#data-science-and-ml

1 messages ยท Page 312 of 1

dusky granite
#

i have put a random value there

#

rest is correct

#

also before it was 6 frames and adding steps_per_epoch made it 5

#

@serene scaffold you there?

serene scaffold
dusky granite
#

ok cool

serene scaffold
dusky granite
#

training dataset

serene scaffold
#

what class does it belong to?

dusky granite
#

i don't understand

#

wdym class

serene scaffold
#

what type of object is train_ds?

dusky granite
#

tf.keras.preprocessing.image_dataset_from_directory

#

this is what you asked right?

serene scaffold
#

yes. that function returns an instance of tf.data.Dataset

dusky granite
#

and that should work right?

serene scaffold
#

still looking

#

what is the import statement for image_learner @dusky granite

dusky granite
#

which one is import - creation of model,compile or fit?

serene scaffold
#

rephrase: what type is image_learner?

dusky granite
#

Sequential

serene scaffold
#

okay, and what is the import statement for Sequential?

dusky granite
#

i am not very sure i understood this but i think this is what you are asking for

#
image_learner = Sequential([
  data_augmentation,
  layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.2),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])```
serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

dusky granite
#

here

lunar zenith
#

I got it already, thanks for your help!

mint palm
#

Why am i wrong in question 7

dusky granite
mint palm
#

The table used in ques 7 is above it

serene scaffold
#

@dusky granite I noticed that you don't pass a y argument

dusky granite
#

well i don't have a y argument

dusky granite
serene scaffold
#

Looks like you don't need one when x is a tf.data instance of some kind

#

hmmmmmmmmm

dusky granite
#

also it worked as a gpu model

serene scaffold
#

btw I'm pretty sure you didn't share the whole error message

dusky granite
#

what should be the steps_per_epoch?

serene scaffold
#

InvalidArgumentError: Unable to parse tensor proto appears to be cut off

dusky granite
#

i can send an ss if you want

serene scaffold
#

Did you look into what "InvalidArgumentError: Unable to parse tensor proto" might mean?

dusky granite
#

like wrong input something missing

#

the screenshots

#

it was 6 frames earlier

#

adding the steps_per_epoch made it 5

#

i believe these 5 are based on the first one

serene scaffold
#

I don't think I have any ideas other than to check out how you're interacting with the GPU

dusky granite
#

it is able to connect to a tpu

#

here

dusky granite
serene scaffold
dusky granite
#

do you know what is currently wrong?

serene scaffold
#

No

dusky granite
#

thanks for your help

#

does anyone else know how to fix this error?

mint palm
#

Why am i wrong in question 7

#

I think my friend is right๐Ÿ’ฏ

#

The given answer is option a

grave frost
#

did you try readin the TF docs?

dusky granite
#

As much as I could

grave frost
#

the model isn't supposed to be in the strategy.scope

dusky granite
#

The kaggle guide told me to do that

#

In order for it to work with tpu

grave frost
#

you just said you read the TF docs

#

use this ^^

dusky granite
#

I tried to where there were no other guides

grave frost
#

take out an hour, read about TPU and solve accordingly

#

go through the whole guide, even if it's not for your case

dusky granite
#
  model = create_model()
  model.compile(optimizer='adam',
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])
dusky granite
grave frost
dusky granite
grave frost
upper lily
# mint palm

Practically, I think your choice is a reasonable conclusion, but here are some other things to consider: there's pretty much always a higher error in validation/test splits (is 5% acceptable?). There's also a large gap between train error and human-level performance, which could indicate that the model itself may be a poor choice. Finally, a high discrepancy in train/val error doesn't necessarily mean validation samples are harder; they could simply belong to a separate distribution

dusky granite
somber prism
#

can someone explain me about dummy variable trap , it is defined as the variables are highly correlated to each other but how , i mean arent the features are independent and the output variable dependent ?

upper lily
upper lily
echo orbit
#

Hey, can someone enlighten me regarding the parameters that can help me know if a sample of a dataframe is fair representative of the original dataframe please ? I'm thinking about bayesian inference but i don't really know how i could make it work (as my datas are hashtags (and not values), i don't think i can work with mean values & std)

#

To sum up : i have multiple dataframes of different sizes (from 5m rows to 67m rows) and i, for example, took only 500K rows of each dataframe. Are there ways to verify if such samples' properties can be applied to the original dataframes ?

lapis sequoia
#

Hi guys. I don't know how relevant this is for Python but I am using Python to do this so I might as well just ask it here. My plan is to use images of a well-known object (a painting in a museum for example) and then use those to query some online database to retrieve the correct name for that painting.

I know this will involve AI but I was hoping someone here can point me in the right direction. Is there already some Python library to do this?
that online database could be a wikidatabase

dusky granite
#

@grave frost i figured most of the stuff out

#

just one thing

#

how can i put my own dataset here?

#
                            as_supervised=True, try_gcs=True)```
#

i currently use this for my datasets

#

tf.keras.preprocessing.image_dataset_from_director

near cosmos
echo orbit
#

Hmmm

#

I think my issue is that i don't really understand what comparing distributions of sample and populations over parameters mean when my "values" are strings

#

If i had numerical values i could probably understand how to do it (with mean, std, etc...) but i really can't see how to make a distribution of strings

near cosmos
echo orbit
#

I think i'll go with a statistical analysis

#

though i wouldn't mind going with a bayesian analysis if that can help me verify if the sample's properties represent the original dataframe's properties as well

near cosmos
echo orbit
#

Yeah it seems i forgot to give the context :
-My main objective is to study how the COVID-19 crisis has been lived by Twitter users using hashtags more or less related with COVID-19.
-To do that, i downloaded a dataset on github containing CSV files of tweet/thread IDs with the hashtags (https://github.com/lopezbec/COVID19_Tweets_Dataset/tree/master/Summary_Hashtag)
-As these are hourly datas, i concatenated them so they become monthly datas & applied some edits (lowercap, delete NaN, etc...)
-Since there is an enormous amount of data, i decided to sample the dataframes so it becomes easier to study them
-My final objective is to use networkx to plot networks of the hashtags & to observe how they are linked to each others (2 hashtags mentioned in the same tweet/thread means they are linked).

However there is not a single analysis there : i need to figure out what would be the best parameters to sample the dataframes so i can tell if any property/result found in the samples would still be accurately found on the original dataframes.

Sorry if my english is kinda bad, please tell me if anything isn't clear
@near cosmos

GitHub

COVID-19 Tweets Dataset. Contribute to lopezbec/COVID19_Tweets_Dataset development by creating an account on GitHub.

#

The dataframes look like this (that was for ex after taking 500K rows) :

           Index             Tweet_ID            Hashtag
5              5  1219774023246192640               #cdc
15            15  1219775789270351873       #coronavirus
21            21  1219778666877448192       #coronavirus
30            30  1219781023023685633            #cancer
32            32  1219781023023685633         #ourmoment
43            43  1219784288687792128    #chinapneumonia
57            57  1219788195799302144       #coronavirus
60            60  1219789360725483520       #coronavirus
67            67  1219794180274151424               #cdc
73            73  1219794987451158528        #wuhanvirus
79            79  1219797876127215616            #corona
127          127  1219807844599336961       #coronavirus
132          132  1219808345713868800       #coronavirus
146          146  1219811459200249856             #sb276
153          153  1219811598845349888                #us
154          154  1219811697176612867  #wuhancoronavirus
156          156  1219811935765483520    #wuhanpneumonia
176          176  1219815212607594506       #coronavirus
193          193  1219817193241751553        #jesussaves
197          197  1219817448083464192          #breaking
205          205  1219818433165946880             #china
209          209  1219818516473270273  #wuhancoronavirus
210          210  1219818516473270273    #wuhanpneumonia
229          229  1219821014735118337      #publichealth
236          236  1219821518718406658  #wuhancoronavirus
238          238  1219821744724369408       #coronavirus
242          242  1219822045367656449             #wuhan
249          249  1219823148998119425       #coronavirus
256          256  1219823683092566016       #coronavirus
257          257  1219824203454529536    #chinapneumonia```
near cosmos
#

So a simple first check then is to calculate your network over several resamples and see if it varies too much for your purposes

echo orbit
#

In that case is there a parameter i can look at to verify if my networks are sufficiently accurate ?

near cosmos
#

You can also look at cross-validation techniques, which are essentially aimed at the same question (is my sample good)

echo orbit
#

Also is there another parameter that can evolve with the sample size so i can see at what exact/approximative value my sample size starts to be good enough ?

near cosmos
#

I'd say the parameter you care about and what is "good enough" is specific to your question/domain. That is, there isn't one right technical answer: You have to think about what you care about and what matters to your problem. But what I would tend to look at is, say, variance in the linkage strengths

echo orbit
#

I see

#

There is still something i don't understand : iirc cross-validation needs an estimator, however i don't think i have any estimator to use (as i'm not trying to make predictions on the datas)

#

In that case how should i define my cross-validation method ?

near cosmos
#

I'd say it this way: you are calculating some statistic (edge strengths in a network) and you want to estimate the variance. Resampling, bootstrapping, cross-validation, etc all give techniques for creating new samples so that you can measure variance.

echo orbit
#

Then i can just make new samples using the DataFrame.sample function

near cosmos
#

(Or in Bayes world, you are estimating the posterior distribution for linkage strength)

echo orbit
#

In that case what would be the objective ? Get the variance as low as possible ?

near cosmos
#

Yes, if your goal is to show that your sample doesn't alter your conclusions

echo orbit
#

I may be dumb but i don't see how i can establish a link between the sample size & the accuracy of the network

#

Like even if i get a pretty low variance (which will tell me i get similar results regardless of the samples (of the same size ?)), how does that tell me that my sample is sufficiently large to correctly represent the original dataset without being too large so i can work on it with satisfying execution times ?

mint palm
upper lily
#

Oh totally, it's a bunk question ๐Ÿ˜›

mint palm
#

Is it just to be ignored or

#

Ya

#

I am just asking cuz instructor specified 30 times to go through questions in detail

#

I hv one more mind f question

#

I got this wrong too

crude fable
#

Does anyone have an idea of how many GCN layers shoule be stacked when using GCNs?

mint palm
#

This is the referencd table for it

mint palm
# mint palm

I tried double crossing by not doing option 2 ( too obvious) lol

near cosmos
echo orbit
#

That's right

near cosmos
#

If you increase your subsample size, your uncertainty about the actual value of the sample will go down. You can demonstrate that by subsampling a bunch of times, perhaps sweeping the sample size and showing you converge

#

If I did 1000 repeat experiments on the real world and showed that my uncertainty on my stats were very small, wouldn't you say that was good evidence I am appropriately sampling reality?

#

We are using the same logic here to show that your subsampling is appropriate

echo orbit
#

So i should, let's say take multiple subsamples (10%,20%,30%, etc...), plot the network for each subsample then notice a convergence in the network constitution (similar links, same nodes, same nodes size, etc...) & conclude my model is accurate at whatever sampling coefficient i notice the convergence ?

#

Where would the bayesian analysis interfere then ?

near cosmos
echo orbit
#

Then i plot the variance as function of the sample size (in % for ex)

#

And take the sample size at which the variance is minimal ?

near cosmos
#

Well, it's going to always be smaller as your sample gets bigger, so look at it first like you are characterizing the behavior and demonstrating that your choice is reasonable

#

But yeah, you've got the basic idea. Give it a go

echo orbit
#

Alright

#

Just in case : where would the bayesian inference/analysis come in please ? Like where in the advices you gave me can i see a bayesian reasoning please ?

near cosmos
echo orbit
#

That's a bit hard to imagine ngl

near cosmos
#

Frequentist: there is one true linkage strength + error in my ability to measure it. Bayesian: there is a family of linkage strengths, some more likely than others

echo orbit
#

From what i understand, i have a distribution of sample size correlated with a distribution of results & i want to find what sample size maximize the probability to get the correct result

near cosmos
#

(With apologies for discussing complex topics over chat)

echo orbit
#

I think i start to understand what is the difference between both

warped pebble
#

hello can someone help me involving finding the Exponential Moving Average from a list of prices

grave frost
#

Signal Processing people - what is the technical term for the specific frequency of an audio file that occurs most of the time? (for ex. say I have a bass song and a lot of frequencies are near 2khz - what would be the technical term for it?)

timid halo
mint palm
#

What makes this wrong choice?

#

The answer is 3rd option

olive willow
#

whats this app called?

tranquil wadi
#
import pandas as pd 
from sklearn import linear_model

app = Flask(__name__, template_folder='template')

@app.route("/")
@app.route("/home")
def home():

     dataset = pd.read_csv("diabetes.csv")

     df = pd.DataFrame(dataset, columns=['Gender', 'AGE', 'Urea', 'Chol', "BMI"])

     X = df[['Gender', 'AGE', 'Urea', 'Chol', 'BMI']]
     Y = df['CLASS']

     regr = linear_model.LinearRegression(n_jobs=-1)
     regr.fit(X, Y)

     X_TEST = [['F', 24, 4.5, 4.2, 21]]

     predicted_val = regr.predict(X=X_TEST)

     return render_template('index.html', data = predicted_val)


if __name__ == '__main__':
    app.run(debug=True)```
#

this giving me indent error : X = df[['Gender', 'AGE', 'Urea', 'Chol', 'BMI']]
^
IndentationError: unindent does not match any outer indentation level

flint mason
#

try shift+tab on the content inside the function home

#

Or so I think

tranquil wadi
#

it worked

kindred blade
#

Can someone tell me how to start learning machine learning Im beginner at AI field I love this field so much so I wanted to get into but im not beginner at data analysis and visualization field so i believe that will help

desert oar
# mint palm

because that's not how overfitting works. you can still overfit within one class even if you are reasonably confident that you have a representative sample of other classes

flint mason
#

have a look at its graph its quite straight forward

near cosmos
#

The range of cosine function is [0, 1]

near cosmos
#

Defintion: cosine(theta) is the x coordinate of the point on a unit circle at angle theta.

tidal bough
#

As for why, well: dot product of two vectors divided by the product of their norms is the cosine of the angle between vectors. That's from -1 to 1. Then you take 1 minus that, and get a value from 0 to 2.

#

With 0 achieved by collinear vectors, and 2 by antiparallel ones.

kindred blade
#

and not only this question , another question since days and not answered

iron basalt
kindred blade
iron basalt
# kindred blade isnt matplotlib and ChartJs data science related ?

They are related to data science in the same way that knowing how to open a window is related to game development. Data science is also a buzz term and therefor related to everything. The question is like asking if math is related to physics. Also the previous question was web stuff so it's better suited for the web dev channel.

grave frost
#

I don't think it makes sense computationally - and we have a mathematical foundation for the negative range

near cosmos
near cosmos
tidal bough
#

distance functions with negative values are kinda bad. Though even the 0-2 cosine distance isn't positively determined (or whatever that property is called) - cosine(a,b)==0 means a || b, but not necessarily a==b, so it's not a metric unless your vectors are normalized.

iron basalt
kindred blade
#

there are unlimited projects to do with ML

grave frost
iron basalt
maiden saddle
#

how long should is the learning curve from beginner to coding ai

grave frost
oak laurel
#

Hi people! I'm an undergrad that wants to make the best of his summer, and I'm looking for good courses to learn skills in Data Science/ML/AI. I'm not really sure where to start, but I've been suggested to look at PyTorch as a starting point (since I'm already decently fluent in Python), and then make my way up to SQL. Can somebody suggest any good online courses/bootcamps/online resources to use over the summer? Thanks.

lapis sequoia
#

Why are neural networks not used for everything?

#

Like, why would you choose to use a statistical model like a Random Forest over a neural network?

velvet thorn
#

lower computational complexity

#

higher interpretability

#

access to software

lapis sequoia
#

so generally just for computation?

#

would big tech companies generally always use NNs then?

#

since they have the resources

velvet thorn
#

interpretability and variance matter

#

NNs aren't better @ solving all problems

#

example

velvet thorn
#

with high enough time complexity and dataset size, nobody has enough resources

lapis sequoia
#

Makes sense, thanks

fast dune
#

@velvet thorn
You helped me 2 days ago on Numpy array broadcasting. I've been studying array broadcasting, but I'm still stuck. I rewrote my code snippet (https://paste.pythondiscord.com/inoyecifax.py) and I've included the necessary utility functions to convert image files to integer matrix for testing purposes.

lapis sequoia
#

Does anybody use MatLab anymore

mint palm
mint palm
#

If we take fog example:
If we Use 1000 fog sample on 100,000
One fogs texture would be repeated 100 times

tiny flax
#

qq decision trees use binary trees as a structure/support?

uncut barn
#

what type of regularisation is used for linear classifiers?

tranquil wadi
#
@app.route("/home", methods=['GET', 'POST'])
def home():

     form= DataForm()
     if form.validate_on_submit():
        gender = float(form.gender.data)
        age = float(form.age.data)
        urea = float(form.urea.data)
        cr = float(form.cr.data)
        hba1c = float(form.hba1c.data)
        chol = float(form.chol.data)
        tg = float(form.tg.data)
        hdl = float(form.hdl.data)
        ldl = float(form.ldl.data)
        vldl = float(form.vldl.data)
        bmi = float(form.bmi.data)
     else: 
        pass

     DATA = [[gender, age, urea, cr, hba1c, chol, tg, hdl, ldl, vldl, bmi]]```
#

Error: gender = float(form.gender.data) TabError: inconsistent use of tabs and spaces in indentation

unique birch
#

you have some indents with tabs and some with spaces

tranquil wadi
#

what shall i do?

unique birch
#

Redo your indents, check your whitespace if your text editor has that

tranquil wadi
#

that worked, but now it is giving the error DATA = [[gender, age, urea, cr, hba1c, chol, tg, hdl, ldl, vldl, bmi]] UnboundLocalError: local variable 'gender' referenced before assignment

lapis sequoia
#

Hi, I'm looking for help with seaborn/matplotlib - How can I put many stripplots on same figure, with (categorical) axis common for all plots?
I tried to do it like this:

sns.stripplot(y="Project", x="DUT_result", data=df[df['Test_result']=='Pass'], hue='Country', marker = 'o')
sns.stripplot(y="Project", x="DUT_result", data=df[df['Test_result']=='Fail'], hue='Country', marker = 'X')

but the second plot changes the y axis, as the rows with Test_result==Pass contain different Project names than ones with Fail.

south crag
#

Hey i am having error importing pipeline as pkl

#

This is the error:

digital merlin
#

Hi, I would like to know if let's say I'm going to train a machine learning model based on a stroke dataset from kaggle, is it possible if I create a form asking for some stuff using a form or data from another file and it can predict if the patient has a stroke? I'm not too sure if it's possible, googling doesn't give me any info if it is

ripe forge
#

Sure, possible. Why wouldn't it be. As long as the data from the form can be converted (which, yes it can. You're the programmer you're in control) then that model can make a prediction on it

digital merlin
#

And is the conversion via hot encoding?

ripe forge
#

That depends entirely on how the train data was prepared.

#

You follow the same steps you took

lapis sequoia
#

i need help regarding ocr

#

is this the right chatroom?

digital merlin
#

@ripe forge but would it be possible if let's say I'm doing it on an application, and I've already prepared the trained data and have already did the application? Because what I'm doing is basically the user entering the data from the form and afterwards there'll be a result of whether the patient has a stroke or not

#

sorry for ping

lapis sequoia
#

are there any libraries which can implement OCR on a screen that is being scrolled live?
like for example if i want to make a program which constantly watches my screen and keeps converting text on the screen to like a dictionary and if a certain word pops up on the screen, it needs tro detect it and then shut the program off
its kinda like u have a screen recorder(like obs) and u are implementing ocr at the same time

teal nova
#

off topic question here, but are there any university professors specializing in mathematical modelling/statistics or economics here?

mossy stratus
#

I'm making a discord bot to play card games and was wondering if the best way to make an AI for this would be to have it "cheat".
If you know of a better way, please tell me.

grave frost
#

Noob Question: why can't the attention mask have values between 0 and 1 (rather than only 0 and 1)? Kind of thinking them as weights for instance, then if I want the model to have partial attention to a token, can't I use something like [0, 0.5, 0.8, 1, 1]??

ripe forge
#

So, you need to save the steps, and the model. and then re-do the dataprep steps on the new data, and load the model, and run a predict on the data

grave frost
#

then re-do the dataprep steps on the new data
That's kinda discouraged now. it's recommended you keep pre-processing as custom layers in the model itself to keep it simple and quick

dusky granite
#

figured some more stuff out
i need help creating this type of dataset

                            as_supervised=True, try_gcs=True)```i generally do this type
```tf.keras.preprocessing.image_dataset_from_directory```
vocal bay
#

I have a use case where I will have a central model which needs to be trained and will deliver predictions. My data is harvested in real time from multiple clients across the internet. I originally thought of making a system where each time a client collects a data point, it sends it to the server with an API and is added to a queue which will contain data which will be batched and trained on. However, this data is sensitive and as far as possible, I would like to avoid it traversing the internet.

Then I came across PySyft which would allow me to train on data remotely. However, can PySyft be used in a way where there is one model and innumerable data sources, instead of multiple models sharing one data source? Also, my use case requires that data clients can come online and offline randomly so it should be possible to add and remove data sources while training.

I appreciate that this is probably a niche use case but would greatly appreciate any guide on the matter.

#

Please tag in responses

grave frost
#

data is sensitive and as far as possible, I would like to avoid it traversing the internet.
what?

tall zinc
#

I have a binary image with some features in it, but sometimes they touch and I want to get individual contours for each. Is there a "more proper" way to go about this than just using cv2.drawContours() to draw a black outline around each directly onto the binary image, thereby shrinking them, and then contour it again?

#

It feels like such a hack but like ... it does work perfectly

tall zinc
#

I don't need noise got rid of, so I don't need the dilate afterwards, and if anything the dilating would just put me right back where I started. I guess it would just be erosion that I want on reflection

tall zinc
near cosmos
tall zinc
#

Yeah erosion and opening are both handy tools but it's just a bit too destructive for this purpose, sadly. I'm fine with my hacky solution I just wasn't sure if there was a "proper" one. Though I guess erosion is that proper solution, I'd just need to also retain the un-eroded mask so I could use that for getting the contours within these other ones

near cosmos
nova tapir
#

how can i solve this problem

tall zinc
#

Dense() apparently needs to be given units, either by name or by position ahead of the named arguments

tall zinc
nova tapir
tall zinc
#

Sounds like you provided 6 logits and only 1 label, rather than one for each

green owl
#

does anyone here have experience making a webcrawler

limpid saddle
#

My data seems to be overfitting, how can I do regularisation?

bold timber
#

Anyone can tell me why I get an error?

heavy tree
#

At the bottom of the error message it says what the problem is. You are trying to convert Florida directly into a number... Not sure if that's possible.

tall zinc
#

Florida doesn't float? Better hope those sea levels don't rise

remote flume
#

hello, somebody know if it is possible speech-to-text conversion with speech recognition but in the output audio?

desert oar
jade carbon
#

btw, what are differences in algorithm between object detection and grad cam?

upper lily
# limpid saddle My data seems to be overfitting, how can I do regularisation?

Hey @limpid saddle , that's quite a broad question. There are many regularization techniques available, but your choice of which technique to use will depend on the data modality (e.g. tabular, imagery, text, etc) and modeling approach. Folks might be able to help more if you could describe the problem, data, and model :)

#

For example, @uncut barn asked this morning about regularization with linear classifiers. In that case, L1 and/or L2 regularization are typically used

#

But if you're working with a neural network and imagery, it's typical to use Dropout layers, batch normalization, and data augmentation

jolly nest
#

!e what should I add? ```py
#LVL 0
mean=avg=lambdaa,n=1,sigma=0,_:sum(a)/len(a or[()])if a else sigma/(n or 1)
sqrt= lambda n:n
0.5
_p_m= lambda a,b:(min([a+b,a-b]),max([a+b,a-b]))
c = lambda
a:a

def sd(*a,**k):
mean = k.get('mean',avg(a,n=k.get('n',len(a))))
return sqrt(avg(
[abs(i-mean)**2 for i in a]))

#LVL 2
def SE(*a,**k):
s = k.get('s',sd(*a))
n = k.get('n',len(a))
return s/sqrt(n)
def z(x, *a, k):
mean = k.get('mean',avg(*a))
s = k.get('s',sd(*a,
({'mean':mean}|k)))
return (x-mean)/s
def raw_score(z, *a, **k):
k['mean'] = k.get('mean',avg(a))
s = k.get('s',sd(a,**k))
return z
s + k['mean']
#LVL 3
s2=lambda
a,**k:k.get('s',sd(*a,**k))**2
def t_stat(*a,mu0=0,**k):
mean = k.get('mean',avg(*a))
se = k.get('SE',SE(*a,**k))
return (mean-mu0)/se
def conf_int(*a,T,**k):
se = k.get('SE',SE(*a,**k))
return _p_m(k.get('mean',avg(a)),Tse)

#LVL 4
def summary(*a,**k):
k['mean']=k.get('mean',avg(*a,**k))
k['s']=k.get('s',sd(*a,**k))
k['SE']=k.get('SE',SE(*a,**k))
k['t']=t_stat(a,**k)
if'T'in k:
k['confidence_interval']=conf_int(a,**k)
_01 = [k['t'] <- k['T'], k['t'] >+ k['T']]
k |= {'H0':'reject'if any(_01)else'accept','Ha':['mu < mu0']
(_01[0])+['mu > mu0']
(_01[1])+['mu != mu0']if any(_01)else'reject'}
k['Z-score']=z(k.get('mu0',0),*a,**k)
for n,v in k.items():print(n,'=',v)

data = {'n':40,'mean':172.55,'s':26.33}
summary(**data)
data |= {'mu0':166.3,'T':2.021}
print(summary(**data))

arctic wedgeBOT
#

@jolly nest :white_check_mark: Your eval job has completed with return code 0.

001 | n = 40
002 | mean = 172.55
003 | s = 26.33
004 | SE = 4.163138539611671
005 | t = 41.44709534842795
006 | Z-score = -6.5533611849601225
007 | None
008 | n = 40
009 | mean = 172.55
010 | s = 26.33
011 | mu0 = 166.3
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/sufijoxodi.txt?noredirect

jolly nest
#

actually, how do you get p-value?

#

I see it everywhere, but no clear way to get it algorithmically

#

just modules and ambiguous formulae

lapis sequoia
#

hello

#

am interested in machine learning and ai

#

i just figuered web development wasnt for me

#

i hope you dont need any form of html/css or js for machine learning and ai ๐Ÿ˜ž

ripe forge
#

Not really, no. Though those are useful things to know if you ever need to deploy stuff yourself

#

Even basic knowledge is enough

winged stratus
#

anyone know how to do early stopping in pytorch?

ripe forge
#

Pytorch you write the model in a loop yeah? Just add a logic for checking validation data in the loop and keeping track of last validation

#

If the score becomes worse, break

nova tapir
limpid oak
#

hello every one
I have one python code which generates .csv as output,but I want to share as link where, when user hit url csv should be downloaded at user side
please help

lapis sequoia
#

how advanced math do i need for ai

mint palm
dusky granite
nova tapir
#

I have a code from about 3 years ago, and i don't know how can i fix it

#

I guess it's code for older versions of keras and tensorflow, and there are some bugs inside

#

can someone fix this for me? it's a CNN code

#

if you guys want to help, text me, I can send the code and data set

cedar finch
#

Anyone have any recommended texts for data science and ai?

serene scaffold
blissful dragon
#

I want to make a Gantt chart for different phases of a space mission. I'm using matplotlib atm (not adverse to using bokeh). Trying to add more entries along the y-axis and it's not displaying for whatever reason. The timeframe I want to do it over would be over years.

#

nvm

acoustic forge
#

Does it make sense to do cross validation on undersampled data?

cedar finch
dusty parcel
#

and any recommendations on a data analytics book? I've heard of 'python for data analysis' from Wes McKinney but I think it's kind of outdated as for today

torn pollen
#

Is it correct that with the right training, I can setup a tensorflow AI with python that will solve captchas with 90%+ accuracy?

hushed hatch
#

Hi Anyone know if we have anything to figure out difference between figures and text? Something that would mark figures and text seperatly in an Image?

upper stirrup
#

Hi everyone!lemon_pleased Can someone help me "Natural Language Processing"(nlp) in "word level tokenization"?

mystic orchid
#

Hi everyone! What you can recommend to learn for noob, who want to learn ml. Have knowledge of python.

arctic wedgeBOT
#

Hey @lapis sequoia!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

โ€ข If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

โ€ข If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

kindred radish
#

Ok this is driving me crazy, can someone confirm something about neural networks:

#

These neurons/nodes... do they ALL have summation and activation functions in them?

#

This is how i thought a single layer perception looked:

#

Where the circle and box are the summation and activation functions respectively

#

But this confuses the "input layer" and "output" layer terminology.

#

(please ping me!)

fading thunder
fading thunder
late shell
#

Hello, I'm learning about regression algorithms, and I'm having trouble understanding the Support Vector Regression. I don't understand why we want to minimize the coefficients vector w? Also how does C play a role in affecting the model? I read this explanation of C on youtube : How much should an SVM/SVR care about getting everything right vs getting the things that it gets right very right. But I don't really get it. Can someone please explain it in noob language. Thanks

desert oar
#

@kindred radish the terminology is indeed confusing, for the reason you mention: the idea of a "layer" doesn't make sense when you think of a neural network as a sequence of data transformations

fading thunder
desert oar
#

@kindred radish maybe it's best to think of a "layer" as "the data resulting from a transformation", if that helps at all. Also, not all "layers" are neatly divisible into "linear component" and "elementwise activation", cf. convolutional layers, attention units, softmax output layer, etc.

fading thunder
late shell
#

I'm sorry, I've not yet read about logistic regression. I'm a beginner.

#

The course I'm following instructs us to study about SVR before SVM, that doesn't make sense to me though.

fading thunder
late shell
#

okay

#

alright, thanks @fading thunder

fading thunder
#

And follow it until you reach SVMs

desert oar
# late shell Hello, I'm learning about regression algorithms, and I'm having trouble understa...

This is a technique called "regularization". The very general idea is that if the weights are "small", then our estimates are closer to a common "baseline" estimate, and we aren't making wildly strong predictions and/or making wildly large changes in predictions for small changes in the input data.

This helps prevent overfitting, at the cost of "shrinking" our predictions towards a baseline and potentially reducing the sensitivity of the model to variation in the inputs. In the extreme case, an over-regularized model might predict the same output for every input.

By adding the total size of the weights into the objective function, we are telling our optimization process penalize bigger weights and prefer smaller weights. We control the strength of the penalty by adjusting C.

On the math side of things, this technique is an application of something called "Lagrange multipliers" that you will learn about in university-level calculus courses.

The technique of regularization by penalizing the "size" of the weights is quite general and you will see it appear in different types of machine learning models.

Edit: specifically in the context of an SVM, the optimization problem is not possible to solve without this weight penalization thing... there are other mathematical interpretations for what is happening here.

lapis sequoia
#

somebody here can help me get all plugins i need in pycharm for an own voice asisstented i cannot download them and i dont know why

kindred radish
#

Like, i guess the "output layer" is the "outer-most hidden layer"?

#

according to that diagram i made?

desert oar
#

I'd rephrase it as, the output layer is just the last layer. And the input layer is a special case where the data flows in from "outside", rather than resulting from a computation.

kindred radish
#

Right I think that clears that up nicely? Lemme sketch something quick

desert oar
#

A "layer" itself is a computation with inputs and outputs. The idea of a layer as "a bunch of nodes" kind of breaks down once you get away from fully-connected multilayer perceptron models.

late shell
grave frost
#

Noob Question: why can't the attention mask have values between 0 and 1 (rather than only 0 and 1)? Kind of thinking them as weights for instance, then if I want the model to have partial attention to a token, can't I use something like [0, 0.5, 0.8, 1, 1]??

kindred radish
#

Would this be right?

#

each circle represents a node that contains AFs and the like

wild dome
#

I'm trying to detect the circle of the wheel and draw its shape on the image

#

this is what I have but it's not working

#
img = cv2.medianBlur(img, 7)
circles = cv2.HoughCircles(img, cv2.HOUGH_GRADIENT, 1, 300, param1=30, param2=45, minRadius=0, maxRadius=0)
for c in np.uint16(np.around(circles))[0, :]:
    a, b, r = c[0], c[1], c[2]
    cv2.circle(img, (a, b), r, (0, 255, 0), 2)
plt.imshow(img, 'gray');
#

how can I adjust the parameters?

timid halo
wild dome
#

maxRadius=1000

#

same with 500

timid halo
# wild dome

well, it's a progress, better more than none, amirite? apparently it's seeing everything else but the one circle in the middle ๐Ÿ˜‚

wild dome
#

do I have to apply more filters? reduce more noise?

timid halo
#

that would be the next step, yea

tawdry hamlet
#

With Wasserstein GANs, how easily comparable are Critic outputs as metrics? I understand that the Critic doesnt output a 0-1 probability of the data being fake/real and instead is more of an abstract score reflective of 'real-ness', but I am not sure if a battery of Critic outputs are 'comparable' between runs?

#

I have a WGAN model that at the end of training runs the original dataset through the Critic and records the output for every datapoint and im trying to see whether or not by repeating this many times to produce many runs I can take the mean of each run for each datapoint as a means of aproximating a score of normality for the purposes of outlier detection

#

Anyone know anything about that sort of thing?

mint palm
# wild dome

its actually so funny to see nn mistakes for the first time in vision

#

this gets me excited for upcoming computer vision lectures

#

๐Ÿ˜†

desert oar
# kindred radish

Honestly... I think you are overthinking the whole nodes and layers things

#

But yes, that diagram looks fine

kindred radish
#

I'm writing a dissertation about it so I have to be specific hahaha

#

Thank you

desert oar
#

Oh? What's the topic?

iron basalt
# kindred radish I'm writing a dissertation about it so I have to be specific hahaha

Home page: https://www.3blue1brown.com/
Help fund future projects: https://www.patreon.com/3blue1brown
Additional funding for this project provided by Amplify Partners
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/nn1-thanks

Full playlist: http://3b1b.co/neural-netw...

โ–ถ Play video
#

A neural network diagram is just a computation graph, but specific to processes which very loosely mimic neural networks.

green owl
#

Hi all I currently trying to write program to scrape numerical data from a website, but I am having a hard time finding any tutorials on finding some reference material or code to extract the: Date, Volume & Short Volume from a table

#

Can anyone help or point me in the right direction? can't seem to figure out how to get the table

median ember
#

Hello, I have a numpy question ( likely a matrix operation )
I have 2 arrays, A and B, with sizes 2,3 and 3,3
I want to subtract A - B in a way that I would have a result with the shape 3,2,3 or 2,3,3 with all the results
is this possible?

#

@tidal bough is it better?

tidal bough
#

(like how, for example, for elementwise multiplication C[i,j] = A[i,j] * B[i,j])

#

you might be looking for np.subtract.outer, but that would be an array of shape 2,3,3,3 with formula C[i,j,k,l] = A[i,j] - B[k,l].

median ember
#

but basically I want every element of A iterate through every element of B and return the results

#

when the arrays have same size, I use a for and np.roll

#

like this:

A = [...]
for i in range(A.shape[0]):
    B_shifted = np.roll(B, i, axis=0)
    C = A - B_shifted
#

but since they donยดt have same size, I canยดt do that

median ember
desert oar
#

@serene scaffold i think i forgot to share this the other day, this was that pd.concat thing we were doing, but in hy:

(require [hy.contrib.walk [let]])
(import [pandas :as pd])

(setv df1
  (pd.DataFrame {"x" [1 2 3]
                 "y" [4 5 6]}
                :index ["a" "b" "c"]))

(setv df2
  (pd.DataFrame {"x" [11 22 33]
                 "y" [44 55 66]}
                :index ["c" "b" "a"]))

(defn myfunc [v1 v2]
  "Silly function for demo purposes"
  (/ (+ (get v1 "x")
        (get v2 "y"))
     2))

(setv result
  (let [myfunc2 (fn [row] (myfunc (get row "x")
                                  (get row "x")))]
    (doto
      (pd.concat {"x" df1 "y" df2} :axis 1)
      (.apply myfunc2 :axis 1))))

(print result)
velvet thorn
#

wtf

#

it's not April Fool's is it

desert oar
#

scheme-like language that compiles to python ๐Ÿ™‚

#

ah, i seem to be using doto wrong, but this works:

(require [hy.contrib.walk [let]])
(import [pandas :as pd])

(setv df1
  (pd.DataFrame {"x" [1 2 3]
                 "y" [4 5 6]}
                :index ["a" "b" "c"]))

(setv df2
  (pd.DataFrame {"x" [11 22 33]
                 "y" [44 55 66]}
                :index ["c" "b" "a"]))

(defn myfunc [v1 v2]
  "Silly function for demo purposes"
  (/ (+ (get v1 "x")
        (get v2 "y"))
     2))

(setv result
  (let [myfunc2 (fn [row] (myfunc (get row "x")
                                  (get row "x")))
        df (pd.concat {"x" df1 "y" df2} :axis 1)]
    (.apply df myfunc2 :axis 1)))

(print result)
velvet thorn
#

^

desert oar
#

why not?

#

in fact, why wouldnt everyone do this

#

ive actually written datascience code like this, its not that bad

#

although i think lisp is better left as an application dev language style, too verbose to comfortably express "math" in lisp

#

I would definitely write ETL code this way though

#

run it on pypy for extra zoom

#

it even supports type annotations

serene scaffold
#

but do you get an optimization if you use them?

exotic maple
#

a quick question folks, jus to make sure I have to idea right.

In general, cross_validation is performed to estimate the classifier / regressor performance on the training set (and skips the need for a validation split). Cross_val doesn't train a model, but only provide a estimation of performance.

Grid_search is used for hyperparameter tuning the selected model; grid search finds and remembers the "best" parameters for the scoring selected. A grid CAN be used for predictions.

Is this correct?

desert oar
#

But sometimes it's better to let the runtime do its own optimization. Although TCO would be really cool

#

It's probably better that the generated python easy to reason about and mostly 1:1 with the hy code

dapper swan
#

Sorry to bother... It might be a dumb question but...
Does anyone know why I got different Coefficient with sklearns's LinearRegression and statsmodel's OLS ?

#

This is sklearn's result

#

while OLS's result like this...
only Year's coef same

desert oar
#

@dapper swan what's the .intercept_ of the sklearn model?

desert oar
#

maybe something is different in how the categorical State feature is expanded. can you show your code for both?

iron basalt
# desert oar why not?

Cost benefit analysis of a new programming language: Benefits: a syntax that you prefer | Costs: More code = more problems, you need to now learn a new language, everyone else that you work with needs to learn a new language, the language needs to be maintained, new programmers at the company need to learn this language making the hiring process a nightmare. Overall, more technical debt with no gain other than "I like it".

#

Not to misunderstand though, I like the idea of trying new programming languages and making them, but one must always be honest with oneself about the costs and benefits. Making a language worth adopting is hard since it requires some extreme benefits.

dapper swan
iron basalt
#

(The benefits for many actually comes through the standard library, not the language itself (it's why a language tends to take off (see ruby on rails, jquery, etc)).

dapper swan
#

And here is statemodel's

iron basalt
#

(Or in the case of something like python, a giant distributed globally accessible library pool via pip)

desert oar
iron basalt
#

Or some more subtle "restrained" approach, like Rust in which the goal is to make things more safe or stable, etc.

desert oar
#

I have snuck into into a couple script at work but

iron basalt
#

The main advantage that something like Lisp has is that it's easy to implement, so back in the day when you were stuck in assembly and you wanted to quickly get out of assembly, you could quickly make Lisp.

desert oar
#

I do think that lisp like languages can express certain kinda of ideas and programs very elegantly

#

"Real" lispers seem to treat lisp as a kind of smalltalk style all-encompassing environment

#

I don't feel that way about it

#

But i do think there are some advantages to lisp and it certainly has its joys, although I don't believe any lisp will ever gain adoption at the level of python

#

@dapper swan do you mind sharing your code as text and not as a screenshot?

#

!code-block

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

iron basalt
#

(Also back then, Lisp's macros and meta-programming ideas were new-ish (in terms of popularity), but these days there is nothing stopping someone from making something like python or C but with Lisp-like macros or even cleaner).

iron basalt
desert oar
#

(Crystal and Nim also have strong syntactic macros, OCaml has its macro PPX thing, Idris has elaborator reflection, et al)

iron basalt
#

I am currently awaiting a bunch of C++ replacements to have such strong macros, Rust has some pretty ok ones.

desert oar
#

Does D have good macros?

iron basalt
#

IDR, but D has kind of fallen off in favor of Rust, Zig, etc in the systems software space. Garbage collection has no place there.

desert oar
#

Anyway the pandas and numpy apis do not work that well with Hy and you'd want to work up some nice macro DSL for it

desert oar
# exotic maple a quick question folks, jus to make sure I have to idea right. In general, cros...

Kinda, but i think maybe focus on the concepts rather than the sklearn implementation thereof. Cross validation is a technique for estimating out of sample performance by slicing up your data and training the model on different slices of the data. Grid search means defining a grid of model parameters and fitting the model at each point on the grid, picking the best performing model. The standard "intro to ML" approach is to use grid search with cross validation at each point in the grid

#

The scikit learn implementation of grid search does give you the convenience of keeping the best-performing model for you and letting you make predictions from it

exotic maple
#

to use them together I can use the CV parameter of sklearn?

#

if im not mistaken the "folds" would CV -1, no?

desert oar
exotic maple
#

thanks! ๐Ÿ™‚ I feel like i've gotten most of these things but im still missing some important bits of the basics >.<

exotic maple
inland zephyr
#

I have problem when try to extract face with MTCNN library. I put my mtcnn as function:

     face_detector = MTCNN()
     detected = face_detector.detect_faces(img)
     return detected```
and run it with this:

from skimage import io as ios
from mtcnn.mtcnn import MTCNN
...
img = ios.imread(images)
detected_faces = face_detector(img)
if len(detected_faces) > 0:
k = detected_faces[0]
...

it raise error: ```ValueError: Input 0 of layer conv2d_3444 is incompatible with the layer: expected axis -1 of input shape to have value 3 but received input with shape [None, 272, 507, 4]``` which I know something wrong with the model.
frosty hamlet
#

Hello guys so I have a pretrained Pix2Pix GAN (pytorch) model that takes an input edge drawing and from one folder and output the drawing colored in essentially. Im currently trying to build a web application that would allow for users to upload there own drawings and receive the generated results.

What would be some technical solution to implement this?

median ember
potent badge
#

How do the depth and width of a deep neural network play into the mean and variance

winged yew
#

do i need external GPU or i can run with my intel UHD 620 graphics card to learn deep learning ?

median ember
exotic maple
slate hollow
#

yo wHAT

#

wait nvm

old grove
# bold timber

You need to transform all values to numerics via standard scaling or whatver fits so... and then train... Strings aren't interpteted by models, we need convert and transform them into numerics ๐Ÿ˜ƒ... Normalize them from what i think

inland zephyr
frank acorn
#

Apache Hadoop or Spark which one should i preffer. I'm a junior in college and i'm interested in data analytics.

carmine finch
#

Hello has anyone worked with databases before please someone DM or anything i need help ๐Ÿ™‚ thanks hope someone helps me

lapis sequoia
#

Hi guys I scraped some e-commerce dataset and posted it in kaggle. There's product, shop, and text data. I would like to see how an experienced data analyst/scientist would approach the dataset and its shortcomings etc. I would really appreciate it if someone can give a minute or two to see this dataset.. https://www.kaggle.com/jaepin/shopeeph-koreantop-clothing thanks

kindred radish
# desert oar Oh? What's the topic?

It's about using machine learning to improve industrial practice. I'm a physics student so this whole paper is going to go over my supervisor's head lmao. So if i can explain everything as concisely and clearly as everything I should hopefully do well

modern swift
#

Another question would be the interaction of comments and ratings. You could try some sentiment analysis, but I think your translations would need some love for that first ๐Ÿ™‚

lapis sequoia
red hound
#

Recently i'm doing a lot of work on sequential data like for example log-files. I would like to maken a comparison of LSTM vs. CNN on generating and separately classifying those samples. Can you recommend a setup, an architecture or anything to compare both types of networks to get meaningful results? Would be awesome to let me know your thoughts

modern swift
lapis sequoia
#

I actually did. The googletrans package was really unstable. It wouldn't detect the source language, and got buggy along the process. I translated it via deep-translator module. It was more stable, but still had some issues.

tough cosmos
#

Can anyone recommend a startup related to data science ?

lapis sequoia
bold timber
#

Hi, I have a question: what of condition to build a training set and test set in Decision Tree Regression?

inland zephyr
#

its also same as how to built the dataset for the classifier task too

#

oh anyway... i have specific question about storing preprocessed datasets. I have let say a 2*n array of feature and 1 class of data which will i used for classification in CNN. The feature taken from a quantized soundwave (so it must be array) with two channel (that's why the dimension is 2 times n when n is the length of the signal). Since pandas are not design to store this kind of data structure , is there the alternative to store the data? so it will have structure like this:

record|min|data                                   |class
--------------------------------------------------------
0001  |1  |[[0.011,...,0.01],[-0.1010,...,-0.001]]|1

since as far my experience Pandas cannot store data like this so i looking for other alternatives.

desert oar
gleaming oyster
#

Hey, how do I know if each value in a Series is a number and, if so, divisible by 10?
I've tried (df[1].str.isnumeric() & df[1].astype(int).mod(10) == 0) but it will stop at any non-numeric value as it can't convert it.
(Don't know if I can ask this here or it has to be on the help channels)

tidal bough
gleaming oyster
#

inds = inds[inds2] does not work (IndexingError), but inds = inds & inds2 does the job. Thanks!

gleaming oyster
potent badge
#

How would the depth and width of a deep neural network play into the mean and variance?

tidal bough
#

because I think inds2 would be shifted compared to inds

gleaming oyster
tidal bough
#

like, say inds is [1,3,5], and df[1][3] is the only one that's 0 mod 10. Then inds2 would be [1], I think, meaning that only the second element of inds fullfills it

#

maybe inds = np.array(inds)[inds2] or something

gleaming oyster
#
x = pd.Series(['1', 'data', '3', '5'])
inds = x.str.isnumeric()
inds2 = x[inds].astype(int).mod(5) == 0
inds = inds & inds2

works and outputs False, False, False, True

bold timber
#

Anyone can tell me what the meaning of first line in that cell?

tidal bough
#

as for how it works, well:

#

!docs numpy.arange

arctic wedgeBOT
#

numpy.arange([start, ]stop, [step, ]dtype=None, *, like=None)```
Return evenly spaced values within a given interval.

Values are generated within the half-open interval `[start, stop)` (in other words, the interval including *start* but excluding *stop*). For integer arguments the function is equivalent to the Python built-in *range* function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use [`numpy.linspace`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html#numpy.linspace "numpy.linspace") for these cases.
bold timber
tidal bough
#

it doesn't; it create a bunch of evenly spaced points

#

they are later used as the X values for the plot

#

huh?

bold timber
tidal bough
#

I don't really get what you're asking

bold timber
#

I mean if i count a grid for X --> len(X_grid) I get 900

#

And my question is, Why in that plot has only 10 dots?

#

wait...

#

Oh I know

#

900 for Position level, right?

#

that is for drawn of scale on axis, right?

tidal bough
#

X_grid is for drawing the blue line

#

the red points are placed separately

#

there's 900 points in the blue line (but they are too close to see where the line passes through them)

bold timber
bold timber
tidal bough
#

This reshapes the input from (900,) to (900,1) - from 1d to 2d

#

that's required for the input to the model

#

basically, the input needs to be 2d, even if each sample is just 1 number.

bold timber
#

Sorry, I still don't understand about that

#

Because I'm beginner in Machine Learning

tidal bough
#

no real visualization here, it's just that 1d arrays are considered different from 2d arrays with a second shape of 1

#

even though they are laid out the same way

bold timber
tidal bough
#

Why'd that array be reshapable into len(X_grid),2? It'd imply that len(X_grid) == len(X_grid)*2, not really possible ๐Ÿ™‚

bold timber
molten hamlet
#

Can someone help me find module that helps calculation of variogram 2D?

sharp vale
#

anyone familliar with the openclassrooms deep learning tasks

#

I need help

grave frost
#

he won't be an expert, but atleast he would have a decent idea

tidal bough
#

that's pretty optimistic

#

professional scientists in hard fields absolutely can be totally ignorant about computers, sadly

brittle turtle
#

hi I'm supposed to find a model for the data provided and plot it, can someone help me take a look and see if its alright

grave frost
tidal bough
#

don't know about mathematics, but physics - totally

#

if they're old, at least

grave frost
#

if a physician doesn't know maths/calc/lin algebra how do they do physics?

#

Im pretty convinced they can get atleast a basic and rough idea of how MLP works from the formalization

kind cedar
#

Hi, i'm working with pandas, I was wondering if there was a way to group data and count according to column value? Here is an example of my dataframe and what I would want.

noble nimbus
#

Hi everyone.

I'm currently working on a news aggregator and I want to group same-topic news. As my dataset will be continuously increasing, so I want to use Incremental Clustering.

Q 1: Is "Incremental Clustering" a name of some algorithm or is it a way of clustering?

Q 2: If "Incremental Clustering" is not an algorithm but an approach, then tell me what specific algorithms will help me.

Request: Please suggest some good tutorials (Python preferred).

wild dome
#

I'm using 'gray' as argument when plotting because the image is in grayscale but I need the circle to be displayed in color, how can I do that

img = cv2.medianBlur(img, 5)
img = cv2.GaussianBlur(img, (5, 5), 0)
img = cv2.medianBlur(img, 5)
circles = cv2.HoughCircles(img, cv2.HOUGH_GRADIENT, 2, 1000, param1=10, param2=10, minRadius=0, maxRadius=0)
for c in np.uint16(np.around(circles))[0, :]:
    a, b, r = c[0], c[1], c[2]
    cv2.circle(img, (a, b), r, (0, 0, 255), 2)
plt.imshow(img, 'gray');```
kindred radish
#

Part of it as well is I don't want to bore them

#

Because they might not be interested in

desert oar
#

ah... you are actually adding the circle to the image

old hatch
#

in pyspark, what's the best way to run some function over each row of a dataframe and map them to a new row with a different schema?

#

im a data science noob

lilac raven
#

Why is extend replacing the previous array with the new one, instead of adding the new array to a list of arrays? ```if file.endswith("_MID-R1-ECG.1D_hrv.txt"):
full_name = pathlib.Path(root) / file
try:
read_fname = full_name
data = np.loadtxt(read_fname)

                    data_list = data.tolist()
                    
                    data_list.extend(data_list)
                    
                    c = np.array(data_list) ```
#

it seems to be just one single [x,x,x,x,x,x,x,x] array instead of [[x,x,x,x,x,x,x,x],[x,x,x,x,x,x,x]] which is what I would want

#

currently it replaces the first array with the second, as I just have two files in the folder to test it out

#

i thought extend would combine the separate arrays into a list of arrays

exotic maple
#

extend (merges) the two lists and creates a single longer list

#

append would "add" the 2nd list to the first one

#

what you should do is create a 3rd, upper lรฑevel list

#

and then append your data to it

lilac raven
#

so initialize a new list for them all?

exotic maple
#

no...

#

something like this

#
list_of_arrays = [] # master list to store the lists/arrays

if file.endswith("_MID-R1-ECG.1D_hrv.txt"):
                    full_name = pathlib.Path(root) / file
                    try:
                        read_fname = full_name
                        data = np.loadtxt(read_fname) # im assuming you're creating an array from text here
                    
                        data_list = data.tolist()
                        
                        list_of_arrays.append(data_list) #this will append the list-like array to the end of the master list
#

if for some reason you want to set those lists back to array you can do some list comprehension

#
list_of_arrays2 = [np.array(element) for element in list_of_arrays]
lilac raven
#

ah ok, that is working

#

and setting it back to arrays would allow to me to np.mean them im guessing

exotic maple
#

that depends on whatever you want to do

#

I'm not sure why you're converting them to list after reading them thou

#

but i'll leave your logic to you :p

lilac raven
#

originally thought I had to as I thought combining lists was easier than arrays

exotic maple
#

You can append any kind of object to a list.

errant crown
#

I want to do the following:
If a user misspells a command or input or whatever my programm gives an output with relevant commands (based on the input and the using history) that are available
how would i do that?

errant crown
#

maybe we have to research it on google or somthin because the ppl on this server dont seem to know how this works

gloomy berry
#

already went there

lilac raven
#

hmm I tried np.mean on the both versions, the list and array and I get the error ''function' object is not subscriptable'

errant crown
gloomy berry
#

!halp

#

this

errant crown
exotic maple
lilac raven
#

I want the mean of [x1,x2,x3,x4,x5] and [y1,y2,y3,y4,y5] like [x1+y1/n,x2+y2/n,etc]

#

np.mean with axis= 1 i thought would do that

exotic maple
#

I think @serene scaffold once showed me a way to compute the mean of a row/axis across many np.arrays, but I can't quite remember which function it was

arctic wedgeBOT
#

numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)```
Compute the arithmetic mean along the specified axis.

Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. [`float64`](https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#numpy.float64 "numpy.float64") intermediate and return values are used for integer inputs.
lilac raven
#

so I just need to make [[47.634249643827026, 48.707791774949484, 44.958609806628594, 46.17740725913995, 38.02733794748916, 38.1356384904845, 35.35533905932738, 35.68120160740313, 38.23956264058725, 40.523534677334084, 36.66523725058259, 31.91423692521127, 39.82019774119848, 40.08918628686366, 33.96831102433787, 59.219460014799566, 43.164887897106965, 44.69394835554186, 40.131993759056165, 75.0, 72.50760609188853, 28.4045450908509, 22.941573387056174, 26.28287415189234, 30.525697073419664, 37.17810563304078, 32.21390769615825, 23.27373340628157]] [[356.22258666457407, 349.47877856411634, 256.22921201710994, 251.57835094989127, 393.43572113709587, 204.17516989095418, 108.25317547305482, 109.66546927595373, 156.7907310185565, 215.62248388226018, 76.82953714410739, 131.98240351921797, 107.11309110334874, 100.0, 155.02932273957373, 267.6284738214527, 342.3813663153998, 289.35272592460575, 319.09348500700077, 277.6278189993808, 261.0439415001608, 229.46949688357273, 313.3243843228943, 250.97033910625996, 194.7798480058684, 326.25957840345467, 235.80044921893565, 140.24663149986463]] into one [.........] then?

serene scaffold
lilac raven
#

average of 47.6+356/n (for now n=2), 48.7+349.5/n, etc

#

they should all be the same shape

serene scaffold
lilac raven
#

yeah

sly salmon
#

for linear regression, is the "weight" of a feature the same as the variance between our outcome and our feature?

lilac raven
#

but it will be much larger, x+y+z+..etc./n

#

just testing it out with these two files for now

serene scaffold
#

np.sum(arr, axis=0) / n will take the "vertical sum" (I just made that up) of your array

#

and then divide each element by n

lilac raven
#

seems to almost do it but it summing strangely

#
[47.63424964 48.70779177 44.95860981 46.17740726 38.02733795 38.13563849
 35.35533906 35.68120161 38.23956264 40.52353468 36.66523725 31.91423693
 39.82019774 40.08918629 33.96831102 59.21946001 43.1648879  44.69394836
 40.13199376 75.         72.50760609 28.40454509 22.94157339 26.28287415
 30.52569707 37.17810563 32.2139077  23.27373341]
[[356.22258666457407, 349.47877856411634, 256.22921201710994, 251.57835094989127, 393.43572113709587, 204.17516989095418, 108.25317547305482, 109.66546927595373, 156.7907310185565, 215.62248388226018, 76.82953714410739, 131.98240351921797, 107.11309110334874, 100.0, 155.02932273957373, 267.6284738214527, 342.3813663153998, 289.35272592460575, 319.09348500700077, 277.6278189993808, 261.0439415001608, 229.46949688357273, 313.3243843228943, 250.97033910625996, 194.7798480058684, 326.25957840345467, 235.80044921893565, 140.24663149986463]]
[356.22258666 349.47877856 256.22921202 251.57835095 393.43572114
 204.17516989 108.25317547 109.66546928 156.79073102 215.62248388
  76.82953714 131.98240352 107.1130911  100.         155.02932274
 267.62847382 342.38136632 289.35272592 319.09348501 277.627819
 261.0439415  229.46949688 313.32438432 250.97033911 194.77984801
 326.2595784  235.80044922 140.2466315 ]```
#
[47.63424964 48.70779177 44.95860981 46.17740726 38.02733795 38.13563849
 35.35533906 35.68120161 38.23956264 40.52353468 36.66523725 31.91423693
 39.82019774 40.08918629 33.96831102 59.21946001 43.1648879  44.69394836
 40.13199376 75.         72.50760609 28.40454509 22.94157339 26.28287415
 30.52569707 37.17810563 32.2139077  23.27373341]
[[356.22258666457407, 349.47877856411634, 256.22921201710994, 251.57835094989127, 393.43572113709587, 204.17516989095418, 108.25317547305482, 109.66546927595373, 156.7907310185565, 215.62248388226018, 76.82953714410739, 131.98240351921797, 107.11309110334874, 100.0, 155.02932273957373, 267.6284738214527, 342.3813663153998, 289.35272592460575, 319.09348500700077, 277.6278189993808, 261.0439415001608, 229.46949688357273, 313.3243843228943, 250.97033910625996, 194.7798480058684, 326.25957840345467, 235.80044921893565, 140.24663149986463]]
[356.22258666 349.47877856 256.22921202 251.57835095 393.43572114
 204.17516989 108.25317547 109.66546928 156.79073102 215.62248388
  76.82953714 131.98240352 107.1130911  100.         155.02932274
 267.62847382 342.38136632 289.35272592 319.09348501 277.627819
 261.0439415  229.46949688 313.32438432 250.97033911 194.77984801
 326.2595784  235.80044922 140.2466315 ]```
#

after converting back to element to array

serene scaffold
#

what is the shape of the array you're passing to np.sum?

lilac raven
#

hmm says list object has no attribute shape

#

data has a shape of 28 which is what I Want, but once I list_of_arrays.append(data) it has no shape

#

              if file.endswith("_MID-R1-ECG.1D_hrv.txt"):
                    full_name = pathlib.Path(root) / file
                    try:
                        read_fname = full_name
                        data = np.loadtxt(read_fname) # im assuming you're creating an array from text here
                    
                        data_list = data.tolist()
                        
                        list_of_arrays.append(data) #this will append the list-like array to the end of the master list
                        list_of_arrays2 = [np.array(element) for element in list_of_arrays]
                    
                        print(list_of_arrays)
                        n = len(list_of_arrays)
                        s =  np.sum(list_of_arrays, axis=0) / n
                        print(s)```
#

and getting [array([47.63424964, 48.70779177, 44.95860981, 46.17740726, 38.02733795, 38.13563849, 35.35533906, 35.68120161, 38.23956264, 40.52353468, 36.66523725, 31.91423693, 39.82019774, 40.08918629, 33.96831102, 59.21946001, 43.1648879 , 44.69394836, 40.13199376, 75. , 72.50760609, 28.40454509, 22.94157339, 26.28287415, 30.52569707, 37.17810563, 32.2139077 , 23.27373341])] [47.63424964 48.70779177 44.95860981 46.17740726 38.02733795 38.13563849 35.35533906 35.68120161 38.23956264 40.52353468 36.66523725 31.91423693 39.82019774 40.08918629 33.96831102 59.21946001 43.1648879 44.69394836 40.13199376 75. 72.50760609 28.40454509 22.94157339 26.28287415 30.52569707 37.17810563 32.2139077 23.27373341] [array([356.22258666, 349.47877856, 256.22921202, 251.57835095, 393.43572114, 204.17516989, 108.25317547, 109.66546928, 156.79073102, 215.62248388, 76.82953714, 131.98240352, 107.1130911 , 100. , 155.02932274, 267.62847382, 342.38136632, 289.35272592, 319.09348501, 277.627819 , 261.0439415 , 229.46949688, 313.32438432, 250.97033911, 194.77984801, 326.2595784 , 235.80044922, 140.2466315 ])] [356.22258666 349.47877856 256.22921202 251.57835095 393.43572114 204.17516989 108.25317547 109.66546928 156.79073102 215.62248388 76.82953714 131.98240352 107.1130911 100. 155.02932274 267.62847382 342.38136632 289.35272592 319.09348501 277.627819 261.0439415 229.46949688 313.32438432 250.97033911 194.77984801 326.2595784 235.80044922 140.2466315 ]

#

oops, also replacing list_of_arrays in the final few lines with list_of_arrays2 gives same output

exotic maple
#

can you do

#

array.shape

#

on both your arrays

#

and then show us that

lilac raven
#
Traceback (most recent call last):

  File "<ipython-input-333-83ca99a6c7b8>", line 1, in <module>
    list_of_arrays2.shape

AttributeError: 'list' object has no attribute 'shape'```
#
list_of_arrays.shape
Traceback (most recent call last):

  File "<ipython-input-334-637c734ee8f5>", line 1, in <module>
    list_of_arrays.shape

AttributeError: 'list' object has no attribute 'shape'```
exotic maple
#

no...I mean the actual arrays inside

#

how much Python do you know?

#

because it seems your struggles come a bit more from the fundamentals

#

try this:

for array in list_of_arrays:
array.shape

lilac raven
#

for array in list_of_arrays:
array.shape

array.shape
Traceback (most recent call last):

File "<ipython-input-336-270abd9e5a99>", line 1, in <module>
array.shape

AttributeError: type object 'array.array' has no attribute 'shape

#

im still learning definitely

exotic maple
#

do this

#

type(list_of_arrays[0])

lilac raven
#

(list_of_arrays[0])
Traceback (most recent call last):

File "<ipython-input-337-c9ed6307f273>", line 1, in <module>
(list_of_arrays[0])

IndexError: list index out of range

exotic maple
#

ok this is a bit frustrating

lilac raven
#

(list_of_arrays2[0])
Out[340]:
array([356.22258666, 349.47877856, 256.22921202, 251.57835095,
393.43572114, 204.17516989, 108.25317547, 109.66546928,
156.79073102, 215.62248388, 76.82953714, 131.98240352,
107.1130911 , 100. , 155.02932274, 267.62847382,
342.38136632, 289.35272592, 319.09348501, 277.627819 ,
261.0439415 , 229.46949688, 313.32438432, 250.97033911,
194.77984801, 326.2595784 , 235.80044922, 140.2466315 ])

exotic maple
#

use that

#

but with the name of the actual list

#

type("NAME OF LIST"[0])

exotic maple
#

for array in list_of_arrays2:
array.shape

lilac raven
#

for array in list_of_arrays2:
array.shape

array.shape
Out[344]: (28,)

#

28 is good for each individual, but I wanted to combine the two individual files (arrays out of those files) into one so I can find the mean easily

late shell
#

hello, noob question : I was training a Support vector regression model when I realized that scaling the target variable significantly boosted the accuracy as opposed to where I only scaled the predictors. Can someone please explain the reason behind this? Why should scaling the target variable help in training the model in any way?

exotic maple
inner estuary
#

Someone knows a good article explaining about facial recognition with python and could share with me please?

lapis sequoia
#

What do people actually use R for over Python?

#

And MatLab for that matter

near cosmos
#

assuming use of tidyverse

#

IME, a lot of research is still done in MATLAB. My sense is this is partly because of legacy/familiarity and lower incentives to learn new tools vs get work done, and partly because there really is good support and documentation for things like data acquisition and optimization. That's kind of mind-reading though--I'm not a fan of MATLAB

desert oar
#

I was blown away the first time i saw someone doing regression in matlab while they also had university access to stata

#

As silly as stata is, there is zero reason to use matlab over stata for just doing basic regression analysis

#

People are really weird when it comes to the tools they like

near cosmos
#

For a lot of users, the work of learning enough to be productive in one environment was tremendous. So they'll go through incredible pain to keep using that environment.

#

Also, "the last postdoc wrote it in matlab and gave me the script"

lapis sequoia
#

can someone help me finding this dataset pls?

jade carbon
#

which we use for time series prediciton?
sparse categorical crossentropy or categorical crossentropy

dusty turret
#

Any sample code for reading training data set of pdf containing scan image of restaurant invoice, and using nlp based model to extract total amount from receipt?

slate hollow
#

hey i'm trying to learn RNNs

#

and they say you define one in keras like so?

#

*:

#

keras.layers.SimpleRNN(1, input_shape=[None, 1])

#

the thing is, where do we specifify the number of times it's passed through?

exotic maple
#

But I can be extremely wrong as well

jade carbon
#

almost in every predictions, they use Dense(1) without any activation.

#

what should y choose for?

exotic maple
#

I'm surprised...I've used sigmoid activation at the last layer for that kind of problem (kill me)

#

for cost function you may use...binary cross entropy i think its called

winged stratus
exotic maple
slate hollow
#

hey i'm trying to learn RNNs
and they say you define one in keras like so:
keras.layers.SimpleRNN(1, input_shape=[None, 1])
the thing is, where do we specifify the number of times it's passed through?

winged stratus
worn hinge
#

So I've been thinking about learning how to do object detecting and stuff with opencv... However, I have no clue where I might start with that. Are there any resources for this type of stuff? (or maybe even just a list of relevant concepts that I can use to piece something together)

winged stratus
#

and on the model side, you could learn some R-CNNs or yolo models

worn hinge
#

I should maybe mention that I've got almost no prior knowledge about any of this

winged stratus
#

yeah, so learn about filters in image processing

#

and when you get an idea about that, you can move on to the other stuff

lapis sequoia
cyan lantern
#

anyone know if there is a way to reduce memory usage from scipy sparse matrices? trying to run a classification model with a pretty big dataset and it is way to expensive to run in terms of memory

kindred radish
#

Just a check about the precision and recall metrics

#

if i were to get a precision and recall of 50%, that implies the classifier is as good as a coin flip in making predictions?

#

(would reallllly appreciate an answer my dissertation is due soon!!)

lapis sequoia
#

which is better

#

tensorflow or pytorch?

cyan lantern
kindred radish
#

im using sklearn to compute them

#

just using their functions

cyan lantern
#

macro or micro?

kindred radish
#

uhhhhhhhh wdym

cyan lantern
#

you are using precision_score() function right?

kindred radish
#

aye and recall_score()

cyan lantern
#

so in these functions you can implement different ways of calculation

kindred radish
#

ohhh yeah i see them

cyan lantern
kindred radish
#

I've been using 'binary'

#

as i'm using a binary classifier

#

ie. the output is either 1 or 0

cyan lantern
#

maybe try out different ones to see the difference

cyan lantern
#

but in general if you have an equal number of samples in each class then it will result in the same score (i think)

kindred radish
#

wait really, wouldn't that make them bad metrics then?

cyan lantern
#

sorry i mean macro and micro will result in same score

kindred radish
#

oh jesus that scared me hahahahaha

#

Otherwise my entire conclusion for my dissertation would have been fucked lmao

cyan lantern
#

so yeah it really depends on what the distribution is

jade carbon
winged stratus
jade carbon
#

even to non linear regression in time series prediction?

winged stratus
#

yeah, since the neural net itself is non linear

jade carbon
#

okay y see.
thx for helps!

tacit palm
#

Hi was wondering if it was possible to do a linear regression

#

with multiple categorical variables

noble nimbus
#

Can anyone help me out with Incremental Clustering?

#

I want to group same-topic news but I seem to find either too advanced stuff, or just theoretical resources.

inland zephyr
#

I want to ask about Siamese NN implementation. According to the behavior of the model, it need 2 image to compare if both are similiar. In real world case, let said i have 1000 person in db and I need to compare a probably same person.

#

Is the better way to loop 1000 times and summarize the result (the minimum distance is the similar person), or is there way to parallelized the process?

jade carbon
#

should compare the images 1 by 1

inland zephyr
#

but i need loop rights? it will cost much if we talk about time and money cost, especially if i using API such AWS Rekognition or similar one

inland zephyr
#

or should i use usual classification tasks for my case?

jade carbon
#

just siamese, count the limit classification

grave frost
#

๐Ÿคฃ ๐Ÿคฃ

#
#

the research is that pre-training code on large models works better? how is that research???

#

if nobody has actually tried pre-training LM on code before, I am going to go and hang myself

jade chasm
#

Hey guys, does anyone have any idea how to use monte carlo simulation for a continuous variable in python?

#

I've been breaking my head over this. This should be insanely easy

#

np.random usually deals with discrete probability density functions

#

I could calculate the CFD, but that wouldn't really help here, would it?

#

obviously, the CFD is just a straight line from 0 to 2.

#

my previous monte carlo esimates were among the lines of:

M = (3*(math.e**4))/103
p = lambda k: M * (((4 ** k) * (math.e**-4)) / math.factorial(k))
probs = [p(k) for k in range(5)]
print(probs)
sample_space = [k for k in range(5)]
# samples = np.random.choice(sample_space, size = 1000000, replace = True, p = probs)
# print((np.mean(samples)))

solution = {}
N=0
cumsum = 0
while N < 100000:
    N += 100 
    counts = 0
    samples = np.random.choice(sample_space, size = 100, replace = True, p = probs)
    cumsum += sum(samples)
    solution[N]= cumsum/N
``` for different parts.
#

If you have an idea, please @ me as I'll be alt-tabbing in a bit. You're a god if you give me the tip which helps me solve it, because I'm all out of idea's.

harsh horizon
bold timber
uncut barn
#

can anyone tell me what went wrong here?

idle summit
#

missing ) for imshow function

uncut barn
#

ah thanks

idle summit
#

should be view(18, 28))

silver widget
#

is there good source to study transformation of non-normally distributed data? I'd like to understand when to chose standarscaler, log transform, or boxcox etc.

bronze skiff
wicked sierra
#

Any Natural language processing expert who can help me?

bronze skiff
jade chasm
#

thanks for your insight by the way

uncut monolith
#

i need some help with dash from plotly

#

does anyone have experience with this library here?

#

im using for a school project

jaunty idol
upper stirrup
#

Hi. Required for word level tokenization to use Train / Val / Test splits?

serene scaffold
upper stirrup
#

now I just try discover word level tokenization

dapper halo
#

dummy question...how can you reset a dataframe index without dropping out any duplicates.

Trying to append data to end of a dataframe and at the end would just like to reset index so its sequential. But it always removes the additional rows I've added onto it

#

I guess reset is supposed to take it back to the original indexing....doesn't seem any of the commands lets you just overwrite the ordering

serene scaffold
mystic orchid
#

Hi everyone! What you can recommend to learn for noob, who want to learn ml. Have knowledge of python.

gritty socket
#

anybody knows how can i search for a specific face on opencv like when i will show my face it will say jack when i show any other person it will say human

#

i want to make my own haarcascade file

serene scaffold
#

there's also a lot of possible directions with ML, so you should look into what those are and pick one.

desert oar
#
  • linear algebra: vector/matrix math and how to interpret matrices as systems of equations
  • calculus: derivatives, convex optimization, at least conceptually know what an integral is (riemann sums)
  • probability: random variables, mean / expected value, variance / std dev, conditional probability (bayes' theorem & law of total probability), law of large numbers, central limit theorem, bernoulli/binomial and gaussian distributions
  • statistics: sample vs population, bias-variance tradeoff, cross validation, classical null hypothesis testing, linear regression, logistic regression

off the top of my head, those are probably the fundamental tools that you will use on a regular basis in machine learning

#

you don't have to learn all of it at once

#

imo the best place to start is basic data analysis (estimating mean and median, data visualization basics) and play around with real datasets using pandas and matplotlib.

#

you will build intuition and experience working with real world data

#

you'll start learning some of the core statistics vocabulary, and you'll gradually start to encounter things you don't understand from probability and stats

#

when you have a bit of comfort in that area, you can move on to more advanced problems. basically, start by getting hands-on with real data, while gradually expanding your sphere of understanding

#

but you absolutely need to be comfortable working with, visualizing, talking about, and thinking about data

flint mason
#

how to insert a dataframe in an sql database\

desert oar
arctic wedgeBOT
#

DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)```
Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [[1]](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html#r689dfd12abe5-1) are supported. Tables can be newly created, appended to, or overwritten.
mint palm
#

Why filter has -1

#

Arent rgb for 0 to 255

#

And even if we only use grey scale ...isnt it 0 to up

desert oar
#

@mint palm often in machine learning we like to normalize data to the -1,1 or 0,1 range when we know it is bounded

#

here they normalized 255->1 and 0->-1

mint palm
#

While plotting will we have to take extra care for affects of normalization

desert oar
#

all closed intervals of real numbers are isomorphic ๐Ÿ™‚ meaning you can always rescale an interval without losing any data

mint palm
#

Forming image i mean

desert oar
#

in matplotlib you should probably set the Z range manually

mint palm
#

Ok thank you

desert oar
mint palm
#

Currently theory ...will see practical in weekend thank you though

flint mason
desert oar
#

you can also manually iterate over rows with .itertuples and run an INSERT for each row

flint mason
desert oar
#

to the database, right?

#

then use .to_sql

#

or do what i said with .itertuples

flint mason
#

sorry I am automating for the first time

desert oar
#

did you read the docs?

flint mason
desert oar
#

it says that you need to use the sqlalchemy library to create an "engine" object

#

then you pass the engine

flint mason
#

yes, I got that working.

desert oar
#

if you use sqlite you can use the sqlite connection object directly, as a special convenience case

flint mason
#

can I retrieve the data later when my complier has stopped working or reset?

mint palm
#

I requested permission @desert oar

desert oar
#

oh i didnt know you needed permission

desert oar
#

it's stored in the database

#

try again @mint palm i unrestricted it

flint mason
mint palm
#

Ok

desert oar
#

are you using sqlite?

#

its possible that you need to commit after saving if you are using the sqlite3 library

flint mason
#

no I m using

create_engine('sqlite://', echo = False)
df.to_sql('table_name', con= engine, if_exists='append')
desert oar
#

you don't need to create an engine if you're using sqlite

mint palm
#

Got it so we normalize it setting max 1 and min -1

flint mason
desert oar
#

also it looks like that won't save to a file since you didn't provide a filename. so this database might be in-memory only

mint palm
#

Thank you

desert oar
#

i am saying that pandas specifically lets you use a plain sqlite connection without sqlalchemy

#

this is a specific case for sqlite, for convenience

flint mason
mint palm
#

You may restrict it again if you want

desert oar
#

@mint palm i will try to make it read only but i dont care if its publicly viewable

#

i believe it already is read-only actually

mint palm
#

I was editable i guess it felt so

desert oar
#

are you sure?

#

can you try to add and run a new cell?

#

@flint mason sqlite has the ability to save the database "in memory", which means it doesn't create a database file, and the database disappears when the python process exits

mint palm
#

Let me try

#

I am able to run and add cell

#

@desert oar

desert oar
#

weird ok

#

i don't see it on my version

#

maybe it made a local copy

#

try to save?

tiny flax
#

So I did a course of a kinda sketchy website. (sketchy in terms that the course did not even mention tensorflow, only sklearn) and I got a certificate of completition. So I wanted to know if the certificate was hot garbage or is it worth something?? the website is https://www.sololearn.com

and on a side note, after doing another course on ML from freecodecamp I see tf.estimators.LinearClassifier() ans sklearn LinearClassifier(), what are the key differences
between them?

Join Now to learn the basics or advance your existing skills

velvet thorn
#

p sure

#

there's no sklearn LinearClassifier?

#

show me

#

the TensorFlow one is a logistic regression I believe

tiny flax
#

I meant sklearns's Linear classifier

#

don't exactly remember the module its imported from

#

yeah sklearn.linear_model.LinearClassifier

#

@velvet thorn

velvet thorn
#

I'm pretty sure

#

there is no such class

tiny flax
#

oops my bad

#

its linear regression I got confused

grave frost
#

@tiny flax where did you write your code? on their IDE or your own computer

tiny flax
grave frost
velvet thorn
#

because LinearRegression is a regression model

#

the LinearClassifier from TF is a classifier

#

so my next question is

#

why are you comparing the two?

tiny flax
#

Uhh I got confused, I thought both of them were classifiers

#

not confused now

velvet thorn
#

okay so assuming

tiny flax
#

thats why I was repeatedly going on about sklearn's LinearClassifier

velvet thorn
#

you meant

#

LogisticRegression

#

they may differ in terms of implementation?

#

possibly TF's uses the GPU

tiny flax
#

in terms of implementation or something

velvet thorn
#

I'd suggest looking @ the source

tiny flax
#

I found sklearn is a bit simpler atleast as much thats covered in the course and tensorflow seems to be a bit more complicated

#

syntax and usage wise

grave frost
#

that's because Sklearn is supposed to be simple and light, while TF is a part of a bigger framework for heavy DL computation on multiple devices.

tiny flax
#

I mean I couldn't initially download the tensorflow package using pip over wifi( 456 MB ) It took long enough for the mirror to close the connection,

#

I thought it was a problem with Jupyter

#

My wifi is kinda slow

#

sklearn is comparatively lightweight

upper stirrup
#

hello, what is the Pre-processing Data?

serene scaffold
upper stirrup
#

I don't want to print the "ali", but also that "0",how can i do this? Shortly: string --> decimal literal

autumn veldt
#

Hello everyone i have a question, currently im creating a program for classification and prediction disease using diagnosed dataset.

500 datasamples (imbalanced)
9 predictor class
1 Target class (450 class a, and 50 class b)

On predictor attribute, mostly data samples is in categorical type, "Yes" or "No"

I tried to balanced the dataset using SMOTE, and the result of my program is always on 100% accuracy. The question is, how can my accuracy always on 100%? Even tho I'm using SMOTE to balanced the dataset.

main kernel
serene scaffold
main kernel
#

guys, how do i train a time series model(scikit or other), with chucks of data(all dataset = 20 GB) how i make it free space, and advance in traing/fiting, like model.fit(df1) than model.fit(df2)... and in the end i have only one model fited

#

i have only 8GB of ram to use

serene scaffold
main kernel
#

its not a option, there is a way to do this in chunks or parts?

serene scaffold
#

in either case, what library are you using to train?

winged stratus
#

dask is a common library, so if you don't mind having it as a dependency

main kernel
#

i thonth to use just scikit LinearRegression

#

ok, i have chunks of data, but can i fit with chunks?

winged stratus
main kernel
#

nice, partial_fit is like = fit df1 , than fit df2 ... ?

winged stratus
#

you use partial_fit normally as you would fit, but on each minibatch

#
for x_chunk, y_chunk in chunks:
   model.partial_fit(x_chunk, y_chunk)
#

like dis

main kernel
#

nice, thank, i think this solve my problem!

winged stratus
#

cool

#

do look into dask, it was made for handling large datasets

#

and it has some lazy evalution features

main kernel
#

i try usin vaex, but it has so many bugs ๐Ÿ˜ฆ

velvet thorn
#

dask is a good choice IMO

#

or you could consider Spark

exotic maple
#

It seems to work as a wrapper around Dask

#

and simplifies a lot of its operations

jolly nest
#

statistics: how to do chi squared thingy in python without imports.

main kernel
#

if you dont want to use np, good luck, but i think you will find some guides

jolly nest
#

ah thank you

#

We start by importing some Python libraries:
XD

#

they abstract away what im trying to learn!

main kernel
#

he build with numpy, he just use scikit to compare

mint palm
#

See if it changed at you end too

silk prawn
#

Anybody got any good resources to learn computer vision?

mint palm
#

Deep learning. Ai course 4 of specialization

silk prawn
#

Thanks ๐Ÿ‘

#

I was also wondering if there are some free of cost resources available?

mint palm
#

This gives pretty basic intro to CNN

silk prawn
#

Ohh thanks

mint palm
#

And intro to Tensorflow and RNN too

mint palm
upper spade
#

guys

#

when should i learn data science

#

i just finished the basics

#

what should i read

#

to go down data science and ai road

fading wave
#

@upper spade Finishing basic python is enough to continue with data science

upper spade
#

damn

#

okay man

#

ill go on the hunt for my first book

fading wave
#

you will learn advanced stuff along the way of learning data science anyways

upper spade
#

ohhhh i see

#

thanks dude

fading wave
#

Many people recommend to start with Machine Learning course on Coursera (even I did) ... It's in Octave but the concepts are very useful ... also a lot of syntax is similar like slicing and stuff

fading wave
upper spade
#

i see

#

its okay

#

i just want to learn

fading wave
#

You can try for financial aid though if you want certificate ... it is usually quite easy to get

upper spade
#

okay man

#

thanks so much for your help

#

really appreciate it man

fading wave
#

After that you can do two specializations on Coursera "Applied Data Science with python by UMich" and "Deep Learning by deeplearning.ai"

upper spade
#

this is what you took?

fading wave
upper spade
#

okay me too then

fading wave
#

After doing that you will have enough knowledge to participate in kaggle contests and boost your skills

upper spade
#

what kaggle

#

what's

fading wave
# upper spade what kaggle

kaggle is like codeforces for data scientists ... there are various data science contests, expert notebooks to learn from, and datasets to experiment on

upper spade
#

oh wow

#

good to know

#

i sure am far from getting there yet

#

but ill try my best

fading wave
#

Also, do follow some medium sites like Towards Data Science, Analytics Vidhya, etc.

upper spade
#

will do man

#

if i have any more questions ill ask you

sage swan
#

so no money has to be spent for it

#

just focus on skills . Certificate means nothing

mint palm
#

Haha hard way is the only way left

#

Github answer key is useless now

sage swan
#

ohh

grave frost
#

looks like TPU v4 is gonna be out soon

#

I wanna get a grant

paper gorge
#

can anybody teach me python

pine bluff
gritty socket
#

does anyone know how i can teach objects to opencv

#

like it comes with face eyes car plates

#

but i want to add other things too

#

how can i train it

sly salmon
#

how can I visualize numpy arrays? I'm having a hard time understanding the concept of 1D, 2D, 3D arrays.

sly salmon
paper gorge
gritty socket
#

yeah

velvet thorn
#

it gets complicated after 3D

sly salmon
#

yeah I can wrap my head around the 1D and 2D arrays, but building and visualizing 3D or higher with numpy is a bit of a headache

#

technically I could have a 2D array with many different features, which represent coordinates to make a n dimensional graph - that makes sense

#

but it's just a minor hiccup

velvet thorn