#data-science-and-ml

1 messages ยท Page 262 of 1

heady hatch
#

Because that's how society uses facial recognition.

#

Terrible for certain groups of people but okay for others.

#

Some suggestions I have for dealing with certain groups of data points is you can either do feature engineering

#

ensemble the models

lapis sequoia
#

yeah good point

heady hatch
#

etc etc.

lapis sequoia
#

whats that

heady hatch
#

You might need to look into data science.

lapis sequoia
#

@heady hatch any helpful resources for that?

steep olive
#

Hey, I'm learning pandas and numpy... Can somebody tell me, How can I divide a dataframe in series?

heady hatch
#

Could you clarify what do you mean by divide a dataframe in a series?

steep olive
#

Get specific information form a dataframe by labels

#

Like "dates" and get all the column

lapis sequoia
#

What are some good books to learn stats

heady hatch
#

@steep olive So get certain columns?

ie

if there's a column named 'a', get 'a' as a column from the dataframe?

steep olive
#

Yeah, but I've found what I need, thank you XD

hollow sentinel
#

hey guys

#

what's the difference between linear regression and multiple linear regression

#

can someone explain it simply

lapis sequoia
#

Most of the time what we( in ML) use is multiple linear regression only.
When you have more than one independent variable then it becomes MLP.

#

Y = Model(X_1) Linear Regression
Y = Model(X_1, X_2, X_3) Multiple Linear Regression

hollow sentinel
#

very cool

#

thank you @lapis sequoia

full narwhal
#

I was looking at different ways of doing array filtering in Python, and came across something I find weird. Why is the second method of filtering the fastest?

lapis sequoia
#

@full narwhal Actually the difference is not a lot (at max 20ish %).
Anyway the difference is most likely due to how indexing works in the background for numpy arrays.

In 1st and 3rd you need to calculate columns and rows indexes k=i*ncol+j for each cell.
But in the 2nd it is you are avoiding that computation. Therefore it is a lil faster.

#

I'm not sure if I'm accounting all the possible reasons but the above one is one of them.

full narwhal
#

@lapis sequoia The difference isn't a lot, but the order is consistent across multiple array sizes. Naively, though, doesn't the second method have the most amount of allocations?

#

There's one for the data[1] >= 0.75, then one for np.where(), then one for data[0][index_list]

#

and i feel like there should be a way to combine what np.where is doing in the other two methods without that extra allocation

#

am i missing something?

lapis sequoia
full narwhal
#

that code is comparing python performance to numpy perf. i dont really see what it has to do with this

lapis sequoia
#

%%timeit
index_list = [data[1] >= 0.75]

%%timeit
index_list = np.where(data[1] >= 0.75)

full narwhal
lapis sequoia
#

try to break the code in more cells and time it. Then see if it is due numpys C optimisation or not.

lapis sequoia
#

As you can see the np.where is slower but the output is in indices form where as in simple conditional it is in True and False. Which leads to different sizes.

full narwhal
#

Yes, but that computation has to happen either way

#

I would argue np.where has to do the extra step of gathering the indices

#

The way i see it, why doesn't the simple solution do what np.where does, except rather than gathering the indices, it gathers the associated data[0][i] (which should be an O(1) operation)

lapis sequoia
#

If you want to go indepth on why it is happening like that and why is there a difference then I can only suggest to look under the hood.

velvet thorn
#

hm.

#

this is an interesting problem

#

@full narwhal I don't have an answer, only a guess

#

and my guess is that in the np.where case, it's faster because the size of the result is known

#

so there is only ever one allocation

#

I'm not sure if there's a way to track allocations, but if there is that might help?

full narwhal
#

@velvet thorn The size of the result is know because np.where had to figure out the size for its return array

#

and you still have to allocate another array for the result, no?

#

it's not overwriting the index array

velvet thorn
#

and you still have to allocate another array for the result, no?
@full narwhal yes

#

but what I'm saying is

#

the difference is in the column-level indexer for the original array, right?

#

whether it's a boolean mask or an array of indices

#

and I'm saying that that part is faster with the latter because the length of the result is known in that case

#

but not in the former case

#

since you have to traverse the entire boolean mask to know the length of the result

#

so presumably there's a number of reallocations, which lead to the slightly higher time

daring crag
#

Hello there! Im new at data science and i want to start but i dont know from where... Can someone recommend me a course, Tutorial or any resource? Thanks btw

velvet thorn
#

non-conclusive support:

#

if you reduce the size of the original array by a lot, the former is actually faster

#

which, is my supposition were valid, would make sense, because there'd be fewer reallocations

full narwhal
#

@velvet thorn what i'm saying is np.where doesn't know what size the output array is to begin with, either

#

right?

velvet thorn
#

@velvet thorn what i'm saying is np.where doesn't know what size the output array is to begin with, either
@full narwhal yes, it doesn't

#

but it calculates it

#

that's my point

full narwhal
#

what i'm asking is why couldn't we skip the intermediate step of collecting the indices?

velvet thorn
#

The way i see it, why doesn't the simple solution do what np.where does, except rather than gathering the indices, it gathers the associated data[0][i] (which should be an O(1) operation)
@full narwhal because it's not necessarily faster, perhaps

#

what i'm asking is why couldn't we skip the intermediate step of collecting the indices?
@full narwhal there's probably some sort of tradeoff.

full narwhal
#

so i just tested it on a 2x1000 array and you seem to be right, but at 2x10000 it seems to flip the other way around (and i wouldnt really consider 10000 to be large)

#

but i still feel like there's something missing here; i just don't see how the np.where method can be faster for larger arrays when it has to do an extra step

velvet thorn
#

but i still feel like there's something missing here; i just don't see how the np.where method can be faster for larger arrays when it has to do an extra step
@full narwhal like I said, presumably the different indexing format cuts down on reallocations when applied to the original array

#

but without looking at the source it'd be hard to tell I guess

ruby summit
#

Hello everyone. What do you think is the appropriate mixture of data science skills and domain knowledge?

jolly plank
#

Can somebody help me with question

#

hello...

lone osprey
#

Filter the subset.. does anyone understand the question??

deep galleon
#

I'll be that guy.. if someone is knowledgeable with numpy masked array, I posted a question with a simplified example in #help-pancakes ๐Ÿ™‚

mild topaz
#

my code here https://paste.pythondiscord.com/ezojitenom.py python Traceback (most recent call last): File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper resp = resource(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view return self.dispatch_request(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request resp = meth(*args, **kwargs) File "E:\demo3\findDocumentType1.py", line 126, in post self.resize_im(image_data) File "E:\demo3\findDocumentType1.py", line 209, in resize_im img = preprocessing(img) File "E:\demo3\findDocumentType1.py", line 194, in preprocessing img = img//255 TypeError: unsupported operand type(s) for //: 'NoneType' and 'int'

vague bear
#

in real jobs, how common is it to code the visualization with Python matplotlib as oppose to tools like Tableua?

odd yoke
#

Ime it depends on the team and the tools they use, if they're already all the preprocessing and analysis in python, it's generally simpler do directly plot stuff from there with mpl/pyplot

#

Same with r and ggplot etc

sudden delta
#

just used matplotlib for realtime visualization recently, it happened to be really quick to get going

#

short answer: matplotlib if engineers/scientists are looking at it, tableau if managers and up are looking at it :>

hard swan
#

Is this room also for Machine Learning?

pale thunder
#

yes

hard swan
#

I am starting to learn ML and I would be asking many questions about that

#

I have the book Python Machine Learning 3rd Edition by Sebastian Raschka

spark nimbus
#

Does anyone have good references for signal processing?

mild topaz
#

model = load_model(pathlib.Path(r'E://demo3\\united_kingdom_50.h5')) this is not working

grave frost
#

@daring crag What exactly do you find interesting?

lapis sequoia
#

Can somebody help me with question
@jolly plank just apply a filter or condition to select the desired category.

#

in real jobs, how common is it to code the visualization with Python matplotlib as oppose to tools like Tableau?
@vague bear It'll depend on your job. If your jobs requires you to give a final viz. report then maybe you don't want to use matplotlib. But when you are doing analysis as subtask in a project or you just need quick plots then your may go for matplotlib.

Conclusion: If you are serving the final result to a non technical group or its a presentation then you may want to have a nice dashboard made from tableau.

spark nimbus
#

@lapis sequoia is that one mainly focused around machine learning or mainly signal processing, because I don't need the former

#

nvm, seems to mostly be machine learning and the applications of it in signal processing

lapis sequoia
#

yeah i though you are looking with ML. This is data science channel. tbh.

#

@spark nimbus You can try this. This looks pure signal Processing. https://www.coursera.org/specializations/digital-signal-processing

Coursera

Offered by ร‰cole Polytechnique Fรฉdรฉrale de Lausanne. This Specialization provides a full course in Digital Signal Processing, with a focus on audio processing and data transmission. You will start from the basic concepts of discrete-time signals and proceed to learn how to ana...

#

But it looks paid. I'm not able to find audit option in Coursera.

spark nimbus
#

Yeah, I was about to say it requires a login :/

lapis sequoia
#

You can try free trial..

#

There was an audit option in coursera courses. You could watch free videos. But now it looks like they have removed that option.

spark nimbus
#

The main issue in signal processing is that the basic concepts are pretty simple, but for anything slightly more complex you need to suddenly understand a couple dozen terms that you've likely never heard before, and I just keep getting lost in all this, especially since I have a hard time visualizing it in my head

lapis sequoia
spark nimbus
#

oh now that you mention it, I might still have the PDFs of the books in my uni's mega drive

#

I think they had a signal processing course

austere swift
#

For some reason i'm getting a keyerror when trying to read a column from a dataframe in pandas, when I know the column name is correct

lapis sequoia
#

df.columns will give you columns.
print(df.columns) and see the column names

austere swift
#

here it shows the dataframe and the column lists but it still says keyerror trade type

#

yeah i already did df.columns

#

you can see 'Trade Type' is in the dataframe and in df.columns but it still gives me a keyerror

lapis sequoia
#

check for whitespaces and \n

lone osprey
#

Try checking data in database

#

Like pasta told

#

U like pasta, pasta?

#

Or ur name is pasta?

austere swift
#

what's weird is that when I try to call 'Trade Date' it works fine, but 'Trade Type' doesnt work

lone osprey
#

I think u have to check on data only

austere swift
#

what do you mean

lapis sequoia
#

@lone osprey I keep my nicks related to food and fruits. And yes i like pasta.

lone osprey
#

Nice๐Ÿ˜

spark nimbus
lone osprey
#

what do you mean
@austere swift check like pasta told

lapis sequoia
#

what's weird is that when I try to call 'Trade Date' it works fine, but 'Trade Type' doesnt work
@austere swift did you check for whitespaces and \n ?

austere swift
#

how would I check that?

lone osprey
#

Show us data once

lapis sequoia
#

df[df.columns[6]]

#

see where the Trade Type is in the columns array. And select it.

#

most likely it will be 6th index. if it works then the there is some whitespace

austere swift
#

df[df.columns[6]]
@lapis sequoia this works for some reason but putting the string directly doesnt

#

maybe it has some weird whitespace thats not a normal space

#

but even when i copy it from the terminal it doesn't work

lapis sequoia
#

You can't see space when you print it.

solar bluff
#

Yeah I've had df columns do that, when there was a space included at the end of the name that isn't obvious

austere swift
#

yeah, I guess i'll just use df[df.columns[6]] instead just as a workaround

solar bluff
#

you could also just rename that column?

austere swift
#

well it's being scraped from a website, so it would probably just be easier to use that as a workaround

lapis sequoia
#

@austere swift you can clean those spaces by using string.strip()

#

df.columns = [col.strip() for col in df.columns]

austere swift
#

Yeah i'll try that

#

nope that didnt work either

haughty nymph
#

Hey folks, does anyone know any good and robust ways to convert a pretty extensive MATLAB script to a Python script?

lapis sequoia
#

i got a question

#

Hey folks, does anyone know any good and robust ways to convert a pretty extensive MATLAB script to a Python script?
@haughty nymph You'll have to write it down in python. Or you can see if there is some library or some repo where the required script is already written.

#

i got a series of 3d brainscans with their labels

#

how can i extract them with the correct labels

#

its in matlab file

#

what is the format of 3d brainscans ? images?
And where are the labels? In filename ?

narrow flume
#

Have the user input a list of columns for a table

Have the user input a data type for every column: int, float, string size 255

In a loop have the user input the values for each row

Ensure the string size doesn't exceed 255

Print the results when the user is finished

#

for the second sentence, how do you know when the user has already input every column include int , float , string size 255?

lapis condor
#

does anyone know how to implement naive baye's classification algorithm? I understood how it works but I'm new to Python language.

lapis sequoia
#

@lapis sequoia the labels are in an array

#

@lapis sequoia the labels are in an array
@lapis sequoia I'm not sure what is your problem.
Do you want to train a image classifier to classify the brainscans with correct labels ?

#

does anyone know how to implement naive baye's classification algorithm? I understood how it works but I'm new to Python language.
@lapis condor If you are looking for basic naive Bayes Classification algorithm implementation in python then they are very much available on internet.

You just have to create a table of probability(frequency/total) for each word by each class.

#

And if the variables are numbers (decimals) then it is a lil tricky.

lapis condor
#

Actually, I'm looking for something that doesn't use iris. I got specific dataset and was asked not to import

#

@lapis sequoia If you could help with that please

lapis sequoia
#

@lapis sequoia the problem is the I have never used matlab files

#

And I need a way to extract several scans and their corresponding to train a model

#

How I'm not sure how to do so

#

@lapis sequoia
can you tell me the extension of the file ?

#

in which the brainscan image is stored

narrow flume
#

Could anyone help me with python numpy structure array?

lone osprey
#

Yup

narrow flume
#

Have the user input a list of columns for a table

Have the user input a data type for every column: int, float, string size 255

In a loop have the user input the values for each row

Ensure the string size doesn't exceed 255

Print the results when the user is finished

You may use Numpy or Pandas to do this

#

I choose to use numpy

#

import numpy as np a = int(input("Size of array:")) lst = [] for i in range(a): my_array.append((input("Values:"))) my_array = np.array(my_array)

#

here's what i have been doing

#

how do i know when user has already input a data type of every colum: int , float , string 255

#

@lone osprey

lapis sequoia
#

@lapis sequoia
can you tell me the extension of the file ?
@lapis sequoia well its a .mat file

#

i believe its several thousand 3d images

#

import scipy.io
X = scipy.io.loadmat('file.mat')

#

Hello guys, does someone here attempted using neuronal networks for building better trading bots?

#

@lapis sequoia
Images are nothing but arrays. Just load them as shown above. Check the shape of X.
You X will have some shape like(n,h,w, ch).

#

Where n = number of images.
h = height of brainscan images
w = width of brainscan images
ch = channels. (=3 if its a clor image)

#

Apply CNN on them and you should be able to get a decent classifier.
Or Use transfer learning if the images are not enough.

#

its telling me that its a dictionary

#

@lapis sequoia Ok. then you will just have to extract the value where from the some Key.
Most likely its in the last tuple. ('Data', array[])

#

how would i do that

#

Yeah its the key with 'Data'.
data = scipy.io.loadmat('file.mat')
X = data['data']

#

And you can see its a 4D array as I mentioned above. (n,h,w, ch)
check the shape of X.shape

#

@lapis sequoia could this be the dimensions

#

i suspect its a greyscale image

#

Yes. its greyscale. But I'm not sure which one is the n = Number of brainscan files.

#

most likey n=89 and the brainscan images are of 176*176.

#

so i got 89 brains scans

#

pixels is 176 by 176

#

channel is one because of brain scan

#

๐Ÿ‘

#

thank you so much

#

Now you should be able to create a classifier. Try using Transfer learning as you have only 89 images.

#

i got one more question

#

i basically have diagnose a condition which a yes or no value

#

should the labels be one hot encoded?

#

so i went back and looked at the Labels which are labeled "Target

#

and i got this result

#

the 89 responding to how many images i have

#

and 1 is binary

#

would that be a correct explanation

lone osprey
#

how do i know when user has already input a data type of every colum: int , float , string 255
@narrow flume u want to know if input is int or float or string?

lapis sequoia
#

most of the libraries and packages takes care of this. Automatically.
Also whether you should use OHE or binary (1/0) will depend on your loss function, you can use binary-cross-entropy or logloss.

#

thank you

narrow flume
#

can anyone help me with structure array

earnest forge
#

lol. I was about to ask the same

#

what are you triying to do, though?

lapis sequoia
#

Have the user input a list of columns for a table

Have the user input a data type for every column: int, float, string size 255

In a loop have the user input the values for each row

Ensure the string size doesn't exceed 255

Print the results when the user is finished

You may use Numpy or Pandas to do this
@narrow flume First get the columns names as inputs from user.
Then get columns types as inputs.
After you have this let user input the values of each row.

lapis sequoia
#

You have to something like that.
I'm not completing the code. You can do the validation for string with length max allowed (255) and check for types.

#

You can do the above with numpy also.

narrow flume
#

oh so we have to ask user for the data type of their input every time?

#

what is the size of string 255 ? @lapis sequoia

lapis sequoia
#

oh so we have to ask user for the data type of their input every time?
@narrow flume No we have to ask the type of column. And make user to enter that type.

#

what is the size of string 255 ? @lapis sequoia
@narrow flume size and len are two different function in python.
Check the question if you have to get size or len.
And you have to check if user is entering it correctly or not. If not then maybe discard that input or again ask user to fill in that row.
Check you question on what to do.
If its not mentioned then you can decide on your own.

narrow flume
#

then i will make a length function to check

fading dirge
#

i would like to find the 4x4 transformation matrix that best fits one 3d set of points to another 3d set of points
can i do that with scikit learn and what functions should i start looking at to accomplish that?

#

i think i could use their stochastic gradient descent module, but is there a better way that i just dont know about?

nocturne kraken
#

least squares

#

you're basically trying to solve a least squares problem XA = Y

#

and there's a solution to that which is just the pseudoinverse

fading dirge
#

cool, so it looks like the least squares in scikit fits a line to a set of points, is there an easy way to get it to learn a matrix? sorry i'm very new to DS im basically a full stack dev who got thrown onto some ds projects hahaha

weary heart
#

hi, if i got r2 score on train data 0.96 and test data 0.90 is it still count as overfitting?
and if so, how i to handle it? should i change the max depth and gamma? (i'm using hyper parameter tuning xg boost)

bold olive
#

How do you access the X_train, y_train, X_test, y_test after doing doing a KFold like this: cv = StratifiedKFold(n_splits=10, random_state=42, shuffle=True)?

I want to now fit the data like this:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()
lda.fit(X_train,y_train)
y_pred = lda.predict(X_test)```

and get the mean scores, mean ROC, etc.
scenic hollow
#

what does tf.compat.v1.get_default_graph means? Like what is computational graph?

bold olive
#

This I know, using the _train_test_split function, you can get the indices like this:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)```
#

But how to do the same for KFold?

lapis sequoia
bold olive
#

So basically fit the classifier in the for loop:

for train_index, test_index in cv.split(X, y):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]
    clf = lda.fit(X_train,y_train)```

And then calculate mean scores and AUCs, correct?
hollow sentinel
#

hey . guys

#

can someone explain what pandas.to_datetime means?

#

I've been seeing it pop up in a lot of Kaggle notebooks and I don't understand what it does

#

I've looked at the pandas doc too

calm forge
hollow sentinel
#

I think I won't figure it out until I do it in a project

#

or download a dataset and use datetime on it

heady hatch
#

Hey guys quick question on TFRecord format.

When is it worthwhile to use?

indigo steppe
#

if you understand the basics of python (basic functions,if statements,loops...),how hard is it to grasp ml with scikit-learn?

hard swan
#

I need help with adaline

#

I dont understand it

#

so how does it actually work?

#

and I dont really get weight and cost in ML

lapis sequoia
#

So basically fit the classifier in the for loop:

for train_index, test_index in cv.split(X, y):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]
    clf = lda.fit(X_train,y_train)```

And then calculate mean scores and AUCs, correct?

@bold olive Yes that is the idea. Just take the mean of all the metrics you want to have.

bold olive
#

It worked btw, @lapis sequoia. Managed to get the confusion matrix, mean accuracy and ROC curves of all folds with mean AUC.

velvet thorn
#

can someone explain what pandas.to_datetime means?
@hollow sentinel it just converts a value or number of values to the datetime type

#

what donโ€™t you get about it?

shell berry
#

Are there any examples of using SVMs for multilabel problems, with a SVM per label perhaps?

odd yoke
#

you can do multi label by combining any binary classification model, so yes, it's possible

#

not that it necessarily gives good results

shell berry
#

@odd yoke Thanks - do you have any code examples or is there a pipeline in scikit learn for it?

#

I have a multiclass multilabel dataset and decision trees arent performing particularly well

#

Maybe I need more features?

odd yoke
shell berry
#

Thanks for the link! That looks like exactly what I need

#

I was thinking I had to hardcode it all myself

#

Is it not helpful most of the time @odd yoke ?

#

Do you recommend multilabel classifiers over wrapping a SVM or something into a multioutput?

grave thunder
#

Hey lads, I have a pandas question. What's the difference between groupby and sort_values?

patent flame
#

group by groups based on a condition

tidal bough
#

They return different stuff, and probably also have different algorithmic complexities (groupby requires one pass over the array, so O(n), sorting, as always, is O(n log n))

patent flame
#

sort sorts the values based on a condition

#

for example:

#

[1, 2, 3, 4, 5, 6]

#

if u group this by > 4

#

then u get [5, 6]

#

if u sort it from large to small

#

u get

#

[6, 5, 4, 3, 2, 1]

#

@grave thunder

grave thunder
#

And if I do groupby("Collumn").max() how is that different then from sorting?

velvet thorn
#

that's a totally different operation

grave thunder
#

It will also return in order

velvet thorn
#

groupby and sorting are not more than tangentially related

patent flame
#

instead of being condescending about his question

velvet thorn
#

the point of groupby is to split some data into groups and apply an operation to each group independently

patent flame
#

you can just answer it, u know

velvet thorn
#

you can just answer it, u know
@patent flame relax, I'm answering it

#
  • creating an example
grave thunder
#

Ah I see, and sorting is used more like for printing data

velvet thorn
#

not necessarily

#

sorting is used when you want to impose some kind of order on data

grave thunder
#

And groupy by when I wanna operate on a group of data

velvet thorn
#
>>> df
    fruit  price
0   apple    1.8
1   apple    1.3
2    pear    2.3
3  banana    3.7
4    pear    2.5
5   apple    1.5
6  banana    3.4
>>> df['price'].max()
3.7
>>> df.groupby('fruit')['price'].max()
fruit
apple     1.8
banana    3.7
pear      2.5
Name: price, dtype: float64
#

a quick example

#

if you want to answer the question "which is the most expensive fruit", then you just do df['price'].max() (the max price)

#

but if you want to answer the question "for each fruit, what is the highest price", then you need to do a groupby.

grave thunder
#

Ohhhh

velvet thorn
#

conceptually, this splits the DataFrame into one "mini-DF" for each value of fruit, so you have one mini-DF where fruit is apple, one for banana and so on

#

then you get the .max() of each

#

then you combine them together.

#

does that make sense?

grave thunder
#

yup yup ^_^

velvet thorn
#

okay

#

so sort just orders values

#

now, you see the DF above is not ordered in any way

grave thunder
#

Thanks lad! I've been trying to wrap my head around that for a while now

velvet thorn
#

but I can impose an ordering:

>>> df.sort_values('price')
    fruit  price
1   apple    1.3
5   apple    1.5
0   apple    1.8
2    pear    2.3
4    pear    2.5
6  banana    3.4
3  banana    3.7
#

so now it's ordered by price in ascending order

#

you can just answer it, u know
@patent flame happy now?

grave thunder
#

๐Ÿค

patent flame
#

I'm not the one you should please. Be a better person for yourself not for anyone else.

grave thunder
#

Chill, you both helped out. Thanks lads

velvet thorn
#

I'm not the one you should please. Be a better person for yourself not for anyone else.
@patent flame that's pretty ironic because you seem rather quick to jump down someone's throat

#

Chill, you both helped out. Thanks lads
@grave thunder yw

patent flame
#

i dont think u can have nudity in profile pic

hollow sentinel
#

let's calm down bois we're all friends here

velvet thorn
#

if you understand the basics of python (basic functions,if statements,loops...),how hard is it to grasp ml with scikit-learn?
@indigo steppe too early; don't do it.

#

you can follow a tutorial, and maybe something will kind of work, but way too soon you will run into problems that are above your level

#

work on your fundamentals (not just programming; mathematics too) for a while first.

grave thunder
#

Learn about sigmoid functions for example

velvet thorn
#

ML has become a lot more accessible in recent years, but it's still a very complex subject.

hollow sentinel
#

@indigo steppe you can try a udemy course that I'm using: Python for Data Science and Machine Learning Bootcamp by Jose Portilla

#

not that all your doubts will be cleared but it's a good start

grave thunder
#

Hey, I've gone through that one too! Albeit it's very well done it's not for total beginners

hollow sentinel
#

you can also try this udemy course: 2020 complete python bootcamp from zero to hero in python by Jose Portilla, Kaggle mini courses, and Andrew Ng's course

#

I haven't finished the python for DS & ML bootcamp bc of college I'm still on linear regression

grave thunder
#

Yup, definitely can recommend Portilla. Among best 20 bucks I've spent.

#

You'll get there. ML is super fun and applicable almost everywhere. I have custom py ML programs for stocks

hollow sentinel
#

lmao dude I can't even figure out the right dataset to use for linear regression

#

I've been looking at Kaggle datasets

grave thunder
#

Kaggle is good imo

hollow sentinel
#

yeah I need tabular data otherwise it's lots of value_counts

grave thunder
#

But depends on what you wanna use ML for, I went through course mostly to automate my day trading and I have constantly updating market that comes in nicely sorted json or csv files

velvet thorn
#

lmao dude I can't even figure out the right dataset to use for linear regression
@hollow sentinel what do you mean "right"?

hollow sentinel
#

@velvet thorn like I wouldn't know how to do linear regression on a dataset of words

velvet thorn
#

ah, okay

#

well there are different things you can do

hollow sentinel
#

natural language processing

velvet thorn
#

for example, sentiment analysis

#

for text

#

that's the most common case

hollow sentinel
#

I also need a dataset that's betwen 50 and 100 KB otherwise seaborn takes too long to make a graph of it

velvet thorn
#

??

#

what kind of graph are you making

hollow sentinel
#

distplot

#

I don't have the dataset anymore tho ๐Ÿ˜ฆ

velvet thorn
#

that doesn't sound right...

hollow sentinel
#

found it

oblique socket
#

Is there a better way to get the euclidean norm of a row using pandas and numpy?

        for index, row in sums.iteritems():
            df.iloc[index] = df.iloc[index].divide(row)
        return df```
#

preferrably like a one liner

#

well, not just get the norm, but also normalize the row but dividing by the norm.

velvet thorn
#

huh.

#

so do you want the norm or not (stored separately)

#

or do you just want to normalise

oblique socket
#

I just want to normalize

velvet thorn
#

use np.linalg.norm

oblique socket
#

I don't really care about the norm

velvet thorn
#

although there's a better way to do that

#

sec

oblique socket
#

does that work on a row by row basis?

lapis sequoia
#

in pandas, I have a column with strings like "foo_2020_10_11", how can I extract that date as a datetime?

oblique socket
#

I thought I tried that

velvet thorn
#

df / np.linalg.norm(df, axis=1, keepdims=True)

#

@oblique socket

#

in pandas, I have a column with strings like "foo_2020_10_11", how can I extract that date as a datetime?
@lapis sequoia use a regex

#

or rather, pd.to_datetime with a regex

lapis sequoia
#

thanks

oblique socket
#

Thank you!

velvet thorn
#

yw!

oblique socket
#

I knew there was a simpler way

velvet thorn
#

yup

#

use of iterrows/iteritems/itertuples is very often an antipattern

oblique socket
#

I figured, it just seemed like spaghetti

velvet thorn
#

try to get used to broadcasting/vectorisation

#

it helps

oblique socket
#

I have this function to normalize a dataset

#
    # minmax feature scaling
    if method == 'minmax':
        # scaled value = (value - min) / (max - min)
        # should also return min and max values for future use
        # if new values are added to the dataset
        # normalize on scale [a, b] (default is [0, 1]
        normalized_df = a + (df - df.min())*(b - a)/(df.max() - df.min())
    if method == 'mean_normalization':
        normalized_df = (df - df.mean()) / (df.max() - df.min())
    # z-score normalization (standardization)
    elif method=='standardize':
        # make each feature have zero mean and unit variance
        # should also return mean and std for each attribute
        # for future use in case new values are added to dataset
        # This method is widely used for normalization in many machine
        # learning algorithms (e.g., support vector machines,
        # logistic regression, and artificial neural networks).
        normalized_df=(df-df.mean())/df.std()
    elif method=='unit':
        # x' = x / ||x||
        # sums = df.apply(lambda x: np.sqrt(np.sum(x**2)),axis='columns')
        # for index, row in sums.iteritems():
        #     df.iloc[index] = df.iloc[index].divide(row)
        normalized_df = df / np.linalg.norm(df, axis=1, keepdims=True)
    return normalized_df```
#

It's complete, for now

velvet thorn
#

HELP

#

I'M BEING DROWNED IN COMMENTS

#

okay purely from a software engineering perspective

#

this is kind of dodgy IMO

#

I would write one function for each method of feature scaling

oblique socket
#

yeah, I guess I could do that

#

I probably should

#

right now I'm the only one using it

velvet thorn
#

yup, it's up to you

#

also snake case for function names is much preferred

oblique socket
#

oops

#

I guess I'll clean that up!

#

thanks for your input!

tidal bough
#

Alternatively, make those docstrings.

#

I'd say it's more important, since comment would take actually going to your code and reading it.

oblique socket
#

good point

shell berry
#

Seems like from the source that you can't do mult-class and multi-label together?

#

Are there any workarounds or other wrappers for this

velvet thorn
#

Seems like from the source that you can't do mult-class and multi-label together?
@shell berry what is that from

lapis sequoia
#

I need a novelty voice TTS engine with python..
but the only good engine I see is pyttsx3
and microsoft bob is most definitly not a novelty voice...
Im specifically trying to approximate glados
from portal
I found this: https://github.com/EtiennePerot/gladosvoicegen

but it looks terrible... and is 6 years old
and requires melodyne which wont work on my linux server
SuriyawongToday at 4:45 PM
next I found this
https://github.com/kairess/tacotron

but that takes a 130 GB dataset
so... yea thats out

#

OK... alternatively... because I cant find something good...

#

what if I used pyttsx3's microsoft lucy or whatver

velvet thorn
#

what limitations do you have

lapis sequoia
#

and then distorted it

velvet thorn
#

I'm assuming it has to be free?

lapis sequoia
#

yes

#

and needs to be fast 5 seconds MAX delay from discord bot command to saying it in VC

velvet thorn
#

hm.

#

you ask much ๐Ÿฅด

lapis sequoia
#

quad core 4gb ram VPS. No gpu though

velvet thorn
#

no GPU
โ—

#

okay I guess

#

won't know until you profile it

lapis sequoia
#

*running on a potato

velvet thorn
#

hm.

#

just an idea but

#

why don't you use those generic TTS services that have been around since forever

#

you know, the robotic ones

#

and then just have a transformer to make it more GLaDOS-like

lapis sequoia
#

and fuck with it to make it distorted

velvet thorn
#

ye

lapis sequoia
#

yep that was fall back idea

velvet thorn
#

I think that'd be more efficient

lapis sequoia
#

that looks like what this does

#

its slow because it uses a GUI tool and automates it anyway

#

but idk how to do that distortion command line

#

and their hacky solution of VM with windows and melodyne is not an option lol

#

that one that actually works does the same thing

#

OK.. I thought GladOS would be easy... perhaps theirs some other novelty tts engine I could use

#

morgan freeman, or snoop dogg, or something else... though that seems way more complex

velvet thorn
#

OK.. I thought GladOS would be easy... perhaps theirs some other novelty tts engine I could use
@lapis sequoia I'd say GLaDOS is the easiest because you can just do what we said above

#

it already sounds kinda like TTS

#

on the other hand, a real human's voice is more complex.

#

this is an unsolved problem btw

#

realistic TTS is worth a lot of $$

lapis sequoia
#

yea thats what I was thinking

#

hum... well good news...

#

pyttsx3 is so awful it sounds close to glados already

#

NVM thats not pyttsx3 thats espeak

#

not their fault its bad I guess

#

ok other voices actually do decient this could work

#

so... how would I add robotic distortion to a audio file?

lapis sequoia
#

oh ffs. I cant even get pyttsx3 to change the fcking voice

#

or volume or anything

lone osprey
#

U can

#

My friend did change its voice

#

I don't know what code to change

#

Check in google or docs

lapis sequoia
#

ok gtts actually works deciently

#

though Im not thrilled with the delay and external server need

shell berry
#

I keep getting this error when training, but if I set my test set to like 0.001% it goes away

#

Whenever I try a sizable test test I get the error again. I tried np.unique to make sure I had two classes and I do. Any ideas? appreciated

lapis sequoia
#

what is the average statistic that value more present than past in time series?

hasty grail
#

Exponential Moving Average?

shell berry
#

I fixed that; I ran a SVC which takes like 10 mins to train and gives me ~77% accuracy, but a linear SVC takes 2-3 seconds and gives me 97% around no matter what my test split is

#

Is it really that performant or a false reading?

lapis sequoia
#

accuracy of train, test or validation set?
Also are you performing proper splitting?
Try use grid search Cross validation and enter the desired hyper-parameter values for both and compare the results.

shell berry
#

Just test @lapis sequoia , Im doing this: python x_train, x_test, y_train, y_test = train_test_split(x_counts, output_labels, test_size=0.33, random_state=100)

dusty depot
#

if it's giving 97% accuracy on the test set

#

that's probably okay then

#

linearsvc can converge a lot faster

shell berry
#

Something seems off because that's really really high

dusty depot
#

is this sklearn?

shell berry
#

Yessir

#

I used a randomforest and got ~70%

dusty depot
#

hmm

#

try bumping up the test size to like 50% and see what happens maybe?

shell berry
#

Just tried that

lapis sequoia
#

linearsvc can converge a lot faster
@dusty depot it can converge lot faster but the results shouldn't be so different.

dusty depot
#

oh, no matter what your test split is

#

hm yeah

shell berry
#

got like 0.002% higher lol

lapis sequoia
#

can you paste the code.

shell berry
#

It can't be my data splitting because I''m splitting it the same way for randomforest and etc

#

Not sure if I can paste the entire code, this is for school

#

Oh oops

#

Ok lmao

lapis sequoia
#

message me if you like.
I just need to see the splitting part and training

#

also the results.

shell berry
#

This is really really embarrassing - I was testing on the train set. I must have changed my code and forgot to change it back

dusty depot
#

oh

#

oop

lapis sequoia
#

lol.

shell berry
#

lol .. ๐Ÿ˜ฃ

#

Im getting 72% now haha

dusty depot
#

rip

shell berry
#

Trying a normal SVC now, should get vastly different results since I'm actually doing stuff properly now

#

I've spent 90% of my time on this assignment cleaning the data

#

Are real world projects mostly like that lol

#

Ok a normal SVC gives me 56% now ๐Ÿ˜ฆ

lapis sequoia
#

Welcome to Data Science.
Basic cleaning is nothing.
In some projects I have spent 70% to 80% time in cleaning and cleaning only.
And I'm talking about a multiple month long project.

#

Extraction, Cleaning and Transformation will be the biggest problem in almost every project.

turbid hearth
#

Does the cross validation graph look good

#

sorry, im new to this and im getting a negative r-squared value compared to a baseline model

#

but it looks like the graph of the model i created plateaus

lapis sequoia
#

I don't know what is your CV score but from Loss plot I can say that you are doing something wrong or at least you are not doing something correctly .

Also Baselines are used as a reference point.
Your train and test losss should be lower and R-squared value should be near 1. 0 being the worst and 1 being the best.

hasty grail
#

I feel that I am missing something because the Keras callback isn't working. Can someone point that out?

#
def get_pred_loss_dataset(test_dataset: tf.data.Dataset, model: tf.keras.Model) -> Tuple[tf.data.Dataset, tf.data.Dataset]:
    """
    Returns a dataset that yields the prediction and loss for each batch in the test dataset.

    Parameters
    ----------
    test_dataset : Dataset
        The test dataset to evaluate on. Yields `(x_true, y_true, ...)` (batched).
    model : Model
        The model that predicts on the test dataset.

    Returns
    -------
    pred_dataset : Dataset
        The resultant dataset that is suitable to be zipped with `test_dataset`.
        Yields the batch prediction.
    loss_dataset : Dataset
        The resultant dataset that is suitable to be zipped with `test_dataset`.
        Yields the batch loss.
    """
    pred_dataset = test_dataset.map(lambda x_true, *_: model(x_true))

    print(test_dataset)
    print("Obtaining loss values...")
    losses = []
    def on_batch_end(batch, logs):
        print(f"batch: {batch}, loss: {logs['loss']}")
        losses.append(logs['loss'])
    log_batch_loss = tf.keras.callbacks.LambdaCallback(on_batch_end=on_batch_end)
    results = model.evaluate(test_dataset, callbacks=[log_batch_loss])
    print(results, losses)
    loss_dataset = tf.data.Dataset.from_tensor_slices(tf.stack(losses))

    return pred_dataset, loss_dataset
#

Console output (yes I know the model sucks, that's why I am looking at where it went wrong):

<BatchDataset shapes: ((1, 512, 512, 3), (1,), (1,)), types: (tf.float16, tf.int32, tf.float16)>
Obtaining loss values...
3965/3965 [==============================] - 578s 146ms/step - loss: 4.4005 - top_1_accuracy: 0.0504 - top_3_accuracy: 0.0918 - top_5_accuracy: 0.2683
[4.40053129196167, 0.05044136196374893, 0.09180327504873276, 0.2683480381965637] []
#

As you can see the print statement in the callback isn't being called

#

I feel dumb for still not seeing the mistake after staring at the code for several minutes

#

ok I still have no idea lol

#

oh

#

I'm such an idiot

#

on_batch_end

A backwards compatibility alias for on_train_batch_end.

#

from the docs

#

so LambdaCallback is useless when not training

#

Nowhere in the docs for LambdaCallback was this mentioned

#

It's only mentioned in the base class Callback

rose swift
#

hi

grave thunder
#

Quick pandas question. Say I have DataFrame

 col1      col2

A a1 a2
B b1 b2
How do I check row B, column 1 if it has value b1 and if it does, drop that whole row? I tried with df.drop(df.loc[df["col1"] == "b1"]) but it doesn't work

keen sinew
#

hi

#

can anyone help me out with this?

lapis sequoia
#

Quick pandas question. Say I have DataFrame
How do I check row B, column 1 if it has value b1 and if it does, drop that whole row? I tried with df.drop(df.loc[df["col1"] == "b1"]) but it doesn't work
@grave thunder If you know its rows B then you can directly drop it. using df.drop(index = 'B').
Or
index_to_drop = df[df['col1'] == "b1"].index
df.drop(index = index_to_drop )

#

also you have to make inplace = True if you want to reflect the changes.

#

can anyone help me out with this?
@keen sinew Well there is no code. But it means you are missing some imports or there is a version clash which is not directly obvious. There can be other reasons too.

rain stone
#

Ma I get the algo info here?

somber bane
#

hello, I am still new to algorithm.
I am planning to build a recommendation system, maybe just a basic one
Can any one give me some helpful recommendation on how should I start and what method should I use, things like that.
I plan to use the feedback and ratings for others users as the data for the recommendation

velvet thorn
#

Quick pandas question. Say I have DataFrame
How do I check row B, column 1 if it has value b1 and if it does, drop that whole row? I tried with df.drop(df.loc[df["col1"] == "b1"]) but it doesn't work
@grave thunder df = df[df['col_1'] != b1]

grave thunder
#

@velvet thorn You save me once again

velvet thorn
#

np

regal belfry
#

whats a good deep learning home workstation?

lapis sequoia
#

Anyone here use Spyder IDE? How good is it in proceding eye-pleasing visual results?

#

producing*

limpid oak
#

if you have python background and wants to practice google earth engine, which one is comfertable, using python lib in conda or js on gee platform?

#

please suggest

coral trellis
#

Hi guys I wonder a thing. Which libraries are most use for NLP? PyTorch or TF-Keras?

hollow sentinel
#

@lapis sequoia Idk but I find the SpyderIDE kinda ugly. If you're doing data science I would recommend Jupyter Notebook

lapis sequoia
#

Anyone here use Spyder IDE? How good is it in proceding eye-pleasing visual results?
@lapis sequoia Personally I'm a big fan of the RStudio IDE which used to be solely for R but has recently gained support for Python in the preview version 1.4. I'm not sure it's as feature complete as for R but it's getting there.

#

I have tried using Spyder as well but I just couldn't get used to it. The problem I have with Jupyter notebooks is that I can't readily see what variables I defined and what they look like and there is no real data browser.

real geode
#

just use Jupyter on Visual Studio Code

#

it shows you the variables and has a data explorer

bitter harbor
#

RStudio is great (minus the fact that it's r) but the desktop version that comes with anaconda feels like it's lacking for some reason

#

I had been using spyder for a while and it was pretty good

hollow sentinel
#

brand_of_car = car_data.groupby('brand')['model'].count().reset_index().sort_values('model',ascending = False).head(10)
brand_of_car = brand_of_car.rename(columns = {'model':'count'})
fig = px.bar(brand_of_car, x='brand', y='count', color='count')
fig.show()
#

guys what is groupby

grave frost
#

@hollow sentinel Just google it bro

hollow sentinel
#

i did

#

it's grouping data

#

but like what does that mean

#

how does grouping the data help

grave frost
#

@coral trellis It depends on what you want to do. I recommend TF if you want to implement some DL paper published by Google and have a SOTA model for your task. Else just find a tutorial that covers all the theory in ML and learn that first before diving into text generation, etc.

#

@regal belfry Depends on what kind of tasks you want to do ๐Ÿ™‚ for most people, a GTX 1050ti would work well enough (since you would be using colab for heavy tasks anyway)

#

@somber bane For what data do you want the recommendation system? What is your tentative metric for that data type?

hollow sentinel
#

what's the difference between using df["column"] v df.column

somber bane
#

@grave frost I was ask user to give a 1-10 scale of rating on shows. And then base on the average rating system along with the type of genre, maybe also on user's age, recommend them shows

analog hatch
#

@hollow sentinel groupby is to group similar attributes in a column that is why its usefull

#

and df["columns"] depends if u have a column name column

grave frost
#

@somber bane Hmm.. seems workable. How much accuracy should it have? Like is it for personal use or you want to use it in a real world scenario?

analog hatch
#

df.columns shows your columns in the dataframe

somber bane
#

I am building this for my freshman computer science project

#

but I plan to publish it for public use

#

so maybe as accurate as possible, but does not require

grave frost
#

If the project is all about a recommendation system, I recommend you use some industry-level algo instead of implementing your own. However, if your teacher expects a custom system, then that's a different story..

heady hatch
#

You can write a basic one, given that you know how to use numpy and linear algebra.

somber bane
#

@grave frost so do you have any recommend industry level ones

#

I just learned to use numpy and pandas, so where should I start. I mean I need some help in setting up a basic picture and working frame behind the algo

grave frost
#

If you are sure that the project's goal is not to make your own custom algo, then industry ones are obv good enough. How does your data look like?

somber bane
#

for right now I have not start to collect data from user's yet

grave frost
#

If it was me, I would be developing some method based on simple ML techniques

somber bane
#

what is ML standand for?

grave frost
#

But it may require some good amount of coding and thoery

#

ML- Machine Learning

somber bane
#

oh,

grave frost
#

But don't worry, it would just be a bit of maths

#

not actual ML

somber bane
#

I recently look up at this website

#

I think I can understand most of it, but I do believe my teacher hopes me to build one by myself, not just do some copy and paste

heady hatch
#

To clarify when you say your teacher hopes you build one yourself, do they mean implement a library or algo vs using some library?

somber bane
#

algo

heady hatch
#

Like is it okay to

import recommendation_engine
recommendation_engine.fit()
recommendation_engine.predict()
violet veldt
#

guys, i want to split x axis labels into years, and labelless ticks between them, how can i do that?

somber bane
#

I think he will be okay with it, since I am only a freshman

heady hatch
#

Okay well, @grave frost might have more libraries. But couple that comes to my mind is Surprise and ALS from spark.

somber bane
#

so do I go ahead and study on how should I use those library, and implement them?

heady hatch
#

But recommendation problems is a bit tricky to begin with since it requires you to have an understanding of what the algorithm is doing.

But like previously stated, it only requires a bit of understanding.

#

Implementing them from scratch might be a bit rough.

#

But I think it would be useful to look at source code to see how they're implemented.

#

If you want something easy to digest and start with, you can grab your data and then recommend items most popular by ratings.

somber bane
#

so can you recommend one library that is friendly to beginner

heady hatch
#

I think Surprise is pretty friendly.

#

I think it's the whole problem that requires a bit of understanding.

#

Once you understand the problem, the library is just tools to help solve it.

somber bane
#

I will workhard on the understanding part, could you help me find some sources that you think is helpful for me to begin learning with Surprise. Thanks

heady hatch
#

I think the link you used talked about it.

#

Here's one from google.

somber bane
#

Thank you very much! @heady hatch

heady hatch
#

But if you built a basic one like just recommending stuff based on popularity, you won't really need much of anything other than maybe data science.

ie
-> group by some category
-> grab top 10 items in those categories

somber bane
#

no, I ask from my professor for something that is challenging. Because I experienced on how to program before. So his purpose is not to keep me boring in the class

heady hatch
#

Ahh okay then, check out those resources and have fun!

somber bane
#

๐Ÿ˜€ I might come back with more questions @heady hatch

#

Thanks a lot

hollow sentinel
#

to think I wanted to make a linear regression out of this

heady hatch
#

Looks like there could be some kind of relationship there! Maybe under log transformation?

#

Or maybe no relationships at all.

hollow sentinel
#

dk what that is

heady hatch
#

Me neither.

hollow sentinel
#

but def not linear regression

#

I'm gonna do it anyways for the learning experience

heady hatch
#

Do it!

#

btw daspecito, I noticed you're unfamiliar with Python. It's great that you're eager to do data science but I think it's important to be familiar with Python first.

hollow sentinel
#

I took my college CS course when I was a cs major

heady hatch
#

Once you get familiar with Python more, do a bit of data science, check out other's notebooks, and then back and forth.

#

Okay, that's good.

hollow sentinel
#

I'm just rusty lol

heady hatch
#

But it doesn't seem to help your unfamiliarity with Python.

hollow sentinel
#

where am I unfamiliar

#

a lot of places

#

hahhahha

heady hatch
#

hahaha

#

I'm not saying you're terrible at programming.

hollow sentinel
#

I don't know OOP

heady hatch
#

Just should be more familiar with Python syntax.

hollow sentinel
#

that is something I should learn

heady hatch
#

Because when you jump into data science, you don't want to be dealing with both Python and Data Science. Since both topics are quite wide.

#

You just want to focus on getting the information you want rather than dealing with syntax troubles.

hollow sentinel
#

oh like that one time I got confused and kept writing state

heady hatch
#

Ye.

hollow sentinel
#

yeah that was dumb

#

Well I was newer to pandas then

#

I make a lot of mistakes starting out

#

I'll get better

heady hatch
#

๐Ÿ’ช I know you will.

hollow sentinel
#

so do you most commonly use a jointplot to see if there's a relationship?

#

between two variables?

#

jointplot in seaborn I mean

heady hatch
#

Yea, jointplot works. I think I also use pairplot.

hollow sentinel
#

yeah my pairplot isn't very promising either

#

not very surprised

#
df.drop(['Unnamed: 0','vin'],axis=1,inplace=True)
#

what does axis = 1 control

heady hatch
#

Second axis.

hollow sentinel
#

but what does second axis mean

heady hatch
#

if you have row and col, it's col.

hollow sentinel
#

oh

heady hatch
#

It's referring to dim.

hollow sentinel
#

hahhhahahha linear alg also don't know that

heady hatch
#

Because matrix can have more than 2 dim.

hollow sentinel
#

I may need to take a linear algebra course

heady hatch
#

It would help.

#

Or if you're going to be working mainly with 2 dimensional data, you could look into data science courses on MOOC.

bitter harbor
#

3b1b has a pretty good series on la

#

that's where I learnt it from

#
  • the basics of how nn's work
#
  • calc if you need it
#

really if I need anything math related I go to him first

#

super nice guy too

hollow sentinel
#

very cool @bitter harbor I'll look at his stuff

#

thanks

bitter harbor
#

i'd suggest taking notes, la gets pretty heavy pretty quick

#

but the comments are full of people complaining that he taught the subject better in a video than their profs in a semester

heady hatch
#

I wonder why they don't use comment to talk about LA stuff, I think it would be a better use of their time.

bitter harbor
#

it's yt?

heady hatch
#

hahaha

hollow sentinel
#
carsData.drop([labels = "vin","lot"],axis=1)
#

I'm trying to drop columns "vin" and "lot"

#

and it says I have a syntax error unsurprisingly

analog hatch
#

if u makign linear regression on data u need to find a closer correlation between certain columns

#

try using corr()

#

is a great way to find what is the best one

hollow sentinel
#

thank you @analog hatch

analog hatch
#

btw each factor of a data is important if u puting it in as the X_train data

#

dont exclude anything other then columns with strings

hollow sentinel
#

so don't take out the lot or the vin number

#

some kaggle notebooks did that so I was wondering if I should do it too

analog hatch
#

I mean if u are doing a tutorial might as well but data is important even if their is barely correlation ofcourse correlation would always be the main factor for the outcome

hollow sentinel
#

makes sense

#

yeah I'm just looking how they decide to visualize the data to make it look good

analog hatch
#

I mean u are doing it good with seaborn and matplotlib

hollow sentinel
#

yeah

analog hatch
#

u can try plotly is more 3 dimnesional

#

if u want to use it

hollow sentinel
#

pairplot, distplot, lmplot does the job pretty well

analog hatch
#

yeah pairplot works similar as .cor()

hollow sentinel
#

i think i've seen .corr() before

analog hatch
#

I am not an expered at this but i do like ML and DL

#

XD I enjoy that shit

hollow sentinel
#

hahaahhahahahh I am nowhere near an expert

#

I just started DS & ML like 2 weeks ago

analog hatch
#

yeah it must be hard

#

dammm thats good thou

hollow sentinel
#

yeah I've been doing a udemy course

analog hatch
#

yoooo me too

#

udemy is the best

#

I been doing on and off for a year

hollow sentinel
#

python for data science and machine learning bootcamp?

analog hatch
#

yess great course

#

deep learning courses also

hollow sentinel
#

love it jose Portilla is amazing

analog hatch
#

yeahh he is really good at explaining the basic of it

hollow sentinel
#

yeah the way he gives you answers with documented code is great

#

so you can follow along

#

I always have his stuff open when I do my own work bc it's a good guide

analog hatch
#

True ones u finish doing data analysis the machine learning parts gets better

#

the deep learning is my favor one

#

is like 5 hours of course

#

but its great

hollow sentinel
#

I will try it

analog hatch
#

yeahh dude do it its worth it

#

you can do so much with ML

hollow sentinel
#

yeah my friend keeps telling me to switch to hacking and i'm like

analog hatch
#

True XD

#

ML is more modern

hollow sentinel
#

only after I completely master machine learning and make bank

analog hatch
#

you can be more creative

hollow sentinel
#

I just find it cool

analog hatch
#

yeah me too

hollow sentinel
#

I want to create algorithms that help predict cancer

#

that's what I find cool

analog hatch
#

actually with the basic u can make one easily with enough data

hollow sentinel
#

yeah but like an insanely accurate one

#

you don't even need to create an algorithm for that tbh

#

it's a classification problem

analog hatch
#

that takes time ofcourse cleaning and difying to make it almost perfect in the course you would learn how u are supposed to control your data so it does not overfit or underfit your results

hollow sentinel
#

yep

analog hatch
#

oo yeah thats logistic regression

hollow sentinel
#

haven't learned that yet

#

figured I'd do some linear regression on my own and then hop back into the course

analog hatch
#

cool cool i mean your in the right path

#

take your time and enjoy what u doing

hollow sentinel
#

definitely

#

I improve every day

analog hatch
#

cool cool you got any question I would be gladly to help

#

u can just dm it to me

hollow sentinel
#

great man I really appreciate it

#

all of you guys have been very supportive

heady tide
#

it's so cool that this method actually knows when a word is overused and lowers it's weight accordingly

hollow sentinel
#
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)


#

ValueError: With n_samples=1, test_size=0.3 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

#

uhhhhhh

#
X = [["Price", 'lot', 'year' ]]
#What you're using to predict the mileage
y = ["Mileage"]
#Trying to predict the mileage based on the price
#

that's what X and y are equal to

#

I got no clue

#

omg guys I think I figured out my first machine learning error

errant parcel
#

pyqtgraph is datascience right!

#

what the fresh hell is going on here

#

i assumed it was some 'first argument is self' business

#

but

#

wot

hollow sentinel
#

never heard of pyktgraph @errant parcel but I'm new to DS/ML

errant parcel
#

noice

#

what was your error in the end

#

and it turns out the issue with that is just that they accidentally made something optional that shouldn't have been

hollow sentinel
#

I didn't have the dataframe I was getting the column out . of

errant parcel
#

so it's possible to pass no arguments and it needs arguments

#

oh

hollow sentinel
#

so I was just putting in a list

#

lmao

#

that's the good type of error

quick epoch
#

Hi guys, does anyone know Matplotlib and is willing to help me out XD?

austere swift
#

just ask your question

hollow sentinel
#

^

quick epoch
#

Sorry, so, I want to produce a histogram that show how the different traffic levels impact the light changes. I got all the data. I kinda know what to do but I am struggling with making a multicolour curve and histogram

velvet thorn
#

Sorry, so, I want to produce a histogram that show how the different traffic levels impact the light changes. I got all the data. I kinda know what to do but I am struggling with making a multicolour curve and histogram
@quick epoch what do you mean multicolour?

#

got an example?

#

so it's possible to pass no arguments and it needs arguments
@errant parcel my guess is that it wraps a C library so it can't do argument checking on the Python side...?

#

but honestly the message looks p self-explanatory

errant parcel
#

well I'm not passing None

#

i'm passing nothing

#

my guess is that all arguments are optional, but when it calls addHandle it fails to actually provide valid default values

velvet thorn
#

my guess is that all arguments are optional, but when it calls addHandle it fails to actually provide valid default values
@errant parcel don't have enough experience to say, but one way of implementing overloads is to have None as default arguments

#

๐Ÿคทโ€โ™‚๏ธ

errant parcel
#

yep but i think the combinations that it allows are wrong

velvet thorn
#

yeah, that's possible

#

just throwing in my utterly uninformed two cents

#

it's a classification problem
@hollow sentinel it being a classification problem doesn't mean a new algorithm/architecture wouldn't be appropriate/necessary

hollow sentinel
#

true @velvet thorn

quick epoch
#

Something like this

#

But I just want a single line

velvet thorn
#

But I just want a single line
@quick epoch that is a single line

#

or do you mean a single plot?

#

anyway, I believe that's from an official MPL example, right?

#

so you should just be able to follow it

hollow sentinel
#

so there's no graphs you create after a logistic regression

#

after you create the model you just do the classification report and that's how you can judge how the model did

#

right?

hollow sentinel
#

also has anyone's tab shift tab to see jupyter doc stop working?

#

mine doesn't work at times and I don't get why

velvet thorn
#

after you create the model you just do the classification report and that's how you can judge how the model did
@hollow sentinel that's a start

#

but there are many other things you can do

#

look into lift

#

calibration

#

ROC-AUC score

#

PR curve

hollow sentinel
#

oh my udemy course didn't mention those haha probably bc it's introductory

#

idk why my tab shift isn't working

#

when you have X why do we use a list inside of a list

velvet thorn
#

when you have X why do we use a list inside of a list
@hollow sentinel because X must be 2D

hollow sentinel
#

bc train_test_split requires it?

velvet thorn
#

no

#

because otherwise

#

how would you tell the difference between N samples with 1 feature and 1 sample with N features?

hollow sentinel
#

uhhhhhh

#

feature?

velvet thorn
#

yes

hollow sentinel
#

oh ok i see

#

yeah there'd be no other way

#

thanks @velvet thorn that was a question that was bothering me haha

velvet thorn
#

yw

quick epoch
#

Yeah I tried but I did if statements to change the colours whenever a certain value appears. I will show you what I mean in a sec

lapis sequoia
paper niche
#

you have 5 bars but are trying to set 7 tick labels? @lapis sequoia

lapis sequoia
#

you have 5 bars but are trying to set 7 tick labels? @lapis sequoia
@paper niche How do I change this?

paper niche
#

your names list has 7 items in it

#

reduce it to 5 items. I see you have Logistic Regression and Decision Tree repeated twice. I suppose that isn't intentional?

lapis sequoia
#

Oops

#

your names list has 7 items in it
@paper niche I've been staring myself blind on this, thanks!

paper niche
#

yep, no sweat

lapis sequoia
#

yep, no sweat
@paper niche Is there a way to get the percentage shown in each bar?

paper niche
hazy field
#

hey,
i tried vgg2 face for face verification. and i was wondering, can we detect that the face is, in fact, a real human face, not some cut-out face print on cardboard ?
is there any research on this or any library that i can use?

#

hmm, i think i got the keyword, liveness detection!

sour cradle
#

I'm trying to convert my data into a format for ml. It gives an example where it uses audio from util, but I can't find anything about that online. Is there a drop-in replacement for that library?

#

never mind, it was a custom library

hollow sentinel
#

and I did this

#
total_cells = np.product(nfl_data.shape)
total_missing = missing_values_count.sum()

# percent of data that is missing
(total_missing/total_cells) * 100
#

on my dataset and I got that my data set missing 95% of it's data

#

did I do something wrong? is it possible to have a dataset that's missing 95% of its data?

lapis sequoia
#

I'm wondering why parameter tuning with gridsearchCV is giving me worse accuracy than the default?

livid tundra
#

I hope it's not a dumb question; where can one find interesting csv files to practice visualization/analysis as a beginner?

hollow sentinel
#

@livid tundra not at all. kaggle is great for that. There are notebooks that will teach you data cleaning, data visualization, and machine learning.

livid tundra
#

Thank you very much!

hollow sentinel
#

@livid tundra no problem

shell berry
#

How do you fit multiple features into a model? Would it be like [[f1, f2], [f1,f2]] etc. where [f1, f2] is one training example with two features?

tidal bough
#

each example would be a vector, yes

#

usually the input is a matrix of shape (n_samples,n_features)

shell berry
#

Thank you

tidal bough
#

so each row vector is a single datapoint

shell berry
#

What if feature 1 is a bag of words and feature 2 is a bag of POS tags? [[1,0,1,0,1],[0,0,0,0,0,1]] or something. Is this a good feature vector? It seems intuitively hard to "graph" these as one point

tidal bough
#

why not? That's a lot of binary features.
More generally, a feature is anything your model accepts ๐Ÿ˜›

shell berry
#

Cool thanks lol

#

What do you think would be better, a bag of bigrams of (word, pos_tag) or bag of words followed by a bag of POS tags?

#

Im sure it depends on the scenario but any thoughts?

tidal bough
#

the former seems to make more sense, though depending on the model it might not matter

#

like, if the model knows there's a correspondence between the two bags, it's the same as if they were already in bigrams

shell berry
#

Thanks @tidal bough, Ill try both just to experiment ๐Ÿ™‚ I now have a list of list of tuples, where each inner list is a sentence and each tuple is (word, tag). However, I can't use countvectorizer or tfid now. Is there another way to make it an input vector, or should I convert the tuples to strings?

bold olive
#

Any reason why mean accuracy from cross_val_score and manually average will be different?

heady hatch
#

What do you mean by manually average?

#

@hollow sentinel what do you mean by no accuracy?

bold olive
#

@heady hatch outputting separate accuracies for each fold and then taking the mean of it.

hollow sentinel
#

like accuracy is blank in the pic I sent

heady hatch
#

@hollow sentinel if you take a look at accuracy in the third column, it shows 0.84.

#

@bold olive So from my understanding are you taking the accuracy of predicted train and predicted val and taking the mean?

bold olive
#

Yes, exactly. Accuracy from each fold and then averaged in the end.

hollow sentinel
#

can someone explain what StandardScaler is and why you need to do it on your dataset before you run k nearest neighbors

#

also how is k nearest neighbors regression different from linear regression

heady hatch
#

@bold olive I believe cross_val_score uses Kfold validation.

#

Meaning that it trains it on training set and then score it on validation set.

#

@hollow sentinel You want to scale the features before running KNN because KNN takes distance into account. Scales of different features will throw these distance calculation off.

KNN vs linear regression, I recommend reading on the two algorithms.

In short, KNN uses k neighbors to calculate the score. While linear regression uses a linear model, ie y = mx + b.

hollow sentinel
#

i think it's time to break out the Intro to Statistical Learning

heady hatch
#

Good luck.

hollow sentinel
#

It's so boring to read

heady hatch
#

It might help to apply the concepts to real life.

bold olive
#

@heady hatch , we can change the validation technique in the function.

heady hatch
#

What do you mean by in the function?

bold olive
#

cross_val_score(X, y, cv)

#

cv can be anything we declare.

heady hatch
#

Did you set anything there?

#

Because I think by default, it uses kfold.

bold olive
#

Yes, I'm using stratified shuffle split and calling it.

#

Even if I use KFold the problem is that the accuracies are different!

#

cross_val_score is reliable right?

heady hatch
#

it just a function that does cross validation. hahaha

bold olive
#

Yeah ik I meant the way it calculates the accuracy metric

heady hatch
#

Yup.

bold olive
#

Something probably wrong in my manual approach then!

heady hatch
#

Hmm one question to ask is why are you looking the accuracy of training set?

bold olive
#

Uh, not the training, test

#

Or does crossval compute training accuracies?

heady hatch
#

It does not.

#

I was wondering because you said you took the average of training and validation.

bold olive
#

No! The average over all folds

heady hatch
#

So you trained the model on the training set, scored it on the validation set and then took the average of all the validation score?

bold olive
#

Yes

heady hatch
#

Ahh.

#

Hmm the other factor I would probably consider is maybe the splitting through each fold.

#

Might not be splitting the same way.

#

I think something you can try is write your own splitting function

seed it
do cross_val_score using your own splitting function
seed it again
split it the same way each fold.

shell berry
#

Does anyone here versed well in scikit learn offering paid tutoring services?

hollow sentinel
#

@shell berry you can try finding a udemy course that does that. Are you a beginner to machine learning? If you are I would use Python for Data Science and Machine Learning Bootcamp by Jose Portilla

oblique socket
#

Is there a better way to implement cross_validation_split using pandas and numpy?

    folds = []
    fold_length = df.index.size // num_folds
    shuffled = df.sample(frac=1)
    for i in range(num_folds):
        folds.append(shuffled.iloc[i*fold_length:(i+1)*fold_length])
    return folds```
cerulean spindle
#

Have you tried the KFolds module in sklearn? Iโ€™m pretty sure you can do that automatically without a function in the cv parameter in cross_validate.

oblique socket
#

I saw that, I wanted to try it without sklearn first

cerulean spindle
#

Oh ok.

oblique socket
#

I'll try that

velvet thorn
#

you can just use len(df)

#

also, if you do it that way, your folds will all be the same size

oblique socket
#

oh yeah, I wasn't sure if I could do that

#

or if it made a difference

velvet thorn
#

which could omit rows if the number of rows you have is not perfectly divisible by the number of folds

#

other than that it looks more or less okay

#

space out your operators

oblique socket
#

oh yeah

#

thanks

velvet thorn
#

yw

oblique socket
#

also, if you do it that way, your folds will all be the same size
@velvet thorn What do mean? Are they not the same size?

velvet thorn
#

which could omit rows if the number of rows you have is not perfectly divisible by the number of folds
@velvet thorn see this

oblique socket
#

yeah

agile wing
#

just realized logistic regression default solver in scikit-learn uses l-bfgs solver instead of gradient descent

shell berry
#

@hollow sentinel Thanks for the advice. I'm actually looking for some guidance on a particular project and I have specific questions

lapis sequoia
#

Hey guys, I'm not seeing the bransches, do you guys know the issue?

dtree = dtree.fit(X_train,y_train)```

```plot_tree(dtree,
 filled = True,
 rounded = True,
 class_names = ['released', 'deceased'],
 feature_names = X.columns) ```
glacial rune
#

any ideas what's causing this? This code ran fine a few weeks ago, but I tried it again today and it didn't work. Not sure if asos changed their backend

molten hamlet
#

why it does not work?

#

your code is 200, so its fine

#

@glacial rune

glacial rune
#

the response is different in python

molten hamlet
#

what do you mean?

glacial rune
#

the response body from Python is:

{"id":14014948,"name":"Nike Air Jordan 1 Mid trainers in colourblock","description":"<a href="/women/shoes/trainers/cat/?cid=6456"><strong>Trainers</strong></a> by   <a href="women/a-to-z-of-brands/jordan/cat/?cid=29517"><strong>Jordan</strong></a><ul>    <li><span style="background-color: initial;">Unboxing potential: considerable</span></li><li>Mid rise</li><li>Padded cuff for a supportive fit</li><li>Lace-up fastening&nbsp;</li><li>Nike Swoosh logo</li><li>Perforated toe cap for breathability</li><li>Helps keep them fresher for longer</li><li>Nike Air sole with Air units</li><li>Units contain pressurised air that compress on impact</li><li>For lightweight, durable cushioning</li><li>Rubber outsole</li></ul>","alternateNames":[{"locale":"en-GB","title":"Nike Air Jordan 1 Mid trainers in colourblock"},{"locale":"ru-RU","title":"ะšั€ะพััะพะฒะบะธ ัั€ะตะดะฝะตะน ะฒั‹ัะพั‚ั‹ ะฒ ัั‚ะธะปะต ะบะพะปะพั€ ะฑะปะพะบ Nike Air Jordan 1"},{"locale":"sv-SE","title":"Nike โ€“ Air Jordan 1 โ€“ Blockfรคrgade trรคningsskor med halvhรถgt skaft"}],"localisedData":null,"gender":"Women","productCode":"1611119","pdpLayout":"Footwear",
#

I'm expecitng it to look more like my first screenshot, with the productID and prices

lapis sequoia
#

data

summer holly
#

Hi everyone, I'm trying to build flask backend that takes tweets as input, preprocesses and makes predictions. I want to use a keras model saved as h5 format. Can anyone direct me to any helpful resources on this? Thank you

hollow sentinel
lapis sequoia
#

Somebody help me with this... why does the normal DQN perform so much better than a lot of the other ones

sleek rampart
#

I need help with Deep Neural Network, like OG pro

lapis sequoia
#

Is it because DQN was faster to train in a simple environment?

sleek rampart
#

what does DQN mean?@lapis sequoia does it mean double Q Network?

#

Deep Q Network I see, cool

hollow sentinel
#

idek what that is

#

machine learning noob here

sleek rampart
#

School Homework?^

hollow sentinel
#

@sleek rampart are you talking about what I posted

heady hatch
#

Hey @hollow sentinel , the reason for dummy variables is for dealing with categorical variables.

#

You could also use ordinal encoder but sometimes order doesnโ€™t make sense, so you need dummies instead.

#

Dummy features also allow you to access multi class.

quick epoch
#

Yo guys. Can you help me with the problem?

#

For some reason I am not getting multicoloured line

heady hatch
#

Iโ€™m not familiar with visualization, but have you checked the documentations?

quick epoch
hollow sentinel
#

you need dummy columns to run a random forest/decision trees?

heady hatch
#

You need dummy columns to deal with not numerical data.

quick epoch
#

I am just getting a green curve

hollow sentinel
#

@quick epoch not using jupyter notebook is a dangerous game haha

quick epoch
#

I know. But I got all the necessary libraries and etc xd

#

So I am not worried about that XD

hollow sentinel
#

@quick epoch i couldn't get pycharm to run on my machine properly lmao

dawn vault
#

hey anyone familiar with dash library ?

quick epoch
#

Itโ€™s easy actually. There is a way to install all necessary packages automatically using conda

#

But I am not good at data visualisation but I need this for data science and ml

#

So if you could help me I would appreciate it

hollow sentinel
#

maybe that's it idk

#

I've been using seaborn the most bc that's what I've been doing with Udemy

quick epoch
#

There are multiple plots and I need one

#

That changes

hollow sentinel
#

you need one but you want certain parts of the line to be different colors

#

what graphing library are you using

quick epoch
#

Matplotlib

#

What I mean is

#

Do you see these if statements?

quick epoch
#

I want the plot to change whenever it sees those values in the csv file

hollow sentinel
#

i get it the doc for multi colored line might help

dawn vault
#

@hollow sentinel are you familiar with dash ? and datatables and callbacks.. ? if so you could mightg help me out ? thnx

hollow sentinel
#

@dawn vault hahhahaha I've only been doing machine learning for 2 or 3 weeks

#

is dash a graphing library like plotly?

dawn vault
#

sort of.. one can build dashboards quit easily.. and yea dash and plotly are mmore or less the same thing..

#

with dash one can build interactive sahboards using plotly and stuff

#

*dashboards

hollow sentinel
#

yeah so far I only know pandas, matplotlib, seaborn, and plotly rn

dawn vault
#

kk

#

iam working with pandas and dahs/plotly.. to get my project going .. but running into issues..

hollow sentinel
#

F

#

don't worry you'll get help here

dawn vault
#

so what are you working on right now ?

hollow sentinel
#

support vector machines

dawn vault
#

omg .. dont know any of those words... lol

hollow sentinel
#

hahahhahahah me neither

#

this udemy course is killing me with all this new info

dawn vault
#

which one?

hollow sentinel
#

Python for Data Science and Machine Learning Bootcamp by Jose Portilla

#

I'd recommend it for people who want to start learning machine learning it's only 20 bucks

dawn vault
#

jose portilla rings a bell.. i might have taken a course.. from him.. w

hollow sentinel
#

yeah he's great

#

probably one of my favorite people to learn from

dawn vault
#

took python for financial analysis and algo trading...