untold tundra Nov 28, 2021, 8:01 PM

#

what you're loading with en_cor... is just the vocab vectors, so if you want to store that on s3, i imagine you can just save a version from the github where i imagine its downloaded from

rotund basin Nov 28, 2021, 8:02 PM

#

this is what I had been assuming, but have not made it work. maybe just need to keep digging there

untold tundra Nov 28, 2021, 8:04 PM

#

well i think nlp.vocab.to_disk() works in anycase, so you can use that

rotund basin Nov 28, 2021, 8:05 PM

#

yes, I just might. Sometimes it seems like there may be an easier way when there actually isn't :)

#

thanks for your insights

sand loom Nov 28, 2021, 8:48 PM

#

Is there an efficient way to, for example, remove columns from a numpy ndarray? Ive got an (n,85) ndarray and am trying to efficiently trim out the last 80 entries

#

In reality selecting relevant data from the last 80 entries and placing it into index 5 of the original array, rendering the last 80 no longer useful

#

Ive looked into masking, deletion, copying to new array... but I'm curious if anyone has any insight into an efficient method, or if leaving as-is is the best? I'm just a bit memory constrained

serene scaffold Nov 28, 2021, 9:12 PM

#

sand loom Is there an efficient way to, for example, remove columns from a numpy ndarray? ...

you can't change the shape of an array in-place, but you can still use slicing

#

!e

import numpy as np
arr = np.random.random((43, 85))
new = arr[:-10, :]  # chop off the last ten rows
print(new.shape)

arctic wedgeBOT Nov 28, 2021, 9:13 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

(33, 85)

sand loom Nov 28, 2021, 9:15 PM

#

serene scaffold !e ```py import numpy as np arr = np.random.random((43, 85)) new = arr[:-10, :] ...

And does slicing still use the original memory in place, i.e. a shallow copy?

#

Ah yeah, it seems that slicing just provides a view. Alrighty

serene scaffold Nov 28, 2021, 9:18 PM

#

sand loom And does slicing still use the original memory in place, i.e. a shallow copy?

No, I don't believe there's an in-place solution.

serene scaffold Nov 28, 2021, 10:22 PM

#

@sand loom how memory constrained are we talking, anyway?

#

If you can prevent the array from being assigned to a variable before you slice it, it might not stay in memory as long.

sand loom Nov 28, 2021, 11:15 PM

#

serene scaffold <@188428700188934145> how memory constrained are we talking, anyway?

I don't have an exact number on that as the rest of the system is still under construction, but it's an embedded application. Doing non-max suppression on (25200x85) unsigned int8, so that gets a bit messy

serene scaffold Nov 28, 2021, 11:16 PM

#

sand loom I don't have an exact number on that as the rest of the system is still under co...

Are you using cpython, or a different implementation?

sand loom Nov 28, 2021, 11:16 PM

#

the device is yocto-based (stripped linux) so just standard python

#

at least, afaik

serene scaffold Nov 28, 2021, 11:18 PM

#

@sand loom I've never worked on an embedded system. I'm not aware of how numpy has been used with them

sand loom Nov 28, 2021, 11:20 PM

#

I'm doing some more research on this (hadnt considered that yocto could use non-standard python which would change all profilings of things I have looked at). However, afaik it should be just the same python installation and implementation as a standard python install on ubuntu or other equivalent (python version 3.8.11 for that matter)

#

Well, at least at the top level I am not using cpython. The implementation on the system might be using it, though. Was not able to find any definite answer on that matter. But for now I am comfortable assuming it works nearly identical to a standard linux install

untold tundra Nov 28, 2021, 11:32 PM

#

an (n,85) ndarray and am trying to efficiently trim out the last 80 entries

#

so you're looking to go from (n, 85) to (n, 5) ?

#

how efficient is data = data[:, -5:] ?

#

or otherwise, eg., more_data = data[:, -5:]; del data[:, :80]

#

the array built-in python module may also be helpful

pure pumice Nov 29, 2021, 12:06 AM

#

@serene scaffold Hey, is it possible if you can help me with a few more things?

serene scaffold Nov 29, 2021, 12:06 AM

#

@pure pumice idk what it is

pure pumice Nov 29, 2021, 12:07 AM

#

serene scaffold <@104664534446272512> idk what it is

{'Index': [0, 1, 2, 3, 4],
'ID': [1, 2, 3, 4, 5],
'Title': ['Inception',
'The Matrix',
'Avengers: Infinity War',
'Back to the Future',
'The Good, the Bad and the Ugly'],
'Year': [2010, 1999, 2018, 1985, 1966],
'Age': ['13+', '18+', '13+', '7+', '18+'],
'IMDb': [8.8, 8.7, 8.5, 8.5, 8.8],
'Rotten Tomatoes': [87, 87, 84, 96, 97],
'Netflix': [1, 1, 1, 1, 1],
'Hulu': [0, 0, 0, 0, 0],
'Prime Video': [0, 0, 0, 0, 1],
'Disney+': [0, 0, 0, 0, 0],
'Type': [0, 0, 0, 0, 0],
'Directors': ['Christopher Nolan',
'Lana Wachowski,Lilly Wachowski',
'Anthony Russo,Joe Russo',
'Robert Zemeckis',
'Sergio Leone'],
'Genres': ['Action,Adventure,Sci-Fi,Thriller',
'Action,Sci-Fi',
'Action,Adventure,Sci-Fi',
'Adventure,Comedy,Sci-Fi',
'Western'],
'Country': ['United States,United Kingdom',
'United States',
'United States',
'United States',
'Italy,Spain,West Germany'],
'Language': ['English,Japanese,French',
'English',
'English',
'English',
'Italian'],
'Runtime': [148.0, 136.0, 149.0, 116.0, 161.0]}

serene scaffold Nov 29, 2021, 12:07 AM

#

What are you trying to do

pure pumice Nov 29, 2021, 12:07 AM

#

serene scaffold <@104664534446272512> idk what it is

i need to Create a new dataframe that only contains values that are on two or more streaming platforms

HINT: This is a great place to use filters!

#

#

this is what ive done for netflix but i cant just do the same this for all 4

#

is it possible to do it in one filter?

serene scaffold Nov 29, 2021, 12:09 AM

#

@pure pumice so you need to add the four streaming platform columns

#

And then get those

#

That are >= 2

pure pumice Nov 29, 2021, 12:10 AM

#

serene scaffold <@104664534446272512> so you need to add the four streaming platform columns

filt1 = df['Netflix','Hulu','Prime Video', 'Disney+'] >= 2
df.loc[filt1

serene scaffold Nov 29, 2021, 12:10 AM

#

@pure pumice you didn't take the sum.of them

#

.sum()

#

Also I'm on my phone

#

At my parents house

#

I wanna go home

#

Help me

pure pumice Nov 29, 2021, 12:10 AM

#

lol

#

send ur address

#

ill pick you up and we can work on this

pure pumice Nov 29, 2021, 12:11 AM

#

serene scaffold .sum()

the .sum()

#

would be after the []

serene scaffold Nov 29, 2021, 12:12 AM

#

@pure pumice try it

#

@pure pumice also you might need to set the axis

#

!docs pandas.DataFrame.sum

arctic wedgeBOT Nov 29, 2021, 12:13 AM

#

pandas.DataFrame.sum


DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)```
Return the sum of the values over the requested axis.

This is equivalent to the method `numpy.sum`.

pure pumice Nov 29, 2021, 12:13 AM

#

wtffff

serene scaffold Nov 29, 2021, 12:14 AM

#

What the fuckkkkk

pure pumice Nov 29, 2021, 12:14 AM

#

so my column names

#

are gonna be inside the sum()

serene scaffold Nov 29, 2021, 12:15 AM

#

No

fierce patio Nov 29, 2021, 12:16 AM

#

hi

pure pumice Nov 29, 2021, 12:17 AM

#

serene scaffold No

how do i set an axis?

serene scaffold Nov 29, 2021, 12:17 AM

#

pure pumice how do i set an axis?

It's 1 or 0

fierce patio Nov 29, 2021, 12:18 AM

#

plz i still can t understand whats the role of cross_val_score

serene scaffold Nov 29, 2021, 12:18 AM

#

@fierce patio it's the score from cross validation

pure pumice Nov 29, 2021, 12:19 AM

#

serene scaffold It's 1 or 0

so i only need to set an axis, none of that other skipna, level... stuff

serene scaffold Nov 29, 2021, 12:19 AM

#

@pure pumice ya

pure pumice Nov 29, 2021, 12:20 AM

#

filt1 = df['Netflix','Hulu','Prime Video', 'Disney+'].sum(axis=1) >= 2

serene scaffold Nov 29, 2021, 12:20 AM

#

@pure pumice that's going to be a series of bools

pure pumice Nov 29, 2021, 12:20 AM

#

serene scaffold <@104664534446272512> that's going to be a series of bools

it just gives an error

serene scaffold Nov 29, 2021, 12:21 AM

#

@pure pumice remember what I said about saying you got an error.

pure pumice Nov 29, 2021, 12:21 AM

#

Traceback (most recent call last)
/cloud/lib/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:

/cloud/lib/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('Netflix', 'Hulu', 'Prime Video', 'Disney+')

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
<ipython-input-12-bbb026620058> in <module>
1 # Create a new dataframe that only contains values that are on two or more streaming platforms
2 # HINT: This is a great place to use filters!
----> 3 filt1 = df['Netflix','Hulu','Prime Video', 'Disney+'].sum(axis=1) >= 2
4 df.loc[filt1]
5 #1 filter, total number line

/cloud/lib/lib/python3.9/site-packages/pandas/core/frame.py in getitem(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]

/cloud/lib/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: ('Netflix', 'Hulu', 'Prime Video', 'Disney+')

#

whoops sorry

serene scaffold Nov 29, 2021, 12:21 AM

#

It has to be a list

#

Not a tuple

pure pumice Nov 29, 2021, 12:22 AM

#

ah so i dont need quotations on each one of thems

#

them

serene scaffold Nov 29, 2021, 12:22 AM

#

No that's not what I said

fierce patio Nov 29, 2021, 12:23 AM

#

serene scaffold <@750786725646696448> it's the score from cross validation

what u mean by score plz

pure pumice Nov 29, 2021, 12:23 AM

#

serene scaffold Not a tuple

but im using []

#

not parenthesis

serene scaffold Nov 29, 2021, 12:23 AM

#

@pure pumice yes but you still need quotes

#

For each string that is a column name

pure pumice Nov 29, 2021, 12:24 AM

#

df['Netflix','Hulu','Prime Video', 'Disney+']

#

like that right?

serene scaffold Nov 29, 2021, 12:24 AM

#

No

#

You need an extra pair of []

pure pumice Nov 29, 2021, 12:24 AM

#

for each

#

column

serene scaffold Nov 29, 2021, 12:24 AM

#

No

#

For the whole thing

#

Df[[.,.,.,.,]]

fierce patio Nov 29, 2021, 12:25 AM

#

serene scaffold <@750786725646696448> it's the score from cross validation

do u mean the best hyperparametr

pure pumice Nov 29, 2021, 12:25 AM

#

serene scaffold For the whole thing

omg it worked

serene scaffold Nov 29, 2021, 12:26 AM

#

meow_party

pure pumice Nov 29, 2021, 12:26 AM

#

thank you python master

serene scaffold Nov 29, 2021, 12:26 AM

#

Yw python disciple

pure pumice Nov 29, 2021, 12:26 AM

#

i feel like shooting my computer screen i hate this python assignment 😦

serene scaffold Nov 29, 2021, 12:26 AM

#

pure pumice i feel like shooting my computer screen i hate this python assignment 😦

That won't fix it

#

Or make it go away

pure pumice Nov 29, 2021, 12:27 AM

#

https://tenor.com/view/pain-naruto-anime-anonymouskun-u-dont-know-me-gif-21604973

Tenor

fierce patio Nov 29, 2021, 12:28 AM

#

pure pumice https://tenor.com/view/pain-naruto-anime-anonymouskun-u-dont-know-me-gif-2160497...

i have the same feeling

pure pumice Nov 29, 2021, 12:28 AM

#

also @serene scaffold if i am trying to create a pie chart for a specific column is that possible, or would i have to put it into a pivot table first

pure pumice Nov 29, 2021, 12:28 AM

#

fierce patio i have the same feeling

pain

serene scaffold Nov 29, 2021, 12:29 AM

#

@pure pumice I've never made a pie chart

pure pumice Nov 29, 2021, 12:30 AM

#

damn

serene scaffold Nov 29, 2021, 12:31 AM

#

I like pie

pure pumice Nov 29, 2021, 12:31 AM

#

apple pie

serene scaffold Nov 29, 2021, 12:31 AM

#

Pecan pie

#

Pumpkin pie

pure pumice Nov 29, 2021, 12:31 AM

#

serene scaffold I like pie

never tried

#

do u eat ice cream?

#

if u like pecans u should eat some pralines and cream ice cream

#

@serene scaffold

#

okay so over here

#

{'Index': [0, 1, 2, 3, 4],
'ID': [1, 2, 3, 4, 5],
'Title': ['Inception',
'The Matrix',
'Avengers: Infinity War',
'Back to the Future',
'The Good, the Bad and the Ugly'],
'Year': [2010, 1999, 2018, 1985, 1966],
'Age': ['13+', '18+', '13+', '7+', '18+'],
'IMDb': [8.8, 8.7, 8.5, 8.5, 8.8],
'Rotten Tomatoes': [87, 87, 84, 96, 97],
'Netflix': [1, 1, 1, 1, 1],
'Hulu': [0, 0, 0, 0, 0],
'Prime Video': [0, 0, 0, 0, 1],
'Disney+': [0, 0, 0, 0, 0],
'Type': [0, 0, 0, 0, 0],
'Directors': ['Christopher Nolan',
'Lana Wachowski,Lilly Wachowski',
'Anthony Russo,Joe Russo',
'Robert Zemeckis',
'Sergio Leone'],
'Genres': ['Action,Adventure,Sci-Fi,Thriller',
'Action,Sci-Fi',
'Action,Adventure,Sci-Fi',
'Adventure,Comedy,Sci-Fi',
'Western'],
'Country': ['United States,United Kingdom',
'United States',
'United States',
'United States',
'Italy,Spain,West Germany'],
'Language': ['English,Japanese,French',
'English',
'English',
'English',
'Italian'],
'Runtime': [148.0, 136.0, 149.0, 116.0, 161.0]}

#

i have these genres and i need to slice it so only the first genre shows on each row

#

but what about the rows with only one genre. wont they get sliced as well?

serene scaffold Nov 29, 2021, 12:34 AM

#

pure pumice <@!253696366952316929>

Reading screenshots of text is annoying, especially on mobile. Which I'm on.

pure pumice Nov 29, 2021, 12:34 AM

#

serene scaffold Reading screenshots of text is annoying, especially on mobile. Which I'm on.

Using the original dataframe, take the genres column, and only keep the first genre.

For example, if the value was previously Comedy,Drama,Romance, then it would become Comedy

serene scaffold Nov 29, 2021, 12:36 AM

#

@pure pumice the way with the fewest steps involves regular expressions

#

You might use apply and a lambda

pure pumice Nov 29, 2021, 12:36 AM

#

serene scaffold <@104664534446272512> the way with the fewest steps involves regular expressions

omg lambda

#

never heard of 😦

serene scaffold Nov 29, 2021, 12:36 AM

#

Lambda

#

It's just where you make a one statement function

pure pumice Nov 29, 2021, 12:37 AM

#

so what im doing rn is just longer

serene scaffold Nov 29, 2021, 12:37 AM

#

I can show you
When I get home

pure pumice Nov 29, 2021, 12:37 AM

#

serene scaffold I can show you When I get home

thank you

pure pumice Nov 29, 2021, 12:40 AM

#

serene scaffold I can show you When I get home

what is an aggfunc?

#

Create a pivot table where the average runtime of the movie is examined

Make the rows Year and the columns Genres

#

gp_pivot = df.pivot_table(values='Runtime', index="Genres",
columns = 'Year', aggfunc='mean')
gp_pivot.tail()

sand loom Nov 29, 2021, 12:47 AM

#

untold tundra how efficient is `data = data[:, -5:]` ?

I assume this would just be another view. However, that's probably fine as long as I'm not making more copies of my slightly modified data. Will need to profile it and also profile your deletion suggestion. I appreciate the help!

quiet vault Nov 29, 2021, 1:00 AM

#

Has anyone worked with colab when using a runtime of a vm from gcp

serene scaffold Nov 29, 2021, 1:01 AM

#

pure pumice what is an aggfunc?

A function that you use to reduce aggregated data, such as sum or mean

#

@pure pumice

In [7]: df['Language']
Out[7]:
0    English,Japanese,French
1                    English
2                    English
3                    English
4                    Italian
Name: Language, dtype: object

In [6]: df['Language'].str.extract(r'^([A-z]+),?')
Out[6]:
         0
0  English
1  English
2  English
3  English
4  Italian

#

I guess it could just be df['Language'].str.extract(r'([A-z]+)')

pure pumice Nov 29, 2021, 1:09 AM

#

serene scaffold <@!104664534446272512> ```py In [7]: df['Language'] Out[7]: 0 English,Japanes...

😮

#

@serene scaffold (r'^([A-z]+),?')

#

what does this mean exactly

serene scaffold Nov 29, 2021, 1:09 AM

#

it's my demon summoning spell

pure pumice Nov 29, 2021, 1:10 AM

#

does that mean its mine now

serene scaffold Nov 29, 2021, 1:10 AM

#

no

pure pumice Nov 29, 2021, 1:10 AM

#

dammit

serene scaffold Nov 29, 2021, 1:10 AM

#

anyway a regular expression is a pattern that strings can match

#

^([A-z]+),? means "from the start of the string (^), extract (()) one or more (+) consecutive characters from A to z ([A-z]) possibly (?) followed by a comma (,).

pure pumice Nov 29, 2021, 1:12 AM

#

and the reason we cant use str[]

serene scaffold Nov 29, 2021, 1:12 AM

#

pure pumice and the reason we cant use str[]

it's a method call, not str slicing

pure pumice Nov 29, 2021, 1:12 AM

#

is because ithe languages are not in a list

#

ahhhhh okay

serene scaffold Nov 29, 2021, 1:12 AM

#

str[1:2] would be a string slice

pure pumice Nov 29, 2021, 1:13 AM

#

ya i was trying tha

#

and it was just

#

messing everthing up

serene scaffold Nov 29, 2021, 1:13 AM

#

that would work if every language name had the same number of characters.

#

I'm gonna play a game before I go back to work in the morning

#

sadge

pure pumice Nov 29, 2021, 1:14 AM

#

have a great night

#

thanks

#

again

serene scaffold Nov 29, 2021, 1:14 AM

#

👍

sullen tinsel Nov 29, 2021, 2:14 AM

#

Hey, does anyone mind taking a look at a code I am working on? I'm struggling with one aspect of it that has to do with user input and the help chat said this chat is also helpful!

serene scaffold Nov 29, 2021, 2:31 AM

#

sullen tinsel Hey, does anyone mind taking a look at a code I am working on? I'm struggling wi...

this chat is for data science, so questions about "user input" probably fall under a different domain. If it's a data science question, go ahead.

Any time you want help in this Discord, or anywhere on the internet, just say your question right away. Putting extra work between people and knowing what your question is just slows things down and makes it less likely that you'll get help.

brisk moth Nov 29, 2021, 2:32 AM

#

how do i use CUDA toolkit?

serene scaffold Nov 29, 2021, 2:32 AM

#

brisk moth how do i use CUDA toolkit?

to do what?

brisk moth Nov 29, 2021, 2:32 AM

#

do cuda things

serene scaffold Nov 29, 2021, 2:32 AM

#

shrug2

brisk moth Nov 29, 2021, 2:32 AM

#

i have a 1060 i tried installing cuda toolkit and it failed

serene scaffold Nov 29, 2021, 2:32 AM

#

show error message

brisk moth Nov 29, 2021, 2:33 AM

#

uhh

serene scaffold Nov 29, 2021, 2:34 AM

#

brisk moth how do i use CUDA toolkit?

it sounds like your question is really "how can I install CUDA toolkit on my system despite it not working when I tried x"

brisk moth Nov 29, 2021, 2:34 AM

#

true

serene scaffold Nov 29, 2021, 2:35 AM

#

that's a different question from "how do I use CUDA toolkit?". afaik, CUDA toolkit is just a compatibility layer for installing pytorch and stuff (and therefore can't be "used"), but I try not to rule out the possibility that people know something I don't, since they usually do.

#

but if you can't show the error message, idk what to do.

royal crest Nov 29, 2021, 2:50 AM

#

XY problem

serene scaffold Nov 29, 2021, 3:13 AM

#

royal crest XY problem

which problem are you saying is an XY problem?

stoic musk Nov 29, 2021, 5:20 AM

#

Is anybody familiar with basic Tensoflow/Keras?

#

for t in range(Tx):

    # Step 2.A: select the "t"th time step vector from X. 
    x = X[:,t,:](X)

Trying to figure out how to iterate over the above tensor X

final scaffold Nov 29, 2021, 6:55 AM

#

Hi, ive installed anaconda and kept these checked while installing:
a) install for All Users (not currnt)
b) Add PATH
Installed location is c:/ProgramData.

Now, when i open cmd (both as user and administrator) and type: python
i get this warning message:-
This python interpreter is in conda environment, but the environment has not been activated. Libraries may fail to load.

gilded copper Nov 29, 2021, 7:10 AM

#

Any professionals over here who can help me a bit

hard shuttle Nov 29, 2021, 9:30 AM

#

Hi everyone

junior lintel Nov 29, 2021, 11:08 AM

#

If I need help with an AI script do I simply go into a help channel or do I go here?

serene scaffold Nov 29, 2021, 11:08 AM

#

@gilded copper always ask your actual question. Don't rule out everyone except "professionals" before you've put an answerable question out there

serene scaffold Nov 29, 2021, 11:09 AM

#

junior lintel If I need help with an AI script do I simply go into a help channel or do I go h...

You can ask here.

marsh yacht Nov 29, 2021, 11:54 AM

#

final scaffold Hi, ive installed anaconda and kept these checked while installing: a) install f...

ill pm you if you need help installing anaconda

marsh yacht Nov 29, 2021, 11:55 AM

#

final scaffold Hi, ive installed anaconda and kept these checked while installing: a) install f...

in the cmd type in conda instead of python see if that work

marsh yacht Nov 29, 2021, 12:02 PM

#

final scaffold Hi, ive installed anaconda and kept these checked while installing: a) install f...

have you created a new anaconda environment yet?

gilded copper Nov 29, 2021, 12:14 PM

#

serene scaffold <@759330370470019092> always ask your actual question. Don't rule out everyone e...

Am confused to choose my career. Which to choose : IOT or Cloud. That's my qn

serene scaffold Nov 29, 2021, 12:28 PM

#

gilded copper Am confused to choose my career. Which to choose : IOT or Cloud. That's my qn

sounds like a question for #career-advice, but I suspect that it will ultimately be up to you

acoustic forge Nov 29, 2021, 3:02 PM

#

Guys - How would you "rank" these algorithms i terms of complexity (NOT Big O, but rather complexity in terms of explainability to stakeholders)
XGBoost
Random Forest
Logistic Regression
K-Nearest Neighbours
Decision tree
Perceptron

shut valve Nov 29, 2021, 3:03 PM

#

regression, decision trees, forests, k-nearest, xgboost, perceptron

#

but like k nearest could be higher

acoustic forge Nov 29, 2021, 3:04 PM

#

K nearest is the easiest in my opinion

shut valve Nov 29, 2021, 3:04 PM

#

it think between knearest and decision trees its a toss up to you on order to introduce it

acoustic forge Nov 29, 2021, 3:05 PM

#

I see. I have this stupid assignment for my final exam in applied data science, and they want basically a powerpoint. I need to explain it to stakeholders who know nothing about data science

shut valve Nov 29, 2021, 3:05 PM

#

i feel that decision trees are esier to show to non tech people

#

yeah then i feel k-nearest is a bit more algorithmic to understand as with decision tress its easier to explain the bigger picture without getting to technical

acoustic forge Nov 29, 2021, 3:07 PM

#

Right, yeah. Makes sense

shut valve Nov 29, 2021, 3:08 PM

#

just a bit of advice i didn't appreciate as much in school was to talk more about results and consequences then specific technical aspects.

#

Hey anyone have any interest in taking https://cds.nyu.edu/deep-learning/ it looks real cool I have take some other deeplearning classes and projects so I'm not a total noob but idk if i would even call myself intermediate yet. I took linear in college did ok but that was a few years ago and havn't taken a derivate or anything in years which I'm concerned about. I'm not going to be sprinting thought going to try to stick to the weekly schedule but If something takes me an additional week for whatever reason then so be it. I was planning on starting it after the new year just asking to see if there was any interest

NYU Center for Data Science

Yann LeCun’s Deep Learning Course at CDS

trail horizon Nov 29, 2021, 3:29 PM

#

guys i would to learn data engineering but dont know from where to start, I already know python and SQL

#

can u pls give me like a career track or course ?

#

pls

serene scaffold Nov 29, 2021, 3:42 PM

#

acoustic forge Guys - How would you "rank" these algorithms i terms of complexity (NOT Big O, b...

K nearest would be easiest, followed by decision tree. Random forest can't be explained without first explaining decision tree. You might be able to reduce "logistic regression" to "best-curve fitting".

serene scaffold Nov 29, 2021, 3:42 PM

#

trail horizon guys i would to learn data engineering but dont know from where to start, I alre...

"Data Science from Scratch" is a good book imo.

trail horizon Nov 29, 2021, 3:43 PM

#

serene scaffold "Data Science from Scratch" is a good book imo.

thank bro but im more interested on data engineering

serene scaffold Nov 29, 2021, 3:44 PM

#

trail horizon thank bro but im more interested on data engineering

what definitions of "data engineering" and "data science" are you working with?

trail horizon Nov 29, 2021, 3:44 PM

#

data science = machine learning, deep learning, etc etc

#

data engineering = spark, pipelines, cloud

shut valve Nov 29, 2021, 3:45 PM

#

umm then yours asking the wrong channel try devops?

serene scaffold Nov 29, 2021, 3:46 PM

#

afaik the fundamentals are the same.

#

one of my classmates got a job titled "data engineer" whereas my title is "computational linguist", but we both did the data science program.

#

(though I also took linguistics classes.)

shut valve Nov 29, 2021, 3:48 PM

#

yeah but if your more interested in dataops and mlops there is a great course on coursera https://www.coursera.org/learn/introduction-to-machine-learning-in-production/home/welcome I took it and it was really interesting got to see a side that you dont get in more technical classes

Coursera

Coursera | Online Courses & Credentials From Top Educators. Join fo...

Learn online and earn valuable credentials from top universities like Yale, Michigan, Stanford, and leading companies like Google and IBM. Join Coursera for free and transform your career with degrees, certificates, Specializations, & MOOCs in data science, computer science, business, and dozens of other topics.

normal radish Nov 29, 2021, 3:49 PM

#

Hey guys can any of you help me with Convolutional Neural Networks? It is for a school project

shut valve Nov 29, 2021, 3:49 PM

#

umm maybe whats your problem?

normal radish Nov 29, 2021, 3:50 PM

#

I have a code I have trouble understanding

#

Posted it in the brocoli help channel

#

#help-broccoli

shut valve Nov 29, 2021, 3:54 PM

#

well it looks like its just making the network do you have a specific layer that you dont understand? i honestly never worked with seperable i dont know what that is

normal radish Nov 29, 2021, 3:55 PM

#

The make model part is my problem. How does it work?

#

Do you have time for voice chat?

shut valve Nov 29, 2021, 3:57 PM

#

i cant talk rn but it looks like you make stacks of convolutions with larger and larger filter sizes

normal radish Nov 29, 2021, 3:57 PM

#

Ye I stole the code from the Keras creator.

#

Need to analyze and understand it

#

But the filter applying is confusing me. The difference between seperableconv2d and just conv2d

shut valve Nov 29, 2021, 4:00 PM

#

yeah I never used separable so i dont know if its make or break or if you can just swap it with regular 2d and get similar performance
https://keras.io/api/layers/convolution_layers/separable_convolution2d/

Keras documentation: SeparableConv2D layer

normal radish Nov 29, 2021, 4:02 PM

#

Shit yeah I read that but it didnt help me

#

Give me 2 sec I can make a model for you

shut valve Nov 29, 2021, 4:02 PM

#

i think you can just use regular conv2d. yeah try it with regular conv2d

normal radish Nov 29, 2021, 4:03 PM

#

Can you give me an example code private?

shut valve Nov 29, 2021, 4:03 PM

#

it says it has The depth_multiplier argument but i dont see it used in the code

#

i would just ctrl-R SeparableConv2D to Conv2D

normal radish Nov 29, 2021, 4:03 PM

#

for reals?

#

Used this https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/image_classification_from_scratch.ipynb#scrollTo=M1uRIyLcbtx9

Google Colaboratory

#

Makes this model:

vAQAAAAAAqCfkewAAAAAAAOoJR4AAAAAAIB6Qr4HAAAAAACgnv4fSfG3pvshIbcAAAAASUVORK5CYII.png

shut valve Nov 29, 2021, 4:06 PM

#

ah i see the residual now

#

ok so where the netowrk splits into seperable and regular conv is because of this part # Project residual residual = layers.Conv2D(size, 1, strides=2, padding="same")( previous_block_activation ) x = layers.add([x, residual]) # Add back residual previous_block_activation = x # Set aside next residual

#

I think the network would be the same if you changed seprable to regular conv2d

#

so what that peice of code is doing is applying the conv2d the layer on the right of the splits and adding them back together and saving the previous_block_activation

normal radish Nov 29, 2021, 4:12 PM

#

Ehhh?

#

Can you maybe draw a layered model of it?

shut valve Nov 29, 2021, 4:14 PM

#

like i just changed it to regular conv if you issue is with the network splitting after the activation its bc of the code above

#

if you re post the make model in a help chat i can show you my comments

brisk moth Nov 29, 2021, 4:16 PM

#

can anyone help me figure out why im getting training and validation accuracy over 3.0 lol

normal radish Nov 29, 2021, 4:16 PM

#

I put it in the #help-broccoli

#

But it was closed again

#

Can you still use it?

bleak kiln Nov 29, 2021, 4:25 PM

#

anyone has some experience with pandas libray ?

fierce patio Nov 29, 2021, 4:37 PM

#

hi guys i work on data about testosteronne i want to creat a ML model for classification my target is testosteronn does i have to drop it from my data if i wanna using kmeans algorithm

desert oar Nov 29, 2021, 5:23 PM

#

bleak kiln anyone has some experience with pandas libray ?

don't "ask to ask". state your question, and someone will answer if they are willing and able

#

see here for a guide on asking good questions online https://www.codementor.io/learn-programming/how-to-get-programming-help-online

#

here as well https://www.pythondiscord.com/pages/guides/pydis-guides/asking-good-questions/

Python Discord - Asking Good Questions

A guide for how to ask good questions in our community.

bold timber Nov 29, 2021, 5:28 PM

#

Hi, I am so confused about this. I have a 5000 feature in the dataset, but I only get around 2500 components in the plot. What happened in this case?

untold tundra Nov 29, 2021, 5:37 PM

#

print X_train.shape

#

either way, do you need to see all 5k components? it's >0.95 at like 200

bold timber Nov 29, 2021, 5:50 PM

#

untold tundra print `X_train.shape`

Before this, I used a different dataset to visualize like this, and I get a whole feature as an axis n_components. But, when I use the new dataset that have 5000 features, I only get some n_components from the whole feature. What happened?

untold tundra Nov 29, 2021, 5:53 PM

#

not sure, are you certain that X_train has 5000?

desert oar Nov 29, 2021, 5:54 PM

#

bold timber Hi, I am so confused about this. I have a 5000 feature in the dataset, but I onl...

what is pca.explained_variance_ratio_.shape?

bold timber Nov 29, 2021, 5:59 PM

#

desert oar what is `pca.explained_variance_ratio_.shape`?

yeah, pca.explained_variance_ratio_.shape is 2418. But why I didn't get a whole number of datasets in the axis of n_components?

desert oar Nov 29, 2021, 6:00 PM

#

bold timber yeah, pca.explained_variance_ratio_.shape is 2418. But why I didn't get a whole ...

because you can't have more components than the matrix rank, and matrix rank can't be greater than the minimum of the number of rows and number of columns

bold timber Nov 29, 2021, 6:01 PM

#

desert oar because you can't have more components than the matrix rank, and matrix rank can...

What is the matrix rank? Can you explain me?

desert oar Nov 29, 2021, 6:02 PM

#

bold timber What is the matrix rank? Can you explain me?

it's a concept from linear algebra. you can think of it as the number of individual "components" in a matrix

#

a matrix with 100 rows but rank 1 really only has 1 piece of data in it

#

(this is a very very non-mathematical explanation)

#

(the actual explanation has to do with "linear transformations")

#

i strongly encourage you to learn the fundamentals of linear algebra, it's an essential tool for building mathematical models in statistics and machine learning

#

linear algebra and calculus

bold timber Nov 29, 2021, 6:05 PM

#

desert oar it's a concept from linear algebra. you can think of it as the number of individ...

but why, when I have 30 features in the dataset, I can get a whole number of features in the plot?

undone heron Nov 29, 2021, 6:05 PM

#

Hello everyone... I have a stacking ensemble with the current config

estimators = [
        ('decision_tree', dtm),
        ('linear_regression',LinearRegressionModel),
    ]

    stack = StackingRegressor(estimators=estimators, final_estimator=RandomForestModel, cv= 7, passthrough = True)

Why does the one above perform better than this one below?

estimators = [
        ('decision_tree', dtm),
        ('rf', RandomForestModel)
    ]

    stack = StackingRegressor(estimators=estimators, final_estimator=LinearRegressionModel, cv= 7, passthrough = True)

bold timber Nov 29, 2021, 6:05 PM

#

desert oar i strongly encourage you to learn the fundamentals of linear algebra, it's an es...

do you can give me article recommendation?

desert oar Nov 29, 2021, 6:06 PM

#

bold timber but why, when I have 30 features in the dataset, I can get a whole number of f...

if you have 30 features and >=30 rows, then the data matrix has rank 30, so you can have up to 30 components

#

i recommend an introductory book or course on linear algebra. MIT 18.06 is excellent, the lectures are all on YouTube and the lecturer Gilbert Strang is very entertaining and passionate

#

i actually think a new online version is starting soon or has already started. but the old lectures are also easily available

undone heron Nov 29, 2021, 6:09 PM

#

undone heron Hello everyone... I have a stacking ensemble with the current config ``` estimat...

I know there is not much information about the data here but I'm looking for a more theoretical response as to how/why this would be the case

desert oar Nov 29, 2021, 6:12 PM

#

undone heron Hello everyone... I have a stacking ensemble with the current config ``` estimat...

that's a great question. my guess is that the linear regression and decision tree offset each other in a "bias-varance" sense. it might also be the case that the random forest model on just 2 features (the two first-stage model outputs) is finding a good bias-variance balance

#

i don't think there's any compelling theory here

bold timber Nov 29, 2021, 6:14 PM

#

desert oar because you can't have more components than the matrix rank, and matrix rank can...

Does it mean that the number of features in the dataset can't be plotted completely if it has lower a number of rows?

undone heron Nov 29, 2021, 6:16 PM

#

desert oar that's a great question. my guess is that the linear regression and decision tre...

Got it, so the actual nature of how these models operate could explain why pairing them like that would make it better/worse? The data I'm working with is basically a time series of passenger flow... So a non-linear problem in that matter and a Linear Regression would have no business in solving it

robust jungle Nov 29, 2021, 6:18 PM

#

would anyone mind helping in #help-dumpling ? I'm having trouble getting model_main_tf2.py to work

desert oar Nov 29, 2021, 6:24 PM

#

bold timber Does it mean that the number of features in the dataset can't be plotted complet...

plotting what exactly?

desert oar Nov 29, 2021, 6:25 PM

#

undone heron Got it, so the actual nature of how these models operate could explain why pairi...

in that case the linear regression is possibly fitting some kind of trend, and the decision tree is possibly fitting some kind of higher-order variation around the trend. if you can be more specific about the nature of the data, we might be able to provide more specific advice

bold timber Nov 29, 2021, 6:25 PM

#

desert oar plotting what exactly?

plot.pca.explained_variance_ratio i mean

desert oar Nov 29, 2021, 6:26 PM

#

bold timber plot.pca.explained_variance_ratio i mean

well yeah, if there are only 2418 components then you can only plot the explained variance ratios for those 2418 components

#

there is no 2873'th component to plot a variance ratio for

#

also i'm not sure there's value in 2000+ components that explain < 1% of variance...

bold timber Nov 29, 2021, 6:33 PM

#

desert oar also i'm not sure there's value in 2000+ components that explain < 1% of varianc...

What are you mean by "I'm not sure"? why 2000+ component is can't explain <1% of variance? can you explain to me about the correlation between components and variance?

especially in my case

desert oar Nov 29, 2021, 6:48 PM

#

bold timber What are you mean by "I'm not sure"? why 2000+ component is can't explain <1% of...

perhaps it would be more illustrative to look at the explained variance of each component, instead of the cumulative explained variances

#

i.e. do the plot without cumsum

#

you will see that the components at the end only explain a tiny fraction of overall variance

#

so you can probably just ignore them

#

and if you look at the plot you currently have, you will see that ~90% of the variance is explained by the first ~250 components

undone heron Nov 29, 2021, 6:57 PM

#

desert oar in that case the linear regression is possibly fitting some kind of trend, and t...

Sure, it is the passenger onboarding of buses in my city (for the year of 2015 in increments 1 hour). Here a quick graph

#

That is basically 24 hours of predicitons (blue lines containing the real values) and that one above is the Linear Regression model trying to solve it

#

This one is the decision tree

#

Random Forest below

desert oar Nov 29, 2021, 7:10 PM

#

@undone heron you just did linear regression of bus usage vs time?

#

this is average hourly usage over 2 months?

#

i don't think either model is a great idea tbh

#

you surely have seasonal effects to consider

#

as well as year-over-year trends

undone heron Nov 29, 2021, 7:11 PM

#

Model trained with the complete months of Jan, Feb and March and is predicting April 1st (24 predictions = 24hrs of the day)

#

I only have 2015 as data (it is mentioned on the limitations of the paper)

desert oar Nov 29, 2021, 7:12 PM

#

i see, maybe you have to assume that there is no change throughout the year although that is very very risky

#

you will at least need weekly seasonality

#

surely transit usage is very different on saturday and sunday vs mon-fri

undone heron Nov 29, 2021, 7:13 PM

#

Indeed, my features make sure that this is accounted for

#

Day of week (Mon - Sund), hour, month (1-12), day of year (1-365), day of month (1-31)

#

My big Q is just why the f* is the Linear Regression as a weak learning improving performance if it is so bad?

bold timber Nov 29, 2021, 7:14 PM

#

desert oar perhaps it would be more illustrative to look at the explained variance of _each...

Sorry, I still don't understand about this plot. Can you explain to me what information I get from this plot?

undone heron Nov 29, 2021, 7:14 PM

#

Even removing it from the stack makes the thing worse lol

desert oar Nov 29, 2021, 7:16 PM

#

bold timber Sorry, I still don't understand about this plot. Can you explain to me what info...

the % of variance explained by each component

#

see how it's nearly 0 for most of the components?

desert oar Nov 29, 2021, 7:17 PM

#

undone heron My big Q is just why the f* is the Linear Regression as a weak learning improvin...

it's not bad, look at your plot

#

it's great actually, it predicts the average hourly trend throughout the day

#

high bias low variance

#

the decision tree takes care of overfitting to all the little fluctuations

#

and it gets smeared out by the linear regression being very _under_fitted

#

and the predictions aren't highly correlated

undone heron Nov 29, 2021, 7:18 PM

#

Hmmmmmm that is an interesting take....

desert oar Nov 29, 2021, 7:18 PM

#

and putting them together with the random forest i guess makes sense?

#

maybe you should do it the other way

#

stage 1: linear regression + random forest
stage 2: decision tree

#

that would be intuitive to me at least

#

or maybe not

#

since you want lower variance in stage 2

#

either way, i can see how the random forest being nonlinear is essential in correctly "re-combining" the 2 models

#

what is the predicted output from the first stacked model? using that same plot

undone heron Nov 29, 2021, 7:20 PM

#

Well, to give more context... The final stage is -> the best ensemble for that social domain. Let me plot the two stacking I meantioned in the first question. 1 sec

desert oar Nov 29, 2021, 7:20 PM

#

tldr my guess is that your stacking model is doing what time series decomposition does, splitting apart "trend" and "noise", and then recombining them with the "noise" turned down and/or filtered with some kind of low-pass filter

undone heron Nov 29, 2021, 7:25 PM

#

Random Forest as final estimator and Decision Tree + Linear Regression as weak learners

#

Linear Reg as final estimator DT + RF as weak learners

#

wtf

#

the plot is better but the performance is not

#

MAE on second one goes up from 50 to 55

desert oar Nov 29, 2021, 7:28 PM

#

this is "in-sample" prediction performance, right?

undone heron Nov 29, 2021, 7:28 PM

#

nope

#

unseen data

desert oar Nov 29, 2021, 7:28 PM

#

i see

#

you are saying that the 1st one has a slightly lower median absolute error than the 2nd one?

#

conceptually i don't think DT + RF makes much sense

#

RF is definitely not a "weak" learner

#

and RF already is constructed from a bunch of trees

#

that said, i am surprised that the first plot has lower median abs error

#

maybe it has to do with median vs mean

#

since the "tails" of the error distribution are essentially discarded with medians

undone heron Nov 29, 2021, 7:29 PM

#

Wait, technical question... How do I measure performance from a .predict() run?

desert oar Nov 29, 2021, 7:29 PM

#

try mean abs error or mean squared error instead!

untold tundra Nov 29, 2021, 7:29 PM

#

what's the thing here?

is it that lr(dt(rf(X, y))) has different characteristics vs. rf(dt(lr(X,y ))) ?

untold tundra Nov 29, 2021, 7:30 PM

#

undone heron Wait, technical question... How do I measure performance from a .predict() run?

cross_val_score(...).mean()

desert oar Nov 29, 2021, 7:30 PM

#

untold tundra what's the thing here? is it that lr(dt(rf(X, y))) has different characteristi...

lr(dt(x,y), rf(x,y)) vs rf(dt(x,y), lr(x,y))

#

i suppose it makes sense to take 1 very deep tree and take a weighted average of it with a forest of many shallow trees

#

that is what the 1st one does

#

im not sure a random forest makes much sense on 2 features either

#

i am almost tempted to do this:

stage1:

random forest
linear regression

stage 2:

fully connected neural network with 1 hidden layer and ~5 hidden layer units

untold tundra Nov 29, 2021, 7:31 PM

#

dt is basically a sort of modal learning, lr is a mean learning, and rf a mean of mode learners

desert oar Nov 29, 2021, 7:31 PM

#

that's a great way to put it

#

in which case, yeah. i guess smashing a mode and mean together kinda makes sense

#

that said, i think maybe this entire problem would benefit from probability calibration and estimated error bounds 🙂

untold tundra Nov 29, 2021, 7:33 PM

#

yeah, it all boils down to some form of average

desert oar Nov 29, 2021, 7:33 PM

#

i really want to see confidence bands around that predicted line

#

or better yet, a probability density surface

#

i'm not sure what a hardcore bayesian machine learning person would do here

undone heron Nov 29, 2021, 7:34 PM

#

Oh Jesus now I'm seeing things said here that I have no idea what they are about lol

#

I'm just trying to get a Bachelor Degree in CS folks

#

I have the feeling something somewhere is wrong

untold tundra Nov 29, 2021, 7:35 PM

#

a neural network is a "mean of means" learner, the first mean() phase projects X into a compressed space; the second mean() is basically a distance from your input x to the nearby points in the compressed space

desert oar Nov 29, 2021, 7:35 PM

#

it doesn't have to be compressed though, right? that's the whole magic of having more hidden units than inputs

#

like kernel methods that used to be fashionable

#

this data isn't public, is it? i might be curious to mess with it, if it's public

untold tundra Nov 29, 2021, 7:36 PM

#

aprox, ```py
W, b = mean(historical data)
layer = mean(WX +B)
predictions = mean(layers)

desert oar Nov 29, 2021, 7:37 PM

#

oh, i see

untold tundra Nov 29, 2021, 7:37 PM

#

i think if you just replace mean with "taking an expectation", you could probably mess around abit and get the definition precise

undone heron Nov 29, 2021, 7:38 PM

#

desert oar this data isn't public, is it? i might be curious to mess with it, if it's publi...

aaaa it isnt public per se... It is public but it would literally be a crime if I was the one to give it to you, you'd have to request it from my local Gov.

untold tundra Nov 29, 2021, 7:38 PM

#

a layer's activations are just a weighed-mean of the previous layer's, where the weights are W

and W are just basically compressions of the original data

undone heron Nov 29, 2021, 7:38 PM

#

If you want to jump on a voice chat I can share my screen and we can chat about it, I just need to compare ensemble methods on that domain

untold tundra Nov 29, 2021, 7:38 PM

#

so the "intuitive formula" above, i think is largely correct

#

so a NN is basically an RF where the core stat part is a mean() rather than a mode

as you can just see routes from X to Y through the layers as independent regressions (basically, means), and the final layers as ensembling/mean'ing those

#

but i'll let you get back to helping, if that's what's going on

bold timber Nov 29, 2021, 8:01 PM

#

desert oar the % of variance explained by each component

Ok I undersrand now. Thank you

desert oar Nov 29, 2021, 8:01 PM

#

undone heron aaaa it isnt public per se... It is public but it would literally be a crime if ...

don't worry about it then 😛

desert oar Nov 29, 2021, 8:02 PM

#

untold tundra a layer's activations are just a weighed-mean of the previous layer's, where the...

i guess i only take issue with the "compression" part - it might be a projection into a higher-dimensional space than it started in

#

otherwise i really like this explanation

#

and you're definitely not in the way of helping at all!

untold tundra Nov 29, 2021, 8:06 PM

#

sure, its a higher-d space, but its linear in that space

#

the intuition is that the weights are basically templates of th original dataset

#

so you're projecting a new x into space where templates are the axis

#

that space isnt a compression of your new-x, its a compression of the historical data

desert oar Nov 29, 2021, 8:09 PM

#

oh, i see what you're saying

#

yeah, interesting way to think about it

#

i tend to think of it as a "recombining" or "mixing" rather than "compressing"

untold tundra Nov 29, 2021, 8:10 PM

#

well its compression just if len(W) << len(X)

#

in the sense that if len(W) == len(X), under forced interpolation, W == X

desert oar Nov 29, 2021, 8:11 PM

#

you're talking about len() as in the entire data matrix? like len(x) being the number of data points in the training set?

untold tundra Nov 29, 2021, 8:11 PM

#

yeah, well W, b = someop(X, y) right?

#

my claim is the basic heart of someop is mean

#

so really, W, b = means(X, y)

desert oar Nov 29, 2021, 8:11 PM

#

yeah i am on board there

#

linear transformations are linear combinations are means

untold tundra Nov 29, 2021, 8:12 PM

#

if len(W,b) == len(X, y), and if loss(training) == 0, then more-or-less W,b should just be X,y

oblique vine Nov 29, 2021, 8:12 PM

#

Hey, can someone explain me how do I obtain model "score" as in sklearn GridSearchCV?
I have made a model, now I want to compare the score to external test set, and I fail to get the "score" in normal range (gridsearch gives something below 1, i get -60 or sth like that

desert oar Nov 29, 2021, 8:12 PM

#

ok i see what you mean, it's an average over all data points within the possibly very-high-dimensional feature space

untold tundra Nov 29, 2021, 8:13 PM

#

yeah, it's mean**s**(X, y)

#

so if those means are just the same number as the original data points, and if you can predict all of those points without error, those means are just the data points

simple ivy Nov 29, 2021, 8:14 PM

#

hey everyone, is anyone here familiar with ONNX models?

desert oar Nov 29, 2021, 8:14 PM

#

do you think in that sense, a neural network is fundamentally different from e.g. an svm or linear regression (maybe with polynomial or other hand-transformed features)?

#

or even a general additive model for that matter

untold tundra Nov 29, 2021, 8:15 PM

#

there's a paper which provies all grad desc learners are eqv. to svm

#

*proves

desert oar Nov 29, 2021, 8:15 PM

#

i remember hearing something like that

untold tundra Nov 29, 2021, 8:15 PM

#

in my mind, i see the NN alg as basically a dial from: knn -> ensemble of lr

desert oar Nov 29, 2021, 8:16 PM

#

interesting idea

untold tundra Nov 29, 2021, 8:16 PM

#

if len(W) <<< len(X) is ensemble(LR), if len(W) ~= len(X), its knn

desert oar Nov 29, 2021, 8:17 PM

#

knn as in nearest neighbors?

#

i haven't heard that idea before

untold tundra Nov 29, 2021, 8:17 PM

#

maybe ensemble knn might be more accurate

#

yeah, to me its kinda obvious, but a lot of the marketing BS requires poeple basically ignore the weights

#

once you sub W = mean(historical data) into all the formulas, it isnt that much of a mystery

#

W = means(historical) , so W = historical if len(W) = len(historical) and loss(historical) = 0

#

which makes it knn

#

should be kinda obvious from autoencoders and the like too

#

an autoencoder just shows that the weights are basically "local aproximations" / compressions of the original data

and the main mechanism of a NN is just to put your new point into the spaces of those aproximation points, and take a mean

desert oar Nov 29, 2021, 8:26 PM

#

oh see, basically if you have 1 weight per data point you're just taking local approximation around each data point?

#

fair enough

#

i have to remember that you're talking about a time series here

#

and not a "flat" dataset of rows and columns

#

i think i was hung up on that point

untold tundra Nov 29, 2021, 8:28 PM

#

me?

desert oar Nov 29, 2021, 8:28 PM

#

are you?

untold tundra Nov 29, 2021, 8:28 PM

#

am i?

#

i'm speaking generically

desert oar Nov 29, 2021, 8:28 PM

#

i guess not then!

untold tundra Nov 29, 2021, 8:28 PM

#

i'm not the time series person

desert oar Nov 29, 2021, 8:28 PM

#

i guess i'm not sure if you're speaking abstractly about the number of parameters or about the actual shape of the weight matrix/tensor

untold tundra Nov 29, 2021, 8:29 PM

#

oh i'm speaking very aproximately

#

= here means, "is, at its heart, "

desert oar Nov 29, 2021, 8:29 PM

#

because in the basic 1-layer feedforward case you have a "1xH" weight vector where H is the number of hidden units

#

i know that in general the closer you get to 1 parameter : 1 data point, the closer you get to just memorizing the original data. but i'm not sure how well that intuition generalizes

untold tundra Nov 29, 2021, 8:30 PM

#

if you do KNN(k=1).fit(X,y).predict(X) you get exactly y, ie., 0 training loss ... why? because W = (X, y) by design

if you do repeat: NN(num_weights=len(X)).fit(X) until ==y, then you've got 0 training loss, ... why?

well it isnt literally that W = (X, y) , but W "is basically" shuffled(X, y)

desert oar Nov 29, 2021, 8:31 PM

#

sure, but what do you mean by "repeat" in this case? are you talking about stacking more layers? adding more weights? running more epochs?

untold tundra Nov 29, 2021, 8:31 PM

#

running more epocs

desert oar Nov 29, 2021, 8:31 PM

#

i don't disagree btw, but i want to make sure i understand your point if i am to borrow the idea 😉

untold tundra Nov 29, 2021, 8:31 PM

#

you can see it as a probablistic condition on W

#

like, what happens if len(W) >> len(X)

#

then it is certainly never the case that the entires of W would be the entires of X

desert oar Nov 29, 2021, 8:32 PM

#

is running more epochs really increasing the size of W though?

untold tundra Nov 29, 2021, 8:32 PM

#

no, its about permuting W until its just a rotation of X into a new space

desert oar Nov 29, 2021, 8:33 PM

#

ok, sure. or iterating as close as possible thereto

untold tundra Nov 29, 2021, 8:33 PM

#

yeah

#

i mean, i think it is pretty exact

#

if a PCA is basically just "rotate X by its means"

#

then a NN under these conditions is just the same thing

desert oar Nov 29, 2021, 8:34 PM

#

oh you are saying specifically if you have at least 1 weight per data point

untold tundra Nov 29, 2021, 8:34 PM

#

ie., W = rotate (X,y) by its means

desert oar Nov 29, 2021, 8:34 PM

#

yes, that makes sense

untold tundra Nov 29, 2021, 8:34 PM

#

yeah, if len(W) < len(X) you get more compressive

#

and end up closer to an ensemble of linear regressions

#

it's always just: mean(means...( x rotated-by means...(history)))

desert oar Nov 29, 2021, 8:36 PM

#

i guess i still don't have a great sense for what the len(W) is. if you have two one-hidden-layer networks, but one has 5 hidden units and the other has 10, the second one has greater len(W) in your eyes, right?

untold tundra Nov 29, 2021, 8:36 PM

#

i just mean all the parameters

#

as in, every since thing under optimization

desert oar Nov 29, 2021, 8:36 PM

#

yeah, ok

#

really interesting idea

#

makes sense intuitively but i might have to simulate and convince myself 🙂

#

and maybe write out some equations

#

certainly i agree with the idea that if you have enough parameters you end up memorizing and reshuffling the data rather than compressing it

#

that such a thing conceptually is similar to k-nearest-neighbors sounds logical but somehow isn't fitting right into my head. will have to tinker with it

untold tundra Nov 29, 2021, 8:38 PM

#

well a NN is just a prediction fn, f =A W1 X...A W2 X... A W3 X

desert oar Nov 29, 2021, 8:38 PM

#

sure

untold tundra Nov 29, 2021, 8:39 PM

#

so maybe, if we say, entires of W* = entires of W1, W2, W3

#

my claim amounts to something like, AW* is just a pca-like rotation of X

#

when len(W) == len(X) and when loss(training) == 0

#

ie., when the network is predicting its historical data, and when the number of parameters = the number of data points

desert oar Nov 29, 2021, 8:41 PM

#

i follow you that far

untold tundra Nov 29, 2021, 8:41 PM

#

right, so maybe the idea is something like

#

"roughly", AX on W* == AW* on X

#

ie., W* and X are basically just the same

#

this isnt how i arrived at the conclusion though

#

i arrived at my general view, by:
(1) dropout on NNs basically ensembles them.. wait, softmax/last-layer is just a mean() anyway, so they're coming into last layer as an ensemble

(2) all ML reduces to mean(), mode(), etc

(3) algs which predict their training data exactly are overfit, ie., their parameters are closer to the original data than they should be

(4) if you're perfectly interpolated and have sufficient parameters to play with, it is extremely likely your parameters are just your original points ("in a rotated space")

#

and also, if you think about the two branches of alg, either you force distributional assumptions on your historical data.. in which case you fit to a model

or you dont, in which case you fit to the data

#

a NN is just a dial between those

desert oar Nov 29, 2021, 8:54 PM

#

yeah, that much i totally follow

#

i really like that line of reasoning

#

so i can definitely see how that would lead to something knn-ish

#

i wouldn't really describe it as nearest neighbors, but certainly an increasingly local approximation

#

i also really like the mode vs mean thing

#

going to borrow that one

#

ty for the insights!

lapis sequoia Nov 29, 2021, 9:32 PM

#

to classify 1000 labels how many imgs per label do i need?

#

gonna use albumentations

untold tundra Nov 29, 2021, 9:32 PM

#

there's no universal answer to that question, if each label is a basic shape, then a handful

lapis sequoia Nov 29, 2021, 9:33 PM

#

wdym with basic shape

untold tundra Nov 29, 2021, 9:33 PM

#

it depends on what the images are of

lapis sequoia Nov 29, 2021, 9:33 PM

#

cartoon

#

anime

untold tundra Nov 29, 2021, 9:34 PM

#

at a guess, 1k/label is a minimum

lapis sequoia Nov 29, 2021, 9:34 PM

#

HAHAHAAH

#

i barely can get 100

#

xD

untold tundra Nov 29, 2021, 9:34 PM

#

well just use that and see what happens

#

how big are the images?

lapis sequoia Nov 29, 2021, 9:34 PM

#

160x160

untold tundra Nov 29, 2021, 9:35 PM

#

c. 25,000 pixels/image, 100 images/label, 1000 labels

lapis sequoia Nov 29, 2021, 9:35 PM

#

what is c.?

untold tundra Nov 29, 2021, 9:35 PM

#

"circa", it means aprox.

lapis sequoia Nov 29, 2021, 9:36 PM

#

yeah, but u are forgeting albumentations

untold tundra Nov 29, 2021, 9:36 PM

#

yeah, you can augment

lapis sequoia Nov 29, 2021, 9:37 PM

#

well, gotta scrap the images first. I did but some images do not correspond the label so i had to clean them manually and after cleaning 170 labels i got bored. I guess it will be easier scrapping better

untold tundra Nov 29, 2021, 9:37 PM

#

yip, you could probably pay on mechanical turk

#

to have them labelled for you

lapis sequoia Nov 29, 2021, 9:37 PM

#

uff not gonna pay for a hobbie xD

#

thanks tho

quiet vault Nov 29, 2021, 9:42 PM

#

Does anyone know how to use google cloud storage with colab

#

I want to know how to access a folder

lapis sequoia Nov 29, 2021, 9:42 PM

#

click on the drive folder inside colab

#

it will give u a link and request for a token

#

just click on the link

quiet vault Nov 29, 2021, 9:47 PM

#

ok thanks

pure pumice Nov 30, 2021, 12:29 AM

#

does anyone know how to filter out items from a dataframe?

#

to only show those specific items in that column

calm thicket Nov 30, 2021, 12:30 AM

#

df.loc['column']

pure pumice Nov 30, 2021, 12:30 AM

#

calm thicket `df.loc['column']`

can i show you what i mean sorry

calm thicket Nov 30, 2021, 12:31 AM

#

sure

pure pumice Nov 30, 2021, 12:32 AM

#

calm thicket sure

{'Index': [0, 1, 2, 3, 4],
'ID': [1, 2, 3, 4, 5],
'Title': ['Inception',
'The Matrix',
'Avengers: Infinity War',
'Back to the Future',
'The Good, the Bad and the Ugly'],
'Year': [2010, 1999, 2018, 1985, 1966],
'Age': ['13+', '18+', '13+', '7+', '18+'],
'IMDb': [8.8, 8.7, 8.5, 8.5, 8.8],
'Rotten Tomatoes': [87, 87, 84, 96, 97],
'Netflix': [1, 1, 1, 1, 1],
'Hulu': [0, 0, 0, 0, 0],
'Prime Video': [0, 0, 0, 0, 1],
'Disney+': [0, 0, 0, 0, 0],
'Type': [0, 0, 0, 0, 0],
'Directors': ['Christopher Nolan',
'Lana Wachowski,Lilly Wachowski',
'Anthony Russo,Joe Russo',
'Robert Zemeckis',
'Sergio Leone'],
'Genres': ['Action,Adventure,Sci-Fi,Thriller',
'Action,Sci-Fi',
'Action,Adventure,Sci-Fi',
'Adventure,Comedy,Sci-Fi',
'Western'],
'Country': ['United States,United Kingdom',
'United States',
'United States',
'United States',
'Italy,Spain,West Germany'],
'Language': ['English,Japanese,French',
'English',
'English',
'English',
'Italian'],
'Runtime': [148.0, 136.0, 149.0, 116.0, 161.0]}

#

so this is the data in my dataframe

#

i need to select 4 genres of my choice from the genres column and filter the dataframe so that only those 4 are left

calm thicket Nov 30, 2021, 12:34 AM

#

ok

pure pumice Nov 30, 2021, 12:35 AM

#

calm thicket ok

okay wait sorry

#

just ignore everything i just said

#

the first step i had to do was to take the genres column and only keep the first genre in it, like if the genres column has comedy,drama,romance. I had to turn it into just comedy

#

df['Genres'] = df['Genres'].str.extract(r'([A-z]+)') #
df.head()

#

i used that^

#

now i am being asked in the instructions to select 4 genres of my choice from the genres column and filter the dataframe so that only those 4 are left

calm thicket Nov 30, 2021, 12:37 AM

#

i c

#

you can use

#

!d pandas.Series.isin

arctic wedgeBOT Nov 30, 2021, 12:38 AM

#

pandas.Series.isin


Series.isin(values)```
Whether elements in Series are contained in values.

Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

calm thicket Nov 30, 2021, 12:39 AM

#

so, df.loc['Genres'].isin(genres) is a series with bools you can index with

pure pumice Nov 30, 2021, 12:41 AM

#

so using the series.isin

#

id have each genre in the values area

pure pumice Nov 30, 2021, 12:41 AM

#

arctic wedge

each genre i include in my filter

calm thicket Nov 30, 2021, 12:44 AM

#

yeah

#

in a list

#

or tuple, whatever

pure pumice Nov 30, 2021, 12:49 AM

#

calm thicket yeah

okay thank you, gonna try rn

pure pumice Nov 30, 2021, 12:52 AM

#

calm thicket in a list

do u mean like this df.loc['Genres'].isin('Action', 'Adventure' , 'Sci-fi', 'Thriller')

calm thicket Nov 30, 2021, 12:52 AM

#

i think you need a list, not separate arguments

pure pumice Nov 30, 2021, 12:54 AM

#

calm thicket i think you need a list, not separate arguments

but you know how before this step

#

i used df['Genres'] = df['Genres'].str.extract(r'([A-z]+)')
df.head()

#

to only display one genre in the column

#

meaning the genres arnt in a list anymore

quiet vault Nov 30, 2021, 1:01 AM

#

I am trying to access a directory with a ton of images stored in google cloud drive using colab. I type this in ```py
!gcloud config set project {project_id}
!gsutil cp -r dir gs://digits

And get the following error:

Updated property [core/project].
CommandException: No URLs matched: dir```
Can someone tell me what I have done wrong

old plinth Nov 30, 2021, 1:29 AM

#

Hey guys so i have a doubt regarding tensorflow. I have been working with pytorch for a long time and felt like i needed to give tensorflow a shot. So right now I am able to understand custom training loops and all of those things. Only doubt is like pytorch where there is a custom dataset and dataloader class from torch.utils.data is there anything flexible like that in tensorflow that is easier to use for custom pre processing of data. Like what is most commonly used for creating custom dataset like we do in pytorch?

pure pumice Nov 30, 2021, 1:41 AM

#

@serene scaffold hey can I please get a little bit more help before i hand this project in?

#

{'Index': [0, 1, 2, 3, 4],
'ID': [1, 2, 3, 4, 5],
'Title': ['Inception',
'The Matrix',
'Avengers: Infinity War',
'Back to the Future',
'The Good, the Bad and the Ugly'],
'Year': [2010, 1999, 2018, 1985, 1966],
'Age': ['13+', '18+', '13+', '7+', '18+'],
'IMDb': [8.8, 8.7, 8.5, 8.5, 8.8],
'Rotten Tomatoes': [87, 87, 84, 96, 97],
'Netflix': [1, 1, 1, 1, 1],
'Hulu': [0, 0, 0, 0, 0],
'Prime Video': [0, 0, 0, 0, 1],
'Disney+': [0, 0, 0, 0, 0],
'Type': [0, 0, 0, 0, 0],
'Directors': ['Christopher Nolan',
'Lana Wachowski,Lilly Wachowski',
'Anthony Russo,Joe Russo',
'Robert Zemeckis',
'Sergio Leone'],
'Genres': ['Action,Adventure,Sci-Fi,Thriller',
'Action,Sci-Fi',
'Action,Adventure,Sci-Fi',
'Adventure,Comedy,Sci-Fi',
'Western'],
'Country': ['United States,United Kingdom',
'United States',
'United States',
'United States',
'Italy,Spain,West Germany'],
'Language': ['English,Japanese,French',
'English',
'English',
'English',
'Italian'],
'Runtime': [148.0, 136.0, 149.0, 116.0, 161.0]}

serene scaffold Nov 30, 2021, 1:47 AM

#

@pure pumice I'm busy RN but I guess ask your question

pure pumice Nov 30, 2021, 1:49 AM

#

serene scaffold <@104664534446272512> I'm busy RN but I guess ask your question

thanks, remember when yesterday we only kept the first genre in the genre column

pure pumice Nov 30, 2021, 1:49 AM

#

serene scaffold <@104664534446272512> I'm busy RN but I guess ask your question

using df['Genres'] = df['Genres'].str.extract(r'([A-z]+)') #
df.head()

#

well now i need to

#

Select 4 Genres of your choice. Filter your dataframe so that only those 4 Genres are left

#

after that first step which we did

tribal oracle Nov 30, 2021, 1:58 AM

#

any data scientist can tell me:

#

you store the data on an excel, for example, when you need to work with it, you transform it ALL from excel to dict(for example)?

#

or you work directly on the excel?

serene scaffold Nov 30, 2021, 2:10 AM

#

pure pumice # Select 4 Genres of your choice. Filter your dataframe so that only those 4 Gen...

well, what have you tried?

serene scaffold Nov 30, 2021, 2:21 AM

#

tribal oracle you store the data on an excel, for example, when you need to work with it, you ...

there are libraries for reading excel files into Python

#

it's basically effortless.

tribal oracle Nov 30, 2021, 2:24 AM

#

serene scaffold there are libraries for reading excel files into Python

i know, that was not my question, i wanted to know how you handle with it

serene scaffold Nov 30, 2021, 2:25 AM

#

handle with it?

tribal oracle Nov 30, 2021, 2:25 AM

#

with the data

#

i'll send an example, one sec

serene scaffold Nov 30, 2021, 2:25 AM

#

it depends on what you're trying to do shrug2

serene scaffold Nov 30, 2021, 2:25 AM

#

tribal oracle i'll send an example, one sec

don't post any screenshots.

tribal oracle Nov 30, 2021, 2:27 AM

#

def load_excel(arquivo_excel, index_coluna):
    df = pd.read_excel(arquivo_excel)
    df.set_index(index_coluna).T.to_dict('list')
    return df.to_dict(orient='list')

i'm doing that to load from the .xlsc to (for example) that:

dict_machine = {'ignore-2': [],
                'ignore-1': [],
                "ignore": [],
                "ignore2": [],
                "ignore4": []}

serene scaffold Nov 30, 2021, 2:28 AM

#

tribal oracle ```py def load_excel(arquivo_excel, index_coluna): df = pd.read_excel(arquiv...

so it's a dict of lists. do you want that? Also df.set_index(index_coluna).T.to_dict('list') does not modify the DataFrame in-place, so that statement has no effect.

tribal oracle Nov 30, 2021, 2:28 AM

#

wait, so i can totally remove it?

serene scaffold Nov 30, 2021, 2:29 AM

#

tribal oracle wait, so i can totally remove it?

yes. or that can be the statement that you return

#

but at that point you might as well have

def load_excel(arquivo_excel, index_coluna):
    return pd.read_excel(arquivo_excel).set_index(index_coluna).T.to_dict('list')

tribal oracle Nov 30, 2021, 2:31 AM

#

hmmmm, let me try

#

nah, its returning a key error

serene scaffold Nov 30, 2021, 2:33 AM

#

shrug2

tribal oracle Nov 30, 2021, 2:33 AM

#

maybe something below is conflicting

#

but anyways, nevermind

pure pumice Nov 30, 2021, 3:13 AM

#

serene scaffold well, what have you tried?

SORRRY im late filter1 = df['Genres'] == "Action,Adventure,Sci-Fi,Thriller"

serene scaffold Nov 30, 2021, 3:13 AM

#

pure pumice SORRRY im late filter1 = df['Genres'] == "Action,Adventure,Sci-Fi,Thriller"

Can you think of why that doesn't work

pure pumice Nov 30, 2021, 3:13 AM

#

because

#

the genres arnt listed as that

pure pumice Nov 30, 2021, 3:14 AM

#

serene scaffold Can you think of why that doesn't work

after we did the df['Genres'] = df['Genres'].str.extract(r'([A-z]+)') #
df.head() yesterday

serene scaffold Nov 30, 2021, 3:14 AM

#

@pure pumice but also you're comparing them to one string

#

Which is ordered

pure pumice Nov 30, 2021, 3:14 AM

#

ya because I have them all under one ""

serene scaffold Nov 30, 2021, 3:14 AM

#

pure pumice after we did the df['Genres'] = df['Genres'].str.extract(r'([A-z]+)') # df.head(...

Also that's going to remove all but the first genre

pure pumice Nov 30, 2021, 3:16 AM

#

ya

pure pumice Nov 30, 2021, 3:16 AM

#

serene scaffold Also that's going to remove all but the first genre

so this ilter1 = df['Genres'] == "Action,Adventure,Sci-Fi,Thriller" only works before we removed all but first

serene scaffold Nov 30, 2021, 3:38 AM

#

pure pumice so this ilter1 = df['Genres'] == "Action,Adventure,Sci-Fi,Thriller" only works b...

it would only work for rows where the Genre is literally "Action,Adventure,Sci-Fi,Thriller"

pure pumice Nov 30, 2021, 3:38 AM

#

serene scaffold it would only work for rows where the Genre is literally `"Action,Adventure,Sci-...

ya it only worked with that

#

before we removed

#

everything but the first one

serene scaffold Nov 30, 2021, 3:39 AM

#

pure pumice everything but the first one

but you don't really want that either, do you?

pure pumice Nov 30, 2021, 3:39 AM

#

serene scaffold but you don't really want that either, do you?

nah

pure pumice Nov 30, 2021, 4:22 AM

#

serene scaffold but you don't really want that either, do you?

filter1 = df['Genres'] == "Action"
filter2 = df["Genres"] == 'Adventure'
filter3 = df['Genres'] == 'Sci-Fi'
filter4 = df['Genres'] == 'Thriller'

#

this wouldnt work either eh?

serene scaffold Nov 30, 2021, 4:23 AM

#

@pure pumice you don't want to be using == with the whole column

#

Because it will check if the value in that column matches the string exactly, from beginning to end.

pure pumice Nov 30, 2021, 4:29 AM

#

serene scaffold Because it will check if the value in that column matches the string exactly, fr...

What would u suggest doing then?

#

cuz just one = sign doesnt work either

serene scaffold Nov 30, 2021, 4:30 AM

#

pure pumice cuz just one = sign doesnt work either

Well of course not. That's for assignment

pure pumice Nov 30, 2021, 4:40 AM

#

serene scaffold Well of course not. That's for assignment

am i on the right track at least?

serene scaffold Nov 30, 2021, 4:42 AM

#

Not at the moment, if I'm being perfectly honest. Your goal is to get those rows where at least one of the genres belongs to one of four that you pick, yes?

pure pumice Nov 30, 2021, 4:45 AM

#

serene scaffold Not at the moment, if I'm being perfectly honest. Your goal is to get those rows...

Select 4 Genres of your choice. Filter your dataframe so that only those 4 Genres are left

#

exact instructions^

serene scaffold Nov 30, 2021, 4:46 AM

#

@pure pumice so you need to pick those columns where the genres are a subset of the four that you pick

pure pumice Nov 30, 2021, 4:46 AM

#

yes

#

like when i do df.head()

#

only movies in those 4 genres should display

serene scaffold Nov 30, 2021, 4:49 AM

#

@pure pumice "only movies in those four genres" needs a more robust definition, since a movie can belong to more than one genre

#

Does it need to belong to all four? Exactly one? At least one (but possibly others that aren't?)?

pure pumice Nov 30, 2021, 4:51 AM

#

just one

#

because before this

#

we deleted all the genres from the column and kept one

serene scaffold Nov 30, 2021, 4:51 AM

#

Exactly one?

pure pumice Nov 30, 2021, 4:51 AM

#

so only one genre will show

#

if uk what i mean

serene scaffold Nov 30, 2021, 4:52 AM

#

pure pumice we deleted all the genres from the column and kept one

Are you sure you were supposed to delete the others?

pure pumice Nov 30, 2021, 4:52 AM

#

serene scaffold Are you sure you were supposed to delete the others?

Using the original dataframe, take the genres column, and only keep the first genre.

For example, if the value was previously Comedy,Drama,Romance, then it would become Comedy

Select 4 Genres of your choice. Filter your dataframe so that only those 4 Genres are left

Create a pivot table of the average runtime of movies over time. The rows are therefore the year

The columns will be the 4 Genres you filtered for

serene scaffold Nov 30, 2021, 4:53 AM

#

@pure pumice alright

#

!docs pandas.Series.isin

arctic wedgeBOT Nov 30, 2021, 4:53 AM

#

pandas.Series.isin


Series.isin(values)```
Whether elements in Series are contained in values.

Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

pure pumice Nov 30, 2021, 4:55 AM

#

okay so where it says values

#

id put in the genres

#

in a list?

#

@serene scaffold

serene scaffold Nov 30, 2021, 4:56 AM

#

@pure pumice try it and see

pure pumice Nov 30, 2021, 5:00 AM

#

serene scaffold <@104664534446272512> try it and see

would series

#

be df?

serene scaffold Nov 30, 2021, 5:01 AM

#

@pure pumice a series is basically a stand alone column

#

I must now sleep

pure pumice Nov 30, 2021, 5:03 AM

#

serene scaffold <@104664534446272512> a series is basically a stand alone column

so i would put "Genres" instead of series

#

okay goodnight

serene scaffold Nov 30, 2021, 5:03 AM

#

The genres column is a series.

pure pumice Nov 30, 2021, 5:04 AM

#

serene scaffold The genres column is a series.

Series.isin('Action','Adventure','Sci-Fi','Thriller')

serene scaffold Nov 30, 2021, 5:04 AM

#

That won't work

pure pumice Nov 30, 2021, 5:04 AM

#

ya series not defined

serene scaffold Nov 30, 2021, 5:04 AM

#

The genre column is the series

#

Also you passed four strings individually as four arguments

#

Not a list

#

Good luck!

pure pumice Nov 30, 2021, 5:05 AM

#

so they should all be under one " "

serene scaffold Nov 30, 2021, 5:05 AM

#

No.

#

You should have passed one list with four strings.

pure pumice Nov 30, 2021, 5:06 AM

#

okay ya i cant do this

#

i will try again

#

tomorrow

serene scaffold Nov 30, 2021, 5:06 AM

#

meow_party

cold skiff Nov 30, 2021, 5:49 AM

#

oh both the argument and the parameters have to be set like objects

torpid elk Nov 30, 2021, 6:02 AM

#

I’m trying to learn data science for python. Any good resources you guys can recommend? Or site where I can practice my skills?

magic edge Nov 30, 2021, 6:03 AM

#

torpid elk I’m trying to learn data science for python. Any good resources you guys can rec...

How about an AI that learns by talking

torpid elk Nov 30, 2021, 6:04 AM

#

magic edge How about an AI that learns by talking

Too Advanced for me right now I think. I’m not that skilled yet.

magic edge Nov 30, 2021, 6:22 AM

#

torpid elk Too Advanced for me right now I think. I’m not that skilled yet.

How about a game using variables. like memory game

torpid elk Nov 30, 2021, 6:26 AM

#

magic edge How about a game using variables. like memory game

Sounds doable. I’ll explore some existing projects. Thanks

frank torrent Nov 30, 2021, 8:47 AM

#

Hi, does anyone know how I could create an orientation histogram using SIFT? I have found the key points and now would like to make an orientation histogram for some of them

next lance Nov 30, 2021, 9:24 AM

#

Why can we not make fully auto driving cars when we already have technology to do that

untold tundra Nov 30, 2021, 9:33 AM

#

we dont have the technology

rigid dawn Nov 30, 2021, 9:35 AM

#

What's the difference between MYSQL, POWER BI , AND TABLEAU

untold tundra Nov 30, 2021, 9:36 AM

#

mysql = database for storing data in tables; powerbi = microsoft data visualization & reporting tool which gets data from a database; tableu = non-microsoft "alternative" to powerbi, maybe a bit harder to use

rigid dawn Nov 30, 2021, 9:37 AM

#

So, basically powerbi extracts data from MySQL

#

??

untold tundra Nov 30, 2021, 9:38 AM

#

it's capable of doing that, yes

rigid dawn Nov 30, 2021, 9:39 AM

#

What should I learn first?

#

I am on the data analysis road

#

Done with python basics( i still get stuck a lot of times)

#

Modules also

#

Numpy pandas matplotlib seaborn plotly

#

Now what should I do?

untold tundra Nov 30, 2021, 9:41 AM

#

err, it is very important to learn SQL

#

the easiest way of doing that first is using sqlite3 in python, as you dont have to install anything

#

so go learn SQL & sqlite3, and when you've done that, then maybe look at powerbi

rigid dawn Nov 30, 2021, 9:42 AM

#

untold tundra the easiest way of doing that first is using sqlite3 in python, as you dont have...

Oh ,is that a module?

untold tundra Nov 30, 2021, 9:42 AM

#

import sqlite3

rigid dawn Nov 30, 2021, 9:42 AM

#

So, I can use all python modules in MySQL?

untold tundra Nov 30, 2021, 9:42 AM

#

sqlite3 is a simpler database, that is a bit like mysql

rigid dawn Nov 30, 2021, 9:43 AM

#

Sure, I would try that first. And after that move to powerbi

#

Bro, can I send you friend request? If that's not an issue

#

If you comfortable with that

untold tundra Nov 30, 2021, 9:54 AM

#

you can send one, i wont accept immediately; i might accept at some point

acoustic forge Nov 30, 2021, 10:40 AM

#

What's the best way to deploy a machine learning model on Azure? Which service should I look into?

untold tundra Nov 30, 2021, 10:40 AM

#

depends what you mean by "deploy"

#

the obvious service is AzureML

acoustic forge Nov 30, 2021, 10:42 AM

#

Perfect - Yeah, I think that's what I was looking for

untold tundra Nov 30, 2021, 10:42 AM

#

microsoft has lots of free courseware in this area, on their microsoft learning github

#

have a look at: https://microsoftlearning.github.io/mslearn-dp100/

acoustic forge Nov 30, 2021, 10:42 AM

#

It's primarily cause we have to design a deployment model for a fictional company as part of one of our courses

#

Basically I need to make an architecture that scales well

#

I'll send a picture of the architecture in a bit, maybe something stands out to you as being comically wrong

untold tundra Nov 30, 2021, 10:43 AM

#

i suspect if you just followed the instructions on that course above, maybe first 6 or 7 labs

#

you'd basically have the solution

buoyant nebula Nov 30, 2021, 10:52 AM

#

Can any body help me on this query

#

https://stackoverflow.com/questions/68983970/subtracting-value-in-specific-order-and-replace-invoice-number-and-date-with-the

Stack Overflow

Subtracting value in specific order and replace invoice number and ...

I want to create a logic and apply it to dataframe to subtract -ve sales(sales return) in the highest current sale. The logic should follow as below:

-ve sales should subtract from the highest day's

acoustic forge Nov 30, 2021, 10:59 AM

#

untold tundra you'd basically have the solution

Anything here that (to you) looks very out of place? Any suggestions? Basically we need to present a scalable architecture to people who don't know anything about architecture/machine learning for a machine learning model in steel plate fault detection

desert oar Nov 30, 2021, 1:59 PM

#

acoustic forge Anything here that (to you) looks very out of place? Any suggestions? Basically ...

hah this looks a lot like my setup at a previous job

#

this is for people to actually train models?

#

i am generally skeptical of letting people who know 0 machine learning do machine learning

#

also the databricks notebook interface is hot steaming garbage

#

not worth it imo, use databricks-connect or just run non-interactive jobs

acoustic forge Nov 30, 2021, 2:01 PM

#

I understand your skepticism 😛 This is a completely fictional report. And we are forced to use Databricks

#

Well, the report is not fictional, the case is

desert oar Nov 30, 2021, 2:01 PM

#

jesus, are you working for my previous employer?

acoustic forge Nov 30, 2021, 2:01 PM

#

Hahahaha our teacher is a consultant, so if you work in consulting I might very well be

desert oar Nov 30, 2021, 2:02 PM

#

i don't, but fuck the microsoft salespeople who convinced upper mgmt to force databricks on everyone

#

dont get me wrong, i really liked having a managed and optimized spark cluster

#

but there was a moment when they actually believed we could do actual work using that interface

acoustic forge Nov 30, 2021, 2:02 PM

#

It's honestly crazy. Its such a hindrance for actual work

desert oar Nov 30, 2021, 2:02 PM

#

like not just "big data" work -- they thought we could do all of our work either on our laptops or databricks

acoustic forge Nov 30, 2021, 2:03 PM

#

Can I send you an updated architecture diagram?

desert oar Nov 30, 2021, 2:03 PM

#

sure

#

and im so relieved that you agree

#

serious tip: set up a local spark cluster on a server, so people can actually develop and test their spark code without burning your very very expensive databricks cluster time

acoustic forge Nov 30, 2021, 2:03 PM

#

desert oar Nov 30, 2021, 2:03 PM

#

or even set it up on data scientists' / ml devs' machines

acoustic forge Nov 30, 2021, 2:03 PM

#

desert oar serious tip: set up a local spark cluster on a server, so people can actually de...

Here's the thing, this will never see the light of day. There's no factory, no nothing

#

We basically had to make up the case ourself

desert oar Nov 30, 2021, 2:04 PM

#

i guess my point is that dev affordances must at least be part of the plan somewhere

#

at least in my opinion

#

maybe consultants dont care

acoustic forge Nov 30, 2021, 2:04 PM

#

desert oar i guess my point is that dev affordances must at least be part of the plan somew...

I would agree, but I am honestly convinced that our teacher knows next to nothing about software development. (As you might be able to tell, I am NOT a fan of this course)

desert oar Nov 30, 2021, 2:04 PM

#

if the plan is "make pretty diagram, tell contributors to f off" then your contributors will all quit for better and better-paying jobs, and your project will fail

#

yeah this seems kind of rough

#

so if it makes you feel better: yes, that architecture was good enough for a global 500

#

so it's good enough for your course

#

and actually on the "production" side it was pretty damn good

acoustic forge Nov 30, 2021, 2:07 PM

#

Super nice! I am very happy about that, thanks a lot 🙂

desert oar Nov 30, 2021, 2:08 PM

#

and better diagrams than we had too 😛

#

ours were literally low res jpg's someone had downloaded off the azure site

acoustic forge Nov 30, 2021, 2:10 PM

#

Yeah, we're not super into making these diagrams, we're data scientists, not necessarily cloud architects. He was super savage to us last time (cause we kept it more technical than he wanted). So this time we really tried to reduce it something anyone could understand (more or less)

#

But god damn. Soon I will never have to touch databricks again (hopefully)

desert oar Nov 30, 2021, 2:13 PM

#

this looks good

#

and honestly if you do need spark, databricks beats the hell out of running your own yarn cluster

#

but yes avoid that accursed notebook interface

wicked grove Nov 30, 2021, 3:30 PM

#

desert oar this looks good

Hello, can you please tell me if this is a good tutorial from which i can follow and learn
https://towardsdatascience.com/how-to-build-a-movie-recommender-system-in-python-using-lightfm-8fa49d7cbe3b

Medium

How to build a Movie Recommender System in Python using LightFm

In this blog post, we will be creating a movie recommender system in python, that suggest new movies to the user based on their viewing…

#

I want to build a movie recommendation system using ml / can you suggest any good ml projects that i can do

untold tundra Nov 30, 2021, 5:05 PM

#

seems ok

queen crag Nov 30, 2021, 5:19 PM

#

Are courses on free code camp useful?

serene scaffold Nov 30, 2021, 5:20 PM

#

queen crag Are courses on free code camp useful?

useful for what goal?

robust jungle Nov 30, 2021, 5:32 PM

#

does anyone know how to fix this? absl.flags._exceptions.IllegalFlagValueError: flag --sample_1_of_n_eval_examples=: invalid literal for int() with base 10: ''

#

coming from:

#

python model_main_tf2.py
--model_dir=$/tmp/model_outputs --num_train_steps=$10000
--sample_1_of_n_eval_examples=$1
--pipeline_config_path=$/Users/admin/PycharmProjects/Imageclassifier/model/object_detection/efficientdet_d7_coco17_tpu-32 2/pipeline.config
--alsologtostderr

desert oar Nov 30, 2021, 5:49 PM

#

@robust jungle did you write this program? model_main_tf2.py

#

it seems like someone gave you incorrect usage instructions

#

that, or you wrote $10000 when you meant 10000

#

the $ is not a "number", it introduces a shell variable

#

so $1 is the first argument of a shell script

#

so it's not clear what $1 or $10000 are supposed to be

#

when you try to use a nonexistent variable in a shell script, the value is an empty string

#

so that's probably what's causing this error

#

if you were given usage instructions, can you post those here?

robust jungle Nov 30, 2021, 5:56 PM

#

desert oar <@!702237083171029122> did you write this program? `model_main_tf2.py`

no

desert oar Nov 30, 2021, 5:56 PM

#

if you were given usage instructions, can you post those here?

robust jungle Nov 30, 2021, 5:57 PM

#

ngl I probbably added that to experiment since I saw it on each line

#

but I just removed it

#

still doesnt work

#

just removed it from the other line too

#

new error (placeholder thing was still there), seems to have fixed it

#

thanks

#

side note: the thing im running is commented at the top of model_main_tf2.py, which can be downloaded on the tensorflow github

#

nevermind

#

just looked at it again

#

that one was on me

#

I did a goof

#

new issue

#

TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. tensorflow.python.lib.io._pywrap_file_io.BufferedInputStream(filename: str, buffer_size: int, token: tensorflow.python.lib.io._pywrap_file_io.TransactionToken = None)

Invoked with: None, 524288

#

comes from:

PIPELINE_CONFIG_PATH=path/to/pipeline.config
MODEL_DIR=/tmp/model_outputs
NUM_TRAIN_STEPS=10000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python model_main_tf2.py -- \
  --model_dir=$MODEL_DIR --num_train_steps=$NUM_TRAIN_STEPS \
  --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
  --pipeline_config_path=$PIPELINE_CONFIG_PATH \
  --alsologtostderr

#

ignore the difference in the last bit, thought the top part was a quickstart and the bottom was placeholders

queen crag Nov 30, 2021, 6:26 PM

#

serene scaffold useful for what goal?

For now to get Introduced to the subject and be able to execute few projects ..I am planning to integrate it with my current skillsets

pure pumice Nov 30, 2021, 8:10 PM

#

@serene scaffold I figured it out df[df["Genres"].isin(['Action','Animation','Comedy','Adventure'])]

serene scaffold Nov 30, 2021, 8:22 PM

#

pure pumice <@!253696366952316929> I figured it out df[df["Genres"].isin(['Action','Animatio...

meow_party

pure pumice Nov 30, 2021, 8:23 PM

#

serene scaffold <a:meow_party:912727194063732768>

is there a way to permanently change my df to show only those genres when i do df.head()??

#

like we did with rotten tomatoes

serene scaffold Nov 30, 2021, 8:30 PM

#

@pure pumice you can write over the existing one

pure pumice Nov 30, 2021, 8:32 PM

#

serene scaffold <@104664534446272512> you can write over the existing one

what do you mean by over?

pure pumice Nov 30, 2021, 8:39 PM

#

serene scaffold <@104664534446272512> you can write over the existing one

nvm figured it out

#

thanks

normal radish Nov 30, 2021, 8:40 PM

#

Hey guys I am in serious need of help on a CNN. Do any of you have some time to spare?

normal radish Nov 30, 2021, 8:59 PM

#

Please anyone with any knowledge of convolutional neural networks! You will save my day!

serene scaffold Nov 30, 2021, 9:41 PM

#

@normal radish try asking your CNN question. It's not likely that anyone will commit to a question you haven't asked yet, even if they know about CNNs.

#

(This goes for any time you want to ask a question on the internet: just ask the question.)

normal radish Nov 30, 2021, 9:47 PM

#

Ok so I have a CNN where I have a 180x180x3 (RGB) being convoluded with 32 filters. It is correct too say that this gives med 32 new images?

#

@serene scaffold

untold tundra Nov 30, 2021, 9:50 PM

#

no

#

there will be 32 activation maps when the 32 filters are applied, but the activation maps arent new images as such

normal radish Nov 30, 2021, 9:53 PM

#

But it will change from a 180x180x3 too a 180x180x32 right?

untold tundra Nov 30, 2021, 9:53 PM

#

if i recall correctly, yes -- you can just build it in keras and then print a summary

normal radish Nov 30, 2021, 9:53 PM

#

Yeah I did that but not sure I understand how

untold tundra Nov 30, 2021, 9:53 PM

#

from keras.applications.vgg16 import VGG16
model = VGG16()
print(model.summary())

#

compare the VGG16 diagram (google images) with the summary

#

convolving layers are often compressive

normal radish Nov 30, 2021, 9:55 PM

#

Thing is each of the filters will be a 180x180x3 but turn a into a 180x180x1. Does it just add the values down?

untold tundra Nov 30, 2021, 9:55 PM

#

so you can go from 180x180x3 -> lots of things

#

yes

#

the "convoltion product" is a dot product thru' the image

normal radish Nov 30, 2021, 9:55 PM

#

Do you have 2 seconds too talk?

untold tundra Nov 30, 2021, 9:55 PM

#

i have two seconds to type

#

are you aware of the stanford course on this? i'm sure there's a video which does parameter counting

normal radish Nov 30, 2021, 9:57 PM

#

I’m not sure if I understand. I understand that the filter is parsing over every pixel in a kernel and here I dots and takes the sum for the new pixel. But… there is 3 layers. How does this turn into 1?

normal radish Nov 30, 2021, 9:57 PM

#

untold tundra are you aware of the stanford course on this? i'm sure there's a video which doe...

I am not aware no

untold tundra Nov 30, 2021, 9:57 PM

#

https://www.youtube.com/playlist?list=PLf7L7Kg8_FNxHATtLwDceyh72QQL9pvpQ

YouTube

Stanford Computer Vision

#

6 is where CNN itself starts: https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLf7L7Kg8_FNxHATtLwDceyh72QQL9pvpQ&index=6

#

the courseware is: http://cs231n.stanford.edu/

#

https://cs231n.github.io/convolutional-networks/

CS231n Convolutional Neural Networks for Visual Recognition

Course materials and notes for Stanford class CS231n: Convolutional Neural Networks for Visual Recognition.

normal radish Nov 30, 2021, 9:58 PM

#

Do you know the answer to my previous question? I will watch this tomorrow

untold tundra Nov 30, 2021, 9:59 PM

#

its like a volumtric dot product

#

you take the pixels of all three layers, and you write them in a single row

#

then the dot product is just the filter's weights in a single row

#

dot'd with those

#

visually, the filter "punches thru'" the layers

normal radish Nov 30, 2021, 10:00 PM

#

So you dot the first pixel in every layer and then the next

untold tundra Nov 30, 2021, 10:01 PM

#

i cant recall the exact formula, but iirc, i think basicaly you take the first weight, eg., w00 and you do that with each layer, so x0red*w00 + x0blue*w00 + ...

#

the idea is that w00 is the weight for "that part of the image"

#

and that there are three color channels is a bit of a distraction, it's duplicated information

#

it's halfway down the page of notes i sent: https://cs231n.github.io/convolutional-networks/

normal radish Nov 30, 2021, 10:05 PM

#

So if the first pixel in the red layer is 2, blue is 3 and green is 1 and the weights are all the the first pixel in the new feature map is 6?

untold tundra Nov 30, 2021, 10:06 PM

#

well the output, ie., activation map, is a scanning of the filter across the image

normal radish Nov 30, 2021, 10:08 PM

#

But you say it punches through basically adding the values of the pixels in each layer?

#

Resulting in an “image” and not 3

untold tundra Nov 30, 2021, 10:10 PM

#

yeah, but the detail is subtle, and i wouldnt want to get it wrong

quiet vault Nov 30, 2021, 10:10 PM

#

normal radish But it will change from a 180x180x3 too a 180x180x32 right?

If you are using padding, yes

normal radish Nov 30, 2021, 10:10 PM

#

I am

#

Padding=same

#

And strides=1

untold tundra Nov 30, 2021, 10:11 PM

#

the 180x180x32 activation maps are "images" made by weights, but the weights can be 3x3x3, say

#

or 3x3x1

normal radish Nov 30, 2021, 10:12 PM

#

So it could give me 3 “images” pr. filter again?

untold tundra Nov 30, 2021, 10:12 PM

#

each cell in the "image" ( activation map ) corresponds to the "same reigion" in the input picture

#

so the top left of each activation map is the top left of the input

#

and the value is the sum of the filters applied to that reigion

normal radish Nov 30, 2021, 10:13 PM

#

Yes that’s the scan it makes with the filter

untold tundra Nov 30, 2021, 10:13 PM

#

if you see a filter as a volume: 3x3x3

#

then it's dot'd with the same volume in the input, 3x3x3

#

and all i can remember about that, is the math is basically, take those 9 numbers in the top corner, and dot them with 9 numbers in the filter

#

basically in quite a naive linear order

normal radish Nov 30, 2021, 10:14 PM

#

I see a filter as 3 values in top 3 in middle and 3 in bottom

#

Yes okay

#

What confuses me was how 180x180x3 turned into 180x180x32

untold tundra Nov 30, 2021, 10:16 PM

#

well the "volume in weights" dot "volume in the image" is a single number

normal radish Nov 30, 2021, 10:16 PM

#

Since in my case I convolude more than once resulting in 180x180x32 turning into 180x180x64 after and I thought it would be 32*64 activation maps

#

So 180x180x(32*64)

untold tundra Nov 30, 2021, 10:17 PM

#

the underlying operation between each filter and each reigion of the image is just a dot product

#

so it projects just to a single number for each application of the filter

normal radish Nov 30, 2021, 10:18 PM

#

I think I get it. Thanks for the answers @untold tundra and @quiet vault

untold tundra Nov 30, 2021, 10:18 PM

#

sure, i'd strongly recommend watching the standford course

#

all of these questions are answered very clearly and logically

#

good night

normal radish Nov 30, 2021, 10:32 PM

#

Goodnight

wooden cosmos Nov 30, 2021, 11:12 PM

#

Hello, i have a question regarding the implementation of a particular neural network. I understand the model but i can't figure out how this guy does backpropagation. Could someone explain it to me? https://teddykoker.com/2019/12/beating-the-odds-machine-learning-for-horse-racing/

Teddy Koker

Beating the Odds: Machine Learning for Horse Racing

Inspired by the story of Bill Benter, a gambler who developed a computer model that made him close to a billion dollars1 betting on horse races in the Hong Kong Jockey Club (HKJC), I set out to see if I could use machine learning to identify inefficiencies in horse racing wagering. Kit Chellel, The Gambler Who Cracked the Horse-Racing Code, ↩

distant trout Dec 1, 2021, 12:50 AM

#

if nayone could help with minimax ai come to kiwi chanel 😂

buoyant nebula Dec 1, 2021, 1:43 AM

#

buoyant nebula https://stackoverflow.com/questions/68983970/subtracting-value-in-specific-order...

Can anybody help me on this..

lusty valley Dec 1, 2021, 6:21 AM

#

Can somebody explain a classification report done on an SGD classifier to me. I have 79% precision and 100% recall on 0s but I have 0% precision and recall on 1s

tidal bough Dec 1, 2021, 6:23 AM

#

That means your classifier classifies everything as 0, I believe

lusty valley Dec 1, 2021, 6:27 AM

#

I see. So it’s useless then

#

Not enough features

slender kestrel Dec 1, 2021, 6:49 AM

#

yo anyone who is learning data science or working in field of data science ? be kind enough to hit me up on dms or ping me

acoustic forge Dec 1, 2021, 7:28 AM

#

Hey guys - So I am creating a real estate regression model based on historical sales of real estate in the Copenhagen (Denmark) area. I was curious if anyone has any cool articles regarding the performance and real-world viability of these types of models?

#

I know how to create it, I am just curious how performant these things are, especially cause I don't know much about real estate, but will be buying an apartment soon

quasi ether Dec 1, 2021, 9:00 AM

#

def prepare(filepath):
    size=50
    img=cv2.imread(filepath)
    img=cv2.resize(img,(size,size))
    return img.reshape(-1,size,size,1)

#

i need help

#

return img.reshape(-1,size,size,1)

warm jungle Dec 1, 2021, 9:06 AM

#

reshape takes a single argument (normally a tuple)

quasi ether Dec 1, 2021, 9:08 AM

#

warm jungle reshape takes a single argument (normally a tuple)

i know

#

https://youtu.be/A4K6D_gx2Iw

YouTube

sentdex

How to use your trained model - Deep Learning basics with Python, T...

In this part, we're going to cover how to actually use your model. We will us our cats vs dogs neural network that we've been perfecting.

Text tutorial and sample code: https://pythonprogramming.net/using-trained-model-deep-learning-python-tensorflow-keras/

Dog example: https://pythonprogramming.net/static/images/machine-learning/dog.jpg

Cat ...

▶ Play video

warm jungle Dec 1, 2021, 9:08 AM

#

so img.reshape((-1, size, size,1)) rather than what you have

normal radish Dec 1, 2021, 9:21 AM

#

Hey do anyone know how to make a visualisation like this on my ConvNet?

orchid kayak Dec 1, 2021, 10:33 AM

#

What does it mean when my evaluation loss is magnitudes higher than my training loss?

odd meteor Dec 1, 2021, 11:15 AM

#

orchid kayak What does it mean when my evaluation loss is magnitudes higher than my training ...

Your model is overfitting. Your model is suffering from high variance problem.

In a layman terms, it means your model performed well in minimizing your loss function on train set, but not so well in replicating the achieved success on your validation set.

The aim is to get the RMSE/Categorical_crossentropy/ exact loss function obtained on Validation set to be exactly same or somewhat closer to the RMSE/categorical cross entropy /your exact loss function obtained on your train set.

old grove Dec 1, 2021, 12:32 PM

#

Hello all.... In spearmenr correlation test, What is our null hypothesis ? there is correlation or there is no correlation ?

warm valley Dec 1, 2021, 12:34 PM

#

Hello, I have a small question. I have a pandas column which have negative values, I want to convert them into positive .

I tried
data[data['Quantity']] = abs(data['Quantity'])
but its giving error

dusk zephyr Dec 1, 2021, 12:39 PM

#

which error? are there nan/null values in the column?

warm valley Dec 1, 2021, 12:42 PM

#

By error, I mean, it was taking years.
But upon searching a bit, I found the code
data[data.columns[data.dtypes != object]] = data[data.columns[data.dtypes != object]].abs()

orchid kayak Dec 1, 2021, 12:48 PM

#

odd meteor Your model is overfitting. Your model is suffering from high variance problem. ...

I understand, I've got a few followups if you don't mind:

My training accuracy itself wasn't that high anyway, so what does it mean that it minimized my loss function?
Is there a method to fix the issue, or is it just randomly changing my hyper-parameters?

odd meteor Dec 1, 2021, 1:21 PM

#

orchid kayak I understand, I've got a few followups if you don't mind: 1. My training accurac...

These ML terminologies might kinda be confusing but there are two things I'd like you to understand first.

Remember one of main goals of almost all model architecture in ML or Deep Learning projects is to either:

Minimize the loss function
Or
Maximize the objective function

Now you as a ML Engineer, your goal is to get the optimum point. You'd want to get an optimum function that minimizes your loss function and at the same damn time maximizes / minimizes your objective function (you can think of this roughly in your head as finding the equilibrium point) to achieve the lowest generalisation error. In ML it's called Bias-Variance Tradeoff.

Usually, achieving #1 in most scenarios leads to achieving #2 as well.

__What is Loss Function or Objective Function? __

Loss function in a layman terms could simply be likened to what we need to train our model in order for us to know how well our model explains the data.

NB: When we're minimizing a function it's called loss function or cost function

Examples: RMSE, MAE, Logloss also similar to categorical cross entropy, MSE, Huber etc

Objective Function is the function we want to either minimize or maximize. In general term, it's any function we want to optimize during training.. So loss function is a type of an objective function.

Example: Coefficient of Determination a.k.a R^2, explained variance score, F1 score, RoC score , AuC score, accuracy_score, precision_score, and all the examples of loss function, etc

With the above explanation I believe you can easily understand what I'm about to say next.

a) High Error leads to Low Accuracy score
b) Low error leads to High Accuracy

So to answer your questions now

I'm super sure all I've explained till this point has answered your first question.
you'd have to reduce your model complexity, or gather more data. By reducing model complexity I mean, reducing your max_depth, min_samples_leaf etc.

Still confused? I hope not 😀

orchid kayak Dec 1, 2021, 1:24 PM

#

Not confused anymore, thanks a lot!

bold timber Dec 1, 2021, 3:24 PM

#

Hi, I am so confused about this code. What argsort()[-6:] does in this code?

serene scaffold Dec 1, 2021, 3:25 PM

#

bold timber Hi, I am so confused about this code. What argsort()[-6:] does in this code?

do you know what argsort does, and do you know what [-6:] does?

#

!e

import numpy as np
arr = np.array([1, 7, 3, 4])
print(arr.argsort())

arctic wedgeBOT Dec 1, 2021, 3:26 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

[0 2 3 1]

bold timber Dec 1, 2021, 3:29 PM

#

serene scaffold do you know what argsort does, and do you know what `[-6:]` does?

I only know argsort is a function to get a similarity value each other

serene scaffold Dec 1, 2021, 3:29 PM

#

bold timber I only know argsort is a function to get a similarity value each other

That is not what argsort does.

#

!docs numpy.ndarray.argsort

arctic wedgeBOT Dec 1, 2021, 3:29 PM

#

numpy.ndarray.argsort


ndarray.argsort(axis=- 1, kind=None, order=None)```
Returns the indices that would sort this array.

Refer to [`numpy.argsort`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html#numpy.argsort "numpy.argsort") for full documentation.

See also

[`numpy.argsort`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html#numpy.argsort "numpy.argsort")equivalent function

bold timber Dec 1, 2021, 3:30 PM

#

serene scaffold That is not what argsort does.

can you explain more to me?

serene scaffold Dec 1, 2021, 3:31 PM

#

bold timber can you explain more to me?

when you argsort an array, you get an array of ints where each int represents an index in the original array

lapis sequoia Dec 1, 2021, 3:31 PM

#

If we explain the above example stel gave,
argsort will give index of element instead of actual element in the sorted array.

serene scaffold Dec 1, 2021, 3:32 PM

#

and the ints are in the order that the elements would be if you had actually sorted the array

bold timber Dec 1, 2021, 3:36 PM

#

serene scaffold and the ints are in the order that the elements would be if you had actually sor...

whether argsort giving a ascending array of value?

bold timber Dec 1, 2021, 3:37 PM

#

lapis sequoia If we explain the above example stel gave, argsort will give index of element in...

whether argsort giving a ascending array of value?

lapis sequoia Dec 1, 2021, 3:37 PM

#

Means?

bold timber Dec 1, 2021, 3:40 PM

#

lapis sequoia Means?

Whether argsort giving a value from smaller to higher?

lapis sequoia Dec 1, 2021, 3:42 PM

#

bold timber Whether argsort giving a value from smaller to higher?

It is giving indices of values from smaller to higher.

tough bolt Dec 1, 2021, 3:49 PM

#

https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation

GitHub

GitHub - HRNet/HigherHRNet-Human-Pose-Estimation: This is an offici...

This is an official implementation of our CVPR 2020 paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" (https://arxiv.org/abs/1908.10357)...

#

Yo

#

I have the task to run this

#

But have no idea where to start

#

Could somebody give me some guidance or nudge me into the right direction how I use those pretrained models

chilly flame Dec 1, 2021, 3:52 PM

#

Hey everyone i have task to recognition character, i already finish the preprocessing and got some binary images but i have no idea how to extract and store the features to the database. Anyone who learning images processing and know about zoning based feature extraction with svm for classifier can explain me or give me link for the documentation? you can hit me up on dm

mighty spoke Dec 1, 2021, 5:13 PM

#

Hi how would I create different plots from different data frames using a loop?

normal radish Dec 1, 2021, 5:46 PM

#

Hey everyone I need some help! How does this ConvNet return a negative value when the sigmoid functon is apllied and there is no negative values as far as I see after the model: https://colab.research.google.com/drive/1F4prYqhvItrD9xaHlZs46-EE7f5MxOEF?usp=sharing

Google Colaboratory

serene scaffold Dec 1, 2021, 5:58 PM

#

mighty spoke Hi how would I create different plots from different data frames using a loop?

Why did you start with the assumption that there has to be a loop?

Can you give an example of the data you're trying to plot? do print(df.head().to_dict('list')) and show it as text.

robust jungle Dec 1, 2021, 6:01 PM

#

Im running this in terminal (from model_main_tf2.py) and im getting an error:

#

PIPELINE_CONFIG_PATH=/Users/admin/PycharmProjects/Imageclassifier/model/object_detection/efficientdet_d7_coco17_tpu-32 2/pipeline.config
MODEL_DIR=/Users/admin/PycharmProjects/Imageclassifier/model/object_detection/efficientdet_d7_coco17_tpu-32 2
NUM_TRAIN_STEPS=10000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python model_main_tf2.py -- \
  --model_dir=$MODEL_DIR --num_train_steps=$NUM_TRAIN_STEPS \
  --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
  --pipeline_config_path=$PIPELINE_CONFIG_PATH \
  --alsologtostderr

#

error:

#

TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. tensorflow.python.lib.io._pywrap_file_io.BufferedInputStream(filename: str, buffer_size: int, token: tensorflow.python.lib.io._pywrap_file_io.TransactionToken = None)

Invoked with: None, 524288

serene scaffold Dec 1, 2021, 6:02 PM

#

robust jungle ``` TypeError: __init__(): incompatible constructor arguments. The following arg...

Can you show what comes before that part of the error message?

#

!traceback

robust jungle Dec 1, 2021, 6:02 PM

#

serene scaffold Can you show what comes before that part of the error message?

sure, but it is quite long

serene scaffold Dec 1, 2021, 6:02 PM

#

robust jungle sure, but it is quite long

!paste

arctic wedgeBOT Dec 1, 2021, 6:02 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

robust jungle Dec 1, 2021, 6:02 PM

#

alright thanks

#

https://paste.pythondiscord.com/wunumakapi.sql

serene scaffold Dec 1, 2021, 6:04 PM

#

robust jungle https://paste.pythondiscord.com/wunumakapi.sql

Looks like this isn't something I know enough about to jump into rn, but you can save this paste and ask again later or in a help channel. Or wait here.

robust jungle Dec 1, 2021, 6:04 PM

#

serene scaffold Looks like this isn't something I know enough about to jump into rn, but you can...

alright

mighty spoke Dec 1, 2021, 6:08 PM

#

serene scaffold Why did you start with the assumption that there has to be a loop? Can you give...

Hi @serene scaffold my data is like this

import pandas as pd#import pandas package to read data more easily
import matplotlib.pyplot as plt#imported pyplot to plot graphs
import datetime as dt#date time to read first column of csv file
import numpy as np
from datetime import datetime



df = pd.read_csv('TSLA.csv')
df2 = pd.read_csv('NBM.V.csv')
df3 = pd.read_csv('TSLA.csv')

df['Date'] = pd.to_datetime(df['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
x1=(df['Date'] - dt.datetime(1970,1,1)).dt.total_seconds()/86400
x2=(df2['Date'] - dt.datetime(1970,1,1)).dt.total_seconds()/86400

y1=df['Close']
y2=df2['Close']

t0=[]
d0=[]

def udcf(df,df2,t0,d0):
    y1_mean = np.mean(y1)
    y2_mean = np.mean(y2)
    y1_stdv = np.std(y1)
    y2_stdv = np.std(y2)
    for i in range(len(df)):
        for j in range(len(df2)):
            t=x2[j]-x1[i]
            t0.append(t)
            d = (y1[i]- y1_mean)*(y2[j] - y2_mean)/(y1_stdv*y2_stdv)
            d0.append(d)
    return udcf,t0,d0                                                                                               
x, y = zip(*sorted(zip(t0, d0)))#ensures x and y values correspond to each others in pairs when sorted
plt.scatter(x, y, ls='-', lw='1', color='red', marker='.')```

#

I want to use other data frames(df2,df3) and calculate t0 and d0 for each then plot them in different graphs rather than doing it manually

serene scaffold Dec 1, 2021, 6:46 PM

#

mighty spoke Hi <@!253696366952316929> my data is like this ``` import pandas as pd#import p...

Please don't show code when asked to show data. I don't have any of those three CSVs on my computer, so this isn't helpful for me.

pale thunder Dec 1, 2021, 6:47 PM

#

anyone aware of a jupyter frontend that is capable of accepting user input and connecting to an already running kernel, like jupyter console can with --existing?

serene scaffold Dec 1, 2021, 7:12 PM

#

What does one call it when they do an "outer" operation on two vectors, other than multiplication (ie outer product)?

#

looks like it might not matter, in this case

mighty spoke Dec 1, 2021, 7:13 PM

#

one of the csv files (df)

serene scaffold Dec 1, 2021, 7:13 PM

#

Remember what I said before: print(df.head().to_dict('list'))

#

if your next message doesn't have the data in that format, I'm afraid I'll have to stop helping.

#

That was right except that it was the same df twice

mighty spoke Dec 1, 2021, 7:16 PM

#

serene scaffold if your next message doesn't have the data in that format, I'm afraid I'll have ...

print(df.head().to_dict('list'))
{'Date': [Timestamp('2010-08-10 00:00:00'), Timestamp('2010-11-10 00:00:00'), Timestamp('2010-12-10 00:00:00'), Timestamp('2010-10-13 00:00:00'), Timestamp('2010-10-14 00:00:00')], 'Open': [10.25, 10.19, 11.05, 12.25, 12.9], 'High': [10.57, 12.0, 12.75, 12.8, 14.79], 'Low': [10.1, 9.85, 10.96, 11.86, 12.75], 'Close': [10.13, 11.13, 12.05, 12.71, 13.94], 'Adj Close': [10.13, 11.13, 12.05, 12.71, 13.94], 'Volume': [1135300, 712500, 777000, 1413100, 1895200]}

#

is that ok?

serene scaffold Dec 1, 2021, 7:16 PM

#

yes, this is what I wanted

mighty spoke Dec 1, 2021, 7:16 PM

#

ah kl

serene scaffold Dec 1, 2021, 7:17 PM

#

It's a good format because I can copy and paste it directly and use it

mighty spoke Dec 1, 2021, 7:18 PM

#

ohh i see

serene scaffold Dec 1, 2021, 7:21 PM

#

def udcf(df,df2,t0,d0):
    y1_mean = np.mean(y1)
    y2_mean = np.mean(y2)
    y1_stdv = np.std(y1)
    y2_stdv = np.std(y2)
    for i in range(len(df)):
        for j in range(len(df2)):
            t=x2[j]-x1[i]
            t0.append(t)
            d = (y1[i]- y1_mean)*(y2[j] - y2_mean)/(y1_stdv*y2_stdv)
            d0.append(d)
    return udcf,t0,d0

This can be greatly simplified

def udcf(y1, y2):
    d = np.outer(y1 - y1.mean(), y2 - y2.mean()) / (y1.std() * y2.std())
    t = (y1.reshape(-1, 1) - y2.reshape(1, -1)).reshape(-1)
    return t, d

or something like that.

#

Anyway @mighty spoke what are you trying to plot? Which two columns are x and y?

mighty spoke Dec 1, 2021, 7:24 PM

#

@serene scaffold I'm trying to plot the lag values (x) and dcf values (y)

#

@serene scaffold this is my other data frame df2

#

print(df2.head().to_dict('list'))
{'Date': [Timestamp('2020-11-27 00:00:00'), Timestamp('2020-11-30 00:00:00'), Timestamp('2020-01-12 00:00:00'), Timestamp('2020-02-12 00:00:00'), Timestamp('2020-03-12 00:00:00')], 'Open': [0.09, 0.09, 0.09, 0.09, 0.09], 'High': [0.09, 0.09, 0.09, 0.09, 0.09], 'Low': [0.09, 0.09, 0.09, 0.09, 0.09], 'Close': [0.09, 0.09, 0.09, 0.09, 0.09], 'Adj Close': [0.09, 0.09, 0.09, 0.09, 0.09], 'Volume': [0, 0, 0, 0, 0]}

arctic wedgeBOT Dec 1, 2021, 7:28 PM

#

Hey @mighty spoke!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

mighty spoke Dec 1, 2021, 7:28 PM

#

my full code:

serene scaffold Dec 1, 2021, 7:29 PM

#

@mighty spoke idk if I can do a full code exploration rn. Can you make it so that your dataframes have columns that represent the x and y data?

#

how is lag calculated?

mighty spoke Dec 1, 2021, 7:30 PM

#

serene scaffold <@!761588935616102422> idk if I can do a full code exploration rn. Can you make ...

lag is calculated using t=x2[j]-x1[i]

mighty spoke Dec 1, 2021, 7:31 PM

#

serene scaffold <@!761588935616102422> idk if I can do a full code exploration rn. Can you make ...

sure i did this df4 = pd.DataFrame({'X' : x, 'Y' : y})

#

also I tried binning the x values but i'm not sure its done the most efficient way

#

x=t0, y=d0

#

i'm trying to compare different data frames like df and df3 or df2 and df3 then plot them in different graphs but i don't want to make a a udcf function for each

serene scaffold Dec 1, 2021, 7:35 PM

#

mighty spoke i'm trying to compare different data frames like df and df3 or df2 and df3 then ...

you should only have to make one udcf function that you can call whenever

#

though it looks like your function is weirdly defined

#

it depends on variables defined in the global scope and doesn't use its parameters. it also returns itself

mighty spoke Dec 1, 2021, 7:38 PM

#

serene scaffold you should only have to make one udcf function that you can call whenever

yes thats what i want

serene scaffold Dec 1, 2021, 7:39 PM

#

mighty spoke yes thats what i want

def udcf(y1, y2):
    return np.outer(y1 - y1.mean(), y2 - y2.mean()) / (y1.std() * y2.std())

assuming that they are both vectors (one-dimensional arrays)

mighty spoke Dec 1, 2021, 7:40 PM

#

serene scaffold ```py def udcf(y1, y2): return np.outer(y1 - y1.mean(), y2 - y2.mean()) / (y...

then i could call it outside like y=udcf(y1, y2) ?

serene scaffold Dec 1, 2021, 7:41 PM

#

mighty spoke then i could call it outside like y=udcf(y1, y2) ?

ye

mighty spoke Dec 1, 2021, 7:41 PM

#

when you did this t = (y1.reshape(-1, 1) - y2.reshape(1, -1)).reshape(-1)
return t, d

#

what does reshape do ?

serene scaffold Dec 1, 2021, 7:43 PM

#

mighty spoke what does reshape do ?

change the shape of the array

sleek sentinel Dec 1, 2021, 7:44 PM

#

Hello, I don't have a powerful PC to train a resource-intensive model, do you know a software to make clusters that works on both linux and windows?

serene scaffold Dec 1, 2021, 7:45 PM

#

sleek sentinel Hello, I don't have a powerful PC to train a resource-intensive model, do you kn...

so you want to do clustering? you can use sklearn for that.

autumn delta Dec 1, 2021, 7:46 PM

#

Hello everyone !

I was wondering if anyone is able to help guide me in a direction for a project ?

I’ve been looking into it and I’ve been seeing a lot of Ai.

Not 100 percent sure if this is the place.

sleek sentinel Dec 1, 2021, 7:47 PM

#

serene scaffold so you want to do clustering? you can use sklearn for that.

I use transformer of huggingface for train

serene scaffold Dec 1, 2021, 7:48 PM

#

sleek sentinel I use transformer of huggingface for train

in either case, all the deep learning libraries I know about (sklearn is not a deep learning library, for the record) can run on Windows and Linux, but might require extra work to get running on Windows.

sleek sentinel Dec 1, 2021, 7:48 PM

#

uh yes but how to launch on several machines on the same job?

serene scaffold Dec 1, 2021, 7:49 PM

#

do you mean several machines or several CPUs?

sleek sentinel Dec 1, 2021, 7:50 PM

#

several machines :p

serene scaffold Dec 1, 2021, 7:51 PM

#

autumn delta Hello everyone ! I was wondering if anyone is able to help guide me in a direc...

This is the channel for asking AI questions, yes.

serene scaffold Dec 1, 2021, 7:51 PM

#

sleek sentinel several machines :p

you might need to use Hadoop or something for that.

odd meteor Dec 1, 2021, 8:31 PM

#

Just thinking out loud...

Do anyone here really use TensorFlow's high level Estimator API to train a model? If so, how often cos... 🤔

I'm well aware of its many advantages over the low level algebraic method and Keras Sequential method but I think it can be stressful when we have many features in our dataset.

Let's say we have +52 features in our data, do we really have to define each +52 feature columns manually? 😩

Is there no way to evade this process of manually defining each feature columns?

desert oar Dec 1, 2021, 9:23 PM

#

odd meteor Just thinking out loud... Do anyone here really use TensorFlow's high level Es...

if it makes you feel better, the docs recommend against Estimator because it doesn't support the v2 api, and suggest using the Keras api instead

#

that said, you can always write a for loop or list comprehension if you need to programmatically build up lists of features

final scaffold Dec 1, 2021, 9:25 PM

#

Hey guys,
i have installed anaconda which almost comes with all packages i need. Is it still necessary to create an environment?
Can i not just do:

create a project in a location of my choice and select the default base environment. And then finally run the scripts since most of my packages are there in the conda installed location (base env)?

desert oar Dec 1, 2021, 9:30 PM

#

final scaffold Hey guys, i have installed anaconda which almost comes with all packages i need...

creating one environment per project helps make sure your anaconda installation itself doesn't get messed up

#

there are a lot of reasons for this, but it's going to save you a lot of pain in the future if you just create one env per project

#

so yes of course you can do what you are asking about, but you shouldn't

#

personally i think anaconda made a very poor decision by shipping everything in one big base environment

compact parrot Dec 1, 2021, 9:31 PM

#

Hi guys
How to implement roc auc for multiclass?
Tryed various variants from google and nothing worked (perhaps cause i am not as smart as i want)

desert oar Dec 1, 2021, 9:33 PM

#

compact parrot Hi guys How to implement roc auc for multiclass? Tryed various variants from goo...

you can compute it separately for each pair of classes, and combine those results using this formula: https://stats.stackexchange.com/q/76830/36229

Cross Validated

Area Under ROC Curve for Multiple Classes

I am working with a highly class-skewed three class classification problem. The class percentages are A = 1.8%, B = 17.5% and C = 80.7%. According to this paper, the following definition of multi-c...

compact parrot Dec 1, 2021, 9:33 PM

#

desert oar you can compute it separately for each pair of classes, and combine those result...

Thanks!

#

Compute separately like
for class in multiclass:
code
?

desert oar Dec 1, 2021, 9:34 PM

#

yes, it can be a for loop over pairs of classes

compact parrot Dec 1, 2021, 9:34 PM

#

Thanks!

odd meteor Dec 1, 2021, 9:35 PM

#

desert oar if it makes you feel better, the docs recommend against Estimator because it doe...

Thanks 😊. Could you send the link to the doc you referenced here? I'd love to read it as well

desert oar Dec 1, 2021, 9:35 PM

#

actually.. i'm not sure if that's how you do it

#

let me look into this a bit more @compact parrot

compact parrot Dec 1, 2021, 9:35 PM

#

desert oar let me look into this a bit more <@!302734324648902657>

Ok!

desert oar Dec 1, 2021, 9:35 PM

#

you might need to do something like fit a different model for every pair of classes

compact parrot Dec 1, 2021, 9:35 PM

#

Oh

desert oar Dec 1, 2021, 9:35 PM

#

it's not generally used for multi-class problems

desert oar Dec 1, 2021, 9:35 PM

#

odd meteor Thanks 😊. Could you send the link to the doc you referenced here? I'd love to r...

https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator

Warning: Estimators are not recommended for new code. Estimators run v1.Session-style code which is more difficult to write correctly, and can behave unexpectedly, especially when combined with TF 2 code. Estimators do fall under our compatibility guarantees, but will receive no fixes other than security vulnerabilities. See the migration guide for details.

TensorFlow

tf.estimator.Estimator | TensorFlow Core v2.7.0

Estimator class to train and evaluate TensorFlow models.

compact parrot Dec 1, 2021, 9:36 PM

#

I am quite new for DT 👉 👈

#

I could share my code for better understanding

odd meteor Dec 1, 2021, 9:36 PM

#

desert oar https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator > Warning: Est...

Thanks man

desert oar Dec 1, 2021, 9:37 PM

#

compact parrot I could share my code for better understanding

you don't need to share the code, but if you explain more about what you are doing, people can provide more useful advice

#

although feel free to share if you can

compact parrot Dec 1, 2021, 9:37 PM

#

desert oar it's not generally used for multi-class problems

My scientific director in uni says that I should implent that

desert oar Dec 1, 2021, 9:38 PM

#

compact parrot My scientific director in uni says that I should implent that

let me skim the paper that the stackexchange answer linked

compact parrot Dec 1, 2021, 9:39 PM

#

#%%

x = df.loc[:,0:63]
y = df[64]

n_classes = y[0]

#%%

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)

#%%

sc = StandardScaler()
x_train = pd.DataFrame(sc.fit_transform(x_train))
x_test = pd.DataFrame(sc.transform(x_test))

#%%

# Naive Bayes
gnb = GaussianNB()
fit = gnb.fit(x_train, y_train)
y_train_pred = fit.predict(x_train)
y_test_pred = fit.predict(x_test)

result = {'y_train': y_train, 'y_test': y_test, 'y_train_pred': y_train_pred, 'y_test_pred':y_test_pred}
show_info('Naive Bayes', gnb, result)```
I am using this dataset
https://www.kaggle.com/kyr7plus/emg-4

desert oar Dec 1, 2021, 9:39 PM

#

https://www.hpl.hp.com/techreports/2003/HPL-2003-4.pdf yeah here, see section 8.2 for an explanation of this.

compact parrot Dec 1, 2021, 9:40 PM

#

Thanks, will read it

#data-science-and-ml

HINT: This is a great place to use filters!

Using the original dataframe, take the genres column, and only keep the first genre.

For example, if the value was previously Comedy,Drama,Romance, then it would become Comedy

Create a pivot table where the average runtime of the movie is examined

Make the rows Year and the columns Genres

Select 4 Genres of your choice. Filter your dataframe so that only those 4 Genres are left

Using the original dataframe, take the genres column, and only keep the first genre.

For example, if the value was previously Comedy,Drama,Romance, then it would become Comedy

Select 4 Genres of your choice. Filter your dataframe so that only those 4 Genres are left

Create a pivot table of the average runtime of movies over time. The rows are therefore the year

The columns will be the 4 Genres you filtered for