hasty grail Nov 5, 2020, 5:49 AM

#

yes

heady hatch Nov 5, 2020, 5:49 AM

#

But even prior to that, how would I know to get the b column?

#

Because as of now, I have no idea where the value is.

hasty grail Nov 5, 2020, 5:49 AM

#

oh I assumed that you only wanted to perform a search on the b column

heady hatch Nov 5, 2020, 5:49 AM

#

Oh! hahaha

#

Yea sorry for being unclear.

hasty grail Nov 5, 2020, 5:50 AM

#

if there are multiple columns then you can convert the entire thing to a NumPy array

#

nonzero would still work

heady hatch Nov 5, 2020, 5:50 AM

#

So essentially we would need to scan across all columns for it?

hasty grail Nov 5, 2020, 5:50 AM

#

yes

heady hatch Nov 5, 2020, 5:50 AM

#

Ahh okay.

hasty grail Nov 5, 2020, 5:51 AM

#

Am assuming that you want *all* the occurences

#

not just the first match

narrow flume Nov 5, 2020, 5:53 AM

#

i'm having questions on making Sierpiński triangle about matplotlib

golden pecan Nov 5, 2020, 6:26 AM

#

hi everyone, i have a JSON question, hope this is the right channel to ask.

I need to parse the JSON from a string. it goes like this:

is there a way to just get the JSON part without declaring something something parts?

hasty grail Nov 5, 2020, 6:41 AM

#

yes you can use regex

chilly pasture Nov 5, 2020, 6:52 AM

#

hi how do i activate a conda virtual environment in colab that persists across all cells?

#

📎 unknown.png

#

📎 unknown.png

#

!source activate command works only for that particular cell

arctic wedgeBOT Nov 5, 2020, 6:53 AM

#

Bad argument

Unable to convert 'activate command works only for that particular cell' to valid command, tag, or Cog.

velvet thorn Nov 5, 2020, 6:55 AM

#

Can we search for index by values in pandas DataFrame?
@heady hatch why do you want to do that?

#

that's my first question

#

second question: do you want the original index, or a numeric one?

heady hatch Nov 5, 2020, 7:34 AM

#

@velvet thorn Because I have a dataframe consist of unique values across, and I need to look for the index of those unique values.

eg

a ('asdf', 1) ('fdsa', 2) ('qwert', 3)
b ('zxcv', 4) ('vcxz', 5) ('qsqdqw', 6)
c ...

original index, but I'm assuming I could just map it back using the numerical one.

One solution I've found was

df.isin([value]).any(axis=1)

This will give me the index of where the value exists.

But I think looking up values to transform them isn't efficient. I think it would be better to transform the values beforehand before turning it into a dataframe.

Originally what I needed to do is count all the values in a list, then turn that count into a dataframe.

But then later on, the objective changed into count all values plus some metadata.

So I was thinking since I've already made the dataframe, I could just go back and insert the metadata at the particular values.

Apparently it's a whole other mess. hahaha

velvet thorn Nov 5, 2020, 7:35 AM

#

hm

#

@heady hatch okay, wait

#

originally you said "by column", didn't you?

#

like can you give a realistic example?

heady hatch Nov 5, 2020, 7:37 AM

#

Hmm what do you mean by column?

#

Oh I think it was Darklight's solution of scanning by column.

velvet thorn Nov 5, 2020, 7:38 AM

#

I'm not really sure what the code you posted represents

#

(also...why are there tuples in your DF...?)

heady hatch Nov 5, 2020, 7:42 AM

#

@velvet thorn
A realistic example, hmm.

So let's say we have a list of features.

#features for category
[['feat_1', 'feat_2', 'feat_3', 'feat_1', 'feat_3'], ['feat_1', 'feat_3', 'feat_4', 'feat_5']]
# turning into count
[{'feat_1': 2, 'feat_2': 1, 'feat_3': 2}, {'feat_1': 1, 'feat_3': 1, 'feat_4': 1, 'feat_5': 1}]

df
category1 ('feat_1', 2) ('feat_2', 1) ('feat_3', 2) (None)
category2 ('feat_1', 1) ('feat_3', 1) ('feat_4', 1) ('feat_5', 1)

But now I need to go back into each feature and add some metadata.

velvet thorn Nov 5, 2020, 7:50 AM

#

🥴

#

are those values?

heady hatch Nov 5, 2020, 7:50 AM

#

Yup.

velvet thorn Nov 5, 2020, 7:51 AM

#

...is it supposed to be like this?

#

>>> df = pd.DataFrame([[1, 1], [1, 2], [1, 3], [1, 1], [1, 3], [2, 1], [2, 3], [2, 4], [2, 5]], columns=['category', 'feature'])
>>> df
   category  feature
0         1        1
1         1        2
2         1        3
3         1        1
4         1        3
5         2        1
6         2        3
7         2        4
8         2        5
>>> df.groupby('category').count()
          feature
category         
1               5
2               4

#

(I meant more something like this btw)

#

like something that can be executed

undone flare Nov 5, 2020, 8:04 AM

#

df_csv.loc[df_csv['Type 1'] == 'Fire'] I have this but I only want to inculde those who are Flying Fire pokemon so how can I do it?

velvet thorn Nov 5, 2020, 8:05 AM

#

df_csv.loc[df_csv['Type 1'] == 'Fire'] I have this but I only want to inculde those who are Flying Fire pokemon so how can I do it?
@undone flare show data

undone flare Nov 5, 2020, 8:05 AM

#

ok

velvet thorn Nov 5, 2020, 8:05 AM

#

no images please

undone flare Nov 5, 2020, 8:06 AM

#

then?

velvet thorn Nov 5, 2020, 8:07 AM

#

text

heady hatch Nov 5, 2020, 8:07 AM

#

@velvet thorn
Something like this, this was generated via random ascii lowercase letters 100 times for 3 times.

In [34]: pd.DataFrame(results)
Out[34]:
        0       1       2       3       4
0  (d, 8)  (v, 7)  (q, 6)  (u, 5)  (m, 5)
1  (c, 9)  (u, 7)  (e, 6)  (d, 6)  (q, 6)
2  (n, 8)  (b, 7)  (z, 7)  (o, 7)  (p, 6)

velvet thorn Nov 5, 2020, 8:07 AM

#

although I'm going to assume that you have a Type 1 and Type 2 column

undone flare Nov 5, 2020, 8:07 AM

#

yes

velvet thorn Nov 5, 2020, 8:07 AM

#

accordingly, I believe you want df_csv[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2'] == 'Flying')].

#

assuming 'Fire' can only be in 'Type 1'

undone flare Nov 5, 2020, 8:08 AM

#

oh

#

I need to make them tuple

velvet thorn Nov 5, 2020, 8:08 AM

#

...

heady hatch Nov 5, 2020, 8:08 AM

#

HAHAHA

velvet thorn Nov 5, 2020, 8:08 AM

#

what?

#

why would you do that

#

no don't do it

undone flare Nov 5, 2020, 8:08 AM

#

Wiat

heady hatch Nov 5, 2020, 8:08 AM

#

Don't do it.

velvet thorn Nov 5, 2020, 8:08 AM

#

@velvet thorn
Something like this, this was generated via random ascii lowercase letters 100 times for 3 times.

In [34]: pd.DataFrame(results)
Out[34]:
        0       1       2       3       4
0  (d, 8)  (v, 7)  (q, 6)  (u, 5)  (m, 5)
1  (c, 9)  (u, 7)  (e, 6)  (d, 6)  (q, 6)
2  (n, 8)  (b, 7)  (z, 7)  (o, 7)  (p, 6)

@heady hatch 🥴

#

why are there tuples in your DataFrame

#

that is Bad

heady hatch Nov 5, 2020, 8:08 AM

#

🥴

velvet thorn Nov 5, 2020, 8:08 AM

#

not Bad, but still Bad

undone flare Nov 5, 2020, 8:08 AM

#

no I mean

heady hatch Nov 5, 2020, 8:09 AM

#

It's terrible but the people wanted me to do this wanted the data like this.

#

Or hmm do you have any other suggestions?

undone flare Nov 5, 2020, 8:10 AM

#

do I have to do df_csv.loc[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2' == 'Flying'])]?

heady hatch Nov 5, 2020, 8:10 AM

#

Depending on what you want.

velvet thorn Nov 5, 2020, 8:10 AM

#

do I have to do df_csv.loc[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2' == 'Flying'])]?
@undone flare that's literally what I typed

#

without the .loc

#

and assuming this

#

assuming 'Fire' can only be in 'Type 1'
@velvet thorn

#

if you can have Flying/Fire in that order then you need to add a bit more

undone flare Nov 5, 2020, 8:10 AM

#

ok

velvet thorn Nov 5, 2020, 8:10 AM

#

It's terrible but the people wanted me to do this wanted the data like this.
@heady hatch uh...

#

Or hmm do you have any other suggestions?
@heady hatch to stoer the data differently?

#

what's the first element in the tuple

heady hatch Nov 5, 2020, 8:11 AM

#

[Counter({'j': 7,
          'f': 3,
          'y': 8,
          'b': 2,
          'c': 3,
          'm': 8,
          's': 6,
          'z': 6,
          'r': 3,
          'a': 3,
          'h': 6,
          'd': 5,
          'w': 1,
          'p': 4,
          'g': 2,
          'i': 5,
          'u': 6,
          'q': 5,
          'o': 3,
          'n': 3,
          'l': 2,
          'k': 4,
          'x': 3,
          'v': 1,
          't': 1}),
 Counter({'i': 6,
          'e': 5,
          'f': 5,
          'v': 6,
          'g': 4,
          'o': 2,
          'x': 4,
          'q': 1,
          'm': 2,
          'k': 4,
          'y': 3,
          'w': 4,
          'a': 3,
          'r': 3,
          'z': 9,
          'd': 3,
          's': 4,
          'h': 5,
          'n': 2,
          'l': 2,
          'p': 8,
          'c': 5,
          't': 2,
          'b': 5,
          'u': 2,
          'j': 1}),
 Counter({'d': 2,
          'x': 3,
          'b': 5,
          'k': 4,
          'i': 6,
          't': 5,
          'v': 9,
          'm': 5,
          's': 3,
          'a': 4,
          'z': 5,
          'p': 3,
          'r': 4,
          'o': 9,
          'q': 5,
          'l': 5,
          'c': 3,
          'e': 4,
          'u': 1,
          'g': 4,
          'n': 1,
          'f': 1,
          'h': 4,
          'j': 3,
          'y': 1,
          'w': 1})]

So the data is something like this.

#

It's a count of certain values.

#

and they want the top 50, each as a column.

#

Not the top 50 of alphabetical characters but top 50 of something else.

#

The first element of the tuple is the key in the count, the second value is the count itself.

undone flare Nov 5, 2020, 8:13 AM

#

This gives me error df_csv.loc[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2' == 'Flying'])]

heady hatch Nov 5, 2020, 8:13 AM

#

And there are a thousand something of these counts.

#

What kind of error are you getting?

undone flare Nov 5, 2020, 8:13 AM

#

oh wait

#

nvm.

#

The placement of ] was wrong

velvet thorn Nov 5, 2020, 8:15 AM

#

The placement of ] was wrong
@undone flare yes, because I told you to look at the code that I wrote

#

not edit what you wrote...

#

as I said, you shouldn't be using .loc

#

And there are a thousand something of these counts.
@heady hatch wait, go back

#

so each individual Counter instance, when stored in the DataFrame, should have something to identify it?

#

i.e. a count from one is distinguishable from a count from another

heady hatch Nov 5, 2020, 8:16 AM

#

It’ll be identified by another value, which will be their index.

velvet thorn Nov 5, 2020, 8:17 AM

#

yeah

heady hatch Nov 5, 2020, 8:17 AM

#

I’m on mobile so I can’t type code. But something like

{“value”: Counter(...)}

And the value will be the index.

velvet thorn Nov 5, 2020, 8:17 AM

#

that's what you want in the result

#

what I mean is

#

IDEALLY

#

you would have a DataFrame with three columns

#

category, character, count

#

not that tuple mess 🥴

heady hatch Nov 5, 2020, 8:18 AM

#

Hahaha I’ve maintained two data frames. One before the tuple mess and the other one as the output that the other people want them.

velvet thorn Nov 5, 2020, 8:18 AM

#

why do they want that

#

did you ask?

undone flare Nov 5, 2020, 8:18 AM

#

as I said, you shouldn't be using .loc
@velvet thorn I am learning rn

heady hatch Nov 5, 2020, 8:19 AM

#

But then I needed to edit the tuples which started this whole journey.

velvet thorn Nov 5, 2020, 8:19 AM

#

which is why I'm telling you not to use it

heady hatch Nov 5, 2020, 8:19 AM

#

Lmeow

velvet thorn Nov 5, 2020, 8:19 AM

#

I'm just saying

#

it'd be a lot easier to add metadata

#

you would have a DataFrame with three columns
@velvet thorn with this

#

add one more column, done 🙂

heady hatch Nov 5, 2020, 8:20 AM

#

I think the reason they wanted it is because they’re not familiar with Python and they want to visually understand the counts.

velvet thorn Nov 5, 2020, 8:21 AM

#

create a visualisation then

heady hatch Nov 5, 2020, 8:22 AM

#

I’ll let them figure that out and I’ll update you tomorrow on what happens.

#

Going to head to bed, good night and thanks again.

velvet thorn Nov 5, 2020, 8:23 AM

#

yw!

boreal summit Nov 5, 2020, 8:26 AM

#

Hello everyone, I've been having a little issue. I'm unable to import datasets from sklearn. I'm getting a "URLopen error (error no 11001) getaddrinfo failed"

velvet thorn Nov 5, 2020, 8:28 AM

#

Hello everyone, I've been having a little issue. I'm unable to import datasets from sklearn. I'm getting a "URLopen error (error no 11001) getaddrinfo failed"
@boreal summit HUH.

#

are you running behind a proxy?

#

like are you in school or something

#

or somewhere that restricts what sites you can visit

brazen canyon Nov 5, 2020, 8:29 AM

#

Dru, that has to do with your internet
Check and try again

boreal summit Nov 5, 2020, 8:46 AM

#

No, I'm running it on vs code.

#

Sorry, I had to go do something real quick.

#

@velvet thorn @brazen canyon it's on vs code.

#

Running jupyter on vs code.

chrome barn Nov 5, 2020, 8:53 AM

#

running it on vscode has nothing to do with your internet, read the questions above again....

boreal summit Nov 5, 2020, 9:28 AM

#

I'm running a proxy.

#

Not on the internet.

lapis sequoia Nov 5, 2020, 9:29 AM

#

How would you guys break this down np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)What should be the result of that shape..Is is a 7x7 matrix or?

undone flare Nov 5, 2020, 9:33 AM

#

np.zeros((7,7)) This will give 7x7 matrix

boreal summit Nov 5, 2020, 9:34 AM

#

I'm also not connected to the internet.

lapis sequoia Nov 5, 2020, 9:34 AM

#

I think I got it..is this a 4D tensor then?

undone flare Nov 5, 2020, 9:36 AM

#

yea I think so

lapis sequoia Nov 5, 2020, 9:36 AM

#

Something like..there x-channels 7x7 matrices twice

#

tricky but interesting :))

velvet thorn Nov 5, 2020, 10:27 AM

#

I'm also not connected to the internet.
@boreal summit you need to be

#

the datasets are downloaded

#

if you're accessing them for the first time

#

How would you guys break this down np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)What should be the result of that shape..Is is a 7x7 matrix or?
@lapis sequoia it's 4D

boreal summit Nov 5, 2020, 10:27 AM

#

@velvet thorn ooh, I never knew. I thought they come with the installation. Thanks for the tip. 👍🏿👍🏿

undone flare Nov 5, 2020, 10:27 AM

#

How can I get this only for Bug?

📎 unknown.png

velvet thorn Nov 5, 2020, 10:28 AM

#

@velvet thorn ooh, I never knew. I thought they come with the installation. Thanks for the tip. 👍🏿👍🏿
@boreal summit np! the thing is some of the datasets are a bit larger

#

and many people will never use them

boreal summit Nov 5, 2020, 10:29 AM

#

@velvet thorn true, that's a valid reason.

lapis sequoia Nov 5, 2020, 10:30 AM

#

@undone flare filter where Type 1 == 'Bug' before doing the groupby()

undone flare Nov 5, 2020, 10:31 AM

#

How?

#

is .where() a thing?

lapis sequoia Nov 5, 2020, 10:32 AM

#

df_xlsx[df_xlsx['Type 1'] == 'Bug].groupby(['Type 1']).count()['count']

#

'Bug' - i missed the closing quote mark

undone flare Nov 5, 2020, 10:33 AM

#

thx

lapis sequoia Nov 5, 2020, 10:33 AM

#

and put Type 2 in the groupby too

grave path Nov 5, 2020, 10:44 AM

#

Hello guys

#

lets say I want to do this
if condition meets put 1 else put 0 in the row

#

how do i add ELSE to this

#

data.income = data.income.replace('>50K',1)

lapis sequoia Nov 5, 2020, 10:46 AM

#

data.income.apply(lambda x: if x == '>50k' then 1 else 0)

grave path Nov 5, 2020, 10:46 AM

#

lambda?

lapis sequoia Nov 5, 2020, 10:47 AM

#

or you can use np.where()

grave path Nov 5, 2020, 10:47 AM

#

how do i do it with np.where?

lapis sequoia Nov 5, 2020, 10:47 AM

#

np.where(data.income == '50k', 1, 0)

grave path Nov 5, 2020, 10:49 AM

#

hmmm

#

Thank you

velvet thorn Nov 5, 2020, 2:08 PM

#

uh

#

data['income'] = (data['income'] == '>50K').astype(int)

#

in general, don't use apply if there's another method

whole vortex Nov 5, 2020, 2:17 PM

#

How would I order the labels in the x axis of a graph using seaborn

velvet thorn Nov 5, 2020, 2:17 PM

#

@whole vortex they should beo rdered by default

whole vortex Nov 5, 2020, 2:18 PM

#

So my data contains a date/time datatype and I've created a new column to retrieve and show the specific day based on these date/time values

#

That works well and good however when the graph is displayed, the days are ordered randomly

#

📎 Screenshot_2020-11-05_at_14.18.58.png

#

I have 6 of these graphs btw

#

Ideally I want to start with Monday and end with sunday, do you or anyone here know if there's a way to custom order the labels here

velvet thorn Nov 5, 2020, 2:19 PM

#

ah, okay

#

this is a bit different

#

they're not ordered randomly

#

they're ordered in increasing order of value

#

you need to order the source data

whole vortex Nov 5, 2020, 2:20 PM

#

That's coincidence

#

I'll give you all 6 graphs

velvet thorn Nov 5, 2020, 2:20 PM

#

okay, but anyway

#

have you ordered the source data

whole vortex Nov 5, 2020, 2:20 PM

#

📎 Screenshot_2020-11-05_at_14.20.50.png

#

I haven't done anything to change the data's order

#

I've only been adding data to the pre-existing rows and analysing it all in different ways

velvet thorn Nov 5, 2020, 2:22 PM

#

hm.

#

categorical data with Seaborn is a bit tricky

whole vortex Nov 5, 2020, 2:22 PM

#

That reference is what I used to be able to create 6 separate graphs with the data I had and to present them nicely

#

I'm not restricted to seaborn, I've just been sticking to it because it looks nice 😬

#

I don't mind trying something new? To be honest, I think this is an aesthetic problem and not really needed but I think it'd be nicer to have the days ordered

velvet thorn Nov 5, 2020, 2:32 PM

#

sorry got distracted

#

@whole vortex okay I don't normally use Seaborn (don't like the abstractions)

#

there's probably a way

#

but I don't know what it is

#

in matplotlib

whole vortex Nov 5, 2020, 2:32 PM

#

Aha, don't worry, you're volunteering 😂

velvet thorn Nov 5, 2020, 2:32 PM

#

I would just process the data manually

#

because that's basically the result of a groupby, right

#

and feed that directly to ax.plot

#

because then I would be able to control the order of the data

whole vortex Nov 5, 2020, 2:33 PM

#

This is going to be interesting. I'm quite new to data science as a whole so still figuring some things out

#

I did come across something earlier regarding ordering the days but didn't manage to apply it

#

https://stackoverflow.com/questions/49034829/keep-weekdays-ordered-on-pandas-boxplot-using-seaborn

Stack Overflow

Keep weekdays ordered on pandas boxplot using seaborn

I have a simple dataset with days on it:

dt, value, coltype
2017-01-01, 10, A
2017-01-02, 11, B
2017-01-03, 30, A
2017-01-04, 90, C
2017-01-05, 9, A
2017-01-06, 13, E
2017-01-07, 12, C
2017-01-0...

velvet thorn Nov 5, 2020, 2:34 PM

#

{row,col,hue}_orderlists

Order for the levels of the faceting variables. By default, this will be the order that the levels appear in data or, if the variables are pandas categoricals, the category order.

#

this might help

#

check that out

#

https://seaborn.pydata.org/generated/seaborn.FacetGrid.html

whole vortex Nov 5, 2020, 2:50 PM

#

@velvet thorn does matplotlib have a facetgrid equivalent

#

I'm unsure how I'd go about this yet

lapis sequoia Nov 5, 2020, 3:13 PM

#

yes, take a look at subplots() on the matplotlib documentation

narrow flume Nov 5, 2020, 3:15 PM

#

can anyone help me with matplotlib? I am doing Sierpiński Triangle

tropic junco Nov 5, 2020, 3:47 PM

#

i have a value, temperature, and i want to make a graph in matplotlib with 0 C to 50 C, and i want my temperature to show on that graph, how will i make this?

whole vortex Nov 5, 2020, 3:48 PM

#

What type of graph do you want

tropic junco Nov 5, 2020, 3:48 PM

#

any kind tbh

#

a line one would be good though

hollow sentinel Nov 5, 2020, 3:55 PM

#

https://matplotlib.org/tutorials/introductory/pyplot.html

#

https://datatofish.com/line-chart-python-matplotlib/

Data to Fish

How to Plot a Line Chart in Python using Matplotlib - Data to Fish

In this short guide, you'll see how to plot a Line chart in Python using Matplotlib. Example is also included for demonstration.

tropic junco Nov 5, 2020, 3:55 PM

#

i see, thanks

#

but the data is different from the examples

#

i have one data and i want to display it between a range

#

so a straight line

hollow sentinel Nov 5, 2020, 5:19 PM

#

https://pythonforbiologists.com/ @left vault out of my realm of stuff I know but this might be helpful

Python for Biologists

mortal pendant Nov 5, 2020, 5:47 PM

#

Hey! I'm wanting to learn how to make an RNN but I can't find anything that doesn't require Tensorflow. However, I am unable to install tensorflow through pip as I get an error that a lot of other people seem to get but none of the alternative command lines work.ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none) ERROR: No matching distribution found for tensorflowI've tried lots of the .whl files that I've seen suggested as solutions but either pip says it's unsupported or it just results in another large error. Any ideas on an actual fix?
I'm on Windows 10 64bit, I wish to use my GPU, I just updated to 3.8.0 to see if that might fix it despite the fact Tensorflow is supposed to support Python 3.5 and up, I'm on the latest version of pip... let me know if you need any more information

flint arrow Nov 5, 2020, 7:40 PM

#

hello

#

I am in my 1st sem of DSA

#

please suggest what I should be learning out of class

hollow sentinel Nov 5, 2020, 11:59 PM

#

@flint arrow do you like courses or books

flint arrow Nov 5, 2020, 11:59 PM

#

I am already in Bachelors coursr

#

I want to be good at programming

hollow sentinel Nov 6, 2020, 12:00 AM

#

you didn’t answer my question lmao

obtuse skiff Nov 6, 2020, 12:20 AM

#

Can someone pls help me understand bias in a neural network
say the bias is 1, does it act like another input and have a weight for each output/node

or does it just add 1 to each node.

#

Ive seen both, and idk which is correct or if both are and when to use one over the other

velvet thorn Nov 6, 2020, 1:19 AM

#

@obtuse skiff each neuron always has its own bias

#

but there are two ways to represent that

#

one bias per layer and one weight per neuron

#

or simply one bias per neuron

#

output = w * a + b, which is equivalent to w * (a + b / w).

#

although you can have one bias per layer

#

but that would make it harder to fit

smoky bobcat Nov 6, 2020, 3:25 AM

#

@serene scaffold here i am

serene scaffold Nov 6, 2020, 3:25 AM

#

there you are indeed

#

let's see if we can figure out what this article is saying: https://sebastianraschka.com/faq/docs/lda-vs-pca.html

Dr. Sebastian Raschka

What is the difference between LDA and PCA for dimensionality reduc...

Both LDA and PCA are linear transformation techniques: LDA is a supervised whereas PCA is unsupervised – PCA ignores class labels.

#

by the way, any time you have a general question about machine learning, a lot of people who know way more about the subject than me hang out here.

#

in this particular channel.

smoky bobcat Nov 6, 2020, 3:26 AM

#

I red it before, from my perspective for PCA is that it sees a covariance between 2 different datas and then tries to standardize it?

#

by the way, any time you have a general question about machine learning, a lot of people who know way more about the subject than me hang out here.
@serene scaffold ok thanks for the info

#

I red it before, from my perspective for PCA is that it sees a covariance between 2 different datas and then tries to standardize it?
@smoky bobcat is this right by any chance? like Tries to standardise data by looking at covariance matrix between different data @serene scaffold

serene scaffold Nov 6, 2020, 3:28 AM

#

I only learned about LDAs recently so I'm trying to wrap my head around all this myself

#

hmm

smoky bobcat Nov 6, 2020, 3:29 AM

#

i havent got even a clue about LDA

#

it's even more confusing than PCA

serene scaffold Nov 6, 2020, 3:32 AM

#

I'll see if another staff member can more effectively answer this question.

smoky bobcat Nov 6, 2020, 3:32 AM

#

ook

#

i think that LDA is more about classification while PCA is more about standardisation

velvet thorn Nov 6, 2020, 3:56 AM

#

i think that LDA is more about classification while PCA is more about standardisation
@smoky bobcat ...what do you mean by that?

smoky bobcat Nov 6, 2020, 3:57 AM

#

@smoky bobcat ...what do you mean by that?
@velvet thorn i mean that LDA tries to classify the data in different portions while PCA tries to get all the data at the same level. correct me if im wrong, im not an expert just a noobie trying to understand

velvet thorn Nov 6, 2020, 3:57 AM

#

uh.

#

to be clear

#

when you say LDA

#

you mean latent discriminant analysis, right?

tropic junco Nov 6, 2020, 3:57 AM

#

``` this is pretty vague, but what would be the best way to plot this kind of data, i just want to plot a few things like temp and humidity, i am getting this data from an api

#

can someone help?

velvet thorn Nov 6, 2020, 4:09 AM

#

@tropic junco your question is pretty vague.

#

as you noticed

#

also the data is very chunky

#

like maybe if you shared your ultimate objective

#

it'd be easier to help you

#

"I just want to plot a few things" <- what did you set out to do originally?

tropic junco Nov 6, 2020, 4:09 AM

#

i just want to plot a graph for temperature and humidity

#

but i am getting confused as how to do it, as i cant plot one time values

velvet thorn Nov 6, 2020, 4:11 AM

#

uh

#

isn't that one entry

#

in your dataset?

tropic junco Nov 6, 2020, 4:11 AM

#

wdym?

velvet thorn Nov 6, 2020, 4:12 AM

#

you said "this kind of data"

#

so I assume you have more like that

#

so just extract temperature and humidity

#

now you have 2 1D arrays

#

scatterplot them against each other

tropic junco Nov 6, 2020, 4:12 AM

#

i mean, if i have temp given 25 C , how do i plot it between a range of 0 C to 50 C

#

oh

heady hatch Nov 6, 2020, 4:13 AM

#

Hey @velvet thorn :^) They wanted to separate out the number in the tuple as its own column now.

eg

('a', 1) ... -> ('a', 1) (1)

velvet thorn Nov 6, 2020, 4:13 AM

#

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

heady hatch Nov 6, 2020, 4:13 AM

#

Hahahaha

velvet thorn Nov 6, 2020, 4:13 AM

#

I'M DYING

#

🥴

#

so who wants you to do this

#

I'm guessing they don't have DB experience

tropic junco Nov 6, 2020, 4:18 AM

#

it just shows an empty graph to me

heady hatch Nov 6, 2020, 4:19 AM

#

hahaha I think they wanted to do this so they can visually see how the data breakdown.

velvet thorn Nov 6, 2020, 4:25 AM

#

it just shows an empty graph to me
@tropic junco show ocde

tropic junco Nov 6, 2020, 4:26 AM

#

nvm, i realized it will be useless to plot a graph of one time values rather than comparing it with past ones

#

like a graph of the temperatures in the past week

smoky bobcat Nov 6, 2020, 4:56 AM

#

you mean latent discriminant analysis, right?
@velvet thorn yes sir

surreal willow Nov 6, 2020, 7:31 AM

#

Idk if this is the right channel, but is anyone here fammiliar with likelihood ratios?

flint arrow Nov 6, 2020, 10:39 AM

#

@hollow sentinel I meant I am in a course and its ok.

#

but u can suggest me some other courses too.

tropic junco Nov 6, 2020, 12:55 PM

#

how can you create graphs with sqlite query?

#

or, what is the best way to create a graph for the user's messages i get from my discord bot?

hollow sentinel Nov 6, 2020, 1:33 PM

#

@flint arrow python for data science and machine learning bootcamp

flint arrow Nov 6, 2020, 1:36 PM

#

@hollow sentinel from?

#

there are so many

hollow sentinel Nov 6, 2020, 1:42 PM

#

Udemy

flint arrow Nov 6, 2020, 2:03 PM

#

right..thank you.

hasty grail Nov 6, 2020, 2:06 PM

#

Is there a more efficient way to do batched scatter operations (in TensorFlow terms) in NumPy? Currently I have something like this:

>>> import numpy as np
>>> a = np.zeros((5, 5))
>>> indices = [4, 2, 4, 3, 1]
>>> np.add.at(a, (np.arange(5), indices), 1)
>>> print(a)
[[0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0.]
 [0. 1. 0. 0. 0.]]

#

I'm also interested in ways to parallelize a loop that calls the batched scatter operation in each iteration.
The use case is building a histogram from a dataset (in practice the dataset is a generator instead of a NumPy array because it doesn't fit in memory).

import numpy as np

n_samples, n_positions, n_bins = 1024, 256, 100    # Real situation: (~64k, ~64k, ~256)
hist_per_position = np.zeros((n_positions, n_bins), dtype=int)
idx_dataset = np.random.randint(n_bins, size=(n_samples, n_positions))
for bin_indices in idx_dataset:
    np.add.at(hist_per_position, (np.arange(n_positions), bin_indices), 1)

rich silo Nov 6, 2020, 2:18 PM

#

Hey guys how do i mark code here?
If i need to post same code here?

hasty grail Nov 6, 2020, 2:18 PM

#

!code

arctic wedgeBOT Nov 6, 2020, 2:18 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

#

Hey @rich silo!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

rich silo Nov 6, 2020, 2:22 PM

#

!code-blocks

arctic wedgeBOT Nov 6, 2020, 2:22 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

rich silo Nov 6, 2020, 2:23 PM

#

print('Hello world!')

hasty grail Nov 6, 2020, 2:33 PM

#

There you go. 🙂

grave path Nov 6, 2020, 2:40 PM

#

Hello guys I have a question I'm trying to gather a bit of information for a project and I'm looking into an image classification problem, where I have for example different animals and the program needs to be able to classify the animals with best accuracy? What would you guys recommend me to look into considering I would want to test multiple algorithms and see what would be most accurate for such a problem should I use MachineLearning or Deep Learning and what tools should I learn or libraries?

#

If I use keras would I be able to specificy which algorithm I want it to use or how exactly does it work

hasty grail Nov 6, 2020, 2:43 PM

#

What do you mean by "algorithm"?

#

Keras is pretty much for Deep Learning only

grave path Nov 6, 2020, 2:44 PM

#

Well I just want someone to direct me a bit tbh, the algorithm to classify the pictures

hasty grail Nov 6, 2020, 2:44 PM

#

If you want to use other types of Machine Learning methods (such as KMeans) you may want to take a look at sklearn

grave path Nov 6, 2020, 2:44 PM

#

like different algorithms will give different accuracy

#

should I be using ML or DL?

#

what would be easier ?

hasty grail Nov 6, 2020, 2:47 PM

#

DL is a subset of ML

#

In your case, I do recommend using DL

#

There are plenty of tutorials on Keras that you can search online

grave path Nov 6, 2020, 2:48 PM

#

I know its a subset but I don't understand then if I use ML would then ML automatically use DL behind the scenes

hasty grail Nov 6, 2020, 2:48 PM

#

Keras is a Deep Learning library

#

Anything you set up there is basically DL

grave path Nov 6, 2020, 2:50 PM

#

Then I can specify in Keras whether I want it to use CNN, RNN or other algorithms?

rugged owl Nov 6, 2020, 2:59 PM

#

Hi everyone, as one of the authors of the open-source framework github.com/dstackai/dstack I’d like to kindly share with the community what I and my friends are doing currently to help use ML models in applications.

In today blog post we wrote on how one can run ML models on live data to build interactive reports with our open-source library https://blog.dstack.ai/run-ml-model-on-live-data-to-build-interactive-reports If this is something relevant to your work, we’d appreciate your feedback!

dstack.ai

Run ML model on live data to build interactive reports

Using ML models to predict and solve business use cases currently is a very lengthy and iterative process. Data scientists and engineers need to tackle a lot of challenges continuously, from building and improving the models to deploying them and usi...

pallid oxide Nov 6, 2020, 3:07 PM

#

Hi guys! I'm looking for projects which utilises the concept of digital twins. I'm doing a research for a school assignment and thus would like to see what has been done already.

vast lava Nov 6, 2020, 3:15 PM

#

data-science , I am looking for SymPY for calculating integral from calculus, I am struggling with some fundamentals for calculating the area under the curve. Can anyone help ?

rich silo Nov 6, 2020, 3:27 PM

#

Hello all, I am looking for some help with plotly.
I want to make 2 vertically stacked graphs that share the same range slider (and also the x-axis).
This is my code so far:

#

https://controlc.com/5a075f09

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0...

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0.2, 0.8]) fi - 5a075f09

#

Too long to paste

pallid oxide Nov 6, 2020, 3:30 PM

#

@rich silo plotly provides a library called dash, which has that capability. Maybe you could take a look at it?

#

See: https://dash.plotly.com/interactive-graphing

Part 4. Interactive Graphing and Crossfiltering | Dash for Python D...

Bind interactivity to the Dash Graph component whenever you hover, click, or select points on your chart.

hasty grail Nov 6, 2020, 3:41 PM

#

@grave path Sorry for the late response, you basically build your model block by block so it's highly customizable.

grave path Nov 6, 2020, 3:43 PM

#

@hasty grail its okay thanks a lot man I'll just have to figure out whether I will do it in ml or dl considering the time frame I have and which is less complex as Im learning ML right now

smoky bobcat Nov 6, 2020, 4:13 PM

#

anyone can suggest me a really good dataset to work on for a uni coursework?

hollow sentinel Nov 6, 2020, 4:28 PM

#

@smoky bobcat Kaggle is a great source for datasets

lapis sequoia Nov 6, 2020, 5:28 PM

#

@smoky bobcat uspto has some good sets https://developer.uspto.gov/data?MURL=data

viral rock Nov 6, 2020, 5:39 PM

#

hi I am new in python is there anyway to install pep8

#

in pycharm

#

i am using mac os

austere swift Nov 6, 2020, 5:40 PM

#

it should just be pip3 install pep8

spare trellis Nov 6, 2020, 5:42 PM

#

I don't know if this is the place to ask but should I learn SQL before starting a data science course or am I able to learn it while doing it?

austere swift Nov 6, 2020, 5:42 PM

#

you dont really need sql for data science

spare trellis Nov 6, 2020, 5:43 PM

#

Oh, do I need any other language understanding besides Python 3 or should I be set to dive in and have fun with it

austere swift Nov 6, 2020, 5:43 PM

#

no you can just go straight in lol

spare trellis Nov 6, 2020, 5:43 PM

#

woot thank you, have a good one

smoky bobcat Nov 6, 2020, 5:56 PM

#

https://www.kaggle.com/sid321axn/heart-statlog-cleveland-hungary-final

Heart Disease Dataset (Comprehensive)

statlog + cleveland + hungary dataset

#

how does this one look

hollow sentinel Nov 6, 2020, 5:57 PM

#

@smoky bobcat depends on what you want to do

smoky bobcat Nov 6, 2020, 5:57 PM

#

uni coursework

hollow sentinel Nov 6, 2020, 5:58 PM

#

yes but like do you want tabular data?

smoky bobcat Nov 6, 2020, 6:01 PM

#

what do you mean

austere swift Nov 6, 2020, 6:01 PM

#

what kinda data do you want

#

images? numbers? tables? etc

smoky bobcat Nov 6, 2020, 6:02 PM

#

numbers

austere swift Nov 6, 2020, 6:02 PM

#

that would be tabular data

smoky bobcat Nov 6, 2020, 6:02 PM

#

its been all day im searching for a good dataset as i need to start working on something asap

heady hatch Nov 6, 2020, 6:02 PM

#

Here's something real basic.

#

http://archive.ics.uci.edu/ml/datasets/Iris/

smoky bobcat Nov 6, 2020, 6:02 PM

#

that would be tabular data
@austere swift oh okay

heady hatch Nov 6, 2020, 6:02 PM

#

All numbers, nothing but numbers.

#

Lots of data science parts to practice.

austere swift Nov 6, 2020, 6:03 PM

#

tabular data is basically just any data thats in the form of tables, like different features of something

smoky bobcat Nov 6, 2020, 6:03 PM

#

Here's something real basic.
@heady hatch bro, cant do the most basic ones like these, these are used as example is in uni lectures

heady hatch Nov 6, 2020, 6:03 PM

#

Oh man.

errant cargo Nov 6, 2020, 6:37 PM

#

Hey guys, i'm a recent grad of computational physics, and for this year i've studied lots of python, data science tools like numpy, pandas, scikit-learn, data visualization, basic sql, machine learning fundamentals and algorithms with scikit-learn, and neural networks with keras tensorflow. I'm doing some projects, but I feel like it would be better for me to ask an experienced person on some tips, so that I know i'm not just wasting time. What more should I learn, what projects should I make, and is there something that i'm missing?

austere swift Nov 6, 2020, 6:39 PM

#

well if you havent gone into the deep math of neural networks and machine learning i highly recommend you do since thatll help you make much better models and will make your life a whole lot easier

#

as for projects that's really up to you and what you wanna do

#

since you're into computational physics you can do some projects of machine learning in physics

#

there were a few papers I've heard of that used machine learning and deep learning for CFD simulations and it made them wayy more efficient in terms of processing and speed, you can try to replicate those

errant cargo Nov 6, 2020, 6:42 PM

#

Yeah, I did go into the maths, they really are useful. Some holes here and there but I intend to make an implementation on each and everyone of them soon enough to make sure I learned. Yesterday I did a little project on computational physics, went well actually.

#

I'll try looking up for them

#

I've been wanting to know what else should I learn to get started on the career as a junior DS. I've heard that stuff like Azure spark is important

#

Does anyone knows a platform for people looking for a mentee?

austere swift Nov 6, 2020, 6:49 PM

#

codementor.io?

errant cargo Nov 6, 2020, 6:49 PM

#

i'll try it out

#

thanks!

errant cargo Nov 6, 2020, 7:13 PM

#

Apparently its all paid, and I don't have much money atm. And I only want some guide/roadmap, I don't need someone to teach me something in specific.

heady hatch Nov 6, 2020, 7:17 PM

#

@errant cargo Hmm anything you're looking for specifically?

#

To say that you're not wasting time, and what you should learn, depends on what your final goal is.

#

Do you want to get a job? Do you want to go back into academia? etc etc.

errant cargo Nov 6, 2020, 7:23 PM

#

Getting a job first for sure

heady hatch Nov 6, 2020, 7:24 PM

#

Okay now what kind of job?

#

Do you want a MLE, DE, DS, DA, etc etc.

errant cargo Nov 6, 2020, 7:24 PM

#

Data Scientist

#

Don't know exactly what domain tbh

#

finance, or tech

heady hatch Nov 6, 2020, 7:25 PM

#

Okay, DS have different requirements and definitions at different companies.

#

Do you know what kind of company and what kind of ds they're looking for?

#

I think it's good that you have a good pool of skills to refine.

#

Now the next step would probably be looking for particular company to understand what skills they're looking for.

errant cargo Nov 6, 2020, 7:26 PM

#

makes sense

heady hatch Nov 6, 2020, 7:26 PM

#

Because how's your analytical skills?

errant cargo Nov 6, 2020, 7:26 PM

#

than work on the stuff they require

#

In terms of EDA, i believe that a few more notebooks and it'll be really decent

heady hatch Nov 6, 2020, 7:27 PM

#

Hmm not just EDA.

#

But actually breaking down a problem.

errant cargo Nov 6, 2020, 7:27 PM

#

THeres some statistical concepts that I need to learn that my uni didnt cover, but thats fine

heady hatch Nov 6, 2020, 7:27 PM

#

Let's say a company asks you to break down why their user engagement is decreasing by 10% over the past few weeks.

errant cargo Nov 6, 2020, 7:27 PM

#

Abstracting and stuff, its decent. Can get better

#

hmm

heady hatch Nov 6, 2020, 7:28 PM

#

EDA is nice and helpful in many things but not really helpful if you can't get some insight that will help with the solution.

errant cargo Nov 6, 2020, 7:28 PM

#

itll depend on what do I have to work with

#

But I guess that what I have to work with depends on me as well

#

Say, maintaining a data base

heady hatch Nov 6, 2020, 7:29 PM

#

Here's some of the definition of DS I've come across.

Hard ML researchers
DS for products/decisions
DS, that's a senior version of DA
Some combination of DA + DE, maybe MLE

#

Probably many more.

#

Hard ML researchers usually look for graduate degrees in actual ML.

#

DS for products and decisions is dealing with the question I asked above.

#

senior version of DA is also that but I suppose adding ML to the mix.

#

Sometimes company doesn't have infrastructure so they ask you to do the data engineering too.

errant cargo Nov 6, 2020, 7:31 PM

#

That would be something i would have to work on a lot if they ask me

#

since I dont have a CS curriculum, just a computational physics

heady hatch Nov 6, 2020, 7:32 PM

#

I think if you want a direction for the next step to take, talk to people who are actually working and ask them what their company is like and what their data scientists are like.

#

I think having some kind of comfort with programming is nice.

errant cargo Nov 6, 2020, 7:32 PM

#

I've been interested in IBM recently, so i'll try that first

heady hatch Nov 6, 2020, 7:32 PM

#

Which then helps you ease into what company might be actually looking for.

errant cargo Nov 6, 2020, 7:33 PM

#

I'll try to find some then

heady hatch Nov 6, 2020, 7:33 PM

#

Good luck.

#

Feel free to come back and ask more questions.

errant cargo Nov 6, 2020, 7:34 PM

#

Atm i'm just developing skills that I know that i'll use as a Data Scientist, but I havent looked into the gritty details yet

#

Which now would be the moment

#

Thanks a lot, would definitely help

heady hatch Nov 6, 2020, 7:35 PM

#

Not to be mean but to play devil's advocate. How do you know you'll use them as a data scientist?

#

Unless you've had data scientist experience already, I'm curious of what you're using as your ground of evidence.

errant cargo Nov 6, 2020, 7:36 PM

#

Everywhere that I looked it mentioned

#

I'm mostly learning from books that are focused on data science

heady hatch Nov 6, 2020, 7:37 PM

#

That's fair.

#

To be honest, I'm in a similar boat as you.

errant cargo Nov 6, 2020, 7:37 PM

#

Data Science Tools for python, hands on machine learning with scikit-learn and keras

heady hatch Nov 6, 2020, 7:37 PM

#

I don't have any fancy degree in ML and pretty much everything is self taught.

#

Currently working as a NLP engineer.

errant cargo Nov 6, 2020, 7:37 PM

#

It's rough

heady hatch Nov 6, 2020, 7:38 PM

#

There are some data scientists I've come across that doesn't touch ML at all.

errant cargo Nov 6, 2020, 7:38 PM

#

Thats cool, i've been wanting to learn a bit on NLP

heady hatch Nov 6, 2020, 7:38 PM

#

Which then kinda makes me question why are you learning sklearn and tf/pt if you're not going to use them on your job.

#

But I'm digressing.

#

I think asking industry people for their experience is a much better metric.

#

Because you get to see what they're working with and what they're looking for.

errant cargo Nov 6, 2020, 7:39 PM

#

yeah, nothing better than people actually working on it

#

Although I would guess that it would depend a lot on the job that they're doing

heady hatch Nov 6, 2020, 7:40 PM

#

Mhm.

errant cargo Nov 6, 2020, 7:40 PM

#

So I would have to ask more than one person

heady hatch Nov 6, 2020, 7:40 PM

#

👍

#

I hope that gives you somewhat of a direction for the next step to take.

errant cargo Nov 6, 2020, 7:40 PM

#

thanks a lot for the help though, def helped

#

yeah, it did

#

Since you're working already, would you recommend me getting an intern before trying to apply as a DS?

heady hatch Nov 6, 2020, 7:41 PM

#

Yea, unless you have some kind of connection to the company.

#

Or maybe sometimes they're okay with you just having academia experience.

#

I think that part depends on how well you sell yourself in terms of job search + interview.

errant cargo Nov 6, 2020, 7:42 PM

#

yeah

#

In any case, maybe its good to do 2 months or 3 of internship just to fixate the stuff i learned

#

thanks for the talk bud

heady hatch Nov 6, 2020, 7:44 PM

#

Ye, update us. Would love to hear your progress.

errant cargo Nov 6, 2020, 7:44 PM

#

Yeah, same for you

smoky bobcat Nov 6, 2020, 8:16 PM

#

how do I balance a dataset?

heady hatch Nov 6, 2020, 8:17 PM

#

You can under, over, or combine under and over sampling.

smoky bobcat Nov 6, 2020, 8:18 PM

#

@heady hatch u good good in this ml stuff?

heady hatch Nov 6, 2020, 8:19 PM

#

Maybe? I have no idea.

#

I can only provide my thoughts. lol

smoky bobcat Nov 6, 2020, 8:19 PM

#

lol u work?

heady hatch Nov 6, 2020, 8:21 PM

#

You should ask your ds questions. hahaha

smoky bobcat Nov 6, 2020, 8:26 PM

#

lol

heady hatch Nov 6, 2020, 8:32 PM

#

~~Hey guys question on fine tuning gpt2.~~

~~Let's say I'm trying to generate stories, would it be better to fine tune it on the whole story text or the stories broken down into sentences?~~

Never mind, figured out a direction to head towards!

rich silo Nov 6, 2020, 9:15 PM

#

Hello all, I am looking for some help with plotly.
I want to make 2 vertically stacked graphs that share the same range slider (and also the x-axis).
This is my code so far:

#

https://controlc.com/5a075f09

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0...

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0.2, 0.8]) fi - 5a075f09

proper swift Nov 6, 2020, 10:13 PM

#

anyone use Kaggle on here?

gray phoenix Nov 6, 2020, 10:25 PM

#

I have an variable integer that is 20201015.

How do i convert it to datetime while maintaining the format of yyyymmdd?

hollow sentinel Nov 6, 2020, 10:29 PM

#

@proper swift yeah it's a great resource

proper swift Nov 6, 2020, 10:29 PM

#

@hollow sentinel could you get this csv file for me, https://www.kaggle.com/crawford/80-cereals?select=cereal.csv

80 Cereals

Nutrition data on 80 cereal products

#

ive forgot my password and my reset email hasnt come through yet :/

hollow sentinel Nov 6, 2020, 10:30 PM

#

you can't download it yourself?

#

oh

#

print(val_y.head())

#

anyone know why it's saying invalid syntax

#

I don't see it

gray phoenix Nov 6, 2020, 10:37 PM

#

I dont think you need print with .head()

#

@hollow sentinel

hollow sentinel Nov 6, 2020, 10:37 PM

#

nope still wrong

gray phoenix Nov 6, 2020, 10:37 PM

#

why are you printing the df?

hollow sentinel Nov 6, 2020, 10:38 PM

#

bc Kaggle asked

gray phoenix Nov 6, 2020, 10:38 PM

#

oh lol

hollow sentinel Nov 6, 2020, 10:38 PM

#

# print the top few validation predictions
print(iowa_model.predict(val_X.head())
# print the top few actual prices from validation data
val_y.head()

#

confusion

#

why is kaggle so stupid

#

idk why it's wrong too

#

nvm copy pasting from the answer key fixed it for some reason

nova smelt Nov 6, 2020, 10:42 PM

#

Is this the channel for stuff related to machine learning and AI?

hollow sentinel Nov 6, 2020, 10:43 PM

#

yessir

nova smelt Nov 6, 2020, 10:44 PM

#

If so...
Can you guy recommend any tutorials to learn neural networks? I've watched the series about neural networks and the series about machine learning by tech with Tim

#

Dunno if you know him

#

But now I feel kinda stuck in what to do next

hollow sentinel Nov 6, 2020, 10:44 PM

#

Tensorflow 2.0 deep learning and artificial intelligence

#

it's on udemy

nova smelt Nov 6, 2020, 10:44 PM

#

Okay

#

I will check that out

#

Thx

hollow sentinel Nov 6, 2020, 10:45 PM

#

no problem

rich silo Nov 6, 2020, 11:05 PM

#

Hello all, I am looking for some help with plotly.
I want to make 2 vertically stacked graphs that share the same range slider (and also the x-axis).
This is my python code so far:
https://controlc.com/5a075f09

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0...

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0.2, 0.8]) fi - 5a075f09

acoustic shadow Nov 6, 2020, 11:14 PM

#

i need some hep with pandas?

#

my Dataframe seems to be duplicating its self

rich silo Nov 6, 2020, 11:14 PM

#

lol how

#

code plz

acoustic shadow Nov 6, 2020, 11:15 PM

#

sure

#

import seaborn
import matplotlib.pyplot
import numpy
import pandas
import requests
import re
import parse
from parse import *
import pandas as pd
    


#Pull Database, from Site 
DB = requests.get("https://www.milehighcomics.com/cgi-bin/genresearch.cgi?title=SUPERM").text

#Global Variables, to pull from
lines = DB.split("\n")

#Create Easy Dataframe to Confirm Conditions

data = pandas.DataFrame({
  "Store": "Mile High Comics",
  "Comics": lines ,
  "Comics": lines ,
  "Comics": lines ,
  "Comics": lines
})
#Change Data Frame size to display The entire Data frame


print("This is working?!?!")

#Fuctions, which search the scraped Site

def BatmanMap(line):
    for line in lines:
      return 1 if search("Batman", line) else 0
  
def WWMap(line):
    for line in lines:
      return 1 if search("Wonder Woman", line) else 0

def GLMap(line):
    for line in lines:
      return 1 if search("Green Lantern", line) else 0

def FMap(line):
    for line in lines:
      return 1 if search("Flash", line) else 0

#Mapping to the Data Frame

data["Wonder Woman"] = data["Comics"].map(WWMap)

data["Green Lantern"] = data["Comics"].map(GLMap)

data["Batman"] = data["Comics"].map(BatmanMap)

data["Flash"] = data["Comics"].map(FMap)

pd.set_option("display.max_rows", None, "display.max_columns", None)

print(data)

#

The project is to eventually, graph a bar chart, detailing something, what i choose was superheros appearing in Superman Titles, on this comic stores Site, however i suck at coding, and it doesnt seem to be working, i added in the Pd.Set_option, but ever since that was added it just makes 2 data frames, one which is the Entire Sites, Source and another where it is correctly formatted, but doesnt work (because i suck)

#

so right now im just trying to get it to where Pandas formats the Dataframe i want, and not reposts the sites Source....

heady hatch Nov 7, 2020, 1:10 AM

#

I don't know if this is the issue you're having but I think your dataframe is initialized with the same column rewriting itself.

data = pandas.DataFrame({
  "Store": "Mile High Comics",
  "Comics": lines ,
  "Comics": lines ,
  "Comics": lines ,
  "Comics": lines
})

lapis sequoia Nov 7, 2020, 2:55 AM

#

Any recommendations on a finite difference book that gives examples in Python?

tropic junco Nov 7, 2020, 3:02 AM

#

how can i make a bar graph, with my x axis like - [1, 1, 1, 2, 3, 3, 2, 1, 5, 4, 1, 1, 2, 4, 2, 3, 1, 2], basically i want to make a graph based on the occurences of same elements

acoustic shadow Nov 7, 2020, 3:05 AM

#

Didnt fix it

#

but, did reduce code.

#

so thanks

heady hatch Nov 7, 2020, 3:21 AM

#

@tropic junco I'm not sure what you're talking about with the xaxis, but you can look into how to make a histogram.

tropic junco Nov 7, 2020, 3:21 AM

#

i see

tropic junco Nov 7, 2020, 4:08 AM

#

i did it :)

heady hatch Nov 7, 2020, 4:11 AM

#

Congratulations!

undone flare Nov 7, 2020, 7:49 AM

#

what does index_col do in pd.read_csv()

heady hatch Nov 7, 2020, 7:58 AM

#

It sets the index as the column you want.

undone flare Nov 7, 2020, 8:01 AM

#

k

rancid mango Nov 7, 2020, 9:30 AM

#

is this the chat for aperture science

winged lark Nov 7, 2020, 11:28 AM

#

good morning

mild topaz Nov 7, 2020, 11:45 AM

#

Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\demo3\recDoc1.py", line 283, in post
    print("{}: {:.2f}%".format(label1, predictions1 * 100))
TypeError: unsupported format string passed to numpy.ndarray.__format__```

summer holly Nov 7, 2020, 12:57 PM

#

Hi, I'm trying to deploy my custom keras flask app which has a size of about 2.3gb and due to these heavy size constraints, I don't think it is possible to use heroku or netlify to deploy it. Is there any alternative?

#

*free alternative

#

Or even a budget alternative

#

📎 Screenshot_20201107-1832422.jpg

grave frost Nov 7, 2020, 1:39 PM

#

Google VM??

#

@mild topaz The error is pretty self-explanatory - you passed an incorrect datatype

slender nymph Nov 7, 2020, 1:43 PM

#

hi, someone can help me and say me what thats mean :
ValueError: exog does not have full column rank.

grave frost Nov 7, 2020, 1:44 PM

#

Did you google your error first?

slender nymph Nov 7, 2020, 1:44 PM

#

a=PanelOLS(dependent=df['logQ'],exog=df[['founderCEO','logassets','logage','bs_volatility']],time_effects=True)
print(a.fit())```

#

yeah and not find a solution

grave frost Nov 7, 2020, 1:44 PM

#

BTW Post the whole Traceback

#

So it becomes easier to help you

slender nymph Nov 7, 2020, 1:45 PM

#

import pandas as pd
import numpy as np
from linearmodels import PanelOLS


#lecture data 
data = pd.read_excel("familyfirms.xlsx")

#drop NaN
data.dropna(inplace=True)

#Log Tobin's Q
data['logQ'] = np.log(data['Q'])

#Log age
data['logage'] = np.log(data['agefirm'])

#Log assets
data['logassets'] = np.log(data['assets'])

df = data.set_index(['company','year'])

a=PanelOLS(dependent=df['logQ'],exog=df[['founderCEO','logassets','logage','bs_volatility']],time_effects=True)
print(a.fit())

data```

#

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-313-bb26f941ef72> in <module>
     21 df = data.set_index(['company','year'])
     22 
---> 23 a=PanelOLS(dependent=df['logQ'],exog=df[['founderCEO','logassets','logage','bs_volatility']],time_effects=True)
     24 print(a.fit())
     25 

~\anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights, entity_effects, time_effects, other_effects, singletons, drop_absorbed)
   1038         drop_absorbed: bool = False,
   1039     ) -> None:
-> 1040         super(PanelOLS, self).__init__(dependent, exog, weights=weights)
   1041 
   1042         self._entity_effects = entity_effects

~\anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights)
    242         )
    243         self._original_index = self.dependent.index.copy()
--> 244         self._validate_data()
    245         self._singleton_index: Optional[NDArray] = None
    246 

~\anaconda3\lib\site-packages\linearmodels\panel\model.py in _validate_data(self)
    381         w = w / w.mean()
    382         self.weights = PanelData(w)
--> 383         rank_of_x = self._check_exog_rank()
    384         self._constant, self._constant_index = has_constant(x, rank_of_x)
    385 

~\anaconda3\lib\site-packages\linearmodels\panel\model.py in _check_exog_rank(self)
    343         rank_of_x = matrix_rank(x)
    344         if rank_of_x < x.shape[1]:
--> 345             raise ValueError("exog does not have full column rank.")
    346         return rank_of_x
    347 

ValueError: exog does not have full column rank.```

#

thats the rror

#

error

#

i dont understand why

#

everything seems okay

📎 unknown.png

golden saffron Nov 7, 2020, 1:57 PM

#

Any leads on chatbot powered by Generative Models? Even any git repo link will do.

grave frost Nov 7, 2020, 2:52 PM

#

@golden saffron DO you want to make one, or do you want to use some pre-existing model?

golden saffron Nov 7, 2020, 2:53 PM

#

@grave frost I want to make one, but need some reference and guidance. I have already made few rule based and context based bots.

grave frost Nov 7, 2020, 2:55 PM

#

Do you know what is a generative model?

#

And have you done NLP before?

golden saffron Nov 7, 2020, 2:56 PM

#

Yes, my next target is to make a Bot who can interact with user.

grave frost Nov 7, 2020, 2:56 PM

#

Yeah, but do you know Machine Learning?

golden saffron Nov 7, 2020, 2:57 PM

#

Something like GPT - 3.

grave frost Nov 7, 2020, 2:57 PM

#

Are you actually trying to make GPT-3 ? I am confused

golden saffron Nov 7, 2020, 2:58 PM

#

Yes, ML, NLP, RL I know. What to use RL for the chatbot.

#

Not exactly GPT - 3 but as I mentioned above some RL based chatbot.

grave frost Nov 7, 2020, 2:59 PM

#

You can't use RL in a chatbot 🤦

misty cargo Nov 7, 2020, 3:00 PM

#

You can't use RL in a chatbot 🤦
@grave frost imagine lol

grave frost Nov 7, 2020, 3:00 PM

#

FIrst, I recommend brush up on the basics of ML and NLP first before diving in to chatbots

golden saffron Nov 7, 2020, 3:00 PM

#

Why not. every conversation will be at one state, there would be some information available about that user that can be used for the conversation

grave frost Nov 7, 2020, 3:01 PM

#

wow. That is not how it works

#

Recommend to brush up on RL as well

misty cargo Nov 7, 2020, 3:01 PM

#

so what do u want exactly?

unborn wraith Nov 7, 2020, 3:01 PM

#

hey!

misty cargo Nov 7, 2020, 3:02 PM

#

hi

unborn wraith Nov 7, 2020, 3:02 PM

#

guys i am new to programming any good resource to learn data science\

golden saffron Nov 7, 2020, 3:02 PM

#

That state can tell me the interest of the user, at lest gender age etc that can be used in conversation.

misty cargo Nov 7, 2020, 3:02 PM

#

guys i am new to programming any good resource to learn data science
@unborn wraith sure do you know the maths already?

grave frost Nov 7, 2020, 3:02 PM

#

That state can tell me the interest of the user, at lest gender age etc that can be used in conversation.
@golden saffron ok , just tell us for what task is the chatbot for

unborn wraith Nov 7, 2020, 3:02 PM

#

@misty cargo nope

#

i am a lil young

misty cargo Nov 7, 2020, 3:02 PM

#

or do you need some calculus, linear algebra and stuff too

golden saffron Nov 7, 2020, 3:03 PM

#

@grave frostI think you are mistaken about RL. RL is all about having a state, a option to be picked up and a reward.

#

WHy cannot that be applied for chatbots?

grave frost Nov 7, 2020, 3:03 PM

#

@golden saffron You can research about that. Bottom line is that it would produce a random bag of words

unborn wraith Nov 7, 2020, 3:03 PM

#

please tag me

misty cargo Nov 7, 2020, 3:03 PM

#

@misty cargo nope
@unborn wraith oh ok then i recommend starting with calculus, you can find courses on mit open courseware for both LA and Calculus

golden saffron Nov 7, 2020, 3:04 PM

#

Dude, That would come up with a set of meaning responses that the RL will have to select.

misty cargo Nov 7, 2020, 3:04 PM

#

that applies to probability and statistics too, mit has pretty good courses

golden saffron Nov 7, 2020, 3:04 PM

#

Dude, That would come up with a set of meaningfull responses that the RL will have to select.

grave frost Nov 7, 2020, 3:04 PM

#

@unborn wraith Just see 3b1B youtube videos and it would keep you an extremely good base

golden saffron Nov 7, 2020, 3:04 PM

#

e.g. a Hi can be responded by Hi, how are you

grave frost Nov 7, 2020, 3:04 PM

#

@golden saffron bro, it doesn't work like that

golden saffron Nov 7, 2020, 3:04 PM

#

or by Hello, whats up

misty cargo Nov 7, 2020, 3:05 PM

#

after that i suggest

unborn wraith Nov 7, 2020, 3:05 PM

#

can anyone provide me a link?

grave frost Nov 7, 2020, 3:05 PM

#

You would have to provide a whole skeleton for RL to fill it up with below avg accuracy

misty cargo Nov 7, 2020, 3:05 PM

#

https://www.coursera.org/learn/machine-learning Stanford Machine Learning (Andrew NG)
http://work.caltech.edu/lectures.html Caltech courses that are great
https://www.fast.ai/ EVERYTHING FROM FAST.AI

Coursera

Machine Learning

Learn Machine Learning from Stanford University. Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, ...

Home

Making neural nets uncool again

golden saffron Nov 7, 2020, 3:06 PM

#

@grave frost, can you explain where did you actually used RL? and whats your understanding of it.

grave frost Nov 7, 2020, 3:06 PM

#

@unborn wraith https://www.google.com/url?sa=t&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwibvIzE2vDsAhWlguYKHf5LC2MQFjAAegQIDRAD&url=https%3A%2F%2Fdevelopers.google.com%2Fmachine-learning%2Fcrash-course&usg=AOvVaw3xRM4CQgVMATc_B_e56j3H Google's crash course. Very good with interactive things and 2 mi videos. Google whatever you dont get or ask it here. It is for absolute beginners

Google Developers

Machine Learning Crash Course | Google Developers

unborn wraith Nov 7, 2020, 3:06 PM

#

thanks

grave frost Nov 7, 2020, 3:06 PM

#

Yw 🙂

misty cargo Nov 7, 2020, 3:07 PM

#

thanks
@unborn wraith np

golden saffron Nov 7, 2020, 3:07 PM

#

I just want to explore RL into Chatbot to understand the user and have a better meaning-full and rewarding conversation.

#

if it fails that will be perfectly fine.

grave frost Nov 7, 2020, 3:08 PM

#

bro, you can explore ofc, no one is stopping you, but you will have to research a bit to find out how it can be used. THe way you described it not how it is to be done

golden saffron Nov 7, 2020, 3:09 PM

#

Okay that means you don't know RL.

misty cargo Nov 7, 2020, 3:09 PM

#

also guys i came talking here just because there was some requirement of sending 50 messages or smth

grave frost Nov 7, 2020, 3:09 PM

#

ohk, I am not saying anything now lemon_swag

misty cargo Nov 7, 2020, 3:09 PM

#

if you need help im free to help

undone flare Nov 7, 2020, 3:10 PM

#

any tips for ds?

grave frost Nov 7, 2020, 3:10 PM

#

@undone flare depends on what you want to do (in general)

undone flare Nov 7, 2020, 3:10 PM

#

data analysis..

misty cargo Nov 7, 2020, 3:11 PM

#

any tips for ds?
@undone flare don t jump to deep learning right off

undone flare Nov 7, 2020, 3:11 PM

#

yea I am not

misty cargo Nov 7, 2020, 3:11 PM

#

most problems can be solved with scikit in like light speed

#

even tho you may not like it lol

grave frost Nov 7, 2020, 3:11 PM

#

@undone flare yeah, there is a site called kaggle.com with great datasets. There is something called EDA - Exploratory Data analysis. Find a dataset you like (There are tons of real world ds and are pretty great). You can check the EDA others have done and try to learn the libs....

undone flare Nov 7, 2020, 3:12 PM

#

I learned the basics of NumPy and currently learning Pandas and I am using the Pokemon Dataset from Kaggle

grave frost Nov 7, 2020, 3:12 PM

#

great! It's a pretty good place to start your DS journey.

undone flare Nov 7, 2020, 3:13 PM

#

Should I take up the udemy bootcamp course?

golden saffron Nov 7, 2020, 3:13 PM

#

ohk, I am not saying anything now
@grave frost Anyways you don't know much, still thanks for the info.

#

Should I take up the udemy bootcamp course?
@undone flare Take free things, There are lot many free things available and most important take up a internship.

unborn wraith Nov 7, 2020, 3:14 PM

#

@grave frost the course u gave me is that for maths?

grave frost Nov 7, 2020, 3:14 PM

#

@golden saffron If you keep saying things like that, you would be reported to server sooner or later. We are not getting paid to help you - it's completely voluntary. Don't get too frazzled up about these things and google things first instead of harassing others

golden saffron Nov 7, 2020, 3:15 PM

#

Internships will be very help full in learning real life issues in data science.

undone flare Nov 7, 2020, 3:15 PM

#

@undone flare Take free things, There are lot many free things available and most important take up a internship.
@golden saffron just wanted your guys opinion

#

alright

#

thx

grave frost Nov 7, 2020, 3:15 PM

#

@unborn wraith It tries to explain things intuitively - without maths. That's why I liked it since it helps to grasp concepts easily and then explore the maths side of it

unborn wraith Nov 7, 2020, 3:15 PM

#

ok thanks 🙂

undone flare Nov 7, 2020, 3:16 PM

#

when should I go for ML?

#

after learning some basic modules?

grave frost Nov 7, 2020, 3:16 PM

#

@undone flare When you feel like it 🙂

unborn wraith Nov 7, 2020, 3:17 PM

#

is it worth to learn ml and data science now?

grave frost Nov 7, 2020, 3:17 PM

#

@undone flare If you are getting bored of EDA and other things, just do some courses to help you get started. Try simple things first (perceptron and linear regeression are good first projects, though the names may sound heavy)

undone flare Nov 7, 2020, 3:17 PM

#

@unborn wraith yes

misty cargo Nov 7, 2020, 3:17 PM

#

after learning some basic modules?
@undone flare maths(calculus, linear algebra, probability & statistics), then classical ml and then deep learning

golden saffron Nov 7, 2020, 3:17 PM

#

Also @undone flare, Try working with spark and cloud as well as many real life datasets are on cloud and they use spark ML libraries for the same.

undone flare Nov 7, 2020, 3:18 PM

#

@undone flare maths(calculus, linear algebra, probability & statistics), then classical ml and then deep learning
@misty cargo I still need to learn calculus

#

is Kaggle Competitions good?

grave frost Nov 7, 2020, 3:19 PM

#

@undone flare Youtube it. Easiest way to learn something fast

misty cargo Nov 7, 2020, 3:19 PM

#

@misty cargo I still need to learn calculus
@undone flare i suggest mit open courseware

#

is Kaggle Competitions good?
@undone flare yup but focus on the ones marked with #knowledge

undone flare Nov 7, 2020, 3:19 PM

#

Okay, thx guys very helpful 👍

grave frost Nov 7, 2020, 3:20 PM

#

yup but focus on the ones marked with #knowledge Any reason why? lol

#

can still participate in the lower end ones, like $500 or so

undone flare Nov 7, 2020, 3:21 PM

#

I only know numpy and pandas so I will just skip competitions for now lol

golden saffron Nov 7, 2020, 3:21 PM

#

@golden saffron If you keep saying things like that, you would be reported to server sooner or later. We are not getting paid to help you - it's completely voluntary. Don't get too frazzled up about these things and google things first instead of harassing others
@grave frost Look bro, I asked for some suggestion. If you dont know its fine, No need to panic and rant out things, A Generative models don’t rely on pre-defined responses. They generate new responses from scratch. When we have multiple GMs they will give multiple responses, and I am just exploring RL here. You responses to my question was simply idiotic. The above definition is textbook def of GM.

grave frost Nov 7, 2020, 3:21 PM

#

@undone flare S'ok - you will get there eventually

#

@golden saffron Man, just read what I posted above. I didn't doubt about RL of GM. All I said is that your approach to using RL/GM in NLP is very wrong and you should research about that.

#

Right now you are just raging that on why it wouldn't work and saying that I am not fit to answer. If you think so, just ignore me. Why would you keep pinging me after that??

pale thunder Nov 7, 2020, 3:25 PM

#

please keep it civil, both of you.

hollow sentinel Nov 7, 2020, 3:28 PM

#

hEaTeD

#

I don't think the Kaggle mini courses are helpful they're kind of cookie cutter

grave frost Nov 7, 2020, 3:30 PM

#

Is that analogy supposed to be obvious? I don't do much baking

hollow sentinel Nov 7, 2020, 3:30 PM

#

oh that just means it's really simple

#

I'm kind of scared of Ng's course bc it's not in Python

#

so I'm doing Kaggle instead

grave frost Nov 7, 2020, 3:32 PM

#

What? What is it in - julia, matlab or somthing?

hollow sentinel Nov 7, 2020, 3:32 PM

#

Octave

#

probably bc Octave was more prevalent back then for machine learning

grave frost Nov 7, 2020, 3:33 PM

#

That's surprising, should have been converted to python by now

hollow sentinel Nov 7, 2020, 3:33 PM

#

yeah but I found a github that does everything in python

#

so that's good

#

I think the google crash course is good too

grave frost Nov 7, 2020, 3:34 PM

#

yeah, but not much coding (atleast not in the start)

hollow sentinel Nov 7, 2020, 3:34 PM

#

I like courses that make me code from the start

#

I didn't like Columbia's course bc it was so focused on theory that it was boring

#

it's why I liked the Python for Data Science and Machine Learning Bootcamp so much

grave frost Nov 7, 2020, 3:36 PM

#

yep, especially in ML, the theory-practical balance is just too bad. There are vids explaining complex things in 4 points and then there are people who explain it all by pretty advanced code. sad

hollow sentinel Nov 7, 2020, 3:36 PM

#

statquest is good for explaining

#

I just wish Ng decided to do it in python

grave frost Nov 7, 2020, 3:37 PM

#

He has other important work too, except making new courses lemon_swag

hollow sentinel Nov 7, 2020, 3:38 PM

#

he has another course in deep learning AO

#

AI

grave frost Nov 7, 2020, 3:38 PM

#

PLus there are plenty others too, so it's not like there isn't much choice

hollow sentinel Nov 7, 2020, 3:38 PM

#

I just think I won't get much help if I do it in octave

#

whatever

grave frost Nov 7, 2020, 3:39 PM

#

I think matlab is great - it doesn't even require coding for most tasks

#

Like regression is done from GUI

#

Just a few values here and there, load the database, a couple drop downs and boom, your regression is done. And it handles some pretty complex graphs upto 3D (in old version) too

hollow sentinel Nov 7, 2020, 3:43 PM

#

Ng has people do linear regression by hand

#

no using sci kit learn

#

ooooooooh spooky

ebon lynx Nov 7, 2020, 3:48 PM

#

there's nothing wrong with doing linear regression by hand

hollow sentinel Nov 7, 2020, 3:48 PM

#

I know I just never did it before

ebon lynx Nov 7, 2020, 3:48 PM

#

because the real ML stuff happens always "by hand"

#

the deeper you go

hollow sentinel Nov 7, 2020, 3:48 PM

#

yeah

#

can someone explain what underfitting is? What does it mean to perform poorly on training data

ebon lynx Nov 7, 2020, 3:50 PM

#

@hollow sentinel y = 1. fit that model

#

that's underfitting

grave frost Nov 7, 2020, 3:51 PM

#

because the real ML stuff happens always "by hand"
@ebon lynx I disagree- in the world where every other guy uses scikit-learn and Keras, it just isn't ML "by-hand" anymore - more like a glorified version that involves programming for people is a trend, so they can secure a good job

ebon lynx Nov 7, 2020, 3:51 PM

#

@grave frost I do scikit learn + keras but I still feel like I'm missing out

#

a lot of real world problems require knowing how to program the solutions

grave frost Nov 7, 2020, 3:52 PM

#

That's my point

hollow sentinel Nov 7, 2020, 3:52 PM

#

lol @ebon lynx i still don’t get it

ebon lynx Nov 7, 2020, 3:52 PM

#

@hollow sentinel it's a shit model and it won't fit anything

#

unless y = 1

hollow sentinel Nov 7, 2020, 3:52 PM

#

oh ok

grave frost Nov 7, 2020, 3:52 PM

#

Few understand how it all actually works

hollow sentinel Nov 7, 2020, 3:52 PM

#

yeah well ML is a niche field

#

it’s like cybersecurity

grave frost Nov 7, 2020, 3:53 PM

#

cybersecurity is not a niche field

hollow sentinel Nov 7, 2020, 3:53 PM

#

oh

ebon lynx Nov 7, 2020, 3:53 PM

#

neither is ML

grave frost Nov 7, 2020, 3:53 PM

#

Though ML is kinda

#

Coz the people who truly understand it have PHD's - years of experince and studying to get to be the experts

hollow sentinel Nov 7, 2020, 3:54 PM

#

yep

grave frost Nov 7, 2020, 3:54 PM

#

CyberSec can be done by a postgrad

#

or script kiddies also these days

hollow sentinel Nov 7, 2020, 3:55 PM

#

well yeah but the ones who aren’t script kiddies are hard to find

grave frost Nov 7, 2020, 3:55 PM

#

no they aren't

#

It's not that technical

hollow sentinel Nov 7, 2020, 3:55 PM

#

oh

grave frost Nov 7, 2020, 3:56 PM

#

Even you can learn a great deal about it in a few weeks (and implement it if coding skills are good)

#

THe thing is to just have knowledge about the methods involved

hollow sentinel Nov 7, 2020, 3:57 PM

#

idk I just found ML more interesting

grave frost Nov 7, 2020, 3:57 PM

#

me too 🙂

#

I find the lack of interpretability of ML models very interesting, which is one of the reasons why I delved into it

hushed wasp Nov 7, 2020, 3:59 PM

#

Hello,

Does someone can tell me what I need to change to not replace every rows by Nan please? 🙂

📎 unknown.png

hollow sentinel Nov 7, 2020, 3:59 PM

#

.fillna would be a good method to look into

hushed wasp Nov 7, 2020, 4:01 PM

#

it's just that my comprehension list replace every other variables than the ones with Kbtu by Nan and in don't know why

hollow sentinel Nov 7, 2020, 4:01 PM

#

how much of your dataset is NaN

hushed wasp Nov 7, 2020, 4:02 PM

#

there aren't before my last code line

ebon lynx Nov 7, 2020, 4:02 PM

#

@hushed wasp to find the correct columns, try the function .filter(like="kBtu")

#

that will give you only the columns with that in the name of the column

hushed wasp Nov 7, 2020, 4:04 PM

#

for the location of the columns it "works" just the Nan replacement I don't know how to solve

#

📎 unknown.png

#

df = df[df[[c for c in df.columns if c.endswith('(kBtu)')]] >= 0]

#

it's this last line which gives me so much nan

fierce swallow Nov 7, 2020, 4:58 PM

#

O

heady hatch Nov 7, 2020, 5:17 PM

#

@hollow sentinel are you still confused about underfitting?

hollow sentinel Nov 7, 2020, 5:38 PM

#

underfitting is where the model hasn't learned enough from the training data

#

right?

#

but like what does it mean to not learn enough

heady hatch Nov 7, 2020, 5:58 PM

#

@hollow sentinel let's focus on a classic model, linear regression.

How do you know when a linear regression is doing badly?

hollow sentinel Nov 7, 2020, 6:05 PM

#

the mean squared error

heady hatch Nov 7, 2020, 6:05 PM

#

And what does that tell you?

hollow sentinel Nov 7, 2020, 6:07 PM

#

how close a regression line is to a set of points

heady hatch Nov 7, 2020, 6:07 PM

#

Right right, and in terms of prediction this means how good or bad your prediction is.

#

So where does underfitting and overfitting come in?

#

Looking solely at underfitting first.

#

Let's think about the relationship between weight and the height.

#

Let's first assume there's a linear relationship between the two.

#

where f(x) = y, and x = weight and y = height.

#

Meaning we're trying to use weight to predict height.

#

How does that theory sound to you?

#

Do you think it makes sense that if people's weight increases, their height will increase in some linear fashion too?

drowsy kite Nov 7, 2020, 6:26 PM

#

Hey guys, wondering if i could get a solution to a small problem i'm having with pandas

#

im trying to use "read_html" on a url but the url is behind a login screen. even when i login with bs4 pandas dosn't recogise it has access pass the login screen. is there another way to do this?

hollow sentinel Nov 7, 2020, 7:05 PM

#

Yes @heady hatch

heady hatch Nov 7, 2020, 7:09 PM

#

@hollow sentinel okay now think about what happens if the algorithm predict average of height for all weight.

Meaning f(x) = avg.

#

How would you describe this algorithm in terms of complexity and the quality of prediction?

#

Is there anything wrong with the algorithm? What's going to happen with the MSE?

heady tide Nov 7, 2020, 9:06 PM

#

📎 1iiPH0JyowvS3k12T0-W2HA.png

hollow sentinel Nov 7, 2020, 10:29 PM

#

idk

velvet thorn Nov 7, 2020, 11:03 PM

#

but like what does it mean to not learn enough
@hollow sentinel there is an actual physical relationship between two populations of data (features and target). a model is one "guess" (based on mathematical rules) at that relationship, which we can evaluate.

#

naturally, we do not have access to the whole population, but only a subset (the datasets that we perform training on)

#

we say a model is "underfit" when the actual relationship is much more complex than that represented by the model

hollow sentinel Nov 7, 2020, 11:13 PM

#

Got it

tawny oak Nov 7, 2020, 11:33 PM

#

hey guys

#

I have this dataframe

#

📎 unknown.png

#

and I want to make it like this

#

📎 unknown.png

#

do you have any idea?

austere swift Nov 7, 2020, 11:39 PM

#

i don't get what you mean

#

oh wait i think i see it now

#

you wanna sum all the ones that have the same id?

tawny oak Nov 7, 2020, 11:40 PM

#

YEAH

austere swift Nov 7, 2020, 11:40 PM

#

i think i remember there being a function for this but i don't remember what it was called

#

wait no it was just a groupby

#

df.groupby(['id']).sum()

velvet thorn Nov 7, 2020, 11:41 PM

#

df.groupby(['name', 'id']).sum()

tawny oak Nov 7, 2020, 11:42 PM

#

I know this but it give me the name just one time

velvet thorn Nov 7, 2020, 11:43 PM

#

I know this but it give me the name just one time
@tawny oak elaborate

#

did you do

#

what I said?

tawny oak Nov 7, 2020, 11:44 PM

#

YEAH

velvet thorn Nov 7, 2020, 11:44 PM

#

show the result

tawny oak Nov 7, 2020, 11:44 PM

#

it give

#

give me that

#

📎 unknown.png

velvet thorn Nov 7, 2020, 11:45 PM

#

it's supposed to be like that

tawny oak Nov 7, 2020, 11:45 PM

#

but I want that

#

https://discordapp.com/channels/267624335836053506/366673247892275221/774778637307740170

velvet thorn Nov 7, 2020, 11:45 PM

#

.reset_index()

tawny oak Nov 7, 2020, 11:46 PM

#

nope

velvet thorn Nov 7, 2020, 11:46 PM

#

actually, no

#

df.groupby(['name', 'id'], as_index=False).sum()

#

.reset_index works too though

#

p sure you didn't use it right

#

  name  id  minutes
0    A  11        3
1    A  13        3

tawny oak Nov 7, 2020, 11:50 PM

#

yeah thank you

#

?D

velvet thorn Nov 7, 2020, 11:50 PM

#

yw

ripe forge Nov 8, 2020, 6:07 AM

#

Wanted Ideas for metric: what's a good substitute for false positive rate in a one-class object detection algorithm?

#

The kicker is: it would be important for this to be model agnostic. And truly capture the essence of "how likely is my model to falsely predict another object where none exists"

#

Any suggestions or even partial ideas welcome.

heady hatch Nov 8, 2020, 6:14 AM

#

How come 1 vs 0 wouldn't work for the metric?

#

Where it detects the class or it doesn't.

ripe forge Nov 8, 2020, 6:38 AM

#

The issue with object detection is that it's not a binary detection. There's the problem with localization as well (where in the image is an object detected). As such, when it doesnt predict a box, it's doing a good job out of an amazingly large number of candidate boxes that were never predicted.

#

So we don't really compute true negatives for object detection (and if we did it wouldn't be model agnostic anyways) thus leading me to this issue.

cedar sky Nov 8, 2020, 6:43 AM

#

Anyone into Kaggle can DM me we could form a team

heady hatch Nov 8, 2020, 6:51 AM

#

I was actually thinking of per pixel binary detection.

#

During inference, you would predict whether the pixel is part of the object you're trying to detect or not.

#

in terms of localization, it would be part of the extraction to localize on where it thinks the object is, then within the ROI detect the object.

#

Then you can have an average precision rate of how well it recognize the pixels.

tawny oak Nov 8, 2020, 8:18 AM

#

hey

#

I have a pandas series which type is string

#

the series is like this

#

10:30

#

02:45

#

I want it hour:minute

#

could I change data type?

nova smelt Nov 8, 2020, 12:30 PM

#

yo so i am a beginner in ML and neural networks and i am currently tryining to create a face recognition neural network with tensorflow and keras
i have finnaly figured out to bring the data in the right shape
but my accuracy is 0.00 sth xDDD
how do i find out which loss functions i should use, which activation functions and how many denselayers
cause i guess thats why i have such a low accuracy xDD
or how many epochs i do need

grave path Nov 8, 2020, 12:33 PM

#

how do i do this

#

📎 unknown.png

#

in Jupyter

undone flare Nov 8, 2020, 1:41 PM

#

@grave path do you mean find determinant of metrices?

#

or matrix multiplication?

grave path Nov 8, 2020, 1:45 PM

#

nevermind I figured it out

#

no i meant the headline xD

#

like the font itself

undone flare Nov 8, 2020, 1:46 PM

#

lol

grave path Nov 8, 2020, 1:46 PM

#

Perhaps yo ucan help with this question

#

How do I keep using the return of a function instead of it only being available inside the function

#

📎 unknown.png

#

It used the one defined outside the function if that makes sense

undone flare Nov 8, 2020, 1:48 PM

#

hmm I see what you mean

#

you should store it in a variable and then do the conditions

#

if you know what I mean

grave path Nov 8, 2020, 1:49 PM

#

yeah I just thought about that let me try it

undone flare Nov 8, 2020, 1:50 PM

#

after the return dataset you should store that in a variable and then use it

grave path Nov 8, 2020, 1:50 PM

#

after?

#

I stored the dataset inside another variable and then returned the new variable

undone flare Nov 8, 2020, 1:51 PM

#

the line NominalEncoder(data, ....) store that in var

#

and then use that

grave path Nov 8, 2020, 1:52 PM

#

ah I see what you mean ill try that

undone flare Nov 8, 2020, 1:52 PM

#

yea otherwise they are overriding each other

grave path Nov 8, 2020, 1:52 PM

#

Legend plus1

undone flare Nov 8, 2020, 1:52 PM

#

Worked?

grave path Nov 8, 2020, 1:53 PM

#

yeah I see this it just calls the function everytime to have that result

#

yeah it did ❤️

undone flare Nov 8, 2020, 1:53 PM

#

nice

grave path Nov 8, 2020, 1:53 PM

#

cheers mate

undone flare Nov 8, 2020, 2:08 PM

#

np.eye() and np.identity() are same right

lapis sequoia Nov 8, 2020, 2:28 PM

#

will it be easy switching from web dev to ai

grave path Nov 8, 2020, 2:30 PM

#

@lapis sequoia Hey mate I have done web dev for a while not the best in it but have built some websites and now I'm doing a bit of machine learning its not the hardest I have realized so far because some stuff in ML are repetitive but then again my experience is limited in both

#

Just go for it

lapis sequoia Nov 8, 2020, 2:31 PM

#

ight thx

undone flare Nov 8, 2020, 2:35 PM

#

@lapis sequoia if you have the basic knowledge of python which I am assuming you have because you were doing web dev so it will not be that hard

grave path Nov 8, 2020, 2:35 PM

#

how do I know if i should use StandardScalar or MinMaxScalar?

hollow gull Nov 8, 2020, 3:12 PM

#

@grave path It is one of those design decisions that is frequently not very obvious to me going into a problem. You can always try both, but it adds a lot of compute if you keep trying every combination of methods. I frequently will research the model I am planning to build and see if the documentation recommends normalization or standardization.

grave path Nov 8, 2020, 3:12 PM

#

Thank you very much

spark nimbus Nov 8, 2020, 3:15 PM

#

I'm working on a jupyter notebook in pycharm, but it takes up excessive amounts of memory. Does anyone know how to solve this?

molten hamlet Nov 8, 2020, 3:22 PM

#

jupyter notebok in pycharm?

#

😮

golden saffron Nov 8, 2020, 3:25 PM

#

I'm working on a jupyter notebook in pycharm, but it takes up excessive amounts of memory. Does anyone know how to solve this?
@spark nimbus Directly use the python terminal. No need of Jupiter in pycharm.

undone flare Nov 8, 2020, 3:29 PM

#

What I prefer is make a different folder and shift+right click and then install all the libraries and jupyter notebook and run the jupyter notebook from that folder itself so it is easier to keep track of stuff

#

Note : This is just my opinion

spark nimbus Nov 8, 2020, 3:37 PM

#

@golden saffron No I mean, this is meant as interactive documentation

#

but this kinda keeps happening every so often

📎 unknown.png

undone flare Nov 8, 2020, 4:06 PM

#

@spark nimbus maybe allocate more memory to PyCharm

spark nimbus Nov 8, 2020, 4:25 PM

#

I already allocated 8G rn

#

and it still happens

undone flare Nov 8, 2020, 4:40 PM

#

oof

steel roost Nov 8, 2020, 4:54 PM

#

hey guys

#

how would i turn this into a dictionary:

#

📎 unknown.png

#

i really want to convert size_name to a dictionary

#

the output looks like 20511552:10

#

where the number after the colon is the size

#

nvm i found it

#

📎 unknown.png

hollow gull Nov 8, 2020, 5:02 PM

#

It is sort of confusing to me to rename something over the size variable inside the loop of sizes. Maybe the initial size variable in for size in sizes should be named differently?

tranquil apex Nov 8, 2020, 5:05 PM

#

0     10   condo    A
1     24  duplex    D
2     32    home    D
3     25  duplex    A
4     65   condo    A

#

how do I turn it in to this^

Price    Type City   AVG
0     10   condo    A  37.5
1     24  duplex    D  24.0
2     32    home    D  32.0
3     25  duplex    A  25.0
4     65   condo    A  37.5

#

i tried groupby type, city and .agg price to mean

hollow gull Nov 8, 2020, 5:08 PM

#

And what did that give you?

tranquil apex Nov 8, 2020, 5:11 PM

#

incompatible index of inserted column with frame index

hollow gull Nov 8, 2020, 5:19 PM

#

import pandas as pd

list_values=[[10, 'condo', 'A'],
[24, 'duplex', 'D'],
[32, 'home', 'D'],
[25, 'duplex', 'A'],
[65, 'condo', 'A']]

df_values = pd.DataFrame(list_values, columns=['Price', 'Type', 'City'])

df_values.groupby(by=['Type', 'City'], as_index=False).agg({'Price': 'mean'})

This outputs:
Type City Price
0 condo A 37.5
1 duplex A 25.0
2 duplex D 24.0
3 home D 32.0

Then you need to rename price as 'AVG' and then left join this dataset back on to your original dataset on Type and City.

tranquil apex Nov 8, 2020, 5:19 PM

#

ya! i just did that and it worked

#

i used merge

hollow gull Nov 8, 2020, 5:20 PM

#

I always forget if I need to use merge or join. I think one uses the index and the other columns, but I was being a little loose in my language :/

tranquil apex Nov 8, 2020, 5:39 PM

#

i got what you meant tho, and it was useful!

#

practically the last piece to my puzzle

alpine bay Nov 8, 2020, 6:04 PM

#

What determines the default values of an ndarray when created like this ?
a = np.ndarray((height,width,3),dtype=np.uint8)

molten hamlet Nov 8, 2020, 8:53 PM

#

is there any book for image processing? but not for opencv, I know that, but some more advanced stuff, detecing peoples or creating haar contours

#

What determines the default values of an ndarray when created like this ?
a = np.ndarray((height,width,3),dtype=np.uint8)
@alpine bay I wrote that code and it is random

alpine bay Nov 8, 2020, 9:15 PM

#

@molten hamlet what do you mean you wrote that code?

arctic wedgeBOT Nov 8, 2020, 9:24 PM

#

Hey @indigo skiff!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .flac, .afdesign, .m4a, .csv.

Feel free to ask in #community-meta if you think this is a mistake.

indigo skiff Nov 8, 2020, 9:27 PM

#

Hey guys i needed Help with assignment which is due within next few hours. I wanted to check if im doing it all right. It's introductory level masters assignment which is asking us to apply DFS, BFS, uniform cost search, best first search and Algorithm A functions along with few more interesting questions. I am not able to attached the assignment. Reading time would be 4-6 mins please could someone have a look. Any help would be really appreciated. I am looking for someone i can discuss this with quickly. I am a new member therefore please do excuse if im not asking this in write place. Unfortunately since i am new member i am not able discuss or use voice chat function therefore would anyone want to volunteer and have a quick discussion please? Thanks again everyone.

molten hamlet Nov 8, 2020, 9:41 PM

#

@molten hamlet what do you mean you wrote that code?
@alpine bay just print(a) few times and you will see

spark dirge Nov 8, 2020, 11:11 PM

#

@indigo skiff create sample problems and code up some stubs. Internet has Bfs and Dfs free for the picking.

hasty grail Nov 9, 2020, 1:55 AM

#

@spare lotus What are you trying to do?

indigo skiff Nov 9, 2020, 4:02 AM

#

@spark dirge are you available for quick discussion? please check message

timber pollen Nov 9, 2020, 6:39 AM

#

deta

undone flare Nov 9, 2020, 6:42 AM

#

?

verbal jetty Nov 9, 2020, 9:46 AM

#

Hey - hopefully, this is the right channel: This is encoded with "ISO-8859-1" What is happening here, and how can I avoid that? (Left Dataset CSV - Right Side Output)

📎 Screenshot_2020-11-09_at_10.35.35.png

velvet thorn Nov 9, 2020, 9:53 AM

#

@verbal jetty try encoding='latin1'

verbal jetty Nov 9, 2020, 9:54 AM

#

Thank you @velvet thorn . but unfortunately the same result

velvet thorn Nov 9, 2020, 9:54 AM

#

you sure

#

the encoding is correct?

#

can you

#

show

#

all the arguments

#

to pd.read_csv

#

other than filename

verbal jetty Nov 9, 2020, 9:57 AM

#

@velvet thorn

📎 Screenshot_2020-11-09_at_10.57.02.png

velvet thorn Nov 9, 2020, 9:58 AM

#

hm.

#

weird

#

what are you opening the left side in?

verbal jetty Nov 9, 2020, 9:58 AM

#

numbers

velvet thorn Nov 9, 2020, 9:58 AM

#

huh?

#

I mean, what program

verbal jetty Nov 9, 2020, 9:59 AM

#

Numbers(MacOS)

#

Equivalent to excel

#

In Excel it looks like this

📎 Screenshot_2020-11-09_at_10.59.52.png

verbal jetty Nov 9, 2020, 10:27 AM

#

got it - works with utf-16

waxen fiber Nov 9, 2020, 11:33 AM

#

Hello everyone!
i am wondering how to extract validation data from this

self.__Dir_Data = tf.keras.preprocessing.image_dataset_from_directory(self.__Dir_Path ,validation_split = 0.1 ,subset="training", seed = 1,  labels='inferred', label_mode='int' ,batch_size=32 ,image_size=(124, 124))

#

I have following statement, which is getting from specified directory whole Train Data

#

And I store inside Dir Data the Train Data from directory, Is it possible to extract for example 10%-20% of images to separate Validation Data?

#

and it is the same for labels

spark dirge Nov 9, 2020, 11:49 AM

#

See train test split
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

sharp sage Nov 9, 2020, 12:02 PM

#

hey

#

can someone help me iterate into a list?

#

📎 unknown.png

#

📎 unknown.png

#

📎 unknown.png

#

i think i need to add an if statement to increase right?

#

wait

lapis sequoia Nov 9, 2020, 12:06 PM

#

Well, looking add the code I expect it, to add the same number 1000 times

#

its never re-evaluated

sharp sage Nov 9, 2020, 12:06 PM

#

yeah

#

its not called 1000times

#

only the list it

#

is*

#

sorry

#

the range

#

dont i just need to do somthing like

#

data * 1000

#

and add it to the list

#

📎 unknown.png

#

📎 unknown.png

#

📎 unknown.png

#

📎 unknown.png

#

should be correct?

#

the only thing being its saying array

lapis sequoia Nov 9, 2020, 12:10 PM

#

better use a help channel for this

#

if you guys have data such as "time, Rates per minute, Rate of penetration, Torque, and Weight"
what type of algorithm do you suggest I use? I was thinking of just plotting the data then being like "when weight spiked at this time, the Rates per minute were increased"

#

I heard doing that is a type of algorithm called "linear regression"

#

you guys got any other suggestions ?

#

Well, plotting alone doesnt have a lot to do with linear regression

#

📎 1200px-Linear_regression.png

#

linear regression is finding the line, that fits the data (blue points) best

#

AL_02notes

#

so if you plot your data and the points are arranged like this, linear regression is probably a good model

#

if it tilts slightly as time goes by

#

i.e. exponential,

#

What do you recommend I use then?

#

well, in most cases you would want to transform your data

#

and do linear regression after that

#

so if it looks exponential, you'd take the log

#

and do linear regression with the transformed values

#

Ohh, I see. Thanks for your explanation

undone flare Nov 9, 2020, 12:53 PM

#

What library should I use for reading MySQL in pandas?

proven yarrow Nov 9, 2020, 1:27 PM

#

can anyone type me numpy and matplotlib code for this graph

📎 124558538_367192271024616_5034592061348451073_n.png

#

anyone pls

spark nimbus Nov 9, 2020, 4:04 PM

#

x = [-2, -1, -1, 1, 2, 3, 4]
y = [0, 0, -1, 1, 1, 0, 0]
plt.xlim(-1.5, 3.5)
ply.ylim(-1.5, 1.5)
plt.plot(x, y)
``` and then some labels too somewhere in there

shy mesa Nov 9, 2020, 6:44 PM

#

how can I remove a row in my dataframe if it contain all NaN value? (using pandas)
I tried this but doesn't work:
.dropna(how='all', axis=0)

stark orchid Nov 9, 2020, 7:35 PM

#

Just wrote a new blog:
https://greatexpectations.io/blog/data-tests-failed-now-what/
TLDR: The job isn't done after you build your data pipeline tests. This blog goes through the processes necessary after a test fails.

Your data tests failed! Now what?

You think all you need to do to secure your data pipeline is implementing some tests and all your data problems are solved? Unfortunately, it’s not quite that easy...

lone osprey Nov 9, 2020, 7:39 PM

#

I have one doubt

#

I have taken a gender classification model

#

I have columns as 'names' and 'gender'

#

But for better training, I trained using columns 'starts by vowel/consonant', 'ends by vowel/consonant', 'long/short size'

#

I tranined it using decision tree classifier

#

And I saved the model

#

Now, I sent the model to someone

#

He knows only that dataset had 'name' and 'columns'

#

So, he gives predict([test['name'])

#

Will it return right answer? I mean, will it return gender?

#

Or he has to give only in way of test['starts with consonat/vowel', 'ends with consonant/vowel', 'short/long word']

#

Please ping me while saying solution to me

#

And please do provide me a solution

safe tapir Nov 9, 2020, 9:26 PM

#

Any opinions on Dagster vs. Prefect?

smoky fractal Nov 9, 2020, 11:15 PM

#

Hello, I am using pandas dataframes with the following short snippet: https://pastebin.pl/view/691d9cc0

I am getting the error

rsi.py:17: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I have tried using the .loc method, but It hasn't worked thus far. it says I'm making a copy, but I'm not sure how or where.

velvet thorn Nov 10, 2020, 12:17 AM

#

how can I remove a row in my dataframe if it contain all NaN value? (using pandas)
I tried this but doesn't work:
.dropna(how='all', axis=0)
@shy mesa pandas methods create copies

#

they don't modify inplace

#

you need to reassign to the original variable or add inplace=True

#

@smoky fractal you're doing it at the start

#

symbolData = symbolData.tail(bars)

#

which is equivalent to symbolData.iloc[-5:]

#

anyway

#

your code could be improved a lot IMO

smoky fractal Nov 10, 2020, 12:19 AM

#

ah, how can I isolate only the most recent x rows?

velvet thorn Nov 10, 2020, 12:19 AM

#

so the simplest solution would be

#

to add a .copy() there

#

symbolData.tail(bars).copy()

#data-science-and-ml

data-science , I am looking for SymPY for calculating integral from calculus, I am struggling with some fundamentals for calculating the area under the curve. Can anyone help ?