#data-science-and-ml

1 messages ยท Page 266 of 1

hasty grail
#

yes

heady hatch
#

But even prior to that, how would I know to get the b column?

#

Because as of now, I have no idea where the value is.

hasty grail
#

oh I assumed that you only wanted to perform a search on the b column

heady hatch
#

Oh! hahaha

#

Yea sorry for being unclear.

hasty grail
#

if there are multiple columns then you can convert the entire thing to a NumPy array

#

nonzero would still work

heady hatch
#

So essentially we would need to scan across all columns for it?

hasty grail
#

yes

heady hatch
#

Ahh okay.

hasty grail
#

Am assuming that you want *all* the occurences

#

not just the first match

narrow flume
#

i'm having questions on making Sierpiล„ski triangle about matplotlib

golden pecan
#

hi everyone, i have a JSON question, hope this is the right channel to ask.

I need to parse the JSON from a string. it goes like this:
<!-- something something some changing text {"prodID":"20XE","isenrollment":false}} something something some changing text -->
is there a way to just get the JSON part without declaring something something parts?

hasty grail
#

yes you can use regex

chilly pasture
#

hi how do i activate a conda virtual environment in colab that persists across all cells?

#

!source activate command works only for that particular cell

arctic wedgeBOT
#
Bad argument

Unable to convert 'activate command works only for that particular cell' to valid command, tag, or Cog.

velvet thorn
#

Can we search for index by values in pandas DataFrame?
@heady hatch why do you want to do that?

#

that's my first question

#

second question: do you want the original index, or a numeric one?

heady hatch
#

@velvet thorn Because I have a dataframe consist of unique values across, and I need to look for the index of those unique values.

eg

a ('asdf', 1) ('fdsa', 2) ('qwert', 3)
b ('zxcv', 4) ('vcxz', 5) ('qsqdqw', 6)
c ...

original index, but I'm assuming I could just map it back using the numerical one.

One solution I've found was

df.isin([value]).any(axis=1)

This will give me the index of where the value exists.

But I think looking up values to transform them isn't efficient. I think it would be better to transform the values beforehand before turning it into a dataframe.

Originally what I needed to do is count all the values in a list, then turn that count into a dataframe.

But then later on, the objective changed into count all values plus some metadata.

So I was thinking since I've already made the dataframe, I could just go back and insert the metadata at the particular values.

Apparently it's a whole other mess. hahaha

velvet thorn
#

hm

#

@heady hatch okay, wait

#

originally you said "by column", didn't you?

#

like can you give a realistic example?

heady hatch
#

Hmm what do you mean by column?

#

Oh I think it was Darklight's solution of scanning by column.

velvet thorn
#

I'm not really sure what the code you posted represents

#

(also...why are there tuples in your DF...?)

heady hatch
#

@velvet thorn
A realistic example, hmm.

So let's say we have a list of features.

#features for category
[['feat_1', 'feat_2', 'feat_3', 'feat_1', 'feat_3'], ['feat_1', 'feat_3', 'feat_4', 'feat_5']]
# turning into count
[{'feat_1': 2, 'feat_2': 1, 'feat_3': 2}, {'feat_1': 1, 'feat_3': 1, 'feat_4': 1, 'feat_5': 1}]

df
category1 ('feat_1', 2) ('feat_2', 1) ('feat_3', 2) (None)
category2 ('feat_1', 1) ('feat_3', 1) ('feat_4', 1) ('feat_5', 1)

But now I need to go back into each feature and add some metadata.

velvet thorn
#

๐Ÿฅด

#

are those values?

heady hatch
#

Yup.

velvet thorn
#

...is it supposed to be like this?

#
>>> df = pd.DataFrame([[1, 1], [1, 2], [1, 3], [1, 1], [1, 3], [2, 1], [2, 3], [2, 4], [2, 5]], columns=['category', 'feature'])
>>> df
   category  feature
0         1        1
1         1        2
2         1        3
3         1        1
4         1        3
5         2        1
6         2        3
7         2        4
8         2        5
>>> df.groupby('category').count()
          feature
category         
1               5
2               4
#

(I meant more something like this btw)

#

like something that can be executed

undone flare
#

df_csv.loc[df_csv['Type 1'] == 'Fire'] I have this but I only want to inculde those who are Flying Fire pokemon so how can I do it?

velvet thorn
#

df_csv.loc[df_csv['Type 1'] == 'Fire'] I have this but I only want to inculde those who are Flying Fire pokemon so how can I do it?
@undone flare show data

undone flare
#

ok

velvet thorn
#

no images please

undone flare
#

then?

velvet thorn
#

text

heady hatch
#

@velvet thorn
Something like this, this was generated via random ascii lowercase letters 100 times for 3 times.

In [34]: pd.DataFrame(results)
Out[34]:
        0       1       2       3       4
0  (d, 8)  (v, 7)  (q, 6)  (u, 5)  (m, 5)
1  (c, 9)  (u, 7)  (e, 6)  (d, 6)  (q, 6)
2  (n, 8)  (b, 7)  (z, 7)  (o, 7)  (p, 6)
velvet thorn
#

although I'm going to assume that you have a Type 1 and Type 2 column

undone flare
#

yes

velvet thorn
#

accordingly, I believe you want df_csv[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2'] == 'Flying')].

#

assuming 'Fire' can only be in 'Type 1'

undone flare
#

oh

#

I need to make them tuple

velvet thorn
#

...

heady hatch
#

HAHAHA

velvet thorn
#

what?

#

why would you do that

#

no don't do it

undone flare
#

Wiat

heady hatch
#

Don't do it.

velvet thorn
#

@velvet thorn
Something like this, this was generated via random ascii lowercase letters 100 times for 3 times.

In [34]: pd.DataFrame(results)
Out[34]:
        0       1       2       3       4
0  (d, 8)  (v, 7)  (q, 6)  (u, 5)  (m, 5)
1  (c, 9)  (u, 7)  (e, 6)  (d, 6)  (q, 6)
2  (n, 8)  (b, 7)  (z, 7)  (o, 7)  (p, 6)

@heady hatch ๐Ÿฅด

#

why are there tuples in your DataFrame

#

that is Bad

heady hatch
#

๐Ÿฅด

velvet thorn
#

not Bad, but still Bad

undone flare
#

no I mean

heady hatch
#

It's terrible but the people wanted me to do this wanted the data like this.

#

Or hmm do you have any other suggestions?

undone flare
#

do I have to do df_csv.loc[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2' == 'Flying'])]?

heady hatch
#

Depending on what you want.

velvet thorn
#

do I have to do df_csv.loc[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2' == 'Flying'])]?
@undone flare that's literally what I typed

#

without the .loc

#

and assuming this

#

assuming 'Fire' can only be in 'Type 1'
@velvet thorn

#

if you can have Flying/Fire in that order then you need to add a bit more

undone flare
#

ok

velvet thorn
#

It's terrible but the people wanted me to do this wanted the data like this.
@heady hatch uh...

#

Or hmm do you have any other suggestions?
@heady hatch to stoer the data differently?

#

what's the first element in the tuple

heady hatch
#
[Counter({'j': 7,
          'f': 3,
          'y': 8,
          'b': 2,
          'c': 3,
          'm': 8,
          's': 6,
          'z': 6,
          'r': 3,
          'a': 3,
          'h': 6,
          'd': 5,
          'w': 1,
          'p': 4,
          'g': 2,
          'i': 5,
          'u': 6,
          'q': 5,
          'o': 3,
          'n': 3,
          'l': 2,
          'k': 4,
          'x': 3,
          'v': 1,
          't': 1}),
 Counter({'i': 6,
          'e': 5,
          'f': 5,
          'v': 6,
          'g': 4,
          'o': 2,
          'x': 4,
          'q': 1,
          'm': 2,
          'k': 4,
          'y': 3,
          'w': 4,
          'a': 3,
          'r': 3,
          'z': 9,
          'd': 3,
          's': 4,
          'h': 5,
          'n': 2,
          'l': 2,
          'p': 8,
          'c': 5,
          't': 2,
          'b': 5,
          'u': 2,
          'j': 1}),
 Counter({'d': 2,
          'x': 3,
          'b': 5,
          'k': 4,
          'i': 6,
          't': 5,
          'v': 9,
          'm': 5,
          's': 3,
          'a': 4,
          'z': 5,
          'p': 3,
          'r': 4,
          'o': 9,
          'q': 5,
          'l': 5,
          'c': 3,
          'e': 4,
          'u': 1,
          'g': 4,
          'n': 1,
          'f': 1,
          'h': 4,
          'j': 3,
          'y': 1,
          'w': 1})]

So the data is something like this.

#

It's a count of certain values.

#

and they want the top 50, each as a column.

#

Not the top 50 of alphabetical characters but top 50 of something else.

#

The first element of the tuple is the key in the count, the second value is the count itself.

undone flare
#

This gives me error df_csv.loc[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2' == 'Flying'])]

heady hatch
#

And there are a thousand something of these counts.

#

What kind of error are you getting?

undone flare
#

oh wait

#

nvm.

#

The placement of ] was wrong

velvet thorn
#

The placement of ] was wrong
@undone flare yes, because I told you to look at the code that I wrote

#

not edit what you wrote...

#

as I said, you shouldn't be using .loc

#

And there are a thousand something of these counts.
@heady hatch wait, go back

#

so each individual Counter instance, when stored in the DataFrame, should have something to identify it?

#

i.e. a count from one is distinguishable from a count from another

heady hatch
#

Itโ€™ll be identified by another value, which will be their index.

velvet thorn
#

yeah

heady hatch
#

Iโ€™m on mobile so I canโ€™t type code. But something like

{โ€œvalueโ€: Counter(...)}

And the value will be the index.

velvet thorn
#

that's what you want in the result

#

what I mean is

#

IDEALLY

#

you would have a DataFrame with three columns

#

category, character, count

#

not that tuple mess ๐Ÿฅด

heady hatch
#

Hahaha Iโ€™ve maintained two data frames. One before the tuple mess and the other one as the output that the other people want them.

velvet thorn
#

why do they want that

#

did you ask?

undone flare
#

as I said, you shouldn't be using .loc
@velvet thorn I am learning rn

heady hatch
#

But then I needed to edit the tuples which started this whole journey.

velvet thorn
#

which is why I'm telling you not to use it

heady hatch
#

Lmeow

velvet thorn
#

I'm just saying

#

it'd be a lot easier to add metadata

#

you would have a DataFrame with three columns
@velvet thorn with this

#

add one more column, done ๐Ÿ™‚

heady hatch
#

I think the reason they wanted it is because theyโ€™re not familiar with Python and they want to visually understand the counts.

velvet thorn
#

create a visualisation then

heady hatch
#

Iโ€™ll let them figure that out and Iโ€™ll update you tomorrow on what happens.

#

Going to head to bed, good night and thanks again.

velvet thorn
#

yw!

boreal summit
#

Hello everyone, I've been having a little issue. I'm unable to import datasets from sklearn. I'm getting a "URLopen error (error no 11001) getaddrinfo failed"

velvet thorn
#

Hello everyone, I've been having a little issue. I'm unable to import datasets from sklearn. I'm getting a "URLopen error (error no 11001) getaddrinfo failed"
@boreal summit HUH.

#

are you running behind a proxy?

#

like are you in school or something

#

or somewhere that restricts what sites you can visit

brazen canyon
#

Dru, that has to do with your internet
Check and try again

boreal summit
#

No, I'm running it on vs code.

#

Sorry, I had to go do something real quick.

#

@velvet thorn @brazen canyon it's on vs code.

#

Running jupyter on vs code.

chrome barn
#

running it on vscode has nothing to do with your internet, read the questions above again....

boreal summit
#

I'm running a proxy.

#

Not on the internet.

lapis sequoia
#

How would you guys break this down np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)What should be the result of that shape..Is is a 7x7 matrix or?

undone flare
#

np.zeros((7,7)) This will give 7x7 matrix

boreal summit
#

I'm also not connected to the internet.

lapis sequoia
#

I think I got it..is this a 4D tensor then?

undone flare
#

yea I think so

lapis sequoia
#

Something like..there x-channels 7x7 matrices twice

#

tricky but interesting :))

velvet thorn
#

I'm also not connected to the internet.
@boreal summit you need to be

#

the datasets are downloaded

#

if you're accessing them for the first time

#

How would you guys break this down np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)What should be the result of that shape..Is is a 7x7 matrix or?
@lapis sequoia it's 4D

boreal summit
#

@velvet thorn ooh, I never knew. I thought they come with the installation. Thanks for the tip. ๐Ÿ‘๐Ÿฟ๐Ÿ‘๐Ÿฟ

undone flare
velvet thorn
#

@velvet thorn ooh, I never knew. I thought they come with the installation. Thanks for the tip. ๐Ÿ‘๐Ÿฟ๐Ÿ‘๐Ÿฟ
@boreal summit np! the thing is some of the datasets are a bit larger

#

and many people will never use them

boreal summit
#

@velvet thorn true, that's a valid reason.

lapis sequoia
#

@undone flare filter where Type 1 == 'Bug' before doing the groupby()

undone flare
#

How?

#

is .where() a thing?

lapis sequoia
#

df_xlsx[df_xlsx['Type 1'] == 'Bug].groupby(['Type 1']).count()['count']

#

'Bug' - i missed the closing quote mark

undone flare
#

thx

lapis sequoia
#

and put Type 2 in the groupby too

grave path
#

Hello guys

#

lets say I want to do this
if condition meets put 1 else put 0 in the row

#

how do i add ELSE to this

#

data.income = data.income.replace('>50K',1)

lapis sequoia
#

data.income.apply(lambda x: if x == '>50k' then 1 else 0)

grave path
#

lambda?

lapis sequoia
#

or you can use np.where()

grave path
#

how do i do it with np.where?

lapis sequoia
#

np.where(data.income == '50k', 1, 0)

grave path
#

hmmm

#

Thank you

velvet thorn
#

uh

#

data['income'] = (data['income'] == '>50K').astype(int)

#

in general, don't use apply if there's another method

whole vortex
#

How would I order the labels in the x axis of a graph using seaborn

velvet thorn
#

@whole vortex they should beo rdered by default

whole vortex
#

So my data contains a date/time datatype and I've created a new column to retrieve and show the specific day based on these date/time values

#

That works well and good however when the graph is displayed, the days are ordered randomly

#

I have 6 of these graphs btw

#

Ideally I want to start with Monday and end with sunday, do you or anyone here know if there's a way to custom order the labels here

velvet thorn
#

ah, okay

#

this is a bit different

#

they're not ordered randomly

#

they're ordered in increasing order of value

#

you need to order the source data

whole vortex
#

That's coincidence

#

I'll give you all 6 graphs

velvet thorn
#

okay, but anyway

#

have you ordered the source data

whole vortex
#

I haven't done anything to change the data's order

#

I've only been adding data to the pre-existing rows and analysing it all in different ways

velvet thorn
#

hm.

#

categorical data with Seaborn is a bit tricky

whole vortex
#

That reference is what I used to be able to create 6 separate graphs with the data I had and to present them nicely

#

I'm not restricted to seaborn, I've just been sticking to it because it looks nice ๐Ÿ˜ฌ

#

I don't mind trying something new? To be honest, I think this is an aesthetic problem and not really needed but I think it'd be nicer to have the days ordered

velvet thorn
#

sorry got distracted

#

@whole vortex okay I don't normally use Seaborn (don't like the abstractions)

#

there's probably a way

#

but I don't know what it is

#

in matplotlib

whole vortex
#

Aha, don't worry, you're volunteering ๐Ÿ˜‚

velvet thorn
#

I would just process the data manually

#

because that's basically the result of a groupby, right

#

and feed that directly to ax.plot

#

because then I would be able to control the order of the data

whole vortex
#

This is going to be interesting. I'm quite new to data science as a whole so still figuring some things out

#

I did come across something earlier regarding ordering the days but didn't manage to apply it

velvet thorn
#

{row,col,hue}_orderlists

Order for the levels of the faceting variables. By default, this will be the order that the levels appear in data or, if the variables are pandas categoricals, the category order.
#

this might help

#

check that out

whole vortex
#

@velvet thorn does matplotlib have a facetgrid equivalent

#

I'm unsure how I'd go about this yet

lapis sequoia
#

yes, take a look at subplots() on the matplotlib documentation

narrow flume
#

can anyone help me with matplotlib? I am doing Sierpiล„ski Triangle

tropic junco
#

i have a value, temperature, and i want to make a graph in matplotlib with 0 C to 50 C, and i want my temperature to show on that graph, how will i make this?

whole vortex
#

What type of graph do you want

tropic junco
#

any kind tbh

#

a line one would be good though

hollow sentinel
tropic junco
#

i see, thanks

#

but the data is different from the examples

#

i have one data and i want to display it between a range

#

so a straight line

hollow sentinel
mortal pendant
#

Hey! I'm wanting to learn how to make an RNN but I can't find anything that doesn't require Tensorflow. However, I am unable to install tensorflow through pip as I get an error that a lot of other people seem to get but none of the alternative command lines work.ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none) ERROR: No matching distribution found for tensorflowI've tried lots of the .whl files that I've seen suggested as solutions but either pip says it's unsupported or it just results in another large error. Any ideas on an actual fix?
I'm on Windows 10 64bit, I wish to use my GPU, I just updated to 3.8.0 to see if that might fix it despite the fact Tensorflow is supposed to support Python 3.5 and up, I'm on the latest version of pip... let me know if you need any more information

flint arrow
#

hello

#

I am in my 1st sem of DSA

#

please suggest what I should be learning out of class

hollow sentinel
#

@flint arrow do you like courses or books

flint arrow
#

I am already in Bachelors coursr

#

I want to be good at programming

hollow sentinel
#

you didnโ€™t answer my question lmao

obtuse skiff
#

Can someone pls help me understand bias in a neural network
say the bias is 1, does it act like another input and have a weight for each output/node

or does it just add 1 to each node.

#

Ive seen both, and idk which is correct or if both are and when to use one over the other

velvet thorn
#

@obtuse skiff each neuron always has its own bias

#

but there are two ways to represent that

#

one bias per layer and one weight per neuron

#

or simply one bias per neuron

#

output = w * a + b, which is equivalent to w * (a + b / w).

#

although you can have one bias per layer

#

but that would make it harder to fit

smoky bobcat
#

@serene scaffold here i am

serene scaffold
#

there you are indeed

#

by the way, any time you have a general question about machine learning, a lot of people who know way more about the subject than me hang out here.

#

in this particular channel.

smoky bobcat
#

I red it before, from my perspective for PCA is that it sees a covariance between 2 different datas and then tries to standardize it?

#

by the way, any time you have a general question about machine learning, a lot of people who know way more about the subject than me hang out here.
@serene scaffold ok thanks for the info

#

I red it before, from my perspective for PCA is that it sees a covariance between 2 different datas and then tries to standardize it?
@smoky bobcat is this right by any chance? like Tries to standardise data by looking at covariance matrix between different data @serene scaffold

serene scaffold
#

I only learned about LDAs recently so I'm trying to wrap my head around all this myself

#

hmm

smoky bobcat
#

i havent got even a clue about LDA

#

it's even more confusing than PCA

serene scaffold
#

I'll see if another staff member can more effectively answer this question.

smoky bobcat
#

ook

#

i think that LDA is more about classification while PCA is more about standardisation

velvet thorn
#

i think that LDA is more about classification while PCA is more about standardisation
@smoky bobcat ...what do you mean by that?

smoky bobcat
#

@smoky bobcat ...what do you mean by that?
@velvet thorn i mean that LDA tries to classify the data in different portions while PCA tries to get all the data at the same level. correct me if im wrong, im not an expert just a noobie trying to understand

velvet thorn
#

uh.

#

to be clear

#

when you say LDA

#

you mean latent discriminant analysis, right?

tropic junco
#
``` this is pretty vague, but what would be the best way to plot this kind of data, i just want to plot a few things like temp and humidity, i am getting this data from an api
#

can someone help?

velvet thorn
#

@tropic junco your question is pretty vague.

#

as you noticed

#

also the data is very chunky

#

like maybe if you shared your ultimate objective

#

it'd be easier to help you

#

"I just want to plot a few things" <- what did you set out to do originally?

tropic junco
#

i just want to plot a graph for temperature and humidity

#

but i am getting confused as how to do it, as i cant plot one time values

velvet thorn
#

uh

#

isn't that one entry

#

in your dataset?

tropic junco
#

wdym?

velvet thorn
#

you said "this kind of data"

#

so I assume you have more like that

#

so just extract temperature and humidity

#

now you have 2 1D arrays

#

scatterplot them against each other

tropic junco
#

i mean, if i have temp given 25 C , how do i plot it between a range of 0 C to 50 C

#

oh

heady hatch
#

Hey @velvet thorn :^) They wanted to separate out the number in the tuple as its own column now.

eg

('a', 1) ... -> ('a', 1) (1)
velvet thorn
#

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

heady hatch
#

Hahahaha

velvet thorn
#

I'M DYING

#

๐Ÿฅด

#

so who wants you to do this

#

I'm guessing they don't have DB experience

tropic junco
#

it just shows an empty graph to me

heady hatch
#

hahaha I think they wanted to do this so they can visually see how the data breakdown.

velvet thorn
#

it just shows an empty graph to me
@tropic junco show ocde

tropic junco
#

nvm, i realized it will be useless to plot a graph of one time values rather than comparing it with past ones

#

like a graph of the temperatures in the past week

smoky bobcat
#

you mean latent discriminant analysis, right?
@velvet thorn yes sir

surreal willow
#

Idk if this is the right channel, but is anyone here fammiliar with likelihood ratios?

flint arrow
#

@hollow sentinel I meant I am in a course and its ok.

#

but u can suggest me some other courses too.

tropic junco
#

how can you create graphs with sqlite query?

#

or, what is the best way to create a graph for the user's messages i get from my discord bot?

hollow sentinel
#

@flint arrow python for data science and machine learning bootcamp

flint arrow
#

@hollow sentinel from?

#

there are so many

hollow sentinel
#

Udemy

flint arrow
#

right..thank you.

hasty grail
#

Is there a more efficient way to do batched scatter operations (in TensorFlow terms) in NumPy? Currently I have something like this:

>>> import numpy as np
>>> a = np.zeros((5, 5))
>>> indices = [4, 2, 4, 3, 1]
>>> np.add.at(a, (np.arange(5), indices), 1)
>>> print(a)
[[0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0.]
 [0. 1. 0. 0. 0.]]
#

I'm also interested in ways to parallelize a loop that calls the batched scatter operation in each iteration.
The use case is building a histogram from a dataset (in practice the dataset is a generator instead of a NumPy array because it doesn't fit in memory).

import numpy as np

n_samples, n_positions, n_bins = 1024, 256, 100    # Real situation: (~64k, ~64k, ~256)
hist_per_position = np.zeros((n_positions, n_bins), dtype=int)
idx_dataset = np.random.randint(n_bins, size=(n_samples, n_positions))
for bin_indices in idx_dataset:
    np.add.at(hist_per_position, (np.arange(n_positions), bin_indices), 1)
rich silo
#

Hey guys how do i mark code here?
If i need to post same code here?

hasty grail
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

#

Hey @rich silo!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

โ€ข If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

โ€ข If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

rich silo
#

!code-blocks

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

rich silo
#
print('Hello world!')
  
hasty grail
#

There you go. ๐Ÿ™‚

grave path
#

Hello guys I have a question I'm trying to gather a bit of information for a project and I'm looking into an image classification problem, where I have for example different animals and the program needs to be able to classify the animals with best accuracy? What would you guys recommend me to look into considering I would want to test multiple algorithms and see what would be most accurate for such a problem should I use MachineLearning or Deep Learning and what tools should I learn or libraries?

#

If I use keras would I be able to specificy which algorithm I want it to use or how exactly does it work

hasty grail
#

What do you mean by "algorithm"?

#

Keras is pretty much for Deep Learning only

grave path
#

Well I just want someone to direct me a bit tbh, the algorithm to classify the pictures

hasty grail
#

If you want to use other types of Machine Learning methods (such as KMeans) you may want to take a look at sklearn

grave path
#

like different algorithms will give different accuracy

#

should I be using ML or DL?

#

what would be easier ?

hasty grail
#

DL is a subset of ML

#

In your case, I do recommend using DL

#

There are plenty of tutorials on Keras that you can search online

grave path
#

I know its a subset but I don't understand then if I use ML would then ML automatically use DL behind the scenes

hasty grail
#

Keras is a Deep Learning library

#

Anything you set up there is basically DL

grave path
#

Then I can specify in Keras whether I want it to use CNN, RNN or other algorithms?

rugged owl
#

Hi everyone, as one of the authors of the open-source framework github.com/dstackai/dstack Iโ€™d like to kindly share with the community what I and my friends are doing currently to help use ML models in applications.

In today blog post we wrote on how one can run ML models on live data to build interactive reports with our open-source library https://blog.dstack.ai/run-ml-model-on-live-data-to-build-interactive-reports If this is something relevant to your work, weโ€™d appreciate your feedback!

dstack.ai

Using ML models to predict and solve business use cases currently is a very lengthy and iterative process. Data scientists and engineers need to tackle a lot of challenges continuously, from building and improving the models to deploying them and usi...

pallid oxide
#

Hi guys! I'm looking for projects which utilises the concept of digital twins. I'm doing a research for a school assignment and thus would like to see what has been done already.

vast lava
#

data-science , I am looking for SymPY for calculating integral from calculus, I am struggling with some fundamentals for calculating the area under the curve. Can anyone help ?

rich silo
#

Hello all, I am looking for some help with plotly.
I want to make 2 vertically stacked graphs that share the same range slider (and also the x-axis).
This is my code so far:

#

Too long to paste

pallid oxide
#

@rich silo plotly provides a library called dash, which has that capability. Maybe you could take a look at it?

hasty grail
#

@grave path Sorry for the late response, you basically build your model block by block so it's highly customizable.

grave path
#

@hasty grail its okay thanks a lot man I'll just have to figure out whether I will do it in ml or dl considering the time frame I have and which is less complex as Im learning ML right now

smoky bobcat
#

anyone can suggest me a really good dataset to work on for a uni coursework?

hollow sentinel
#

@smoky bobcat Kaggle is a great source for datasets

lapis sequoia
viral rock
#

hi I am new in python is there anyway to install pep8

#

in pycharm

#

i am using mac os

austere swift
#

it should just be pip3 install pep8

spare trellis
#

I don't know if this is the place to ask but should I learn SQL before starting a data science course or am I able to learn it while doing it?

austere swift
#

you dont really need sql for data science

spare trellis
#

Oh, do I need any other language understanding besides Python 3 or should I be set to dive in and have fun with it

austere swift
#

no you can just go straight in lol

spare trellis
#

woot thank you, have a good one

smoky bobcat
#

how does this one look

hollow sentinel
#

@smoky bobcat depends on what you want to do

smoky bobcat
#

uni coursework

hollow sentinel
#

yes but like do you want tabular data?

smoky bobcat
#

what do you mean

austere swift
#

what kinda data do you want

#

images? numbers? tables? etc

smoky bobcat
#

numbers

austere swift
#

that would be tabular data

smoky bobcat
#

its been all day im searching for a good dataset as i need to start working on something asap

heady hatch
#

Here's something real basic.

smoky bobcat
#

that would be tabular data
@austere swift oh okay

heady hatch
#

All numbers, nothing but numbers.

#

Lots of data science parts to practice.

austere swift
#

tabular data is basically just any data thats in the form of tables, like different features of something

smoky bobcat
#

Here's something real basic.
@heady hatch bro, cant do the most basic ones like these, these are used as example is in uni lectures

heady hatch
#

Oh man.

errant cargo
#

Hey guys, i'm a recent grad of computational physics, and for this year i've studied lots of python, data science tools like numpy, pandas, scikit-learn, data visualization, basic sql, machine learning fundamentals and algorithms with scikit-learn, and neural networks with keras tensorflow. I'm doing some projects, but I feel like it would be better for me to ask an experienced person on some tips, so that I know i'm not just wasting time. What more should I learn, what projects should I make, and is there something that i'm missing?

austere swift
#

well if you havent gone into the deep math of neural networks and machine learning i highly recommend you do since thatll help you make much better models and will make your life a whole lot easier

#

as for projects that's really up to you and what you wanna do

#

since you're into computational physics you can do some projects of machine learning in physics

#

there were a few papers I've heard of that used machine learning and deep learning for CFD simulations and it made them wayy more efficient in terms of processing and speed, you can try to replicate those

errant cargo
#

Yeah, I did go into the maths, they really are useful. Some holes here and there but I intend to make an implementation on each and everyone of them soon enough to make sure I learned. Yesterday I did a little project on computational physics, went well actually.

#

I'll try looking up for them

#

I've been wanting to know what else should I learn to get started on the career as a junior DS. I've heard that stuff like Azure spark is important

#

Does anyone knows a platform for people looking for a mentee?

austere swift
errant cargo
#

i'll try it out

#

thanks!

errant cargo
#

Apparently its all paid, and I don't have much money atm. And I only want some guide/roadmap, I don't need someone to teach me something in specific.

heady hatch
#

@errant cargo Hmm anything you're looking for specifically?

#

To say that you're not wasting time, and what you should learn, depends on what your final goal is.

#

Do you want to get a job? Do you want to go back into academia? etc etc.

errant cargo
#

Getting a job first for sure

heady hatch
#

Okay now what kind of job?

#

Do you want a MLE, DE, DS, DA, etc etc.

errant cargo
#

Data Scientist

#

Don't know exactly what domain tbh

#

finance, or tech

heady hatch
#

Okay, DS have different requirements and definitions at different companies.

#

Do you know what kind of company and what kind of ds they're looking for?

#

I think it's good that you have a good pool of skills to refine.

#

Now the next step would probably be looking for particular company to understand what skills they're looking for.

errant cargo
#

makes sense

heady hatch
#

Because how's your analytical skills?

errant cargo
#

than work on the stuff they require

#

In terms of EDA, i believe that a few more notebooks and it'll be really decent

heady hatch
#

Hmm not just EDA.

#

But actually breaking down a problem.

errant cargo
#

THeres some statistical concepts that I need to learn that my uni didnt cover, but thats fine

heady hatch
#

Let's say a company asks you to break down why their user engagement is decreasing by 10% over the past few weeks.

errant cargo
#

Abstracting and stuff, its decent. Can get better

#

hmm

heady hatch
#

EDA is nice and helpful in many things but not really helpful if you can't get some insight that will help with the solution.

errant cargo
#

itll depend on what do I have to work with

#

But I guess that what I have to work with depends on me as well

#

Say, maintaining a data base

heady hatch
#

Here's some of the definition of DS I've come across.

  • Hard ML researchers
  • DS for products/decisions
  • DS, that's a senior version of DA
  • Some combination of DA + DE, maybe MLE
#

Probably many more.

#

Hard ML researchers usually look for graduate degrees in actual ML.

#

DS for products and decisions is dealing with the question I asked above.

#

senior version of DA is also that but I suppose adding ML to the mix.

#

Sometimes company doesn't have infrastructure so they ask you to do the data engineering too.

errant cargo
#

That would be something i would have to work on a lot if they ask me

#

since I dont have a CS curriculum, just a computational physics

heady hatch
#

I think if you want a direction for the next step to take, talk to people who are actually working and ask them what their company is like and what their data scientists are like.

#

I think having some kind of comfort with programming is nice.

errant cargo
#

I've been interested in IBM recently, so i'll try that first

heady hatch
#

Which then helps you ease into what company might be actually looking for.

errant cargo
#

I'll try to find some then

heady hatch
#

Good luck.

#

Feel free to come back and ask more questions.

errant cargo
#

Atm i'm just developing skills that I know that i'll use as a Data Scientist, but I havent looked into the gritty details yet

#

Which now would be the moment

#

Thanks a lot, would definitely help

heady hatch
#

Not to be mean but to play devil's advocate. How do you know you'll use them as a data scientist?

#

Unless you've had data scientist experience already, I'm curious of what you're using as your ground of evidence.

errant cargo
#

Everywhere that I looked it mentioned

#

I'm mostly learning from books that are focused on data science

heady hatch
#

That's fair.

#

To be honest, I'm in a similar boat as you.

errant cargo
#

Data Science Tools for python, hands on machine learning with scikit-learn and keras

heady hatch
#

I don't have any fancy degree in ML and pretty much everything is self taught.

#

Currently working as a NLP engineer.

errant cargo
#

It's rough

heady hatch
#

There are some data scientists I've come across that doesn't touch ML at all.

errant cargo
#

Thats cool, i've been wanting to learn a bit on NLP

heady hatch
#

Which then kinda makes me question why are you learning sklearn and tf/pt if you're not going to use them on your job.

#

But I'm digressing.

#

I think asking industry people for their experience is a much better metric.

#

Because you get to see what they're working with and what they're looking for.

errant cargo
#

yeah, nothing better than people actually working on it

#

Although I would guess that it would depend a lot on the job that they're doing

heady hatch
#

Mhm.

errant cargo
#

So I would have to ask more than one person

heady hatch
#

๐Ÿ‘

#

I hope that gives you somewhat of a direction for the next step to take.

errant cargo
#

thanks a lot for the help though, def helped

#

yeah, it did

#

Since you're working already, would you recommend me getting an intern before trying to apply as a DS?

heady hatch
#

Yea, unless you have some kind of connection to the company.

#

Or maybe sometimes they're okay with you just having academia experience.

#

I think that part depends on how well you sell yourself in terms of job search + interview.

errant cargo
#

yeah

#

In any case, maybe its good to do 2 months or 3 of internship just to fixate the stuff i learned

#

thanks for the talk bud

heady hatch
#

Ye, update us. Would love to hear your progress.

errant cargo
#

Yeah, same for you

smoky bobcat
#

how do I balance a dataset?

heady hatch
#

You can under, over, or combine under and over sampling.

smoky bobcat
#

@heady hatch u good good in this ml stuff?

heady hatch
#

Maybe? I have no idea.

#

I can only provide my thoughts. lol

smoky bobcat
#

lol u work?

heady hatch
#

You should ask your ds questions. hahaha

smoky bobcat
#

lol

heady hatch
#

Hey guys question on fine tuning gpt2.

Let's say I'm trying to generate stories, would it be better to fine tune it on the whole story text or the stories broken down into sentences?

Never mind, figured out a direction to head towards!

rich silo
#

Hello all, I am looking for some help with plotly.
I want to make 2 vertically stacked graphs that share the same range slider (and also the x-axis).
This is my code so far:

proper swift
#

anyone use Kaggle on here?

gray phoenix
#

I have an variable integer that is 20201015.

How do i convert it to datetime while maintaining the format of yyyymmdd?

hollow sentinel
#

@proper swift yeah it's a great resource

proper swift
#

ive forgot my password and my reset email hasnt come through yet :/

hollow sentinel
#

you can't download it yourself?

#

oh

#
print(val_y.head())
#

anyone know why it's saying invalid syntax

#

I don't see it

gray phoenix
#

I dont think you need print with .head()

#

@hollow sentinel

hollow sentinel
#

nope still wrong

gray phoenix
#

why are you printing the df?

hollow sentinel
#

bc Kaggle asked

gray phoenix
#

oh lol

hollow sentinel
#
# print the top few validation predictions
print(iowa_model.predict(val_X.head())
# print the top few actual prices from validation data
val_y.head()
#

confusion

#

why is kaggle so stupid

#

idk why it's wrong too

#

nvm copy pasting from the answer key fixed it for some reason

nova smelt
#

Is this the channel for stuff related to machine learning and AI?

hollow sentinel
#

yessir

nova smelt
#

If so...
Can you guy recommend any tutorials to learn neural networks? I've watched the series about neural networks and the series about machine learning by tech with Tim

#

Dunno if you know him

#

But now I feel kinda stuck in what to do next

hollow sentinel
#

Tensorflow 2.0 deep learning and artificial intelligence

#

it's on udemy

nova smelt
#

Okay

#

I will check that out

#

Thx

hollow sentinel
#

no problem

rich silo
acoustic shadow
#

i need some hep with pandas?

#

my Dataframe seems to be duplicating its self

rich silo
#

lol how

#

code plz

acoustic shadow
#

sure

#
import seaborn
import matplotlib.pyplot
import numpy
import pandas
import requests
import re
import parse
from parse import *
import pandas as pd
    


#Pull Database, from Site 
DB = requests.get("https://www.milehighcomics.com/cgi-bin/genresearch.cgi?title=SUPERM").text

#Global Variables, to pull from
lines = DB.split("\n")

#Create Easy Dataframe to Confirm Conditions

data = pandas.DataFrame({
  "Store": "Mile High Comics",
  "Comics": lines ,
  "Comics": lines ,
  "Comics": lines ,
  "Comics": lines
})
#Change Data Frame size to display The entire Data frame


print("This is working?!?!")

#Fuctions, which search the scraped Site

def BatmanMap(line):
    for line in lines:
      return 1 if search("Batman", line) else 0
  
def WWMap(line):
    for line in lines:
      return 1 if search("Wonder Woman", line) else 0

def GLMap(line):
    for line in lines:
      return 1 if search("Green Lantern", line) else 0

def FMap(line):
    for line in lines:
      return 1 if search("Flash", line) else 0

#Mapping to the Data Frame

data["Wonder Woman"] = data["Comics"].map(WWMap)

data["Green Lantern"] = data["Comics"].map(GLMap)

data["Batman"] = data["Comics"].map(BatmanMap)

data["Flash"] = data["Comics"].map(FMap)

pd.set_option("display.max_rows", None, "display.max_columns", None)

print(data)
#

The project is to eventually, graph a bar chart, detailing something, what i choose was superheros appearing in Superman Titles, on this comic stores Site, however i suck at coding, and it doesnt seem to be working, i added in the Pd.Set_option, but ever since that was added it just makes 2 data frames, one which is the Entire Sites, Source and another where it is correctly formatted, but doesnt work (because i suck)

#

so right now im just trying to get it to where Pandas formats the Dataframe i want, and not reposts the sites Source....

heady hatch
#

I don't know if this is the issue you're having but I think your dataframe is initialized with the same column rewriting itself.

data = pandas.DataFrame({
  "Store": "Mile High Comics",
  "Comics": lines ,
  "Comics": lines ,
  "Comics": lines ,
  "Comics": lines
})
lapis sequoia
#

Any recommendations on a finite difference book that gives examples in Python?

tropic junco
#

how can i make a bar graph, with my x axis like - [1, 1, 1, 2, 3, 3, 2, 1, 5, 4, 1, 1, 2, 4, 2, 3, 1, 2], basically i want to make a graph based on the occurences of same elements

acoustic shadow
#

Didnt fix it

#

but, did reduce code.

#

so thanks

heady hatch
#

@tropic junco I'm not sure what you're talking about with the xaxis, but you can look into how to make a histogram.

tropic junco
#

i see

tropic junco
#

i did it :)

heady hatch
#

Congratulations!

undone flare
#

what does index_col do in pd.read_csv()

heady hatch
#

It sets the index as the column you want.

undone flare
#

k

rancid mango
#

is this the chat for aperture science

winged lark
#

good morning

mild topaz
#
Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\demo3\recDoc1.py", line 283, in post
    print("{}: {:.2f}%".format(label1, predictions1 * 100))
TypeError: unsupported format string passed to numpy.ndarray.__format__```
summer holly
#

Hi, I'm trying to deploy my custom keras flask app which has a size of about 2.3gb and due to these heavy size constraints, I don't think it is possible to use heroku or netlify to deploy it. Is there any alternative?

#

*free alternative

#

Or even a budget alternative

grave frost
#

Google VM??

#

@mild topaz The error is pretty self-explanatory - you passed an incorrect datatype

slender nymph
#

hi, someone can help me and say me what thats mean :
ValueError: exog does not have full column rank.

grave frost
#

Did you google your error first?

slender nymph
#
a=PanelOLS(dependent=df['logQ'],exog=df[['founderCEO','logassets','logage','bs_volatility']],time_effects=True)
print(a.fit())```
#

yeah and not find a solution

grave frost
#

BTW Post the whole Traceback

#

So it becomes easier to help you

slender nymph
#
import pandas as pd
import numpy as np
from linearmodels import PanelOLS


#lecture data 
data = pd.read_excel("familyfirms.xlsx")

#drop NaN
data.dropna(inplace=True)

#Log Tobin's Q
data['logQ'] = np.log(data['Q'])

#Log age
data['logage'] = np.log(data['agefirm'])

#Log assets
data['logassets'] = np.log(data['assets'])

df = data.set_index(['company','year'])

a=PanelOLS(dependent=df['logQ'],exog=df[['founderCEO','logassets','logage','bs_volatility']],time_effects=True)
print(a.fit())

data```
#
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-313-bb26f941ef72> in <module>
     21 df = data.set_index(['company','year'])
     22 
---> 23 a=PanelOLS(dependent=df['logQ'],exog=df[['founderCEO','logassets','logage','bs_volatility']],time_effects=True)
     24 print(a.fit())
     25 

~\anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights, entity_effects, time_effects, other_effects, singletons, drop_absorbed)
   1038         drop_absorbed: bool = False,
   1039     ) -> None:
-> 1040         super(PanelOLS, self).__init__(dependent, exog, weights=weights)
   1041 
   1042         self._entity_effects = entity_effects

~\anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights)
    242         )
    243         self._original_index = self.dependent.index.copy()
--> 244         self._validate_data()
    245         self._singleton_index: Optional[NDArray] = None
    246 

~\anaconda3\lib\site-packages\linearmodels\panel\model.py in _validate_data(self)
    381         w = w / w.mean()
    382         self.weights = PanelData(w)
--> 383         rank_of_x = self._check_exog_rank()
    384         self._constant, self._constant_index = has_constant(x, rank_of_x)
    385 

~\anaconda3\lib\site-packages\linearmodels\panel\model.py in _check_exog_rank(self)
    343         rank_of_x = matrix_rank(x)
    344         if rank_of_x < x.shape[1]:
--> 345             raise ValueError("exog does not have full column rank.")
    346         return rank_of_x
    347 

ValueError: exog does not have full column rank.```
#

thats the rror

#

error

#

i dont understand why

golden saffron
#

Any leads on chatbot powered by Generative Models? Even any git repo link will do.

grave frost
#

@golden saffron DO you want to make one, or do you want to use some pre-existing model?

golden saffron
#

@grave frost I want to make one, but need some reference and guidance. I have already made few rule based and context based bots.

grave frost
#

Do you know what is a generative model?

#

And have you done NLP before?

golden saffron
#

Yes, my next target is to make a Bot who can interact with user.

grave frost
#

Yeah, but do you know Machine Learning?

golden saffron
#

Something like GPT - 3.

grave frost
#

Are you actually trying to make GPT-3 ? I am confused

golden saffron
#

Yes, ML, NLP, RL I know. What to use RL for the chatbot.

#

Not exactly GPT - 3 but as I mentioned above some RL based chatbot.

grave frost
#

You can't use RL in a chatbot ๐Ÿคฆ

misty cargo
#

You can't use RL in a chatbot ๐Ÿคฆ
@grave frost imagine lol

grave frost
#

FIrst, I recommend brush up on the basics of ML and NLP first before diving in to chatbots

golden saffron
#

Why not. every conversation will be at one state, there would be some information available about that user that can be used for the conversation

grave frost
#

wow. That is not how it works

#

Recommend to brush up on RL as well

misty cargo
#

so what do u want exactly?

unborn wraith
#

hey!

misty cargo
#

hi

unborn wraith
#

guys i am new to programming any good resource to learn data science\

golden saffron
#

That state can tell me the interest of the user, at lest gender age etc that can be used in conversation.

misty cargo
#

guys i am new to programming any good resource to learn data science
@unborn wraith sure do you know the maths already?

grave frost
#

That state can tell me the interest of the user, at lest gender age etc that can be used in conversation.
@golden saffron ok , just tell us for what task is the chatbot for

unborn wraith
#

@misty cargo nope

#

i am a lil young

misty cargo
#

or do you need some calculus, linear algebra and stuff too

golden saffron
#

@grave frostI think you are mistaken about RL. RL is all about having a state, a option to be picked up and a reward.

#

WHy cannot that be applied for chatbots?

grave frost
#

@golden saffron You can research about that. Bottom line is that it would produce a random bag of words

unborn wraith
#

please tag me

misty cargo
#

@misty cargo nope
@unborn wraith oh ok then i recommend starting with calculus, you can find courses on mit open courseware for both LA and Calculus

golden saffron
#

Dude, That would come up with a set of meaning responses that the RL will have to select.

misty cargo
#

that applies to probability and statistics too, mit has pretty good courses

golden saffron
#

Dude, That would come up with a set of meaningfull responses that the RL will have to select.

grave frost
#

@unborn wraith Just see 3b1B youtube videos and it would keep you an extremely good base

golden saffron
#

e.g. a Hi can be responded by Hi, how are you

grave frost
#

@golden saffron bro, it doesn't work like that

golden saffron
#

or by Hello, whats up

misty cargo
#

after that i suggest

unborn wraith
#

can anyone provide me a link?

grave frost
#

You would have to provide a whole skeleton for RL to fill it up with below avg accuracy

misty cargo
#

https://www.coursera.org/learn/machine-learning Stanford Machine Learning (Andrew NG)
http://work.caltech.edu/lectures.html Caltech courses that are great
https://www.fast.ai/ EVERYTHING FROM FAST.AI

Coursera

Learn Machine Learning from Stanford University. Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, ...

Making neural nets uncool again

golden saffron
#

@grave frost, can you explain where did you actually used RL? and whats your understanding of it.

grave frost
unborn wraith
#

thanks

grave frost
#

Yw ๐Ÿ™‚

misty cargo
#

thanks
@unborn wraith np

golden saffron
#

I just want to explore RL into Chatbot to understand the user and have a better meaning-full and rewarding conversation.

#

if it fails that will be perfectly fine.

grave frost
#

bro, you can explore ofc, no one is stopping you, but you will have to research a bit to find out how it can be used. THe way you described it not how it is to be done

golden saffron
#

Okay that means you don't know RL.

misty cargo
#

also guys i came talking here just because there was some requirement of sending 50 messages or smth

grave frost
#

ohk, I am not saying anything now lemon_swag

misty cargo
#

if you need help im free to help

undone flare
#

any tips for ds?

grave frost
#

@undone flare depends on what you want to do (in general)

undone flare
#

data analysis..

misty cargo
#

any tips for ds?
@undone flare don t jump to deep learning right off

undone flare
#

yea I am not

misty cargo
#

most problems can be solved with scikit in like light speed

#

even tho you may not like it lol

grave frost
#

@undone flare yeah, there is a site called kaggle.com with great datasets. There is something called EDA - Exploratory Data analysis. Find a dataset you like (There are tons of real world ds and are pretty great). You can check the EDA others have done and try to learn the libs....

undone flare
#

I learned the basics of NumPy and currently learning Pandas and I am using the Pokemon Dataset from Kaggle

grave frost
#

great! It's a pretty good place to start your DS journey.

undone flare
#

Should I take up the udemy bootcamp course?

golden saffron
#

ohk, I am not saying anything now lemon_swag
@grave frost Anyways you don't know much, still thanks for the info.

#

Should I take up the udemy bootcamp course?
@undone flare Take free things, There are lot many free things available and most important take up a internship.

unborn wraith
#

@grave frost the course u gave me is that for maths?

grave frost
#

@golden saffron If you keep saying things like that, you would be reported to server sooner or later. We are not getting paid to help you - it's completely voluntary. Don't get too frazzled up about these things and google things first instead of harassing others

golden saffron
#

Internships will be very help full in learning real life issues in data science.

undone flare
#

@undone flare Take free things, There are lot many free things available and most important take up a internship.
@golden saffron just wanted your guys opinion

#

alright

#

thx

grave frost
#

@unborn wraith It tries to explain things intuitively - without maths. That's why I liked it since it helps to grasp concepts easily and then explore the maths side of it

unborn wraith
#

ok thanks ๐Ÿ™‚

undone flare
#

when should I go for ML?

#

after learning some basic modules?

grave frost
#

@undone flare When you feel like it ๐Ÿ™‚

unborn wraith
#

is it worth to learn ml and data science now?

grave frost
#

@undone flare If you are getting bored of EDA and other things, just do some courses to help you get started. Try simple things first (perceptron and linear regeression are good first projects, though the names may sound heavy)

undone flare
#

@unborn wraith yes

misty cargo
#

after learning some basic modules?
@undone flare maths(calculus, linear algebra, probability & statistics), then classical ml and then deep learning

golden saffron
#

Also @undone flare, Try working with spark and cloud as well as many real life datasets are on cloud and they use spark ML libraries for the same.

undone flare
#

@undone flare maths(calculus, linear algebra, probability & statistics), then classical ml and then deep learning
@misty cargo I still need to learn calculus

#

is Kaggle Competitions good?

grave frost
#

@undone flare Youtube it. Easiest way to learn something fast

misty cargo
#

@misty cargo I still need to learn calculus
@undone flare i suggest mit open courseware

#

is Kaggle Competitions good?
@undone flare yup but focus on the ones marked with #knowledge

undone flare
#

Okay, thx guys very helpful ๐Ÿ‘

grave frost
#

yup but focus on the ones marked with #knowledge Any reason why? lol

#

can still participate in the lower end ones, like $500 or so

undone flare
#

I only know numpy and pandas so I will just skip competitions for now lol

golden saffron
#

@golden saffron If you keep saying things like that, you would be reported to server sooner or later. We are not getting paid to help you - it's completely voluntary. Don't get too frazzled up about these things and google things first instead of harassing others
@grave frost Look bro, I asked for some suggestion. If you dont know its fine, No need to panic and rant out things, A Generative models donโ€™t rely on pre-defined responses. They generate new responses from scratch. When we have multiple GMs they will give multiple responses, and I am just exploring RL here. You responses to my question was simply idiotic. The above definition is textbook def of GM.

grave frost
#

@undone flare S'ok - you will get there eventually

#

@golden saffron Man, just read what I posted above. I didn't doubt about RL of GM. All I said is that your approach to using RL/GM in NLP is very wrong and you should research about that.

#

Right now you are just raging that on why it wouldn't work and saying that I am not fit to answer. If you think so, just ignore me. Why would you keep pinging me after that??

pale thunder
#

please keep it civil, both of you.

hollow sentinel
#

hEaTeD

#

I don't think the Kaggle mini courses are helpful they're kind of cookie cutter

grave frost
#

Is that analogy supposed to be obvious? I don't do much baking

hollow sentinel
#

oh that just means it's really simple

#

I'm kind of scared of Ng's course bc it's not in Python

#

so I'm doing Kaggle instead

grave frost
#

What? What is it in - julia, matlab or somthing?

hollow sentinel
#

Octave

#

probably bc Octave was more prevalent back then for machine learning

grave frost
#

That's surprising, should have been converted to python by now

hollow sentinel
#

yeah but I found a github that does everything in python

#

so that's good

#

I think the google crash course is good too

grave frost
#

yeah, but not much coding (atleast not in the start)

hollow sentinel
#

I like courses that make me code from the start

#

I didn't like Columbia's course bc it was so focused on theory that it was boring

#

it's why I liked the Python for Data Science and Machine Learning Bootcamp so much

grave frost
#

yep, especially in ML, the theory-practical balance is just too bad. There are vids explaining complex things in 4 points and then there are people who explain it all by pretty advanced code. sad

hollow sentinel
#

statquest is good for explaining

#

I just wish Ng decided to do it in python

grave frost
#

He has other important work too, except making new courses lemon_swag

hollow sentinel
#

he has another course in deep learning AO

#

AI

grave frost
#

PLus there are plenty others too, so it's not like there isn't much choice

hollow sentinel
#

I just think I won't get much help if I do it in octave

#

whatever

grave frost
#

I think matlab is great - it doesn't even require coding for most tasks

#

Like regression is done from GUI

#

Just a few values here and there, load the database, a couple drop downs and boom, your regression is done. And it handles some pretty complex graphs upto 3D (in old version) too

hollow sentinel
#

Ng has people do linear regression by hand

#

no using sci kit learn

#

ooooooooh spooky

ebon lynx
#

there's nothing wrong with doing linear regression by hand

hollow sentinel
#

I know I just never did it before

ebon lynx
#

because the real ML stuff happens always "by hand"

#

the deeper you go

hollow sentinel
#

yeah

#

can someone explain what underfitting is? What does it mean to perform poorly on training data

ebon lynx
#

@hollow sentinel y = 1. fit that model

#

that's underfitting

grave frost
#

because the real ML stuff happens always "by hand"
@ebon lynx I disagree- in the world where every other guy uses scikit-learn and Keras, it just isn't ML "by-hand" anymore - more like a glorified version that involves programming for people is a trend, so they can secure a good job

ebon lynx
#

@grave frost I do scikit learn + keras but I still feel like I'm missing out

#

a lot of real world problems require knowing how to program the solutions

grave frost
#

That's my point

hollow sentinel
#

lol @ebon lynx i still donโ€™t get it

ebon lynx
#

@hollow sentinel it's a shit model and it won't fit anything

#

unless y = 1

hollow sentinel
#

oh ok

grave frost
#

Few understand how it all actually works

hollow sentinel
#

yeah well ML is a niche field

#

itโ€™s like cybersecurity

grave frost
#

cybersecurity is not a niche field

hollow sentinel
#

oh

ebon lynx
#

neither is ML

grave frost
#

Though ML is kinda

#

Coz the people who truly understand it have PHD's - years of experince and studying to get to be the experts

hollow sentinel
#

yep

grave frost
#

CyberSec can be done by a postgrad

#

or script kiddies also these days

hollow sentinel
#

well yeah but the ones who arenโ€™t script kiddies are hard to find

grave frost
#

no they aren't

#

It's not that technical

hollow sentinel
#

oh

grave frost
#

Even you can learn a great deal about it in a few weeks (and implement it if coding skills are good)

#

THe thing is to just have knowledge about the methods involved

hollow sentinel
#

idk I just found ML more interesting

grave frost
#

me too ๐Ÿ™‚

#

I find the lack of interpretability of ML models very interesting, which is one of the reasons why I delved into it

hushed wasp
#

Hello,

Does someone can tell me what I need to change to not replace every rows by Nan please? ๐Ÿ™‚

hollow sentinel
#

.fillna would be a good method to look into

hushed wasp
#

it's just that my comprehension list replace every other variables than the ones with Kbtu by Nan and in don't know why

hollow sentinel
#

how much of your dataset is NaN

hushed wasp
#

there aren't before my last code line

ebon lynx
#

@hushed wasp to find the correct columns, try the function .filter(like="kBtu")

#

that will give you only the columns with that in the name of the column

hushed wasp
#

for the location of the columns it "works" just the Nan replacement I don't know how to solve

#

df = df[df[[c for c in df.columns if c.endswith('(kBtu)')]] >= 0]

#

it's this last line which gives me so much nan

fierce swallow
#

O

heady hatch
#

@hollow sentinel are you still confused about underfitting?

hollow sentinel
#

underfitting is where the model hasn't learned enough from the training data

#

right?

#

but like what does it mean to not learn enough

heady hatch
#

@hollow sentinel let's focus on a classic model, linear regression.

How do you know when a linear regression is doing badly?

hollow sentinel
#

the mean squared error

heady hatch
#

And what does that tell you?

hollow sentinel
#

how close a regression line is to a set of points

heady hatch
#

Right right, and in terms of prediction this means how good or bad your prediction is.

#

So where does underfitting and overfitting come in?

#

Looking solely at underfitting first.

#

Let's think about the relationship between weight and the height.

#

Let's first assume there's a linear relationship between the two.

#

where f(x) = y, and x = weight and y = height.

#

Meaning we're trying to use weight to predict height.

#

How does that theory sound to you?

#

Do you think it makes sense that if people's weight increases, their height will increase in some linear fashion too?

drowsy kite
#

Hey guys, wondering if i could get a solution to a small problem i'm having with pandas

#

im trying to use "read_html" on a url but the url is behind a login screen. even when i login with bs4 pandas dosn't recogise it has access pass the login screen. is there another way to do this?

hollow sentinel
#

Yes @heady hatch

heady hatch
#

@hollow sentinel okay now think about what happens if the algorithm predict average of height for all weight.

Meaning f(x) = avg.

#

How would you describe this algorithm in terms of complexity and the quality of prediction?

#

Is there anything wrong with the algorithm? What's going to happen with the MSE?

heady tide
hollow sentinel
#

idk

velvet thorn
#

but like what does it mean to not learn enough
@hollow sentinel there is an actual physical relationship between two populations of data (features and target). a model is one "guess" (based on mathematical rules) at that relationship, which we can evaluate.

#

naturally, we do not have access to the whole population, but only a subset (the datasets that we perform training on)

#

we say a model is "underfit" when the actual relationship is much more complex than that represented by the model

hollow sentinel
#

Got it

tawny oak
#

hey guys

#

I have this dataframe

#

and I want to make it like this

#

do you have any idea?

austere swift
#

i don't get what you mean

#

oh wait i think i see it now

#

you wanna sum all the ones that have the same id?

tawny oak
#

YEAH

austere swift
#

i think i remember there being a function for this but i don't remember what it was called

#

wait no it was just a groupby

#

df.groupby(['id']).sum()

velvet thorn
#

df.groupby(['name', 'id']).sum()

tawny oak
#

I know this but it give me the name just one time

velvet thorn
#

I know this but it give me the name just one time
@tawny oak elaborate

#

did you do

#

what I said?

tawny oak
#

YEAH

velvet thorn
#

show the result

tawny oak
#

it give

#

give me that

velvet thorn
#

it's supposed to be like that

velvet thorn
#

.reset_index()

tawny oak
#

nope

velvet thorn
#

actually, no

#

df.groupby(['name', 'id'], as_index=False).sum()

#

.reset_index works too though

#

p sure you didn't use it right

#
  name  id  minutes
0    A  11        3
1    A  13        3
tawny oak
#

yeah thank you

#

?D

velvet thorn
#

yw

ripe forge
#

Wanted Ideas for metric: what's a good substitute for false positive rate in a one-class object detection algorithm?

#

The kicker is: it would be important for this to be model agnostic. And truly capture the essence of "how likely is my model to falsely predict another object where none exists"

#

Any suggestions or even partial ideas welcome.

heady hatch
#

How come 1 vs 0 wouldn't work for the metric?

#

Where it detects the class or it doesn't.

ripe forge
#

The issue with object detection is that it's not a binary detection. There's the problem with localization as well (where in the image is an object detected). As such, when it doesnt predict a box, it's doing a good job out of an amazingly large number of candidate boxes that were never predicted.

#

So we don't really compute true negatives for object detection (and if we did it wouldn't be model agnostic anyways) thus leading me to this issue.

cedar sky
#

Anyone into Kaggle can DM me we could form a team

heady hatch
#

I was actually thinking of per pixel binary detection.

#

During inference, you would predict whether the pixel is part of the object you're trying to detect or not.

#

in terms of localization, it would be part of the extraction to localize on where it thinks the object is, then within the ROI detect the object.

#

Then you can have an average precision rate of how well it recognize the pixels.

tawny oak
#

hey

#

I have a pandas series which type is string

#

the series is like this

#

10:30

#

02:45

#

I want it hour:minute

#

could I change data type?

nova smelt
#

yo so i am a beginner in ML and neural networks and i am currently tryining to create a face recognition neural network with tensorflow and keras
i have finnaly figured out to bring the data in the right shape
but my accuracy is 0.00 sth xDDD
how do i find out which loss functions i should use, which activation functions and how many denselayers
cause i guess thats why i have such a low accuracy xDD
or how many epochs i do need

grave path
#

how do i do this

#

in Jupyter

undone flare
#

@grave path do you mean find determinant of metrices?

#

or matrix multiplication?

grave path
#

nevermind I figured it out

#

no i meant the headline xD

#

like the font itself

undone flare
#

lol

grave path
#

Perhaps yo ucan help with this question

#

How do I keep using the return of a function instead of it only being available inside the function

#

It used the one defined outside the function if that makes sense

undone flare
#

hmm I see what you mean

#

you should store it in a variable and then do the conditions

#

if you know what I mean

grave path
#

yeah I just thought about that let me try it

undone flare
#

after the return dataset you should store that in a variable and then use it

grave path
#

after?

#

I stored the dataset inside another variable and then returned the new variable

undone flare
#

the line NominalEncoder(data, ....) store that in var

#

and then use that

grave path
#

ah I see what you mean ill try that

undone flare
#

yea otherwise they are overriding each other

grave path
#

Legend plus1

undone flare
#

Worked?

grave path
#

yeah I see this it just calls the function everytime to have that result

#

yeah it did โค๏ธ

undone flare
#

nice

grave path
#

cheers mate

undone flare
#

np.eye() and np.identity() are same right

lapis sequoia
#

will it be easy switching from web dev to ai

grave path
#

@lapis sequoia Hey mate I have done web dev for a while not the best in it but have built some websites and now I'm doing a bit of machine learning its not the hardest I have realized so far because some stuff in ML are repetitive but then again my experience is limited in both

#

Just go for it

lapis sequoia
#

ight thx

undone flare
#

@lapis sequoia if you have the basic knowledge of python which I am assuming you have because you were doing web dev so it will not be that hard

grave path
#

how do I know if i should use StandardScalar or MinMaxScalar?

hollow gull
#

@grave path It is one of those design decisions that is frequently not very obvious to me going into a problem. You can always try both, but it adds a lot of compute if you keep trying every combination of methods. I frequently will research the model I am planning to build and see if the documentation recommends normalization or standardization.

grave path
#

Thank you very much

spark nimbus
#

I'm working on a jupyter notebook in pycharm, but it takes up excessive amounts of memory. Does anyone know how to solve this?

molten hamlet
#

jupyter notebok in pycharm?

#

๐Ÿ˜ฎ

golden saffron
#

I'm working on a jupyter notebook in pycharm, but it takes up excessive amounts of memory. Does anyone know how to solve this?
@spark nimbus Directly use the python terminal. No need of Jupiter in pycharm.

undone flare
#

What I prefer is make a different folder and shift+right click and then install all the libraries and jupyter notebook and run the jupyter notebook from that folder itself so it is easier to keep track of stuff

#

Note : This is just my opinion

spark nimbus
#

@golden saffron No I mean, this is meant as interactive documentation

undone flare
#

@spark nimbus maybe allocate more memory to PyCharm

spark nimbus
#

I already allocated 8G rn

#

and it still happens

undone flare
#

oof

steel roost
#

hey guys

#

how would i turn this into a dictionary:

#

i really want to convert size_name to a dictionary

#

the output looks like 20511552:10

#

where the number after the colon is the size

#

nvm i found it

hollow gull
#

It is sort of confusing to me to rename something over the size variable inside the loop of sizes. Maybe the initial size variable in for size in sizes should be named differently?

tranquil apex
#
0     10   condo    A
1     24  duplex    D
2     32    home    D
3     25  duplex    A
4     65   condo    A
#

how do I turn it in to this^

Price    Type City   AVG
0     10   condo    A  37.5
1     24  duplex    D  24.0
2     32    home    D  32.0
3     25  duplex    A  25.0
4     65   condo    A  37.5
#

i tried groupby type, city and .agg price to mean

hollow gull
#

And what did that give you?

tranquil apex
#

incompatible index of inserted column with frame index

hollow gull
#

import pandas as pd

list_values=[[10, 'condo', 'A'],
[24, 'duplex', 'D'],
[32, 'home', 'D'],
[25, 'duplex', 'A'],
[65, 'condo', 'A']]

df_values = pd.DataFrame(list_values, columns=['Price', 'Type', 'City'])

df_values.groupby(by=['Type', 'City'], as_index=False).agg({'Price': 'mean'})

This outputs:
Type City Price
0 condo A 37.5
1 duplex A 25.0
2 duplex D 24.0
3 home D 32.0

Then you need to rename price as 'AVG' and then left join this dataset back on to your original dataset on Type and City.

tranquil apex
#

ya! i just did that and it worked

#

i used merge

hollow gull
#

I always forget if I need to use merge or join. I think one uses the index and the other columns, but I was being a little loose in my language :/

tranquil apex
#

i got what you meant tho, and it was useful!

#

practically the last piece to my puzzle

alpine bay
#

What determines the default values of an ndarray when created like this ?
a = np.ndarray((height,width,3),dtype=np.uint8)

molten hamlet
#

is there any book for image processing? but not for opencv, I know that, but some more advanced stuff, detecing peoples or creating haar contours

#

What determines the default values of an ndarray when created like this ?
a = np.ndarray((height,width,3),dtype=np.uint8)
@alpine bay I wrote that code and it is random

alpine bay
#

@molten hamlet what do you mean you wrote that code?

arctic wedgeBOT
#

Hey @indigo skiff!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .flac, .afdesign, .m4a, .csv.

Feel free to ask in #community-meta if you think this is a mistake.

indigo skiff
#

Hey guys i needed Help with assignment which is due within next few hours. I wanted to check if im doing it all right. It's introductory level masters assignment which is asking us to apply DFS, BFS, uniform cost search, best first search and Algorithm A functions along with few more interesting questions. I am not able to attached the assignment. Reading time would be 4-6 mins please could someone have a look. Any help would be really appreciated. I am looking for someone i can discuss this with quickly. I am a new member therefore please do excuse if im not asking this in write place. Unfortunately since i am new member i am not able discuss or use voice chat function therefore would anyone want to volunteer and have a quick discussion please? Thanks again everyone.

molten hamlet
#

@molten hamlet what do you mean you wrote that code?
@alpine bay just print(a) few times and you will see

spark dirge
#

@indigo skiff create sample problems and code up some stubs. Internet has Bfs and Dfs free for the picking.

hasty grail
#

@spare lotus What are you trying to do?

indigo skiff
#

@spark dirge are you available for quick discussion? please check message

timber pollen
#

deta

undone flare
#

?

verbal jetty
velvet thorn
#

@verbal jetty try encoding='latin1'

verbal jetty
#

Thank you @velvet thorn . but unfortunately the same result

velvet thorn
#

you sure

#

the encoding is correct?

#

can you

#

show

#

all the arguments

#

to pd.read_csv

#

other than filename

verbal jetty
velvet thorn
#

hm.

#

weird

#

what are you opening the left side in?

verbal jetty
#

numbers

velvet thorn
#

huh?

#

I mean, what program

verbal jetty
#

Numbers(MacOS)

#

Equivalent to excel

verbal jetty
#

got it - works with utf-16

waxen fiber
#

Hello everyone!
i am wondering how to extract validation data from this

self.__Dir_Data = tf.keras.preprocessing.image_dataset_from_directory(self.__Dir_Path ,validation_split = 0.1 ,subset="training", seed = 1,  labels='inferred', label_mode='int' ,batch_size=32 ,image_size=(124, 124))
#

I have following statement, which is getting from specified directory whole Train Data

#

And I store inside Dir Data the Train Data from directory, Is it possible to extract for example 10%-20% of images to separate Validation Data?

#

and it is the same for labels

sharp sage
#

hey

#

can someone help me iterate into a list?

#

i think i need to add an if statement to increase right?

#

wait

lapis sequoia
#

Well, looking add the code I expect it, to add the same number 1000 times

#

its never re-evaluated

sharp sage
#

yeah

#

its not called 1000times

#

only the list it

#

is*

#

sorry

#

the range

#

dont i just need to do somthing like

#

data * 1000

#

and add it to the list

#

should be correct?

#

the only thing being its saying array

lapis sequoia
#

better use a help channel for this

#

if you guys have data such as "time, Rates per minute, Rate of penetration, Torque, and Weight"
what type of algorithm do you suggest I use? I was thinking of just plotting the data then being like "when weight spiked at this time, the Rates per minute were increased"

#

I heard doing that is a type of algorithm called "linear regression"

#

you guys got any other suggestions ?

#

Well, plotting alone doesnt have a lot to do with linear regression

#

linear regression is finding the line, that fits the data (blue points) best

#

so if you plot your data and the points are arranged like this, linear regression is probably a good model

#

if it tilts slightly as time goes by

#

i.e. exponential,

#

What do you recommend I use then?

#

well, in most cases you would want to transform your data

#

and do linear regression after that

#

so if it looks exponential, you'd take the log

#

and do linear regression with the transformed values

#

Ohh, I see. Thanks for your explanation

undone flare
#

What library should I use for reading MySQL in pandas?

proven yarrow
#

anyone pls

spark nimbus
#
x = [-2, -1, -1, 1, 2, 3, 4]
y = [0, 0, -1, 1, 1, 0, 0]
plt.xlim(-1.5, 3.5)
ply.ylim(-1.5, 1.5)
plt.plot(x, y)
``` and then some labels too somewhere in there
shy mesa
#

how can I remove a row in my dataframe if it contain all NaN value? (using pandas)
I tried this but doesn't work:
.dropna(how='all', axis=0)

stark orchid
lone osprey
#

I have one doubt

#

I have taken a gender classification model

#

I have columns as 'names' and 'gender'

#

But for better training, I trained using columns 'starts by vowel/consonant', 'ends by vowel/consonant', 'long/short size'

#

I tranined it using decision tree classifier

#

And I saved the model

#

Now, I sent the model to someone

#

He knows only that dataset had 'name' and 'columns'

#

So, he gives predict([test['name'])

#

Will it return right answer? I mean, will it return gender?

#

Or he has to give only in way of test['starts with consonat/vowel', 'ends with consonant/vowel', 'short/long word']

#

Please ping me while saying solution to me

#

And please do provide me a solution

safe tapir
#

Any opinions on Dagster vs. Prefect?

smoky fractal
#

Hello, I am using pandas dataframes with the following short snippet: https://pastebin.pl/view/691d9cc0

I am getting the error

rsi.py:17: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I have tried using the .loc method, but It hasn't worked thus far. it says I'm making a copy, but I'm not sure how or where.

velvet thorn
#

how can I remove a row in my dataframe if it contain all NaN value? (using pandas)
I tried this but doesn't work:
.dropna(how='all', axis=0)
@shy mesa pandas methods create copies

#

they don't modify inplace

#

you need to reassign to the original variable or add inplace=True

#

@smoky fractal you're doing it at the start

#

symbolData = symbolData.tail(bars)

#

which is equivalent to symbolData.iloc[-5:]

#

anyway

#

your code could be improved a lot IMO

smoky fractal
#

ah, how can I isolate only the most recent x rows?

velvet thorn
#

so the simplest solution would be

#

to add a .copy() there

#

symbolData.tail(bars).copy()