#data-science-and-ml | Python | Page 74

visual sleet Jul 25, 2023, 6:42 AM

#

The language you speak

#

Where are you from

cursive drift Jul 25, 2023, 6:43 AM

#

uh, my native language, and russian

#

are u understanding for my bad english?

visual sleet Jul 25, 2023, 6:43 AM

#

So do you speak Russian

cursive drift Jul 25, 2023, 6:43 AM

#

yeah

visual sleet Jul 25, 2023, 6:43 AM

#

Mm

cursive drift Jul 25, 2023, 6:44 AM

#

what

visual sleet Jul 25, 2023, 6:44 AM

#

Alright

#

Tell you what

#

Find something you like and pursue it

#

Learn it thoroughly, inside and out

cursive drift Jul 25, 2023, 6:46 AM

#

i like code anything

#

but im interesting in AI

visual sleet Jul 25, 2023, 6:46 AM

#

I’m not an entrepreneur or a freelancer nor have I made any money from anything online whether that be coding or something else so good luck 🫡

cursive drift Jul 25, 2023, 6:46 AM

#

but i think i cant learn it for 9 month

cursive drift Jul 25, 2023, 6:46 AM

#

visual sleet I’m not an entrepreneur or a freelancer nor have I made any money from anything ...

okay, thx

hollow magnet Jul 25, 2023, 8:53 AM

#

https://discord.com/channels/267624335836053506/1133314395087327232

slim bone Jul 25, 2023, 9:25 AM

#

Hey fellas
I'm about to enter my 2nd year of CS - I have a couple of months before starting the year though. I'm contemplating what to do with this time and my top* choice at the moment is to start learning Pytorch (Specifically, due to academic courses I'll probably be taking later)
Throughout my first year I learned Discrete math, Calc1/2, and Linear1/2 (Unfortunately no probability/statistics yet)
I'm wondering whether or not this knowledge could be useful while learning, or will I need to wait a little while longer before I can utilize what I've learned in math?

Ty in advance to anyone who replies

civic elm Jul 25, 2023, 9:27 AM

#

I realized that also expensive, with large datasets how much could be the cost when using aws or Google cloud services?

civic elm Jul 25, 2023, 9:28 AM

#

slim bone Hey fellas I'm about to enter my 2nd year of CS - I have a couple of months befo...

Statistics and math is always a good career path

slim bone Jul 25, 2023, 9:29 AM

#

civic elm Statistics and math is always a good career path

Agreed, I’m planning on pursuing a masters in Deep Learning. I’m just not sure that’s what I asked 🙂

civic elm Jul 25, 2023, 9:35 AM

#

I would prefer masters in data science because it's more practical but I don't know really

slim bone Jul 25, 2023, 9:41 AM

#

I'm not sure a degree in Data Science is more practical than Deep Learning if I want to specialize in Deep Learning, you could be right though. As mentioned earlier I'm only starting my journey

I'd really like to emphasize however, that I'm asking about whether or not having my background in math could come useful in any significant way while learning Pytorch?* And not about my future academic ventures :)

civic elm Jul 25, 2023, 9:43 AM

#

Yes it's useful because when you want to train models

slim bone Jul 25, 2023, 9:43 AM

#

Could you elaborate how perhaps?

civic elm Jul 25, 2023, 9:44 AM

#

you would then be able to understand papers math equation on the model you are using

slim bone Jul 25, 2023, 9:44 AM

#

Oh. Will I necessarily be taught these though, through the common learning sources available?

#

I went through a few tutorials before and other than a very primitive explanation about Linear Transformations I didn't really notice anything "math-y"

#

Obviously the whole field is fundamentally very reliant on math. Perhaps I should rephrase:
How can I utilize my math background (Calc1/2, Linear1/2, Discrete) to learn Pytorch? (Assuming I can utilize it at all)

I think that's a relatively concise question

boreal gale Jul 25, 2023, 9:51 AM

#

it really depends on what do you want out of pytorch..? like what are you hoping to do with it.
"learn pytorch" is maybe a little bit vague, do you have a concrete goal in mind?

civic elm Jul 25, 2023, 9:53 AM

#

You mean In concrete examples? Like you would instantly know what a linear regression would do to your dataset

boreal gale Jul 25, 2023, 9:54 AM

#

my message was a question for @slim bone in case you misunderstood.

slim bone Jul 25, 2023, 9:54 AM

#

boreal gale it really depends on what do you want out of pytorch..? like what are you hoping...

Ah, you’re right. My apologies
Hopefully Deep Learning? Admittedly I’m not sure what my options are.

#

Machine Learning in general sounds really cool, too

#

I’d imagine there’s subsets for those disciplines as well, I’m sorry if my answers aren’t useful

civic elm Jul 25, 2023, 9:55 AM

#

What mean by practicality is that I see more job openings for masters in data scientists than masters in DL

slim bone Jul 25, 2023, 9:56 AM

#

Ah, but if I just wanted a job I’d just stick to Fullstack lol

boreal gale Jul 25, 2023, 9:56 AM

#

right, to learn how to use pytorch requires some linear algebra.
to understand how model works properly you need a mertric ton of linear algebra and calc

civic elm Jul 25, 2023, 9:57 AM

#

slim bone Ah, but if I just wanted a job I’d just stick to Fullstack lol

not what I meant

slim bone Jul 25, 2023, 9:57 AM

#

boreal gale right, to learn how to use pytorch requires some linear algebra. to understand h...

Oh snap. Could you elaborate? I’m assuming my background doesn’t qualify as “Metric Ton” - what other academic courses should I take?

slim bone Jul 25, 2023, 9:57 AM

#

civic elm not what I meant

Oh, sorry

civic elm Jul 25, 2023, 9:57 AM

#

I mean you can practice your education in the real world

slim bone Jul 25, 2023, 9:59 AM

#

Ah perhaps I was unclear - I’m not looking to just utilize my knowledge. I’m hoping to utilize it about something I’m passionate about

boreal gale Jul 25, 2023, 9:59 AM

#

slim bone Oh snap. Could you elaborate? I’m assuming my background doesn’t qualify as “Met...

i assume you are based in the US?
i don't know what is calc1/2 and linear1/2, and i assume that's a standard thing in the US?
you potentially already have the necessary knowledge and just need to fill in minor gaps by reading papers

slim bone Jul 25, 2023, 10:02 AM

#

boreal gale i assume you are based in the US? i don't know what is calc1/2 and linear1/2, an...

Ah, no I'm not based in the US at all
Calculus 1 and Calculus 2 are basically the courses that teach you about Limits, Derivatives, Integrals, Taylor series(es?), multivariable calculus, function series, etc'..
Linear Algebra 1/2 mostly teach you about the fundamentals of Linear Algebra, transforming a matrix into a diagonal one, checking if that's possible, Tensors, Bilinear forms, and the superset for diagonal matrices whose name I can't remember (The one that's built out of eigenvalues, with 1's across the secondary diagonal if that makes sense)

#

It's obviously a little hard to compress a years-worth of knowledge into a concise paragraph but I hope I managed to get the message across

civic elm Jul 25, 2023, 10:03 AM

#

slim bone Ah perhaps I was unclear - I’m not looking to just utilize my knowledge. I’m hop...

I got you.. If I would have all the resources I would even get a phd in DL

slim bone Jul 25, 2023, 10:03 AM

#

Oh and I don't know if this small detail is relevant but I think I technically learned Real Analysis and not Calculus

boreal gale Jul 25, 2023, 10:03 AM

#

slim bone Ah, no I'm not based in the US at all Calculus 1 and Calculus 2 are basically th...

oh right, that's already a plenty good start.

slim bone Jul 25, 2023, 10:03 AM

#

So, more proof-based I suppose.

slim bone Jul 25, 2023, 10:04 AM

#

civic elm I got you.. If I would have all the resources I would even get a phd in DL

I'm definitely considering a PhD but that's obviously so far ahead of me haha

short path Jul 25, 2023, 10:05 AM

#

Guys, I'm trying to install pandoc to turn rmarkdown into pdf but the .msi file isn't running in my windows

#

do you know how could I install it?

slim bone Jul 25, 2023, 10:05 AM

#

boreal gale oh right, that's already a plenty good start.

Right, so I'm wondering if I can use this knowledge somehow that's relevant to what I want to do in the future

#

Kind of dip my toes in the water, if that makes sense

short path Jul 25, 2023, 10:06 AM

#

And Jupyter says I need to install it

slim bone Jul 25, 2023, 10:06 AM

#

Because as mentioned earlier - the tutorials I've found don't really dive into the math-side of things

boreal gale Jul 25, 2023, 10:06 AM

#

slim bone Right, so I'm wondering if I can use this knowledge somehow that's relevant to w...

i would just pick a paper that you feel interested in (that's related to NN), look up a reference implementation and see how people did it, think about why.
repeat with another paper but try not to look at the reference implementation and replicate the paper's result and see if you enjoy this

also you can try coming up with extension to their models/apply it to a new problem

short path Jul 25, 2023, 10:07 AM

#

@slim bone vc é brasileiro, né?

slim bone Jul 25, 2023, 10:08 AM

#

boreal gale i would just pick a paper that you feel interested in (that's related to NN), lo...

So, basically if I'm understanding you correctly:

Learn about the fundamentals of Pytorch
Try to look up some fundamental papers related to DL (Not sure how to find em', but I'll figure it out later I suppose)
Try to understand the papers, and replicate the results to see if I understand
Recurse step 2?

#

Also err, what's NN?

slim bone Jul 25, 2023, 10:08 AM

#

short path <@213073438221271041> vc é brasileiro, né?

I'm not sure what this means, my apologies!

short path Jul 25, 2023, 10:09 AM

#

Oh, my bad

#

I thought you were from my country

slim bone Jul 25, 2023, 10:09 AM

#

All good haha

civic elm Jul 25, 2023, 10:10 AM

#

slim bone So, basically if I'm understanding you correctly: 1) Learn about the fundamental...

that roadmap looks like a top to bottom approach, I would go bottom up meaning not using pytorch only numpy first

boreal gale Jul 25, 2023, 10:10 AM

#

slim bone So, basically if I'm understanding you correctly: 1) Learn about the fundamental...

NN = neural network.

slim bone Jul 25, 2023, 10:10 AM

#

civic elm that roadmap looks like a top to bottom approach, I would go bottom up meaning n...

I'm not entirely sure what this means
Does Pytorch utilize Numpy as a dependancy?

boreal gale Jul 25, 2023, 10:10 AM

#

civic elm that roadmap looks like a top to bottom approach, I would go bottom up meaning n...

that's also valid!

civic elm Jul 25, 2023, 10:10 AM

#

slim bone I'm not entirely sure what this means Does Pytorch utilize Numpy as a dependancy...

yes

slim bone Jul 25, 2023, 10:11 AM

#

boreal gale NN = neural network.

Ah, silly me.
Does my aproach look good in theory though?
Perhaps indeed, learn some Numpy together with step 1

boreal gale Jul 25, 2023, 10:12 AM

#

slim bone Ah, silly me. Does my aproach look good in theory though? Perhaps indeed, learn ...

yeah, being familiar with numpy will help you a lot in the long run. that's a good shout.

tall tulip Jul 25, 2023, 10:13 AM

#

My dataset contains 21k values approx, the dataset values are recorded after every 5-min, but there are 764 values which are not in 5-min interval, So, I try to resample the non 5-min dates to 5-min interval using resample . I have tried the following code:

df_raw['Time'] = pd.to_datetime(df_raw['Time'])
df_raw.drop_duplicates(subset='Time', inplace=True)
df_raw.set_index('Time', inplace=True)
df_freq = df_raw.resample('10T').ffill()
# df_freq = df_raw.resample('5min').interpolate(method='polynomial', order=1)
# df_asfreq = df.asfreq('5T')
# df_resampled = df_raw.resample('5T', on='Time').asfreq()
# df_freq = df_raw.resample('5min').sum()```
The issues are that:
It makes the dataset for 12 months whether my dataset only contains data from Feb to April
It reduce my dataset from 21k values to 9k values

boreal gale Jul 25, 2023, 10:13 AM

#

tall tulip My dataset contains 21k values approx, the dataset values are recorded after eve...

got some example data we can play with?

slim bone Jul 25, 2023, 10:13 AM

#

Great. Better get to it then - Too much future planning can be harmful at times I suppose
Thanks a lot you two! @boreal gale@civic elm

tall tulip Jul 25, 2023, 10:15 AM

#

@boreal gale Okay let me make a sample data for you

short path Jul 25, 2023, 10:15 AM

#

Jupyter isn't getting the characters in a table right. Is there a way to allow it to show the right characters?

#

It should be like that:

#

In RStudio ^^

boreal gale Jul 25, 2023, 10:16 AM

#

short path Jupyter isn't getting the characters in a table right. Is there a way to allow i...

post example data for us to test 😄

short path Jul 25, 2023, 10:16 AM

#

I just loaded the table from a book I'm using to study

#

load(url("http://ime.usp.br/~pam/dados.RData"))
tab2_1

#

So just these two lines ^^

#

the problem is that it should show "médio" instead of "mÃ©dio"

#

Since in RStudio it works fine, the problem must be with the encoding in jupyter

#

@boreal gale I guess you won't be able to open it there because you would need an R kernel

#

but do you have an ideia on how to change the jupyter encoding?

#

to allow more characters

tall tulip Jul 25, 2023, 10:21 AM

#

@boreal gale can I upload sample dataset here?

boreal gale Jul 25, 2023, 10:21 AM

#

tall tulip <@231160898872410123> can I upload sample dataset here?

!paste

arctic wedgeBOT Jul 25, 2023, 10:21 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

boreal gale Jul 25, 2023, 10:22 AM

#

post a dump (e.g. dataframe.to_dict()) there if possible

boreal gale Jul 25, 2023, 10:22 AM

#

short path <@231160898872410123> I guess you won't be able to open it there because you wou...

yeah i don't have it at hand in this PC.

tall tulip Jul 25, 2023, 10:23 AM

#

I just make a csv file with 270 values, it's not that big

tall tulip Jul 25, 2023, 10:23 AM

#

boreal gale post a dump (e.g. `dataframe.to_dict()`) there if possible

Ok doing it

boreal gale Jul 25, 2023, 10:24 AM

#

short path to allow more characters

seems fine to me 🤷 that's the default

short path Jul 25, 2023, 10:25 AM

#

Let me try to do the table manually then

#

to see if that works

boreal gale Jul 25, 2023, 10:25 AM

#

boreal gale seems fine to me 🤷 that's the default

just in case

pd.read_csv(io.StringIO("""
l'accent aigu (acute accent) – é
l'accent grave (grave accent) – à, è, ù
la cédille (cedilla) – ç
l'accent circonflexe (circumflex) – â, ê, î, ô, û
l'accent tréma (trema) – ë, ï, ü
"""), sep='–')

short path Jul 25, 2023, 10:29 AM

#

@boreal gale it shows this

#

but now I don't know if the problem is really with the character or with the way I used the function

boreal gale Jul 25, 2023, 10:31 AM

#

ah! sorry i was confused, you are using R throughout and not python/pandas.

short path Jul 25, 2023, 10:31 AM

#

yeah

#

but I'm trying to adapt it

#

the code

boreal gale Jul 25, 2023, 10:32 AM

#

it might be an issue with whatever dataframe library you are using, i will have to try it myself in a bit

short path Jul 25, 2023, 10:33 AM

#

Ok. thank you

#

that's the data:

#

N;estado_civil;grau_instrucao;n_filhos;salario;idade_anos;idade_meses;reg_procedencia
1;solteiro;ensino fundamental;;4,00;26;3;interior
2;casado;ensino fundamental;1;4,56;32;10;capital
3;casado;ensino fundamental;2;5,25;36;5;capital
4;solteiro;ensino médio;;5,73;20;10;outra
5;solteiro;ensino fundamental;;6,26;40;7;outra
6;casado;ensino fundamental;0;6,66;28;0;interior
7;solteiro;ensino fundamental;;6,86;41;0;interior
8;solteiro;ensino fundamental;;7,39;43;4;capital
9;casado;ensino médio;1;7,59;34;10;capital
10;solteiro;ensino médio;;7,44;23;6;outra
11;casado;ensino médio;2;8,12;33;6;interior
12;solteiro;ensino fundamental;;8,46;27;11;capital
13;solteiro;ensino médio;;8,74;37;5;outra
14;casado;ensino fundamental;3;8,95;44;2;outra
15;casado;ensino médio;0;9,13;30;5;interior
16;solteiro;ensino médio;;9,35;38;8;outra
17;casado;ensino médio;1;9,77;31;7;capital
18;casado;ensino fundamental;2;9,80;39;7;outra
19;solteiro;superior;;10,53;25;8;interior
20;solteiro;ensino médio;;10,76;37;4;interior
21;casado;ensino médio;1;11,06;30;9;outra
22;solteiro;ensino médio;;11,59;34;2;capital
23;solteiro;ensino fundamental;;12,00;41;0;outra
24;casado;superior;0;12,79;26;1;outra
25;casado;ensino médio;2;13,23;32;5;interior
26;casado;ensino médio;2;13,60;35;0;outra
27;solteiro;ensino fundamental;;13,85;46;7;outra
28;casado;ensino médio;0;14,69;29;8;interior
29;casado;ensino médio;5;14,71;40;6;interior
30;casado;ensino médio;2;15,99;35;10;capital
31;solteiro;superior;;16,22;31;5;outra
32;casado;ensino médio;1;16,61;36;4;interior
33;casado;superior;3;17,26;43;7;capital
34;solteiro;superior;;18,75;33;7;capital
35;casado;ensino médio;2;19,40;48;11;capital
36;casado;superior;3;23,30;42;2;interior

#

and the codeline:

#

tab2_1<-read.table("tabela2_1.csv", dec=",", sep=";",h=T)

boreal gale Jul 25, 2023, 10:35 AM

#

perfect

#

it probably is due to the R dataframe library, doing it in pandas seems fine to me

short path Jul 25, 2023, 10:37 AM

#

boreal gale it probably is due to the R dataframe library, doing it in pandas seems fine to ...

but it may be something with the jupyter as well

#

because it works fine in RStudio

boreal gale Jul 25, 2023, 10:37 AM

#

boreal gale it probably is due to the R dataframe library, doing it in pandas seems fine to ...

sorry that's not true. can't replicate in R kernel

short path Jul 25, 2023, 10:38 AM

#

#

with the data you send me ^^

boreal gale Jul 25, 2023, 10:39 AM

#

wot <- ' â, ê, î, ô, û'
wot

how about this

short path Jul 25, 2023, 10:39 AM

#

let me see

#

tall tulip Jul 25, 2023, 10:40 AM

#

@boreal gale https://paste.pythondiscord.com/IHGQ
The discord suggest me this link

boreal gale Jul 25, 2023, 10:40 AM

#

short path

sorry i meant run that as a cell.

short path Jul 25, 2023, 10:40 AM

#

oooh

#

Now I'm puzzled

boreal gale Jul 25, 2023, 10:41 AM

#

okay perfect

tall tulip Jul 25, 2023, 10:42 AM

#

@boreal gale #data-science-and-ml message here is the link to my question.
And 21055 0 days 00:10:00 21056 0 days 00:00:00 21063 0 days 01:40:00 21109 0 days 00:10:00 21115 0 days 00:10:00
These are the time step which are not in 5-min

boreal gale Jul 25, 2023, 10:42 AM

#

short path

try specifying the encoding argument?

short path Jul 25, 2023, 10:43 AM

#

ô

#

it worked

#

boreal gale Jul 25, 2023, 10:44 AM

#

tall tulip <@231160898872410123> https://discord.com/channels/267624335836053506/3666732478...

👍 give me a moment to parse your question again..

tall tulip Jul 25, 2023, 10:45 AM

#

Sure

short path Jul 25, 2023, 10:45 AM

#

@boreal gale do you know if there's a way to do something like that?

#

for cases in which I'm not doing the "read.table" to create the dataset

boreal gale Jul 25, 2023, 10:53 AM

#

tall tulip <@231160898872410123> https://discord.com/channels/267624335836053506/3666732478...

#

did i get your requirement right?

tall tulip Jul 25, 2023, 10:57 AM

#

If you check the index 21055 and 21056 There time are duplicated I want to remove duplicates, and if you check the index 21062 and 21063 the time difference is above 1 hour I want to make all the times are 5-min interval

#

@boreal gale see the difference

boreal gale Jul 25, 2023, 11:01 AM

#

tall tulip <@231160898872410123> see the difference

i get "If you check the index 21055 and 21056 There time are duplicated I want to remove duplicates"
and i don't get "and if you check the index 21062 and 21063 the time difference is above 1 hour I want to make all the times are 5-min interval"
it's not clear enough what you want yet.
do you want to "insert" more entries in-between, at 5 minute interval, using the previous seen temperature readings?

boreal gale Jul 25, 2023, 11:03 AM

#

short path for cases in which I'm not doing the "read.table" to create the dataset

not sure, my R-using days are way behind me.

short path Jul 25, 2023, 11:03 AM

#

boreal gale not sure, my R-using days are way behind me.

Have you learned R before Python?

boreal gale Jul 25, 2023, 11:04 AM

#

short path Have you learned R before Python?

yep

tall tulip Jul 25, 2023, 11:04 AM

#

"do you want to "insert" more entries in-between, at 5 minute interval, using the previous seen temperature readings?"
I just want to make my complete dataset to 5-min interval, there are 764 interval which are not 5-min.

boreal gale Jul 25, 2023, 11:05 AM

#

tall tulip "do you want to "insert" more entries in-between, at 5 minute interval, using th...

still don't get it sorry.
type out the example output if you see rows 21060 to 21064 please

short path Jul 25, 2023, 11:06 AM

#

boreal gale yep

Why did you stop using R so much? Is Python that much more effective? I'm curious because I'm at the beginning of my major in Statistics and I wanted to learn Python, but all my professors use R

#

Don't you want to use R for the data visualization at least?

short path Jul 25, 2023, 11:07 AM

#

short path Don't you want to use R for the data visualization at least?

to complement what you can do with Python

tall tulip Jul 25, 2023, 11:09 AM

#

Okay let me give you an example: if you see the time column at index 21062 and 21063, at index 21062 the the time is 27/04/2023 20:55 but when you see the time at index 21063 it jumps two hour 27/04/2023 22:35. It needs to be 27/04/2023 21:00 not 27/04/2023 22:35.

#

@boreal gale

boreal gale Jul 25, 2023, 11:09 AM

#

short path Why did you stop using R so much? Is Python that much more effective? I'm curiou...

Why did you stop using R so much?
because i stop being a statistician.
I'm curious because I'm at the beginning of my major in Statistics and I wanted to learn Python, but all my professors use R
the statistics support in R is much better (not sure if that's still the case today, python has come a long way, especially in time series modelling which is the one thing that's mega awesome in R back in the day), i would stick to R if you are more productive at it. but for job prospect.. learning python is probably just an eventuality, might as well get started now 😛
Don't you want to use R for the data visualization at least?
not really, matplotlib, seaborn, bokeh/plotly is plenty for my needs.

boreal gale Jul 25, 2023, 11:11 AM

#

tall tulip Okay let me give you an example: if you see the time column at index 21062 and 2...

okay, still missing curcial bit of information.

given

27/04/2023 20:55
27/04/2023 21:15

what is the output?

27/04/2023 20:55
27/04/2023 21:00

OR

27/04/2023 20:55
27/04/2023 21:00
27/04/2023 21:05
27/04/2023 21:10
27/04/2023 21:15

OR something else?

tall tulip Jul 25, 2023, 11:12 AM

#

I want this output:

27/04/2023 21:00
27/04/2023 21:05
27/04/2023 21:10
27/04/2023 21:15```

short path Jul 25, 2023, 11:12 AM

#

boreal gale > Why did you stop using R so much? because i stop being a statistician. > I'm c...

Would you recommend learning both python and R at the same time, focus on one first and then the other or it would be better for me to just focus in Python and do just the necessary for what my professors ask me to do in R?

boreal gale Jul 25, 2023, 11:13 AM

#

tall tulip I want this output: ```27/04/2023 20:55 27/04/2023 21:00 27/04/2023 21:05 27/04/...

perfect. that's what i meant by inserting rows inbetween.

boreal gale Jul 25, 2023, 11:15 AM

#

tall tulip I want this output: ```27/04/2023 20:55 27/04/2023 21:00 27/04/2023 21:05 27/04/...

have you considered

deduplicating your dataset first (by aggregation or whatever you want)
generate the min-max timestamp range at 5 minute interval
set time as index if you haven't
pd.DataFrame.reindex with 2)

?

boreal gale Jul 25, 2023, 11:16 AM

#

short path Would you recommend learning both python and R at the same time, focus on one fi...

hmmm.. that's a good question. one that i don't have an answer to.

short path Jul 25, 2023, 11:17 AM

#

Do you come across data science books written in R? @boreal gale

boreal gale Jul 25, 2023, 11:18 AM

#

not really! (i don't read much 😦 )

short path Jul 25, 2023, 11:19 AM

#

Do you prefer to learn in a top-down approach?

#

by getting projects first and then learning what you have to know to complete it

#

practicing a lot

boreal gale Jul 25, 2023, 11:20 AM

#

probably yes. i am the kind of person who sometimes disregard docs and actually look at source code of libraries i am using..

boreal gale Jul 25, 2023, 11:21 AM

#

tall tulip I want this output: ```27/04/2023 20:55 27/04/2023 21:00 27/04/2023 21:05 27/04/...

df_raw = df.copy()
df_raw['Time'] = pd.to_datetime(df_raw['Time'])
df_raw.drop_duplicates(subset='Time', inplace=True)
df_raw.set_index('Time', inplace=True)
df_raw.reindex(pd.date_range(df_raw.index.min(),  df_raw.index.max(), freq='5T')).ffill()

something like this i mean (#data-science-and-ml message)

short path Jul 25, 2023, 11:22 AM

#

short path Do you come across data science books written in R? <@231160898872410123>

and @boreal gale did some job demand you to use R because it was the language used there before?

#

these are my two fears for not learning R

boreal gale Jul 25, 2023, 11:23 AM

#

short path and <@231160898872410123> did some job demand you to use R because it was the la...

i didn't even go look for those job so no.
after i graduated, i became a data scientist using python, eventually just an enginner that uses python mostly
i probably use R 1-2 times in my career thus far..

short path Jul 25, 2023, 11:23 AM

#

short path and <@231160898872410123> did some job demand you to use R because it was the la...

or because of your coworkers

short path Jul 25, 2023, 11:24 AM

#

boreal gale i didn't even go look for those job so no. after i graduated, i became a data s...

I get it

#

Do you use kaggle?

#

I want to get to be able to do some projects there

#

and be kinda competitive

#

and Python seems to be way more effective than R for that

boreal gale Jul 25, 2023, 11:25 AM

#

ah it's important to note i am no longer a data scientist 😂
(but to answer your question, i tried, but i had better things to do, so actually kinda no.)

short path Jul 25, 2023, 11:26 AM

#

What do you work with now?

boreal gale Jul 25, 2023, 11:26 AM

#

this is going offtopic 😛 catch me in one of the off topic channel 😉

short path Jul 25, 2023, 11:27 AM

#

Ok. I'm curious just because there's been some talk about the field of data science risking to be way smaller

#

because of the new tools

#

and a possible market saturation

tall tulip Jul 25, 2023, 11:28 AM

#

boreal gale ```py df_raw = df.copy() df_raw['Time'] = pd.to_datetime(df_raw['Time']) df_raw....

Thank you man It works It resolve both the issues. @boreal gale

short path Jul 25, 2023, 11:28 AM

#

short path Ok. I'm curious just because there's been some talk about the field of data scie...

So I wonder if that made you move to other field

cosmic dew Jul 25, 2023, 1:12 PM

#

Hi guys, I started to study Python, is there any website that you recommend me to practice?

tidal bough Jul 25, 2023, 1:12 PM

#

codewars, I guess

oblique quarry Jul 25, 2023, 1:26 PM

#

Guys I posted a question in the python help channel would be much appreciated if someone would take the time to take a look at it

misty flint Jul 25, 2023, 2:19 PM

#

huggingface has a great API

#

for ML

#

saves a lot of time for stuff

late ruin Jul 25, 2023, 2:23 PM

#

Hi I hope someone could help me out, I have data in a file named [MESTP].JZF from what I've searched around it and found nothing of this sort of file extension, but there is data there , would love to hear for some help, how could I read that kind of file using pandas/pickle to table it in jupiter

left tartan Jul 25, 2023, 3:26 PM

#

late ruin Hi I hope someone could help me out, I have data in a file named [MESTP].JZF fro...

Perhaps it’s a common file type, but just with the wrong extension. You can check the first few bytes to see if it has a known magic number. You can use file xyz in Linux which does this for you. You could try opening it in a compression utility like 7zip that can inspects the metadata.

worn plank Jul 25, 2023, 4:39 PM

#

im tryna help my gf with her hw and she's tasked with finding the center and radius of a circle, and im following this video but it gets to one point thst confuses both of us. why would (x^2+4x+4)+(y^2+10y+25) simplify to (x+2)^2+(y+5)^2? where does the 4x and 10y go??

small wedge Jul 25, 2023, 4:43 PM

#

worn plank im tryna help my gf with her hw and she's tasked with finding the center and rad...

Because if you break the expressions out to something like (x + 2) * (x + 2) == x*x + 2*x + x*2 + 2*2 and you multiply the constant by the variable i.e. the left 2* the right x and the right 2* the left x you are left with 2 terms of 2x which combine to make 4x

#

Same with the y's

teal mesa Jul 25, 2023, 4:58 PM

#

The function x^2+4x+4 has two zero points at -2, which means you can write x^2+4x+4=(x+2)^2

cosmic dew Jul 25, 2023, 4:59 PM

#

tidal bough codewars, I guess

thanks

quiet seal Jul 25, 2023, 6:43 PM

#

you know what would be nifty

#

#

I've summarized categories of stuff with Pandas and drawn spider charts with plotly before

#

but it never occurred to me to have each category be a slice of a pie chart, and build out those slices from layers

#

so if you're doing a CMM with ratings from non-existent to 1-5, you have 5 colors (or desaturate as you go out from greatest to least maturity, and color each grouping of capabilities) and then just grayed out at the outside

#

but I don't think that kind of charting exists in python yet?

#

it's one hell of a lot of information in one place. 3 dimensional: group (pie slice), categories within the group (bands), portion of the group in each category (thickness of the bands)

tidal bough Jul 25, 2023, 8:16 PM

#

In your actual example, do you really have as many different functions as you have elements?

#

Because if so, not sure anything can be done about it.

#

Well, these functions are all just python code, so one way or another they all need to be executed on the corresponding elements and that'll take the bulk of the time. Not sure if anything can be done.

ripe forge Jul 25, 2023, 8:47 PM

#

Vectorization assumes the same operation on multiple data points. If all you are doing are running completely independent functions on independent values, then you can't really do much this way. Options would be either just running it all in parallel (and it would depend on actual task whether you get speedups or not) or rewrite your functions somehow first if that's an option. Make them all the same or make them more efficient.

obsidian otter Jul 25, 2023, 9:20 PM

#

yoo

#

i want to build an ai, if i manage to do everything where do i train that ting ?

rapid temple Jul 25, 2023, 9:29 PM

#

what is the default model used for OpenAI and ChatOpenAI classes when no model is specified?

#

is it text-davinci-003 and gpt-turbo-3.5 by default?

young granite Jul 25, 2023, 9:43 PM

#

quiet seal you know what would be nifty

one avoids pie charts at all cost

tepid tartan Jul 25, 2023, 11:12 PM

#

It is better to become data analytics and then slowly transition to data science?

quiet seal Jul 25, 2023, 11:17 PM

#

young granite one avoids pie charts at all cost

spider charts are useful but lower information than this. Plus it's not really a pie chart when the data is along the radius rather than represented by the arc length

serene scaffold Jul 25, 2023, 11:27 PM

#

tepid tartan It is better to become data analytics and then slowly transition to data science...

Those terms aren't really used precisely and consistently in industry. What education and experience do you have currently, and where are you trying to get?

rugged rapids Jul 25, 2023, 11:55 PM

#

tepid tartan It is better to become data analytics and then slowly transition to data science...

doesnt matter

lapis sequoia Jul 26, 2023, 12:21 AM

#

What's the best way to get started with AI's?

#

Any recommended tutorials?

#

Recommend programs or websites

#

I want to make a discord bot that talks in chat like me

tepid tartan Jul 26, 2023, 12:22 AM

#

serene scaffold Those terms aren't really used precisely and consistently in industry. What educ...

I got none, I haven’t taken computer classes yet beside sql. Plan on doing the khan stats prop/ linear math when my summer semester done

serene scaffold Jul 26, 2023, 12:23 AM

#

tepid tartan I got none, I haven’t taken computer classes yet beside sql. Plan on doing the k...

so you're pursuing a degree currently? what degree?

serene scaffold Jul 26, 2023, 12:24 AM

#

lapis sequoia I want to make a discord bot that talks in chat like me

that is not a good first project. you will run out of motivation before you feel any reward.

#

unless you're okay with the responses not sounding natural.

lapis sequoia Jul 26, 2023, 12:24 AM

#

Hmm, what do you recommend?

serene scaffold Jul 26, 2023, 12:25 AM

#

!resources data science

arctic wedgeBOT Jul 26, 2023, 12:25 AM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

serene scaffold Jul 26, 2023, 12:25 AM

#

I would follow along with a book or course, and then try developing basic models, even if you don't have a use case in mind for them.

lapis sequoia Jul 26, 2023, 12:28 AM

#

I want to do something AI related..

rugged rapids Jul 26, 2023, 12:30 AM

#

arctic wedge

there are no projects from these resources

rugged rapids Jul 26, 2023, 12:30 AM

#

lapis sequoia I want to do something AI related..

do whatever you want man its coding

#

have fun

serene scaffold Jul 26, 2023, 12:35 AM

#

lapis sequoia I want to do something AI related..

yes, that will link you to the data science ones.

left tartan Jul 26, 2023, 12:37 AM

#

lapis sequoia Any recommended tutorials?

This one has been highly recommended as an intro in Python: https://cs50.harvard.edu/ai/2020/

CS50's Introduction to Artificial Intelligence with Python

This course explores the concepts and algorithms at the foundation of modern artificial intelligence, diving into the ideas that give rise to technologies like game-playing engines, handwriting recognition, and machine translation. Through hands-on projects, students gain exposure to the theory behind graph search algorithms, classification, opt...

left tartan Jul 26, 2023, 12:38 AM

#

serene scaffold yes, that will link you to the data science ones.

Do you think cs50 should be added to resources?

serene scaffold Jul 26, 2023, 12:39 AM

#

left tartan Do you think cs50 should be added to resources?

you can open an issue on the meta repo to suggest that. https://github.com/python-discord/meta/issues/new/choose

tepid tartan Jul 26, 2023, 12:46 AM

#

serene scaffold so you're pursuing a degree currently? what degree?

Bachelor of computer science

flint grail Jul 26, 2023, 12:47 AM

#

@tepid tartan

#

i'm currently 13 and i've been doing Python for a while

#

is this a good age to start or do most start younger

#

i've done lua when i was 9

#

when did you start doing computer science

mild dirge Jul 26, 2023, 12:47 AM

#

I never know if these comments are trolls or..

flint grail Jul 26, 2023, 12:47 AM

#

mild dirge I never know if these comments are trolls or..

what?

#

why would this be a troll

tepid tartan Jul 26, 2023, 12:48 AM

#

flint grail is this a good age to start or do most start younger

Depends on what you very curious into.

flint grail Jul 26, 2023, 12:48 AM

#

please shut up.

lapis sequoia Jul 26, 2023, 12:48 AM

#

left tartan This one has been highly recommended as an intro in Python: https://cs50.harvard...

does it actually take 7 weeks??

flint grail Jul 26, 2023, 12:48 AM

#

tepid tartan Depends on what you very curious into.

AI I want to work with AI but other things too

#

such as, work with cyber security

left tartan Jul 26, 2023, 12:48 AM

#

lapis sequoia does it actually take 7 weeks??

I guess it takes however long you want.

flint grail Jul 26, 2023, 12:49 AM

#

lapis sequoia does it actually take 7 weeks??

7 weeks for a course in which you as a "Harvard Student" have other classes too

left tartan Jul 26, 2023, 12:49 AM

#

flint grail 7 weeks for a course in which you as a "Harvard Student" have other classes too

It’s an online self paced course

flint grail Jul 26, 2023, 12:49 AM

#

dont they have

#

exams on it?

left tartan Jul 26, 2023, 12:50 AM

#

Only if you want to do them

flint grail Jul 26, 2023, 12:50 AM

#

what?

#

so they can pass a class without taking an exam?

tepid tartan Jul 26, 2023, 12:51 AM

#

If you enjoy it, you can purse it

flint grail Jul 26, 2023, 12:51 AM

#

tepid tartan If you enjoy it, you can purse it

how long does it take

#

to achieve what I stated previously

left tartan Jul 26, 2023, 12:51 AM

#

I suggest just visiting the page, they list several ways to take it and has a lot of information.

flint grail Jul 26, 2023, 12:52 AM

#

left tartan I suggest just visiting the page, they list several ways to take it and has a lo...

i already checked out the harvard link

tepid tartan Jul 26, 2023, 12:52 AM

#

Python? Or…

flint grail Jul 26, 2023, 12:52 AM

#

currently im doing python

#

i got to go finish this course

#

so i'll see you later guys

cursive drift Jul 26, 2023, 5:25 AM

#

hey, where i can learn finetuning?

oblique quarry Jul 26, 2023, 5:53 AM

#

cursive drift hey, where i can learn finetuning?

finetuning as in setting hyperparameters?

#

comes with practice tbh like you get at some point a feeling when looking at the data to know how aggressive you can go with the learning rate decay and whatnot

cursive drift Jul 26, 2023, 6:21 AM

#

oblique quarry finetuning as in setting hyperparameters?

as approach to transfer learning in which the weights of a pre-trained model are trained on new data, i need to know what i need to learn, resources, videos, what module in python

oblique quarry Jul 26, 2023, 7:01 AM

#

I only read about residual Learning (microsofts ResNet) as part of my cv project. If you wanna know more about this kinda stuff you should ask more creditable members of the data-science channel such as @past meteor. But I can link you some resources I used https://arxiv.org/abs/1512.03385

arXiv.org

Deep Residual Learning for Image Recognition

Deeper neural networks are more difficult to train. We present a residual
learning framework to ease the training of networks that are substantially
deeper than those used previously. We explicitly reformulate the layers as
learning residual functions with reference to the layer inputs, instead of
learning unreferenced functions. We provide comp...

cursive drift Jul 26, 2023, 7:15 AM

#

oblique quarry I only read about residual Learning (microsofts ResNet) as part of my cv project...

thx

lyric olive Jul 26, 2023, 10:08 AM

#

I have started working as healthcare AI ML engineer, any good resources for AI in Pathology & Radiology like Monai.io

barren fable Jul 26, 2023, 10:27 AM

#

I have a question in machine learning. A lot of people have told me that when you split your data in your code, it's better to split it into training and testing data because validation is not that important. Is that true?

steady spindle Jul 26, 2023, 10:58 AM

#

Hello, I started a new project "self assisting A.I.", need a guidance so that I can complete this project, So if any of you want to join. 😇

hasty mountain Jul 26, 2023, 11:25 AM

#

lyric olive I have started working as healthcare AI ML engineer, any good resources for AI i...

There's a library of datasets called "MedMNIST". It has many optical microscopy images, X-rays and I think some other exams, too.

#

Guys, I'm trying to use Genetic Algorithm optimization together with Stochastic Gradient Descent to optimize my VAE which as already reached its plateau. I'm only a bit confused on whether I should use Genetic Algorithms to find a model that will provide a lower loss for each batch (Stochastic approach?) or which will provide a lower loss for an entire epoch (Global approach, I guess?)

I know that stochastic approach has some good proprieties for gradient descent, but would that be also valid for genetic algorithms? So far, I've tested the stochastic approach and it seems that it may cause the epoch loss to both decrease on some epochs and increase at anothers...

oblique quarry Jul 26, 2023, 12:52 PM

#

barren fable I have a question in machine learning. A lot of people have told me that when yo...

It really depends. When I have a abundance of data i usually split my data into test, train, val but you can always use cross-validation for cases where you dont have that much data at hand. If you're interested: https://towardsdatascience.com/what-is-cross-validation-60c01f9d9e75

Medium

What is Cross-Validation?

Testing your machine learning models with cross-validation

mild dirge Jul 26, 2023, 12:55 PM

#

barren fable I have a question in machine learning. A lot of people have told me that when yo...

If you want to know how well your model would perform in the real world, then you need a separate test dataset that you ahven't used at all in the processing of designing your model.

lapis sequoia Jul 26, 2023, 12:57 PM

#

barren fable I have a question in machine learning. A lot of people have told me that when yo...

it depends on the context. For online competition like kaggle where there is a score given to you maybe you won't need validation but for a real life project if you only use train and test you would be just tuning your model until it works on the test but not until it generalizes. Its like you are overfitting the test set not the model.

oblique quarry Jul 26, 2023, 1:23 PM

#

is there somebody with more experience who can review my code ```py
import numpy as np
import scipy.signal

class Convolution():
def init(self, inputSize, kernelSize):
self.weight = np.random.randn(inputSize[0], kernelSize, kernelSize) / kernelSize**2
self.outputShape = (inputSize[0],inputSize[1] - kernelSize + 1, inputSize[0] - kernelSize + 1)
self.bias = np.random.randn(*self.outputShape)
self.kernelSize = kernelSize

def image(self, images):
    for batch in range(len(images)):
        yield images[batch], batch

def forward(self, images):#performing crossCorrelation
    self.input = images
    for image, b in self.image(images):
        for y in range(self.outputShape[1]):
            for x in range(self.outputShape[2]):
                self.bias[b, y, x] += np.sum(image[y:y+self.kernelSize, x:x+self.kernelSize] * self.weight[b])
    return self.bias

def backward(self, gradient):
    self.dbias = gradient
    self.dweight = np.zeros_like(self.weight).astype(np.float64)
    dInput = np.zeros_like(self.input).astype(np.float64)
    _, h, w = gradient.shape
    for grad, batch in self.image(gradient):
        for y in range(h):
            for x in range(w):
                self.dweight[batch] += self.input[batch, y:y+self.kernelSize, x:x+self.kernelSize] * grad[y,x]
                dInput[batch, y:y+self.kernelSize, x:x+self.kernelSize] += scipy.signal.convolve2d(np.flip(self.weight[batch]), grad[y,x].reshape((1,1)), mode="full")
    return dInput

conv = Convolution((2, 4,4), 2)
bilder = np.random.randn(2, 4, 4)
out = conv.forward(bilder)
dInput = conv.backward(out)

#

(I'll vectorize the code but before i do that i want to know if everything checks out)

spark inlet Jul 26, 2023, 1:57 PM

#

Traceback (most recent call last):
  File "main.py", line 1, in <module>
    import cv2
  File "/home/runner/some-school-project/venv/lib/python3.10/site-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/home/runner/some-school-project/venv/lib/python3.10/site-packages/cv2/__init__.py", line 111, in bootstrap
    load_first_config(['config.py'], True)
  File "/home/runner/some-school-project/venv/lib/python3.10/site-packages/cv2/__init__.py", line 109, in load_first_config
    raise ImportError('OpenCV loader: missing configuration file: {}. Check OpenCV installation.'.format(fnames))
ImportError: OpenCV loader: missing configuration file: ['config.py']. Check OpenCV installation.

#

code:

import cv2

img = cv2.imread("img.jpg")
cv2.imshow("output image", img)

cv2.waitkey(0)

#

im new to python am i doing something wrong?

pseudo spire Jul 26, 2023, 2:41 PM

#

@spark inlet what are you trying to do?

spark inlet Jul 26, 2023, 2:43 PM

#

pseudo spire <@755851493432754186> what are you trying to do?

trying to read and image with open cv and display it but its not even installing opencv python thingy

soft dock Jul 26, 2023, 3:11 PM

#

Perhaps this may help with your issue?
https://github.com/opencv/opencv/issues/14064

GitHub

OpenCV loader: missing configuration file: ['config.py'] · Issue #1...

System information (version) OpenCV => 3.4.5 / 4.0.0 / 4.0.1 Operating System / Platform => Ubuntu 16 Compiler => g++ Python => 3.6.8 Detailed description I compiled the open-cv with gs...

indigo wing Jul 26, 2023, 4:39 PM

#

People I need immediate help, I am working on a OMDENA AI project on the baseline regression model. I am a 3rd year B.Tech CSE DS student. I have only worked with the datasets given in our classes with questions on them. I have no idea what is happening here. I can't understand how can I contribute to the project. If I can't make some contribution then I will be kicked out. I want to contribute but it feels so chaos, but not out of my understanding. I don't understand what can I contribute tot he project. Please advice.

#

Please help. Me noob. The scale of participation of this project and their achievements, knowledge is far greater than mine

young granite Jul 26, 2023, 6:51 PM

#

speak with ur project partners and find a way maybe u just dont get one point and in an discussion u get back on track @indigo wing

hoary sphinx Jul 26, 2023, 7:12 PM

#

Is anyone here doing job as data scientist?

cunning falcon Jul 26, 2023, 8:01 PM

#

I am not sure if this is the place to ask this question. I have been reading about statistical learning with Python.

https://hastie.su.domains/ISLP/ISLP_website.pdf

It seems that a row in a matrix is called a “feature” vector. A column in the matrix is a vector. Is there a special vector name for a column, like there is for a row?

pseudo spire Jul 26, 2023, 8:14 PM

#

spark inlet trying to read and image with open cv and display it but its not even installing...

Should work. Check maybe you use venv inside IDE, and the packages are installed outside the venv

humble portal Jul 26, 2023, 8:16 PM

#

I'm getting this issue on the server I use. It started happening a few days ago and I can still use CUDA, but it is both slow and outputs a warning to console every time I spawn a new process.

The warning being Can't initialize NVML. If I try nvcc --version I get Failed to initialize NVML: Driver/library version mismatch

The server is running Ubuntu and I do not have super user access. The server administrator is refusing to do anything about it until it becomes a major issue rather than just an annoyance, and I am unable to use conda rather than pip due to the standards of the paper I am aiming on submitting to in the end.

young granite Jul 26, 2023, 8:23 PM

#

hoary sphinx Is anyone here doing job as data scientist?

many ppl around here maybe ask ur question directly, if u seek for carrer advices go to #career-advice

misty flint Jul 26, 2023, 9:17 PM

#

lyric olive I have started working as healthcare AI ML engineer, any good resources for AI i...

talk to your SMEs, talk to your SMEs, talk to your SMEs

(important things need to be said 3x. i am also in healthcare tech)

#

other than that if you need MLE resources, i highly recommend madewithml.com

#

if you are into traditional books, i highly recommend "Machine Learning Engineering in Action by Ben Wilson" (more than worth every penny i spent)

vernal acorn Jul 26, 2023, 11:21 PM

#

Hey guys! So I want to get into the python GUI space with a project essentially as a annotation checker for a TTS dataset. The idea being to

A) Load long audio and a csv file(s) indicating the text and timestamp points for when a NN thinks it said that point and
B) Allow the user to scrub through the audio, and modify/confirm the timestamp points.

My problem comes seemingly with a decent audio playback system, with seemingly no GUI libraries supporting comprehensive (not to mention not 2000esk UI) elements for audio processing. Is there any libraries or examples out there that can handle something like this in python?

#

Essentially, I want to create a dumb-downed version of what can be seen here with Prodigy:

#

https://prodi.gy/docs/audio-video#manual

Prodigy

Audio and Video · Prodigy · An annotation tool for AI, Machine Lear...

A downloadable annotation tool for NLP and computer vision tasks such as named entity recognition, text classification, object detection, image segmentation, A/B evaluation and more.

slim bone Jul 26, 2023, 11:28 PM

#

Hey fellas, I’m trying to get into ML but I feel like I’m drowning in an endless sea of documentation and terminologies while not really going anywhere.

I have some academic background and some of the math nailed down already (mainly calculus and linear algebra). Can someone please recommend me a book that teaches some PyTorch? Many thanks in advance

mild dirge Jul 26, 2023, 11:35 PM

#

This one was good
https://www.manning.com/books/deep-learning-with-pytorch

Manning Publications

Deep Learning with PyTorch

Create neural networks and deep learning systems with PyTorch. Discover best practices for the entire DL pipeline, including the PyTorch Tensor API and loading data in Python.

#

Might be a new version out

jovial elm Jul 27, 2023, 2:48 AM

#

anyone had trouble with enabling GPU for tensorflow? i've followed a step by step tutorial and they have my exact graphics card (1050 Ti) but 0 luck getting it to work

serene scaffold Jul 27, 2023, 2:48 AM

#

jovial elm anyone had trouble with enabling GPU for tensorflow? i've followed a step by ste...

what happens when you do nvidia-smi from a terminal

jovial elm Jul 27, 2023, 2:49 AM

#

serene scaffold Jul 27, 2023, 2:50 AM

#

jovial elm

remember to always show text as text, not a screenshot.

how do you know that using tensorflow with the gpu isn't working? I'm not saying it isn't, but like what code are you running, and what does it do that is different from what you expect?

jovial elm Jul 27, 2023, 2:54 AM

#

Not sure why it says 12.2 in the top right, i installed 11.8 after noticing tensorflow specifically asks for 11.8 instead of 12.2
I even uninstalled every 12.2 version so that's weird.

and I run this code to determine if it has successfully detected the GPU

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

and so far it outputs [] only, which is an empty array meaning it hasn't found any GPU.

#

i also installed anaconda, and used the conda environment to handle the installation of tensorflow and all that for me, tested it with that environment and it was the exact same result

#

previously i used my default python installation, modules installed with pip

serene scaffold Jul 27, 2023, 2:56 AM

#

I'm not seeing any solutions that don't involve anaconda, but I don't use anaconda, so I'll have to get off here.

jovial elm Jul 27, 2023, 2:59 AM

#

I'll just try restarting my PC for now and see if that'll do anything.

#

spoiler: it did not

#

And it still says 12.2 when running nvidia-smi so i think i'll make this my focus to try and fix because i have no idea what else lol

misty flint Jul 27, 2023, 5:00 AM

#

vernal acorn Hey guys! So I want to get into the python GUI space with a project essentially ...

theres a ton of these in react. im having to go this route for one of my side projects involving podcasts. python ones just dont cut it for me

vernal acorn Jul 27, 2023, 5:02 AM

#

misty flint theres a ton of these in react. im having to go this route for one of my side pr...

Yeah, its looking more and more like thats the only reasonable option, pythons GUI toolkits just arent there yet

#

I do wonder if something like plotly and dash with react wrappers would do the trick, but it might already complicate the issue

misty flint Jul 27, 2023, 5:03 AM

#

that might not work

#

if you need to wrap react components, i recommend this one https://reflex.dev/

Reflex

Performant, customizable web apps in pure Python. Deploy in seconds.

#

reminiscent of streamlit

#

lets you wrap some of the popular react components you might need out there so you dont have to work with custom react code if you dont need to: https://reflex.dev/docs/advanced-guide/wrapping-react

vernal acorn Jul 27, 2023, 5:06 AM

#

Ill try that out

#

Thanks!

misty flint Jul 27, 2023, 5:06 AM

#

np. gl

brittle storm Jul 27, 2023, 5:46 AM

#

Any one know how to use PyBluez?

barren fable Jul 27, 2023, 6:13 AM

#

ML (KNN) - Finding The Best n_neighbor

# Perform Grid Search for best n_neighbors on the validation set
param_grid = {'n_neighbors': range(1, 11)}
knn_model = KNeighborsClassifier()
grid_search = GridSearchCV(knn_model, param_grid, cv=5, scoring='accuracy')
grid_search.fit(xTrain, yTrain)

# Get the best value for n_neighbors from the validation set
best_n_neighbors = grid_search.best_params_['n_neighbors']

print("Best n_neighbors:", best_n_neighbors)

# Train the model with the best hyperparameter on the combined training and validation data
final_knn_model = KNeighborsClassifier(n_neighbors=best_n_neighbors)
final_knn_model.fit(np.vstack((xTrain, xValidation)), np.concatenate((yTrain, yValidation)))

# Test the final model on the test data
yPrediction = final_knn_model.predict(xTest)

# Calculate accuracy and display it as a percentage
accuracy = np.mean(yPrediction == yTest)
print("Accuracy: {:.2%}".format(accuracy))

# You can also use classification_report to get detailed metrics
print(classification_report(yTest, yPrediction))

I took this code from chatgpt to find the best n_neighbor, but the problem is that I tested it on many different codes, and every time it gave me the result that the best n_neighbor was 1, and its accuracy was 81%. When I tested n_neighbor manually, it gave me these results.

(n_neighbor: accuracy)
1: 82%
2: 80%
3: 82%
4: 81%
5: 83%
6: 81%
7: 83%
8: 82%
9: 83%
10: 82%

5, 7 and 9 are better so why the code didn't print anyone of them?

crimson quiver Jul 27, 2023, 8:59 AM

#

!resources

arctic wedgeBOT Jul 27, 2023, 8:59 AM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

wind cosmos Jul 27, 2023, 9:46 AM

#

Suggest a good comprehensive data science, AI ML course available online for paid/free ( yt, udemy, Coursera anything works)
Preferably cheaper

lapis sequoia Jul 27, 2023, 9:53 AM

#

wind cosmos Suggest a good comprehensive data science, AI ML course available online for pa...

this is vague , what do you want to learn exactly

wind cosmos Jul 27, 2023, 9:54 AM

#

Data science/ Data analytics
I'll be pursuing a applied statistics and data science/analytics degree from next year and I want to learn all that ahead of time and prepare my projects pre hand to stay ahead of my college

humble shore Jul 27, 2023, 12:28 PM

#

yay

#

there is an ai and ml field

#

: )

#

any one uses keras, tf or sklearn?

lapis sequoia Jul 27, 2023, 12:37 PM

#

can someone explain to me the SPPF layer in YOLO?

rugged mist Jul 27, 2023, 1:56 PM

#

what's the best way to do torch.tensor([model(torch.tensor([t])) for t in T])
(T is a 1d tensor)

tidal bough Jul 27, 2023, 1:59 PM

#

torch.tensor([t]) is weird; you can probably do, like,

torch.tensor([model(t) for t in T[:,None]])

to make all the t have a shape of (1,) already.

rugged mist Jul 27, 2023, 1:59 PM

#

any way to avoid the list comp?

jovial elm Jul 27, 2023, 1:59 PM

#

jovial elm spoiler: it did not

update: nvidia-smi showing 12.2 was normal, so that wasn't the problem.
went ahead and installed conda & tensorflow in WSL ubuntu environment and bravo hurray it works .. !

>>> e = tf.config.list_physical_devices('GPU')[0]
>>> e
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')

tidal bough Jul 27, 2023, 1:59 PM

#

rugged mist any way to avoid the list comp?

well, you can preallocate the result tensor and loop over T writing the results into it

spark inlet Jul 27, 2023, 2:00 PM

#

pseudo spire Should work. Check maybe you use venv inside IDE, and the packages are installed...

what?

tidal bough Jul 27, 2023, 2:00 PM

#

besides that, though, not much advice except "vectorize your model so you don't have to do this".

spark inlet Jul 27, 2023, 2:00 PM

#

im using replit online ide

pseudo spire Jul 27, 2023, 2:00 PM

#

use local then

spark inlet Jul 27, 2023, 2:03 PM

#

pseudo spire use local then

i cant im not working alone

rugged mist Jul 27, 2023, 2:13 PM

#

tidal bough besides that, though, not much advice except "vectorize your model so you don't ...

okay im probably very deep into an x-y problem so just ignore that previous question

im trying to achieve sthn similar to this video about solving an ODE with an nn

the nn's shape is like 1->32->1 and its being used to approximate an R->R function such that it obeys NN'(t) = f(NN(t), t)
to do this it makes a loss function L(NN) = sum(NN'(t) - f(NN(t), t) for t in T)

as far as i understand, normally the flow is like: for each (t, xtrue) pair, loss is L(model(t), xtrue)
but here its not like the loss takes in a single input-output pair and returns the loss for that sample, instead the loss takes in all input-output pairs and returns the loss for the whole set

tidal bough Jul 27, 2023, 2:15 PM

#

rugged mist okay im probably very deep into an x-y problem so just ignore that previous ques...

Usually if you have an nn of shape 1->32->1 (so each sample is one number), you can apply it to an array of shape like(N,1) (N samples one number each) to get an output of shape (N,1) (the results for each sample).

#

sum(NN'(t) - f(NN(t), t) for t in T)
I suspect you want to have, like, a square inside the sum here, otherwise it's very easy to get 0 or negative loss without being anywhere close to optimality.

rugged mist Jul 27, 2023, 2:19 PM

#

tidal bough Usually if you have an nn of shape `1->32->1` (so each sample is one number), yo...

oh, id tried that first and got some error about matmul shapes not being compatible so i thought it wasnt possible
i initialized my T as a linspace, doing .reshape(-1, 1) fixed it

rugged mist Jul 27, 2023, 2:19 PM

#

tidal bough > sum(NN'(t) - f(NN(t), t) for t in T) I suspect you want to have, like, a squar...

yea i do have that, missed it when typing here

civic elm Jul 27, 2023, 3:03 PM

#

Hi, this is from the book Hands-On Machine Learning... I am confused why would models be biased and how?

jovial elm Jul 27, 2023, 3:32 PM

#

jovial elm update: nvidia-smi showing 12.2 was normal, so that wasn't the problem. went ahe...

UPDATE: IT WORKS

#

(On windows this time, instead of WSL)

#

I ran

conda install python=3.8
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
python -m pip install "tensorflow<2.11"

and that worked

misty flint Jul 27, 2023, 3:46 PM

#

civic elm Hi, this is from the book Hands-On Machine Learning... I am confused why would m...

because the features aren't normalized. they arent on the same scale. do you see how the absolute values for both ML features in the example are drastically different?

if you dont normalize/scale them, then the model will bias towards the one with greater values; in this case, total # of rooms. someone feel free to amend/clarify my response.

scarlet iron Jul 27, 2023, 4:12 PM

#

hey, i'm starting out with AI and data science and want to build a project. fine tuning LLMs has really interested me, although i'm not sure if that's something one can do as a begineer or how tough or complex it is

agile jackal Jul 27, 2023, 4:21 PM

#

gpt4all what training mod is better, I want unhinged results that might be accurate
I have gpt4all and mpt

inland nebula Jul 27, 2023, 6:05 PM

#

Hey guys, I've been learning Python as of late and just got an idea for a motion/limb tracking project; what Python libraries and related technologies should I use to make something like that? I know OpenCV may be one option, but I wanna know all my options before I start working on it

#

Also, I've got another idea regarding GPT where I feed the chatbot some data on a specific data which can then be used for knowledge purposes (e.g. lawbotpro.com), but I don't really know where and how to start working on it. YT tutorials only go through surface level knowledge. What should I do regarding this, and where should I start?

oblique quarry Jul 27, 2023, 6:15 PM

#

inland nebula Hey guys, I've been learning Python as of late and just got an idea for a motion...

the motion/limb tracking idea is great! Opencv is very useful for capturing the image but you could also use the windows api instead. As for the logic I'd go with a framework such as tf or pt. I for my part would use pt but it's really up to you

#

https://youtu.be/06TE_U21FK4

YouTube

Nicholas Renotte

AI Pose Estimation with Python and MediaPipe | Plus AI Gym Tracker ...

Tired of stacking reps at the gym?

Been lifting heavy and just can't seem to lift that pen? (actually lol'd)

Well, have I got the app for you!

In this video you'll learn how to build your very own AI powered gym tracker using AI Powered Pose estimation. You'll leverage MediaPipe and Python to detect different posts from a webcam feed. Then re...

▶ Play video

#

this vid is prob going to help you

inland nebula Jul 27, 2023, 6:37 PM

#

Thanks a lot, gonna look into this

cedar turret Jul 27, 2023, 8:35 PM

#

I have a general data analysis question, but I'm not sure where the best place to ask it is.

#

I got a dataset that contains null values in some of the rows. Under what circumstances is it ok to keep those entries in my dataset? Should it be best practices to drop all those rows, or should I look at it as kind of a case by case basis?

serene scaffold Jul 27, 2023, 8:39 PM

#

cedar turret I got a dataset that contains null values in some of the rows. Under what circum...

what kind of dataset, and what are you trying to do with it?

cedar turret Jul 27, 2023, 8:41 PM

#

It contains ride information for a bike-share company. So it contains columns such as membership type, ride start time, end time, start station name, end station name and latitude and longitude for both the start and end of the ride.

#

The null values are isolated to location information, mostly the station name.

#

The goal is to identify trends in how different membership types use the service.

#

So when I'm looking at how long or what time of day or month different member types ride, I don't really need the location information. So when I do that kind of analysis, am I fine to keep those rows if they have null location information? Or am I best dropping it all together?

serene scaffold Jul 27, 2023, 8:51 PM

#

cedar turret The null values are isolated to location information, mostly the station name.

so the null values are the lack of a name for unnamed locations?

#

sounds like you don't need that information for most of the analysis you might do.

#

but you can probably learn things about the unnamed locations given the lat/long and some external resource.

cedar turret Jul 27, 2023, 8:53 PM

#

Yeah that was my intention. I want to take the lat and long and use geopandas and a shapefile to basically find out where and which neighborhoods rides most often take place.

serene scaffold Jul 27, 2023, 8:54 PM

#

sounds like you know what to do.

cedar turret Jul 27, 2023, 8:55 PM

#

And obviously if I have a null lat and long value, well I can't use that for that analysis but when I'm looking to determine how many riders use the service in January, a lack of lat and long shouldn't mean that I need to throw out that entry right?

serene scaffold Jul 27, 2023, 8:56 PM

#

exactly

cedar turret Jul 27, 2023, 8:56 PM

#

Ok great. I just needed someone else to bounce this idea off of. Make sure I wasn't completely crazy.

serene scaffold Jul 27, 2023, 8:56 PM

#

you probably are, just in different ways.

hasty mountain Jul 27, 2023, 9:21 PM

#

Guys, is there an optimized way of taking the mean and standard deviation of a dataset without having to load it all into the memory RAM?

I have a dataset of 60,000 100x100 images, and I want to resize them into 64x64 and take the mean and standard deviation for a VAE, but I'm afraid this may cause my RAM to become a george foreman grill.

I was thinking about loading the first 5,000 images, taking their mean and standard deviation. Then, load the second 5,000 images, mean and standard deviation and so on. In the end, I would sum all those means and standard deviations and take the average.

However, I did some quick calculations with presumed numbers (like 3, 7, 5...) and discovered that this approach won't get me exactly the correct mean it would provide if I took the mean and standard deviation of all 60,000 images at once. Is there a correct approach that will also save me memory and computation time?

agile cobalt Jul 27, 2023, 9:25 PM

#

~~it should get you a mean close enough for all practical purposes?~~

#

maybe check these posts out if you haven't yet:

PyTorch Forums

Computing the mean and std of dataset

I think in the other post by @ptrblck, he is computing the mean and std over the pixels not over samples in the batch. So, then that code in About Normalization using pre-trained vgg16 networks is correct, since the goal is to compute the mean and std for each batch and then take the average of these two quantities over the entire dataset.

PyTorch Forums

About Normalization using pre-trained vgg16 networks

Hi ptrblck. Please have a look at this Computing the mean and std of dataset

hasty mountain Jul 27, 2023, 9:28 PM

#

Thanks! I'll take a look!

#

Hm... The first post seems closer to what I want, the mean and standard deviation of the whole dataset considering each pixel value. The second seems to be more focused on mean and standard deviation of the channels. The first post seems to, in a nutshell, try the same approach as I said. Taking the mean of a number of samples (batches), and, in the end, taking the mean of those mean samples.

That approach doesn't really provide the exact numbers for the complete mean, though:

Total mean: (1 + 6 + 7 + 3 + 4)/5 = 21/5 = 4.2

Partial Mean: (1 + 6 + 7)/3 + (3 + 4)/2 = 14/3 + 7/2 = (28/6 + 21/6) = 49/6 ----> Total mean would be = (49/6)/2 = 49/12 = 4.083

I suppose this difference could be discarded, them? Unless I did something wrong in my calculations...

past meteor Jul 28, 2023, 6:43 AM

#

hasty mountain Guys, is there an optimized way of taking the mean and standard deviation of a d...

There's "online" variants of the mean and standard deviation

#

Meaning, you can express them as a running total of something. For the mean it's obvious how to do this, for the stdev a bit less so but you're one quick google search away 🙂 => Welford's algorithm

fallen dagger Jul 28, 2023, 7:37 AM

#

Has anyone here coded a deep learning/ML library from scratch? If so can I see it I'm trying to write my own and want to see other people's approaches to it

agile cobalt Jul 28, 2023, 7:44 AM

#

fallen dagger Has anyone here coded a deep learning/ML library from scratch? If so can I see i...

take a look at https://course.fast.ai - particularly the second part
that is definitely not something you should write your own of though, or at least, limit your code to high-level stuff while building on top of something like PyTorch (which is what they do there, though it does explains most operations and implement them in python before switching over to using pytorch's version)

Practical Deep Learning for Coders - Practical Deep Learning

A free course designed for people with some coding experience, who want to learn how to apply deep learning and machine learning to practical problems.

barren fable Jul 28, 2023, 7:46 AM

#

Is there anyone who knows about scatter plot interpretation and linear regression, dropping some columns that are not important, etc...?

fallen dagger Jul 28, 2023, 7:59 AM

#

agile cobalt take a look at https://course.fast.ai - particularly the second part that is def...

I've done a bit of that course but I'm really just looking to write my own, don't want to use existing libraries. Thanks though.

agile cobalt Jul 28, 2023, 8:03 AM

#

just keep in mind that the performance of a ML library written in pure python would be many hundreds of thousands of times worse than something like PyTorch

#

even if it were written in C you would still likely be looking at hundreds of times worse performance, between complicated optimizations and GPU support
...mainly the later

fallen dagger Jul 28, 2023, 8:07 AM

#

that's fine it's a learning experience foremost

#

maybe I'll try to optimize it as well and replicate it in C++ later

tidal bough Jul 28, 2023, 8:11 AM

#

hasty mountain Guys, is there an optimized way of taking the mean and standard deviation of a d...

Mean can be computed in an online way, only considering an element at a time:

def mean_online(it):
    cur_mean = next(it)
    cnt = 1
    for el in it:
        # currently we have sum(lst[:cnt])/cnt, and we want sum(lst[:cnt+1]/(cnt+1)
        # so we want cur_mean = (cur_mean*cnt + el)/(cnt+1)
        # which can be rearranged a bit to get:
        mul = 1/(cnt+1)
        cur_mean = cur_mean*(1-mul) + el*mul
        cnt += 1
    return cur_mean

And for std... compute the mean square, too, then subtract the squared mean, then take the square root.

#

Now, if you want to also do it quickly... probably the best idea would be to rewrite mean_online a bit so that it works on blocks of K elements, instead of 1 element at a time.

tidal bough Jul 28, 2023, 8:28 AM

#

tidal bough Now, if you want to also do it quickly... probably the best idea would be to rew...

yup, this seems to work for me:

def mean_std_blocks(it: Iterator[np.ndarray], ddof: int = 0) -> tuple[float, float]:
    cur_mean = 0
    cur_meansq = 0
    cnt = 0
    for block in it:
        k = len(block)
        cur_mean = cnt / (cnt + k) * cur_mean + 1 / (cnt + k) * block.sum()
        cur_meansq = cnt / (cnt + k) * cur_meansq + 1 / (cnt + k) * (block**2).sum()
        cnt += k
    std = np.sqrt(cur_meansq - cur_mean**2)
    if ddof != 0:
        std *= np.sqrt(cnt / (cnt - ddof))
    return cur_mean, std

#

So you just need to load your dataset in blocks small enough to comfortably fit into memory, and feed the iterator of blocks through a function like that.

hasty mountain Jul 28, 2023, 9:20 AM

#

tidal bough yup, this seems to work for me: ```py def mean_std_blocks(it: Iterator[np.ndarra...

Nice! Thanks, guys!

hasty mountain Jul 28, 2023, 10:38 AM

#

Well...I just discovered that if I resize my 100x100 images to 64x64, the mean and standard deviation won't change that much (~0.02 more or less) pithink

#

But then... could that also be valid if I resize my 100x100 images to 200x200?

civic elm Jul 28, 2023, 11:15 AM

#

agile cobalt even if it were written in C you would still likely be looking at hundreds of ti...

Why though? Is it because of Cuda and numpy?

tidal bough Jul 28, 2023, 11:18 AM

#

hasty mountain But then... could that also be valid if I resize my 100x100 images to 200x200?

I would expect them not to, yeah. Mean changing would be like the image getting on average brighter, and std changing would be like the contrast getting on average higher, one could very roughly say - and resizing really shouldn't do it.

#

(another way of thinking about this, is that duplicating points (like, going from arr to np.concat([arr,arr])) changes neither mean nor std - resizing isn't quite the same, since it does interpolation, but it shouldn't be far from it. Though I'm not sure how to state it formally.)

hasty mountain Jul 28, 2023, 11:25 AM

#

tidal bough (another way of thinking about this, is that duplicating points (like, going fro...

Hm... Maybe it could be seen like something like this:

original_mean = (1 + 2 + 2 + 1)/4 = 6/4 = 1.5

resized_mean = (1 + 1 + 2 + 2 + 2 + 2 + 1 + 1)/8 = 12/8 = 6/4 = 1.5

?

tidal bough Jul 28, 2023, 11:26 AM

#

Yeah, that's the "duplicating points doesn't change moments" thing, but in general resizing involves linear or nonlinear interpolation over some grid.

hasty mountain Jul 28, 2023, 11:27 AM

#

Yeah, I suppose it may be possible that some modes of resizing might mess up with the statistics. But usually I'm just going for the classic mode (which is nearest neighbors, I think?)

#

Well... in that case... Maybe I could resize my 100x100 images into... 4x4 and take their mean and std? pithink

lapis sequoia Jul 28, 2023, 12:05 PM

#

what do you do at work when you are waiting for some model training?

tidal bough Jul 28, 2023, 12:43 PM

#

hasty mountain Well... in that case... Maybe I could resize my 100x100 images into... 4x4 and t...

Is there a reason you want that? Like, taking the mean and std of the original image would almost certainly be faster than resizing it.

hasty mountain Jul 28, 2023, 12:47 PM

#

tidal bough Is there a reason you want that? Like, taking the mean and std of the original i...

It's because of my Variational AutoEncoder. It generates parameters of a normal distribution, so I have to denormalize its outputs to get the proper images.

civic elm Jul 28, 2023, 2:20 PM

#

fallen dagger I've done a bit of that course but I'm really just looking to write my own, don'...

Coursera's Andrew Ng ML course is the one right for your needs

#

they use numpy though, but that's opensource and written in c/c++ I think?

coral field Jul 28, 2023, 2:51 PM

#

does anyone know any model/ website i can use to find the best font style for an image?

spare briar Jul 28, 2023, 3:16 PM

#

hasty mountain It's because of my Variational AutoEncoder. It generates parameters of a normal ...

I think this is slightly misunderstanding the VAE. You sample from a gaussian z~N but learn a decoder p_\theta(x|z) that maps the sample into image space. The image reconstruction quality is evaluated with a pixel-wise gaussian likelihood.

TLDR you don't need to denormalize image outputs from a VAE, it generates the image directly

spare briar Jul 28, 2023, 3:16 PM

#

hasty mountain Well... in that case... Maybe I could resize my 100x100 images into... 4x4 and t...

This would break the VAE since the objective function (evidence lower bound) includes a likelihood term over the reconstructed image pixels

#

loss = mean squared error reconstruction term - KL divergence

#

the mean squared error term is derived from the pixel-wise gaussian likelihood

#

have you read this? https://arxiv.org/abs/1312.6114

arXiv.org

Auto-Encoding Variational Bayes

How can we perform efficient inference and learning in directed probabilistic
models, in the presence of continuous latent variables with intractable
posterior distributions, and large datasets? We introduce a stochastic
variational inference and learning algorithm that scales to large datasets and,
under some mild differentiability conditions, ...

hasty mountain Jul 28, 2023, 3:19 PM

#

spare briar I think this is slightly misunderstanding the VAE. You sample from a gaussian z~...

That's the thing. It seems that the VAE actually generates parameters for a gaussian distribution for each pixel. Each value that is outputted from the Decoder would be like the position X in a gaussian distribution.
If I don't denormalize the output, it'll just generate blurry images.

spare briar Jul 28, 2023, 3:19 PM

#

No this isn't what the VAE does

hasty mountain Jul 28, 2023, 3:19 PM

#

I've been reading this:

https://arxiv.org/pdf/2006.10273.pdf

spare briar Jul 28, 2023, 3:19 PM

#

it generates latent gaussians then maps them with a neural network to denormalized pixels

hasty mountain Jul 28, 2023, 3:20 PM

#

spare briar it generates latent gaussians then maps them with a neural network to denormaliz...

That is with the MSE loss, then?

spare briar Jul 28, 2023, 3:20 PM

#

The pixels are modeled as Gaussians centered around the true pixel value (which is like adding gaussian noise)

#

The latent gaussians are the KL loss. The pixel-wise reconstruction is the MSE loss

#

assume that each pixel has a true value, but there is some noise from sampling

#

we model the noise as a gaussian

hasty mountain Jul 28, 2023, 3:20 PM

#

pithink

spare briar Jul 28, 2023, 3:21 PM

#

so when we evaluate a particular pixel we do e^{(x - true value)^2/2\sigma}

hasty mountain Jul 28, 2023, 3:21 PM

#

I've never had a VAE working on RGB images when using MSE Loss. Only on grayscale images.

spare briar Jul 28, 2023, 3:21 PM

#

if you take the log that gives you the reconstruction loss term which is a gaussian log likelihood over pixels

#

Something is wrong with your implementation

hasty mountain Jul 28, 2023, 3:22 PM

#

I've also been using this code: https://colab.research.google.com/drive/1_yGmk8ahWhDs23U4mpplBFa-39fsEJoT?usp=sharing#scrollTo=MvBo844ZHQhF

Google Colaboratory

spare briar Jul 28, 2023, 3:22 PM

#

https://arxiv.org/abs/2007.03898

arXiv.org

NVAE: A Deep Hierarchical Variational Autoencoder

Normalizing flows, autoregressive models, variational autoencoders (VAEs),
and deep energy-based models are among competing likelihood-based frameworks
for deep generative learning. Among them, VAEs have the advantage of fast and
tractable sampling and easy-to-access encoding networks. However, they are
currently outperformed by other models suc...

hasty mountain Jul 28, 2023, 3:22 PM

#

The gaussian log likelihood is indeed over the output values...but not over the pixels, but instead it considers the values as parameters of the distribution

spare briar Jul 28, 2023, 3:23 PM

#

ok this is just a very shitty old fashioned vae implementation

hasty mountain Jul 28, 2023, 3:23 PM

#

But I do found it strange that most papers still consider MSE. But MSE never worked for me

spare briar Jul 28, 2023, 3:23 PM

#

hasty mountain The gaussian log likelihood is indeed over the output values...but not over the ...

no the pixel distribution is modeled as a gaussian

#

look at the paper i linked just above

hasty mountain Jul 28, 2023, 3:24 PM

#

spare briar ok this is just a very shitty old fashioned vae implementation

It's a code by a Meta Engineer, though pithink

#

But ok, I'll take a look

#

I plan on making a paper on VAEs, so it'll be useful

spare briar Jul 28, 2023, 3:24 PM

#

or this one https://arxiv.org/abs/1906.00446

arXiv.org

Generating Diverse High-Fidelity Images with VQ-VAE-2

We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE)
models for large scale image generation. To this end, we scale and enhance the
autoregressive priors used in VQ-VAE to generate synthetic samples of much
higher coherence and fidelity than possible before. We use simple feed-forward
encoder and decoder networks, making our m...

#

its code by a meta engineer to teach VAEs

#

not to implement a state of the art high quality image generation vae

tidal bough Jul 28, 2023, 3:24 PM

#

It's a code by a Meta Engineer, though
~~ah yes, it is well known that Meta engineers only write good, quality code~~

spare briar Jul 28, 2023, 3:26 PM

#

I highly recommend that you read the VAE paper (autoencoding variational bayes) closely

#

It seems like you are misunderstanding how the VAE works/how the loss is derived

#

Then when you go to implement it follow the NVAE or VQVAE papers for all of the modern tricks to make the images look nice

hasty mountain Jul 28, 2023, 3:27 PM

#

Ok then. But then... should the dataset be normalized in a specific way? Or should the Decoder have some specific activation function?

#

I've been using a dataset scaled to be within range [-1, 1], and my VAE only worked with sigmoid activation at the decoder + GLLLoss

spare briar Jul 28, 2023, 3:28 PM

#

If the input images are scaled, the generated images will be scaled

#

Again you are working based on that very barebones VAE implementation, which is similar to the original paper and doesn't include anything learned since

lapis sequoia Jul 28, 2023, 3:29 PM

#

Hey there! I am currently pursuing a degree in bioinformatics..I was looking for someone in the same field or in the data science domain in order to participate with..to solve an extremely challenging and interesting problem statement available on Kaggle..would love if anyone would want to collaborate..looking forward to the same!!! Do DM if u r an enthusiast too!

hasty mountain Jul 28, 2023, 3:29 PM

#

spare briar If the input images are scaled, the generated images will be scaled

So, I could replace the sigmoid function by a Tanh function?

spare briar Jul 28, 2023, 3:30 PM

#

This question is missing the forest for the trees

hasty mountain Jul 28, 2023, 3:30 PM

#

Or maybe remove the final activation function at all? pithink

hasty mountain Jul 28, 2023, 3:49 PM

#

Hm... I didn't quite understand why use MSE instead of GLLLoss at all... But I do like the possibility of using Feedforward Layers for the VAE in an effective way.
I also didn't get the "codebook" thing. Would it be like...a second Encoder? Or simply...a "book" of optimizable parameters?

verbal oyster Jul 28, 2023, 3:50 PM

#

lapis sequoia Hey there! I am currently pursuing a degree in bioinformatics..I was looking for...

I can help but first explain more clear

hasty mountain Jul 28, 2023, 3:51 PM

#

I'll take a look at the vanilla VQ-VAE paper...

spare briar Jul 28, 2023, 3:55 PM

#

i would back off of implementing VAEs and work on fundamentals first
https://ermongroup.github.io/cs228-notes/
https://deepgenerativemodels.github.io/notes/index.html

Contents

Lecture notes for Stanford cs228.

Contents

Lecture notes for Deep Generative Models.

humble canyon Jul 28, 2023, 3:56 PM

#

Hey guys, we've launched an open-source AI code assistant for JetBrains and VS Code check it out! https://github.com/smallcloudai/refact

GitHub

GitHub - smallcloudai/refact: Refact: Open-source Copilot alternati...

Refact: Open-source Copilot alternative with fine-tuning - GitHub - smallcloudai/refact: Refact: Open-source Copilot alternative with fine-tuning

verbal oyster Jul 28, 2023, 3:56 PM

#

Yo can I be part of the group

hasty mountain Jul 28, 2023, 4:02 PM

#

spare briar i would back off of implementing VAEs and work on fundamentals first https://er...

Thanks. It's been quite difficult to find decent content explaining how to implement VAEs.

#

I may need to review the Lilian Weng's blog, too pithink

spare briar Jul 28, 2023, 4:05 PM

#

yeah her content is good too

#

have you read this? https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf

#

(I mean the chapters on latent variable inference)

keen kettle Jul 28, 2023, 4:55 PM

#

Hi guys ... all my ML friends I need a bit of advice from you all.

I am working with my professor to conduct research in the fraud detection domain. I've recently made a hybrid ensemble model (using ensemble learn with LR, DT, SGD and NB) and written a paper on the same. It is in review by my professor , and I hope to send it out to journals shortly.

The next aim for our research is to develop a novel machine learning model for credit card fraud detection, any insights on how I should go about implementing the model?

keen kettle Jul 28, 2023, 4:56 PM

#

keen kettle Hi guys ... all my ML friends I need a bit of advice from you all. I am workin...

From what I understand, I need to work on a new classification algorithm / model right? I haven't had a chance to talk regarding this with my professor yet, but I was feeling curious and thought I'd ask the community. 😄

lapis sequoia Jul 28, 2023, 6:27 PM

#

is this a good channel for data visualization questions

#

data vis in this context would be computer graphics based, not like say R based

boreal gale Jul 28, 2023, 6:34 PM

#

lapis sequoia is this a good channel for data visualization questions

data visualisation is one part of the data science workflow, so i would argue yes, feel free to post your question here 🙂

lapis sequoia Jul 28, 2023, 6:36 PM

#

don't have any yet but when I do I will

lapis sequoia Jul 28, 2023, 6:50 PM

#

lapis sequoia data vis in this context would be computer graphics based, not like say R based

Cool. What do you make your visualisations with

hasty mountain Jul 28, 2023, 6:52 PM

#

spare briar have you read this? https://www.microsoft.com/en-us/research/uploads/prod/2006/0...

Nah. I really didn't manage to find any decent theoretical content about VAEs... Looks like I should've searched for latent variables rather than VAEs directly.

civic elm Jul 28, 2023, 10:01 PM

#

keras is awesome just saying

#

pytorch vs keras?

#

what does one offer that is exclusive from the other?

hasty mountain Jul 29, 2023, 2:37 AM

#

civic elm what does one offer that is exclusive from the other?

Pytorch, being a lower-level API, allows you to have a greater control over what happens through your models, but without being as annoying as even lower-level APIs like tensorflow

somber hamlet Jul 29, 2023, 7:49 AM

#

Hello, does someone know a matplotlib colormap that render well both on white and black background?

sleek harbor Jul 29, 2023, 9:15 AM

#

somber hamlet Hello, does someone know a matplotlib `colormap` that render well both on white ...

RdBu and twilight_shifted are my go to ones for a dark grey backgrounds. Both are fine for white as well. See https://matplotlib.org/stable/tutorials/colors/colormaps.html for others and experiment

somber hamlet Jul 29, 2023, 9:18 AM

#

sleek harbor `RdBu` and `twilight_shifted` are my go to ones for a dark grey backgrounds. Bot...

Thanks for ideas, I'll try them but I fear they may be too low contrast. I'm trying to plot somehting like the image above with many interesting lines, so constract is top priority

#

I've found a list of colors that are supposed to be high contrast on both dark and white background: https://codepen.io/finnhvman/full/bZQLgR, maybe I'll end up using a custom colormap

#

Another open question I have is which color to choose for the axis labels lemon_thinking

sleek harbor Jul 29, 2023, 9:23 AM

#

somber hamlet Thanks for ideas, I'll try them but I fear they may be too low contrast. I'm try...

I'm just guessing here, but you might want to look at the hue, not cmap. Cmap is for intensity of smth. It would be quite difficult to determine multiple different lines by intensity, not category, like that. But if that is what you want, then try smth like hsv or qualitative colormaps maybe

somber hamlet Jul 29, 2023, 9:26 AM

#

sleek harbor I'm just guessing here, but you might want to look at the hue, not cmap. Cmap is...

I see, I only took cmap in it's most basic form, a map of colors. Can you just expand a bit on hue?

sleek harbor Jul 29, 2023, 9:28 AM

#

somber hamlet I see, I only took `cmap` in it's most basic form, a map of colors. Can you just...

My bad, I was actually thinking about seaborn. I see ur using matplotlib..

young granite Jul 29, 2023, 9:32 AM

#

somber hamlet Thanks for ideas, I'll try them but I fear they may be too low contrast. I'm try...

in my opinion generate as groups with max 7 diff. colours otherwise (even if good contrast is given) u dont see anything

sleek harbor Jul 29, 2023, 9:34 AM

#

somber hamlet I see, I only took `cmap` in it's most basic form, a map of colors. Can you just...

If u want less vibrant, calm-ish colors, try these.. but I don't think that's what you want..

def dark_style():
    from cycler import cycler
    plt.style.use(["dark_background", "bmh"])
    plt.rcParams["axes.facecolor"] = "#23272e"
    plt.rcParams["figure.facecolor"] = "#23272e"
    plt.rcParams["axes.prop_cycle"] = cycler(
        "color",
        [
            "#1c90d4",
            "#ad0026",
            "#530fff",
            "#429900",
            "#d55e00",
            "#ff47ac",
            "#42baff",
            "#009e73",
            "#fff133",
            "#0072b2",
        ],
    )

dark_style()

young granite Jul 29, 2023, 9:41 AM

#

somber hamlet I see, I only took `cmap` in it's most basic form, a map of colors. Can you just...

for custom color_ramp:

rgb_list = [

]

def rgb_to_hex(r, g, b):
    return '#{:02x}{:02x}{:02x}'.format(r, g, b)

hex_list = []
for i in rgb_list:
    hex_list.append(rgb_to_hex(i[0], i[1], i[2]))

def make_Ramp(ramp_colors): 
    color_ramp = LinearSegmentedColormap.from_list( 'my_list', [ Color( c1 ).rgb for c1 in ramp_colors ] )
    plt.figure( figsize = (15,3))
    plt.imshow( [list(np.arange(0, len( ramp_colors ) , 0.1)) ] , interpolation='nearest', origin='lower', cmap= color_ramp )
    plt.xticks([])
    plt.yticks([])
    return color_ramp

custom_ramp = make_Ramp(hex_list)

rgb_tuple = cmr.take_cmap_colors(custom_ramp, "color_ticks", return_fmt='hex')

somber hamlet Jul 29, 2023, 9:48 AM

#

thanks for the provided cmap, will try it. FTR here is the native colormap, I'll compare to it

#

It's quite good actually! Probably good enough, thanks!

#

Will try to fiddle a bit around

young granite Jul 29, 2023, 10:18 AM

#

somber hamlet thanks for the provided cmap, will try it. FTR here is the native colormap, I'll...

if they are just x_shifted maybe u can do a gradient rather than a cmap

somber hamlet Jul 29, 2023, 12:34 PM

#

somber hamlet It's quite good actually! Probably good enough, thanks!

Btw, does someone know how to change/edit the color of the margin/border/outline of the figure (in black on light theme, in white on dark theme). Documentation only refers to ticks

#

Ah, I was searching with the wrong term, fig.set_edgecolor is supposed to work

#

no that's not it, it's the edges of the objects drawn pithink

somber hamlet Jul 29, 2023, 12:55 PM

#

edgecolor: The figure patch edge color: https://matplotlib.org/stable/api/figure_api.html#

def plt_test():
    fig, ax = plt.subplots()
    fig.set_edgecolor(mpl.colors.to_rgba("#FF6600")) #orange-ish
    plt.show()

yet it's still black, hmm

boreal gale Jul 29, 2023, 12:59 PM

#

somber hamlet > edgecolor: The figure patch edge color: https://matplotlib.org/stable/api/figu...

do you mean these edges?

somber hamlet Jul 29, 2023, 12:59 PM

#

is it called the frameon?

boreal gale Jul 29, 2023, 1:00 PM

#

try ax.spines['left'].set_color('red')

somber hamlet Jul 29, 2023, 1:03 PM

#

yeay, success. ax.spines[:].set_color(mpl.colors.to_rgba("#FF6600")), thanks! research hell

sleek harbor Jul 29, 2023, 1:04 PM

#

somber hamlet yeay, success. `ax.spines[:].set_color(mpl.colors.to_rgba("#FF6600"))`, thanks! ...

Share ur final theme when ur done, if u don't mind. Would be interested to check it out

odd meteor Jul 29, 2023, 4:50 PM

#

keen kettle From what I understand, I need to work on a new classification algorithm / model...

Maybe you can try implementing Cuckoo Search Optimization in your model

#

To whom it may interest.

keen kettle Jul 29, 2023, 4:58 PM

#

odd meteor Maybe you can try implementing Cuckoo Search Optimization in your model

I've heard of it for the first time today. A quick search shows it's quite a new technique. I'll take it over with my professor thanks a lot!

#

In the meantime, if you've any other suggestions I'd highly appreciate it 😄

coral field Jul 29, 2023, 5:13 PM

#

how can i incorporate huggingface models into tensorflow for transfer learning? I have google's ViT from huggingface imported, and i want to add another dense layer to identify classes

tulip barn Jul 29, 2023, 5:22 PM

#

Dunno if this is the right place for it but I found this super cool upcoming AI class

humble shore Jul 29, 2023, 7:34 PM

#

tulip barn Dunno if this is the right place for it but I found this super cool upcoming AI ...

is this like targetted for a cartain country or state or is it open world wide

delicate apex Jul 29, 2023, 7:37 PM

#

tulip barn Dunno if this is the right place for it but I found this super cool upcoming AI ...

!rule ad it's not

arctic wedgeBOT Jul 29, 2023, 7:37 PM

#

Rules

6. Do not post unapproved advertising.

noble plover Jul 29, 2023, 8:04 PM

#

Hello, can someone help me with a csv file? I am trying to read the file using pandas.

csv_file = 'recensioni.csv'
df = pd.read_csv(csv_file)

and I get this error pandas.errors.ParserError: Error tokenizing data. C error: Expected 12 fields in line 3, saw 13.
My third line of the .csv file is this one:

"Consiglio vivamente!", "È fantastico, lo adoro!",4,,Alessia,,8481001046362,,,,,
What's causing the error is the comma between the quotes. This causes the csv file thinking he has one more column. The csv parses it as it is a delimiter, but it is not. I found on the internet that you just have to put the comma between the quotes ( just like this -> "This is not, delimited" ) but it doesn't seem to work. Does anyone have any idea?

serene scaffold Jul 29, 2023, 8:14 PM

#

noble plover Hello, can someone help me with a csv file? I am trying to read the file using p...

try doing df = pd.read_csv(csv_file, sep=r"\s*,\s*") so that spaces before and after a comma (if any) are treated as part of the separator

noble plover Jul 29, 2023, 8:15 PM

#

something changed

#

I get the same error but a new information has been added to the error:
Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

serene scaffold Jul 29, 2023, 8:15 PM

#

everything changed when the fire nation attacked.

#

oh, great.

left tartan Jul 29, 2023, 8:15 PM

#

noble plover Hello, can someone help me with a csv file? I am trying to read the file using p...

You can do something like ```py
import pandas as pd
from io import StringIO

datastr = """Date,Name,Value
2023-07-01,Alice,10
2023-07-02,Bob,15,,,
"""

header = pd.read_csv(StringIO(datastr), nrows=1).columns.tolist()
pd.read_csv(StringIO(datastr), usecols=header)

#

that ignores any "extra" trailing commas

#

because: it reads the header first, then only looks for those columns.

noble plover Jul 29, 2023, 8:17 PM

#

left tartan You can do something like ```py import pandas as pd from io import StringIO da...

I have a csv file with over 200 lines, can I read it as text file and put it inside datastr right?

left tartan Jul 29, 2023, 8:17 PM

#

You don't need to use stringio, I just did that so I could provide a standalone xample

civic elm Jul 29, 2023, 8:18 PM

#

I finally got my cat/not a cat binary classifier woot!

serene scaffold Jul 29, 2023, 8:18 PM

#

noble plover I have a csv file with over 200 lines, can I read it as text file and put it ins...

unless the lines are really long, 200 lines should be nothing as far as RAM is concerned.

left tartan Jul 29, 2023, 8:18 PM

#

You'd just do ```py

header = pd.read_csv(filename, nrows=1).columns.tolist()
pd.read_csv(filename, usecols=header)```

noble plover Jul 29, 2023, 8:18 PM

#

serene scaffold unless the lines are really long, 200 lines should be nothing as far as RAM is c...

yes their not that much but I wouldn't like to insert everything in a string

civic elm Jul 29, 2023, 8:19 PM

#

5 weeks!

noble plover Jul 29, 2023, 8:19 PM

#

left tartan You'd just do ```py header = pd.read_csv(filename, nrows=1).columns.tolist() p...

okay I'll try

#

okay now it's not giving errors

#

but it's splitting where it finds the comma inside the quotes

#

the value inside rating should have been concatenated in the body. And the value inside review_date should've been inside rating

#

It parsed the quotes instead of the comma

tepid tartan Jul 29, 2023, 8:25 PM

#

The best way to start data science is focus on Linear Math first right?

left tartan Jul 29, 2023, 8:26 PM

#

noble plover okay I'll try

Try to modify my example to reproduce your problem. That's the best way to get help: a reproducible example that someone can work from.

noble plover Jul 29, 2023, 8:27 PM

#

left tartan Try to modify my example to reproduce your problem. That's the best way to get h...

I would do it. But your example in the datastr doesn't contain a field which has a comma inside quotes

#

I have many two fields in many rows that contains a comma inside quotes

left tartan Jul 29, 2023, 8:28 PM

#

noble plover I would do it. But your example in the datastr doesn't contain a field which has...

Yes, so modify the example to add a comma in a way that mirrors your data.

noble plover Jul 29, 2023, 8:28 PM

#

I'll try, thank you

tired elk Jul 29, 2023, 8:54 PM

#

hey everyone, im trying to implement this linear regression model for stock prices but I instead of a line i want to plot a more complex curve that is shaped like this data- anyone have any suggestions- also pls let me know if you think thats a bad idea because I'm a complete beginner - thanks

WhatsApp_Image_2023-07-30_at_01.59.41.jpg

desert oar Jul 29, 2023, 9:30 PM

#

tired elk hey everyone, im trying to implement this linear regression model for stock pric...

how would you draw a straight line on this chart? just intuitively, no math

mild dirge Jul 29, 2023, 9:38 PM

#

tired elk hey everyone, im trying to implement this linear regression model for stock pric...

Maybe draw a line with a confidence interval or something?

#

Like not only predicting the price change, but also the std of the price change

civic elm Jul 29, 2023, 9:51 PM

#

Draw a line using 2 vectors

#

My statistics book would tell me. You have not really explored the data set

desert oar Jul 29, 2023, 11:23 PM

#

mild dirge Maybe draw a line with a confidence interval or something?

the chart to me is basically showing 0 linear relationship, so the best fit line is horizontal. it's also showing a very particular pattern of heteroskedasticity: conditional variance is strongly and monotonically related to volatility... which it had damn well better be because that is usually how volatility is defined.

#

oh that's volume not volatility

#

lol well it's still flat

floral tangle Jul 30, 2023, 12:22 AM

#

Anyone know why the first debug print my string concat correctly and my ValueError tears the string finaly?

My call:

pos_tags = ['PRON', 'VERB', 'PUNCT']
is_valid = check_sentence(' '.join(pos_tags), grammar)

def check_sentence(sentence, grammar):
    print(f"try sentence: '{sentence}'")
    parser = nltk.ChartParser(grammar)
    try:
        for tree in parser.parse(sentence):
            print("Zugehörige Syntaxbaumstruktur:", tree)
            return True
        print("Keine Übereinstimmung mit der definierten Grammatik gefunden.")
        return None
    except ValueError as e:
        print("Fehler beim Parsen:", e)
        return False

Output: try sentence: 'PRON VERB PUNCT' Fehler beim Parsen: Grammar does not cover some of the input words: "'P', 'R', 'O', 'N', ' ', 'V', 'E', 'R', 'B', ' ', 'P', 'U', 'N', 'C', 'T'".

young granite Jul 30, 2023, 6:04 AM

#

desert oar the chart to me is basically showing 0 linear relationship, so the best fit line...

would go with what u said and maybe suggest SVR for regression but overall i would try to go with different preprocessing lol

bronze flint Jul 30, 2023, 6:06 AM

#

My prof gave me this project to work on, but i assume it was posted quite a bit ago on his website
The project itself contains 10gb of checkpoints and annotations and data which in my opinion is a lot because it was specifically formatted for Mask RCNN or Faster RCNN

This was released when Yolo V4 wasnt even out so i assume Faster RCNN is old now that YOLOV8 exists and it's format

I assume Faster RCNN is still a thing thats used or should i try and find other project that i could use YOLO on

#

I guess for the sake of practise i could do Faster RCNN

vestal widget Jul 30, 2023, 6:50 AM

#

Can someone explain for me the difference between dataset and language model?

#

I read some articles about both of them but haven't really clear about it

trail rune Jul 30, 2023, 8:50 AM

#

vestal widget Can someone explain for me the difference between dataset and language model?

Dataset is basically raw data (can be text, video etc) that is used to train machine learning models or any model at all. While language models are machine learning/ statistical models that are trained on text data and are able to generate texts based on the dataset they're trained on.
In the context of large language models, the dataset are text gathered from various sources (books, web pages etc) and the language model is trained on these texts. It's able to find relationships between the words in the text and can learn to generate more texts based on that dataset its been trained on.

vestal widget Jul 30, 2023, 8:58 AM

#

trail rune Dataset is basically raw data (can be text, video etc) that is used to train mac...

Thank for the explaination!

zealous badger Jul 30, 2023, 11:52 AM

#

hey um are there any datasets which dont have models trained on them with more than 90% accuracy?

#

its to do with a assignment , we have to find these "challenging" datasets and try and improve on these scores . i just cant seem to find any. i assume all tabular datasets have models that can have scores >90%

mild dirge Jul 30, 2023, 11:54 AM

#

Well if they have a low accuracy and they are popular, it will be hard to improve on them by yourself

#

But there are pretty old datasets like ImageNet that have barely 91% accuracy even after existing for so long

pine escarp Jul 30, 2023, 12:10 PM

#

Can you guys recommend me beginner machine learning projects?

zealous badger Jul 30, 2023, 12:22 PM

#

mild dirge But there are pretty old datasets like ImageNet that have barely 91% accuracy ev...

yeah i thought about taking it. but i assume its tough to train models on it right?

#

not feasible for an individual

desert oar Jul 30, 2023, 12:50 PM

#

young granite would go with what u said and maybe suggest SVR for regression but overall i wou...

it seems more like there simply is no function relating this x and y

desert oar Jul 30, 2023, 12:51 PM

#

pine escarp Can you guys recommend me beginner machine learning projects?

kaggle titanic and boston housing are hard to go wrong with. very well constructed datasets, lots to explore, lots of articles and blog posts for when you get stuck

buoyant mural Jul 30, 2023, 12:52 PM

#

def scrapingMobilePhones():
url="https://www.flipkart.com/search?q=mobiles under 50000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
r=requests.get(url)
soup=BeautifulSoup(r.text,"html.parser")
while True:
np=soup.find("a",class_="1LKTO3").get("href")
cnp="https://www.flipkart.com"+np
return cnp
#url=cnp
#r=requests.get(url)
#soup=BeautifulSoup(r.text,"html.parser")

print(scrapingMobilePhones())

Mobiles Under 50000- Buy Products Online at Best Price in India - A...

Shop for electronics, apparels & more using our Flipkart app Free shipping & COD.

#

help me with it

#

import requests
from bs4 import BeautifulSoup
import pandas as pd
def scrapingMobilePhones():
url="https://www.flipkart.com/search?q=mobiles under 50000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
r=requests.get(url)
soup=BeautifulSoup(r.text,"html.parser")
while True:
np=soup.find("a",class_="1LKTO3").get("href")
cnp="https://www.flipkart.com"+np
return cnp
#url=cnp
#r=requests.get(url)
#soup=BeautifulSoup(r.text,"html.parser")

print(scrapingMobilePhones()) #why I am not being able to get the href link from the class 1LKTO3

Mobiles Under 50000- Buy Products Online at Best Price in India - A...

Shop for electronics, apparels & more using our Flipkart app Free shipping & COD.

#

I am trying web scraping and I am complete beginner in it

#

fix the problem

#

I fixed it no worries

cinder schooner Jul 30, 2023, 1:30 PM

#

Hello, can anyone review this part of my resume ? I'm feeling like it may be too long or have too much details but i don't know how to rewrite it. I know maybe it should be in career-discussion but it need AI expertise to answer so i thought i'd write here

past meteor Jul 30, 2023, 2:16 PM

#

cinder schooner Hello, can anyone review this part of my resume ? I'm feeling like it may be too...

Way too much detail

cinder schooner Jul 30, 2023, 2:16 PM

#

i know but what should i remove

past meteor Jul 30, 2023, 2:16 PM

#

Most of it

unborn adder Jul 30, 2023, 2:17 PM

#

1st point, 3rd point, 5th point

past meteor Jul 30, 2023, 2:17 PM

#

Most companies have a prescreen by HR, for them that's just unreadable

cinder schooner Jul 30, 2023, 2:18 PM

#

unborn adder 1st point, 3rd point, 5th point

i remove those or i keep those?

unborn adder Jul 30, 2023, 2:18 PM

#

remove those

#

keep just 2nd and 4th

#

those others aren't saying much

cinder schooner Jul 30, 2023, 2:19 PM

#

how about this version

#

Development of a Real-time ultrasound image analysis system for an Nvidia Jetson Nano edge device.

Refactored the codebase, profiled and analyzed the model's architecture, hardware optimization and deployment to diagnose latency and overheating issues.
Developed a lighter, extensible deep multi-task model that competes with the baseline segmentation model while decreasing inference time and halving loading time.

past meteor Jul 30, 2023, 2:20 PM

#

Why add NVIDIA Jetson nano

#

Less is more, take it out. It detracts from the rest

unborn adder Jul 30, 2023, 2:21 PM

#

I think you are trying to impress them by fancy words..that won't happen haha, they know what they are looking for, list them 3,4 bullet points you think you can speak upon and that's it...they are not going to be "impressed" by your CV, they actually don't care..trust me

cinder schooner Jul 30, 2023, 2:24 PM

#

i'm really trying to tell what i worked on. So let me explain and you can tell me how to say this.
First I refactored the code base. Then i analyzed the model and profiled it with cuda tools looking for why is it taking too much inference time. I found that since its a segmentation model and then theres calculation of the bounding boxes from the segmentation masks theres a big bottleneck on the bounding box calculation part. I then changed that with another multi task model that does classification and bounding box regression. Since its a direct bounding box regression theres no need to the post processing thus no bottlenecks. and since its only one model we didnt have to load 2 models on the edge device.
And since its a multitask model its extensible so we're adding other auxilary tasks like calculating surface and angles etc

past meteor Jul 30, 2023, 2:25 PM

#

I'm on mobile otherwise I'd have written my take on it.

To me it's a bit about like academic writing, no superfluous language. Why? They're reading tons of CVs and they won't read it in detail.

Secondly, it's about being able to impress HR, technical people and business people with one document. Too much of what you have doesn't speak to HR or business. Even if it's a DS they might not be in your niche

cinder schooner Jul 30, 2023, 2:27 PM

#

that's why i asked, i'm really convinced its long and complexe but I really also want someone who after the HR that works on computer vision can know what I did. I really worked hard on this and since i'm a graduate i don't have many experiences on the computer vision part most of them are pure software engineering so i'm focusing on this

past meteor Jul 30, 2023, 2:27 PM

#

I don't think detailed explanations of what are ever relevant

unborn adder Jul 30, 2023, 2:29 PM

#

cinder schooner that's why i asked, i'm really convinced its long and complexe but I really also...

my experience so far:
-Refactored code base
-Analyzed and profiled model with CUDA tools
-Identified bottleneck
-Replaced with multi-task model for classification and bounding box regression
-Added auxiliary tasks for extensibility

I'm willing to learn more about it and can't wait to work in rapid growing environment, I'm the right choice for you,
my pleasure,
-Belga

#

.
that's it..nothing else...

#

I haven't failed any interview or job proposal I every did, I aced everything so far..plus..pls don't do cold email approach, that's the worst, hit them with that on linked in or something...or If you have to do email approach, mention them their company and don't sounds like you cold emailed 10 others companies too..this should work

cinder schooner Jul 30, 2023, 2:30 PM

#

unborn adder my experience so far: -Refactored code base -Analyzed and profiled model with CU...

what i'm struggling with is that I worked on the model but also on the optimization on ONNX and tensorrt deployment

past meteor Jul 30, 2023, 2:31 PM

#

Development of system for real time fetal ultrasound image analysis:

Development of a neural network based system to classify and locate fetuses.
Improved time-to-prediction and performance of the previous model.
Deployed the model on specialised hardware in the field.

cinder schooner Jul 30, 2023, 2:31 PM

#

and now I really know onnx and played a lot with it

past meteor Jul 30, 2023, 2:31 PM

#

Notice how I use time to prediction instead of inference time

cinder schooner Jul 30, 2023, 2:31 PM

#

Thank you really much guys

past meteor Jul 30, 2023, 2:31 PM

#

That term doesn't even exist but everyone can understand what it means

unborn adder Jul 30, 2023, 2:31 PM

#

cinder schooner what i'm struggling with is that I worked on the model but also on the optimizat...

but don't think they will be impressed by it...do you even know if that company is working on that and need someone to work on that technology?

cinder schooner Jul 30, 2023, 2:31 PM

#

sorry if you're a woman, i always say guys

past meteor Jul 30, 2023, 2:32 PM

#

HR and business don't care about onnx

unborn adder Jul 30, 2023, 2:32 PM

#

that's right

past meteor Jul 30, 2023, 2:32 PM

#

That's 2/3rd of your audience

unborn adder Jul 30, 2023, 2:33 PM

#

I hope you did a research on their company and technologies they are working on and with whom they are working with...beacuse if you mention anything unrelated to that...they are not interested

past meteor Jul 30, 2023, 2:33 PM

#

If I'm hiring I'm more interested in seeing if you can solve the problem. Why? Maybe inference on edge isn't even important, maybe you can just call the model from some API

#

ONNX is a very specific niche I think

unborn adder Jul 30, 2023, 2:34 PM

#

unless you KNOW for sure they are working with ONNX, don't mention it

past meteor Jul 30, 2023, 2:34 PM

#

(We do inference on edge but we just use a container and NVIDIA machines, they run it in exactly the same way our desktops run the models so no ONNX or tflite)

cinder schooner Jul 30, 2023, 2:35 PM

#

yeah i didn't mention onnx but thought about that hardware optimization part

unborn adder Jul 30, 2023, 2:35 PM

#

maybe they are working on something else and hate that...for example, maybe is someone using PyTorch and hate TensorFlow and if you mention that your are good and played a lot with TensorFlow, they won't care firstly, and secondly they might refuse you just because for that...it's complicated

cinder schooner Jul 30, 2023, 2:35 PM

#

that's what important for edge ai

#

thank you very much, i understand

unborn adder Jul 30, 2023, 2:35 PM

#

any time...good luck with that!

somber hamlet Jul 30, 2023, 2:37 PM

#

sleek harbor Share ur final theme when ur done, if u don't mind. Would be interested to check...

Hey Mayushii, I've compiled my results here: https://github.com/kraktus/cosmopolitan-colormap since you were interested

unborn adder Jul 30, 2023, 2:37 PM

#

any good ML papers to recommend? I don't know who to trust xd

past meteor Jul 30, 2023, 2:39 PM

#

I mean, what type of paper?

unborn adder Jul 30, 2023, 2:39 PM

#

I don't know if this is a good answer but I'm interested in computer vision

#

I have never read them..any of them..but I want to

#

is there that type of paper? on computer vision? or anything related

past meteor Jul 30, 2023, 2:42 PM

#

I'd just read dive into deep learning

#

It lists seminal papers in computer vision so you can just read those then if you want more details

unborn adder Jul 30, 2023, 2:43 PM

#

alright, any specific papers on that you would recommend?

past meteor Jul 30, 2023, 2:44 PM

#

I'd just go with a book because papers expect you to have certain prerequisite knowledge and CV is mature enough to have books that take you from A to Z

unborn adder Jul 30, 2023, 2:44 PM

#

past meteor I'd just go with a book because papers expect you to have certain prerequisite k...

oh okay, I think I will start with grokking series then, just for the start

cinder schooner Jul 30, 2023, 2:53 PM

#

unborn adder I don't know if this is a good answer but I'm interested in computer vision

you just want to read any paper on anything related to computer vision?

tired elk Jul 30, 2023, 2:53 PM

#

desert oar how would you draw a straight line on this chart? just intuitively, no math

alright thank you

pine escarp Jul 30, 2023, 2:54 PM

#

desert oar kaggle titanic and boston housing are hard to go wrong with. very well construct...

Thank you, I'll try them.

sleek harbor Jul 30, 2023, 2:56 PM

#

somber hamlet Hey Mayushii, I've compiled my results here: https://github.com/kraktus/cosmopol...

👍 I like mine best :3

P.s. they aren't exactly picked by hand tho. I ripped the colors from the bmh theme, and configured the color intensity (saturation) in such a way, so that on my specific background (the one I used up there, which is, btw, also ripped from the background color of the Dark One Pro Darker vs code theme, which is what I use), it looks best (to my eyes), when there are overlapping semi-transparent elements (such as histograms). So I get graphs that seems as if they have no background at all (cus the background of the graphs is the same as my editor theme), but others will just get a pleasant dark grey background with calm-ish colors that mix well on the background, as well as when elements are transparent. It was never meant to work well on a light background tho, never even considered that

cinder schooner Jul 30, 2023, 2:56 PM

#

unborn adder alright, any specific papers on that you would recommend?

what I do is i would read on something in particular and then build up on that. For exemple i was working on object detection so I found that there single shot detector that predict directly and two stage detectors. I started reading the papers thats shaped both so I read the papers about the versions of YOLO and what they introduced each time. Then I read the papers about RCNN and the versions so I understand more about each type and the difference. Then I started reading about the different possible loss functions used and the versions of the IOU loss they build each time. Then I read about the tuning of this models and choosing the hyperparameters then I built something to try.

somber hamlet Jul 30, 2023, 3:07 PM

#

sleek harbor 👍 I like mine best :3 P.s. they aren't exactly picked by hand tho. I ripped th...

I see, thanks for the precisions. Gave you credits for the colormap btw

vestal widget Jul 30, 2023, 3:14 PM

#

Im using nano-gpt, is it possible to create a language model for chatbot from my own dataset i give it?

serene scaffold Jul 30, 2023, 3:38 PM

#

vestal widget Im using nano-gpt, is it possible to create a language model for chatbot from my...

it should be possible to continue training the model on your own data, yes. this is called fine-tuning.

median fulcrum Jul 30, 2023, 4:15 PM

#

Hi guys, I don't see so much discussion about the jupyter notebook 7 migration, so I tought would be cool to talk here

#

the strange think is that there's not a lot of issues in github repos saying that the extension is not working properly

tepid tartan Jul 30, 2023, 4:26 PM

#

https://www.udemy.com/course/core-data-science-and-machine-learning/
I found this, let me know it worth learning this after I done my linear and prop/stats/cal math

Udemy

2023 CORE: Data Science and Machine Learning

A complete survey of all core skills required on the job

median fulcrum Jul 30, 2023, 4:27 PM

#

median fulcrum the strange think is that there's not a lot of issues in github repos saying tha...

themes

#

also

void veldt Jul 30, 2023, 5:09 PM

#

is this where I would ask questions regarding data fitting and scipy minimize?

serene scaffold Jul 30, 2023, 5:09 PM

#

void veldt is this where I would ask questions regarding data fitting and scipy minimize?

yes

void veldt Jul 30, 2023, 5:10 PM

#

so I posted my question on SO since easier to format code there, but in short just trying to confirm my code is setup properly

#

had a quick question regarding differences between LSMR and LIMFIT using my setup, I appear to be getting different solutions but don't quite understand why: https://stackoverflow.com/questions/76798827/lmfit-vs-lsmr-am-i-getting-different-fits-due-to-machine-precision

Stack Overflow

LMFIT vs. LSMR am I getting different fits due to machine precision?

I've been playing around with using LMFIT vs. LMSR for fitting of linear systems. The coefficients of this system are in fact derived from global parameters that are also being fit (i.e. I have a n...

#

Based from my understanding, my data is just trash and due to the differences between how solvers work (nelder-mead versus levenberg-marquedt), I arrive at different solutions due to the minima being within machine precision

#

like with good quality data, with the setup I have, I should arrive to the same solution. But since my data is trash, that is why I am observing the divergence

tepid tartan Jul 30, 2023, 5:24 PM

#

tepid tartan https://www.udemy.com/course/core-data-science-and-machine-learning/ I found thi...

@serene scaffold you think helps?

lapis sequoia Jul 30, 2023, 6:59 PM

#

hello, I wanted to know that if universal sentence encoder uses gpu or not?

#

hey guys please help me

grim hearth Jul 30, 2023, 7:10 PM

#

lapis sequoia hey guys please help me

you failed an import

lapis sequoia Jul 30, 2023, 7:10 PM

#

how do I fix it

#

the diffusion.py is there

#

it's not my code I am using : tortoise-tts-fast

void veldt Jul 30, 2023, 7:24 PM

#

lapis sequoia how do I fix it

samplers is not a function or class, what exactly r u trying to import here?

#

You can see what samplers is here: https://github.com/152334H/tortoise-tts-fast/blob/main/tortoise/utils/diffusion.py

GitHub

tortoise-tts-fast/tortoise/utils/diffusion.py at main · 152334H/tor...

Fast TorToiSe inference (5x or your money back!). Contribute to 152334H/tortoise-tts-fast development by creating an account on GitHub.

tidal bough Jul 30, 2023, 7:27 PM

#

void veldt samplers is not a function or class, what exactly r u trying to import here?

look at the traceback path; it's not their code - it's part of the same package.

void veldt Jul 30, 2023, 7:50 PM

#

tidal bough look at the traceback path; it's not their code - it's part of the same package.

oh...weird

cosmic lynx Jul 30, 2023, 9:14 PM

#

a few questions:

how far of a jump is a digit reading AI to something that could play nim
to make a game AI, would I need to learn another language aside from Python?

mild dirge Jul 30, 2023, 9:18 PM

#

cosmic lynx a few questions: 1. how far of a jump is a digit reading AI to something that c...

With "game AI" you probably need to look into reinforcement learning, which imo is one of the more complex fields in AI.

cosmic lynx Jul 30, 2023, 9:31 PM

#

in that case, what would be a better next step?

woeful fiber Jul 30, 2023, 9:50 PM

#

Has anyone seen a nice example repo for yolov7 object tracking in mp4 files

#

I have a little project idea for object tracking in video and since it’s kinda the thing deep learning for recognized for I figured there would be a lot of material for it

civic elm Jul 30, 2023, 9:52 PM

#

Anyone here working in a large company? what is the stack like? aws? azure?

#

What about the data extraction?

cosmic lynx Jul 30, 2023, 10:02 PM

#

civic elm Anyone here working in a large company? what is the stack like? aws? azure?

I would strongly suspect that it would largely depend on the company and industry

civic elm Jul 30, 2023, 10:05 PM

#

cosmic lynx I would strongly suspect that it would largely depend on the company and industr...

I just need info to build my cv

soft dock Jul 30, 2023, 10:52 PM

#

civic elm I just need info to build my cv

It still heavily depends on the industry you find yourself in and the company you'll be working for. It also depends on the subdiscipline of data science you'd like to go into. In my opinion, simply working on projects demonstrates a lot of your skills. Scrape ugly data from a website you're interested in, scrub it until it can fit into a model, and present your data analysis with visuals. It's even better if you can make a sort of dashboard app with it, and even better if you use a bit of DevOps to optimize how the app is shared/deployed. The specific packages and software don't matter as much as the results, because if you can do it once with x software you can be trained by your company to do it again with y software.

upper flame Jul 30, 2023, 11:12 PM

#

hey

#

does anyone understand a lil bit in finance

serene scaffold Jul 30, 2023, 11:26 PM

#

upper flame does anyone understand a lil bit in finance

Always ask your actual question right away. Don't ask to ask.

void veldt Jul 30, 2023, 11:42 PM

#

soft dock It still heavily depends on the industry you find yourself in and the company yo...

this. The underlying theory is the same regardless of language. Once you know the underlying theory behind how things r done, you know what the right questions r to ask and how to easily find answers to apply them using different languages and formats

frank helm Jul 31, 2023, 12:37 AM

#

Need a bit help with course selection.

I completed self studying calculus 1 and 2. My plan is to do calc 3 w/ probability, and linear algebra w/ statistics after that.

However, MITOCW's probabilistic systems and applied probability has been rather difficult for me. One of my good friends recommended Georgie Tech's proabibility course.

So my question is are the following two courses enough to get me started with Data Science?

Geogria's Probability https://www2.isye.gatech.edu/~sman/courses/6739/
also available on edx: https://www.edx.org/professional-certificate/gtx-probability-random-variables
Statistics for applications: https://ocw.mit.edu/courses/18-650-statistics-for-applications-fall-2016/video_galleries/lecture-videos/

OR https://ocw.mit.edu/courses/6-041sc-probabilistic-systems-analysis-and-applied-probability-fall-2013/pages/resource-index/ is a must? I am currently taking this right now and I am not a huge fan of the psets. Its way too hard and so are the recitations and since there aren't any easy problems the learning curve is way too steep. I am also in a time crunch so its hard for me to go out of my way to research new things.

Georgia's probability also has stats in the course. But its preferable I take statistics for applications as well right?

vestal widget Jul 31, 2023, 1:25 AM

#

I read some articles online said that stuff like GPT-3 and GPT-4 is a language model. So i wanna ask, does the language model is the code it self or is it some kind of data that help the code determined the output?

serene scaffold Jul 31, 2023, 1:46 AM

#

vestal widget I read some articles online said that stuff like GPT-3 and GPT-4 is a language m...

the language model is based on data about how words are used, and then it's used in the code.

bronze flint Jul 31, 2023, 9:41 AM

#

Hello,
I am converting COCO format to YOLO format and i am normalizing BBOX data to be between 0 and 1
When i did 1 epoch to just test if labels were correct, YOLO kept drawing rectangles off the actual coordinates

When i tested if i normalized coordinates correctly it worked locally

img = cv2.imread('vid_000031_frame0000043.jpg')

x = round(190.00196078431372/img.shape[1],6)
y = round(132.00196078431372/img.shape[0],6)
w = round(116.99607843137255/img.shape[1],6)
h = round(20.996078431372553/img.shape[0],6)

print(x,y,w,h)

x_pixel = int(x*img.shape[1])
y_pixel = int(y*img.shape[0])
w_pixel = int(w*img.shape[1])
h_pixel = int(h*img.shape[0])

print(x_pixel,y_pixel,w_pixel,h_pixel)
print(img.shape)
cv2.rectangle(img, (x_pixel,y_pixel), (x_pixel+w_pixel,y_pixel+h_pixel),(255,0,0),4)

It drew it well
I normalized and got it back to original state and everything worked well

I am unsure if YOLO is doing something badly or if i normalized data badly for the YOLO format
All i did is scale it based on image height and width as u can see here

x = round(190.00196078431372/img.shape[1],6)
y = round(132.00196078431372/img.shape[0],6)
w = round(116.99607843137255/img.shape[1],6)
h = round(20.996078431372553/img.shape[0],6)

lapis sequoia Jul 31, 2023, 10:46 AM

#

@sonic vapor offering money for services. Against rules.

north rain Jul 31, 2023, 10:51 AM

#

@zealous ermine You'd be better off looking for paid work on a freelance site like fiverr or upwork, we don't allow the solicitation of paid work here

#

!rule 9

arctic wedgeBOT Jul 31, 2023, 10:53 AM

#

Rules

9. Do not offer or ask for paid work of any kind.

slim bone Jul 31, 2023, 11:49 AM

#

Hey fellas, need a quick check on my understanding - Is "Label" the attribute which classifies what kind of data we fed the machine?
So, when we run some data through a neural network, we want it to, ideally, output label - and when the network does(?) back propagation it calculates the cost relative to the label (which essentially tells it the optimal outcome)?

I apologize if the explanation is unclear, I can rephrase if needsbe

serene scaffold Jul 31, 2023, 12:39 PM

#

slim bone Hey fellas, need a quick check on my understanding - Is "Label" the attribute wh...

a classification model is one that takes things and says which class they belong to. Like if you have a model that takes pictures, and it can tell you if the picture is of a cat or of a dog, then the classes are "cat" and "dog". and then your training and test instances are labeled as "cat" or as "dog".

#

and when the network does(?) back propagation it calculates the cost relative to the label (which essentially tells it the optimal outcome)?
your understanding is missing some steps. you can't calculate "the cost relative to a label". a label is a symbol, not a number that you can do math/calculations with.

#

@slim bone let me know when you're here and we can go into more detail.

slim bone Jul 31, 2023, 12:42 PM

#

I’m internalizing, I’ll ping you in a moment

serene scaffold Jul 31, 2023, 12:42 PM

#

okie

slim bone Jul 31, 2023, 12:42 PM

#

I appreciate the detailed explanation

#

and then your training and test instances are labeled as "cat" or as "dog".
This is indeed what I thought what labels were initially, but I'm following this link at the moment:
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
And the label (variable: labels) appears to be an array. What does this array represent exactly?

a label is a symbol
I'm assuming the first question would answer this

@serene scaffold Forgot to mention, I'm free to chat now

serene scaffold Jul 31, 2023, 12:56 PM

#

slim bone > and then your training and test instances are labeled as "cat" or as "dog". Th...

you're right that labels (the variable) are an array/tensor. when the labels are discrete, non-numeric symbols, a popular way to represent them as arrays is one-hot encoding. So you might decide that [1, 0] is the array for "dog", and [0, 1] is the array for "cat".

And if you have a sequence of images that are [cat, cat, dog], the representation would be

[[0, 1],
[0, 1],
[1, 0]]

does that make sense so far, @slim bone?

#

if you had a third class, like "turtle", then those could be [0, 0, 1], and you'd have to add an extra zero to "dog" and "cat" (because now there's three classes, not two)

slim bone Jul 31, 2023, 12:57 PM

#

serene scaffold you're right that `labels` (the variable) are an array/tensor. when the labels a...

And if you have a sequence of images that are [cat, cat, dog]
Is this a dataset, essentially? And the entire dataset gets a single label?

#

Also, on that note - can a desired outcome be [0.5,0.5,0] for example? (Probably not in this model, but perhaps a different one?)

serene scaffold Jul 31, 2023, 12:58 PM

#

slim bone > And if you have a sequence of images that are [cat, cat, dog] Is this a datase...

no, each image gets its own label. [cat, cat, dog] is a slice of three images from the whole dataset. hopefully you have more than three 😄

serene scaffold Jul 31, 2023, 12:59 PM

#

slim bone Also, on that note - can a desired outcome be `[0.5,0.5,0]` for example? (Probab...

there might be different cases where you want that. but in this case, an output of [.5, .5, 0] would mean "there's a 50% chance this is a dog and a 50% chance this is a cat, but I'm not sure beyond that". but you want the model to produce a specific answer.

slim bone Jul 31, 2023, 1:00 PM

#

serene scaffold no, each image gets its own label. [cat, cat, dog] is a slice of three images fr...

Ah, maybe I don't entirely understand what a dataset is
Why would you want a sequence of images? To quicken the computation of er... the correction of the weights and biases? (Gradient descent?)

I apologize for the faulty terminology this is all still very new to me

slim bone Jul 31, 2023, 1:01 PM

#

serene scaffold there might be different cases where you want that. but in this case, an output ...

Right, right
Was just wondering about that

serene scaffold Jul 31, 2023, 1:05 PM

#

slim bone Ah, maybe I don't entirely understand what a dataset is Why would you want a seq...

in this case, a dataset is all the images you have to train and test your model.

Why would you want a sequence of images? To quicken the computation of er... the correction of the weights and biases?

this is getting into the batch size. which is a hyperparameter. the batch size is the number of training instances, and the model's current outputs on those instances, that the model considers at a time when calculating the direction of the gradient.

slim bone Jul 31, 2023, 1:06 PM

#

serene scaffold in this case, a dataset is all the images you have to train and test your model....

Ah, so the dataset all of the data you're training the machine on
I think I understand the rest of your explanation

serene scaffold Jul 31, 2023, 1:06 PM

#

slim bone Ah, so the dataset **all** of the data you're training the machine on I think I ...

you don't use the whole dataset for training. you have to keep part of it aside for testing.

slim bone Jul 31, 2023, 1:07 PM

#

serene scaffold you don't use the whole dataset for training. you have to keep part of it aside ...

Ah, that makes perfect sense. I didn't think about that

slim bone Jul 31, 2023, 1:08 PM

#

serene scaffold in this case, a dataset is all the images you have to train and test your model....

So uhm, if I could ask about the example I linked earlier specifically - the reason you might want an array with a thousand variables, is because each batch is "of size 1000"? Like, in the example you mentioned earlier, its a thousand images where each image contains either a cat, dog or turtle?

serene scaffold Jul 31, 2023, 1:12 PM

#

slim bone So uhm, if I could ask about the example I linked earlier specifically - the rea...

it looks to me like labels = torch.rand(1, 1000) is intended to represent all the y data for the dataset.

do you know the difference between X and y data?

slim bone Jul 31, 2023, 1:13 PM

#

serene scaffold it looks to me like `labels = torch.rand(1, 1000)` is intended to represent all ...

I'm unfamiliar with the terms x and y data, should I read up on those real quick?

serene scaffold Jul 31, 2023, 1:15 PM

#

in this case, the y data is the label for the image, and the X data is the image itself

slim bone Jul 31, 2023, 1:15 PM

#

serene scaffold in this case, the y data is the label for the image, and the X data is the image...

Oh, can you say that the dataset is made of x-data and that y-data are the possible outcomes?

serene scaffold Jul 31, 2023, 1:16 PM

#

slim bone Oh, can you say that the dataset is made of x-data and that y-data are the possi...

I guess that's one way of putting it.

slim bone Jul 31, 2023, 1:16 PM

#

If not, you don't have to explain. I'll just read on it later - I don't want to waste your time

#

Alright

serene scaffold Jul 31, 2023, 1:16 PM

#

You are not wasting my time.

#

Using it? yes. but not wasting it.

slim bone Jul 31, 2023, 1:17 PM

#

Apologies, I meant - I don't want to waste your time if I can probably read on this myself later
If you're still happy to spare an explanation I'll gladly take it obviously

#

Currently, I'm just trying to understand this little labels array

serene scaffold Jul 31, 2023, 1:17 PM

#

the page you're looking at doesn't show a whole training procedure.

slim bone Jul 31, 2023, 1:18 PM

#

Right

serene scaffold Jul 31, 2023, 1:18 PM

#

but if you had an image classifier for cats and dogs, a batch of three images would look like this for the y data

[[0, 1],
[0, 1],
[1, 0]]

slim bone Jul 31, 2023, 1:18 PM

#

Right, so far so good

serene scaffold Jul 31, 2023, 1:19 PM

#

and a batch of three 64-by-64 pixel colored images would be an array or tensor of shape (3, 3, 64, 64)

#

(the first three is three images, and the second three is red-green-blue)

slim bone Jul 31, 2023, 1:20 PM

#

Makes perfect sense

#

In the example though, data is only a single image
So wouldn't you only need a single label?

Wait actually, that is a single label, isn't it? It's just that there's "a thousand y-data"?

#

Or at least,* a thousand possible outcomes?

serene scaffold Jul 31, 2023, 1:21 PM

#

slim bone In the example though, `data` is only a single image So wouldn't you only need a...

the 1000 probably means a thousand instances. and if there's only one value per instance, it's probably some probability between 0 and 1

#

it might be that this is a binary classifier. a binary classifier for pictures of cats would give you the probability that the picture is of a cat

slim bone Jul 31, 2023, 1:23 PM

#

serene scaffold the 1000 probably means a thousand instances. and if there's only one value per ...

What does "instance" mean in this context?

serene scaffold Jul 31, 2023, 1:23 PM

#

slim bone What does "instance" mean in this context?

an image

slim bone Jul 31, 2023, 1:23 PM

#

Would you reckon I should even get hung up on this piece of code? If I think I understand your explanation for labels at least?

#

I'm just trying to dip my toes and familiarize myself with the terminology, so I can eventually read the Pytorch documentation without drowning

#

I'm about to enter my second year as a CS student and figured this would be a nice summer project

#

Considering I have some of the mathematical background, although probably very little of it.

serene scaffold Jul 31, 2023, 1:25 PM

#

slim bone Would you reckon I should even get hung up on this piece of code? If I think I u...

I think that article is mostly about the math of gradient descent. if you want to understand the code, you should look at something that contains a whole experiment

slim bone Jul 31, 2023, 1:25 PM

#

serene scaffold I think that article is mostly about the math of gradient descent. if you want t...

Makes sense. Any recommendations for where to find such experiments?

serene scaffold Jul 31, 2023, 1:26 PM

#

slim bone Makes sense. Any recommendations for where to find such experiments?

not off the top of my head

slim bone Jul 31, 2023, 1:26 PM

#

Should I just look up some papers? Or are you referring to something a little more basic

slim bone Jul 31, 2023, 1:26 PM

#

serene scaffold not off the top of my head

Ah, not a problem
I appreciate your help and patience. Thank you

left tartan Jul 31, 2023, 1:28 PM

#

slim bone Ah, not a problem I appreciate your help and patience. Thank you

for what it's worth, have you started at the absolute beginning on this content? Like, start with something like 3b1b's excellent NN intro: https://www.youtube.com/channel/CSANnvayOd8Ay2scpk91AtpA16FvVqEg_zCZL32hs

#

then, maybe something like this: https://cs50.harvard.edu/ai/2020/weeks/5/

Week 5 Neural Networks - CS50's Introduction to Artificial Intellig...

This course explores the concepts and algorithms at the foundation of modern artificial intelligence, diving into the ideas that give rise to technologies like game-playing engines, handwriting recognition, and machine translation. Through hands-on projects, students gain exposure to the theory behind graph search algorithms, classification, opt...

slim bone Jul 31, 2023, 1:28 PM

#

left tartan for what it's worth, have you started at the absolute beginning on this content?...

I watched the first three videos, the fourth one is a little much for me at the moment

#

Almost everything made sense (give or take, a couple of knowledge gaps in back propagation)

#

I do agree, the videos are excellent.

left tartan Jul 31, 2023, 1:29 PM

#

yah, like most things, it'll gain meaning on rewatch

slim bone Jul 31, 2023, 1:30 PM

#

Ah so far I've rewatched at least one video every day haha

#

Glad to see those are so heavily recommended though

left tartan Jul 31, 2023, 1:30 PM

#

the videos often have something for every skill level... look at the 4th (back propogation), I get why it'd be a lot

slim bone Jul 31, 2023, 1:31 PM

#

left tartan the videos often have something for every skill level... look at the 4th (back p...

The fourth is the one that delves a little deeper into the mathematical theory behind it, isn't it?

left tartan Jul 31, 2023, 1:31 PM

#

Yah, but I think you can understand it without really intuiting why derivatives are involved

left tartan Jul 31, 2023, 1:33 PM

#

slim bone The fourth is the one that delves a little deeper into the mathematical theory b...

Like, imagine you could search the entire solution space of every possible parameter for the nn.

#

To find the optimal nn

#

Obviously it'd work (find the right answer)... it would just be computationally infeasible.

slim bone Jul 31, 2023, 1:36 PM

#

left tartan Yah, but I think you can understand it without really intuiting why derivatives ...

Ah I wasn't aware the fourth video is particularly relevant beyond "Here's the math behind things"

#

I figured I'd watch it later once I understand the terminology a little better
I just finished my calculus courses so I don't think understanding what's being said is beyond my reach (Then again, I have no idea)

slim bone Jul 31, 2023, 1:37 PM

#

left tartan Obviously it'd work (find the right answer)... it would just be computationally ...

Makes sense, I'm just not sure I'm following ^^;

left tartan Jul 31, 2023, 1:40 PM

#

slim bone Makes sense, I'm just not sure I'm following ^^;

Well, think of NN's as two things: 1. a pretty cool observation that we can build 'intelligent' classifiers via a nn, 2. a really tough problem of figuring out what parameters to use for the nn to yield the optimal results

#

So, video 4 gets into the "really tough problem" part: that it's really "hard" to find the optimal parameters.

#

And that's where all the clever algorithms and math come into play. but if you pretended for a moment that you could just exhaustively search every single possible set of parameters, and ignored this problem, everything is actually fairly simple.

slim bone Jul 31, 2023, 1:43 PM

#

That's curious, and sounds rather crucial

left tartan Jul 31, 2023, 1:45 PM

#

slim bone That's curious, and sounds rather crucial

Did you cover hill climbing in any of your DSA/CS courses?

#

I think if you "get" hill climbing, gradient descent is much easier to intuit.

slim bone Jul 31, 2023, 1:46 PM

#

Oh I understand gradient descent just fine I think

#

I didn't cover a topic called "hill climbing" in my courses - the idea of a multivariable function being a hill was mentioned in calculus 2 though

#

I was indeed explained the idea of "gradient" by the "imagine you're traversing the function on-foot" which what I think you're poking at

left tartan Jul 31, 2023, 1:48 PM

#

slim bone I was indeed explained the idea of "gradient" by the "imagine you're traversing ...

Yah... hill climbing is more a simple algorithm of: start at x, look left and right, and decide which direction to climb.

#

so, imagine combining that idea with derivatives (where you can look at a multidimensional "slope" to decide which way to go)

#

Anyway, this is basically the idea of the 4th video

slim bone Jul 31, 2023, 1:49 PM

#

Ah, it sounds simple when you put it like that

left tartan Jul 31, 2023, 1:50 PM

#

Yah, the math does get hairy though.

slim bone Jul 31, 2023, 1:50 PM

#

As one would imagine, I'll keep it in mind though. Thank you

civic elm Jul 31, 2023, 2:02 PM

#

Question, are there any Data Science online certifications that are valuable to employers? for example, Google ML certification, is this something they like?

fresh harbor Jul 31, 2023, 2:22 PM

#

what is the equivalent of onnxruntime.InferenceSession.get_inputs in OpenCV's cv2.dnn.Net?

sturdy canyon Jul 31, 2023, 2:24 PM

#

civic elm Question, are there any Data Science online certifications that are valuable to ...

Unfortunately I think the answer is it depends on the recruiter. Based on talking with my friend who has done some hiring in this space, he doesn't care so much about what you've learned/what certs you have. He's more inrerested in the stuff you've worked on (personal projects or otherwise) in a real world space, and that you can logically reason your way through a problem. He's one person though, so the next recruiter may require you to have a dual degree in quantum computing and elementary education to be considered pithink

#

I think you should ask yourself if you think you'd find value in them, or if you'd prefer to learn on your own. I got a data science cert back in the day for R, which was very helpful to solidify and learn how to apply the stats foundation I got in school (and it was faster than figuring it out myself). However, once I got into Python and ML, I felt comfortable enough to just learn it on my own/ask questions of my coworkers.

timid kiln Jul 31, 2023, 3:05 PM

#

Where do y'all think the best place to ask questions about plotly would be?

civic elm Jul 31, 2023, 3:20 PM

#

sturdy canyon I think you should ask yourself if you think you'd find value in them, or if you...

Thank you so much, very helpful

past meteor Jul 31, 2023, 3:52 PM

#

slim bone Makes sense. Any recommendations for where to find such experiments?

You can roll one yourself 🙂

#

It's actually good practice. Make some fake data, take a loss function (MSE) and write your own linear regression with gradient descent

#

I did this some years ago, it's good to for example start with regular GD, then make it SGD, then add regularization, then make a 2nd order method, ...

daring sphinx Jul 31, 2023, 3:55 PM

#

guys I'm kind of clueless on what to do next so I'm asking it here. I want to make some money next month through freelancing. $50 is enough. Right now, I know tensorflow, pytorch, natural language processing with spacy and transformers and traditional machine learning algorithms with sklearn. Most I've done in terms of deployment is deploying a simple gradio app of image classification in hugging face.
Right now what do I need to learn to start making some money through freelancing? I'm a fast learner.

slim bone Jul 31, 2023, 4:13 PM

#

past meteor I did this some years ago, it's good to for example start with regular GD, then ...

I'm getting tired just by thinking about it
I barely know a single line of code in Pytorch lol

#

But perhaps that is the way to go - think of a cool project, and just figure out how to do it by any means necessary

serene scaffold Jul 31, 2023, 4:14 PM

#

daring sphinx guys I'm kind of clueless on what to do next so I'm asking it here. I want to ma...

You won't be able to earn any money as a freelancer at your present level of knowledge.

slim bone Jul 31, 2023, 4:14 PM

#

Complete top-down approach type of thing, food for thought I suppose

past meteor Jul 31, 2023, 4:14 PM

#

Do it with numpy

daring sphinx Jul 31, 2023, 4:14 PM

#

serene scaffold You won't be able to earn any money as a freelancer at your present level of kno...

what more do I need to learn?

slim bone Jul 31, 2023, 4:15 PM

#

past meteor Do it with numpy

You mean, straight up program it from the ground up?

past meteor Jul 31, 2023, 4:15 PM

#

gradient descent with numpy is <10 lines of very easy code

serene scaffold Jul 31, 2023, 4:15 PM

#

daring sphinx what more do I need to learn?

to do AI related stuff as a freelancer, you'd need a degree related to AI and years of experience. I wouldn't even try.

past meteor Jul 31, 2023, 4:15 PM

#

Take a piece of paper and write out the partial derivatives of MSE and start there. It's easier than you think

daring sphinx Jul 31, 2023, 4:16 PM

#

serene scaffold to do AI related stuff as a freelancer, you'd need a degree related to AI and ye...

are you even into AI?

past meteor Jul 31, 2023, 4:16 PM

#

Afterwards, look at how y^{hat} is found in math terms (hint, it's just a dot product)

slim bone Jul 31, 2023, 4:16 PM

#

past meteor Take a piece of paper and write out the partial derivatives of MSE and start the...

Unfortunately my calculus courses only dealt with 2-variable calculus, and left out the "multivariable" for calculus 3. Would you reckon it's still as simple as you might think?

past meteor Jul 31, 2023, 4:16 PM

#

And then you're half way there

serene scaffold Jul 31, 2023, 4:16 PM

#

daring sphinx are you even into AI?

yes, I work in language AI professionally. there isn't anything AI related that's worthwhile that could be produced on a budget of $50.

past meteor Jul 31, 2023, 4:17 PM

#

slim bone Unfortunately my calculus courses only dealt with 2-variable calculus, and left ...

Yes, it's still as simple as I think

slim bone Jul 31, 2023, 4:17 PM

#

Worth trying then, got a fun project idea?

past meteor Jul 31, 2023, 4:17 PM

#

I gave a seminar on this a while ago to people that didn't have anything more than HS math

slim bone Jul 31, 2023, 4:17 PM

#

What.

#

I don't think I could've even fathomed the concept of a "gradient" in highschool lol

daring sphinx Jul 31, 2023, 4:18 PM

#

serene scaffold yes, I work in language AI professionally. there isn't anything AI related that'...

I'll try my hardest to prove you wrong and make some money doing machine learning without any degree and work experience.

#

thanks for the motivation.

past meteor Jul 31, 2023, 4:18 PM

#

But you did learn what a partial derivative is in high school right?

serene scaffold Jul 31, 2023, 4:18 PM

#

daring sphinx I'll try my hardest to prove you wrong and make some money doing machine learnin...

I hope it works out for you. Let me know how it goes.

slim bone Jul 31, 2023, 4:19 PM

#

past meteor But you did learn what a partial derivative is in high school right?

Vaguely, they were simply introduced as "Integrals" with no additional context (e.g., not every function has those)

past meteor Jul 31, 2023, 4:19 PM

#

I think I pitched it as learning is 1) trying something 2) making mistakes 3) getting feedback 4) improving 5) going back to step 1

daring sphinx Jul 31, 2023, 4:19 PM

#

sure bro

past meteor Jul 31, 2023, 4:19 PM

#

That's the core loop of gradient descent, idt I used the word "gradient" there unless in very vague terms

slim bone Jul 31, 2023, 4:19 PM

#

Ah

serene scaffold Jul 31, 2023, 4:19 PM

#

daring sphinx I'll try my hardest to prove you wrong and make some money doing machine learnin...

if you're interested in paths to being a professional AI developer that are more likely to work, we can have that discussion as well.

slim bone Jul 31, 2023, 4:19 PM

#

Yeah I do agree that the idea is rather elegant

past meteor Jul 31, 2023, 4:20 PM

#

I think you're scaring yourself. It's not that hard. If you know a regular derivative and the chain rule you can do partial derivatives and then you can understand what a gradient is

slim bone Jul 31, 2023, 4:20 PM

#

Oh, I know what a gradient is. It's just weird to me that you relied on Highschool math

past meteor Jul 31, 2023, 4:21 PM

#

Partial derivatives are high school math or at least they were to me

slim bone Jul 31, 2023, 4:21 PM

#

Weird? Impressive? Not sure what's ther ight word here

daring sphinx Jul 31, 2023, 4:21 PM

#

serene scaffold if you're interested in paths to being a professional AI developer that are more...

I'll look into that too. But I just need to make $50 this august using whatever machine learning I've learnt and will learn.

slim bone Jul 31, 2023, 4:21 PM

#

past meteor Partial derivatives are high school math or at least they were to me

Ah, some basic linear algebra too probably?

serene scaffold Jul 31, 2023, 4:21 PM

#

daring sphinx I'll look into that too. But I just need to make $50 this august using whatever ...

why does it have to be $50 and why in August?

daring sphinx Jul 31, 2023, 4:21 PM

#

serene scaffold why does it have to be $50 and why in August?

I've made a commitment to join MMA classes from september. And I want to do it with my own money

serene scaffold Jul 31, 2023, 4:22 PM

#

@past meteor my high school math ended at trig. only the most advanced students would learn limits and derivative calculus

past meteor Jul 31, 2023, 4:22 PM

#

Yeah we learnt what vectors and matrices are and how to multiply them

serene scaffold Jul 31, 2023, 4:22 PM

#

daring sphinx I've made a commitment to join MMA classes from september. And I want to do it w...

MMA as in mixed martial arts?

lapis sequoia Jul 31, 2023, 4:22 PM

#

daring sphinx I'll look into that too. But I just need to make $50 this august using whatever ...

Can you create machine learning programs already?

daring sphinx Jul 31, 2023, 4:22 PM

#

serene scaffold MMA as in mixed martial arts?

yea

past meteor Jul 31, 2023, 4:22 PM

#

and with that you have enough for SGD

slim bone Jul 31, 2023, 4:22 PM

#

serene scaffold MMA as in mixed martial arts?

They want to fund their lessons with Free-lanced-machine-learning-money I think

daring sphinx Jul 31, 2023, 4:22 PM

#

lapis sequoia Can you create machine learning programs already?

I'm learning AWS and Rest API using flask. If I try hard, I'll be done with that learning part within a week.

#

I'm through the AWS course 50% already.

past meteor Jul 31, 2023, 4:23 PM

#

Maybe I am being overly optimistic here, I'm just going by what we covered in HS 🤣

slim bone Jul 31, 2023, 4:23 PM

#

past meteor Yeah we learnt what vectors and matrices are and how to multiply them

Ah most of the topics you've mentioned are academic, I see now
I wasn't even aware of the existence of matrices until I took Linear Algebra haha

#

But you've raised a good point, I probably should learn Numpy ~~and maybe Python in general, I haven't touched it in two years.~~

lapis sequoia Jul 31, 2023, 4:23 PM

#

daring sphinx I'm learning AWS and Rest API using flask. If I try hard, I'll be done with that...

Have you created your own programs yet?

past meteor Jul 31, 2023, 4:24 PM

#

Vectors were covered in 3rd secundary which is the last year of middle school afaik

slim bone Jul 31, 2023, 4:24 PM

#

Wow.

#

Care to tell us where you're from?

daring sphinx Jul 31, 2023, 4:24 PM

#

lapis sequoia Have you created your own programs yet?

no. I just made models.

past meteor Jul 31, 2023, 4:24 PM

#

slim bone Care to tell us where you're from?

Belgium

slim bone Jul 31, 2023, 4:25 PM

#

Well I know where I want to grow my kids

daring sphinx Jul 31, 2023, 4:25 PM

#

lapis sequoia Have you created your own programs yet?

deployed one in hugging face with gradio ui. guess that doesn't count .

slim bone Jul 31, 2023, 4:25 PM

#

(Joking, of course)

#

But no, that's seriously impressive

past meteor Jul 31, 2023, 4:26 PM

#

Well, if you're not great with these topics I think you should interweave doing ML projects and coding up the algos from scratch

#

The former is the more relevant skill for jobs but the latter is a good test to see if you know what you're doing imo

lapis sequoia Jul 31, 2023, 4:27 PM

#

daring sphinx deployed one in hugging face with gradio ui. guess that doesn't count .

In my opinion learning from tutorials and actually using it to create your own models that you aren't making using tutorials are completely 2 different battles.

past meteor Jul 31, 2023, 4:27 PM

#

Some people can read a proof and grok it but personally implementing it helps me to be sure that I know what I'm doing

daring sphinx Jul 31, 2023, 4:28 PM

#

lapis sequoia In my opinion learning from tutorials and actually using it to create your own m...

I've done both. I've worked on personal projects without any help.

past meteor Jul 31, 2023, 4:28 PM

#

Naïve implementations of most algorithms are quite simple (the ones that are used in practice tend to have some nice tricks for numerical stability etc).

daring sphinx Jul 31, 2023, 4:29 PM

#

I think if I'm gonna have to learn deployment, rest api and making a UI to use the model.

lapis sequoia Jul 31, 2023, 4:30 PM

#

If you can create machine learning models based on real world problems I don't see why you can't make money.

slim bone Jul 31, 2023, 4:40 PM

#

@past meteor
I'm trying to attribute it some thought, breaking down what each step consists of
I've come down with:

Obtaining, parsing and feeding the algorithm data (Just, File I/O?)
Breaking down the data, and implementing a forward propagation algorithm
Calculating the cost of the function and propagate it backwards - correcting the weights involved with Gradient Descent

Does this sound good? Because, I do think I know how to implement each step individually
(I know this sounds rather trivial, just asking if I'm in the right direction)

#

Oh and, I'll probably do a handwriting recognition software for the sake of being able to use 3blue1brown's videos.
Maybe something a little less basic if I manage this one

#

Oh and if I managed to get your attention - Can(should?) I do this in ~~Jupyter Notebooks~~ Google Collab for the sake of portability?

past meteor Jul 31, 2023, 4:47 PM

#

slim bone <@260493929047130113> I'm trying to attribute it some thought, breaking down wha...

Pick a loss function and find the partial derivatives with pen and paper.
Generate a dataset (P variables) that's easy to work with
Generate random weights (P + 1 variables, bias term)
In a for loop:
Make a prediction (dot product of weights and data)
Calculate the error
Calculate the gradient
Update the weights
Continue till n_iterations is met

#

I'd start with linear regression

slim bone Jul 31, 2023, 4:50 PM

#

past meteor 1. Pick a loss function and find the partial derivatives with pen and paper. 2. ...

Question about steps 2 and 3 - Why are the amount of weights depenent on the size of the dataset? Or did I miss something

#

For example, I can have a hundred, 16x16 images - but I'll need 257 weights no?

past meteor Jul 31, 2023, 4:51 PM

#

I changed N to P for clarity, P is the number of variables and N the number of datapoints

past meteor Jul 31, 2023, 4:51 PM

#

slim bone For example, I can have a hundred, 16x16 images - but I'll need 257 weights no?

yes

slim bone Jul 31, 2023, 4:51 PM

#

past meteor I changed N to P for clarity, P is the number of variables and N the number of d...

Pardon? They're both P's now

#

Is that a mistake?

past meteor Jul 31, 2023, 4:52 PM

#

No, you have 1 weight (or coefficient) for each variable

#

And a bias (intercept) term, so P+1

slim bone Jul 31, 2023, 4:53 PM

#

Ah. I think I misread - when you say:

Generate a dataset (P variables) that's easy to work with
You mean, a dataset with N images (In my project), where each image gives up P variables (In our case, pixels)?

#

Is that right?

past meteor Jul 31, 2023, 4:53 PM

#

Yes

#

The issue with not just generating easy to work with data is that you can't sanity check as easily

#

If I were you I'd generate a NxP matrix at random and have a function you generate at random as well that defines the output

#

Why? You're pretty sure in this case that your loss is 0 if you did it correctly

slim bone Jul 31, 2023, 4:56 PM

#

past meteor Why? You're pretty sure in this case that your loss is 0 if you did it correctly

How does that derive from said matrix? I'm not sure I follow

#

Also uhm, wouldnt I have to calculate a ton of partial derivatives (256) with my implementation?

past meteor Jul 31, 2023, 4:57 PM

#

What I'm trying to say is that you could use fake data instead of the handwritten digits

#

But that's just my personal way of working

slim bone Jul 31, 2023, 4:57 PM

#

What does fake data mean in this context?

past meteor Jul 31, 2023, 4:57 PM

#

np.random

slim bone Jul 31, 2023, 4:58 PM

#

Yeah but I need to feed the network concrete, labeled inputs no?

#

I feel like I'm missing what you're trying to say

past meteor Jul 31, 2023, 4:58 PM

#

I just make a random matrix and then I make a vector that maps input to output

#

So if I have a random matrix that is N x P, I make a vector of size P (also random, but this time maybe random integers) and I do labels = np.dot(random_data, random_true_weights) + 42

#

At the end of your gradient descent the weights (coefficients) you learnt should be really close to that random_true_weights vector you made in the beginning

#

But maybe I'm just really confusing you so feel free to ignore this bit haha

slim bone Jul 31, 2023, 5:02 PM

#

Perhaps a tiny bit, I think might need to fill in some knowledge gaps I have (I just realized for example, that I have no idea how to adjust the biases.)

past meteor Jul 31, 2023, 5:02 PM

#

TL;DR is that when I'm doing "fundamental" exercises (ones close to theory) I like making my own toy datasets because it helps in understanding what's going on

slim bone Jul 31, 2023, 5:03 PM

#

Ah I think I'd just rather implement something I'd be proud of

#

E.g., when I learned HTML5 I made an entire website about Corgis. Core memory for me 🙂

#

Granted, it took two days and was very easy, but there's something motivating about making something that's truly your own

past meteor Jul 31, 2023, 5:04 PM

#

I forget people aren't like me and that's also totally okay haha

#

Tbh, you might as well pick up any guide from Tensorflow / Pytorch's website because the first ones build simple neural networks for the handwritten dataset

#

https://keras.io/examples/vision/mnist_convnet/

Keras documentation: Simple MNIST convnet

slim bone Jul 31, 2023, 5:09 PM

#

past meteor Tbh, you might as well pick up any guide from Tensorflow / Pytorch's website bec...

Tbf I've been trying to read these docs for 2-3 days now and I just can't bring myself to understand the code

#

The Pytorch one specifically

#

But I thought you recommended me to use Numpy before relying on libraries first? ^^;

past meteor Jul 31, 2023, 5:11 PM

#

It depends how deep you wanna know this stuff, if you want to make something that makes you proud and then move on to other projects that might not even be related to ML/AI I'd just use Keras

#

If you want to stick with ML/AI for the long haul then coding a few of the basics yourself in Numpy (starting from linear regression) and then moving to Jax inbetween doing actual projects (could even be the titanic dataset) makes the most sense

#

That's just my 2 cents, I'm actually curious to know what the rest thinks.

slim bone Jul 31, 2023, 5:18 PM

#

Hmm, I can give the whole picture about myself (I'll make it as brief as I can)

I'm currently approaching my 2nd year of bachelors in CS (out of 3.5, probably)
I've started pursuing this degree in order to obtain a masters (and maybe even a PhD) in ML
I probably have about 2 months to burn right now (summer vacation) and I figured to myself that I might want to start learning about ML, now that I have some of the mathematical background nailed down.
I've been trying to get into ML in the last few days, kind of drowned in tutorials, asked around for advice, got some advice, tried said advice, still have no idea how to start.
*. I've been reading Pytorch documentation for a while, trying to implement something, but to no avail - I just don't understand the code written there for the life of me.

Thing is, I've watched the first 3blue1brown videos and I do feel like I understand the fundamentals of how the process works, at least from the math-ier side of things. So your idea sounded intruiging

tl;dr: Starting 2nd year of CS. Definitely in it for the long haul. Can't implement anything despite genuine efforts. Narrowing down the problem to "just do math" actually sounds cool

#

@past meteor Obligatory tag*

twilit tundra Jul 31, 2023, 5:18 PM

#

100% agree on Keras for accessibility. In my experience, coding the basic components from scratch at least once is useful and gratifying but not sure there is a need to continue using them on actual projects instead of the available frameworks

slim bone Jul 31, 2023, 5:19 PM

#

twilit tundra 100% agree on Keras for accessibility. In my experience, coding the basic compon...

So you advocate for "Do the numpy thing once and then move on"?

#

Also, the reason I'm using Pytorch is because it seems universities tend to favour it over TF for some reason.

#

At least where I live

#

I've heard Keras is easier, don’t know if that’s true*

twilit tundra Jul 31, 2023, 5:25 PM

#

Yes basically: I like to think of neural networks as lego bricks. Once I know how they work and their purpose, I just want to be able to reuse them on each project without having to reimplement them. Having experience coding them is mostly useful when you want to introduce new custom components.

In terms of framework, Keras and Pytorch are basically equivalent until you go into specific models. Keras runs on top of tensorflow and makes it more accessible.

past meteor Jul 31, 2023, 5:26 PM

#

slim bone Hmm, I can give the whole picture about myself (I'll make it as brief as I can) ...

I think you can do it, it's pretty easy even as you are now

#

It's not as scary as you think

#

I can implement it for you but then you wouldn't learn anything compared to struggling with it for max 2 days

twilit tundra Jul 31, 2023, 5:30 PM

#

Keras and Pytorch work similarly. The most difficult part about learning either is the data science/ML design part. You can easily switch between the two

slim bone Jul 31, 2023, 5:30 PM

#

twilit tundra Yes basically: I like to think of neural networks as lego bricks. Once I know ho...

Your approach definitely makes sense, unfortunately I’m not sure I’m at the stage of making anything with PyTorch - let alone something worth reusing

slim bone Jul 31, 2023, 5:31 PM

#

past meteor I think you can do it, it's pretty easy even as you are now

“It” as in, making a model with Numpy right?

slim bone Jul 31, 2023, 5:32 PM

#

twilit tundra Keras and Pytorch work similarly. The most difficult part about learning either ...

Is one reasonably more accessible than the other though?

#

As in, would I have a reason to switch in the near future if my university uses PyTorch?

twilit tundra Jul 31, 2023, 5:32 PM

#

When I took the coursera ML course a few years ago, they had the implementation of a neural network from scratch on MATLAB as a workshop

slim bone Jul 31, 2023, 5:33 PM

#

That sounds cool!!

twilit tundra Jul 31, 2023, 5:33 PM

#

They are equally accessible I'd say

#

If you know you're going to use pytorch, then use pytorch

slim bone Jul 31, 2023, 5:34 PM

#

Right

twilit tundra Jul 31, 2023, 5:34 PM

#

I can't really point to any guide on pytorch but I'm sure there are a few beginner-friendly ones

slim bone Jul 31, 2023, 5:34 PM

#

So, I should probably do the NumPy thing, see if I manage with that, and then come back to you folks if I have some questions regarding how to proceed?

twilit tundra Jul 31, 2023, 5:35 PM

#

Sounds good

slim bone Jul 31, 2023, 5:35 PM

#

I am rather skeptical this project would help me read the documentation better but it sounds like a nice thing to do regardless

#

And honestly at this point I just want to make something

slim bone Jul 31, 2023, 5:35 PM

#

twilit tundra Sounds good

Right, better get to it then. Thanks you two for the help

past meteor Jul 31, 2023, 5:36 PM

#

slim bone “It” as in, making a model with Numpy right?

yes

long canopy Jul 31, 2023, 5:37 PM

#

I have a program that records all my window usages (ActivityWatch, if you know of it) and generates Events objects which contain the window name and amount of time spent on this window; a new event object is generated each time I switch windows.

I have 2 categories: Working and Not Working. I'd like to construct an AI that automatically assigns, or suggests, the proper classification for an Event object into one of these categories. For some Event objects, like those recording time playing a game, the categorization is obvious. But for Events related to internet browsing, the difference between Not Working and Working is not always clear cut and obvious, especially when, e.g., I'm on discord, facebook, reddit, or Google, which may be for either work or nonwork activities, or when I'm navigating websites I've never encountered before.

Does anyone have suggestions on a path that could get me started on eventually being able to program something like this?

twilit tundra Jul 31, 2023, 5:43 PM

#

A common use case that is quite similar is logs anomaly detection: detecting behaviors in log entries that are different from usual. Not an expert on the subject but it could be a good place to start

long canopy Jul 31, 2023, 5:45 PM

#

twilit tundra A common use case that is quite similar is logs anomaly detection: detecting beh...

will look into that, ty!

desert oar Jul 31, 2023, 6:13 PM

#

long canopy I have a program that records all my window usages (ActivityWatch, if you know o...

the biggest component of something like this is figuring out how to encode all this unstructured data into "features" that can be fed into a model of some kind

#

+1 on looking at anomaly detection literature for logs, maybe specifically "intrusion detection" in cybersecurity

#

i might consider starting by trying to classify work vs non-work in a fixed time window, before trying to determine when to start/stop an activity and generate a corresponding event

#

that might be a good way to get a feel for the data and the feature you want to develop, in a less-complicated modeling scenario

#

that said, you will also want to build up some structured thinking about what an "event" really is. it might be something like "i have reached a minimum confidence threshold that i switched from Non-Work to Work X minutes ago"

#

at which point some of the statistical and probabilistic reasoning might look similar to "changepoint detection" in time series analysis

#

you could also do something like chunk up the time into 1-minute sliding windows and look at the prevailing activity in each window

#

lots of options here, but also hopefully you can see how there are a lot of components to a project like this. it's a great idea, but by the time you're done with it, you might have put enough work into it to have a product you can sell vs. just a hobby project to learn machine learning

#

i can imagine someone making a side business out of selling an "AI time tracker" app like this, if one doesn't already exist

long canopy Jul 31, 2023, 6:26 PM

#

desert oar the biggest component of something like this is figuring out how to encode all t...

thanks a lot for your comments!

#

already found a couple of books about anomaly detection with python, so I'll begin with that

#

will look into changepoint detection and time series analysis too

umbral charm Jul 31, 2023, 7:05 PM

#

Hey im Using Pandas (im new too this package) and im trying to create a column called 'MAT' . HOwever, i want in this column to be such that it only have values of if MA1 == MA2, (both of these r columns on the same dataframe), and if not it will just give NaN for that index
How would i implement this, ive tried chat gpt and BIng ai but ity no good

sleek harbor Jul 31, 2023, 7:06 PM

#

When u code, does ur code look "good"? Cus.. everything I do is such a chaotic mess, I literally can't ask for help with my code when smth doesn't work, cus it's such a mess.. and I usually just clean things up at the end, when everything works..

twilit tundra Jul 31, 2023, 7:09 PM

#

umbral charm Hey im Using Pandas (im new too this package) and im trying to create a column c...

Do you mean that you have a column MA1 and a column MA2 and MAT should be equal to True if MA1 == MA2, NaN otherwise?

umbral charm Jul 31, 2023, 7:09 PM

#

twilit tundra Do you mean that you have a column MA1 and a column MA2 and MAT should be equal ...

not True, but the value in the columns of MA1 and MA2

#

https://gyazo.com/8a6232d24adc2e123bc0a07b9389b66d

Gyazo

#

like this

twilit tundra Jul 31, 2023, 7:11 PM

#

Something like
df[MAT] = df[MA1]
df.loc[df[MA1]!=df[MA2],MAT] = np.nan

umbral charm Jul 31, 2023, 7:12 PM

#

ok

twilit tundra Jul 31, 2023, 7:12 PM

#

First line to initialize the value, and then you filter on the indices that have MA1!=MA2

umbral charm Jul 31, 2023, 7:13 PM

#

It is a slight bit more complicated than that tho, my numbers go to around 5 dp, and it needs so stay like that, however if the numbers are equal to eacother too 2 dp, that is allowed to go to MAT

twilit tundra Jul 31, 2023, 7:13 PM

#

sleek harbor When u code, does ur code look "good"? Cus.. everything I do is such a chaotic m...

What do you mean by mess? Is it because you introduce too many variables that don't make sense?

umbral charm Jul 31, 2023, 7:14 PM

#

So like i want to compare the numbers to 2 dp, but i wanna display them to 5 dp

#

hopefully its just simple .round()

twilit tundra Jul 31, 2023, 7:15 PM

#

Yeah you can juste replace by a rounding

#

In the pandas boolean

sleek harbor Jul 31, 2023, 7:15 PM

#

twilit tundra What do you mean by mess? Is it because you introduce too many variables that do...

I mean like, no comments, strange formatting, in jup notebooks - lots and lots of redundant cells I used for experiments, sometimes even in a non linear order.. basically, a mess 🥲

twilit tundra Jul 31, 2023, 7:16 PM

#

It's a huge mess when I use a notebook and I know I'm not going to show it to anyone else

past meteor Jul 31, 2023, 7:16 PM

#

sleek harbor When u code, does ur code look "good"? Cus.. everything I do is such a chaotic m...

I can always give pointers

#

I obsess about mine looking good but it's not great either because it's at the expense of doing less stuff

twilit tundra Jul 31, 2023, 7:17 PM

#

The easiest way to make your code easier to read is to use markdown to partitionate your code and add descriptions

#

And clean up cells that you defined just to check one variable once you're done with them

sleek harbor Jul 31, 2023, 7:17 PM

#

past meteor I can always give pointers

U familiar with plotly dash?

twilit tundra Jul 31, 2023, 7:17 PM

#

Or merge them together

past meteor Jul 31, 2023, 7:18 PM

#

With plotly yes but I haven't used dash specifically

sleek harbor Jul 31, 2023, 7:19 PM

#

twilit tundra And clean up cells that you defined just to check one variable once you're done ...

I often write several versions of doing the same thing, and then have difficulty deciding what to keep, especially when I'm not 100% certain on my design, as in, I might want this to happen, and then this should be kept as it can be used later, or I might not need it later and can use this more optimized version.. and.. it's a mess

past meteor Jul 31, 2023, 7:19 PM

#

Notebooks are good but dangerous imo because out of experience, they lead to many globals and hard to understand / debug / change code

sleek harbor Jul 31, 2023, 7:20 PM

#

past meteor Notebooks are good but dangerous imo because out of experience, they lead to man...

My latest bug is - a global :3 can't figure out where I messed up. Will sleep on it ig

umbral charm Jul 31, 2023, 7:21 PM

#

twilit tundra Or merge them together

Where do i put the round?

past meteor Jul 31, 2023, 7:21 PM

#

sleek harbor I often write several versions of doing the same thing, and then have difficulty...

Do you use version control?

umbral charm Jul 31, 2023, 7:22 PM

#

Surley if i put it within the .loc it will try to locate a rounded number which therefore does not existr

twilit tundra Jul 31, 2023, 7:22 PM

#

df.loc[df[MA1].round()!=df[MA2].round(),MAT] = np.nan

#

something like that, you're just transforming the 2 Series you're comparing

sleek harbor Jul 31, 2023, 7:22 PM

#

past meteor Do you use version control?

Yes, but.. probably not often enough. I commit when I finish like, a "chapter" of what I'm doing, like a big step. Should do it more often, but.. ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

umbral charm Jul 31, 2023, 7:22 PM

#

would that not try to locate the rounded number, and not the initial number

twilit tundra Jul 31, 2023, 7:22 PM

#

The first term is a boolean

#

It returns a series that is True on the indices you're filtering it on

#

df["MAT1"].round() != df["MAT2"].round() = pd.Series([False, False, False, True]) for instance

#

So when you use loc, it will select the last row

past meteor Jul 31, 2023, 7:24 PM

#

sleek harbor Yes, but.. probably not often enough. I commit when I finish like, a "chapter" o...

Each commit should leave your codebase in a runnable, well formed state

#

At each checkpoint you reach one of those, commit

umbral charm Jul 31, 2023, 7:25 PM

#

OHH

#

I SEEE

#

Pandas is so different

umbral charm Jul 31, 2023, 7:38 PM

#

twilit tundra df["MAT1"].round() != df["MAT2"].round() = pd.Series([False, False, False, True]...

Its not working correctly

#

fuck it ill just add 2 more columns with the rounded values

twilit tundra Jul 31, 2023, 7:43 PM

#

Did you put an argument in round()?

umbral charm Jul 31, 2023, 7:43 PM

#

twilit tundra Did you put an argument in round()?

Yea

#

2 for 2 dp

#

Its not finding all the values, Its finds like 5 values which match, but when i just iterate throught it i find about 13 that match

odd meteor Jul 31, 2023, 7:45 PM

#

umbral charm fuck it ill just add 2 more columns with the rounded values

Perhaps, you might wanna share the code you wrote and the error message you got (if any) so roseluxembourg can provide much help

twilit tundra Jul 31, 2023, 7:46 PM

#

Maybe your argument is not correct for the computation you're looking for (putting 1 instead maybe?)

umbral charm Jul 31, 2023, 7:48 PM

#

odd meteor Perhaps, you might wanna share the code you wrote and the error message you got ...

I can share my code but my CSV is like 1000 rows so that'll be a bit harder

#

TSLA['Boo'] = TSLA['MA20']
TSLA.loc[TSLA['MA20'].round() != TSLA['MA80'].round(), 'Boo'] = np.nan
print(TSLA['Boo'])
for i in TSLA['MA20']:
    for j in TSLA['MA80']:
        if round(i, 2) == round(j, 2):
            print(i, j)

twilit tundra Jul 31, 2023, 7:48 PM

#

You're comparing every value in col1 with every value in col2 here

#

If you were trying to keep MAT1 if there is at least one MAT2 that approximates it, it's another formula

umbral charm Jul 31, 2023, 7:50 PM

#

twilit tundra You're comparing every value in col1 with every value in col2 here

AHHH

#

I see

#

its because even for different indexes the values of MA1 and MA2 r the same so they compare them from 2 different indexes

#

so your oriignal code only compares MA1 and MA2 from the same index row correct?

twilit tundra Jul 31, 2023, 7:52 PM

#

Yes

umbral charm Jul 31, 2023, 7:52 PM

#

but when i did my iteration it couldve been comparing MA1 from index 5 and MA2 from index 12

twilit tundra Jul 31, 2023, 7:52 PM

#

Yes

#

Which one is the one you're trying to do

umbral charm Jul 31, 2023, 7:53 PM

#

Compare in the same index