#data-science-and-ml | Python | Page 298

spiral trail Mar 19, 2021, 7:27 PM

#

The ass is missing but it's a starting point?

bronze skiff Mar 19, 2021, 7:34 PM

#

ai decensorship for japanese culture is a thing

#

basically-- you can "learn the ass"

grave frost Mar 19, 2021, 7:38 PM

#

bronze skiff basically-- you can "learn the ass"

what was the training data? hen-

bronze skiff Mar 19, 2021, 7:42 PM

#

its just gradient descent

#

descending into the

spiral trail Mar 19, 2021, 7:44 PM

#

You could create a mask with a preference based on a big enough sample size, right? Then it's goodbye Tinder swiping 😊

#

On a serious note though, why is this not a thing on porn websites?

iron bough Mar 19, 2021, 7:52 PM

#

hmm

#

i think it's because of bbq sauce

grave frost Mar 19, 2021, 8:47 PM

#

they had a bug bounty program; must be hard for the experts to browse the site for vulnerabilities 😏

serene scaffold Mar 19, 2021, 9:28 PM

#

@spiral trail @misty flint there are other examples of ML being used in the real world that are more appropriate for our server, which is designed for users as young as 13.

bronze skiff Mar 19, 2021, 9:43 PM

#

13 year olds learn ds?

#

this is why the rust server is better

serene scaffold Mar 19, 2021, 9:46 PM

#

bronze skiff 13 year olds learn ds?

we allow anyone who is eligible to use Discord to fully participate in our community. While it's not likely that a 13 year old would have the prerequisite knowledge to succeed in data science right away, that doesn't preclude them from participating.

misty flint Mar 19, 2021, 9:54 PM

#

serene scaffold <@!107950493216559104> <@!446424248479645706> there are other examples of ML bei...

this is true. let me delete my comment

misty flint Mar 19, 2021, 9:55 PM

#

bronze skiff this is why the rust server is better

what about julia pithink

#

where is foxxy

#

maybe i will learn julia next Clown2

shut valve Mar 19, 2021, 10:05 PM

#

im using speech brain for a project rn that uses pytorch first time working with it not doing anything advance gonna end up using the pertained models anyway so its the same layers and stuff. I just kinda started in tf and figured that the cert would help me get a job in ai

#

apparently its all just straight from the coursera course so only sequential

misty flint Mar 19, 2021, 10:11 PM

#

shut valve apparently its all just straight from the coursera course so only sequential

interesting

#

pithink

#

my classmates and i were thinking of doing the certificate just to get more familiar with TF not really to use the certificate lol

#

since ik very few certificates mean anything to companies

#

which is fair

#

my friend said their company just hired a guy that had a certificate for a certain tech but didnt actually end up knowing the tech when they brought him onboard

#

memecringeharold

exotic maple Mar 19, 2021, 10:14 PM

#

that shit happens all the time lol

misty flint Mar 19, 2021, 10:15 PM

#

he said if it was up to him, that guy def would not be hired

misty flint Mar 19, 2021, 10:15 PM

#

exotic maple that shit happens all the time lol

yeah one of the red flags was the guy had a 4 page resume

exotic maple Mar 19, 2021, 10:15 PM

#

sadly it means the DS learn certificates i have dont mean shit even thou I know -a bit- of it

exotic maple Mar 19, 2021, 10:15 PM

#

misty flint yeah one of the red flags was the guy had a 4 page resume

that alone would have made him discard his CV lol

misty flint Mar 19, 2021, 10:15 PM

#

and its not like hes a phd and this is academia or anything

#

CryLaugh

exotic maple Mar 19, 2021, 10:15 PM

#

I have always rejected people with more than 2 pages

#

1 is optimal

dim dirge Mar 19, 2021, 10:15 PM

#

Hello everyone, I am new to python programming and would like some help please in building a script!

misty flint Mar 19, 2021, 10:15 PM

#

but my friend said no one consulted him so Oopsies

exotic maple Mar 19, 2021, 10:16 PM

#

dim dirge Hello everyone, I am new to python programming and would like some help please i...

Hey. This is the DS channel. If your question is about DS we can help, otherwise try looking for a more appropiate channl 🙂

#

Fire away man

shut valve Mar 19, 2021, 10:16 PM

#

well its not bad at all (the course and the exam) I guess it depends on your course load like i had the python skills to do this years ago in school but i was spreading my self thin enough there and didnt have the time to do it. Well I dont plan of cheating my through it i like ai and think it would just be nice to show with my projects

misty flint Mar 19, 2021, 10:16 PM

#

exotic maple sadly it means the DS learn certificates i have dont mean shit even thou I know ...

yeah. i think at this point, projects >>> certificates bc you can actually show employers your learning

exotic maple Mar 19, 2021, 10:17 PM

#

misty flint yeah. i think at this point, projects >>> certificates bc you can actually show ...

Then you have a shit mind like mine and can't think of a good project :v

#

-sad dog face-

misty flint Mar 19, 2021, 10:17 PM

#

shut valve well its not bad at all (the course and the exam) I guess it depends on your cou...

good to know ValkNaruhodo

exotic maple Mar 19, 2021, 10:17 PM

#

Eventually i'm just going to run a sentiment analysis of my friends in twitter

misty flint Mar 19, 2021, 10:17 PM

#

exotic maple Then you have a shit mind like mine and can't think of a good project :v

see the good thing about group projects is im never the one that has to think of the idea since im also shit at coming up with ideas

exotic maple Mar 19, 2021, 10:17 PM

#

or just find a random dataset on kaggle

misty flint Mar 19, 2021, 10:17 PM

#

memethinkingblackguy

exotic maple Mar 19, 2021, 10:17 PM

#

-proceeds to embarass himself-

misty flint Mar 19, 2021, 10:18 PM

#

CryLaugh

#

one of my projects were doing some light nlp

#

on telegram data

dim dirge Mar 19, 2021, 10:18 PM

#

exotic maple Hey. This is the DS channel. If your question is about DS we can help, otherwise...

I'm not sure, but I thought I could get some help here because it's about data that I want to manage in python (manipulation of data files). Does it fit in the group discussion? or...

misty flint Mar 19, 2021, 10:19 PM

#

since all the public channels have an easy way to download their data

#

you just go to the top right corner, and literally press export chat data

#

kekw

#

so if you need ideas, theres that

#

#

they even have a telegram api

exotic maple Mar 19, 2021, 10:21 PM

#

dim dirge I'm not sure, but I thought I could get some help here because it's about data t...

it should

misty flint Mar 19, 2021, 10:21 PM

#

not that i really use telegram. its not that popular in the states

#

but it still seems like something nice to put on the resume

exotic maple Mar 19, 2021, 10:21 PM

#

misty flint

classificattion challenge - Is Rex trolling or not? :v

misty flint Mar 19, 2021, 10:21 PM

#

exotic maple classificattion challenge - Is Rex trolling or not? :v

CryLaugh

#

seems too good to be true, right?

#

thats what i thought too at first

dim dirge Mar 19, 2021, 10:23 PM

#

exotic maple it should

I explain my problem ? or...

misty flint Mar 19, 2021, 10:23 PM

#

the problem comes when you try to download the data, its usually too big so you have to select what you want

shut valve Mar 19, 2021, 10:24 PM

#

I honestly don't know the value of certs to projects like obv a good project is worth the most but like i make shitty little things that are fun for me i dont do medical or business stuff i do stupid shit

misty flint Mar 19, 2021, 10:25 PM

#

shut valve I honestly don't know the value of certs to projects like obv a good project is ...

yes, but does your stupid shit use dif technologies? dif languages? it doesnt matter then

#

DoggoKek

#

you can still put it

#

did you make a docker container for it? its still using docker DoggoKek

#

logo_docker

shut valve Mar 19, 2021, 10:28 PM

#

I do everything in python and yeah atm im just trying to get really good a few libs tf, (numpy, pands, the basic data exploratory stuff), I try to put them on plotly's Dash

misty flint Mar 19, 2021, 10:28 PM

#

plotly's Dash ID_blurryeyes

#

im also trying to get better at that

#

data viz leggo

#

Praise

shut valve Mar 19, 2021, 10:28 PM

#

word its sick

misty flint Mar 19, 2021, 10:28 PM

#

ye fam

exotic maple Mar 19, 2021, 10:29 PM

#

what is plotly's dash?

#

should i save another bookmarket of another tool to learn? -pukes-

shut valve Mar 19, 2021, 10:30 PM

#

like Im not a front end dev and i just do the basic scatter bar colorful graphs stuff its just a way to have stuff on the internet

misty flint Mar 19, 2021, 10:30 PM

#

exotic maple should i save another bookmarket of another tool to learn? -pukes-

~~possibly~~

#

ID_BoomKek

#

im jk idk what you use for data viz

exotic maple Mar 19, 2021, 10:31 PM

#

I hate you rex

#

die

misty flint Mar 19, 2021, 10:31 PM

#

but think of it like a tableau alt

#

but for python

#

and more python-integrated

exotic maple Mar 19, 2021, 10:31 PM

#

i was thinking of using this

shut valve Mar 19, 2021, 10:31 PM

#

yeah like tableau with flask

exotic maple Mar 19, 2021, 10:31 PM

#

https://www.datawrapper.de/

Create charts and maps with Datawrapper

Datawrapper

Create interactive, responsive & beautiful charts — no code required.

#

since its free and shit xd

#

and looks pretty

misty flint Mar 19, 2021, 10:31 PM

#

oh that one looks interesting

exotic maple Mar 19, 2021, 10:31 PM

#

oh plotly is coded in python as well?

misty flint Mar 19, 2021, 10:31 PM

#

i will also save it

exotic maple Mar 19, 2021, 10:31 PM

#

another library?

misty flint Mar 19, 2021, 10:31 PM

#

yeah

exotic maple Mar 19, 2021, 10:32 PM

#

-dies buried by libraries-

misty flint Mar 19, 2021, 10:32 PM

#

exotic maple -dies buried by libraries-

all the libraries 💀

#

~~better than R packages~~

#

RunFail

shut valve Mar 19, 2021, 10:32 PM

#

yeah plotly is for graphs and data viz and dash is for front end deployment

misty flint Mar 19, 2021, 10:32 PM

#

~~those are endless~~

exotic maple Mar 19, 2021, 10:33 PM

#

devcelopment? NEVER

#

I tried using django once

#

almost killed my friend

shut valve Mar 19, 2021, 10:33 PM

#

lol yeah thats why i like dash real easy one file type shit

misty flint Mar 19, 2021, 10:33 PM

#

we used flask for our last project

#

and by we, i mean my friend

#

and by used, i mean we had one page that was a pain to figure out

#

💀

exotic maple Mar 19, 2021, 10:34 PM

#

https://plotly.com/

Plotly: The front end for ML and data science models

Plotly creates & stewards the leading data viz & UI tools for ML, data science, engineering, and the sciences. Language support for Python, R, Julia, and JavaScript.

#

this is it?

shut valve Mar 19, 2021, 10:34 PM

#

the struggling means your closer* to learning

misty flint Mar 19, 2021, 10:34 PM

#

exotic maple this is it?

yeah

#

what about when youre bashing your head against the keyboard

#

what does that mean

shut valve Mar 19, 2021, 10:34 PM

#

https://dash-gallery.plotly.host/Portal/

Dash Enterprise

misty flint Mar 19, 2021, 10:34 PM

#

pithink

exotic maple Mar 19, 2021, 10:35 PM

#

sigh

#

-looks for youtube guides for plotly-

misty flint Mar 19, 2021, 10:36 PM

#

theres another tool i was going to mention but i think warden will strangle me

#

RunFail

exotic maple Mar 19, 2021, 10:36 PM

#

oh yeah baby look at those animations. py_strong

exotic maple Mar 19, 2021, 10:36 PM

#

misty flint theres another tool i was going to mention but i think warden will strangle me

Have you seen SAO?

misty flint Mar 19, 2021, 10:36 PM

#

yeah

exotic maple Mar 19, 2021, 10:36 PM

#

I¿ll get rid of you ala Gun Gale Online :v

misty flint Mar 19, 2021, 10:36 PM

#

CryLaugh

#

ID_BoomKek

#

skull mask villain monkaCHRIST

#

but yeah

#

were going to try to use this for our telegram project

#

https://streamlit.io/gallery

#

the code is v minimal

#

even less than flask

shut valve Mar 19, 2021, 10:38 PM

#

yes streamlit is also very cool again its just i started in dash gonna build on what i know

exotic maple Mar 19, 2021, 10:38 PM

#

bro is this what my CS friend called "tool hell"

misty flint Mar 19, 2021, 10:38 PM

#

shut valve yes streamlit is also very cool again its just i started in dash gonna build on ...

im going to try dash if i cant figure out how to make the graphs interactive on streamlit

exotic maple Mar 19, 2021, 10:39 PM

#

like 500 good tools

#

to do the same shit

misty flint Mar 19, 2021, 10:39 PM

#

the engineer's dilemma

exotic maple Mar 19, 2021, 10:39 PM

#

abandon ML

#

return to carpentry

misty flint Mar 19, 2021, 10:39 PM

#

too caught up in the tools you never actually build anything

misty flint Mar 19, 2021, 10:39 PM

#

exotic maple return to carpentry

blobhyperthink

exotic maple Mar 19, 2021, 10:39 PM

#

When you have a hammer, everything looks like a nail

#

-throws a naive bayes classifier in your face-

misty flint Mar 19, 2021, 10:40 PM

#

we need a ML emote

#

somehow

exotic maple Mar 19, 2021, 10:40 PM

#

my GPU burning?

#

me cooking meat over my PC?

misty flint Mar 19, 2021, 10:40 PM

#

💀

shut valve Mar 19, 2021, 10:40 PM

#

new tool for that google colab

exotic maple Mar 19, 2021, 10:40 PM

#

or my accuracy been lower than 0.7?

#

ay

misty flint Mar 19, 2021, 10:41 PM

#

exotic maple or my accuracy been lower than 0.7?

💀 we did a Random Forest and a Decision Tree the other day and the accuracy was only slightly better than 50-50

#

0.53

#

💀

exotic maple Mar 19, 2021, 10:41 PM

#

might as well guess

#

lmao

misty flint Mar 19, 2021, 10:41 PM

#

right?

#

it had to do with poor data

#

but good thing it was just a throwaway assignment

shut valve Mar 19, 2021, 10:42 PM

#

what was your data?

misty flint Mar 19, 2021, 10:43 PM

#

like super small sample so the machine couldnt learn properly

#

also

#

tensorflow emote when

#

blobhyperthink

#

i need one

iron basalt Mar 19, 2021, 10:50 PM

#

dim dirge I explain my problem ? or...

Just ask, if it's the wrong channel we can tell you which channel to ask the question in.

dim dirge Mar 19, 2021, 10:51 PM

#

iron basalt Just ask, if it's the wrong channel we can tell you which channel to ask the que...

rps=np.loadtxt('Line_001RPS.txt',dtype = np.str)
sps=np.loadtxt('Line_001SPS.txt',dtype = np.str)
print('File 1 shape', sps.shape) #Pour connaître la structure du fichier en détail
print('File 2 shape', rps.shape)
rps.shape
sps.shape
print(len(sps))
Nlines_sps=sps.shape[0]
Nlines_rps=rps.shape[0]
fichier=open('mise_a_jour.txt','w')
for i in range(Nlines_sps): #boucle for pour gerer le premier fichier(qui fonctionne correctement)
Cf1 = sps[i,0] + ' ' + sps[i,1] + ' ' + sps[i,2] + ' ' + sps[i,3] #ligne du premier fichier
fichier.write(( (Cf1 +'\n')*282 ))
fichier.close()

iron basalt Mar 19, 2021, 10:51 PM

#

!code

arctic wedgeBOT Mar 19, 2021, 10:51 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

dim dirge Mar 19, 2021, 10:52 PM

#

rps=np.loadtxt('Line_001RPS.txt',dtype = np.str)
sps=np.loadtxt('Line_001SPS.txt',dtype = np.str)
print('File 1 shape', sps.shape) #Pour connaître la structure du fichier en détail
print('File 2 shape', rps.shape)
rps.shape
sps.shape
print(len(sps))
Nlines_sps=sps.shape[0]
Nlines_rps=rps.shape[0]
fichier=open('mise_a_jour.txt','w')
for i in range(Nlines_sps): #boucle for pour gerer le premier fichier(qui fonctionne correctement)
    Cf1 = sps[i,0] + ' ' + sps[i,1] + ' ' + sps[i,2] + ' ' + sps[i,3] #ligne du premier fichier
    fichier.write(( (Cf1 +'\n')*282 ))
fichier.close()

iron basalt Mar 19, 2021, 10:52 PM

#

ok, so what is the context and what is the goal?

dim dirge Mar 19, 2021, 10:54 PM

#

I have indeed two data files with each: SPS (251 rows 4 columns) and RPS (781 rows and 4 columns)
first I duplicated each line of sps by 282, so I got a new file (mise_a_jour) of 70782 lines and 4 columns. then I want to make a scan in RPS so as to take the first 282 lines to add them in the mise_a_jour file and then continue the operation by starting again not at d but at d+2 (I shift of two(2) lines each time always taking 282 lines.
I know if I have been clear enough but you can ask me other questions!

iron basalt Mar 19, 2021, 10:58 PM

#

So your problem is that you want to loop over RPS and take the first 282 lines and add them, then move the cursor down 2 lines and add the next 282 lines and so on. What are these 282 lines being added to?

#

"add them in the mise_a_jour file" - this needs more explanation.

#

by "add" do you mean append to the end of the file (such that the number of lines of the file increases by 282 each time)?

#

Also what is the context? So that we do not waste time on an XY problem. @dim dirge

dim dirge Mar 19, 2021, 11:06 PM

#

iron basalt So your problem is that you want to loop over RPS and take the first 282 lines a...

yes exactly !

dim dirge Mar 19, 2021, 11:08 PM

#

iron basalt So your problem is that you want to loop over RPS and take the first 282 lines a...

the 282 rows are added to the update_file which currently contains 70782 rows and 4 columns. at the end of the job, the update_file will contain 70782 rows and 8 columns

dim dirge Mar 19, 2021, 11:10 PM

#

iron basalt by "add" do you mean append to the end of the file (such that the number of line...

not add to the line, but rather on the same line to make the 8 columns

iron basalt Mar 19, 2021, 11:10 PM

#

So you are aligning / matching lines and adding them up, resulting in the same number of lines (element-wise addition)?

dim dirge Mar 19, 2021, 11:14 PM

#

iron basalt So you are aligning / matching lines and adding them up, resulting in the same n...

exactly... so the first line of SPS will correspond to the first line of RPS in update_a_jour
this will be the beginning of the update_file

exotic maple Mar 19, 2021, 11:14 PM

#

does pandas have an append I/O method when producing a CSV output?

In standard python i think we can

with("file", "a")

to add something at the end of a file instead of overwriting.

iron basalt Mar 19, 2021, 11:16 PM

#

dim dirge exactly... so the first line of SPS will correspond to the first line of RPS in ...

Ok, but there is a problem, this will not align, RPS has too many lines even when you skip every other line in it (781 / 2 * 282 = 110121).

#

What do you want to happen when the end of SPS is reached before the end of RPS is reached?

tidal bough Mar 19, 2021, 11:21 PM

#

exotic maple does pandas have an append I/O method when producing a CSV output? In standard ...

You're talking about with open(file, "a") (which opens the file in append mode, so writes result in the data being added to the end of the file).

It's probably a better idea to just read the file, append your data to it, then dump it back. If you must, though, try passing a file-object opened in "a" mode to to_csv.

dim dirge Mar 19, 2021, 11:21 PM

#

iron basalt Ok, but there is a problem, this will not align, RPS has too many lines even whe...

No no RPS has 781 lines.
and SPS has 251; so 251*282=70782
If I take the first 282 lines of RPS, and match them to the first line of SPS duplicate 282 times.
I'll leave you the three files if you want, so you can look at them and maybe it will help you to understand better!

dim dirge Mar 19, 2021, 11:22 PM

#

iron basalt What do you want to happen when the end of SPS is reached before the end of RPS ...

the SPS lines are duplicated 282 times, which makes 251*282=70782 lines

#

if I take 282 lines of RPS and start again by d+2 each time, I will have 70782 lines also for RPS
NB: the numbering of RPS goes from 561 to 1342.
it is an acquisition in which each signal sent by one (1) point of SPS (line) is recorded by 282 points RPS (line). then to send my second signal I shift two (2) points in RPS. that is to say that the first two points of RPS which recorded the first point of SPS will not record the second point of SPS anymore

exotic maple Mar 19, 2021, 11:31 PM

#

tidal bough You're talking about `with open(file, "a")` (which opens the file in append mode...

Thanks, Mr. Confused Reptile pithink

iron basalt Mar 19, 2021, 11:32 PM

#

dim dirge if I take 282 lines of RPS and start again by d+2 each time, I will have 70782 l...

RPS has 781 lines correct?

#

So one thing you want for sure is to duplicate each line of sps 282 times right? let's start with that.

#

sps_duplicated = np.repeat(sps, 282, axis=0)

dim dirge Mar 19, 2021, 11:35 PM

#

iron basalt So one thing you want for sure is to duplicate each line of sps 282 times right?...

yes yes I already did

iron basalt Mar 19, 2021, 11:35 PM

#

next you want every other line from rps right?

#

every_other_rps = rps[::2]

#

oh wait nvm, you want to have a sliding window over rps

dim dirge Mar 19, 2021, 11:38 PM

#

iron basalt ```py sps_duplicated = np.repeat(sps, 282, axis=0) ```

I can't believe it, you got it in one order?
It took me 2 days though 😩

dim dirge Mar 19, 2021, 11:39 PM

#

iron basalt next you want every other line from rps right?

yes

iron basalt Mar 19, 2021, 11:39 PM

#

# Loop through rps, but with step of 2 (ever other)
for i in range(0, rps.shape[0], 2):
  # Do stuff here

dim dirge Mar 19, 2021, 11:41 PM

#

iron basalt next you want every other line from rps right?

Yes, but the first 282 lines must correspond to the other 282 of SPS and then I come back to take the other 282 values that follow, leaving the first two lines

iron basalt Mar 19, 2021, 11:44 PM

#

You are looping through every other line of rps while also looping over every line of sps (not every other)? @dim dirge

dark willow Mar 19, 2021, 11:46 PM

#

What's a nice way to embed PyCharm visualizations in a Medium article?

sharp gate Mar 19, 2021, 11:47 PM

#

i think this is the right channel...

from PIL import Image
im = Image.open("maze.jpg")
im.show()

output = open('maze.txt', 'a+')
for pixel in iter(im.getdata()):
    output.write(str(pixel))

this gives me a lot of tupels in (R,G,B) configuration, however i would like them to be in a [1, 0, 0, 1, 1, 0] sorta configuration
i am ~~fairly~~ very new to doing this kind of thing in python.

#

so any help would be great

dim dirge Mar 19, 2021, 11:51 PM

#

iron basalt You are looping through every other line of rps while also looping over every li...

wait I'll leave you the files, so we can understand each other better I think

arctic wedgeBOT Mar 19, 2021, 11:51 PM

#

Hey @dim dirge!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

#

Hey @dim dirge!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

#

Hey @dim dirge!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

sharp gate Mar 19, 2021, 11:56 PM

#

dim dirge wait I'll leave you the files, so we can understand each other better I think

try putting the code in pastebin.com

arctic wedgeBOT Mar 19, 2021, 11:59 PM

#

Hey @dim dirge!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

#

Hey @dim dirge!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

dim dirge Mar 20, 2021, 12:04 AM

#

sharp gate try putting the code in pastebin.com

I did it but it doesn't work

dim dirge Mar 20, 2021, 12:09 AM

#

iron basalt You are looping through every other line of rps while also looping over every li...

701 688081.8 3838302.1 46.0 561 684590.2 3837867.6 41.0

#

this is the first line of the file I need to have

bronze skiff Mar 20, 2021, 12:24 AM

#

shut valve im using speech brain for a project rn that uses pytorch first time working with...

no, certs won't get you a job in ai

#

considering the material that cert is based off of is really basically, it won't really be an indicator of anything

#

the only cert that matters for ml jobs is your degree

#

lets be real

grave frost Mar 20, 2021, 12:26 AM

#

yeah, totally agree

#

maybe you could get a job without degree, but you would have to be a 200iq genius for that

misty flint Mar 20, 2021, 12:29 AM

#

your degree is a pretty important cert

#

DoggoKek

tranquil apex Mar 20, 2021, 12:44 AM

#

im stuck

#

i want to calculate the growth ratio between these dates for each file

file_name           date   dist        
20210314_080621.txt 03-16  0.820328
                    03-18  0.838098
20210314_080633.txt 03-16  0.755168
                    03-18  0.784473
20210314_080644.txt 03-16  0.561407

#

i dont think groupby.agg would work for this

misty flint Mar 20, 2021, 1:28 AM

#

oh wait i misinterpreted your question

stiff barn Mar 20, 2021, 1:50 AM

#

If anyone is interested, I just finished up a project experimenting with style GANs. I trained a model that generates 1024x1024 bird images. If you're curious, I post 2 per day on Twitter as @bird_not_exist. Happy to discuss methodology with anyone who is interested.

misty flint Mar 20, 2021, 2:03 AM

#

just took a peek, looks awesome!

#

would love to ask more but idk anything about GANs but i might come to you later if i do something with it.

#

like, do you think its possible to do the same with fishies?

#

🐡

stiff barn Mar 20, 2021, 2:05 AM

#

Thanks @misty flint! Happy to answer any questions you have. Feel free to send me a DM whenever as well.

#

You could for sure do it for fish haha.

#

Would just need to get enough images. I used something like 35K bird images to train and more would have been better.

#

Also need a pretty beefy GPU haha

misty flint Mar 20, 2021, 2:08 AM

#

this is good info to know. thanks i might take you up on that offer when i get a bit further along in my ML studies

#

Praise

stiff barn Mar 20, 2021, 2:10 AM

#

Good stuff! GANs seem to have a lot of really cool use cases for sure.

misty flint Mar 20, 2021, 2:11 AM

#

DoggoKek

exotic maple Mar 20, 2021, 2:17 AM

#

misty flint https://streamlit.io/gallery

bro this crap looks amazing af omg

misty flint Mar 20, 2021, 2:20 AM

#

#

ah yes, learn all the tools

#

blobhyperthink

exotic maple Mar 20, 2021, 2:21 AM

#

misty flint

Luvia

#

a goddamn

#

man

#

of culture

misty flint Mar 20, 2021, 2:21 AM

#

DoggoKek

stiff barn Mar 20, 2021, 2:27 AM

#

Seems pretty sweet. Looks like it goes a lot further than plotly

#

Been a while since I bothered with plotly though. Could be better by now. Really just only use matplotlib and seaborn

misty flint Mar 20, 2021, 2:37 AM

#

usually good for most cases

#

~~until you need to show non-technical people~~

#

DoggoKek

stiff barn Mar 20, 2021, 2:57 AM

#

I live a simple life lol.

misty flint Mar 20, 2021, 3:00 AM

#

jealous

autumn veldt Mar 20, 2021, 5:53 AM

#

i got 97 test data, and i willing to do some manual calculation, but when i print my test data, i only got 5 from top and 5 from bottom. im using print(X_test). what syntac do i need to write to show all my 97 data sample?

agile wing Mar 20, 2021, 6:53 AM

#

exploratory data analytics

lapis sequoia Mar 20, 2021, 9:00 AM

#

Hi - I wanted to ask what is the name of the thing I'm trying to do:

Say I have 200 sentences that follow this pattern:

Chevrolet can go 200 km/h
230 km/h is the top speed of this Ford

I wanted to train a model that would identify what's the CarBrand and TopSpeed in that sentence. What is the scientific name for what I'm trying to do? It's not sentiments

EDIT: it's named entity recognition

buoyant yacht Mar 20, 2021, 9:41 AM

#

please suggest the world best github repo / any blog for sentiment analysis with good accuracy?

#

anyone knows?

grave frost Mar 20, 2021, 10:15 AM

#

buoyant yacht please suggest the world best github repo / any blog for sentiment analysis with...

depends on the dataset; in most cases it would need a lot of modifications/tweaks to get it working

buoyant yacht Mar 20, 2021, 10:21 AM

#

can you suggest some github repo for sentiment analysis ?

grave frost Mar 20, 2021, 10:28 AM

#

you can google it lol, there are plenty out there

grave frost Mar 20, 2021, 11:45 AM

#

autumn veldt i got 97 test data, and i willing to do some manual calculation, but when i prin...

if it is a numpy array, you can convert it to list and use print()

#

if in pandas, use .values to convert a column to a numpy array

autumn veldt Mar 20, 2021, 11:53 AM

#

grave frost if in pandas, use `.values` to convert a column to a numpy array

Ok, thanks, I'll try it later

grave frost Mar 20, 2021, 11:57 AM

#

if it still can't be done, you can use magic functions to write the variable to the file (I think its %writefile variable > file_namee.txt) and view the file

remote pumice Mar 20, 2021, 1:37 PM

#

i want to display date and time generated in python file (.py) in my django project its a opencv file
i have captured the drowsiness alert data so

lapis sequoia Mar 20, 2021, 2:59 PM

#

I'm totally noob in Python, can someone help me? I have some homework from space data science and this is my notebook:

hollow sentinel Mar 20, 2021, 3:03 PM

#

We can’t supply code for homework

#

but we can give general ideas

rotund dock Mar 20, 2021, 3:13 PM

#

Hi anyone here familiar with scipy.stats?

hollow sentinel Mar 20, 2021, 3:14 PM

#

@rotund dock https://dontasktoask.com/

Don't ask to ask, just ask

rotund dock Mar 20, 2021, 3:15 PM

#

hollow sentinel <@807312874627268688> https://dontasktoask.com/

Fair enough

#

I want to calculate the value for a given probability using the st.gumbel_r.ppf(). I'm comparing it with the analytical solution and it is giving me completely different results, anyone knows why?
I obtained the values for the scale and location using the moment of methods

P_class = [0.14285714, 0.28571429, 0.42857143, 0.57142857, 0.71428571, 0.85714286, 0.999999]

u = 8.590342451210152
alpha = 0.1841827435642898

h_class1 = st.gumbel_r.ppf(P_class, scale = u, loc = alpha)
h_class = u-np.log(-np.log(P_class))/alpha

Results
h_class1 = [ -5.53466431,  -1.7516637 ,   1.6076281 ,   5.17091797, 9.54112426,  16.24661736, 118.86414528]
h_class = [4.97583548,  7.36682128,  9.49000861, 11.74212971, 14.50424958, 18.74235061, 83.60013865]

I want to get the same h_class results when using the scipy function

serene scaffold Mar 20, 2021, 3:18 PM

#

hollow sentinel <@807312874627268688> https://dontasktoask.com/

I'm not sure what your intentions are, but linking people to that website isn't appropriate. It's comes off as dismissive and condescending.

hollow sentinel Mar 20, 2021, 3:20 PM

#

sorry I saw other people do it so I assumed it was ok

serene scaffold Mar 20, 2021, 3:21 PM

#

hollow sentinel sorry I saw other people do it so I assumed it was ok

I see. Thanks for letting me know. We're see that sort of thing in the same light as "google it" or "rtfm", so be sure to avoid it.

hollow sentinel Mar 20, 2021, 3:22 PM

#

Got it. Won’t happen again

rotund dock Mar 20, 2021, 3:25 PM

#

I was just wondering if this was the right place to ask a question about that topic

hollow sentinel Mar 20, 2021, 3:27 PM

#

Which topic

rotund dock Mar 20, 2021, 3:32 PM

#

scipy.stats

#

Statistics

hollow sentinel Mar 20, 2021, 3:57 PM

#

yeah it is

serene scaffold Mar 20, 2021, 4:06 PM

#

rotund dock I was just wondering if this was the right place to ask a question about that to...

This is the right channel for asking about scipy. The best way to get help is to just dive right in to your question--if someone understands the subject matter, they'll see your message and help.

grave frost Mar 20, 2021, 4:07 PM

#

serene scaffold I'm not sure what your intentions are, but linking people to that website isn't ...

hmm..but the content in the website seems pretty polite and very understanding. It's just a suggestion - not phrased as a rule. If the tone was bad, I would have understood. but I dont see the harm in anyone getting someone to read the content in the site

#

if it was me, This link should be in the first message every new user gets

serene scaffold Mar 20, 2021, 4:07 PM

#

So for example, don't ask if anyone knows about a general library or a type of problem, and hope that that person will know how to answer your specific question. It's easier to start helping if we know exactly what you'd like help with.

serene scaffold Mar 20, 2021, 4:09 PM

#

grave frost hmm..but the content in the website seems pretty polite and very understanding. ...

the wording on the website isn't necessarily bad, but getting a link like that when wanting help with a problem sends the message that "look, you've made a rookie mistake so common that there's a website dedicated to it"

#

I believe we cover some of the same material in our question asking guide.

grave frost Mar 20, 2021, 4:09 PM

#

serene scaffold the wording on the website isn't necessarily bad, but getting a link like that w...

yeah, so? StackOverflow has many such links for everyone. its much easier to link a message than to type it out

#

typing out everything everytime someone new asks would flood the server. its not efficient

serene scaffold Mar 20, 2021, 4:10 PM

#

grave frost yeah, so? StackOverflow has many such links for everyone. its much easier to lin...

I'm not sure I follow your reasoning. This isn't stack overflow. And Discord hasn't threatened to decrease their resource allocation for us.

grave frost Mar 20, 2021, 4:11 PM

#

its not about the resource allocation, just simply that :
putting a link to a message is more efficient than typing out a huge wall of lines everytime

serene scaffold Mar 20, 2021, 4:12 PM

#

That's why I have a file of copypastas that I wrote

grave frost Mar 20, 2021, 4:13 PM

#

serene scaffold That's why I have a file of copypastas that I wrote

that's pretty useless and inefficient. either you let someone put a link (because not everyone wants to store copypastas) or you get a bot to do that

#

a link does not feel harsh or anything. its just a website. how can it convey some negative feelings?

serene scaffold Mar 20, 2021, 4:14 PM

#

I can appreciate that getting a link like that might not feel harsh to you, but it does for a lot of people, and the reason it does makes complete sense.

#

How about you DM @sonic vapor if you'd like to discuss this further.

bronze skiff Mar 20, 2021, 4:37 PM

#

otherwise you only lead to inefficiencies of a question-answering system and help vampirism

serene scaffold Mar 20, 2021, 4:38 PM

#

Pasting the link without another word is the problem (not necessarily the content of the website), for the reasons I've explained previously.

You're not required to give long-winded responses for each ask-to-ask instance. You can simply say "Go ahead and ask" if you'd like.

Let us know in #community-meta if you'd like to discuss this further.

kindred radish Mar 20, 2021, 5:11 PM

#

So I'm using sklearn's MLPClassifier to predict whether a machine breaks based on input data

#

I've tried a tonne of different paramter settings and 8/10 times the machine's score is terrible

#

It's not really predicting it at all

#

Is this indicative that the input data might not actually correlate to whether a break happens or not?

serene scaffold Mar 20, 2021, 5:35 PM

#

@kindred radish what features are you using?

kindred radish Mar 20, 2021, 5:41 PM

#

I've got like 6 features and they're features of the thing that's going into the machine

#

Such as the thickness of the material and its composition

#

I don't have any data on the machine itself apart from how many times it breaks a day

grave frost Mar 20, 2021, 6:03 PM

#

@kindred radish wut?

#

can you clarify what exactly you want to accomplish?

kindred radish Mar 20, 2021, 6:14 PM

#

Ok so:
-X_train are like 6 features about the film that goes into the machine.
-y_train is an array of 1s and 0s that represent whether or not the machine breaks.
-This is fed into the MLPClassifier that SkLearn provides.
-The aim was to create a model that could correctly predict if the machine breaks or not.
-Despite playing around with the classifier's parameters, the model is unsuccessfully predicting whether the machine breaks: the score and the precision are abysmal.

my question is: Does this mean that the input data is uncorrelated to whether or not the machine breaks? Can I say that this data doesn't have anything to do with the machine breaking?

kindred radish Mar 20, 2021, 6:40 PM

#

Lmk if anything needs clarification, I think that's as clear as I could make it! I standardised the input data as well ^^

grave frost Mar 20, 2021, 7:03 PM

#

yeah, either your model does not have correctly inputted data, or there is no correlation. can you identify whether or not the machie breaks seeing a sample?

ionic sun Mar 20, 2021, 7:04 PM

#

ive been stuck for hours on trying to curve fit some data

#

the curve is relatively exponential

#

but when i plot the curve fit it gives me a straight line

#

#

def curve1(x, a, b, c):
    return a * (b ** x) + c

def plot1():
    fig, ax = plt.subplots()
    graph = ax.scatter(year1, production, c=production, cmap="bone_r", s=8)
    popt, _ = curve_fit(curve1, year1, production, p0=[-1000, 1e-6, 1])
    a, b, c = popt[0], popt[1], popt[2]
    fit = []
    for i in year1:
        fit.append(curve1(i, a, b, c))

    plt.plot(year1, fit)
    plt.tight_layout()
    plt.savefig("graph1.png")
    plt.show()```

kindred radish Mar 20, 2021, 7:11 PM

#

grave frost yeah, either your model does not have correctly inputted data, or there is no co...

I'm sorry, what did you mean by that last bit? Are you asking if I have access to the physical machine itself?

polar charm Mar 20, 2021, 7:21 PM

#

hello, when I run my network, "model.fit(x=train_samples, y=train_labels, batch_size=10, epochs=30, verbose=2)" and train_sample and _label are train_labels =['^GSPC.Adj Close.csv']
train_samples = ['^GSPC.Close.csv']

I get Function call stack:
train_function

misty flint Mar 20, 2021, 8:30 PM

#

rotund dock I want to calculate the value for a given probability using the st.gumbel_r.ppf(...

i couldnt figure it out either. sorry. i tried all the various methods in that module too

grave frost Mar 20, 2021, 8:57 PM

#

kindred radish I'm sorry, what did you mean by that last bit? Are you asking if I have access t...

no, like if you were given the training data, could you yourself identify whether it would break or not?

kindred radish Mar 20, 2021, 9:09 PM

#

grave frost no, like if you were given the training data, could you yourself identify whethe...

No I couldn't, I didn't even know if there was a correlation to begin with. I was just given the data to play around with and see if I could get anything out of it

kindred radish Mar 20, 2021, 9:46 PM

#

But it's fine that it doesn't work, as long as that means I can say that the variables do not affect whether the machine breaks or not

grave frost Mar 20, 2021, 9:49 PM

#

kindred radish But it's fine that it doesn't work, as long as that means I can say that the var...

no, it would be more likely that your feature engineering is wrong

kindred radish Mar 20, 2021, 9:51 PM

#

How do you mean?

#

All of the features have been standardised and I've removed outliers

grave frost Mar 20, 2021, 9:53 PM

#

https://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/

Machine Learning Mastery

Discover Feature Engineering, How to Engineer Features and How to G...

Feature engineering is an informal topic, but one that is absolutely known and agreed to be key to success in applied machine learning. In creating this guide I went wide and deep and synthesized all of the material I could. You will discover what feature engineering is, what problem it solves, why it matters, how […]

kindred radish Mar 20, 2021, 9:56 PM

#

Is this more to do with unsupervised learning?

#

Thank you for your answers so far btw!

grave frost Mar 20, 2021, 9:57 PM

#

kindred radish Is this more to do with unsupervised learning?

uhh no. this is feature engineering - that involves applying certain methodologies to enhance the features you are feeding to a model

kindred radish Mar 20, 2021, 10:03 PM

#

I'm unsure how much engineering I can really do with my features

#

They're measured values of things like Viscosity or amount of chemical

grave frost Mar 20, 2021, 10:04 PM

#

well, its mostly common-sense/logic. What do you think is the best way to represent your data so that the model can understand

kindred radish Mar 20, 2021, 10:05 PM

#

Well I made them all standardised so that they were comparable to one another

#

As some features are of the order of magnitude of 100 whilst some are like 0.1

#

So I guess that was a form of feature engineering?

grave frost Mar 20, 2021, 10:05 PM

#

uh-huh

#

and what is your task as well as your model?

kindred radish Mar 20, 2021, 10:07 PM

#

My task was exploratory, I wanted to see if this data was responsible for why the machines break

grave frost Mar 20, 2021, 10:07 PM

#

wait, if you are doing EDA then why are you using models?

kindred radish Mar 20, 2021, 10:07 PM

#

If I could get my model to reliably predict a break, that would mean that the input data affects the machine

#

EDA?

misty flint Mar 20, 2021, 10:08 PM

#

exploratory data analysis

grave frost Mar 20, 2021, 10:08 PM

#

EDA involves something like this: https://www.kaggle.com/sohier/structured-eda-for-data-cleaning

Structured EDA for Data Cleaning

Explore and run machine learning code with Kaggle Notebooks | Using data from Credit Card Data from book "Econometric Analysis"

#

a small list of what you have to do (not to solve the task, but to explore the data)

#

exploration: like what feature appears the most. plots etc.

misty flint Mar 20, 2021, 10:12 PM

#

whether the machine breaks? what does that even mean

#

memecringeharold

kindred radish Mar 20, 2021, 10:13 PM

#

kindred radish Ok so: -X_train are like 6 features about the film that goes into the machine. ...

Ah it's a bit confusing, I don't mean machine as machine learning. Refer to this ^^

misty flint Mar 20, 2021, 10:13 PM

#

you have some kind of machine that youre feeding data to to determine whether it physically breaks?

#

why arent you measuring performance of the machine?

#

or does the machine not have any metrics?

#

i know youre not referring to machine in machine learning

kindred radish Mar 20, 2021, 10:13 PM

#

The machine's only metric is that it breaks or that it doesn't break

misty flint Mar 20, 2021, 10:14 PM

#

memecringeharold

#

its tough to do any feature engineering then

#

if all you know is that

kindred radish Mar 20, 2021, 10:14 PM

#

Yeah exactly, there isn't context for me to problem solve

grave frost Mar 20, 2021, 10:15 PM

#

well, you don't have to train a model anyways, so no feature engineering required

misty flint Mar 20, 2021, 10:15 PM

#

can you not look for more context? i feel like it would go a long way in your data analysis

kindred radish Mar 20, 2021, 10:15 PM

#

This is all the data that I have to work with unfortunately

misty flint Mar 20, 2021, 10:15 PM

#

memecringeharold

kindred radish Mar 20, 2021, 10:16 PM

#

Yeah I know

misty flint Mar 20, 2021, 10:16 PM

#

Despite playing around with the classifier's parameters, the model is unsuccessfully predicting whether the machine breaks: the score and the precision are abysmal.
how much data did you have to work with

#

you usually need A LOT of instances

kindred radish Mar 20, 2021, 10:16 PM

#

After cleaning it, it's around 300

misty flint Mar 20, 2021, 10:17 PM

#

pithink

kindred radish Mar 20, 2021, 10:17 PM

#

Yeah not a lot

#

Kind of frustrating, this is my final project as well

misty flint Mar 20, 2021, 10:17 PM

#

memecringeharold

#

yikes

#

how many classifiers did you try

#

try more

#

best answer might be a combo

kindred radish Mar 20, 2021, 10:18 PM

#

I've only used the sklearn MLPClassifier

misty flint Mar 20, 2021, 10:19 PM

#

why did you use that one

kindred radish Mar 20, 2021, 10:19 PM

#

Because i thought it would classify between "break" and "not break"

misty flint Mar 20, 2021, 10:20 PM

#

i think you should not use a neural network

#

you dont have enough data

#

use more simple classifiers

#

its probably overfitting

kindred radish Mar 20, 2021, 10:20 PM

#

Oh right, what kinds of classifier's would you recommend from SkLearn?

misty flint Mar 20, 2021, 10:21 PM

#

try a bunch

#

logistic regression, naive-bayes, SVM, random forest, decision tree, etc.

kindred radish Mar 20, 2021, 10:22 PM

#

Ok thank you I'll give it a go

#

I know I keep harping on about it

misty flint Mar 20, 2021, 10:22 PM

#

see how your accuracy looks like afterwards

kindred radish Mar 20, 2021, 10:23 PM

#

But If the precision is bad for those as well, maybe that suggests the input data has nothing to do with the output?

misty flint Mar 20, 2021, 10:23 PM

#

yes

kindred radish Mar 20, 2021, 10:23 PM

#

Thank god

misty flint Mar 20, 2021, 10:23 PM

#

but youll have to try to see

#

DoggoKek

kindred radish Mar 20, 2021, 10:23 PM

#

Okok thank you

misty flint Mar 20, 2021, 10:23 PM

#

technically thats a conclusion too

kindred radish Mar 20, 2021, 10:23 PM

#

Yes exactly

misty flint Mar 20, 2021, 10:23 PM

#

and then you can show that by showing all the different classifiers you tried

kindred radish Mar 20, 2021, 10:23 PM

#

The worst thing I can say to my supervisor is "I don't know why it doesn't work"

misty flint Mar 20, 2021, 10:24 PM

#

DoggoKek

#

yep

kindred radish Mar 20, 2021, 10:24 PM

#

So if I try a bunch of things and it doesn't work then I can say "this could be because the data isn't right for the job"

#

Which is much much nicer

misty flint Mar 20, 2021, 10:24 PM

#

yep yep

kindred radish Mar 20, 2021, 10:24 PM

#

Thank you !!!

misty flint Mar 20, 2021, 10:24 PM

#

youve proven too

#

np

#

DoggoKek

kindred radish Mar 20, 2021, 10:24 PM

#

Feel so relieved

misty flint Mar 20, 2021, 10:24 PM

#

Praise

kindred radish Mar 20, 2021, 10:25 PM

#

Would you mind if I @you here if I run into trouble?

#

I won't do it for petty shit, just advice

misty flint Mar 20, 2021, 10:25 PM

#

yeah sure

kindred radish Mar 20, 2021, 10:25 PM

#

And it'd only be for tomorrow + Monday

#

Thank you <3 TT.TT

misty flint Mar 20, 2021, 10:25 PM

#

whenever i have time, ill end up responding

#

haha np

#

we're all here learning together

#

DoggoKek

grave frost Mar 20, 2021, 10:27 PM

#

kindred radish But If the precision is bad for those as well, *maybe* that suggests the input d...

huh? how does the precision have to do with any correlation between data?

kindred radish Mar 20, 2021, 10:28 PM

#

Lemme show you what I mean, one sec

misty flint Mar 20, 2021, 10:30 PM

#

you can look at accuracy scores using sklearn btw

#

theres a function/method

#

also this is cool if people are actually trying to figure out CAUSATION, not correlation https://livefreeordichotomize.com/2016/12/15/hill-for-the-data-scientist-an-xkcd-story/

Live Free or Dichotomize - Hill for the data scientist: an xkcd story

kindred radish Mar 20, 2021, 10:31 PM

#

#

So like, right now it's incorrectly predicting "Break" when it's not a break

#

Ideally the top left and bottom right elements would be high numbers

misty flint Mar 20, 2021, 10:32 PM

#

did you use the confusion matrix function

#

try that

kindred radish Mar 20, 2021, 10:33 PM

#

aye that's what this ios

misty flint Mar 20, 2021, 10:33 PM

#

hmm

#

why is it in percentages?

#

or is that how the data is?

kindred radish Mar 20, 2021, 10:33 PM

#

It's set to be normalised

misty flint Mar 20, 2021, 10:33 PM

#

ah

grave frost Mar 20, 2021, 10:33 PM

#

whats your F-score?

kindred radish Mar 20, 2021, 10:34 PM

#

uhhh how would i find that? Do you just literally mean "model.score"? For this run it was 57%

grave frost Mar 20, 2021, 10:35 PM

#

well, is your data imbalanced?

misty flint Mar 20, 2021, 10:35 PM

#

i think classification_report() tells you a bunch of scores

grave frost Mar 20, 2021, 10:35 PM

#

(and please do not jump in a task without learning the appropriate basics,
you would struggle more and you wouldn't understand anything)

kindred radish Mar 20, 2021, 10:35 PM

#

imbalanced how?

#

This is my final year project and I've been teaching myself everything since my supervisor doesn't understand how any of this works. This is the last stage of it all 😕

misty flint Mar 20, 2021, 10:36 PM

#

since my supervisor doesn't understand how any of this works.

#

thats a big rip

#

🕯️

grave frost Mar 20, 2021, 10:36 PM

#

do you want to do data analysis or EDA?

kindred radish Mar 20, 2021, 10:37 PM

#

I specifically have to do machine learning. I've got a final meeting with a CEO on Tuesday where I have to explain the results of this :))))))))))))

#

So i wont really have time to implement anything major

grave frost Mar 20, 2021, 10:38 PM

#

how does making a model help in this?

#

to find correlation between features, there are different techniques

kindred radish Mar 20, 2021, 10:38 PM

#

I originally just wanted to predict if the machine would break as I thought the data was correlated

#

As i wanted to be able to say to the CEO "hey this is why ML is good. Look, i can predict when your machines break"

#

And my supervisor also thought that would be good

grave frost Mar 20, 2021, 10:39 PM

#

and what exactly is your task- meaning what exact variables have you been given?

kindred radish Mar 20, 2021, 10:41 PM

#

The Ripeness of the film, the amount of a chemical in it, the viscosity of it, the thickness and some other stuff as well

grave frost Mar 20, 2021, 10:42 PM

#

and WTH is the machine?

misty flint Mar 20, 2021, 10:42 PM

#

hahahaha

#

thats what I was asking

#

💀

#

people always try to separate the data from the context and it never ends well

kindred radish Mar 20, 2021, 10:42 PM

#

The machine essentially rolls the "liquid" into films

grave frost Mar 20, 2021, 10:43 PM

#

and how often does it break per hour?

kindred radish Mar 20, 2021, 10:43 PM

#

kindred radish The Ripeness of the film, the amount of a chemical in it, the viscosity of it, t...

All of these inputs are for the "liquid". Sorry i said film because ive wanted to avoid saying "liquid" since i dont want to talk too much about specifics of the manufacture since it's a company i sort of working under

#

I havent been given that data, ive only been given how many times a day it breaks

grave frost Mar 20, 2021, 10:44 PM

#

🤦 if its a number, then why are you trying to predict whether it breaks or not?

kindred radish Mar 20, 2021, 10:44 PM

#

I tried to break the problem down into a simpler one

#

And i thought that predicting whether the machine would break at all would be simpler than trying to predict how many times it breaks

grave frost Mar 20, 2021, 10:45 PM

#

please tell the whole problem next time. your task is not classification, its regression

kindred radish Mar 20, 2021, 10:45 PM

#

the number of breaks is very small: around 0-5

grave frost Mar 20, 2021, 10:45 PM

#

and does it break atleast once a day?

kindred radish Mar 20, 2021, 10:45 PM

#

No, some days it doesnt break

grave frost Mar 20, 2021, 10:46 PM

#

then that is a regression task - which is much easier than making it a classification task with 5 labels

kindred radish Mar 20, 2021, 10:47 PM

#

When i tried to use MLPRegression it didn't work too well

misty flint Mar 20, 2021, 10:47 PM

#

ID_BoomKek

grave frost Mar 20, 2021, 10:47 PM

#

kindred radish When i tried to use MLPRegression it didn't work too well

hm...accuracy?

kindred radish Mar 20, 2021, 10:47 PM

#

I got a negative score lol

grave frost Mar 20, 2021, 10:47 PM

#

kindred radish I got a negative score lol

thats not accuracy lol

#

accuracy is always positive

misty flint Mar 20, 2021, 10:47 PM

#

idk how you can get a negative accuracy

kindred radish Mar 20, 2021, 10:47 PM

#

yeah i know thats why i was so confused

misty flint Mar 20, 2021, 10:47 PM

#

kekw

#

think you have the wrong numbers

kindred radish Mar 20, 2021, 10:48 PM

#

I used this: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor.score

#

And got a negative numner

#

yes i understand the square of something gives a positive

misty flint Mar 20, 2021, 10:49 PM

#

you found R^2

#

thats not accuracy

deft ruin Mar 20, 2021, 10:49 PM

#

Yeah that can be negative if the model performs worse than the mean

misty flint Mar 20, 2021, 10:49 PM

#

anyway

#

heres a stats meme

kindred radish Mar 20, 2021, 10:50 PM

#

Id read that a negative score was bad 😅

misty flint Mar 20, 2021, 10:50 PM

#

grave frost Mar 20, 2021, 10:50 PM

#

just leave that and focus on the accuracy 🤷

misty flint Mar 20, 2021, 10:50 PM

#

💀

kindred radish Mar 20, 2021, 10:51 PM

#

ok im sorry that i might seem really dumb or whatever, but ive been given literally no direction and im a Physicist, i code in python all day and pray to god that my code doesnt come back to haunt me because it looks monstrous

misty flint Mar 20, 2021, 10:51 PM

#

friendly remember: sometimes, outliers can mean something, so just think about your problem as a whole before cutting them out, etc.

grave frost Mar 20, 2021, 10:51 PM

#

kindred radish ok im sorry that i might seem really dumb or whatever, but ive been given litera...

thats alright

deft ruin Mar 20, 2021, 10:52 PM

#

No worries man

grave frost Mar 20, 2021, 10:52 PM

#

negative R2 is pretty bad - did you try increasing the number of hidden layers in the MLP?

kindred radish Mar 20, 2021, 10:52 PM

#

thank you guys

#

Yeah i played around with the parameters for absolutely ages

deft ruin Mar 20, 2021, 10:52 PM

#

Sometimes that means that there is something going on with your model specification

#

It might be worth taking a closer look at your data

kindred radish Mar 20, 2021, 10:52 PM

#

Tried a bunch of things, brought on my CS housemate to look at the parameters too

misty flint Mar 20, 2021, 10:52 PM

#

ruler just wants to remind people to brush up on stats basics before diving into ML

#

which is fair

#

bc it happens a lot

#

ID_BoomKek

grave frost Mar 20, 2021, 10:53 PM

#

something like this = hidden_layer=(10,30,30,50)

#

@kindred radish number of layers, not the amount of neurons in each layer

kindred radish Mar 20, 2021, 10:53 PM

#

I did (100,50,25) ive tried just doing 100 or stuff

deft ruin Mar 20, 2021, 10:54 PM

#

Might also be worth plotting some of the variables against each other and coloring by whether the machine broke that day to see if there is any pattern

#

Or doing the same with histograms

kindred radish Mar 20, 2021, 10:54 PM

#

grave frost <@!248911149767065610> number of layers, not the amount of neurons in each layer

oh this might be a problem then? Wait let me show you

grave frost Mar 20, 2021, 10:54 PM

#

deft ruin Might also be worth plotting some of the variables against each other and colori...

in multi-dimensions? 👀 TSNE it

misty flint Mar 20, 2021, 10:54 PM

#

the problem with data analysis is it can take time

#

DoggoKek

kindred radish Mar 20, 2021, 10:55 PM

#

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor.score
the very first parameter "hidden_layer_sizes" is what i've been doing (100,50,25)

#

I did see improvement

#

But not much

grave frost Mar 20, 2021, 10:56 PM

#

make it even longer; use solver lbfsg

kindred radish Mar 20, 2021, 10:56 PM

#

Yeah ive used that solver because my data set is small

grave frost Mar 20, 2021, 10:56 PM

#

and more hidden layers?

kindred radish Mar 20, 2021, 10:56 PM

#

Yeah i tried a bunch of things

grave frost Mar 20, 2021, 10:56 PM

#

make max_iter=1000 or so?

kindred radish Mar 20, 2021, 10:56 PM

#

    return MLPClassifier(hidden_layer_sizes = (100,50,25),
                          random_state=0,
                          activation = 'relu', 
                          learning_rate_init=0.0001,
                          solver='adam',
                          max_iter=5000,
                          verbose=True,
                          early_stopping=False,
                          n_iter_no_change=10,
                          alpha=0.001,
                          tol=1e-8,
                          beta_1=0.9,
                          max_fun=15000)

#

Like, everything you see here I changed

#

Spent hours messing around with it trying to create variations on each other

grave frost Mar 20, 2021, 10:58 PM

#

what about a 10-layer network?

#

also, put validation_fraction=0.1 to test your network on 10% of your train data

kindred radish Mar 20, 2021, 11:01 PM

#

so like if i made hidden_layer_sizes = (1000,500,250,125,60,30,15)? I thought the length of this tuple dictated the number of layers??

grave frost Mar 20, 2021, 11:01 PM

#

kindred radish so like if i made hidden_layer_sizes = (1000,500,250,125,60,30,15)? I thought t...

it does

#

are you using colab?

kindred radish Mar 20, 2021, 11:02 PM

#

grave frost also, put `validation_fraction=0.1` to test your network on 10% of your train da...

Right now it's testing against 20% of the total data. Like i split my data into test and train data

kindred radish Mar 20, 2021, 11:02 PM

#

grave frost are you using colab?

No im not, what's that?

grave frost Mar 20, 2021, 11:02 PM

#

kindred radish No im not, what's that?

nvm that

kindred radish Mar 20, 2021, 11:02 PM

#

grave frost it does

Ill try something like this

grave frost Mar 20, 2021, 11:03 PM

#

kindred radish Right now it's testing against 20% of the total data. Like i split my data into ...

try leaving the split, and just put the parameter there. be careful to pass all your data variable (before the split)

kindred radish Mar 20, 2021, 11:04 PM

#

umm ok, why do it this way round? The test and train data is always jumbled up before hand each time to make sure that it's not just doing exactly the same thing each time

grave frost Mar 20, 2021, 11:04 PM

#

for newbies, integrated is much better

#

because it reduces the chance of a problem somewhere

kindred radish Mar 20, 2021, 11:05 PM

#

kindred radish so like if i made hidden_layer_sizes = (1000,500,250,125,60,30,15)? I thought t...

Also i still got this despite using the above

#

grave frost Mar 20, 2021, 11:05 PM

#

I told you, ditch the scores and focus on the accuracy

kindred radish Mar 20, 2021, 11:05 PM

#

okok sorry ill go implement that now

grave frost Mar 20, 2021, 11:06 PM

#

your precision/recall may matter, but it depends on the task you are doing (like what does you implementation value - false positives, false negatives etc.) different values are preferrred for different scenarios

#

like for predicting cancer, you do not want any False Negatives.

kindred radish Mar 20, 2021, 11:08 PM

#

I guess for breaks i wouldnt want a False Negative either

grave frost Mar 20, 2021, 11:10 PM

#

you just want a high accuracy for prediction 🤷

#

for your task, FP's, FN's etc. dont matter

#

because you want to say to the CEO that your model can predict 90% of the time whether the machine would break or not

#

not talk to him about FP's or precision/recall

kindred radish Mar 20, 2021, 11:12 PM

#

So im using accuracy_score()?

#

That the right one?

grave frost Mar 20, 2021, 11:13 PM

#

thats for classification

#

try max_error

kindred radish Mar 20, 2021, 11:16 PM

#

grave frost thats for classification

Ah im using the MLPClassifier did you want me to swap to Regression? Sorry if im being slow, i dont mean to be frustrating to help!

grave frost Mar 20, 2021, 11:17 PM

#

kindred radish Ah im using the MLPClassifier did you want me to swap to Regression? Sorry if im...

yeah, regression

kindred radish Mar 20, 2021, 11:17 PM

#

aight lemme swap it to regression real quick

#

The regressor assumes a linear relationship right?

cerulean stream Mar 20, 2021, 11:20 PM

#

anyone know of a non-blocking way to implement matplotlib in an async environment: the regular run_in_executor from asyncio doesnt work

deft ruin Mar 20, 2021, 11:21 PM

#

@kindred radish no MLP has a nonlinear activation

kindred radish Mar 20, 2021, 11:26 PM

#

ooof im getting an error lol

#

    raise ValueError("Classification metrics can't handle a mix of {0} "

ValueError: Classification metrics can't handle a mix of multiclass and continuous targets

#

Im much too tired to be able to try and handle the error

#

So i'll probably call it a night and try and do something tomorrow. Is it alright if I could ping you as well tomorrow @grave frost ? Im on GMT timezone

#

absolutely fine if not, you;ve helped me a lot already! ^^

grave frost Mar 20, 2021, 11:32 PM

#

kindred radish ```python raise ValueError("Classification metrics can't handle a mix of {0}...

you've probably not set the columns to be predicted correctly. its basically an error that it can't handle classification and continous regression together; you have to choose one

grave frost Mar 20, 2021, 11:32 PM

#

kindred radish So i'll probably call it a night and try and do *something* tomorrow. Is it alri...

yeah, thats alright

kindred radish Mar 20, 2021, 11:39 PM

#

thank you!

stiff barn Mar 21, 2021, 2:15 AM

#

kindred radish thank you!

I think @misty flint mentioned this before but since you only have 300 samples I would stay away from a deep learning model and probably stick with a gradient booster if all the data is tabular (no images or any unstructured data). I’m assuming since it’s your first dive into ML your supervisor will want the model to be at least somewhat explainable and not just a black box which a gradient booster will give you to some extent. They’re also probably more of the standard for structured data.

#

Depending on what problem you’re trying to solve you may have been right in choosing classification over regression. If you just want to predict if a machine will fail that day then I would turn the problem into a binary classification one. That should be a bit more achievable with that little data. Otherwise I’d stick with regression and not do multi-class classification.

exotic maple Mar 21, 2021, 2:56 AM

#

Can someone explain in plain english what is the difference between Dense and Sparse matrix?

#

I feel like i get but i want to be sure i have the right idea...

deft ruin Mar 21, 2021, 3:00 AM

#

Sparse matrices are mostly zeroes, so much so that it becomes efficient to create a separate data structure that stores the location and value of the nonzero entries rather than the whole matrix

exotic maple Mar 21, 2021, 3:10 AM

#

but for example, why are sparse matrices preferred in some cases? I was reviewing some NLP tasks and it seems sparse matrices are like the daily bread there.

#

is that an inevitable conclusion of the pain in the ass of text?

deft ruin Mar 21, 2021, 3:17 AM

#

Yeah usually with NLP you’re dealing with a giant matrix where each column is a word and each value is the number of times that word appeared in the “document” (which is whatever you’re considering a single observation)

#

You can imagine how that can get a big fast and how lots of counts will be zero

#

I said word but usually it’s called something more general e.g token

shut valve Mar 21, 2021, 3:21 AM

#

Lol good question all I know is If you do One hot encoding it’s sparse

#

But beyond that and the mostly zeros I don’t really know

#

But no that’s not the end all for nlp most utilize some sort of embedding which turns your tokens to vectors so you don’t have to use one hot or sparse stuff

deft ruin Mar 21, 2021, 3:24 AM

#

Yeah I should say that’s the simplest case like bag of words

#

Lots of techniques are concerned with reducing the dimensionality of the feature space

velvet thorn Mar 21, 2021, 3:26 AM

#

exotic maple but for example, why are sparse matrices preferred in some cases? I was reviewin...

it depends on what you're doing

#

like the simplest form of vectorisation

#

bag of words modelling

#

there are many unique words

#

but most of them will not appear in any one document

#

-> lots of 0s

#

imagine a matrix where 99.9% of the values are 0

#

naively storing it in a data structure meant to hold dense data

#

will take ~100x of the amount of memory a sparse data structure would take

#

sparse data structures can also optimise for certain operations

#

for example, say you want the document(s) that contain the most of a certain word

#

you can clearly ignore all the 0s

#

there are many ways to store sparse data

#

each with their drawbacks and advantages

exotic maple Mar 21, 2021, 4:22 AM

#

@velvet thorn thanks a lot man.

#

I see that sklearns CountVectorizer returns a scipy.sparse matrix

#

is that one of the optimized structures you mentioned?

astral path Mar 21, 2021, 4:32 AM

#

ay so the ML model i built is in first place in my march madness pool

misty flint Mar 21, 2021, 5:30 AM

#

astral path ay so the ML model i built is in first place in my march madness pool

Praise

lean ledge Mar 21, 2021, 5:45 AM

#

exotic maple is that one of the optimized structures you mentioned?

Yes

#

Sparse matrix types, unlike normal matrix types, aren't going to be stored like straight arrays

#

They're going to have more complicated encoding schemes to avoid doing useless operations

paper lake Mar 21, 2021, 7:41 AM

#

misty flint <:Praise:793696707333849099>

🍼

spring seal Mar 21, 2021, 8:03 AM

#

Hi there, I am beginner. I am currently working on EDA of 'Temperature Variation of Countries'. So, if any beginner (like minded) want to work/study with me. just drop a message. It's just like group study, nothing else.

rotund dock Mar 21, 2021, 8:36 AM

#

misty flint i couldnt figure it out either. sorry. i tried all the various methods in that m...

Great I’ll keep trying and see! Thanks for trying

uncut barn Mar 21, 2021, 9:34 AM

#

What does it mean for the model (nn) loss to fluctuate?

lean ledge Mar 21, 2021, 9:46 AM

#

it's not training properly

#

decrease your learning rate

#

ideally have gradient clipping and stuff also

#

and normalise your data

grave frost Mar 21, 2021, 9:54 AM

#

velvet thorn there are many ways to store sparse data

I think the most common method is storing indices, right?

#

for sparse arrays with less density

tidal bough Mar 21, 2021, 9:55 AM

#

!docs scipy.sparse

arctic wedgeBOT Mar 21, 2021, 9:55 AM

#

`scipy.sparse`

This appears to be a generic page not tied to a specific symbol.

tidal bough Mar 21, 2021, 9:56 AM

#

there's compressed-sparse-rows, compressed-sparse-columns, dictionary-of-keys, and probably others I don't remember

#

and that's only for sparse matrices.

hallow girder Mar 21, 2021, 11:42 AM

#

Hello there,

I have somehow managed to land a Data Engineer role. I have never been a Data Engineer before.
My background is DevOps (Linux for the last 15 years ,and also AWS, Ansible, systems, bash, python, networking, etc). At the interview they mentioned being in a Data Team and technologies and terms such as ETL, data pipelines, python, pytest, Jupiter Notebooks (with Pandas, numpy, matplotlib), AWS s3, data wrangling, Linux, SQL, Agile and doing code reviews in Github.

I think I have either worked with some of these technologies and/or have played with some of them before but I don't know anything about ETL, data pipelines, pytest, Jupiter Notebooks (with Pandas, numpy, matplotlib), and doing code reviews to any great extent. Could someone point me to some recommend learning guides such as MOOCS, online courses etc about those (ETL, data pipelines, pytest, Jupiter Notebooks and doing code reviews).

Also, what is it like to work in a Data Team, is it different to working in a Software Development Team? How best can I transition from a DevOps (mostly Linux sysadmin type experience) to a Data Engineer mindset? What would be some good things to figure out and ask questions about when you first start in a Data Team?

Any help much appreciated.

lapis sequoia Mar 21, 2021, 12:27 PM

#

Hello everyone!
I am working on a project that uses OCR. I have videos that show boxes from above rolling on a production lane from one side to another. I already have a code provided by someone else that cuts from video frames only the box’s label and my task is to OCR this label.
Videos are recorded by fish eyed camera lens so it is a little deformed, which is not making it easy for me. First issue I was struggling with was to find a proper OCR tool, because it is going to be running on a weak virtual machine with only CPU (4 cores). I found brilliant PaddleOCR which is lightweight and seems to fit my needs, but I can’t apply it properly on my box labels to OCR them. Without doing anything to the images, PaddleOCR has 85% efficiency in detecting and recognizing numbers properly, but I need to make it better. I figured that rotating an image is helping in many cases, but I can’t have one fixed angle by which I rotate every image, so I found that four point transform can help in my case. To properly use this transform I need to find a contour of my label. I tried playing with opencv to pre-process image but I can’t find universal parameters for all images in order to find this contour and transform it. I tried pre-processing images with thresholds, blurs, applying erosion, dilation etc. after which I use canny edge detection and then I try to apply four point transform. I will be thankful for any help. Images with labels I am working with look like this:

#

#

serene scaffold Mar 21, 2021, 12:29 PM

#

@hallow girder please save this question and ask again another time. I'm worried no one with that knowledge saw it.

lean ledge Mar 21, 2021, 12:32 PM

#

lapis sequoia Hello everyone! I am working on a project that uses OCR. I have videos that show...

dont do BS, try to unwarp the images properly

#

the issue is a fisheye warp, the solution should be an unwarp

#

if you have access to the camera, run a calibration procedure, or know the camera model, try to find its distortion coefficients online

#

no OCR software should have any issue with those photos given it's undistorted properly

grave frost Mar 21, 2021, 12:39 PM

#

using pre-trained embeddings here, when I try to generate a subword vector, it has no problem. but when I try to find its most similar word (using an inbuilt method) then it reports that the word is out-of-vocabulary. how does that work?

#

ahh, nvm

kindred radish Mar 21, 2021, 1:18 PM

#

              precision    recall  f1-score   support

           0       0.58      0.77      0.66        39
           1       0.55      0.33      0.42        33

    accuracy                           0.57        72
   macro avg       0.56      0.55      0.54        72
weighted avg       0.56      0.57      0.55        72

@grave frost As per @stiff barn 's suggestion, I've used a Gradient Booster to try and classify breaks or not. I just printed classification_report() to get this table

grave frost Mar 21, 2021, 1:18 PM

#

hmm, why are still doing classification? regression?

kindred radish Mar 21, 2021, 1:20 PM

#

Aye I'm gonna try the regression one next, I just thought to print this out since I had everything set up for classification. I guess it was like "let's just see what happens" 😅 Ill go use GradientBoosterRegressor now

grave frost Mar 21, 2021, 1:20 PM

#

kindred radish Aye I'm gonna try the regression one next, I just thought to print this out sinc...

cool

stiff barn Mar 21, 2021, 1:23 PM

#

hallow girder Hello there, I have somehow managed to land a Data Engineer role. I have never ...

Hey @hallow girder. I’ve worked as a data engineer for a few years now and currently work as one for a large company in the US so I will do my best to provide some help and answers.

First thing, congratulations! It is possible that the team wants someone with a lot of dev ops knowledge as we do need to perform a lot in that area.

Since you already have cloud and Python experience that is going to help a lot as well since we generally like to build in Python. If you have IAC experience as well that would be great to show the team.

If you know which cloud provider they are using, I would look into getting a data engineering certification for that cloud. Each of them have one I believe and there are usually courses on coursera for them. There is also a company called DataQuest which has a Data Engineer track which is pretty good. They go through writing memory efficient pipelines, pandas, numpy, ect... You may find their other tracks useful as well.

When you first get to the team you’ll need to figure out what you’ll actually be doing and what technologies they use. Data Engineering can vary to some extent where some companies you focus more on writing SQL procedures and managing databases, while others it’s more writing data pipelines in the cloud, and others it’s a lot of both. On that note, if you’re not strong in SQL that’s definitely a top if not the top skill to brush up on. Figuring out what the team generally actually does will help you target what to learn.

Feel free to reach out if you have any other questions out want to chat about the role.

grave frost Mar 21, 2021, 1:23 PM

#

stiff barn I think <@!446424248479645706> mentioned this before but since you only have 300...

lol regression is very achievable in little data if there is a simple correlation between the input features. that is no basis for a recommendation to a task

stiff barn Mar 21, 2021, 1:24 PM

#

kindred radish Aye I'm gonna try the regression one next, I just thought to print this out sinc...

Do you have it set so there are only two labels, 1 and 0?

kindred radish Mar 21, 2021, 1:25 PM

#

Yeah, 1 meant there was a break, 0 meant there was no break

#

That's for the output, y

stiff barn Mar 21, 2021, 1:33 PM

#

grave frost lol regression is *very* achievable in little data if there is a simple correlat...

That is not true in his case. Given that he is reframing a regression problem into a binary classification problem can increase accuracy as you’re essentially condensing a range of values into one. It can be helpful especially in his case where I assume there are cases where the same input values can lead to a different output values due to chance often. At the end of the day though, we don’t know that much about the data so I could be wrong.

grave frost Mar 21, 2021, 1:34 PM

#

stiff barn That is not true in his case. Given that he is reframing a regression problem in...

all good points, but moot since that would cause bias in the model

#

@kindred radish how many times a week does it (on average) break?

stiff barn Mar 21, 2021, 1:38 PM

#

I am here to learn as well so if there is something I’m missing then please call me on it.

kindred radish Mar 21, 2021, 1:39 PM

#

The number of breaks is pretty small, each day (which is the time frame everything is recorded) there is about zero to four breaks

grave frost Mar 21, 2021, 1:39 PM

#

kindred radish The number of breaks is pretty small, each day (which is the time frame everythi...

each day?!

kindred radish Mar 21, 2021, 1:39 PM

#

yeah ikr lol

grave frost Mar 21, 2021, 1:39 PM

#

yeah, then its a regression hands down

kindred radish Mar 21, 2021, 1:40 PM

#

wait why

#

I thought it was such a small amount of breaks

#

~~that regression wouldn't be a good idea~~ that i could simplify the problem using a classifier

grave frost Mar 21, 2021, 1:41 PM

#

yeah, but having a large frequency each day makes it much more suitable to regression. if was like it broke 3-4 times a week with different days, then it would be classification because you could counter-act the bias

stiff barn Mar 21, 2021, 1:41 PM

#

If it’s that often then binary classification will probably just always output 1

#

Can’t really learn much. If you had the data on an hourly basis or something that might be different

kindred radish Mar 21, 2021, 1:42 PM

#

Why does it make it more suitable to regression?

grave frost Mar 21, 2021, 1:43 PM

#

because it can handle the fluctuations pretty well and classification would not handle the bias and would just output ones

kindred radish Mar 21, 2021, 1:43 PM

#

When i think of regression, as a physicist, i think in terms of some equation being obeyed. I have no idea what equation this data would follow to yield the output we see

stiff barn Mar 21, 2021, 1:43 PM

#

That’s what the machine must learn haha.

#

Could be a simple linear equation

#

Or a bunch of decision trees working together if a gradient booster/random forest

#

Or any of the number of regression models

kindred radish Mar 21, 2021, 1:45 PM

#

I doubt it is linear, only because the factors are kind of complicated and aren't so simple

grave frost Mar 21, 2021, 1:45 PM

#

it might be linear with data engineering

#

but you wouldn't know unless you do it.

stiff barn Mar 21, 2021, 1:45 PM

#

Thanks for calling me out there @grave frostz I wasn’t aware it was breaking that often

kindred radish Mar 21, 2021, 1:45 PM

#

Like im not even sure that the data is correlated to whether the machine breaks or not, this is just the data ive been given. It could be something like the humidity of the room, which they haven't recorded, which is actually making the machine break

grave frost Mar 21, 2021, 1:46 PM

#

kindred radish Like im not even sure that the data is correlated to whether the machine breaks ...

even if there was no correlation, you could find partial correlation which is help enough

stiff barn Mar 21, 2021, 1:47 PM

#

kindred radish Like im not even sure that the data is correlated to whether the machine breaks ...

You have the data in Pandas right? You can start by running df.corr() and get a look at the correlation between columns.

kindred radish Mar 21, 2021, 1:47 PM

#

So even if the data didn't actually affect whether the machine breaks or not, a model could still figure something out?

kindred radish Mar 21, 2021, 1:47 PM

#

stiff barn You have the data in Pandas right? You can start by running df.corr() and get a ...

I'll try that out ig. Is this fine to do one standardised data ?

stiff barn Mar 21, 2021, 1:48 PM

#

More like even if the data isn’t directly correlated, there can be some partial correlation that can give the model some signal to make predictions on.

stiff barn Mar 21, 2021, 1:49 PM

#

kindred radish I'll try that out ig. Is this fine to do one standardised data ?

As in scaled data via MinMax or Standard scaling?

kindred radish Mar 21, 2021, 1:49 PM

#

Standard scaling

#

That isnt the raw data, i've standardised it myself

#

Just wondering which to use. It's just more convenient for me to use the standardised one rn thats why i was asking!

stiff barn Mar 21, 2021, 1:50 PM

#

It should preserve the correlation and be fine.

#

If the scaling didn’t preserve the correlation then that’s a problem haha

#

How do you have the data stored in the notebook now when training? Numpy, pandas DataFrame?

kindred radish Mar 21, 2021, 1:53 PM

#

So i have a Pandas DataFrame of all the features, X, and the output, y.
Then i split the data into training and testing data and randomise their positions according to sklearn's train_test_split() function

stiff barn Mar 21, 2021, 1:56 PM

#

So you'll want to run the correlation on the full training dataset with the y included.

kindred radish Mar 21, 2021, 1:57 PM

#

Yeah not looking promising

#

-0.08902379625215213    -0.09753689601363387    -0.17866755803410775    0.02011288489513478

#

Are the correlations im getting to the total number of breaks

stiff barn Mar 21, 2021, 2:00 PM

#

So what are those, are those the correlation of each feature with the y?

kindred radish Mar 21, 2021, 2:00 PM

#

Yeah, that's the correlation of each feature with the total number of breaks that happened on that day

stiff barn Mar 21, 2021, 2:01 PM

#

There is some signal there. Not much but some. feature 3 has a decent negative correlation.

#

It's not all bad as there may be some non-linear correlation that you cannot see here

kindred radish Mar 21, 2021, 2:01 PM

#

Yeah i would presume this just tells me that there isn't really a linear correlation

stiff barn Mar 21, 2021, 2:01 PM

#

You can bring that out using feature engineering techniques like feature crosses

kindred radish Mar 21, 2021, 2:01 PM

#

And that the third feature is the "most" linearly correlated

stiff barn Mar 21, 2021, 2:02 PM

#

Yeah, basically

#

The gradient booster should be able to capture some of the non linear correlation.

kindred radish Mar 21, 2021, 2:08 PM

#

so i shouldnt 'LS' since that;s for a linear problem?

#

stiff barn Mar 21, 2021, 2:09 PM

#

I would just start with the default.

#

You can try other options later in your hyperparameter tuning phase and see if others work better.

kindred radish Mar 21, 2021, 2:10 PM

#

aight what metric should i use then? I just used classification_report() before but obviously this is regression?

stiff barn Mar 21, 2021, 2:12 PM

#

MSE most likely. Could also use r2

#

Would probably use both

spark olive Mar 21, 2021, 2:49 PM

#

i was referred here from the help channels, could someone help me with some code?

i am trying to make a bar chart, but i am unsure how to call up the x-axis and y-axis variables needed in
sb.factorplot(x=), (y=)
as the data is from several different csv files, and the math i have done is in functions

the code is on here https://paste.pythondiscord.com/uqezixusen.py

https://drive.google.com/drive/folders/1EJtc3R60eAMcMqLcSmGuPAGYPifLVIzz?usp=sharing here is the csv files relevant for the chart

serene scaffold Mar 21, 2021, 3:04 PM

#

@spark olive sb.factorplot(x=), (y=) is a syntax error

spark olive Mar 21, 2021, 3:05 PM

#

i need to put something in the x= and y=, but i cant figure out what to put

deft ruin Mar 21, 2021, 3:14 PM

#

Which columns do you want to plot?

#

You’ll need a data frame with two columns containing the data you want to plot

#

x and y are strings with the names of the columns

spark olive Mar 21, 2021, 3:16 PM

#

i want to plot EAB_SUM as the x axis, and postBIODV as the y axis, but i cant figure out how to turn it from a function to a series

kindred radish Mar 21, 2021, 3:16 PM

#

@stiff barn So i mentioned yesterday that this happened, but i have somehow got a negative R2 value...

#

Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse)

#

My R^2 value for this is like -0.8 and the MSE is like 1

spark olive Mar 21, 2021, 3:19 PM

#

aiming for something like this

deft ruin Mar 21, 2021, 3:19 PM

#

@spark olive the globals and the Xs in the function signature make the code a bit hard to follow

spark olive Mar 21, 2021, 3:20 PM

#

(realized i said barchart earlier when i meant dot graph, my bad there)

deft ruin Mar 21, 2021, 3:20 PM

#

Is the data you want for EAB-SUM in the total column of XX_EAB?

spark olive Mar 21, 2021, 3:21 PM

#

the data i want for EAB_SUM would be the result of

XX_EAB['TOTAL'] = XX_EAB.sum()```

#

but it would differ depending on if it was NY, WI, or TX

#

the fuctions having the XX was for the purpose of being able to replace the "XX" with the state code (NY, WI, or TX) without having to retype all of it every time

deft ruin Mar 21, 2021, 3:25 PM

#

Uh ok might be better to have a positional argument called state

spark olive Mar 21, 2021, 3:26 PM

#

oh? i have never used that before

deft ruin Mar 21, 2021, 3:27 PM

#

I’d refactor your code so that it returns the column you want

#

And set it up so that you can give it the stats and it will get the sum for that state

#

Then use pd.concat to put them together

spark olive Mar 21, 2021, 3:29 PM

#

how do you do that?

deft ruin Mar 21, 2021, 3:33 PM

#

Try writing a function takes the data and the state name and gives you the sum for that state

#

Then create a loop that puts the sum for each state in a list

#

Then convert that list to a series

spark olive Mar 21, 2021, 3:35 PM

#

how would you create the loop? sorry i am new to python! 😅

craggy sundial Mar 21, 2021, 3:35 PM

#

I want to implement a generative adversarial network in native python. not for any practical applications, but for learning

stiff barn Mar 21, 2021, 3:37 PM

#

kindred radish My R^2 value for this is like -0.8 and the MSE is like 1

Did you adjust the dataset to regression where you have the full range of values not just 1 and 0?

kindred radish Mar 21, 2021, 3:38 PM

#

Yeah im a mug i just realised i did that lmao

stiff barn Mar 21, 2021, 3:38 PM

#

lol all good

kindred radish Mar 21, 2021, 3:39 PM

#

Seems the regressor doesn't like that the input is continuous whilst the output is discrete?

#

ValueError: Classification metrics can't handle a mix of multiclass and continuous targets

stiff barn Mar 21, 2021, 3:39 PM

#

What metrics are you using?

kindred radish Mar 21, 2021, 3:40 PM

#

Ah i see yeah it's the classification report#

#

OKKK it works now it works

stiff barn Mar 21, 2021, 3:41 PM

#

Lol good, how are they looking?

kindred radish Mar 21, 2021, 3:41 PM

#

Annd Im getting the following:

MSE:  1.2018049174929522
R2:  -0.2902649077024708
MSE:  0.6243021962953844
R2:  -0.016531632695806264
MSE:  0.7926685016933164
R2:  -0.3511394915226984

#

For three runs

#

So they're still negative. I checked to see the data that im running through and it all looks good

deft ruin Mar 21, 2021, 3:42 PM

#

@spark olive no worries theysian might want to go to dedicated help channel for that kind of thing

#

Afraid I can’t walk you through it right now

spark olive Mar 21, 2021, 3:43 PM

#

@deft ruin i went there but they directed me to here instead
no worries though! any time youre free id really appreciate it. thank you so much for your pointers so far!

kindred radish Mar 21, 2021, 3:45 PM

#

Fiddling with the Gradient Booster's parameters doesn't seem to help

stiff barn Mar 21, 2021, 3:45 PM

#

Let it run for longer

kindred radish Mar 21, 2021, 3:45 PM

#

aight

stiff barn Mar 21, 2021, 3:45 PM

#

MSE is decreasing

#

For the most part

#

Also, if you can show your loss that would be helpful

kindred radish Mar 21, 2021, 3:48 PM

#

Aight i slapped it up to 1000

#

      Iter       Train Loss   Remaining Time 
         1           0.6906            0.00s
         2           0.6635            0.50s
         3           0.6451            0.33s
         4           0.6294            0.50s
         5           0.6113            0.40s
         6           0.5965            0.50s
         7           0.5845            0.43s
         8           0.5727            0.50s
         9           0.5605            0.44s
        10           0.5534            0.50s
        20           0.4828            0.39s
        30           0.4391            0.39s
        40           0.3983            0.36s
        50           0.3602            0.36s
        60           0.3275            0.34s
        70           0.2983            0.35s
        80           0.2739            0.33s
        90           0.2500            0.33s
       100           0.2338            0.32s
       200           0.1234            0.28s
       300           0.0760            0.24s
       400           0.0524            0.20s
       500           0.0355            0.17s
       600           0.0244            0.13s
       700           0.0175            0.10s
       800           0.0133            0.07s
       900           0.0098            0.03s
      1000           0.0073            0.00s

#

I could fiddle with the learning rate?

#

So i made the learning rate like 0.001 as opposed to the default 0.1. This makes it much less negative, it actually makes it go to around zero.
I've read that having a "high" learning rate is what you do if you have a "small" amount of data?

stiff barn Mar 21, 2021, 3:55 PM

#

negative r2 means that the model performs worse than a horizontal straight line. 0 would mean it performs just as good as one

#

.001 would be a fairly standard rate. It would be lower than the default though

kindred radish Mar 21, 2021, 3:55 PM

#

0.1 is the default yeah

stiff barn Mar 21, 2021, 3:56 PM

#

Trylowering max depth to 3 or 4

kindred radish Mar 21, 2021, 3:56 PM

#

So this is the same as me just drawing a flat straight line through my data? That's pretty crap right? lol

stiff barn Mar 21, 2021, 3:56 PM

#

Lol yeah basically

kindred radish Mar 21, 2021, 3:56 PM

#

depth is default to 3 lemme try 4

#

Tried 4. it's looking like it's still 0

#

like floating around that, both negatively and positively

stiff barn Mar 21, 2021, 3:57 PM

#

Is that on the holdout dataset?

kindred radish Mar 21, 2021, 3:57 PM

#

holdout?

stiff barn Mar 21, 2021, 3:58 PM

#

The dataset you save that doesn't touch the training process

kindred radish Mar 21, 2021, 3:58 PM

#

oh the "test" data?

#

Like i split the data into a 80-20 train-test split. These metrics then compare the predicted output based on what's trained against the test dataset

stiff barn Mar 21, 2021, 4:00 PM

#

Yeah, the test dataset

kindred radish Mar 21, 2021, 4:00 PM

#

Yeah then it's all done on that

stiff barn Mar 21, 2021, 4:03 PM

#

kindred radish Yeah then it's all done on that

I would try some of the things here

#

https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html

kindred radish Mar 21, 2021, 4:08 PM

#

are you worried that it's correlating the training data fine, just not the test data?

#

If the positions of the training and testing data is randomised each time, doesn't that help?

stiff barn Mar 21, 2021, 4:10 PM

#

More that it's over-fitting to the training set.

#

I'd suggest setting a random seed for both the model and the training test split so you don't get different results each time due to randomization

deft ruin Mar 21, 2021, 4:11 PM

#

@spark olive sure thing i might be able to provide some more help later

kindred radish Mar 21, 2021, 4:11 PM

#

Ok that sounds like a good idea

#

What's on the y-axis of this plot though @stiff barn Or did you mean to track how the R2 value varies upon changing the paremeters?

sonic raft Mar 21, 2021, 4:13 PM

#

Hi! I need some help with pytorch.
So let's say I've calculated a gradient for a set of parameters( It doesn't matter, but let's say with a MSE loss function)
So, when I'd like to step the parameters
is there any difference between
parameter.data -= parameter.grad * learning_rate
vs
parameter.data -= parameter.grad.data * learning_rate?
Since I've already told Pytorch not to calculate the gradients for this stepping operation with "parameter.data"

stiff barn Mar 21, 2021, 4:14 PM

#

kindred radish What's on the y-axis of this plot though <@!247847269267800074> Or did you mean ...

Just try changing the parameters they suggested there and see the effects.

spark olive Mar 21, 2021, 4:15 PM

#

@deft ruin thank you so much! feel free to dm me if thats easier

stiff barn Mar 21, 2021, 4:16 PM

#

The y axis is just a measure of goodness of fit

kindred radish Mar 21, 2021, 4:25 PM

#

stiff barn Just try changing the parameters they suggested there and see the effects.

HMMMMMMMMMM:

#

i used exactly their code, and i've just put X_train,X_test,y_train and y_test into it

stiff barn Mar 21, 2021, 4:26 PM

#

Wouldn't bother copying the code. I'd just try those parameters like subsample=0.5 in the code you had

kindred radish Mar 21, 2021, 4:27 PM

#

ohhhhhhhhhhh

#

aight lemme reverse reverse

stiff barn Mar 21, 2021, 4:27 PM

#

haha sorry should have been more clear

kindred radish Mar 21, 2021, 4:30 PM

#

Looks like im getting about the same as what I was getting before

stiff barn Mar 21, 2021, 4:31 PM

#

Yeah I mean unfortunately there may not be much signal in the data you have. You could try doing some feature engineering or collecting more data.

#

If you're willing to share the notebook I can look for anything being off

#

Not sure if that's allowed for you though

kindred radish Mar 21, 2021, 4:31 PM

#

I can't share it I'm afraid im under an NDA

stiff barn Mar 21, 2021, 4:31 PM

#

Yeah, makes sense

#

Same lol

kindred radish Mar 21, 2021, 4:32 PM

#

it's a real bitch right? hahaha

#

I want to be able to conclusively say something about the data

stiff barn Mar 21, 2021, 4:32 PM

#

Yeah, mine's pretty invasive

kindred radish Mar 21, 2021, 4:32 PM

#

I had an idea to push in artificial input data to induce a correlation that the models would learn. For example: create a feature that always results in a break if it has a value of 0.5

#

And this would show that the data i've been given is likely to not an effect on whether the machine breaks or not

#

Would this be a good idea to do do you think?

stiff barn Mar 21, 2021, 4:34 PM

#

Not a bad idea to validate your process

kindred radish Mar 21, 2021, 4:34 PM

#

omg some hope!

stiff barn Mar 21, 2021, 4:34 PM

#

If you had the time, you could also try something like this

#

https://github.com/sdv-dev/SDV

GitHub

sdv-dev/SDV

Synthetic Data Generation for tabular, relational and time series data. - sdv-dev/SDV

#

You could use that to generate more training data that has the same statistical properties as what you have.

#

That could improve training

#

Wouldn't recommend it though unless you had a bunch of time

kindred radish Mar 21, 2021, 4:36 PM

#

ohhh is this like... In unsupervised learning, say you had pictures of people's faces and you wanted a model to learn how to recognise them. You don't need to take more photos, you can just mirror them and that's the same as new data?

stiff barn Mar 21, 2021, 4:37 PM

#

Yeah, I'd say the idea is similar.

#

Not exactly the same as new data but helps prevent overfitting and such.

kindred radish Mar 21, 2021, 4:37 PM

#

ok that's cool, i might check this out after the meeting, unfortunately it's on Tuesday so i don't really have the time to look into it now

stiff barn Mar 21, 2021, 4:37 PM

#

Gives more examples to train on

kindred radish Mar 21, 2021, 4:38 PM

#

Thank you for helping me, this channel is awesome. Definitely made me realise how little i understood about what I was doing! Wish I'd been more active on here at the beginning of the academic year lmao

stiff barn Mar 21, 2021, 4:39 PM

#

No problem, you seem to learn fast and are able to iterate quickly on what we suggested. It's a pretty cool community for sure!

#

We're all still always learning. You have to be in this field haha

kindred radish Mar 21, 2021, 4:41 PM

#

thanks ahaha i've been in a bit of a panic so i was laser focused on whatever you guys said 😅

#

Yeah it definitely seems like it. When i was younger i saw lots of videos about ML and data science which was more on the meme-y side of things so i think i falsely assumed that the field wasn't as deep as it is. Obviously a terrible false assumption on my part! So much info for such a young field of study

stiff barn Mar 21, 2021, 4:43 PM

#

Yeah it's all super cool. It's a very interesting way to solve problems

#

Unfortunately just takes a lot of data generally haha

kindred radish Mar 21, 2021, 4:48 PM

#

Well that seems to be what was holding back the science when it was first come up with right?

#

Because a lot of this Computer Science was discovered decades ago

#

But """"Big Data""""" wasn't available back then as it is now. Like the amount of data that Google and Facebook or TikTok has on people is absolutely insane, so there's so much room to explore with ML

stiff barn Mar 21, 2021, 4:51 PM

#

Yeah for sure. Both availability of data and processing power. Now you can buy a solid GPU for $700 and be able to build fairly large models in a few hours of training or just hop over the the cloud and rent even larger resources.

kindred radish Mar 21, 2021, 5:27 PM

#

that is if you can find a GPU hahaha ;)

#

So im pretty happy with how everything's turned out, despite it being sloppy on my part. One thing I would like to check with you @stiff barn is why you suggested the GaussianBoost function? That question might be a bit involved so if there's material you know of that I could read about it then that would be awesome! I've looked around a bit on wikipedia and stuff, just unsure on some concepts i guess.

stiff barn Mar 21, 2021, 5:32 PM

#

kindred radish that is if you can *find* a GPU hahaha ;)

Haha I’ve been very lucky in that area with the 3090

#

The library XGBoost is what is generally used in the industry for structured data. That mostly contains optimized boosting models. They generally are the go to because they work really well on this kind of data. The most basic explanation I can think of is that you’re training a bunch of decision trees in a row where each subsequent decision tree tries to learn to correct the errors made by the decision tree before it

kindred radish Mar 21, 2021, 6:04 PM

#

Thank you! So I should look into XGBoost then and read around that a bit?

stiff barn Mar 21, 2021, 6:18 PM

#

@kindred radish Yeah, probably a good idea. This is the book I read to get a good introduction into a wide range of things. https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1492032646/ref=asc_df_1492032646/?tag=hyprod-20&linkCode=df0&hvadid=385599638286&hvpos=&hvnetw=g&hvrand=13529348850306413326&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9004373&hvtargid=pla-523968811896&psc=1&tag=&ref=&adgrpid=79288120515&hvpone=&hvptwo=&hvadid=385599638286&hvpos=&hvnetw=g&hvrand=13529348850306413326&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9004373&hvtargid=pla-523968811896

marsh gale Mar 21, 2021, 6:42 PM

#

Hey all!
I recently finished my Master in mechanical engineering and I thought about getting myself a little into ML (not for a job per se, but since i'm interested)
I'm currently playing some browser game, where Players can produce a number of units and fight each other. Combat is fairly simple each Unit picks a random Unit as a target and attacks it but some Units are good against others which means if they hit the Unit they'r good against they have a chance to attack again.

So my idea was to teach a ML with the Unit compositions of all the players on a Server and calculate the best possible "counter" to their Units.

Would you strongly advice me to not try this? If so, why?

stiff barn Mar 21, 2021, 7:06 PM

#

marsh gale Hey all! I recently finished my Master in mechanical engineering and I thought a...

Hey @marsh gale, welcome! It sounds like what you're attempting to do there is reinforcement learning. It sounds like it could be doable but I would suggest tackling a simple problem first for your first dive into ML. Something that already has a clear labeled dataset so you can wrap your head around some of the underlying concepts in ML. Reinforcement learning is definitely more on the advanced end and a current area of active research.

faint ocean Mar 21, 2021, 7:14 PM

#

#help-corn message

#

Could somebody maybe take a look at this for me?

grave frost Mar 21, 2021, 7:40 PM

#

marsh gale Hey all! I recently finished my Master in mechanical engineering and I thought a...

you can try naive Q-learning or DQN since your environment seems pretty simple enough with not much complexity

misty flint Mar 21, 2021, 7:42 PM

#

paper lake 🍼

ID_BoomKek

#

what does this even mean

grave frost Mar 21, 2021, 8:25 PM

#

misty flint what does this even mean

baby bottle?

hollow sentinel Mar 21, 2021, 8:27 PM

#

out of context baby bottle

grave frost Mar 21, 2021, 8:27 PM

#

maybe he/she was trying to tease him?

hollow sentinel Mar 21, 2021, 8:30 PM

#

idk

fiery frost Mar 21, 2021, 8:42 PM

#

Hi, I will be happy if someone can guide me.
I am trying to build CNN that finds the most similar fonts to a font in the picture.
The point is, that the font in the picture is in a different language from the font I try to find.
For example, there is a picture with a font in Japanese, and neural networks need to find the most similar font in English.
I don't know a lot of neural network staff, and I don't ask for a step-by-step guide, just need a direction and from where to start.
I already staterd establish a database for it.
Any help would be welcomed!

uncut barn Mar 21, 2021, 9:17 PM

#

will a small batch size cause the loss function to fluctuate, my data set is a 2000, 40, 1?

lethal crater Mar 21, 2021, 10:13 PM

#

uncut barn will a small batch size cause the loss function to fluctuate, my data set is a 2...

yes

#

well, your batch size will change the result of your learning rate

#

a larger batch size tends to work better with a slightly higher learning rate in my experience

lethal crater Mar 21, 2021, 10:15 PM

#

fiery frost Hi, I will be happy if someone can guide me. I am trying to build CNN that finds...

have a look at the MNIST dataset. It's kinda the 'hello world' of machine learning and revolves around identifying handwritten digits with CNNs. It'd be a great point to study since you're also analysing images of text

marsh gale Mar 21, 2021, 10:31 PM

#

stiff barn Hey <@!271400231923351552>, welcome! It sounds like what you're attempting to do...

Hey! Tyvm for the warm welcome! 🙂 Yeah thats what I thought, too but on the other hand, I really enjoy solving a problem I've set myself.
Yeah from what I've read it would fit into "reinforced learning".

@grave frost tyvm for the advice! What makes Q-learning/DQN so desireable for my task?

granite wolf Mar 21, 2021, 10:46 PM

#

#

anyone know how to add the ID column to the left and stop it using country/region as index?

#

this isn't freshly read in data, its the result of transposing the data then setting first row as headers, so cant use read_csv parameters

hollow zephyr Mar 21, 2021, 10:53 PM

#

HI there, I've covered numpy basics, and would like to practice, can you recommend me some sources that I can use to find exercises

grave frost Mar 21, 2021, 10:58 PM

#

marsh gale Hey! Tyvm for the warm welcome! 🙂 Yeah thats what I thought, too but on the oth...

Q-learning and DQN are usually preferred for solving simple environments - usually which do not include any multi-agent interactions. if your game is of greater complexity, you might want to look into more complex techniques like NEAT.
Either way, it depends on your environment.

granite wolf Mar 21, 2021, 10:58 PM

#

@hollow zephyr try find some interesting kaggle data and look at the tasks tab if you want to find solutions yourself without following a youtube tutorial

hollow zephyr Mar 21, 2021, 10:59 PM

#

granite wolf <@!781234040580997130> try find some interesting kaggle data and look at the tas...

Thanx a lot

zinc lark Mar 21, 2021, 11:07 PM

#

is there any way to slim down pytorch installation to just torch.jit? It's 170MB as of now even w/ only CPU support

#

only using it for running traced models in prod

velvet thorn Mar 21, 2021, 11:08 PM

#

zinc lark is there any way to slim down pytorch installation to just `torch.jit`? It's 170...

not trivially AFAIK

still otter Mar 21, 2021, 11:27 PM

#

granite wolf anyone know how to add the ID column to the left and stop it using country/regio...

you want the dates to still be the index, but the index name to be id? You can just rename the index, if that is what you want

serene scaffold Mar 21, 2021, 11:34 PM

#

velvet thorn not trivially AFAIK

I assume you'd have to clone it and figure out what you can safely prune from the code base without breaking what you need?

grave frost Mar 21, 2021, 11:35 PM

#

Quick Question - what can be the best ways to squeeze all perf out of pre-training? (apart from things like hyperparameter tuing)

serene scaffold Mar 21, 2021, 11:50 PM

#

My guess is that if you try too hard, you'll just overfit, but I could be wildly incorrect.

velvet thorn Mar 21, 2021, 11:55 PM

#

serene scaffold I assume you'd have to clone it and figure out what you can safely prune from th...

yeah

grave frost Mar 22, 2021, 12:10 AM

#

serene scaffold My guess is that if you try too hard, you'll just overfit, but I could be wildly...

yea, that's why I am not using it ¯_(ツ)_/¯ it was just an inquiry whether someone know some method that can enhance its performance

grave frost Mar 22, 2021, 12:31 AM

#

This aged well: https://www.reddit.com/r/MachineLearning/comments/6n97my/what_do_you_guys_think_about_siraj_ravals_videos/

r/MachineLearning - What do you guys think about Siraj Raval's vide...

142 votes and 101 comments so far on Reddit

#

3 Years ago - before the scandal

hollow sentinel Mar 22, 2021, 1:15 AM

#

@grave frost this would be more suited for #ot0-psvm’s-eternal-disapproval

#

Since it’s more about a person than a DS/ML topic

zinc lark Mar 22, 2021, 1:22 AM

#

@velvet thorn I'm trying to think of a reason why a general-purpose tool that prunes unused modules from 3rd party packages isn't out there. Is it because the very dynamic nature of python would make detection of unused modules very hard?

misty flint Mar 22, 2021, 1:23 AM

#

sounds like itll be very easy to break something

#

DoggoKek

velvet thorn Mar 22, 2021, 1:32 AM

#

zinc lark <@!171929073063297024> I'm trying to think of a reason why a general-purpose too...

yes

#

well

#

actually I need to think about that

#

but my gut feeling is also that there’s not much need for such a tool?

zinc lark Mar 22, 2021, 1:34 AM

#

yeah I don't think there's a conrete need. But I can also see how some people might use it (for example, minimizing attack surface on third party dependencies)

quasi sparrow Mar 22, 2021, 1:57 AM

#

Hey guys, quick question. How can I downgrade to Python 3.8 without using virtual environments? I'm running on Linux Ubuntu.

zinc lark Mar 22, 2021, 2:01 AM

#

While it's possible to downgrade, @quasi sparrow you should also check out just using python3.8

#

instead of python3 (which I'm guessing is linked to 3.9 on your system)

quasi sparrow Mar 22, 2021, 2:01 AM

#

Oh, that's a good idea. Let me try that!

misty flint Mar 22, 2021, 2:15 AM

#

depends on who you are. academia vs. industry. research vs. applied. etc.

#

speaking of papers

#

https://www.nature.com/articles/s41598-020-79310-1

Scientific Reports

Facial recognition technology can expose political orientation from...

#

ID_BoomKek

#

also if you are in academia, it should be less about reading X amount of papers, and more about focusing on your particular field + 3 pass method

#

maybe those related to your research, you would do the complete 3 passes, while those in other domains, would get 1 pass

exotic maple Mar 22, 2021, 2:16 AM

#

Ah yes, political inclinations from algorithms

#

what could go wrong?

misty flint Mar 22, 2021, 2:16 AM

#

memecringeharold

misty flint Mar 22, 2021, 2:17 AM

#

exotic maple Ah yes, political inclinations from algorithms

as a reference, they mentioned that someone did sexual orientation from algorithms

exotic maple Mar 22, 2021, 2:17 AM

#

I want my technocratic dictatorship to have at least some engineered catgirls or sex robots, but instead we have Onlyfan thots

misty flint Mar 22, 2021, 2:17 AM

#

and i was like

#

Pika

#

i wonder if anyone is ever like: 'maybe we shouldnt do this project'

#

DoggoKek

quasi sparrow Mar 22, 2021, 2:18 AM

#

zinc lark While it's possible to downgrade, <@!810729779648004096> you should also check o...

Seems like I need to use tensorflow ==1.11 to run this pretrained network. I would need to downgrade to tensorflow 3.8 so I can install tensorflow==1.11

exotic maple Mar 22, 2021, 2:18 AM

#

bro I laugh at the U,S researchers thinking

misty flint Mar 22, 2021, 2:18 AM

#

have you heard of the 3 pass method?

#

if not, then reading papers will eat up all your time

#

DoggoKek

exotic maple Mar 22, 2021, 2:18 AM

#

"guys we have a very tough political situation at home, what should we do"
"I KNOW, WE CAN CREATE A ML TO IDENTIFY "THEM" -> them is anyone you dont like

misty flint Mar 22, 2021, 2:18 AM

#

exotic maple "guys we have a very tough political situation at home, what should we do" "I KN...

ID_BoomKek

exotic maple Mar 22, 2021, 2:19 AM

#

I've only read 1 paper, ever, and it was the SAGA paper. It was...tough, but i think i got most of it

#

fuck implementing that though lol

zinc lark Mar 22, 2021, 2:20 AM

#

quasi sparrow Seems like I need to use tensorflow ==1.11 to run this pretrained network. I wou...

two separate python installations do not share packages AFAIK. So you can just do python3.8 -m pip install tensorflow==1.11 and then python3.8 -c "import tensorflow" should work

#

pip3.8 could also possibly work instead of python3.8 -m pip, depends on if you have that linked

#

and then to test my claim that different python installations don't share packages, python3 -c "import tensorflow" would not work (or if it did, tensorflow would be a different version cause you said 1.11 is incompatible with 3.9)

misty flint Mar 22, 2021, 2:26 AM

#

exotic maple I've only read 1 paper, ever, and it was the SAGA paper. It was...tough, but i t...

i come from a background that is heavily focused on research so being able to read papers was part of the skill set

#

this is the original version: https://web.stanford.edu/class/cs244/papers/HowtoReadPaper.pdf

this is the modern take: https://towardsdatascience.com/how-to-read-scientific-papers-df3afd454179

Medium

How To Read Scientific Papers

Increase your efficiency with the three-pass approach

#

i also have one "real" publication

#

so ig that helps

#

pithink

#

non-cs tho

#

DoggoKek

quasi sparrow Mar 22, 2021, 2:33 AM

#

zinc lark two separate python installations do not share packages AFAIK. So you can just d...

Thanks! I will try that

quasi sparrow Mar 22, 2021, 2:36 AM

#

zinc lark two separate python installations do not share packages AFAIK. So you can just d...

I think it's a problem with tensorflow. I can't install version 1.11. It may be rolled out already

#

I don't think it's avaliable anymore

zinc lark Mar 22, 2021, 2:36 AM

#

oh right it's on version 2. forgot about that

#

@quasi sparrow if you really need 1.11, would you be willing to use py 3.6? or do you need 3.8?

#

if you can hop down to 3.6, this should work:

#

pip install https://files.pythonhosted.org/packages/ce/d5/38cd4543401708e64c9ee6afa664b936860f4630dd93a49ab863f9998cd2/tensorflow-1.11.0-cp36-cp36m-manylinux1_x86_64.whl

#

or you'd have to change to whatever OS you're on

#

you can find the links here

#

https://pypi.org/project/tensorflow/1.11.0/#modal-close

misty flint Mar 22, 2021, 2:51 AM

#

ig = i guess

#

my paper is not important anymore

#

at least not for AI

#

~~i still put it on my resume tho~~

#

DoggoKek

quasi sparrow Mar 22, 2021, 2:52 AM

#

zinc lark <@!810729779648004096> if you really need 1.11, would you be willing to use py 3...

Thanks for all the help! I'm going to try with tensorflow 2.0 and fixing all the error one by one. It should be a good learning exercise, lol.

#

I'm trying to fine tune a Bert algorithm from hugging faces but damn, they make it seem so easy on the website

lean ledge Mar 22, 2021, 3:36 AM

#

your job isn't to read papers, it's to know what's going on in the field

#

read as many as you need to to keep up, don't go full detail on them if you dont need to

#

if you know the general methods, their pros and cons, why they are used, and the general trends, you're fine

wide oxide Mar 22, 2021, 4:26 AM

#

Has anyone worked with NEAT?

uneven gust Mar 22, 2021, 4:47 AM

#

does anyone here know how to code in R?

wide oxide Mar 22, 2021, 4:48 AM

#

uneven gust does anyone here know how to code in R?

Done some programming in R before

uneven gust Mar 22, 2021, 4:48 AM

#

I'm like super stuck on something basic

#

I need to make a data frame with 3 elements

#

i got that part down but it's what each element does that i'm stuck on

#

here's what i need to do

#

Sample should contain sample numbers from 1 to 20;
Group should have alternating labels ‘A’ and ‘B’. ( Hint : use the rep() function with one of its arguments being the same number of rows as in column Sample);
Value should contain a sample of numbers between -20 and 20 (after setting the seed at 42)

#

and here's my code so far

#

df <- data.frame(sample = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),
                 group = rep(c(A,B)sample),
                 value = c(13,11,9))```

#

it showed up strange on here but you get the point

#

i have no idea what to do for value

misty flint Mar 22, 2021, 4:53 AM

#

i need to learn more R sometime. all i know is tidyverse 10/10

#

DoggoKek

uneven gust Mar 22, 2021, 4:53 AM

#

@wide oxide do you know how to fix it?

wide oxide Mar 22, 2021, 4:53 AM

#

uneven gust <@!332558396219981836> do you know how to fix it?

I can give it a try.

uneven gust Mar 22, 2021, 4:56 AM

#

thanks let me know here or in dm 🙂

misty flint Mar 22, 2021, 4:56 AM

#

i think youre really close BongoCat

high badge Mar 22, 2021, 5:00 AM

#

this is the bellman optimality equation (for reinforcement learning)
but thats about all i really know about it
can someone explain to me what the summation sign is for? and what max_a is?

lean ledge Mar 22, 2021, 5:20 AM

#

@high badge "the optimal value for a state = the maximum of (the reward from the action taken plus a discounted sum of values for all the states gone over with the optimal action)"

#

it's one of the most basic statements of dynamic programming, although this is specifically written for reinforcement learning and has an added discount factor

#

Idk wtf T(s, a, s') is doing there, that shouldn't be there

high badge Mar 22, 2021, 5:24 AM

#

oh

lean ledge Mar 22, 2021, 5:25 AM

#

Oh T is the transition dynamics

#

yeah no thats fine

#

http://incompleteideas.net/book/first/ebook/node35.html

3.8 Optimal Value Functions

high badge Mar 22, 2021, 5:26 AM

#

uh

sturdy heron Mar 22, 2021, 5:45 AM

#

Hi, I'm Hans. I'm new here. Do you guys have any recommendation for beginner who wants to learn data science?

lean ledge Mar 22, 2021, 5:46 AM

#

http://progdisc.club/resources/#machine-learning

sturdy heron Mar 22, 2021, 5:49 AM

#

lean ledge http://progdisc.club/resources/#machine-learning

okay, thanks man

#

Btw, is Kaggle good for learning data science from zero?

#

especially for a person who don't have IT background

lean ledge Mar 22, 2021, 5:54 AM

#

a maths background is more important than anything IT or software related

#

kaggle is okay for practice, it doesn't teach you anything directly

sturdy heron Mar 22, 2021, 5:58 AM

#

oh, i see

floral mauve Mar 22, 2021, 6:58 AM

#

Hey guys, new to this server (< 1 day). Hello!

I was wondering, is there a good place to learn /recommended learning track for ML for someone who has decent knowledge of multi-variable calc and linear algebra?

misty flint Mar 22, 2021, 7:00 AM

#

#data-science-and-ml message

#

one of the links is math heavy so you can start there

lean ledge Mar 22, 2021, 7:01 AM

#

floral mauve Hey guys, new to this server (< 1 day). Hello! I was wondering, is there a good...

Bishop's!

#

Pattern Recognition and Machine Learning is a great book

#

Follow it up with Goodfellow's deep learning book

floral mauve Mar 22, 2021, 7:04 AM

#

Thanks, I'll look into them

grave frost Mar 22, 2021, 10:33 AM

#

lean ledge kaggle is okay for practice, it doesn't teach you anything directly

I disagree - In my opinion, I find competing in competitions the biggest motivator to learn new things and try to apply them to some real task. It keeps one motivated to keep learning something new every day and counter the people at the top of the LB!

lean ledge Mar 22, 2021, 10:34 AM

#

grave frost I disagree - In my opinion, I find competing in competitions the biggest motivat...

They were asking about learning from kaggle from scratch. It doesn't teach you anything from scratch, it's a practice platform

#

Read context before you reply pls

#

I mean it does have intro courses but they're so basic might as well read wikipedia pages or X in Y minutes

grave frost Mar 22, 2021, 10:38 AM

#

intro courses but they're so basic
That's the point - its meant for beginners absolutely new to data science. and several people I know have benefited a lot from kaggle courses 🤷

lean ledge Mar 22, 2021, 11:10 AM

#

It doesn't even teach you linear regression lol

#

In fact, it doesn't teach you ML at all

#

It gives you a tutorial thing which literally just calls the Decision Tree method without really explaining what it is

#

And a more advanced tutorial which swaps out the class name with RandomForests

#

I dont know why you choose to be contrarian about everything I say

#

It's objectively a horrible introduction that teaches you nothing about ML

sweet ginkgo Mar 22, 2021, 11:14 AM

#

Hello there. I need someone to help me/give advice about making a Snake game neural network

#

Thanks a lot