#data-science-and-ml
1 messages · Page 292 of 1

i can invite u :((
nooooooooooo
julia's syntax is as easy as peethon

@hollow sentinel whats the thing youre learning after ds&a
i feel like youre never getting to ML at this rate

or do you want to go the SWE route

whats SWE?
Software Engineer
^
whats W?
Soft"W"are Engineer

winning
ð
the math
frick u love that sticka so much
"softwinning" ah yes
it's just linked lists slowing me down
just practical statistics behind ML

just that for now
statistics is one of the biggest parts of ML
so it would be nice if I can knock that out
yes yes
plus I have you to help
thats good. even a bit of knowledge with it can help u like bayesian and stuff but yeah
and the rest of this server
do you do a lot of bayesian stats with bioinformatics too?

are you all undergrads?
28?
i am just a microbiologist undergrad so probably yes to that i guess or can u share your knowledge sensei


close
If you're 20, you're not old
nice! i did molecular bio as my undergrad

But I remember how it felt being 20 relative to 18 year olds
I live in a pretty young neighborhood and my parents are like go play w the high schoolers
I mean
a 20 year old playing with high schoolers????
uhhhh
sus
freak so i am sure i can ask some guidance here from u while suffering with the rain of kekw and pepe stickas
haha yes
ig if youre going the bioinformatics route, try to do well in your stats class
i regret not taking mine seriously

speak for yourself I got a A+ in statistics bc I cheated
i only passed :3 freaking hell i also regretted
it wasn't just me cheating
it was the entire class
we were all in one groupchat just cheating
well I know some stuff
but it was business statistics
not "machine learning statistics"
dont worry, i have a book :3 and then i will suffer from it
rex can I send the book to you
already suffering
see what you think
the practical statistics behind machine learning book
sure
imma check my lib again if i had such book
any of you guys learned ai through the internet?
there are some good resources out there
i wanna learn ai with python but i'm too young to go to uni where can i learn that stuff?
Kaggle
kaggle?
is that free?
content looks relatively solid
I'm not sure Kaggle is the best "learn AI from nothing" site. It's good if you have some exposure. The traditional rec is Andrew Ng's Machine Learning course, but I suspect you don't have a great math background yet. What's your math like and how strong are you in python?
oreilly is mostly good
very strong in python but don't have a crazy math background
Kaggle is a data science competition website where you are given datasets and need to extract something from them (e.g. a class, predicted price, etc...) There are a number of notebooks where people step through what they did, but I don't think those are a best first resource
it assumes you know calculus, linear algebra, statistics, probability, and discrete maths
it's also in Octave, not Python
Agreed, hence why I was hesitating to recommend it
nice i can just translate some stuff into julia
define descrete math?
i know all of that other stuff but i don't know what descrete math means
Discrete math is the math of finite things (as opposed to infinite or continuous elements).
that's the most ELI5 explanation I could find
oh yeah i did that in school
i think i wont worry much. highschool destroyed me anyway with these stuffs
To be fair, Discrete math is infinite, just not continuous
it's a generalized explanation
Sure
discrete math is the miscellaneous math subject where they throw all the subtopics from other maths they think engineers might need

like "find the limit of 5/x" and stuff like that?
frick


find the limit of 5/x would be calculus, no?
amirite or amirite?
claculusp

lol my phone skid
if you want to learn calculus
i don't know how things are called in my language
watch 3blue1brown
Anyway, back to the question at hand, these resources are pretty good for learning AI: https://github.com/louisfb01/start-machine-learning-in-2020
Oh right, I forget Kaggle had courses now, so I take back what I said, you could learn there
hmm i should do those 100 numpy exercises some time
don't youtube have good ai courses?
If you really want to learn AI/ML, you're going to need to pair learning AI/ML things like scikit/random forest and studying a lot of math
the one i found was techwithtim and i don't like him
Probably. Many of the courses host their videos on YouTube. I personally prefer more structured coursework than straight YouTube lectures offer but to each their own
oh definitely find someone you like or else itll be harder to learn
goood stufff
I tried doing the MIT algos/DS course
i like chris
and it started w peak finding....
not even big O
just peak finding...
MIT is gonna MIT huh
do i need to learn ds?
¯_(ã)_/¯
ds is so boring
do i need ds for ai?
yeah thats not MIT just the teacher
you mean data science or data structures?
well yes
IF you want to be effective, yes
yup
data structures
yes
i do that stuff at school
Oh, you mean Data Structures? Yes, very much so
it's so boring
ye, gotta knew atleast the basics
yes, it is. thats why im procrastinating on learning it

which is why I'm learning it
time to stop procrastinating


i know queues trees, linked lists all that stuff
<strikethrough>I don't know HTML</strikethrough>
Hash tables?
no
aren't dictionaries just optimized hash tables
next year we do hash tables and big O
ur in college?
Yes, but there's a difference between calling dict and knowing how it works
highschool
interesting
wait, they teach you that in HS?
this is in julia! i see what youre trying to do

yeah
it is called baiting
atleast they are teaching you the stuff ð€·
I think AP CS A in the US teaches algorithms
its ok. just breeze through the boring stuff for now and come back to it when you REALLY need it

but they forget them bc they don't bother to look it over till college
ok i will
indian python CS is pretty basic
we have pretty advanced stuff in cs class compared to other countries i think
we also have to do a graded project next year
ye, that's in EU too
some kid did an operating system
seriously??
ooh nice
like siraj Raval ? ð€£
who is siraj raval?
nevermind
oh god
Siraj Rival is this guy who claimed he was a ML prodigy but all he did was copy code
ð€dk who that is
yeah, he also claimed he built an OS
aaah a script kiddie
basically yes
like me
which was also copy/paste

everything he did in his videos
he just copy pasted from someone else's github repo
it wouldn't be a problem if he cited them
being a script kiddie in general is pretty bad, he made it worse by PLAGIRARIZING
uhm wait, he read code, copied it, never learned it? what a scam lol
but he didn't cite them
yeah
like devs integrate stack overflow code into their stuff
but they know how it works
they know what it does
so basically copy the whole repo without creditin
what happened after he was caught doing that
uh all of his credibility vanished
he even ran a course and it scammed hella people
incorrect explanations
freaking hell... Fake Gurus
so he demonstrated it himself
he released his own "research papers"

oh yeah, that one too
but all he did was reword what they said dumber
y i k e s
ð©
and what he couldn't understand he completely removed
something like hilbert complex to 'hilbert complicated'
which aparently doesn't make sense as its some mathematical term
sometimes if lucky, they wont
true
There are so many get rich quick gurus out there

were u the one who sent a yt link about the rise of fake gurus
yep
another problem is these damn data science bootcamps
i see my useless long term memory can still recall
and there are more; techlead, joma tech etc.
Fluke too
Joshua Fluke
he has these "courses" you can buy to start your own company
what bullshit
i convinced people not to trust joma tech
was he involved in something
Joma tech is a sellout
?
so is tech lead
still they say he is cool and trusted
they both doxxed a kid

cringey enough they made people trust them frick
tech w tim any day over these people
I'm a self taught programmer working as a Data Analyst. I graduated in Economic, but tossed it all when I got bored.
I had a hard time trying to find a good course to start learning
whattt I wanna see
andrew mo
and there was another guy that roasts whitehatjr - I forgot his name
andrew Ng is the best stuff you'll find
^^^^^
find him in coursera
And I completely feel the pain of all those bootcamps saying that in 2 weeks you'll get a 6 figure salary at the top companies.
but you need to know the math behind it
Yeah, I did his ML course.
thats bs
But I have to redo it.
lol, who does them anyway?
I like datacamp.com
these goddamn people who think they'll make 6 figures after 6 weeks?
you dont get a 6 figure salary after learning in two weeks,,,,
it's just a psychology trick
thats.... impossible
playing w peoples motivation
bs more like
and all you are is a script kiddie anyways you just execute the code they do no questions asked
Doing that one, everything from Andrew at coursera.org and trying to get a cert at some cloud service. Most likely AWS
but there are plenty of them around so apparently it seems to work; greed is a big motivator
freecodecamp is also a good resource
I have seen so many tik tokers endorse these bootcamps and it makes me sad
Because they get paid lol
yeah
tiktok is inherently bad though xD
6 figures after 6 weeks. sounds like something a fake guru would say

there's some good people on there
not many
but some
sadly some sounds so few
I find vines on YT to be much more entertaining
I can literally do these test correctly , they just HATE me.
(Go to http://buyraycon.com/calebcity for 15% off your order! Brought to you by Raycon.)
Music -
Melodic - King Kaiyo
19 - Oddwin
adios
gn friend
"they just hate me" its the algorithmic bias

pretty sure no one uses these captchas anymore
atleast the ones I see ð€·
most of them involve just ticking a box and going on
they use ML apparently to identify if the mouse movement is from a bot or a human, tho not much details have been released
you know companies use ML to track what their employees do on their work laptops
they can track your key strokes and stuff
see what emails you sent
exactly what you're doing
yeah, thats pretty common tho they dont use much ML - its just the plain old stuff
keyloggers and so on
you are advised not to use work laptops for personal stuff anyways
yes
you won't know if they do implement it, most likely the box is false and they just analyze snippets of your mouse movements when your browse the site
rather than checking it before

IMO they would get more data if they analyze your browsing history ð€·
like if you are visiting several hyperlinks in small amount of time, that could indicate some automation (hmm..that does seem viable)
guess I'm never gonna be free from big brother then huh
just blow up your router and go to the jungle
also everything I use is Apple meaning Apple has all my data
also meaning they're probably selling it to third parties
im okay if google is big brother. better than facebook. i dont trust them
apple has the best privacy IMO

according to who?
they have the T2 security chip, and they don't allow as much freedom as app devs on apple store get in terms of permission
On a side note. There's a captcha solver that uses Google's own language audio AI engine to solve the audio part of the captcha.
Fight fire with fire
it doesn't work that well
the discriminator always comes to be better, so you wont be able to generate a very good captcha
7/10 times it works. You can just refresh the web again if you want to automate stuff without human input.
I kind of do that at work lol.
I believe there was some mathematical proof for that (I visualized it in some video) but that is the reason why they moved aways from puzzle based captchas
its one of the basic properties of GAN, and the reason in general why GANs work (not sure about the maths, so has to be confirmed by someone)
plus if you fail more than once or twice, doesn't it temporarily ban you from attempting?
After you solve a captcha succesfully sometimes the very own engine asks you to train some more challenges. Not because what you did was wrong, but to properly use you as a manual classifier.
also, isn't it illegal to collect captcha data?
and using it on google's own servers is kinda trolling them
Not quite. They don't ban you if you don't send like 1000 queries a second from the same IP. If you do less than 50 per sec, you'll be fine. You can always clear the cookies and get a new IP
in sites that allow you to use them for a month for specific times, changing IP+MAC does the trick (I had a script for that)
those were fun times
They check only the IP, the browser info, such as header, version and stuff like that. You could, but I don't ever change Mac addresses.
cool
I'm trying to train an agent to in a racetrack environment but I am getting strange results when I plot the reward against episodes, if anyone can have a look at it, Id appreciate it
Hey @remote fossil!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:
⢠If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
⢠If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
not sure how I can share the txt file thats used to build the enviornment
Gdrive it
thats why


By the way, sorry if my rant is not appropiate for the channel. I just joined this server yesterday. How's the random channel called?
Thanks.
its ok. captchas are kinda like ai

ai is a very broad field
tbh

Yeah, but I meant that since I wasn't asking or answering anything in particular. It was a bit of meaningless rant on my behalf.
I'm trying to get more familiar with parts of the data science stack I haven't used yet. Right now I'm trying to preprocess the Titanic dataset and it seems the sklearn preprocessing tools were designed to simplify problems more complex than the one that I'm having
for example, I want to normalize the ages of the passengers by squishing them between 0 and 1, and convert the passenger class into a one-hot. And then the end-result would be array like [a, b, c, d, e] where only one of a, b or c is 1 (for the passenger class), d is float between 0 and 1 for the age, and e is 0 for men and 1 for women, or something.
Do people typically run machine learning projects locally or in the cloud?
Why don't you try to wrangle the data in pandas? @serene scaffold
I'm trying to train a model with YOLOv5 in Jupyter on my laptop and it took like 20 minutes to run 3 epochs
Depends on the size of the data and your hardware specs. At the cloud is usually faster but costs obviously more than running that in your old pc which takes longer. You can use Azure free trial since it's the longest one from the big ones I think (vs Google Cloud or AWS) but don't take my word for a fact.
alright thanks
I'm using pandas, yeah, but I want to learn more about sklearn.
And even though I gave a list earlier, it's going to be a pytorch tensor.
I'm not used to work with sklearn, but reading some of the documentation https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
you have the .OneHotEncoder() with the .fit_transform and the .transform() methods. I would start there.
https://www.youtube.com/watch?v=irHhDMbw3xo
This guy drops the age column, but still does a one-hot encoding with the Titanic dataset.
@serene scaffold
Thanks, I'll take a look at that video!
Anyone know how you would extract classified Named Entities (NEs) from a NLTK tree? I'm having trouble grabbing these NEs and assigning it to a Python lists ðŠ When I traverse through the NLTK Tree object, for some reasons no leaves are getting identified, for me to actually start retrieving the NEs
cloud is much better. for beginners, colab is so much easier. though the best path is to invest in your own hardware, I recommend you ditch your laptop as soon as possible because YOLOv5 is pretty small anyway
Anybody know any good videos on Pandas Data Manipulation and Data Tidying?
how long should training a model take on hardware
which hardware
nevermind, I guess it depends on the computer, model, images, etc.
On laptop - weeks (due to constant thermal throttling)
On Colab/cloud - a couple hours usually
you can still fine-tune models on laptop, but that is pretty restrictive
Does colab let me save the models locally?
so I could train it in the cloud and use it to detect something locally
olab saves the model on google drive since it is the fastest and most reliable. after that, you can download the model from google drive to local
but if you have bad internet, then you should invest in building your own Deep Learning rig
how does internet influence it
its downloading, man
you download the model checkpoint and use it to do inference locally
oh ok
I thought you meant my internet would influence how fast colab runs or something
nah. for beginners, colab is the best. if you want extra power, you can pay for colab pro subscription
alright thank you
cool, no worries
please could someone help me with a groupby in pandas?
ObservationDate Country/Region Confirmed Deaths Recovered
0 01/22/2020 Mainland China 1.0 0.0 0.0
1 01/22/2020 Mainland China 14.0 0.0 0.0
2 01/22/2020 Mainland China 6.0 0.0 0.0
3 01/22/2020 Mainland China 1.0 0.0 0.0
4 01/22/2020 Mainland China 0.0 0.0 0.0
... ... ... ... ... ...
236012 02/27/2021 Ukraine 69504.0 1132.0 65049.0
236013 02/27/2021 Netherlands 16480.0 178.0 0.0
236014 02/27/2021 Mainland China 1321.0 1.0 1314.0
236015 02/27/2021 Ukraine 50582.0 834.0 44309.0
236016 02/27/2021 Netherlands 255335.0 3732.0 0.0
im trying to group by ObservationDate and Country/Region
summing confirmed, deaths and recovered
Any reason you're using nltk and not spacy?
it would be easier to experiment if you provide the data as comma separated values, though let me see if I can figure it out.
thanks, i think i've figured it out using: df2 = df.groupby(['Country/Region', 'ObservationDate']).agg({'Confirmed': 'sum', 'Deaths': 'sum', 'Recovered': 'sum'})
however it creates like new but lower columns that aren't counted as columns by pandas?
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
^ I suggest providing the CSV there if you'd like any other kind of help with that data
Nope, I don't really have any particular reasons for using NLTK tbh but compared to SpaCy, it does have a plethora of corpora and algorithms to choose from and also works with Stanford Tagger too ð
hello
Does anybody understands how to import large csv file into a database table using python?
https://www.youtube.com/watch?v=iEFqcMrszH0 Generating art ot music using a GAN
Made with Lucid Sonic Dreams, an upcoming Python package that syncs generative art to music!
Model weights trained by Reddit user new_confusion_2021
Song: Raspberry by Saje
this is really cool, if there's ever a release of this package, I will definitely be trying it out
is spacy much better than nltk? were thinking about using nltk for a telegram sentiment analysis project
oof. I was "trying" to test a few models with 100k x 1k array and PC is so hot I can cook a steak over it
NEW VIDEO: Cooking with Intel 7 - Ramen Noodle Soup on a Pentium D 820 CPU
https://www.youtube.com/watch?v=yNWdB1_nGos
In this video, you'll learn how fast a CPU can get hot, how hot it can get, and how quick it'll die out without a heatsink, as I fry me a little snack on it. The CPU is an Intel Celeron 1.8GHz Willamette CPU. NO, it is NOT an A...
F CPU

shoulda used cloud
I find have enough experience with nltk to say. What functionality do you want from either of them?
I've mostly used spacy to get token-level features
I finally got the shit model to converge
score -> 0.9.
not bad
but now im scared of getting precision, recall for comparison ugh
im going to burn my shit HAHA
CV?
So I'm taking a MOOC for data analytics, I never took a stats class before. Every time a new stats term is mentioned, Im having to google what each word is.
Do you think I should cut my losses and take a stats course on Khan Academy to catch myself up?
what stats terms?
P Values, R squared, adj r squared, F test
These are a few, and im probably only a week in
@serene scaffold
@gray phoenix is googling each new stats concept helping you keep up with the course?
ah gotcha.
also what is the name of the MOOC?
@serene scaffold
Yes and no, I'm learning what it is. But I'm honestly concerned its going in one ear, and out the other.
Yes, in the sense that I am learning what it is.
EDX, Predictive Analytics with Python.
Company is paying for it, so im not TOO concerned about "passing". But I am more interested in trying to learn and retain it
@gray phoenix https://youtube.com/playlist?list=PLblh5JKOoLUK0FLuzwntyYI10UQFUhsY9
Check out Statquest. Very easy to digest videos
@misty flint Thank you!
I'll take a look at this.
he makes it fun with a little tune at the start of each video
np
i love statquest lol
My thinking is, the advantage of taking a stats course to supplement the main course is that you'd learn the stats concepts in a structured way. But I don't think you can pause the main course while you finish the stats course.
that's rough buddy
that really sucks
yeah thats exactly right. I was just thinking about taking the loss on it. I do keep the course. So I was going to revisit once I catch up in stats
its alright. i think this last time, things are finally becoming internalized, instead of just memorizing formulas, yknow?
seeing bayes rule in ML was cool
i watched statquest. highly recommend it

@gray phoenix do you need help understanding P values, for example?
ive seen most of the stats vids. need to go through the ML ones
I think right now for me atleast, I just gotta start to learn it. I'll definitely reach out if I have questions.
Thank you!
I have some bank transaction data and one of the features is a short description
for example some values are "DEBIT - east coast rail" ; "2019-7-14 UBER" ; "Hotel Ritz" and so on. Struggling to find a way prepare this transaction description feature to predict the label of transaction ("Travel" "Accomodation").
use nlp
10gb csv
The capacity of a neural network to absorb information is limited by its
number of parameters. Conditional computation, where parts of the network are
active on a per-example basis, has been...
137 billion parameters

ofc jeff dean would be coauthor

Does anyone know cv2?
I really need help
it's best to ask the question you have right away.
@serene scaffold I wanna live template match a video
how do i use numpy arrays in cv.matchTemplate
without downloading each frame
I probably don't know how to help with your question, I'm just letting you know that asking your question is more likely to attract someone who knows how to help than stating the general topic of an unasked question.
:/
im trying to solve this using numpy/pandas/matplotlib and i dont know how to get 'r'squared ive tried youtube and google/khan academy and i cant find anything that uses a formula like this
any help to point me in the right direction would be appreciated
khan acadmey like others only ever seem to use y=mx+c
are you having trouble with the definition of the "coefficient of determination" or how to encode the solution?
both
Hey, anyone know anything about powerbi?
Go ahead and ask the question you would have asked if someone said yes.
You know how in a line graph there is a trend line that you can add in the analytics tab?
It doesnt show up for me
Like lol
@serene scaffold Coefficient of determination is, in its simplest form a definition of how much the indepedent variable(s) affect the dependent one
Thanks, though A_new_player is the one who wants help with that
depending on the statistic, R2 can be computed directly (literally correlation squared)
or...you need to obtain via other means.
Linear regressions almost always generate R2 directly
ok but using i.e. matlab/numpy package how can i find the r squared?
i have no idea how to incorporate 't' data or how to sub anything into a/b/c
i'm going to assume that T is student's T score
If my memory serves me right students t is calculated just like a Z score
(value - mean) / std dev of the SAMPLE
o wait you werent asking how to calculate t lol
honestly, for the first, the wiki explanation is very indepth. https://en.wikipedia.org/wiki/Coefficient_of_determination
In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing o...
I really dont know what that T is thou
yeh but again, it doesnt seem to get me anywhere
https://en.wikipedia.org/wiki/Coefficient_of_determination#Adjusted_R2
In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing o...
doesn't explain what the (model) is either
it says right there...below lol
next sentence
"where p is the number of explanatory variables"
which is?
@prisma willow https://youtu.be/2AQKmw14mHM
â NOTE: When I code, I use Kite, a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while youâre typing. I love it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=statquest&u...
isnt that just 'n'
thx ill take a look
n is data points
do it. its really short
will be v helpful
trust rex
tell me what ''n' and explanatory variables are
i trust him
ur saying n(sample size) is the number of rows of 'x's and so are the explantory variables?
I recommend you watch the video. I'm kinda tired and i don want to mislead you with a shitty explanation
also, you shouldnt be watching that wiki section
. Your problem EXPLICITLY states UNADJUSTED R2. you're reading adjusted R2
the problem he has some t values idk wtf they are lol
i have never related students t to R2
maybe just a random variable
im sure there is a way though, because some linear regerssions assume Gaussian distribution
maybe. someone should tell that teacher to not recycle important variables though...
t, z, those letters have meaning in statistics lol
using python
- packages
also video helped with some logic, but again it doesnt help with the question itself
What's the Question?
im trying to solve this using numpy/pandas/matplotlib and i dont know how to get 'r'squared ive tried youtube and google/khan academy and i cant find anything that uses a formula like this
any help to point me in the right direction would be appreciated
khan acadmey like others only ever seem to use y=mx+c
if i could get 'y' and 'f' i could do it
https://www.geeksforgeeks.org/python-coefficient-of-determination-r2-score/
Hm. What's t?
do you have to use matplotlib? i can never get it to do what i need it too lol
its trying a polynomial function instead of a linear one 
T must stand for "true values" I think.
oh nice. it looks like it returns what youre looking for R^2. just gotta use the score() function
I guess it doesn't matter whether t is true or predicted does it
@misty flint
i dont know for this line:
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
should the second values in each list be the 't'?
sure?
No I'm not sure. More like speculating
possibly. then you could use the functions below maybe
im not too sure tho since i have a hard time interpreting what its asking

assuming that that's right then this is the only confusing line with what to sub for...
y = np.dot(X, np.array([1, 2])) + 3
oh you have to change that one completely
thats the function
in that example it looks like its doing the dot product of X * [1,2] then adding 3 bc why not

then it comes right back to the initial question
what is y=ax^3 + bx^2 etc.... whats 'a' and 'b' and 'c'?
that is literally the part when i was like this

idk if those coefficients are what you plug in t in for or what
no idea
I have to Draw a contoured field plot to illustrate the particular pollutant anomaly variations in terms of the month (y-axis) and year (x-axis)
I am not able to get months on y axis
Can anyone help?
@oak elk can yo please share your code which contain graph algo
in statistics, how come they always say its impossible to determine causation for something?
yes, i get that correlation does not equal causation but at this rate, can you EVER prove causation

without being omnipotent
heh e
ok serious question time
how do you determine where to set the x bounds for a plot?
I'm trying to visualize the distribution of a variable
but it's kind of heavily skewed because there's some outliers
i.e. there's only 10 occurances of a plays-over-expected greater than 6 in a list of 7000+ entires
maybe one with and one without outliers then
also you know what distribution this looks like
what?
huhhhhh i guess it does look like that
plays over expected is basically just a measure of how many more plays I have for a song than the average user has
also
what exactly might the purpose of having gradient coloring like this be? (not my plot)
It's not always impossible
If you know there's no confounding variables, you can draw causational networks and use your knowledge of the system to infer the causation
At least to an extent
ohhh. wait then that means your outliers mean something. those 10 songs are very popular, most likely outside standard distribution too. but obv you can remove them if youre trying to analyze just the bulk of data
Probably not much more than "it looks nicer"
(for that particular graph)
yeah my graph's purpose is pretty similar so i think i can just do the same if i choose
yeah yeah like i'm trying to make a plot that will show me what the general trend is for plays above expected
wait
i need to take into consideration number of plays
the only thing i can interpret is that at each timestamp, you have multiple values, so the coloring helps you distinguish some of the more granular data but idk why they dont just divide the data up instead
ohhh yeah that would make sense
obv depends on the use case
i could always use hue for weight but that wouldn't be too helpful for some cases which don't have a lot of plays and aren't very visible
man i need to take a data viz course so much
too bad all the ones at my uni have heavy prereqs :(
maybe take a look at a few books?

also i found a cool library that does bayesian causal networks

oooh thanks
i ended up doing a parallel plot for the purposes i was wanting
O LMAO
i didnt realize that recorded my audio too
https://github.com/ossu/bioinformatics dunno if this is a good resource for anyone interested in bioinformatics
Is it possible to import graphs?
import a graph instance? when you import a Python module into another module, you get all the variables defined in the global scope for that module.
@serene scaffold ok?
did I not understand your question?
see #âïœhow-to-get-help to open a help channel and ping me when you do ðð»

frik u are too addicted of that sticka :((
can I ask question anyone in here do the Shopee Code League - Multi-Channel Contacts
i wanted to recommend books but i havent read any good ones about data viz principles. all i have are podcasts
*graph

sadsuu
well heres one about florence nightingale and how she used data viz to get she what she wanted 
Listen to this episode from 99% Invisible on Spotify. Victorian nurse Florence Nightingale (played in this episode by her distant cousin Helena Bonham Carter) is a hero of modern medicine - but her greatest contribution to combating disease and death resulted from the vivid graphs she made to back her public health campaigns. Her charts...
this is a pretty concise one that I found
In pandas can I select a column with '.' - as shown in the highlight below ".sales"?
try that
@lone drum the error said there that u should have that "GDAL API" or something installed
i aint familiar with GeoPandas
Ok np
if you're more of a videos guy you can use this too https://youtu.be/xvpNA7bC8cs
Have you ever been confused about the "right" way to select rows and columns from a DataFrame? pandas gives you an incredible number of options for doing so, but in this video, I'll outline the current best practices for row and column selection using the loc, iloc, and ix methods.
SUBSCRIBE to learn data science with Python:
https://www.youtub...
this guy is goated
I was watching his vids all the time when I was using Pandas
Awesome. Thanks so much
yep
how do I motion track colors in opencv??
gpus are faster on matrix operations than cpu

this is why training a model on a gpu tends to be faster
Is it bad that today I learned that a GPU is a graphic processing unit
no
am noob
interesting
just dont use your gaming PC thinking you can handle ML
-cries in former graphic cards-
killed it again


but holy it popped like popcorn
will a rx 570 in a desktop perform better when training CNNs rather than my macbook?
at that point just use cloud
i havent used cloud cause im a lazy fuck
well its not that, ima gamer and coder so im building a gaming PC, on my mac I use Colab, but since Im building a PC I wanna know if its worth going local rather than cloud
how much ML are you doing
casual
nah just use the cloud then
gpus are for gaming!! abooos google colab gpus for freeee


lol yea a mentor of mine told me he built a desktop with a dedicated GPU for deep learning
sounds moronic.
Deep learning on a desktop is kind off dumb lol
itll still take forever to train
i can barely manage a few random forest or GDtrees on my pc xd
ah ok i see
on the cloud, you can just run multiple gpus rather than just running 1 for like a week
also less expensive
maybe it is a desktop with a lot of gpus

ive been running that crap for like 1 hour xd
its a GBDT
my comment is big brain thou
nah hes a 1st year at stanford he cant afford SLI or Crossfire after tuition Lul
especially in this market
lol
Silicon shortage be like
dont they make specific chips for ML now
oooof

I only saw chips for mining
which i htink its the greatest waaste of power in recent history
hmm i dont remember if i heard it correctly on a podcast
i think thats what they said

I'm going to sound heretic here, but is anyone else interested in ML / AI here, but to a simple applications perspective?
imagine mining can use more power than a whole city? freaking helll
As in, limited uses, small implementations, quick wins, etc
I'm liking everything i'm learning but i dont see myself working on massive DL or AI stuff lol

yes. low hanging fruit

thats what i like my projects to be
This.

btw we are planning to create a package in julia related to epidemiology or protein folding. is it okay to ask advice from u? just not on julia stuff
I literally was telling that to my CS friend

like
he's into DL and all that
and i'm like: "cool dude, thats why you studied CS and I didnt lol"
yeah sure thanks :3


Julia looks sexy but i havent found a single piece of deent learning material
lol idk if i can do a pure CS degree. im in grad school now but for AI
coming from a non-technical field

pffft
I'm coming from industrial engineering
im dead
most people in my field and in most admin feels find their epoch in excel lmao
freak i am an undergrad in microbiology :((
but then : WILD 30MB DATA FILE SHOWS UP

and excel cant eat that up xD
wait is that really the limit of excel?
depends
idk why i thought it was bigger
ive never seen excel opening anything above 25 MB lol
all i know is they cant do more than 1mil rows
excel breaks at around 100k rows
i remembered the UK have large backlogs of data because they used excel for covid

wut
the UK?
they have a data science ecosystem there
not sure
they couldnt get anone to pandas it?
but there was large backlogs because they used excel
yeah i think i have read it from last year
UK got hit so bad by covid iirc
i saw a job listing for an internship at a local power company to "migrate all their data from excel to 'python' to run analyses"
i literally died
ð

PLOP
imagine getting hired to do that
you'll end up building their whole data infraestructure
imagine your bosses not knowing enough to even see if youre doing your work properly
migrate to 'python'
i still cant get over that
ð
eh
all my career LMAO
hahahahhahaa.... i can only laugh in excel
good side, my bosses never annoy me
bad side... figure shit out yourself lol
I, without knowing crap about dbs, implemented the most disgusting, heretical and repulsive postgresql server you can imagine
ive heard that you never want to be the first data scientist at a company bc youll end up building the whole data infrastructure like you said
then itll be a pain to fix everything later
and every data scientist later on will hate you

and then they will say your infrastructure is shit not knowing your struggles of being alone :((
imma watch Redo of Healer tomorrow. goodnight
I made a postgre to keep track of my weight and i havent update in months
bro. the ending of last chapter AHAHAHA
such a fucking meme
thank you, gridsearch. after 2 hours of burning my pc finally got somethng :v
my prof has been lecturing about error backpropagation for a while and im still like

hes one of those profs thats all rambly
Is there a good way to compute correlation between location (as in on a map) and a numerical variable?
I'm trying to figure out if there's a correlation between location of artist who I listen to and # of plays they have
you need to convert location to an encoded variable
a location is a nominal value so you cant numerically process it.
ahhhh ok
even if you have LAT/LON of location, i dont think it would work
yeah that's what i'm working with
lat/lon is still a type of categorical variable, its a coordinate. -40 E has no numerical value in itself
that makes sense
thanks for clearing that up, i was about to make a pretty big mistake i think
I had a similar issue not long ago
lat lons are given as numbers
but in reality they are coordinates, nominal data, not a continous value in itself
I found a pretty good model called Moran's I but I'm not sure it would work here
how would you encode them?
I don't know for a fact but I think it's anywhere from 500-1000
onehotencode turns basically every unique category into a binary variable
so, that would create 1000 features from 1000 unique pairs of lat lon
ohh ok that makes sense
wait 1000 unique values for one variable or 1000 categorical variables?
1000 categorical
like zipcode

wait lol yeah
OrdinalEncoder is a decent one too if you dont want that many categories
its what you are asking for lol
My reinforcement learning agent takes about 5 seconds to train over 150 episodes, however, when I implement a function to average 20 agents it takes over 3 minutes, basic math says it should take 1 minute 30, any ideas?
function probably isnt linear
something something big o notation
something something

also other things could affect training time
def agent_avrg():
#initialise average undiscounted return among x agents
#episode number : return value
avrg_undiscounted_return = {}
for itr in range(150):
avrg_undiscounted_return[itr]=0
env = RacetrackEnv()
for i in range(20):
env.reset()
a = Agent(env, 0.2, 0.9, 0.15, 150)
undiscounted_return = a.sarsa()
for ep in avrg_undiscounted_return:
avrg_undiscounted_return[ep]+= undiscounted_return[ep]
for ep in avrg_undiscounted_return:
avrg_undiscounted_return[ep] = avrg_undiscounted_return[ep]/20
return avrg_undiscounted_return
ah makes sense, the above function is my averaging function, do you see a way to improve speed/simplify it
Why does this not load ðŠ
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
#Loading the dataframe
df = pd.read_csv(r'C:\Users\polsm\Desktop\pd.csv')
df
like it says it cant find the file? Am i getting the wrong directory or something
Can you paste the entire error message into the paste bin linked to above, and then give us the link to the paste bin?
where is your jupyter environment?
I'm not a fan of jupyter for this reason
I would first go to the directory where that file is located and do python -c "import pandas as pd; pd.read_csv('pd.csv')", just to ascertain that there's not some weirder issue at play here.
can I just upload the file somewhere and then use URL? worked for me b4 but i need never uploaded the file anywhere, just straight from url
now i get this...
the csv file needs to be on the same computer as the Python interpreter. Is this a Jupyter notebook on your computer or on a cloud platform?
is my url
I fixed it with a band aid
I uploaded the file to website and copy pasted the URL
#Loading the dataframe
df = pd.read_csv(r'C:\Users\polsm\Desktop\pd.csv')
df
Now it magically worked???
idk whatever
lol
ð€·ð»ââïž
@shut slate what level of Python experience would you say that you have?
complete noob lol
I would encourage you not to use jupyter at all, then
It makes it look easier, but this is deception. it actually adds a lot of extra considerations and makes debugging more difficult.
Yeah I noticed but my uni is kind of forcing us to use it
jupyter notebook is annoying with the cells
oh hey i recognize you from the #algos-and-data-structs channel
yeah I frequent this channel sometimes bc I wanna go into data science
it's a dream
I'm a data science/AI major rn
nice
one way you can simplify the debugging process is to always run the cells in order, from top to bottom.
god knows how I'm gonna learn all the math behind this field
is it a dataframe already?
yeah
df.columns = ['col1', 'col2', ...]
ok thanks
But let's say I defined the list already newColumns = ['Hospital', 'Provider ID', 'State', 'Period', 'Claim Type', 'Avg Spending Hospital', 'Avg Spending State', 'Avg Spending Nation', 'Percent Spending Hospital', 'Percent Spending State', 'Percent Spending Nation']
When i do df.columns(newColumns) it doesnt work





