#data-science-and-ml
1 messages ยท Page 186 of 1
Returns the coefficient of determination R^2 of the prediction
Ah I see! Looks like you can choose the error measure yourself
I guess you have this one http://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score
Then hold on, let me reread your older messages
I use the built in .score methods of the model classes
model.fit(X_train, y_train)
score = model.score(X_test, y_test)```
Aight, so you can ignore my error measure up there then
I've pretty much never used Rยฒ though
But anyway, so you just average your score over your 100 experiments then? Sounds fine
Yeah, avg, min and max
๐
Then just have a go at changing the data to a set of integers 0-17 and try fitting the model again, evaluating the score just like before
Should i stick to manual data splitting?
Trust the machine ๐ค๐
Just do random draws or something to split your data into training/testing sets
Sometimes they'll be bad, sometimes good, c'est la vie
If the model is good then the average score will reflect that
The "legit" way to split your data over a series of experiments using a limited data set is cross validation however: https://en.wikipedia.org/wiki/Cross-validation_(statistics)
Definitely have a look at that, I think i saw something in the docs about it
Since the dataset has 6 * 3 features I can see the getting 18 combos
But what do I do with data that share features ?
๐ค I don't understand
just give both the same category value and keep it as is?
Oh
Yes, that will be "translated" to
15, 0.4627
15, 0.4135
Or whatever integer you assign that combo
There's nothing weird about that just so you know. A data set will always be a flawed representation of reality
Say we wanted to predict the weight of a human being, using the features "nationality" and "gender" for instance
Certainly we can have two swedish men with different weights
The issue is that we lack features, such as "profession", "salary", etc. Those would help us, and they exist IRL -- but we don't have access to them
They are so-called "latent features" https://en.wikipedia.org/wiki/Latent_variable
In statistics, latent variables (from Latin: present participle of lateo (โlie hiddenโ), as opposed to observable variables), are variables that are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed (direct...
I see
As you do more ML you'll encounter this more often. If you plot the weight of all swedish men, for instance, you would quickly see groups, or clusters, forming
Why these clusters arise is because of latent features. Maybe most of the men in one cluster were recently divorced, or something
No way to know, but we can at least find the clusters and make educated guesses
that probably would make you eat more
exactly
Finding clusters like this is the main problem in unsupervised learning -- finding patterns that arise due to latent features https://en.wikipedia.org/wiki/Unsupervised_learning
That's a tangent though, not directly related to your problem ๐
The point is that without knowing these latent features, we will always see variation in the data
Of course, if we had ALL conceivable data, we would have 0 variation
good explanations ๐๐ฝ
good night
๐๐พ
Wow. I learnt a lot today. Thanks to you guys. Esp Slandon
I want to be able to train a program to identify a object in a picture
By feeding it pictures of what I want it to identify
Is keras and tensor flow the tools to do that ?
Yes those will work! Pytorch is another library that I find easier. You'll want to have a look at convolutional neural networks
Hope you have a lot of training data :)
@small ore By the way, after waking up i suddenly realized what you meant by "Is arbitrary numbering a good input training set?" and the answer is no. If we number the combinations 0-17 and regress upon those we should first have sorted the combinations after, say, average weight or something, so that combination 5 will correspond to a higher weight than combination 4 , etc. Otherwise there will be no correlation between the input and the weight, and our linear estimation 'y = kx + m' will fail miserably which I suspect it did. Completely slipped my mind yesterday ๐
I tried categorizing the combos of level and species btw
Didn't really see any improvements 
Did you try sorting the list of combinations by average weight before assigning the new label?
If we don't, there won't be any correlation between the label and the output
But even then, in the best case, your regression will just be another way of immediately mapping a combination to its average weight :/
It's such a weird assignment tbh :/
Yup
Was even stated that the highest possible score from all hand ins would be the top grade for the assignment, so if nobody got higher than -0.5.. that would be an A
They pit you against each other like fighting roosters ๐ค Is the data available online?
Not that im aware of
We just got a csv with it
But i think I've found a solution imma stick to
So to make a model that can distinguish between let's say different fish. I would need thousands of pictures of each fish I want it to recognise?
Quite possible!
The more the merrier
https://arxiv.org/pdf/1805.10106.pdf
In this paper they do some work on recognizing fish. In total they use 27000 images
Thnx
no problemo ๐
Any ideas on how to go about effectively knowing what to fiddle with to help a network train? Right now I have a network with about 400,000 images in the training set across two classes that are very similar (photoshopped vs not photoshopped) of the same category essentially and its having a hard time not over fitting into one section or the other. Do I reduce data? change activation functions? scale the images larger? all of the above?
@feral lodge The sorting would indeed meake it better but still arbitrary numbers does not make much sense to me. It is like a step of 1 to distinguish between species while the actual meaningful things may not even have any reasonable corelation to that function. Hot encoding seems better esp since data is not much.
Number of dimensions are also reasonable
@fathom current From what little I have read, do not apply your recognition codes directly to the original image. Dumb it down ( Grey scale, etc and perhaps other dumbing masks) to make the problem solve in reasonable time.
sadly there's an app already that does everything i was considering doing
Some common sense even says things like outlines may be sufficient to determine what these are. ( Not always)
so i may not be doing any machine learning stuff any time soon ๐ฆ
back to the drawing board
but thank you for the info
Google photos is one app I can think of which does not need a great many photos to recognize faces and search image by faces. Maybe it incrementally betters the prediction model for each face when you add more photos to its database. And going by its silent speed it perhaps dumbs down images a lot and uses only a few bits of information for making indexes. Recently someone known to me was impressed when it could detect a childhood photo of theirs based on a few 10s of their adulthood photos
@small ore I won't defend it since i definitely agree with you that it's a very bad representation of the data. ๐ It enforces some strange stuff like a uniform distance 1 between the data points, which is almost certainly false. But I also think we should be very skeptical of a representation that requires fitting in nine dimensions despite originally having only two features ๐ In the end I think neither will work well -- the artitrary-number representation because it simplifies and assumes too much, and the one-hot representation because it's too high dimensional -- because linear regression simply is a poor model for the data
@velvet anchor How different are the photoshopped and real versions of the images?
If the data is like, half children's sketches of animals and half photographs of animals it might be better to first detect whether or not it's a sketch and then use one of two convolutional networks to classify which animal it is
However if they're fairly similar it's probably better to preprocess the images, keeping only black-and-white outlines, and train on those
https://jgeekstudies.org/2017/03/12/who-is-that-neural-network/ Here's a fun blog entry showcasing some image preprocessing techniques
Also, in the arxiv paper i linked above they do some preprocessing on real life fish images (page 3) If your photoshop images are realistic-looking I imagine this preprocessing may produce similar results for them as for real-life images
Those guys also saw big performance jumps when switching between activation functions, since you mentioned those
So I'm trying out some algorithms on this classification problem. But the data cleaning is a hassle. I usually just structure everything in a long script ๐. I have several data sources that needs tying together, and each source has several separate datasets for certain time periods. In addition each dataset for each source has meta info like time offset etc. Would it be wise to create a class for handling all the data sources for each time period? With some function that returns a workable dataset?
And in the case that I want to add several of these time period datasets together to a larger dataset. Should this also be in a class, or should I just have a function that iterates over the some id's create instances of the class, calls the wished for output and merges this output?
could this motherboard be used to build a gpu-based machine learning supercomputer? or is pci-e x1 over usb too limiting? https://edgeup.asus.com/2018/asus-h370-mining-master-20-gpus-one-motherboard-pcie-over-usb/
@feral lodge not very
Well kinda. Theyโre deepfake images
So only a small portion has been touched
Yes
Tricky ๐ค
Right now I'm setup as 7 convolutional layers in a binary classification problem
You just feed the images without preprocessing?
I've done a tone of preprocessing
Been working at this problem for like 4 months ๐
Aw shucks ๐ค
I've tried RGB, Greyscale, a Gradient (which is a self created technique of drawing vectors of brightness change within the image), etc etc
all the different activation functions, more layers, less layers, etc etc
I know there are entropy-based algorithms for detecting tampering with images
But that's like for detecting photoshop editing
Interesting but not really my area. I'll see if I can find some papers
Good project though, lord knows we'll need to be able to detect deepfake media in the near future
Yeah it's a research project with one of my professors and it's just a 2 man team with a limited GPU
I'm having to run a batch size of 2 @ 250x250 to even train networks lol
It's a super interesting problem because
I've read about cloud-based deep learning using tensorflow on google cloud, is that an option? https://cloud.google.com/ml-engine/docs/pricing To increase the computational power
you're essentially trying to detect the noise within an image
but that gets lost in training a lot of time
cloud based isn't really an option because of pricing but a new GPU may be in the departments future soonish
Aight ๐ค
https://arxiv.org/pdf/1805.04096.pdf I guess this is too simple for you?
Looks like it detects copy-and-paste homemade fake images ๐
Yeah
https://arxiv.org/pdf/1711.10394.pdf This looks it handles a bit more challenging stuff. No humans though ๐
And it may turn out that what weโre trying to is impossible given team size (basically only me) and the hardware. Thatโs okay too. I just donโt want it to be the case
Iโve tried every combination of settings though. I have a master python script that generates models and tests accuracy and itโs either always 1 or always 0 over the test set.
Have you tried selu activation and alpha dropout?
Yes on selu no on alpha dropout
try alpha dropout after selu, also have you tried Bayesian search for your parameters?
Yes on Bayesian
what about other dropouts, batch normalization?
like keras.layers.BatchNormalization?
and for drop outs right now im dropping out 0.25 after 3rd layer then 0.1 after the 5th
Id post my code but i'm not at the workstation right now to have access to it
no github?
Nah I only work on it when I'm getting paid so i didnt have it uploaded remotely to remove temptation ๐
ayy
Is there a better ML framework than Keras for images? Like one thats easier to use?
I like keras because you can just feed it np.arrays but if theres another im open to switchin
pytorch,tensorflow
might look at pytorch but im also more well versed in keras. probably not worth trying to replace 4months of keras knowledge
When you say you did Bayesian search, does that mean variational inference?
Maybe, any links for exactly what you're referring to to be sure?
https://arxiv.org/pdf/1506.02158.pdf Like this one for instance. Instead of producing point estimates of the weights, place a prior distribution over them, and train to compute the posterior
Sort of in its infancy and limited by computation power and bias due to choice of prior, but in theory good against overfitting
here's what I'm working with in my current iteration, just trying distinct activation functions to see what changes
The epochs and stuff are low for rapid testing just to see results on a model, i up them to something reasonable if I get promising results
have bigger kernels at the start and reduce them as you go deeper in the conv2d layers
Tried that combination as well, tried at one point going from like 64 to 2
they should gradually decrease, not decrease at once
Yeah I did
I stepped like 64 -> 58 -> ... -> 2 at one point
I've just been messing around with this variation for a bit
Examples from training images are like
All I every did with images was on mnist, my experiments showed, have kernals decrease slowly, window size gradually decrease and that helped. sandwich dropouts/batch norms as much as possible and they'd do well
One of the big problems I found with using large kernels and stuff was overfitting
Because the images are so close to a real image
Like that
Damn that looks difficult to say it's a fake.
Exactly
Sometimes theyre more obvious like this
And then theres these which are basically impossible
It's pretty much impossible to solve lol. I think natural images have some sort of static noise in them, maybe artificially created ones don't, you could try to extract that as a feature and feed it to the network.
That's what I tried to do by creating a gradient to measure light
Hey guys! Just wanted to drop in and ask if there's a relevant channel for python for finance purposes?
I have an ongoing project that involves some quant with py, so if there's someone who's good with that please tag them here or tag a relevant channel as I couldnt find one
@velvet anchor and @hasty maple you might find some interest in research around characatures
the human mind abstracts things in ways that we have very difficult time reproducing in AI
@grave axle I would me interested to learn about the same topic too. If you learn about a server/channelor come across/know of any material, please let me know
Anyone here knowledgeable in Keras? Specifically multilabel class prediction
Getting some strange results from predict_generator
Im a bit of a keras noob but it's what i've been using for my project
predict_generator is returning incorrect values i believe
I have a model trained on ~1 million images
number of classes in 228
BATCH = 64
STEPS = len(test_paths) // BATCH
test_seq = TestBatchSequence(test_paths, BATCH)
probs = model.predict_generator(
test_seq,
steps = STEPS + 1,
workers = 5,
verbose = 1
)
Example of the resultant probability array (index 0)
Prob: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Shouldn't that be a probability for the classes, not just 1?
Also, each subsequent image has the exact same result array
If the results are the same for every image I think it's an issue of over/under fitting as for 0 or 1 i think some activation functions return only 0 or 1 as a result. @hasty maple might be able to answer definitively if he's around
indeed
are you using softmax / sigmoid in your output layer?
I'm using sigmoid activation on my last layer since regarding: https://stackoverflow.com/questions/44164749/how-does-keras-handle-multilabel-classification
I'm not that used to keras either lol. All I did was mnist data set using it.
I know for sure the reason it's 1 is because of how softmax handles probability
it's output looks like this
so it will drag output up to 1 or 0 outside a very small subset of values
and the reason it's the same is because of presumably some set of images resulting in an overfit, have you tried changing kernel / window size?
yeah it might be overfitting, you're right
I oughta compare test results with validation to determine this
kfold that B
what kind of companies do you data scientists guys work for?
dont the big companies all use java?
not really
@lapis sequoia I'm no data scientist, but my understanding was that Python is a very strong incumbent in the data science industry because of the many specialized tools written for it and its ease of use. bar R, it might even be the most popular
("data science" being a massive and sweeping field that it's probably not fair to lump every portion of under the same umbrella)
I know statoil, the big norwegian oil company uses machine learning to determine if theres oil based on soil samples or something like that. Was a guest lecture about it in my python ML course. Doesnt quite imply that they use python.. but 
Is there a TF equivalent to keras' .flow_from_directory()? in essence, I want to try a tf model with some images, but the few examples I could find either use the build in .dataset() module or a pickle'd file and neither are very helpful in that regard
I think I gonna do a data science bootcamp, there are lot of companies offering that now, costs about 10000$ for 3-5 months, only basic coding background required, almost 100% of students find a job as data scientist afterwards
hello guys
i'm having some trouble with training this data set using stacked algorithms
What is currently the most widely used/best python library for HMM?
hmm?
Hidden Markov Models
Seems the answer is pomegranate anyhow..
Data science is simply a cool name for parts of statistics
is data science artificial intelligence?
A part of AI is data science, a part is not
can you guys help me with something
AI just means something that can make decisions. Doesn't have to be very statistical but it tends to be
is data science artificial intelligence?
no
in linear regression if i z score my data and calculate the constants for my function, how do i invert the z-score constants that is true for my z scored data to original data?
add mean and multiply by standard dev.
Is there a dataset of normal photos containing people. Not just cropped faces like the LFW or Essex set?
@velvet anchor This has a wide list of photos containing people: http://image-net.org/synset?wnid=n07942152
No problem
what are some features that you guys commonly use when interpreting data
Depends on the data set
Would this be a good channel to ask for help on a neural network program?
@small pumice Yes
Ok
I'm working on a project in which a character in a text-based rogulike-styled world has to find its way around
To control it, I made a neural network from scratch
no libraries
But it isn't working
Wait-nevermind
I think I figured it out
Alrighty
@small pumice how did you do it?
in one of my future projects i plan on doing that as well
well except apply the neural network to something else
oh
Maybe. Coding a NN from scratch though is a very big task. I think youโll likely save a bunch of time getting tensorflow / Keras / pytorch to work for you than you will coding one from the ground up
Yeah, probably start from the basics, NN are actually pretty hard to make
Iโm not saying itโs impossible but itโs definitely a several month task for a team of researchers to get a working version
Yeah
Itโs just so hard to find TensorFlow tutorials that explain the concepts that are being coded in
Try looking for Keras info. I had the same problem
It runs as basically a tensorflow wrapper. Almost. Either tensor or thano
You mean thanos :p
maybe. im not familiar with it just seen it before
I think its actually theano but im not 100%
a simple neural network only one or two layers is not hard to code up by yourself if you wanted to... you should use numpy for all the matrix operations
Mine has 2 hidden layers
Input layer has 190 neurons, second layer has 16, third has 16, and output layer has 4.
here is a multi-layer neural net i coded with just numpy, if your interested
Cool
its not documented well or anything and its a few years old but maybe you'll find it useful
lemme know if u have any specific questions
i'd really only recommend coding one up yourself though if you are interested in the challenge or learning more about them or something.... if you just want one to use, id use libraries like other people recommended
If I have a NN detecting photoshop should the training set classes be like normal humans & a seperate class of photoshopped humans or would it make more sense to do like unphotoshopped images of all possible objects then a class of photoshopped humans
that new dataset improved things dramatically @hasty maple BTW. with some preprocessing and stuff I've gotten 80% accuracy ๐ฎ
Are you sure it's not overfit on the train/test set :P
I mean It could be but I just ran it over a test set of ~12,000 images not included in the training set
need more testing to confirm but its a start
Anyone know a server dedicated to Machine Learning ?
@feral lodge
I can't link invites, but search for artifical intelligence here: https://discordlist.me/
It's got chats for several branches of AI, including machine learning @nova viper
@feral lodge
Thanks for your help
๐
/r/learningmachinelearning has a discord server as well. You could look into that if you're interested @nova viper
I didn't know about that one, cheers! Here's the reddit post: https://www.old.reddit.com/r/learnmachinelearning/comments/6mfwmf/introducing_rlearnmachinelearning_discord_chatroom/
Anyone know of a good way to get certified/prove skills in Python to a potential employer? My MS is in biostatistics but I'd like to get into data science.
One of the best ways to prove skills is with a github to show case but I'm not entirely sure on certification
Looks like there are organizations that issue python certificates, but looks like a hassle and sometimes expensive ๐ค I'd say it's probably better to have a personal project or two which you can show the employer
Agreed on github
I know some companies like microsoft are now additionally offering like a certification in their machine learning programs but i don't know how much value they hold to employers
Alright, thanks for the advice guys. I've got some basic stuff up on GitHub but it's not really data-science related. Do you have any advice on something I might want to look into project-wise? Or would this be a better question for a career counselor or someone in the industry?
Biostatistics sounds like a perfect application area for machine learning if that's what you're interested in
Definitely. We do pretty much everything in SAS though, and I'm already looking into getting certified for that soon. (I'm currently in college) I'd like to expand my skillset though.
Python has several excellent libraries for statistical analysis
Pandas, SciPy, Numpy among others
scikit-learn
I listen to a python podcast thats been covering a lot of data science applications lately and I hear scikit come up ALL the time
What's the pod?
I think its talk python to me
Thanks ๐
He's been interviewing people who are using it in geoscience or at the allen institute, etc etc and they talk about their software stack and stuff
its pretty high level so not like super detailed but 3 you mentioned and scikit are what I hear every episode
I consider going to a data scientist boot camp, but i saw on the curriculum they dont teach python or javan, they only use R, is that any good for a job? ;/
R and Python are pretty neck and neck for data science
I dont have much personal experience with it but it was the original laguage and pythons currently trying to overtake its spot
R is a bit like MATLAB if you've ever used it
As a programmer I like to think of it as a very advanced, programmable, calculator
Great for stats, but you can't really integrate it with a bigger program
Which you of course can with python
yeah i dont think its a very versatile programming language, more something for academia and mathematicians
That's my opinion yeah
I can be used in the field though, one of my professors has done a lot of work for the central bank of Sweden, and he works almost exclusively in R afaik
seems like both have a pretty even split though
I have an interview @ twitch soon as a data analyst and they use Python
Good luck! Wish I had an interview too >.<
Ive prolly sent out 50+ applications haha
just looking for something entry level for my last semester
dont super care what it is
Okay awesome, thanks for the help guys
shotgun technique @hasty maple
@velvet anchor have you done any Data Science projects before applying for all these jobs? I just did one, studied for like 4 months, did one kaggle comp, got good results and that's about it ๐
Just this research fellowship over deepfakes
But I also almost have a BS in Math and CS
ah you have a data analytics preferred major
Yeah something like that:P
Are there any data-science centric Discords beyond the Python realm?
like for R?
Just in general, like broader topics than just language-specific stuff
I don't have any particular questions I'm more just curious
i think /r/learnmachinelearning has a discord
Ah, cool, thanks!
youre either not training well enough or displaying too low of confidence
Is there a book for tensorflow or pytorch thatโs most recommended?
Thatโll work. Iโve been using Keras mainly just wanna expand my horizons a bit
I'll dm you
K
I read some of the posts above. I am unable to see a search option on https://discordlist.me/ . Can someone help or just pass me the invites for the relevant servers please?
Also how do I add /r/learnmachinelearning?
@hasty maple @velvet anchor
Remove the [[[....]]] stuff if you didn't notice it ๐
Thanks a lot
No problemo ๐
@velvet anchor I've been recommended this book before: https://www.amazon.com/gp/product/1491962291/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=jackchanamazo-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1491962291&linkId=dba20e99d63c85b4bc0c89940c05cff0 Haven't read it though
Is that the Tensor flow pdf?
Possibly, but I have no idea! That pdf could be anything really
!mute @feral lodge 3d Bypassing the spam filter
:ok_hand: Slandรถn#5361 is now muted for 3 days (Bypassing the spam filter)
He knew what he was doing
He could have just DM'd you
These things are in place for a reason
He says he agrees with you but I still think 3 days of muting Slandon is a loss to us more than him
!unmute @feral lodge
:ok_hand: Slandรถn#5361 is now unmuted
Let's make sure that doesn't happen again
Lesson learned! Thanks boys
how does that slice work with the comma though. it looks weird
Oh 0 is step
I totally missed that comma
if it would be the step it would be [::0].
Also a step of 0 would be an infinite loop I guess,
I think it's referencing in a 2d manner with , just ignoring x and only giving y?
something like [x,y:x,0] ?
yea it must be somehting like that
and specifically implemented by the class of whatever x_factor is
They also (ab)use the __getitem__ to make the user write nicer syntax yes. Doesn't often look like this though.
data.iloc[:, 0] pandas dataframe
OK
It's an empty slice
followed by a 0
in a tuple
(slice(None, None, None), 0)
Right, but dependent on what lib it is, it could use 2 slices
Plt.scatter(x_factor[:,0],x_factor[:,1]
Yeah we really need to know what x_factor is
At second x_factor there is 1 [:,1]
They are used to request most likely different indexed columns in a table
So 0 would get first columns data, 1 would get 2nd column
that syntax is kinda nasty
X_factor is a variable carying data
It is :P
Would still need to know what x_factor is to give a proper answer
it's like 2d_data[row_selecting, column_selecting]
You can always print(type(x_factor)) if you have no idea 
Sklearn.decomposition
X_factor=F.fit_transform(iris.data)
with a capital X?
Anyways
Returns: X_new : numpy array of shape [n_samples, n_features_new]
Small
So yes it's a numpy array
Therefore its array[row slicing, col slicing] so py x_factor[1:3, 0]for instance would be row 1 and 2 with only data from column 0
[: ,0] says give me all the rows, with only data in the first column
Thanks
From sklearn import datasets
Import numpy as np
Iris= datasets.load_iris()
C= np.corrcoef(iris.data.T)
What is the capital T?
to me it seems like an alias for target
I misread something, lemme try again
Yeah, it seems to be the target
It's an array of all data split into multiple lists
first list is target
I actually don't know what I'm doing. But it's atleast every column of the dataset split into lists
Can anyone give me a cheat sheet for sklearn?
What kind of cheat sheet
Sklearn library
Yeah. Their docs are one of the best
Iโm not sure if any like quick reference pages though the lists like functions often used or whatever
Ow ,That's a good site.
Is there a reference for all the output values of different keras activation functions?
there's https://keras.io/activations/ but I wanted like a set of possible values
Yeah that's actually perfect
Didn't know there was such variety actually ๐ค
Yeah picking the right activation functions is certainly difficult
because theyre all so different
https://arxiv.org/pdf/1804.02763.pdf
Looks like there's been some work on comparing some of them
hey, I'm trying to do some time series analysis.
year
1998-01-01 71
1998-01-01 60
1998-01-01 65
1998-01-01 83
1998-01-01 72
Name: yieldpercol, dtype: int64
this is ts.head()
My issue is that I need to combine all the data from each individual year together
that is, add up all the entries for 1998, then all the ones for 1999, etc
And I'm having trouble figuring out how to do that with pandas
There are so many TensorFlow and Keras tutorials that jump straight to things like MNIST recognition. Does anyone know some good tutorials on neural networks using simple data that you make?
For example, a tutorial that shows how to make a neural network that can add two numbers together? I know itโs simple, but it would be a good way to get the concept down.
The Js tutorial on NN from coding train and 3blue1brown are pretty good
Subscribe to stay notified about new videos: http://3b1b.co/subscribe Support more videos like this on Patreon: https://www.patreon.com/3blue1brown Special t...
Welcome to Chapter 10 of The Nature of Code: Neural Networks. (http://natureofcode.com/book/chapter-10-neural-networks/) In this video, I provide a brief int...
Both not python related, but explain the concepts fairly well
@worn cosmos By "add upp", do you mean sum? In that case you can probably do something with pandas cumulative sum https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.cumsum.html. If you mean "for each year Y, create a vector/dataframe of all data from the first year to year Y", then you should be able to loop through range(0, length_of_data) and for each index create a slice from 0 to index with data.iloc([ : index]) or something like that:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges
https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer
hey, i'm doing some pretty basic data acquisition and i can't get my live plot to work properly, would anyone mind helping out?
bot.tags['ask']
Asking good questions will yield a much higher chance of a quick response:
โข Don't ask to ask your question, just go ahead and tell us your problem.
โข Try to solve the problem on your own first, we're not going to write code for you.
โข Show us the code you've tried and any errors or unexpected results it's giving
โข Keep your patience while we're helping you.
You can find a much more detailed explanation on our website.
aiohtpp and urllib can read from anything which is reachable with http or https
Oh thanks
๐
hello
how do you guys apply models to the training.csv file
and what is a model made of/
?
Hey! It sounds to me like you're unsure what a model is, so this answer is pretty basic. Sorry if I misunderstood ๐ When we observe and take measurements of stuff in the real world there's often good reason to assume that those observations follow a predictable pattern, even if they are seemingly random and independent of each other. A statistical model is a way of mathematically concretizing those patterns, so we can better understand and work more effectively with our data. We will never have enough data to completely accurately model the complexities of real-world relationships, but a simple mathematical model often captures the essential underlying patterns of the observations. What model is suitable depends on the nature of the data:
Sometimes our observations are only positive integers, like if we were counting the number of spam emails a person recieves every day. This kind of data should probably be modelled with a Poisson or binomial distribution.
https://en.wikipedia.org/wiki/Poisson_distribution
https://en.wikipedia.org/wiki/Binomial_distribution
Sometimes they're real numbers without limits, like when measuring the temperature at 21:00 each day in January. That kind of data probably follows a normal distribution or a Cauchy distribution.
https://en.wikipedia.org/wiki/Normal_distribution
https://en.wikipedia.org/wiki/Cauchy_distribution
Sometimes the data consists of real numbers with explicit limits, like if we have a list of estimations of the probability of a turtle egg of a certain species containing a female turtle. Since a probability can only be between 0 and 1, that kind of data can likely be nicely modeled with a beta distribution.
https://en.wikipedia.org/wiki/Beta_distribution
Instead of only trying to model the distribution of possible values in your data, you're often also interested in modelling the relationship between the different features in your data. In what way does the chance of getting lung cancer increase with each year of smoking? How does it correlate with age? Income? Weight? To model these kinds of relationships we often use a technique called regression. With linear regression we assume the relationship is linear, with polynomial regression we assume the relationship is some polynomial.
https://en.wikipedia.org/wiki/Linear_regression
https://en.wikipedia.org/wiki/Polynomial_regression
If the relationship between features is very complex and high-dimensional, we often need to use a more complex model, like a support vector machine or a neural network.
https://en.wikipedia.org/wiki/Support_vector_machine
https://en.wikipedia.org/wiki/Artificial_neural_network
Since we haven't seen your training.csv file and don't know what kind of analysis you want to do on this data, we can't really say what kind of model or approach is suitable for you. There're plenty of tutorials online for machine learning in python, maybe check this one out?
https://towardsdatascience.com/simple-and-multiple-linear-regression-in-python-c928425168f9
Sorry if this was too verbose or below your level friendo, let us know if you wanted some other kind of direction ๐
@lapis sequoia
oh its fine!
im kind of new to the whole scene
plus im just a high school student lol
Never too early to start ๐
https://www.coursera.org/learn/machine-learning I haven't looked through this course myself, but my friends who have liked it a lot! Might be a good intro to machine learning, and it's free I believe? You enroll and then get access to the video lectures
Just as a headsup Slandon, you can cut down the length of messages by removing the embed sent by each link by just wrapping them in <> <www.google.com> wont send the embed for www.google.com
Unless you wanted them ofc ๐
Oh I had no idea, thanks! I've got link previews turned off, so I always forget they exist
Oh, makes sense. You had quite the wall there ':P
Every wiki link had a pretty big image with them
Live and learn ๐คฆ
What's a good book for statistics, I'm primarily looking for something small and conscience just to go over the concepts, not anything with a ton of derivations and stuff.
Hey all. I'm using Python to do some basic data visualization for a pet project I'm working on. I'm wondering what they best way to plot a timeline (historical, as in multi-year) for the reign of multiple emperors would be. The columns I've got that I think are actually of interest are as follows: Start (in years), End (in years), and ruler name. Basically I'd like to get a chart with all the rulers mapped onto it (just straight lines) but separated so you can see them individually, as they're sometimes overlapping.
I'm happy to share the dataset. It's a collection of all Chinese emperors. I'm gonna be releasing it onto my blog once I've finished this last part, but I'm having trouble finding the proper package to use. I found a suggestion that I could use a Gantt chart for this, but can't figure out how to actually work that with my data. I've got it all in a pandas dataframe.
Each row is an emperor, with each emperor having a start date, end date, and length of reign. Not sure if that will help with answering the question.
This link https://plot.ly/python/gantt/#use-a-pandas-dataframe seems to cover plotting pandas data in Gantt charts using Plotly @young aurora
I'll try this! Thank you. I'll report whether it worked or not afterwards.
Hope it helps!
So I feel like an idiot - but my plot is totally blank.
This is the code I used to create it - don't know if this is what you'd need to help.
fig = ff.create_gantt(ThreeSovereigns, colors=['#333F44', '#93e4c1', '#93e4c1'], show_colorbar=True, group_tasks=True)
py.iplot(fig, filename='gantt-group-tasks-together', world_readable=True)
Yeah, that one definitely wont work. It just isn't doing what it needs to do. Any other ideas?
ThreeSovereigns = ChinaEmpire[ChinaEmpire.DynastyCode == '00a']
In one of their earlier snippets they define their data like this
It works fine for creating matplotlib charts etc.
If you just copy-paste their code, using their example data, does it plot correctly?
Yes. I was hoping there was a more elegant solution than hardcoding in the start and end date - I think the issue may be that my start and end dates aren't in datetime
That being said, they're only years (e.g. -2023 Start, -1500 End) so I'm not sure how to convert them into datetime if that's what this requires
If it isn't and hardcoding is what it wants, I can do that too. It just seems extremely... bad
As long as it fits the pattern if should be fine! That is, the data should be of the form [ {"Task" : <Name>, "Start" : <Start time>, "Finish" : <End time>} ]
So if you can process your data file and parse it to such a list i imagine it'll work. So then the issue is the dates... If you only have the years you can probably just set the month and day to be the first of January or something
Yeah, that's totally fine - it's thousands of years and also mythology.
No idea if it'll handle BC dates nicely though ๐ค I have an early meeting so I'll have to leave, but if you ask in the help channels someone should be able to help
Okay, thank you!
This has become a sub-problem of the first, so if you show a snip of your data and explain about the negative dates that's probably enough to go on
Okay, cool. Pyplot just isn't playing nice with the numbers, either.
No problem, hope you can solve it! Feel free to DM me a link to the blog when you're done ๐
Guys
Is the Yahoo scraper of pandas actually broken
The Yahoo data reader for stock prices
<@&267628507062992896> I suggest pinning Slandon's message above. The one with a lot of links and explaining basic 'model'
I agree
I think the entire message isn't pinned. It prolly is internally split into two messages
Hey, anyone here know much about logistic regression? I'm getting a huge upper limit for a Wald confidence interval and I don't know if it's reasonable or not.
fractional_shortening | 75.604| 0.252| >999.999 |
the values being the point estimate, and upper/lower wald CI's respectively
Looks like it works, but implies a high standard error SE(ฮฒ-hat) = SD(ฮฒ-hat)/sqrt(n) for the maximum likelihood estimation ฮฒ-hat of your coefficient ฮฒ. n is the sample size.
The Wald 95% confidence interval is usually a Gaussian centered on ฮฒ-hat whose standard deviation is the standard error of the mean, SE(ฮฒ-hat). That means that 95% of its density is contained within the two points ฮฒ-hat ยฑ 1.96 * SE(ฮฒ-hat), so those two points are where the upper and lower CI limits usually lie.
Since this is logistic regression though, we're working in transformed space. That means the confidence interval is transformed as well, so the interval is rather exp(ฮฒ-hat ยฑ 1.96 * SE(ฮฒ-hat)). So for you we have this:
> betaHat <- 75.604
> lower <- 0.252
>
> # Now, because lower <- exp(betaHat - 1.96*stdErr)
>
> stdErr <- (betaHat - log(lower))/1.96
>
> stdErr
[1] 39.2767
> # Pretty high!
>
>
> # Checking lower and upper CI limits:
>
> exp(betaHat - 1.96*stdErr) # Lower limit
[1] 0.252
>
> exp(betaHat + 1.96*stdErr) # Upper limit
[1] 1.85097e+66
> # Very big!
>
@hasty maple I can find the book we use for my Statistics class at Uni if you want
Found it. this is what we used, it was pretty nice, I'm sure you can find old versions quite easily
Hi, this looks like the scientific and numerical python channel on the python discord server. Is that correct?
Yeah
Wew that's an expensive book
Honestly its not so bad. its pricy new but used is reasonable
Is there a way to ensure a keras model is free from the GPU to train again?
Such as like
While(this):
train a model
test accuracy
free resources, to tweak settings
lets do the timewarp again
I've had to manually restart my notebook session to get the GPU freed from the clutches of Keras, let me know if you find a better way ๐
Right now Iโm running multiple scripts that call a second one to free it
So it frees with model.py exits
:o script calling a script, is this something different from an import? iirc import does just that
I wasn't importing it but it would probably also work
Just looking to rewrite my toolkit's as we move forward towards an adversarial network with 6 months of python knowledge
instead of the trash thats like taped together
lol
I took this research position with actually 0 knowledge of python really
So it was like, quick sketch of what I wanted to do in C#, Port to python for Keras & TF options, hold the codebase together with prayer
You can learn python in a week tbh, it's not that difficult
Yeah for basic stuff
but decorators, generators, etc etc are stuff that are also easy but not immediately apparent when you need them plus all the other libraries that are core parts of writing correct python definitely takes more than a week
ah true, I never learnt classes, decorators, generators and the like as I haven't found any use for them yet
They're nice
Generators are nice for datasets where you don't have a standard way of iterating but dont need (or can't fit) the whole set in memory
https://paste.pydis.com/vaqofeyoxi.coffeescript monkeys with typewriter method of CNN training
@hasty maple
i have an interest in learning about neural networks, but i don't really know enough maths to do much with it (though i seem to pick stuff up quite quickly). i don't really have the willpower to learn a whole bunch of mathsy stuff straight up because i'd prefer to apply it and see what it does, rather than simply knowing what it does. essentially, is it possible to work with some of the more simpler aspects of neural nets without having a lot of mathematical knowledge at the beginning?
Yeah neural networks don't require a ton of math knowledge for categorization that's more of a data analytics type of problem
but not having a strong math background won't hurt too bad with NNs
i see
because i had a look at a sort of "hello world" example a while back and, even though i didn't know exactly how everything worked, i believe i got the general idea at least. i guess if i fiddled with stuff more to see what values affect certain things, i would be able to develop a better understanding of how stuff works together like that
https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py is a simple example btw @lilac shadow
ooh okay i'll take a look ^^ thanks!
i'll have to look up what a lot of this stuff does haha
And then mine as well with the pydis link above is an example of a network
but it's not standard because im kind of just room full of monkeys with typewritering my parameters
lol
@feral lodge gonna come in and lay down the science on why what I said has been wrong ๐
If you don't mind a little bit of reading i would recommend checking out at least the first few chapters of this book: http://neuralnetworksanddeeplearning.com/chap1.html I think it's a wonderful introduction to NNs! The math is light also
ooh, now that is interesting
i've always been fascinated at NN's recognising stuff in images
and things like that
The math behind categorization is largely abstracted from the user
for Keras at least you just kind of throw your image as np.array() to the model and it tells you if it fits in one of your categories
i like to know how stuff works behind the scenes too, yeah?
Image stuff is usually handled using what's called a convolutional neural network -- an example of which you'll find in Clay's github link up there. One thing I like very much about the learning example in the book there is that it shows that even plain old feed-forward neural networks can be used for images (though in a limited capacity)
i see
all this terminology is going straight over my head so i'm going to be doing a lot of googles when i get round to looking at this in detail :D
oh, that makes sense
so you just need to find the best algorithms to do what you need to do, basically?
obviously easier said than done
Best parameters, activation functions, etc
yeah
some is pretty easy but when dealing with images that are really close together it gets difficult to distinguish
I've been working on this image classification problem for like 3-4 months now for example
i imagine it's a fuckton of optimisations to do as well
@feral lodge what do you know about genetic algorithms?
A big part of it is also figuring out clever ways to preprocess your data. Clay's working with images, for example, but he can't just toss his images through the net -- has also has to preprocess his images with stuff like this,to remove unnecessary noise and bring out important features
Not much at all actually, just what I remember from the AI intro course ๐
I'm sure I can google a bit to look smart though ๐ ๐
oh yeah, i know about pre-processing ^^ i always imagined it to be a way of making the data more "standard" i guess you could say
My boss was just explaining them to me and mentioned how they might be helpful to randomly tweak parameters
It can be. It can also be used to help guide the important parts too
yeah, that makes sense
I didnโt really have a question about them though Slandon. Just didnโt know if youโd heard of them
That's super interesting, haven't seen it before. Here's a paper on it you can get paid for reading https://arxiv.org/pdf/1712.06567.pdf
๐. Iโve got it almost implemented actually alreadh
Nice!
Just gotta make it object oriented and pretty
Got this stuff on an open github?
Nah
Not until we publish
I keep the network stuff kinda open but the pre processing is hidden ๐
That's cool to be part of a publication before your graduation ๐ฎ
Yeah
and lead author ๐ค
Pre processing is kept secret because it's the magic I guess (and it's ugly AF code wise because I wrote it Day 1 of learning Python so it takes around 72hours to do 12,000 images LOL)
But if you want to read into it, it's based on light measurement and ELA
I can read the paper laterz ๐
ELA is such a genius method of image forensics honestly
but it only works on jpgs ๐ฆ
Error level analysis is the analysis of compression artifacts in digital data with lossy compression such as JPEG.
but I keep all the stuff off public repos too @feral lodge so I don't work on them without getting paid hahaha
Don't you have the thirst for knowledge? ๐ค
ELA looks cool, never heard of it. I've hardly worked on images at all
I do but I also have a thirst for not starving
Although with that being said, I'm currently working on my classes while off the clock xd
I meant classes like OOP classes my mistake haha but
I'm taking C++, Algorithms, Philosophy and Tech writing this summer
then I finish in the fall with Operating Systems, Programming Languages, Senior Design, and Assembly 4
I have ascended ๐ผ
That's some good stuff though! How far into your education are you?
I graduate in dec
Master's?
That's great! I didn't touch ML until after my bachelor's were completed
Here's my genetic algorithm btw @feral lodge but I haven't been able to test it yet ๐
import netparams
import random
class Genetic:
def __init__(self):
self._population = []
self.createpops()
self.actfunc = ['relu', 'selu', 'linear', 'tanh', 'softmax', 'elu', 'softplus', 'softsign', 'sigmoid']
self.paramlist = ['window1', 'window2', 'window3', 'window4', 'window5', 'window6', 'conv_depth_1', 'conv_depth_2',
'conv_depth_3', 'conv_depth_4', 'conv_depth_5', 'conv_depth_6']
self.actlist = ['activation1', 'activation2', 'activation3', 'activation4', 'activation5', 'activation6',
'activation7', 'activation8']
def createpops(self):
for x in range(0, 9):
child = netparams.NetworkParams()
for attrib in self.paramlist:
child.setval(attrib, random.randint(1, 36))
for slot in self.actlist:
child.setval(slot, random.choice(actfunc))
child.setval('hidden', random.randint(400, 1600))
self._population.append(child)
def evolve(self):
Parent1 = random.choice(self._population)
Parent2 = random.choice(self._population)
while Parent1 is Parent2:
Parent2 = random.choice(self._population)
child = netparams.NetworkParams()
for attrib in self.paramlist:
child.setval(attrib, random.choice( Parent1.getval(attrib) , Parent2.getval(attrib) ))
for slot in self.actlist:
child.setval(slot, random.choice( Parent1.getval(slot), Parent2.getval(slot) ))
return child
def compare(self, childlist):
for x in self._population:
if x.getval('fit') < childlist.getval('fit'):
self._population.remove(x)
self._population.append(childlist)
class NetworkParams:
def __init__(self, **kwargs):
for key,value in kwargs.items():
setattr(self,key,value)
def getval(self, networkparam):
return getattr(self, networkparam)
def setval(self, networkparam, value):
setattr(self,networkparam, value)
Would this be an appropriate place to ask for help concerning Matplotlib and Python?
Sure
actually, maybe.
this is more for analytics / ML so it depends on what you're asking about within it ๐
Aperture redirected my help request to this channel
Ask away ๐
It is about plotting a polar plot essentially.
I am working on generating a Radar PPI Scope using matplotlib. I need fine control of how major and minor ticks are handled along with tick labeling. Since plt.polar does not offer sufficient control over these parameters (to my knowledge), I have opted to use a Polarxes transformation and AxisArtist functions to get the control I need. However, I have run into difficulties with how tick label printing and minor tick marks are handled. The picture below is an example PPI template that I seek to recreate.
the channel description was always like this
And this is what I have recreated thus far. North bearing corresponding to 0ยฐ.
I can't figure out how to get minor ticks to print every 1ยฐ. Additionally, I cannot get the major axis tick labels to print every 15ยฐ starting at 0.
My current Code: https://paste.pythondiscord.com/urozeduzov.py
These issues have been stumping me for the last couple of days, so I figured it was time to ask for some advice. ๐
Lemme get matplot installed and such and i'll take a loot
and by that I mean we're taking our break in class now so BRB 10
Damn can't get matplot installed
Which dependency manager are you using?
pip
Keeps failing for no reason
(in a virtual environment)
not in a virtual environment ti says operation isn't permitted
Interesting. I am running it in a virtualized environment right now. I use MiniConda though as my dependency manager.
ooh that's some pretty pretty data
You bet it is. It's also problematic because it includes mythical beings, but hey, what can you do
Hi everyone,
Have anyone tried to predict words from letters? Or give word suggestions.
I want to train a model for my language with my own data and i want to predict words from letters or give word suggestions.
I'm open to all suggestions
@lilac shadow Andrew Ngs course is good for someone who fears math. He even teaches basic matrix multiplication and skips over derivations which require the simplest of PDs and straight away goes to the final result and concentrates more on discussing it
I've heard Ng's course is fantastic ๐
@dreamy tartan Do you mean like autocorrect, or, easier, autocompletion? Or does your language use other symbols like ัะบัััะบ ฮฑฮนฮตฯฮพฯฮฟฮฝฯฮณ เฆฌเงเฆเฆฌเฆกเงเงเฆธเงเฆเฆคเง and you want to predict those kinds of words using the abc alphabet? ๐
Can you show us a small example of what the program should be able to do?
@round current I've never used matplotlib, but this guy https://stackoverflow.com/a/44657941 seems to have created major and minor ticks using some other approach than transformation
I'm not sure how to interpret the graph @young aurora, could you explain? For Yuan for instance, the bar goes between 3ish and 12ish years, but wikipedia says the dynasty lasted from 1260 to 1368 ๐ค Those dates seem to be pretty exact, so why do you have errors bars?
Oh, so this isnโt the length of the dynasty - this is the length of time for individual rulers!
Also, itโs all based on the traditional dating used by Chinese historians (AKA the old one) rather than newer dating methods.
I should be more clear with the title/X label.
Hmm, but these were the Yuan emperors -- only Kublai lasted for 34 years, but I'm interpreting the graph to say he ruled for maybe 2 years ๐
And Temรผr lasted 35 years, but the tick diving the Yuan bar in two is far from the middle ๐
So this is important to understand for chinese dating
Lay it on me
The emperors arenโt necessarily what you see on Wikipedia
These are taken not from historians in the modern, technical sense, but rather from court records made and changed much later
Think of it as โedited historyโ
Iโll go check the data for them, though, and give you a more complete answer
Itโs 398 total emperors, haha
Very interesting! Send it as a DM so we don't scare away new questions here
@feral lodge my language is using latin alphabet ๐ Peter Norvig approach helped me a lot and i think it solved my problem. With it im doing spell checker now. For autocorrect and autocompletion do i needed something like this im guessing. Am i correct?
@feral lodge That is what I ended up doing. I generated small line segments prior to the transformation to create the minor and major ticks. Far from an elegant solution, but workable. I am satisfied with the end result.
That looks cool!
This is the Peter Norvig approach, right? https://norvig.com/spell-correct.html Looks like it already functions as an autocorrector! For autocompletion a good starting approach is to just keep track of the letters the user has written, and keep a list of all words in the dictionary that begin with that sequence, sorted by how common the word is (if you have that info).
Yeah, that looks awesome, great job ๐ฎ
Nice job @round current glad you got it working
Slandon! You know the universe
I don't know why I was tagged to that haste bin Clay ๐
Oh just because it was a dumb solution to just like random parameters and rerun it
๐
what makes more sense from an OOP perspective for an evolutionary algorithm? wrapping the model + parameters inside of an overarching simulation class or just letting the model be a procedural setup that calls parental gene manipulation as needed?
also because doing it that way does free up the model memory Ichi, it seems just running del model and gc.collect() will clear the GPU allocation allowing for another model to run agin
and, I thought, you mentioned being interested in a solution from within the same script
ah yeah I was, but it was hard to follow the code as I checked back a day later and wasn't really sure why I was tagged.
I'll keep del model and gc.collect() in mind. Do I need to import anything to run gc.collect()?
I don't believe so
Once I get back to the office i'm gonna be rewriting it to add in the genetic / evolutionary algorithm instead of just rand()ing parameters
hard in the sense I didn't know why I was given the code, the code was easy to understand
Yeah just because you'd been the person i'd been primarily talking to about it ๐
Good luck with the Genetic Algorithms
Is there a quick way to calculate how much space a network will take up in memory?
Is it just input size^2 * layers?
Can someone tell me as to what genetic algorithm is and how and where they are useful?
Ok so a genetic algorithm (also called evolutionary algorithm, or a few other terms) is a way used to create a set of parameters that gets better over time
It works, at a really high level, like this. Create a population of a number of parameters. Letโs use 4 in this example
These parameters may be a list like:
1 layer, 3x3 window, 200x200 input size, sigmoid output function.
3 layers, 10x10 window, 400x400 input size, soft max output function
And 2 more with a different set of parameters you want to adjust
Now you take these sets of parameters and compute a score for them for how well they match an optimal output. So for my case, as an example, Iโm scoring based on how accurate they are at identifying images.
Now that I have 4 sets of parameters and a score. I can create children. So Iโll take 2 random sets from my population and just randomly pick parameters. So I may take dads input size, layers and moms output function and window
Youโll take this new child set of parameters and score it. If itโs better than one of the other 4, you replace the lowest score and run it again
You can also implement โmutationsโ to your population. So you could take the parameters and add 1 to them or whatever
Does that make sense @small ore
Still reading and trying to make sense ๐
So, if I understood it right, you set the parameters for each layer(including activation function, window, etc) in the beginning quite randomly and then use the "genetic algorithm" to change parameters ( is that what you call mutation) and see if it scores better?
Yeah exactly. You just randomly set parameters. Make children from them and see if itโs better. Then with survival of the fittest you replace the lowest score with the new one if itโs better
And repeat until youโre satisfied. And that whole process is called a generic / evolutionary algorithm
Itโs useful for optimization. Thereโs a few techniques like the one above, simulated annealing, swarming, etc
Wow. And here I am finding it difficult to understand even the basic NN well
There is loads to learn flop
Thereโs a lot to learn but itโs not too hard once you get it ๐
Iโve been working with Keras for a few months now and Iโm like just barely scratching the surface kinda.
I have read forward and backward propogation twice now and while I understand everything that is said, I am yet to figure what are the knowns and what are unknowns in each step
So if I'm understanding this correctly. a network with input shape 100x100 in RGB and 3 convolutional layers with 3 filters, with a dense output will take up
100x100x3 = 3000 +
100x100x3 = 3000 +
100x100x3 = 3000 +
100x100x3 = 3000 +
1x100x100 = 1000
Then multiply the batch size by the total amount?
did i do that right? does window size matter at all?
I know thereโs model.timeline() or model.summary() but I believe that the model gets loaded into memory first fully before computing that information. So Iโm trying to avoid OOM errors instead of wrapping everything in a try except
@small ore those hastebin links above are an example of an evolutionary algorithm (version 1.0) if you wanted to see one fully written
Oh wow. Not sure if I will understand it. I will take a look at it. Thank you
Are those ruby files? ๐ฎ
No itโs python
Okay. It does look like python classes and methods but the extension in those hastebins made me think it could be ruby
Thanks for getting me interested in it
No problem. Haha. Gave me a reason to come into work today and finish jt
Yeah IDK why they got the ruby extension that's weird.
if you edit them to have a .py URL, it'll have python syntax highlighting
ALso dont hate on my awful use of kwargs. it's got a reason as this build continues fleshing out
@small ore To answer the question you asked this morning at a very high level...
Genetic Algorithms are useful for optimizing extremely large data sets. They don't necessarily give you the absolute best possible value. However, they get very close, with significantly less computational cost (less computer power).
Think of it as finding a solution that's 90% as good as the best, in 2 days on your laptop, instead of the absolute best in a year on a supercomputer.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.330.1662&rep=rep1&type=pdf paper about if you want even more high level
@quiet gyro Thank you
hey data scientists, I wonder ... I have some time series data that I use as a regressor in a GLM for some other measurement. the problem is taht there are some high intensity peaks in this time series that kinda mess up the regression
i know that music processing people usually apply some "dynamic range compression" to make the whole recording have a similar amplitude. is that used for other data as well or would it change too much?
My two cents: Does the data representing those peak really matter to the representation of the data and prediction if they are removed?
@lapis sequoia
Yeah that type of processing is used in other places Hypo
It might change too much but smoothing out data from massive peaks is always part of the challenge but do keep in mind it might alter the rest of your results too so you might have to redo your formulas with the compressed data
Any pandas people here? I've got a litle thing I'm wondering if I can pick your minds.
I've got two tables, one for hospitals and one for patients.
Each patient record has a foreign key back to the hospital table.
each patient record also has a mortality result (ALIVE/EXPIRED)
I'm trying to add a column to the hospital table of the mortality ratio, that is n_expired / n_total
Currently I was thinking about going down the hospital table with iterrows() and grabbing all the patient records for the corresponding hospital and calculate the ratio one by one. I was wondering if there was a more pandas-y/pythonic way of doing this..
Not that I know of. #databases may have a clever way to do it that Iโm unaware of, but your method is how Iโd approach it.
This would be the R-y way of doing it, and pandas is meant to resemble R, afaik:
import pandas as pd
apples = pd.DataFrame({'color': ["red", "green", "red", "yellow", "red", "green", "yellow"], 'taste': ["nasty", "tasty", "nasty", "tasty", "tasty", "tasty", "nasty"]})
color_stats = pd.DataFrame({'color': ["red", "green", "yellow"]})
def compute_tasty_ratio(c):
c_apples = apples[apples['color'] == c]
n_tasty = (c_apples['taste']=="tasty").sum()
n_tot = c_apples.shape[0]
return n_tasty/n_tot
color_stats['ratio'] = color_stats.color.apply(compute_tasty_ratio)
print(apples)
print(color_stats)
In this example we add a column to show the ratio of tastiness of apples of different colors
@bleak geode
Prints
color taste
0 red nasty
1 green tasty
2 red nasty
3 yellow tasty
4 red tasty
5 green tasty
6 yellow nasty
color ratio
0 red 0.333333
1 green 1.000000
2 yellow 0.500000
๐ฎ @feral lodge I thought you died
I live!! Got a 70 hour work week during the summer though, so I'll probably not be as active :O
Besides, you keep answering all questions for me ๐
Ok so @feral lodge general question here. If I wanted to release the evolutionary algorithm onto pip as a framework what makes the most sense. Allowing users to submit their own class and a list of keys to create children of? Or another way?
Let me check my textbook and refamiliarize myself with GAs ๐ค
Is it a general GA, or is it specifically for training NN weights?
I want to at some point release another version specifically for NN weights but thereโs a lack of GA frameworks for Python in general so having a general version too isnโt a bad idea
Good initiative!
At some point for NN weights I incision a constructor where you say like โGenetic(pop=X,conv_layers=y,...)โ and have it generate full models
But for a regular GA itโs much simpler
The way I see it the algorithm needs two things: how each individual member of the population is formatted (like, is it a binary vector? A vector of floats? A mix? How long is the vector?) and a fitness function
What did you mean by their own class?
The inputs to their pseudocode is an initial population and a fitness function
Would your framework be fancier than this?
genetic algorithms amaze me ^^
@lilac shadow http://boxcar2d.com/ Check this out if you haven't ๐
Not really. But I wasnโt going to implement the fitness function from within (you can see my framework as it is up a few lines). Essentially I envisioned a way so that the user has s simulation class that scores fitness.
From within simulation they instantiate my genetic class providing a list of attributes to be randomized (the things we care about) and a class to contain them. My GA framework will handle the evolutions / population control / etc so that
From the users perspective all they have to do would be
GeneticF = Genetic(...)
For X in GeneticGenerator
X.fit = function result.
X.compare()
And then after a predetermined number of iterations it would spit out the population of most fit results
Where the generator would handle evolving from the pop / mutations / etc
The reason youโd want a class to contain values vs a dict doesnโt super matter but it gives you freedom to apply @properties so you can apply processing to specific results later
Oh, I see ๐ค So they test fitness themselves, and your class functions as a way to produce new individuals which to test?
Seems fine to me!
I can't think of any direct improvement of what you have right now, except generalizing the code in Genetic
I think ๐ค
Yeah that comes too
I just needed something workable and I had 2 hours to write and test before class ๐
Or maybe I still don't get it completely, the reproduction rates are based on the fitness evaluation right?
The percentages in that screenie
Does the user have to input the fitness evaluation for each individual?
Not exactly. Itโs just kind of random sampling
But the point is to choose/evolve fit individuals ๐ฎ
The user would supply fit = function result inside a loop or whatever
And then the framework would handle making sure it fit within the population
It is the goal to evolve fit individuals. Yes.
However more fit parents doesnโt always equal a more fit offspring
So just evolving from the two most fit every time doesnโt guarantee the best result
So you randomly sample parents from across your population that are within the โfittestโ
So like in your picture above, unless I misunderstood what you asked, you donโt mate just the best two and the worst two. You just mate all of them kinda and see what the best results are. Make a new population of best performers thatโs the same size. Repeat
Oh no, definitely not just pair the best two
Bu I think our two algorithms are slightly different
"In this particular variant of the genetic algorithm, the probability of being chosen for reproducing is directly proportional to the fitness score, and the percentages are shown next to the raw scores."
I was thinking this
Maybe. Iโm on mobile too so I could just be representing my ideas incompletely
Yeah nah. Iโm not choosing based on fitness scores in any way. Though i suppose it wouldnโt be hard to implement. I was thinking of just mating all of them together so 1&2, 1&3, 1&4. 2&3, 2&4, 3&4. Make new population of best 4. Repeat
Nah itโs not. Just how my professor explained how it worked. ๐
But I guess at the same time GAs arenโt not time consuming either.
Oh sure, they're basically a random search! But I was thinking your approach adds a lot of extra randomness and time/space requirements, while disregarding a big part of the "genetic" aspect ๐ค
Yeah could be for sure. Definitely wouldnโt hurt to add in percentage to be chosen
But, the Genetic class has no knowledge of the fitness function then?
That's all handled by the user?
Yeah
So how do you choose the best children after pairing all individuals?
For child in babies:
If child.fit > worst population
Replace worst with baby
And child.fit is computed how? o:
The generator function would return a child object to the user
The user would score the file off their fitness function
Set child.fit equal to its result
gotcha ๐
Then the generator function could, using the childโs newly given fitness score compute a new child object to supply
You said you pair the individuals [1,2,3,4] like this: [1,2], [1,3], [1,4], [2,3], [2,4], [3,4] right?
Ye
Does each pairing only generate one offspring?
Right now
But thereโs nothing making that not be the case later
But it seems that most implementations only generate one off spring
Indeed! But I think most implementations don't choose parent like this ๐ In your case, if 1 is the individual [1111 1111], we can never concieve a child with 1111 in the second half
But one bigger thing I was thinking regarding that, is that you generate (N choose 2) children each generation leap, which the user has to test before settling on the N best ones, which become the next generation
Whereas they in the figure up there, generate N children each generation leap
Right
Itโs kinda hard to say which is correct I think. The % gets you less but possibly better guesses but this way is a more complete sampling. Hard to say both have their pros and cons
Sure! And I'm definitely no expert
But if we compare the complexities of f(x) = x choose 2 and g(x) = x we get this
So for x choose 2 to be a reasonable choice, then the best child must be very similar a member of the original population
Whereas the blue line will quickly move through populations, finding descendants very different from the initial pop
Right
nerds! :D
math

I wonder if thereโs any justification for just random choice. Where each individual has 1/n choice of being a parent
That starts to approach beam search a bit imo ๐ค
I was just thinking in terms of NNs on that where
Given certain problems you have to beam because small changes can give drastic results
Especially with breeding and activation functions
Sure, that's a pickle
ooh i like pickles :^)
Are you sure you want to have the activation function as part of the GA though? When I first heard you explaining the application of GAs in NNs i figured you were just going to evolve the weights
It can go both ways
Not gonna evolve the output function to keep the uh
Answer range the same
But everything else is fair game I think
You da boss ๐
Right like obviously it makes no sense to score fitness on networks with sigmoid
And then suddenly breed a tanh answer
So your last dense layer would stay the same
But I think activation functions on the convolution layers can be helpful
Worth a shot! Have you tried training with it yet?
Itโs running over the weekend
I had some bugs to quash which I think have all been taken care of
Was gonna go into work today in a couple hours and see if itโs still running over night
I feel like Iโm missing something scikit learn in not knowing what itโs capable of. Is there a resource for showing all the advanced stuff it can do?
There are quite a few tutorials on stock price prediction with machine learning, but many of them are outdated or use the Quandl library, which only has stock data to March of this year. Does anyone know of any tutorials that donโt have these problems?
note: if there was a tutorial that worked for actually getting a profitable model, everyone would be doing it
i had a group of friends work on a machine learning model for stock prediction for half a year, and they only got vaguely positive predictions that hypothetically made them money but were never tested live...
they couldn't find data without paying big bucks, either.
Yeah, anything that's real-time or any useful aggregate analytics is often behind a paywall
@feral lodge, check this out:
@small pumice DM me for data if it is only for testing your code and learning. I do not know how credible the data is though
I like that Joseph
I think I liked the vertical timeline more but the round data has a cool feel to it
Donut chart ๐
Machine learning (and possibly human civilisation) has peaked: a paper describing an algorithm that can automatically generate internet memes. Oh, and the paper is called "Dank Learning". via @samfreis https://t.co/btyqNumBfi
264
540
So the round data and the is actually representing the data differently
My project this semester is to write some type of technical manuscript. Going to wind up doing a tutorial / user's manual for Keras. Is there any interest in that being posted here? Ideally it'll cover types of networks / optimizations / when to use techniques such as forwards / backwards propagation, etc.
Maybe send it to the Keras developers?
I'm sure they'd love to add it to their documentation
Maybe. I just HATED having to have 5-10 tabs open while researching at the start
and found no concrete starting point
every tutorial is the same copy pasted MNIST flower petal model with a sentence variation
Definitely send it to them then
Depending on how their docs are done,you could just take pieces of content from whatever you make and add it, then submit a pull request or something
The bane of all software projects is lack of good documentation
and no users
Often because they have no idea how to use the awesome thing you built and don't have the time or desire to retrace your steps to figure out how it works ๐
True
People will also like a quick reference guide for switching from one ML moduel to the other
Yeah. Itโll all be in Keras scope those because unless youโre doing natural language modeling, which the Azure platform is set to excel at, thereโs not a huge reason to use like pytorch here, Keras there, etc to my understanding of it
Isn't Keras docs easy enough to use? I didn't have any trouble for the most part
Yeah the docs are nice but piecing together the docs into a coherent structure was :puke: cuz there wasnt like any nice tutorials
except the datasets ones thats used everywhere
theyre more geared towards people with some type of knwoledge about ML in general not for newbies
IMO ML shouldn't be like you can pick a library,learn it and use it. You should understand some of the principles,ideas before being able to use them, so in a way the current state of resources is good to filter out overenthusiastic hardly working entrants to this field
I dunno. I guess in theory thats not wrong, but theres value in being an entrant building something and understanding it
even if you dont know the innerworkings behind it
the cycle should be learn-->understand-->build, you don't need innerworking level understanding but atleast the surface level, so that you would know where else a certain ML concept might be applied to
I need a data set of headshots that are larger than 64x64. Anyone have any ideas?
Ideally Iโd like 400x400 or larger
I might scrape insta for them
No idea, maybe there's one here: https://www.kairos.com/blog/60-facial-recognition-databases
Checked all those. Using bits and pieces from like 4 different ones there
Thinking scraping insta or Facebook might be best
But idk if thatโs okay to use in research ๐
http://discovery.cs.wayne.edu/lab_website/index.php/lsdl/
Probably have to cite this paper if you use: https://arxiv.org/pdf/1706.08690.pdf
Reminds me of an industrious little company called cambridge analytica ๐
5million faces. Machine learning intensifies.
Computing gradients of all of those will take a month though. Lol.
Probably worth
No problem, looks like half of them are obama's face ๐
Worth. That works since half my deepfake is trump ๐
๐
Dlib is such a great library
Nah. Itโs just quick to double check a dataset and make sure the images have faces
Cool! Wonder if it's good enough to detect on images like these
aw ye
Go green
lmao
Oh snap
Was that on those obscured faces?
import dlib
import glob
from skimage import io
dir = "/faces/dir"
detector = dlib.get_frontal_face_detector()
correct = 0
total = 0
for x in glob.glob(dir + "*.jpg"):
img = io.imread(x)
total += 1
faces = detector(img, 1)
if len(faces) > 0:
correct += 1
print "Found {} correct faces out of {} total images".format(correct, total)
yeah
No wonder
25% isnt terrible when you can literally only see half the face
May be you should give parts of the face as training sets
im just using dlibs default model
because slandon was curious if it works
enlarging the images gave about 10% more correct
Found 3636 correct faces out of 10049 total images
I am just throwing random thoughts ๐
Yeah im just messing around with things while I keep tweaking my research project
<@&267628507062992896> Worth pinning the code-block above
xd
I upped the resampling to 10 to see if it changed anything and its taken like 20minutes to run
40 minutes still going strong
no idea why we would pin that.
same
same
Well, if someone is trying to write their own image recognition, then that code above will serve as an avaluation standard to measure your own code against
@feral lodge set it up to check every resampling rate dlib offers, seems to get about 10% better each sampling. didn't have time to do 10 fully, but 0 was 1100, 1 was 2200, etc will report tomorrow with exact results. it looks promising though
not that it matters for anything but it's cool none the less
Is also works on the CPU so it wont impact my keras training xd
Hey all - I'm trying to do two subplots - each being a LineCollection - in matplotlib. Just to make it easy since I am not providing the underlying data, here's the function I created to make the first line collection - I only need to duplicate this so that there are two of them side by side.
'x2': Xia['Finish']})
segs = np.zeros((len(df_lines), 2,2))
segs[:,:,1] = df_lines[["x1","x2"]].values
fig, ax = plt.subplots(figsize=(3,20))
colors = [mcolors.to_rgba(c)
for c in plt.rcParams['axes.prop_cycle'].by_key()['color']]
line_segments = LineCollection(segs, colors=colors, linewidths=7)
ax.add_collection(line_segments)
ax.set_ylim(-1,1)
plt.title('Xia Dynasty', fontsize = '25')
plt.ylabel('Year', fontsize = '20')
plt.yticks(fontsize = '15')
plt.xticks(range(len(begin)), "")
plt.ylim(-2230, -1750)
plt.xlim(-.3,1)
for i in range(18):
plt.text(.1, begin.iloc[i] + length.iloc[i]/2, event.iloc[i], ha='left', fontsize = '14', rotation=0)
plt.gca().invert_yaxis()
plt.show()
fig.savefig('xiadynasty.png', dpi=100)```
code blocks ๐ฆ
I've tried this solution, and I get an error about the image being too big to create.
begin = Xia['Start']
end = Xia['Finish']
length = Xia['Length']
event2 = XiaXSZCP['Dynasty']
begin2 = XiaXSZCP['Start']
end2 = XiaXSZCP['Finish']
length2 = XiaXSZCP['Length']
df_lines = pd.DataFrame({'y1': Xia['Start'],
'y2': Xia['Finish']})
df_lines2 = pd.DataFrame({'y1': XiaXSZCP['Start'],
'y2': XiaXSZCP['Finish']})
segs = np.zeros((len(df_lines), 2,2))
segs[:,:,1] = df_lines[["y1","y2"]].values
segs2 = np.zeros((len(df_lines2), 2,2))
segs2[:,:,1] = df_lines2[["y1","y2"]].values
colors = [mcolors.to_rgba(c)
for c in plt.rcParams['axes.prop_cycle'].by_key()['color']]
plt.subplots(figsize=(3,6))
ax1 = plt.subplot(1,2,1)
line_segments = LineCollection(segs, colors=colors, linewidths=7)
ax1.add_collection(line_segments)
for i in range(18):
plt.text(.1, begin.iloc[i] + length.iloc[i]/2, event.iloc[i], ha='left', fontsize = '12', rotation=0)
plt.title('Xia Dynasty', fontsize = '25')
ax2 = plt.subplot(1,2,2)
line_segments2 = LineCollection(segs2, colors=colors, linewidths=7)
ax2.add_collection(line_segments2)
for i in range(1):
plt.text(.1, begin2.iloc[i] + length2.iloc[i]/2, event2.iloc[i], ha='left', fontsize = '12', rotation=0)
plt.gca().invert_yaxis()
plt.show()
fig.savefig('xiadynasty.png', dpi=100)```
I should really gt around to learning matplotlib so I can help with these
have you tried lowering your dpi maybe? or does the error occur sooner than that
It's sooner - removing it or limiting it to a tiny amount still spits out a "this is way too big" error. Here's the error message when dpi = 100
<Figure size 216x432 with 2 Axes>```
Thats not a dpi problem
Yeah was just a quick troubleshooting thing to be sure :p
Sorry, I don't know matplotlib, wish I could help more
Maybe a list is too big, or a loop doesnt have an end condition
Stack seems to think that it could be a stray text coordinate. Make sure they're all being given within the bounds of the image. but I'm not sure about MPL either, @feral lodge normally handles these questions hahaha


