#data-science-and-ml

1 messages ยท Page 186 of 1

feral lodge
#

Lemme have a quick look

#

Soon sleep time for me though, heads up

placid snow
#

Returns the coefficient of determination R^2 of the prediction

feral lodge
#

Ah I see! Looks like you can choose the error measure yourself

#

Then hold on, let me reread your older messages

placid snow
#

I use the built in .score methods of the model classes

#
model.fit(X_train, y_train)
score = model.score(X_test, y_test)```
feral lodge
#

Aight, so you can ignore my error measure up there then

#

I've pretty much never used Rยฒ though

#

But anyway, so you just average your score over your 100 experiments then? Sounds fine

placid snow
#

Yeah, avg, min and max

feral lodge
#

๐Ÿ‘Œ

#

Then just have a go at changing the data to a set of integers 0-17 and try fitting the model again, evaluating the score just like before

placid snow
#

Should i stick to manual data splitting?

feral lodge
#

Trust the machine ๐Ÿค–๐Ÿ‘Œ

#

Just do random draws or something to split your data into training/testing sets

#

Sometimes they'll be bad, sometimes good, c'est la vie

#

If the model is good then the average score will reflect that

placid snow
#

Hm

#

I guess

#

Thank you! Imma go back to tinkering the knobs then

#

Before I go GWcmeisterPeepoE

feral lodge
#

The "legit" way to split your data over a series of experiments using a limited data set is cross validation however: https://en.wikipedia.org/wiki/Cross-validation_(statistics)

Cross-validation, sometimes called rotation estimation, or out-of-sample testing is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the g...

#

Definitely have a look at that, I think i saw something in the docs about it

placid snow
#

Since the dataset has 6 * 3 features I can see the getting 18 combos

#

But what do I do with data that share features ?

feral lodge
#

๐Ÿค” I don't understand

placid snow
#
M,trinervia,0.4627
M,trinervia,0.4135```
#

Etc

hollow lantern
#

avg?

#

๐Ÿ˜ƒ

placid snow
#

just give both the same category value and keep it as is?

feral lodge
#

Oh

#

Yes, that will be "translated" to

15, 0.4627
15, 0.4135

Or whatever integer you assign that combo

placid snow
#

alright

#

Just wanted to make sure peepoSmile

feral lodge
#

There's nothing weird about that just so you know. A data set will always be a flawed representation of reality

#

Say we wanted to predict the weight of a human being, using the features "nationality" and "gender" for instance

#

Certainly we can have two swedish men with different weights

placid snow
#

Yeah, you're right

#

Just had one of those, but what if bad happen moments

feral lodge
#

The issue is that we lack features, such as "profession", "salary", etc. Those would help us, and they exist IRL -- but we don't have access to them

#

They are so-called "latent features" https://en.wikipedia.org/wiki/Latent_variable

In statistics, latent variables (from Latin: present participle of lateo (โ€œlie hiddenโ€), as opposed to observable variables), are variables that are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed (direct...

placid snow
#

I see

feral lodge
#

As you do more ML you'll encounter this more often. If you plot the weight of all swedish men, for instance, you would quickly see groups, or clusters, forming

#

Why these clusters arise is because of latent features. Maybe most of the men in one cluster were recently divorced, or something

#

No way to know, but we can at least find the clusters and make educated guesses

hollow lantern
#

that probably would make you eat more

feral lodge
#

exactly

#

Finding clusters like this is the main problem in unsupervised learning -- finding patterns that arise due to latent features https://en.wikipedia.org/wiki/Unsupervised_learning

Unsupervised machine learning is the machine learning task of inferring a function to describe hidden structure from "unlabeled" data (a classification or categorization is not included in the observations). Since the examples given to the learner are unlabeled, there is no e...

#

That's a tangent though, not directly related to your problem ๐Ÿ˜ƒ

#

The point is that without knowing these latent features, we will always see variation in the data

#

Of course, if we had ALL conceivable data, we would have 0 variation

hollow lantern
#

good explanations ๐Ÿ‘๐Ÿฝ

feral lodge
#

Cheers ๐Ÿ˜„

#

Good night friendo

hollow lantern
#

good night

placid snow
#

๐Ÿ‘๐Ÿพ

small ore
#

Wow. I learnt a lot today. Thanks to you guys. Esp Slandon

fathom current
#

I want to be able to train a program to identify a object in a picture

#

By feeding it pictures of what I want it to identify

#

Is keras and tensor flow the tools to do that ?

feral lodge
#

Yes those will work! Pytorch is another library that I find easier. You'll want to have a look at convolutional neural networks

#

Hope you have a lot of training data :)

feral lodge
#

@small ore By the way, after waking up i suddenly realized what you meant by "Is arbitrary numbering a good input training set?" and the answer is no. If we number the combinations 0-17 and regress upon those we should first have sorted the combinations after, say, average weight or something, so that combination 5 will correspond to a higher weight than combination 4 , etc. Otherwise there will be no correlation between the input and the weight, and our linear estimation 'y = kx + m' will fail miserably which I suspect it did. Completely slipped my mind yesterday ๐Ÿ˜–

placid snow
#

I tried categorizing the combos of level and species btw

#

Didn't really see any improvements FeelsSadMan

feral lodge
#

Did you try sorting the list of combinations by average weight before assigning the new label?

#

If we don't, there won't be any correlation between the label and the output

#

But even then, in the best case, your regression will just be another way of immediately mapping a combination to its average weight :/

#

It's such a weird assignment tbh :/

placid snow
#

Yup

#

Was even stated that the highest possible score from all hand ins would be the top grade for the assignment, so if nobody got higher than -0.5.. that would be an A

feral lodge
#

They pit you against each other like fighting roosters ๐Ÿค” Is the data available online?

placid snow
#

Not that im aware of

#

We just got a csv with it

#

But i think I've found a solution imma stick to

fathom current
#

So to make a model that can distinguish between let's say different fish. I would need thousands of pictures of each fish I want it to recognise?

feral lodge
#

Quite possible!

placid snow
#

The more the merrier

feral lodge
fathom current
#

Thnx

feral lodge
#

no problemo ๐Ÿ 

velvet anchor
#

Any ideas on how to go about effectively knowing what to fiddle with to help a network train? Right now I have a network with about 400,000 images in the training set across two classes that are very similar (photoshopped vs not photoshopped) of the same category essentially and its having a hard time not over fitting into one section or the other. Do I reduce data? change activation functions? scale the images larger? all of the above?

small ore
#

@feral lodge The sorting would indeed meake it better but still arbitrary numbers does not make much sense to me. It is like a step of 1 to distinguish between species while the actual meaningful things may not even have any reasonable corelation to that function. Hot encoding seems better esp since data is not much.

#

Number of dimensions are also reasonable

#

@fathom current From what little I have read, do not apply your recognition codes directly to the original image. Dumb it down ( Grey scale, etc and perhaps other dumbing masks) to make the problem solve in reasonable time.

fathom current
#

sadly there's an app already that does everything i was considering doing

small ore
#

Some common sense even says things like outlines may be sufficient to determine what these are. ( Not always)

fathom current
#

so i may not be doing any machine learning stuff any time soon ๐Ÿ˜ฆ

#

back to the drawing board

#

but thank you for the info

small ore
#

Google photos is one app I can think of which does not need a great many photos to recognize faces and search image by faces. Maybe it incrementally betters the prediction model for each face when you add more photos to its database. And going by its silent speed it perhaps dumbs down images a lot and uses only a few bits of information for making indexes. Recently someone known to me was impressed when it could detect a childhood photo of theirs based on a few 10s of their adulthood photos

feral lodge
#

@small ore I won't defend it since i definitely agree with you that it's a very bad representation of the data. ๐Ÿ˜„ It enforces some strange stuff like a uniform distance 1 between the data points, which is almost certainly false. But I also think we should be very skeptical of a representation that requires fitting in nine dimensions despite originally having only two features ๐Ÿ˜• In the end I think neither will work well -- the artitrary-number representation because it simplifies and assumes too much, and the one-hot representation because it's too high dimensional -- because linear regression simply is a poor model for the data

feral lodge
#

@velvet anchor How different are the photoshopped and real versions of the images?

#

If the data is like, half children's sketches of animals and half photographs of animals it might be better to first detect whether or not it's a sketch and then use one of two convolutional networks to classify which animal it is

#

However if they're fairly similar it's probably better to preprocess the images, keeping only black-and-white outlines, and train on those

#

Also, in the arxiv paper i linked above they do some preprocessing on real life fish images (page 3) If your photoshop images are realistic-looking I imagine this preprocessing may produce similar results for them as for real-life images

#

Those guys also saw big performance jumps when switching between activation functions, since you mentioned those

polar acorn
#

So I'm trying out some algorithms on this classification problem. But the data cleaning is a hassle. I usually just structure everything in a long script ๐Ÿ˜”. I have several data sources that needs tying together, and each source has several separate datasets for certain time periods. In addition each dataset for each source has meta info like time offset etc. Would it be wise to create a class for handling all the data sources for each time period? With some function that returns a workable dataset?

#

And in the case that I want to add several of these time period datasets together to a larger dataset. Should this also be in a class, or should I just have a function that iterates over the some id's create instances of the class, calls the wished for output and merges this output?

astral harbor
#

could this motherboard be used to build a gpu-based machine learning supercomputer? or is pci-e x1 over usb too limiting? https://edgeup.asus.com/2018/asus-h370-mining-master-20-gpus-one-motherboard-pcie-over-usb/

velvet anchor
#

@feral lodge not very

#

Well kinda. Theyโ€™re deepfake images

#

So only a small portion has been touched

feral lodge
#

Ah

#

Is it humans?

velvet anchor
#

Yes

feral lodge
#

Tricky ๐Ÿค”

velvet anchor
#

Right now I'm setup as 7 convolutional layers in a binary classification problem

feral lodge
#

You just feed the images without preprocessing?

velvet anchor
#

I've done a tone of preprocessing

#

Been working at this problem for like 4 months ๐Ÿ˜›

feral lodge
#

Aw shucks ๐Ÿค”

velvet anchor
#

I've tried RGB, Greyscale, a Gradient (which is a self created technique of drawing vectors of brightness change within the image), etc etc

#

all the different activation functions, more layers, less layers, etc etc

feral lodge
#

I know there are entropy-based algorithms for detecting tampering with images

#

But that's like for detecting photoshop editing

#

Interesting but not really my area. I'll see if I can find some papers

#

Good project though, lord knows we'll need to be able to detect deepfake media in the near future

velvet anchor
#

Yeah it's a research project with one of my professors and it's just a 2 man team with a limited GPU

#

I'm having to run a batch size of 2 @ 250x250 to even train networks lol

#

It's a super interesting problem because

feral lodge
velvet anchor
#

you're essentially trying to detect the noise within an image

#

but that gets lost in training a lot of time

#

cloud based isn't really an option because of pricing but a new GPU may be in the departments future soonish

feral lodge
#

Aight ๐Ÿค”

#

Looks like it detects copy-and-paste homemade fake images ๐Ÿ˜„

velvet anchor
#

Yeah

feral lodge
velvet anchor
#

And it may turn out that what weโ€™re trying to is impossible given team size (basically only me) and the hardware. Thatโ€™s okay too. I just donโ€™t want it to be the case

#

Iโ€™ve tried every combination of settings though. I have a master python script that generates models and tests accuracy and itโ€™s either always 1 or always 0 over the test set.

hasty maple
#

Have you tried selu activation and alpha dropout?

velvet anchor
#

Yes on selu no on alpha dropout

hasty maple
#

try alpha dropout after selu, also have you tried Bayesian search for your parameters?

velvet anchor
#

Yes on Bayesian

hasty maple
#

what about other dropouts, batch normalization?

velvet anchor
#

like keras.layers.BatchNormalization?

#

and for drop outs right now im dropping out 0.25 after 3rd layer then 0.1 after the 5th

#

Id post my code but i'm not at the workstation right now to have access to it

hasty maple
#

no github?

velvet anchor
#

Nah I only work on it when I'm getting paid so i didnt have it uploaded remotely to remove temptation ๐Ÿ˜›

hasty maple
#

ayy

velvet anchor
#

Is there a better ML framework than Keras for images? Like one thats easier to use?

#

I like keras because you can just feed it np.arrays but if theres another im open to switchin

hasty maple
#

pytorch,tensorflow

velvet anchor
#

might look at pytorch but im also more well versed in keras. probably not worth trying to replace 4months of keras knowledge

feral lodge
#

When you say you did Bayesian search, does that mean variational inference?

velvet anchor
#

Maybe, any links for exactly what you're referring to to be sure?

feral lodge
#

https://arxiv.org/pdf/1506.02158.pdf Like this one for instance. Instead of producing point estimates of the weights, place a prior distribution over them, and train to compute the posterior

#

Sort of in its infancy and limited by computation power and bias due to choice of prior, but in theory good against overfitting

velvet anchor
#

Might work if I had more hardware

#

but our machine has a 970 I think

velvet anchor
#

here's what I'm working with in my current iteration, just trying distinct activation functions to see what changes

#

The epochs and stuff are low for rapid testing just to see results on a model, i up them to something reasonable if I get promising results

hasty maple
#

have bigger kernels at the start and reduce them as you go deeper in the conv2d layers

velvet anchor
#

Tried that combination as well, tried at one point going from like 64 to 2

hasty maple
#

they should gradually decrease, not decrease at once

velvet anchor
#

Yeah I did

#

I stepped like 64 -> 58 -> ... -> 2 at one point

#

I've just been messing around with this variation for a bit

#

Examples from training images are like

hasty maple
#

All I every did with images was on mnist, my experiments showed, have kernals decrease slowly, window size gradually decrease and that helped. sandwich dropouts/batch norms as much as possible and they'd do well

velvet anchor
#

One of the big problems I found with using large kernels and stuff was overfitting

#

Because the images are so close to a real image

#

Like that

hasty maple
#

Damn that looks difficult to say it's a fake.

velvet anchor
#

Exactly

#

Sometimes theyre more obvious like this

#

And then theres these which are basically impossible

hasty maple
#

It's pretty much impossible to solve lol. I think natural images have some sort of static noise in them, maybe artificially created ones don't, you could try to extract that as a feature and feed it to the network.

velvet anchor
#

That's what I tried to do by creating a gradient to measure light

grave axle
#

Hey guys! Just wanted to drop in and ask if there's a relevant channel for python for finance purposes?

#

I have an ongoing project that involves some quant with py, so if there's someone who's good with that please tag them here or tag a relevant channel as I couldnt find one

spark summit
#

@velvet anchor and @hasty maple you might find some interest in research around characatures

#

the human mind abstracts things in ways that we have very difficult time reproducing in AI

small ore
#

@grave axle I would me interested to learn about the same topic too. If you learn about a server/channelor come across/know of any material, please let me know

fresh otter
#

Anyone here knowledgeable in Keras? Specifically multilabel class prediction

#

Getting some strange results from predict_generator

velvet anchor
#

Im a bit of a keras noob but it's what i've been using for my project

fresh otter
#

predict_generator is returning incorrect values i believe

#

I have a model trained on ~1 million images

#

number of classes in 228

#
BATCH = 64
STEPS = len(test_paths) // BATCH

test_seq = TestBatchSequence(test_paths, BATCH)

probs = model.predict_generator(
    test_seq,
    steps = STEPS + 1,
    workers = 5,
    verbose = 1
)
#

Example of the resultant probability array (index 0)

#
Prob: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
#

Shouldn't that be a probability for the classes, not just 1?

#

Also, each subsequent image has the exact same result array

velvet anchor
#

If the results are the same for every image I think it's an issue of over/under fitting as for 0 or 1 i think some activation functions return only 0 or 1 as a result. @hasty maple might be able to answer definitively if he's around

fresh otter
#

indeed

velvet anchor
#

are you using softmax / sigmoid in your output layer?

fresh otter
hasty maple
#

I'm not that used to keras either lol. All I did was mnist data set using it.

velvet anchor
#

I know for sure the reason it's 1 is because of how softmax handles probability

#

it's output looks like this

#

so it will drag output up to 1 or 0 outside a very small subset of values

#

and the reason it's the same is because of presumably some set of images resulting in an overfit, have you tried changing kernel / window size?

fresh otter
#

yeah it might be overfitting, you're right

#

I oughta compare test results with validation to determine this

prime thistle
#

kfold that B

lapis sequoia
#

what kind of companies do you data scientists guys work for?

#

dont the big companies all use java?

earnest prawn
#

not really

lapis sequoia
#

guys

#

how do you guys kaggle

naive hornet
#

@lapis sequoia I'm no data scientist, but my understanding was that Python is a very strong incumbent in the data science industry because of the many specialized tools written for it and its ease of use. bar R, it might even be the most popular

#

("data science" being a massive and sweeping field that it's probably not fair to lump every portion of under the same umbrella)

placid snow
#

I know statoil, the big norwegian oil company uses machine learning to determine if theres oil based on soil samples or something like that. Was a guest lecture about it in my python ML course. Doesnt quite imply that they use python.. but GWcmeisterPeepoShrug

velvet anchor
#

Is there a TF equivalent to keras' .flow_from_directory()? in essence, I want to try a tf model with some images, but the few examples I could find either use the build in .dataset() module or a pickle'd file and neither are very helpful in that regard

lapis sequoia
#

I think I gonna do a data science bootcamp, there are lot of companies offering that now, costs about 10000$ for 3-5 months, only basic coding background required, almost 100% of students find a job as data scientist afterwards

lapis sequoia
#

hello guys

#

i'm having some trouble with training this data set using stacked algorithms

lapis sequoia
#

wait nvm

#

i think i know how to fix it

polar acorn
#

What is currently the most widely used/best python library for HMM?

lapis sequoia
#

hmm?

polar acorn
#

Hidden Markov Models

polar acorn
#

Seems the answer is pomegranate anyhow..

lapis sequoia
#

Data science is simply a cool name for parts of statistics

lapis sequoia
#

is data science artificial intelligence?

velvet anchor
#

Can be

#

Data science is extremely broad

lapis sequoia
#

true

#

by the way

lean ledge
#

A part of AI is data science, a part is not

lapis sequoia
#

can you guys help me with something

lean ledge
#

AI just means something that can make decisions. Doesn't have to be very statistical but it tends to be

lapis sequoia
#

oh okay

#

by the way

lapis sequoia
#

is data science artificial intelligence?
no

elder basalt
#

in linear regression if i z score my data and calculate the constants for my function, how do i invert the z-score constants that is true for my z scored data to original data?

silk schooner
#

add mean and multiply by standard dev.

velvet anchor
#

Is there a dataset of normal photos containing people. Not just cropped faces like the LFW or Essex set?

sacred summit
velvet anchor
#

that's P E R F E C T

#

ty

sacred summit
#

No problem

lapis sequoia
#

what are some features that you guys commonly use when interpreting data

velvet anchor
#

Depends on the data set

small pumice
#

Would this be a good channel to ask for help on a neural network program?

sacred summit
#

@small pumice Yes

small pumice
#

Ok

#

I'm working on a project in which a character in a text-based rogulike-styled world has to find its way around

#

To control it, I made a neural network from scratch

#

no libraries

#

But it isn't working

#

Wait-nevermind

#

I think I figured it out

sacred summit
#

Alrighty

lapis sequoia
#

@small pumice how did you do it?

#

in one of my future projects i plan on doing that as well

#

well except apply the neural network to something else

small pumice
#

Donโ€™t ask me

#

It doesnโ€™t really wprk

#

Work

lapis sequoia
#

oh

small pumice
#

Iโ€™ll post the code later though

#

Maybe someone will know how to fix it

velvet anchor
#

Maybe. Coding a NN from scratch though is a very big task. I think youโ€™ll likely save a bunch of time getting tensorflow / Keras / pytorch to work for you than you will coding one from the ground up

sacred summit
#

Yeah, probably start from the basics, NN are actually pretty hard to make

velvet anchor
#

Iโ€™m not saying itโ€™s impossible but itโ€™s definitely a several month task for a team of researchers to get a working version

small pumice
#

Yeah

#

Itโ€™s just so hard to find TensorFlow tutorials that explain the concepts that are being coded in

velvet anchor
#

Try looking for Keras info. I had the same problem

#

It runs as basically a tensorflow wrapper. Almost. Either tensor or thano

sacred summit
#

You mean thanos :p

velvet anchor
#

maybe. im not familiar with it just seen it before

#

I think its actually theano but im not 100%

small pumice
#

snaps fingers

#

Also the recent TensorFlow update makes most tutorials outdated

silk schooner
#

a simple neural network only one or two layers is not hard to code up by yourself if you wanted to... you should use numpy for all the matrix operations

small pumice
#

Mine has 2 hidden layers

#

Input layer has 190 neurons, second layer has 16, third has 16, and output layer has 4.

silk schooner
small pumice
#

Cool

silk schooner
#

its not documented well or anything and its a few years old but maybe you'll find it useful

#

lemme know if u have any specific questions

#

i'd really only recommend coding one up yourself though if you are interested in the challenge or learning more about them or something.... if you just want one to use, id use libraries like other people recommended

small pumice
#

Yeah

#

Thanks

velvet anchor
#

If I have a NN detecting photoshop should the training set classes be like normal humans & a seperate class of photoshopped humans or would it make more sense to do like unphotoshopped images of all possible objects then a class of photoshopped humans

velvet anchor
#

that new dataset improved things dramatically @hasty maple BTW. with some preprocessing and stuff I've gotten 80% accuracy ๐Ÿ˜ฎ

hasty maple
#

Are you sure it's not overfit on the train/test set :P

velvet anchor
#

I mean It could be but I just ran it over a test set of ~12,000 images not included in the training set

#

need more testing to confirm but its a start

nova viper
#

Anyone know a server dedicated to Machine Learning ?

austere quartz
#

@feral lodge

feral lodge
#

It's got chats for several branches of AI, including machine learning @nova viper

nova viper
#

@feral lodge
Thanks for your help

feral lodge
#

No problemo friendo

#

Ty for the mention @austere quartz ๐Ÿ˜

austere quartz
#

๐Ÿ˜

hasty maple
#

/r/learningmachinelearning has a discord server as well. You could look into that if you're interested @nova viper

feral lodge
worn cosmos
#

Anyone know of a good way to get certified/prove skills in Python to a potential employer? My MS is in biostatistics but I'd like to get into data science.

velvet anchor
#

One of the best ways to prove skills is with a github to show case but I'm not entirely sure on certification

feral lodge
#

Looks like there are organizations that issue python certificates, but looks like a hassle and sometimes expensive ๐Ÿค” I'd say it's probably better to have a personal project or two which you can show the employer

#

Agreed on github

velvet anchor
#

I know some companies like microsoft are now additionally offering like a certification in their machine learning programs but i don't know how much value they hold to employers

worn cosmos
#

Alright, thanks for the advice guys. I've got some basic stuff up on GitHub but it's not really data-science related. Do you have any advice on something I might want to look into project-wise? Or would this be a better question for a career counselor or someone in the industry?

feral lodge
#

Biostatistics sounds like a perfect application area for machine learning if that's what you're interested in

worn cosmos
#

Definitely. We do pretty much everything in SAS though, and I'm already looking into getting certified for that soon. (I'm currently in college) I'd like to expand my skillset though.

feral lodge
#

Python has several excellent libraries for statistical analysis

#

Pandas, SciPy, Numpy among others

velvet anchor
#

scikit-learn

#

I listen to a python podcast thats been covering a lot of data science applications lately and I hear scikit come up ALL the time

feral lodge
#

What's the pod?

velvet anchor
#

I think its talk python to me

feral lodge
#

Thanks ๐Ÿ‘Œ

velvet anchor
#

He's been interviewing people who are using it in geoscience or at the allen institute, etc etc and they talk about their software stack and stuff

#

its pretty high level so not like super detailed but 3 you mentioned and scikit are what I hear every episode

lapis sequoia
#

I consider going to a data scientist boot camp, but i saw on the curriculum they dont teach python or javan, they only use R, is that any good for a job? ;/

velvet anchor
#

R and Python are pretty neck and neck for data science

#

I dont have much personal experience with it but it was the original laguage and pythons currently trying to overtake its spot

feral lodge
#

R is a bit like MATLAB if you've ever used it

#

As a programmer I like to think of it as a very advanced, programmable, calculator

#

Great for stats, but you can't really integrate it with a bigger program

#

Which you of course can with python

lapis sequoia
#

yeah i dont think its a very versatile programming language, more something for academia and mathematicians

feral lodge
#

That's my opinion yeah

#

I can be used in the field though, one of my professors has done a lot of work for the central bank of Sweden, and he works almost exclusively in R afaik

velvet anchor
#

seems like both have a pretty even split though

#

I have an interview @ twitch soon as a data analyst and they use Python

hasty maple
#

Good luck! Wish I had an interview too >.<

velvet anchor
#

Ive prolly sent out 50+ applications haha

#

just looking for something entry level for my last semester

#

dont super care what it is

hasty maple
#

:o nice

#

I sent out like 5 a couple weeks ago. I guess I should send more

worn cosmos
#

Okay awesome, thanks for the help guys

velvet anchor
#

shotgun technique @hasty maple

hasty maple
#

@velvet anchor have you done any Data Science projects before applying for all these jobs? I just did one, studied for like 4 months, did one kaggle comp, got good results and that's about it ๐Ÿ˜‚

velvet anchor
#

Just this research fellowship over deepfakes

#

But I also almost have a BS in Math and CS

hasty maple
#

ah you have a data analytics preferred major

velvet anchor
#

Yeah something like that:P

past gazelle
#

Are there any data-science centric Discords beyond the Python realm?

velvet anchor
#

like for R?

past gazelle
#

Just in general, like broader topics than just language-specific stuff

#

I don't have any particular questions I'm more just curious

velvet anchor
#

i think /r/learnmachinelearning has a discord

past gazelle
#

Ah, cool, thanks!

stoic gyro
#

i have this problem

#

how can i solve it?

velvet anchor
#

youre either not training well enough or displaying too low of confidence

stoic gyro
#

ok ok

#

i will figured this tomorrow

velvet anchor
#

Is there a book for tensorflow or pytorch thatโ€™s most recommended?

sacred summit
#

@velvet anchor I have a pdf on Machine Learning with Tensorflow

#

in python btw

velvet anchor
#

Thatโ€™ll work. Iโ€™ve been using Keras mainly just wanna expand my horizons a bit

sacred summit
#

I'll dm you

velvet anchor
#

K

small ore
#

Also how do I add /r/learnmachinelearning?

#

@hasty maple @velvet anchor

feral lodge
#

Remove the [[[....]]] stuff if you didn't notice it ๐Ÿ‘Œ

small ore
#

Thanks a lot

feral lodge
#

No problemo ๐Ÿ‘Œ

small ore
#

Is that the Tensor flow pdf?

feral lodge
#

Possibly, but I have no idea! That pdf could be anything really

hearty hazel
#

!mute @feral lodge 3d Bypassing the spam filter

crimson lightBOT
#

:ok_hand: Slandรถn#5361 is now muted for 3 days (Bypassing the spam filter)

small ore
#

๐Ÿ˜ฆ

#

Too harsh a punishment

hearty hazel
#

He knew what he was doing

#

He could have just DM'd you

#

These things are in place for a reason

small ore
#

He says he agrees with you but I still think 3 days of muting Slandon is a loss to us more than him

hearty hazel
#

!unmute @feral lodge

crimson lightBOT
#

:ok_hand: Slandรถn#5361 is now unmuted

hearty hazel
#

Let's make sure that doesn't happen again

feral lodge
#

Lesson learned! Thanks boys

hot karma
#

plt.scatter(x_factor[:,0],x_factor[:,0])

#

What is [:,0] ?

placid snow
#

a slice

#

Is x_factor a numpy or pandas array?

hot karma
#

Matplotlib

#

Y there is 0 in [:,0]?

proven crater
#

how does that slice work with the comma though. it looks weird

placid snow
#

A slice is an object with start, end and step.

#

Wait, with a comma

hot karma
#

Oh 0 is step

placid snow
#

I totally missed that comma

proven crater
#

if it would be the step it would be [::0].
Also a step of 0 would be an infinite loop I guess,

placid snow
#

I think it's referencing in a 2d manner with , just ignoring x and only giving y?

#

something like [x,y:x,0] ?

proven crater
#

yea it must be somehting like that

hearty hazel
#

It's a tuple

#

one second

proven crater
#

and specifically implemented by the class of whatever x_factor is

placid snow
#

^

#

Hence why i asked if it's numpy or pandas, I believe they syntax like that?

proven crater
#

They also (ab)use the __getitem__ to make the user write nicer syntax yes. Doesn't often look like this though.

placid snow
#

data.iloc[:, 0] pandas dataframe

hearty hazel
#

OK

#

It's an empty slice

#

followed by a 0

#

in a tuple

#

(slice(None, None, None), 0)

placid snow
#

Right, but dependent on what lib it is, it could use 2 slices

hot karma
#

Plt.scatter(x_factor[:,0],x_factor[:,1]

hearty hazel
#

Yeah we really need to know what x_factor is

hot karma
#

At second x_factor there is 1 [:,1]

placid snow
#

They are used to request most likely different indexed columns in a table

#

So 0 would get first columns data, 1 would get 2nd column

hearty hazel
#

that syntax is kinda nasty

hot karma
#

X_factor is a variable carying data

proven crater
#

It is :P

placid snow
#

Would still need to know what x_factor is to give a proper answer

proven crater
#

it's like 2d_data[row_selecting, column_selecting]

placid snow
#

You can always print(type(x_factor)) if you have no idea GWcmeisterPeepoShrug

hot karma
#

F = factorAnalysis(n_components=2)

#

X_factor=f.fit_transform(iris.data)

placid snow
#

What lib is factorAnalysis from

#

sklearn?

hot karma
#

Sklearn.decomposition

placid snow
#

Also did you mean x_factor = ?

#

Else it's a different variable

hot karma
#

X_factor=F.fit_transform(iris.data)

placid snow
#

with a capital X?

#

Anyways

#

Returns: X_new : numpy array of shape [n_samples, n_features_new]

hot karma
#

Small

placid snow
#

So yes it's a numpy array

#

Therefore its array[row slicing, col slicing] so py x_factor[1:3, 0]for instance would be row 1 and 2 with only data from column 0

#

[: ,0] says give me all the rows, with only data in the first column

hot karma
#

Thanks

#

From sklearn import datasets

#

Import numpy as np

#

Iris= datasets.load_iris()

#

C= np.corrcoef(iris.data.T)

#

What is the capital T?

placid snow
#

to me it seems like an alias for target

#

I misread something, lemme try again

#

Yeah, it seems to be the target

#

It's an array of all data split into multiple lists

#

first list is target

#

I actually don't know what I'm doing. But it's atleast every column of the dataset split into lists

feral lodge
#

It's the transpose I think

placid snow
#

Something like that yeah

#

It's data, but transposed

velvet anchor
#

#slandonisfree

#

Also ty for that link @feral lodge gonna pick it up after work

hot karma
#

Can anyone give me a cheat sheet for sklearn?

velvet anchor
#

What kind of cheat sheet

hot karma
#

Sklearn library

placid snow
#

Their docs are fairly good imo

velvet anchor
#

Yeah. Their docs are one of the best

#

Iโ€™m not sure if any like quick reference pages though the lists like functions often used or whatever

hot karma
#

Ow ,That's a good site.

velvet anchor
#

Is there a reference for all the output values of different keras activation functions?

feral lodge
#

That OK? Shows the range and a bunch of other properties

velvet anchor
#

Yeah that's actually perfect

feral lodge
#

Didn't know there was such variety actually ๐Ÿค”

velvet anchor
#

Yeah picking the right activation functions is certainly difficult

#

because theyre all so different

feral lodge
velvet anchor
#

Noice

#

Ill get paid to read this tomorrow

worn cosmos
#

hey, I'm trying to do some time series analysis.

#
year
1998-01-01    71
1998-01-01    60
1998-01-01    65
1998-01-01    83
1998-01-01    72
Name: yieldpercol, dtype: int64
#

this is ts.head()

#

My issue is that I need to combine all the data from each individual year together

#

that is, add up all the entries for 1998, then all the ones for 1999, etc

#

And I'm having trouble figuring out how to do that with pandas

small pumice
#

There are so many TensorFlow and Keras tutorials that jump straight to things like MNIST recognition. Does anyone know some good tutorials on neural networks using simple data that you make?
For example, a tutorial that shows how to make a neural network that can add two numbers together? I know itโ€™s simple, but it would be a good way to get the concept down.

placid snow
#

The Js tutorial on NN from coding train and 3blue1brown are pretty good

#

Subscribe to stay notified about new videos: http://3b1b.co/subscribe Support more videos like this on Patreon: https://www.patreon.com/3blue1brown Special t...

โ–ถ Play video

Welcome to Chapter 10 of The Nature of Code: Neural Networks. (http://natureofcode.com/book/chapter-10-neural-networks/) In this video, I provide a brief int...

โ–ถ Play video
#

Both not python related, but explain the concepts fairly well

feral lodge
#

@worn cosmos By "add upp", do you mean sum? In that case you can probably do something with pandas cumulative sum https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.cumsum.html. If you mean "for each year Y, create a vector/dataframe of all data from the first year to year Y", then you should be able to loop through range(0, length_of_data) and for each index create a slice from 0 to index with data.iloc([ : index]) or something like that:

https://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges
https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer

trail current
#

hey, i'm doing some pretty basic data acquisition and i can't get my live plot to work properly, would anyone mind helping out?

naive swallow
#

bot.tags['ask']

arctic wedgeBOT
#
ask

Asking good questions will yield a much higher chance of a quick response:

โ€ข Don't ask to ask your question, just go ahead and tell us your problem.
โ€ข Try to solve the problem on your own first, we're not going to write code for you.
โ€ข Show us the code you've tried and any errors or unexpected results it's giving
โ€ข Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

junior hemlock
#

Guys I need a little help

#

Can aiohttp and urllib read from websites like YT ?

earnest prawn
#

aiohtpp and urllib can read from anything which is reachable with http or https

junior hemlock
#

Oh thanks

hasty maple
#

๐Ÿ˜‚

lapis sequoia
#

hello

#

how do you guys apply models to the training.csv file

#

and what is a model made of/

#

?

feral lodge
#

Hey! It sounds to me like you're unsure what a model is, so this answer is pretty basic. Sorry if I misunderstood ๐Ÿ˜„ When we observe and take measurements of stuff in the real world there's often good reason to assume that those observations follow a predictable pattern, even if they are seemingly random and independent of each other. A statistical model is a way of mathematically concretizing those patterns, so we can better understand and work more effectively with our data. We will never have enough data to completely accurately model the complexities of real-world relationships, but a simple mathematical model often captures the essential underlying patterns of the observations. What model is suitable depends on the nature of the data:

#

Sometimes our observations are only positive integers, like if we were counting the number of spam emails a person recieves every day. This kind of data should probably be modelled with a Poisson or binomial distribution.
https://en.wikipedia.org/wiki/Poisson_distribution
https://en.wikipedia.org/wiki/Binomial_distribution

Sometimes they're real numbers without limits, like when measuring the temperature at 21:00 each day in January. That kind of data probably follows a normal distribution or a Cauchy distribution.
https://en.wikipedia.org/wiki/Normal_distribution
https://en.wikipedia.org/wiki/Cauchy_distribution

Sometimes the data consists of real numbers with explicit limits, like if we have a list of estimations of the probability of a turtle egg of a certain species containing a female turtle. Since a probability can only be between 0 and 1, that kind of data can likely be nicely modeled with a beta distribution.
https://en.wikipedia.org/wiki/Beta_distribution

#

Instead of only trying to model the distribution of possible values in your data, you're often also interested in modelling the relationship between the different features in your data. In what way does the chance of getting lung cancer increase with each year of smoking? How does it correlate with age? Income? Weight? To model these kinds of relationships we often use a technique called regression. With linear regression we assume the relationship is linear, with polynomial regression we assume the relationship is some polynomial.
https://en.wikipedia.org/wiki/Linear_regression
https://en.wikipedia.org/wiki/Polynomial_regression

If the relationship between features is very complex and high-dimensional, we often need to use a more complex model, like a support vector machine or a neural network.
https://en.wikipedia.org/wiki/Support_vector_machine
https://en.wikipedia.org/wiki/Artificial_neural_network

#

Sorry if this was too verbose or below your level friendo, let us know if you wanted some other kind of direction ๐Ÿ˜„

#

@lapis sequoia

lapis sequoia
#

oh its fine!

#

im kind of new to the whole scene

#

plus im just a high school student lol

feral lodge
#

Never too early to start ๐Ÿ˜„

#

https://www.coursera.org/learn/machine-learning I haven't looked through this course myself, but my friends who have liked it a lot! Might be a good intro to machine learning, and it's free I believe? You enroll and then get access to the video lectures

placid snow
#

Just as a headsup Slandon, you can cut down the length of messages by removing the embed sent by each link by just wrapping them in <> <www.google.com> wont send the embed for www.google.com

#

Unless you wanted them ofc ๐Ÿ˜…

feral lodge
#

Oh I had no idea, thanks! I've got link previews turned off, so I always forget they exist

placid snow
#

Oh, makes sense. You had quite the wall there ':P

#

Every wiki link had a pretty big image with them

feral lodge
#

Live and learn ๐Ÿคฆ

hasty maple
#

What's a good book for statistics, I'm primarily looking for something small and conscience just to go over the concepts, not anything with a ton of derivations and stuff.

young aurora
#

Hey all. I'm using Python to do some basic data visualization for a pet project I'm working on. I'm wondering what they best way to plot a timeline (historical, as in multi-year) for the reign of multiple emperors would be. The columns I've got that I think are actually of interest are as follows: Start (in years), End (in years), and ruler name. Basically I'd like to get a chart with all the rulers mapped onto it (just straight lines) but separated so you can see them individually, as they're sometimes overlapping.

#

I'm happy to share the dataset. It's a collection of all Chinese emperors. I'm gonna be releasing it onto my blog once I've finished this last part, but I'm having trouble finding the proper package to use. I found a suggestion that I could use a Gantt chart for this, but can't figure out how to actually work that with my data. I've got it all in a pandas dataframe.

#

Each row is an emperor, with each emperor having a start date, end date, and length of reign. Not sure if that will help with answering the question.

feral lodge
young aurora
#

I'll try this! Thank you. I'll report whether it worked or not afterwards.

feral lodge
#

Hope it helps!

young aurora
#

So I feel like an idiot - but my plot is totally blank.

#

This is the code I used to create it - don't know if this is what you'd need to help.

#

fig = ff.create_gantt(ThreeSovereigns, colors=['#333F44', '#93e4c1', '#93e4c1'], show_colorbar=True, group_tasks=True)
py.iplot(fig, filename='gantt-group-tasks-together', world_readable=True)

young aurora
#

Yeah, that one definitely wont work. It just isn't doing what it needs to do. Any other ideas?

feral lodge
#

How is ThreeSovereigns defined?

#

@young aurora

young aurora
#

ThreeSovereigns = ChinaEmpire[ChinaEmpire.DynastyCode == '00a']

feral lodge
young aurora
#

It works fine for creating matplotlib charts etc.

feral lodge
#

If you just copy-paste their code, using their example data, does it plot correctly?

young aurora
#

Yes. I was hoping there was a more elegant solution than hardcoding in the start and end date - I think the issue may be that my start and end dates aren't in datetime

#

That being said, they're only years (e.g. -2023 Start, -1500 End) so I'm not sure how to convert them into datetime if that's what this requires

#

If it isn't and hardcoding is what it wants, I can do that too. It just seems extremely... bad

feral lodge
#

As long as it fits the pattern if should be fine! That is, the data should be of the form [ {"Task" : <Name>, "Start" : <Start time>, "Finish" : <End time>} ]

#

So if you can process your data file and parse it to such a list i imagine it'll work. So then the issue is the dates... If you only have the years you can probably just set the month and day to be the first of January or something

young aurora
#

Yeah, that's totally fine - it's thousands of years and also mythology.

feral lodge
#

No idea if it'll handle BC dates nicely though ๐Ÿค” I have an early meeting so I'll have to leave, but if you ask in the help channels someone should be able to help

young aurora
#

Okay, thank you!

feral lodge
#

This has become a sub-problem of the first, so if you show a snip of your data and explain about the negative dates that's probably enough to go on

young aurora
#

Okay, cool. Pyplot just isn't playing nice with the numbers, either.

feral lodge
#

No problem, hope you can solve it! Feel free to DM me a link to the blog when you're done ๐Ÿ˜„

young aurora
#

-9000 maps, -10000 does not.

#

Will do!

lapis sequoia
#

Guys

#

Is the Yahoo scraper of pandas actually broken

#

The Yahoo data reader for stock prices

small ore
#

<@&267628507062992896> I suggest pinning Slandon's message above. The one with a lot of links and explaining basic 'model'

hearty hazel
#

I agree

small ore
#

I think the entire message isn't pinned. It prolly is internally split into two messages

hearty hazel
#

Not gonna pin the whole thing

worn cosmos
#

Hey, anyone here know much about logistic regression? I'm getting a huge upper limit for a Wald confidence interval and I don't know if it's reasonable or not.

#
fractional_shortening | 75.604| 0.252| >999.999 |
#

the values being the point estimate, and upper/lower wald CI's respectively

feral lodge
#

Looks like it works, but implies a high standard error SE(ฮฒ-hat) = SD(ฮฒ-hat)/sqrt(n) for the maximum likelihood estimation ฮฒ-hat of your coefficient ฮฒ. n is the sample size.

The Wald 95% confidence interval is usually a Gaussian centered on ฮฒ-hat whose standard deviation is the standard error of the mean, SE(ฮฒ-hat). That means that 95% of its density is contained within the two points ฮฒ-hat ยฑ 1.96 * SE(ฮฒ-hat), so those two points are where the upper and lower CI limits usually lie.

Since this is logistic regression though, we're working in transformed space. That means the confidence interval is transformed as well, so the interval is rather exp(ฮฒ-hat ยฑ 1.96 * SE(ฮฒ-hat)). So for you we have this:

> betaHat <- 75.604
> lower <- 0.252
> 
> # Now, because lower <- exp(betaHat - 1.96*stdErr)
> 
> stdErr <- (betaHat - log(lower))/1.96
> 
> stdErr
[1] 39.2767
> # Pretty high!
> 
> 
> # Checking lower and upper CI limits:
> 
> exp(betaHat - 1.96*stdErr)  # Lower limit
[1] 0.252
> 
> exp(betaHat + 1.96*stdErr)  # Upper limit
[1] 1.85097e+66
> # Very big!
> 
velvet anchor
#

@hasty maple I can find the book we use for my Statistics class at Uni if you want

#

Found it. this is what we used, it was pretty nice, I'm sure you can find old versions quite easily

flat umbra
#

Hi, this looks like the scientific and numerical python channel on the python discord server. Is that correct?

velvet anchor
#

Yeah

hasty maple
#

Wew that's an expensive book

velvet anchor
#

Honestly its not so bad. its pricy new but used is reasonable

#

Is there a way to ensure a keras model is free from the GPU to train again?

#

Such as like

While(this):
    train a model
    test accuracy
    free resources, to tweak settings
    lets do the timewarp again
hasty maple
#

I've had to manually restart my notebook session to get the GPU freed from the clutches of Keras, let me know if you find a better way ๐Ÿ˜‚

velvet anchor
#

Right now Iโ€™m running multiple scripts that call a second one to free it

hasty maple
#

:o script calling a script, is this something different from an import? iirc import does just that

velvet anchor
#

I wasn't importing it but it would probably also work

#

Just looking to rewrite my toolkit's as we move forward towards an adversarial network with 6 months of python knowledge

#

instead of the trash thats like taped together

hasty maple
#

lol

velvet anchor
#

I took this research position with actually 0 knowledge of python really

#

So it was like, quick sketch of what I wanted to do in C#, Port to python for Keras & TF options, hold the codebase together with prayer

hasty maple
#

You can learn python in a week tbh, it's not that difficult

velvet anchor
#

Yeah for basic stuff

#

but decorators, generators, etc etc are stuff that are also easy but not immediately apparent when you need them plus all the other libraries that are core parts of writing correct python definitely takes more than a week

hasty maple
#

ah true, I never learnt classes, decorators, generators and the like as I haven't found any use for them yet

velvet anchor
#

They're nice

#

Generators are nice for datasets where you don't have a standard way of iterating but dont need (or can't fit) the whole set in memory

#

@hasty maple

lilac shadow
#

i have an interest in learning about neural networks, but i don't really know enough maths to do much with it (though i seem to pick stuff up quite quickly). i don't really have the willpower to learn a whole bunch of mathsy stuff straight up because i'd prefer to apply it and see what it does, rather than simply knowing what it does. essentially, is it possible to work with some of the more simpler aspects of neural nets without having a lot of mathematical knowledge at the beginning?

velvet anchor
#

Yeah neural networks don't require a ton of math knowledge for categorization that's more of a data analytics type of problem

#

but not having a strong math background won't hurt too bad with NNs

lilac shadow
#

i see

#

because i had a look at a sort of "hello world" example a while back and, even though i didn't know exactly how everything worked, i believe i got the general idea at least. i guess if i fiddled with stuff more to see what values affect certain things, i would be able to develop a better understanding of how stuff works together like that

velvet anchor
lilac shadow
#

ooh okay i'll take a look ^^ thanks!

#

i'll have to look up what a lot of this stuff does haha

velvet anchor
#

And then mine as well with the pydis link above is an example of a network

#

but it's not standard because im kind of just room full of monkeys with typewritering my parameters

lilac shadow
#

lol

velvet anchor
#

@feral lodge gonna come in and lay down the science on why what I said has been wrong ๐Ÿ˜›

lilac shadow
#

haha slandon is amazing at this stuff from what i've seen

#

๐Ÿ‘

feral lodge
lilac shadow
#

ooh, now that is interesting

#

i've always been fascinated at NN's recognising stuff in images

#

and things like that

velvet anchor
#

The math behind categorization is largely abstracted from the user

#

for Keras at least you just kind of throw your image as np.array() to the model and it tells you if it fits in one of your categories

lilac shadow
#

i like to know how stuff works behind the scenes too, yeah?

feral lodge
#

Image stuff is usually handled using what's called a convolutional neural network -- an example of which you'll find in Clay's github link up there. One thing I like very much about the learning example in the book there is that it shows that even plain old feed-forward neural networks can be used for images (though in a limited capacity)

lilac shadow
#

i see

#

all this terminology is going straight over my head so i'm going to be doing a lot of googles when i get round to looking at this in detail :D

velvet anchor
#

It's not super difficult to pick up

#

there's just a LOT of techniques for it

lilac shadow
#

oh, that makes sense

#

so you just need to find the best algorithms to do what you need to do, basically?

#

obviously easier said than done

velvet anchor
#

Best parameters, activation functions, etc

lilac shadow
#

yeah

velvet anchor
#

some is pretty easy but when dealing with images that are really close together it gets difficult to distinguish

#

I've been working on this image classification problem for like 3-4 months now for example

lilac shadow
#

i imagine it's a fuckton of optimisations to do as well

velvet anchor
#

@feral lodge what do you know about genetic algorithms?

feral lodge
#

A big part of it is also figuring out clever ways to preprocess your data. Clay's working with images, for example, but he can't just toss his images through the net -- has also has to preprocess his images with stuff like this,to remove unnecessary noise and bring out important features

#

Not much at all actually, just what I remember from the AI intro course ๐Ÿ˜„

#

I'm sure I can google a bit to look smart though ๐Ÿ˜ ๐Ÿ‘Œ

lilac shadow
#

oh yeah, i know about pre-processing ^^ i always imagined it to be a way of making the data more "standard" i guess you could say

velvet anchor
#

My boss was just explaining them to me and mentioned how they might be helpful to randomly tweak parameters

#

It can be. It can also be used to help guide the important parts too

lilac shadow
#

yeah, that makes sense

velvet anchor
#

I didnโ€™t really have a question about them though Slandon. Just didnโ€™t know if youโ€™d heard of them

feral lodge
velvet anchor
#

๐Ÿ˜‚. Iโ€™ve got it almost implemented actually alreadh

feral lodge
#

Nice!

velvet anchor
#

Just gotta make it object oriented and pretty

feral lodge
#

Got this stuff on an open github?

velvet anchor
#

Nah

#

Not until we publish

#

I keep the network stuff kinda open but the pre processing is hidden ๐Ÿ˜›

feral lodge
#

That's cool to be part of a publication before your graduation ๐Ÿ˜ฎ

velvet anchor
#

Yeah

#

and lead author ๐Ÿค‘

#

Pre processing is kept secret because it's the magic I guess (and it's ugly AF code wise because I wrote it Day 1 of learning Python so it takes around 72hours to do 12,000 images LOL)

feral lodge
#

๐Ÿ˜„

#

That's real nice though man, hope it turns out well

velvet anchor
#

But if you want to read into it, it's based on light measurement and ELA

feral lodge
#

I can read the paper laterz ๐Ÿ˜

velvet anchor
#

ELA is such a genius method of image forensics honestly

#

but it only works on jpgs ๐Ÿ˜ฆ

#

but I keep all the stuff off public repos too @feral lodge so I don't work on them without getting paid hahaha

feral lodge
#

Don't you have the thirst for knowledge? ๐Ÿค“

#

ELA looks cool, never heard of it. I've hardly worked on images at all

velvet anchor
#

I do but I also have a thirst for not starving

#

Although with that being said, I'm currently working on my classes while off the clock xd

feral lodge
#

hey me too

#

But i have some exams in August

#

What're your courses?

velvet anchor
#

I meant classes like OOP classes my mistake haha but

#

I'm taking C++, Algorithms, Philosophy and Tech writing this summer

#

then I finish in the fall with Operating Systems, Programming Languages, Senior Design, and Assembly 4

feral lodge
#

I have ascended ๐Ÿ‘ผ

#

That's some good stuff though! How far into your education are you?

velvet anchor
#

I graduate in dec

feral lodge
#

Master's?

velvet anchor
#

Bachelors

#

But a degree in math and CS

feral lodge
#

That's great! I didn't touch ML until after my bachelor's were completed

velvet anchor
#

Here's my genetic algorithm btw @feral lodge but I haven't been able to test it yet ๐Ÿ˜›

#
import netparams
import random


class Genetic:
    
    def __init__(self):
        self._population = []
        self.createpops()
        self.actfunc = ['relu', 'selu', 'linear', 'tanh', 'softmax', 'elu', 'softplus', 'softsign', 'sigmoid']
        self.paramlist = ['window1', 'window2', 'window3', 'window4', 'window5', 'window6', 'conv_depth_1', 'conv_depth_2',
                     'conv_depth_3', 'conv_depth_4', 'conv_depth_5', 'conv_depth_6']
        self.actlist = ['activation1', 'activation2', 'activation3', 'activation4', 'activation5', 'activation6',
                   'activation7', 'activation8']
    
    def createpops(self):
        for x in range(0, 9):
            child = netparams.NetworkParams()
            for attrib in self.paramlist:
                child.setval(attrib, random.randint(1, 36))
            
            for slot in self.actlist:
                child.setval(slot, random.choice(actfunc))
            
            child.setval('hidden', random.randint(400, 1600))
            self._population.append(child)
            
    def evolve(self):
        Parent1 = random.choice(self._population)
        Parent2 = random.choice(self._population)
        
        while Parent1 is Parent2:
            Parent2 = random.choice(self._population)
        
        child = netparams.NetworkParams()
        for attrib in self.paramlist:
            child.setval(attrib, random.choice( Parent1.getval(attrib) , Parent2.getval(attrib) ))
        for slot in self.actlist:
            child.setval(slot, random.choice( Parent1.getval(slot), Parent2.getval(slot) ))
            
        return child
    
    def compare(self, childlist):
        for x in self._population:
            if x.getval('fit') < childlist.getval('fit'):
                self._population.remove(x)
                self._population.append(childlist)
#
class NetworkParams:

    def __init__(self, **kwargs):
        for key,value in kwargs.items():
            setattr(self,key,value)

    def getval(self, networkparam):
        return getattr(self, networkparam)

    def setval(self, networkparam, value):
        setattr(self,networkparam, value)
round current
#

Would this be an appropriate place to ask for help concerning Matplotlib and Python?

velvet anchor
#

Sure

#

actually, maybe.

#

this is more for analytics / ML so it depends on what you're asking about within it ๐Ÿ˜›

round current
#

Aperture redirected my help request to this channel

velvet anchor
#

Ask away ๐Ÿ˜ƒ

round current
#

It is about plotting a polar plot essentially.

#

I am working on generating a Radar PPI Scope using matplotlib. I need fine control of how major and minor ticks are handled along with tick labeling. Since plt.polar does not offer sufficient control over these parameters (to my knowledge), I have opted to use a Polarxes transformation and AxisArtist functions to get the control I need. However, I have run into difficulties with how tick label printing and minor tick marks are handled. The picture below is an example PPI template that I seek to recreate.

naive swallow
#

the channel description was always like this

round current
#

And this is what I have recreated thus far. North bearing corresponding to 0ยฐ.

#

I can't figure out how to get minor ticks to print every 1ยฐ. Additionally, I cannot get the major axis tick labels to print every 15ยฐ starting at 0.
My current Code: https://paste.pythondiscord.com/urozeduzov.py
These issues have been stumping me for the last couple of days, so I figured it was time to ask for some advice. ๐Ÿ˜›

velvet anchor
#

Lemme get matplot installed and such and i'll take a loot

#

and by that I mean we're taking our break in class now so BRB 10

#

Damn can't get matplot installed

round current
#

Which dependency manager are you using?

velvet anchor
#

pip

#

Keeps failing for no reason

#

(in a virtual environment)

#

not in a virtual environment ti says operation isn't permitted

round current
#

Interesting. I am running it in a virtualized environment right now. I use MiniConda though as my dependency manager.

young aurora
#

Here's a cool thing I made:

naive swallow
#

ooh that's some pretty pretty data

young aurora
#

You bet it is. It's also problematic because it includes mythical beings, but hey, what can you do

dreamy tartan
#

Hi everyone,

Have anyone tried to predict words from letters? Or give word suggestions.

I want to train a model for my language with my own data and i want to predict words from letters or give word suggestions.

I'm open to all suggestions

small ore
#

@lilac shadow Andrew Ngs course is good for someone who fears math. He even teaches basic matrix multiplication and skips over derivations which require the simplest of PDs and straight away goes to the final result and concentrates more on discussing it

feral lodge
#

I've heard Ng's course is fantastic ๐Ÿ‘Œ

#

@dreamy tartan Do you mean like autocorrect, or, easier, autocompletion? Or does your language use other symbols like ั„ะบัƒัˆั‰ะบ ฮฑฮนฮตฯฮพฯ‚ฮฟฮฝฯ‚ฮณ เฆฌเงเฆ—เฆฌเฆกเงเงเฆธเงเฆ•เฆคเง and you want to predict those kinds of words using the abc alphabet? ๐Ÿ˜„

#

Can you show us a small example of what the program should be able to do?

#

@round current I've never used matplotlib, but this guy https://stackoverflow.com/a/44657941 seems to have created major and minor ticks using some other approach than transformation

#

I'm not sure how to interpret the graph @young aurora, could you explain? For Yuan for instance, the bar goes between 3ish and 12ish years, but wikipedia says the dynasty lasted from 1260 to 1368 ๐Ÿค” Those dates seem to be pretty exact, so why do you have errors bars?

young aurora
#

Oh, so this isnโ€™t the length of the dynasty - this is the length of time for individual rulers!

#

Also, itโ€™s all based on the traditional dating used by Chinese historians (AKA the old one) rather than newer dating methods.

#

I should be more clear with the title/X label.

feral lodge
#

Hmm, but these were the Yuan emperors -- only Kublai lasted for 34 years, but I'm interpreting the graph to say he ruled for maybe 2 years ๐Ÿ˜„

#

And Temรผr lasted 35 years, but the tick diving the Yuan bar in two is far from the middle ๐Ÿ˜„

young aurora
#

So this is important to understand for chinese dating

feral lodge
#

Lay it on me

young aurora
#

The emperors arenโ€™t necessarily what you see on Wikipedia

#

These are taken not from historians in the modern, technical sense, but rather from court records made and changed much later

#

Think of it as โ€œedited historyโ€

#

Iโ€™ll go check the data for them, though, and give you a more complete answer

#

Itโ€™s 398 total emperors, haha

feral lodge
#

Very interesting! Send it as a DM so we don't scare away new questions here

dreamy tartan
#

@feral lodge my language is using latin alphabet ๐Ÿ˜ƒ Peter Norvig approach helped me a lot and i think it solved my problem. With it im doing spell checker now. For autocorrect and autocompletion do i needed something like this im guessing. Am i correct?

round current
#

@feral lodge That is what I ended up doing. I generated small line segments prior to the transformation to create the minor and major ticks. Far from an elegant solution, but workable. I am satisfied with the end result.

young aurora
#

That looks cool!

feral lodge
#

This is the Peter Norvig approach, right? https://norvig.com/spell-correct.html Looks like it already functions as an autocorrector! For autocompletion a good starting approach is to just keep track of the letters the user has written, and keep a list of all words in the dictionary that begin with that sequence, sorted by how common the word is (if you have that info).

#

Yeah, that looks awesome, great job ๐Ÿ˜ฎ

velvet anchor
#

Nice job @round current glad you got it working

small ore
#

Slandon! You know the universe

hasty maple
#

I don't know why I was tagged to that haste bin Clay ๐Ÿ˜•

velvet anchor
#

Oh just because it was a dumb solution to just like random parameters and rerun it

#

๐Ÿ˜›

#

what makes more sense from an OOP perspective for an evolutionary algorithm? wrapping the model + parameters inside of an overarching simulation class or just letting the model be a procedural setup that calls parental gene manipulation as needed?

#

also because doing it that way does free up the model memory Ichi, it seems just running del model and gc.collect() will clear the GPU allocation allowing for another model to run agin

#

and, I thought, you mentioned being interested in a solution from within the same script

hasty maple
#

ah yeah I was, but it was hard to follow the code as I checked back a day later and wasn't really sure why I was tagged.

I'll keep del model and gc.collect() in mind. Do I need to import anything to run gc.collect()?

velvet anchor
#

I don't believe so

#

Once I get back to the office i'm gonna be rewriting it to add in the genetic / evolutionary algorithm instead of just rand()ing parameters

hasty maple
#

hard in the sense I didn't know why I was given the code, the code was easy to understand

velvet anchor
#

Yeah just because you'd been the person i'd been primarily talking to about it ๐Ÿ˜›

hasty maple
#

Good luck with the Genetic Algorithms

velvet anchor
#

Is there a quick way to calculate how much space a network will take up in memory?

#

Is it just input size^2 * layers?

small ore
#

Can someone tell me as to what genetic algorithm is and how and where they are useful?

velvet anchor
#

Ok so a genetic algorithm (also called evolutionary algorithm, or a few other terms) is a way used to create a set of parameters that gets better over time

#

It works, at a really high level, like this. Create a population of a number of parameters. Letโ€™s use 4 in this example

#

These parameters may be a list like:
1 layer, 3x3 window, 200x200 input size, sigmoid output function.
3 layers, 10x10 window, 400x400 input size, soft max output function
And 2 more with a different set of parameters you want to adjust

#

Now you take these sets of parameters and compute a score for them for how well they match an optimal output. So for my case, as an example, Iโ€™m scoring based on how accurate they are at identifying images.

Now that I have 4 sets of parameters and a score. I can create children. So Iโ€™ll take 2 random sets from my population and just randomly pick parameters. So I may take dads input size, layers and moms output function and window

Youโ€™ll take this new child set of parameters and score it. If itโ€™s better than one of the other 4, you replace the lowest score and run it again

#

You can also implement โ€œmutationsโ€ to your population. So you could take the parameters and add 1 to them or whatever

#

Does that make sense @small ore

small ore
#

Still reading and trying to make sense ๐Ÿ˜

#

So, if I understood it right, you set the parameters for each layer(including activation function, window, etc) in the beginning quite randomly and then use the "genetic algorithm" to change parameters ( is that what you call mutation) and see if it scores better?

velvet anchor
#

Yeah exactly. You just randomly set parameters. Make children from them and see if itโ€™s better. Then with survival of the fittest you replace the lowest score with the new one if itโ€™s better

#

And repeat until youโ€™re satisfied. And that whole process is called a generic / evolutionary algorithm

#

Itโ€™s useful for optimization. Thereโ€™s a few techniques like the one above, simulated annealing, swarming, etc

small ore
#

Wow. And here I am finding it difficult to understand even the basic NN well

#

There is loads to learn flop

velvet anchor
#

Thereโ€™s a lot to learn but itโ€™s not too hard once you get it ๐Ÿ˜ƒ

#

Iโ€™ve been working with Keras for a few months now and Iโ€™m like just barely scratching the surface kinda.

small ore
#

I have read forward and backward propogation twice now and while I understand everything that is said, I am yet to figure what are the knowns and what are unknowns in each step

velvet anchor
#

So if I'm understanding this correctly. a network with input shape 100x100 in RGB and 3 convolutional layers with 3 filters, with a dense output will take up

100x100x3 = 3000 +
100x100x3 = 3000 +
100x100x3 = 3000 +
100x100x3 = 3000 +
1x100x100 = 1000

Then multiply the batch size by the total amount?
did i do that right? does window size matter at all?

velvet anchor
#

I know thereโ€™s model.timeline() or model.summary() but I believe that the model gets loaded into memory first fully before computing that information. So Iโ€™m trying to avoid OOM errors instead of wrapping everything in a try except

velvet anchor
#

@small ore those hastebin links above are an example of an evolutionary algorithm (version 1.0) if you wanted to see one fully written

small ore
#

Oh wow. Not sure if I will understand it. I will take a look at it. Thank you

#

Are those ruby files? ๐Ÿ˜ฎ

velvet anchor
#

No itโ€™s python

small ore
#

Okay. It does look like python classes and methods but the extension in those hastebins made me think it could be ruby

#

Thanks for getting me interested in it

velvet anchor
#

No problem. Haha. Gave me a reason to come into work today and finish jt

velvet anchor
#

Yeah IDK why they got the ruby extension that's weird.

lilac shadow
#

if you edit them to have a .py URL, it'll have python syntax highlighting

velvet anchor
#

ALso dont hate on my awful use of kwargs. it's got a reason as this build continues fleshing out

quiet gyro
#

@small ore To answer the question you asked this morning at a very high level...

Genetic Algorithms are useful for optimizing extremely large data sets. They don't necessarily give you the absolute best possible value. However, they get very close, with significantly less computational cost (less computer power).

Think of it as finding a solution that's 90% as good as the best, in 2 days on your laptop, instead of the absolute best in a year on a supercomputer.

velvet anchor
small ore
#

@quiet gyro Thank you

lapis sequoia
#

hey data scientists, I wonder ... I have some time series data that I use as a regressor in a GLM for some other measurement. the problem is taht there are some high intensity peaks in this time series that kinda mess up the regression

#

i know that music processing people usually apply some "dynamic range compression" to make the whole recording have a similar amplitude. is that used for other data as well or would it change too much?

small ore
#

My two cents: Does the data representing those peak really matter to the representation of the data and prediction if they are removed?

#

@lapis sequoia

velvet anchor
#

Yeah that type of processing is used in other places Hypo

velvet anchor
#

It might change too much but smoothing out data from massive peaks is always part of the challenge but do keep in mind it might alter the rest of your results too so you might have to redo your formulas with the compressed data

bleak geode
#

Any pandas people here? I've got a litle thing I'm wondering if I can pick your minds.
I've got two tables, one for hospitals and one for patients.

#

Each patient record has a foreign key back to the hospital table.

#

each patient record also has a mortality result (ALIVE/EXPIRED)

#

I'm trying to add a column to the hospital table of the mortality ratio, that is n_expired / n_total

#

Currently I was thinking about going down the hospital table with iterrows() and grabbing all the patient records for the corresponding hospital and calculate the ratio one by one. I was wondering if there was a more pandas-y/pythonic way of doing this..

velvet anchor
#

Not that I know of. #databases may have a clever way to do it that Iโ€™m unaware of, but your method is how Iโ€™d approach it.

feral lodge
#

This would be the R-y way of doing it, and pandas is meant to resemble R, afaik:

import pandas as pd

apples = pd.DataFrame({'color': ["red", "green", "red", "yellow", "red", "green", "yellow"], 'taste': ["nasty", "tasty", "nasty", "tasty", "tasty", "tasty", "nasty"]})

color_stats = pd.DataFrame({'color': ["red", "green", "yellow"]})

def compute_tasty_ratio(c):
    c_apples = apples[apples['color'] == c]
    n_tasty = (c_apples['taste']=="tasty").sum()
    n_tot = c_apples.shape[0]
    return n_tasty/n_tot

color_stats['ratio'] = color_stats.color.apply(compute_tasty_ratio)

print(apples)
print(color_stats)

In this example we add a column to show the ratio of tastiness of apples of different colors

#

@bleak geode

#

Prints

    color  taste
0     red  nasty
1   green  tasty
2     red  nasty
3  yellow  tasty
4     red  tasty
5   green  tasty
6  yellow  nasty

    color     ratio
0     red  0.333333
1   green  1.000000
2  yellow  0.500000
velvet anchor
#

๐Ÿ˜ฎ @feral lodge I thought you died

feral lodge
#

I live!! Got a 70 hour work week during the summer though, so I'll probably not be as active :O

#

Besides, you keep answering all questions for me ๐Ÿ‘Œ

velvet anchor
#

Ok so @feral lodge general question here. If I wanted to release the evolutionary algorithm onto pip as a framework what makes the most sense. Allowing users to submit their own class and a list of keys to create children of? Or another way?

feral lodge
#

Let me check my textbook and refamiliarize myself with GAs ๐Ÿค”

#

Is it a general GA, or is it specifically for training NN weights?

velvet anchor
#

I want to at some point release another version specifically for NN weights but thereโ€™s a lack of GA frameworks for Python in general so having a general version too isnโ€™t a bad idea

feral lodge
#

Good initiative!

velvet anchor
#

At some point for NN weights I incision a constructor where you say like โ€˜Genetic(pop=X,conv_layers=y,...)โ€™ and have it generate full models

#

But for a regular GA itโ€™s much simpler

feral lodge
#

The way I see it the algorithm needs two things: how each individual member of the population is formatted (like, is it a binary vector? A vector of floats? A mix? How long is the vector?) and a fitness function

#

What did you mean by their own class?

#

Would your framework be fancier than this?

lilac shadow
#

genetic algorithms amaze me ^^

feral lodge
velvet anchor
#

Not really. But I wasnโ€™t going to implement the fitness function from within (you can see my framework as it is up a few lines). Essentially I envisioned a way so that the user has s simulation class that scores fitness.

From within simulation they instantiate my genetic class providing a list of attributes to be randomized (the things we care about) and a class to contain them. My GA framework will handle the evolutions / population control / etc so that

#

From the users perspective all they have to do would be
GeneticF = Genetic(...)
For X in GeneticGenerator
X.fit = function result.
X.compare()

#

And then after a predetermined number of iterations it would spit out the population of most fit results

#

Where the generator would handle evolving from the pop / mutations / etc

#

The reason youโ€™d want a class to contain values vs a dict doesnโ€™t super matter but it gives you freedom to apply @properties so you can apply processing to specific results later

feral lodge
#

Oh, I see ๐Ÿค” So they test fitness themselves, and your class functions as a way to produce new individuals which to test?

velvet anchor
#

Yeah

#

Unless thereโ€™s a better way

feral lodge
#

Seems fine to me!

#

I can't think of any direct improvement of what you have right now, except generalizing the code in Genetic

#

I think ๐Ÿค”

velvet anchor
#

Yeah that comes too

#

I just needed something workable and I had 2 hours to write and test before class ๐Ÿ˜‚

feral lodge
#

Or maybe I still don't get it completely, the reproduction rates are based on the fitness evaluation right?

#

The percentages in that screenie

#

Does the user have to input the fitness evaluation for each individual?

velvet anchor
#

Not exactly. Itโ€™s just kind of random sampling

feral lodge
#

But the point is to choose/evolve fit individuals ๐Ÿ˜ฎ

velvet anchor
#

The user would supply fit = function result inside a loop or whatever

#

And then the framework would handle making sure it fit within the population

#

It is the goal to evolve fit individuals. Yes.

#

However more fit parents doesnโ€™t always equal a more fit offspring

#

So just evolving from the two most fit every time doesnโ€™t guarantee the best result

#

So you randomly sample parents from across your population that are within the โ€fittestโ€

#

So like in your picture above, unless I misunderstood what you asked, you donโ€™t mate just the best two and the worst two. You just mate all of them kinda and see what the best results are. Make a new population of best performers thatโ€™s the same size. Repeat

feral lodge
#

Oh no, definitely not just pair the best two

#

Bu I think our two algorithms are slightly different

#

"In this particular variant of the genetic algorithm, the probability of being chosen for reproducing is directly proportional to the fitness score, and the percentages are shown next to the raw scores."

I was thinking this

velvet anchor
#

Maybe. Iโ€™m on mobile too so I could just be representing my ideas incompletely

#

Yeah nah. Iโ€™m not choosing based on fitness scores in any way. Though i suppose it wouldnโ€™t be hard to implement. I was thinking of just mating all of them together so 1&2, 1&3, 1&4. 2&3, 2&4, 3&4. Make new population of best 4. Repeat

feral lodge
#

I see ๐Ÿค” Is this approach based on a paper or something?

#

It seems time consuming

velvet anchor
#

Nah itโ€™s not. Just how my professor explained how it worked. ๐Ÿ˜‚

#

But I guess at the same time GAs arenโ€™t not time consuming either.

feral lodge
#

Oh sure, they're basically a random search! But I was thinking your approach adds a lot of extra randomness and time/space requirements, while disregarding a big part of the "genetic" aspect ๐Ÿค”

velvet anchor
#

Yeah could be for sure. Definitely wouldnโ€™t hurt to add in percentage to be chosen

feral lodge
#

But, the Genetic class has no knowledge of the fitness function then?

#

That's all handled by the user?

velvet anchor
#

Yeah

feral lodge
#

So how do you choose the best children after pairing all individuals?

velvet anchor
#

For child in babies:
If child.fit > worst population
Replace worst with baby

feral lodge
#

And child.fit is computed how? o:

velvet anchor
#

The generator function would return a child object to the user

#

The user would score the file off their fitness function

#

Set child.fit equal to its result

feral lodge
#

gotcha ๐Ÿ‘Œ

velvet anchor
#

Then the generator function could, using the childโ€™s newly given fitness score compute a new child object to supply

feral lodge
#

You said you pair the individuals [1,2,3,4] like this: [1,2], [1,3], [1,4], [2,3], [2,4], [3,4] right?

velvet anchor
#

Ye

feral lodge
#

Does each pairing only generate one offspring?

velvet anchor
#

Right now

#

But thereโ€™s nothing making that not be the case later

#

But it seems that most implementations only generate one off spring

feral lodge
#

Indeed! But I think most implementations don't choose parent like this ๐Ÿ˜„ In your case, if 1 is the individual [1111 1111], we can never concieve a child with 1111 in the second half

#

But one bigger thing I was thinking regarding that, is that you generate (N choose 2) children each generation leap, which the user has to test before settling on the N best ones, which become the next generation

#

Whereas they in the figure up there, generate N children each generation leap

velvet anchor
#

Right

#

Itโ€™s kinda hard to say which is correct I think. The % gets you less but possibly better guesses but this way is a more complete sampling. Hard to say both have their pros and cons

feral lodge
#

Sure! And I'm definitely no expert

#

But if we compare the complexities of f(x) = x choose 2 and g(x) = x we get this

#

So for x choose 2 to be a reasonable choice, then the best child must be very similar a member of the original population

#

Whereas the blue line will quickly move through populations, finding descendants very different from the initial pop

velvet anchor
#

Right

lilac shadow
#

nerds! :D

feral lodge
velvet anchor
#

I wonder if thereโ€™s any justification for just random choice. Where each individual has 1/n choice of being a parent

feral lodge
#

That starts to approach beam search a bit imo ๐Ÿค”

velvet anchor
#

I was just thinking in terms of NNs on that where

#

Given certain problems you have to beam because small changes can give drastic results

#

Especially with breeding and activation functions

feral lodge
#

Sure, that's a pickle

lilac shadow
#

ooh i like pickles :^)

feral lodge
#

Are you sure you want to have the activation function as part of the GA though? When I first heard you explaining the application of GAs in NNs i figured you were just going to evolve the weights

velvet anchor
#

It can go both ways

#

Not gonna evolve the output function to keep the uh

#

Answer range the same

#

But everything else is fair game I think

feral lodge
#

You da boss ๐Ÿ‘Œ

velvet anchor
#

Right like obviously it makes no sense to score fitness on networks with sigmoid

#

And then suddenly breed a tanh answer

#

So your last dense layer would stay the same

#

But I think activation functions on the convolution layers can be helpful

feral lodge
#

Worth a shot! Have you tried training with it yet?

velvet anchor
#

Itโ€™s running over the weekend

#

I had some bugs to quash which I think have all been taken care of

#

Was gonna go into work today in a couple hours and see if itโ€™s still running over night

velvet anchor
#

I feel like Iโ€™m missing something scikit learn in not knowing what itโ€™s capable of. Is there a resource for showing all the advanced stuff it can do?

spring radish
#

's got a cool flowchart

small pumice
#

There are quite a few tutorials on stock price prediction with machine learning, but many of them are outdated or use the Quandl library, which only has stock data to March of this year. Does anyone know of any tutorials that donโ€™t have these problems?

spring radish
#

note: if there was a tutorial that worked for actually getting a profitable model, everyone would be doing it

#

i had a group of friends work on a machine learning model for stock prediction for half a year, and they only got vaguely positive predictions that hypothetically made them money but were never tested live...

#

they couldn't find data without paying big bucks, either.

quiet gyro
#

Yeah, anything that's real-time or any useful aggregate analytics is often behind a paywall

young aurora
small ore
#

@small pumice DM me for data if it is only for testing your code and learning. I do not know how credible the data is though

velvet anchor
#

I like that Joseph

#

I think I liked the vertical timeline more but the round data has a cool feel to it

hasty maple
#

Donut chart ๐Ÿ˜„

lapis sequoia
young aurora
#

So the round data and the is actually representing the data differently

velvet anchor
#

My project this semester is to write some type of technical manuscript. Going to wind up doing a tutorial / user's manual for Keras. Is there any interest in that being posted here? Ideally it'll cover types of networks / optimizations / when to use techniques such as forwards / backwards propagation, etc.

quiet gyro
#

Maybe send it to the Keras developers?

#

I'm sure they'd love to add it to their documentation

velvet anchor
#

Maybe. I just HATED having to have 5-10 tabs open while researching at the start

#

and found no concrete starting point

#

every tutorial is the same copy pasted MNIST flower petal model with a sentence variation

quiet gyro
#

Definitely send it to them then

#

Depending on how their docs are done,you could just take pieces of content from whatever you make and add it, then submit a pull request or something

#

The bane of all software projects is lack of good documentation

velvet anchor
#

and no users

quiet gyro
#

Often because they have no idea how to use the awesome thing you built and don't have the time or desire to retrace your steps to figure out how it works ๐Ÿ˜‰

velvet anchor
#

True

small ore
#

People will also like a quick reference guide for switching from one ML moduel to the other

velvet anchor
#

Yeah. Itโ€™ll all be in Keras scope those because unless youโ€™re doing natural language modeling, which the Azure platform is set to excel at, thereโ€™s not a huge reason to use like pytorch here, Keras there, etc to my understanding of it

hasty maple
#

Isn't Keras docs easy enough to use? I didn't have any trouble for the most part

velvet anchor
#

Yeah the docs are nice but piecing together the docs into a coherent structure was :puke: cuz there wasnt like any nice tutorials

#

except the datasets ones thats used everywhere

#

theyre more geared towards people with some type of knwoledge about ML in general not for newbies

hasty maple
#

IMO ML shouldn't be like you can pick a library,learn it and use it. You should understand some of the principles,ideas before being able to use them, so in a way the current state of resources is good to filter out overenthusiastic hardly working entrants to this field

velvet anchor
#

I dunno. I guess in theory thats not wrong, but theres value in being an entrant building something and understanding it

#

even if you dont know the innerworkings behind it

hasty maple
#

the cycle should be learn-->understand-->build, you don't need innerworking level understanding but atleast the surface level, so that you would know where else a certain ML concept might be applied to

velvet anchor
#

I need a data set of headshots that are larger than 64x64. Anyone have any ideas?

#

Ideally Iโ€™d like 400x400 or larger

#

I might scrape insta for them

feral lodge
velvet anchor
#

Checked all those. Using bits and pieces from like 4 different ones there

#

Thinking scraping insta or Facebook might be best

#

But idk if thatโ€™s okay to use in research ๐Ÿ˜‚

feral lodge
#

Reminds me of an industrious little company called cambridge analytica ๐Ÿ˜

velvet anchor
#

5million faces. Machine learning intensifies.

#

Computing gradients of all of those will take a month though. Lol.

#

Probably worth

feral lodge
#

No problem, looks like half of them are obama's face ๐Ÿ˜›

velvet anchor
#

Worth. That works since half my deepfake is trump ๐Ÿ˜‚

feral lodge
#

๐Ÿ˜„

velvet anchor
#

Dlib is such a great library

feral lodge
#

Never used it

#

I've only ever used opencv

#

Oh, it's not only image processing

velvet anchor
#

Nah. Itโ€™s just quick to double check a dataset and make sure the images have faces

feral lodge
velvet anchor
#

I can test

#

Some of those obscured look like the subject just ripped a fat vape

feral lodge
#

aw ye

velvet anchor
#

Go green

lilac shadow
#

lmao

velvet anchor
#

@feral lodge Found 2595 correct faces out of 10049 total images

#

with dlib

feral lodge
#

Oh snap

small ore
#

Was that on those obscured faces?

velvet anchor
#
import dlib
import glob
from skimage import io

dir = "/faces/dir"

detector = dlib.get_frontal_face_detector()
correct = 0
total = 0
for x in glob.glob(dir + "*.jpg"):
    img = io.imread(x)
    total += 1
    faces = detector(img, 1)
    if len(faces) > 0:
        correct += 1
print "Found {} correct faces out of {} total images".format(correct, total)
#

yeah

small ore
#

No wonder

velvet anchor
#

25% isnt terrible when you can literally only see half the face

small ore
#

May be you should give parts of the face as training sets

velvet anchor
#

im just using dlibs default model

#

because slandon was curious if it works

#

enlarging the images gave about 10% more correct

#

Found 3636 correct faces out of 10049 total images

small ore
#

I am just throwing random thoughts ๐Ÿ˜ƒ

velvet anchor
#

Yeah im just messing around with things while I keep tweaking my research project

small ore
#

<@&267628507062992896> Worth pinning the code-block above

south quest
#

uhhhhhh

#

That's probably not the kind of thing we'd pin

velvet anchor
#

xd

#

I upped the resampling to 10 to see if it changed anything and its taken like 20minutes to run

velvet anchor
#

40 minutes still going strong

weak kiln
#

no idea why we would pin that.

velvet anchor
#

same

naive swallow
#

same

small ore
#

Well, if someone is trying to write their own image recognition, then that code above will serve as an avaluation standard to measure your own code against

velvet anchor
#

@feral lodge set it up to check every resampling rate dlib offers, seems to get about 10% better each sampling. didn't have time to do 10 fully, but 0 was 1100, 1 was 2200, etc will report tomorrow with exact results. it looks promising though

#

not that it matters for anything but it's cool none the less

#

Is also works on the CPU so it wont impact my keras training xd

young aurora
#

Hey all - I'm trying to do two subplots - each being a LineCollection - in matplotlib. Just to make it easy since I am not providing the underlying data, here's the function I created to make the first line collection - I only need to duplicate this so that there are two of them side by side.

                             'x2': Xia['Finish']})

segs = np.zeros((len(df_lines), 2,2))
segs[:,:,1] = df_lines[["x1","x2"]].values


fig, ax = plt.subplots(figsize=(3,20))

colors = [mcolors.to_rgba(c)
          for c in plt.rcParams['axes.prop_cycle'].by_key()['color']]

line_segments = LineCollection(segs, colors=colors, linewidths=7)
ax.add_collection(line_segments)

ax.set_ylim(-1,1)
plt.title('Xia Dynasty', fontsize = '25')
plt.ylabel('Year', fontsize = '20')
plt.yticks(fontsize = '15')
plt.xticks(range(len(begin)), "")

plt.ylim(-2230, -1750)
plt.xlim(-.3,1)

for i in range(18):
    plt.text(.1, begin.iloc[i] + length.iloc[i]/2, event.iloc[i], ha='left', fontsize = '14', rotation=0)

plt.gca().invert_yaxis()
plt.show()
fig.savefig('xiadynasty.png', dpi=100)```
velvet anchor
#

code blocks ๐Ÿ˜ฆ

young aurora
#

I've tried this solution, and I get an error about the image being too big to create.

begin = Xia['Start']
end = Xia['Finish']
length = Xia['Length']

event2 = XiaXSZCP['Dynasty']
begin2 = XiaXSZCP['Start']
end2 = XiaXSZCP['Finish']
length2 = XiaXSZCP['Length']

df_lines = pd.DataFrame({'y1': Xia['Start'], 
                             'y2': Xia['Finish']})

df_lines2 = pd.DataFrame({'y1': XiaXSZCP['Start'], 
                             'y2': XiaXSZCP['Finish']})

segs = np.zeros((len(df_lines), 2,2))
segs[:,:,1] = df_lines[["y1","y2"]].values

segs2 = np.zeros((len(df_lines2), 2,2))
segs2[:,:,1] = df_lines2[["y1","y2"]].values

colors = [mcolors.to_rgba(c)
          for c in plt.rcParams['axes.prop_cycle'].by_key()['color']]

plt.subplots(figsize=(3,6))
ax1 = plt.subplot(1,2,1)
line_segments = LineCollection(segs, colors=colors, linewidths=7)
ax1.add_collection(line_segments)
for i in range(18):
    plt.text(.1, begin.iloc[i] + length.iloc[i]/2, event.iloc[i], ha='left', fontsize = '12', rotation=0)
plt.title('Xia Dynasty', fontsize = '25')


ax2 = plt.subplot(1,2,2)
line_segments2 = LineCollection(segs2, colors=colors, linewidths=7)
ax2.add_collection(line_segments2)
for i in range(1):
    plt.text(.1, begin2.iloc[i] + length2.iloc[i]/2, event2.iloc[i], ha='left', fontsize = '12', rotation=0)


plt.gca().invert_yaxis()
plt.show()
fig.savefig('xiadynasty.png', dpi=100)```
velvet anchor
#

I should really gt around to learning matplotlib so I can help with these

#

have you tried lowering your dpi maybe? or does the error occur sooner than that

young aurora
#

It's sooner - removing it or limiting it to a tiny amount still spits out a "this is way too big" error. Here's the error message when dpi = 100

#

<Figure size 216x432 with 2 Axes>```
manic mason
#

Thats not a dpi problem

velvet anchor
#

Yeah was just a quick troubleshooting thing to be sure :p

manic mason
#

Sorry, I don't know matplotlib, wish I could help more

#

Maybe a list is too big, or a loop doesnt have an end condition

velvet anchor
#

Stack seems to think that it could be a stray text coordinate. Make sure they're all being given within the bounds of the image. but I'm not sure about MPL either, @feral lodge normally handles these questions hahaha