odd yoke Aug 2, 2020, 11:44 PM

#

(or comment it)

fallow sandal Aug 2, 2020, 11:44 PM

#

oh and just do model_dir = (path for my custom model)

#

?

odd yoke Aug 2, 2020, 11:45 PM

#

yes

#

make sure this uses the correct api, this one https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md

GitHub

tensorflow/tensorflow

An Open Source Machine Learning Framework for Everyone - tensorflow/tensorflow

#

how did you save your model before ?

#

if you used tf.saved_model.save(model, path) then this should work

fallow sandal Aug 2, 2020, 11:46 PM

#

I was following tf2 documentation and it made me save my model like this

#

📎 unknown.png

odd yoke Aug 2, 2020, 11:47 PM

#

perfect

fallow sandal Aug 2, 2020, 11:47 PM

#

Woah it works

#

TY SO MUCH 😁 @odd yoke

odd yoke Aug 2, 2020, 11:48 PM

#

yw

fallow sandal Aug 2, 2020, 11:48 PM

#

That was so much effort (I have like beginner to intermediate python) so like it was a big challenge haha XD

#

But I think I'm going to look into YoloV3 or V4 framework next time..

odd yoke Aug 2, 2020, 11:48 PM

#

personal opinion here, but tensorflow is big mess

fallow sandal Aug 2, 2020, 11:48 PM

#

yeah

#

YoloV3 and V4 seems easier to get set up

lapis sequoia Aug 2, 2020, 11:49 PM

#

thinking of switching to torch

odd yoke Aug 2, 2020, 11:49 PM

#

they are algorithms, you can implement them in tensorflow too

fallow sandal Aug 2, 2020, 11:49 PM

#

oh yeah pytorch was recommended to me too

#

i mean fun weekend project touching into something i wasnt comfortable with

odd yoke Aug 2, 2020, 11:49 PM

#

to be fair, TF2 is orders of magnitude better than TF1

#

they made the api more "torch-like"

fallow sandal Aug 2, 2020, 11:49 PM

#

ugh finding documentation was the big pain

odd yoke Aug 2, 2020, 11:49 PM

#

it was completely unintuitive before

fallow sandal Aug 2, 2020, 11:49 PM

#

lots of tf1 documentation mixed with tf2 XD

lapis sequoia Aug 2, 2020, 11:50 PM

#

I think eager execution made it choppy

odd yoke Aug 2, 2020, 11:50 PM

#

yeah it's officially stable but only as of recently

fallow sandal Aug 2, 2020, 11:50 PM

#

Ok, something went wrong with my training lol

#

It's recognizing my object twice..? One with a longer bounding box lmao

odd yoke Aug 2, 2020, 11:51 PM

#

so, that is a common thing, but usually models have techniques to suppress overlapping bounding boxes

#

eg, faster-rcnn uses what's called "non-maximum suppression" at the end of the model

#

that only works if the boxes are extremely similar tho, if that's your case here

fallow sandal Aug 2, 2020, 11:52 PM

#

I'm guessing this is normal? I needed to pick a random object so I picked a toy goat from my fav robotics part supplier (i just like the goat)

#

📎 116346490_333516914450196_5089848629270352261_n.png

#

I only trained it for 20 minutes with about 55 images

odd yoke Aug 2, 2020, 11:53 PM

#

hard to know just like that, but it's safe to expect it's due to lack of training

#

or overfitting

#

can't really make any conclusion just like that

fallow sandal Aug 2, 2020, 11:53 PM

#

it was based on one of the my ssd resnet 50

#

maybe I shhould try a faster rcnn model

odd yoke Aug 2, 2020, 11:54 PM

#

if you don't need real-time object detection, it's the current SOTA

fallow sandal Aug 2, 2020, 11:54 PM

#

I stopped training early because I thought something crashed, I was checking my task manager and for some reason my GPU and CPU activity dropped to liek near 0

#

(like idle, not 0)

odd yoke Aug 2, 2020, 11:55 PM

#

weird, you should log metrics during training to have a better idea of what's going on

fallow sandal Aug 2, 2020, 11:55 PM

#

Is it a config on how intensive the gpu/cpu goes? Cause like all the youtube videos I was watching said it would make my computer go sicko mode

#

I think I still have my tensorboard, let me check

lapis sequoia Aug 2, 2020, 11:55 PM

#

guys would you recommend a good book for code-approach deeplearning, preferably keras

#

documentations have me lost tbh

fallow sandal Aug 2, 2020, 11:55 PM

#

📎 unknown.png

#

Not sure, how to make sense of this, documentation told me total loss of less than 1 was good, while another youtube video was like less than 0.05 consistently

odd yoke Aug 2, 2020, 11:56 PM

#

I don't know about keras, but pytorch recently distributed their free book called "Deep Learning with Pytorch", and it seemed pretty good when I skimmed through it

#

You should try to log your validation set too

fallow sandal Aug 2, 2020, 11:57 PM

#

oh shoot

#

Does it only log my training set?

odd yoke Aug 2, 2020, 11:58 PM

#

yes
also, your learning rate seems to be gradually raising, is that warm up that you stopped too early, or is it a mistake

#

https://pytorch.org/deep-learning-with-pytorch @lapis sequoia

PyTorch

An open source deep learning platform that provides a seamless path from research prototyping to production deployment.

fallow sandal Aug 2, 2020, 11:59 PM

#

I have no clue, I should probably try running it again but for a bit longer and see how the graph changes over time

#

Man, I got so many warning messages so I was just scared something would have gone wrong regardless

#

lol

odd yoke Aug 2, 2020, 11:59 PM

#

it contains some code, like here, but also charts, more theoretical details, intuitions to have etc

📎 unknown.png

#

But again, it's torch, not keras

fallow sandal Aug 2, 2020, 11:59 PM

#

voxels? 😮

odd yoke Aug 3, 2020, 12:00 AM

#

3d pixels

fallow sandal Aug 3, 2020, 12:00 AM

#

So it doesn't use tensors?

frank bone Aug 3, 2020, 12:00 AM

#

Any recommendations for interactive charting for time series data on local machine, preferrably not in a browser?

odd yoke Aug 3, 2020, 12:00 AM

#

~~well, Nd pixels~~brainfart

fallow sandal Aug 3, 2020, 12:00 AM

#

supreme voxels

#

XD

odd yoke Aug 3, 2020, 12:01 AM

#

wait no in this case it's 3d

#

they have volumetric images of ct scans or something, voxels are how you refer to the pixels in a 3d image

lapis sequoia Aug 3, 2020, 12:03 AM

#

https://pytorch.org/deep-learning-with-pytorch @lapis sequoia
@odd yoke seems pretty cool skimmed through the contents

fallow sandal Aug 3, 2020, 12:03 AM

#

hmm interesting

#

also yeah, my model is having issues with doing two bounding boxes lols

#

📎 wOOPUAlhTn6rgAAAABJRU5ErkJggg.png

#

but its working so im happy ^_^

#

thank you again @odd yoke

#

i can sleep it peace tonight knowing that atleast i didnt go away empty handed with diving into this mess of documentation lol

#

hmm

#

do you think thhe fact that I didn't have multiples of the goat

#

led to this maybe?

#

If I trained it better with pictures with multiples, that might have helped the training maybe..? not sure but random theory

odd yoke Aug 3, 2020, 12:08 AM

#

multiples ? as in many instances of the goat in the same image ?

fallow sandal Aug 3, 2020, 12:09 AM

#

yeah

odd yoke Aug 3, 2020, 12:09 AM

#

it can help, but it shouldn't be needed

fallow sandal Aug 3, 2020, 12:09 AM

#

Yeah hmm

odd yoke Aug 3, 2020, 12:09 AM

#

(cute goat btw)

fallow sandal Aug 3, 2020, 12:10 AM

#

thx 😂 thats why I did it haha. we had a more practical idea of detecting different types of physical ports (USB) for the tech unsavvy but i guess i just wanted to see how hard it would take to use tensorflow

safe tapir Aug 3, 2020, 1:34 AM

#

Anyone familiar with text-to-spectrogram?

Specifically interested in what spectrogram features make certain vocal characteristics (e.g. "sad", "happy")

lapis sequoia Aug 3, 2020, 6:45 AM

#

hey friends, im making a GAN atm and im having a bit of trouble with the input pipeline and training step, would anyone be able to help me out?

lapis sequoia Aug 3, 2020, 7:15 AM

#

https://pastebin.com/91FW6kGU for reference

Pastebin

GAN update - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

im using the tensorflow pix2pix function as the basis for the generate images and train function

#

but im having problems with the images. i can show how i was storing my data as a h5py file

#

im not sure if the issue is that i need to load my images using a tf.dataset.dataset object

#

but then i dont know how to go about that

#

and im not too sure what to do with the generate images functions and stuff because the way they do it in the pix2pix documentation on the tensorflow website seems really efficient

slender robin Aug 3, 2020, 7:59 AM

#

Hi all, I have some doubt about web scraping. Any one have experience about product image get in big basket.

#

I got other data for example Product name, quality, price...

hollow silo Aug 3, 2020, 10:50 AM

#

pretty basic question, i am trying to decide between 2 projects to put on my resume
one project is where i built an OCR system from scratch - involved a lot of image processing to cluster and extract text patches from images and then pre-process them to look as close as possible to the actual training set
but the network was fairly simple and the data set was just EMNIST
and another was pointcloud segmention using a multi-class SVM
i hada dense point cloud data set and i trained an SVM to classify different regions in it. any suggestions?

fervent bridge Aug 3, 2020, 11:41 AM

#

@lapis sequoia https://stackoverflow.com/questions/48309631/tensorflow-tf-data-dataset-reading-large-hdf5-files

Stack Overflow

TensorFlow - tf.data.Dataset reading large HDF5 files

I am setting up a TensorFlow pipeline for reading large HDF5 files as input for my deep learning models. Each HDF5 file contains 100 videos of variable size length stored as a collection of compres...

#

Take a look at this link, also by chance can you send me the code used to convert image arrays into HDF5 files. I am currently needing to do so would like some help.

lapis sequoia Aug 3, 2020, 11:42 AM

#

for sure man

#

i was using sentdex as a basis

#

im thinking i might switch to numpy arrays but ill send in a sec

fervent bridge Aug 3, 2020, 11:43 AM

#

Hmm Sentdex has an HDF5 video?

lapis sequoia Aug 3, 2020, 11:43 AM

#

nah not on hdf5

#

he uses pickle but i used hdf5 because i had issues with pickle

#

pickle seemed to mess up my files

#

i was using an image classifier and running a test with it

#

and i got significantly worse results with pickle than i did with hdf5

#

https://pastebin.com/PNBSQ73Y

fervent bridge Aug 3, 2020, 11:45 AM

#

Ah what did you use to learn HDF5

lapis sequoia Aug 3, 2020, 11:46 AM

#

it was a while ago but i looked up a bunch of stuff on like medium or towardsdatascience

#

i didnt do anything special other than store the image arrays in them

#

but i dont think its too complex from memory

#

https://www.christopherlovell.co.uk/blog/2016/04/27/h5py-intro.html

Christopher Lovell

h5py: reading and writing HDF5 files in Python

A brief guide on how to read and write HDF5 files in Python using the h5py package

#

this site's pretty good

fervent bridge Aug 3, 2020, 11:47 AM

#

Ah ok I see what you did I mean it helps that you are using the same code base as I am 😄

lapis sequoia Aug 3, 2020, 11:48 AM

#

hahaha

#

glad i could help 🙂

#

what is your project?

fervent bridge Aug 3, 2020, 11:49 AM

#

You just stored X and y in a respective dataset, cool cool, I want to do the same thing but without having to loop through all the images and store them in a var, as I am looping through 40,000 images 277x277. I was wanting to append to the X dataset and y dataset as I looped through the images so that I would not have to store the arrays in memory all at once.

#

Any idea on how to do this?

lapis sequoia Aug 3, 2020, 11:51 AM

#

hmm

fervent bridge Aug 3, 2020, 11:52 AM

#

I grabbed a image dataset from Google and am working on a ANN, CNN, RNN and checking the differences

lapis sequoia Aug 3, 2020, 11:52 AM

#

so you mean appending them into one single dataset rather than what i did where i have images and labels as separate

fervent bridge Aug 3, 2020, 11:52 AM

#

No I want to have them seperate as you have them, just that I want to be able to append my arrays to that dataset as if I was appending them to a list

#

So that I would not have to load them all into the list thus having loaded them into memory

lapis sequoia Aug 3, 2020, 11:53 AM

#

ah right

fervent bridge Aug 3, 2020, 11:53 AM

#

Not have to loop through and add all images into a list then to transfer that list into a HDF5 file

lapis sequoia Aug 3, 2020, 11:54 AM

#

right

fervent bridge Aug 3, 2020, 11:57 AM

#

@lapis sequoia Nice the link you provided has the same questions I did in the comment sections

#

https://stackoverflow.com/questions/47072859/how-to-append-data-to-one-specific-dataset-in-a-hdf5-file-with-h5py

Stack Overflow

How to append data to one specific dataset in a hdf5 file with h5py

I am looking for a possibility to append data to an existing dataset inside a .h5 file using Python (h5py).

A short intro to my project: I try to train a CNN using medical image data. Because of the

#

https://www.pythonforthelab.com/blog/how-to-use-hdf5-files-in-python/#resizing-datasets

Python For The Lab

How to use HDF5 files in Python

HDF5 allows you to store large amounts of data efficiently

#

It directed me to these two links, will have to read them through, thanks man. Is it fine if I ping you with any questions if they arise?

lapis sequoia Aug 3, 2020, 11:58 AM

#

for sure man

#

im not all that great but ill try to help ^_^

fervent bridge Aug 3, 2020, 11:59 AM

#

Haha well both learn

#

Have you gotten your hands on a Kaggle comp yet?

lapis sequoia Aug 3, 2020, 12:00 PM

#

ah wait i think i get what youre trying to do

#

so you mean you have like different datasets that youre appending into one h5 file

#

and nah im on google's servers

fervent bridge Aug 3, 2020, 12:01 PM

#

Just appending in batches into one dataset

lapis sequoia Aug 3, 2020, 12:01 PM

#

yeah gotcha

#

my b

fervent bridge Aug 3, 2020, 12:01 PM

#

So append like 1 or 100 image arrays at a time so that I do not have to append all 40,000 at once

lapis sequoia Aug 3, 2020, 12:01 PM

#

yeye

#

so like

#

the img arrays are np arrays right

#

hmm

#

that stackoverflow link seems to do everything you need i think

acoustic halo Aug 3, 2020, 1:00 PM

#

@fervent bridge Did you not manage to get memmaps working?

serene oar Aug 3, 2020, 1:29 PM

#

Hi!

I'm plotting with pyplot / mpld3 and I notice that whenever I do

plt.show()

I see the correct X tick labels, which are normal strings from a list.
However, when I do

mpld3.show()

These labels don't show correctly and I just get numbers. It seems to be a known issue but I only found a fix for it for some guy using dates, not strings.

arctic cliff Aug 3, 2020, 1:37 PM

#

What's the usage of this: plt.figure() Because when I got rid of it the plotting worked fine

serene oar Aug 3, 2020, 1:41 PM

#

I do fig, ax = plt.subplots(figsize=(20, 10))

arctic cliff Aug 3, 2020, 1:41 PM

#

What's it used for ?

serene oar Aug 3, 2020, 1:43 PM

#

For plotting.
I create a bar chart comparing the stats of different features.

arctic cliff Aug 3, 2020, 1:43 PM

#

I meant the fig variable

#

which is plt.figure() if I'm not mistaken

#

Plotting seems to work fine without it

serene oar Aug 3, 2020, 1:45 PM

#

Not sure how to accomplish that.
I must create a figure to have the subplots in, no?

#

Also if I didn't have that fig I couldn't save it to html later. \

arctic cliff Aug 3, 2020, 1:46 PM

#

📎 unknown.png

serene oar Aug 3, 2020, 1:47 PM

#

With pyplot it works for me too.
But I am aiming for mpld3. It's much more interactive when used as html.

arctic cliff Aug 3, 2020, 1:47 PM

#

I see ..

#

Thanks !

serene oar Aug 3, 2020, 1:48 PM

#

Soo.. any idea on how to get the labels to mpld3 plot?

    ax.set_xticks(x)
    ax.set_xticklabels(labels)

This does the trick for pyplot but not mpld3

arctic cliff Aug 3, 2020, 1:50 PM

#

I just googled, I'm not sure if this is gonna help but here

#

https://stackoverflow.com/questions/38205238/plotting-date-labels-with-mpld3

Stack Overflow

Plotting date labels with mpld3

I need help with this.
The plot is fine but when I'm hovering over the points I get the S&P price as y (which is right) but instead of the date as x I get a timestamp.
Is there anyone able to ...

#

https://ask.xiaolee.net/questions/1092991

python - Possible to make labels appear when hovering over a point ...

I am using matplotlib to make scatter plots. Each point on the scatter plot is associated with a named object. I would like to be able to see the name of an object when I hover my cursor over the ...

serene oar Aug 3, 2020, 1:54 PM

#

Hm, it's a bit different from what I'm going for but it might help. Thanks

autumn veldt Aug 3, 2020, 2:45 PM

#

excuse me, do u guys know why i keep getting the same accuracy on my SVMClassifier? but i got a variant accuracy when i test on my DecisionTree(DT) Classifier?

📎 dt1.PNG

#

📎 dt2.PNG

desert oar Aug 3, 2020, 2:55 PM

#

SVMs don't work in epochs

#

well, internally they do. because the fitting algorithm is usually iterative

#

in fact the same is true for DecisionTree

#

i assume these are sklearn models?

#

the whole idea of an "epoch" is really an implementation detail of gradient descent

#

basically anywhere other than a neural network, the software tries its best to hide the optimizer from the user

autumn veldt Aug 3, 2020, 3:00 PM

#

yeah, it's sklearn models.
but sir, what should i do if i need some accuracy testing on SVM (when i know svm don't work with epochs)

desert oar Aug 3, 2020, 3:01 PM

#

you need boostrapping or cross validation for that

#

you should do that with other models too btw

autumn veldt Aug 3, 2020, 3:01 PM

#

like cnn?

desert oar Aug 3, 2020, 3:01 PM

#

yes although if the model is big and complicated and slow to fit, then sometimes it doesnt make sense because it would take too long to run

autumn veldt Aug 3, 2020, 3:02 PM

#

actually my dataset only 800+ images

desert oar Aug 3, 2020, 3:02 PM

#

i recommend you step back and consider why all this is happening

#

look at how SVMs and decision trees are fitted

#

and why epochs are used in fitting NNs but not other models

autumn veldt Aug 3, 2020, 3:04 PM

#

ok sir, thanks btw

safe tapir Aug 3, 2020, 3:14 PM

#

Is anyone familiar with the text-to-audio generation process? I'm interested in realistic / emotional voices.

Specifically:

Are there any datasets that contain labeled emotional audio data (e.g. "sad", "happy", "surprised")
Is there any intuition for what spectrograms would look like which emotions?

grave frost Aug 3, 2020, 3:32 PM

#

Hey all! I had posted a question earlier on what model would be good for cipher applications like Input: welcome Output; njoigfr. I had been recommended to use models like BERT with powerful word embeddings but it seems that NLP models study tokens with respect to other words in a sentence. My intention is not to have it take it as a sentence. My intention is for the model to find out a relationship between the input and output and on the basis of the relationship predict the output accordingly.

MY training data is a .csv file which looks like this:-
inp1, out1
inp2,out2

#

Here inp stands for input and out stands for output. So can anyone confirm whether a BERT-like NLP model can find the relationship b/w input and output data considering 1 row not to consider the whole dataset?

acoustic halo Aug 3, 2020, 3:33 PM

#

I recommended bert to you, it was before i knew you were doing ciphers though and would definitely be a bad idea

#

bert is used for getting the contextual information between words, which a hash has no use for

grave frost Aug 3, 2020, 3:34 PM

#

So would you happen to know what model would be able to handle that?

acoustic halo Aug 3, 2020, 3:35 PM

#

Throw out any model that relies on word embeddings, unless you know for a fast that they are used in the hashing function, do you have idea how the hashes are actually generated?

grave frost Aug 3, 2020, 3:37 PM

#

Yeah, They are made from a crytographic function though I don't know how exactly they accomplish that. For me, that function is like a black box...

acoustic halo Aug 3, 2020, 3:37 PM

#

and are the output hashes always the same length?

grave frost Aug 3, 2020, 3:38 PM

#

We can pad, them right?

#

The input has to be padded in any case...

acoustic halo Aug 3, 2020, 3:39 PM

#

Well, does the output always have the same number of characters as the input?

grave frost Aug 3, 2020, 3:40 PM

#

I think so. I haven't decided on a cipher yet but probably the input characters would be equal to the output ones...

acoustic halo Aug 3, 2020, 3:43 PM

#

Well, if its going to be some kind of substitution cipher like enigma, perhaps a RNN would be best

#

https://greydanus.github.io/2017/01/07/enigma-rnn/

Learning the Enigma with Recurrent Neural Networks

Learning about learning.

grave frost Aug 3, 2020, 3:45 PM

#

Hmm... What if I use a crytographic hash? then RNN's won't be very suitable then. The timesteps will have no relations whatsoever....

acoustic halo Aug 3, 2020, 3:45 PM

#

Then you probably wont be able to solve it with a NN faster than brute force

#

Because you probably have a salt to figure out as well as an unknown algorithm

grave frost Aug 3, 2020, 3:46 PM

#

The whole point is to determine whether there do exist any arbitrary relations between the hashes and the inputs. There is always that bias in there. Even though the chances are pretty slim, but I wanna experiment on them

acoustic halo Aug 3, 2020, 3:47 PM

#

Just try a dense netwoork for a start

desert oar Aug 3, 2020, 3:47 PM

#

is the entire message encrypted? or are we talking about individual encrypted words

#

because if you encrypt each word in a message then you're basically solving a "fill in the blank" problem where you probabilistically infer the mots likely word in each slot

grave frost Aug 3, 2020, 3:47 PM

#

Initially I am considering hashes as input and the numbers as output..

desert oar Aug 3, 2020, 3:48 PM

#

but to decrypt an entire encrypted message is basically saying "i dont care about the theoretical results im going to try anyway" which seems like its likely to result in failure but i guess it doesnt hurt to try

acoustic halo Aug 3, 2020, 3:49 PM

#

The numbers correspond to the initial word?

grave frost Aug 3, 2020, 3:49 PM

#

Of course, but doesn't hurt to try. The hashes are pretty complex, but I want to start delving into some pre-college research about it and maybe brainstorm some ideas in the later years...

#

@acoustic halo the hashes correspond to the numbers

acoustic halo Aug 3, 2020, 3:50 PM

#

And how do you link a number to a hash?

grave frost Aug 3, 2020, 3:50 PM

#

By encrypting it

#

Basically encrypting the number as output and the hash as input for the model

acoustic halo Aug 3, 2020, 3:53 PM

#

Well, i would start with just a densely connected network as a start

grave frost Aug 3, 2020, 3:53 PM

#

My model should be able to derive more than the statistical relationship and move towards complex ones. That's why I am struggling to choose the right model. High dimensional vector representations seem like a weak start, but would probably do.

#

But Dense layers cannot absorb abstract relations

acoustic halo Aug 3, 2020, 3:54 PM

#

Considering you don't know if any relation exists anyway, it would be a start

#

And several stacked dense layers can learn complex and abstract patterns

#

depending on how you define abstract

grave frost Aug 3, 2020, 4:14 PM

#

Well, good for a start I guess. May look for 1024 Dense layers just for starting 🙂 but I guess it will do for experimentation....

flat quest Aug 3, 2020, 4:18 PM

#

i mean stacked dense layers are basically what most neural nets are :/. And they work pretty well in a lot of cases

odd yoke Aug 3, 2020, 4:20 PM

#

ehhh, that's a bit of a stretch

#

fully connected layers really aren't that common anymore

#

*networks, not layers

flat quest Aug 3, 2020, 4:22 PM

#

well networks are different. There's this concept of sparse layers that is used, but numerically the operation is pretty much the same in most cases. We're still running computations over those 0's its just a lot more effecient.

odd yoke Aug 3, 2020, 4:26 PM

#

Yeah I understand that you one can be used to represent the other (not that it should be done), but with that definition you can basically go down up to like addition and multiplication, and while it may be true, it's not exactly a useful definition

flat quest Aug 3, 2020, 4:56 PM

#

true true, but at the same time a lot of people think these various layers are completely different things since they never look at the actual mathematical operation behind it.

Its good to know where their similarities lie, and why they work.

desert oar Aug 3, 2020, 5:14 PM

#

are there any coherent resources on neural network architectures for more "traditional" problems? im specifically not interested in the typical deep learning domains like images, audio, video, nlp/text, or even time series. im wondering about more mundane problems like autoencoders and prediction on "social science" datasets, more akin to titanic, boston housing, etc. than mnist.

#

id be interested in any research comparing training times & prediction/inference performance with other methods like xgboost

#

i ask because i was recently playing around with some NNs that gave me huge increases in accuracy (like 10+ percentage points) on a problem at my company, using just 1 hidden layer with parameters that just sounded like nice round numbers and weren't hyper-optimized at all. so it got me thinking that there was a lot of untapped potential for neural networks in domains where they aren't necessarily popular or dominant. trying to educate myself a bit.

lapis sequoia Aug 3, 2020, 7:03 PM

#

What is Data Science. I dont really have a clear understanding of what it is and what it is used for.

desert oar Aug 3, 2020, 7:36 PM

#

@lapis sequoia it's a broad term that encompasses statistics, machine learning, and data analysis. usually someone with a "data scientist" job title works on some combination of those things.

past maple Aug 3, 2020, 7:42 PM

#

hello anyone here?

#

have a little doubt here.

#

so i trained a model and the accuracy shows to be around 90%.
but when i submit the results, my AUC-ROC Score comes out to be very low.(in the range of 0.5)
so what i am doing wrong?

desert oar Aug 3, 2020, 7:53 PM

#

@past maple this is binary classification? are your classes very imbalanced?

past maple Aug 3, 2020, 7:59 PM

#

yes its binary classification.

#

also yes imbalanced classes.

#

but when i use random forest the accuracy is quite less like 18% but the AUC-ROC Score score improves. (in the range of 0.8)

#

@desert oar

desert oar Aug 3, 2020, 8:01 PM

#

if your classes are 90% "A" and 10% "B" your model can get 90% accuracy by predicting "A" for any input

#

random forest might be doing a better job at not overfitting to the baseline class distribution

past maple Aug 3, 2020, 8:02 PM

#

so how do i overcome this thing?

dreamy fractal Aug 3, 2020, 8:04 PM

#

If the data is highly imbalanced, computing the accuracy is perhaps not the best way to evaluate your model

past maple Aug 3, 2020, 8:06 PM

#

then what should this poor soul do?

dreamy fractal Aug 3, 2020, 8:06 PM

#

I think your first model always predict 0 or always predict 1, hence the AUC score close to 0.5

past maple Aug 3, 2020, 8:06 PM

#

yes right.

dreamy fractal Aug 3, 2020, 8:07 PM

#

Consider using other metrics such as precision, recall or F1 score. You can also vizualise the confusion matrix to see where you make most of your errors

past maple Aug 3, 2020, 8:08 PM

#

okay noted, will check with that.

desert oar Aug 3, 2020, 8:13 PM

#

depending on your model you can improve the outcome by adjusting hyperparameters

#

you might also have success with oversampling or undersampling, those dont always work well though

past maple Aug 3, 2020, 8:15 PM

#

yes, i have tried adjusting the hyperparameters for random forests. but then it slightly improves the model.

charred blaze Aug 3, 2020, 8:28 PM

#

I'll second the use of other evaluation metrics. Consider those which are more adequate for your scenario of a binary classification with an unbalanced label distribution, like weighted accuracy or class balance accuracy. For binary classification, I'm quite partial to geometric mean of sensitivity and specificity.

past maple Aug 3, 2020, 8:32 PM

#

okay okay, thank you tho. i will see what i can do.
honestly i am just getting started so figuring out these things.

lapis sequoia Aug 3, 2020, 8:51 PM

#

python ok

iron rampart Aug 3, 2020, 9:09 PM

#

Hey, so i've been doing Machine Learning for a few days and have this code

import numpy as np
import pandas as pd
from sklearn import linear_model
import sklearn
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
from matplotlib import style
import pickle

style.use("ggplot")

data = pd.read_csv("student-mat.csv", sep=";")

predict = "G3"

data = data[["G1", "G2", "absences","failures", "studytime","G3"]]
data = shuffle(data) # Optional - shuffle the data

x = np.array(data.drop([predict], 1))
y =np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)


best = 0
for _ in range(20):
    x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)

    linear = linear_model.LinearRegression()

    linear.fit(x_train, y_train)
    acc = linear.score(x_test, y_test)
    print("Accuracy: " + str(acc))

    if acc > best:
        best = acc
        with open("studentgrades.pickle", "wb") as f:
            pickle.dump(linear, f)

pickle_in = open("studentgrades.pickle", "rb")
linear = pickle.load(pickle_in)


print("-------------------------")
print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)
print("-------------------------")

predicted= linear.predict(x_test)
for x in range(len(predicted)):
    print(predicted[x], x_test[x], y_test[x])

plot = "failures"
plt.scatter(data[plot], data["G3"])
plt.legend(loc=4)
plt.xlabel(plot)
plt.ylabel("Final Grade")
plt.show()```

But i still just don't get it. The results are still the same. It looks like it won't learn anything

uncut shadow Aug 3, 2020, 9:45 PM

#

wdym?

iron rampart Aug 3, 2020, 9:50 PM

#

So the result's are the same, i can not see if it learns from it

tidal bough Aug 3, 2020, 10:06 PM

#

It looks like you're recreating the model every epoch 😛 No wonder it doesn't learn.

iron rampart Aug 3, 2020, 10:07 PM

#

I'm sorry my english isnt that great haha and i've just begon so if i being honest. Don't know what to do, and how to not recreate eery epoch

desert oar Aug 3, 2020, 10:07 PM

#

this is the same problem that someone else had

#

you dont use epochs with sklearn models

iron rampart Aug 3, 2020, 10:10 PM

#

So i should just remove the line of code?

quartz crow Aug 3, 2020, 10:34 PM

#

data science best tutorial ?

#

i am a beginner at the moment

sand spoke Aug 3, 2020, 10:36 PM

#

Specializations on Coursera can help you

fervent bridge Aug 3, 2020, 10:45 PM

#

@acoustic halo Just saw your message now, just figured that HDF5 would be of more convenience later down the road just in case I have to move around between libraries, better to learn it now then later

#

@lapis sequoia Did the link help you out

quartz crow Aug 3, 2020, 11:02 PM

#

actually a lot of things bro ml deep learning and computer vision engineers and data scientist which to choose

fervent bridge Aug 3, 2020, 11:11 PM

#

You have $10 to spare @quartz crow ?

quartz crow Aug 3, 2020, 11:11 PM

#

no bro i am a poor at the moment

fervent bridge Aug 3, 2020, 11:12 PM

#

ML/DL, computer vision, data scientist all kind of fall under the same umbrella

#

with computer vision you utilize ML/DL

quartz crow Aug 3, 2020, 11:12 PM

#

yeah i know

fervent bridge Aug 3, 2020, 11:12 PM

#

just you are working withimages

#

as a data scientist if someone gives you images as data you as a data scientist are supposed to extract valuable information from that data and make it work with ML/AI

#

Hmm if you had $10 I would of recommended a nice update tensorflow 2 course that covered all those topics

#

but take a look at Sentdex on youtube

#

he has a lot of tuts some may be outdated though

quartz crow Aug 3, 2020, 11:14 PM

#

hmm ok . i need help for cv

fervent bridge Aug 3, 2020, 11:14 PM

#

https://www.youtube.com/watch?v=wQ8BIBpya2k&list=RDCMUCfzlCWGWYyIQ0aLC5w48gBQ&index=2

YouTube

sentdex

Deep Learning with Python, TensorFlow, and Keras tutorial

An updated deep learning introduction using Python, TensorFlow, and Keras.

Text-tutorial and notes: https://pythonprogramming.net/introduction-deep-learning-python-tensorflow-keras/

TensorFlow Docs: https://www.tensorflow.org/api_docs/python/
Keras Docs: https://keras.io/lay...

▶ Play video

#

Take a look at this one its updated

tidal bough Aug 3, 2020, 11:15 PM

#

coursera also has plenty of nice courses

#

Most of them are paid, but you can access any paid course in audit mode, which as far am I'm aware literally only disallows you from doing quizzes. All the materials, and most importantly programming assignments (including their automatic grading) are available.

turbid hearth Aug 4, 2020, 12:34 AM

#

📎 unknown.png

#

how can i fit my regression model better

flat quest Aug 4, 2020, 3:12 AM

#

tbh there's enough free material out there that paid courses aren't entirely necesarry.

The advanced stuff you can just learn through medium and reading through papers.

fervent bridge Aug 4, 2020, 4:31 AM

#

@flat quest Yeaup but I wouldn't recommend it if wasn't a good course, medium and reading through papers requires that the reader most of the time build their own structure in order to learn what to do next. I mean not many have a A-Z fully structured medium article 🙂 but yes a lot of free material

#

Woot woot got HDF5 to work in appending mode 🙂 currently looping through my 40k images 🙂

flat quest Aug 4, 2020, 4:35 AM

#

yeah you have to figure out how to get the information, and which one will be most useful for you

But its something everyone has to do eventually

#

nice! ;D. Guess it didn't take too long to learn?

fervent bridge Aug 4, 2020, 4:36 AM

#

Nope, I mean internet was getting installed today, was out, took about 2 hours of research

#

🙂

flat quest Aug 4, 2020, 4:38 AM

#

ah i see. Yeah its worth learning it, can use it with a number of different libs/packages.

fervent bridge Aug 4, 2020, 4:38 AM

#

Yeah always better to get the tough part out early rather then later.

flat quest Aug 4, 2020, 4:39 AM

#

^^

#

its also more effecient so makes working with data a lot nicer

fervent bridge Aug 4, 2020, 4:45 AM

#

Yeah I see it's taking care of the reshaping in itself per batch. So I don't have to reshape through Numpy.

#

📎 screenshot-9320933e.jpg

#

HDF5 mantains order right? @flat quest

past schooner Aug 4, 2020, 4:59 AM

#

hey guys, I've got a work problem
I'm scraping media news articles and I need to get the article's text that's not a script or some other random sh*t you can find in html data. I'm using BeautifulSoup, for example soup.select('article p')

#

it is 1 am and I'm asking questions on a discord channel I just joined, so bear with me

bitter harbor Aug 4, 2020, 5:06 AM

#

You know if their tos’s allow it?

past schooner Aug 4, 2020, 5:07 AM

#

if you mean the source, yes, though good point to ask

flat quest Aug 4, 2020, 5:13 AM

#

I think so yeah @fervent bridge. Not entirely sure on that one

blazing bridge Aug 4, 2020, 5:18 AM

#

Hmm if you had $10 I would of recommended a nice update tensorflow 2 course that covered all those topics
@fervent bridge could you send the link to it

bitter harbor Aug 4, 2020, 5:18 AM

#

I’m not too familiar with soup but would changing the .select class to children work?

outer fulcrum Aug 4, 2020, 5:21 AM

#

Hey guys, do you know a good project example in data science where I can train OOP ?

fervent bridge Aug 4, 2020, 5:49 AM

#

https://www.udemy.com/course/deep-learning-tensorflow-2

Udemy

Tensorflow 2.0: Deep Learning and Artificial Intelligence

Neural Networks for Computer Vision, Time Series Forecasting, NLP, GANs, Reinforcement Learning, and More!

#

@blazing bridge Its a great course for High Level knowledge, I mean I consider it a must have. Covers a wide range of NN in TensorFLow 2

#

Again its all High Level focused around TensorFlow 2 but great to work with.

bitter harbor Aug 4, 2020, 5:50 AM

#

ty

blazing bridge Aug 4, 2020, 5:51 AM

#

@fervent bridge thank you

fervent bridge Aug 4, 2020, 5:54 AM

#

yeaup going to bed, but This course and NNFS got me on the right track, NNFS providing more lower leven knowledge and the Udemy course complementing it.

#

https://nnfs.io

desert parcel Aug 4, 2020, 9:21 AM

#

import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader

inputs = np.array([[313, 1], [323, 1], [333, 1], [343, 1]], dtype='float32')
target = np.array([[14.76], [16.42], [18.08], [23.41]], dtype='float32')

inputs = torch.from_numpy(inputs)
target = torch.from_numpy(target)

model = nn.Linear(2, 1)
preds = model(inputs)

train_ds = TensorDataset(inputs, target)
train_dl = DataLoader(train_ds, batch_size=5, shuffle=True)

loss_fn = F.mse_loss
loss = loss_fn(preds, target)

opt = torch.optim.Adam(model.parameters())

def fit(num_epochs, model, loss_fn, opt):
    with torch.autograd.set_detect_anomaly(True):
        for epoch in range(num_epochs):
            for xb, yb in train_dl:
                pred = model(xb)
                loss = loss_fn(preds, yb)
                loss.backward(retain_graph=True)
                opt.step()
                opt.zero_grad()

            if (epoch+1) % 10 == 0:
                print(f"Epoch: {epoch+1}, loss: {loss.item()}")

fit(50, model, loss_fn, opt)

Output:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!```

#

I followed the hints

#

But have no idea what that means

#

So I guess I tried to follow the hint

desert oar Aug 4, 2020, 10:17 AM

#

oh boy that's a fun one

#

what is loss_fn? @desert parcel

#

can you also show some of said backtrace?

lapis sequoia Aug 4, 2020, 10:47 AM

#

I wish they had an ML practice section CodeWars
just writing short code or "complete this code" for Numpy/R/PyTorch to practice when there is time to kill
is there any website like that?

uncut shadow Aug 4, 2020, 11:00 AM

#

Don't think so. it would require lots of computational power cuz u can't just check if your code is the same as the correct one

#

(it assumes u mean deep learning)

#

But other algorithms might require lots of power too

#

So, (IMO) pretty unlikely

#

But ofc u can google and check it yourself

lapis sequoia Aug 4, 2020, 11:16 AM

#

Kaggle is good

#

actually there is a lot of good content on there

#

but a lot of the quick practice content is from universities, and they lock access to everyone but the people in that course

desert oar Aug 4, 2020, 11:22 AM

#

@lapis sequoia just download a dataset and play with it

#

e.g. from the UCI machine learning site

#

plenty of small clean easy to understand datasets out there

#

or simulate your own data, which is more advanced but potentially more educational

lapis sequoia Aug 4, 2020, 11:29 AM

#

that's not how I learn

#

it's the same as giving a student a bunch of problems without teaching him how to solve them and say "just try them"

desert oar Aug 4, 2020, 11:30 AM

#

oh, you are asking for specific tasks to complete

#

not just data

#

there is actually an interesting lack of that, out there in the world

#

obviously students get those kinds of assignments in school

lapis sequoia Aug 4, 2020, 11:31 AM

#

I believe Udacity and Coursera courses have such practice problems

desert oar Aug 4, 2020, 11:31 AM

#

it could be an interesting niche

#

yeah

#

but nothing public like you're talking about

lapis sequoia Aug 4, 2020, 11:31 AM

#

make a new website

#

you can charge for compute cost

desert oar Aug 4, 2020, 11:31 AM

#

who even writes questions for those kinds of sites?

#

yeah it could obviously time out computations after a certain point, limit memory usage and processes/threads

#

charge for premium membership to answer more than N challenges per day

#

etc

#

seems like a legit product tbh

high pulsar Aug 4, 2020, 1:59 PM

#

hey guys, I've got a work problem
I'm scraping media news articles and I need to get the article's text that's not a script or some other random sh*t you can find in html data. I'm using BeautifulSoup, for example soup.select('article p')
@banville#2284 You can use regex maybe?

odd yoke Aug 4, 2020, 2:02 PM

#

Using regex to parse html is an absolutely terrible idea

desert oar Aug 4, 2020, 2:18 PM

#

this

jolly briar Aug 4, 2020, 3:25 PM

#

idk , for simple stuff regex can be ok, as a general rule its better to use bs4 or something, but for something simple egrep and sed can be useful

grave frost Aug 4, 2020, 4:35 PM

#

Can anyone explain why Keras Embedding layers doesn't accept strings? It seems to run on numbers fine however For strings it requires one-hot encoding which kinda defeats the purpose of creating the embeddings. In the end, I got it One-hot encoded (a -> 1; b -> 2) But am still curious. Isn't the whole purpose of embeddings to represent data in higher dimensions? Why didn't Keras implicitly understand it and encoded it accordingly??

#

Would it be just a lack of feature in Keras, or does it make sense not to have the embeddings accept strings and have the dev one-hot encode them?

spark stag Aug 4, 2020, 4:48 PM

#

@grave frost when one-hot encoding, you turn some value, in your case characters, into an array of values, containing one truthy value (a one), if keras were to try one-hot encode data as you feed it to it, it wont know how long to make each array, e.g. if trying to one hot encode a sequnce like 0, 3, 1, 2, 0, 3, then this could be turned into a matrix of shape 6x4 (6 items, 4 classes), if the model is being slowly fed information then it does not know how many classes there will be so will have inconsistantly shaped arrays representing each class (it could one-hot encode all data at once but then i think it needs to have every possible class in that input so it knows how many classes there are as this shouldn't be changeable)

fringe cove Aug 4, 2020, 4:50 PM

#

hello i have a simple dataframe with 2 colums including minutes and points. i just would like to see by curiosity what a sklearn model would predict for this dataset. how can i do that please. i dont really know sklearn and would like to see thank you

#

look like that

📎 unknown.png

grave frost Aug 4, 2020, 4:51 PM

#

@spark stag What about compromising on the inconsistency of input dim by padding after doing analysis of the data pipeline OR more practical, having the user specify the custom dims of the batches if data is like that and handle it accordingly. Like if I had encoded something like [0. 0. 0. 1. .....] for each character, it would be overkill and too memory intensive (Like used by scikit-learn lib for 1-hot). I just think that the whole way it works could be improved manifold and is a bit too complex....

#

@fringe cove look up Linear Regression...

fringe cove Aug 4, 2020, 4:53 PM

#

ok thank you

sinful dock Aug 4, 2020, 4:53 PM

#

hey folks, anyone knows how to drop multiple columns from a datframe using slices of index locations?. I've been stuck on this for a couple of hrs and can't find anything on Stackoverflow , wanting to drop columns with indexes 1:31 and all columns after column index 67 stage_metrics.drop(stage_metrics.iloc[:, [1:31, 67:], axis = 1, inplace = True)
Also tried this one stage_metrics.drop(stage_metrics.columns[1:31, 67:], axis = 1, inplace = True)

grave frost Aug 4, 2020, 4:55 PM

#

Why don't you use a Pandas DataFrame? It has all these functionalities and would serve as a much better and arguably a more feauturefull tool for any dataset...

sinful dock Aug 4, 2020, 4:56 PM

#

yes, I'm on pandas, sorry trying to get the code to display in color

spark stag Aug 4, 2020, 5:00 PM

#

@grave frost there probably are ways of doing it but its quite a lot of overhead for it to process as its being fed data, especially as, unless they process the data every time they see it each epoch, a new copy of the data needs to be made that is one hot encoded so now you have 2 copies in memory

grave frost Aug 4, 2020, 5:01 PM

#

Hmm... that makes sense

spark stag Aug 4, 2020, 5:02 PM

#

idk, there may be an easy way for it to be implemented but i wouldn't say in my experiance at least its too much effort to do manually, especialy considering how much easier keras makes setting up a neural network in general

grave frost Aug 4, 2020, 5:03 PM

#

Of course, But I had to spend an hour or something just to make that one-hot encoding work (I don't like coding) And I couldn't use Sk-Learn at all for my use case..

#

A dedicated lib for that would be so much better and smooth..

flat quest Aug 4, 2020, 5:04 PM

#

@desert oar

A lot of beginners might use that kind of site. Don't think it'll be really that helpful in pursueing a DS career, but yeah there's a good chance beginners might buy

desert oar Aug 4, 2020, 5:05 PM

#

@flat quest yep, about as useful as pursuing a programming career 😛

#

good for younger people i think

#

or real real novices who dont know enough to make toy problems for themselves

flat quest Aug 4, 2020, 5:06 PM

#

xd
Maybe as an introductory. They're gonna have to learn to ask their own questions on the data and make their own problems. Not many ppl get to that stage :/.

But if they buy it -> its a selling product 😉

grave frost Aug 4, 2020, 5:09 PM

#

With what all the blaze on Youtube and other resources, it seems hard to beleive that any beginner would buy somthing like that. When I was starting up, I saw plenty of these paid resources but the free ones don't have any problems. The real factor is that these paid resources usually just bunch the topics in the right order in one place so as to not have people looking complex eqs on Wiki or hunting YT for an explanation on k-means clustering. That said, few people do them for learning. Mostly they are for boosting credentials for newbies who think they matter....

fringe cove Aug 4, 2020, 5:12 PM

#

@grave frost ok i manage to do this following a tutorial

📎 unknown.png

#

i suppose these scores are y = ax+B ?

flat quest Aug 4, 2020, 5:12 PM

#

you'd be surprised how many noobies do that
Yes you can learn DS through reading online articles, books, yt, working on your own datasets, etc. But very few people are actually willing to go through all that. They'd rather complete a course or a set of problmes that would certify them as job-ready.

Tho algorithmic competition sites are quite widely used. So some people might do it just for the fun of it

grave frost Aug 4, 2020, 5:13 PM

#

Is the plot for the whole data, and is it correctly represented? double check all your code because it doesn't seem like a Linear problem but rather a regression one.

fringe cove Aug 4, 2020, 5:15 PM

#

i think i messed up in my head

odd yoke Aug 4, 2020, 5:15 PM

#

@fringe cove no, it's the R2 coeff

grave frost Aug 4, 2020, 5:15 PM

#

@flat quest But to be honest, they really aren't actually much use even for getting "Job-ready". I have read experinces of many Data Scientists who have done an analysis on people who have put MOOC's on their CV and whether they got the job or not. The numbers aren't very pretty....

odd yoke Aug 4, 2020, 5:15 PM

#

it's an indicator that represents how well your model fits the data

fringe cove Aug 4, 2020, 5:16 PM

#

oh yeah ofc

#

haha

flat quest Aug 4, 2020, 5:17 PM

#

oh they're not useful at all @grave frost

But noobies will always fall for it.

grave frost Aug 4, 2020, 5:17 PM

#

Newbies will fall for anything, as long as it looks professional and is affiliated to a big company..

flat quest Aug 4, 2020, 5:18 PM

#

^^

#

and thats how you make a selling product xd

fringe cove Aug 4, 2020, 5:18 PM

#

so this is the data i have just to start over becausee i think i'm overplaying it. this is nba scoreboard for a season of aa player

📎 unknown.png

odd yoke Aug 4, 2020, 5:18 PM

#

Saying they're not useful at all definitely wrong, sure, doing a MOOC doesn't mean you're fit for a job yet, but it doesn't mean you didn't learn anything doing said MOOC

fringe cove Aug 4, 2020, 5:18 PM

#

if i know this player will have a minutes > 30 minutes in next game

odd yoke Aug 4, 2020, 5:18 PM

#

Also, saying "MOOC are bad just watch youtube" is laughable

fringe cove Aug 4, 2020, 5:18 PM

#

is it possible to have a model from all these data ? and make a prediction for points ?

grave frost Aug 4, 2020, 5:19 PM

#

@odd yoke Of course, but YT would still be free anyways...

odd yoke Aug 4, 2020, 5:19 PM

#

Yes, that's true, but that's also the case for some coursera courses for example

#

(I agree the """"certification"""" they give you at the end is basically digital toilet paper)

flat quest Aug 4, 2020, 5:20 PM

#

sure you can gain exp and knowledge from an MOOC
but an MOOC doesn't really mean all that much to a job recruiter

odd yoke Aug 4, 2020, 5:20 PM

#

And when you see frauds like Siraj Raval on yt having such big communities, I find it hard to say that using yt is a better idea

fringe cove Aug 4, 2020, 5:20 PM

#

his videos are cool ( as a complete newb)

odd yoke Aug 4, 2020, 5:21 PM

#

He is the embodiment of the ML hype taken to the extreme

#

He doesn't know much about it, but pretends he does, because it draws people in

grave frost Aug 4, 2020, 5:21 PM

#

@fringe cove Take my advice- stop watching him now

fringe cove Aug 4, 2020, 5:21 PM

#

^^

#

i'm trying to get practical with data by making some scripts with nba data for player performance

#

i have mastered the scraping and now have a complete data for the season for every player

#

as u can see i can plot things etc and make some deductions with my brain

#

but i d love to see what a mathematical model could do

sinful dock Aug 4, 2020, 5:23 PM

#

@flat quest What you guys recommend then to use to learn instead of MOOC's if you are beginner?

odd yoke Aug 4, 2020, 5:24 PM

#

I think MOOCs are fine

grave frost Aug 4, 2020, 5:24 PM

#

@odd yoke Did you see some of the videos that gave evidence that he:-

Plagiarised a Paper on Neural cubits and claimed that he wrote it
Copied tons of Code from Github without citing the author, made minimal changes, and called it his own code
Filed a YT copyright infringement on another YTber who unearthed all his black activities
scammed newbies in a $200 course titled "How to earn money with ML" from which he made approx. 200,000$

odd yoke Aug 4, 2020, 5:24 PM

#

Yeah I did

#

I'd rather not talk about him or I'll get angry and spam this channel when someone is asking questions

grave frost Aug 4, 2020, 5:24 PM

#

Ok 😆

fringe cove Aug 4, 2020, 5:25 PM

#

yeah stop fight and just model my shit xd

flat quest Aug 4, 2020, 5:25 PM

#

they're not bad for new people. Just don't list them on your resume or depend on them too much.

But at some point you should transition into just reading through other people's work and suggestions (articles, papers, books, other resources, yt maybe), rather than following a predefined course @fringe cove

odd yoke Aug 4, 2020, 5:26 PM

#

Yeah, just like "regular" programming really, you can't stick to tutorials indefinitely

sinful dock Aug 4, 2020, 5:27 PM

#

Agree, you have to put that knowledge into practice. for someone that doesn't have a Comp Sc education i think it might help to redirect attention to another area

fringe cove Aug 4, 2020, 5:28 PM

#

what should i look for when what i want to do is like feeding lots of data and make the model find the best fit for predictions ? i dont know if i make sense at all

#

but in my case

#

tonight orlando plays against indiana

#

i would like to know what a model would say about one player points for tonight game

odd yoke Aug 4, 2020, 5:28 PM

#

You're defining machine learning here, it's kinda hard to give a useful answer with such a broad question

fringe cove Aug 4, 2020, 5:28 PM

#

according to all previous data

#

yeah i want ml haha

#

only experience i had was mechanical arm movement training while in internship

#

but nothing else

odd yoke Aug 4, 2020, 5:29 PM

#

There is all sort of recurrent networks you could use if you want to preserve knowledge from past matches, I don't have experience in that area so perhaps there are better fitting algorithms but maybe start looking there

fringe cove Aug 4, 2020, 5:31 PM

#

but can u do this in like 2 lines of code with sk learn just to see a basic view ?

hearty seal Aug 4, 2020, 5:31 PM

#

hello guys

fringe cove Aug 4, 2020, 5:31 PM

#

i realise it looks candid

#

but rn i'm just curious af

#

and have no knloedge at all in ml

hearty seal Aug 4, 2020, 5:31 PM

#

i just wanted to know if the book i bought "datascience from scratch" is a good book to start data science with

fringe cove Aug 4, 2020, 5:32 PM

#

if u bought it i think thats it is worth trying it lol

hearty seal Aug 4, 2020, 5:32 PM

#

its like 450 pages xd

#

i am trying to finish a python tutorial book first before i start

odd yoke Aug 4, 2020, 5:33 PM

#

never heard of it

hearty seal Aug 4, 2020, 5:35 PM

#

sorry if i killed your convo there

fringe cove Aug 4, 2020, 5:38 PM

#

dw i'm just a noob like depending on them to tell me what to do so ^^

odd yoke Aug 4, 2020, 5:49 PM

#

ask away

#

you mean you have the indices of an element, and want to find its column ?

fringe cove Aug 4, 2020, 6:05 PM

#

if u know the cell u can find the column name with the indice no ?

arctic cliff Aug 4, 2020, 6:36 PM

#

How can I get the repeated names ?

📎 unknown.png

odd yoke Aug 4, 2020, 6:36 PM

#

.duplicated

balmy ice Aug 4, 2020, 6:50 PM

#

Hi i am a student doing web development with django.
I am thinking to start moving toward AI and deep learning!
Can any of you show me the right path to start?

tidal bough Aug 4, 2020, 6:51 PM

#

I usually recommend this coursera course, it's a very nice overview of the field with programming assignments:
https://www.coursera.org/learn/machine-learning

#

It uses Octave for them, mind, not Python.

lapis sequoia Aug 4, 2020, 6:58 PM

#

Hi all! Is this a good place to ask a question about Pandas?

tidal bough Aug 4, 2020, 6:59 PM

#

Yeah, pretty good.

lapis sequoia Aug 4, 2020, 7:00 PM

#

Stupid question incoming. How can I append a row to a dataframe? I've been using append with no luck. I have the row in a list.

desert oar Aug 4, 2020, 7:01 PM

#

@lapis sequoia show us sample data & the code you're running, which reproduces the error or problem you have

lapis sequoia Aug 4, 2020, 7:02 PM

#

right away

#

    return [i**2, i**3, i**4]

df = pd.DataFrame(columns=['i','a','b','c'])


for i in range(100):
    [a,b,c] = my_fun(i)
    df.append([i, a,b,c])
    
display(df)```

desert oar Aug 4, 2020, 7:07 PM

#

ah

#

DataFrame.append doesn't work like list.append

#

it doesn't modify the dataframe, it creates a new one with the row appended

#

def my_fun(i):
    return [i**2, i**3, i**4]

df = pd.DataFrame(columns=['i','a','b','c'])


for i in range(100):
    [a,b,c] = my_fun(i)
    df = df.append([i,a,b,c])
    
display(df)

#

however i don't really recommend constructing dataframes this way. it's quite inefficient

#

it's much faster if you do it like this:

def my_fun(i):
    return [i**2, i**3, i**4]

colnames = ['i','a','b','c']

data = []
for i in range(100):
    a, b, c = my_fun(i)
    record = {'i': i, 'a': a, 'b': b, 'c': c}
    data.append(record)

df = pd.DataFrame(data)

lapis sequoia Aug 4, 2020, 7:11 PM

#

make a list of dicts

#

and then a dataframe out of the list of dicts?

desert oar Aug 4, 2020, 7:11 PM

#

or maybe better still:

def my_fun(i):
    return [i**2, i**3, i**4]

data = []
for i in range(100):
    a, b, c = my_fun(i)
    data.append([i, a, b, c])

df = pd.DataFrame(data, columns=['i','a','b','c'])

#

yeah list-of-dicts is one way

#

list-of-lists is another

lapis sequoia Aug 4, 2020, 7:12 PM

#

aha

#

so don't modify the dataframe

desert oar Aug 4, 2020, 7:12 PM

#

yeah adding rows to a dataframe is really slow

lapis sequoia Aug 4, 2020, 7:12 PM

#

only call stuff from it

desert oar Aug 4, 2020, 7:12 PM

#

adding columns isn't that bad

#

actually adding columns is pretty efficient

#

but adding rows is slow and you should avoid it if possible

lapis sequoia Aug 4, 2020, 7:12 PM

#

ok

#

thans a lot

#

one more thing

#

where can I learn stuff like this about pandas?

desert oar Aug 4, 2020, 7:13 PM

#

reading the docs and trying things

lapis sequoia Aug 4, 2020, 7:13 PM

#

I am a Matlab refugee with 0 pandas experience

desert oar Aug 4, 2020, 7:13 PM

#

there are a lot of docs pages, not all of them are well-written or easy to understand

#

ah

#

well, you should feel comfortable with numpy

#

which is basically modeled after matlab

#

pandas is more like R

#

or more like Excel if you've never used R

lapis sequoia Aug 4, 2020, 7:14 PM

#

ok

#

I tried with the docs

#

not easy

desert oar Aug 4, 2020, 7:14 PM

#

yeah, you have to suffer through it

#

one of my many "todos" is to contribute better user guide content for pandas

quartz crow Aug 4, 2020, 7:15 PM

#

for machine learning engineer what skills are necessary

lapis sequoia Aug 4, 2020, 7:15 PM

#

@desert oar Thanks a lot for your time! Have a nice day/evening!

desert oar Aug 4, 2020, 7:16 PM

#

youre welcome

serene scaffold Aug 4, 2020, 7:17 PM

#

Has anyone here used async for their data science stuff?

desert oar Aug 4, 2020, 7:17 PM

#

only for webscraping or otherwise hitting APIs. not much value in it otherwise

serene scaffold Aug 4, 2020, 7:18 PM

#

I've used joblib to parallelize stuff but that's not the same thing.

desert oar Aug 4, 2020, 7:18 PM

#

async =/= parallel

serene scaffold Aug 4, 2020, 7:18 PM

#

right

desert oar Aug 4, 2020, 7:18 PM

#

stick with joblib

serene scaffold Aug 4, 2020, 7:18 PM

#

can you parallel with async?

#

or is it not even meant for that

desert oar Aug 4, 2020, 7:18 PM

#

yeah its not meant for that

serene scaffold Aug 4, 2020, 7:18 PM

#

then what's async for for?

desert oar Aug 4, 2020, 7:18 PM

#

asyncio lets you run stuff in a separate process w/ run_in_executor

#

thats a complicated question

#

you know how __iter__ works?

serene scaffold Aug 4, 2020, 7:19 PM

#

ye

odd yoke Aug 4, 2020, 7:19 PM

#

concurrency != parallelism

desert oar Aug 4, 2020, 7:19 PM

#

now imagine you await before you yield

#

that's what async for is

#

but yeah, async/await isn't even a good programming model for computational parallelism

#

let alone a good way to implement it in python

#

stick with joblib or concurrent.futures.ProcessPoolExecutor or multiprocessing.Pool

#

or dask or ray et al

odd yoke Aug 4, 2020, 7:20 PM

#

oh you said what i posted above already

desert oar Aug 4, 2020, 7:20 PM

#

yeah

#

well

#

not exactly!

#

😛

odd yoke Aug 4, 2020, 7:20 PM

#

i went to take my food and i saw this convo, should have scrolled up if i wanted to be useful

desert oar Aug 4, 2020, 7:20 PM

#

lol it happens

#

anyway async/await can make life easier if you're hitting APIs and you want the freedom to ctrl+c without doing a bunch of extra work

arctic cliff Aug 4, 2020, 7:21 PM

#

How can I get rid of this symbol so I can make a normal math operation? -_-

📎 unknown.png

#

📎 unknown.png

serene scaffold Aug 4, 2020, 7:21 PM

#

I think replace will only do the first occurrence

#

while currency_symbol in my_str:
    my_str = my_str.replace(currency_symbol, '')

#

could work

arctic cliff Aug 4, 2020, 7:22 PM

#

There's only one symbol in every price

serene scaffold Aug 4, 2020, 7:22 PM

#

😮

#

but aren't there four here?

arctic cliff Aug 4, 2020, 7:23 PM

#

After the sum

#

Because they are strings/objects

#

I assume

#

wait

#

Price object

#

Yeah

serene scaffold Aug 4, 2020, 7:24 PM

#

I have to head out but maybe rock salt lamp can help.

desert oar Aug 4, 2020, 7:25 PM

#

you have a lot of issues here

#

what are you actually trying to do
what does the source data look like

arctic cliff Aug 4, 2020, 7:26 PM

#

I'm trying to sum every price that has a specific same year date so I can get the earnings of every year

#

📎 unknown.png

desert oar Aug 4, 2020, 7:31 PM

#

so the data is like pd.Series(['Free', 'Free', '₹ 1,000', '₹ 530,000']) etc.

#

right?

arctic cliff Aug 4, 2020, 7:31 PM

#

Yeah

desert oar Aug 4, 2020, 7:31 PM

#

ok

#

well those are strings

#

python has no idea that the text contains numbers

#

so you can't just add them and expect them to be added like numbers

#

python doesn't know that "Free" means ₹0

#

so you need to parse the strings, to extract numbers

#

i can give you a solution, but you've been in this server long enough to start developing your own solutions

#

once you know the basics, "how do i do X" is a matter of putting together what you already know. maybe 80-90% of the time.

arctic cliff Aug 4, 2020, 7:34 PM

#

Right ..

#

Ok wait

twilit badger Aug 4, 2020, 7:38 PM

#

hey guys i just wanted to share a great opportunity: https://ignition-hacks-2020.devpost.com/?ref_content=default&ref_feature=challenge&ref_medium=discover
Its a very beginner friendly hackathon and offers a prize pool of $4700

Ignition Hacks 2020

Ignition Hacks 2020 is a free virtual hackathon that runs from August 22-23 that introduces middle and high school students to Artificial Intelligence.

arctic cliff Aug 4, 2020, 8:05 PM

#

@desert oar It was kinda a nightmare not gonna lie

📎 unknown.png

desert oar Aug 4, 2020, 8:13 PM

#

ok

#

nice try

#

it can be done more simply

arctic cliff Aug 4, 2020, 8:13 PM

#

How ?

desert oar Aug 4, 2020, 8:14 PM

#

just clean up the prices first

#

make a new column of "price" that contains numbers

#

you can use regex to remove all the non-numerical characters:

df['Price_num'] = df['Price'].str.replace(r'[,₹ ]', '').map(float)

arctic cliff Aug 4, 2020, 8:16 PM

#

Oh ..

desert oar Aug 4, 2020, 8:17 PM

#

df['Price_num'] = (
    df['Price']
    .str.replace(r'[,₹ ]', '')
    .mask(lambda x: x == 'Free', 0.0)
    .map(float))

forgot to handle the "Free" case

#

now you can do whatever you need to do with df['Price_num']

odd yoke Aug 4, 2020, 8:17 PM

#

str.replace doesn't use regex btw

desert oar Aug 4, 2020, 8:17 PM

#

in pandas it does

#

in regular python it doesnt

odd yoke Aug 4, 2020, 8:17 PM

#

oh right

desert oar Aug 4, 2020, 8:17 PM

#

kind of poor choice imo

arctic cliff Aug 4, 2020, 8:17 PM

#

mask ?

desert oar Aug 4, 2020, 8:17 PM

#

should have made it not regex, then given regex=True or something as a parameter

#

mask is a bit of a weird function

arctic cliff Aug 4, 2020, 8:18 PM

#

Is it a python thing ? Or is it related to Pandas ?

desert oar Aug 4, 2020, 8:18 PM

#

pandas

#

pd.Series.mask

#

there is also pd.Series.where which does almost the same thing, but "reverse"

#

the first argument pd.Series.mask is a function that should return a Series of bool (True/False)

#

ah you know what

#

do this instead, easier to understand

#

df['Price_num'] = (
    df['Price']
    .str.replace(r'[,₹ ]', '')
    .replace('Free', 0.0)
    .map(float))

odd yoke Aug 4, 2020, 8:20 PM

#

<@&267629731250176001>

arctic cliff Aug 4, 2020, 8:21 PM

#

Here it treats Price values one by one?
Because I had to loop to make changes to everyone of them

desert oar Aug 4, 2020, 8:21 PM

#

yes, pandas methods let you make changes without looping

#

they can be significantly faster than looping

#

and a lot less code

arctic cliff Aug 4, 2020, 8:22 PM

#

I see ..

#

I will start making columns instead from now on

#

By the way

#

I know it's too early to ask but I'm just so excited
When should I start learning AI things?

desert oar Aug 4, 2020, 8:23 PM

#

you can start learning concepts now, or at least some math

arctic cliff Aug 4, 2020, 8:23 PM

#

I also know I'm not ready yet
I just wanna know when will I be

desert oar Aug 4, 2020, 8:23 PM

#

it's good to learn programming concurrently with the math and the concepts

#

you start putting ideas together

arctic cliff Aug 4, 2020, 8:23 PM

#

Oh?
Do you suggest a specific source ?

desert oar Aug 4, 2020, 8:23 PM

#

for ai? no, i have no idea

arctic cliff Aug 4, 2020, 8:24 PM

#

For the math of AI

desert oar Aug 4, 2020, 8:24 PM

#

what's your academic level and background?

arctic cliff Aug 4, 2020, 8:24 PM

#

Highschool

desert oar Aug 4, 2020, 8:26 PM

#

start learning pre-calculus and calculus. logarithms, exponential functions, quadratic functions, derivatives

#

maybe you can start looking at intro probability & statistics

#

and very simple linear algebra, concepts like understanding what vectors and matrixes are

#

once you know a little bit on each of those areas, you will start to learn important terminology and concepts

#

the more you learn, the easier it will be to learn more

arctic cliff Aug 4, 2020, 8:30 PM

#

I see !

mellow spruce Aug 4, 2020, 8:43 PM

#

Doe anyone knows to fix a y axis that does not change with the addition of more traces in a waterfall/scatter plot chart?

#

Chart looks like this with one trace but the moment i add more, the y axis changes and it distorts the graph

📎 image0.jpg

#

📎 image0.jpg

#

Each trace index is following a list, however not every trace has all the elements of the list

desert oar Aug 4, 2020, 8:47 PM

#

@mellow spruce you can use ax.autoscale(False) to disable changing the axes

mellow spruce Aug 4, 2020, 8:49 PM

#

@mellow spruce you can use ax.autoscale(False) to disable changing the axes
@desert oar is that set on the trace or on the fig.update_layout()?

desert oar Aug 4, 2020, 8:49 PM

#

neither

#

show your plotting code

mellow spruce Aug 4, 2020, 8:52 PM

#


   

    sorterIndex=dict(zip(routing_list,range(len(routing_list))))

    group['Route']=group['ope_no'].map(sorterIndex)

    group.sort_values(['Route'], ascending=True, inplace=True)

    group.drop('Route',1,inplace=True)

    fig.add_trace(go.Scatter(

        name=k,

       mode='lines+markers',

        y=group['ope_no'],

        x=group['processstart'],

       

   ))

 

 

 

 

 

fig.update_layout(title="Title",

                  yaxis={'autorange':"reversed"})

fig.show()```

#

the first part is the order that I want each trace to follow

desert oar Aug 4, 2020, 8:54 PM

#

ah

#

what is go

#

wait

#

is this not matplotlib?

mellow spruce Aug 4, 2020, 8:55 PM

#

no, it's plotly

desert oar Aug 4, 2020, 8:55 PM

#

oh

#

i have absolutely no idea

#

in the future, clarify what library you're using

#

i assumed it was matplotlib, i should have asked

mellow spruce Aug 4, 2020, 8:55 PM

#

Sorry, my bad. Thanks anyway

willow parcel Aug 4, 2020, 9:41 PM

#

my b if this is the wrong channel but heres a simplified part of my code

#

def foo(bar):
bar = bar + 1
return bar
play = True
while play:
baz = foo(0)
print(baz)

#

how do i get it so that it prints numbers increasing instead of just 1's

odd yoke Aug 4, 2020, 9:42 PM

#

you can ask in a help channel

#

#❓｜how-to-get-help

lapis sequoia Aug 4, 2020, 11:21 PM

#

Hi.do someone know about a website where I can get info, data,statics .like a repository of covid 19?. I would yo get data for analyzing.

fervent bridge Aug 4, 2020, 11:32 PM

#

@lapis sequoia where you able to load the HDF5 file into TensorFlow?

gray scaffold Aug 4, 2020, 11:38 PM

#

@lapis sequoia https://github.com/nytimes/covid-19-data

GitHub

nytimes/covid-19-data

An ongoing repository of data on coronavirus cases and deaths in the U.S. - nytimes/covid-19-data

lapis sequoia Aug 4, 2020, 11:40 PM

#

@gray scaffold thank you

gray scaffold Aug 4, 2020, 11:40 PM

#

no problem, enjoy

desert parcel Aug 4, 2020, 11:59 PM

#

@desert oar Warning: Error detected in AddmmBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:42) Traceback (most recent call last): File "d:/python/ML/Corrosion test/test.py", line 37, in <module> fit(50, model, loss_fn, opt) File "d:/python/ML/Corrosion test/test.py", line 30, in fit loss.backward(retain_graph=True) File "D:\python\ML\lib\site-packages\torch\tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "D:\python\ML\lib\site-packages\torch\autograd\__init__.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck! Here is the entire error output.

#

There are versions where I tried to enable the anomaly detection

#

but have no idea how to

#

I did search it online but for some reason I can't find it

#

changing the optimizer to other options didn't work either

#

https://hastebin.com/oqatuzesab.py

#

here is the code

desert parcel Aug 5, 2020, 12:35 AM

#

I got it working

#

I copied another piece of sample code and that worked for some reason

willow karma Aug 5, 2020, 12:37 AM

#

After googling/stack overflowing/githubing for awhile, I believe Facebook's Prophet modeling package does not include a feature importance method. Does anyone know a workaround to use here so I can see which predictors are most importance in forecasting the target?

arctic cliff Aug 5, 2020, 12:37 AM

#

I'm trying to plot the earnings increasing but It's not working, What am I doing wrong ?

📎 unknown.png

willow karma Aug 5, 2020, 12:39 AM

#

@arctic cliff you shouldnt iterate on the plot method. If you just feed that method a dataframe with the time data as the index and the y values as your single column you'll get the result you want.

desert parcel Aug 5, 2020, 12:39 AM

#

Could someone compare the code? The one on the left works but the on the right doesn't. I tried to find the difference but so far has seen no difference.

📎 unknown.png

arctic cliff Aug 5, 2020, 12:39 AM

#

@arctic cliff you shouldnt iterate on the plot method. If you just feed that method a dataframe with the time data as the index and the y values as your single column you'll get the result you want.
@willow karma Oh! Thanks a bunch

willow karma Aug 5, 2020, 12:41 AM

#

@arctic cliff if you have a dataframe df with a date index and one column 'y_value'.. you would just need to run df.plot()

arctic cliff Aug 5, 2020, 12:45 AM

#

You made my day

#

📎 04rxjSxXbR0AAAAAElFTkSuQmCC.png

#

Can't I change the x and y ?

#

Ah nvm

#

Ignore me

desert parcel Aug 5, 2020, 1:01 AM

#

how many other methods are there to improv epredictions

#

improve predictions*

#

other than the number of iterations and messing around with the learning rate

#

Predictions:
tensor([[ 5.7500,  7.2500,  8.0000],
        [ 5.7500,  7.2500,  8.0000],
        [ 5.7500,  7.2500,  8.0000],
        [15.0000, 14.0000, 15.0000],
        [ 5.7500,  7.2500,  8.0000]], grad_fn=<AddmmBackward>)
----------------------------------------
Originals:
tensor([[ 5.,  6.,  6.],
        [ 5.,  5.,  6.],
        [ 7.,  8., 10.],
        [15., 14., 15.],
        [ 6., 10., 10.]])

#

Because right now it's not the most precise

#

some are exactly on point

#

not all of them

#

Ohh maybe I can add more like stuff in the inputs

#

yes adding more data in the inputs worked

#

I added enough stuff until it became very precise

#

😄 it finally figured it out

📎 unknown.png

desert oar Aug 5, 2020, 1:32 AM

#

the best way to improve prediction is to use input data that's strongly related to your target, and to represent that input data in such a way that the relationship is easy to learn

lapis sequoia Aug 5, 2020, 2:28 AM

#

@fervent bridge hey sorry missed your message yesterday, been busy. Nah haven't been able to, might try as an npz or npy file

fervent bridge Aug 5, 2020, 2:50 AM

#

Hmm did you want to go over it ? @lapis sequoia I am almost done getting it into TS

lapis sequoia Aug 5, 2020, 2:53 AM

#

For sure

#

Will you be free in like half an hour

#

I'm just doing smth at the moment

fervent bridge Aug 5, 2020, 2:56 AM

#

Yeaup

desert parcel Aug 5, 2020, 2:59 AM

#

the best way to improve prediction is to use input data that's strongly related to your target, and to represent that input data in such a way that the relationship is easy to learn
@desert oar yeah that makes sense. I had like 5 extra rows of input data that's why the predictions were so close.

fervent bridge Aug 5, 2020, 3:32 AM

#

@lapis sequoia ready?

lapis sequoia Aug 5, 2020, 3:58 AM

#

Apologies, gimme a bit more

arctic cliff Aug 5, 2020, 4:01 AM

#

How can I get the row of that value ?

📎 unknown.png

desert parcel Aug 5, 2020, 4:08 AM

#

There is an error with TensorDataSet

📎 unknown.png

#

I'm not sure how to fix it

#

there is an assertion error

#

assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)

fervent bridge Aug 5, 2020, 4:09 AM

#

https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values

Stack Overflow

How to select rows from a DataFrame based on column values?

How to select rows from a DataFrame based on values in some column in Python Pandas?

In SQL, I would use:

SELECT *
FROM table
WHERE colume_name = some_value
I tried to look at pandas documentati...

#

@arctic cliff

desert parcel Aug 5, 2020, 4:09 AM

#

I put the error into google but everything is in chinese

arctic cliff Aug 5, 2020, 4:11 AM

#

Thanks !

lapis sequoia Aug 5, 2020, 4:11 AM

#

@fervent bridge yo

fervent bridge Aug 5, 2020, 4:13 AM

#

@lapis sequoia ready?, been asking some questions trying to move along, shall we continue through DM?

lapis sequoia Aug 5, 2020, 4:13 AM

#

yeah no prob

desert oar Aug 5, 2020, 4:20 AM

#

@arctic cliff use Series.idxmax

arctic cliff Aug 5, 2020, 4:22 AM

#

That's what I was looking for !

glacial comet Aug 5, 2020, 4:28 AM

#

Hi All, new to the group. Is there a Data Science FAQ area?

granite light Aug 5, 2020, 6:02 AM

#

I have a basic question for Pandas

#

I am new to it. Let's say I only want to consider values after a certain index. My index is integers. I have 50 rows and I only want to use data from row 26 onwards in my new calculation. There are three columns which are not in ascending or descending order

desert oar Aug 5, 2020, 6:15 AM

#

vaex is on my perpetual todo list

#

@granite light data.loc[26:] if you want to use the index value 26, or data.iloc[26:] if you want to use the row number 26

granite light Aug 5, 2020, 6:17 AM

#

@desert oar thank you. Now how can I use it in a condition? Let us say I am doing data[data.column1 > something & index> 26]

#

I am not sure how to write that condition to ensure that first condition is only checked on the rows 26:50

desert oar Aug 5, 2020, 6:18 AM

#

you can save the subset of the data to a variable first

#

then apply your other conditions

granite light Aug 5, 2020, 6:18 AM

#

True

#

But I am kinda trying to learn, so would like to know if it can be done without copying

desert oar Aug 5, 2020, 6:19 AM

#

data_sub = data.iloc[26:]
data_final = data_sub.loc[data_sub['column1'] > something]

#

you aren't copying data

granite light Aug 5, 2020, 6:19 AM

#

Also what about computation time in the two approaches?

desert oar Aug 5, 2020, 6:19 AM

#

in fact if you try to modify sub pandas will give you a warning

granite light Aug 5, 2020, 6:19 AM

#

you aren't copying data
@desert oar Okay. I hadn't considered this

desert oar Aug 5, 2020, 6:20 AM

#

.loc and .iloc try to avoid making copies when possible

#

in most cases they return different "views" to the same underlying data

#

the pandas documentation avoids saying that they never make copies

#

but i cant think of a time i used it where it did make a copy

granite light Aug 5, 2020, 6:21 AM

#

Thanks a lot, that makes it a lot easier

desert oar Aug 5, 2020, 6:21 AM

#

it's a nice feature

lone nebula Aug 5, 2020, 7:30 AM

#

hey guys why does this line break my column header

📎 unknown.png

#

and how can i merge rows with the same year column into a single row without breaking this

misty lake Aug 5, 2020, 7:39 AM

#

I have a weird question, Can anyone suggest any techniques/ approaches like how goolge gives some value to a parameter question asked?

For Example
parameter value

"Temperate Today" - 20 F
"Rating Dark Night" - 4.4

basically Im encountered a problem to map parameters to its value from a large set of word documents. The word documents have complex table structure / paragraph and essays.

Parameters and value are in the document but not structured.

I'm looking for some help in cracking this

With Keyword search, NER Models I was able to get the parameters. But not able to find a solution to pull the relevant value of the parameters in a set of word documents.

#

Please tag me if someone could help

desert parcel Aug 5, 2020, 8:17 AM

#

I have a basic question

#

np.random.permutation(n)

#

This just randomly chooses a few values from n right?

velvet thorn Aug 5, 2020, 8:18 AM

#

uh

#

no, it randomly permutes (shuffles) np.arange(n)

desert parcel Aug 5, 2020, 8:20 AM

#

so it just changes the order of the thing?

velvet thorn Aug 5, 2020, 8:22 AM

#

strictly speaking

#

it makes a copy with the order changed randomly, yes

spark stag Aug 5, 2020, 8:25 AM

#

if you pass it an iterable it will shuffle those values instead of np.arange() ```py

np.random.permutation((3, 5, 4, 2, 3))
array([3, 5, 4, 3, 2])
np.random.permutation((3, 5, 4, 2, 3))
array([5, 2, 3, 3, 4])```

desert parcel Aug 5, 2020, 8:25 AM

#

ah gotcha

#

also

#

this part wasn't explained clearly in the yt tut

#

import numpy as np

def split_indices(n, eval):
    eval = int(eval*n)
    index = np.random.permutation(n)
    return index[eval:], index[:eval]

train_index, eval_index = split_indices(len(dataset), eval=0.2)

velvet thorn Aug 5, 2020, 8:26 AM

#

if you pass it an iterable it will shuffle those values instead of np.arange() ```py

np.random.permutation((3, 5, 4, 2, 3))
array([3, 5, 4, 3, 2])
np.random.permutation((3, 5, 4, 2, 3))
array([5, 2, 3, 3, 4])```
@spark stag it will make a copy with shuffled values

desert parcel Aug 5, 2020, 8:26 AM

#

So here it shuffles the array then takes 20% of it and puts it inside train_index and eval_index?

velvet thorn Aug 5, 2020, 8:26 AM

#

uh

desert parcel Aug 5, 2020, 8:26 AM

#

Or takes 20% then shuffles that

#

that being the 20%

velvet thorn Aug 5, 2020, 8:26 AM

#

it shuffles an array that represents the index

desert parcel Aug 5, 2020, 8:26 AM

#

uhuh

velvet thorn Aug 5, 2020, 8:26 AM

#

then it takes the first x% of the shuffled array containing random indices

#

and uses that to form the training set

spark stag Aug 5, 2020, 8:27 AM

#

@velvet thorn ah ye thats what i meant, it will use those values when crating the array but i guess i was't really clear on that

velvet thorn Aug 5, 2020, 8:27 AM

#

and the rest for the evaluation set

desert parcel Aug 5, 2020, 8:27 AM

#

so it shuffles it once, takes 20%, then shuffles it again?

velvet thorn Aug 5, 2020, 8:27 AM

#

although I don't like that code

#

no, it shuffles once only

#

eval as a parameter name is Bad

desert parcel Aug 5, 2020, 8:27 AM

#

So it shuffles once, takes 20% and put it into the variables?

#

Lol I didn't know what else to put it

#

the parameter set by the yt tut

velvet thorn Aug 5, 2020, 8:28 AM

#

So it shuffles once, takes 20% and put it into the variables?
@desert parcel yes.

desert parcel Aug 5, 2020, 8:28 AM

#

was something like

#

n_val

velvet thorn Aug 5, 2020, 8:28 AM

#

well I'm not sure if you have the same understanding as me

desert parcel Aug 5, 2020, 8:28 AM

#

Oh yeah lol

velvet thorn Aug 5, 2020, 8:29 AM

#

so just to be clear

#

shuffle a sequential array (0, 1, 2...n - 1) representing indices

#

use the last x% for the train set and the first (1 - x)% for the evaluation set

autumn veldt Aug 5, 2020, 9:34 AM

#

Excuse me guys.
so, im trying to run 5 random state. where the result of accuracy each random_state i want to save it into csv file, do u guys know how to do it?

📎 ask1.PNG

#

📎 ask2.PNG

outer fulcrum Aug 5, 2020, 9:53 AM

#

What kind of package do you use to generate a pd freport of your data analysis ?

warm moth Aug 5, 2020, 11:53 AM

#

Hi! I am pretty new to Data Science. I was wondering how you would do, say Regression, on Real time data? Would you have to train the model on the whole dataset again everytime new data is avaliable? Would you be able to Pickle the model then just do model.fit(x, y)
over and over again for every new data?

I am working on a little project which deals with realtime weather data and I want to predict the weather. I want to Implement it on my website and maybe a Discord Bot.

tidal bough Aug 5, 2020, 12:07 PM

#

What you want is called "online learning" - when new data becomes available in batches and the algorithm should ideally be able to quickly update on the new data without being refit on the entire updated dataset.
https://en.wikipedia.org/wiki/Online_machine_learning

warm moth Aug 5, 2020, 12:13 PM

#

What you want is called "online learning" - when new data becomes available in batches and the algorithm should ideally be able to quickly update on the new data without being refit on the entire updated dataset.
https://en.wikipedia.org/wiki/Online_machine_learning
@tidal bough Thankyou for the answer. I will check it out. Any idea on how I could go about implementing it in a Discord Bot or a Website? Like should I make an API which can be accessed by the Bot?

zenith saffron Aug 5, 2020, 12:13 PM

#

what should I put inside kmeans.fit() if my file is pdf file that already have been pre-processing and using td-idf for this method to work or the function is not right. I already try to look at the stack overflow and other website but I can't found the answer. My program is kmeans clustering using pdf file. So I want to put an elbow method inside it.

📎 unknown.png

tidal bough Aug 5, 2020, 12:21 PM

#

@warm moth Might be a good idea if you will need to access it from different places (your website and the bot).

warm moth Aug 5, 2020, 12:23 PM

#

Alrighty! Thanks for the Answer.

still delta Aug 5, 2020, 1:03 PM

#

What are the best statistics books, you have seen at univ???

ornate dagger Aug 5, 2020, 1:43 PM

#

Would preprocessing in Python (well any language, just using Python as an example) mean simply taking a look at the source code and copying ONLY used functions in this code from the imported modules that contain them? Here is an example;
module that will be imported:

def add(a, b):
  return a+b

def subtract(a, b):
  return a-b

source code:

x = 5
y = 4
print(add(x,y))

After preprocessing:

def add(a, b):
  return a+b

x = 5
y = 4
print(add(x,y))

acoustic halo Aug 5, 2020, 1:46 PM

#

preprocessing has a variety of different meanings, what you put is an example of preprocessing but that is down to the task at hand and what youw ant to achieve

#

For example, I preprocessed a bunch of c++ files, and for me that meant removing all comments and undoing all the #define preprocessor directives

ornate dagger Aug 5, 2020, 1:49 PM

#

and how would one go about removing all the comments and undoing all the #define directives?

#

wouldn't it be essentially having a function in a library perhaps that goes over your code and does this - same thing as described above?

acoustic halo Aug 5, 2020, 1:50 PM

#

Comments largely with regex, undoing directives is a massive effort so unless you need to, i wouldnt recommend it

ornate dagger Aug 5, 2020, 1:50 PM

#

not trying to replicate it, simply trying to understand the other variant of preprocessing more clearly.

acoustic halo Aug 5, 2020, 1:51 PM

#

I have a program where i feed in the source code text and it spits out the processed code

ornate dagger Aug 5, 2020, 1:53 PM

#

so basically preprocessing means editing/preparing the source code before it goes through it all?

#

whether it's importing functions from used modules or doing any else kind of formatting

acoustic halo Aug 5, 2020, 1:53 PM

#

Yeah, in my case it was, I was building abstract syntax trees for each source code file, before that, each file had to be preprocessed in that way

ornate dagger Aug 5, 2020, 1:54 PM

#

alright, thank you!

acoustic halo Aug 5, 2020, 1:56 PM

#

Ultimately though, you have to know how you want each file preprocessed for whatever task it is you want to complete, and depending on that you might find that another way of pre processing your code that is betetr

lapis sequoia Aug 5, 2020, 3:48 PM

#

What sort of augmentation should i be applying to a dataset of skin cancer images. It's well segmented but not doesn't contain many images (size 2 GB approx), and I'm going to try a few Transfer Learning architectures first. Also what metrics/score would be best to evaluate my model?

molten hamlet Aug 5, 2020, 4:13 PM

#

is there easy way to assign function that generates batches of data into data generator and feed model while training? in keras

#

def get_batch():
  # example
  yield X, Y

steel roost Aug 5, 2020, 5:06 PM

#

Where is a good starting point to start learning data science with python

grave frost Aug 5, 2020, 5:06 PM

#

What exactly are you interested in learning about?

odd yoke Aug 5, 2020, 5:10 PM

#

@molten hamlet are you using tf.data.Dataset ?

#

if so, there is a batch method

stark hornet Aug 5, 2020, 5:16 PM

#

I desperately need help. is it useful to use/learn matplotlib when you can just export the data to excel or some similar program?

desert oar Aug 5, 2020, 5:20 PM

#

Yes

fickle rampart Aug 5, 2020, 5:21 PM

#

My goal is to develop a stock options backtester. I'm 2 months into learning programming(python specifically). With so much information and so many fields of study, what areas should I focus on in order to develop this backtesting program?

#

I've started learning pandas but not sure where to go from here. Should I focus on understanding classes and objects? What will I need to focus on in order for the backtester to make the correct selection of orders to buy and sell amongst so many rows of data as well as calculate the necessary statistics such as the profit/loss per strategy? Any guidance on this will help me a lot. I don't know where to look.

molten hamlet Aug 5, 2020, 5:40 PM

#

@odd yoke no, but I solved it, fit supports generators since 2.0 I, think I can just pass generator function

flat quest Aug 5, 2020, 6:02 PM

#

well the stock market is a really odd thing, especially rn. Breaking all the standard rules, so backtesting strats might not work as well as before.

But anyways, if you want to make a backtester, I would say learn the basics of classes and objects before jumping into pandas. As for pandas, there's lots of tutorials online and documentation is pretty good imo @fickle rampart

fickle rampart Aug 5, 2020, 6:13 PM

#

Yes I agree pandas is well documented and since it's so widely used I've been able to find how to do things with it with some searching. The dillema I'm facing is that making an options backtester seems to be much more difficult than a stock backtester. While in stocks there is only one stock which never changes, in options there are hundreds of options that change every week. What would be useful for me to focus on in order to understand how to make the selection of the correct options with my code?

river fjord Aug 5, 2020, 7:02 PM

#

It is so hard to read such large paragraphs, keep it short pepeLaugh

lapis sequoia Aug 5, 2020, 7:25 PM

#

@desert oar you helped me with this yesterday but i had a followup question -- do you know why this fill_value is replacing everything in my dataframe with 0? this is the code

#

import pandas as pd

data = pd.read_csv('my-data.csv')
data['MONTH'] = pd.to_datetime(data['MONTH'])

new_index = pd.date_range(data['MONTH'].min(), data['MONTH'].max(), freq='MS', name='MONTH')
def fill_monthly(df):
    return df.set_index('MONTH').drop('APP', axis='columns').reindex(new_index, fill_value=0)
data_filled = data.groupby('APP').apply(fill_monthly)```

desert oar Aug 5, 2020, 7:25 PM

#

it shouldnt be. can you also provide some small test data?

lapis sequoia Aug 5, 2020, 7:25 PM

#

yea so

desert oar Aug 5, 2020, 7:25 PM

#

its easier than me constructing some tiny data set

lapis sequoia Aug 5, 2020, 7:26 PM

#

ID | MONTH | INCIDENTS
AP00094 | 2017-11 | 1
AP00094 | 2018-03 | 1
AP00095 | 2019-05| 3

#

it worked with some other dataframes but for this one im getting 0 replaced for everything

desert oar Aug 5, 2020, 7:27 PM

#

is ID equivalent to APP?

lapis sequoia Aug 5, 2020, 7:27 PM

#

yea

#

wait

#

i think it might be because my month columns are in string format right now

#

i didnt even notice. let me change that and see

desert oar Aug 5, 2020, 7:28 PM

#

did you forget the to_datetime?

#

that line is necessary

lapis sequoia Aug 5, 2020, 7:28 PM

#

yeah thats probably it. lemme try

#

yup that worked!

#

thanks

desert oar Aug 5, 2020, 7:30 PM

#

👍

lapis sequoia Aug 5, 2020, 7:46 PM

#

I am pretty new to ML and DS so I might probably misunderstood the concepts but I hope someone can clarify it for me.
What is point of having multiple kernals in a CNN's convolution layer if the Maxpool in the next layer performs a max operation? Since all the kernal outputs will give the same max values per pool window.

#

It's not a python specific question, so I posted it here. Hope that's alright

odd yoke Aug 5, 2020, 8:02 PM

#

(i'm assuming conv2d for this example)
if you have C convolution kernels, the output will have the dimensions HWC (or CHW based on what data layout you use), pooling operations is used to down sample the spatial dimensions (HW), the C dimension still keeps its size

#

and the kernels are not initialized with the same values, so the values won't be the same

#

@lapis sequoia ping in case you left

lapis sequoia Aug 5, 2020, 8:03 PM

#

I'm here, reading it, thanks

#

Nah, I'm working with the grayscale images for now. I understood the downsizing the spatial dimensions part.

#

Wait, lemme try to use an example

#

Example output after convolving with a kernel:
[1 2 3 4]
[2 1 3 4]
[4 2 1 3]
[2 2 4 1]
Now if I do a max in axis 1, won't all of them become 4?

#

class Layer_Maxpool:
    def __init__(self, pool_scale):
        # Initializing attributes
        self.pool_scale = pool_scale

    def maxpool(self, img, maxpool_out):
        maxpool_out = np.zeros((conv_out.shape[0] // self.pool_scale, conv_out.shape[1] // self.pool_scale, conv_out.shape[-1]))
        for ix in range(img.shape[-1]):
            new_img = conv_out[:,:,ix]
            for i in range(maxpool_out.shape[0]):
                for j in range(maxpool_out.shape[1]):
                    segment = new_img[i * self.pool_scale:(i+1) * self.pool_scale, j * self.pool_scale:(j+1) * self.pool_scale]
                    maxpool_out[i,j] = np.amax(segment, axis=(0, 1))
        return maxpool_out

    def forward(self, inputs, training=False):
        self.inputs = inputs
        self.output = np.zeros(
            (
                inputs.shape[0] // self.pool_scale,
                inputs.shape[1] // self.pool_scale,
                inputs.shape[2]
            )
        )
        # Calculate output values from input ones, weights and biases
        self.output = self.maxpool(inputs, self.output)

This is the code I'm using. Might've made a mistake in it somewhere.

#

@odd yoke

odd yoke Aug 5, 2020, 8:10 PM

#

You don't directly apply the max pool on the convolution kernel of the previous layer, you apply it on the output of said convolution

lapis sequoia Aug 5, 2020, 8:10 PM

#

omg, I figured it out

#

I'm sorry 😅

#

Instead of broadcasting, I was looping over the images

#

You don't directly apply the max pool on the convolution kernel of the previous layer, you apply it on the output of said convolution
@odd yoke yeah, aware of that

#

the inputs here is the conv out

#

So the correct architecture of the model is:

Conv -> Maxpool -> ReLu -> Dense Layer -> Softmax

correct?

odd yoke Aug 5, 2020, 8:15 PM

#

That looks good yep

lapis sequoia Aug 5, 2020, 8:15 PM

#

Thanks a lot

odd yoke Aug 5, 2020, 8:15 PM

#

You may see ReLu -> Maxpool instead sometimes, but it's the same result

lapis sequoia Aug 5, 2020, 8:15 PM

#

yeah, was reading about that just now

odd yoke Aug 5, 2020, 8:15 PM

#

Mostly for optimization purposes

lapis sequoia Aug 5, 2020, 8:16 PM

#

I see

#

wouldn't subsampling it first reduce the overhead on Relu?
Not sure which on would be costlier as both strive to reduce the computation in their own way

#

intuition says maxpool does a tougher job

oblique belfry Aug 5, 2020, 8:17 PM

#

Relu is a really simple function to execute. So, it doesn't matter much.

odd yoke Aug 5, 2020, 8:18 PM

#

I think that's a reasonable assumption, I'm not knowledgeable enough in GPGPU to know exactly what they may do to make it faster with Conv -> ReLu -> Maxpool

lapis sequoia Aug 5, 2020, 8:18 PM

#

I see

oblique belfry Aug 5, 2020, 8:18 PM

#

relu = max(0, x)

lapis sequoia Aug 5, 2020, 8:18 PM

#

yeah, aware of that tonus

oblique belfry Aug 5, 2020, 8:18 PM

#

Not trying to talk down, but just typing as I think.

lapis sequoia Aug 5, 2020, 8:18 PM

#

ah lol, okay

odd yoke Aug 5, 2020, 8:19 PM

#

Yeah, but when we're talking about millions of weights, that relu operation that is ran several times per iteration can make a non-negligible difference

oblique belfry Aug 5, 2020, 8:19 PM

#

Yeah....I don't disagree. But, I feel like that is a level of optimization that isn't necessary in my opinion to think about at this point.

odd yoke Aug 5, 2020, 8:19 PM

#

https://github.com/tensorflow/tensorflow/issues/3180 Here some people shortly discuss the idea of automatically reversing relu -> pooling

GitHub

Execution order of ReLU and Max-Pooling · Issue #3180 · tensorflow/...

Hello Everyone, I'm new to Deep Learning and TensorFlow. From studying tutorials / research papers / online lectures it appears that people always have the execution order: ReLU -> P...

lapis sequoia Aug 5, 2020, 8:19 PM

#

But from maxpool's p.o.v, will max([1 2 3 4]) and max([-1 2 3 4]) make any difference?

#

ty, will check it out

odd yoke Aug 5, 2020, 8:20 PM

#

When your program takes hours or days to train, even an improvement of 1% is important

oblique belfry Aug 5, 2020, 8:20 PM

#

Had a typo. I don't disagree with you.

odd yoke Aug 5, 2020, 8:20 PM

#

Ah, my bad

oblique belfry Aug 5, 2020, 8:20 PM

#

Nah. It's mine.

odd yoke Aug 5, 2020, 8:21 PM

#

So apparently, tensorflow doesn't optimize for it (yet?)

oblique belfry Aug 5, 2020, 8:21 PM

#

I am not sure how I feel about TF doing that on its own.

#

Not saying it isn't an affective optimization. But, I think the dev should handle that. And, there should be better documentation on similar operations.

loud breach Aug 5, 2020, 8:36 PM

#

hi, im a noob in neural networks and i was trying to make a very simple perceptron that simply tries to guess the slope.
so you give it a x, it needs to spit out the correct y (so curve fitting?)
the cost function is (a-y)²
i thought this is the way to calculate the new weight:
W1 = W0- learning_rate*i*2*(a-y)
is this right?

#

a is the network's prediction

#

y is the desired output

#

i is the input (so x)

oblique belfry Aug 5, 2020, 9:04 PM

#

Are there any good benchmarks for Flax?

red carbon Aug 5, 2020, 11:34 PM

#

anybody knows whats the best way to get the output of a particular hidden layer in a NN using pytorch?

odd yoke Aug 5, 2020, 11:38 PM

#

you can create a list out of a model where each element is a layer

#

alternatively, when you define your model, store a reference to the layer that interests you and retrieve it using a method

#

this seems to be a very common question, there are multiple other solutions you can find online

arctic cliff Aug 6, 2020, 12:41 AM

#

I don't get df.grouby()

velvet thorn Aug 6, 2020, 1:07 AM

#

I don't get df.grouby()
@arctic cliff what about it

desert parcel Aug 6, 2020, 1:10 AM

#

Could someone take a look at the tensor shapes, it's not getting the output I wanted

#

https://hastebin.com/jocugisulo.py

#

this is something I drew to help myself

📎 unknown.png

#

The first two parts of this work, but the final part i'm not sure how to get

#

this is the output

📎 unknown.png

#

I tried to do a .t() at targets to try and fix it but there are errors so I'm not sure what to do.

odd yoke Aug 6, 2020, 1:13 AM

#

model = nn.Linear(13, 1) here you define your model as a linear model that takes an input of size 13, and has an output of size 1

#

I'm confused as to why it doesn't crash directly in your training loop

desert parcel Aug 6, 2020, 1:18 AM

#

It didn't crash

#

model = nn.Linear(13, 1) here you define your model as a linear model that takes an input of size 13, and has an output of size 1
@odd yoke Alright but I changed it to (13, 13) but doing that just gives an error about singleton dimensions

#

I changed it to (13, 2) and that also crashed it

#

so it only works with (13, 1) I tried transposing the tensor but it didn't work either

odd yoke Aug 6, 2020, 1:21 AM

#

which line crashes when you set it to 13, 13

desert parcel Aug 6, 2020, 1:21 AM

#

📎 unknown.png

#

The lines are linked

odd yoke Aug 6, 2020, 1:22 AM

#

what's the exact stack trace ?

desert parcel Aug 6, 2020, 1:22 AM

#

let me get it again

#

d:/Coding/python/ML/winrate.py:37: UserWarning: Using a target size (torch.Size([5])) that is different to the input size (torch.Size([13])). This will likely lead to incorrect results due to broadcasting. Please ensure they 
have the same size.
  loss = loss_fn(preds, yb)
Traceback (most recent call last):
  File "d:/Coding/python/ML/winrate.py", line 46, in <module>
    fit(250, model, loss_fn, opt)
  File "d:/Coding/python/ML/winrate.py", line 37, in fit
    loss = loss_fn(preds, yb)
  File "D:\Coding\python\ML\lib\site-packages\torch\nn\functional.py", line 2542, in mse_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "D:\Coding\python\ML\lib\site-packages\torch\functional.py", line 62, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (13) must match the size of tensor b (5) at non-singleton dimension 0```

odd yoke Aug 6, 2020, 1:25 AM

#

oh it's the batch size

#

now as to why the shapes don't fit, can you print the shapes right before the loss in the loop ?

#

wait wait

#

you're using inputs

#

instead of xb

#

I'm not exactly familiar with pytorch, but that doesn't seem right

#

Also, is your dataset supposed to be one parameter and one label ?

#

In which case you want to set model to Linear(1, 1)

arctic cliff Aug 6, 2020, 1:29 AM

#

@velvet thorn What's it used for ?

odd yoke Aug 6, 2020, 1:29 AM

#

If really you don't understand it at all, I suggest you look at the documentations directly @arctic cliff

#

It's used for "grouping" values together based on some arbitrary criteria

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
It even has some examples

desert parcel Aug 6, 2020, 1:31 AM

#

@odd yoke sorry what.

odd yoke Aug 6, 2020, 1:32 AM

#

Your model's input size isn't 13

#

it's 1, right ?

desert parcel Aug 6, 2020, 1:32 AM

#

When I do .shape

odd yoke Aug 6, 2020, 1:32 AM

#

13 being the number of examples in your dataset

desert parcel Aug 6, 2020, 1:32 AM

#

it just gives [13]

#

Ohhhh

#

it's a 1,13

odd yoke Aug 6, 2020, 1:32 AM

#

1, 1

arctic cliff Aug 6, 2020, 1:32 AM

#

Guess I got it, Thanks !

desert parcel Aug 6, 2020, 1:32 AM

#

so I just put in 1,1

odd yoke Aug 6, 2020, 1:32 AM

#

Yes, and in the training loop, you're using inputs but I'm p sure you want to use xb

desert parcel Aug 6, 2020, 1:33 AM

#

yeah inputs is xb targets is yb

#

It says there is a size mismatch

#

RuntimeError: size mismatch, m1: [1 x 13], m2: [1 x 1] at C:\w\b\windows\pytorch\aten\src\TH/generic/THTensorMath.cpp:41

#

After changing it model=nn.Linear(1,1)

#

m2 being model

odd yoke Aug 6, 2020, 1:34 AM

#

which line causes this ?

desert parcel Aug 6, 2020, 1:34 AM

#

I'm a bit busy right now I'll get back to you

#

Sorry about the wait

velvet thorn Aug 6, 2020, 1:34 AM

#

@velvet thorn What's it used for ?
@arctic cliff many things, but the most common one is to apply aggregations over subsets of data

odd yoke Aug 6, 2020, 1:34 AM

#

you shouldn't get 13 as input anywhere

velvet thorn Aug 6, 2020, 1:35 AM

#

for example, say you have a dataset that contains three columns: department, name, and age

odd yoke Aug 6, 2020, 1:35 AM

#

don't forget to remove ```py
preds = model(inputs)

print(preds.shape)

loss_fn = F.mse_loss # except this line
loss = loss_fn(preds, targets)```

velvet thorn Aug 6, 2020, 1:35 AM

#

if you wanted the average age of the whole company, you would do df['age'].mean()

#

but if you wanted the average age of each department, you would do df.groupby('department').mean()

arctic cliff Aug 6, 2020, 1:36 AM

#

Can't I do: df.department.mean() ?

velvet thorn Aug 6, 2020, 1:37 AM

#

no, that would be the mean of the column department

#

which doesn't make sense because it contains strings.

#

>>> df
    department name  age
0   Accounting    A   36
1   Accounting    B   29
2  Engineering    C   24
3  Engineering    D   37
4  Engineering    E   33
>>> df['age'].mean()
31.8
>>> df.groupby('department').mean()
                   age
department            
Accounting   32.500000
Engineering  31.333333

arctic cliff Aug 6, 2020, 1:38 AM

#

!

#

I got it !

velvet thorn Aug 6, 2020, 1:38 AM

#

this is the simplest and (I think) the most common use case for groupby

#

but the general principle is split-apply-combine

#

split into subsets based on the value of a specified column, apply some operation, combine the results back into a DataFrame

#

in this case the operation is the mean aggregation.

#

however, you can do stuff like transform and filter, in particular

#

also, as you get more advanced you'll find that you don't have to group on only a single column, or even on columns at all

#

an easy example of the first case is...imagine you also had a "sex" column

#

you could do df.groupby(['department', 'sex']).mean() to get the average age by department and sex

arctic cliff Aug 6, 2020, 1:41 AM

#

Let me try this out

#

Thanks

lapis sequoia Aug 6, 2020, 2:52 AM

#

lst = eval(input("Enter list :"))
length = len(lst)
#List to hold unique elements
uniq = [ ]
#List to hold duplicate elements
dupl = [ ]
count = i = 0
while i < length :
element = lst[i]
#Count as 1 for the element at lst[i]
count = 1
if element not in uniq and element not in dupl:
i+=1
for j in range(i,length):
if element==lst[j]:
count+=1
#when inner llop - for loop ends
else:
print("Element",element,"frequency:",count)
if count==1:
uniq.append(element)
else:
depl.append(element)
#When element is found in uniq or dupl lists
else:
i+=1
print("Original list",lst)
print("Unique elemts list",uniq)
print("Duplicates elements list",dupl)

#

why I'm getting error?

#

$python main.py
Enter list :
Traceback (most recent call last):
File "main.py", line 1, in <module>
lst = eval(input("Enter list :"))
EOFError: EOF when reading a line

desert parcel Aug 6, 2020, 3:06 AM

#

don't forget to remove ```py
preds = model(inputs)

print(preds.shape)

loss_fn = F.mse_loss # except this line
loss = loss_fn(preds, targets)```
@odd yoke wydm

odd yoke Aug 6, 2020, 3:06 AM

#

remove that code, it's not part of your model

#

it may be what's causing the error with the shape 13

#

because you should really only have shapes 1 and 5

desert parcel Aug 6, 2020, 3:07 AM

#

like this?

📎 unknown.png

odd yoke Aug 6, 2020, 3:19 AM

#

yes

lapis sequoia Aug 6, 2020, 5:29 AM

#

What types of career paths are you all wanting to do with data science?

#

Just curious

#

New to this

brittle edge Aug 6, 2020, 5:36 AM

#

Does anyone here use notebooks.ai?

slate scroll Aug 6, 2020, 5:38 AM

#

@lapis sequoia I am a machine learning engineer, it is a growing area.

desert parcel Aug 6, 2020, 8:09 AM

#

📎 unknown.png

#

Could someone explain this line?

#

import numpy as np

def split_indices(n, eval):
    eval = int(eval*n)
    index = np.random.permutation(n)
    return index[eval:], index[:eval]

train_index, eval_index = split_indices(len(dataset), eval=0.2)

#

Here is the full code

#

So does it split the 20% between train_index and eval_index?

#

📎 unknown.png

#

Here is the output I don't really understand it

desert parcel Aug 6, 2020, 8:39 AM

#

mostly because they're different

velvet thorn Aug 6, 2020, 9:05 AM

#

didn't we go through this yesterday

still delta Aug 6, 2020, 9:42 AM

#

please, do you have a good "google API's" tutorial ?

uncut shadow Aug 6, 2020, 10:17 AM

#

what

lapis sequoia Aug 6, 2020, 10:21 AM

#

Series.filter(regex="..")

#

need to filter out strings ending with -org in the series

#

what will the regular expression be like

uncut shadow Aug 6, 2020, 10:25 AM

#

no idea cuz u didn't show any example of data

lapis sequoia Aug 6, 2020, 10:29 AM

#

📎 Screenshot_2020-08-06_at_3.59.00_PM.png

#

@uncut shadow here you go

#

Nvm i used another approach

dreamy fractal Aug 6, 2020, 10:44 AM

#

Hello guys, I have a question regarding Deep Learning frameworks. I know how to make simple neural networks architectures, but I have some difficulties implementing custom architectures even though I'm quite familiar with the theory behind the implementation. Do you have any ressources or ideas about how to practice the "coding" part of implementing custom neural networks using Tensorflow and/or PyTorch ?

lapis sequoia Aug 6, 2020, 10:46 AM

#

Look up architectures and try to implement a broken down version of it

#

the last few chaps of Hands on ML with scikit learn and tensorflow are helpful

dreamy fractal Aug 6, 2020, 10:49 AM

#

Will look into that, is the book adapted for Tensorflow 2.0+ ?

vocal sluice Aug 6, 2020, 10:56 AM

#

i want to ask that im have data mean training data for object detection and i want to use tensorflow for this puporse i m labeling picture but the problem is that all the picture (mostly is in horizantal) and im labeling them i want to ask that is there any problem after my model will train coz of horizantal pic>>>>>>>............sorry for RiP Inglish

hidden halo Aug 6, 2020, 11:02 AM

#

I need to do a calculation over a list where I need to find the number of items smaller than any item appearing prior to that item. I have written this function using numpy array for this:

L = [5,8,2,77,34,67,....,56,342,567]
num_lower = []
for i, j in enumerate L:
    cur_L = L[:i+1]
    lt = np.sum(cur_L < j)
    num_lower.append(lt)

Is there a way to vectorize this loop using Numpy

tidal bough Aug 6, 2020, 11:05 AM

#

honestly, this sounds like it can be solved in O(n) by dynamic programming from the right

#

but to speed this solution up, using numba should work too if the contents are homogenous.

hidden halo Aug 6, 2020, 11:07 AM

#

basically it's a time series data and for each number, I want to know where it stands with respect to historical data.
I'm not familiar with numba

#

I'll look it up

tidal bough Aug 6, 2020, 11:07 AM

#

Pretty much just make this part into a function and apply the @numba.njit decorator to it.

#

It'll lag the first time you call it because it'll be compiled into C code, but then it'll be much faster.

#

of course, not all functions can be translated into C, but this looks like something that can - just some math and loops.

hidden halo Aug 6, 2020, 11:10 AM

#

OK. I'll try it out. Thanks

#

honestly, this sounds like it can be solved in O(n) by dynamic programming from the right
@tidal bough What did you mean by this?

tidal bough Aug 6, 2020, 11:11 AM

#

so... for each number in the list, you need to count the number of elements that are to the right of that number and smaller than it?

velvet thorn Aug 6, 2020, 11:12 AM

#

honestly, this sounds like it can be solved in O(n) by dynamic programming from the right
@tidal bough really...?

hidden halo Aug 6, 2020, 11:12 AM

#

I'd say to the left of the number. As in, the numbers are on a timeline, starting from left and moving to the right. So I need to consider all numbers appearing before that

velvet thorn Aug 6, 2020, 11:13 AM

#

I can't see it but maybe you're right

#data-science-and-ml

print(preds.shape)

print(preds.shape)