#data-science-and-ml

1 messages ยท Page 241 of 1

odd yoke
#

(or comment it)

fallow sandal
#

oh and just do model_dir = (path for my custom model)

#

?

odd yoke
#

yes

#

how did you save your model before ?

#

if you used tf.saved_model.save(model, path) then this should work

fallow sandal
#

I was following tf2 documentation and it made me save my model like this

odd yoke
#

perfect

fallow sandal
#

Woah it works

#

TY SO MUCH ๐Ÿ˜ @odd yoke

odd yoke
#

yw

fallow sandal
#

That was so much effort (I have like beginner to intermediate python) so like it was a big challenge haha XD

#

But I think I'm going to look into YoloV3 or V4 framework next time..

odd yoke
#

personal opinion here, but tensorflow is big mess

fallow sandal
#

yeah

#

YoloV3 and V4 seems easier to get set up

lapis sequoia
#

thinking of switching to torch

odd yoke
#

they are algorithms, you can implement them in tensorflow too

fallow sandal
#

oh yeah pytorch was recommended to me too

#

i mean fun weekend project touching into something i wasnt comfortable with

odd yoke
#

to be fair, TF2 is orders of magnitude better than TF1

#

they made the api more "torch-like"

fallow sandal
#

ugh finding documentation was the big pain

odd yoke
#

it was completely unintuitive before

fallow sandal
#

lots of tf1 documentation mixed with tf2 XD

lapis sequoia
#

I think eager execution made it choppy

odd yoke
#

yeah it's officially stable but only as of recently

fallow sandal
#

Ok, something went wrong with my training lol

#

It's recognizing my object twice..? One with a longer bounding box lmao

odd yoke
#

so, that is a common thing, but usually models have techniques to suppress overlapping bounding boxes

#

eg, faster-rcnn uses what's called "non-maximum suppression" at the end of the model

#

that only works if the boxes are extremely similar tho, if that's your case here

fallow sandal
#

I'm guessing this is normal? I needed to pick a random object so I picked a toy goat from my fav robotics part supplier (i just like the goat)

#

I only trained it for 20 minutes with about 55 images

odd yoke
#

hard to know just like that, but it's safe to expect it's due to lack of training

#

or overfitting

#

can't really make any conclusion just like that

fallow sandal
#

it was based on one of the my ssd resnet 50

#

maybe I shhould try a faster rcnn model

odd yoke
#

if you don't need real-time object detection, it's the current SOTA

fallow sandal
#

I stopped training early because I thought something crashed, I was checking my task manager and for some reason my GPU and CPU activity dropped to liek near 0

#

(like idle, not 0)

odd yoke
#

weird, you should log metrics during training to have a better idea of what's going on

fallow sandal
#

Is it a config on how intensive the gpu/cpu goes? Cause like all the youtube videos I was watching said it would make my computer go sicko mode

#

I think I still have my tensorboard, let me check

lapis sequoia
#

guys would you recommend a good book for code-approach deeplearning, preferably keras

#

documentations have me lost tbh

fallow sandal
#

Not sure, how to make sense of this, documentation told me total loss of less than 1 was good, while another youtube video was like less than 0.05 consistently

odd yoke
#

I don't know about keras, but pytorch recently distributed their free book called "Deep Learning with Pytorch", and it seemed pretty good when I skimmed through it

#

You should try to log your validation set too

fallow sandal
#

oh shoot

#

Does it only log my training set?

odd yoke
#

yes
also, your learning rate seems to be gradually raising, is that warm up that you stopped too early, or is it a mistake

fallow sandal
#

I have no clue, I should probably try running it again but for a bit longer and see how the graph changes over time

#

Man, I got so many warning messages so I was just scared something would have gone wrong regardless

#

lol

odd yoke
#

it contains some code, like here, but also charts, more theoretical details, intuitions to have etc

#

But again, it's torch, not keras

fallow sandal
#

voxels? ๐Ÿ˜ฎ

odd yoke
#

3d pixels

fallow sandal
#

So it doesn't use tensors?

frank bone
#

Any recommendations for interactive charting for time series data on local machine, preferrably not in a browser?

odd yoke
#

well, Nd pixelsbrainfart

fallow sandal
#

supreme voxels

#

XD

odd yoke
#

wait no in this case it's 3d

#

they have volumetric images of ct scans or something, voxels are how you refer to the pixels in a 3d image

lapis sequoia
fallow sandal
#

hmm interesting

#

also yeah, my model is having issues with doing two bounding boxes lols

#

but its working so im happy ^_^

#

thank you again @odd yoke

#

i can sleep it peace tonight knowing that atleast i didnt go away empty handed with diving into this mess of documentation lol

#

hmm

#

do you think thhe fact that I didn't have multiples of the goat

#

led to this maybe?

#

If I trained it better with pictures with multiples, that might have helped the training maybe..? not sure but random theory

odd yoke
#

multiples ? as in many instances of the goat in the same image ?

fallow sandal
#

yeah

odd yoke
#

it can help, but it shouldn't be needed

fallow sandal
#

Yeah hmm

odd yoke
#

(cute goat btw)

fallow sandal
#

thx ๐Ÿ˜‚ thats why I did it haha. we had a more practical idea of detecting different types of physical ports (USB) for the tech unsavvy but i guess i just wanted to see how hard it would take to use tensorflow

safe tapir
#

Anyone familiar with text-to-spectrogram?

Specifically interested in what spectrogram features make certain vocal characteristics (e.g. "sad", "happy")

lapis sequoia
#

hey friends, im making a GAN atm and im having a bit of trouble with the input pipeline and training step, would anyone be able to help me out?

lapis sequoia
#

im using the tensorflow pix2pix function as the basis for the generate images and train function

#

but im having problems with the images. i can show how i was storing my data as a h5py file

#

im not sure if the issue is that i need to load my images using a tf.dataset.dataset object

#

but then i dont know how to go about that

#

and im not too sure what to do with the generate images functions and stuff because the way they do it in the pix2pix documentation on the tensorflow website seems really efficient

slender robin
#

Hi all, I have some doubt about web scraping. Any one have experience about product image get in big basket.

#

I got other data for example Product name, quality, price...

hollow silo
#

pretty basic question, i am trying to decide between 2 projects to put on my resume
one project is where i built an OCR system from scratch - involved a lot of image processing to cluster and extract text patches from images and then pre-process them to look as close as possible to the actual training set
but the network was fairly simple and the data set was just EMNIST
and another was pointcloud segmention using a multi-class SVM
i hada dense point cloud data set and i trained an SVM to classify different regions in it. any suggestions?

fervent bridge
#

Take a look at this link, also by chance can you send me the code used to convert image arrays into HDF5 files. I am currently needing to do so would like some help.

lapis sequoia
#

for sure man

#

i was using sentdex as a basis

#

im thinking i might switch to numpy arrays but ill send in a sec

fervent bridge
#

Hmm Sentdex has an HDF5 video?

lapis sequoia
#

nah not on hdf5

#

he uses pickle but i used hdf5 because i had issues with pickle

#

pickle seemed to mess up my files

#

i was using an image classifier and running a test with it

#

and i got significantly worse results with pickle than i did with hdf5

fervent bridge
#

Ah what did you use to learn HDF5

lapis sequoia
#

it was a while ago but i looked up a bunch of stuff on like medium or towardsdatascience

#

i didnt do anything special other than store the image arrays in them

#

but i dont think its too complex from memory

#

this site's pretty good

fervent bridge
#

Ah ok I see what you did I mean it helps that you are using the same code base as I am ๐Ÿ˜„

lapis sequoia
#

hahaha

#

glad i could help ๐Ÿ™‚

#

what is your project?

fervent bridge
#

You just stored X and y in a respective dataset, cool cool, I want to do the same thing but without having to loop through all the images and store them in a var, as I am looping through 40,000 images 277x277. I was wanting to append to the X dataset and y dataset as I looped through the images so that I would not have to store the arrays in memory all at once.

#

Any idea on how to do this?

lapis sequoia
#

hmm

fervent bridge
#

I grabbed a image dataset from Google and am working on a ANN, CNN, RNN and checking the differences

lapis sequoia
#

so you mean appending them into one single dataset rather than what i did where i have images and labels as separate

fervent bridge
#

No I want to have them seperate as you have them, just that I want to be able to append my arrays to that dataset as if I was appending them to a list

#

So that I would not have to load them all into the list thus having loaded them into memory

lapis sequoia
#

ah right

fervent bridge
#

Not have to loop through and add all images into a list then to transfer that list into a HDF5 file

lapis sequoia
#

right

fervent bridge
#

@lapis sequoia Nice the link you provided has the same questions I did in the comment sections

#

It directed me to these two links, will have to read them through, thanks man. Is it fine if I ping you with any questions if they arise?

lapis sequoia
#

for sure man

#

im not all that great but ill try to help ^_^

fervent bridge
#

Haha well both learn

#

Have you gotten your hands on a Kaggle comp yet?

lapis sequoia
#

ah wait i think i get what youre trying to do

#

so you mean you have like different datasets that youre appending into one h5 file

#

and nah im on google's servers

fervent bridge
#

Just appending in batches into one dataset

lapis sequoia
#

yeah gotcha

#

my b

fervent bridge
#

So append like 1 or 100 image arrays at a time so that I do not have to append all 40,000 at once

lapis sequoia
#

yeye

#

so like

#

the img arrays are np arrays right

#

hmm

#

that stackoverflow link seems to do everything you need i think

acoustic halo
#

@fervent bridge Did you not manage to get memmaps working?

serene oar
#

Hi!

I'm plotting with pyplot / mpld3 and I notice that whenever I do

plt.show()

I see the correct X tick labels, which are normal strings from a list.
However, when I do

mpld3.show()

These labels don't show correctly and I just get numbers. It seems to be a known issue but I only found a fix for it for some guy using dates, not strings.

arctic cliff
#

What's the usage of this: plt.figure() Because when I got rid of it the plotting worked fine

serene oar
#

I do fig, ax = plt.subplots(figsize=(20, 10))

arctic cliff
#

What's it used for ?

serene oar
#

For plotting.
I create a bar chart comparing the stats of different features.

arctic cliff
#

I meant the fig variable

#

which is plt.figure() if I'm not mistaken

#

Plotting seems to work fine without it

serene oar
#

Not sure how to accomplish that.
I must create a figure to have the subplots in, no?

#

Also if I didn't have that fig I couldn't save it to html later. \

arctic cliff
serene oar
#

With pyplot it works for me too.
But I am aiming for mpld3. It's much more interactive when used as html.

arctic cliff
#

I see ..

#

Thanks !

serene oar
#

Soo.. any idea on how to get the labels to mpld3 plot?

    ax.set_xticks(x)
    ax.set_xticklabels(labels)

This does the trick for pyplot but not mpld3

arctic cliff
#

I just googled, I'm not sure if this is gonna help but here

serene oar
#

Hm, it's a bit different from what I'm going for but it might help. Thanks

autumn veldt
#

excuse me, do u guys know why i keep getting the same accuracy on my SVMClassifier? but i got a variant accuracy when i test on my DecisionTree(DT) Classifier?

desert oar
#

SVMs don't work in epochs

#

well, internally they do. because the fitting algorithm is usually iterative

#

in fact the same is true for DecisionTree

#

i assume these are sklearn models?

#

the whole idea of an "epoch" is really an implementation detail of gradient descent

#

basically anywhere other than a neural network, the software tries its best to hide the optimizer from the user

autumn veldt
#

yeah, it's sklearn models.
but sir, what should i do if i need some accuracy testing on SVM (when i know svm don't work with epochs)

desert oar
#

you need boostrapping or cross validation for that

#

you should do that with other models too btw

autumn veldt
#

like cnn?

desert oar
#

yes although if the model is big and complicated and slow to fit, then sometimes it doesnt make sense because it would take too long to run

autumn veldt
#

actually my dataset only 800+ images

desert oar
#

i recommend you step back and consider why all this is happening

#

look at how SVMs and decision trees are fitted

#

and why epochs are used in fitting NNs but not other models

autumn veldt
#

ok sir, thanks btw

safe tapir
#

Is anyone familiar with the text-to-audio generation process? I'm interested in realistic / emotional voices.

Specifically:

  1. Are there any datasets that contain labeled emotional audio data (e.g. "sad", "happy", "surprised")
  2. Is there any intuition for what spectrograms would look like which emotions?
grave frost
#

Hey all! I had posted a question earlier on what model would be good for cipher applications like Input: welcome Output; njoigfr. I had been recommended to use models like BERT with powerful word embeddings but it seems that NLP models study tokens with respect to other words in a sentence. My intention is not to have it take it as a sentence. My intention is for the model to find out a relationship between the input and output and on the basis of the relationship predict the output accordingly.

MY training data is a .csv file which looks like this:-
inp1, out1
inp2,out2

#

Here inp stands for input and out stands for output. So can anyone confirm whether a BERT-like NLP model can find the relationship b/w input and output data considering 1 row not to consider the whole dataset?

acoustic halo
#

I recommended bert to you, it was before i knew you were doing ciphers though and would definitely be a bad idea

#

bert is used for getting the contextual information between words, which a hash has no use for

grave frost
#

So would you happen to know what model would be able to handle that?

acoustic halo
#

Throw out any model that relies on word embeddings, unless you know for a fast that they are used in the hashing function, do you have idea how the hashes are actually generated?

grave frost
#

Yeah, They are made from a crytographic function though I don't know how exactly they accomplish that. For me, that function is like a black box...

acoustic halo
#

and are the output hashes always the same length?

grave frost
#

We can pad, them right?

#

The input has to be padded in any case...

acoustic halo
#

Well, does the output always have the same number of characters as the input?

grave frost
#

I think so. I haven't decided on a cipher yet but probably the input characters would be equal to the output ones...

acoustic halo
#

Well, if its going to be some kind of substitution cipher like enigma, perhaps a RNN would be best

grave frost
#

Hmm... What if I use a crytographic hash? then RNN's won't be very suitable then. The timesteps will have no relations whatsoever....

acoustic halo
#

Then you probably wont be able to solve it with a NN faster than brute force

#

Because you probably have a salt to figure out as well as an unknown algorithm

grave frost
#

The whole point is to determine whether there do exist any arbitrary relations between the hashes and the inputs. There is always that bias in there. Even though the chances are pretty slim, but I wanna experiment on them

acoustic halo
#

Just try a dense netwoork for a start

desert oar
#

is the entire message encrypted? or are we talking about individual encrypted words

#

because if you encrypt each word in a message then you're basically solving a "fill in the blank" problem where you probabilistically infer the mots likely word in each slot

grave frost
#

Initially I am considering hashes as input and the numbers as output..

desert oar
#

but to decrypt an entire encrypted message is basically saying "i dont care about the theoretical results im going to try anyway" which seems like its likely to result in failure but i guess it doesnt hurt to try

acoustic halo
#

The numbers correspond to the initial word?

grave frost
#

Of course, but doesn't hurt to try. The hashes are pretty complex, but I want to start delving into some pre-college research about it and maybe brainstorm some ideas in the later years...

#

@acoustic halo the hashes correspond to the numbers

acoustic halo
#

And how do you link a number to a hash?

grave frost
#

By encrypting it

#

Basically encrypting the number as output and the hash as input for the model

acoustic halo
#

Well, i would start with just a densely connected network as a start

grave frost
#

My model should be able to derive more than the statistical relationship and move towards complex ones. That's why I am struggling to choose the right model. High dimensional vector representations seem like a weak start, but would probably do.

#

But Dense layers cannot absorb abstract relations

acoustic halo
#

Considering you don't know if any relation exists anyway, it would be a start

#

And several stacked dense layers can learn complex and abstract patterns

#

depending on how you define abstract

grave frost
#

Well, good for a start I guess. May look for 1024 Dense layers just for starting ๐Ÿ™‚ but I guess it will do for experimentation....

flat quest
#

i mean stacked dense layers are basically what most neural nets are :/. And they work pretty well in a lot of cases

odd yoke
#

ehhh, that's a bit of a stretch

#

fully connected layers really aren't that common anymore

#

*networks, not layers

flat quest
#

well networks are different. There's this concept of sparse layers that is used, but numerically the operation is pretty much the same in most cases. We're still running computations over those 0's its just a lot more effecient.

odd yoke
#

Yeah I understand that you one can be used to represent the other (not that it should be done), but with that definition you can basically go down up to like addition and multiplication, and while it may be true, it's not exactly a useful definition

flat quest
#

true true, but at the same time a lot of people think these various layers are completely different things since they never look at the actual mathematical operation behind it.

Its good to know where their similarities lie, and why they work.

desert oar
#

are there any coherent resources on neural network architectures for more "traditional" problems? im specifically not interested in the typical deep learning domains like images, audio, video, nlp/text, or even time series. im wondering about more mundane problems like autoencoders and prediction on "social science" datasets, more akin to titanic, boston housing, etc. than mnist.

#

id be interested in any research comparing training times & prediction/inference performance with other methods like xgboost

#

i ask because i was recently playing around with some NNs that gave me huge increases in accuracy (like 10+ percentage points) on a problem at my company, using just 1 hidden layer with parameters that just sounded like nice round numbers and weren't hyper-optimized at all. so it got me thinking that there was a lot of untapped potential for neural networks in domains where they aren't necessarily popular or dominant. trying to educate myself a bit.

lapis sequoia
#

What is Data Science. I dont really have a clear understanding of what it is and what it is used for.

desert oar
#

@lapis sequoia it's a broad term that encompasses statistics, machine learning, and data analysis. usually someone with a "data scientist" job title works on some combination of those things.

past maple
#

hello anyone here?

#

have a little doubt here.

#

so i trained a model and the accuracy shows to be around 90%.
but when i submit the results, my AUC-ROC Score comes out to be very low.(in the range of 0.5)
so what i am doing wrong?

desert oar
#

@past maple this is binary classification? are your classes very imbalanced?

past maple
#

yes its binary classification.

#

also yes imbalanced classes.

#

but when i use random forest the accuracy is quite less like 18% but the AUC-ROC Score score improves. (in the range of 0.8)

#

@desert oar

desert oar
#

if your classes are 90% "A" and 10% "B" your model can get 90% accuracy by predicting "A" for any input

#

random forest might be doing a better job at not overfitting to the baseline class distribution

past maple
#

so how do i overcome this thing?

dreamy fractal
#

If the data is highly imbalanced, computing the accuracy is perhaps not the best way to evaluate your model

past maple
#

then what should this poor soul do?

dreamy fractal
#

I think your first model always predict 0 or always predict 1, hence the AUC score close to 0.5

past maple
#

yes right.

dreamy fractal
#

Consider using other metrics such as precision, recall or F1 score. You can also vizualise the confusion matrix to see where you make most of your errors

past maple
#

okay noted, will check with that.

desert oar
#

depending on your model you can improve the outcome by adjusting hyperparameters

#

you might also have success with oversampling or undersampling, those dont always work well though

past maple
#

yes, i have tried adjusting the hyperparameters for random forests. but then it slightly improves the model.

charred blaze
#

I'll second the use of other evaluation metrics. Consider those which are more adequate for your scenario of a binary classification with an unbalanced label distribution, like weighted accuracy or class balance accuracy. For binary classification, I'm quite partial to geometric mean of sensitivity and specificity.

past maple
#

okay okay, thank you tho. i will see what i can do.
honestly i am just getting started so figuring out these things.

lapis sequoia
#

python ok

iron rampart
#

Hey, so i've been doing Machine Learning for a few days and have this code

import numpy as np
import pandas as pd
from sklearn import linear_model
import sklearn
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
from matplotlib import style
import pickle

style.use("ggplot")

data = pd.read_csv("student-mat.csv", sep=";")

predict = "G3"

data = data[["G1", "G2", "absences","failures", "studytime","G3"]]
data = shuffle(data) # Optional - shuffle the data

x = np.array(data.drop([predict], 1))
y =np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)


best = 0
for _ in range(20):
    x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)

    linear = linear_model.LinearRegression()

    linear.fit(x_train, y_train)
    acc = linear.score(x_test, y_test)
    print("Accuracy: " + str(acc))

    if acc > best:
        best = acc
        with open("studentgrades.pickle", "wb") as f:
            pickle.dump(linear, f)

pickle_in = open("studentgrades.pickle", "rb")
linear = pickle.load(pickle_in)


print("-------------------------")
print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)
print("-------------------------")

predicted= linear.predict(x_test)
for x in range(len(predicted)):
    print(predicted[x], x_test[x], y_test[x])

plot = "failures"
plt.scatter(data[plot], data["G3"])
plt.legend(loc=4)
plt.xlabel(plot)
plt.ylabel("Final Grade")
plt.show()```

But i still just don't get it. The results are still the same. It looks like it won't learn anything
uncut shadow
#

wdym?

iron rampart
#

So the result's are the same, i can not see if it learns from it

tidal bough
#

It looks like you're recreating the model every epoch ๐Ÿ˜› No wonder it doesn't learn.

iron rampart
#

I'm sorry my english isnt that great haha and i've just begon so if i being honest. Don't know what to do, and how to not recreate eery epoch

desert oar
#

this is the same problem that someone else had

#

you dont use epochs with sklearn models

iron rampart
#

So i should just remove the line of code?

quartz crow
#

data science best tutorial ?

#

i am a beginner at the moment

sand spoke
#

Specializations on Coursera can help you

fervent bridge
#

@acoustic halo Just saw your message now, just figured that HDF5 would be of more convenience later down the road just in case I have to move around between libraries, better to learn it now then later

#

@lapis sequoia Did the link help you out

quartz crow
#

actually a lot of things bro ml deep learning and computer vision engineers and data scientist which to choose

fervent bridge
#

You have $10 to spare @quartz crow ?

quartz crow
#

no bro i am a poor at the moment

fervent bridge
#

ML/DL, computer vision, data scientist all kind of fall under the same umbrella

#

with computer vision you utilize ML/DL

quartz crow
#

yeah i know

fervent bridge
#

just you are working withimages

#

as a data scientist if someone gives you images as data you as a data scientist are supposed to extract valuable information from that data and make it work with ML/AI

#

Hmm if you had $10 I would of recommended a nice update tensorflow 2 course that covered all those topics

#

but take a look at Sentdex on youtube

#

he has a lot of tuts some may be outdated though

quartz crow
#

hmm ok . i need help for cv

fervent bridge
#

Take a look at this one its updated

tidal bough
#

coursera also has plenty of nice courses

#

Most of them are paid, but you can access any paid course in audit mode, which as far am I'm aware literally only disallows you from doing quizzes. All the materials, and most importantly programming assignments (including their automatic grading) are available.

turbid hearth
#

how can i fit my regression model better

flat quest
#

tbh there's enough free material out there that paid courses aren't entirely necesarry.

The advanced stuff you can just learn through medium and reading through papers.

fervent bridge
#

@flat quest Yeaup but I wouldn't recommend it if wasn't a good course, medium and reading through papers requires that the reader most of the time build their own structure in order to learn what to do next. I mean not many have a A-Z fully structured medium article ๐Ÿ™‚ but yes a lot of free material

#

Woot woot got HDF5 to work in appending mode ๐Ÿ™‚ currently looping through my 40k images ๐Ÿ™‚

flat quest
#

yeah you have to figure out how to get the information, and which one will be most useful for you

But its something everyone has to do eventually

#

nice! ;D. Guess it didn't take too long to learn?

fervent bridge
#

Nope, I mean internet was getting installed today, was out, took about 2 hours of research

#

๐Ÿ™‚

flat quest
#

ah i see. Yeah its worth learning it, can use it with a number of different libs/packages.

fervent bridge
#

Yeah always better to get the tough part out early rather then later.

flat quest
#

^^

#

its also more effecient so makes working with data a lot nicer

fervent bridge
#

Yeah I see it's taking care of the reshaping in itself per batch. So I don't have to reshape through Numpy.

#

HDF5 mantains order right? @flat quest

past schooner
#

hey guys, I've got a work problem
I'm scraping media news articles and I need to get the article's text that's not a script or some other random sh*t you can find in html data. I'm using BeautifulSoup, for example soup.select('article p')

#

it is 1 am and I'm asking questions on a discord channel I just joined, so bear with me

bitter harbor
#

You know if their tosโ€™s allow it?

past schooner
#

if you mean the source, yes, though good point to ask

flat quest
#

I think so yeah @fervent bridge. Not entirely sure on that one

blazing bridge
#

Hmm if you had $10 I would of recommended a nice update tensorflow 2 course that covered all those topics
@fervent bridge could you send the link to it

bitter harbor
#

Iโ€™m not too familiar with soup but would changing the .select class to children work?

outer fulcrum
#

Hey guys, do you know a good project example in data science where I can train OOP ?

fervent bridge
#

@blazing bridge Its a great course for High Level knowledge, I mean I consider it a must have. Covers a wide range of NN in TensorFLow 2

#

Again its all High Level focused around TensorFlow 2 but great to work with.

bitter harbor
#

ty

blazing bridge
#

@fervent bridge thank you

fervent bridge
#

yeaup going to bed, but This course and NNFS got me on the right track, NNFS providing more lower leven knowledge and the Udemy course complementing it.

desert parcel
#
import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader

inputs = np.array([[313, 1], [323, 1], [333, 1], [343, 1]], dtype='float32')
target = np.array([[14.76], [16.42], [18.08], [23.41]], dtype='float32')

inputs = torch.from_numpy(inputs)
target = torch.from_numpy(target)

model = nn.Linear(2, 1)
preds = model(inputs)

train_ds = TensorDataset(inputs, target)
train_dl = DataLoader(train_ds, batch_size=5, shuffle=True)

loss_fn = F.mse_loss
loss = loss_fn(preds, target)

opt = torch.optim.Adam(model.parameters())

def fit(num_epochs, model, loss_fn, opt):
    with torch.autograd.set_detect_anomaly(True):
        for epoch in range(num_epochs):
            for xb, yb in train_dl:
                pred = model(xb)
                loss = loss_fn(preds, yb)
                loss.backward(retain_graph=True)
                opt.step()
                opt.zero_grad()

            if (epoch+1) % 10 == 0:
                print(f"Epoch: {epoch+1}, loss: {loss.item()}")

fit(50, model, loss_fn, opt)

Output:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!```

#

I followed the hints

#

But have no idea what that means

#

So I guess I tried to follow the hint

desert oar
#

oh boy that's a fun one

#

what is loss_fn? @desert parcel

#

can you also show some of said backtrace?

lapis sequoia
#

I wish they had an ML practice section CodeWars
just writing short code or "complete this code" for Numpy/R/PyTorch to practice when there is time to kill
is there any website like that?

uncut shadow
#

Don't think so. it would require lots of computational power cuz u can't just check if your code is the same as the correct one

#

(it assumes u mean deep learning)

#

But other algorithms might require lots of power too

#

So, (IMO) pretty unlikely

#

But ofc u can google and check it yourself

lapis sequoia
#

Kaggle is good

#

actually there is a lot of good content on there

#

but a lot of the quick practice content is from universities, and they lock access to everyone but the people in that course

desert oar
#

@lapis sequoia just download a dataset and play with it

#

e.g. from the UCI machine learning site

#

plenty of small clean easy to understand datasets out there

#

or simulate your own data, which is more advanced but potentially more educational

lapis sequoia
#

that's not how I learn

#

it's the same as giving a student a bunch of problems without teaching him how to solve them and say "just try them"

desert oar
#

oh, you are asking for specific tasks to complete

#

not just data

#

there is actually an interesting lack of that, out there in the world

#

obviously students get those kinds of assignments in school

lapis sequoia
#

I believe Udacity and Coursera courses have such practice problems

desert oar
#

it could be an interesting niche

#

yeah

#

but nothing public like you're talking about

lapis sequoia
#

make a new website

#

you can charge for compute cost

desert oar
#

who even writes questions for those kinds of sites?

#

yeah it could obviously time out computations after a certain point, limit memory usage and processes/threads

#

charge for premium membership to answer more than N challenges per day

#

etc

#

seems like a legit product tbh

high pulsar
#

hey guys, I've got a work problem
I'm scraping media news articles and I need to get the article's text that's not a script or some other random sh*t you can find in html data. I'm using BeautifulSoup, for example soup.select('article p')
@banville#2284 You can use regex maybe?

odd yoke
#

Using regex to parse html is an absolutely terrible idea

desert oar
jolly briar
#

idk , for simple stuff regex can be ok, as a general rule its better to use bs4 or something, but for something simple egrep and sed can be useful

grave frost
#

Can anyone explain why Keras Embedding layers doesn't accept strings? It seems to run on numbers fine however For strings it requires one-hot encoding which kinda defeats the purpose of creating the embeddings. In the end, I got it One-hot encoded (a -> 1; b -> 2) But am still curious. Isn't the whole purpose of embeddings to represent data in higher dimensions? Why didn't Keras implicitly understand it and encoded it accordingly??

#

Would it be just a lack of feature in Keras, or does it make sense not to have the embeddings accept strings and have the dev one-hot encode them?

spark stag
#

@grave frost when one-hot encoding, you turn some value, in your case characters, into an array of values, containing one truthy value (a one), if keras were to try one-hot encode data as you feed it to it, it wont know how long to make each array, e.g. if trying to one hot encode a sequnce like 0, 3, 1, 2, 0, 3, then this could be turned into a matrix of shape 6x4 (6 items, 4 classes), if the model is being slowly fed information then it does not know how many classes there will be so will have inconsistantly shaped arrays representing each class (it could one-hot encode all data at once but then i think it needs to have every possible class in that input so it knows how many classes there are as this shouldn't be changeable)

fringe cove
#

hello i have a simple dataframe with 2 colums including minutes and points. i just would like to see by curiosity what a sklearn model would predict for this dataset. how can i do that please. i dont really know sklearn and would like to see thank you

grave frost
#

@spark stag What about compromising on the inconsistency of input dim by padding after doing analysis of the data pipeline OR more practical, having the user specify the custom dims of the batches if data is like that and handle it accordingly. Like if I had encoded something like [0. 0. 0. 1. .....] for each character, it would be overkill and too memory intensive (Like used by scikit-learn lib for 1-hot). I just think that the whole way it works could be improved manifold and is a bit too complex....

#

@fringe cove look up Linear Regression...

fringe cove
#

ok thank you

sinful dock
#

hey folks, anyone knows how to drop multiple columns from a datframe using slices of index locations?. I've been stuck on this for a couple of hrs and can't find anything on Stackoverflow , wanting to drop columns with indexes 1:31 and all columns after column index 67 stage_metrics.drop(stage_metrics.iloc[:, [1:31, 67:], axis = 1, inplace = True)
Also tried this one stage_metrics.drop(stage_metrics.columns[1:31, 67:], axis = 1, inplace = True)

grave frost
#

Why don't you use a Pandas DataFrame? It has all these functionalities and would serve as a much better and arguably a more feauturefull tool for any dataset...

sinful dock
#

yes, I'm on pandas, sorry trying to get the code to display in color

spark stag
#

@grave frost there probably are ways of doing it but its quite a lot of overhead for it to process as its being fed data, especially as, unless they process the data every time they see it each epoch, a new copy of the data needs to be made that is one hot encoded so now you have 2 copies in memory

grave frost
#

Hmm... that makes sense

spark stag
#

idk, there may be an easy way for it to be implemented but i wouldn't say in my experiance at least its too much effort to do manually, especialy considering how much easier keras makes setting up a neural network in general

grave frost
#

Of course, But I had to spend an hour or something just to make that one-hot encoding work (I don't like coding) And I couldn't use Sk-Learn at all for my use case..

#

A dedicated lib for that would be so much better and smooth..

flat quest
#

@desert oar

A lot of beginners might use that kind of site. Don't think it'll be really that helpful in pursueing a DS career, but yeah there's a good chance beginners might buy

desert oar
#

@flat quest yep, about as useful as pursuing a programming career ๐Ÿ˜›

#

good for younger people i think

#

or real real novices who dont know enough to make toy problems for themselves

flat quest
#

xd
Maybe as an introductory. They're gonna have to learn to ask their own questions on the data and make their own problems. Not many ppl get to that stage :/.

But if they buy it -> its a selling product ๐Ÿ˜‰

grave frost
#

With what all the blaze on Youtube and other resources, it seems hard to beleive that any beginner would buy somthing like that. When I was starting up, I saw plenty of these paid resources but the free ones don't have any problems. The real factor is that these paid resources usually just bunch the topics in the right order in one place so as to not have people looking complex eqs on Wiki or hunting YT for an explanation on k-means clustering. That said, few people do them for learning. Mostly they are for boosting credentials for newbies who think they matter....

fringe cove
#

i suppose these scores are y = ax+B ?

flat quest
#

you'd be surprised how many noobies do that
Yes you can learn DS through reading online articles, books, yt, working on your own datasets, etc. But very few people are actually willing to go through all that. They'd rather complete a course or a set of problmes that would certify them as job-ready.

Tho algorithmic competition sites are quite widely used. So some people might do it just for the fun of it

grave frost
#

Is the plot for the whole data, and is it correctly represented? double check all your code because it doesn't seem like a Linear problem but rather a regression one.

fringe cove
#

i think i messed up in my head

odd yoke
#

@fringe cove no, it's the R2 coeff

grave frost
#

@flat quest But to be honest, they really aren't actually much use even for getting "Job-ready". I have read experinces of many Data Scientists who have done an analysis on people who have put MOOC's on their CV and whether they got the job or not. The numbers aren't very pretty....

odd yoke
#

it's an indicator that represents how well your model fits the data

fringe cove
#

oh yeah ofc

#

haha

flat quest
#

oh they're not useful at all @grave frost

But noobies will always fall for it.

grave frost
#

Newbies will fall for anything, as long as it looks professional and is affiliated to a big company..

flat quest
#

^^

#

and thats how you make a selling product xd

fringe cove
#

so this is the data i have just to start over becausee i think i'm overplaying it. this is nba scoreboard for a season of aa player

odd yoke
#

Saying they're not useful at all definitely wrong, sure, doing a MOOC doesn't mean you're fit for a job yet, but it doesn't mean you didn't learn anything doing said MOOC

fringe cove
#

if i know this player will have a minutes > 30 minutes in next game

odd yoke
#

Also, saying "MOOC are bad just watch youtube" is laughable

fringe cove
#

is it possible to have a model from all these data ? and make a prediction for points ?

grave frost
#

@odd yoke Of course, but YT would still be free anyways...

odd yoke
#

Yes, that's true, but that's also the case for some coursera courses for example

#

(I agree the """"certification"""" they give you at the end is basically digital toilet paper)

flat quest
#

sure you can gain exp and knowledge from an MOOC
but an MOOC doesn't really mean all that much to a job recruiter

odd yoke
#

And when you see frauds like Siraj Raval on yt having such big communities, I find it hard to say that using yt is a better idea

fringe cove
#

his videos are cool ( as a complete newb)

odd yoke
#

He is the embodiment of the ML hype taken to the extreme

#

He doesn't know much about it, but pretends he does, because it draws people in

grave frost
#

@fringe cove Take my advice- stop watching him now

fringe cove
#

^^

#

i'm trying to get practical with data by making some scripts with nba data for player performance

#

i have mastered the scraping and now have a complete data for the season for every player

#

as u can see i can plot things etc and make some deductions with my brain

#

but i d love to see what a mathematical model could do

sinful dock
#

@flat quest What you guys recommend then to use to learn instead of MOOC's if you are beginner?

odd yoke
#

I think MOOCs are fine

grave frost
#

@odd yoke Did you see some of the videos that gave evidence that he:-

  1. Plagiarised a Paper on Neural cubits and claimed that he wrote it
  2. Copied tons of Code from Github without citing the author, made minimal changes, and called it his own code
  3. Filed a YT copyright infringement on another YTber who unearthed all his black activities
  4. scammed newbies in a $200 course titled "How to earn money with ML" from which he made approx. 200,000$
odd yoke
#

Yeah I did

#

I'd rather not talk about him or I'll get angry and spam this channel when someone is asking questions

grave frost
#

Ok ๐Ÿ˜†

fringe cove
#

yeah stop fight and just model my shit xd

flat quest
#

they're not bad for new people. Just don't list them on your resume or depend on them too much.

But at some point you should transition into just reading through other people's work and suggestions (articles, papers, books, other resources, yt maybe), rather than following a predefined course @fringe cove

odd yoke
#

Yeah, just like "regular" programming really, you can't stick to tutorials indefinitely

sinful dock
#

Agree, you have to put that knowledge into practice. for someone that doesn't have a Comp Sc education i think it might help to redirect attention to another area

fringe cove
#

what should i look for when what i want to do is like feeding lots of data and make the model find the best fit for predictions ? i dont know if i make sense at all

#

but in my case

#

tonight orlando plays against indiana

#

i would like to know what a model would say about one player points for tonight game

odd yoke
#

You're defining machine learning here, it's kinda hard to give a useful answer with such a broad question

fringe cove
#

according to all previous data

#

yeah i want ml haha

#

only experience i had was mechanical arm movement training while in internship

#

but nothing else

odd yoke
#

There is all sort of recurrent networks you could use if you want to preserve knowledge from past matches, I don't have experience in that area so perhaps there are better fitting algorithms but maybe start looking there

fringe cove
#

but can u do this in like 2 lines of code with sk learn just to see a basic view ?

hearty seal
#

hello guys

fringe cove
#

i realise it looks candid

#

but rn i'm just curious af

#

and have no knloedge at all in ml

hearty seal
#

i just wanted to know if the book i bought "datascience from scratch" is a good book to start data science with

fringe cove
#

if u bought it i think thats it is worth trying it lol

hearty seal
#

its like 450 pages xd

#

i am trying to finish a python tutorial book first before i start

odd yoke
#

never heard of it

hearty seal
#

sorry if i killed your convo there

fringe cove
#

dw i'm just a noob like depending on them to tell me what to do so ^^

odd yoke
#

ask away

#

you mean you have the indices of an element, and want to find its column ?

fringe cove
#

if u know the cell u can find the column name with the indice no ?

arctic cliff
odd yoke
#

.duplicated

balmy ice
#

Hi i am a student doing web development with django.
I am thinking to start moving toward AI and deep learning!
Can any of you show me the right path to start?

tidal bough
#

It uses Octave for them, mind, not Python.

lapis sequoia
#

Hi all! Is this a good place to ask a question about Pandas?

tidal bough
#

Yeah, pretty good.

lapis sequoia
#

Stupid question incoming. How can I append a row to a dataframe? I've been using append with no luck. I have the row in a list.

desert oar
#

@lapis sequoia show us sample data & the code you're running, which reproduces the error or problem you have

lapis sequoia
#

right away

#
    return [i**2, i**3, i**4]

df = pd.DataFrame(columns=['i','a','b','c'])


for i in range(100):
    [a,b,c] = my_fun(i)
    df.append([i, a,b,c])
    
display(df)```
desert oar
#

ah

#

DataFrame.append doesn't work like list.append

#

it doesn't modify the dataframe, it creates a new one with the row appended

#
def my_fun(i):
    return [i**2, i**3, i**4]

df = pd.DataFrame(columns=['i','a','b','c'])


for i in range(100):
    [a,b,c] = my_fun(i)
    df = df.append([i,a,b,c])
    
display(df)
#

however i don't really recommend constructing dataframes this way. it's quite inefficient

#

it's much faster if you do it like this:

def my_fun(i):
    return [i**2, i**3, i**4]

colnames = ['i','a','b','c']

data = []
for i in range(100):
    a, b, c = my_fun(i)
    record = {'i': i, 'a': a, 'b': b, 'c': c}
    data.append(record)

df = pd.DataFrame(data)
lapis sequoia
#

make a list of dicts

#

and then a dataframe out of the list of dicts?

desert oar
#

or maybe better still:

def my_fun(i):
    return [i**2, i**3, i**4]

data = []
for i in range(100):
    a, b, c = my_fun(i)
    data.append([i, a, b, c])

df = pd.DataFrame(data, columns=['i','a','b','c'])
#

yeah list-of-dicts is one way

#

list-of-lists is another

lapis sequoia
#

aha

#

so don't modify the dataframe

desert oar
#

yeah adding rows to a dataframe is really slow

lapis sequoia
#

only call stuff from it

desert oar
#

adding columns isn't that bad

#

actually adding columns is pretty efficient

#

but adding rows is slow and you should avoid it if possible

lapis sequoia
#

ok

#

thans a lot

#

one more thing

#

where can I learn stuff like this about pandas?

desert oar
#

reading the docs and trying things

lapis sequoia
#

I am a Matlab refugee with 0 pandas experience

desert oar
#

there are a lot of docs pages, not all of them are well-written or easy to understand

#

ah

#

well, you should feel comfortable with numpy

#

which is basically modeled after matlab

#

pandas is more like R

#

or more like Excel if you've never used R

lapis sequoia
#

ok

#

I tried with the docs

#

not easy

desert oar
#

yeah, you have to suffer through it

#

one of my many "todos" is to contribute better user guide content for pandas

quartz crow
#

for machine learning engineer what skills are necessary

lapis sequoia
#

@desert oar Thanks a lot for your time! Have a nice day/evening!

desert oar
#

youre welcome

serene scaffold
#

Has anyone here used async for their data science stuff?

desert oar
#

only for webscraping or otherwise hitting APIs. not much value in it otherwise

serene scaffold
#

I've used joblib to parallelize stuff but that's not the same thing.

desert oar
#

async =/= parallel

serene scaffold
#

right

desert oar
#

stick with joblib

serene scaffold
#

can you parallel with async?

#

or is it not even meant for that

desert oar
#

yeah its not meant for that

serene scaffold
#

then what's async for for?

desert oar
#

asyncio lets you run stuff in a separate process w/ run_in_executor

#

thats a complicated question

#

you know how __iter__ works?

serene scaffold
#

ye

odd yoke
#

concurrency != parallelism

desert oar
#

now imagine you await before you yield

#

that's what async for is

#

but yeah, async/await isn't even a good programming model for computational parallelism

#

let alone a good way to implement it in python

#

stick with joblib or concurrent.futures.ProcessPoolExecutor or multiprocessing.Pool

#

or dask or ray et al

odd yoke
#

oh you said what i posted above already

desert oar
#

yeah

#

well

#

not exactly!

#

๐Ÿ˜›

odd yoke
#

i went to take my food and i saw this convo, should have scrolled up if i wanted to be useful

desert oar
#

lol it happens

#

anyway async/await can make life easier if you're hitting APIs and you want the freedom to ctrl+c without doing a bunch of extra work

arctic cliff
serene scaffold
#

I think replace will only do the first occurrence

#
while currency_symbol in my_str:
    my_str = my_str.replace(currency_symbol, '')
#

could work

arctic cliff
#

There's only one symbol in every price

serene scaffold
#

๐Ÿ˜ฎ

#

but aren't there four here?

arctic cliff
#

After the sum

#

Because they are strings/objects

#

I assume

#

wait

#

Price object

#

Yeah

serene scaffold
#

I have to head out but maybe rock salt lamp can help.

desert oar
#

you have a lot of issues here

#
  1. what are you actually trying to do
  2. what does the source data look like
arctic cliff
#
  1. I'm trying to sum every price that has a specific same year date so I can get the earnings of every year
desert oar
#

so the data is like pd.Series(['Free', 'Free', 'โ‚น 1,000', 'โ‚น 530,000']) etc.

#

right?

arctic cliff
#

Yeah

desert oar
#

ok

#

well those are strings

#

python has no idea that the text contains numbers

#

so you can't just add them and expect them to be added like numbers

#

python doesn't know that "Free" means โ‚น0

#

so you need to parse the strings, to extract numbers

#

i can give you a solution, but you've been in this server long enough to start developing your own solutions

#

once you know the basics, "how do i do X" is a matter of putting together what you already know. maybe 80-90% of the time.

arctic cliff
#

Right ..

#

Ok wait

twilit badger
arctic cliff
desert oar
#

ok

#

nice try

#

it can be done more simply

arctic cliff
#

How ?

desert oar
#

just clean up the prices first

#

make a new column of "price" that contains numbers

#

you can use regex to remove all the non-numerical characters:

df['Price_num'] = df['Price'].str.replace(r'[,โ‚น ]', '').map(float)
arctic cliff
#

Oh ..

desert oar
#
df['Price_num'] = (
    df['Price']
    .str.replace(r'[,โ‚น ]', '')
    .mask(lambda x: x == 'Free', 0.0)
    .map(float))

forgot to handle the "Free" case

#

now you can do whatever you need to do with df['Price_num']

odd yoke
#

str.replace doesn't use regex btw

desert oar
#

in pandas it does

#

in regular python it doesnt

odd yoke
#

oh right

desert oar
#

kind of poor choice imo

arctic cliff
#

mask ?

desert oar
#

should have made it not regex, then given regex=True or something as a parameter

#

mask is a bit of a weird function

arctic cliff
#

Is it a python thing ? Or is it related to Pandas ?

desert oar
#

pandas

#

pd.Series.mask

#

there is also pd.Series.where which does almost the same thing, but "reverse"

#

the first argument pd.Series.mask is a function that should return a Series of bool (True/False)

#

ah you know what

#

do this instead, easier to understand

#
df['Price_num'] = (
    df['Price']
    .str.replace(r'[,โ‚น ]', '')
    .replace('Free', 0.0)
    .map(float))
odd yoke
#

<@&267629731250176001>

arctic cliff
#

Here it treats Price values one by one?
Because I had to loop to make changes to everyone of them

desert oar
#

yes, pandas methods let you make changes without looping

#

they can be significantly faster than looping

#

and a lot less code

arctic cliff
#

I see ..

#

I will start making columns instead from now on

#

By the way

#

I know it's too early to ask but I'm just so excited
When should I start learning AI things?

desert oar
#

you can start learning concepts now, or at least some math

arctic cliff
#

I also know I'm not ready yet
I just wanna know when will I be

desert oar
#

it's good to learn programming concurrently with the math and the concepts

#

you start putting ideas together

arctic cliff
#

Oh?
Do you suggest a specific source ?

desert oar
#

for ai? no, i have no idea

arctic cliff
#

For the math of AI

desert oar
#

what's your academic level and background?

arctic cliff
#

Highschool

desert oar
#

start learning pre-calculus and calculus. logarithms, exponential functions, quadratic functions, derivatives

#

maybe you can start looking at intro probability & statistics

#

and very simple linear algebra, concepts like understanding what vectors and matrixes are

#

once you know a little bit on each of those areas, you will start to learn important terminology and concepts

#

the more you learn, the easier it will be to learn more

arctic cliff
#

I see !

mellow spruce
#

Doe anyone knows to fix a y axis that does not change with the addition of more traces in a waterfall/scatter plot chart?

#

Chart looks like this with one trace but the moment i add more, the y axis changes and it distorts the graph

#

Each trace index is following a list, however not every trace has all the elements of the list

desert oar
#

@mellow spruce you can use ax.autoscale(False) to disable changing the axes

mellow spruce
#

@mellow spruce you can use ax.autoscale(False) to disable changing the axes
@desert oar is that set on the trace or on the fig.update_layout()?

desert oar
#

neither

#

show your plotting code

mellow spruce
#

   

    sorterIndex=dict(zip(routing_list,range(len(routing_list))))

    group['Route']=group['ope_no'].map(sorterIndex)

    group.sort_values(['Route'], ascending=True, inplace=True)

    group.drop('Route',1,inplace=True)

    fig.add_trace(go.Scatter(

        name=k,

       mode='lines+markers',

        y=group['ope_no'],

        x=group['processstart'],

       

   ))

 

 

 

 

 

fig.update_layout(title="Title",

                  yaxis={'autorange':"reversed"})

fig.show()```
#

the first part is the order that I want each trace to follow

desert oar
#

ah

#

what is go

#

wait

#

is this not matplotlib?

mellow spruce
#

no, it's plotly

desert oar
#

oh

#

i have absolutely no idea

#

in the future, clarify what library you're using

#

i assumed it was matplotlib, i should have asked

mellow spruce
#

Sorry, my bad. Thanks anyway

willow parcel
#

my b if this is the wrong channel but heres a simplified part of my code

#

def foo(bar):
bar = bar + 1
return bar
play = True
while play:
baz = foo(0)
print(baz)

#

how do i get it so that it prints numbers increasing instead of just 1's

odd yoke
#

you can ask in a help channel

lapis sequoia
#

Hi.do someone know about a website where I can get info, data,statics .like a repository of covid 19?. I would yo get data for analyzing.

fervent bridge
#

@lapis sequoia where you able to load the HDF5 file into TensorFlow?

gray scaffold
lapis sequoia
#

@gray scaffold thank you

gray scaffold
#

no problem, enjoy

desert parcel
#

@desert oar Warning: Error detected in AddmmBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:42) Traceback (most recent call last): File "d:/python/ML/Corrosion test/test.py", line 37, in <module> fit(50, model, loss_fn, opt) File "d:/python/ML/Corrosion test/test.py", line 30, in fit loss.backward(retain_graph=True) File "D:\python\ML\lib\site-packages\torch\tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "D:\python\ML\lib\site-packages\torch\autograd\__init__.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck! Here is the entire error output.

#

There are versions where I tried to enable the anomaly detection

#

but have no idea how to

#

I did search it online but for some reason I can't find it

#

changing the optimizer to other options didn't work either

#

here is the code

desert parcel
#

I got it working

#

I copied another piece of sample code and that worked for some reason

willow karma
#

After googling/stack overflowing/githubing for awhile, I believe Facebook's Prophet modeling package does not include a feature importance method. Does anyone know a workaround to use here so I can see which predictors are most importance in forecasting the target?

arctic cliff
#

I'm trying to plot the earnings increasing but It's not working, What am I doing wrong ?

willow karma
#

@arctic cliff you shouldnt iterate on the plot method. If you just feed that method a dataframe with the time data as the index and the y values as your single column you'll get the result you want.

desert parcel
#

Could someone compare the code? The one on the left works but the on the right doesn't. I tried to find the difference but so far has seen no difference.

arctic cliff
#

@arctic cliff you shouldnt iterate on the plot method. If you just feed that method a dataframe with the time data as the index and the y values as your single column you'll get the result you want.
@willow karma Oh! Thanks a bunch

willow karma
#

@arctic cliff if you have a dataframe df with a date index and one column 'y_value'.. you would just need to run df.plot()

arctic cliff
#

You made my day

#

Can't I change the x and y ?

#

Ah nvm

#

Ignore me

desert parcel
#

how many other methods are there to improv epredictions

#

improve predictions*

#

other than the number of iterations and messing around with the learning rate

#
Predictions:
tensor([[ 5.7500,  7.2500,  8.0000],
        [ 5.7500,  7.2500,  8.0000],
        [ 5.7500,  7.2500,  8.0000],
        [15.0000, 14.0000, 15.0000],
        [ 5.7500,  7.2500,  8.0000]], grad_fn=<AddmmBackward>)
----------------------------------------
Originals:
tensor([[ 5.,  6.,  6.],
        [ 5.,  5.,  6.],
        [ 7.,  8., 10.],
        [15., 14., 15.],
        [ 6., 10., 10.]])
#

Because right now it's not the most precise

#

some are exactly on point

#

not all of them

#

Ohh maybe I can add more like stuff in the inputs

#

yes adding more data in the inputs worked

#

I added enough stuff until it became very precise

desert oar
#

the best way to improve prediction is to use input data that's strongly related to your target, and to represent that input data in such a way that the relationship is easy to learn

lapis sequoia
#

@fervent bridge hey sorry missed your message yesterday, been busy. Nah haven't been able to, might try as an npz or npy file

fervent bridge
#

Hmm did you want to go over it ? @lapis sequoia I am almost done getting it into TS

lapis sequoia
#

For sure

#

Will you be free in like half an hour

#

I'm just doing smth at the moment

fervent bridge
#

Yeaup

desert parcel
#

the best way to improve prediction is to use input data that's strongly related to your target, and to represent that input data in such a way that the relationship is easy to learn
@desert oar yeah that makes sense. I had like 5 extra rows of input data that's why the predictions were so close.

fervent bridge
#

@lapis sequoia ready?

lapis sequoia
#

Apologies, gimme a bit more

arctic cliff
desert parcel
#

I'm not sure how to fix it

#

there is an assertion error

#

assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)

fervent bridge
#

@arctic cliff

desert parcel
#

I put the error into google but everything is in chinese

arctic cliff
#

Thanks !

lapis sequoia
#

@fervent bridge yo

fervent bridge
#

@lapis sequoia ready?, been asking some questions trying to move along, shall we continue through DM?

lapis sequoia
#

yeah no prob

desert oar
#

@arctic cliff use Series.idxmax

arctic cliff
#

That's what I was looking for !

glacial comet
#

Hi All, new to the group. Is there a Data Science FAQ area?

granite light
#

I have a basic question for Pandas

#

I am new to it. Let's say I only want to consider values after a certain index. My index is integers. I have 50 rows and I only want to use data from row 26 onwards in my new calculation. There are three columns which are not in ascending or descending order

desert oar
#

vaex is on my perpetual todo list

#

@granite light data.loc[26:] if you want to use the index value 26, or data.iloc[26:] if you want to use the row number 26

granite light
#

@desert oar thank you. Now how can I use it in a condition? Let us say I am doing data[data.column1 > something & index> 26]

#

I am not sure how to write that condition to ensure that first condition is only checked on the rows 26:50

desert oar
#

you can save the subset of the data to a variable first

#

then apply your other conditions

granite light
#

True

#

But I am kinda trying to learn, so would like to know if it can be done without copying

desert oar
#
data_sub = data.iloc[26:]
data_final = data_sub.loc[data_sub['column1'] > something]
#

you aren't copying data

granite light
#

Also what about computation time in the two approaches?

desert oar
#

in fact if you try to modify sub pandas will give you a warning

granite light
#

you aren't copying data
@desert oar Okay. I hadn't considered this

desert oar
#

.loc and .iloc try to avoid making copies when possible

#

in most cases they return different "views" to the same underlying data

#

the pandas documentation avoids saying that they never make copies

#

but i cant think of a time i used it where it did make a copy

granite light
#

Thanks a lot, that makes it a lot easier

desert oar
#

it's a nice feature

lone nebula
#

and how can i merge rows with the same year column into a single row without breaking this

misty lake
#

I have a weird question, Can anyone suggest any techniques/ approaches like how goolge gives some value to a parameter question asked?

For Example
parameter value

  • "Temperate Today" - 20 F
  • "Rating Dark Night" - 4.4

basically Im encountered a problem to map parameters to its value from a large set of word documents. The word documents have complex table structure / paragraph and essays.

Parameters and value are in the document but not structured.

I'm looking for some help in cracking this

With Keyword search, NER Models I was able to get the parameters. But not able to find a solution to pull the relevant value of the parameters in a set of word documents.

#

Please tag me if someone could help

desert parcel
#

I have a basic question

#

np.random.permutation(n)

#

This just randomly chooses a few values from n right?

velvet thorn
#

uh

#

no, it randomly permutes (shuffles) np.arange(n)

desert parcel
#

so it just changes the order of the thing?

velvet thorn
#

strictly speaking

#

it makes a copy with the order changed randomly, yes

spark stag
#

if you pass it an iterable it will shuffle those values instead of np.arange() ```py

np.random.permutation((3, 5, 4, 2, 3))
array([3, 5, 4, 3, 2])
np.random.permutation((3, 5, 4, 2, 3))
array([5, 2, 3, 3, 4])```

desert parcel
#

ah gotcha

#

also

#

this part wasn't explained clearly in the yt tut

#
import numpy as np

def split_indices(n, eval):
    eval = int(eval*n)
    index = np.random.permutation(n)
    return index[eval:], index[:eval]

train_index, eval_index = split_indices(len(dataset), eval=0.2)
velvet thorn
#

if you pass it an iterable it will shuffle those values instead of np.arange() ```py

np.random.permutation((3, 5, 4, 2, 3))
array([3, 5, 4, 3, 2])
np.random.permutation((3, 5, 4, 2, 3))
array([5, 2, 3, 3, 4])```
@spark stag it will make a copy with shuffled values

desert parcel
#

So here it shuffles the array then takes 20% of it and puts it inside train_index and eval_index?

velvet thorn
#

uh

desert parcel
#

Or takes 20% then shuffles that

#

that being the 20%

velvet thorn
#

it shuffles an array that represents the index

desert parcel
#

uhuh

velvet thorn
#

then it takes the first x% of the shuffled array containing random indices

#

and uses that to form the training set

spark stag
#

@velvet thorn ah ye thats what i meant, it will use those values when crating the array but i guess i was't really clear on that

velvet thorn
#

and the rest for the evaluation set

desert parcel
#

so it shuffles it once, takes 20%, then shuffles it again?

velvet thorn
#

although I don't like that code

#

no, it shuffles once only

#

eval as a parameter name is Bad

desert parcel
#

So it shuffles once, takes 20% and put it into the variables?

#

Lol I didn't know what else to put it

#

the parameter set by the yt tut

velvet thorn
#

So it shuffles once, takes 20% and put it into the variables?
@desert parcel yes.

desert parcel
#

was something like

#

n_val

velvet thorn
#

well I'm not sure if you have the same understanding as me

desert parcel
#

Oh yeah lol

velvet thorn
#

so just to be clear

#

shuffle a sequential array (0, 1, 2...n - 1) representing indices

#

use the last x% for the train set and the first (1 - x)% for the evaluation set

autumn veldt
#

Excuse me guys.
so, im trying to run 5 random state. where the result of accuracy each random_state i want to save it into csv file, do u guys know how to do it?

outer fulcrum
#

What kind of package do you use to generate a pd freport of your data analysis ?

warm moth
#

Hi! I am pretty new to Data Science. I was wondering how you would do, say Regression, on Real time data? Would you have to train the model on the whole dataset again everytime new data is avaliable? Would you be able to Pickle the model then just do model.fit(x, y)
over and over again for every new data?

I am working on a little project which deals with realtime weather data and I want to predict the weather. I want to Implement it on my website and maybe a Discord Bot.

tidal bough
#

What you want is called "online learning" - when new data becomes available in batches and the algorithm should ideally be able to quickly update on the new data without being refit on the entire updated dataset.
https://en.wikipedia.org/wiki/Online_machine_learning

warm moth
#

What you want is called "online learning" - when new data becomes available in batches and the algorithm should ideally be able to quickly update on the new data without being refit on the entire updated dataset.
https://en.wikipedia.org/wiki/Online_machine_learning
@tidal bough Thankyou for the answer. I will check it out. Any idea on how I could go about implementing it in a Discord Bot or a Website? Like should I make an API which can be accessed by the Bot?

zenith saffron
#

what should I put inside kmeans.fit() if my file is pdf file that already have been pre-processing and using td-idf for this method to work or the function is not right. I already try to look at the stack overflow and other website but I can't found the answer. My program is kmeans clustering using pdf file. So I want to put an elbow method inside it.

tidal bough
#

@warm moth Might be a good idea if you will need to access it from different places (your website and the bot).

warm moth
#

Alrighty! Thanks for the Answer.

still delta
#

What are the best statistics books, you have seen at univ???

ornate dagger
#

Would preprocessing in Python (well any language, just using Python as an example) mean simply taking a look at the source code and copying ONLY used functions in this code from the imported modules that contain them? Here is an example;
module that will be imported:

def add(a, b):
  return a+b

def subtract(a, b):
  return a-b

source code:

x = 5
y = 4
print(add(x,y))

After preprocessing:

def add(a, b):
  return a+b

x = 5
y = 4
print(add(x,y))
acoustic halo
#

preprocessing has a variety of different meanings, what you put is an example of preprocessing but that is down to the task at hand and what youw ant to achieve

#

For example, I preprocessed a bunch of c++ files, and for me that meant removing all comments and undoing all the #define preprocessor directives

ornate dagger
#

and how would one go about removing all the comments and undoing all the #define directives?

#

wouldn't it be essentially having a function in a library perhaps that goes over your code and does this - same thing as described above?

acoustic halo
#

Comments largely with regex, undoing directives is a massive effort so unless you need to, i wouldnt recommend it

ornate dagger
#

not trying to replicate it, simply trying to understand the other variant of preprocessing more clearly.

acoustic halo
#

I have a program where i feed in the source code text and it spits out the processed code

ornate dagger
#

so basically preprocessing means editing/preparing the source code before it goes through it all?

#

whether it's importing functions from used modules or doing any else kind of formatting

acoustic halo
#

Yeah, in my case it was, I was building abstract syntax trees for each source code file, before that, each file had to be preprocessed in that way

ornate dagger
#

alright, thank you!

acoustic halo
#

Ultimately though, you have to know how you want each file preprocessed for whatever task it is you want to complete, and depending on that you might find that another way of pre processing your code that is betetr

lapis sequoia
#

What sort of augmentation should i be applying to a dataset of skin cancer images. It's well segmented but not doesn't contain many images (size 2 GB approx), and I'm going to try a few Transfer Learning architectures first. Also what metrics/score would be best to evaluate my model?

molten hamlet
#

is there easy way to assign function that generates batches of data into data generator and feed model while training? in keras

#
def get_batch():
  # example
  yield X, Y
steel roost
#

Where is a good starting point to start learning data science with python

grave frost
#

What exactly are you interested in learning about?

odd yoke
#

@molten hamlet are you using tf.data.Dataset ?

#

if so, there is a batch method

stark hornet
#

I desperately need help. is it useful to use/learn matplotlib when you can just export the data to excel or some similar program?

desert oar
#

Yes

fickle rampart
#

My goal is to develop a stock options backtester. I'm 2 months into learning programming(python specifically). With so much information and so many fields of study, what areas should I focus on in order to develop this backtesting program?

#

I've started learning pandas but not sure where to go from here. Should I focus on understanding classes and objects? What will I need to focus on in order for the backtester to make the correct selection of orders to buy and sell amongst so many rows of data as well as calculate the necessary statistics such as the profit/loss per strategy? Any guidance on this will help me a lot. I don't know where to look.

molten hamlet
#

@odd yoke no, but I solved it, fit supports generators since 2.0 I, think I can just pass generator function

flat quest
#

well the stock market is a really odd thing, especially rn. Breaking all the standard rules, so backtesting strats might not work as well as before.

But anyways, if you want to make a backtester, I would say learn the basics of classes and objects before jumping into pandas. As for pandas, there's lots of tutorials online and documentation is pretty good imo @fickle rampart

fickle rampart
#

Yes I agree pandas is well documented and since it's so widely used I've been able to find how to do things with it with some searching. The dillema I'm facing is that making an options backtester seems to be much more difficult than a stock backtester. While in stocks there is only one stock which never changes, in options there are hundreds of options that change every week. What would be useful for me to focus on in order to understand how to make the selection of the correct options with my code?

river fjord
#

It is so hard to read such large paragraphs, keep it short pepeLaugh

lapis sequoia
#

@desert oar you helped me with this yesterday but i had a followup question -- do you know why this fill_value is replacing everything in my dataframe with 0? this is the code

#
import pandas as pd

data = pd.read_csv('my-data.csv')
data['MONTH'] = pd.to_datetime(data['MONTH'])

new_index = pd.date_range(data['MONTH'].min(), data['MONTH'].max(), freq='MS', name='MONTH')
def fill_monthly(df):
    return df.set_index('MONTH').drop('APP', axis='columns').reindex(new_index, fill_value=0)
data_filled = data.groupby('APP').apply(fill_monthly)```
desert oar
#

it shouldnt be. can you also provide some small test data?

lapis sequoia
#

yea so

desert oar
#

its easier than me constructing some tiny data set

lapis sequoia
#

ID | MONTH | INCIDENTS
AP00094 | 2017-11 | 1
AP00094 | 2018-03 | 1
AP00095 | 2019-05| 3

#

it worked with some other dataframes but for this one im getting 0 replaced for everything

desert oar
#

is ID equivalent to APP?

lapis sequoia
#

yea

#

wait

#

i think it might be because my month columns are in string format right now

#

i didnt even notice. let me change that and see

desert oar
#

did you forget the to_datetime?

#

that line is necessary

lapis sequoia
#

yeah thats probably it. lemme try

#

yup that worked!

#

thanks

desert oar
#

๐Ÿ‘

lapis sequoia
#

I am pretty new to ML and DS so I might probably misunderstood the concepts but I hope someone can clarify it for me.
What is point of having multiple kernals in a CNN's convolution layer if the Maxpool in the next layer performs a max operation? Since all the kernal outputs will give the same max values per pool window.

#

It's not a python specific question, so I posted it here. Hope that's alright

odd yoke
#

(i'm assuming conv2d for this example)
if you have C convolution kernels, the output will have the dimensions HWC (or CHW based on what data layout you use), pooling operations is used to down sample the spatial dimensions (HW), the C dimension still keeps its size

#

and the kernels are not initialized with the same values, so the values won't be the same

#

@lapis sequoia ping in case you left

lapis sequoia
#

I'm here, reading it, thanks

#

Nah, I'm working with the grayscale images for now. I understood the downsizing the spatial dimensions part.

#

Wait, lemme try to use an example

#

Example output after convolving with a kernel:
[1 2 3 4]
[2 1 3 4]
[4 2 1 3]
[2 2 4 1]
Now if I do a max in axis 1, won't all of them become 4?

#
class Layer_Maxpool:
    def __init__(self, pool_scale):
        # Initializing attributes
        self.pool_scale = pool_scale

    def maxpool(self, img, maxpool_out):
        maxpool_out = np.zeros((conv_out.shape[0] // self.pool_scale, conv_out.shape[1] // self.pool_scale, conv_out.shape[-1]))
        for ix in range(img.shape[-1]):
            new_img = conv_out[:,:,ix]
            for i in range(maxpool_out.shape[0]):
                for j in range(maxpool_out.shape[1]):
                    segment = new_img[i * self.pool_scale:(i+1) * self.pool_scale, j * self.pool_scale:(j+1) * self.pool_scale]
                    maxpool_out[i,j] = np.amax(segment, axis=(0, 1))
        return maxpool_out

    def forward(self, inputs, training=False):
        self.inputs = inputs
        self.output = np.zeros(
            (
                inputs.shape[0] // self.pool_scale,
                inputs.shape[1] // self.pool_scale,
                inputs.shape[2]
            )
        )
        # Calculate output values from input ones, weights and biases
        self.output = self.maxpool(inputs, self.output)

This is the code I'm using. Might've made a mistake in it somewhere.

#

@odd yoke

odd yoke
#

You don't directly apply the max pool on the convolution kernel of the previous layer, you apply it on the output of said convolution

lapis sequoia
#

omg, I figured it out

#

I'm sorry ๐Ÿ˜…

#

Instead of broadcasting, I was looping over the images

#

You don't directly apply the max pool on the convolution kernel of the previous layer, you apply it on the output of said convolution
@odd yoke yeah, aware of that

#

the inputs here is the conv out

#

So the correct architecture of the model is:

Conv -> Maxpool -> ReLu -> Dense Layer -> Softmax

correct?

odd yoke
#

That looks good yep

lapis sequoia
#

Thanks a lot

odd yoke
#

You may see ReLu -> Maxpool instead sometimes, but it's the same result

lapis sequoia
#

yeah, was reading about that just now

odd yoke
#

Mostly for optimization purposes

lapis sequoia
#

I see

#

wouldn't subsampling it first reduce the overhead on Relu?
Not sure which on would be costlier as both strive to reduce the computation in their own way

#

intuition says maxpool does a tougher job

oblique belfry
#

Relu is a really simple function to execute. So, it doesn't matter much.

odd yoke
#

I think that's a reasonable assumption, I'm not knowledgeable enough in GPGPU to know exactly what they may do to make it faster with Conv -> ReLu -> Maxpool

lapis sequoia
#

I see

oblique belfry
#

relu = max(0, x)

lapis sequoia
#

yeah, aware of that tonus

oblique belfry
#

Not trying to talk down, but just typing as I think.

lapis sequoia
#

ah lol, okay

odd yoke
#

Yeah, but when we're talking about millions of weights, that relu operation that is ran several times per iteration can make a non-negligible difference

oblique belfry
#

Yeah....I don't disagree. But, I feel like that is a level of optimization that isn't necessary in my opinion to think about at this point.

odd yoke
lapis sequoia
#

But from maxpool's p.o.v, will max([1 2 3 4]) and max([-1 2 3 4]) make any difference?

#

ty, will check it out

odd yoke
#

When your program takes hours or days to train, even an improvement of 1% is important

oblique belfry
#

Had a typo. I don't disagree with you.

odd yoke
#

Ah, my bad

oblique belfry
#

Nah. It's mine.

odd yoke
#

So apparently, tensorflow doesn't optimize for it (yet?)

oblique belfry
#

I am not sure how I feel about TF doing that on its own.

#

Not saying it isn't an affective optimization. But, I think the dev should handle that. And, there should be better documentation on similar operations.

loud breach
#

hi, im a noob in neural networks and i was trying to make a very simple perceptron that simply tries to guess the slope.
so you give it a x, it needs to spit out the correct y (so curve fitting?)
the cost function is (a-y)ยฒ
i thought this is the way to calculate the new weight:
W1 = W0- learning_rate*i*2*(a-y)
is this right?

#

a is the network's prediction

#

y is the desired output

#

i is the input (so x)

oblique belfry
#

Are there any good benchmarks for Flax?

red carbon
#

anybody knows whats the best way to get the output of a particular hidden layer in a NN using pytorch?

odd yoke
#

you can create a list out of a model where each element is a layer

#

alternatively, when you define your model, store a reference to the layer that interests you and retrieve it using a method

#

this seems to be a very common question, there are multiple other solutions you can find online

arctic cliff
#

I don't get df.grouby()

velvet thorn
#

I don't get df.grouby()
@arctic cliff what about it

desert parcel
#

Could someone take a look at the tensor shapes, it's not getting the output I wanted

#

The first two parts of this work, but the final part i'm not sure how to get

#

I tried to do a .t() at targets to try and fix it but there are errors so I'm not sure what to do.

odd yoke
#

model = nn.Linear(13, 1) here you define your model as a linear model that takes an input of size 13, and has an output of size 1

#

I'm confused as to why it doesn't crash directly in your training loop

desert parcel
#

It didn't crash

#

model = nn.Linear(13, 1) here you define your model as a linear model that takes an input of size 13, and has an output of size 1
@odd yoke Alright but I changed it to (13, 13) but doing that just gives an error about singleton dimensions

#

I changed it to (13, 2) and that also crashed it

#

so it only works with (13, 1) I tried transposing the tensor but it didn't work either

odd yoke
#

which line crashes when you set it to 13, 13

desert parcel
#

The lines are linked

odd yoke
#

what's the exact stack trace ?

desert parcel
#

let me get it again

#
d:/Coding/python/ML/winrate.py:37: UserWarning: Using a target size (torch.Size([5])) that is different to the input size (torch.Size([13])). This will likely lead to incorrect results due to broadcasting. Please ensure they 
have the same size.
  loss = loss_fn(preds, yb)
Traceback (most recent call last):
  File "d:/Coding/python/ML/winrate.py", line 46, in <module>
    fit(250, model, loss_fn, opt)
  File "d:/Coding/python/ML/winrate.py", line 37, in fit
    loss = loss_fn(preds, yb)
  File "D:\Coding\python\ML\lib\site-packages\torch\nn\functional.py", line 2542, in mse_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "D:\Coding\python\ML\lib\site-packages\torch\functional.py", line 62, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (13) must match the size of tensor b (5) at non-singleton dimension 0```
odd yoke
#

oh it's the batch size

#

now as to why the shapes don't fit, can you print the shapes right before the loss in the loop ?

#

wait wait

#

you're using inputs

#

instead of xb

#

I'm not exactly familiar with pytorch, but that doesn't seem right

#

Also, is your dataset supposed to be one parameter and one label ?

#

In which case you want to set model to Linear(1, 1)

arctic cliff
#

@velvet thorn What's it used for ?

odd yoke
#

If really you don't understand it at all, I suggest you look at the documentations directly @arctic cliff

#

It's used for "grouping" values together based on some arbitrary criteria

desert parcel
#

@odd yoke sorry what.

odd yoke
#

Your model's input size isn't 13

#

it's 1, right ?

desert parcel
#

When I do .shape

odd yoke
#

13 being the number of examples in your dataset

desert parcel
#

it just gives [13]

#

Ohhhh

#

it's a 1,13

odd yoke
#

1, 1

arctic cliff
#

Guess I got it, Thanks !

desert parcel
#

so I just put in 1,1

odd yoke
#

Yes, and in the training loop, you're using inputs but I'm p sure you want to use xb

desert parcel
#

yeah inputs is xb targets is yb

#

It says there is a size mismatch

#

RuntimeError: size mismatch, m1: [1 x 13], m2: [1 x 1] at C:\w\b\windows\pytorch\aten\src\TH/generic/THTensorMath.cpp:41

#

After changing it model=nn.Linear(1,1)

#

m2 being model

odd yoke
#

which line causes this ?

desert parcel
#

I'm a bit busy right now I'll get back to you

#

Sorry about the wait

velvet thorn
#

@velvet thorn What's it used for ?
@arctic cliff many things, but the most common one is to apply aggregations over subsets of data

odd yoke
#

you shouldn't get 13 as input anywhere

velvet thorn
#

for example, say you have a dataset that contains three columns: department, name, and age

odd yoke
#

don't forget to remove ```py
preds = model(inputs)

print(preds.shape)

loss_fn = F.mse_loss # except this line
loss = loss_fn(preds, targets)```

velvet thorn
#

if you wanted the average age of the whole company, you would do df['age'].mean()

#

but if you wanted the average age of each department, you would do df.groupby('department').mean()

arctic cliff
#

Can't I do: df.department.mean() ?

velvet thorn
#

no, that would be the mean of the column department

#

which doesn't make sense because it contains strings.

#
>>> df
    department name  age
0   Accounting    A   36
1   Accounting    B   29
2  Engineering    C   24
3  Engineering    D   37
4  Engineering    E   33
>>> df['age'].mean()
31.8
>>> df.groupby('department').mean()
                   age
department            
Accounting   32.500000
Engineering  31.333333
arctic cliff
#

!

#

I got it !

velvet thorn
#

this is the simplest and (I think) the most common use case for groupby

#

but the general principle is split-apply-combine

#

split into subsets based on the value of a specified column, apply some operation, combine the results back into a DataFrame

#

in this case the operation is the mean aggregation.

#

however, you can do stuff like transform and filter, in particular

#

also, as you get more advanced you'll find that you don't have to group on only a single column, or even on columns at all

#

an easy example of the first case is...imagine you also had a "sex" column

#

you could do df.groupby(['department', 'sex']).mean() to get the average age by department and sex

arctic cliff
#

Let me try this out

#

Thanks

lapis sequoia
#

lst = eval(input("Enter list :"))
length = len(lst)
#List to hold unique elements
uniq = [ ]
#List to hold duplicate elements
dupl = [ ]
count = i = 0
while i < length :
element = lst[i]
#Count as 1 for the element at lst[i]
count = 1
if element not in uniq and element not in dupl:
i+=1
for j in range(i,length):
if element==lst[j]:
count+=1
#when inner llop - for loop ends
else:
print("Element",element,"frequency:",count)
if count==1:
uniq.append(element)
else:
depl.append(element)
#When element is found in uniq or dupl lists
else:
i+=1
print("Original list",lst)
print("Unique elemts list",uniq)
print("Duplicates elements list",dupl)

#

why I'm getting error?

#

$python main.py
Enter list :
Traceback (most recent call last):
File "main.py", line 1, in <module>
lst = eval(input("Enter list :"))
EOFError: EOF when reading a line

desert parcel
#

don't forget to remove ```py
preds = model(inputs)

print(preds.shape)

loss_fn = F.mse_loss # except this line
loss = loss_fn(preds, targets)```
@odd yoke wydm

odd yoke
#

remove that code, it's not part of your model

#

it may be what's causing the error with the shape 13

#

because you should really only have shapes 1 and 5

desert parcel
odd yoke
#

yes

lapis sequoia
#

What types of career paths are you all wanting to do with data science?

#

Just curious

#

New to this

brittle edge
slate scroll
#

@lapis sequoia I am a machine learning engineer, it is a growing area.

desert parcel
#

Could someone explain this line?

#
import numpy as np

def split_indices(n, eval):
    eval = int(eval*n)
    index = np.random.permutation(n)
    return index[eval:], index[:eval]

train_index, eval_index = split_indices(len(dataset), eval=0.2)
#

Here is the full code

#

So does it split the 20% between train_index and eval_index?

#

Here is the output I don't really understand it

desert parcel
#

mostly because they're different

velvet thorn
#

didn't we go through this yesterday

still delta
#

please, do you have a good "google API's" tutorial ?

uncut shadow
#

what

lapis sequoia
#

Series.filter(regex="..")

#

need to filter out strings ending with -org in the series

#

what will the regular expression be like

uncut shadow
#

no idea cuz u didn't show any example of data

lapis sequoia
#

@uncut shadow here you go

#

Nvm i used another approach

dreamy fractal
#

Hello guys, I have a question regarding Deep Learning frameworks. I know how to make simple neural networks architectures, but I have some difficulties implementing custom architectures even though I'm quite familiar with the theory behind the implementation. Do you have any ressources or ideas about how to practice the "coding" part of implementing custom neural networks using Tensorflow and/or PyTorch ?

lapis sequoia
#

Look up architectures and try to implement a broken down version of it

#

the last few chaps of Hands on ML with scikit learn and tensorflow are helpful

dreamy fractal
#

Will look into that, is the book adapted for Tensorflow 2.0+ ?

vocal sluice
#

i want to ask that im have data mean training data for object detection and i want to use tensorflow for this puporse i m labeling picture but the problem is that all the picture (mostly is in horizantal) and im labeling them i want to ask that is there any problem after my model will train coz of horizantal pic>>>>>>>............sorry for RiP Inglish

hidden halo
#

I need to do a calculation over a list where I need to find the number of items smaller than any item appearing prior to that item. I have written this function using numpy array for this:

L = [5,8,2,77,34,67,....,56,342,567]
num_lower = []
for i, j in enumerate L:
    cur_L = L[:i+1]
    lt = np.sum(cur_L < j)
    num_lower.append(lt)

Is there a way to vectorize this loop using Numpy

tidal bough
#

honestly, this sounds like it can be solved in O(n) by dynamic programming from the right

#

but to speed this solution up, using numba should work too if the contents are homogenous.

hidden halo
#

basically it's a time series data and for each number, I want to know where it stands with respect to historical data.
I'm not familiar with numba

#

I'll look it up

tidal bough
#

Pretty much just make this part into a function and apply the @numba.njit decorator to it.

#

It'll lag the first time you call it because it'll be compiled into C code, but then it'll be much faster.

#

of course, not all functions can be translated into C, but this looks like something that can - just some math and loops.

hidden halo
#

OK. I'll try it out. Thanks

#

honestly, this sounds like it can be solved in O(n) by dynamic programming from the right
@tidal bough What did you mean by this?

tidal bough
#

so... for each number in the list, you need to count the number of elements that are to the right of that number and smaller than it?

velvet thorn
#

honestly, this sounds like it can be solved in O(n) by dynamic programming from the right
@tidal bough really...?

hidden halo
#

I'd say to the left of the number. As in, the numbers are on a timeline, starting from left and moving to the right. So I need to consider all numbers appearing before that

velvet thorn
#

I can't see it but maybe you're right