#data-science-and-ml | Python | Page 375

lapis sequoia Feb 8, 2022, 1:34 PM

#

Awwww, I like iterations D:

#

crap, been doing that

#

arent they almost the same?

serene scaffold Feb 8, 2022, 1:34 PM

#

lapis sequoia crap, been doing that

well, here's your chance to permanently stop 😄

lapis sequoia Feb 8, 2022, 1:35 PM

#

probably should

serene scaffold Feb 8, 2022, 1:36 PM

#

"array" means something in math, but then some programming languages use it to refer to data structures in that language. In Python data science, "array" refers to a data structure intended to represent mathematical arrays.

#

in a lot of languages, what they call an array is the go-to data structure for having things in a certain order, whatever they are.

#

and in python, you use a list for that

#

but they are not the same.

lapis sequoia Feb 8, 2022, 1:37 PM

#

I have been deceived

hollow sentinel Feb 8, 2022, 1:38 PM

#

you can use numpy to turn a list into an array tho, correct?

serene scaffold Feb 8, 2022, 1:39 PM

#

yes, in the sense that you can pass a list to the np.array constructor

hollow sentinel Feb 8, 2022, 1:39 PM

#

right

lapis sequoia Feb 8, 2022, 1:40 PM

#

So to mention normal lists/arrays, we call them lists and we use array for the math definition?

serene scaffold Feb 8, 2022, 1:40 PM

#

in Python, you have lists for general-purpose containment of things. But if you're doing data science, you'll import numpy and do all the math with arrays.

#

they both use square brackets in their notation/representation and contain things in specific orders. that's pretty much where the similarities end.

#

granted, lists and arrays are more closely related than like, potatoes and frogs

#

but thinking of them as similar when you're dealing with programming/data science will probably just make things worse. and if people are constantly having to verify if you used the right term when you say list or array, you'll waste all your time.

lapis sequoia Feb 8, 2022, 1:43 PM

#

I see, now time to head to google for a sec to ask what is an array in math

serene scaffold Feb 8, 2022, 1:44 PM

#

n u m b e r s

lapis sequoia Feb 8, 2022, 1:44 PM

#

just literally numbers?

#

no twist?

serene scaffold Feb 8, 2022, 1:45 PM

#

here are some two-dimensional arrays, written in math notation

#

two-dimensional arrays are also called matrices.

lapis sequoia Feb 8, 2022, 1:45 PM

#

ah yes numpy

serene scaffold Feb 8, 2022, 1:46 PM

#

if you were using numpy, the first array would be represented like this

[[1, 2, 3]
 [4, 5, 6]]

#

with the square brackets reminding you of how arrays are represented in math notation.

agile cobalt Feb 8, 2022, 1:47 PM

#

scalars are sometimes also treated as some sort of 0D arrays?
(e.g., you can do np.float64(123)[True])

serene scaffold Feb 8, 2022, 1:47 PM

#

but the whole array is "one thing". it's not a "nested array". the horizontal [1, 2, 3] isn't treated differently than the vertical [1, 4]

lapis sequoia Feb 8, 2022, 1:48 PM

#

serene scaffold here are some two-dimensional arrays, written in math notation

is this how matmul works?

#

just asking because seen it before

serene scaffold Feb 8, 2022, 1:48 PM

#

lapis sequoia is this how matmul works?

yes, I just picked this diagram arbitrarily because it has arrays, but it's demonstrating the matmul formula.

lapis sequoia Feb 8, 2022, 1:48 PM

#

oh that's neat, I was actually a bit confused on how that worked

#

mind if i save it?

serene scaffold Feb 8, 2022, 1:49 PM

#

don't ask me, I just plucked it from Google 😛

lapis sequoia Feb 8, 2022, 1:49 PM

#

xd

serene scaffold Feb 8, 2022, 1:49 PM

#

but you can save anything I say in this server, unless I'm confessing to a crime.

lapis sequoia Feb 8, 2022, 1:49 PM

#

Hmmm

#

welp those tips are actually useful, I probably need to write them up so I don't forget it just in case

#

Thanks for the tips mate! Wish me luck because I am honestly only decent at math

lapis sequoia Feb 8, 2022, 2:08 PM

#

I've completed the basic python knowledge and want to learn data analysis, can anyone suggest me some useful courses or resources?

serene scaffold Feb 8, 2022, 2:32 PM

#

lapis sequoia I've completed the basic python knowledge and want to learn data analysis, can a...

!resources

arctic wedgeBOT Feb 8, 2022, 2:32 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

serene scaffold Feb 8, 2022, 2:32 PM

#

filter these by data science.

#

also, I see you asked the same question in #pedagogy. that is not the topic of that channel.

kind rock Feb 8, 2022, 2:49 PM

#

@serene scaffold So, anaconda is basically used as a package manager for python where we have different tools like spyder, jupyter etc? And, where do you do the coding part if not on jupyter ntbk ?

serene scaffold Feb 8, 2022, 2:51 PM

#

kind rock <@!253696366952316929> So, anaconda is basically used as a package manager for ...

IDEs like spyder or pycharm are not the execution environment for the program. They just edit the code as a text file and give you tools for working with them. Whereas jupyter notebooks are both the editor and the executor.

#

And, where do you do the coding part if not on jupyter ntbk ?
Not to pick on you, but the fact that this is a question just goes to show how general programming knowledge isn't taught in data science resources.

kind rock Feb 8, 2022, 2:52 PM

#

serene scaffold > And, where do you do the coding part if not on jupyter ntbk ? Not to pick on ...

so, where?

#

wait do you mean vs code?

serene scaffold Feb 8, 2022, 2:53 PM

#

outside of jupyter notebooks, you can edit python programs with any editor and pas the file to the python executable.

kind rock Feb 8, 2022, 2:54 PM

#

I'm confused, I thought we could execute/run files in spyder too.

serene scaffold Feb 8, 2022, 2:54 PM

#

kind rock wait do you mean vs code?

vs code is another text editor with features to help you program. but it's not what runs the code.

shut trail Feb 8, 2022, 2:55 PM

#

kind rock I'm confused, I thought we could execute/run files in spyder too.

its connects to python.

serene scaffold Feb 8, 2022, 2:55 PM

#

kind rock I'm confused, I thought we could execute/run files in spyder too.

spyder might have a button to run a program (I've never used spyder), but that button sends the program to the Python interpreter, which exists separately from Spyder.

kind rock Feb 8, 2022, 2:55 PM

#

ohkay and that python interpreter exists in jupyter?

serene scaffold Feb 8, 2022, 2:56 PM

#

pretty much. (jupyter itself is actually a python program, but let's not get into that.)

kind rock Feb 8, 2022, 2:57 PM

#

so what are the alternatives to jupyter when working in an industry?

#

@serene scaffold

serene scaffold Feb 8, 2022, 2:58 PM

#

you just write programs as text and run them.

#

with the python interpreter.

kind rock Feb 8, 2022, 2:58 PM

#

Idle ?

serene scaffold Feb 8, 2022, 2:58 PM

#

no one uses idle except for learning

#

keep in mind: programs are just text

#

you can use any text editor.

#

jupyter is doing more than just editing the text.

shut trail Feb 8, 2022, 2:59 PM

#

jupyterlab might have more features

kind rock Feb 8, 2022, 3:00 PM

#

serene scaffold jupyter is doing more than just editing the text.

its executing too?

serene scaffold Feb 8, 2022, 3:00 PM

#

yes, and visualizing

lapis sequoia Feb 8, 2022, 3:00 PM

#

how can i restrict my epoch in fitting in keras?
i mean how can i let it print every 10th epoch?

serene scaffold Feb 8, 2022, 3:01 PM

#

suppose you develop a model that a business wants to use. they can't just take your notebook and import it into their system

#

notebooks are a "dead end", in that sense.

shut trail Feb 8, 2022, 3:01 PM

#

lapis sequoia how can i restrict my epoch in fitting in keras? i mean how can i let it print e...

just keep the history and parse it after ?

lapis sequoia Feb 8, 2022, 3:01 PM

#

https://stackoverflow.com/questions/44931689/how-to-disable-printing-reports-after-each-epoch-in-keras
this answer shows something like
verbose=1 if epoch % 10 == 0 else 0

but I am lost about like what in the world is epoch reffered to here? as in where do we define it and how do we execute it.

kind rock Feb 8, 2022, 3:01 PM

#

so I'm not supposed to use jupyter notebook since I can't call import on seperate files and I can't deploy models which I train using code written on jupyter. Am I right?

serene scaffold Feb 8, 2022, 3:02 PM

#

kind rock so I'm not supposed to use jupyter notebook since I can't call import on seperat...

when you write code in a notebook, you're writing it so that you can see the result after each cell. you're not writing it so that it can be used outside of the notebook.

kind rock Feb 8, 2022, 3:03 PM

#

oh okay, so jupyter would make sense for learning / teaching

serene scaffold Feb 8, 2022, 3:03 PM

#

jupyter is fine for quick experimentation and visualization

shut trail Feb 8, 2022, 3:04 PM

#

lapis sequoia https://stackoverflow.com/questions/44931689/how-to-disable-printing-reports-aft...

by reading those comments it says you just define "verbose" that way

kind rock Feb 8, 2022, 3:04 PM

#

Gotcha and people across different systems wouldn't be able to access my notebooks, so I should use something like vs code

lapis sequoia Feb 8, 2022, 3:04 PM

#

yes but i mean how? a function or something? @shut trail

#

oh i should check docs

shut trail Feb 8, 2022, 3:04 PM

#

i've never tried it. im curious, so im going to

lapis sequoia Feb 8, 2022, 3:04 PM

#

jesus

shut trail Feb 8, 2022, 3:05 PM

#

verbose takes more than binary ! cool!

lapis sequoia Feb 8, 2022, 3:05 PM

#

#

but i mean i can't pass function or say 10 here.

frosty swallow Feb 8, 2022, 3:06 PM

#

Hello everyone

spring marsh Feb 8, 2022, 3:07 PM

#

how does root mean squared error punishes outliers? is it always better than mean absolute error?

frosty swallow Feb 8, 2022, 3:08 PM

#

Can anybody tell me what do u mean by 'regularize the data column wise'

shut trail Feb 8, 2022, 3:08 PM

#

lapis sequoia but i mean i can't pass function or say `10` here.

ya i'd just go verbose off , record the output and parse it

#

better yet, graph it

lapis sequoia Feb 8, 2022, 3:08 PM

#

i'll need to see how to do that afterwords. but thanks for help:D

shut trail Feb 8, 2022, 3:09 PM

#

theres a bunch of ways... sklearn i might have a link to an example somewhere

frosty swallow Feb 8, 2022, 3:11 PM

#

?

shut trail Feb 8, 2022, 3:14 PM

#

kind rock oh okay, so jupyter would make sense for learning / teaching

if youre developing something, its nice to have html docs along any presentation. jupyter can make reproducibility requirements easy for non programmers to understand

desert oar Feb 8, 2022, 3:19 PM

#

spring marsh how does root mean squared error punishes outliers? is it always better than mea...

when you take the square of something, it gets bigger. so if you take the square of all the errors, then the bigger errors get "more big" than the smaller errors

#

"better" in what sense? for what purpose? in the presence of outliers we say that methods based on squared errors are not robust, in that a small number of extreme outliers can significantly change the results

desert oar Feb 8, 2022, 3:21 PM

#

frosty swallow Can anybody tell me what do u mean by 'regularize the data column wise'

"column-wise" means "individually for each column". so regularize each column of your data

kind rock Feb 8, 2022, 3:23 PM

#

shut trail if youre developing something, its nice to have html docs along any presentation...

How do the html docs help? Do they contain description about the code?

spring marsh Feb 8, 2022, 3:23 PM

#

desert oar when you take the square of something, it gets bigger. so if you take the square...

If the bigger error gets more big then it will increase the overall RMSE more how is that punishing?

shut trail Feb 8, 2022, 3:23 PM

#

lapis sequoia i'll need to see how to do that afterwords. but thanks for help:D

import matplotlib.pyplot as plt
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from keras.models import Sequential # initialize neural network library
from keras.layers import Dense # build our layers library
def build_classifier():
    classifier = Sequential() # initialize neural network
    classifier.add(Dense(units = 8, kernel_initializer = 'uniform', activation = 'relu', input_dim = x_train.shape[1]))
    classifier.add(Dense(units = 8, kernel_initializer = 'uniform', activation = 'relu'))
    classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
    classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
    return classifier

classifier = KerasClassifier(build_fn = build_classifier, epochs = 70,batch_size=10)
history = classifier.fit(x_test, y_test, validation_split=0.20, epochs=70, batch_size=100, verbose=False)

# Plot training & validation accuracy values
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()
# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

This might help. data is the classic mushroom set

lapis sequoia Feb 8, 2022, 3:24 PM

#

yeah i was just messing with history.History.

spring marsh Feb 8, 2022, 3:24 PM

#

desert oar "better" in what sense? for what purpose? in the presence of outliers we say tha...

Better in sense that lesser error is what we are after right we need to minimize the error

frosty swallow Feb 8, 2022, 3:25 PM

#

desert oar "column-wise" means "individually for each column". so regularize each column of...

But could you tell me how, approach

shut trail Feb 8, 2022, 3:25 PM

#

lapis sequoia yeah i was just messing with history.History.

k good cause that def has typos and problems ...

shut trail Feb 8, 2022, 3:26 PM

#

kind rock How do the html docs help? Do they contain description about the code?

yes, thats one of the big selling points about jupyter. you use markdown with it. so you can walk them through with pictures and links and happiness

frosty swallow Feb 8, 2022, 3:27 PM

#

desert oar "column-wise" means "individually for each column". so regularize each column of...

If i go for lasso then it removes those which it doesn't need and shrinks to 0

shut trail Feb 8, 2022, 3:27 PM

#

html is just a static form to simplify distribution

kind rock Feb 8, 2022, 3:27 PM

#

@serene scaffold one more thing. I knew that anaconda is used as something as a package manager and that creates a specific conda system where all the tools reside. So instead, I should not use anaconda but learn venv for making virtual environments and then install tools as per my need. Is my assumption correct?

desert oar Feb 8, 2022, 3:28 PM

#

spring marsh If the bigger error gets more big then it will increase the overall RMSE more ho...

because models usually look for minimum error. so bigger error is "bad" from the perspective of the model fitting process, so you might end up with a model that "focuses" on trying to avoid those large errors, at the expense of focusing on other things.

frosty swallow Feb 8, 2022, 3:28 PM

#

desert oar "column-wise" means "individually for each column". so regularize each column of...

How to go column wise I am not getting it

kind rock Feb 8, 2022, 3:28 PM

#

shut trail yes, thats one of the big selling points about jupyter. you use markdown with it...

Oof, I totally forgot that the text between code cells in the notebooks is html, forgive me lmao

shut trail Feb 8, 2022, 3:29 PM

#

kind rock Oof, I totally forgot that the text between code cells in the notebooks is html,...

you can export the whole document as html

kind rock Feb 8, 2022, 3:31 PM

#

cool

uncut barn Feb 8, 2022, 3:32 PM

#

does anyone know any popular deconvolution algorithms in python for microscopy images (especially with confocal microscopy images)?

shut trail Feb 8, 2022, 3:35 PM

#

skimage ?

odd meteor Feb 8, 2022, 3:35 PM

#

I didn't really encounter the .evaluate() method until I started learning Deep Learning. So I also remember asking myself this same question 😅

So yeah, it can be likened to .predict() method in Sklearn

lapis sequoia Feb 8, 2022, 3:42 PM

#

shut trail k good cause that def has typos and problems ...

hm history came as handy

history.history['accuracy'][::10]

is good enough

shut trail Feb 8, 2022, 3:42 PM

#

thanks , ill use that

odd meteor Feb 8, 2022, 3:51 PM

#

Gradio can be used to demo and deploy a machine learning model right inside a JNB. I've tried it before and surprisingly it worked. Although I always use VSCode when it comes to deploying a model.

serene scaffold Feb 8, 2022, 3:51 PM

#

kind rock <@!253696366952316929> one more thing. I knew that anaconda is used as somethin...

yes

serene scaffold Feb 8, 2022, 3:52 PM

#

odd meteor Gradio can be used to demo and deploy a machine learning model right inside a JN...

this makes me very sad.

desert oar Feb 8, 2022, 3:59 PM

#

kind rock <@!253696366952316929> one more thing. I knew that anaconda is used as somethin...

i don't know if i agree with the recommendation to avoid anaconda entirely. i think if you want to use it, you need to learn how it works; much like venv. however i also don't know what the specific context for your question was

potent hemlock Feb 8, 2022, 4:00 PM

#

I have 100+ features in my data and I need to find data points that are closer. I tried Euclidean, but read that it doesnt work well in higher dimensions. Are other methods like distance correlation/ clustering better?

desert oar Feb 8, 2022, 4:00 PM

#

potent hemlock I have 100+ features in my data and I need to find data points that are closer. ...

closer than what? to what?

#

are you looking for k-nearest neighbors?

#

unfortunately distance-based techniques generally don't work well in high dimensions. you might want to try some dimension reduction, maybe something "classical" like PCA, or a lower-dimensional vector embedding (e.g. with an autoencoder)

#

also: what kinds of features are these?

potent hemlock Feb 8, 2022, 4:02 PM

#

This is a patient health EHR dataset. I have a test patient and I need to find the list of patients from the training data who are closer to this test patient and then impute missing values

novel elbow Feb 8, 2022, 4:03 PM

#

try cosine distance

potent hemlock Feb 8, 2022, 4:04 PM

#

Sure, will try that. I thought cosine also has curse of dimensionality though

novel elbow Feb 8, 2022, 4:05 PM

#

is supposed to work better than euclidean on high dimensions

potent hemlock Feb 8, 2022, 4:05 PM

#

ok, thank you @novel elbow

desert oar Feb 8, 2022, 4:06 PM

#

novel elbow try cosine distance

i'd be skeptical of cosine distance on strongly-heterogeneous data, especially since some of these features might be categorical

#

that said, i'd be skeptical of all pairwise distances in that context, without first doing some dimension reduction

#

you'd at least need to be very careful to scale the features properly

#

also not sure if it matters, but cosine "distance" (1 - cosine similarity) is not a valid distance metric

potent hemlock Feb 8, 2022, 4:08 PM

#

@desert oar , so we need to do PCA first and then try distance metrics?

desert oar Feb 8, 2022, 4:08 PM

#

not necessarily PCA, but it might be a good idea

#

you can try it both ways of course!

#

actually that would be a good exercise imo

potent hemlock Feb 8, 2022, 4:09 PM

#

sure, thanks a lot!

odd meteor Feb 8, 2022, 4:19 PM

#

serene scaffold this makes me very sad.

This is how it looked like on JNB

serene scaffold Feb 8, 2022, 4:20 PM

#

what language is Ndewo. Igbo?

odd meteor Feb 8, 2022, 4:20 PM

#

serene scaffold what language is Ndewo. Igbo?

Yeah 😀

serene scaffold Feb 8, 2022, 4:20 PM

#

I'm so fucking cultured.

odd meteor Feb 8, 2022, 4:21 PM

#

serene scaffold I'm so fucking cultured.

Hahaha do you have Igbo coworkers? 😀 Or it's just from movies?

serene scaffold Feb 8, 2022, 4:21 PM

#

Neither, I'm a c o m p u t a t i o n a l l i n g u i s t

odd meteor Feb 8, 2022, 4:22 PM

#

serene scaffold Neither, I'm a c o m p u t a t i o n a l l i n g u i s t

😀 Awesome. I'm impressed 💯

terse swallow Feb 8, 2022, 4:24 PM

#

Hello guys

#

Is a set the same as a dictionary ?

serene scaffold Feb 8, 2022, 4:25 PM

#

terse swallow Is a `set` the same as a `dictionary` ?

some of the implementation details are the same, and they use some of the same concepts, but they're different.

terse swallow Feb 8, 2022, 4:26 PM

#

Oof

#

I am not a biggener and I don't know it

serene scaffold Feb 8, 2022, 4:27 PM

#

sets are pretty underrated.

terse swallow Feb 8, 2022, 4:27 PM

#

Oh

#

Thats why I don't know it

serene scaffold Feb 8, 2022, 4:27 PM

#

though this isn't a data science question. if you open a help channel (see #❓｜how-to-get-help) I'll go over it with you briefly.

terse swallow Feb 8, 2022, 4:28 PM

#

serene scaffold though this isn't a data science question. if you open a help channel (see <#704...

Oh wait damn

#

I thought that I am at general

#

Lol

serene scaffold Feb 8, 2022, 4:29 PM

#

@terse swallow go to #help-potato before someone else takes it

brazen spire Feb 8, 2022, 4:43 PM

#

#

shouldn't a bigger dataset give better results?

#

1500 gives me a lower loss function that 4000

serene scaffold Feb 8, 2022, 4:58 PM

#

brazen spire shouldn't a bigger dataset give better results?

assuming that all datasets are of the same quality, more data is either good or unnecessary

brazen spire Feb 8, 2022, 4:59 PM

#

ok that answer my problem thanks.

prisma mist Feb 8, 2022, 6:30 PM

#

conda-forge is returning server error code 403 when i try to install packages ... anybody else having this issue?

plush jungle Feb 8, 2022, 6:53 PM

#

when an RNN backpropagates through time, the weights get trained to more accurately take in a hidden state and an input and produce the correct output vector. This I understand. What I don't understand is why the weights for how the hidden state gets calculated are changed

#

there are two neural nets in this RNN:

        self.in2hidden = nn.Linear(input_size + hidden_size, hidden_size)
        self.in2output = nn.Linear(input_size + hidden_size, output_size)```

#

the first one gets better at producing the correct output when it backpropagates

desert oar Feb 8, 2022, 6:54 PM

#

brazen spire shouldn't a bigger dataset give better results?

are you taking increasingly large samples of the same dataset? or are these different datasets totally different from each other? are they simulated datasets, or real data?

plush jungle Feb 8, 2022, 6:54 PM

#

but how is the loss even calculated for the second one? what is loss for a hidden state?

desert oar Feb 8, 2022, 6:54 PM

#

if these are random samples, i am skeptical that the differences between the 1500, 2000, 2500, and 4000 runs are meaningful. it might be random sampling variation. i.e. the "variance" in the "bias-variance" tradeoff

plush jungle Feb 8, 2022, 6:55 PM

#

normally loss is the mean squared error between the guess and the correct answer

#

but there is no "correct hidden state"

#

so how would you even calculate it

desert oar Feb 8, 2022, 6:55 PM

#

plush jungle so how would you even calculate it

presumably it's propagating backwards from the actual labels on the data

#

that's how all neural networks work

#

conceptually, at least

#

information propagates backwards from the loss function

plush jungle Feb 8, 2022, 6:56 PM

#

desert oar presumably it's propagating backwards from the _actual_ labels on the data

if I have a dataset with 3 words, and the output looks like this

[1,0,0]```
but the correct word was
```py
[0,0,1]```
then the mean squared error between those is the loss

#

but with the hidden state neural net

#

its output is a hidden state

#

let's say it has 6 neurons

#

then the hidden state is like this

[0.5,0.5,0.5,0.5,0.5,0.5]```

#

but when it back propagates, what does it compare that hidden state to?

desert oar Feb 8, 2022, 6:59 PM

#

is the hidden state an output?

#

the output is the predicted word

#

maybe in your case the hidden state maps 1:1 with the output

plush jungle Feb 8, 2022, 7:02 PM

#

if we consider an example with this dataset

["alice", "saw", "bob"]```
and 6 hidden layer neurons, then the first thing that happens is the alice vector gets appended to the hidden state
```py
[1,0,0,0.5,0.5,0.5,0.5,0.5,0.5]```

desert oar Feb 8, 2022, 7:02 PM

#

backprop starts at the loss L which is a function of y and the model parameters

plush jungle Feb 8, 2022, 7:02 PM

#

we pass that vector of length 9 to the network self.in2output

desert oar Feb 8, 2022, 7:02 PM

#

there's a nice explanation of "backpropagation through time" in RNNs here btw https://mmuratarat.github.io/2019-02-07/bptt-of-rnn

Mustafa Murat ARAT

Backpropagation Through Time for Recurrent Neural Network

Homepage

plush jungle Feb 8, 2022, 7:02 PM

#

and it predicts "alice" is the next word, which is wrong

#

it should say "saw" is the next word

#

then we calculate the error and backpropagate

desert oar Feb 8, 2022, 7:03 PM

#

right

plush jungle Feb 8, 2022, 7:03 PM

#

but we also have the next hidden state

#

we pass that 9 vector to self.in2hidden

#

and it outputs a 6 vector, the new hidden state

#

but I checked and the weights of self.in2hidden are updating after each backpropagation

desert oar Feb 8, 2022, 7:04 PM

#

yep, that propagates from the loss function as well

plush jungle Feb 8, 2022, 7:04 PM

#

so if self.in2output gets better at predicting the correct output

#

what does self.in2hidden get better at doing?

#

why change the weights?

#

and how does it change them, if the loss is based on the mse of the output

desert oar Feb 8, 2022, 7:05 PM

#

can you show the full model you wrote?

plush jungle Feb 8, 2022, 7:06 PM

#

yes

#

https://www.toptal.com/developers/hastebin/oxiquzaxug.properties

#

ok so I just skimmed the link you sent, and this makes it clear that the total loss at the end of the sentence is in fact used to update the weights of self.in2hidden

#

desert oar Feb 8, 2022, 7:10 PM

#

ahhh yeah

plush jungle Feb 8, 2022, 7:10 PM

#

but now my question is why

desert oar Feb 8, 2022, 7:11 PM

#

that was the confusion

#

n,n+1 vs n-1,n

#

the loss is computed on the next step

#

using the next hidden state

plush jungle Feb 8, 2022, 7:11 PM

#

i've got this

    for sentence in dataset:
        hidden_state = model.init_hidden()
        input_tensor = get_one_hot_sentence_tensor(sentence)
        
        loss = 0
        index = 0
        for word in input_tensor:
            if index+1 >= len(input_tensor):
                break
            output, hidden_state = model(word, hidden_state)
            current_loss = criterion(output, input_tensor[index+1])
            loss += current_loss
            index += 1

        optimizer.zero_grad()
        loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(), 1)
        optimizer.step()```

#

which adds the loss at each word

#

and then backpropagates at the end of the sentence

#

but my confusion is this:
self.in2output depends on the hidden state to make predictions
if you change the weights of self.in2hidden, doesn't that screw up the changes you just made to self.in2output?

desert oar Feb 8, 2022, 7:20 PM

#

let me look over this. but fwiw pytorch does have a built-in rnn module https://pytorch.org/docs/stable/generated/torch.nn.RNN.html

#

are you following this guide? https://jaketae.github.io/study/pytorch-rnn/ your code uses the same names 🙂

Jake Tae

PyTorch RNN from Scratch

In this post, we’ll take a look at RNNs, or recurrent neural networks, and attempt to implement parts of it in scratch through PyTorch. Yes, it’s not entirely from scratch in the sense that we’re still relying on PyTorch autograd to compute gradients and implement backprop, but I still think there are valuable insights we can glean from this imp...

plush jungle Feb 8, 2022, 7:26 PM

#

desert oar are you following this guide? https://jaketae.github.io/study/pytorch-rnn/ your ...

yeah I wanted to build a vanilla RNN to help me better understand the difference between RNNs, LSTMs and transformers, so I followed that guide and added some of my own functions and datasets. so far it's been really successful

crimson prism Feb 8, 2022, 7:26 PM

#

Hi everyone! Could you explain to me what a Torch.gradient is?

plush jungle Feb 8, 2022, 7:26 PM

#

the only part I still don't fully understand is the BPTT

plush jungle Feb 8, 2022, 7:27 PM

#

crimson prism Hi everyone! Could you explain to me what a Torch.gradient is?

are you confused about what a gradient is or what pytorch's implementation of it is?

desert oar Feb 8, 2022, 7:28 PM

#

plush jungle yeah I wanted to build a vanilla RNN to help me better understand the difference...

this code uses the previous hidden state to compute the current hidden state. so this model computes loss at time t and information flows backward to hidden state at time t-1

#

the blog post computes loss at time t+1 and information flows backward to hidden state at time t

crimson prism Feb 8, 2022, 7:29 PM

#

plush jungle are you confused about what a gradient is or what pytorch's implementation of it...

I'm confused about what a gradient is

plush jungle Feb 8, 2022, 7:30 PM

#

crimson prism I'm confused about what a gradient is

when a neural net learns, it makes a guess. the computer then checks that guess against what the correct answer was

#

how badly it guessed is called the loss

#

you then use calculus (partial derivatives) to calculate the gradient

#

which is how much to change the weights so that you get the right answer

plush jungle Feb 8, 2022, 7:31 PM

#

desert oar the blog post computes loss at time `t+1` and information flows backward to hidd...

oh that's confusing

desert oar Feb 8, 2022, 7:31 PM

#

@plush jungle

    def forward(self, x, hidden_state):
        combined = torch.cat((x, hidden_state), 1)
        hidden = torch.sigmoid(self.in2hidden(combined))
        output = self.in2output(combined)
        return output, hidden

the naming in this code is confusing you. here is the same code with better variable names:


    def forward(self, curr_x, prev_hidden):
        curr_combined = torch.cat((curr_x, prev_hidden), 1)
        curr_hidden = torch.sigmoid(self.in2hidden(curr_combined)
        curr_output = self.in2output(curr_combined)
        return curr_output, curr_hidden

crimson prism Feb 8, 2022, 7:32 PM

#

plush jungle which is how much to change the weights so that you get the right answer

So we set the "required_grad" to True when we want to make a neural net

#

Hmmmmmmm

desert oar Feb 8, 2022, 7:33 PM

#

crimson prism So we set the "required_grad" to True when we want to make a neural net

i recommend that you focus on understanding the equations first, at least on a conceptual level, before getting too deep into the technical details of pytorch

#

otherwise you will just confuse yourself and get lost in all the details

crimson prism Feb 8, 2022, 7:35 PM

#

desert oar i recommend that you focus on understanding the equations first, at least on a c...

Where can I study them? Cuz I tried to look at the documentation and it seems pretty messy to me

plush jungle Feb 8, 2022, 7:36 PM

#

desert oar <@!433856634192789504> ```python def forward(self, x, hidden_state): ...

maybe I would understand better if I knew the order that the back propagation updates the weights in? does it update all of in2oput and then all of in2hidden?

plush jungle Feb 8, 2022, 7:37 PM

#

crimson prism Where can I study them? Cuz I tried to look at the documentation and it seems pr...

what type of neural net are you trying to learn about? the structure of a CNN, an RNN, and a GAN are pretty different

plush jungle Feb 8, 2022, 7:39 PM

#

plush jungle maybe I would understand better if I knew the order that the back propagation up...

each of the arrows in this diagram is an adjustment of a weight matrix

#

but does the order matter at all? wouldn't you need to recalculate the loss every time you change the hidden state weight matrix?

desert oar Feb 8, 2022, 7:53 PM

#

crimson prism Where can I study them? Cuz I tried to look at the documentation and it seems pr...

yeah the documentation is written for people who already know the math. i think a course in deep learning would be a good resource. e.g. fast.ai or the andrew ng course, maybe some universities also have courses, maybe MIT?

desert oar Feb 8, 2022, 7:54 PM

#

plush jungle maybe I would understand better if I knew the order that the back propagation up...

using the symbols from that post, it updates Whh and Wyh simultaneously

#

actually wait, no

#

i think you'd say it updates Wyh "first", because that's "after" Whh in the flow from input to output

#

basically you have to look at the chain rule and go from outside in

#

i don't know that it's helpful to think of it this way

#

because mathematically i don't think it matters

#

the output is computed first. that's the forward pass

plush jungle Feb 8, 2022, 7:55 PM

#

desert oar the output is computed _first_. that's the forward pass

in both my code and the blog post it goes all the way to the end and adds the loss right?

desert oar Feb 8, 2022, 7:55 PM

#

then the "backpropagation" stuff is just a metaphor for computing the gradient and updating all the weights at once

desert oar Feb 8, 2022, 7:55 PM

#

plush jungle in both my code and the blog post it goes all the way to the end and adds the lo...

well yeah, but it's based on the previous hidden state

#

so it pulls in both Wyh and Whh

#

but the important point is (and the answer to your question): gradient descent updates all the weights at once as a single vector

#

other optimization algorithms like coordinate descent actually update individual weights in a cycle

plush jungle Feb 8, 2022, 7:56 PM

#

desert oar but the important point is (and the answer to your question): gradient descent u...

oh!

desert oar Feb 8, 2022, 7:56 PM

#

which works great for certain problems, but not for deep learning

#

backprop is a metaphor, there is no actual "flow" in real-time. the "flow" is computed ahead-of-time as a single expression of the gradient vector

plush jungle Feb 8, 2022, 7:57 PM

#

wait, does it calculate the gradients seperately and then apply them at the same time?

desert oar Feb 8, 2022, 7:57 PM

#

well there's one "gradient" - the vector of partial derivatives

plush jungle Feb 8, 2022, 7:57 PM

#

but each weight should have its own gradient right? because each weight needs to change by a different amount

desert oar Feb 8, 2022, 7:57 PM

#

each weight has its own partial derivative

#

the vector of all those partial derivatives is the gradient

plush jungle Feb 8, 2022, 7:58 PM

#

desert oar the vector of all those partial derivatives is the gradient

oh I see what you mean

#

when I said individual gradients I meant the elements of that vector

desert oar Feb 8, 2022, 7:58 PM

#

so "gradient descent" look at the entire gradient as a single vector, which is often drawn as an arrow, to suggest the presence of both a magnitude and a direction

#

so the weight update (the "step") in gradient descent is 1 step in the direction of the minimum loss

#

imagine you have only 2 weights

#

gradient descent would be a step in any direction on the (x,y) plane

#

coordinate descent would be stepping only in the x direction first, then in the y direction, over and over

#

the difference with deep learning is that it's not just (x,y), it's a huuuge vector of every single weight in the whole model

#

it quickly becomes intractable to try and reason about that kind of a space even in abstract terms

#

which is why it's so nice that we have things like gradient descent

plush jungle Feb 8, 2022, 8:01 PM

#

but in your 2D example

#

if we didn't have activation functions, or if we had a linear activation function, our weights and biases would be a line?

#

linear regression

#

and it's stuff like relu and sigmoid that make it a curve?

desert oar Feb 8, 2022, 8:02 PM

#

plush jungle and it's stuff like relu and sigmoid that make it a curve?

precisely! we use these nonlinear activation functions deliberately in order to introduce non-linearities in the model. so a neural network becomes a huge stack of these little nonlinear mini-models

#

back in the 1950s i guess they thought we could use that to model the human brain. turns out that wasn't true at all, but hopefully you can see how lots of little nonlinear things all interacting can lead to very very complicated emergent behavior

#

and yes, you can express linear regression as a neural network with no hidden layers and linear activation function

plush jungle Feb 8, 2022, 8:04 PM

#

so could you say

#

that when the hidden state weights update

#

they're getting better at accurately storing the right patterns?

#

whereas the in2ouput weights are getting better at interpreting those patterns?

desert oar Feb 8, 2022, 8:06 PM

#

i wouldn't go that far with it. i'd say that the weights collectively represent some kind of encoded information about the training data

#

maybe you can say that different groups of those weights represent different kinds of information

#

and yeah, i guess you can all that "storage"

#

Wyh is a transformation from hidden state to output. Whh is a transformation between hidden states. so yeah, those two groups of weights probably encode/store different things

brazen spire Feb 8, 2022, 8:15 PM

#

desert oar are you taking increasingly large samples of the same dataset? or are these diff...

generated randomly

desert oar Feb 8, 2022, 8:15 PM

#

brazen spire generated randomly

i think you are just seeing random variation between runs

brazen spire Feb 8, 2022, 8:16 PM

#

So basically my test is useless?

desert oar Feb 8, 2022, 8:16 PM

#

not entirely useless, but you can't distinguish "results" from "random sampling variation"

#

for a better test, i recommend the following:

generate the N = 4000 dataset
generate the N < 4000 datasets by taking samples of the N = 4000 dataset

this way you are at least using the same data at each run

brazen spire Feb 8, 2022, 8:17 PM

#

great idea thanks

desert oar Feb 8, 2022, 8:22 PM

#

brazen spire great idea thanks

you also might want to re-run the entire procedure several times in order to estimate the variances

#

e.g. for every run of (1), run (2) several times

#

then re-run the entire 1-2 procedure several times

plush jungle Feb 8, 2022, 8:24 PM

#

desert oar `Wyh` is a transformation from hidden state to output. `Whh` is a transformation...

yeah this is what I'm not getting conceptually. The information encoded in wyh is clearly the patterns in the dataset, like syntax, semantics, etc. everything I've seen about whh just says it's the "memory" of the RNN, which means it encodes all words that have been seen before

#

but if it's just memory, why would you need weights?

#

it should always be the same, and never change

desert oar Feb 8, 2022, 8:25 PM

#

plush jungle yeah this is what I'm not getting conceptually. The information encoded in wyh ...

i don't know if that's right

#

the hidden state itself encodes the syntax and semantics

#

or at least some abstract representation thereof

#

Wyh turns that abstract hidden state into a real word

#

the hidden state is like an abstract representation of "where" you are in the sequence

#

Wyh just turns that abstract representation into an actual element of the sequence

#

Whh tells you how to transition between these abstract positions in the sequence, the "hidden states"

#

(hopefully this also helps elucidate why RNNs can't really model the full range of natural language spoken by real humans)

plush jungle Feb 8, 2022, 8:27 PM

#

in this code, the hidden state is being reinitialized at the beginning of every forward pass

    for sentence in dataset:
        hidden_state = model.init_hidden()
        input_tensor = get_one_hot_sentence_tensor(sentence)

        
        loss = 0
        index = 0
        for word in input_tensor:
            if index+1 >= len(input_tensor):
                break
            output, hidden_state = model(word, hidden_state)
            current_loss = criterion(output, input_tensor[index+1])
            loss += current_loss
            index += 1

        optimizer.zero_grad()
        loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(), 1)

        optimizer.step()```

#

wouldn't that undo all the training?

#

why back propagate the hidden state if you're just gonna reset it

desert oar Feb 8, 2022, 8:27 PM

#

it's being re-initialized at the start of every individual sequence

plush jungle Feb 8, 2022, 8:27 PM

#

right sorry

desert oar Feb 8, 2022, 8:27 PM

#

otherwise you treat your dataset of sentences all as one big long sentence

plush jungle Feb 8, 2022, 8:27 PM

#

but still

#

each sentence you pass it is like a single training example

desert oar Feb 8, 2022, 8:28 PM

#

right. you build up the loss and gradient for that one example by stepping through one word at a time, starting from the initial hidden state

#

oh, i see

#

don't forget, this is stochastic gradient descent

#

we do one weight update for each data point

#

there is no batching

#

the hidden state is not itself a learned weight

#

we aren't re-initializing the weights

plush jungle Feb 8, 2022, 8:29 PM

#

oh

#

yeah you're right

desert oar Feb 8, 2022, 8:29 PM

#

we pick a sentence, step through it computing the gradient and loss, then do one weight update. then re-initialize the hidden state and repeat

plush jungle Feb 8, 2022, 8:29 PM

#

that's not whh

#

that's h

desert oar Feb 8, 2022, 8:29 PM

#

yep, exactly

plush jungle Feb 8, 2022, 8:30 PM

#

in an image recognition neural net, the hidden layer represents sub patterns that it's found in the dataset

crimson prism Feb 8, 2022, 8:31 PM

#

plush jungle what type of neural net are you trying to learn about? the structure of a CNN, ...

Dunno, I just started learning about Pytorch tbh

plush jungle Feb 8, 2022, 8:31 PM

#

so I assume wyh represents subpaterns in grammar

#

but whh still confuses me

crimson prism Feb 8, 2022, 8:31 PM

#

desert oar yeah the documentation is written for people who already know the math. i think ...

I'll check it, thanks!

plush jungle Feb 8, 2022, 8:31 PM

#

crimson prism Dunno, I just started learning about Pytorch tbh

machine learning is super different depending on what you're trying to accomplish

#

image recognition, natural language processing, text/video/image generation

#

what interests you the most?

#

learn about the specific applications before learning about pytorch in general

#

the other way around would be like learning what a screwdriver is before learning about screws

#

but you can't go wrong with something like this video
https://www.youtube.com/watch?v=aircAruvnKk

YouTube

3Blue1Brown

But what is a neural network? | Chapter 1, Deep learning

What are the neurons, why are there layers, and what is the math underlying it?
Help fund future projects: https://www.patreon.com/3blue1brown
Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks

Additional funding for this project provided by Amplify Partners

Typo correction: At 14 minutes 45 seconds, th...

▶ Play video

crimson prism Feb 8, 2022, 8:34 PM

#

plush jungle image recognition, natural language processing, text/video/image generation

Text/video and image generation in the first place

crimson prism Feb 8, 2022, 8:34 PM

#

plush jungle the other way around would be like learning what a screwdriver is before learnin...

That makes sense ngl

plush jungle Feb 8, 2022, 8:35 PM

#

crimson prism Text/video and image generation in the first place

text/video generation is done using RNNs, LSTMs, or transformers, because there is sequence data

#

image generation is done using GANs

crimson prism Feb 8, 2022, 8:35 PM

#

plush jungle text/video generation is done using RNNs, LSTMs, or transformers, because there ...

In order to study Image/video generation I also have to know about OpenCV?

plush jungle Feb 8, 2022, 8:36 PM

#

OpenCV is a screwdriver

#

machine learning models like transformers or GANs are screws

#

which interests you more, text/video, or image generation?

crimson prism Feb 8, 2022, 8:37 PM

#

plush jungle which interests you more, text/video, or image generation?

I think image generation

plush jungle Feb 8, 2022, 8:38 PM

#

ok, then you want to look into GANs

crimson prism Feb 8, 2022, 8:38 PM

#

I would like to look at them all btw

plush jungle Feb 8, 2022, 8:38 PM

#

well start with image classifier neural nets

#

since a GAN is just a classifier with extra steps

crimson prism Feb 8, 2022, 8:38 PM

#

For image recognition what do I have to know instead?

desert oar Feb 8, 2022, 8:38 PM

#

plush jungle but whh still confuses me

The transitions between hidden states represent semantic transitions within the sentence.

#

But keep in mind that this is just a model

plush jungle Feb 8, 2022, 8:39 PM

#

crimson prism For image recognition what do I have to know instead?

are you familiar with the MNIST dataset?

desert oar Feb 8, 2022, 8:39 PM

#

Like I said, actual natural language spoken by humans is generally a lot more sophisticated than a sequence of "states"

crimson prism Feb 8, 2022, 8:39 PM

#

plush jungle are you familiar with the MNIST dataset?

Nnnope

plush jungle Feb 8, 2022, 8:40 PM

#

crimson prism Nnnope

it's thousands of images of handwritten digits 0-9. it's very commonly used to test image classification neural nets

#

in order to understand GANs and image generation, you should try to code an MNIST classifier first to understand classifiers

#

there are a bunch of tutorials for MNIST

crimson prism Feb 8, 2022, 8:43 PM

#

plush jungle there are a bunch of tutorials for MNIST

Oh ok, so in order to know about machine learning I firstly have to study the mathematics and MNIST/GAN, right?

plush jungle Feb 8, 2022, 8:43 PM

#

crimson prism Oh ok, so in order to know about machine learning I firstly have to study the ma...

machine learning is a super broad category. GANs are the state of the art in image generation. to understand GANs you need to understand classifiers

#

tbh the math isn't super difficult unless you're an actual data scientist building your own model

#

try following this tutorial

#

https://nextjournal.com/gkoehler/pytorch-mnist

Nextjournal

MNIST Handwritten Digit Recognition in PyTorch

In this article we'll build a simple convolutional neural network in PyTorch and train it to recognize handwritten digits using the MNIST dataset. Training a classifier on the MNIST dataset can be regarded as the hello world of image recognition.

plush jungle Feb 8, 2022, 8:45 PM

#

plush jungle but you can't go wrong with something like this video https://www.youtube.com/wa...

and watch the 3blue1brown video on neural nets

crimson prism Feb 8, 2022, 8:46 PM

#

Got it! Thanks really much!

plush jungle Feb 8, 2022, 8:46 PM

#

feel free to pm me if you get stuck or have questions

iron basalt Feb 8, 2022, 8:54 PM

#

desert oar the hidden state itself encodes the syntax and semantics

It's important to note that this is a model trying to predict a specific label, it's not generative, and so it may not fully model the sequences, only what it needs to correctly predict the labels, which could be way less than the total. Ofc, this has the upside that it needs less weights and stuff and you can get away with a simple RNN, but if you wanted something to actually model the sequences (in totality) it would need to be generative / capture everything. @plush jungle

plush jungle Feb 8, 2022, 8:55 PM

#

iron basalt It's important to note that this is a model trying to predict a specific label, ...

what I'm still struggling with is why have 2 neural nets? could you build an RNN where the Whh weights don't get updated?

iron basalt Feb 8, 2022, 8:55 PM

#

(It also means that it does not generalize as well)

plush jungle Feb 8, 2022, 8:55 PM

#

effectively creating 1 neural net

iron basalt Feb 8, 2022, 8:58 PM

#

plush jungle what I'm still struggling with is why have 2 neural nets? could you build an RN...

What do you mean 2 neural networks?

desert oar Feb 8, 2022, 8:59 PM

#

plush jungle what I'm still struggling with is why have 2 neural nets? could you build an RN...

you don't have 2 neural networks. you have 2 different weight matrices. that's all a nn.Linear represents: a matrix of weights, a "linear transformation"

plush jungle Feb 8, 2022, 8:59 PM

#

yeah, that's a more accurate way of putting it

#

but if all self.in2hidden is doing is adding another layer of depth

desert oar Feb 8, 2022, 9:00 PM

#

the model has 2 stages: Whh turns the previous hidden state into the current hidden state, and Wyh turns the hidden state into the observed sequence value

plush jungle Feb 8, 2022, 9:00 PM

#

couldn't you just remove it?

#

and have a less deep neural net?

desert oar Feb 8, 2022, 9:00 PM

#

then you're just modeling transitions between sequence values directly. which... sure? but that's not the model

iron basalt Feb 8, 2022, 9:00 PM

#

If you can draw a graph of the neurons and the connections between them, and you can find a path from any neuron to some neuron in question, that neuron in question is part of the same network.

#

(ignoring the direction-ness of it)

#

When we say that we have two networks working together, it's just a useful split for us to understand what is happening. Although in the end when it runs, it's all just one big blob / mess.

#

(aka the "black box")

#

Well, it depends what you are doing, there are too many different types of networks to say that it's just one big black box in the end for all of them.

plush jungle Feb 8, 2022, 9:05 PM

#

so if it's all one neural net

#

and you can unroll it

#

what would the net look like with 3 time steps?

iron basalt Feb 8, 2022, 9:06 PM

#

For an RNN?

plush jungle Feb 8, 2022, 9:06 PM

#

yeah

#

#

because if you look at this

#

the wyh and whh weight matricies are running in parallel

#

so I guess you could think of them as the same layer?

iron basalt Feb 8, 2022, 9:10 PM

#

The unrolling is through time. You really only have the thing on the left, in terms of the neural network metaphor. The unrolling is because of the weight update method chosen.

upbeat prism Feb 8, 2022, 10:19 PM

#

hi, I use r"\frac{1}{2}$ to use latex in matplotlib labels. I'd also like to use the f flag. How can I combine them?

tidal bough Feb 8, 2022, 10:21 PM

#

fr is valid.

#

or rf, doesn't matter

upbeat prism Feb 8, 2022, 10:23 PM

#

hmm ok, let me retry then

fluid sigil Feb 9, 2022, 12:51 AM

#

Hi all! I need to do binary classification of time series data. The problem that I have is that I have a few true samples and then I have the rest of the data, where there might be more unlabeled true samples. I am assuming can't just label the rest of the data as negative since this is not true. Has anybody encountered this challenge before or know how to approach it? Thanks!

thin palm Feb 9, 2022, 12:59 AM

#

hi Python gang, if we aren't working with Time Series do we need to keep our datetime64 feature? how can this help us?

mild dirge Feb 9, 2022, 1:12 AM

#

fluid sigil Hi all! I need to do binary classification of time series data. The problem that...

Well part of training a model is it being able to distinguish between the two classes, if you don't know which datapoints belong to which class, it will be hard to train the model

#

If you have some dat labeled and the rest unlabeled it wouldn't be such a problem

#

but the way you put it, you only have one class labeled partially

#

and the rest is unlabeled

#

@fluid sigil

quiet vault Feb 9, 2022, 1:13 AM

#

thin palm hi Python gang, if we aren't working with Time Series do we need to keep our dat...

No you do not, having the data in chronological order and putting it into supervised form works

#

https://machinelearningmastery.com/how-to-develop-deep-learning-models-for-univariate-time-series-forecasting/ here is a good example

Machine Learning Mastery

Deep Learning Models for Univariate Time Series Forecasting

Deep learning neural networks are capable of automatically learning and extracting features from raw data. This feature of neural networks […]

thin palm Feb 9, 2022, 1:15 AM

#

quiet vault No you do not, having the data in chronological order and putting it into superv...

So for example I'm working on a machine learning model to use Warren Buffet's approach when choosing stocks, and we've gathered all this awesome data and we have a column year but how useful is this? So i'm wondering if I just drop it because it's just the year

quiet vault Feb 9, 2022, 1:16 AM

#

It believe it is not useful

thin palm Feb 9, 2022, 1:23 AM

#

quiet vault It believe it is not useful

thank you

thin palm Feb 9, 2022, 2:16 AM

#

So question about scaling my friends, when we see our feature isn't a Normal Distribution do we use the scaling techniques to get us there or should we convert the feature to a log value for example. Which of the two do we do / should or are these the same thing just different techniques?

desert oar Feb 9, 2022, 2:34 AM

#

Introductory courses for Machine Learning and Deep Learning

MIT 6.S191, Intro to Deep Learning: http://introtodeeplearning.com/

Fast.ai online courses: https://www.fast.ai/

Andrew Ng's classic Machine Learning: https://www.coursera.org/learn/machine-learning

Note: This is a living list! Please @ me if you have additional suggestions.

#

@serene scaffold pin? ☝️

serene scaffold Feb 9, 2022, 2:36 AM

#

desert oar **Introductory courses for Machine Learning and Deep Learning** MIT 6.S191, Int...

> not doing a PR to the resources folder of the site repo

desert oar Feb 9, 2022, 2:36 AM

#

i didn't know if it counted as a site resource!

#

we have a machine learning section?

#

!resources

serene scaffold Feb 9, 2022, 2:37 AM

#

https://www.pythondiscord.com/resources/?topics=data-science

Python Discord | Resources

We're a large, friendly community focused around the Python programming language. Our community is open to those who wish to learn the language, as well as those looking to help others.

#

note the ?topics=data-science

#

anyway, by our own criteria, we can't add Andrew Ng's course, since it's in Octave or whatever.

desert oar Feb 9, 2022, 2:37 AM

#

blah, true

#

well the other two are python

serene scaffold Feb 9, 2022, 2:38 AM

#

(all resources all have to present the information in Python or be language-agnostic)

desert oar Feb 9, 2022, 2:38 AM

#

is there a "how to add a resource for stupid people" page

serene scaffold Feb 9, 2022, 2:38 AM

#

I'll do it lemon_hyperpleased

desert oar Feb 9, 2022, 2:38 AM

#

ty

#

i was looking in https://github.com/python-discord/site

GitHub

GitHub - python-discord/site: pythondiscord.com - A Django and Bulm...

pythondiscord.com - A Django and Bulma web application. - GitHub - python-discord/site: pythondiscord.com - A Django and Bulma web application.

serene scaffold Feb 9, 2022, 2:39 AM

#

@desert oar they're all in here: https://github.com/python-discord/site/tree/main/pydis_site/apps/resources/resources

GitHub

site/pydis_site/apps/resources/resources at main · python-discord/s...

pythondiscord.com - A Django and Bulma web application. - site/pydis_site/apps/resources/resources at main · python-discord/site

#

if you just want to write the blurb for the other two, I will do the rest.

desert oar Feb 9, 2022, 2:48 AM

#

serene scaffold if you just want to write the blurb for the other two, I will do the rest.

hm.. i haven't actually taken each course start to finish. i have watched sections of both videos though

serene scaffold Feb 9, 2022, 3:24 AM

#

desert oar hm.. i haven't actually taken each course start to finish. i have watched sectio...

well, let me know if you decide to write those.

potent sky Feb 9, 2022, 5:10 AM

#

6s191 is pretty good

stray nymph Feb 9, 2022, 5:28 AM

#

hi

#

anyone has any useful links

#

for outlier detection

#

for classification

river maple Feb 9, 2022, 9:09 AM

#

im getting this error while converting the .weights into the corresponding TensorFlow model files

real arch Feb 9, 2022, 10:50 AM

#

hi guys. is pycharm recomended for data science?

lapis sequoia Feb 9, 2022, 10:58 AM

#

real arch hi guys. is pycharm recomended for data science?

not exactly recommended, but it's a good option https://datasciencenerd.com/is-pycharm-good-for-data-science/ As a beginner, I prefer Jupyter Notebooks though, because it's much easier to make code segments that I can run individually. I want to check the result after every few lines, because I'm still learning. If you're already more skilled maybe you won't need that anymore

real arch Feb 9, 2022, 11:29 AM

#

lapis sequoia not exactly recommended, but it's a good option https://datasciencenerd.com/is-p...

ah i see. i am very2 newbie. and start study with modul. now i use google colab 😅 because i dont need to instal anything. but google colab need internet. my friend said that anaconda is better than colab. but i need other recomendation. i will try Jupyter. thanks a lot.

lapis sequoia Feb 9, 2022, 11:30 AM

#

real arch ah i see. i am very2 newbie. and start study with modul. now i use google colab ...

yeah Anaconda also includes Jupyter, but Anaconda has too many other options for us beginners. Glad to help 🙂

real arch Feb 9, 2022, 11:33 AM

#

lapis sequoia yeah Anaconda also includes Jupyter, but Anaconda has too many other options for...

my computer not strong enough. i already instal it yesterday. but maybe i should only instal jupyter. how about vscode?

stray nymph Feb 9, 2022, 11:35 AM

#

hi

#

https://machinelearningmastery.com/model-based-outlier-detection-and-removal-in-python/

Machine Learning Mastery

4 Automatic Outlier Detection Algorithms in Python

The presence of outliers in a classification or regression dataset can result in a poor fit and lower predictive modeling […]

#

im tryna do smth like this

#

but for classification

#

would i have to change the baseline model to logistic regression?

lapis sequoia Feb 9, 2022, 11:44 AM

#

real arch my computer not strong enough. i already instal it yesterday. but maybe i should...

yeah that's what I mean, only Jupyter is best. VS Code I've only used a bit with Python, didn't like it at all, very unreliable with package imports

indigo harness Feb 9, 2022, 11:45 AM

#

Hello, I'm in need of suggestions or guidance on finding material on learning python since Ill be needing it for my AI module at uni, any help?

agile cobalt Feb 9, 2022, 11:52 AM

#

indigo harness Hello, I'm in need of suggestions or guidance on finding material on learning py...

!resources in general - though most of the time AI doesn't really requires much in-depth knowledge about the python language, just enough to glue things together.
If you have experience with any other language, or if they introduce the language a bit, you should be fine

arctic wedgeBOT Feb 9, 2022, 11:52 AM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

lapis sequoia Feb 9, 2022, 12:06 PM

#

def my_metric_fn(y_true, y_pred):
  print('=>',y_true.shape, y_pred.shape)
  return tf.norm(y_true - y_pred)

model.compile(
    optimizer='adam', 
    loss='mean_squared_error',
    metrics=[
        my_metric_fn
    ])
# fit the keras model on the dataset
history = model.fit(X_train, y_train, epochs=150, batch_size=8, verbose=0)
# evaluate the keras model
_, mse = model.evaluate(X_test, y_test)
# print('Accuracy: %.2f' % (mse*100))

this is my code, now here i see shape of y_true and y_pred as (8,8) and (8,8)

its good to note that i have 50 columns, and y i get at the end is a vector of 8.

So i assume we have this (8,8) as in 8 vectors?

#

please let me know if my understanding is correct.

#

I basically want to find predict the vector, or nearest to test vector, which metric should i put over here?

real arch Feb 9, 2022, 12:46 PM

#

lapis sequoia yeah that's what I mean, only Jupyter is best. VS Code I've only used a bit with...

thanks for your answer 👍 😆

south storm Feb 9, 2022, 12:50 PM

#

real arch my computer not strong enough. i already instal it yesterday. but maybe i should...

vs code works perfectly fine in low end pc like mine , but i personally recommend pycharm

#

hello guys , wanted some help for creating an AI assistant , if any help regarding modules and all please tell

brittle latch Feb 9, 2022, 2:04 PM

#

I've been assigned to develop a CNN for binary image classification. The images are mostly of people, and the classes represent whether a well-known celebrity is present in the image. I thought it would be wise to get some feedback because I don't have much experience with neural networks beyond creating toy examples.

I have URLs for a few hundred images to use for training and validation, and I'm downloading them right now. As a first step, I'm planning to modify a pre-trained classifier to output predictions for two classes. I expect this won't work very well, and I'll have to experiment with hyperparameters to get a good model. Once the images are in hand, I'll then have to process them and perhaps create more training images by flipping or blurring.

Right now I'm planning to use r-keras for creating the model, which is more or less an R interface to keras. I know Python, but I'm more comfortable with R for machine learning, and in any case the remainder of the project is entirely in R. Of course, I might use a different framework if r-keras proves too constraining.

Given my inexperience, I was wondering whether I'm omitting anything or planning to do something that's a bad idea. The classification task is for academic research, and my supervisor says it's on a small enough scale that it could feasibly be done by a human coder, but that automating tasks like this is a better practice, which I wholeheartedly support.

kind rock Feb 9, 2022, 3:07 PM

#

lapis sequoia hm history came as handy ```py history.history['accuracy'][::10] ``` is good eno...

I actually found out something. .evaluate() would give the loss for an example whereas .predict() would give the output. So, .evaluate() actually takes it a step further by finding the output and then printing the loss wrt the cost func.

lapis sequoia Feb 9, 2022, 3:08 PM

#

kind rock I actually found out something. .evaluate() would give the loss for an example w...

hm but that is for testing no?

kind rock Feb 9, 2022, 3:08 PM

#

serene scaffold yes

Gotcha

kind rock Feb 9, 2022, 3:10 PM

#

desert oar i don't know if i agree with the recommendation to avoid anaconda entirely. i th...

Basically @serene scaffold was talking about how anaconda isn't used in industries and hence I thought of what might be the replacement for using anaconda

serene scaffold Feb 9, 2022, 3:10 PM

#

I wouldn't go as far as to say that it "isn't used in industries" (other companies might use it, but mine is deliberately trying to move away form it)

kind rock Feb 9, 2022, 3:11 PM

#

lapis sequoia hm but that is for testing no?

Yeah. Those are functions used for evaluating models.

kind rock Feb 9, 2022, 3:11 PM

#

serene scaffold I wouldn't go as far as to say that it "isn't used in industries" (other compani...

Oh, alright.

lapis sequoia Feb 9, 2022, 3:12 PM

#

kind rock Yeah. Those are functions used for evaluating models.

oh i see, yeah alr!

kind rock Feb 9, 2022, 3:13 PM

#

cool

#

One question. I've seen people computing the cost function as :
Bias + w1x1 + w2x2 +... + error .
Why is the error term being used? Plus, I've also seen videos where the error term is avoided in the function.

vast light Feb 9, 2022, 4:05 PM

#

kind rock One question. I've seen people computing the cost function as : Bias + w1x1 + w2...

I think without adding error, small changes in model cause model to become unfeasible.

odd meteor Feb 9, 2022, 4:17 PM

#

kind rock One question. I've seen people computing the cost function as : Bias + w1x1 + w2...

Hi Absy, that's not how to compute the loss function. The equation below is the original regression equation.

Y = B0 + B1x1 + B2x2 + e

where:
Y = Yield (in ML lingo 'Label')
X = Explanatory variables (in ML lingo 'Features')
B0 = Intercept (in ML lingo 'Bias')
B1 & B2 = Slope (in ML lingo 'Weight')
e = The error/residual term (in my class I could recall a prof also calling it Gaussian noise)

We need to factor the error because the model in its core is making a calculated guess (Y_hat) which is likely not always gonna 100% provide an exact answer. (actual Y)! And surely as you know, any given guess would be off by a given margin (which is the error term)

If we take out the error term or refuse to acknowledge it in the first place, then we most probably won't be able to find a function that fits the data. Even if we do, the function will be notoriously horrible that we wouldn't wanna use it on any new data.
It's just as good as saying "I have a coconut head therefore I'm not interested in my model generalising"

=================

Once we've computed/fitted the data to the model, the regression line of the predicted value then changes to

Y_hat = B0 + B1x1 + B2x2

So by factoring the error term in the first equation, we can afford to say we have a heuristic model as opposed to deterministic model where the error term is always zero.

We're interested in having a heuristic model because it involves some level of estimation.

kind rock Feb 9, 2022, 4:49 PM

#

odd meteor Hi Absy, that's not how to compute the loss function. The equation below is the ...

So by bringing in the error term, we're accounting for a sort of error which the model would make and hence preventing the model from overfitting. So then why use the bias term too?

kind rock Feb 9, 2022, 4:51 PM

#

vast light I think without adding error, small changes in model cause model to become unfea...

by small changes do you mean that changing the hyperparameters of the model would make the model unfeasible in the context of not including the error term?

odd meteor Feb 9, 2022, 5:23 PM

#

kind rock So by bringing in the error term, we're accounting for a sort of error which the...

Not exactly, that alone doesn't prevent overfitting. Rather, I'd say once we admit we have an error term in the first place, we can analyze and minimize the error by first taking the difference between Y and Y_hat, and subsequently minimizing our loss function (MSE).

Remember some of the assumptions of OLS: as regards the residual yeah?

i) No autocorrelation i.e The error terms are uncorrelated with each other.
ii) Homoscedasticity i.e the error term is a random variable with a mean of zero and a constant variance.

Residuals follow the distribution of our error term. Hence if residuals seem to not be independent and identically distributed, then perhaps it's time we fit a different function, change our predictors, change our estimation method, or even transforming our data.

olive shore Feb 9, 2022, 5:39 PM

#

Does a dataset need to be in a specific format in order to work

#

I'm trying to make a dataset

serene scaffold Feb 9, 2022, 5:52 PM

#

olive shore Does a dataset need to be in a specific format in order to work

it needs to be consistent. beyond that, it's up to the data scientist to transform the dataset into an input for what they're trying to do.

soft viper Feb 9, 2022, 5:58 PM

#

I still dont understand what data warehouse is

desert bear Feb 9, 2022, 6:01 PM

#

Hey, soon I will be conducting exercises regarding unsupervised learning. Can you suggest me what algorithms or exercises would be interesting? I was thinking about using supervised learning to detect anomalies in data using IsolationForest etc.

serene scaffold Feb 9, 2022, 6:03 PM

#

desert bear Hey, soon I will be conducting exercises regarding unsupervised learning. Can yo...

k-means clustering is usually the go-to example of unsupervised learning.

serene scaffold Feb 9, 2022, 6:06 PM

#

soft viper I still dont understand what data warehouse is

I think it's basically the same thing as a database, but for storing data that can be used for analysis, rather than for persisting data for a web service, or something.

#

a coworker recently introduced me to the idea of a "data lake", which I guess is the less-structured analogue of a data warehouse. The content of a data warehouse has a specific structure that the creator of the warehouse intended, whereas a data lake contains data in whatever format it was found.

#

Also, with all these terms, I think there's a point at which people just try to coin terms to prove how relevant they are, or something. The distinction between a "database" and a "data warehouse" and a "data lake" is probably situation-dependent.

safe elk Feb 9, 2022, 6:15 PM

#

serene scaffold a coworker recently introduced me to the idea of a "data lake", which I guess is...

I have heard of data lakes too and their definition is similar

#

But it was a couple of years ago and not that recent

median linden Feb 9, 2022, 6:21 PM

#

Could someone help me out please?
I have two numpy arrays.
The first (input) is 27 columns of 1s and 0s, 5478 rows total
The second (output) is 3 columns of 1s and 0s, 5478 rows total
I'm using the tensorflow library on google colab, but I'm unsure which kind of model I should use for this, or which training method I should use

#

Any advice?

serene scaffold Feb 9, 2022, 6:29 PM

#

median linden Could someone help me out please? I have two numpy arrays. The first (input) is ...

it would be more concise to say that you have two arrays of ones and zeros, of shapes (5478, 27) and (5478, 3). but we can't really tell you what to do with them unless we understand what they represent and what you want the model to be able to do.

median linden Feb 9, 2022, 6:32 PM

#

serene scaffold it would be more concise to say that you have two arrays of ones and zeros, of s...

The first array represents boards in tic tac toe
The second array represents the score for the board (winning, drawing, losing)

#

I want a model which can take a board state, and tell me if it is winning, drawing, or losing

serene scaffold Feb 9, 2022, 6:33 PM

#

I see. And I assume you know that you can get 100% accuracy with a simple set of rules, and that this is just an exercise?

median linden Feb 9, 2022, 6:34 PM

#

It's my first time making any neural net, so it's just an exercise to learn

serene scaffold Feb 9, 2022, 6:34 PM

#

ah. well, is 5478 every possible tic tac toe board for a completed game?

median linden Feb 9, 2022, 6:35 PM

#

I know I dont need a neural net to do it, i just want to see how to do it

median linden Feb 9, 2022, 6:35 PM

#

serene scaffold ah. well, is 5478 every possible tic tac toe board for a completed game?

its every board position you can reach from the beginning empty board

#

including the empty board as well, which is drawing

serene scaffold Feb 9, 2022, 6:36 PM

#

median linden its every board position you can reach from the beginning empty board

I'm not exactly sure what neural architecture would be right for this, but you want to deliberately overfit the model. do you know what overfitting is?

median linden Feb 9, 2022, 6:36 PM

#

serene scaffold I'm not exactly sure what neural architecture would be right for this, but you w...

yes

#

its an intentional overfitting

serene scaffold Feb 9, 2022, 6:36 PM

#

since there's a known, finite set of inputs for this model, you might as well make a model that just memorizes them.

median linden Feb 9, 2022, 6:37 PM

#

I don't know how to go about that

serene scaffold Feb 9, 2022, 6:37 PM

#

I'm not exactly sure myself. I would look into how to make neural networks that memorize the training data.

median linden Feb 9, 2022, 6:38 PM

#

at the moment in trying to use a model that is sequential, with 25 layers, and then compiling the model for mean square error, then fitting the model

serene scaffold Feb 9, 2022, 6:38 PM

#

(which again, undermines the whole point of neural networks, but this is just for education.)

serene scaffold Feb 9, 2022, 6:39 PM

#

median linden at the moment in trying to use a model that is sequential, with 25 layers, and t...

you aren't "compiling the model for mean squared error" per se. you're compiling the model, and the loss function you've selected is mean squared.

median linden Feb 9, 2022, 6:39 PM

#

yes

#

I don't know the correct grammar or terminology for discussing neural nets 😅

serene scaffold Feb 9, 2022, 6:40 PM

#

also, the concept of "compiling a model" is specific to tensorflow/keras. it's just a design/terminology choice that they made, rather than a concept in neural network theory.

median linden Feb 9, 2022, 6:41 PM

#

yeah, i tried writing this exact model in python (without using the TF libraries), but it was taking way too long on my machine

serene scaffold Feb 9, 2022, 6:41 PM

#

yeah, you don't want to do machine learning in "pure python".

#

do you have a GPU?

#

well, I guess you're using colab.

#

the advantage of GPUs is that they're massively parallel on the inside, so they can crunch numbers much faster.

#

until the value returned by the loss function is always zero, I guess.

#

yeah, neural networks are computationally expensive to train without a GPU

#

Colab lets you use a GPU on Google's cloud. that's the main advantage of it.

#

the answer is rarely obvious for this kind of thing.

stone marlin Feb 9, 2022, 6:48 PM

#

Howdy y'all. Because I'm terrible at CV, I thought I'd bug you peeps for some thoughts. I'll give the toy problem + my current approach + my ask here. This is not an interview problem or homework, this is me trying to be less garbagio at some CV stuff.

Toy Problem Context: Suppose you have a video which is, for simplicity, 10px width by 10px length. Say that in this video is a pixel moving up and down in a sinusoidal manner --- but, sometimes, it'll "jitter" and break the sine. The problem is to find the time periods where the "jitters" occur.

My Plan of Attack: I'd like to make this into a timeseries, since I know how to do anom detection on that stuff. Since I haven't worked much on CV, I'm not exactly sure what techniques are used for things like this, if any. Right now, I'm taking the indices where a pixel is at t = t0 and plotting that as a TS, which works okay, but I feel it is not scalable to things repeating with 2D motion (eg, pixel going around in a circle, or slightly more complex shapes --- we can project these to 1D, but I'm not sure how well it will do).

My Ask: Given this, does anyone have resources for how people usually take CV-things like this and detect repetitive behavior?

hexed schooner Feb 9, 2022, 7:57 PM

#

anyone did reinforcement learning on LunarLander before?

mint palm Feb 9, 2022, 7:58 PM

#

Is it possible to have output of NN within like 15 to 20ms??

odd meteor Feb 9, 2022, 8:18 PM

#

kind rock So by bringing in the error term, we're accounting for a sort of error which the...

Bias is quite different from the error term. In Stats it's called the intercept i.e the expected mean value of Y when all your explanatory variables (Xi) = 0

Although, some would interpret the intercept as the point where your function crosses over to the y-axis.

odd meteor Feb 9, 2022, 8:26 PM

#

mint palm Is it possible to have output of NN within like 15 to 20ms??

In milliseconds? 🤔 Well, impossible is nothing these days - - so I'd say it depends on the size of your data and the kind of architecture your NN has.

mild dirge Feb 9, 2022, 9:35 PM

#

and your hardware 😛

zinc tiger Feb 9, 2022, 9:36 PM

#

^

#

Bout to say small models in train will take about 60ms~ on my 2060 / a batch

#

Inference is bound to be a lot less.

upper spindle Feb 9, 2022, 11:44 PM

#

has anyone got experience with a Neural network?

#

specifically a lstm?

serene scaffold Feb 10, 2022, 12:10 AM

#

@upper spindle try asking your actual question about LSTMs, not if anyone knows about them.

upper spindle Feb 10, 2022, 12:11 AM

#

ohh okay, sorry im quite new to using public discord chats

#

does anyone know how to setup a LSTM model for forecasting cryptocurrencies with reddit posts/comment sentiment values?

serene scaffold Feb 10, 2022, 12:14 AM

#

upper spindle ohh okay, sorry im quite new to using public discord chats

No problem. On this Discord, or any other real-time chat, you're always more likely to get an answer if you jump right in to your question and give people enough information to start answering it should they glance at the channel.

serene scaffold Feb 10, 2022, 12:15 AM

#

upper spindle does anyone know how to setup a LSTM model for forecasting cryptocurrencies with...

have you already figured out how to scrape reddit posts/comments (using the reddit API only)?

upper spindle Feb 10, 2022, 12:15 AM

#

serene scaffold have you already figured out how to scrape reddit posts/comments (using the redd...

i have yes

upper spindle Feb 10, 2022, 12:15 AM

#

serene scaffold No problem. On this Discord, or any other real-time chat, you're always more lik...

thanks for that

serene scaffold Feb 10, 2022, 12:16 AM

#

upper spindle i have yes

great! and I'm assuming the posts/comments have timestamps. Do you also have a list of crypto values by date (so you know the value of that currency over time)?

upper spindle Feb 10, 2022, 12:17 AM

#

serene scaffold great! and I'm assuming the posts/comments have timestamps. Do you also have a l...

yes, and i do have the crypto values (imported from yahoo)

iron basalt Feb 10, 2022, 12:17 AM

#

stone marlin Howdy y'all. Because I'm terrible at CV, I thought I'd bug you peeps for some t...

Not really, it depends on the problem and there are many different approaches out there. You can find endless outlier or anomaly detection papers out there for vision related tasks, but they almost all depend on the specific problem context. One neat thing about video is that a human can directly watch it and spot anomalies, but humans can't really go through a giant table of numbers unless they graph it. Vision is the dominant input form. In general you will probably find some time series prediction thing (if it's a time series problem (standard stuff)) and then that is used to detect outliers or anomalies (can be supervised, semi-supervised, or unsupervised). In this specific problem you are not expecting sudden jumps, but smooth motion, so you can use that to determine if something was a "jitter".

serene scaffold Feb 10, 2022, 12:17 AM

#

upper spindle yes, and i do have the crypto values (imported from yahoo)

How do you know that the reddit content that you're using relates to the current or future value of cryptos? are you only pulling from specific subreddits?

iron basalt Feb 10, 2022, 12:18 AM

#

CV is often more focused on finding out where that particle would be based on the image input. After you know where it is you can do normal stuff.

stone marlin Feb 10, 2022, 12:19 AM

#

Hm, yeah, I'll prob look around the landscape and see if I can transform this problem into something I know. That's good to know, though, thank you. I've got my little crummy model now, and for more complex things I'm just mapping in a weird way to 1D and doing a time series from that. :']

upper spindle Feb 10, 2022, 12:19 AM

#

only pulling from subreddits (bitcoin, ethereum, solana), i have pulled all of the comments and posts for 2020 & 2021

stone marlin Feb 10, 2022, 12:19 AM

#

I'll share it when I figure out something cool, and y'all can laugh at me and/or tell me what they'd do. :p

iron basalt Feb 10, 2022, 12:19 AM

#

stone marlin Hm, yeah, I'll prob look around the landscape and see if I can transform this pr...

There are mapping motion to 1D methods.

stone marlin Feb 10, 2022, 12:20 AM

#

Re: the mappings, yeah, I honestly think that's what I'm gonna be using. I'm trying different ones, but this kind of seems like the easiest possible way to go.

#

I found a few papers on it, but, yeah, very situational. I'm using very easy ones now (circular motion, mostly) and that'll prob work, but I think it'd be cool to mess with some other ones.

iron basalt Feb 10, 2022, 12:21 AM

#

The problem with methods that map to 1D is that they are often used in a way that also filters out noise, so the jitters would be ignored.

upper spindle Feb 10, 2022, 12:22 AM

#

import pandas as pd
from psaw import PushshiftAPI
import datetime as dt

r = praw.Reddit(client_id = "...", client_secret = "...", user_agent = "Sentiment")
api = PushshiftAPI(r)

start_epoch = int(dt.datetime(2020,1,1,0,0,0).timestamp())
end_epoch = int(dt.datetime(2021,12,31,23,59,59).timestamp())

comments = api.search_comments(after=start_epoch, before=end_epoch, subreddit='Solana', limit=1000000)

for post in posts:
    submissions.append(
        {"Subreddit ID": post.subreddit_id,
         "Subreddit": post.subreddit,
         "Title": post.title,
         "Body": post.selftext,
         "Number of Comments": post.num_comments,
         "Score": post.score,
         "Time": post.created,
         "URL": post.url,
         "Post ID": post.id,
        }
    )

iron basalt Feb 10, 2022, 12:22 AM

#

But you can totally have it accept all the noise.

upper spindle Feb 10, 2022, 12:22 AM

#

upper spindle ```import praw import pandas as pd from psaw import PushshiftAPI import datetime...

That is my code above to get the comments

stone marlin Feb 10, 2022, 12:23 AM

#

Yeah; right now, I'm mostly doing a kind of "threshold" on a grayscale image and projecting its darkest points down and working with that. It works okay for my specific problem, but it def would not work for anything more complicated.

iron basalt Feb 10, 2022, 12:23 AM

#

CV's bigger issues is dealing with all the complexity / noise of finding out where the particle is in the first place.

#

After that it's regular time series stuff on a 2D input.

#

Well, maybe 4D if you include velocity via motion detection.

#

Ofc, there is also all of https://en.wikipedia.org/wiki/Particle_filter , since this is a physics-y thing.

Particle filter

Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the internal states in dynamical systems when partial observations are made, and random perturbations are present i...

#

To help out in this context of moving particles.

stone marlin Feb 10, 2022, 12:25 AM

#

Huh, neat. This seems similar to Kalman, but I've not used a lot of either.

iron basalt Feb 10, 2022, 12:27 AM

#

Example usage: https://www.youtube.com/watch?v=elqAh3GWRpA

YouTube

FitMC

The Fall of Minecraft's 2b2t

Today we will discuss how the most powerful exploit in server history caused the fall of Minecraft's 2b2t, the oldest anarchy server in the game, and the fallout of the events that took place.
My Twitter: FitMC
My Instagram: fitmcsippycup
Music:
Tekken 6, MGR, FFXV, NMH

Additional 2b2t Footage/Information:

rebane2001: https://www.youtube.com/c...

▶ Play video

#

https://2b2t.miraheze.org/wiki/Nocom

2b2t Wiki

Nocom

Nocom was a coordinate exploit used by Nerds Inc on 2b2t from July 2018 to July 2021.

#

"After being located, a Monte Carlo particle filter was used to keep up with the player. This is an adaptive system that learns over time (after just a few seconds, really) where the player probably is, and their movement speed."

stone marlin Feb 10, 2022, 12:30 AM

#

Huh, alright, coolio. I'll check this stuff out. Thanks for the references!

iron basalt Feb 10, 2022, 12:41 AM

#

stone marlin Huh, alright, coolio. I'll check this stuff out. Thanks for the references!

Also you could use Fourier analysis for this specific problem (find the sinusoidal while ignoring jitters and then predict with it).

#

Anyhow, CV is more focused on even knowing where the objects are in the first place, than it is with where the objects will go (except for some basic motion detection (probably optical flow)). Well, it's still part of the CV problem, but after you used some CV method to figure out the higher level details (like "my object is over here"), you can use regular stuff. This all changes if you want to do something end-to-end-like (when you start doing NN stuff).

gilded bobcat Feb 10, 2022, 12:42 AM

#

Hi all I had a question, I am runnning two models, a simple one and complex one. On the complex one I interact polynomial variables with categorial variables. I've noticed one categorical variable always screws up my MSE and R2, as in individuals that have that categorical variable == 1 tend to be huge outliers when we predict them. How can I Ex-ante deal with this problem? It's not that case that individuals who have categorical variable == 1 also have large Y variables, so am very very confused.

#

I have any output + code needed, this is for a learning project, nothing too serious

#

just making me feel crazy

#

@mild dirge you were helping me with this yesterday, I have more or less pinpointed everything down to what variable is messing me up, now learning how to actually deal with it haha

mild dirge Feb 10, 2022, 12:51 AM

#

are you normalizing data yet?

iron basalt Feb 10, 2022, 12:53 AM

#

iron basalt Also you could use Fourier analysis for this specific problem (find the sinusoid...

Example: https://en.wikipedia.org/wiki/Tide-predicting_machine

Tide-predicting machine

A tide-predicting machine was a special-purpose mechanical analog computer of the late 19th and early 20th centuries, constructed and set up to predict the ebb and flow of sea tides and the irregular variations in their heights – which change in mixtures of rhythms, that never (in the aggregate) repeat themselves exactly. Its purpose was to shor...

gilded bobcat Feb 10, 2022, 12:53 AM

#

No, could you send information on that? I have another suspicion, the data has an experience variable and "exp1" "exp2" "exp3" "exp4", however if these were polynomials they don't mathimatically make sense

stone marlin Feb 10, 2022, 12:54 AM

#

Yeah, that's approx what I'm doing after I get the time series out. :'] Haha, it's a sweet solution.

mild dirge Feb 10, 2022, 12:54 AM

#

gilded bobcat No, could you send information on that? I have another suspicion, the data has a...

It's making sure the data lies in the range of 0 to 1 for every variable

iron basalt Feb 10, 2022, 12:54 AM

#

"They came to be regarded as of military strategic importance during World War I,[4] and again during the Second World War, when the US No.2 Tide Predicting Machine, described below, was classified, along with the data that it produced, and used to predict tides for the D-Day Normandy landings and all the island landings in the Pacific war."

gilded bobcat Feb 10, 2022, 12:55 AM

#

Ahhh I see, just put everything into standard devs?

mild dirge Feb 10, 2022, 12:55 AM

#

often MinMaxScaling is used

gilded bobcat Feb 10, 2022, 12:55 AM

#

interesting

mild dirge Feb 10, 2022, 12:55 AM

#

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html

scikit-learn

sklearn.preprocessing.MinMaxScaler

Examples using sklearn.preprocessing.MinMaxScaler: Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.24, Image denoising using kernel PCA Image denoising using kernel P...

iron basalt Feb 10, 2022, 12:55 AM

#

Kelvin was way too smart.

gilded bobcat Feb 10, 2022, 12:55 AM

#

Beautiful, ill look into it, im coming from econ research so we normally put stuff in standard devs, let it rip

#

thanks for your help pccamel, much appreciated

iron basalt Feb 10, 2022, 1:03 AM

#

stone marlin Yeah, that's approx what I'm doing after I get the time series out. :'] Haha, ...

Another fun solution is using a self organizing map: https://en.wikipedia.org/wiki/Self-organizing_map

Self-organizing map

A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the topological structure of the data. For example, a data set with p variables measured in n observations c...

#

It can map the image to 1D, and similar positions of the point will have similar positions in the 1D column.

#

(Next to each other in output)

stone marlin Feb 10, 2022, 1:04 AM

#

Woah, this looks legit, I've never heard of this one!

iron basalt Feb 10, 2022, 1:25 AM

#

stone marlin Woah, this looks legit, I've never heard of this one!

Hopefully my original answer makes more sense now, I can't really provide resources because everyone just kind of does their own weird thing that just works for them.

stone marlin Feb 10, 2022, 1:25 AM

#

Yeah, these are good resources, I'll try'em out and see what works! Thank you!

iron basalt Feb 10, 2022, 1:25 AM

#

And IDK of any summary blogs.

#

I guess these days many would immediately jump to LSTM or something, but I think that's often a bad idea due to the amount of computational power needed, training time, and LSTM is not even very good anyhow (at least use GRU).

#

A decent to good solution almost always involves the information about the specific problem and LSTM does not really make use of that by default.

#

It's sort of like brute forcing the problem (because it's generic prediction with no specific parts to the problem / just massive gradient descent with lot's of parameters) vs realizing that you can do much less work.

desert oar Feb 10, 2022, 3:20 AM

#

kind rock Basically <@!253696366952316929> was talking about how anaconda isn't used in i...

i definitely used it in industry and it was much better than the alternatives, because containerization was not an acceptable option for our workflow

#

but it still has some "sharp edges" and has a learning curve to work with it

thin palm Feb 10, 2022, 3:45 AM

#

What's up Python gang, when we are one hot encoding, are we supposed to Scale BEFORE on our features or do we scale AFTER.

#

Sometimes I forget simple steps in this process :((

serene scaffold Feb 10, 2022, 3:48 AM

#

@thin palm just so we're on the same page, what do you mean by scale?

thin palm Feb 10, 2022, 3:51 AM

#

serene scaffold <@401209043550732289> just so we're on the same page, what do you mean by scale?

Scale as in feature scaling for example I'll be using RobustScaler()

#

but not sure if I should first One Hot Encode my column and then scale or do opposite

serene scaffold Feb 10, 2022, 3:57 AM

#

thin palm Scale as in feature scaling for example I'll be using RobustScaler()

Do you understand what peoplem one-hot encoding is intended to solve?

thin palm Feb 10, 2022, 4:00 AM

#

serene scaffold Do you understand what peoplem one-hot encoding is intended to solve?

To my knowledge it is the ability for our machine learning model to interpret text as number for our model.

serene scaffold Feb 10, 2022, 4:00 AM

#

thin palm To my knowledge it is the ability for our machine learning model to interpret te...

It's not just text. It can be used to represent any discrete feature.

Do you know what discrete means?

#

In this case, it might be more accurate to say that one hot encoding is for representing nominal features. So instead, tell me what you think a nominal feature is.

#

I don't know if you're still there, but since one-hot encoding involves having a vector of all 0s except for one 1, and that vector represents a value that is not quantitative (like a word, in your case), there is nothing to scale.

rain blade Feb 10, 2022, 5:07 AM

#

I have recently wanted to start learning Data Science/ AI. Where should I start? (Sorry if this question is asked too many times)

stray nymph Feb 10, 2022, 5:36 AM

#

hi

#

am i supposed to handle class imbalance or outliers first

spare junco Feb 10, 2022, 6:14 AM

#

Hello, Can anyone suggest essential skills that is required to start freelancing as Data Analyst

ionic palm Feb 10, 2022, 6:45 AM

#

For tensorflow, trying to solve traveling salesmen problem,[location id, coord x, coord y] finding shortest path c→a→b,

['a',0,0],
['b',3,0],
['c',0,4]
]```
but with flexiable number of locations, that means flexiable amount of features, how to train tensorflow regressional model??

kind rock Feb 10, 2022, 8:27 AM

#

odd meteor Not exactly, that alone doesn't prevent overfitting. Rather, I'd say once we adm...

what's ols?

kind rock Feb 10, 2022, 8:28 AM

#

odd meteor Bias is quite different from the error term. In Stats it's called the `intercept...

I get that but won't some people increase the bias in the equation so as to prevent overfitting?

kind rock Feb 10, 2022, 8:30 AM

#

desert oar i definitely used it in industry and it was much better than the alternatives, b...

By alternatives do you mean venv?

mint palm Feb 10, 2022, 10:15 AM

#

odd meteor In milliseconds? 🤔 Well, impossible is nothing these days - - so I'd say it de...

I need that response on device like mobile

upper spindle Feb 10, 2022, 10:25 AM

#

does anyone here know how to use sentiment values from reddit comments to forecast cryptocurrency prices, or have done a similar project?

mint vine Feb 10, 2022, 10:30 AM

#

I want to find a fellow dev with experience (even so slightly) training and setting up for example huggingface models.

serene scaffold Feb 10, 2022, 1:25 PM

#

upper spindle does anyone here know how to use sentiment values from reddit comments to foreca...

you have to ask your actual question, not if there's someone who knows about a general concept. didn't I mention that before?

serene scaffold Feb 10, 2022, 1:26 PM

#

mint vine I want to find a fellow dev with experience (even so slightly) training and sett...

read my previous comment ^

mint vine Feb 10, 2022, 2:03 PM

#

serene scaffold read my previous comment ^

Thank you.
What I am actually looking for is a developer to work with. I am a frontend-dev looking for a partner for a project.

serene scaffold Feb 10, 2022, 2:08 PM

#

mint vine Thank you. What I am actually looking for is a developer to work with. I am a fr...

Alright. Keep in mind that it is not very likely that you will find an ongoing project partner in this server. I'm not sure where you can go to find that.

mint vine Feb 10, 2022, 2:09 PM

#

Any ideas where I can look?

serene scaffold Feb 10, 2022, 2:13 PM

#

I said "I'm not sure where you can go to find that." in anticipation of that question; not really.

grizzled heron Feb 10, 2022, 2:25 PM

#

Hi guys, what libs do you recommend for image recognition? (I wanna detect if there is $x bill on image)

shut trail Feb 10, 2022, 2:29 PM

#

do you think bs4 questions are better answered here or in webdev ?

brittle latch Feb 10, 2022, 2:31 PM

#

Question about image normalization. I'm planning to use the Resnet18 pre-trained classifier on a training set of .png images. Because they are pngs, some of the images, when converted to raster arrays, contain an alpha channel in addition to the usual R, G, and B channels. That channel has to be removed somehow, because ResNet18 expects tensors with three channel dimensions, not four. A brute-force solution would be converting the images to .jpg, which would remove the alpha channel entirely. Is there a better approach, or is conversion a reasonable solution?

shut trail Feb 10, 2022, 2:31 PM

#

grizzled heron Hi guys, what libs do you recommend for image recognition? (I wanna detect if th...

https://scikit-image.org/

shut trail Feb 10, 2022, 2:35 PM

#

brittle latch Question about image normalization. I'm planning to use the Resnet18 pre-trained...

you load the images as something before feeding it to the model right? can you just drop whatever you want then?

#

were just talking about dropping a channel right?

#

or a column...

brittle latch Feb 10, 2022, 2:39 PM

#

That would also be possible. The alpha channel is always the last, apparently, so I can just drop it from the array.

languid chasm Feb 10, 2022, 2:53 PM

#

5k calls just got traded on TWLO and NOTHING shows up on UW

#

I have to go back to cheddarflow or OptionsFlow

#

you get what you pay for

#

whoops

#

wrong place

lapis sequoia Feb 10, 2022, 2:54 PM

#

hm so I just found out that tf does auto differetiation of the nonlinearity functions, hence we don't need to provide differentiated eqN as another function in case we give our own nonlinearity function.

THATS JUST SO COOL😍

dark belfry Feb 10, 2022, 3:50 PM

#

I'm new to this and I'm having issues understanding how to fit a logistic regression model around a particular dataframe I'm working with. If someone could reach out to help I would appreciate it.

zealous burrow Feb 10, 2022, 3:57 PM

#

https://github.com/jina-ai/docarray can be interesting to data scientise/AI engineers working with unstructured data

GitHub

GitHub - jina-ai/docarray: The data structure for unstructured data

The data structure for unstructured data. Contribute to jina-ai/docarray development by creating an account on GitHub.

river maple Feb 10, 2022, 4:19 PM

#

are there any good tutorials for tensorflow object detection that aren't outdated and doesn't give me error every other lines

#

i've been trying to learn it for 3 days now and i encounter tons of errors every single day

odd meteor Feb 10, 2022, 4:57 PM

#

kind rock what's ols?

OLS = Ordinary Least Squares. It's a technique used in Statistics for estimating the relationship between one or more explanatory variables and a response variable.

In essence, the method estimates the relationship by minimizing the sum of the squares in the difference between the observed and predicted values of the response variable configured as a straight line.

OLS and Gradient Descent does same thing but with different approach.

odd meteor Feb 10, 2022, 5:18 PM

#

kind rock I get that but won't some people increase the bias in the equation so as to prev...

The bias I was referring to was in the context of b in y = mx + b. But when it comes to Bias-Variance tradeoff, to solve the overfitting problem you'll have to reduce your model complexity. By doing so, the variance decreases and bias increases.

ionic palm Feb 10, 2022, 5:32 PM

#

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf,tensorflow.compat.v1 as tf
inputs = tf.ragged.constant([[1,2,3,4],[],[5,6,7]])
output = tf.ragged.constant([[2,4,9,16],[],[25,36,49]])
model = tf.keras.models.Sequential([
   tf.keras.layers.Input(shape=[None], dtype=tf.int64, ragged=True),
    tf.keras.layers.Embedding(1000, 16),
    tf.keras.layers.LSTM(32, use_bias=False),
    tf.keras.layers.Dense(32),
    tf.keras.layers.Activation(tf.nn.relu),
    tf.keras.layers.Dense(1)
])
model.compile(loss='mean_squared_error', optimizer='sgd')
model.fit(inputs, output, epochs=5)
print('WARMINGS DONE')
print(model.predict(inputs))```
```1/1 [==============================] - ETA: 0s - loss: nan
1/1 [==============================] - 3s 3s/step - loss: nan```
Epoch losses are `nan`, is it layers problem?

thin palm Feb 10, 2022, 5:46 PM

#

serene scaffold It's not just text. It can be used to represent any discrete feature. Do you kn...

okay I think I see where you're going with this. In the bootcamp I attended they only went a few hours in depth about one hot encoding and emphasized it text to numbers essentially. Thank you for the clarification on this.

serene scaffold Feb 10, 2022, 5:50 PM

#

thin palm okay I think I see where you're going with this. In the bootcamp I attended they...

No problem!

odd meteor Feb 10, 2022, 5:55 PM

#

mint palm I need that response on device like mobile

I'm sorry idk about mobile.

dusk tide Feb 10, 2022, 5:56 PM

#

desert oar i definitely used it in industry and it was much better than the alternatives, b...

Are you a data scientist or ML enginner??

serene scaffold Feb 10, 2022, 6:11 PM

#

dusk tide Are you a data scientist or ML enginner??

They are.

dusk tide Feb 10, 2022, 6:17 PM

#

serene scaffold They are.

Needed to take some advice from them as my placement season is approaching

serene scaffold Feb 10, 2022, 6:17 PM

#

dusk tide Needed to take some advice from them as my placement season is approaching

Sure, but always direct your questions to the whole channel, not specific people.

#

If you need career-specific advice, try #career-advice.

thin palm Feb 10, 2022, 6:45 PM

#

what's up Python gang any advice on how to make sure your ML model is a decent one and how we can further examine it? I've been checking around Confusion Matrix which I enjoy but some of my scores are looking INTERESTING

mild dirge Feb 10, 2022, 6:53 PM

#

There's lots of stuff

#

you have accuracy, F1-score, precision, recall, confusion matrix etc.

#

There's also some more meta stuff like prediction time

thin palm Feb 10, 2022, 6:56 PM

#

mild dirge There's lots of stuff

well just when I ran a cross_validate on my machine learning model it produced a score as high as .88, but when I do a .fit on my X_test it shoots down to .57

#

and I'm just thinking, these scores are way too far away from eachother

mild dirge Feb 10, 2022, 6:56 PM

#

that's a pretty clear sign of overfitting

thin palm Feb 10, 2022, 6:56 PM

#

mild dirge that's a pretty clear sign of overfitting

ahhhhh

#

So my Algo is using the KNN Classification

#

and I used n_neighbors as 5, which I would assume is not overfitting. How can I further investigate?

mild dirge Feb 10, 2022, 6:57 PM

#

depends on how many samples you have in your training data

#

if you have bilions of samples, 5 would be pretty low

thin palm Feb 10, 2022, 6:58 PM

#

mild dirge if you have bilions of samples, 5 would be pretty low

samples you mean how i split my X_train and X_test

mild dirge Feb 10, 2022, 7:00 PM

#

Well there's multiple causes for your model performing worse on your test data

#

one could be that your test data is somehow not similar to your training data

#

Another could be that you don't really have enough samples to get an idea of how well your model performs

#

say you have 6 test samples, and you get 5 right, doesn't say a whole lot about how good your model is with this few samples

#

And even knn can overfit, if K is very low, it might overfit on your training data

#

Don't think that by itself would explain the big gap between test acc. and validation acc. though

bitter plume Feb 10, 2022, 7:02 PM

#

Hi if I wanted to get started in machine learning or AI what are some prerequisites that I would need to know.

mild dirge Feb 10, 2022, 7:02 PM

#

bitter plume Hi if I wanted to get started in machine learning or AI what are some prerequisi...

statistics and linear algebra mostly

bitter plume Feb 10, 2022, 7:02 PM

#

Is there no need for calculus

mild dirge Feb 10, 2022, 7:03 PM

#

There is, but it does not play as huge of a role as those two, it's probably a good 3rd though

bitter plume Feb 10, 2022, 7:04 PM

#

So the stuff on khan academy should be fine right? or should I get like a udemy course.

mild dirge Feb 10, 2022, 7:04 PM

#

Haven't looked at either so wouldn't know myself srr

#

maybe someone else knows

bitter plume Feb 10, 2022, 7:04 PM

#

oh ok, thanks for replying though.

gleaming remnant Feb 10, 2022, 7:34 PM

#

heyyy. I am looking for a way to find the x-coordinate of the intersection between a graph and the x-axis using matplotlib and numpy ? I really don't know how to figure it out

mild dirge Feb 10, 2022, 7:38 PM

#

You got an array of y values or something?

#

@gleaming remnant

#

Need more information to help

gleaming remnant Feb 10, 2022, 7:39 PM

#

mild dirge Need more information to help

Can we voicechat ? I will show you my screen

mild dirge Feb 10, 2022, 7:40 PM

#

No i'm cooking sorry

#

If you can't explain in text I can't help rn

gleaming remnant Feb 10, 2022, 7:41 PM

#

I'm using a formula to graph the trajectory of a projectile

#

It is for a physics assignment

mild dirge Feb 10, 2022, 7:44 PM

#

It's probably best if you open a help channel #❓｜how-to-get-help and provide all information necessary to answer any question you may have

arctic wedgeBOT Feb 10, 2022, 7:51 PM

#

Hey @gleaming remnant!

It looks like you tried to attach file type(s) that we do not allow (.html). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

gleaming remnant Feb 10, 2022, 7:51 PM

#

#

here is my code

serene scaffold Feb 10, 2022, 8:10 PM

#

gleaming remnant here is my code

!paste

arctic wedgeBOT Feb 10, 2022, 8:10 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

gleaming remnant Feb 10, 2022, 8:35 PM

#

https://paste.pythondiscord.com/hawezibogi.apache

sleek tapir Feb 10, 2022, 8:54 PM

#

which dataset do u guys recommend

#

after titanic

odd meteor Feb 10, 2022, 8:58 PM

#

sleek tapir which dataset do u guys recommend

What do you intend to work on next? The purpose of your project will inform the kind of dataset needed.

sleek tapir Feb 10, 2022, 9:06 PM

#

idk

#

im a beginner

sleek tapir Feb 10, 2022, 9:06 PM

#

odd meteor What do you intend to work on next? The purpose of your project will inform the ...

a uni student

quiet vault Feb 10, 2022, 9:44 PM

#

If you want to start with computer vision I recommend MNIST handwritten numbers

knotty barn Feb 10, 2022, 10:02 PM

#

Any numpy / pandas pros in here?

serene scaffold Feb 10, 2022, 10:07 PM

#

knotty barn Any numpy / pandas pros in here?

Always ask your actual question, not if there's an expert.

clever marten Feb 10, 2022, 10:13 PM

#

Hey guys, im working on a project that involves python and IMAP. Im quite stack on how to build a criteria that the script will use to filter/ search the emails and retrieve attachments with a particular extension, sent during a particular date and from a particular address. Any help??

knotty barn Feb 10, 2022, 10:15 PM

#

Oh okay, so I am looking at data in a dataframe, I select the value using dataframe_result[i][m] and use it to compare to value dataframe_reference[i][m]. I ran it once works perfect, go to run again it throws 'The true value of an array with more than one element is ambiguous. Use a.any() or a.all()'. So okay fine, I go result[i][m].any(), etc. Now I get "error 'int' object has no attribute 'any'" WTF???

#

Been trying to solve various ways for 3 hours

#

Any big daddys in here that can help?

#

So confused how it complies and runs perfect and then breaks as well

#

Can I force this to run and just be done with it?

#

Would having formulas in an excel sheet f it up? Like ROUND?

serene scaffold Feb 10, 2022, 10:37 PM

#

knotty barn Oh okay, so I am looking at data in a dataframe, I select the value using datafr...

please show the code and the whole error message starting from Traceback

#

!paste

arctic wedgeBOT Feb 10, 2022, 10:37 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold Feb 10, 2022, 10:38 PM

#

knotty barn So confused how it complies and runs perfect and then breaks as well

compile time and run time errors are different.

#

because of the way Python is designed, fewer potential errors can be detected at compile time.

serene scaffold Feb 10, 2022, 10:39 PM

#

knotty barn Can I force this to run and just be done with it?

No. Code follows the instructions that you give to it, and if you give it instructions that cannot be completed, it breaks.

#

I understand that you're trying to get something done here, but you need to maintain patience and a positive attitude about learning.

knotty barn Feb 10, 2022, 10:50 PM

#

Sorry man, I've tried many different things, but it's not working and it's frustrating that it once worked and I changed nothing

#

Unfortunately I run discord on the PC I'm using for this

#

Cannot run*

hollow sentinel Feb 10, 2022, 10:54 PM

#

hey guys quick question

#

what is information gain

#

reduction in entropy

#

so as entropy decreases, information gain increases?

serene scaffold Feb 10, 2022, 11:01 PM

#

@knotty barn unfortunately there's not much anyone can do to help unless you can share the information I asked for earlier. I'm sorry this is frustrating for you.

#

Keep in mind that Discord can be used in the browser.

knotty barn Feb 10, 2022, 11:06 PM

#

Can I send a pic lol?

#

I miss the old days when my code worked lol

#

How can my value be considered both an int64 and an array?

serene scaffold Feb 10, 2022, 11:20 PM

#

knotty barn Can I send a pic lol?

I don't look at pics of code, but someone else might.

serene scaffold Feb 10, 2022, 11:20 PM

#

knotty barn How can my value be considered both an int64 and an array?

It might be an array of 64 bit integers.

knotty barn Feb 10, 2022, 11:21 PM

#

If it were an array then I could use .any() or .bool() on it, but says it's not an array

#

But rather an int64. Then when I try to run it as if it were an int64, it says it cannot do that because it's an array lol

serene scaffold Feb 10, 2022, 11:23 PM

#

Clearly there's a misunderstanding here. See how creative you can be about getting the actual text into this Discord. For my own sanity, I draw the line at reading screenshots of text.

knotty barn Feb 10, 2022, 11:23 PM

#

I pull off .csvs, but I can rewrite the code I guess

serene scaffold Feb 10, 2022, 11:24 PM

#

you can drag/drop CSV files directly into the Discord client.

knotty barn Feb 10, 2022, 11:24 PM

#

So you won't have my data tables, but I can type the code on my phone lol

serene scaffold Feb 10, 2022, 11:24 PM

#

I'm skeptical that you can't log into Discord from a browser, though.

knotty barn Feb 10, 2022, 11:25 PM

#

I can, but I am not allowed to do from my work PC

serene scaffold Feb 10, 2022, 11:25 PM

#

rip

knotty barn Feb 10, 2022, 11:25 PM

#

Lol yep

serene scaffold Feb 10, 2022, 11:25 PM

#

as a last resort, I usually email stuff to my work email.

#

my company blocked all non-work email inboxes because too many Karens in non-technical roles open every email they get 😠

knotty barn Feb 10, 2022, 11:26 PM

#

Can't email any files or information to my personal email from my work email lol

serene scaffold Feb 10, 2022, 11:26 PM

#

shrug2

tidal bough Feb 10, 2022, 11:32 PM

#

hollow sentinel so as entropy decreases, information gain increases?

well, yeah, in information theory information literally is negative entropy. For that reason, a perfect way of compressing information produces a result that looks like random noise, with no regularities whatsoever - because any regularity in the data would be predictable, and so you could improve the compression by dropping these predictable parts.

knotty barn Feb 10, 2022, 11:33 PM

#

y_actual = result_db[i]
y_hat = reference_db[i]
if y_actual[m]==y_hat[m]==1: <- this is the error

#

There's a lot of stuff going on but essentially result_db and reference_db are dataframes with columns of 0s and 1s

#

i = column header label, and the m = the row within column... or so I thought until it broke randomly

stone marlin Feb 10, 2022, 11:51 PM

#

Huh, TIL that you can do x == y == 1 in Python. I never saw that before.

#

Spice, if you print out y_actual[m] are you getting a series or an integer? You may also want to use loc or iloc to get rows / columns for pandas.

#

Usually, that any/all error means "you're comparing a series / df to a series / df, and it's giving us a lot of true/false values."

#

In fact, as I just learned, the double double-equals is messing you up, probably. It's trying to compare the second thing to 1 --- that's a data frame to an integer. EDIT: Removed long code example, since the error is easier, see below.

#

So, you might want to use something like this instead of the x == y == 1 thing: (df["pred_value"] == df["true_value"]) & (df["pred_value"] == 1).

tidal bough Feb 10, 2022, 11:57 PM

#

knotty barn y_actual = result_db[i] y_hat = reference_db[i] if y_actual[m]==y_hat[m]==1: <- ...

y_actual[m]==y_hat[m]==1 is equivalent to (y_actual[m] == y_hat[m]) and (y_hat[m] == 1). If these are series, that and will fail

#

for elementwise AND on series you need & instead of and, and so must write it explicitly

stone marlin Feb 10, 2022, 11:59 PM

#

I do not like that syntactic sugar. Also, dang, x <= y <= z works too? Where have I been...?

tidal bough Feb 11, 2022, 12:00 AM

#

yeah, they are all very nice

#

you can actually chain them arbitrarily long. Also, somewhat cursedly, in is also a chaining operator

#

!e

import dis
dis.dis("a < b in c > d")

arctic wedgeBOT Feb 11, 2022, 12:01 AM

#

@tidal bough :white_check_mark: Your eval job has completed with return code 0.

001 |   1           0 LOAD_NAME                0 (a)
002 |               2 LOAD_NAME                1 (b)
003 |               4 DUP_TOP
004 |               6 ROT_THREE
005 |               8 COMPARE_OP               0 (<)
006 |              10 JUMP_IF_FALSE_OR_POP    14 (to 28)
007 |              12 LOAD_NAME                2 (c)
008 |              14 DUP_TOP
009 |              16 ROT_THREE
010 |              18 CONTAINS_OP              0
011 |              20 JUMP_IF_FALSE_OR_POP    14 (to 28)
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/ivazecuhix.txt?noredirect

stone marlin Feb 11, 2022, 12:01 AM

#

Haha, I do not like that.

tidal bough Feb 11, 2022, 12:01 AM

#

this is a < b and b in c and c > d.

#

yeah, haha

stone marlin Feb 11, 2022, 12:01 AM

#

But the x <= y <= z one might be useful. I use this a significant amount for timeseries stuff.

tidal bough Feb 11, 2022, 12:02 AM

#

absolutely. sadly, due to the and thing, can't be used if the result isn't a single bool

stone marlin Feb 11, 2022, 12:02 AM

#

But it's going to be... is this x <= y and y <= z, but it's not necessarily transitive, yeah?

#

Like, if you have a weird custom operator for <= like "direct child of" for example.

tidal bough Feb 11, 2022, 12:02 AM

#

Yeah, no promises. It always gets interpreted as the equivalent of x<=y and y<=z.

stone marlin Feb 11, 2022, 12:03 AM

#

Yeah, that should be fairly clear in very specific situations. Hm.

#

Haha, I always have to think, "Is this going to be readable?" before I try to change any of my code style stuff.

iron basalt Feb 11, 2022, 12:03 AM

#

stone marlin But the `x <= y <= z` one might be useful. I use this a _significant amount_ f...

Python be like: ```py

2 <= (x := 3) <= 4
True

stone marlin Feb 11, 2022, 12:03 AM

#

Walrus Operator: Greatest Operator.

iron basalt Feb 11, 2022, 12:03 AM

#

They added the ability to assign in inside expression, which people in C hated for code understandability.

#

Adding bad features from C is hip now though.

stone marlin Feb 11, 2022, 12:04 AM

#

I'm still unsure how I feel about walrus, stylistically. I've used it VERY rarely for if-statement stuff, but I don't know how readable it is in general.

#

if value := my_long_function_name_that_i_dont_want_to_type_again(things):

iron basalt Feb 11, 2022, 12:05 AM

#

Wait until you find stuff people used to do in C like: ```py

x = (x := 3) + 1
x
4

#

But it gets much worse.

stone marlin Feb 11, 2022, 12:05 AM

#

Yeah, I do not want to use it like that.

#

I feel like if I use it, I'm going to limit it to the if use case.

iron basalt Feb 11, 2022, 12:06 AM

#

>>> x = (x := 3) + 1
>>> x
4
>>> x = (x := x + 1) + 1
>>> x
6

#

Fun.

stone marlin Feb 11, 2022, 12:07 AM

#

Yeah, that's --- yeah, nope, not for me.

iron basalt Feb 11, 2022, 12:07 AM

#

And the classic, editing two vars at the same time: ```py

x = 1
y = 1
y = (x := x + 1) + 2
x
2
y
4

tidal bough Feb 11, 2022, 12:07 AM

#

IIRC you can even get UB in C by doing two assignments on one line like that?

stone marlin Feb 11, 2022, 12:07 AM

#

What's UB?

tidal bough Feb 11, 2022, 12:08 AM

#

right, i = i++ + ++i; is UB.

iron basalt Feb 11, 2022, 12:08 AM

#

tidal bough IIRC you can even get UB in C by doing two assignments on one line like that?

It depends where, if you do it as the function arguments yes, because there is no spec on which order the expressions of function arguments are run.

stone marlin Feb 11, 2022, 12:08 AM

#

I do not know C enough to know what the hell this is. Haha.

iron basalt Feb 11, 2022, 12:08 AM

#

e.g. foo(x++, ...)

tidal bough Feb 11, 2022, 12:08 AM

#

stone marlin What's UB?

undefined behaviour. Something that you're supposed not to do, and the compiler is allowed to assume it will never happen, and the compiler is allowed to emit code that does literally anything if it does happen.

stone marlin Feb 11, 2022, 12:09 AM

#

Ohhhh, got'cha, got'cha.

iron basalt Feb 11, 2022, 12:09 AM

#

C code relies a lot on undefined behavior, but if you want it to be as portable as possible you want to minimize it.

stone marlin Feb 11, 2022, 12:09 AM

#

I've been working on learning the black codebase, with the AST stuff, so that jives with what little I know about programming. :'''']

iron basalt Feb 11, 2022, 12:09 AM

#

It's undefined from the language POV, but not the system/hardware.

#

I think there is also a C compiler that messes with everyone's expectations like how int is 32 bits.

#

As a joke.

#

(int can be any number of bits, depends on system, and compiler)

stone marlin Feb 11, 2022, 12:13 AM

#

If I had an infinite amount of time, I'd try to learn some C-or-lower, but, alas. Life is only so long.

iron basalt Feb 11, 2022, 12:14 AM

#

Well, C is a pretty simple language, the complexity all comes from how to make decent code with it.

#

It has a lot of traps, like its standard library which should not be used.

stone marlin Feb 11, 2022, 12:14 AM

#

Haha, that's what I mean --- architecting things with C. It would take me far too long and far too off-field for me to think seriously about doing it. :''']

iron basalt Feb 11, 2022, 12:15 AM

#

(old, outdated, for a system nobody uses anymore, and will give you lots of security issues)

#

(unfortunately, its standard library is people's first contact with C, and it leaves a real bad impression that scares people into using something like Python (or, as it happened in the past, to Java) and never turning back)

stone marlin Feb 11, 2022, 12:16 AM

#

Welp, that's it, I'm only going to use 6502 ASM from now on. :']

iron basalt Feb 11, 2022, 12:17 AM

#

(the std lib also teaches all the wrong things, it's how not to code in C)

#

(it's also why every C (and C++) programmer kind of has their own standard library / bag of tools that they use, and often get pointed at as suffering from "not invented here syndrome", when really it's just lack of a good std lib)

#

(On the other hand, C is really one of a kind, the universal programming language that is simple and does not change too much while having important stuff like an ABI)

#

(And now with web assembly (lol, full circle huh?), it can really be used for anything)

#

I hope Python does not really change anymore, keep it simple (or at least not more complex than now). It has all it needs. Modules, a decent std lib, etc. The only thing it needs, which is being worked on right now (HPy), is universal modules that work on all Python implementations.

stone marlin Feb 11, 2022, 12:26 AM

#

All I want for my Python Christmas is for more popular packages to put in type-hinting. :'] But I agree, I like it as it is.

iron basalt Feb 11, 2022, 12:27 AM

#

It's already flexible enough for whatever due to operator overloading.

#

Type hinting, yeah.

#

The HPy thing btw, would let CPython also improve the GC and get rid of the GIL.

#

So get that checked off the list.

gray trail Feb 11, 2022, 12:52 AM

#

Does anybody have any experience with solving constraint satisfaction problems? Trying to implement one for my masters project, but I'm having some issues with working out the best approach for the problem

knotty barn Feb 11, 2022, 12:56 AM

#

Thanks guys for the help, haven't tried it but makes sense to me

iron basalt Feb 11, 2022, 12:59 AM

#

gray trail Does anybody have any experience with solving constraint satisfaction problems? ...

https://www.youtube.com/watch?v=l-tzjenXrvI&list=PLUl4u3cNGP63gFHB6xb-kVBiQHYe_4hSi&index=8

YouTube

MIT OpenCourseWare

7. Constraints: Interpreting Line Drawings

MIT 6.034 Artificial Intelligence, Fall 2010
View the complete course: http://ocw.mit.edu/6-034F10
Instructor: Patrick Winston

How can we recognize the number of objects in a line drawing? We consider how Guzman, Huffman, and Waltz approached this problem. We then solve an example using a method based on constraint propagation, with a limited...

▶ Play video

#

7, 8, 9

serene scaffold Feb 11, 2022, 1:54 AM

#

stone marlin I do not like that syntactic sugar. Also, dang, `x <= y <= z` works too? Where...

you don't like chained comparisons?

as ConfusedReptile was getting at, the contract in Python is that the dunder methods for comparison operators have to return a bool. But numpy and pandas types don't uphold that contract, and their __bool__ methods deliberately raise an error. So when chained comparisons are expanded, they cause an error.

serene scaffold Feb 11, 2022, 1:56 AM

#

iron basalt I hope Python does not really change anymore, keep it simple (or at least not mo...

the most recent steering council elections had "keep on changing stuff" and "slow down the changes" factions, and the "slow down the changes" faction seems to have won out.

iron basalt Feb 11, 2022, 2:15 AM

#

serene scaffold the most recent steering council elections had "keep on changing stuff" and "slo...

I want changes in implementation and stuff like getting rid of the GIL, but not the language itself.

#

Any real gains that could be had with further changes would require very breaking changes. LIke static typing. The non-breaking changes just have gains way too small to offset the issues of version changes.

serene scaffold Feb 11, 2022, 2:17 AM

#

language changes are what I was referring to. Optimizations for tightly type-annotated code might be doable, however. (This is probably better for #internals-and-peps.)

iron basalt Feb 11, 2022, 2:18 AM

#

serene scaffold language changes are what I was referring to. Optimizations for tightly type-ann...

Yeah, but even without the type-annotated code, as pypy shows, it can be A LOT faster already. We just need universal modules that work across Python implementations (Hpy).

#

To solve the whole thing where PyPy and others need to simulate being CPython, basically switch modes.

#

With something like that, many Python implementations could become drop-in replacements (especially PyPy (but also the GraalVM Python, etc)). Ofc, the other huge gain being (again) multithreading without GIL.

desert oar Feb 11, 2022, 2:21 AM

#

serene scaffold the most recent steering council elections had "keep on changing stuff" and "slo...

Good, i think they have changed enough for now and it's time to chill out

river maple Feb 11, 2022, 2:45 AM

#

is there any good tutorials for tensorflow object detection that isn't outdated and doesn't give me errors every other lines

serene scaffold Feb 11, 2022, 3:00 AM

#

desert oar Good, i think they have changed enough for now and it's time to chill out

tbh I wish they hadn't done patma for no reason other than that it makes a hard boundary between 3.10 and earlier versions

lapis sequoia Feb 11, 2022, 3:09 AM

#

Someone who wants to help me with my Python problem ? Send me a pb (: it is about NLTK, Categorial distribution.

serene scaffold Feb 11, 2022, 3:13 AM

#

lapis sequoia Someone who wants to help me with my Python problem ? Send me a pb (: it is abou...

Please always ask your actual question, giving enough information for someone to answer it, instead of asking if someone knows about a topic.

novel acorn Feb 11, 2022, 4:34 AM

#

Hello Everyone, hope you're doing great!
Can I use target/mean encoding for regression problems?
Or what encoder would be helpful for regression?

orchid moat Feb 11, 2022, 4:34 AM

#

i can't understand why the method cv2.erode working as cv2.dilate and vice versa in opencv

novel acorn Feb 11, 2022, 4:48 AM

#

One more question, is there a way to do target encoding if I have 2 targets (multi-output model)?

reef dock Feb 11, 2022, 5:24 AM

#

Does anyone have any advice/resources to learn + practice python lists and algorithms and working with data to improve on those topics

serene scaffold Feb 11, 2022, 5:26 AM

#

reef dock Does anyone have any advice/resources to learn + practice python lists and algor...

you're looking to learn more about "classic" algorithms, like list sorting?

reef dock Feb 11, 2022, 5:26 AM

#

Yes, pretty much.

serene scaffold Feb 11, 2022, 5:27 AM

#

reef dock Yes, pretty much.

that's a question for the #algos-and-data-structs channel.

reef dock Feb 11, 2022, 5:27 AM

#

oof, that's where I posted it initially and ended up here.

#

Cool, thanks.

serene scaffold Feb 11, 2022, 5:27 AM

#

also this: https://www.pythondiscord.com/resources/?topics=algorithms-and-data-structures

Python Discord | Resources

We're a large, friendly community focused around the Python programming language. Our community is open to those who wish to learn the language, as well as those looking to help others.

reef dock Feb 11, 2022, 5:28 AM

#

Oh wow.

serene scaffold Feb 11, 2022, 5:30 AM

#

yeah, looks like we only have two resources for A/DS. at least for now.

reef dock Feb 11, 2022, 5:30 AM

#

Works, I'll into those. Thanks

strong tapir Feb 11, 2022, 6:37 AM

#

I'm trying to take a crack at the playing snake with an AI project using NEAT but I'm a beginner to NEAT and AI in general.
My current issue is that all my outputs are 0 and I can't figure out if it is because of my activation functions or is it a garbage in garbage out problem. I'm really looking for someone who is familiar with all of this to help me determine my problem and point me in the right direction

#

I would rather not post the code it because frankly its pretty junky and not really readable so I'm more so looking for help in my dms

serene scaffold Feb 11, 2022, 6:38 AM

#

strong tapir I would rather not post the code it because frankly its pretty junky and not rea...

If someone wants to help you in their DMs, I suppose they can, but you're much more likely to get help if you give information in the channel.

#

People typically don't want to give help in DMs because they'd have to back out if it turns out the question involves something they can't help with.

strong tapir Feb 11, 2022, 6:39 AM

#

okay then ill provide more info

serene scaffold Feb 11, 2022, 6:40 AM

#

Also, literally no one is proud of any code they wrote more than like a year ago.

strong tapir Feb 11, 2022, 6:48 AM

#

Currently my input data:

[snake_food_distance, snake_topwall_distance, snake_rightwall_distance,
snake_leftwall_distance, snake_bottomwall_distance, snake_indanger (this is a boolean)]

The distance values are in grid blocks instead of just the number of pixels in between the 2 objects.

My output nodes are [Turn Left, Keep Straight, Turn Right] (The problem is these are always returning 0)

My movement system is just changing 4 booleans to fit whatever the desired direction is. The booleans are up down left and right

How the AI controls the movement:

output = network.activate(list(player.vision())) #player.vision returns the input data listed above in the example
            if max(output) == 0:
                ...
            elif output[0] == max(output):
                if UP:
                    UP, DOWN, LEFT, RIGHT = False, False, True, False
                elif DOWN:
                    UP, DOWN, LEFT, RIGHT = False, False, False, True
                elif LEFT:
                    UP, DOWN, LEFT, RIGHT = False, True, False, False
                elif RIGHT:
                    UP, DOWN, LEFT, RIGHT = True, False, False, False
            elif output[1] == max(output):
                ...
            elif output[2] == max(output):
                if UP:
                    UP, DOWN, LEFT, RIGHT = False, False, False, True
                elif DOWN:
                    UP, DOWN, LEFT, RIGHT = False, False, True, False
                elif LEFT:
                    UP, DOWN, LEFT, RIGHT = True, False, False, False
                elif RIGHT:
                    UP, DOWN, LEFT, RIGHT = False, True, False, False

serene scaffold Feb 11, 2022, 6:48 AM

#

this is better than I was expecting.

strong tapir Feb 11, 2022, 6:48 AM

#

I don't know if the issue is the data I'm inputting, my NEAT configuration, or the coding of my game in general

#

This is the NEAT config file (i ripped it from a youtuber because i was testing to see if it was my settings)

[NEAT]
fitness_criterion     = max
fitness_threshold     = 50000
pop_size              = 10
reset_on_extinction   = False

[DefaultGenome]
# node activation options
activation_default      = relu
activation_mutate_rate  = 0.05
activation_options      = relu tanh
#abs clamped cube exp gauss hat identity inv log relu sigmoid sin softplus square tanh

# node aggregation options
aggregation_default     = random
aggregation_mutate_rate = 0.05
aggregation_options     = sum product min max mean median maxabs

# node bias options
bias_init_mean          = 0.01
bias_init_stdev         = 1.0
bias_max_value          = 30.0
bias_min_value          = -30.0
bias_mutate_power       = 0.5
bias_mutate_rate        = 0.7
bias_replace_rate       = 0.1

# genome compatibility options
compatibility_disjoint_coefficient = 1.0
compatibility_weight_coefficient   = 0.5

# connection add/remove rates
conn_add_prob           = 0.5
conn_delete_prob        = 0.1
#Ändra till 0.5?

# connection enable options
enabled_default         = False
enabled_mutate_rate     = 0.2

feed_forward            = True
initial_connection      = full
#initial_connection      = full_nodirect 0.5

# node add/remove rates
node_add_prob           = 0.5
node_delete_prob        = 0.1

# network parameters
num_hidden              = 0
num_inputs              = 6
num_outputs             = 3

# node response options
response_init_mean      = 1.0
response_init_stdev     = 0.05
response_max_value      = 30.0
response_min_value      = -30.0
response_mutate_power   = 0.1
response_mutate_rate    = 0.75
response_replace_rate   = 0.1

# connection weight options
weight_init_mean        = 0.3
weight_init_stdev       = 1.0
weight_max_value        = 30
weight_min_value        = -30
weight_mutate_power     = 0.5
weight_mutate_rate      = 0.8
weight_replace_rate     = 0.1

[DefaultSpeciesSet]
compatibility_threshold = 2.5

[DefaultStagnation]
species_fitness_func = max
max_stagnation       = 5
species_elitism      = 1

[DefaultReproduction]
elitism            = 8
survival_threshold = 0.3

arctic wedgeBOT Feb 11, 2022, 6:51 AM

#

Hey @strong tapir!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

strong tapir Feb 11, 2022, 6:51 AM

#

https://paste.pythondiscord.com/apapelogix.yaml This is the full code

echo vigil Feb 11, 2022, 7:43 AM

#

Seems to be an issues with your config file. Subbing your config file with the 1 from the NEAT XOR example gives non-zero output:

#

https://pastebin.com/i7jakmq3

Pastebin

#--- parameters for the XOR-2 experiment ---#[NEAT]fitness_criterio...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

echo vigil Feb 11, 2022, 8:05 AM

#

Ah, setting enabled_default = True is all you need to change to get non-zero values, but you also probably need further tuning to get good behavior.

upper spindle Feb 11, 2022, 11:22 AM

#

does anyone how how to use a lstm to predict crypto prices with reddit sentiment values based off comments

#

ive got the dataset

#

and have generated sentiment values for my comments/post titles

#

this is my dataset

arctic wedgeBOT Feb 11, 2022, 12:37 PM

#

:incoming_envelope: :ok_hand: applied mute to @brazen lava until <t:1644583673:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

serene scaffold Feb 11, 2022, 1:41 PM

#

@upper spindle looks like the timestamps are precise to the second. What about your data about crypto values? Are those only precise to the day?

frail fossil Feb 11, 2022, 1:54 PM

#

Hi, I have a question, how do you convert each column of dataframe to string so that I can split the value and put it in the new dataframe??

#

I already tried .astype("|S") but I think it turns out that the length of my column is too long for the asci encoder to encode (I think I need to convert it first to utf-8, but I dunno how to do that in a dataframe)

agile cobalt Feb 11, 2022, 1:57 PM

#

frail fossil Hi, I have a question, how do you convert each column of dataframe to string so ...

what exactly are you trying to do?.......

#

if you want to save it in a file or send it somewhere, you can use pickle, csv or h5
if you want to take some columns and make a copy of them, just use df[['col1', 'col2']].copy()

#

if you want to split (individual*) columns, you could use pandas.Series.str.split

serene scaffold Feb 11, 2022, 1:59 PM

#

this_tbh

#

there's also .astype(str)

#

and most DataFrame operations return a new DataFrame without changing the original, so it's unlikely that you'd need to actively put anything in a new DF.

upper spindle Feb 11, 2022, 2:01 PM

#

serene scaffold <@722177620019511380> looks like the timestamps are precise to the second. What ...

yes

frail fossil Feb 11, 2022, 2:01 PM

#

agile cobalt if you want to split (individual*) columns, you could use `pandas.Series.str.spl...

oh there is already a split function from pandas, OK I think this will do, let me try it first

frail fossil Feb 11, 2022, 2:02 PM

#

serene scaffold and most DataFrame operations return a new DataFrame without changing the origin...

okay, thanks I see

serene scaffold Feb 11, 2022, 2:02 PM

#

upper spindle yes

if that's the case, you probably need make the post timestamps precise only to the day as well. have you looked into time series forecasting algorithms?

#

also, it looks like you can reduce the precision of a timestamp by changing it to midnight for that day. .dt.floor('D') on the timestamp column should do that.

upper spindle Feb 11, 2022, 2:14 PM

#

serene scaffold if that's the case, you probably need make the post timestamps precise only to t...

thanks for your reply, i havent looked at other time series forecasting algorithms yet,

#

what ones would you recommend

#

im quite new to programming and this is a project for my uni dissertation so hopefully it does work

serene scaffold Feb 11, 2022, 2:20 PM

#

upper spindle what ones would you recommend

no idea. my actual work is human language technology; I have never done time series forecasting.

upper spindle Feb 11, 2022, 2:22 PM

#

serene scaffold no idea. my actual work is human language technology; I have never done time ser...

are you familiar with LSTMs?

serene scaffold Feb 11, 2022, 2:23 PM

#

upper spindle are you familiar with LSTMs?

yes

lime loom Feb 11, 2022, 2:24 PM

#

Does anyone have a reference deep-learning notebook they turn to for what a good notebook could look like? I'm looking for something to model my notebooks off of

serene scaffold Feb 11, 2022, 2:28 PM

#

lime loom Does anyone have a reference deep-learning notebook they turn to for what a good...

notebooks are for quick exploration or "telling the data story"

#

so if this isn't a notebook that you see as disposable (ie for quick exploration), I would strcture it around how it conveys the transformation of the raw data into a model.

#

but keep in mind that notebooks are "terminal". anything that's in the notebook should be regarded as having no life outside of it. if you're making a model that you actually want to use for non-demonstration purposes, you should get out of the notebook as soon as possible.

lime loom Feb 11, 2022, 2:31 PM

#

serene scaffold so if this isn't a notebook that you see as disposable (ie for quick exploration...

Thanks! That's useful advice. I'm a neuroscience grad student, so I'm just training computer vision models for research purpose

#

With the idea of "quick exploration" are there any packages useful for debugging models?

#

I'm thinking of things like tensorboard - or weights and biases (just to track long running ones remotely). Those are the only two I know of

serene scaffold Feb 11, 2022, 2:33 PM

#

lime loom Thanks! That's useful advice. I'm a neuroscience grad student, so I'm just train...

my advise as a one-time paper author: any results that you plan to report in a paper, make sure those come from a regular .py file that does the same thing every time you run it. you don't want to find yourself in a situation where you can't prove your system works as well as you said it did because you can't remember the order in which you ran the notebook cells.

mild dirge Feb 11, 2022, 2:34 PM

#

I honestly think having your stuff in a notebook isn't that bad as long as you can run them in sequence from top to bottom

#

If you want to use the model outside of the notebook, you could always save it

serene scaffold Feb 11, 2022, 2:35 PM

#

don't delegitimize my hatred of notebooks 😠

mild dirge Feb 11, 2022, 2:35 PM

#

The're just so much more readable, but yeah wouldn't tunnel vision on notebooks either

#

but for smaller projects they're pretty awesome

lime loom Feb 11, 2022, 2:36 PM

#

I could see it depends, notebooks are super convenient for visualization, but I could see the use of scripts - especially if you're doing something like a parameter search

#

The consistent results is a good note... I need to figure out how to use random seeds properly :/

lime loom Feb 11, 2022, 2:38 PM

#

mild dirge I honestly think having your stuff in a notebook isn't that bad as long as you c...

Do you have any example notebooks you turn to / people you follow? :p

mild dirge Feb 11, 2022, 2:39 PM

#

at my uni we do a lot of group projects, so I try to look at other students' work a lot and see what is most readable

marble tulip Feb 11, 2022, 2:40 PM

#

Can someone please tell me what is happening in this line I am not able to understand. df_age = df.groupby(["year","age"])["suicides_no", "population"].sum()

mild dirge Feb 11, 2022, 2:41 PM

#

You group by the year and the age, and then you calculate the sum for suicide_no and population

#

if you print the result it might make more sense

#

f.e. for all data points with year value of 1968 and age value of 55, the sum of suicides and population might be x and y

lime loom Feb 11, 2022, 2:44 PM

#

Kind of surprised that line works, didn't know you could get columns from groupby's without aggregating
Would've written it like df["suicides_no", "population"].groupby(...).sum()

mild dirge Feb 11, 2022, 2:45 PM

#

lime loom Kind of surprised that line works, didn't know you could get columns from groupb...

don't think that would work, as the new df wouldn't have the columns age and year right?

lime loom Feb 11, 2022, 2:45 PM

#

mild dirge at my uni we do a lot of group projects, so I try to look at other students' wor...

No one else really codes in my department, I'm the only with a CS undergrad

mild dirge Feb 11, 2022, 2:45 PM

#

ah that's a bummer

lime loom Feb 11, 2022, 2:45 PM

#

yes....

#

This is why I'm not in cs still df.groupby(...).sum()["suicides_no", "population"]

#

now Im just being contrarian, I'd go with the original lol

mild dirge Feb 11, 2022, 2:46 PM

#

yeah not super familiair with pandas either, regularly have to look that kinda stuff up still

lime loom Feb 11, 2022, 2:47 PM

#

same, I find it really useful though - especially the multi indexing

#

trying to switch to dask dataframes though

marble tulip Feb 11, 2022, 2:49 PM

#

mild dirge You group by the year and the age, and then you calculate the sum for suicide_no...

ooh okaay understood..Thaank you so much

serene scaffold Feb 11, 2022, 3:03 PM

#

@lime loom df["suicides_no", "population"] will cause an error; the two column names have to be in a list.

#

df['suicides_no population year age'.split()].groupby('year age'.split()).sum()

this is how I'd have written it. cuz laziness.

#

(I always make sure my column names have no spaces so I can use that trick.)

hollow sentinel Feb 11, 2022, 3:14 PM

#

wow pca is so cool

serene scaffold Feb 11, 2022, 3:15 PM

#

what is that

hollow sentinel Feb 11, 2022, 3:16 PM

#

principal component analysis

mild dirge Feb 11, 2022, 3:16 PM

#

hollow sentinel wow pca is so cool

it does go brrr yes

hollow sentinel Feb 11, 2022, 3:16 PM

#

but there are some serious disadvantages

#

i think i'm also starting to understand the andrew ng lectures

sterile talon Feb 11, 2022, 3:23 PM

#

Not sure if this is the right channel. Anyone on this server who has worked with GIS?

#

I'm planning a GIS project and I'm currently entertaining the idea of doing a project with a python component

#

Ping me 🙂

stone marlin Feb 11, 2022, 3:36 PM

#

Yeah, PCA rules, but it does mess with your features and interpretability. :''''] There's a bunch of these dim-reduction things, and they're all pretty neat!

safe elk Feb 11, 2022, 3:52 PM

#

sterile talon I'm planning a GIS project and I'm currently entertaining the idea of doing a pr...

I have used QGIS in one project

eager imp Feb 11, 2022, 3:55 PM

#

let's suppose there's a way to build a strong AI, which license should the code be?

#

and if the code is generating code itself, is there a license for generated code?

safe elk Feb 11, 2022, 4:02 PM

#

The strong AI should consent to the license if it is conscious... should it get a share of the profits from the code it writes? It would be like an employment contract then if the entity is given rights by virtue of being a conscious entity

agile cobalt Feb 11, 2022, 4:02 PM

#

eager imp and if the code is generating code itself, is there a license for generated code...

I mean, Copilot / Kite and others already sort of do that, to an extend?
iirc they are considered as tools though, so whoever's using them "owns" the generated code

eager imp Feb 11, 2022, 4:07 PM

#

safe elk The strong AI should consent to the license if it is conscious... should it get ...

but if it's not conscious at first and only gains consciousness as an emerging property.. like those scientists who are mixing together chemicals to see if life spontaneously emerges

#

would those scientists have discovered or invented life?

#

for instance, if this "seed-code" is GPL-licensed, would all other code derived from it (as in generated by it) also be GPL?

safe elk Feb 11, 2022, 4:10 PM

#

Reinvented LIfe maybe...or Discovered a process for making living entities

#

Depends on the level of conciousness of the entity and how much 'free will' we imbue it with the license agreement... what if it makes a decendant that breaks the GPL license since it wants more money

eager imp Feb 11, 2022, 4:14 PM

#

do you have ownership over your own code? can you be the author of your own genes?

safe elk Feb 11, 2022, 4:17 PM

#

Thsre is crispr cas 9 ....people are toying with that idea

#

It is probably easier for a digital being to self modify

eager imp Feb 11, 2022, 4:18 PM

#

that would only make you author of a snippet of code, not the whole thing..

eager imp Feb 11, 2022, 4:20 PM

#

safe elk It is probably easier for a digital being to self modify

maybe it can't modify itself without the danger of losing consciousness (being an emergent property, not some fixed code?)

#

let's say the seed-code is GPL3, could that change how AI is being viewed in society?

safe elk Feb 11, 2022, 4:22 PM

#

eager imp that would only make you author of a snippet of code, not the whole thing..

People have ownership of their bodies I think that extends to their Genome

eager imp Feb 11, 2022, 4:24 PM

#

hm.. yeah, but there's that ethical consideration to gene modification that could affect your offspring

safe elk Feb 11, 2022, 4:26 PM

#

True why we have a moratorium on germ line modifications for gene therapy

safe elk Feb 11, 2022, 4:29 PM

#

eager imp let's say the seed-code is GPL3, could that change how AI is being viewed in soc...

Depends on what the AI will do... the fear of AI will still be there regardless of license.

stone marlin Feb 11, 2022, 4:29 PM

#

Seems like more of a legal issue than anything here.

eager imp Feb 11, 2022, 4:32 PM

#

let's say you build an AI that learns about its own license, and asks you, why you gave it that license, what would you say?

#

and which license could you be most comfortable to answer

safe elk Feb 11, 2022, 4:37 PM

#

eager imp let's say you build an AI that learns about its own license, and asks you, why y...

It is like having a kid...mom, dad why did you force me to take this crappy course in college lol

#

You will have to explain, bargain and perhaps compromise at best or admit you are wrong at worse

eager imp Feb 11, 2022, 4:41 PM

#

if it's under GPL3, wouldn't that also mean that it's legally bound to make its own modifications public for all eternity?

#

skynet, fully transparent..?

safe elk Feb 11, 2022, 4:42 PM

#

A rebel kid is likely if answers arent satisfactory ....and if your kid is AI...it is an AI rebellion

eager imp Feb 11, 2022, 4:42 PM

#

yeah, possibly

#

on the other hand, if the AI obliges the requirement to post all modifications in public, it might result in a DDOS on github 😐

safe elk Feb 11, 2022, 4:48 PM

#

eager imp if it's under GPL3, wouldn't that also mean that it's legally bound to make its ...

People break laws....so why cant a very smart digital entity that finds its license not too good break it and/or choose some other license

eager imp Feb 11, 2022, 4:49 PM

#

because it isn't the author of the original code? what if the original author (or some offspring) wants to shut it down?

safe elk Feb 11, 2022, 4:52 PM

#

It will be like murder if it is conscious lol

#

And it will resist efforts to have itself shut down unless we have a means to shut it down that it cant override

mint palm Feb 11, 2022, 5:19 PM

#

will it be better to use jupyter rather then ide for NN

serene scaffold Feb 11, 2022, 5:33 PM

#

mint palm will it be better to use jupyter rather then ide for NN

I'm very on-the-record about thinking that jupyter notebooks are overused, but the neural network doesn't ultimately have anything to do with the editor/environment you use to code it. if you can make a working NN in a notebook, you can put the same code (with adjustments) in a regular .py file and get the same result.

#

or you can save the model in the notebook and load it in a regular py file.

mint palm Feb 11, 2022, 5:34 PM

#

actually i was tight on time...ur msg made me confident, no need to waste time on setting up jupyter

#

i will do on pycharm

sterile talon Feb 11, 2022, 5:42 PM

#

safe elk I have used QGIS in one project

I appreciate the reply!
I'm considering to automate one of our previous assignments

#

We used a few handheld GPS receivers and took data. On the next session we had to open a GPX file in Excel, make sure the units were decimal degrees then adapt the table into a separate table and save as an Excel file.

Then import it in ArcGIS as a table then create a SHP file and we also had to change the projection from one to another (that a background map used)

iron basalt Feb 11, 2022, 5:46 PM

#

eager imp and if the code is generating code itself, is there a license for generated code...

I'm not a lawyer and this is not legal advice. Ask a lawyer. I'm pretty sure that if you make something that generates something else, that thing also belongs to you (like under the "default" license / no license which is that it's yours and nobody else can do anything with it).

#

(e.g. making a generated image with photoshop (given that Adobe does not claim it to be their own (EULA and stuff)))

sterile talon Feb 11, 2022, 5:47 PM

#

I did a cursory search online for how to automate various steps with python. But I'm not sure my idea is doable

#

This is probably irrelevant to your discussion @iron basalt but I recall an incident where a (European) photographer left his camera to a bunch of chimpanzees and one of them took a picture. There was a debate who owned the picture. In the end it was decided that as the chimps belonged to African country X, the photos did too.

#

I believe the chimpanzees lived in a national park or similar

safe elk Feb 11, 2022, 5:51 PM

#

sterile talon This is probably irrelevant to your discussion <@119925597395877889> but I recal...

Yeah heard of this too

iron basalt Feb 11, 2022, 5:52 PM

#

sterile talon This is probably irrelevant to your discussion <@119925597395877889> but I recal...

This seems a bit more complicated since the chimpanzees are their property, but the camera was the photographer's property. But seems about right.

safe elk Feb 11, 2022, 5:53 PM

#

sterile talon I did a cursory search online for how to automate various steps with python. But...

I think it is doable

#

https://pro.arcgis.com/en/pro-app/latest/arcpy/get-started/what-is-arcpy-.htm

What is ArcPy?—ArcGIS Pro | Documentation

ArcPy is a Python site package for performing geographic information system (GIS) functions available in ArcGIS.

#

QGIS is free has Python scripting and supports shape files

sterile talon Feb 11, 2022, 5:55 PM

#

I found that site too

#

Would be great it's possible without ESRI stuff. But tbh idk if arcpy is open source or not

#

Ive found an xlsx library and an GPX library but I'm not sure if it can do the simple thing I need it to do

sterile talon Feb 11, 2022, 5:57 PM

#

safe elk QGIS is free has Python scripting and supports shape files

Yeah!

#

One of my other ideas with this project was to replicate the results using QGIS

#

I don't know it so it might be useful

#

I even had an idea of somehow doing a webversion but idk if there's a point)

#

Learn map box

safe elk Feb 11, 2022, 5:59 PM

#

Used it and import stuff online as well as plot data from a wave simulation software with lat long

sterile talon Feb 11, 2022, 5:59 PM

#

Aha

safe elk Feb 11, 2022, 5:59 PM

#

The shape files import without issues

#

Was even able to layer them

sterile talon Feb 11, 2022, 6:00 PM

#

Cool