#data-science-and-ml

1 messages Β· Page 376 of 1

safe elk
#

QGIS closest to ArcGIS

sterile talon
#

I remember giving QGIS a shot as self study summer of 2020 but the tutorial video series I found on YouTube was by this Swedish guy (I'm Swedish myself) and his English was really boring to listen to and had a strong Swedish accent. I fell asleep

#

😜

safe elk
#

Map box /web version might be useful if you want something like an online Dashboard ...but it is more work compared to QGIS if you are coming from ArcGIS...depends on what you are using the maps for

brazen spire
#

Biais are updated the same way as weights in a neural network?

#

with the Delta rule?

safe elk
sterile talon
sterile talon
#

Aha ok I see.. Dashboards..

safe elk
sterile talon
#

I'm in Earth science πŸ™‚

#

Hydrology /hydrogeology as a specialty

#

Have you tried Julia? I started a few weeks ago. I like it a lot!

#

It has great potential imho

safe elk
sterile talon
#

Makes sense

#

I'd like to travel around Asia at some point

#

Just need to finish my degrees and get a job!

safe elk
safe elk
sterile talon
#

Julia has UTF-8 support

sterile talon
#

If its related to the uni that is..

safe elk
#

They need to... and yes it is related

sterile talon
#

^^

safe elk
sterile talon
#

We have mainly worked with matlab and domain specific software /coding

#

Ty!

safe elk
#

Used Matlab too lol

sterile talon
#

Geochemical simulations, fluid simulations

#

Phreeqc, GMS (gui to mod flow)

#

And others

safe elk
#

Navier Stokes and like

sterile talon
#

Ye they are in there somewhere. I'm glad I didn't have to look for em

safe elk
#

Yep having them there is nice

sterile talon
#

I've been thinking about doing something crazy on my spare time and write something in perhaps Julia to simply working with phreeqc.

#

Started on a webdev project for a hydrochemistry course. Haven't had time to finish

#

Sorry if I'm way off topic.

safe elk
#

Lol people here are geeky

#

I think it isnt an issue

sterile talon
#

I was at the gym today earlier πŸ˜‰

#

Yes I'm quite geeky.. Even my GF with a PhD in organic chemistry says so.

safe elk
#

Lol I majored in Chem

hexed schooner
#

can anyone tells me why this code uses tensorflow v1 but not v2, and why it runs very slow if i use tensorflow v2?

sterile talon
#

Wow that's cool!

hexed schooner
#

tensorflow 2 is worser than v1?

#

hmm... I dont know pytorch, I am more familiar with tensorflow v2

hexed schooner
#

yea I'll learn Pytorch but not now 😒 because I need to submit the work by 14 feb and I only know tensorflow now

#

at first I thought that it is because tensorflow v2 is using GPU but even if i switch to Google Colab and use GPU it runs slow too

#

but when I include

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

it runs very fast

karmic moth
#

Hi guys...sorry to interrupt

#

is any one of u guys data scientists or AI experts?

hexed schooner
#

what is it

#

I'm just a student

karmic moth
#

oh nw

#

no im a student as well

#

am in my final year..i'm trying to find some experts to interview for my final year thesis, cuz im implementing a deep learning model for detecting inconsistent reviews on Amazon, and i need to interview some experts to gain their opinions on my approaches and techniques

#

and i have been struggling and couldnt find anyone..

karmic moth
#

u want me to run the code?

#

uhm k...is this to check if im a bot or not?

#

then..

#

wait

#

il run

#

i dont have the packages

hexed schooner
#

wait so what do u meant by narrow it down

#

ohhh i see

#

i tried to do that too

#

but it seems like normal to me...

#

because it is just normal Sequential model fit and predict

#

its jupyter notebook

restive rock
#

hi a friend of mine needs help making reports with python....some data science stuff...would anyone be willing to assist....
I've hardly used python..

serene scaffold
#

you're making reports. what reports?

restive rock
#

he doesn't need rn....
he'll do it tomorrow, i just saw his text and since i don't have experience with python i hopped here

serene scaffold
restive rock
#

cool

grave frost
#

its a long fixed issue πŸ€·β€β™‚οΈ please don't give wrong advice to others without reading your own link first....

grave frost
strong tapir
#

Thanks now im getting non-zero outputs but just like you said its gonna require some further tuning for desired behavior

#

i think this is mainly just because i need better input data but i think i can do it from here, I appreciate the help

late ruin
#

Hi guys, maybe someone could help me, I have this function, that im using at the start of my machine learning script, and I'm stuck understanding how I could predict the winner when the winners in my column are the name of the team, thanks in advance

chrome marten
#

how do i pass an actual image in this model to get 128 dimensional vectors?

inputs = tf.keras.Input(shape=(32, 32, 3))
x = tf.keras.layers.Conv2D(filters=32, kernel_size=(1, 1), activation='relu')(inputs)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu')(x)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), activation='relu')(x)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
prime hearth
#

hello im doing a linear regression model however my cost function is fluctuating

#

is this normal

#

?

#

my lowest cost is 1

dusk tide
prime hearth
#

im trying to predict anual salary based on 3 features

dusk tide
#

What's the total cost coming after training??

prime hearth
#

oh no not yet, also this was implemented from scratch without any libraries

#

so i dont think i have access to these metrics

#

lemme check total cost

#

total cost is 320

#

not sure what this mean

#

i can try using libraries i just practicing with implementing from scratch

dusk tide
prime hearth
#

trainning

#

which is worse im guessing

dusk tide
prime hearth
#

lemme try visualziing the current weights and the testing data and il get back here

#

its because the data is not quite linear in every feature

#

its from kaggle dataset insurance

#

so i picked the top 3 features with high corelation to the target

#

and removed ones with very low corelation

dusk tide
prime hearth
#

gaussian distribution and linear decision boundaary?

#

i know these are two

#

not sure for other assumptions

#

and relationship- some type of corelation

dusk tide
prime hearth
#

yes i removed multi colinearlity

#

thanks for sharing that too

dusk tide
#

And also features preprocessing and other things is required before modelling and

prime hearth
#

yes i did do that

#

i scaled data and also applied catergorical transformation

#

and removed outliers with IQR

dusk tide
#

You can try polynomial regression

prime hearth
#

oh moving it up a dimension

#

i guess i will need to learn how to do that from scratch

#

only know how to do that with libraries

dusk tide
prime hearth
#

okay thanks

#

it cause i saw someone else in kaggle

#

do linear regressoin

#

and got 70% accurary

#

with sklearn

#

i though i could get same without sklearn

hexed schooner
#

is deep q network same as double q learning?

gilded bobcat
#

Hi all quick question

#

is it normal or atypical to see a simple train test split beat out k-fold cross validation in out of sample prediction?

#

2 models, 2 methods of sample splitting.

misty flint
#

depends on datasize

#

there could be overfitting

uneven cargo
#

πŸ‘‹ Hey all, I've put together a Jupyter notebook that I'm trying to make sure has a really good developer experience when sharing as I want to use it as a tutorial for how to encode data as vectors, cluster it using KMeans, dimensionality reduce it with PCA and then visualise in a projector and dashboard. I'd be super keen on getting your thoughts and feedback on how to make it as usable as possible: https://colab.research.google.com/drive/1C6waQQCXKqXyG2ZRmrJohZn9UEe-8iI3?usp=sharing

kind rock
#

I'm trying to build a machine that plays rock papers and scissors against a user and then learns along the way. Does this come under supervised or unsupervised?

dim heart
#

hi

#

any one here know about tensorflow

#

i have some problem with it

#

can someone help me ?

pastel valley
#

is it ok to use this kind of images on training a model lets say to classify a cat or not?

#

or its better to just use single cat on images for the dataset?

#

im talking about in training convolutional neural network models

#

so its better to use just like these kind of images?

#

the model will learn better with that? but if i try to input an image with multiple cat will it still recognize it as cat?

#

even i only trained it with images of single cat?

lapis sequoia
#

Why ?

pastel valley
#

yo any tips on resizing an image to be a NxN without making it like fat?

cunning parrot
#

If the File ist Stored in another folder

brazen spire
#

in this case

#

are they multiple way to update b2?

#

If L = 0.5*( (u_target1 - n_3out)**2 + (u_target2 - n_4out)**2 )

#

is it βˆ‚L/βˆ‚b2 = βˆ‚L/βˆ‚n_3out * βˆ‚n_3out/βˆ‚n_3in * βˆ‚n_3in/βˆ‚b2 ?

#

or βˆ‚L/βˆ‚b2 = βˆ‚L/βˆ‚n_4out * βˆ‚n_4out/βˆ‚n_4in * βˆ‚n_4in/βˆ‚b2?

charred wedge
#

What would you use to capture data from a json streaming api?

#

I mean you don't really ask the api for anything, you just.. listen to a stream.

ionic palm
#
inputs=tf.ragged.constant([[[0,0],[3,0],[0,4]],[[0,0],[1,0]]])
print(inputs.shape)```
```(2, None, None)```
It does not recognize it is `(2,None,2)` , what should I do?
shut raven
#

Hey.
I wanna do a survey (this system is finished) but I need a good library to display the results.
That library should be able to display the results as rendered images (but with a good vision on each input, since there will be mostly 30-50 answers/per user/per survey) & with percentage/custom text.
Does anyone here know a good one for my project?

wicked grove
#

Hello

#

What can i do when the train and validation have 10% difference

dim heart
#

anyone here using tensorflow

nova tapir
#
  1. is f(i) the same as "x(i)" features. i mean, it is just a different symbol, right?

  2. if i'm not mistaken, l(i) is placed at the point where x(i) is, but the sim(similarity) between them is always 0? because l(i) is on x(i)

prime hearth
#

i would like some way to know how the differen values of regulariation is affecting the performance of my model
all i see right now is the cost is going down but thats it and plotting it on graph doesnt help since it looks all same

#

for linear regression model implemented from scratch ^

#

@nova tapir is this from school or website? Might need to know how the writer is intepreting "l"

#

oh waut

#

l is actually data point

#

X is the full data

#

so it comparing one sample of observation with the full dataset

gilded kestrel
#

how can this be explained: high test accuracy but moderate accuracy in practice?

late ruin
#

hi quick question, if i want to give int values to a column of strings, is there a way to do it? i have a column of team names, and i want to give each team a unique value of an int, is there a way to do it?

junior wharf
#

Hey everyone. So, I was trying to find a fast, stable way to solve a system of equations Ax = b for constant A and many vectors, one at a time (so vectorization doesn't help me). I could calculate numpy.linalg.inv(A) and then multiply the vector b and that is very fast, but I believe it is unstable for the matrix and vectors I have to go through. I could also use np.linalg.solve(A, b) on every iteration, and that looks like it can be more stable, but is much slower. I thought if I factored my matrix beforehand and then used scipy.linalg.cho_solve() I could have a faster solution, but even though the factorization is done beforehand, solving the system is slower than the solve option. Is there a way to get better performance in this case?

serene scaffold
late ruin
serene scaffold
late ruin
serene scaffold
# late ruin yea im trying to use a svm model to predict match result between two teams, but ...

so, that's actually a great example of how assigning arbitrary numbers to each value wouldn't work. each input for a SVM is a point in space, and the SVM finds the boundaries between types of points. It doesn't make any sense to say that one team is "more in one direction" than another.

question is, is the team name a feature (information about a data point), or the target (the label for the data point)?

late ruin
#

feature

serene scaffold
late ruin
#

given a history of results in a match between two teams, predict a result of a match between them, or between any given two teams

serene scaffold
late ruin
#

umm i decided because i found couple of projects that used it, but im ok to use anything

serene scaffold
#

if you're going to use SVM, it sounds like you'd need to have a different SVM for each combination of teams.

tight dove
#

Hello all

#

I am trying to get a count of the nnumber of customers by country, so I did this -

#
df = df.groupby(by=['country'])['name'].count()
serene scaffold
#

you're writing over the df variable, which I would avoid

#

if you're using a jupyter notebook, that's probably why you're having an issue.

tight dove
#

What is the best approach?

serene scaffold
tight dove
#

Asides that, I was trying to express the output or the dataframe as -

<COUNTRY>: <number>
<COUNTRY>: <number>
serene scaffold
#

I have no further comments until seeing the raw text printed by print(df.head().to_dict('list')).

tight dove
#

Oh okay

serene scaffold
#

Ping me if you decide to show that. I'm happy to help you solve this, but I'm particular about what information askers make available.

upper spindle
#

gives me this

#

nvm,

#

i think i know where i went wrong

#

still comes up with this AttributeError Can only use .dt accessor with datetimelike values

#

does anyone know how to convert the dates to just the day e.g. from 2021-12-31 23:48:38 to 2021-12-31

umbral anvil
#

Good morning, I have a question, so I leave a message here.
It constitutes an airport user prediction model.
We're going to predict the number of users every month.
I'm not sure which model to build.
scikit-learn? k-means?
I'd like to get technical advice.

lapis sequoia
#

hey they added support to py3.10 on a new tf?

lapis sequoia
#

check out pd.to_datetime function

serene scaffold
# upper spindle gives me this

is that column secretly a string column? you have to use pd.to_datetime (credit to RA), but you also have to write over the existing column.

serene scaffold
#

We're going to predict the number of users every month.
be really specific about what data you have that you can use to "predict the number of users".

umbral anvil
#

..um...

serene scaffold
#

let's forget that kmeans, classifiers, or sklearn exist for a moment. what data are you working with?

upper spindle
serene scaffold
upper spindle
#

i tried converting to datetime but it still doesnt work

serene scaffold
upper spindle
upper spindle
umbral anvil
upper spindle
#

thanks

serene scaffold
#

if you're lucky, it doesn't really require any work. see if pd.to_datetime(df['Date']) works. but remember, this won't change df or any of its columns in any way. it returns an entirely new Series.

serene scaffold
#

Those who I'm helping/attempting to help, please ping me if you respond.

umbral anvil
serene scaffold
#

@upper spindle did it work?

umbral anvil
upper spindle
serene scaffold
upper spindle
#

just checking on stackoverflow

serene scaffold
upper spindle
#

I typed this code pd.to_datetime(df['Date']) , the dtype was datetime64[ns]

serene scaffold
upper spindle
#

then I tried this df['Time'] = pd.to_datetime(df['Time'], unit='d')

serene scaffold
#

datetime64[ns] is an unambiguous way of storing a time.

upper spindle
serene scaffold
#

why didn't you try using pd.to_datetime(df['Time']) in conjunction with the .dt. thing we talked about earlier?

umbral anvil
serene scaffold
upper spindle
#

Im not sure how to use pd.to_datetime(df['Time']) in conjunction with .dt.

umbral anvil
# prime hearth for linear regression model implemented from scratch ^

I'm sorry, but I'll leave an answer in Japanese only for this person.
泣いγ‚ͺγ‚ͺγ‚«γƒŸγƒΌ さん、θ³ͺε•γŒι›£θ§£γ§γ™γ€‚ 要約をしてください。
あγͺγŸγ“γθ³ͺε•γŒι›£θ§£γͺγ‚ˆγ†γ§γ™γ€‚
If you leave a message like this, no one can help you. I'm sorry.

serene scaffold
green niche
#

should you learn AI before going into the math or should you learn the math before going into AI.

serene scaffold
#

(well, you can separate the theoretical math from the AI, but not vice versa.)

green niche
#

I heard you can do a lot with AI without the math, but to be proficient, you need a lot of math

upper spindle
serene scaffold
green niche
#

ah ok

serene scaffold
#

and what is pd.to_datetime(df['Date'])?

green niche
#

so overall, the math is crucial to learn first before the AI

serene scaffold
#

though I don't know how much you like pure math.

upper spindle
serene scaffold
upper spindle
#

before, i had converted my unix time using df['Time'] = pd.to_datetime(df['Time'], unit='s'), which converted to a normal datetime like 2021-12-31 23:58:50 , but when i try df['Time'] = pd.to_datetime(df['Time'], unit='s') on the same data column it comes out as ValueError: non convertible value 2021-12-31 23:48:38 with the unit 'd'

serene scaffold
#

any time you say something "doesn't work", please be specific about what happens instead.

#

I can't help you debug if I don't know what's actually happening. Did you try pd.to_datetime(df['Date']).dt.floor('D')?

#

this is assuming that df['Date'] is still a Series of strings that are timestamps.

upper spindle
upper spindle
serene scaffold
upper spindle
#

and btw, thanks for your responses, theyve helped a lot

serene scaffold
#

it's a difficult thing to wrap your head around: methods that change the object "in-place", vs functions/methods that return entirely new objects.

upper spindle
serene scaffold
upper spindle
serene scaffold
#

in "normal python", it's mostly methods changing things in-place (like list.append), whereas the python data science world is mostly returning new objects.

lime ocean
#

Is there any way to separate two IPython displays being run in the same notebook cell? Some sort of vertical spacer I can insert or something?

#

right now they are really squished together and they use the same horizontal scrollbar which is annoying

serene scaffold
#
SoftHints - Python, Data Science and Linux Tutorials

In this brief tutorial, we'll see how to display two and more DataFrames side by side in Jupyter Notebook. To start let's create two example DataFrames: import pandas as pd df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz'], 'value': [1, 2, 3]}) df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz'], 'value': [5,

lime ocean
#

yeah, that helps :)
thanks

dusk tide
#

Pic is the ques. Of svm
We need to find | | theta | |
So pic 2 has the p(i) for example x which has projection on theta
So acc. To me p(i) | | theta | | >= 1 to classify example 'x'
So putting the value of p(i)= 2 we get | | theta | | = 1/2
So option 2 is correct ??

fallow rune
#

Hi guys, just asking. Does anyone here has an experience in doing NLP?

lapis sequoia
#

Hi, I'm having trouble understanding what the parameters a, b, c and d correspond to and why they are passed as the second argument to plt.plot: ```py
x = np.linspace(0, 2, 100)
y = 1/3*x3 - 3/5 * x2 + np.random.randn(x.shape[0])/20

def f(w, a, b, c, d):
return a * x3 + b * x2 + c * x + d

params, param_covarience = optimize.curve_fit(f, x, y)

plt.figure(figsize=(8, 8))

plt.scatter(x, y)
plt.plot(x, f(x, params[0], params[1], params[2], params[3]), c='g', lw=3)

wary breach
#

What point seems most like an elbow? 25?

#

or 13?

lapis sequoia
tidal bough
lapis sequoia
#

these parameters are coming from optimize.curve_fit. so basically optimzer is giving you this nice parameters, by which you can create a nice function which will give you a curve which will have all points on it. (it is more of a they will probably very closer to curve if not on curve.)

#

hm i am choosing worst words, follow what reptile says.

tidal bough
#

Here's the result, with the params estimated shown

#

note that the original params were 1/3, -3/5, 0, 0, so it's pretty close but not perfect (obviously)

#

in fact, here's it with the original polynomial shown too:

cinder schooner
#

Hello, i have a question about the log loss metric. So what i understood is that logloss= -1*Log(Likelihood).
I have a model that i'm using for multi class classification, i'm showing for every epoch the accuracy, the precision, the recall, the logloss and the vallogloss. I'm using categorical crossentropy.
What i'm not understanding is why sometimes when the loss function decreases the logloss increases and sometimes when the loss function increases the log loss decreases. Shouldn't they like go together? like increase together or decrease together? What's the relation between them?

karmic moth
#

Does anyone know how to convert a request_json to a Dataframe

serene scaffold
serene scaffold
karmic moth
#

[{'ReviewId': 'RLP00H7L5ITZL', 'ReviewComment': "some dude yoinked my lil brothers bike, It's my fault for buying this dogsht fkn lock. DO NOT BUY IF YOU VALUE YOUR BIKE", 'StarRating': 1}, {'ReviewId': 'RMYJ0K43DKOLF', 'ReviewComment': "This lock literally fell apart after one use. All of the number rings slid off. It might be fixable, but for me this isn't worth it. Will be going back to a keyed lock", 'StarRating': 1}]

#

in this format

serene scaffold
#

Try pd.json_normalize

#

Tfw I'm answering python questions on my phone right after waking up. With one eye open

karmic moth
#

lols

#

thnx dude!

serene scaffold
#

Did it work?

meager scroll
#

Hi guyz, do you know how to calculate sth like that on dataset in python?

serene scaffold
#

and what is delta t?

meager scroll
#

I'm working on dataset from stock, it's closure prices

serene scaffold
#

is t a day?

meager scroll
#

yes

serene scaffold
#

so delta t is the difference in closure price from the previous day?

meager scroll
#

yes

serene scaffold
#

alright. do you have an array of t values?

meager scroll
#

yes

serene scaffold
#

can you show the array?

meager scroll
#

I just do something like np.arrange(1, len(data) + 1) which is equal to. number of days, and it's like 5k records

serene scaffold
#

okay, so it's just an arbitrary array of shape (len(data),)?

meager scroll
#

yeah

serene scaffold
#

great. though if each element is a t value, I'm still not sure how to get S(t)

#

alternatively, if each element is actually an S(t) value, then I don't know how to get S(t + Dt)

meager scroll
#

May it help?

#

In this article they used s(t + dt) - s(t)

serene scaffold
#

they define s(t) = ln(S(t))

meager scroll
#

yes, s(t) is ln of closure prices

serene scaffold
#

can you show figure 1?

meager scroll
serene scaffold
#

does this mean that your data has a way to look up t and they're respective s(t) values?

meager scroll
#

yes

serene scaffold
#

is it in a csv?

meager scroll
#

yes

serene scaffold
#

please drag/drop the CSV into this chat.

meager scroll
serene scaffold
#

alright, one moment

serene scaffold
meager scroll
#

Hmmm... they call it log return, I'm also confused about it

serene scaffold
#

I'm just trying to map the formula they gave you onto what data you have

meager scroll
#

https://www.r-bloggers.com/2019/03/inverse-statistics-and-how-to-create-gain-loss-asymmetry-plots-in-r/ There is R code, when some1 calculate it, but looks like in different way(?)

#

ret <- cumsum(as.numeric(na.omit(ROC(p[d:end]))))

serene scaffold
#

If you know how the parts of the formula relate to the data in the CSV, I can help you with that

#

otherwise, I'm just guessing.

meager scroll
#

Alright. Will try to better understand what's going on in this paper. Thanks for your time!

wicked grove
#

After training my model,i tried evaluating it

#

And i get this

#

I cant understand why the test loss ,is greater than the test acc

#
 
score = model2.evaluate(X_new_img_test,onehot_t,batch_size=128)
print('Test loss:', score[0]) 
print('Test accuracy:', score[1])``` ```3/3 [==============================] - 14s 2s/step - loss: 0.8399 - accuracy: 0.7333
Test loss: 0.8398879766464233
Test accuracy: 0.7333333492279053```
#

These are my graphs

upper spindle
#

is there a way to calculate the average sentiment of each singular day ?

agile cobalt
#

how large are the Training and Test sets?
(*replying to urfaa)

lapis sequoia
#

where do i go to learn numpy, pandas, seabron etc

#

theres like no good tutorials

#

im lost

shut obsidian
calm thicket
#

or the pandas docs?

serene scaffold
#

numpy is a subset of pandas, in some ways.

stone marlin
#

Oh, this is a cool resource.

serene scaffold
stone marlin
#

I was making "koans" for Pandas + Numpy for some students I am getting soon, a la https://github.com/gregmalcolm/python_koans, and I love looking at tutorial resources to see what people feel like beginners / intermediates struggle the most on. :''']

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @keen cairn until <t:1644769791:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

prime hearth
#

hello, i would like to please ask, why do we find the min for arg min -log(p(x|theta))

#

for me it makes more sense to find max so arg max -log(p(x|theta))

#

because min is infinite

#

like x^2 for example

#

why argmax of x^2 and argmin of -x^2

#

it makes more sense for opposite

spare briar
#

p(x|theta) is our likelihood function

#

loss function is being derived by maximum likelihood

#

so arg max log(p(x|theta))

#

log is monotonic so arg max log(p(x|theta)) = arg max p(x|theta)

#

notice that arg max log(p(x|theta)) = arg min -log(p(x|theta))

prime hearth
#

yes but why do we take max of logp(x)

#

because if we graph it derivative

#

and the function

spare briar
#

because p(x|theta) is our likelihood function

prime hearth
#

the max is infinite

spare briar
#

we want the highest likelihood of the observed data under our model

prime hearth
#

oh okay, so what about let say x^2

#

argmax x^2

spare briar
#

right so when our likelihood is gaussian

prime hearth
#

this isnt possible rigght

spare briar
#

then log p(x|theta) \approx -|x - mu|^2

#

because the gaussian has the form

#

e^(-|x-mu|^2/2\sigma)

#

so when we take the log we have exponent of gaussian concave down

#

but you are right

prime hearth
#

oh okay thanks i think that made more sense by showing the gaussian formula

spare briar
#

what if our likelihood function is more complicated

prime hearth
#

i forgot that p(x) is gaussian formula with - |x-u|

spare briar
#

this is why with these sorts of models we use likelihoods from exponential family distributions

#

these are basically the distributions where it is possible to write a closed form likelihood and get a loss

#

hope that helps a bit

#

i strongly recommend Bishop's book chapters 3 and 4 on this topic

prime hearth
#

thanks, i was just confused why we took max of log but i understand now because if we were to graph it it would give max value

#

i just forgot about the minus sign in the equation for probability density function

#

thanks!

ionic palm
#

How to use layers.Discretization() across dimention when a ragged tensor innermost dimension is 1 ? Is there any layers.Reshape() trick?
Like [[[10],[20],[30]],[[10],[20],[30],[40]]] to become [[[1],[2],[3]],[[1],[2],[3],[4]]]
And [] to become []Since it is ragged, reshape() need to flexiable right?
Also Flatten() is not supportive to ragged

candid flare
#

Hey I am currently reading "Automate the boring stuff wit python" but I want to at some point learn how to do something with machine learning(Not sure what im a newbie). What do you think a good next book to read would be?

serene scaffold
#

@candid flare data science from scratch

candid flare
#

I was thinking of getting that one next! @serene scaffold

trail ibex
#

Hi guys, I have a really basic question that I can't seem to find the answer to. It's a pandas dataframe I am working with. Newbie stuff for a college project - is this the right channel to ask about it?

#

Or should I use the help channels?

agile cobalt
#

just ask away

trail ibex
#

It's so simple, but I just can't get my head around it. I am counting the nulls for <column name> in a dataframe to see how many there are - all I want is "<Name of Column>: <number of nulls>"

#

I cannot seem to figure it out, I might be a bit tired :/

agile cobalt
#

have you checked the user guide for working with missing data? it should mention the methods you'd need

serene scaffold
#

You can use the isna and sum methods.

trail ibex
#

I've been googling to the point where I can't even read anymore tbh, really frustrated now. I know I'm missing something silly as all hell

serene scaffold
#

Pandas sometimes uses "na" in reference to nan/null

#

@trail ibex don't worry, we'll fix this. Deep breaths lemon_hyperpleased

#

Start by calling isna() on your dataframe and print it to see what you get

trail ibex
#

So I'm working with a data file where there's countries (with names "country") and iso codes for the 3 letter abbrev. I'm trying to get a list of "<Country>: <number of nulls>". I can get a list with no issues, but it's 1002 lines long, I am just missing something silly. I apologise for the silly question again, I am only 2 weeks into my course

#

Here's my code:

#

fixing ISO codes first

Let's look at the iso_codes column and see where the nulls are

null_isos = df[df['iso_code'].isna()]
print(null_isos['country'])

#

df is the dataframe containing all the stuffs

serene scaffold
#

df[df.isna()] is wrong for this

#

You'll lose entire rows that have a single nan

#

Or something like that

#

Just look at df.isna() by itself first.

trail ibex
#

Let me try that

serene scaffold
#

I'm about to drive home. I'll be back in ten minutes or so.

trail ibex
#

It's giving me a list of bools now, it's the same output basically but instead of country names, I now get row numbers

minor elbow
#

u can sum() bools

trail ibex
#

No worries Stel

minor elbow
#

so like df.isna().groubpy('country').sum()

trail ibex
#

groupby.....I didn't know this method. Let me try this

#

I swear if it works I'm gonna cry

minor elbow
#

lol

#

it can be a little unwieldy

trail ibex
#

So it did.....something, but not quite what I expected hehe

#

It's so nice to have help....jesus the relief is real

minor elbow
#

its not clear to me what counts as being na? you might want to subset the colums to country and whatever u are looking for na in

serene scaffold
#

whether or not something is na/null is unambiguous. it is or it isn't.

minor elbow
#

i mean like the structure of the df, is it just country, code or are their other columns

trail ibex
#

I am not sure that I phrased my question very well now :/ So I have a dataset that has a bunch of country names - it's a little messy cos it's joined from 2 sources. One source has individual countries, with ISO codes (the 3 letter abbreviations for them). The other source has stuff I want to exclude in this sense like "Americas" or "Europe" - those don't have 3 letter iso codes. I want to just look at how many countries have null iso codes, and how many, and make this a displayable list, if you get me (I'm using Jupyter Notebook for my project). So basically it's a preclean step - I know how to drop them, no issue, I just want to display what I am dropping, and it's driving me crazy heh

serene scaffold
#

I didn't expect .groubpy('country') to be part of the solution. do you mind doing print(df.head().to_dict('list')) and copying the exact string output into the chat as text?

#

@trail ibex ^

trail ibex
#

{'biofuel_consumption': [nan, nan, nan, nan, nan], 'biofuel_electricity': [nan, nan, nan, nan, nan], 'coal_consumption': [nan, nan, nan, nan, nan], 'coal_electricity': [nan, nan, nan, nan, nan], 'coal_production': [0.691, 0.726, 0.842, 0.842, 0.859], 'country': ['Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan'], 'electricity_generation': [nan, nan, nan, nan, nan], 'fossil_electricity': [nan, nan, nan, nan, nan], 'fossil_fuel_consumption': [nan, nan, nan, nan, nan], 'gas_consumption': [nan, nan, nan, nan, nan], 'gas_electricity': [nan, nan, nan, nan, nan], 'gas_production': [nan, nan, nan, nan, nan], 'gdp': [31712751616.0, 32398444544.0, 33068124160.0, 34692370432.0, 35319054336.0], 'hydro_consumption': [nan, nan, nan, nan, nan], 'hydro_electricity': [nan, nan, nan, nan, nan], 'iso_code': ['AFG', 'AFG', 'AFG', 'AFG', 'AFG'], 'low_carbon_consumption': [nan, nan, nan, nan, nan], 'low_carbon_electricity': [nan, nan, nan, nan, nan], 'nuclear_consumption': [nan, nan, nan, nan, nan], 'nuclear_electricity': [nan, nan, nan, nan, nan], 'oil_consumption': [nan, nan, nan, nan, nan], 'oil_electricity': [nan, nan, nan, nan, nan], 'oil_production': [nan, nan, nan, nan, nan], 'other_renewable_consumption': [nan, nan, nan, nan, nan], 'other_renewable_electricity': [nan, nan, nan, nan, nan], 'population': [13356500.0, 13171679.0, 12882518.0, 12537732.0, 12204306.0], 'renewables_consumption': [nan, nan, nan, nan, nan], 'renewables_electricity': [nan, nan, nan, nan, nan], 'solar_consumption': [nan, nan, nan, nan, nan], 'solar_electricity': [nan, nan, nan, nan, nan], 'wind_consumption': [nan, nan, nan, nan, nan], 'wind_electricity': [nan, nan, nan, nan, nan], 'year': [1980, 1981, 1982, 1983, 1984]}
C:\Users<me>\AppData\Local\Temp/ipykernel_13120/1923355927.py:1: UserWarning: DataFrame columns are not unique, some columns will be omitted.
print(df.head().to_dict('list'))

serene scaffold
#

alright, let me see

trail ibex
#

I only have the notebook saved locally at the moment, but if it helps, I could set up a Git and push it, I guess. Might take me a while to figure it out

serene scaffold
#

so, I would add the country to the index.

trail ibex
#

So index(['country'])

#

Let me see if I can get back to where I was

#

Guys, thank you so much for the help on this

#

I can do later steps, I just can't demonstrate why I'm doing them, and it's.....argh

serene scaffold
#

do you mind drag/dropping the CSV into this chat?

trail ibex
#

Ofc that's no prob, I'm working with a Kaggle file. OK to just link?

serene scaffold
#

df.isna().sum() "works" but doesn't organize it by country. it sounds like at the end you want a table with rows for each country and columns for each kind of data.

#

and each cell is the number of nans.

trail ibex
minor elbow
#

df[['country', 'iso_code']].groupby('country').iso_code.apply(lambda x: x.isnull().sum())

trail ibex
#

I will google this, thanks for the pointer πŸ™‚

serene scaffold
trail ibex
#

Basically what I want to do with this bunny is give a justification for zapping areas like "America", "Europe" and ISO codes with zero data

minor elbow
#

well it works on the 5 rows with no nulls πŸ˜‰

trail ibex
#

So I want to show the number of nulls for the country name, and say "This is why I am dropping these" - is that makes sense

#

I am thinking I am overthinking it badly hehe

#

Ah it's a college course, I want to do well. No questions at the end, you get me πŸ™‚

#

I mean, that is what I will argue, but I want to show that the data wouldn't have helped either, so I zapped it

#

Pretty sure it was calculated as the sum of the territories anyhow, so I could always reproduce it just by summing if I needed to

#

I agree πŸ™‚ But I want to show what I am zapping

serene scaffold
#

this worked for me:

df.drop(['year', 'iso_code'], axis=1).groupby('country').apply(lambda d: d.isna().sum())

I'll explain why this works

trail ibex
#

year.....hokay. That's interesting, didn't see that one playing in

slow sable
#

how can i interpret this graph? is test set overfitting with higher degrees and train set underfitting?

serene scaffold
#

.drop(['year', 'iso_code'], axis=1) -- we don't care about these columns (columns are axis 1)
.groupby('country') -- this sort of makes a separate dataframe for each country, where every df is for one country
.apply(lambda d: d.isna().sum()) -- this does isna().sum() for each of those dataframes

trail ibex
serene scaffold
trail ibex
#

I did. I got down from 123 columns to 32 I was interested in. The ISO column is one I kept for 2 reasons - (1) as a cleaner - if it's blank I can dump it (2) I'll use it as the ref to get the flag graphics from somewhere else for the viz πŸ™‚ Again, it's a college project, I have to demonstrate this stuff

trail ibex
#

I know. There are other columns like GDP which I want to use back and forward fills on. I only wanted to deal with the blanks in the iso_codes column tbh

#

But I can't seem to find the right syntax

#

Gonna try Stel's suggestion now though

#

Ah, Stel is using a drop. That's my next step for sure (although he's 1000000000 miles beyond me) but I need to display what I'm dropping first, if that makes any sense

minor elbow
#

did u try the one i posted

serene scaffold
trail ibex
#

Sorry about all the stupid questions btw :/ I really am a newb, Python is my first programming language and man it's rough with the syntax

minor elbow
#

yeah pandas is like its own little language in some respects

trail ibex
serene scaffold
#

you can't do df.var2 == NaN, btw. comparisons to NaN are just always false.

#

but in either case, lst is not trying to filter rows or columns that have nans. they're trying to count them

trail ibex
serene scaffold
#

not the rows or columns, but the instances of nan.

trail ibex
#

I'll read that, thank you YoDaddy πŸ™‚

trail ibex
serene scaffold
#

I don't understand what you mean by "show what you drop". We're just ignoring two columns that either don't have nans or which are redundant.

minor elbow
#

u can subset dfs by indexing with a list of cols, so it gets the country/iso_code, then it groups by country, then for each group it counts the nulls in the iso_code series

#

it returns a series indexed by country so you can sort_values() ascending/descending, filter out those with 0 values etc

trail ibex
#

So, it's a college project - introductory data analysis. I need to do a project showing the steps I take to arrive at certain conclusions. With a dataset like this, I am not gonna use everything, so I am gonna zap a big part of it. Some of that I can do by justifying only importing certain columns (done), but other parts I need to justify getting rid of stuff that's not consistent or in the right format. The ISO code column is one of those. I want to zap the ones with none, cos they're generally territories (not what I want) or have no data. I need to say "This is what I am dropping, and this is why"

#

It's just the requirements of the course, is all πŸ™‚

#

But I'd still need to be able to select them to drop them anyhow

untold belfry
#

Can anyone tell me shortly how to use numpy's rjust for the second value of each numpy array inside the big array (arr[:, 1])?

trail ibex
#

Yup, those are the bunnies I want rid of, but I can't say "Well, it looked shit in Excel" πŸ˜› I have to show I am doing it with Python

#

I may have lost this in the chat tbh :/ Let me scroll back. At the moment, I have dizzy's line, which does do the trick, but also shows the ones which have no nulls

#

Man, so much googling tomorrow to even figure that line out, but again, let me scroll back to the groupby

#

Ahhh, so it's the same line. Is there a method to exclude the ones that have no nulls from it?

#

Damn, that is useful. How did I not find that. Newbie search terms ftl :/

#

I seriously cannot thank you guys enough

#

You're saving my sanity here

minor elbow
#

you can use .sort_values(ascending=False) to sort highest to lowest

trail ibex
#

I mean, I love everyone here so far, but this.....will let me get to tomorrow's bit. I'm gonna have to do some serious googling into why this works, but I seriously thank you man, really

#

YoDaddyM, I looked at the filtering example, but I got some weird outputs on that one. I probably need to read up a bit on the methods used

minor elbow
#

haha urw it turns out i have spent a lot of time working with pandas

trail ibex
#

Am sorry if it seems like I am begging to "solve it for me please" but honestly, I am finding this course way harder than I thought :/

#

I seriously do appreciate the help

#

And the explanations

minor elbow
#

theres a book "python for data analysis" by the original author of pandas which is a great reference to have around if you are going to be using pandas often

trail ibex
#

So here's another question

#

It may not be a sensible one

#

Actually

#

It isn't. I just noticed the answer to my question is a few lines up hehe. So I won't ask it πŸ˜›

trail ibex
#

iso_code.apply(lambda x: x.isnull().sum())

#

The x:x - are they supposed to be something?

#

Or is this saying " well, we're calling it x, so we'll call isnull() on x"?

#

That is how I am reading it

minor elbow
#

yeah the latter, lambda's are just like little one line functions

trail ibex
#

Gotcha, thank you. Google time on lambdas πŸ™‚

minor elbow
#

its like having ```
def something(x):
return x.isnull().sum()

trail ibex
#

Yeah, that makes sense, just wanted to make sure I was reading it right πŸ™‚

minor elbow
#

theres some quirks to what is returned by the grouping things in pandas, i try to avoid lambdas but sometimes they are the only option

trail ibex
#

Can I ask about isnull and isna - is there any difference?

minor elbow
#

i dont think there is no, isnull is preferred i believe but isna is kept for compatibility

#

dataframes originated in a different language called R which has isna so a lot of data science ppl are used to using it

trail ibex
#

Ah OK

#

I noticed that YoD was intimating that a filter would do in this case - is it a good idea looking into that to see if I can find a second way of doing this?

#

R is not in this course for me, it's in the next one. If I survive this.

minor elbow
#

yes filtering is a worth a look, its a logical predicate used to index (ie goes in the [] part)

trail ibex
#

So filtering is the same as slicing? (sorry, again, newbie)

minor elbow
#

yes and no but mostly no, slicing is a way to select a subset based on the index, filtering is a way to select a subset based on conditions

trail ibex
#

Ah, so filtering sounds much more interesting. I'll read up on that - thanks man πŸ™‚

#

The weird thing here is that I make bloody dashboards all day long in Tableau, and I can't fathom the basic stuff

#

SQL is nice, Python is yuck. Fight me πŸ˜›

minor elbow
#

different kettles of fish πŸ˜›

#

id take python over sql every day unsurprisingly

trail ibex
#

I know hehe πŸ™‚ And the reason for the course is because I need to move into dirty nasty areas of the business I'm in where it's all in excel files and not in the db

minor elbow
#

python is cool cause everyone knows excel but less ppl know python so its better for job security πŸ˜‰

trail ibex
minor elbow
#

sql dbs are good if you have a lot of well structured data

#

doing data science stuff in python is more like a map/apply functional approach

trail ibex
#

They usually end up not well structured tho. The place I work has some......spaghetti.....DBs. Yes, I can query them in SQL, but eh, who designed this

minor elbow
#

which can scale easier than sql

trail ibex
minor elbow
#

but sql dbs have been around a long time and very good at what they do

#

yeah having a good ref really helps

#

its a different way of thinking about things, it took me a long while to get my head around it, i wouldnt say i have mastered it yet either

trail ibex
#

Does it drive you guys nuts when two parts of the business are recording the same stuff, but seperately, and with different names? And often different units?

trail ibex
minor elbow
#

it'll be fine dude, taking the time to understand every step is the way to go, eventually it will all click

trail ibex
#

welp, you helped me out a lot this evening man. I'm gonna do some googling on lambas to see what it is that worked. Thanks again πŸ™‚ Sorry in advance but expect many more stupid questions in the near future πŸ˜›

#

Thanks Stel and YoD too πŸ™‚

vagrant kite
#

what do you need?

#

that is a question that you should post in this channel without pinging admins or moderators
ping moderators only if you need moderation, and admins only if there's something wrong with the server

nova pollen
#

typically you get a response faster if you ask your question instead of asking if someone can answer your question

#

so what's your question

#

sure

#

send it

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

nova pollen
#

while im reading it, what's your question about it?

#

which is?

#

can you show a screenshot of what you mean

serene scaffold
#

All help in this server is given by volunteers. There's no guarantee about when or if you'll get an answer.

#

Also, most of the mods and admins are not data scientists.

nova pollen
#

m1d = (-2/n) * sum(y - y_predicted) * x1
m2d = (-2/n) * sum(y - y_predicted) * x2

#

note the *x1 at the end, that's a numpy array

#

hence m1d is an array

#

hence later,

m1_curr = m1_curr - (learning_rate * m1d)

#

m1 curr becomes an array too

#

perhaps you want sum((y-ypred)*x1)

#

same for m2d

#
(y-ypred)Β² 
partial wrt m1
-2(y-ypred) * x1
average of gradients
1/n sum(-2(y-ypred)*x1)
= -2/n sum((y-ypred)*x1)
#

more accurately i would write
1/n sum(-2(y[i]-ypred[i])*x1[i])

#

-2 is a constant, hence can come out

#

anything that depends on the index stays inside

#

looks right, you might need to test run it yourself though

#

feel free to ping me again if something seems wrong

serene scaffold
#

yes. though do not ping me to ask me questions. I will answer if I am reading the channel.

#

though if someone expresses interest in your question, then you can ping them with respect to that question, to keep communication going.

#

with what? please ask a question, giving enough information that I can answer it if I know how.

#

Sorry, I can't do that right now. Did you check the output to see if it's correct? It's a lot more reliable to confirm that code has the expected result, than to stare at it convince yourself that it is or isn't written correctly.

viral jackal
#

where i start with machine learning

#

i see

hybrid isle
#

Hey, is there anyone who uses macbook to for ml, I had some problems while using tensorflow, I am a rookie on mac, needed someone to help me out in setting up an ml environment

humble garnet
#

hello everyone

#

I have a problem, I have a 2 class dataset with 2500 images, I need to put it in a csv file, the problem is, when I place the image matrix, it becomes an object, not float64, what advice can you give to collect it in float64 format?

royal crest
#

data_frame.Image.astype('float64') I guess

desert oar
# humble garnet

pandas doesn't have specific support for columns that contain arrays. each individual value might still be a numpy array of dtype float64, but the pandas column itself can only have dtype object because that is the "generic" dtype for arbitrary python objects that pandas doesn't know how to handle

#

important question: how did you expect to put a multi-dimensional array into a csv in the first place?

upper spindle
#

does anyone know if it is possible to scrape twitter for historical tweets using selenium or beautifulsoup based on a keywords/hashtags?

#

the twitter api wont let me do a large scale scrape as historical tweets are mainly for academic research

rotund isle
#

Hi, how does this snippet of code create training data and add some noise?

#We will create some one dimensional data with a bit of noise
num_points = 50
X = np.linspace(0,100,num_points).reshape(num_points,1)
y = (4 + 3 * X) + 25*np.random.randn(num_points, 1)```
serene scaffold
modest shuttle
#

Hello,
Why did the image change?

tidal bough
#

well, presumably you inverted the image somehow between these two cells

modest shuttle
#

in jupyter it is okay but pycharm doesn't correctly show it

#

Why?

tidal bough
#

although inverting colors is a very weird behaviour for a colormap

tidal bough
modest shuttle
#

how to fix it in pycharm?

deft sapphire
#

hey could anyone help me with this issue

#

i have been trying to capture video from my own webcam using opencv
but the window keeps greying out

#

any solution to it ?

#

i am trying to run this

#

import cv2 as cv

cap = cv.VideoCapture(0)

while True:
s, img = cap.read()

cv.imshow("Image", img)

cv.Waitkey(0)
#

anyone ?

orchid kayak
#

I've got a model which, when trained on the exact same data different times, has very different accuracy scores. Does this make sense? Does it mean the data is not good? (the accuracy values themselves range from around 0.04 to 0.09, so the scale itself is small but its always in that scale)

I simply don't understand how the same model architecture, trained with the same data can have different accuracy scores each time it is trained.

brazen spire
#

Anyone good with amazon Sagemaker?

viral bone
strong tapir
#

I've been trying to tackle the Snake with AI problem using the NEAT algorithm but I can't seem to get any behavior. I can't tell if its from my input data, bugs in the game itself (using pygame), or my NEAT config. My code is very junky so for further information I'll provide the important stuff below.

My input data right now is

input_data = 
[(distance of snake head to food in north south east west directions (4 inputs)), 
(distance of snake head to walls (4 inputs)), 
(nearest snake body in north south east west directions (4 inputs))]

'if there isnt any food or a snake torso on one of the directions it returns 0 for the input'

my activation function is defaulted to relu but can mutate to tanh

my output is toggling a list of directions [UP, DOWN, LEFT, RIGHT] to either True or False for the desired direction.

[NEAT CONFIG]
[NEAT]
fitness_criterion     = max
fitness_threshold     = 100000
pop_size              = 10
reset_on_extinction   = False

[DefaultGenome]
# node activation options
activation_default      = relu
activation_mutate_rate  = 0.5
activation_options      = relu tanh

# node aggregation options
aggregation_default     = sum
aggregation_mutate_rate = 0.2
aggregation_options     = sum

# node bias options
bias_init_mean          = 0.0
bias_init_stdev         = 1.0
bias_max_value          = 30.0
bias_min_value          = -30.0
bias_mutate_power       = 0.5
bias_mutate_rate        = 0.9
bias_replace_rate       = 0.1

# genome compatibility options
compatibility_disjoint_coefficient = 1.0
compatibility_weight_coefficient   = 0.5

# connection add/remove rates
conn_add_prob           = 0.5
conn_delete_prob        = 0.5

# connection enable options
enabled_default         = True
enabled_mutate_rate     = 0.05

feed_forward            = True
initial_connection      = full

# node add/remove rates
node_add_prob           = 0.2
node_delete_prob        = 0.2

# network parameters
num_hidden              = 0
num_inputs              = 12
num_outputs             = 4

# node response options
response_init_mean      = 1.0
response_init_stdev     = 0.0
response_max_value      = 30.0
response_min_value      = -30.0
response_mutate_power   = 0.0
response_mutate_rate    = 1.0
response_replace_rate   = 0.0

# connection weight options
weight_init_mean        = 0.0
weight_init_stdev       = 1.0
weight_max_value        = 30
weight_min_value        = -30
weight_mutate_power     = 0.5
weight_mutate_rate      = 0.8
weight_replace_rate     = 0.1

[DefaultSpeciesSet]
compatibility_threshold = 3.0

[DefaultStagnation]
species_fitness_func = max
max_stagnation       = 20
species_elitism      = 2

[DefaultReproduction]
elitism            = 2
survival_threshold = 0.2

I can provide more info, visualization, or code if needed

minor elbow
lapis sequoia
#

how do i train python???

minor elbow
#

very carefully

trail ibex
#

Am I safe enough to ignore this? It appears to have done the job

#

Code used is:

# Still 2 issues here. GDP and population. We're going to tackle both with forward and back fills
# Let's fix those
cols = ['gdp', 'population']
df.loc[:,cols] = df.loc[:,cols].ffill()
minor elbow
#

it might be df is already a copy

#

like from earlier code

#

u can pass inplace=True to skip the assignment

trail ibex
#

That was the issue, thanks a mill - again!

#

Dizzy, your advice last night got me to the point where I can play with pictures now with my dataset. Super pleased. I wanted to just say thanks again πŸ™‚ I know I was annoying, I was just super frustrated - you were kind and patient

minor elbow
#

ur welcome and i appreciate the thanks. i didnt find you annoying fwiw, theres definitely a pretty steep learning curve

trail ibex
#

Yeah hehe πŸ™‚ But when you look at the output you wanted (I'm still using excel to test if my code output matches) and find it's the right stuff......man, quite a buzz πŸ™‚

minor elbow
#

haha yeah i still get that feeling

trail ibex
#

Now all I gotta do is learn Seaborn πŸ˜›

minor elbow
#

if anything it was enjoyable to see your enthusiasm, its easy to get a bit jaded after a while

#

seaborn is so pretty

trail ibex
#

But for this set (what's left of it), I won't be doing anything nuts

#

It is, isn't it? πŸ™‚

minor elbow
#

the examples are good too

trail ibex
#

Man the documentation is exceptional

minor elbow
#

like all the docs

#

yeah

#

pandas has great docs too

trail ibex
#

It does, absolutely. The issue I had yesterday was just that I didn't know how to phrase my questions πŸ™‚

#

You helped a lot though, seriously

#

My googles today were better than yesterday's ones. And that's a result

#

I use Tableau at work, but Seaborn is actually prettier, I think

minor elbow
#

yeah theres 2 things in python i use a lot, dir() and help(), dir gives you a list of all the functions an object has so like i will go dir(df) to see if theres anything that looks like what i want, and like help(df.ffill) will load up the docs for the function, im not sure how help() works in ipython

trail ibex
#

I didn't know about dir, actually, that one will be useful. The help() tends to give a LOT of info all in one blast. Thanks!

minor elbow
#

yeah i use python from the command line mostly so help just gives the info one page at a time and its easier to search through

trail ibex
#

Ah yes, OK. I am using Jupyter (I am required to for this project, but I actually kinda like it anyhow)

#

Jupyter tends to squash the output if there's more than a few lines

minor elbow
#

jupyter is good for sharing examples/demos with others

trail ibex
#

I have PyCharm installed as well, but I am still too afraid to tackle using it πŸ˜›

orchid kayak
minor elbow
#

i use visual studio code a lot, not for python but for other langs

trail ibex
#

Anyhow, looks like MooseMom needs help more than me. Again, thanks so much!

minor elbow
#

can u share ur code or parts of it?

orchid kayak
#

I'd be happy to, but just to be clear I am following a semi-tutorial and this is the first time I am meddling in the machine learning field (signal processing), so I may not fully understand all the choices here

minor elbow
#

sure np

orchid kayak
#

The model:

  model = Sequential()
  model.add(Conv2D(32, (3,3), padding='same', input_shape=(513, 26, 1), name='conv_1'))
  model.add(LeakyReLU(name='leaky_relu_1'))
  model.add(Conv2D(16, (3,3), padding='same', name='conv_2'))
  model.add(LeakyReLU(name='leaky_relu_2'))
  model.add(MaxPooling2D(pool_size=(3,3), name='max_pooling_1'))
  model.add(Dropout(0.25, name='dropout_1'))
  model.add(Conv2D(64, (3,3), padding='same', name='conv_3'))
  model.add(LeakyReLU(name='leaky_relu_3'))
  model.add(Conv2D(16, (3,3), padding='same', name='conv_4'))
  model.add(LeakyReLU(name='leaky_relu_4'))
  model.add(MaxPooling2D(pool_size=(3,3), name='max_pooling_2'))
  model.add(Dropout(0.25, name='dropout_2'))
  model.add(Flatten(name='flatten_1'))
  model.add(Dense(128, name='dense_1'))
  model.add(LeakyReLU(name='leaky_relu_5'))
  model.add(Dropout(0.5, name='dropout_3'))
  model.add(Dense(513, name='dense_2'))
  
  sgd = SGD(learning_rate=0.001, decay=1e-6, momentum=0.9, nesterov=True)
  model.compile(loss='mse', optimizer=sgd, metrics=['accuracy'])
#

The feature method:

def transforming_librosa(y_mixture, y_vocals=None):
  mixture_librosa_stft = lb.stft(y=y_mixture, n_fft=1024, win_length=1024, hop_length=256)
  mixture_librosa_stft = abs(mixture_librosa_stft)
  mixture_stft = normalize(mixture_librosa_stft)

  if y_vocals is not None:
    vocal_librosa_stft = lb.stft(y=y_vocals, n_fft=1024, win_length=1024, hop_length=256)
    vocal_librosa_stft = abs(vocal_librosa_stft)
    vocals_stft = normalize(vocal_librosa_stft)
    
    return mixture_stft ,vocals_stft
  
  else:
    return mixture_stft, []
#
def featue_extract(cls):  
  y_mixture = []
  y_vocals = []
  for i in range(len(df_data)):
    a = df_data.at[i, 'mixture']
    b = df_data.at[i, 'vocals']
    

    m, v = transforming_librosa(a, b)

    if(cls[i] == 1):
      t = binary_mask(v)
    else:
      t = np.zeros(shape=(513, 26), dtype=np.float64)
    
    t = t.T
    t = t[13]
    
    y_mixture.append(m)
    y_vocals.append(t)

  return np.array(y_mixture) ,np.array(y_vocals)
minor elbow
#

ok thats a deep learning model, it will randomly initialize the weights

orchid kayak
#
y_mixture = np.reshape(y_mixture, newshape=(3168, 513, 26, 1))
y_vocals = y_vocals.astype(np.float64)
minor elbow
#

its not uh regression strictly speaking

orchid kayak
#

Oh?

#

I hadn't realized that

minor elbow
#

neural networks are effectivetly weighted sets of regression models

#

the weights is what gets "learned"

#

usually they are randomly initialized

#

also dropout layers will randomly drop things

#

try put np.seed(x) at the top

orchid kayak
#

So are you saying that due to the random initialization the results won't necessarily repeat themselves

minor elbow
#

yes

#

that + dropout

orchid kayak
#

I should have thought about that, thanks

#

Now I just need to figure out how to increase the accuracy

minor elbow
#

also sgd = stochastic gradient descent, and stochastic is just a fancy way of saying random so theres randomness in that as well

orchid kayak
#

the sgd part I copied, I was not taught the meaning of it

minor elbow
#

if ur new to ML, starting with deep learning is definitely hard mode

#

thats a good reference though

orchid kayak
#

My high school decided it was a good idea to let us do final projects in this area

#

I could've done something a lot simpler i.e image classification but I wanted something more interesting hence the signal processing

#

Had I had any idea how complicated it would be I'd had never done it

minor elbow
#

if theres termporal ordering to the data, ie samples from a signal over time, a different type of model like rnn or lstm might be more effective

#

*temporal

#

yeah its pretty full on, most of the problems i deal with arent really suited to deep learning, plus the training time for the models is too long for me so i dont use it much

orchid kayak
#

I think I understand what you are saying, but in reality what this whole project is, is converting the audio data into image data, and converting a regression problem to a classification one

minor elbow
#

oh right

#

model building and tuning is quite a lot of work

#

id find a tutorial/example model that does what you want and try it out, which sounds like maybe what you are doing?

#

realistically coming up with a novel/new method of using deep learning for signal anaylsis would be a phd worthy topic

orchid kayak
#

Exactly my issue lol, I've discovered that my topic has almost 0 results on a deep learning model, except a pair of articles by the same person who discusses EXACTLY my topic. But his articles don't give a fully detailed explanation on how to do it yourself, so the struggle still remains

mint palm
#

i saw a sitation of kaggle dataset
it mentioned it had 65000 entries
but when i download it and open in excel it doesnt have entries...whats the matter

#

but i cant see 65000 entries

prime hearth
#

hello , for map estimate linear regression

#

how would i find the posterior term?

#

shold i use chi square method

#

or any arbitary value?

brazen spire
#

Anyone proficient with Amazon sagemaker?

#

can't get the GPU to work

minor elbow
#

are u on a gpu instance

brazen spire
#

yeah

#

which is weird

#

i know we can force it on tensorflow

#

but i don't know with pytorch

minor elbow
#

ive only used mxnet with sagemaker

prime hearth
#

hello

#

for map estimation linear regression

#

can i please ask

#
from predictor import Predictor
import numpy as np


class LinearRegressionMLE(Predictor):
    def __init__(self):
        self.weights = None

    def train(self, train_x, train_y):
        bias = np.ones((train_x.shape[0], 1))
        X = np.concatenate((train_x, bias), axis = 1)
        self.weights = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(train_y)
        

    def predict(self, test_x):
        bias = np.ones((test_x.shape[0], 1))
        X = np.concatenate((test_x, bias), axis = 1)
        return X.dot(self.weights)```
#

from this website https://medium.com/@luckecianomelo/the-ultimate-guide-for-linear-regression-theory-918fe1acb380

#

i would like to please know for the train method(), how do we get x weights?

#

also the formula for mle is
X^T Y(X^T X)^-1

tidal bough
#

np.linalg.inv(X.T.dot(X)).dot(X.T).dot(train_y)
this looks like the logistic equation - that's how.

prime hearth
#

because when i do the math i get (1,1)

#

for weights

#

and if i have 3 features

#

how to get 3 weights?

#

i see they aded bias as second column to x

#

to get 2 weights i think

#

but i dont understand why they did this

#

thanks so much for taking time to respond

tidal bough
# prime hearth how to get 3 weights?

X is a matrix of shape sample_number, feature_number. train_y is a vector of shape sample_number. Then let's look at the shape of the result:

X.T @ X is (feature_number,feature_number).
Inverting it preserves the shape.
Multiplying by X.T once more produces a shape of (feature_number,sample_number)
finally, multiplying by a vector of (sample_number,1) gets you a (feature_number,1) vector

so indeed, the shape of the result should always be correct - a vector of feature_number weigths.

prime hearth
#

hmm okay so let say i have x shape (2,1) and y is (2,1)
if i apply this to the math above that you wrote i get
(1,2) x (2,1) = (1,1)
we take inverse of this so (1,1)
(1,1) times (1,2) = (1,2)
now times y (2,1) becomes (1,1)

#

this is using dot product like code above

tidal bough
#

yeah, that's right - you started with a dataset with 1 feature and ended up with 1 weight

#

though note that this is counting the bias among the features

#

so if you have 1 feature, including the bias, that means you actually have no data at all, only the bias column of ones.

prime hearth
#

oh so i have one feature and one label

#

so 2 features in total but one is x and another is y

tidal bough
#

the label doesn't count as a feature

prime hearth
#

like salary and age

#

oh ok

tidal bough
#

features are the inputs to your model that you use to determine the output (label)

prime hearth
#

yes

#

so you said that this acount bias among feature

tidal bough
#

in my definition, I count the bias as a feature, which means you'll never have less than 2 features, yeah

prime hearth
#

so the reason why they added bias

#

to X

#

so you are saying we need to add bias

#

before doing the formula to calculate weights

tidal bough
#

We need to add the bias to X because if we don't have the bias column, then our linear regression won't have a constant term

#

like, it won't be able to learn relationships like y = 5 + x - it'll approximate it with something like y=x and be consistently wrong by the constant of 5

prime hearth
#

oh okay thanks and that @ symobol

#

is that like mul;tiplication of amtrix

#

or dot product as well

tidal bough
#

numpy uses @ for matrix multiplication; it's the same as using np.dot and I'd say it's more readable

tidal bough
prime hearth
#

oh okay

lapis sequoia
#

Is anyone here familiar with bagging (bootstrapping + aggregating)? I have one doubt about a thing which I'm not sure I'm doing right:
is it normal to get the same accuracy no matter the number of bootstrapped trees??

#

I don't think it's right

#

but I can't quite see what I'm doing wrong

prime hearth
#

thanks conufsed reptile

lapis sequoia
#

very confused

#

n-no one?

prime hearth
#

oh sorry forgot to ask one more thing, why did they use np.ones for bias

#

they only estimate weights use mle but not for the bias and most website dont explain this

#

when i derived mle for bias it comes to be BiasMLE = 1/N summation Yi - 1/N *Weights * summation Xi

hollow sentinel
#

are there any courses i can use for scikitlearn?

tidal bough
#

the latter approach is easier

hidden wadi
#

hello ypu can help me

#

you

#

with a ai

serene scaffold
#

Looks like that person left.

hollow sentinel
#

i think it's better to understand how the diff algos work than sklearn

#

i'm not sure tho

frosty flower
#

This represents 10 images of 1024w and 768h

#

I want to look at a specific pixel (i, j) in all 10 images and see it as a vector with length 10

#

How do I do that?

minor elbow
#

z[:,i,j]

serene scaffold
#

"vectors of tuples" aren't really a thing.

minor elbow
serene scaffold
minor elbow
#

its a bit more under the hood than sklearn but you should be able to pick up skl afterwards

serene scaffold
#

in either case, you should not aim to learn specific libraries

minor elbow
#

i cant remember what lang it is, im sure ppl have done python versions u can look at

serene scaffold
#

libraries are tools. you should try solving different problems, and over time you'll figure out which libraries can help you solve those problems.

minor elbow
#

yeah sklearn is a pretty straight forward ml lib if you dont know how to do ML then sklearn or any other library will be no use

serene scaffold
#

right. also sklearn does a lot of different things that are kind of unrelated.

lapis sequoia
inland zephyr
#

Hello all, i want to explore multi-input cnn which combine wavelet and CNN which mentioned in this paper... however i need sufficient good example or hands-on for multi-input CNN model. The paper source for the diagram: http://arxiv.org/abs/1805.08620

frosty flower
#

But now I've got a different problem: I need a 1024 by 768 matrix that each entry (i, j) is the original entry (i, j)'s dot product with itself

#

I can do it with a loop but is there a vectorized way to do it?

serene scaffold
serene scaffold
#

okay, let me see

#

I might have to look into that tomorrow tangerine_think

iron basalt
# frosty flower I meant ith row and jth col

I think you just described matrix powers: ```py

import numpy as np
x = np.arange(25).reshape((5, 5))
x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
y = np.dot(x, x)
y
array([[ 150, 160, 170, 180, 190],
[ 400, 435, 470, 505, 540],
[ 650, 710, 770, 830, 890],
[ 900, 985, 1070, 1155, 1240],
[1150, 1260, 1370, 1480, 1590]])
np.dot(x[0,:], x[:,0])
150
np.dot(x[1,:], x[:,0])
400
np.dot(x[0,:], x[:,1])
160
np.dot(x[1,:], x[:,1])
435

lapis sequoia
#

Is anyone here familiar with bagging (bootstrapping + aggregating)?
I'm trying to implement both the bootstrapping and aggregating phase manually, but I get inaccurate results
Can someone help me sort this out?

iron basalt
#

(Dot product along last axis)

inland zephyr
#

Hello, i have tried to build a multi-input model and try to predict the output of the model. However, i have problem when try to run the model.

#

I have four input (it is image actually), and processed in parallel (through multiple Conv2d) before concatenated at the end. However, i got error while run the model to predict the inputs ``` ValueError: Exception encountered when calling layer "model_6" (type Functional).

Input 0 of layer "conv2d_3" is incompatible with the layer: expected min_ndim=4, found ndim=3. Full shape received: (32, 142, 32)

Call arguments received:
  β€’ inputs=('tf.Tensor(shape=(32, 142, 32), dtype=float32)', 'tf.Tensor(shape=(32, 142, 32), dtype=float32)', 'tf.Tensor(shape=(32, 142, 32), dtype=float32)', 'tf.Tensor(shape=(32, 142, 32), dtype=float32)')
  β€’ training=False
  β€’ mask=None```
inland zephyr
#

I think i missing something to feed the model with my four images

#

if I can see the model summary, why i cannot feed the model directly

#

this is the model that i build and compiled, however when I try to call model.predict(x=[img1,...,img4]) the errors happen

misty flint
#

looks like your dimensions dont match

#

its expecting 4 dimensions but found only 3

inland zephyr
#

yep you right

#

i need to reshape my image so it can feed directly to the model.

umbral anvil
#

I'm working on a project to predict congestion at the airport.
We are trying to build a pipeline that connects data and machine learning.
The API data is currently DB, and the SQL used here is 'postgra sql'.

vivid ridge
#

Which book/course is recommended for Time series analysis/prediction (for someone without statistics background, but with math degree)

umbral anvil
mint palm
#

I saw some work.....it had used CNN for detecting malicious data....but the dataset was actually in csv file and not at all related to image.....it was kind of a typical data used in normal ANN....

#

Is it actually possible to do that

#

??

agile cobalt
orchid kayak
#

Do the amount of data and the model accuracy go hand in hand? Does lack of data mean poorer accuracy? If I add more data for my training, can my accuracy improve?

kindred silo
#

I am not sure if this is the right place to ask but does anyone have a good dataset on pronouns/neo-pronouns ?

misty flint
#

you can start here i guess https://restfulapi.net

REST API Tutorial

REST is an acronym for REpresentational State Transfer. It is an architectural style for hypermedia systems and was first presented by Roy Fielding.

misty flint
orchid kayak
#

Thanks, so just to make sure for myself: If I am following a tutorial where the creator has a dataset of 15M examples, while I have 3000, it is expected my model will preform worst, correct?

lapis sequoia
agile cobalt
#

I wouldn't be surprised if it performed better against the training data, but a bit worse on the test data
how many samples you need depends mostly on how many features you're using iirc, but it can vary a lot

lapis sequoia
#

it depends on the task and data. but having worst result at 3000 is not a thumb rule at all.

#

and hm if it has 1.5M for training then it can fall into overfitting too. (not necessaily)

umbral anvil
desert oar
desert oar
#

the job of a data scientist is to build models and design business solutions using data and data-derived products. the job of a data engineer is to support data science with software

umbral anvil
desert oar
#

then i recommend that you start by learning sql and specifically postgresql

#

because if that's where the data is currently stored, then you will need to be able to at least query from it and connect to it from other systems

#

it's also important to define what you need to accomplish, in more detail than "connect machine learning to data" which is vague

umbral anvil
#

@desert oar Thank you very much for your advice.
I understand what you mean.

frosty flower
#

How do I perform linear regression on data that's structured this way?

umbral anvil
#

Is the purpose of learning the size of the image? That's how I understood it.

frosty flower
#

The training set is 10 of 1024x768 images, the target is one array with shape (1024, 768)

#

I want to do a linear regression for each of the pixels

#

and store w and b (or w0 and w1, whatever you call it) separately in two 1024x768 arrays

umbral anvil
#

@frosty flower I think it's about image learning among AI techniques. Is that right?
I won't be able to help, but others will be able to help you.😒

storm stone
#

hey, is anyone here familiar with openAI, GPT-3 or their API?

serene scaffold
storm stone
#

so basically, i'm trying to develop a program that uses two bots to talk to each other

#

sorta like a chatbot in a way

#

with the openAI API which uses GPT-3 models

mild dirge
storm stone
#

but my API settings for GPT must not be configured correctly

mild dirge
#

Depends on how complicated your desired transformation is

storm stone
#

because it is unable to speak fluently with each other

#

if anyone could help me out with this, i would really appreciate a dm or anything

brazen spire
#

What are some applications of neural rendering?

#

i can't find much online

#

besides deepfakes

desert oar
desert oar
#

linear regression with 1 variable has a closed form solution, maybe you can just write a vectorized numpy expression and apply it to your 10x1024x768 array of training images

#

but i have to say this seems like a weird thing to want to do

modest shuttle
#

Why pycharm doesn't show correctly the image???

frosty flower
modest shuttle
#

Why????????????????????

frosty flower
#

It's likely your cv2.imread parameters

#

I'm not too familiar with it but I think:

  1. The imread and imshow cmaps should probably match each other
  2. By default, the cv2 reads images using a BGR order instead of RGB. Might also be something you should aware
mild dirge
#

yeah iirc matplotlib is RGB and cv2 is BGR by default

silk basin
#

hey i want to create a programm that lets u check other ppl's usernames what should i use

#

cause idk

mild dirge
silk basin
#

i use it for

#

but it isnt bout the library

modest shuttle
silk basin
#

i just wanna know if i can use dict

modest shuttle
modest shuttle
shut trail
iron basalt
shut trail
#

check the gui settings before bug reporting haha

#

@modest shuttle

mild dirge
#

why would anyone want that option enabled lol

#

that's just screaming for confusion

shut trail
#

dark theme with big white blocks can be hard on the eyes

desert oar
# shut trail

i bet it's an attempt to make charts with matplotlib defaults look good on the dark background

#

what a funny default

mild dirge
#

yeah it's just strange to me, you can change matplotlib plot styles anyways

#

making it invert images just is weird haha

neat schooner
#

does anyone have a recommendation for learning Pandas Multiindexing. I am struggling with this concept. Been reading Python data science handbook by Vanderplas, but it's just not sticking.

shut trail
shut trail
#

if you use tuples as indices, it gets annoying to use one index at a time

#

pandas multiindex allows you to get past that

neat schooner
#

@lapis sequoia, not sure if it's 128 but the chapter on Hierarchical Indexing (reading it online)

shut trail
#

ya thats the right section. "the bad way" describes the situation it helps with

neat schooner
#

the whole chapter seems disparate and not cohesive (to me at least)

shut trail
#

haha fk the whole chapter just read the 'the bad way' and forget everything else . at least then you'll know the what and why of multiindex

neat schooner
#

Pandas: when doing it the bad way works....

shut trail
#

it does lol read

#

if you produce documents that you dont want to scare people, these kinds of tools are really nice

serene scaffold
desert oar
shut trail
#

re munging every time you want to use one index in a multi index situation

desert oar
#

sometimes the reset_index/set_index dance is unavoidable, eg. after groupby or before join

desert oar
shut trail
#

yup. pandas tries to simplify the code and and make it more efficient

desert oar
#

oh, no the "bad way" here is literally just a multiindex but worse

#

there is no reason imo to explicitly use a tuple-valued index instead of a multiindex

shut trail
#

lol no kidding

neat schooner
#

I guess what isn't clicking is when creating a Multiindex. do I just create a Series, a dataFrame, do I use from.array, from.tuple, from.product? I get having options it seems overly complicated

serene scaffold
shut trail
#

^ this is why i love pandas. but I felt like you did at first dr.venture

neat schooner
#

thats why I asked. I want to understand this and maybe a different approach would make it click. Was looking at Corey schafer's tutorials to see if he had one

shut trail
shut trail
#

you dont need to call multiindex, the logic just works

neat schooner
#

ok, thank you!

frosty flower
#

Dumb question

#

Oh....

#

Nvm I'm stupid

misty flint
#

dw, no one saw it except me

#

im jk. it was a fair question

frosty flower
#

Hey everyone

#

So a general question: what makes you interested in data science?

#

Personally I feel like the more ML courses I take (and the more assignments I do), the less I feel like working in this field.

#

Most CS concepts I've learned at uni made me feel like "aha that's smart, that's the way it should be done". But DS concepts are more like "this works alright, and now let's look at it algebraically to see if we can make it work even better"

#

If any of you genuinely find DS interesting, would you share your thoughts on what makes it fun? Because I kind of can't see it right now (while taking multiple courses and struggling). Need some motivation.

worldly dawn
fringe igloo
#

Can someone help with fixing this matplotlib chart animation? It's recreating the lines each call instead of redrawing them

from random import randint
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

x = [datetime.now() - timedelta(seconds=i) for i in range(10)]
random_numbers = [randint(0, 100) for i in range(10)]
random_numbers_again = [randint(0, 25) for i in range(10)]


def update(_frame):
    x.pop(0)
    x.append(datetime.now())
    random_numbers.pop(0)
    random_numbers.append(randint(0, 100))
    random_numbers_again.pop(0)
    random_numbers_again.append(randint(0, 25))

    plt.plot(x, random_numbers, label="random_numbers", color="#1f78ff")
    plt.plot(x, random_numbers_again, label="random_numbers_again", color="#ff4747")

    plt.xlabel("Time")
    plt.ylabel("Random Number")
    plt.title("Random Number Graph")
    plt.legend(loc="upper left")


def main():
    fig, ax = plt.subplots()
    _animation = FuncAnimation(fig, update, interval=1000)
    plt.show()


if __name__ == "__main__":
    main()
misty flint
#

in the real world, the problems are never straightforward and theres usually more than one solution

shut trail
fringe igloo
#

So something like this?

def update(_frame):
    x.pop(0)
    x.append(datetime.now())
    random_numbers.pop(0)
    random_numbers.append(randint(0, 100))
    random_numbers_again.pop(0)
    random_numbers_again.append(randint(0, 25))

    plt.plot(x, random_numbers, label="random_numbers", color="#1f78ff")
    plt.plot(x, random_numbers_again, label="random_numbers_again", color="#ff4747")

    plt.xlabel("Time")
    plt.ylabel("Random Number")
    plt.title("Random Number Graph")
    plt.legend(loc="upper left")

    plt.draw()


def main():
    fig, ax = plt.subplots()
    _animation = FuncAnimation(fig, update, interval=1000)
    plt.show()
#

Nvm nope

shut trail
#

tbh honest i didnt read the first snippet, just thought it would be a good piece of info for you

#

try it haha

fringe igloo
#

I don't really understand how to use it

#

The above produces a nightmare

shut trail
#

are you in interactive mode ?

fringe igloo
#

No idea what does that mean, I just run the code above

#

Is that in interactive mode?

#

I'm just looking for a chart that updates with the lines with the data every sec

#

Which it does but it creates new lines each time

shut trail
#

when i read "recreating the lines each call instead of redrawing them", i think, add a plt.draw() before the plt.show()

fringe igloo
#

I'm doing it completely wrong I just realized

#
from random import randint
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

x = [datetime.now() - timedelta(seconds=i) for i in range(10)]
random_numbers = [randint(0, 100) for i in range(10)]

plt.xlabel("Time")
plt.ylabel("Random Number")
plt.title("Random Number Graph")

fig = plt.figure()
line, = plt.plot(x, random_numbers, label="random_numbers", color="#1f78ff")
plt.legend(loc="upper left")


def update(_frame):
    x.pop(0)
    x.append(datetime.now())
    random_numbers.pop(0)
    random_numbers.append(randint(0, 100))

    line.set_data(x, random_numbers)
    return line,


def main():
    _animation = FuncAnimation(fig, update, interval=1000)
    plt.show()


if __name__ == "__main__":
    main()
#

I think this is much closer to the correct one

#

Though still not yet correct

shut trail
#

dont want to try plt.draw before plt.show ? lol

fringe igloo
#

Like this or what?

def main():
    _animation = FuncAnimation(fig, update, interval=1000)
    plt.draw()
    plt.show()
#

I don't see why/where/how I need to use it

shut trail
fringe igloo
#

Pretty sure that's completely different

#

Since I'm using FuncAnimation

#

I haven't seen draw used anywhere in the examples

shut trail
#

you dont need draw with animation

fringe igloo
#

Right

#

So what's wrong with my code?

#
from random import randint
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

x = [datetime.now() - timedelta(seconds=i) for i in range(10)]
random_numbers = [randint(0, 100) for i in range(10)]

fig = plt.figure()
line, = plt.plot(x, random_numbers, label="random_numbers", color="#1f78ff")
plt.legend(loc="upper left")
plt.xlabel("Time")
plt.ylabel("Random Number")
plt.title("Random Number Graph")


def update(_frame):
    x.pop(0)
    x.append(datetime.now())
    random_numbers.pop(0)
    random_numbers.append(randint(0, 100))

    line.set_data(x, random_numbers)
    return line,


def main():
    _animation = FuncAnimation(fig, update, interval=1000)
    plt.show()


if __name__ == "__main__":
    main()
#

The output of that makes no sense

shut trail
#

nothing since you havent defined featrues or outcomes

shut trail
fringe igloo
#

No, run it

#

It just looks nonsense

shut trail
#

likee... randomly chosen numbers ?

fringe igloo
#

No

#

Did you run it?

shut trail
#

no

fringe igloo
#

...

#

Can someone help with the above?

shut trail
#

if you had defined features and outcomes, train_test_split does just what it says. it returns training and testing sets of the size you asked for

#

and random sate is used for reproducibility

mild dirge
#

Why is random_state always set to 42 for that one?

#

Is there like a single popular guide that uses that value or something?

shut trail
#

hitchhikers guide