#data-science-and-ml

1 messages · Page 380 of 1

neat anvil
#

overfitting also is usually determined by comparing performance on the validation set versus the performance on the test set, and is really quite impossible to determine w/o at the very least that comparison

pastel valley
#

if i used this i should also indicate the labels for my validation set?

somber prism
#

wym indicate the labels ?

somber prism
pastel valley
somber prism
# pastel valley

yes this is how you do it ```

train_test_split( x, y, test_size = some float, stratify = y
)```

#

only if you specify stratify it will even split the distributions for train and validation

pastel valley
#

this is sample of the split i did top is train bottom is validation but its possible that the validation set is unbalanced ? like maybe its composed of 50% class_a right?

#

based on imagedatagenerator its what it does

#

it did not specify if its balance or not

#

is it to necessary to have balanced validation set?

#

oh i see what you mean here is the validation data of fit()

#

but that part about regularization layer drop out meaning during prediction and validation my dropout layer is ignored?

#

drop out layers are just used during training?

neat anvil
#

typically dropout layers are only used during training, yes. That doesn't necessarily have to be true but that is the convention.

prime hearth
pastel valley
neat anvil
#

I cannot speak to your particular codebase. You'll have to refer to the documentation and/or explore the source code.

#

The typical situation would be that dropout layers are essentially only actually "in" the model during back-propogation in the training process

#

and are not used at any other time

pastel valley
#

its ignored based on this if i understand it correctly

pastel valley
#

so only half of the units on a layer will be updated?

#

at first i thought the dropout happens during forward prop?

neat anvil
# pastel valley back prop is the time where the weights are being updated righ?

yes. I'd highly consider taking this course if you have any confusion on any of these details we've been discussing. https://www.coursera.org/learn/machine-learning

Coursera

Learn Machine Learning from Stanford University. Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, ...

#

consider taking some time to write a basic neural network "from scratch", meaning only using numpy functions and data structures. It'll really build up a lot of these complicated concepts of the internals of deep learning in your mind

#

why things are done a certain way

#

instead of just having to memorize "people only apply dropout during backwards propagation not during forward propagation b/c that's the way it's done"

pastel valley
#

it mentioned about normalization and drop outs also but i forgot and its probably not to detailed compared to the course

#

thank you for the reference 😅 👍

mighty agate
#

hello, i'm trying to display my data frame and i don't know why i'm trying to show the elements from my json array it doesn't...

#
import json
import pandas as pd
import matplotlib.pyplot as plt
from pandas.io.json import json_normalize```
#
with open('C:/Users/PC/Desktop/desktop/git/projets/python/Data-Analysis-Velib/station_status.json', 'r') as f:
    velos = pd.DataFrame(json.loads(f.read()))
#df_velos = pd.json_normalize(velos['data'], record_path=['stations'])
df_velos = pd.json_normalize(velos['data'])```
serene scaffold
mighty agate
#

all right, like that? py with open('C:/Users/PC/Desktop/desktop/git/projets/python/Data-Analysis-Velib/station_status.json', 'r') as f: velos = json.loads(f.read()) #df_velos = pd.json_normalize(velos['data'], record_path=['stations'])

lapis sequoia
#

yo

#

a little help please?

#

i am not able to install ecapture module

#

pip install ecapture

#

it says scikit-image wheels cannot build

#

using python 3.10

serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold
#

It doesn't look like this question is on topic for this channel.

strange zealot
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

graceful glacier
#

i have the following table

#

and i want to add a 'Games out of Position' column.

it will detailing how many games he has played in a position that is
NOT the position listed in the 'position' column

#

for example
adrian it would be 2 and then 9,
and for gomez it would be 17, 5, 14(read from top to bottom)

#

how can i accomplish this?

desert oar
graceful glacier
#

yes, thats the index

desert oar
#

it sounds like you are seeking to aggregate the data to just 1 data point per player

#

so you are looking to create a new table with just player as the index

#

and your dataframe is using a multiindex?

graceful glacier
#

no i dont think so. just looking to add a column to the existing table. aggregation might be involved in acchieving this however im not sure

#

yes multi index

desert oar
#

well that number seems like a number per player only

#

so you could then re-join that back into the original table, but the number will be repeated for each row for each player

graceful glacier
#

no let me just send a picture of what the solution column will look like

desert oar
#

ok. i think i am missing some context

#

what do you want to do with this quantity that you calculate?

graceful glacier
graceful glacier
desert oar
graceful glacier
#

so for gomez, he played 17 games outside the RB position, 5 games outside the RBP position and 14 games outside the SUB position

desert oar
#

i think i understand that. what i recommend is writing a function that takes a dataframe as an input and returns this new column, that works on one single player. then you can do a .groupby(level='player name').apply(your_new_function) to get your desired column

graceful glacier
#

ok i think i got it

#

a window function totaling the total appearences for that player

#

and subtract that number from the appeared column

desert oar
#

im not sure you need a window function, but that is one way to do it

#

you would probably need to define a custom "window" which i think is pretty complicated

#

i tried to do it once and gave up, the docs weren't clear and the examples that ship with pandas are kind of convoluted internally

#

that's why i suggested groupby

#

honestly i'd just loop over rows inside the function

#

these individual per-player dataframes are so small that the performance isn't important

graceful glacier
desert oar
#

i wouldn't even bother

graceful glacier
#

ok, ill try both way none the less and learn something new.

graceful glacier
desert oar
#

the super-naive way is something like this:

def compute_player_oop(player_df):
    result = []
    for row in player_df.itertuples():
        other_rows = player_df.loc[player_df.index != row.index]
        other_total = other_rows['appeared'].sum()
        result.append(other_total)
    return pd.Series(result, index=player_df.index)

data['games_out_of_position'] = data.groupby(level='player name').apply(computer_player_oop)
#

something like that anyway

#

untested code written by volunteer strangers etc.

graceful glacier
#

ahhh nice. thanks by the way

desert oar
#

this is pretty inefficient but easy

graceful glacier
desert oar
#

there are probably faster ways using set operations on indexes or something with window functions

graceful glacier
desert oar
#

yeah for small dataframes it's fine

#

if you are looping over 5 rows who cares

#

if you are looping over 5000 rows even who cares

graceful glacier
#

true

#

ok this was constructive, thanks

desert oar
#

the reason to avoid loops on small datasets is more for readability + concision than for performance

#

you're welcome

serene scaffold
#

you're welcome to who? I was going to say that sometimes the only reason I encourage people to avoid loops is to force them to learn the API lemon_sweat

desert oar
#

idk if there's a tidy pandas solution for this problem though

#

"un-selecting" one row at a time

serene scaffold
#

also I see that you're speaking to Ahmad. I was momentarily confused because you wrote that message quite quickly while I was typing, so I thought that was related.

desert oar
#

basically a "leave-one-out" operation

desert oar
#

it'd be nice to have a general efficient idiom for "leave-one-out" operations with pandas

#

or even "leave-n-out", like an inverse window function

#

could be a good SO question (if it isn't already)

serene scaffold
#

if you ever figure it out, you can ask it and answer it

#

if there's a way to make a boolean series where n are True, you can use that to get the retained rows and invert it with ~ to get those that aren't.

desert oar
#

that's pretty much what i did, other_rows = data.loc[this_row.index != data.index]

#

(assuming that the indexes are unique, which they really should be)

#

non-unique indexes in pandas are just a Bad Idea, like non-string column names

minor elbow
#

u could sum all of them and then subtract the row

marsh juniper
#

why there is no data in due_date and a few other columns when the dataset originally has it

#

please if anyone can help me asap, I have to turn in my project

desert oar
frozen hedge
#

does anyone know how to use kronecker delta or levi civita symbols with autograd / jax?

wide rose
#

Anyone have idea how to stack 3d matricies in numpy?

#
corgi_1 = np.asarray(cv2.imread(corgi_jpgs[0]))
corgi_2 = np.asarray(cv2.imread(corgi_jpgs[1]))
corgi_3 = np.asarray(cv2.imread(corgi_jpgs[1]))

corgis = np.stack((corgi_1, corgi_2))
corgis.shape```
#
(2,100,100,3)
#

when I try to add the third one to "master" array I get a value error

#
corgi_3.shape
(100,100,3)
#

do I just need to add an extra dimension along the fourth axisa

serene scaffold
wide rose
#

tried it same error

#

tried literally all of them hahaha

#

before coming here

#
corgi_3 = np.expand_dims(corgi_3, axis=0)
corgis = np.stack((corgis, corgi_3))```
#

doesnt work

#

vstack gets it with the extra dim added

#

cool thanks

wide rose
#

yea I just had to add the extra dim for vstack to work

#

wasnt expecting that behavior but in retrospect makes perfect sense

#

realized as soon as I typed here :/

desert oar
#

vstack and hstack don't add dimensions for you

#

you can use .reshape to "wrap" with an extra dimension

wide rose
#

yea I used expand dim

desert oar
#

ah i see, yep

wide rose
#

I get so use to numpy being so smart sometimes i do dumb shit like that lmao

desert oar
#

it'd be nice to have a function like "concat with extra dimension" function

#

there are stack_columns and stack_rows but i think those are 2d-only

wide rose
#

wonder how hard that would be

iron basalt
#
>>> def combine(x, y):
...     return np.concatenate((np.expand_dims(x, axis=0), np.expand_dims(y, axis=0)))
#

Make it asserts that x and y have the same dims though*

#

Or use stack*

desert oar
#

yeah it's not that bad

#

just nobody did it

misty flint
#

has anyone been able to play around with gpt-3 before? how was your experience

graceful glacier
#

i have the following table

#

and i want to apply somthing like this

#
np.where(dfx['space'], dfx['full_name'].str.extract('.*\s(.*)'), dfx['full_name'])
#

its raises an error and i understand why

#

any ideas on how to execute this?

#

basically i want to get the last name, which is anything after the first space, if a space exists

#

i found a round about way of doing this(i created multiple columns and then removed them) but i would still like to know if its possible with one line of code

wide rose
wide rose
misty flint
#

gpt-2 yes, openai-gpt (gpt1) yes, not gpt-3

#

ive used the first two in my NLP class

minor elbow
#

u can use np.newaxis/None as a new dim eg np.ones((3,4))[:,:,np.newaxis].shape

pastel valley
#

if i dont plan to validate or change my model do i just use train and test set? and use the test set in validation_data ?

#

instead of using validation set in validation_data for fit() method then i can just use the test set to evaluate it? or it is necessary to just evaluate the model on test set after the training?

prime hearth
#

hello, i would liek to please ask, im currently self teaching machine learning. The algorithms I learned mostly are about linear regression and neural networks and logitic regression. I learned a bit about featring engineerin like how to handle misisng values or imbalance dataset and do hyper paramter tuning a bit. I am focusing deeper now in NLP

#

am i on the right path to landing first internship?

#

im also making a machine learning project with NLP that uses full stack and kubernet

frosty flower
#

I have an array with shape (2, 3) and an array with shape (2, )

#

I want to do an element-wise division

#
a = [[1,2,3], [4,5,6]]
b = [3, 3]
# Want: 
c = [[0.333, 0.666, 1], [1.333, 1.666, 2]]
prime hearth
#

you can just do

#

a/b

#

assuming b is 1 element

frosty flower
#

It says "operands could not be broadcast together with shapes (2,3) (2,)"

prime hearth
#

yeah because must be same shape

#

since b has same element

#

can just use first index

minor elbow
minor elbow
frosty flower
#

It works. Do you mind explain what exactly the code does?

minor elbow
pastel valley
#

or maybe at first i split the trainin data to tain and validation set and after some changes i use all training data for training set and test set for validation?

minor elbow
#

ok theres 2 things here

#

lets say u just have a train and test set

#

you build your model on the train set, and it seems to go ok, then you run it on your test set but it does not do as well.

#

so you change something in your model, do the training again on the train set, test it again with the test set and it does better

#

at this point u have no idea how your model will perform on unseen/new data

#

like it should be ok, but you dont actually know, because you have reused ur test set and have potentially implicitly introduced overfitting

#

so how do u know? the validation set

#

if u go and make changes after evaluating performance on some new data set, that new data set cease to be a useful indicator of how the model will perform on unseen data

#

you explictly said in your question you would not be making any changes to the model

#

so u need to be clear if you are going to tweak the model, or accept its performance as is

pastel valley
#

i am trying to compare image augmentation techniques on which method will give the best raw performance in order to do so without bias i created a model from scratch and compare their performance without optimizing the model because if i do change the models it should always be identical and if i do that optimizing it will just make biases for the models

#

if i explained it right so i dont need to use validation i just create 2 identical models train with different augmented techniques and test which one gives the best raw performance
by raw performance i mean no optimizations or tweak to be applied to the models so they will stay the same

minor elbow
#

yeah i think if i understand u, just a train and test set is needed and you can report the results for each augmenation type

pastel valley
#

i even go to the lengths of copying the initial weights of models so they really have the same starting point

minor elbow
#

yeah you should be able to set the random seed which should give the same initialization

pastel valley
minor elbow
#

i am not familiar enough with those libraries to be able to say

pastel valley
minor elbow
#

i prefer setting the seed unless you are really sure there are no other random things happening in other layers

#

but im not really a dl person

#

i dont know enough about keras/tf to be able to say either way

pastel valley
#

this is what is says on validation data of sequential.fit()
if i understand it correctly the models is seeing the validation every end of epoch but its not being trained for it right? so its the same as testing the models on new data did i understand it correct?

pastel valley
minor elbow
#

deep learning

#

yeah the validation_data wont be used to train, so u could use your test set there and the results of your experiment would be whatever the final metrics are from the final epoch

pastel valley
#

nice nice

#

btw this is the example i trained without validation data

#

the metrics showed their are the blind guess of the model on training right?

minor elbow
#

im not 100% but if you didnt provide any validation data then yeah they would be the training data metrics

#

its good to track the metrics with a validation set

#

im always really suspicious of a model that learns really well such as that one seems to be doing

#

its often a sign of the model overfitting the training data

#

but it depends on the type of data you are working with, sometimes things do work very well

pastel valley
#

overfitting is when the model is too good on training but garbage on new data right?

minor elbow
#

yeah

pastel valley
#

underfitting is what?

#

not learning enough?

minor elbow
#

more or less

#

its the bias/variance tradeoff

pastel valley
#

if the model predict something like this what does it mean?

minor elbow
#

thats just the class

pastel valley
#

also softmax is probability distribution right?
so if am getting 1 is the model too confident of the answer?

minor elbow
#

softmax is a probability-like distribution

pastel valley
minor elbow
#

yeah

pastel valley
minor elbow
#

the element with the highest number is the most likely class

#

relative to the others

pastel valley
#

the overall values of the classes should be 1 right?

minor elbow
#

the values should all sum to 1 yes, thats what makes it probability-like

#

and they are all in the range [0,1]

pastel valley
#

oh isee thank you

#

btw how is overfitting underfitting solved?

minor elbow
#

hyper parameter optimization, regularisation, different types of models, stuff like that

#

its kinda a how long is a piece of string type thing

#

depends on ur data, model, approach, desired outcome etc

pastel valley
#

in my case if i am comparing if the other model is overfitting while the other one is not then i can say the dataset is the one responsible ?

minor elbow
#

overfitting is fundamentally a model parameter problem

#

but it may also be due to not having enough data

#

in which case, you can still change model paramters to prevent overfitting, however it likely wont be a very good model

lone drum
#

Hello
I am trying df.to_csv()
But it is not creating CSV file
It has created folder name as file name but not file
Ping me when replying

#

I am not getting output as CSV file in specified path

mint palm
#

is state space search heuristic method??

lapis sequoia
#

like best first search uses heuristic, while bfs/dfs/uniform search doesnt.

serene scaffold
mint palm
#

both are not heuristic, right??

lapis sequoia
mint palm
#

look at this

lapis sequoia
# mint palm look at this

I am not aware of the source of given slide, but as much I've studied in this field, I never used heuristics in bfs and dfs while making programms.

#

hill climber needs heuristic but not bfs or dfs.

mint palm
#

yeah i agree its wrong...it even specifies blind bfs...which has no heuristic

lapis sequoia
#

may be the person put it to explain that we dont do that in bfs and dfs by putting blind because well, it does not have any info about environment, except what environment serves as next state.

mint palm
#

hmm may be

pastel valley
#

how can i get the the true positives etc from keras after training so that i can create confusion matrix? multiclass

lapis sequoia
pastel valley
#

y pred is i get from this?

#

what is y test?

odd meteor
# pastel valley what is y test?

Y_test = your target variable's true value (the portion which has not been seen by your model.) Y_train on the other hand is also the true value of your target variable but the portion used to train your model.

Y_pred = The prediction made by your model

teal mortar
#

anyone saw any difference between normalising whole data vs using only batch normalisation, or using both at the same time, what is a more optimal solution?

pastel valley
#

during prediction the row here are the test images and the column are the classes right?

#

but the order is not the same from the generator to me trying to manually input a single image

#

i tried to predict using the very first test image and the output is different from the 1st row of the predictions using test_generator

#

i did not use the shuffle parameter so my test set probably reading the test images in order?

#

oh it default shuffles hahaha my bad

odd meteor
# teal mortar anyone saw any difference between normalising whole data vs using only batch nor...

Normalising your whole neural network inputs improves your model no doubt. But remember that deeper layers are trained based on the output of the previous layer. And since the weight gets updated via gradient descent, the consecutive layers unfortunately will no longer benefit from the earlier normalisation since they need to adapt to the previous layer's weight changes; hence, finding it much troublesome to learn their own weight!

With Batch Normalisation, we can evade such incidence with finesse! This is because Batch Normalization makes sure that, independently of the changes, the input to the next layer is normalized. And above all, it does this inna smart way with trainable parameters that also learn how much of this Normalization kept scaling or shifting it.

I hope you understand it better now. ✌️

teal mortar
odd meteor
# pastel valley oh so the y test is the one i will provide manually?

I'm not sure I understand what you mean mean by 'providing Y_test manually' but both Y_test and Y_train are gotten from the original Y when you split your whole data into train set and holdout set.

Y_test is very important because that's what you'll use to evaluate the accuracy/lapses/difference in the prediction made by your model (Y_pred).

pastel valley
odd meteor
pastel valley
#

if i try to get the y
its said too much to unpack

#

how to i get my y_test from the test_generator?

odd meteor
# pastel valley

I don't see any error here though. I'm more of an NLP guy (because that's what I'm learning at the moment) So I don't have enough experience in Computer Vision yet.

So if the problem is actually beyond what's in the pics then, other people here can help out

odd meteor
pastel valley
#

i now get the true values of my test set

#

btw is this normal predictions for the model? .

#

alot of negatives some doesnt even have a positive prediction that means the model cant predict that input?

acoustic halo
lapis sequoia
#

its this simple lol

#

1eN = 1 x 10^N

pastel valley
acoustic halo
#

no

pastel valley
#

aww

pastel valley
#

oh you guys back at it again haha

merry wadi
#

I'm working with some data and its ballooned into a massive amount of if statements that need to include or exclude certain key words. Looking like this
if ((routes[0] == 'IN' or routes[0] == 'BANG') and (routes[1] == 'IN' or routes[1] == 'BANG') ): return 'DBL DIG' if (((routes[0] == 'IN' or routes[0] == 'BANG') and routes[1] == 'UNDER') or (routes[0] == 'UNDER' and (routes[1] == 'IN' or routes[1] == 'BANG')) ): return 'DRIVE 6' if ((routes[0] == 'UNDER' and (routes[1] == 'SHORT OUT') and (routes[2] !='SHORT OUT' and routes[2] != 'RETURN' and routes[2] != 'RETURN')) ): return 'DRIVE 7'

I think to simplify this I'd be able to use a dictionary structure but I'm unsure how to proceed/ how it would work to exclude certain values as well. Any help would be appreciated!

urban prism
#

Would it overfit if I were to run a model, save it and then run it again with the same data?

midnight crater
#

Hello, I want to create a dataset regarding heart rate and oxygen saturation that will determine whether a person has fainted 0 or 1. My question is how will I get the fainted value for training?

Im creating my own dataset because I couldn't find any dataset that have these features.

serene scaffold
arctic venture
#

hello

serene scaffold
midnight crater
serene scaffold
#

it might be easiest to just show the data that you have. like copy/pasting the CSV into the chat, if you have that.

midnight crater
midnight crater
serene scaffold
#

For the rest of this conversation, I will only look at text (no screenshots).

So, like I said, you have to already know if they fainted or not. The point of machine learning is that it learns from real examples of the inputs (gender, age, HR, SPO2) and outputs (FAINT). So, you don't have access to a dataset with this information, you will have to conduct a study.

#

The alternative is to make up fake answers and see what happens, if this is just for educational purposes.

midnight crater
serene scaffold
serene scaffold
#

if you make a model that returns a y value for a given X, but you don't know what y is, you'll never be able to confirm that the model is correct.

midnight crater
#

well, that makes sense. it's really hard to find data that have similar attributes as mine

#

that's my only problem

serene scaffold
#

you can use unsupervised learning, which could tell you which X instances are more similar, and you might discover that there's two discernable subsets. But you'd have no way of knowing which is "faint" and which is "did not faint".

serene scaffold
#

for example, if you had a dataset of people that gave their birthday (as a timestamp), but you want their age (as a number of years), you could calculate that based on what you know about how age works.

midnight crater
#

thanks!!

steep cypress
#

Hello everyone, I've been learning ML, in neural networks now [Andrew Ng course]...was wondering if I should go ahead and try out neural network implementation in pytorch tutorial docs or try and implement it from scratch first, from course material and other resources. I do have a basic knowledge of the working, feed forwards, backprop and stuff [sentdex, 3b1b] BUT I wanted to try it out on a dataset and pytorch docs seems fun. What should I go for first?

frosty flower
#

I'm coding (manually) a MNIST categroizer using no hidden layer. So it's just input -> output -> softmax -> cross entropy loss. I was trying to calculate the output's derivative wrt the loss, and stumbled upon this formula:

#

So... I don't even need to calculate the loss to do back prop??

graceful glacier
#

hello

#

i was wondering if you could call a function from within .assign() that returns a dataframe instead of a series

#

so somthing like this

#
...asign(new_col = lambda dfx : get_df(dfx)[0])
#

would the '[0]' part be sufficient to turn it back into a series?

cosmic lynx
#

how large is difficulty spike for starting to learn how to make machine learning? ngl, I still feel like I'm shaking off rust, and I have no experience with interacting with large datasets...
Doing a little digging, it sounds kind of necessary to learn SQL to a certain degree at very least.

balmy willow
#

i made ai tictactoe guys

#

chk out this link

serene scaffold
cosmic lynx
#

okay thanks

vague kindle
#

Would it be possible (or a good idea) for someone with geometry level knowledge to learn the algebra and calculus that is needed for ml?

serene scaffold
vague kindle
serene scaffold
safe elk
#

We didnt split algebra like that in my uni lol

vague kindle
#

Do you think it would be a good idea or should I wait a few years until I take all the math courses?

serene scaffold
#

though if you're learning algebra, it might not be a bad time to learn array/matrix arithmetic.

serene scaffold
#

other branches of math you could look into are set theory and graph theory. they differ from the kind of math you learn in high school in that they're a lot more conceptual. there isn't a whole lot of calculating.

#

set theory is just about having things in groups, and graph theory is just about things and relationships between things. they're used to model real-world phenomena in precise terms.

vague kindle
serene scaffold
urban prism
#

My kaggle notebook gives memory error seemingly at random (can run smoothly one time and return an error another) what should I do?

frosty flower
#

Training on MNIST using a 1 layer NN (left, no hidden layer) and a 2 layer NN (right, 1 hidden layer).
So, in the 1 layer case, the accuracy reaches maximum after just 1 epoch and basically just fluctuates around there. Is it normal?

junior lintel
#

Hello, I am totally lost.
I am trying to learn machine learning ( but I am 14 which means I don’t have a lot of experience)
So I bought the famous and recommended book “Hands-On Machine learning with Scikit-Learn, Keras & Tensorflow”.
I then saw in the Prerequisites that this book assumes that I am familiar with Python’s main scientific libraries in particular Numpy, Pandas and Matplotlib.
I then learned these libraries but I don’t really understand the code part of the book still because it uses scikit learn, keras and Tensorflow without explaining what the syntax means so I told myself that i should start learning ML on youtube first but the explanations are too simplified so the reason I wrote this big message is to ask you please tell me where to start

#

Or for anyone who has already read this book does it explain later on the syntax of Tensorflow, Keras and Scikit-Learn?

pastel valley
junior lintel
#

I already know python

#

I am talking about the syntax of the machine learning libraries like tensor flow and Keras

pastel valley
#

it is python objects and such i think

#

maybe you want to find is the documentations of those library to know what those methods objects do?

junior lintel
#

Yeah what I am saying is I don’t understand the methods of these ML libraries but I am asking if (for anyone who read the book) The book will explain the methods later on and I am also asking where should I start to learn ML

pastel valley
#

there is a course recommended here its machine learning by andrew ng iirc

junior lintel
#

Ok

#

You just answered my question / helped me so thank you a lot

#

I quickly watched it and although math is extremely important for ML it only talks about math and not the libraries

pastel valley
#

that course is for theories i think and for libraries documentations is the way i think

junior lintel
#

Sorry?

pastel valley
#

i mean the course is for ml fundamentals and if you want to learn about the libraries its the documentations you should read i think

junior lintel
#

Ok

#

Do you know where I can find these documentations?

iron basalt
junior lintel
#

Ok thanks

iron basalt
junior lintel
#

Yes this is where I just went

#

Thank you a lot then @pastel valley and thank you @iron basalt have a great day

#

Both of you

grave frost
#

my advice: get up to scratch with your math background first, then do ML side-by-side

minor elbow
minor elbow
#

you can think of a neural network as a weighted set of linear/logistics regression, the "learning" part is finding the weights

#

if you have no weights (no hidden layer) theres nothing to learn

tidal bough
#

if there's no activation functions, your network is just a linear operator (regardless of the number of layers it has, actually), and so equivalent to logistic regression. Since without a hidden layer there's no activations either, the same happens here.

iron basalt
#

(if we are being pedantic, it's affine (from linear algebra POV, from calculus POV it's "linear" (deg. 0 or 1)), the activation function is linear)

#

("linear layer" then refers to the activation function)

junior lintel
grave frost
urban prism
#

Don't let your age bring you down. I'm 18 and got my first ML Emgineer job recently -which tbh I'm doing terrible at-

urban prism
#

Is there an explanation for my kaggle notebook to terminate itself with a memory allocation error every now and then?
It works fine one time and then gives me the error on another run

serene scaffold
urban prism
#

It normally just this:
Your notebook tried to allocate more memory than is available. It has restarted. but the notebook isn't consistent with the error.
Also I can send the part(s) where I think may be causing the errors but it will take time to reproduce

urban prism
#

After I run

img_height =256
img_width = 256
num_channels = 3
unet = unet_model((img_height, img_width, num_channels))
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

urban prism
#

which isn't an issue, I don't think

#

@serene scaffold Tried the give all the information needed, you can ask anything

#

Kind of desperate, been on this for hours now

serene scaffold
#

@urban prism sorry I've been afk. Though generally speaking, I don't know what guarantees Colab makes about how much compute power they'll give you

urban prism
#

It's alright. Thanks

misty flint
#

the recommendation is to reduce one or the other

#

or get colab pro

#

even then sometimes you run out of memory

serene scaffold
misty flint
#

it really seems random at times tbh. the info at the top right is sometimes useful, but i think sometimes youre sharing with others

#

its like some cloud providers

urban prism
#

I see. Thanks for the info! @misty flint

river maple
#

got this while training a custom model

#

does changing random in cfg file to 0 solve this?

river maple
#

im using google colab

zenith oar
#

Hi

#

how can i tackle this problem (which data structure i should use)

#

Basicaly i have 2 columns that represent each the white and black player in chess

#

I need to find out which player has played the most with another player

#

I was suggested to use this view

#

I am just not sure how is it called

safe elk
#

Looks like a crosstab

junior lintel
#

Hello, I am trying to learn the main ML libraries like Tensorflow and scikit learn but when I watch courses on YouTube like the ones made by “FreeCodeCamp.com” people (in the comments) say it’s a great tutorial but personally I don’t understand what’s going on as they just write the scikit learn methods without explaining what they do I am lost.

neat anvil
#

The documentation and tutorials on the tensorflow and sklearn websites is a great resource

#

As well, if you don’t understand the fundamental mathematics/logic of the ML algorithms, you may need to study up on that background knowledge before things start to make sense

weary flint
#

anyone want to start a data science podcast with me? i literally just started learning and have tons of questions so the podcast would just be me asking questions and my partner answering them

#

i think it could potentially be entertaining

#

or it could be some else that is also new and we report on our progress together

serene scaffold
pastel valley
#

yo this is overfitting right?

#

btw in that metrices i can also see that around 30 epochs the model started to learn almost nothing?

serene scaffold
#

or did you predict on the test data after every epoch...?

pastel valley
#

but the accuracy is 80% above mostly is it still good?

#

or do models should naturally get higher?

serene scaffold
#

if 80% is better than human performance for that task, then yes. if it's for something that's not very important, maybe

pastel valley
#

base on the test-train metrices their margins is it somewhat accceptable? or also depending on the task?

serene scaffold
#

it always depends. for all I know, you want a model that's always wrong.

pastel valley
#

there is not right or wrong models its just about tweaking it to how you want it to perform? its bugging me if i created a right model or a wrong one but if i understand you correctly then if this performance is acceptable to me then the model is right ?

neat anvil
#

So there’s a quote often used when making new theories in science in general- “there’s no such thing as a model that is right, only models that are useful”

#

This quote is doubly true for machine learning where you’re essentially trying to skip the “actually understand what is happening” part of science right to the “have a model that can predict future events” part of science

#

If you think 90% test accuracy on your data is good enough to be useful, congrats! you’re done fine-tuning that model

pastel valley
#

i still dont get it

serene scaffold
neat anvil
#

We can’t make that decision for you, there’s no arbitrary “model is good enough now” cutoff that applies to all models

pastel valley
#

but is there ever a model that has 98%+ accuracy? like is there ever someone capable of creating a very good model?

neat anvil
#

And yea, like you observed it seemed your model may have stopped learning anything new around 20-30 epochs of training. At least in terms of achieving better statistical metrics. That’s quite normal, google “neural network training early stopping” and you’ll find resources in that

serene scaffold
#

there are models that are basically 100% for problems that don't have ambiguous cases.

neat anvil
pastel valley
#

neural network training early stopping there are cases called over training where the model will decrease performance? its like after the ath its will just curve down? is this what it means?

neat anvil
#

One model being more “overtrained” than another means the difference between test metrics and train metrics is higher

#

This may or may not have anything to do with how many epochs of training the model undergoes

pastel valley
#

so if i plan to compare performances of models ill just use the same epochs for all models?

neat anvil
#

Unfortunately it’s not that simple. Models with different architecture may take different numbers of epochs to train to equivalent performance. Usually this type of thing is addressed by using a consistent validation set and consistent early stopping criteria between the models you are trying to compare.

pastel valley
#

i just create 3 models and train them to classify same classes but the data they will be trained is different per model

neat anvil
pastel valley
#

they are originally have the same data but applied with different augmentations techniques, for example the data set of cats, dogs, birds they will give their own training to the model
now if i apply to augmentationA to those original dataset then how much does the same model improved or if it performed worst then the same model trained on augmentationB with the original dataset then is A better or B or the base model without augmentation applied to the cats, dogs, birds dataset

#

does it make sense? 😅

#

its like experiment, does applying this kind of augmentation to this type of classes will be better or not or how about this type of augmentation or this one

#

like that

#

sorry my English is not that good 😅

neat anvil
#

Yes, it’s a sensible question, but since you’re introducing different augmentation data into the models you cannot just give them the same architecture and same training procedure and compare their performance and use this as a way to understand which augmentation is always “best”

#

How would you know if perhaps a slightly different model architecture trained on augmentation B data wouldn’t outperform the initial model architecture on un-augmented data?

pastel valley
# neat anvil Yes, it’s a sensible question, but since you’re introducing different augmentati...

yes this is also the one thing that will negate this experiment because there are cases where what if its a different architecture used then maybe the output will not be the same
but if i say like the class features
for example if i apply this experiment on classifying cars and say like augA is rotating etc, and augB is color casting etc, then it wouldnt makes sense because there seems to be nothing wrong with those augmentations
but if for example my model should classify something like color of a ball then there is a chance that augB will perform worst because applying colorcasting on the sample that color is a special feature would be a problem, example classes will be blue ball and red ball and i have a original sample of blue ball and when i applied augB it produced a red ball like augmented image then the model will learn that its red ball but infact its just augmented image from a blue ball
of course it is all only a "what if"

pastel valley
#

if i said it correct
this is the biggest question does it make sense? or i am wasting my time?

scarlet light
#

Hi can someone help me

neat anvil
pastel valley
#

or there are more?

neat anvil
#

Many, many more

pastel valley
#

if i say that given this architecture and this classes which type of augmentation is the best and worst then its back to why not use different model? hahaha wew

neat anvil
#

Indeed

broken shell
#

Hey, I want to learn more about AI and implement it into python scripts, but I don't know where to start. Any suggestions ? (I'm a total beginner when it comes to AI)

serene scaffold
urban prism
#

I'm trying to implement this

from tensorflow.python.ops.numpy_ops import np_config
np_config.enable_numpy_behavior()
def dice_metric(inputs, target):
    intersection = 2.0 * (target * inputs).sum()
    union = target.sum() + inputs.sum()
    if target.sum() == 0 and inputs.sum() == 0:
        return 1.0
    return intersection / union
def dice_loss(inputs, target):
    num = target.size(0)
    inputs = inputs.reshape(num, -1)
    target = target.reshape(num, -1)
    smooth = 1.0
    intersection = (inputs * target)
    dice = (2. * intersection.sum(1) + smooth) / (inputs.sum(1) + target.sum(1) + smooth)
    dice = 1 - dice.sum() / num
    return dice
def bce_dice_loss(inputs, target):
    dicescore = dice_loss(inputs, target)
    bcescore = tf.keras.losses.BinaryCrossentropy()
    bceloss = bcescore(inputs, target)

    return bceloss + dicescore

Though it returns:

    /opt/conda/lib/python3.7/site-packages/keras/engine/training.py:853 train_function  *
        return step_function(self, iterator)
    /tmp/ipykernel_35/144329947.py:17 bce_dice_loss  *
        dicescore = dice_loss(inputs, target)
    /tmp/ipykernel_35/144329947.py:8 dice_loss  *
        num = target.size(0)

    TypeError: 'NoneType' object is not callable

I'm a bit lost. Any ideas?
unet.compile(optimizer=Adam(learning_rate=1e-4), loss=[bce_dice_loss], metrics=[dice_metric])

broken shell
serene scaffold
#

@urban prism if you get an error about NoneType, it usually means something returned none when you thought it returned something

urban prism
#

Yay, debugging time

#

time to use bunches of prints :P

broken shell
serene scaffold
#

there's also an online course by Andrew Ng, but it hasn't been reviewed by our staff yet. I also recommend 3blue1brown on YouTube for the math stuff.

broken shell
#

Thanks !

merry wadi
#

whats the best way to code a lot of if statements?

serene scaffold
merry wadi
serene scaffold
misty flint
#

sigh

#

i hate minitorch

#

if anyone asks you to do it for fun to "learn ML from scratch", you should heavily reconsider

#

unless thats something youre passionate about, then feel free

grave frost
#

doesn't make sense re-implementing literally everything fom scratch

#

I'd say just learn how to implement stuff, implement papers then. Writing your own autograd engine is pretty useless IMO since its just heuristic/rule-based anyways

misty flint
#

yeah too bad its assignment for deep learning class

#

so no choice but to drag my feet

#

even tho i will never use this again

grave frost
#

atleast you'll get a better understanding of tensor manipulation

#

use einops if you aren't already - might save you a ton of time

iron basalt
# grave frost I'd say just learn how to implement stuff, implement papers then. Writing your o...

Autodiff systems are actually pretty straight forward, the real difficulty by these kinds of things (libraries) is having ALL the features. It's having to implement N different things. And if they can interact with each other it starts to become an N^2 problem. For example when implementing a database and wanting all the different selections, HTTPS interface, maybe a GUI, etc vs just having the base systems (the record storage system / file format, some indexing).

#

Even if the different features are not hard to implement (especially if someone already did it before), it just takes a lot of time, especially for debugging it all together.

#

On the other hand, if you know you only need a few of the features, then it wont take too long and you can probably optimize it further because it's more specific and more specific / less general tends to be faster on computers (and less code, so easier to debug and maintain and browse, etc).

grave frost
#

agreed. in the end, its more of a programming exercise than conceptual one

iron basalt
#

Yeah, since it's been done before (it's called minitorch after all, so it probably has nothing new in it).

grave frost
#

oh well. I suppose I'd have to do it too one day

iron basalt
#

Ofc, being able to do that grind is super important if you want to make something new, it's a huge initial hill to climb before you see any results.

#

Or in RL terms, a very very delayed reward.

grave frost
#

doesn't seem like a particularly useful grind - but I suppose my programming skills needs a lot of work

#

yea. still...

iron basalt
#

If you do want to make something new that requires implementing a new library / system, then I would recommend making it very specific, don't let the feature count get too high because the time needed is not a linear function of the number of features, so adding even one more might make it way more work than expected.

iron basalt
misty flint
#

yes im glad the guy who does this for work agrees with me

#

i now feel 200% validated

upper spindle
#

When I execute this code, trainx = input_df.loc[train_index], I get this error KeyError: "None of [DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',\n '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',\n '2020-01-09', '2020-01-10',\n ...\n '2021-05-17', '2021-05-18', '2021-05-19', '2021-05-20',\n '2021-05-21', '2021-05-22', '2021-05-23', '2021-05-24',\n '2021-05-25', '2021-05-26'],\n dtype='datetime64[ns]', name='Date', length=512, freq=None)] are in the [index]"

#

Im not sure why

frosty flower
#

How should I adjust the learning rate wrt the batch size?

vague kindle
#

Sorry if this is a stupid question but what are some actual uses of svm, knn, logistic or linear regression, and other similar algorithms? Wouldn't a human be able to figure out patterns like this and accurately predict outcomes?

frosty flower
#

Some of them are used as components of larger and more complex systems

carmine gulch
#

Hello, does anyone know how to deploy and easily share notebooks from Jupyter without having to spin up a separate streamlit server? In other words, I want a way to share notebooks like they were separate dashboard pages. I don’t want to pay for plotly enterprise.

minor elbow
carmine gulch
minor elbow
#

linear and logistic regression is also valuable as a tool for interpretation, they are widely used in medical research and other sciences

#

also its not a question of if humans can do it, humans can drive cars great too but ppl are spending a lot of time on models for that

#

ML is best suited when u have relatively simple tasks you need to do a gagillion times very quickly

#

also the different model types do better with different types of data/problems, its not necessarily possible to tell in advance what is the best type of model to use for a given problem

#

its a rookie mistake i see over and over to pick deep learning models for everything

frosty flower
minor elbow
#

good point

neat anvil
# carmine gulch I basically want to create an on-prem instance with some way to publish the out ...

You could host a Jupyter lab instance with read-only notebooks? https://stackoverflow.com/questions/58944458/read-only-python-notebook-in-jupyter-lab

carmine gulch
#

Thanks for the link though

plucky willow
#

does anyone have a really simple example of multivariate regression with tensor flow?

#

I have done regression with a csv with just an x and a y, but if i had two independent variables what would i do

#

the data that i am using is 100 random floats from -10 to 10 for all 3 columns

#

but idk how to find any relationship btw them

#

i understand 2 variable linear regression, but i am completly lost with 3 variable regression

somber prism
#

guys is there any AI assisted image labelling tool thats actually free to use? like all we need to do is draw a bounding box for some images and rest will be taken care by the ai

#

i have like 278k images but i want to label that 😭

river maple
#

this might help?

somber prism
steep cypress
#

---
hello, I was learning pytorch from the docs and following some sentdex neural networks examples. Was wondering if it is necessary to transform the target labels into one-hot encoded vectors. If I transform I'm having problem with nll_loss()

remote ridge
#

hey guys! i am new here, i am having trouble choosing a project for my final sem in uni, i would like to make a project on machine learning, any idea where can i get help from and get the project done? are there any good courses in udemy which can help?

barren jungle
#

Hey i need to know how to use stegnatography to hide a code inside an image and when omage opened by someone it pastes a code inside the browser console , pls this is the imformation i required for my project if u know pls help

#

Nobody help regarding ai and datascience in this server

shell drift
#

Hello, I want to start my AI/Ml journey can anyone please share some roadmap or structure which I can follow for learning

urban ore
#

can i train a neural network with untrainable params in them?

urban prism
#

How do I make it save according to the increase of metric?

312/312 [==============================] - 74s 197ms/step - loss: 0.4691 - dice_metric: 0.7883 - val_loss: 0.5346 - val_dice_metric: 0.7840

Epoch 00001: dice_metric improved from inf to 0.78828, saving model to model_unet.h5
Epoch 2/32
312/312 [==============================] - 67s 207ms/step - loss: 0.2512 - dice_metric: 0.8916 - val_loss: 0.2132 - val_dice_metric: 0.9031

Epoch 00002: dice_metric did not improve from 0.78828
Epoch 3/32
312/312 [==============================] - 71s 217ms/step - loss: 0.1809 - dice_metric: 0.9258 - val_loss: 0.1983 - val_dice_metric: 0.9246
tacit basin
tacit basin
tacit basin
remote ridge
#

CNN for traffic sign classification using LE-Net

tacit basin
odd meteor
tacit basin
urban prism
# tacit basin As i can see it saves model on metric increase

312/312 [==============================] - 74s 197ms/step - loss: 0.4691 - dice_metric: 0.7883 - val_loss: 0.5346 - val_dice_metric: 0.7840

Epoch 00001: dice_metric improved from inf to 0.78828, saving model to model_unet.h5
Epoch 2/32
312/312 [==============================] - 67s 207ms/step - loss: 0.2512 - dice_metric: 0.8916 - val_loss: 0.2132 - val_dice_metric: 0.9031

Epoch 00002: dice_metric did not improve from 0.78828
But didn't it from 0.78828 to 0.8916 ?

remote ridge
#

i just dont know if its viable for resume and project

tacit basin
urban prism
#

What do you mean?

urban ore
urban ore
#

i have an autoencoder

#

with 5 layers initially trained and now i have frozen the first 3 for transfer learning

#

and have created new layers as well

desert bear
#

hi i am working on a fun project. it is a sort of alexa and it is traint with tensorflow. the text is classifier with a nlu. now i was wondering if there is a character that i can put in my training model that can mean any word? or is this not necessary and do i need to train it with for instance city names.

barren jungle
#

@tacit basin let it be btw knowledge not a bad thing

tacit basin
tacit basin
urban prism
#

I solved it. Apperantly I wrote mode="min" to the wrong kernel (two of the same kaggle notebooks were open)

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @stray crystal until <t:1646046249:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

midnight crater
#

hello, why is this returning an error?

for x in range(len(y_pred)):
    print(X_test[x], y_test[x], y_pred[x])```
I've checked the length of the 3 variables and they are all the same, but somehow the `X_test` and `y_test` throws an error
pine wolf
#

are they all lists?

#

if one or more are dicts, say, then there might be an error

midnight crater
#

ops, no, I already assumed they are all in array

#

only the y_pred is in array

pine wolf
#

i would try using zip instead

#

it would look like this:

for x, y, z in zip(x_test, y_test, y_pred):
    print(x, y, z)
midnight crater
#

I just used .to_numpy() to convert them to list

#

Thanks!!

pine wolf
#

np

wicked grove
#

Hello,how can i plot the training and validation curve after kfold cross validation??

hard basalt
tacit basin
# wicked grove Hello,how can i plot the training and validation curve after kfold cross validat...

Sci-kit learn example

from sklearn.model_selection import validation_curve

max_depth = [1, 5, 10, 15, 20, 25]
train_scores, test_scores = validation_curve(
    regressor, data, target, param_name="max_depth", param_range=max_depth,
    cv=cv, scoring="neg_mean_absolute_error", n_jobs=2)
train_errors, test_errors = -train_scores, -test_scores
plt.plot(max_depth, train_errors.mean(axis=1), label="Training error")
plt.plot(max_depth, test_errors.mean(axis=1), label="Testing error")
plt.legend()

plt.xlabel("Maximum depth of decision tree")
plt.ylabel("Mean absolute error (k$)")
_ = plt.title("Validation curve for decision tree")

https://inria.github.io/scikit-learn-mooc/python_scripts/cross_validation_validation_curve.html

tacit basin
desert oar
#

if you just want to share notebooks as finished products in some on-prem fashion, your options are:

  1. use nbconvert to convert a notebook to plain html, which can then be shared however you want to share plain html
  2. nbviewer, which just does (1) automatically a server application
#

@tacit basin fyi since you were interested too ☝️

tacit basin
#

Google colab is an option too. Or deep note...

urban ore
#

any idea on how to add new layers to a model in torch?

neat anvil
#

this delightfully simple example from there

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.fc = nn.Linear(10, 2)
        
    def forward(self, x):
        x = self.fc(x)
        return x


model = MyModel()
x = torch.randn(1, 10)
print(model(x))
> tensor([[-0.2403,  0.8158]], grad_fn=<ThAddmmBackward>)

model = nn.Sequential(
    model,
    nn.Softmax(1)
)
print(model(x))
> tensor([[0.2581, 0.7419]], grad_fn=<SoftmaxBackward>)
urban ore
#

and i need to add my encoding function in it and idk how

thin palm
#

What's up Python gang, when building a project is it bad to have all our data cleaning, encoding, and data engineering all in one file?

desert oar
#

well actually.. not if it's small

#

it's not bad to start that way

#

actually i take back what i said because i misread

#

yes, all of your data processing stuff can be in one file

#

however consider that if the file is really big you might benefit from writing some separate modules

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @topaz gale until <t:1646075445:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

neat anvil
# thin palm What's up Python gang, when building a project is it bad to have all our data cl...

I'd highly recommend taking a look at DAG execution tools like https://www.nextflow.io/blog.html, https://airflow.apache.org/, https://www.prefect.io/, or https://dvc.org/. Which one works best for you will depend on your specific needs. They have many benefits beyond encouraging you to adopt the better practice of separating your python code into separate modules

tacit basin
#

Or ploomber https://ploomber.io/
This is example on how to rewrite one huge notebook into a pipeline
https://ploomber.io/blog/refactor-nb-i/
There are many tools to chose from as usual
https://ploomber.io/blog/survey/

neat anvil
#

so many DAG engines

#

it's frankly silly

tacit basin
#

Yep

iron basalt
#

If you want nice visual DAGs (which can also run Python, but it's its own language too (visual scripting)), then check out Enso: https://enso.org/

desert oar
#

it's silly but i think the big problem is that they all do slightly different things and have slightly different benefits

#

this ploomber article is useful but it's frustrating that they are also clearly (somewhat) biased

desert bear
#

what is the best way to train a ai to recconice citys and countys

serene scaffold
desert bear
#

not realy i am at a problem that i wanne train a model like a alexa or siri and i dont realy know how to begin with this i use a nlu model and tensorflow

serene scaffold
#

can you give a specific use case? "recognizing cities and counties" could mean a lot of things

desert bear
#

i have a start but i wanne use it that when i ask where new amsetdam lays that it will awnser in the usa so i need to train it to knwo the sentens and to inow that new amsetdam is a city

tacit basin
desert oar
#

not that i know of

#

i wish there was

desert oar
#

otherwise you are looking for a general category called "named entity recognition"

serene scaffold
#

and since that's a popular problem, there are lots of off-the-shelf solutions. you can use spaCy.

desert bear
#

alright i think i get it thanks.

neat anvil
#

spaCy is great. 100% recommend

desert bear
neat anvil
#

very

tacit basin
# desert bear alright i think i get it thanks.

Some more model ideas for question answering task https://paperswithcode.com/task/question-answering

Question Answering is the task of answering questions (typically reading comprehension questions), but abstaining when presented with a question that cannot be answered based on the provided context.

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular b...

desert bear
thin palm
#

Python gang I have a question for making a Class for a "Trainer" to train my machine learning model. If I want to add a Scaler, where would I add this within my code? I'm trying to understand how I'd incorporate a scaler into this:

    def __init__(self, X, y):
        #  X: pandas DataFrame
        #  y: pandas Series
        self.X = X
        self.y = y
        self.knn_model = None

    def set_model(self):
        """ defines our model as a class asttribute"""
        self.knn_model =  KNeighborsClassifier(n_neighbors=10)

    def run(self):
        self.set_model()
        self.knn_model.fit(self.X,self.y)

    def evaluate(self, X_test, y_test):
        r2_test = self.knn_model.score(X_test, y_test)
        y_pred = self.knn_model.predict(X_test)

        # Confusion Matrix
        print(confusion_matrix(y_test, y_pred))
        # Accuracy
        print(accuracy_score(y_test, y_pred))
        # Recall
        print(recall_score(y_test, y_pred, average=None))
        # Precision
        print(precision_score(y_test, y_pred, average=None))

    def save_model(self):
        joblib.dump(self.knn_model, 'model.joblib')
        print(colored("model.joblib saved locally", "green"))
desert oar
#

(note: self.knn_model is an instance attribute, not a class attribute)

#

you would add any transformers as additional instance attributes

#

alternatively, use a Pipeline and set that as self.knn_model

#

i think scikit-learn pipelines are very useful

thin palm
desert oar
#

it's easier to use them

thin palm
desert oar
#

what does a pipeline have to do with a notebook?

thin palm
#

I'd have to go back and test my pipeline in notebooks before I move into production

#

was told to always experiement on notebooks then go to Visual Studio

thin palm
desert oar
#

you don't have to do that

#

test it however you need to test it

thin palm
desert oar
#

i also question the value of this set_model method

thin palm
#

maybe you're right with the Pipelines being easier

desert oar
#
class Trainer:
    def __init__(self, X, y):
        self.X = X
        self.y = y
        self.model = None

    def build_model(self):
        return make_pipeline(
            StandardScaler(),
            KNeighborsClassifier(n_neighbors=10),
        )

    def run(self):
        self.model = self.build_model()
        self.model.fit(self.X,self.y)
#
thin palm
#

Sorry some times I find my self asking questions that only confuse my self... in my notebooks I decide to scale once after we have our X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = .3, random_state=0)
how do I incorporate this in my class or do I need to do it in my class?

urban prism
#

Is there a way I can calculate my model's gpu memory and ram usage? The methods I've found seem to focus on the usage while training. I want to see how it is while using it

neat anvil
#

I just use htop for stuff like that, but I'm sure there's a better way

urban prism
#

Yeah

desert oar
proven ermine
#

Anybody have a rec for a course or website to learn python data science best practices? I'm an experienced data scientist in R and I have familiarity with python but I'm interested in learning the Pydata stack (particularly Dask)

neat anvil
#

If you're already experienced in data science generally, the tutorials & user guides for the big libraries would probably be the best place to turn IMO. Tensorflow, pytorch, sklearn, pandas, all have extensive and well-written guides/tutorials.

ember minnow
#

In order to process the data, I used the lookback index of 60 points, so when trying to predict on data that is outside of the dataset, I would need the last 60 points as well, but I am I doing something wrong with the way I am predicting?

azure lagoon
#

hello! i got redirected from #discord-bots ... was wondering if anyone had a plt.style built for outputting to discord?

serene scaffold
pallid bramble
#

I have two lists of States. is it possible to use Pandas to compare the two lists, and create a new series?

#

I'm going to try set intersection

#

I'm not being very clear. I want a list of the items from list A that are NOT in list B. Forming list C. or subtracting list B from list A.

serene scaffold
#

yes, sounds like you should use sets and then convert the result back to a Series @pallid bramble

#

yes but I don't know what you're going to ask. you should always just ask your actual question, not if someone knows about something you haven't asked.

#

Nope

#

@stuck storm all you have to do is ask a question that someone can look at and see what the question is. then whoever's available can try to answer it.

pallid bramble
#

thank you. that works great. its the difference method, not intersection

serene scaffold
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

{2, 3}
serene scaffold
#

why do you keep deleting stuff that you say

pallid bramble
#

thank you. i like that better too.

serene scaffold
#

it was better before when you had the function header. but thanks for saying what the variables are. what are you confused by? (by the way, your question is really about neural network theory, not so much numpy. that's another reason why it's better to ask your actual question, not say what you think the topic is.)

#
def loss_and_gradient(X, A, y):
    """Compute the loss"""
    loss = np.sum((X @ A - y) ** 2) / 2
    #compute the gradient
    grad = X.T @ (X @ A - y)  # this is an array
    # return the loss and gradient
    return loss, grad
#

is this a tuple of (X, A, y)? also, what does it do that is different than what you expected?

short delta
#

Hi guys, I am trying to debug a hand me down code base.

The script ingest an input excel form to generate an output excel form to add in the query column but it seems like its not working it kept complain about [Column A, Column B, Column C] is not in index.

I added Column A and B to the input excel form and it cleared the error for their respective not in index error.

However, Column C that is not supposed to be in the input excel form is being flagged not in index but is expected to be in the output excel form and it is not there after it was generated.

Looking for anyone that can assist with this debugging

serene scaffold
#

not entirely sure. I rewrote it at as this:

def loss_and_grad(X, A, y):
    foo = (X @ A) - y
    return (
        np.sum(foo ** 2) / 2,
        X.T @ foo
    )
serene scaffold
short delta
#

erm i cannot show the code..

#

that is the issue lol

serene scaffold
#

@stuck storm do you not want me to help? I was looking at what you had written right as you deleted it

short delta
#

I need some time to prep those then

tacit basin
tacit basin
tacit basin
short delta
#

I will update with details later on, gotta get some work done in office now

urban knoll
#

I'm not too sure if this is the thread I should use to ask questions about clustering , but since its about AI, here I go,. I'm interested in applying agglomoretive clustering to something I'm working on, specifically the Ward method, but I'm unable to find usefule information on how to fill out the parameter connectivity, I know what it is but I spnt know what exactly I should place in there, like whats the foramt basically?Is it some np.shape() tyoe thing I'm su[poosed to place in there? I've already looked here: https://scikit-learn.org/0.15/modules/generated/sklearn.cluster.Ward.htmlAgglomerativeClustering(n_clusters=7,. Doesnt say what the format is.

desert bear
idle tartan
#

Holaa guys I am working on deepfake detection system I have to submit it within one week can anyone help me with it?

lone drum
#
Traceback (most recent call last):

  File "D:\college_project\modules\model_train.py", line 21, in <module>
    model.add(MaxPooling2D(pool_size = (2,2)))

  File "C:\Users\shubh\anaconda3\lib\site-packages\tensorflow\python\training\tracking\base.py", line 629, in _method_wrapper
    result = method(self, *args, **kwargs)

  File "C:\Users\shubh\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "C:\Users\shubh\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2013, in _create_c_op
    raise ValueError(e.message)

ValueError: Exception encountered when calling layer "max_pooling2d_7" (type MaxPooling2D).

Negative dimension size caused by subtracting 2 from 1 for '{{node max_pooling2d_7/MaxPool}} = MaxPool[T=DT_FLOAT, data_format="NHWC", explicit_paddings=[], ksize=[1, 2, 2, 1], padding="VALID", strides=[1, 2, 2, 1]](Placeholder)' with input shapes: [?,1,1,16].

Call arguments received:
  • inputs=tf.Tensor(shape=(None, 1, 1, 16), dtype=float32)``` how to fix this error ping me when replying
tidal bough
#

probably something went wrong at an earlier stage, since 1x1 is a weird size to have.

lone drum
abstract falcon
#

Can we plot accuracy/loss curve of each iteration in ML algorithms? If yes, Can somebody suggest me any resource to refer?

tidal bough
abstract falcon
#

sklearn

tidal bough
#

Oh, I see. And what kind of model are you training? Is it a neural network or something more classical?

abstract falcon
#

I will be training four models Logistic Regression, SVM, DecisionTrees and Naive Bayes. And I want to see how does each model perform in that dataset.

tidal bough
lone drum
tidal bough
gloomy prawn
#

Hi, I have table with properties in first row and object in first column and I must put an X in each intersection of object with its property like this: https://github.com/xflr6/concepts/blob/master/examples/relations.csv, after that I need to do some manipulations like deleting one column , deleting one row etc... my question is what's the best way to do that ? Pandas ?
Thanks

tidal bough
#

oh hold on

#

@lone drumah, your convolution layers aren't just 3x3, they have a kernel_size of 3 (so 3x3) and a strides of 3 (so 3x3).

tidal bough
#

so then:
0) input: 32x32

  1. after first convolution: something like 10x10
  2. after max pooling: 5x5
  3. after second convolution: 1x1
  4. and the second max pooling fails, since the input size is too small for a 2x2 pooling
lone drum
#

ah okay

#

but when i use this code previously it worked

lone drum
tidal bough
#

Alter your layers in some way so that they don't squeeze the image too much. Can't really recommend how exactly, I haven't worked with CNNs.

maiden shore
#

I'm not sure how much of a mouthful this is but could someone please explain linear regression and cost functions to me like I'm 5

tacit basin
idle tartan
#

I am working on deepfake detection system I have to submit it within one week can anyone help me with it?

onyx island
#

Hello, I'm using K-NN (SVM may be) classifier to detect defective part on image.
After training the classifier, I have the confusion matrix, etc.
Now I want to put a label on the original image to show which part is defective or not.
How can I retrieve data after training the data ?

maiden shore
tranquil oak
#

whats the best begginer course for ML?

tacit basin
tender talon
#

def hello():
return 'Goodbye, Mars!'

#

need helps please

#

help how to print "hello world!"

prime hearth
tranquil oak
#

I heard the basics in ML are very large, so a course that would cover them would be wonderful, thanks for the article, very useful aswell!

prime hearth
#

Yeah i agree, this article I found to be helpful as it really is targeted for beginners and explains all the math concepts easily with examples and data science terms

#

But there are many other articles out there that are also great for lineae regression

junior fractal
#

Hi

maiden shore
final kiln
#

can anyone recommend a good, math-heavy book on AI and data science?

hard basalt
#

I am just thinking of giving my users a set-solution for them to browse and tinker with their data

prime hearth
#

@maiden shore yeah so drawingg any random line at first

#

It same when implementing linear regression , we initialize the paramterers for our weights and bias as random values or 0

#

So initially line will be flat if everything initialized to 0

thorn venture
#

Hi I need to add some data from 3csv files to a Excel file as it is ; but the catch is there are dates available all files have series of dates like 5th feb to 1st mar. all dates need to be in a same section in master file. e.g. all 3rd feb will be consequent then all 4th will be added.
Should I copy paste and then sort only the date? will that work or any other ways there ? should I use pandas or openpyxl ? Pls help me guys I`m noob in this field.

serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold
#

the steps will go something like this:

  1. open all three CSVs as DataFrames (with pandas)
  2. Normalize the date representation for each DataFrame
  3. Concatenate all the DataFrames into one
  4. Sort the rows by date
final kiln
serene scaffold
thorn venture
#

This is how csv file data are there. I have to paste it like the way where all same dates should be together. then apply formulla to the new column.

serene scaffold
desert oar
desert oar
#

the latter has a new 2022 edition which is free to read (as a draft) online

#

the 2012 version is also free to read online, and is more polished, but might seem a bit out of date

upper spindle
#

what version of numpy runs with tf

solar phoenix
#

sometimes one of the values will = 0

#

in that case i want to take one of the numbers only and ignore the 0

desert oar
solar phoenix
#

if both are 0 i want to return 0

desert oar
#

it's easy to get so lost in all the pandas stuff that you forget to use the basic tools that are part of python!

solar phoenix
#

Thanks for the response, sorry, can you extrapolate on that?

#

oh so it is like

#

if x > 0: type thing?

desert oar
#

yeah, exactly

#

pandas isn't a separate programming language, it's still python

solar phoenix
#

sure sure, i get it, thanks

#

appreciate it

upper spindle
#

what does it mean if TypeError: 'NoneType' object is not callable

desert oar
#

usually this is a mistake in your program logic somewhere

#

perhaps you overwrote the name of a function (e.g. sum = None; sum([1,2,3]))

#

you should show your code and the full error traceback

#

it's unlikely that this is a datascience-specific problem

upper spindle
#
validation_split=0.2, shuffle=True, verbose=0, batch_size=batch_size, epochs=200)```
#

i executed this code, but says the error came from the mat_X_train mat_y_train line

real flame
#

Anyone here knows how to use R script?

serene scaffold
#

instead of being sorry for the ping, just don't ping. I am busy.

desert oar
# real flame Anyone here knows how to use R script?

in general it's a good idea not to "ask to ask" like this. you can ask if something is on topic or not, but "does anyone know how to X" is not a good way to get help, because it forces people to "interview" you before getting to any useful work

lapis sequoia
#

Fricking

real flame
desert oar
real flame
#

ok?

serene scaffold
#

@real flame salt rock lamp is giving you advice to help you get help in any online context. he is helping you, not criticizing you.

real flame
#

jeez stop attacking me I asked one question and I don’t need anyone’s advice I never asked for it nor do I care

serene scaffold
#

No one is attacking you. We answer a lot of questions in this server, so we're giving you suggestions that will increase the likelihood that you will receive help in the future.

real flame
#

‘It forces people to interview you’ where is the correlation between the question I asked and this response

#

I appreciate your help, but it was never my intention make it seem that way

serene scaffold
#

because someone would have to interview you about what your R question is before they could attempt to answer it. people usually don't want to do that--they want to see a question and start answering it.

real flame
#

I wanted to see if anyone knew R in the first place since this isn’t a channel exactly for it, nothing wrong with seeing if someone understood the subject

serene scaffold
#

nothing wrong with seeing if someone understood the subject
the best way for someone to know if they understand the subject matter of your question is to see the actual question.

#

how much R do they have to know to be able to help? they'll never know until the question is there.

real flame
serene scaffold
#

Even though R is out-of-scope, if you decide to ask your question, I'll allow it as a gesture of goodwill.

real flame
#

because I had no idea what they meant and it sounded like I meant something way different than what it actually was, don’t assume I was forcing anyone to interview me

#

they could have said this

#

anyways I understand what you meant, next time I’ll ask the question instead of asking who knows how to help

scenic ferry
#

I was told that I should join this community if I want to get into actual pyhton

serene scaffold
scenic ferry
#

I accidentally posted this here sorry

#

I meant to send this in general

scenic ferry
serene scaffold
scenic ferry
#

Nvmd found it

serene scaffold
#

where was it?

scenic ferry
serene scaffold
scenic ferry
#

Oh okay

serene scaffold
#

be sure to read the channel description before using a channel for the first time

timid forge
#

Am I allowed to post a reddit post here that I made instead of rewriting my question out?

serene scaffold
timid forge
#

I'm having a problem running code from the machine learning course on freecodecamp. I am getting this when I run it: Process finished with exit code -1073740791 (0xC0000409). I'm just wondering if this is because my computer can't handle the bigger data or if I am doing something wrong. I explained it a little better here https://www.reddit.com/r/MLQuestions/comments/t3vsj7/code_never_runs_and_getting_similar_error_message/

hard basalt
desert oar
#

maybe you can use nbconvert to convert the notebook to plain html with the widgets intact

arctic crown
#

please help what is a tensor?

serene scaffold
# arctic crown please help what is a tensor?

a generalization of an array. An individual number, like 5 is a 0-dimensional tensor. [7, 9, 10] is a one-dimensional tensor of shape (3,). [[4, 9, 7], [1, 0, 6]] is a two-dimensional tensor of shape (2, 3). and so on.

serene scaffold
arctic wedgeBOT
#

Hey @shut trail!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

timid forge
#

Feels like a big task tbh I don't know this stuff 😛 I just want to write code Dx

shut trail
#

you could be missing pyqt

tidal bough
graceful glacier
#

i have the following table

#

its per company per month

#

how can i create a 'last month sales' column?

neat anvil
#

@graceful glacier try using a HAVING block in your query

graceful glacier
#

HAVING?

#

isnt that SQL specific only?

serene scaffold
#

also, what is "last month sales"?

#

is it the sum of each sales value grouped by company and calendar month?

#

are March 2022 and March 2021 the same month?

graceful glacier
#

yea sure, im using pandas and last month sales would indicate the sales from the month previous in the current row. the sales are summed already. and this particular dataset has only two months in it for the same year(2020)

#

so the last month sales would look somthing like this

serene scaffold
graceful glacier
#

i can print the whole table, its not big

serene scaffold
#

you can do print(df.to_dict('list')) if you want.

graceful glacier
#

{'Company': ['British Soaps', 'British Soaps', 'Chin & Beard Suds Co', 'Chin & Beard Suds Co', 'Soap and Splendour', 'Soap and Splendour', 'Squeaky Cleanies', 'Squeaky Cleanies', 'Sudsie Malone', 'Sudsie Malone'], 'Date': [Timestamp('2020-03-01 00:00:00'), Timestamp('2020-04-01 00:00:00'), Timestamp('2020-03-01 00:00:00'), Timestamp('2020-04-01 00:00:00'), Timestamp('2020-03-01 00:00:00'), Timestamp('2020-04-01 00:00:00'), Timestamp('2020-03-01 00:00:00'), Timestamp('2020-04-01 00:00:00'), Timestamp('2020-03-01 00:00:00'), Timestamp('2020-04-01 00:00:00')], 'Sales': [671772175.7995872, 687222935.8429785, 483505038.56760347, 508107776.28321755, 1896382984.9812155, 1933258732.721503, 1790308563.639309, 1818003538.774453, 1747533597.012284, 1794467930.657847]}

serene scaffold
#

thanks, let me see

#
df.groupby('Company')['Sales'].diff()
Out[8]:
0             NaN
1    1.545076e+07
2             NaN
3    2.460274e+07
4             NaN
5    3.687575e+07
6             NaN
7    2.769498e+07
8             NaN
9    4.693433e+07
Name: Sales, dtype: float64

Here's one way to do it

#

this works because there's only one row per (company, month) and they're already in chronological order.

#

if one company had more than one row for a given month, you'd have to aggregate them to get unique (company, month) rows.

#

also this is the change in sales. I think that's what you meant

graceful glacier
#

yes this would be change in sales

#

and thanks for the solution

#

i think another solution might also be a rolling sum of 2 rows subtracted by the value in the sales column

serene scaffold
#

similar solution:

In [12]: df.set_index(['Company', 'Date']).groupby(level=0).diff()
Out[12]:
                                        Sales
Company              Date
British Soaps        2020-03-01           NaN
                     2020-04-01  1.545076e+07
Chin & Beard Suds Co 2020-03-01           NaN
                     2020-04-01  2.460274e+07
Soap and Splendour   2020-03-01           NaN
                     2020-04-01  3.687575e+07
Squeaky Cleanies     2020-03-01           NaN
                     2020-04-01  2.769498e+07
Sudsie Malone        2020-03-01           NaN
                     2020-04-01  4.693433e+07
serene scaffold
#

one of the most important considerations when you write data science code is that people can follow along with what you wrote.

graceful glacier
#

oh wait i just realized i dont want change in sales but rather the previous months sales

#

so sort of the step before you get change in sales

serene scaffold
#

sounds like you're just adding redundance

#
In [13]: df.set_index(['Company', 'Date']).groupby(level=0).shift()
Out[13]:
                                        Sales
Company              Date
British Soaps        2020-03-01           NaN
                     2020-04-01  6.717722e+08
Chin & Beard Suds Co 2020-03-01           NaN
                     2020-04-01  4.835050e+08
Soap and Splendour   2020-03-01           NaN
                     2020-04-01  1.896383e+09
Squeaky Cleanies     2020-03-01           NaN
                     2020-04-01  1.790309e+09
Sudsie Malone        2020-03-01           NaN
                     2020-04-01  1.747534e+09

In [14]: df.set_index(['Company', 'Date']).groupby(level=0).shift().fillna(0)
Out[14]:
                                        Sales
Company              Date
British Soaps        2020-03-01  0.000000e+00
                     2020-04-01  6.717722e+08
Chin & Beard Suds Co 2020-03-01  0.000000e+00
                     2020-04-01  4.835050e+08
Soap and Splendour   2020-03-01  0.000000e+00
                     2020-04-01  1.896383e+09
Squeaky Cleanies     2020-03-01  0.000000e+00
                     2020-04-01  1.790309e+09
Sudsie Malone        2020-03-01  0.000000e+00
                     2020-04-01  1.747534e+09
#

if you must

#
In [15]: df.groupby('Company')['Sales'].shift()
Out[15]:
0             NaN
1    6.717722e+08
2             NaN
3    4.835050e+08
4             NaN
5    1.896383e+09
6             NaN
7    1.790309e+09
8             NaN
9    1.747534e+09
Name: Sales, dtype: float64

this one is better if you want to attach it to the original df, since it's indexed the same way.

graceful glacier
#

ok great, i dont know why i didnt think of this before when i was playing around with shift

serene scaffold
#

🇸 🇭 🇮 🇫 🇹

graceful glacier
#

df_output_1 = df_sales.groupby(['Company', 'Date'], as_index=False)['Sales']\
                      .sum()\
                      .assign(Last_Month_Sales = 
                              lambda dfx : dfx.groupby('Company')['Sales'].shift())
serene scaffold
#

is this supposed to do the same thing as what I wrote

graceful glacier
#

yess, it does

graceful glacier
#

😂 the first two lines are to get the table into the form i screenshotted

serene scaffold
#
df_output_1 = (
    df_sales.groupby(['Company', 'Date'], as_index=False)['Sales']
        .sum()
        .assign(
           Last_Month_Sales=lambda dfx: dfx.groupby('Company')['Sales'].shift()
    )
)
#

what about this

timid forge
#

Sorry, can you explain what you mean by get cpu working? Also I checked and I have cuda 11.0 and in the github thread confusedreptile posted, one of the comments mentioned it not working with 11.0 but he got it with 10.1 so I guess I just need to go to a lower version. I remember when I was first doing this I had a reason I was going for 11.0 but I really can't remember now... I will try this fix in a bit 🙂 thanks for the help.

serene scaffold
#

if you're doing something that would benefit from CUDA, you might be able to do limited prototyping on the CPU.

#

whether or not you should solve the CUDA issues first is up to you, I guess.

timid forge
#

Ya I'm sure I don't even need my gpu involved right now but I jumped the gun when I started and wanted to do it the best way :/ Maybe I am better off forgetting all the extras for now while I learn the basics.

serene scaffold
timid forge
thick acorn
#

I would like to model global warming from temperature records recorded daily from June 1920 to October 2019 in Montélimar on Python. For this, I would like to model these seasonal variations with a sinusoidal fit. However, such a model fitted to the data set will not give any average temperature increase. I therefore try to apply a sinusoidal fit for each decade.

I first plotted the data from the data file and then I wanted to create a time variable to be able to do my ten-year average.

I would like to apply the sine fit for all the decades in the data file (not just the 1950s) and then plot the entire graph with the fit. However I have no idea how to do this in code. Does anyone have any suggestions?

As the code I wrote so far is quite long, I put it here: https://paste.pythondiscord.com/holaxaxawa

#

This is an example of model I must have

remote wave
#

Did anyone here did a project on stock trend prediction?

graceful glacier
#

hello

#

is it possible to .merge() a df to itself during chaining?

#

so somthing like this

#
df_output_1 = df_sales.groupby(['Company', 'Date'], as_index=False)['Sales']\
                      .sum()\
                      .assign(Last_Month_Sales = 
                              lambda dfx : dfx.groupby('Company')['Sales'].shift())\
                      .merge(...)\
serene scaffold
#

you might be able to do some walrus operator fuckery

graceful glacier
#

i ask bc the .assign() method s able to take the most recent version of the df

serene scaffold
#

also you didn't use my refactor from before.

graceful glacier
serene scaffold
graceful glacier
#

not off the top of my head

graceful glacier
serene scaffold
#

an expression is something that evaluates to another value, like df_sales.groupby(['Company', 'Date'], as_index=False). in this expression, the ['Company', 'Date'] is also an expression on its own.

#

but a statement is not an expression. df_output_1 = df_sales.groupby(['Company', 'Date'], as_index=False) is an assignment statement. even though df_sales.groupby(['Company', 'Date'], as_index=False) by itself is an expression, when you have the df_output_1 = part, that part isn't an expression.

#

you can't do (a = 1 + 1) + 3 to assign a the value of 2, and then use it as an expression

#

except you can, if you but a : in front of the =

#

🤯

graceful glacier
#

so youve taken a statement and used it in an expression simultaneously

#

correct?

serene scaffold
graceful glacier
serene scaffold
#

!e

x = (y := 2 + 3) + 4
print(y)
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

5
serene scaffold
#

but there are times where it saves you from having to do an expensive function call twice

#

and there are other times that it can make your code more elegant

graceful glacier
#

i see, me personally im just trying to chain assign as much as i can

serene scaffold
#

⛓️

thick acorn
neat anvil
serene scaffold
neat anvil
#

other minor recommendation, if you nest everything after the = inside a parentheses pair, you don't need a \ at the end of every line @graceful glacier

graceful glacier
graceful glacier
#

df_output_1 = (
df_sales.groupby(['Company', 'Date'], as_index=False)['Sales']\
                      .sum()\
                      .assign(Last_Month_Sales = lambda dfx : dfx.groupby('Company')['Sales'].shift())\
                      .merge(df_output_1)
)
neat anvil
neat anvil
graceful glacier
#

yea im working on it lol

thick acorn
neat anvil
#
df_output_1 = (
  df_sales.groupby(['Company', 'Date'], as_index=False)['Sales']
  .sum()
  .assign(Last_Month_Sales = lambda dfx : dfx.groupby('Company')['Sales'].shift())
  .merge(df_output_1)
)

is better

graceful glacier
#

thanks!!

#

but sadly its throwing an error

#

that df_output_1 isnt defined

thick acorn
neat anvil
# graceful glacier that df_output_1 isnt defined

O i see the question - I misunderstood. Why not just do this? trying to do some code-golf or something?

df_output_1 = (
  df_sales.groupby(['Company', 'Date'], as_index=False)['Sales']
  .sum()
  .assign(Last_Month_Sales = lambda dfx : dfx.groupby('Company')['Sales'].shift())
)
df_output_1 = df_output_1.merge(df_output_1)
graceful glacier
#

yea i could have done this, i was curious if it was possible during chain assignment

neat anvil
thick acorn
neat anvil
#

applying a sinusoidal fit to each decade separately would be unable to predict temperatures in any future decade, so that wouldn't be very useful as a predictive model

thick acorn
neat anvil
neat anvil
thick acorn
thick acorn
desert oar
#

@thick acorn you should consider decomposing this time series into a "trend" and "seasonal" component

#

it sounds like your technique is to break up the time series into 10-year chunks and try to fit a separate function in every chunk

#

seems crude but you should definitely be able to see the trend that way

#

that said, you could get a similar effect by just fitting a straight line!

#

you'll see visually that it goes up over time, in a way that is much bigger than random fluctuations

thick acorn
desert oar
#

my suggestion for how to do this with code: put the data into a pandas series, use date/time indexing to "step through" the series 10 years at a time, and then just loop

thick acorn
desert oar
#

however i will warn you that it won't look as good as the "expected" output you showed

#

wouldn't you want to do the linear regression first, then subtract off the trend that you estimated in the regression model?

thick acorn
desert oar
#

you have to do a lot more "bookkeeping" of indexes if you use plain numpy for this

thick acorn
desert oar
#

pandas makes it pretty easy to just select a date range

#

you can even use .resample and iterate over the result

thick acorn
neat anvil
desert oar
desert oar
#

anyway this is a tidy way to iterate over 10 year chunks in a pandas series that has a datetime index:

for chunk_10year in my_series.resample('10YS'):
    ...
thick acorn
thick acorn
thick acorn
desert oar
thick acorn
#

Okay thank you

ornate sky
#

hey ,so i tried to import pyPDF2 and it always produces an error

#

even tho , i have it installed

#

(using linux btw -ubuntu) so i tried to reinstall it and it didn't work

#

any ideas ?

tacit basin
ornate sky
#

module error

tacit basin
#

How do you import it?

ornate sky
#

import pyPDF2

#

i'am pretty sure it's a path related issue am trying to export the installation path to bashrc file