wooden sail Dec 26, 2022, 8:11 PM

#

can you print the shape of h, k1_x, k1_y, etc all the way until the line where you assign x and y to x_sol and y_sol?

prime hearth Dec 26, 2022, 9:02 PM

#

hello, i would like to please ask are there any beginner guide resources to naming variables in data science? For example this is what I have:

# load data
uploaded = files.upload()
yelp_business_category_correlation_dataframe = pd.read_csv(io.BytesIO(uploaded['yelp_bussiness_category_correlation.csv']))
yelp_business_category_correlation_dataframe.head(10)
def getBusinessCompetitorsByCategory(business_category):

#

the csv file im loading is a pearson correlation of businesses by categories

#

and my function will return the top correlated bussines categories related to the parameter business_category, i put keyword Competitor since these business can be potential competitors

#

i dont want to name variables like
df = pd.read.. where df stands for dataframe. df is very ambiguous and not readable in the long term

#

or what i have is okay?

timber spoke Dec 26, 2022, 9:17 PM

#

say I have a 3 layer MLP neural network (2 hidden 1 output). I know the range for my input (0 to 1 for example), but i do not know the possible range for the layer 1 output for example. given i have a set of weights and biases and a defined range for layer 1 inputs, does anyone know if it is possible to determine the maximum possible output for layer 1 for example?

hasty mountain Dec 26, 2022, 9:34 PM

#

timber spoke say I have a 3 layer MLP neural network (2 hidden 1 output). I know the range fo...

You could use a customized threshold function, if there aren't any activation functions available(tanh, ReLU, ReLU6...)

#

From 0 to 1 you could use a sigmoid or softmax function(though I guess sigmoid function between hidden layers might not be recommended due to vanishing gradients)

timber spoke Dec 26, 2022, 9:36 PM

#

hasty mountain From 0 to 1 you could use a sigmoid or softmax function(though I guess sigmoid f...

i was using ReLU for my case

#

for the hidden neurons that is

hasty mountain Dec 26, 2022, 9:37 PM

#

Sigmoid seems to make things quite...unstable. At least I was testing a GAN here and, well...with sigmoid things got quite messy.

timber spoke Dec 26, 2022, 9:40 PM

#

hmm, that's interesting

iron basalt Dec 26, 2022, 10:04 PM

#

Notation in probability is a mess (especially expected value, it can be ambiguous without more information). I prefer p and Pr. Hats on top for approximation.

#

In ML they may or may not state what the notation means, or even be consistent. Things are left ambiguous and made unambiguous with the surrounding text (which they sometimes don't have and expect you to know based on some other papers that they are copying in notation (but mixed together, so it may require a lot of inference to decode its meaning (like solving a Sudoku puzzle at times)))

serene scaffold Dec 26, 2022, 10:08 PM

#

@iron basalt thank you for your input 💚

iron basalt Dec 26, 2022, 10:10 PM

#

*Or my favorite, you could only know what they mean by being on the same wavelength / predicting what they are trying to do in the paper because everyone working on similar stuff has convergent ideas.

#

("culture")

serene scaffold Dec 26, 2022, 10:11 PM

#

there's a whole reference implementation for the paper I wrote two-ish years ago. but even with that, I'm already regretting some imprecision in how I explained a few points.

iron basalt Dec 26, 2022, 10:12 PM

#

Yeah in ML they sometimes kind of give up, and write it for others that are on that same wavelength. In that case it's a very fast way to write it, but terrible for anyone trying to get in on it.

serene scaffold Dec 26, 2022, 10:12 PM

#

(and it's not a shitty reference implementation. you could reproduce everything in the paper with one bash command, if you have the dataset.)

iron basalt Dec 26, 2022, 10:14 PM

#

(in that case one hopes for a reference implementation, hopefully in Python, because it's unlikely that any C++ or other stuff makes any sense and is bug free)

iron basalt Dec 26, 2022, 10:31 PM

#

@hasty mountain Try different loss functions.

#

The default 2014 one is not so great.

hasty mountain Dec 26, 2022, 10:32 PM

#

iron basalt <@388857837222100993> Try different loss functions.

Yeah, I'm taking a look at one Tensorflow implementation of Ian Goodfellow's suggestion

#

A modified one, the non-saturating version for the generator loss

iron basalt Dec 26, 2022, 10:32 PM

#

hasty mountain Yeah, I'm taking a look at one Tensorflow implementation of Ian Goodfellow's sug...

Do you mean the 2016 tutorial? https://arxiv.org/pdf/1701.00160.pdf

hasty mountain Dec 26, 2022, 10:33 PM

#

iron basalt Do you mean the 2016 tutorial? https://arxiv.org/pdf/1701.00160.pdf

Nah, this one
https://github.com/tensorflow/tensorflow/blob/2007e1ba474030fcce840b0b8a599558e7d5998f/tensorflow/python/ops/losses/losses_impl.py

#

But I didn't know about this tutorial. Interesting.

iron basalt Dec 26, 2022, 10:35 PM

#

hasty mountain Nah, this one https://github.com/tensorflow/tensorflow/blob/2007e1ba474030fcce84...

You also might want to look into https://en.wikipedia.org/wiki/Wasserstein_GAN if you did not already.

Wasserstein GAN

The Wasserstein Generative Adversarial Network (WGAN) is a variant of generative adversarial network (GAN) proposed in 2017 that aims to "improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches".Compared with the original GAN discriminator,...

hasty mountain Dec 26, 2022, 10:35 PM

#

Ugh, I've read about WGAN, but I'm kind of lazy to try to implement it

iron basalt Dec 26, 2022, 10:36 PM

#

https://www.intel.com/content/www/us/en/developer/articles/technical/better-generative-modelling-through-wasserstein-gans.html

Intel

Better Generative Modelling through Wasserstein GANs

This paper about Wasserstein GANs (Generative Adversarial Networks) shows how developers can train their discriminator to convergence. Doing this eliminates the need to balance generator updates with discriminator updates.

#

hasty mountain Dec 26, 2022, 10:37 PM

#

Look at that...it looks like a C++ loop grumpchib

#

lr = 5e-5 and RMSProp? Woah...

iron basalt Dec 26, 2022, 10:39 PM

#

Compared to the original GAN algorithm, the WGAN undertakes the following changes:

    - After every gradient update on the critic function, we are required to clamp the weights to a small fixed range is required, usually [−c,c].
    - Use a new loss function derived from the Wasserstein distance.  The discriminator model does not play as a direct critic but rather a helper for estimating the Wasserstein metric between real and generated data distributions.

Empirically the authors recommended usage of RMSProp optimizer on the critic, rather than a momentum-based optimizer such as Adam which could cause instability in the model training.

hasty mountain Dec 26, 2022, 10:40 PM

#

Oooh... I see... I didn't know that about Adam

iron basalt Dec 26, 2022, 10:41 PM

#

hasty mountain Oooh... I see... I didn't know that about Adam

It's a default choice... until it's not.

hasty mountain Dec 26, 2022, 10:42 PM

#

Now that you mentioned it... I think Goodfellow used a simple SGD with momentum, didn't he?

iron basalt Dec 26, 2022, 10:42 PM

#

You are past the beginning here, so default choices need to be reevaluated.

hasty mountain Dec 26, 2022, 10:42 PM

#

But now it makes sense...specially since some papers on GANs differ so much on values for beta1 and beta2 for Adam

#

And I don't have more than 9.000 GPUs to spend so much time in trial and error

#

Thanks!

iron basalt Dec 26, 2022, 10:43 PM

#

Yeah, when you don't have ALL the compute, your choices need to be more careful / examined.

#

/ don't assume that because everyone does it that it's the best.

prime hearth Dec 26, 2022, 11:10 PM

#

hello, i would like to please ask is it better to do recommendation system based on what others like or make a simple recommendation based on popular things

#

what i want is simply to recommend other popular bussiness , but not sure if this is better than recommending what other users like. Recommending popular business is just using some math to find most rating count and highest rating compared to the second option where i need to use KNN or something similar

tacit basin Dec 27, 2022, 6:14 AM

#

prime hearth hello, i would like to please ask is it better to do recommendation system based...

Why not to use both?

hasty kiln Dec 27, 2022, 8:08 AM

#

Are you using tensorflow?

wooden sail Dec 27, 2022, 8:08 AM

#

nope

#

i've used keras a little, but i usually go for jax

hasty kiln Dec 27, 2022, 8:09 AM

#

Wow cool

iron basalt Dec 27, 2022, 8:18 AM

#

Taichi is there, but I have not used it, seems fine, assuming it's not buggy. It has limitations as usual though. We have our own stuff so I don't use these open source frameworks except when using other's stuff or when it happens to be a good fit for something. https://www.taichi-lang.org/

Taichi Lang: High-performance Parallel Programming in Python

Taichi is a domain-specific language embedded in Python that helps you easily write portable, high-performance parallel programs.

hasty kiln Dec 27, 2022, 8:18 AM

#

I love your job, this is my dream

wooden sail Dec 27, 2022, 8:20 AM

#

iron basalt Taichi is there, but I have not used it, seems fine, assuming it's not buggy. It...

ah i hadn't heard about taichi before

iron basalt Dec 27, 2022, 8:20 AM

#

wooden sail ah i hadn't heard about taichi before

It's new-ish.

austere swift Dec 27, 2022, 8:20 AM

#

iron basalt Taichi is there, but I have not used it, seems fine, assuming it's not buggy. It...

i've never heard of that, it seems pretty interesting

wooden sail Dec 27, 2022, 8:21 AM

#

hasty kiln I love your job, this is my dream

better start studying math 😛 the coding is really kinda secondary

austere swift Dec 27, 2022, 8:21 AM

#

that looks like something i'll definitely start using

iron basalt Dec 27, 2022, 8:22 AM

#

The way that it hijacks the Python type hinting syntax is interesting.

#

Building a language out of Python existing syntax, since Python kind of has everything you need now.

fresh tiger Dec 27, 2022, 8:24 AM

#

Hi, I have a question regarding evaluating and retraining a model.

Assuming I have the current flow: 1) User can evaluate existing deloyed model, 2) Based on evaluation, user can retrain the model, 3) if retrain and accuracy better than evaluation accuracy from step 1, deploy model.

My question here is, for step 3, should I be comparing the newely trained model accuracy to the accuracy from step 1? Or should I be comparing it to the accuracy from the old model when it was initially trained?

iron basalt Dec 27, 2022, 8:24 AM

#

*At leas the docs seem to give warnings for limitations which is nice: ```
WARNING

Taichi only supports fields of dimensions ≤ 8.

#

(I happen to have used 12 dimensional arrays so 😦 )

wooden sail Dec 27, 2022, 8:25 AM

#

sounds nasty 😌

wooden sail Dec 27, 2022, 8:25 AM

#

fresh tiger Hi, I have a question regarding evaluating and retraining a model. Assuming I h...

what's the difference between the two? how do you evaluate the model in step 1?

iron basalt Dec 27, 2022, 8:25 AM

#

(I do like "fields" more than "tensors" actually, might yoink that naming)

#

(I just call them ndarrays in my stuff, because I like to call data structures exactly what they are)

wooden sail Dec 27, 2022, 8:27 AM

#

hmm i don't think it's a clear nomenclature though, considering field is already used for the fields over which one defines vector spaces, or when working with vector fields, it kinda hinds at there being other (possibly spatial) dimensions to which vectors are assigned

#

i prefer tensor if it's a multilinear transformation, n-way array otherwise

iron basalt Dec 27, 2022, 8:28 AM

#

I think Taichi comes from graphics programming, so the name kind of makes sense given that background.

wooden sail Dec 27, 2022, 8:28 AM

#

i see... but having to relearn nomenclature AND syntax/API makes it less appealing 😛

iron basalt Dec 27, 2022, 8:29 AM

#

But I will just stick with ndarray, like numpy. The actual name is rarely typed since I have factory procedures (functions))

wooden sail Dec 27, 2022, 8:29 AM

#

i like jax cuz it looks the same as numpy and the nomenclature is mathematically sound

fresh tiger Dec 27, 2022, 8:29 AM

#

wooden sail what's the difference between the two? how do you evaluate the model in step 1?

forgot to add that poart my bad - THe user would add new data to the database - the user can then initiate an evaluation. I dont really have anything specificly defined for that evaluation, but I assume something like: ```python
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

The model is already a .pb and I would load it in, and then probably run something like .evaluate() on it. Shouldnt this accuracy then be used for stage 3, as this accuray is the one impacted by the new data?

wooden sail Dec 27, 2022, 8:31 AM

#

i don't get your last point

#

this evaluation is not that different from the validation that was already done when the model was trained

fresh tiger Dec 27, 2022, 8:32 AM

#

but if the evaluation is done with a lot of new data added to the dataset couldnt that impact the accuracy?

#

ie the model with the original dataset may have had an 80% accuracy

#

but then a large amount of new data is added, model is evaluated, It may have an accuracy of 60%?

wooden sail Dec 27, 2022, 8:33 AM

#

ok, that's fair. but by retrain you don't mean from scratch, do you?

fresh tiger Dec 27, 2022, 8:34 AM

#

I do

wooden sail Dec 27, 2022, 8:34 AM

#

that doesn't seem very well thought out

#

but sure, you can do that

iron basalt Dec 27, 2022, 8:35 AM

#

*Due to the graphics background Taichi seems to have spatial partitioning trees so one can do voxels, fluid simulations and such. Pretty neat.

wooden sail Dec 27, 2022, 8:35 AM

#

you could also treat this as a batching strategy for the data, though in fairness the populations would have different statistical properties

fresh tiger Dec 27, 2022, 8:35 AM

#

wooden sail but sure, you can do that

yeah we are a bit limited on time ;-; so some stuff like this is quite rushed

wooden sail Dec 27, 2022, 8:36 AM

#

i see. anyway, yeah, that makes sense. evaluate using the new data

#

just make sure it's split properly so that you don't evaluate on data you will also train on

fresh tiger Dec 27, 2022, 8:37 AM

#

Could that cause overfitting?

wooden sail Dec 27, 2022, 8:37 AM

#

not overfitting, unfair evaluation

#

you wouldn't even be able to tell if there was overfitting in that case

wooden sail Dec 27, 2022, 8:38 AM

#

iron basalt *Due to the graphics background Taichi seems to have spatial partitioning trees ...

ok that's pretty handy, yeah

#

maybe i'll force a student to look at it in some ultrasound simulations :x

fresh tiger Dec 27, 2022, 8:41 AM

#

wooden sail not overfitting, unfair evaluation

Would this still be the case if the new data is added to the old data? Ie the dataset for retraining will always have the old data + new data addded to it?

wooden sail Dec 27, 2022, 8:43 AM

#

all i mean is that the data you use for evaluation cannot be part of the training data

#

so when you add new data to do this evaluation, that data can't be used for anything else

fresh tiger Dec 27, 2022, 9:06 AM

#

Ahh ok yes that makes sense

#

thank you very much for all of your help! 😄

patent lynx Dec 27, 2022, 9:11 AM

#

hey is there a numpy function for the gram-schmidt for finding a set of basis vectors?

wooden sail Dec 27, 2022, 9:12 AM

#

not directly, but you can use a QR decomp or SVD to achieve a similar effect

#

QR is probably faster

patent lynx Dec 27, 2022, 9:13 AM

#

guess i look into it, otherwise I'll be relying someone else gsbasis function i found on github

wooden sail Dec 27, 2022, 9:13 AM

#

scipy linalg orth produces an orthonormal basis, and the docs say it does it via SVD as i suggested

#

https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.orth.html

#

like so

patent lynx Dec 27, 2022, 9:14 AM

#

yeah the github function shows similar method, but I am not sure if this is robust:

#

import numpy as np
import numpy.linalg as la
def gsBasis(A) :
    B = np.array(A, dtype=np.float_) # Make B as a copy of A, since we're going to alter it's values.
    # Loop over all vectors, starting with zero, label them with i
    for i in range(B.shape[1]) :
        # Inside that loop, loop over all previous vectors, j, to subtract.
        for j in range(i) :
            # Complete the code to subtract the overlap with previous vectors.
            # you'll need the current vector B[:, i] and a previous vector B[:, j]
            B[:, i] = B[:, i] - B[:, i] @ B[:, j] * B[:, j]
        # Next insert code to do the normalisation test for B[:, i]
        if la.norm(B[:, i]) > verySmallNumber :
            B[:, i] = B[:, i] / la.norm(B[:, i])
        else :
            B[:, i] = np.zeros_like(B[:, i])   
    # Finally, we return the result:
    return B```

wooden sail Dec 27, 2022, 9:17 AM

#

yeah this looks like vanilla gram schmidt, might have some numerical stability issues

#

i'd just use the scipy one if you don't wanna make a robust one yourself

patent lynx Dec 27, 2022, 9:17 AM

#

thank you

wide ibex Dec 27, 2022, 10:08 AM

#

Hey everyone, I am trying to find answers to a strange problem in Tensorflow Keras where the exported weights dont seem to be producing the same results and am trying to being some attention to the problem to see if anyone can help understand why this is happening, there is a GitHub issue concerning the problem here: https://github.com/keras-team/keras/issues/17332

If anyone can help shed some light on this issue it would be greatly appreciated, thank you.

GitHub

Exporting Weights from Tensorflow Keras to C produces a different s...

For some reason when I export my weights from a Tensorflow Keras model, be it a simple Sequential FNN, and then load them into C to perform forward passes I get different output results, sometimes ...

odd dagger Dec 27, 2022, 3:22 PM

#

hi, can someone give me an example of web crawling with parallel processing?

plain cobalt Dec 27, 2022, 3:42 PM

#

would someone tell me where should i learn data science for free

serene scaffold Dec 27, 2022, 3:47 PM

#

plain cobalt would someone tell me where should i learn data science for free

!resources data science

arctic wedgeBOT Dec 27, 2022, 3:47 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

plain cobalt Dec 27, 2022, 3:57 PM

#

serene scaffold !resources data science

yes

serene scaffold Dec 27, 2022, 4:21 PM

#

plain cobalt yes

click the link 😛

ripe sapphire Dec 27, 2022, 4:24 PM

#

what should I learn in AIML specifically?

#

what is required to be a professional

#

someone guide me please

serene scaffold Dec 27, 2022, 4:25 PM

#

ripe sapphire what is required to be a professional

a computer science degree related to AI/ML, probably at the masters level

ripe sapphire Dec 27, 2022, 4:25 PM

#

@serene scaffolddid you complete your masters

serene scaffold Dec 27, 2022, 4:26 PM

#

ripe sapphire <@253696366952316929>did you complete your masters

I'm like the only member of my department who doesn't have a masters (and many have PhDs), but I'm starting grad school next month

#

not that I'm some prodigy. I was very fortunate to be hired. but I also cultivated a very niche skillset during my undergrad that happened to be what they wanted.

ripe sapphire Dec 27, 2022, 4:27 PM

#

silly question: Is good maths neccesary for ML

serene scaffold Dec 27, 2022, 4:28 PM

#

depends on what you mean by "good at maths". ML is math. but being "bad at math" is mostly a state of mind.

patent lynx Dec 27, 2022, 4:33 PM

#

You need 3 stuff generally

#

linear algebra

#

multivariate calc

#

statistics (PCA?) still working on this

#

https://www.youtube.com/playlist?list=PLblh5JKOoLUK0FLuzwntyYI10UQFUhsY9

YouTube

Statistics Fundamentals

These videos give you a general overview of statistics as well as a be a reference for statistical concepts.

#

I'd recommend starting with this

#

https://www.youtube.com/watch?v=WUvTyaaNkzM&list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr&ab_channel=3Blue1Brown

YouTube

3Blue1Brown

The essence of calculus

What might it feel like to invent calculus?
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/lessons/essence-of-calculus#thanks

In this first video of the series, we see how unraveling the nuances of a simp...

▶ Play video

#

and this

serene scaffold Dec 27, 2022, 4:36 PM

#

though bare in mind that even if you learn those things, potential AI/ML employers won't really take you seriously without a degree. whether that's fair or not can be debated.

odd dagger Dec 27, 2022, 5:03 PM

#

patent lynx https://www.youtube.com/watch?v=WUvTyaaNkzM&list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t...

best channel for maths overall

wooden sail Dec 27, 2022, 5:06 PM

#

very strong disagree there

#

due to how 3b1b presents content, it only makes sense if you have already covered the content in another form first

#

e.g. by reading in a book or in lectures in uni

#

on its own the channel does not provide nearly enough background nor detail of the right kind to be a standalone good math resource

serene scaffold Dec 27, 2022, 5:40 PM

#

wooden sail due to how 3b1b presents content, it only makes sense if you have already covere...

I don't think the Essence of Calculus series is intended to help one learn calculus starting from algebra. I think it's intended to strengthen one's existing understanding of calculus.

And I think his series on neural networks with MNIST is great for laying a foundation

So yes, I agree that it's not a good standalone math resource. But I don't think that's what it's trying to be.

wooden sail Dec 27, 2022, 5:42 PM

#

i agree, but the way historify and curry chicken recommended it is misleading if the other person has no prior knowledge

serene scaffold Dec 27, 2022, 5:42 PM

#

and I think the guiding principle for what videos he decides to make is "what aspects of a given topic could be better explained with animations than with static visuals?" and goes from there

patent lynx Dec 27, 2022, 5:46 PM

#

Sorry if i give bad advice, but yeah should have put asterisks that this should have been supplemented with examples like in books and exercises

wooden sail Dec 27, 2022, 5:48 PM

#

not bad advice, just needs a little more oomph 😛

odd dagger Dec 27, 2022, 6:03 PM

#

@wooden sail I somewhat agree with you. I felt it was the best channel for me for calculus stuff atleast, I still watch his videos for fun...
Also like, everyone's personal opinion differ right? blobgrimacing

wooden sail Dec 27, 2022, 6:10 PM

#

most certainly. my wording might have been harsh, but at the end of the day it's my opinion rather than fact 😛

serene scaffold Dec 27, 2022, 6:26 PM

#

wooden sail most certainly. my wording might have been harsh, but at the end of the day it's...

!otn a edd's opinion is fact

arctic wedgeBOT Dec 27, 2022, 6:26 PM

#

:ok_hand: Added edd’s-opinion-is-fact to the names list.

wooden sail Dec 27, 2022, 6:30 PM

#

lol

odd meteor Dec 27, 2022, 6:40 PM

#

wooden sail i agree, but the way historify and curry chicken recommended it is misleading if...

I saw curry chicken and I immediately thought it's some food condiment brand and stuff like that 😄

Hey, Curry Chiken, cool name mate. Are you from India or Spain?

#

Is anyone here attending ICLR 2023 ?

#

We can do a small PythonDiscord Hangout / Dinner if we're up to 3 people that'll attend ICLR 😎

steady basalt Dec 27, 2022, 7:00 PM

#

ripe sapphire silly question: Is good maths neccesary for ML

You should have a decent statistical skill set at minimum

hasty mountain Dec 27, 2022, 8:33 PM

#

@iron basalt ChatGPT might not have all the answers...but he's quite a Sokrates.
I was talking to him about my GAN and that my discriminator had its loss stabilized around 0.34, while my Generator loss, in the best case, oscillated between 5 and 8, and he just told me "you could train the generator for more epochs with the discriminator's weights frozen"

#

Like...every code I've seen about GANs uses the approach 1 batch -> 1 step for discriminator -> 1 step for generator.
But then this reminded me that Goodfellow suggested that one could use more iterations per batch in both discriminator or generator. He just used 1 in the paper for convenience. Heh

#

Now I'll see if 5 more iterations in the generator works(it has 5 times more parameters than the discrimiantor)... using a content loss function to try to avoid model collapse, of course

#

I was trying to compensate this by using a higher learning rate for the generator optimizer, but it also didn't work.

toxic vault Dec 27, 2022, 8:38 PM

#

question

#

i was wondering if this code would work for a stock trading bot that uses machine learning and stock strategies

serene scaffold Dec 27, 2022, 9:06 PM

#

I have a Bert-for-NER model, named m0, that does 9 classes, so the final layer is Linear(in_features=768, out_features=9, bias=True). And I'm trying to make a copy of that model, m1, that does all of those same classes and three additional ones. So I did this.

m1 = BertForTokenClassification.from_pretrained('./m0.pkl')
linear = nn.Linear(768, len(e1))  # len(e1) == 12
with torch.no_grad():
    linear.weight[:len(e0), :] = m0.classifier.weight.clone().detach()  # len(e0) == 9
    m1.classifier = linear
m1.train().to(cuda)

And if I print m1, I can see that indeed, the last layer is (classifier): Linear(in_features=768, out_features=12, bias=True).

But I still end up wit this error

Traceback (most recent call last):
  File "/home/farnsworthsw/projects/cont_learning/replicate_addner.py", line 194, in <module>
    optimizer = AdamW(m1.parameters(), lr=1e-5, eps=1e-8)
  File "/home/farnsworthsw/projects/cont_learning/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/farnsworthsw/projects/cont_learning/venv/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1785, in forward
    loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
RuntimeError: shape '[-1, 9]' is invalid for input of size 7296

#

I checked all the layers of both m0 and m1, and I didn't see anything in either that depended on the number of classes except the classifier layer (which is the last one), so I'm not sure why this would suddenly become a problem.

#

I suppose I could try making m1 with nn.Sequential of all the layers of m0 except the last one, and then the new linear one.

#

I hope my question is sufficiently detailed without being too much.

keen notch Dec 27, 2022, 9:14 PM

#

wooden sail can you print the shape of h, k1\_x, k1\_y, etc all the way until the line where...

omg sorry missed thisss

#

shall i np.shape(h)

#

import numpy as np
import matplotlib.pyplot as plt
# Define functions to compute the right-hand sides of the differential equations
def f_x(x, y, vx, vy):
    return -2 * y**2 * x * (1 - x**2) * np.exp(- (x**2 + y**2))
def f_y(x, y, vx, vy):
    return -2 * x**2 * y * (1 - y**2) * np.exp(- (x**2 + y**2))
def trajectory(impactpar, speed):
    maxtime = 10 / speed
    t = np.linspace(0, maxtime, 300)
    x = impactpar

    y = -2
    vx = 0
    vy = speed
    # Initialize arrays to store the solutions
    x_sol = np.empty(t.shape)
    y_sol = np.empty(t.shape)
    for i, _t in enumerate(t[:-1]):
        h = t[i+1] - _t
        k1_x, k1_y = h * vx, h * vy
        k2_x, k2_y = h * (vx + 0.5 * k1_x), h * (vy + 0.5 * k1_y)
        k3_x, k3_y = h * (vx + 0.5 * k2_x), h * (vy + 0.5 * k2_y)
        k4_x, k4_y = h * (vx + k3_x), h * (vy + k3_y)
        x += (k1_x + 2 * k2_x + 2 * k3_x + k4_x) / 6
        y += (k1_y + 2 * k2_y + 2 * k3_y + k4_y) / 6
        vx = f_x(x, y, vx, vy)
        vy = f_y(x, y, vx, vy)
        x_sol[i+1], y_sol[i+1] = x, y
    return x_sol, y_sol
x_sol,y_sol = trajectory(0.1, 0.1)
# Plot the resulting trajectory
plt.plot(x_sol, y_sol)
plt.xlabel("x")
plt.ylabel("y")
plt.show()

# Solution to part (b)
def scatterangles(allb, speed):
    # Initialize an array to store the scatter angles
    angles = np.empty(allb.shape)

    # Loop over the impact parameter values
    for i, impactpar in enumerate(allb):
    # Solve the differential equations and store the final values of x and y
        _, vy = trajectory(impactpar, speed)
        # Compute the scatter angle
        angles[i] = np.arctan2(vy, 0)
        # Return the array of scatter angles
    return angles
allb = np.arange(-2, 2, 0.001)
angles = scatterangles(allb, 0.1)

# Plot the scatter angles as a function of impact parameter
plt.plot(allb, angles)
plt.xlabel("Impact parameter")
plt.ylabel("Scatter angle")
plt.show()```

hasty mountain Dec 27, 2022, 9:19 PM

#

hasty mountain <@119925597395877889> ChatGPT might not have all the answers...but he's quite a ...

Eeeeh...maybe he was wrong here, too. Again, discriminator loss stabilized at 0.33 and generator loss stabilized around 5.73

#

pithink

#

I hope I'm just being too hasty

keen notch Dec 27, 2022, 9:30 PM

#

so when printing the shape I get this

#

hasty mountain Dec 27, 2022, 11:48 PM

#

hasty mountain Eeeeh...maybe he was wrong here, too. Again, discriminator loss stabilized at 0....

In the end, the correct answer was "try using residual blocks". I was trying to use an architecture similar to SRGAN but without residual blocks, but, apparently, residual blocks are magical. Though I don't quite understand the justificative on why they work so well...
I can understand when you concatenate a residual block to your output, but I don't quite get it why it works when you directly sum the residual blocks, element-wise.

idle sequoia Dec 28, 2022, 12:20 AM

#

huys, I have this column in a Pyspark data frame:

date
2022/1/1
2022/10/2
2022/2/4

and I really need to convert the datas to:

2022/01/01
2022/10/02
2022/02/04

how can i do this with pyspark?

craggy patio Dec 28, 2022, 7:21 AM

#

How do I make an AI that tries making different chords and tests if the user likes them or not and keep making chords that the user likes?

strong sedge Dec 28, 2022, 8:02 AM

#

can we create a auto encoder for dealing with names ?

#

if so can you link me to some paper/article

odd dagger Dec 28, 2022, 10:12 AM

#

anyone good in xpath here?
//*[contains(concat( " ", @class, " " ), concat( " ", "organizationName", " " ))]

Can someone explain to me how this xpath works?

I am trying to crawl data from a website using scrapy

ocean swallow Dec 28, 2022, 10:48 AM

#

is there any service for labeling product images?

#

I used Google Vision and probably if I had an hour I would deploy a service better than that...

#

but mine wouldn't be enough as well

odd mason Dec 28, 2022, 11:58 AM

#

Not sure if this is the right place to ask,
I've been a ML Engineer for 2+ years now in the same company (straight out of college)
I'm considering switching companies soon and was looking for potential project ideas to put on my resume.
Is there any place I can get ideas from? (Which aren't too generic)

#

My resume just has one project at present

patent lynx Dec 28, 2022, 12:45 PM

#

join a kaggle competition, you might want to form teams with someone you know. Gravitate to a topic that is relevant to the company you are interested in or that satisfy the new job's skills

#

for example if you want to be an ML in a real estate company, perhaps you might want to join this: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques

House Prices - Advanced Regression Techniques

Predict sales prices and practice feature engineering, RFs, and gradient boosting

odd mason Dec 28, 2022, 1:11 PM

#

patent lynx join a kaggle competition, you might want to form teams with someone you know. G...

Wouldn't these would be more fit for a Data Scientist rather than ML Engineer profile?

patent lynx Dec 28, 2022, 1:14 PM

#

odd mason Wouldn't these would be more fit for a Data Scientist rather than ML Engineer pr...

they kinda do, but i think you can develop some kind of ML technique and test it against top results (other competitors which you can see their codes in kaggle).

amber lark Dec 28, 2022, 4:27 PM

#

Does someone know how to change the colors that they won't look so similar?

strange igloo Dec 28, 2022, 4:41 PM

#

You may try something similar to this:

axes.scatter(high_action.suspense, high_action.action, high_action.comedy, c="red", marker="x", s=200)
axes.scatter(low_action.suspense, low_action.action, low_action.comedy, c="blue", marker="o", s=200)

axes.set_title("Sample Movies")
axes.set_xlabel('Suspense')
axes.set_ylabel('Action')
axes.set_zlabel('Comedy')```

#

And the full article will be helpful if you are using matplotlib

#

https://medium.com/geekculture/matplotlib-3d-scatterplots-from-zero-to-hero-e7291766a0b2

Medium

Matplotlib 3D Scatterplots From A to Z

The Ultimate Guide To 3D Scatter Plots Using Matplotlib’s Object-Oriented and Pyplot Interfaces

hasty mountain Dec 28, 2022, 5:09 PM

#

craggy patio How do I make an AI that tries making different chords and tests if the user lik...

Try using a Generative model(Diffusion, GAN, Variational AutoEncoder) to make the chords, and take a look at Reinforcement Learning methods to consider the user feedback.
(I'd recommend asking ChatGPT about PPO...it explains in a quite clean way how it works. But also consider Time Difference Learning)

lethal spade Dec 28, 2022, 5:17 PM

#

Am I the only one that thinks that Advanced indexing in numpy doesn't follow the principle of minimum astonishment?

for example

a = np.random.rand(100, 100)

a[(2,4)] #this yields the element at [2,4]
a[[2,4]] #this yields the rows at position 2 and 4
a[1, (2,4)] #this yields the 2nd and 4th elements of row 1. (So actually does advanced indexing)
a[1, [2,4]] # Works the same way as the previous one.

Worst of all, it's very easy for someone do a mistake and not notice it: it seems to me that the first method, a[(2,4)], should not be allowed, and instead only a[*(2,4)] should work. I checked how it works in Julia (which has a similar syntax), and a[(2,4)] would yield an error, which makes sense to me. Could it be an idea to deprecate a[(2,4)]-like usages?

agile cobalt Dec 28, 2022, 5:40 PM

#

lethal spade Am I the only one that thinks that Advanced indexing in numpy doesn't follow the...

numpy syntax is not a magical layer on top of python, it uses the features provided by python - when you do arr[1, 2] that 1, 2 is a tuple. ```pycon

class Foo:
... def getitem(self, *something):
... print(repr(something))
...

foo = Foo()
foo[(1, 2, 3)]
((1, 2, 3),)
foo[1, 2, 3]
((1, 2, 3),)

arctic wedgeBOT Dec 28, 2022, 5:42 PM

#

@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | (1, 2, 3)
002 | (1, 2, 3)

agile cobalt Dec 28, 2022, 5:43 PM

#

!e without the * (had it there for testing)```py
class Foo:
def getitem(self, item):
print(repr(item))
arr = Foo()
arr[1, 2, 3]
arr[(1, 2, 3)]
arr[*(1, 2, 3)]

arctic wedgeBOT Dec 28, 2022, 5:43 PM

#

@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | (1, 2, 3)
002 | (1, 2, 3)
003 | (1, 2, 3)

strange igloo Dec 28, 2022, 6:08 PM

#

I have a data frame that looks like this. I used a recursive function to create a lineage of dependencies. The problem is, some of the lineage routes are incomplete, though the data set will include the complete route at some point.

How can I remove the incomplete routes and preserve the complete routes?

Here the red line is incomplete, while the green line is complete.

lapis sequoia Dec 28, 2022, 6:14 PM

#

JAI HIND. I AM FROM INDIA AND LOOKING FORWARD TO BE A DATA SCEINTIST

young granite Dec 28, 2022, 6:33 PM

#

strange igloo I have a data frame that looks like this. I used a recursive function to create ...

what makes the marked rows incomplete?

strange igloo Dec 28, 2022, 6:35 PM

#

So it takes every row of the original data set, and then creates that rows chain.

#

but any particular row in the original data set can be in the middle of a chain

young granite Dec 28, 2022, 6:36 PM

#

how do u define whether a row is complete or incomplete then

strange igloo Dec 28, 2022, 6:36 PM

#

It's more like a series of rows

young granite Dec 28, 2022, 6:36 PM

#

to cluster ur data u need to define rules for clustering

strange igloo Dec 28, 2022, 6:37 PM

#

It's like this

row 1
row 2
row 3
row 4

is complete series

row 3
row 4

is incomplete series

young granite Dec 28, 2022, 6:38 PM

#

so different sources ?

#

use something like this in a pandas df:

df.loc[df['column_name'] == some_value]
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

or make a boolean df beforehand and filter afterwards.
unfortunately I do not understand your question logic

patent lynx Dec 28, 2022, 6:44 PM

#

I think what he meant that the chain creation is created by the dependency and the dependent

#

Like delta cool 1 depedent is cool 1

#

Then the delta cool 2 dependency is cool 1

strange igloo Dec 28, 2022, 6:45 PM

#

It’s hard to describe

#

It’s hard to think about lol

#

But that is right

patent lynx Dec 28, 2022, 6:46 PM

#

Index chain 9, 10, 11 is incomplete because if you look at one of their dependency and the dependent for the next index are not equal.

#

Like delta cool 3 in index 9 has dependent cool report but delta cool 2 in index 10 has the dependency cool 1. This forms an incomplete chain.

strange igloo Dec 28, 2022, 6:55 PM

#

Yeah, that's right.

Delta cool 3 in index 9 is the end of a chain, so there isn't anything that comes after it.

So there's a correction there on my part, the chain actually begins at index 10, with delta cool 2, but is incomplete

Another way to say it, is I want to only preserve a chain that begins at the root

patent lynx Dec 28, 2022, 7:04 PM

#

Idk how would you define the root tho

#

Is it the proc name or the dependency?

strange igloo Dec 28, 2022, 7:19 PM

#

the dependency

#

I realized that maybe this isn't ideal, though.

Basically, I'm pulling definitions of stored procedures from a SQL database and then parsing the from and into clauses to find dependencies.

And I envisioned this as a spreadsheet lineage.

Now I realize that preserving only the complete series might be a bad idea, because you might want to search a lineage when the "root" is actually in the middle of a series.

#

Thank you for the help @patent lynx - I'm going to go for a walk. Enough hacking for now.

craggy patio Dec 28, 2022, 8:55 PM

#

hasty mountain Try using a Generative model(Diffusion, GAN, Variational AutoEncoder) to make th...

Thank you

serene scaffold Dec 28, 2022, 9:19 PM

#

serene scaffold I have a Bert-for-NER model, named `m0`, that does 9 classes, so the final layer...

turns out my solution was basically right, except that I needed to deconstruct m1's individual layers into a nn.Module subclass.

keen notch Dec 28, 2022, 9:47 PM

#

I've a question why does all my plots look the same is my energy functions wrong

#

# YOUR CODE HERE
import numpy as np
import math
import matplotlib.pyplot as plt

G=6.6738e-11
M=1.9891e30
m=5.9722e24

def verlet(x0,y0,vx0,vy0,N,paramters=()):
    t = 1/N #timestep
    G = paramters[0]
    M = paramters[1]
    m = paramters[2]
    x=np.zeros((N,2))
    v=np.zeros((N,2))
    x[0]=(x0,y0)
    v[0]=(vx0,vy0)
    for i in range(N-1):
        x[i+1] = x[i] + (v[i] * t)
        f = -G * M * x[i+1] / (np.linalg.norm(x[i+1])**3)
        v[i+1] = v[i] + (f * t)
    return x,v

def solve(par):
    xval,vval = verlet(1.521e11,0,0,2.9291e4,35040,paramters=par)
    return xval,vval

def potentialEnergy(r,par):
    energy = np.zeros(len(r))
    vals = r[:, 1]
    for i in range(len(r)):
        energy[i] = par[2] * par[0] * np.linalg.norm(r[i])
    return energy
    
def kineticEnergy(v,par):
    energy = np.zeros(len(v))
    vals = v[:, 0]
    for i in range(len(v)):
        energy[i] = 0.5 * par[2] * ((np.linalg.norm(v[i]))**2)
    return energy

xval,vval = verlet(1.521e11,0,0,2.9291e4,35040,paramters=(G,M,m))

pe = potentialEnergy(xval,(G,M,m))
ke = kineticEnergy(vval,(G,M,m))

total = pe+ke

plt.subplot(3, 1, 1)
plt.plot(pe)


plt.subplot(3, 1, 2)
plt.plot(ke)

plt.subplot(3, 1, 3)
plt.plot(total)

plt.show()```

#

#

happy to show the question if needed

young granite Dec 28, 2022, 9:56 PM

#

provide the question pls @keen notch

keen notch Dec 28, 2022, 9:57 PM

#

yes of course,here!

#

#

can you read it?

young granite Dec 28, 2022, 9:57 PM

#

nah please provide text

#

or paste the markdown in a pastebin

#

!paste

arctic wedgeBOT Dec 28, 2022, 9:59 PM

#

Hey @keen notch!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

keen notch Dec 28, 2022, 10:01 PM

#

https://paste.pythondiscord.com/erohinipef

#

has a few equations so ss again

young granite Dec 28, 2022, 10:17 PM

#

does this seem suited?

#

@keen notch

keen notch Dec 28, 2022, 10:22 PM

#

omgg

#

looks better i thinkk

#

the start scale is squashed

#

tbh not sure how it's supposed to look just knew my graphs looked very wrong😂

young granite Dec 28, 2022, 10:23 PM

#

😄

keen notch Dec 28, 2022, 10:23 PM

#

how did you do it in terms of code

young granite Dec 28, 2022, 10:29 PM

#

wait

#

somethings wrong

keen notch Dec 28, 2022, 10:29 PM

#

ohh ok let me know what's up

young granite Dec 28, 2022, 10:33 PM

#

https://paste.pythondiscord.com/ubazolajad

keen notch Dec 28, 2022, 10:35 PM

#

ooo thank you so what was the issue

#

so i can understand

#

DAMN YOUR GENIUS

#

is there no orange plot?

young granite Dec 28, 2022, 10:38 PM

#

keen notch is there no orange plot?

under the green one

#

u see blue line is by 0

#

so there wont be many changes to the resulting curve

keen notch Dec 28, 2022, 10:39 PM

#

ohhh makes sense

#

what was wrong with what i had previously

young granite Dec 28, 2022, 10:42 PM

#

i would assume its ur np.zeros all time so u get a mismatch somewhere

#

but i didnt check ur "logic"

#

so the iterations u do

#

i would suggest to check em

keen notch Dec 28, 2022, 10:43 PM

#

ahh fair enough so what does this do

keen notch Dec 28, 2022, 10:43 PM

#

young granite i would suggest to check em

good plan

keen notch Dec 28, 2022, 10:43 PM

#

young granite i would assume its ur np.zeros all time so u get a mismatch somewhere

u right acc that makes sense

young granite Dec 28, 2022, 10:44 PM

#

keen notch ahh fair enough so what does this do

which part

keen notch Dec 28, 2022, 10:44 PM

#

the appends

young granite Dec 28, 2022, 10:45 PM

#

store out the values

#

u need to calculate for each step

keen notch Dec 28, 2022, 10:45 PM

#

young granite store out the values

oh right t[-1]?

young granite Dec 28, 2022, 10:45 PM

#

thats why u need to append later and why i think ur np.zeros are resulting in mismatch

keen notch Dec 28, 2022, 10:45 PM

#

young granite thats why u need to append later and why i think ur np.zeros are resulting in mi...

ahhh i seee

#

smart

#

and interesting

#

t[-1]?

young granite Dec 28, 2022, 10:48 PM

#

u want to compare last t value with the tmax

keen notch Dec 28, 2022, 10:48 PM

#

young granite u want to compare last t value with the tmax

ohhhh

young granite Dec 28, 2022, 10:48 PM

#

to iterate only in the range

#

!e

import numpy as np


test = np.array((1,2,3))
print(test[-1])

keen notch Dec 28, 2022, 10:49 PM

#

makes sensee

#

thank you so so much

young granite Dec 28, 2022, 10:49 PM

#

no worries

#

but [-1] is pretty basic stuff

#

u should check the basic first

#

before u attend such difficult questions?!

keen notch Dec 28, 2022, 10:52 PM

#

young granite before u attend such difficult questions?!

i google haha

#

i should ur right

young granite Dec 28, 2022, 10:52 PM

#

u doing it for university?

keen notch Dec 28, 2022, 10:52 PM

#

working on a course outside uni but yes

young granite Dec 28, 2022, 10:52 PM

#

i would highly recommend to learn python tho

#

its the "soft-skill" to have

keen notch Dec 28, 2022, 10:53 PM

#

young granite i would highly recommend to learn python tho

in summer i will it's more a side hobby atm

keen notch Dec 28, 2022, 10:53 PM

#

young granite its the "soft-skill" to have

for sure!!

young granite Dec 28, 2022, 10:53 PM

#

thats totally fine

#

but maybe ur current class is then too advanced for u

#

its no shame to start small

keen notch Dec 28, 2022, 10:55 PM

#

young granite but maybe ur current class is then too advanced for u

that's true!!!

#

i'm more a C girl

#

i'm just trying to do python questions I've done some basic ones tbf and didn't have problems

young granite Dec 28, 2022, 10:57 PM

#

keen notch i'm just trying to do python questions I've done some basic ones tbf and didn't ...

its a good learning approach

#

however can be frustrating and difficulty

#

im out for today have a great night

#

🦉

keen notch Dec 28, 2022, 11:01 PM

#

young granite im out for today have a great night

have a goodnight too!! and thanks again I'll take your advice

#

one question

#

shouldn't the total remain constant?

robust jungle Dec 29, 2022, 4:28 AM

#

im trying to train a model with labels in the format of integers ranging from 0-2 and am getting this error:

Received a label value of 2 which is outside the valid range of [0, 1).  Label values: 1 2 2 2 2 0 0 0 1 1 2 2 1 1 0 0
     [[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_73146]

I know that sparse crossentropy is supposed to accept integer encoded labels as opposed to one - hot, so what am I doing wrong?

plush jungle Dec 29, 2022, 7:27 AM

#

lol my q learning bot that I'm trying to teach to play a top down shooter game has learned that pygame has lag

#

I don't have a cap on how many bullets it can shoot, and I gave it a penalty for not putting its laser on the target and a reward for being on target, so naturally as it spun around trying to find the target, it realized it could minimize its penalty if it slowed down time

#

so in between every turn action it takes, it fires another bullet

#

now that it's found the target I think it's about to learn that not slowing down time is actually beneficial

dusk tide Dec 29, 2022, 7:31 AM

#

So I was following a notebook on kaggle to train model on TPU which was by kaggle grandmaster Phil Cullinton https://www.kaggle.com/code/philculliton/a-simple-tf-2-1-notebook/notebook . In this he has used VGG16 model which takes the input shape as (224,224) during training. But he is training with (192,192) input shape images ?? On GPU or CPU it will throw an error. How is it possible ??

A Simple TF 2.1 notebook

Explore and run machine learning code with Kaggle Notebooks | Using data from Flower Classification with TPUs

keen notch Dec 29, 2022, 12:11 PM

#

hey does anyone whether this means my bottom plot is wrong (should be constant)

#

fast schooner Dec 29, 2022, 12:22 PM

#

Hi, I hope everyone is having a great christmas. I'm searching for some advice and think that this might be the right channel to ask. I'm about to finish my theoretical physics degree in barcelona and I've taken some computational courses+ I've done Machine Learning course and an internship ML related in stockholm (erasmus). I really enjiyed this topics and was thinking about pursuing a carreer in data science. I was hoping that there is someone with a similar background that could give some advice. Thanks in advance 🙂

fast schooner Dec 29, 2022, 12:24 PM

#

keen notch

Idk without seeing the rest of the plot but ur total energy should be higher than the kinetic energy too

#

sry without seeing the rest of the code

#

Maybe its not properly labelled

keen notch Dec 29, 2022, 12:28 PM

#

I can pastebin the rest of the code

#

with the question?

#

https://paste.pythondiscord.com/ubazolajad

keen notch Dec 29, 2022, 12:31 PM

#

fast schooner Maybe its not properly labelled

it is labelled properly?

keen notch Dec 29, 2022, 12:50 PM

#

the thing I'm unsure about is whether the total has to stay constant

fast schooner Dec 29, 2022, 12:50 PM

#

yes it should

#

Energy is preserved

fast schooner Dec 29, 2022, 12:51 PM

#

fast schooner Idk without seeing the rest of the plot but ur total energy should be higher tha...

Okay dont look at this comment

#

energy should be more in absolute value in this case

keen notch Dec 29, 2022, 12:52 PM

#

not sure what I'm doing wrong

fast schooner Dec 29, 2022, 12:52 PM

#

Oh

#

I think it is right

#

it oscillates because of the method u're using

#

just check b

#

It does oscillate around a constant mean value

#

That can happen when using numerical methods

keen notch Dec 29, 2022, 12:54 PM

#

fast schooner It does oscillate around a constant mean value

wait how so maybe I understand this wrong

keen notch Dec 29, 2022, 12:54 PM

#

fast schooner It does oscillate around a constant mean value

ohh so it's fine

fast schooner Dec 29, 2022, 12:55 PM

#

Smthing like this

#

I think so yea

keen notch Dec 29, 2022, 12:56 PM

#

ohh i see the red line is the mean!

#

how'd you do that

fast schooner Dec 29, 2022, 12:56 PM

#

Paint

keen notch Dec 29, 2022, 12:56 PM

#

haha ohh

fast schooner Dec 29, 2022, 12:56 PM

#

hahahah

keen notch Dec 29, 2022, 12:56 PM

#

i get youu

#

smart thank you so the mean is what they want constant

#

but it oscilates

fast schooner Dec 29, 2022, 12:57 PM

#

but you can get the mean value with np.mean and then plot a constant line if u want

#

yess

keen notch Dec 29, 2022, 12:57 PM

#

ofc i know that function hehe:)

#

yesss so my graphs are all good😎

fast schooner Dec 29, 2022, 12:58 PM

#

Can u plot the potential and kinetic energy

keen notch Dec 29, 2022, 12:58 PM

#

top graph

fast schooner Dec 29, 2022, 12:59 PM

#

yes but i cant see the potential

keen notch Dec 29, 2022, 1:00 PM

#

because it sum of both

#

so will cancel

fast schooner Dec 29, 2022, 1:00 PM

#

mmm

#

I think in this case the potential is just on top of ur total energy

keen notch Dec 29, 2022, 1:01 PM

#

is it because the blue line is by 0
so there wont be many changes to the resulting curve (orange)

fast schooner Dec 29, 2022, 1:02 PM

#

fast schooner I think in this case the potential is just on top of ur total energy

So this basically

mystic grotto Dec 29, 2022, 1:02 PM

#

Hey guys, question regarding regression model.
I want to predict a salary based on categorical data such as experience level or job title.
What model would recommend?
Thx in advance 🙂

keen notch Dec 29, 2022, 1:03 PM

#

fast schooner So this basically

so we all good:))

fast schooner Dec 29, 2022, 1:03 PM

#

Im not sure

#

hahahahahah

#

If u could plot only the potential

#

im assuming it will look just like the total energy

keen notch Dec 29, 2022, 1:06 PM

#

fast schooner im assuming it will look just like the total energy

that might be right

#

hmm i'll think

#

but thank you for your help!:)

fast schooner Dec 29, 2022, 1:06 PM

#

np

#

It still doesnt look quite okay

#

But think about it I can't help rn maybe at night

keen notch Dec 29, 2022, 1:08 PM

#

ahhh no a is wrong total should be constant

keen notch Dec 29, 2022, 1:09 PM

#

fast schooner But think about it I can't help rn maybe at night

it's okee

fast schooner Dec 29, 2022, 1:13 PM

#

yes there is something wrong

#

If I have time tonight I'll take a look

keen notch Dec 29, 2022, 1:14 PM

#

yeah there is I'll think about it when I'm back home as well

#

might be my equations

fading zealot Dec 29, 2022, 2:31 PM

#

does anyone know feature engineering ?

charred light Dec 29, 2022, 2:45 PM

#

fading zealot does anyone know feature engineering ?

That's some what of a vague question. There's a lot of parts to feature engineering. What do you need to know?

fading zealot Dec 29, 2022, 2:46 PM

#

I have 4 dataset

#

📎 Global_Temperatures.csv 📎 ElectricCarPrices.csv 📎 CO2_emission.csv 📎 Air_Polution.csv

#

I want to apply feature engineering using these four dataset with an hypothesis question

charred light Dec 29, 2022, 2:50 PM

#

fading zealot I want to apply feature engineering using these four dataset with an hypothesis ...

So what's the goal? Are you trying to first merge these datasets together? By country and then by year?

Saying "apply feature engineering" is very generic. It would be like: I want to apply math to this math problem. Do you want to do integrals? Algebra?

fading zealot Dec 29, 2022, 2:52 PM

#

apply a model and predict something

#

Do you mind if we can get on call and explain you everything @charred light

#

thinking about applying linear regression model

charred light Dec 29, 2022, 2:58 PM

#

fading zealot apply a model and predict something

Then you would want to start by merging these datasets together. You can use Pandas and use pd.merge function.

I don't have time to commit to a call, nor do I really want to do that either.

fading zealot Dec 29, 2022, 2:59 PM

#

I do really need help @charred light though

charred light Dec 29, 2022, 3:00 PM

#

And text is just fine to do that.

rancid sorrel Dec 29, 2022, 3:00 PM

#

you should convert the data over to a common format using sklearn, unable it and then throw it in your models

#

first thing is to compair the types of data using pandas.

#

df1.dtypes

#

for example

fading zealot Dec 29, 2022, 3:02 PM

#

can you explain using a zoom call

rancid sorrel Dec 29, 2022, 3:02 PM

#

lol not zoom no

fading zealot Dec 29, 2022, 3:03 PM

#

oops we do have discord

fading zealot Dec 29, 2022, 3:03 PM

#

rancid sorrel df1.dtypes

I did that

open storm Dec 29, 2022, 3:03 PM

#

https://pastebin.com/LJaFfFsE
Can someone please help ? This is a deep learning problem. I trained a gesture learning model on 225x225 pixel images using keras and neural networks. I saved the model to an h5 file. Above is the code I want to use for detecting it. However when I show my hand in front of the camera it shuts down right away with error being that

ValueError: Input 0 of layer "sequential_4" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(None, 21, 2)

I am fine with detecting my hand within a region of interest. But what other things I can do to fix this problem and get it up and running?

Pastebin

# Import OpenCV and MediaPipeimport cv2import mediapipe as mpimport...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

rancid sorrel Dec 29, 2022, 3:04 PM

#

also anyone know why this isnt working

l1.dytpes
bare_nuclei                    object
----> 1 l1["bare_nuclei"] = pd.to_numeric(l1["bare_nuclei"])

TypeError: 'method' object is not subscriptable
l1["bare_nuclei"] = pd.to_numeric(l1["bare_nuclei"])```

charred light Dec 29, 2022, 3:08 PM

#

rancid sorrel also anyone know why this isnt working ``` l1.dytpes bare_nuclei ...

The proper use case is just pd.to_numeric(l1["bare_nuclei"]). You don't need to set it back to the dataframe.

rancid sorrel Dec 29, 2022, 3:08 PM

#

well i kinda need to shove it there for all the stuff i am doing later

#

? does pd.to_numeric actualy convert the data?

#

@fading zealot https://www.statology.org/pandas-merge-multiple-dataframes/

Statology

How to Merge Multiple DataFrames in Pandas (With Example) - Statology

This tutorial explains how to merge multiple DataFrames into one in pandas, including an example.

#

give that a read

charred light Dec 29, 2022, 3:09 PM

#

Yes, it applies similarly to the inplace flag in other pandas functions

rancid sorrel Dec 29, 2022, 3:09 PM

#

also @charred light just using pd.to_numeric(l1["bare_nuclei"]) has same error

charred light Dec 29, 2022, 3:10 PM

#

rancid sorrel also <@287800670441046018> just using `pd.to_numeric(l1["bare_nuclei"])` has sam...

Do a type(pd.to_numeric(l1["bare_nuclei"])) and see what that returns. Also, check your dataframe. type(l1["bare_nuclei"])

fading zealot Dec 29, 2022, 3:11 PM

#

@rancid sorrel Do you want help me later?

rancid sorrel Dec 29, 2022, 3:12 PM

#

ValueError                                Traceback (most recent call last)
File ~/.local/lib/python3.9/site-packages/pandas/_libs/lib.pyx:2363, in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "?"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 type(pd.to_numeric(l1["bare_nuclei"]))
      2 #pd.to_numeric(l1["bare_nuclei"])

File ~/.local/lib/python3.9/site-packages/pandas/core/tools/numeric.py:185, in to_numeric(arg, errors, downcast)
    183 coerce_numeric = errors not in ("ignore", "raise")
    184 try:
--> 185     values, _ = lib.maybe_convert_numeric(
    186         values, set(), coerce_numeric=coerce_numeric
    187     )
    188 except (ValueError, TypeError):
    189     if errors == "raise":

File ~/.local/lib/python3.9/site-packages/pandas/_libs/lib.pyx:2405, in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "?" at position 23```

#

the pandas.core.series.Series is the type(l1["bare_nuclei"])

charred light Dec 29, 2022, 3:13 PM

#

Ah, you'll need the errors flag

#

You have a "?" in one of your rows, which is causing the error.

#

rancid sorrel Dec 29, 2022, 3:14 PM

#

oh bugger yeah that would do it sorry this data is supposed to be sanitized already

#

time to go thow some chlorox at it

charred light Dec 29, 2022, 3:14 PM

#

Welcome to DS, the data is never clean (No matter what the data team tells you).

rancid sorrel Dec 29, 2022, 3:15 PM

#

yeah 16 ? in the data

#

thank god the dataset is small enough i can just open it in vs code

charred light Dec 29, 2022, 3:16 PM

#

That's probably why it's being read in as an object too.

rancid sorrel Dec 29, 2022, 3:16 PM

#

do we have a crap in crap out emoji?

charred light Dec 29, 2022, 3:16 PM

#

It should have defaulted as a int/float.

rancid sorrel Dec 29, 2022, 3:17 PM

#

yeah i would have expected that, honestly i am final year CS this is my first time dealing with Data Science

#

💩 in --> 💩 out

fading zealot Dec 29, 2022, 3:17 PM

#

@charred light in order to predict something from a dataset what are the steps we need to take into consideration?

#

apply models and predict the accuracy ?

rancid sorrel Dec 29, 2022, 3:17 PM

#

you need to do and EDA

#

first

#

the sweet viz libary will do most of that for you

fading zealot Dec 29, 2022, 3:18 PM

#

Explore data analysis and then apply the model to predict?

rancid sorrel Dec 29, 2022, 3:19 PM

#

#without scaling funtion for the models
def models(X_train,Y_train):
    
    #Logistic Regression
    from sklearn.linear_model import LogisticRegression
    log = LogisticRegression(random_state = 0)
    log.fit(X_train, Y_train)
    
    #Decision Tree
    from sklearn import tree
    from sklearn.tree import DecisionTreeClassifier
    dtc = DecisionTreeClassifier(criterion = 'entropy',random_state=0)
    
    #Random Forest class1ifier
    from sklearn.ensemble import RandomForestclass1ifier
    forest = RandomForestclass1ifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
    forest.fit(X_train, Y_train)

    #print model accuracy on the training data.

    print('[0]Logistic Regression Training Accuracy:', log.score(X_train, Y_train))
    print('[1]Decision Tree class1ifier Training Accuracy:', dtc.score(X_train, Y_train))
    print('[2]Random Forest class1ifier Training Accuracy:', forest.score(X_train, Y_train))

    return log, dtc, forest```

#

but thats a decent load of code to run all the models

charred light Dec 29, 2022, 3:19 PM

#

fading zealot <@287800670441046018> in order to predict something from a dataset what are the ...

Normally the process for modeling is as follows:

Feature engineering (Merge datasets, clean data, apply transformations) -> EDA (See relevant columns to be used in model) -> Modeling -> Check results + finetune if needed

fading zealot Dec 29, 2022, 3:20 PM

#

@charred light any youtube video to help me ?

fading zealot Dec 29, 2022, 3:20 PM

#

charred light Normally the process for modeling is as follows: **Feature engineering** (Merge...

thanks for this

rancid sorrel Dec 29, 2022, 3:20 PM

#

@fading zealot freecodecamp.org

fading zealot Dec 29, 2022, 3:21 PM

#

@rancid sorrel can we use Linear regression ?

rancid sorrel Dec 29, 2022, 3:21 PM

#

https://www.youtube.com/watch?v=r-uOLxNrNk8

YouTube

freeCodeCamp.org

Data Analysis with Python - Full Course for Beginners (Numpy, Panda...

Learn Data Analysis with Python in this comprehensive tutorial for beginners, with exercises included!
NOTE: Check description for updated Notebook links.

Data Analysis has been around for a long time, but up until a few years ago, it was practiced using closed, expensive and limited tools like Excel or Tableau. Python, SQL and other open libra...

▶ Play video

#

@fading zealot you test the prediction accuracy after

#

so use sklearn to split the data into train and test, then you apply the metrics used to test each model

#

once you have your models scored you can then chose the correct one

fading zealot Dec 29, 2022, 3:22 PM

#

Oh k

#

I wish to learn that slowly @rancid sorrel with focus using different data sets from kaggle

charred light Dec 29, 2022, 3:23 PM

#

fading zealot <@287800670441046018> any youtube video to help me ?

https://www.youtube.com/results?search_query=linear+regression+python+sklearn is a good point to start.
e.g. https://www.youtube.com/watch?v=b0L47BeklTE

But you'll also need to know how to merge datasets first. https://www.youtube.com/watch?v=h4hOPGo4UVU

fading zealot Dec 29, 2022, 3:24 PM

#

@charred light looking at the dataset I shared with you

#

what do you think what shall be a good hypothesis question

rancid sorrel Dec 29, 2022, 3:25 PM

#

@charred light my uni skipped al the data sanitation and cleaning and just threw us at the , AAN,CNN and was like "lol enjoy the deep end here is a rock for you to help you float"

#

but yeah if your learning i recommend you use MD for notes with something like obsidan.md to collate all your notes into a vault (it dosnt work well with Jupiter however)

#

at least not yet

charred light Dec 29, 2022, 3:27 PM

#

fading zealot what do you think what shall be a good hypothesis question

One could be: "Does CO2 emissions cause global temperatures to rise? (Spoilers: Yes it does)"
Another could be: "Does EV prices have an affect on CO2 emissions?"

fading zealot Dec 29, 2022, 3:27 PM

#

Oh merge is so easy, it is just combining two dataframes and passing on the query using an attribute

charred light Dec 29, 2022, 3:28 PM

#

rancid sorrel <@287800670441046018> my uni skipped al the data sanitation and cleaning and jus...

The uni course probably has data cleaning/processing as prerequisite. At least that's what I would assume.

fading zealot Dec 29, 2022, 3:28 PM

#

The problem at the moment is the term period I have .. I need to finish off with this asap

rancid sorrel Dec 29, 2022, 3:29 PM

#

yeah i got 10 days for mine to be in too and i am on holiday abroard 🙂

#

i feel you

charred light Dec 29, 2022, 3:29 PM

#

You'll probably be using 1978 Year + for global temps, and limited to 2010 year + for EV.

fading zealot Dec 29, 2022, 3:29 PM

#

@charred light you seem an expert in this field

charred light Dec 29, 2022, 3:30 PM

#

No expert here, just working as a data scientist. There's a lot to this field.

rancid sorrel Dec 29, 2022, 3:31 PM

#

honestly i hope to reach the levels of skill you have oneday skyglow

fading zealot Dec 29, 2022, 3:31 PM

#

@charred light if you want me to be honest. All I want you to explain using a Jupyter notebook and coding at the moment

#

and then guide me how to be a good data scientist

#

I am willing to learn and make it happen

rancid sorrel Dec 29, 2022, 3:31 PM

#

well not that you have the time for all of it but i think you need to go back to basics

#

freecodecamp > cs50
intro to github basics

fading zealot Dec 29, 2022, 3:32 PM

#

just 9 hours to get done with the project

#

thats the problem

rancid sorrel Dec 29, 2022, 3:32 PM

#

yeah this is like 2 weeks of time to watch

#

yeah i know what you mean, well bascialy your kinda boned

fading zealot Dec 29, 2022, 3:32 PM

#

ya

#

If you or skyglow will help me with this project

#

that would be great

#

and then make a schedule or plan what to learn and how to be a good scientist

rancid sorrel Dec 29, 2022, 3:34 PM

#

once you have followed, skyglows reccomendations, by merging the datasets

fading zealot Dec 29, 2022, 3:34 PM

#

okay i will do that

#

@rancid sorrel can you stay online here

#

@charred light please do stay. I might need help

#

As a data Scientist, do let me know the road to be a good data scientist

#

@rancid sorrel do we need find the missing values as well

rancid sorrel Dec 29, 2022, 3:38 PM

#

you can however assuming like me your not dealing with 8Pb datasets

#

you can just get the datasets into the correct shape, then deal with the sanitization

#

so share> sanitize > merge

fading zealot Dec 29, 2022, 3:39 PM

#

ok

#

@charred light this was my hypothesis question

#

‘Will the increased usage of electric vehicles aid in decreasing CO2 emissions, therefore leading to reduced global warming?’

rancid sorrel Dec 29, 2022, 3:41 PM

#

no it wont

#

but thats a different issue

fading zealot Dec 29, 2022, 3:42 PM

#

okay

rancid sorrel Dec 29, 2022, 3:43 PM

#

your dealing with the demand side fo the equation not the supply side, and thats the main issue with EV

charred light Dec 29, 2022, 3:43 PM

#

fading zealot ‘Will the increased usage of electric vehicles aid in decreasing CO2 emissions, ...

You can use that hypothesis. I'm not sure if you can see a significant impact over 12 years worth of data though.

rancid sorrel Dec 29, 2022, 3:43 PM

#

honestly we should all be using h2 from biofule with carbon capture at conversion . but thats just my opinion

fading zealot Dec 29, 2022, 3:44 PM

#

rancid sorrel your dealing with the demand side fo the equation not the supply side, and thats...

true

charred light Dec 29, 2022, 3:44 PM

#

Or have better public transportation (for the US).

rancid sorrel Dec 29, 2022, 3:44 PM

#

my general point is you got some major survivor bias with that analysis with your dataset

fading zealot Dec 29, 2022, 3:44 PM

#

what would a unique hypothesis question ?

fading zealot Dec 29, 2022, 3:45 PM

#

charred light One could be: "Does CO2 emissions cause global temperatures to rise? (Spoilers: ...

this one ?

rancid sorrel Dec 29, 2022, 3:46 PM

#

honestly id compair the amount of EV adoption vs the adoption of renewable energy generation

#

and see how much energy your wasting, and analyse the supply side shortfall compared to gas

#

if you can chose your datasets your better off using non US data,

#

cause use emmesions are crap shoot (imo)

charred light Dec 29, 2022, 3:47 PM

#

fading zealot

This is the datasets that was provided.

fading zealot Dec 29, 2022, 3:47 PM

#

yes

rancid sorrel Dec 29, 2022, 3:48 PM

#

fair enough just plow though it then 🙂

fading zealot Dec 29, 2022, 3:48 PM

#

but was said we can use external data as well

rancid sorrel Dec 29, 2022, 3:48 PM

#

sorry for going off on datascience no1 rule of cynicism

#

EU/UK has good data on this

#

as they use the europe emission standards, also EU has the satellite that tracks it

#

uk is good for EV adoption as its data is centraly avaible and accurate due to the regulations

fading zealot Dec 29, 2022, 3:49 PM

#

@rancid sorrel how long does it take get done with the code ?

rancid sorrel Dec 29, 2022, 3:50 PM

#

<https://www.gov.uk/government/statistical-data-sets/vehicle-licensing-statistics-data-tables

GOV.UK

Vehicle licensing statistics data tables

Detailed statistics about vehicle licensing and registered vehicles in the United Kingdom.

charred light Dec 29, 2022, 3:51 PM

#

Yea, US has something like that from DOT (Department of Transportation). Although, most of it is null lol.

rancid sorrel Dec 29, 2022, 3:52 PM

#

you have to have all your info up to date in the uk

#

or your going to jail, there are automatic number plate readers everwhere

#

so the DVLA (DMV) has a complete dataset

fading zealot Dec 29, 2022, 3:52 PM

#

@charred light how long will it take for you get done with the feature engineering and the EDA with the data sets provided to you ?

rancid sorrel Dec 29, 2022, 3:53 PM

#

@fading zealot EDA you use sweetviz

#

and it will do it for you in about 1 line

fading zealot Dec 29, 2022, 3:53 PM

#

ok

rancid sorrel Dec 29, 2022, 3:53 PM

#

analysis = sv.analyze(l1, target_feat='class1') analysis.show_html('EDA-Sweetviz2.html', open_browser=False)

fading zealot Dec 29, 2022, 3:54 PM

#

i am installing

arctic wedgeBOT Dec 29, 2022, 3:55 PM

#

Hey @rancid sorrel!

It looks like you tried to attach file type(s) that we do not allow (.html). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

rancid sorrel Dec 29, 2022, 3:55 PM

#

well thats sensible

#

but yeah it creates a html report

fading zealot Dec 29, 2022, 3:57 PM

#

@rancid sorrel @charred light please stay

#

@charred light what to do after merging

charred light Dec 29, 2022, 4:01 PM

#

fading zealot <@287800670441046018> what to do after merging

Look at the relevant columns, and then prep the data for modeling.

fading zealot Dec 29, 2022, 4:01 PM

#

which datasets you want me use for the hypothesis question

#

co2 emission and ev datasets ?

charred light Dec 29, 2022, 4:03 PM

#

If you merged correctly, you should have 1 dataset with all the columns.

fading zealot Dec 29, 2022, 4:03 PM

#

I know. that

#

but which datasets would you recommend me to merge

#

thats what I am asking

rancid sorrel Dec 29, 2022, 4:04 PM

#

All of them

tawny turtle Dec 29, 2022, 4:04 PM

#

why packages not showing?

fading zealot Dec 29, 2022, 4:05 PM

#

rancid sorrel All of them

okay

#

I might be annoying to both of you but all I am left is with 8 hours to finish it off @rancid sorrel @charred light Apologies.

#

@charred light can you do the feature engineering for me ?

#

in a jypter notebook

agile cobalt Dec 29, 2022, 4:12 PM

#

fading zealot I might be annoying to both of you but all I am left is with 8 hours to finish i...

finish what exactly?

charred light Dec 29, 2022, 4:12 PM

#

agile cobalt finish what exactly?

Their homework lol

fading zealot Dec 29, 2022, 4:13 PM

#

charred light Normally the process for modeling is as follows: **Feature engineering** (Merge...

this

agile cobalt Dec 29, 2022, 4:13 PM

#

fading zealot <@287800670441046018> can you do the feature engineering for me ?

so yeah, no - that would be overstepping by quite a long margin.

fading zealot Dec 29, 2022, 4:13 PM

#

charred light Their homework lol

it is just bcoz I dont understand now and I am left with no time

serene scaffold Dec 29, 2022, 4:13 PM

#

fading zealot I might be annoying to both of you but all I am left is with 8 hours to finish i...

please don't ping people asking them to answer questions for you. everyone is a volunteer and no one is on-call to provide help.

fading zealot Dec 29, 2022, 4:13 PM

#

I know

agile cobalt Dec 29, 2022, 4:14 PM

#

fading zealot it is just bcoz I dont understand now and I am left with no time

get a zero and study for the next time then

serene scaffold Dec 29, 2022, 4:14 PM

#

fading zealot I know

then why did you do it?

fading zealot Dec 29, 2022, 4:14 PM

#

Because I am stuck at the moment

serene scaffold Dec 29, 2022, 4:14 PM

#

I don't know that we'll be able to help you become unstuck before the assignment is due.

fading zealot Dec 29, 2022, 4:14 PM

#

I am not a frequent user asking for help

#

it is just bcoz I am stuck and with the christmas around I was not in a state to get done with everything

agile cobalt Dec 29, 2022, 4:16 PM

#

we and your teacher(s) would rather have you ask for help often to learn things when you're supposed to than ask for help only when given homework, and ask for everything then

fading zealot Dec 29, 2022, 4:16 PM

#

please don't preach me ..

#

we had four assignments in the span of 2 weeks

rancid sorrel Dec 29, 2022, 4:17 PM

#

i am in same situation and litterly dealing with youtube rn, btw do we have any rep?

serene scaffold Dec 29, 2022, 4:17 PM

#

fading zealot please don't preach me ..

you can ask a specific question that doesn't require catching up on all the context of what you're trying to do, or you can seek help elsewhere.

fading zealot Dec 29, 2022, 4:17 PM

#

Moreover i am project manager and all I am stuck is with this code thats all

rancid sorrel Dec 29, 2022, 4:17 PM

#

i would like to thank @charred light for helping me with my particual problem

serene scaffold Dec 29, 2022, 4:18 PM

#

fading zealot Moreover i am project manager and all I am stuck is with this code thats all

I'm sorry that you're struggling. but all this extra information about your personal situation isn't relevant.

fading zealot Dec 29, 2022, 4:18 PM

#

@serene scaffold I know what you are trying to explain

#

I am not dumb to come up here to get done with my assignment

serene scaffold Dec 29, 2022, 4:19 PM

#

fading zealot <@253696366952316929> I know what you are trying to explain

If you want help, please make sure that your next message is a stand-alone explanation of what the problem is, with relevant code samples as needed.

fading zealot Dec 29, 2022, 4:20 PM

#

@serene scaffold Seriously brother. It's not easy to understand other problems what they go through

serene scaffold Dec 29, 2022, 4:21 PM

#

fading zealot <@253696366952316929> Seriously brother. It's not easy to understand other probl...

This isn't going anywhere. If you're not going to ask your question in a way where people can provide you with meaningful assistance (and without doing it for you), we're just going to keep talking in circles.

fading zealot Dec 29, 2022, 4:21 PM

#

when family is around with 20 people in your house and one wish to study .. it is next to impossible for me get done with everything

#

Ok

serene scaffold Dec 29, 2022, 4:22 PM

#

No more discussion of your personal situation in this channel. You can ask a question if you want. "Please do my homework for me", as we've discussed, does not count.

fading zealot Dec 29, 2022, 4:22 PM

#

I never mentioned do my homework for me as a statement

#

Check if you want

#

Stop this crap and coding for a non-technical person is hard

#

Fine I wont ask anything

#

You need to look at me that I am eager to learn and understand the concepts

#

No more discussion please @serene scaffold

serene scaffold Dec 29, 2022, 4:25 PM

#

that's what I've been asking for bing_shrug

fading zealot Dec 29, 2022, 4:26 PM

#

Query - How to view multiple csv files in jypter notebook?

#

@charred light

serene scaffold Dec 29, 2022, 4:26 PM

#

fading zealot <@287800670441046018>

stop pinging specific people. I already said that.

#

I was about to answer your question, too

fading zealot Dec 29, 2022, 4:26 PM

#

ok

#

noted

serene scaffold Dec 29, 2022, 4:27 PM

#

you can ping people if they've already engaged with your current question. not if they engaged with a similar question in the past.

you would have to load each csv with pd.read_csv and then have the name of each df be the last statement of a cell

#

so if there are 3 dataframes to display, you need 3 cells.

fading zealot Dec 29, 2022, 4:28 PM

#

I mean all to say all the three datasets

serene scaffold Dec 29, 2022, 4:28 PM

#

fading zealot I mean all to say all the three datasets

is each dataset not a CSV file?

fading zealot Dec 29, 2022, 4:29 PM

#

it is

#

but I have four datasets

#

I want to import all of them

serene scaffold Dec 29, 2022, 4:30 PM

#

so, you would do pd.read_csv four times. one for each dataset. and then you'd need four cells to display them

#

because each cell displays the result of the last statement

fading zealot Dec 29, 2022, 4:30 PM

#

it gave utf-8 error

serene scaffold Dec 29, 2022, 4:30 PM

#

remember to always show the whole error message.

fading zealot Dec 29, 2022, 4:30 PM

#

ok noted

#

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 150: invalid start byte

serene scaffold Dec 29, 2022, 4:31 PM

#

fading zealot UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 150: invali...

okay, so the encoding of your CSV file is different than expected. try encoding='ascii' in the read_csv function

#

so it would look something like df1 = pd.read_csv('file/path.csv', encoding='ascii')

fading zealot Dec 29, 2022, 4:34 PM

#

serene scaffold Dec 29, 2022, 4:34 PM

#

you need a comma between the file name and the encoding= part

fading zealot Dec 29, 2022, 4:34 PM

#

ok noted sir

serene scaffold Dec 29, 2022, 4:35 PM

#

if you do df.head() four times, only the last one will be displayed, afaik

rancid sorrel Dec 29, 2022, 4:36 PM

#

serene scaffold if you do `df.head()` four times, only the last one will be displayed, afaik

your correct you have to use print (df.head()) for each one otherwise

fading zealot Dec 29, 2022, 4:36 PM

#

UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 30: ordinal not in range(128)

serene scaffold Dec 29, 2022, 4:37 PM

#

fading zealot UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 30: ordinal...

is the data in any particular language?

fading zealot Dec 29, 2022, 4:37 PM

#

Can I share the datasets ?

serene scaffold Dec 29, 2022, 4:37 PM

#

you can put them in the paste bin

#

!paste

arctic wedgeBOT Dec 29, 2022, 4:37 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold Dec 29, 2022, 4:37 PM

#

but there isn't really any way for us to know what the encoding will be except to guess.

fading zealot Dec 29, 2022, 4:37 PM

#

fading zealot I have 4 dataset

these are the datasets

#

oky i will try my way to view the dataset

serene scaffold Dec 29, 2022, 4:39 PM

#

try removing the encoding part and doing endoing_errors='ignore' and see if that works

fading zealot Dec 29, 2022, 4:39 PM

#

ok

serene scaffold Dec 29, 2022, 4:40 PM

#

it looks like the problem is whatever char is at the end of this
Afghanistan,AF,93,1752,0,41128771,652230,0.40%,63/km�

#

so, we can just ignore it.

fading zealot Dec 29, 2022, 4:41 PM

#

yes

serene scaffold Dec 29, 2022, 4:43 PM

#

fading zealot yes

did it work? I'll be leaving in about ten minutes btw.

fading zealot Dec 29, 2022, 4:44 PM

#

oky

#

can you just tell me what feature engineering is

#

what are the steps need to be taken into consideration

serene scaffold Dec 29, 2022, 4:45 PM

#

it's where you use the data to create additional features

fading zealot Dec 29, 2022, 4:45 PM

#

with the existing data ..yes i know that

#

how to predict something we need to have an hypothesis question

#

i have that

serene scaffold Dec 29, 2022, 4:46 PM

#

like these features: Year,BEV average price (USD),Global Sales Volume,Mileage (Km),Lithium Ion Battery Price (USD),,,Average price of new car

#

you might make another feature, battery price per milage

#

or something like that

fading zealot Dec 29, 2022, 4:46 PM

#

charred light Normally the process for modeling is as follows: **Feature engineering** (Merge...

is this what I need to do

serene scaffold Dec 29, 2022, 4:47 PM

#

Feature engineering (Merge datasets, clean data, apply transformations)
imo, only the "apply transformations" is feature engineering. data cleaning is its own thing.

rancid sorrel Dec 29, 2022, 5:08 PM

#

can anyone explain why
`missing_values = ["NA","N/a",np.nan,"?"]

u1 = pd.read_csv("../DataSets/Breast cancer dataset/breast-cancer-wisconsin.data",header=None,na_values=missing_values)
ul.dropna()`

#

isnt working for ul.dropna()

#

is dropna() a predefinded funtion or does it take my missing_values when called this way

pliant fox Dec 29, 2022, 5:35 PM

#

.

charred light Dec 29, 2022, 5:48 PM

#

rancid sorrel isnt working for ul.dropna()

dropna() generally only applies for None types or np.NaN (Not a Number). See https://pandas.pydata.org/docs/user_guide/missing_data.html#values-considered-missing

You'll need to filter out user defined list of null values manually. (You can use some type of filter to do this. e.g. .isin())

rancid sorrel Dec 29, 2022, 5:49 PM

#

it appears i needed l1 = l1.dropna()

fading hill Dec 29, 2022, 5:49 PM

#

What does numpy use for the visualization of data in their documentation?

rancid sorrel Dec 29, 2022, 5:50 PM

#

missing_values = ["NA","N/a",np.nan,"?"] << appears to flag the ? as a null value

#

i also tried l1['bare_nuclei'] = pd.to_numeric(l1['bare_nuclei'],errors='coerce') @charred light

charred light Dec 29, 2022, 5:51 PM

#

rancid sorrel i also tried `l1['bare_nuclei'] = pd.to_numeric(l1['bare_nuclei'],errors='coerce...

Did some error pop up?

rancid sorrel Dec 29, 2022, 5:51 PM

#

no that swaps the errors to nan

#

at which point icould do a drop null easily

#

errors='coerce' << swaps errors to nan

charred light Dec 29, 2022, 5:53 PM

#

Ok, good to hear.

rancid sorrel Dec 29, 2022, 5:54 PM

#

unless i am making a mistake. but so far thats a fairly novel way to handle the problem. now to turn autopilot back on

charred light Dec 29, 2022, 5:56 PM

#

It can be better to go in and manually fix errors (depending on scale of your data). Like mentioned earlier, if the data point is 16 ?. Then it could be better to clean this with apply and some function. But then again, if you have large rows of data, it doesn't really matter losing one data point or two.

charred light Dec 29, 2022, 5:57 PM

#

pliant fox .

You might want to clarify what you mean by flattening a pip list. I assume pip here is apart of the pip python package, which you can send to a txt file.

rancid sorrel Dec 29, 2022, 5:57 PM

#

charred light It can be better to go in and manually fix errors (depending on scale of your da...

100% but i am working with a team

#

and i am responsible for creating the templates for the EDA

#

so the templates have to do the cleaning for them rather than me manipulating the data, even if the data is small

#

also honestly the data forensic training i had really makes me not want to change the original data>

#

in band data manipulation is just a habit that's been forced into me so you can see the manipulation clearly. also the academics prefer it

charred light Dec 29, 2022, 6:01 PM

#

Yea, academia tends to have perfect data. Real world data is mostly nulls lol

rancid sorrel Dec 29, 2022, 6:02 PM

#

honestly my background is sysadmin and cybersecurity (15 is years) going back to accedmia was a bit of a mindfuck

#

at work id fix it with sed and damn the data or pull this from a SQL server and fix it there

charred light Dec 29, 2022, 6:03 PM

#

I would sure hope the data is cleaned before heading into a database.

rancid sorrel Dec 29, 2022, 6:03 PM

#

honestly thats usally where you get the crap show

charred light Dec 29, 2022, 6:05 PM

#

Yea, it's always a fight with our digital team that handles the databases lol

rancid sorrel Dec 29, 2022, 6:06 PM

#

honestly its usally the frountends fault 😉

#

they are not doing the sanitisation at the java script 😉

charred light Dec 29, 2022, 6:07 PM

#

~~that's why I add a ; DROP TABLE usernames every time a wifi connection asks me for info~~

rancid sorrel Dec 29, 2022, 6:08 PM

#

haha

#

i need to get into more microservice so i can do this full stack 😉

toxic vault Dec 29, 2022, 6:16 PM

#

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Load and preprocess stock data
data = load_stock_data()
X = data[['past_performance']]
y = data['future_performance']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Use the model to make predictions on the test data
predictions = model.predict(X_test)

# Evaluate the model's performance
score = model.score(X_test, y_test)
print("Model accuracy:", score)

# Use the model to make decisions about which stocks to buy or sell
while True:
  current_stocks = get_current_stocks()
  current_prices = get_current_prices(current_stocks)
  for stock, price in current_prices.items():
    prediction = model.predict([[price]])
    if prediction > price:
      # Buy the stock
      buy_stock(stock, price)
    elif prediction < price:
      # Sell the stock
      sell_stock(stock, price)
  time.sleep(3600) # Wait an hour before making new predictions```

#

Would this work as a stock trading bot?

#

What add ons or tweaks would need to be made for it to work efficiently

charred light Dec 29, 2022, 6:31 PM

#

toxic vault Would this work as a stock trading bot?

Depending on how much money you willing to lose. Maybe have a hold out set to test.

keen notch Dec 29, 2022, 6:32 PM

#

pliant fox .

?what

hasty mountain Dec 29, 2022, 6:36 PM

#

If I make a GAN and, instead of passing the Generator's output directly to the Discriminator, I pass it to a SuperResolution model and only then I pass it to the Discriminator...would this make things too messy?

wooden sail Dec 29, 2022, 6:39 PM

#

it should be fine, it'd make the two more capable of working independently. ofc there's now way more parameters so the training would be slower (unless you train and freeze the super res part ahead of time)

hasty mountain Dec 29, 2022, 6:40 PM

#

Yeah, I was thinking about using a pretrained SRGAN in order to make my Generator(in my current GAN) to produce images with a better resolution

#

I mean...even images in 64x64 are so blurry without superresolution models...
(At least if I'm not making anything wrong...which is quite likely)

wooden sail Dec 29, 2022, 6:42 PM

#

do keep in mind though that doing SR does not result in any extra info... at least normally. using a network that might be different, but the added info is bias that the network picked up from the training data

#

the original architecture without SR should perform about as well as with SR if everything is working ideally

hasty mountain Dec 29, 2022, 6:46 PM

#

Wouldn't the SR stimulate the generator to make better images? I mean, it would remark the generator's mistakes for the discriminator, wouldn't it?

toxic vault Dec 29, 2022, 6:47 PM

#

@charred light like a minimum Range and a maximum Range?

wooden sail Dec 29, 2022, 6:48 PM

#

hasty mountain Wouldn't the SR stimulate the generator to make better images? I mean, it would ...

that'll depend on what exactly the network does. most cost functions anyway have some kind of averaging incorporated, so i'm not convinced it'll make a huge difference

#

but try it out and see!

hasty mountain Dec 29, 2022, 6:49 PM

#

I'll need an AWS entire building just to learn about GANs... pithink

charred light Dec 29, 2022, 6:49 PM

#

toxic vault <@287800670441046018> like a minimum Range and a maximum Range?

Meant more as a joke. You should be testing your model on a hold out set if you haven't.

sudden ermine Dec 29, 2022, 8:25 PM

#

Helpful resource for data science - Data science dictionary
https://play.google.com/store/apps/details?id=com.neuralnetworker.datasciencedictionary

Data Science Dictionary - Apps on Google Play

Data Science Dictionary - 700+ Terms

rapid pasture Dec 29, 2022, 8:34 PM

#

Hello guys, I have this error : sklearn.exceptions.NotFittedError: The TF-IDF vectorizer is not fitted
at the code below :

def predict(message : str) : 
    tfidf_vectorizer=TfidfVectorizer(stop_words='english')
    pac = pickle.load(open("model.pkl", "rb"))
    vectMessage = tfidf_vectorizer.transform(pd.Series([message]))
    prediction = pac.predict(vectMessage)
    return prediction[0]

I don't know why I have this error because I am transforming the series so it can fit. Thanks in advance for the help.

#

Before saving the model into a pkl it worked fine, but now it keeps raising this error

#

if somebody could help it would be good, thanks in advance

sudden ermine Dec 29, 2022, 8:45 PM

#

rapid pasture Hello guys, I have this error : `sklearn.exceptions.NotFittedError: The TF-IDF ...

The error sklearn.exceptions.NotFittedError: The TF-IDF vectorizer is not fitted is raised when you try to use a scikit-learn estimator that has not been fitted yet.

In your case, it seems that you are trying to transform a message into a vector using the tfidf_vectorizer object, but this object has not been fitted to any data. To fit the vectorizer, you need to pass a list of documents to the fit method, like this:

tfidf_vectorizer=TfidfVectorizer(stop_words='english')
tfidf_vectorizer.fit(list_of_documents)

Where list_of_documents is a list of strings representing the documents you want to fit the vectorizer to. Once the vectorizer is fitted, you can use the transform method to transform new documents into vectors.

In your case, you might want to consider fitting the vectorizer to the training data that you used to train the classifier, so that the vectorizer is able to transform new messages in the same way as it did for the training data.

I hope this helps!

young granite Dec 29, 2022, 8:47 PM

#

how can i find multi-input models from scikit learn and compare them on a given dataset?

sudden ermine Dec 29, 2022, 8:50 PM

#

young granite how can i find multi-input models from scikit learn and compare them on a given ...

Here is an example of how you could use the model_selection module to compare different models on a given dataset:

from sklearn.model_selection import cross_val_score

Load the dataset

X = ... # Input features
y = ... # Target variable

Define a list of models to compare

models = [LinearRegression(), LogisticRegression(), DecisionTreeRegressor(), RandomForestRegressor(), SVR()]

Iterate over the models and print their mean cross-validation score

for model in models:
scores = cross_val_score(model, X, y, cv=5) # 5-fold cross-validation
print(f"{model.class.name}: {scores.mean():.2f}")

This code will train each model on the training folds of a 5-fold cross-validation, and then evaluate its performance on the corresponding test folds. The mean cross-validation score for each model will be printed at the end.

I hope this helps. Let me know about it.

young granite Dec 29, 2022, 8:51 PM

#

sudden ermine Here is an example of how you could use the model_selection module to compare di...

ty for respons i do use CV for LR but i want to compare LR with other models, however i dont find on scikit-site a list of models who are compatible

rapid pasture Dec 29, 2022, 8:54 PM

#

sudden ermine The error sklearn.exceptions.NotFittedError: The TF-IDF vectorizer is not fitted...

Yes i thought it was possible to consider the text as something to be predicted by the model

#

so I will always need to insert the new document I want to predict inside a serie of documents right?

sudden ermine Dec 29, 2022, 8:58 PM

#

rapid pasture so I will always need to insert the new document I want to predict inside a seri...

Yes, that's correct. When you use a scikit-learn vectorizer to transform a document into a numerical representation (also known as a feature vector), you need to pass the document as a list of strings to the transform method.

For example, if you want to transform a single document message, you can do it like this:

vectMessage = tfidf_vectorizer.transform([message])

If you want to transform multiple documents at once, you can pass them as a list of strings:

vectMessages = tfidf_vectorizer.transform(list_of_messages)

Where list_of_messages is a list of strings representing the documents you want to transform.

rapid pasture Dec 29, 2022, 9:04 PM

#

rapid pasture Hello guys, I have this error : `sklearn.exceptions.NotFittedError: The TF-IDF ...

yes but in my code as you can see I wrote [message]

sudden ermine Dec 29, 2022, 9:09 PM

#

Disclaimer: those are the responses of chatgpt.

odd meteor Dec 29, 2022, 9:13 PM

#

young granite how can i find multi-input models from scikit learn and compare them on a given ...

To add to what CheemaBhaiExpereince has suggested, I also use this approach when I'm extremely lazy.

young granite Dec 29, 2022, 9:16 PM

#

odd meteor To add to what CheemaBhaiExpereince has suggested, I also use this approach when...

this looks good to me ❤️

#

does all work for multiple features?

mighty meteor Dec 29, 2022, 9:18 PM

#

can someone help me?

odd meteor Dec 29, 2022, 9:20 PM

#

sudden ermine Disclaimer: those are the responses of chatgpt.

#announcements message

odd meteor Dec 29, 2022, 9:22 PM

#

mighty meteor can someone help me?

Post what you need help with and people will respond to your question.

charred light Dec 29, 2022, 9:24 PM

#

odd meteor https://discord.com/channels/267624335836053506/354619224620138496/1049439289764...

I wish this was added under #rules or #code-of-conduct for easier access.

cerulean lodge Dec 29, 2022, 9:25 PM

#

Is anyone aware of Lark grammar libraries? Not even necessarily actual libs but even just published grammar sets. I searched the discord, doesn't seem to be much here. I found: https://pypi.org/project/lark-grammars/0.3.0/ but I'm wondering if I can get an even larger set of grammar use cases/solutions.

PyPI

lark-grammars

Lark grammars for using wht Hypothesis testing library

sudden ermine Dec 29, 2022, 9:26 PM

#

sudden ermine Helpful resource for data science - Data science dictionary https://play.google....

What do you guys think about this app?

cerulean lodge Dec 29, 2022, 9:27 PM

#

I saw you posted that same link on the data science discord @sudden ermine. Your own app?

sudden ermine Dec 29, 2022, 9:28 PM

#

Yes Its my first app

young granite Dec 29, 2022, 9:30 PM

#

@odd meteor it seems like this works only for 1 feature inputs?

odd meteor Dec 29, 2022, 9:31 PM

#

charred light I wish this was added under <#693837295685730335> or <#783829285227462697> for e...

Nice observation. I'll forward your observation to the main mods.

cc: @serene scaffold

serene scaffold Dec 29, 2022, 9:33 PM

#

odd meteor Nice observation. I'll forward your observation to the main mods. cc: <@2536963...

there's ongoing discussion about that in #1044328825145786458

odd meteor Dec 29, 2022, 9:34 PM

#

young granite <@519319496868233227> it seems like this works only for 1 feature inputs?

It works for more than 1 input features x1, x2,..., xn

young granite Dec 29, 2022, 9:34 PM

#

odd meteor It works for more than 1 input features `x1, x2,..., xn`

for both x and y?

odd meteor Dec 29, 2022, 9:34 PM

#

serene scaffold there's ongoing discussion about that in <#1044328825145786458>

I had no idea. Okay that's supercool.

young granite Dec 29, 2022, 9:34 PM

#

currently i use df as input and it returns an empty result for clf.fit

#

do i need to convert my features into arrays?

#

normally scikit works with dfs aswell?

#

@odd meteor any idea?

#

i converted the df now to array with .to_numpy() but still empty result

odd meteor Dec 29, 2022, 9:46 PM

#

young granite currently i use df as input and it returns an empty result for clf.fit

Hi Greenleek, If you were able to split your data with Train-Test-Split, all you need to do next is to just follow the snapshot I sent. If it's still not super clear, I'll advise you try replicating this on your PC to see how it works. Once you are able to replicate the results, then using lazypredict in your own project will become easier to grab.

Just follow the screenshot I sent, or better still check the documentation for more clarity. https://lazypredict.readthedocs.io/en/latest/usage.html#classification

young granite Dec 29, 2022, 9:47 PM

#

odd meteor Hi Greenleek, If you were able to split your data with Train-Test-Split, all you...

thanks for the reply emyrs but i did tried ur method on the test-set however when using my data its not working Q_Q

#

if i use lazy regressor it gives different results then my previously determined ones so i guess something isnt working with the input

#

i dunno why tho cause the input is exactly the same as in the example, with the only difference that both X and y got multiple features

odd meteor Dec 29, 2022, 9:49 PM

#

young granite thanks for the reply emyrs but i did tried ur method on the test-set however whe...

Just try and see if you can replicate the result from the example on the documentation page.

young granite Dec 29, 2022, 9:49 PM

#

odd meteor Just try and see if you can replicate the result from the example on the documen...

i can

odd meteor Dec 29, 2022, 9:52 PM

#

young granite i can

Awesome.

The reason you're getting different result could be because of a couple of things...

Random State used
The hyperparameter tuned/involved etc

#

So long as you can replicate the result on the documentation page, you can just pick, say, the top 3 algorithms and try to do more hyperparameter tuning to improve the model performance.

young granite Dec 29, 2022, 9:55 PM

#

but i do get an empty "models" after running the lazyclassifier 🗿

#

so it seems no model works for the offered data

#

even tho the data is in the correct format

odd meteor Dec 29, 2022, 9:57 PM

#

young granite but i do get an empty "models" after running the lazyclassifier 🗿

Are you building a classification model or regression model? I've not had that issue using the library before. although I've not used it for a while now.

young granite Dec 29, 2022, 9:58 PM

#

regression but there it results in errors as well

#

"y should be a 1d array, got an array of shape (20,20) instead"

#

so they arent capable to perform on multiple features iguess

#

and the LinearRegressor offers waaaaay different results then my previously run scikit

odd meteor Dec 29, 2022, 10:07 PM

#

young granite "y should be a 1d array, got an array of shape (20,20) instead"

I've not had any issue with the library (the couple of times I used it). It worked perfectly. If you believe this is a serious issue, perhaps you can raise this issue on the library's GitHub page. https://github.com/shankarpandala/lazypredict

I'll try to use the library once I get home today to confirm if it's still working properly.

Also, you share the error message / your code?

GitHub

GitHub - shankarpandala/lazypredict: Lazy Predict help build a lot ...

Lazy Predict help build a lot of basic models without much code and helps understand which models works better without any parameter tuning - GitHub - shankarpandala/lazypredict: Lazy Predict help ...

young granite Dec 29, 2022, 10:09 PM

#

odd meteor I've not had any issue with the library (the couple of times I used it). It work...

there is no error simply an empty result frame but thanks for ur help, ill stick to manual changing models 🗿

hasty mountain Dec 29, 2022, 10:14 PM

#

Guys, I want to use a model in Pytorch which outputs 2 classes using softmax(softmax, not sigmoid). However, I don't know really which Loss Function I should use, as Pytorch's Cross Entropy includes a LogSoftmax implemented, and NLLLoss expects an output generated by a LogSoftmax function.
Any suggestion?

dense crane Dec 29, 2022, 10:51 PM

#

are there an assosiated rules for none binary variable?

#

like for iris dataset for example

hasty mountain Dec 30, 2022, 1:36 AM

#

Since there's some folks here that are quite mathmaniacs, can someone tell me if this madness I made makes sense?
The idea here is to adapt the Dot-Product Attention from Transformer(NLP) into a Element-Wise Attention layer to extract features from images. I want to avoid Matrix Multiplications because they're too computationally expensive.

class AttentionBlock(nn.Module):

    def __init__(self, in_channels, n_attention_weights):

        super(AttentionBlock, self).__init__()

        self.create_x_weights = nn.Conv2d(in_channels, n_attention_weights, kernel_size=1, stride=1, bias=False)
        self.create_y_weights = nn.Conv2d(in_channels, n_attention_weights, 1, 1, bias=False)
        self.conclude_attention = nn.Conv2d(n_attention_weights, in_channels, 1, 1, bias=False)

        self.Xsoftmax = nn.Softmax(-2) # Computes softmax over the X axis in a feature map
        self.Ysoftmax = nn.Softmax(-1) # Computes softmax over the Y axis in a feature map

    def forward(self, input):

        x_weights = self.create_x_weights(input)
        y_weights = self.create_y_weights(input)

        x_weights = self.Xsoftmax(x_weights)
        y_weights = self.Ysoftmax(y_weights)

        attention_weights = x_weights * y_weights

        attention_weights = self.conclude_attention(attention_weights)

        attention_output = attention_weights * input

        return attention_output

I've noticed that Transformer uses a "similarity matrix", that is the dot-product between queries and keys, then applies softmax to this product. But I don't see exactly how I could use something like this here, so I just applied softmax over the rows of some feature maps(which would be the row weights) and softmax over the columns of other feature maps(column weights) and then apply element-wise product to the input. The higher the X and Y weights, higher the final product, higher the relevancy...or so this is what I want.

#

I should simply test this...but I'm also crazy enough to have this idea while trying to make a GAN, so even if this works, it might not appear so, since...well...GANs things

#

I also don't know if maybe wouldn't it be better to just stick with a single Conv2D instead of doing all this

worthy hollow Dec 30, 2022, 3:31 AM

#

hey guys so i have this code that makes an error at the division of 2 different dataframe that i want to use for a new data frame... It makes NaN everywhere.. Here's the code: ```py

Chargement des données financières des entreprises du secteur pharmaceutique

df = pd.read_csv(r'data//income_statement.csv')

Sélection des colonnes à inclure dans l'analyse

cols = ['entreprise', 'date', 'chiffre_affaires', 'resultat_operationnel', 'resultat_net']
df = df[cols]
df['date'] = pd.to_datetime(df['date'], format='%Y', errors='ignore')

Filtrage des données pour ne conserver que les années précédant la covid-19 (2019 et avant) et celles incluant la covid-19 (2020 et après)

df_avant_covid = df[df['date'].dt.year < 2020]
df_apres_covid = df[df['date'].dt.year >= 2020]

Calcul de la moyenne annuelle des chiffres d'affaires et des résultats opérationnels pour chaque entreprise, avant et après la covid-19

df_avant_covid = df_avant_covid.groupby(['entreprise', df_avant_covid['date'].dt.year]).mean()
df_apres_covid = df_apres_covid.groupby(['entreprise', df_apres_covid['date'].dt.year]).mean()

Calcul de la variation des chiffres d'affaires et des résultats opérationnels entre les périodes avant et après la covid-19

df_variation = pd.DataFrame()
df_variation['variation_ca'] = df_apres_covid['chiffre_affaires'] / df_avant_covid['chiffre_affaires'] - 1
df_variation['variation_op'] = df_apres_covid['resultat_operationnel'] / df_avant_covid['resultat_operationnel'] - 1

Affichage des variations des chiffres d'affaires et des résultats opérationnels pour chaque entreprise

print(df_variation)

Création d'un graphique comparant les variations des chiffres d'affaires et des résultats opérationnels pour chaque entreprise

plt.bar(df_variation.index, df_variation['variation_ca'], label="variation du chiffre d'affaires")
plt.bar(df_variation.index, df_variation['variation_op'], label="variation du résultat opérationnel")
plt.legend()
plt.show()```

#

here's the output of the df_variation dataframe containing the NaNs from the operation py variation_ca variation_op entreprise date Roche Holding AG 2018 NaN NaN 2019 NaN NaN 2020 NaN NaN 2021 NaN NaN

#

and here's the error when it tries to generate the plot: ```py

TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8504/54936595.py in <module>
24
25 # Création d'un graphique comparant les variations des chiffres d'affaires et des résultats opérationnels pour chaque entreprise
---> 26 plt.bar(df_variation.index, df_variation['variation_ca'], label="variation du chiffre d'affaires")
27 plt.bar(df_variation.index, df_variation['variation_op'], label='variation du résultat opérationnel')
28 plt.legend()

c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\pyplot.py in bar(x, height, width, bottom, align, data, **kwargs)
2649 x, height, width=0.8, bottom=None, *, align='center',
2650 data=None, **kwargs):
-> 2651 return gca().bar(
2652 x, height, width=width, bottom=bottom, align=align,
2653 **({"data": data} if data is not None else {}), **kwargs)

c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib_init_.py in inner(ax, data, *args, **kwargs)
1359 def inner(ax, *args, data=None, **kwargs):
1360 if data is None:
-> 1361 return func(ax, *map(sanitize_sequence, args), **kwargs)
1362
1363 bound = new_sig.bind(ax, *args, **kwargs)

c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\axes_axes.py in bar(self, x, height, width, bottom, align, **kwargs)
2277
2278 if orientation == 'vertical':
-> 2279 self._process_unit_info(
2280 [("x", x), ("y", height)], kwargs, convert=False)
2281 if log:

c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\axes_base.py in _process_unit_info(self, datasets, kwargs, convert)
2339 # Update from data if axis is already set but no unit is set yet.
2340 if axis is not None and data is not None and not axis.have_units():
-> 2341 axis.update_units(data)
2342 for axis_name, axis in axis_map.items():
2343 # Return if no axis is set.

c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\axis.py in update_units(self, data)
1446 neednew = self.converter != converter
1447 self.converter = converter
-> 1448 default = self.converter.default_units(data, self)
1449 if default is not None and self.units is None:
1450 self.set_units(default)

c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\category.py in default_units(data, axis)
...
---> 92 raise TypeError(
93 "{!r} must be an instance of {}, not a {}".format(
94 k,

TypeError: 'value' must be an instance of str or bytes, not a tuple```

#

any one has a clue?

queen cradle Dec 30, 2022, 3:54 AM

#

worthy hollow hey guys so i have this code that makes **an error at the division of 2 differen...

Check df_avant_covid['chiffre_affaires'] and df_avant_covid['resultat_operationnel'] for zeros.

worthy hollow Dec 30, 2022, 3:55 AM

#

queen cradle Check `df_avant_covid['chiffre_affaires']` and `df_avant_covid['resultat_operati...

there's no 0 in the df_avant_covid : ```py

chiffre_affaires resultat_operationnel resultat_net
entreprise date
Roche Holding AG 2018 60829.73 15099.83 10735.20
2019 64165.38 17662.06 13584.73```

queen cradle Dec 30, 2022, 3:56 AM

#

What about df_apres_covid?

#

Do you already have NaNs there?

worthy hollow Dec 30, 2022, 3:56 AM

#

i have no nan there too

#

the nan are made from here

#

df variation

#

bcuz i try to do an operation from there using the two above df (df_avant_covid & df_apres_covid)

#

but they are not in the same dimension i think thats why

queen cradle Dec 30, 2022, 3:59 AM

#

Oh, it could also be happening because when you groupby you have no data. Then when you call .mean() you get NaN.

worthy hollow Dec 30, 2022, 4:05 AM

#

nah its not

serene scaffold Dec 30, 2022, 4:06 AM

#

@queen cradle welcome to our wonderful data science chat waveboye

queen cradle Dec 30, 2022, 4:06 AM

#

serene scaffold <@710929945526009897> welcome to our wonderful data science chat <:waveboye:5856...

Thank you

serene scaffold Dec 30, 2022, 4:07 AM

#

how did you find this channel immediately after joining, anyway? thinkPeepo

queen cradle Dec 30, 2022, 4:07 AM

#

Um, I scrolled down?

serene scaffold Dec 30, 2022, 4:08 AM

#

good to know. (some people complain about the findability of our channels.)

worthy hollow Dec 30, 2022, 4:08 AM

#

worthy hollow Dec 30, 2022, 4:08 AM

#

worthy hollow

i think it might be more clear with this little explaination

#

" I want to divide: (mean of 2018-2019) / (mean of 2020-2021) from the "chiffre_affaires" column... But it's complicated bcuz it's inside the same dataframe and i want to store the result at the "variation_ca" column "

#

queen cradle Dec 30, 2022, 4:10 AM

#

serene scaffold good to know. (some people complain about the findability of our channels.)

With a server this big, I assumed when I joined that there would be lots of channels I wouldn't be interested in; I just kept going until I found some that looked promising. But I'm not new to Discord, and I think the long list of channels might have been more difficult for me to parse if I were.

serene scaffold Dec 30, 2022, 4:11 AM

#

queen cradle With a server this big, I assumed when I joined that there would be lots of chan...

thanks for your feedback 👍

lapis sequoia Dec 30, 2022, 4:17 AM

#

can someone help me why this is not working

#

this is how br looks

#

br.set_index(["area_name","dat"]).stack("area_name")
@serene scaffold
I think it's related to the area name column. Because a normal set index("area).stack() is also not working for that.

queen cradle Dec 30, 2022, 4:18 AM

#

worthy hollow " **I want to divide: (mean of 2018-2019) / (mean of 2020-2021) from the "chiffr...

In the code you posted, df_variation is a new DataFrame unrelated to df. It looks like df doesn't have chiffre_affaires or resultat_operationnel columns and like df_variation only has those two columns. But in the picture, it looks like you have one large data frame with everything. Also, I can see a df4. So I think you must have done something that you haven't shown us.

#

I recommend restarting your analysis and inserting print statements.

worthy hollow Dec 30, 2022, 4:22 AM

#

ok i found something but got a new error now

#

here's the dataframe i have

#

      variation_ca  variation_op  chiffre_affaires  resultat_operationnel  \
date                                                                        
2018           0.0           0.0          60829.73               15099.83   
2019           0.0           0.0          64165.38               17662.06   
2020           0.0           0.0          64361.84               19777.96   
2021           0.0           0.0          72046.48               19863.39   

      resultat_net  
date                
2018      10735.20  
2019      13584.73  
2020      15247.05  
2021      15240.81 ```

#

and the code with the error is:

#

# Chargement des données financières des entreprises du secteur pharmaceutique
df = pd.read_csv(r'data//income_statement.csv')

# Sélection des colonnes à inclure dans l'analyse
cols = ['entreprise', 'date', 'chiffre_affaires', 'resultat_operationnel', 'resultat_net']
df = df[cols]
df['date'] = pd.to_datetime(df['date'], format='%Y', errors='ignore')

# Filtrage des données pour ne conserver que les années précédant la covid-19 (2019 et avant) et celles incluant la covid-19 (2020 et après)
df_avant_covid = df[df['date'].dt.year < 2020]
df_apres_covid = df[df['date'].dt.year >= 2020]

# Calcul de la moyenne annuelle des chiffres d'affaires et des résultats opérationnels pour chaque entreprise, avant et après la covid-19
df_avant_covid = df_avant_covid.groupby([df_avant_covid['date'].dt.year]).mean()
df_apres_covid = df_apres_covid.groupby([df_apres_covid['date'].dt.year]).mean()

# Calcul de la variation des chiffres d'affaires et des résultats opérationnels entre les périodes avant et après la covid-19
df_variation = pd.DataFrame()
df_variation['variation_ca'] = df_apres_covid['chiffre_affaires'] / df_avant_covid['chiffre_affaires'] - 1
df_variation['variation_op'] = df_apres_covid['resultat_operationnel'] / df_avant_covid['resultat_operationnel'] - 1

data = [df_variation, df_avant_covid, df_apres_covid]
df4 = pd.concat(data)
df4 = df4.iloc[4:, :]
df4 = df4.fillna(0)

# Sélection des lignes à utiliser pour la division
df4 = df4.set_index("date")
row_2018_2019 = df4.loc[['2018', '2019'], 'chiffre_affaires']
row_2021_2020 = df4.loc[['2021', '2020'], 'chiffre_affaires']

# Division des lignes sélectionnées
df4["variation_ca"] = row_2021_2020 / row_2018_2019

df4```

#

brings me this error idk why: ```py

KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_12880/950567047.py in <module>
26
27 # Sélection des lignes à utiliser pour la division
---> 28 df4 = df4.set_index("date")
29 row_2018_2019 = df4.loc[['2018', '2019'], 'chiffre_affaires']
30 row_2021_2020 = df4.loc[['2021', '2020'], 'chiffre_affaires']

c:\Users\PEGON\anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper

c:\Users\PEGON\anaconda3\lib\site-packages\pandas\core\frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
5449
5450 if missing:
-> 5451 raise KeyError(f"None of {missing} are in the columns")
5452
5453 if inplace:

KeyError: "None of ['date'] are in the columns"```

verbal venture Dec 30, 2022, 5:01 AM

#

how much data is usually needed for very effective CNNs? and does the accuracy of an algo fall down to just how much data you have, or the type of algorithms you're creating?

hasty mountain Dec 30, 2022, 5:37 AM

#

I've read that Relativistic Discriminators tend to perform better in GANs, and now that I've implemented it, my discriminator simply won't learn anything...nice.

hasty mountain Dec 30, 2022, 5:38 AM

#

verbal venture how much data is usually needed for very effective CNNs? and does the accuracy o...

The accuracy should get higher with the more data you have, unless your algorithm is overfitting(in this case, is the type of algorithm you're creating).

And people tend to use tens of thousands of data, from what I've seen so far.

verbal venture Dec 30, 2022, 5:39 AM

#

hasty mountain The accuracy should get higher with the more data you have, unless your algorith...

but what's the difference between 10,000 and 1M ?

#

would the same algorithm perform way better on 1M images?

hasty mountain Dec 30, 2022, 5:39 AM

#

verbal venture would the same algorithm perform way better on 1M images?

Probably

#

1M images = more features to learn, more ways to generalize

#

A person who only learned around 100 words will have way more difficulty in communicating and developing social skills than someone who has more than 25,000 words in his vocabulary

hasty mountain Dec 30, 2022, 6:13 AM

#

hasty mountain I've read that Relativistic Discriminators tend to perform better in GANs, and n...

Now it's working...
Why does initializing my weights with a too low std leads to vanishing gradients, though? Doesn't seem to make sense
||Also thanks ChatGPT...someday I'll have an AI better than you.||

silk knot Dec 30, 2022, 6:34 AM

#

Does anyone know how to only get 1 Line as the output for this?

s1 = """
Apples Oranges Grapes
White Black Red Green"""
s2 = "Apples"

print(s1[s1.index(s2) + len(s2):])

Output:

Oranges Grapes
White Black Red Green

I want the output to be Oranges Grapes (Which is only the one line after the word)

#

nvm i got it

dull carbon Dec 30, 2022, 7:40 AM

#

#

How i can solve this

fading zealot Dec 30, 2022, 11:46 AM

#

any data scientist here

#

@charred light can you suggest the dataset

#

the dataset i have used the accuracy is -0.44

#

are you there ?

steady basalt Dec 30, 2022, 12:11 PM

#

fading zealot the dataset i have used the accuracy is -0.44

Negative accuracy!

#

Ur predictions in a black hole or smtn

fading zealot Dec 30, 2022, 12:12 PM

#

ok

#

@steady basalt can you help with the datasets

#

📎 ElectricCarPrices.csv 📎 CO2_emission.csv 📎 Air_Polution.csv

#

these are the three datasets

#

I am finding to difficult to find the features

#

EV dataset has only 10 rows

#

atleast need a big dataset to calculate r2

#

rsquare

keen notch Dec 30, 2022, 12:15 PM

#

does anyone know why this error

#

import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp

def acc_func(t, vals):
    x, y, vx,vy = vals
    acc_x = -2.0 * y**2 * x * (1 - (x**2)) * np.exp(- (x**2 + y**2))
    acc_y = -2.0 * x**2 * y * (1 - (y**2)) * np.exp(- (x**2 + y**2))
    return np.array([vx, vy,acc_x,acc_y],dtype=object)


def trajectory(impactpar, speed):
    maxtime = 10.0 / speed
    t = np.linspace(0, maxtime, 300)
    x0 = impactpar
    y0 = -2.0
    vx0 = 0.0
    vy0 = speed
    vals =  np.array([x0, y0,vx0,vy0],dtype=object)
    acc = solve_ivp(acc_func, (0.,300.), vals, t_eval=t)
    x = acc.y[0]
    y = acc.y[1]
    return x, y

x, y = trajectory(0.15, 0.1)
# Plot the trajectory
plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("y")
plt.show()

# # Solution to part (b)
def scatterangles(allb, speed):
    angles = []
    for impactpar in allb:
        x, y = trajectory(impactpar, speed)
        vx = x[-1]
        vy = y[-1]
        angle = np.arctan2(vy, vx)
        angles.append(angle)
    return angles
allb = np.arange(-2, 2, 0.001)
angles = scatterangles(allb, 0.1)
plt.plot(allb, angles)
plt.xlabel("Impact parameter")
plt.ylabel("Scatter angle")
plt.show()```

#

dusky finch Dec 30, 2022, 12:18 PM

#

worthy hollow " **I want to divide: (mean of 2018-2019) / (mean of 2020-2021) from the "chiffr...

Try the .divide() function instead of /

keen notch Dec 30, 2022, 12:26 PM

#

I'm happy to provide the question to make more sense

keen notch Dec 30, 2022, 12:49 PM

#

my graph looks like this👀 😂

#

sorry for the spam but looks better now just the range error still:(

#

I think it might be i need to use solve_ivp

arctic wedgeBOT Dec 30, 2022, 12:58 PM

#

Hey @keen notch!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

unique ridge Dec 30, 2022, 1:15 PM

#

Hey, I think this is the perfect place to ask my question, so let's give it a shot. I am building a predictive model based on temperature data from my greenhouse. Now I have data which spans over 10 months from January till October. Every hour, the sensors register attributes such as: AVG temperature in and out the greenhouse, AVG relative humidity, ABS humidity and AVG moisture deficit. I have plotted my data and I see some points that are quite high. Because of this, I want to detect outliers and remove them, but I dont know how I can approach it the best way in this use cause, mainly due the fact the data represent real stuff and all the attributes depend a bit on each other. Any advice on how I can approach this the best way?

runic patrol Dec 30, 2022, 1:22 PM

#

hey guys i have started learning for data science from sep 2022 till now im done with python ,pandas,numpy ,gui,databases,graphs and charts although data science is a very vast topic to learn i'm thinking to learn till Machine learning wihich includes statistics,EDA & feature engineering,ML, PCA, NLP, time series analysis,stats. after that end to end projects on ML. Will this be enough to get a entry level job in ML field ? i will complete my data science with ongoing job. my background is civil engineering

opaque bay Dec 30, 2022, 1:25 PM

#

Hi

#

I need source code for an Instagram scraper which scrapes an account's followers and gets their emails

runic patrol Dec 30, 2022, 1:27 PM

#

opaque bay I need source code for an Instagram scraper which scrapes an account's followers...

can we do this ? hacking

opaque bay Dec 30, 2022, 1:27 PM

#

Web scraping

runic patrol Dec 30, 2022, 1:27 PM

#

ohk

unique ridge Dec 30, 2022, 1:29 PM

#

opaque bay I need source code for an Instagram scraper which scrapes an account's followers...

I dont think this is the right channel to ask.

steady basalt Dec 30, 2022, 1:34 PM

#

fading zealot EV dataset has only 10 rows

lol…..

fading zealot Dec 30, 2022, 1:34 PM

#

yes

#

can you just recommend me

#

which features i can take into consideration and do feature engineering

steady basalt Dec 30, 2022, 1:35 PM

#

What is useful

#

And also, you have 10 rows this isn’t good enough unless you’re joining datasets and have a forgein key

fading zealot Dec 30, 2022, 1:35 PM

#

ya

#

what if we take co2 dataset and air pollution dataset

#

what features would you suggest

steady basalt Dec 30, 2022, 1:36 PM

#

What features are there how many

#

Is this how much co2 a car produces?

fading zealot Dec 30, 2022, 1:37 PM

#

fading zealot ok

these are the datasets

steady basalt Dec 30, 2022, 1:37 PM

#

Given car engine, model, price etc?

fading zealot Dec 30, 2022, 1:37 PM

#

no

#

how the increase in co2 and no2 causing global warming

steady basalt Dec 30, 2022, 1:37 PM

#

your target variable is global temperature?

#

What is each sampler

#

A sample of what, factory output?

#

Where do you get this data, what is the actual data im on mobile

fading zealot Dec 30, 2022, 1:38 PM

#

oh

#

we have 4 data sets

steady basalt Dec 30, 2022, 1:39 PM

#

U can just say what is your sample on the data set u working on

fading zealot Dec 30, 2022, 1:39 PM

#

global temperature

steady basalt Dec 30, 2022, 1:39 PM

#

Ok predicting global temp based on what

#

It’s time series?

fading zealot Dec 30, 2022, 1:39 PM

#

i want to calculate r square and for that I need huge data

#

in all the datasets i have

steady basalt Dec 30, 2022, 1:40 PM

#

you need to properly describe your data

fading zealot Dec 30, 2022, 1:40 PM

#

global temperature data set doesnt have countries

#

wait let me send you a pic

steady basalt Dec 30, 2022, 1:40 PM

#

Just one row

fading zealot Dec 30, 2022, 1:42 PM

#

#

steady basalt Dec 30, 2022, 1:42 PM

#

Globe is what ur predicting?

fading zealot Dec 30, 2022, 1:42 PM

#

no

#

i have to use two datasets it is mandatory

steady basalt Dec 30, 2022, 1:43 PM

#

Ok I see

fading zealot Dec 30, 2022, 1:43 PM

#

to predict my hypothesis

steady basalt Dec 30, 2022, 1:43 PM

#

Global temp is time series monthly

fading zealot Dec 30, 2022, 1:43 PM

#

i tried but i got -0.44 accuracy

steady basalt Dec 30, 2022, 1:44 PM

#

Time series regression isn’t accuracy based

#

It’s error

#

Did you take the square of the error what is ur metric

fading zealot Dec 30, 2022, 1:44 PM

#

i just want you to recommend what features i can use from two datasets and then will apply linear regression

steady basalt Dec 30, 2022, 1:44 PM

#

U need to understand better what predictions are and how you measure them

fading zealot Dec 30, 2022, 1:45 PM

#

i dont wanna use global temperature

steady basalt Dec 30, 2022, 1:45 PM

#

There’s no point until you understand how we measure regression

#

It’s more important than features

fading zealot Dec 30, 2022, 1:45 PM

#

ok

unique ridge Dec 30, 2022, 1:45 PM

#

unique ridge Hey, I think this is the perfect place to ask my question, so let's give it a sh...

nobody that has an answer on this?

steady basalt Dec 30, 2022, 1:45 PM

#

If you must merge data sets maybe you can show over time the largest co2 producers increasing co2

fading zealot Dec 30, 2022, 1:46 PM

#

if we will take co2 dataset and air pollution

#

which features would be perfect

steady basalt Dec 30, 2022, 1:46 PM

#

unique ridge Hey, I think this is the perfect place to ask my question, so let's give it a sh...

Have you identified extreme outliers

#

Maybe just a box plot

fading zealot Dec 30, 2022, 1:46 PM

#

merge co2 and air pollution?

unique ridge Dec 30, 2022, 1:46 PM

#

steady basalt Have you identified extreme outliers

What defines an 'extreme outlier' ?

steady basalt Dec 30, 2022, 1:47 PM

#

fading zealot merge co2 and air pollution?

You need to define a goal before you define a method

#

What do you set out to predict

#

Decide that first

runic patrol Dec 30, 2022, 1:47 PM

#

runic patrol hey guys i have started learning for data science from sep 2022 till now im don...

anyone ?

steady basalt Dec 30, 2022, 1:47 PM

#

unique ridge What defines an 'extreme outlier' ?

Something way beyond the iqr I guess? Can eyeball it and see

keen notch Dec 30, 2022, 1:48 PM

#

does anyone how to use solve_ivp

steady basalt Dec 30, 2022, 1:48 PM

#

If there’s a distribution of points and a single point miles out

keen notch Dec 30, 2022, 1:48 PM

#

fading zealot Dec 30, 2022, 1:48 PM

#

yes I am unable to get a good dataset

unique ridge Dec 30, 2022, 1:48 PM

#

steady basalt Something way beyond the iqr I guess? Can eyeball it and see

I can show you some charts? Since it is greenhouse data, the variables do depend on each other.

steady basalt Dec 30, 2022, 1:49 PM

#

It matters then whether you think it’s relevant or not to your model

unique ridge Dec 30, 2022, 1:49 PM

#

fading zealot yes I am unable to get a good dataset

check government websites

fading zealot Dec 30, 2022, 1:49 PM

#

i did but didnt get relevant dataset

unique ridge Dec 30, 2022, 1:49 PM

#

You can maybe combine datasets to get the wanted result.

steady basalt Dec 30, 2022, 1:50 PM

#

fading zealot yes I am unable to get a good dataset

I don’t think global temperature is something easy to predict unless you’re just going with x decades of trend

#

Shits random and has been for millions of years

#

All u can do is say line go up because humans

fading zealot Dec 30, 2022, 1:50 PM

#

lets say we can use co2 data set and air pollution dataset

#

can we say with the increase in temperature of co2 an no2

#

it is causing global warming

unique ridge Dec 30, 2022, 1:51 PM

#

Supermoon, want to see some charts?

steady basalt Dec 30, 2022, 1:51 PM

#

Controversial question

steady basalt Dec 30, 2022, 1:51 PM

#

unique ridge Supermoon, want to see some charts?

Sure

fading zealot Dec 30, 2022, 1:51 PM

#

fading zealot

this is co2 dataset

steady basalt Dec 30, 2022, 1:52 PM

#

fading zealot can we say with the increase in temperature of co2 an no2

Correlation.. causation… it’s theoretical still

#

Not something I’d personally work on

fading zealot Dec 30, 2022, 1:53 PM

#

so features like country year can be taken into consideration

arctic wedgeBOT Dec 30, 2022, 1:53 PM

#

:incoming_envelope: :ok_hand: applied mute to @unique ridge until <t:1672409033:f> (10 minutes) (reason: attachments rule: sent 7 attachments in 10s).

The <@&831776746206265384> have been alerted for review.

steady basalt Dec 30, 2022, 1:54 PM

#

Ouch

brisk vapor Dec 30, 2022, 1:54 PM

#

!unmute 360683248151429131

arctic wedgeBOT Dec 30, 2022, 1:54 PM

#

:incoming_envelope: :ok_hand: pardoned infraction mute for @unique ridge.

steady basalt Dec 30, 2022, 1:54 PM

#

fading zealot so features like country year can be taken into consideration

Whatever is useful and related to your outcome

unique ridge Dec 30, 2022, 1:54 PM

#

🤣

brisk vapor Dec 30, 2022, 1:54 PM

#

Could you please upload them to some image host instead perhaps? Otherwise our bot will not like it

unique ridge Dec 30, 2022, 1:54 PM

#

I can send them 1 for 1?

#

brisk vapor Dec 30, 2022, 1:55 PM

#

sure but it might take a little bit of time, since the bot looks at 10s window

steady basalt Dec 30, 2022, 1:55 PM

#

Outliers where?

unique ridge Dec 30, 2022, 1:56 PM

#

Hold on, this is the general chart

#

steady basalt Dec 30, 2022, 1:57 PM

#

Might not want to delete anything based on that

unique ridge Dec 30, 2022, 1:57 PM

#

#

I can send boxplots too? 🤣

steady basalt Dec 30, 2022, 1:57 PM

#

More useful than time series

unique ridge Dec 30, 2022, 1:58 PM

#

What do you mean?

steady basalt Dec 30, 2022, 1:58 PM

#

How can u decide to delete points based on the series

#

If it can be legit and caused by something at that time

#

I’d imagine none of those points will appear as outliers on ur box plot

unique ridge Dec 30, 2022, 1:59 PM

#

Since it is real data. It is all legit indeed, but the sensors can do a fuckywucky ofcourse.

#

steady basalt Dec 30, 2022, 2:00 PM

#

If you beleive this is the case try linear interpolation ?

#

Doesn’t look too bad but I don’t know much about greenhouses

unique ridge Dec 30, 2022, 2:03 PM

#

Since i am not knowing the sensor did an oopsie, is the reason i want to find possible outliers. Well, just keep in mind that if it is warm outside, the temperature and humidity in the greenhouse increase as well. Only if the variables change too fast, preventive actions get taken.

steady basalt Dec 30, 2022, 2:03 PM

#

What are you modelling for

unique ridge Dec 30, 2022, 2:03 PM

#

I basically want to predict the temperature in the greenhouse based on the other 5 attributes.

steady basalt Dec 30, 2022, 2:04 PM

#

Incase ur thermostat breaks?

unique ridge Dec 30, 2022, 2:04 PM

#

Iam following crisp-dm, so i am now at prepping.

#

No, so you can predict the upcoming temperature 😛

steady basalt Dec 30, 2022, 2:04 PM

#

Interesting

unique ridge Dec 30, 2022, 2:05 PM

#

Its a small school project so nothing too serious, but if i do something, i want it done right (or atleast the best i can do)

#

Would you suggest i dont need outliers removal?

steady basalt Dec 30, 2022, 2:07 PM

#

I wouldn’t…

#

But you know glasshouses better than me

#

And it could be a good idea if ur usecase aligns with it

#

Maybe ur recorders weren’t fault

unique ridge Dec 30, 2022, 2:12 PM

#

What i maybe can do is select from each attribute 10 highest values and have a look if the values from other attributes match with the others?

steady basalt Dec 30, 2022, 2:13 PM

#

Is it possible to test ur model now without doing so see how close it is

#

Predicting the next day 15 readings

#

Say

#

Maybe give it to a bi lstm and then predict windows

unique ridge Dec 30, 2022, 2:15 PM

#

Yeah i can iterate multiple times through it. One with no 'outlier removal' and 1 without.

steady basalt Dec 30, 2022, 2:18 PM

#

Try and see what u get after all pre processing

#

At predicting next 5 readings

#

Look at absolute error maybe to get an idea

#

Are you using all readings at equal intervals to predict temp

#

If only you have sunlight too

raw tulip Dec 30, 2022, 4:24 PM

#

hey!

slate hollow Dec 30, 2022, 5:05 PM

#

the existence of a forward feeding neural network implies the existence of a mysterious, unseen backward feeding neural network

hasty mountain Dec 30, 2022, 5:12 PM

#

Does initializing my weights with a very low standard deviation(2e-5) might lead to vanishing gradients, or was this just a delirium from ChatGPT?
I've tested it and it seemed to actually make a difference, but I was sleepy back then and I might've changed my residual scaling factor...idk...I don't remember... pithink

raw tulip Dec 30, 2022, 5:47 PM

#

I was hoping to see more discussions about GPT and AI here, maybe anyone could provide a recommendation to other channel?

serene scaffold Dec 30, 2022, 6:35 PM

#

raw tulip I was hoping to see more discussions about GPT and AI here, maybe anyone could p...

Everyone here is already talking about AI. And there's more to AI than gpt.

#

The message before yours even mentions chatgpt

digital hazel Dec 30, 2022, 7:03 PM

#

When bringing my tensorflow model code into an API, do I have to save the model and load it in the API file? I understand thsr saving the model saves its weight and accuracy, but assuming I keep the same parameters and code in the API code, shouldnt the model be around the same accuracy after fitting it?

#

In other words, why can't I just copy and paste the same code that trains/fits the model into the API code file, as it only runs one time when the server is setting up?

#

Nvm realize now that the model takes long to train so training it one time and saving it saves you a lot of time.

serene scaffold Dec 30, 2022, 7:09 PM

#

digital hazel Nvm realize now that the model takes long to train so training it one time and s...

ya gotta retrain the model before each prediction
no batching, either
KEK

digital hazel Dec 30, 2022, 7:13 PM

#

Gotcha thank you

#

Idk why I didnt realize that

serene scaffold Dec 30, 2022, 7:19 PM

#

digital hazel Idk why I didnt realize that

to be clear, I was just being silly, since you seem to have figured out on your own that you can save and load models.

digital hazel Dec 30, 2022, 7:24 PM

#

No I understand no worries. Just took me one google search of why do I need to save models🤦‍♂️

rancid sorrel Dec 30, 2022, 7:44 PM

#

anyone got a good guide for how to hook up an AAN to input/output thats dynamic?

wintry geode Dec 30, 2022, 7:54 PM

#

Hello, I am using a module called chatterbot and I am trying to see what the confidence of the chatbot’s response is, does anyone know how to?

serene scaffold Dec 30, 2022, 8:06 PM

#

wintry geode Hello, I am using a module called **chatterbot** and I am trying to see what the...

I'm trying to install chatterbot so I can figure that out for you. but I suspect that each response from the bot isn't a string, but some kind of Response object

hasty mountain Dec 30, 2022, 8:31 PM

#

Curious...I thought pretraining my Generator with a L1 Loss would mess the adversarial training with the Discriminator, but it seems to actually do no harm at all... so far, at least
EDIT: during adversarial training, the Discriminator simply messes up the generator pretrained weights py_guido

#

I'm almost becoming a GAN researcher...too bad I still couldn't get any decent result

steady basalt Dec 30, 2022, 8:55 PM

#

rancid sorrel anyone got a good guide for how to hook up an AAN to input/output thats dynamic?

Wdym?

steady basalt Dec 30, 2022, 8:55 PM

#

hasty mountain Curious...I thought pretraining my Generator with a L1 Loss would mess the adver...

How hard is it to build a good gan model

rancid sorrel Dec 30, 2022, 8:55 PM

#

we get a lot of guides about how to parse data into a AAN, we dont get many guides on how to hook the up as say a control system

#

like for example a self driving car

steady basalt Dec 30, 2022, 8:56 PM

#

rancid sorrel we get a lot of guides about how to parse data into a AAN, we dont get many guid...

Control system??

#

Oh right

#

Well that’s just engineering

rancid sorrel Dec 30, 2022, 8:56 PM

#

well teh coding part

steady basalt Dec 30, 2022, 8:56 PM

#

Once you have the model you can deploy it

#

For instance, you can use the model you’ve trained in an app

rancid sorrel Dec 30, 2022, 8:56 PM

#

like for example
input A ->>> AAN ->>> output B

hasty mountain Dec 30, 2022, 8:57 PM

#

steady basalt How hard is it to build a good gan model

Apparently, it isn't that hard if you're a random guy making a tutorial in the internet, but it's being a bit hard for me grumpchib

rancid sorrel Dec 30, 2022, 8:57 PM

#

i want more examples of parts A and B

steady basalt Dec 30, 2022, 8:57 PM

#

Data will come from somewhere, for me it’s cloud based. For your self driving car, I suppose an app will stream images

#

That’s probably very complex software

#

So I can only explain on a more basic level

#

You can do a batch or a real time app

#

Once you built a model it’s all app building, and retraining to account for drift

rancid sorrel Dec 30, 2022, 8:58 PM

#

my disseration is ML with Cybersecurity. honestly all the stuff sofar is about importing static data

steady basalt Dec 30, 2022, 8:59 PM

#

There is some data engineering and possibly ml ops you will need to learn then

rancid sorrel Dec 30, 2022, 8:59 PM

#

steady basalt There is some data engineering and possibly ml ops you will need to learn then

very much so. got any places i can start

steady basalt Dec 30, 2022, 8:59 PM

#

Data engineering will be making the data get to model to retrain and ml ops you will need to work out how to deploy it to produce results

#

I mean I started on the job with Azure pipelines

#

What is your objective?

#

Don’t do something random as it’s harder

rancid sorrel Dec 30, 2022, 9:00 PM

#

hoenstly ive got like 90% of a comp sci degree in me so i can code whatever needed,

steady basalt Dec 30, 2022, 9:00 PM

#

In which case

rancid sorrel Dec 30, 2022, 9:00 PM

#

baiscaly making a ML powered bot

steady basalt Dec 30, 2022, 9:00 PM

#

You can try simulate a data stream

#

So generate shit with a script

#

Can you build apps

#data-science-and-ml

Load the dataset

Define a list of models to compare

Iterate over the models and print their mean cross-validation score

Chargement des données financières des entreprises du secteur pharmaceutique

Sélection des colonnes à inclure dans l'analyse

Filtrage des données pour ne conserver que les années précédant la covid-19 (2019 et avant) et celles incluant la covid-19 (2020 et après)

Calcul de la moyenne annuelle des chiffres d'affaires et des résultats opérationnels pour chaque entreprise, avant et après la covid-19

Calcul de la variation des chiffres d'affaires et des résultats opérationnels entre les périodes avant et après la covid-19

Affichage des variations des chiffres d'affaires et des résultats opérationnels pour chaque entreprise

Création d'un graphique comparant les variations des chiffres d'affaires et des résultats opérationnels pour chaque entreprise

and here's the error when it tries to generate the plot: ```py

brings me this error idk why: ```py