#data-science-and-ml
1 messages · Page 38 of 1
hello, i would like to please ask are there any beginner guide resources to naming variables in data science? For example this is what I have:
# load data
uploaded = files.upload()
yelp_business_category_correlation_dataframe = pd.read_csv(io.BytesIO(uploaded['yelp_bussiness_category_correlation.csv']))
yelp_business_category_correlation_dataframe.head(10)
def getBusinessCompetitorsByCategory(business_category):
the csv file im loading is a pearson correlation of businesses by categories
and my function will return the top correlated bussines categories related to the parameter business_category, i put keyword Competitor since these business can be potential competitors
i dont want to name variables like
df = pd.read.. where df stands for dataframe. df is very ambiguous and not readable in the long term
or what i have is okay?
say I have a 3 layer MLP neural network (2 hidden 1 output). I know the range for my input (0 to 1 for example), but i do not know the possible range for the layer 1 output for example. given i have a set of weights and biases and a defined range for layer 1 inputs, does anyone know if it is possible to determine the maximum possible output for layer 1 for example?
You could use a customized threshold function, if there aren't any activation functions available(tanh, ReLU, ReLU6...)
From 0 to 1 you could use a sigmoid or softmax function(though I guess sigmoid function between hidden layers might not be recommended due to vanishing gradients)
i was using ReLU for my case
for the hidden neurons that is
Sigmoid seems to make things quite...unstable. At least I was testing a GAN here and, well...with sigmoid things got quite messy.
hmm, that's interesting
Notation in probability is a mess (especially expected value, it can be ambiguous without more information). I prefer p and Pr. Hats on top for approximation.
In ML they may or may not state what the notation means, or even be consistent. Things are left ambiguous and made unambiguous with the surrounding text (which they sometimes don't have and expect you to know based on some other papers that they are copying in notation (but mixed together, so it may require a lot of inference to decode its meaning (like solving a Sudoku puzzle at times)))
@iron basalt thank you for your input 💚
*Or my favorite, you could only know what they mean by being on the same wavelength / predicting what they are trying to do in the paper because everyone working on similar stuff has convergent ideas.
("culture")
there's a whole reference implementation for the paper I wrote two-ish years ago. but even with that, I'm already regretting some imprecision in how I explained a few points.
Yeah in ML they sometimes kind of give up, and write it for others that are on that same wavelength. In that case it's a very fast way to write it, but terrible for anyone trying to get in on it.
(and it's not a shitty reference implementation. you could reproduce everything in the paper with one bash command, if you have the dataset.)
(in that case one hopes for a reference implementation, hopefully in Python, because it's unlikely that any C++ or other stuff makes any sense and is bug free)
Yeah, I'm taking a look at one Tensorflow implementation of Ian Goodfellow's suggestion
A modified one, the non-saturating version for the generator loss
Do you mean the 2016 tutorial? https://arxiv.org/pdf/1701.00160.pdf
But I didn't know about this tutorial. Interesting.
You also might want to look into https://en.wikipedia.org/wiki/Wasserstein_GAN if you did not already.
The Wasserstein Generative Adversarial Network (WGAN) is a variant of generative adversarial network (GAN) proposed in 2017 that aims to "improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches".Compared with the original GAN discriminator,...
Ugh, I've read about WGAN, but I'm kind of lazy to try to implement it
Compared to the original GAN algorithm, the WGAN undertakes the following changes:
- After every gradient update on the critic function, we are required to clamp the weights to a small fixed range is required, usually [−c,c].
- Use a new loss function derived from the Wasserstein distance. The discriminator model does not play as a direct critic but rather a helper for estimating the Wasserstein metric between real and generated data distributions.
Empirically the authors recommended usage of RMSProp optimizer on the critic, rather than a momentum-based optimizer such as Adam which could cause instability in the model training.
Oooh... I see... I didn't know that about Adam
It's a default choice... until it's not.
Now that you mentioned it... I think Goodfellow used a simple SGD with momentum, didn't he?
You are past the beginning here, so default choices need to be reevaluated.
But now it makes sense...specially since some papers on GANs differ so much on values for beta1 and beta2 for Adam
And I don't have more than 9.000 GPUs to spend so much time in trial and error
Thanks!
Yeah, when you don't have ALL the compute, your choices need to be more careful / examined.
/ don't assume that because everyone does it that it's the best.
hello, i would like to please ask is it better to do recommendation system based on what others like or make a simple recommendation based on popular things
what i want is simply to recommend other popular bussiness , but not sure if this is better than recommending what other users like. Recommending popular business is just using some math to find most rating count and highest rating compared to the second option where i need to use KNN or something similar
Why not to use both?
Are you using tensorflow?
Wow cool
Taichi is there, but I have not used it, seems fine, assuming it's not buggy. It has limitations as usual though. We have our own stuff so I don't use these open source frameworks except when using other's stuff or when it happens to be a good fit for something. https://www.taichi-lang.org/
Taichi is a domain-specific language embedded in Python that helps you easily write portable, high-performance parallel programs.
I love your job, this is my dream
ah i hadn't heard about taichi before
It's new-ish.
i've never heard of that, it seems pretty interesting
better start studying math 😛 the coding is really kinda secondary
that looks like something i'll definitely start using
The way that it hijacks the Python type hinting syntax is interesting.
Building a language out of Python existing syntax, since Python kind of has everything you need now.
Hi, I have a question regarding evaluating and retraining a model.
Assuming I have the current flow: 1) User can evaluate existing deloyed model, 2) Based on evaluation, user can retrain the model, 3) if retrain and accuracy better than evaluation accuracy from step 1, deploy model.
My question here is, for step 3, should I be comparing the newely trained model accuracy to the accuracy from step 1? Or should I be comparing it to the accuracy from the old model when it was initially trained?
*At leas the docs seem to give warnings for limitations which is nice: ```
WARNING
Taichi only supports fields of dimensions ≤ 8.
(I happen to have used 12 dimensional arrays so 😦 )
sounds nasty 😌
what's the difference between the two? how do you evaluate the model in step 1?
(I do like "fields" more than "tensors" actually, might yoink that naming)
(I just call them ndarrays in my stuff, because I like to call data structures exactly what they are)
hmm i don't think it's a clear nomenclature though, considering field is already used for the fields over which one defines vector spaces, or when working with vector fields, it kinda hinds at there being other (possibly spatial) dimensions to which vectors are assigned
i prefer tensor if it's a multilinear transformation, n-way array otherwise
I think Taichi comes from graphics programming, so the name kind of makes sense given that background.
i see... but having to relearn nomenclature AND syntax/API makes it less appealing 😛
But I will just stick with ndarray, like numpy. The actual name is rarely typed since I have factory procedures (functions))
i like jax cuz it looks the same as numpy and the nomenclature is mathematically sound
forgot to add that poart my bad - THe user would add new data to the database - the user can then initiate an evaluation. I dont really have anything specificly defined for that evaluation, but I assume something like: ```python
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
The model is already a .pb and I would load it in, and then probably run something like .evaluate() on it. Shouldnt this accuracy then be used for stage 3, as this accuray is the one impacted by the new data?
i don't get your last point
this evaluation is not that different from the validation that was already done when the model was trained
but if the evaluation is done with a lot of new data added to the dataset couldnt that impact the accuracy?
ie the model with the original dataset may have had an 80% accuracy
but then a large amount of new data is added, model is evaluated, It may have an accuracy of 60%?
ok, that's fair. but by retrain you don't mean from scratch, do you?
I do
*Due to the graphics background Taichi seems to have spatial partitioning trees so one can do voxels, fluid simulations and such. Pretty neat.
you could also treat this as a batching strategy for the data, though in fairness the populations would have different statistical properties
yeah we are a bit limited on time ;-; so some stuff like this is quite rushed
i see. anyway, yeah, that makes sense. evaluate using the new data
just make sure it's split properly so that you don't evaluate on data you will also train on
Could that cause overfitting?
not overfitting, unfair evaluation
you wouldn't even be able to tell if there was overfitting in that case
ok that's pretty handy, yeah
maybe i'll force a student to look at it in some ultrasound simulations :x
Would this still be the case if the new data is added to the old data? Ie the dataset for retraining will always have the old data + new data addded to it?
all i mean is that the data you use for evaluation cannot be part of the training data
so when you add new data to do this evaluation, that data can't be used for anything else
hey is there a numpy function for the gram-schmidt for finding a set of basis vectors?
not directly, but you can use a QR decomp or SVD to achieve a similar effect
QR is probably faster
guess i look into it, otherwise I'll be relying someone else gsbasis function i found on github
scipy linalg orth produces an orthonormal basis, and the docs say it does it via SVD as i suggested
like so
yeah the github function shows similar method, but I am not sure if this is robust:
import numpy as np
import numpy.linalg as la
def gsBasis(A) :
B = np.array(A, dtype=np.float_) # Make B as a copy of A, since we're going to alter it's values.
# Loop over all vectors, starting with zero, label them with i
for i in range(B.shape[1]) :
# Inside that loop, loop over all previous vectors, j, to subtract.
for j in range(i) :
# Complete the code to subtract the overlap with previous vectors.
# you'll need the current vector B[:, i] and a previous vector B[:, j]
B[:, i] = B[:, i] - B[:, i] @ B[:, j] * B[:, j]
# Next insert code to do the normalisation test for B[:, i]
if la.norm(B[:, i]) > verySmallNumber :
B[:, i] = B[:, i] / la.norm(B[:, i])
else :
B[:, i] = np.zeros_like(B[:, i])
# Finally, we return the result:
return B```
yeah this looks like vanilla gram schmidt, might have some numerical stability issues
i'd just use the scipy one if you don't wanna make a robust one yourself
thank you
Hey everyone, I am trying to find answers to a strange problem in Tensorflow Keras where the exported weights dont seem to be producing the same results and am trying to being some attention to the problem to see if anyone can help understand why this is happening, there is a GitHub issue concerning the problem here: https://github.com/keras-team/keras/issues/17332
If anyone can help shed some light on this issue it would be greatly appreciated, thank you.
hi, can someone give me an example of web crawling with parallel processing?
would someone tell me where should i learn data science for free
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
yes
click the link 😛
what should I learn in AIML specifically?
what is required to be a professional
someone guide me please
a computer science degree related to AI/ML, probably at the masters level
@serene scaffolddid you complete your masters
I'm like the only member of my department who doesn't have a masters (and many have PhDs), but I'm starting grad school next month
not that I'm some prodigy. I was very fortunate to be hired. but I also cultivated a very niche skillset during my undergrad that happened to be what they wanted.
silly question: Is good maths neccesary for ML
depends on what you mean by "good at maths". ML is math. but being "bad at math" is mostly a state of mind.
You need 3 stuff generally
- linear algebra
- multivariate calc
- statistics (PCA?) still working on this
I'd recommend starting with this
What might it feel like to invent calculus?
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/lessons/essence-of-calculus#thanks
In this first video of the series, we see how unraveling the nuances of a simp...
and this
though bare in mind that even if you learn those things, potential AI/ML employers won't really take you seriously without a degree. whether that's fair or not can be debated.
best channel for maths overall
very strong disagree there
due to how 3b1b presents content, it only makes sense if you have already covered the content in another form first
e.g. by reading in a book or in lectures in uni
on its own the channel does not provide nearly enough background nor detail of the right kind to be a standalone good math resource
I don't think the Essence of Calculus series is intended to help one learn calculus starting from algebra. I think it's intended to strengthen one's existing understanding of calculus.
And I think his series on neural networks with MNIST is great for laying a foundation
So yes, I agree that it's not a good standalone math resource. But I don't think that's what it's trying to be.
i agree, but the way historify and curry chicken recommended it is misleading if the other person has no prior knowledge
and I think the guiding principle for what videos he decides to make is "what aspects of a given topic could be better explained with animations than with static visuals?" and goes from there
Sorry if i give bad advice, but yeah should have put asterisks that this should have been supplemented with examples like in books and exercises
not bad advice, just needs a little more oomph 😛
@wooden sail I somewhat agree with you. I felt it was the best channel for me for calculus stuff atleast, I still watch his videos for fun...
Also like, everyone's personal opinion differ right? 
most certainly. my wording might have been harsh, but at the end of the day it's my opinion rather than fact 😛
!otn a edd's opinion is fact
:ok_hand: Added edd’s-opinion-is-fact to the names list.
lol
I saw curry chicken and I immediately thought it's some food condiment brand and stuff like that 😄
Hey, Curry Chiken, cool name mate. Are you from India or Spain?
Is anyone here attending ICLR 2023 ?
We can do a small PythonDiscord Hangout / Dinner if we're up to 3 people that'll attend ICLR 😎
You should have a decent statistical skill set at minimum
@iron basalt ChatGPT might not have all the answers...but he's quite a Sokrates.
I was talking to him about my GAN and that my discriminator had its loss stabilized around 0.34, while my Generator loss, in the best case, oscillated between 5 and 8, and he just told me "you could train the generator for more epochs with the discriminator's weights frozen"
Like...every code I've seen about GANs uses the approach 1 batch -> 1 step for discriminator -> 1 step for generator.
But then this reminded me that Goodfellow suggested that one could use more iterations per batch in both discriminator or generator. He just used 1 in the paper for convenience. Heh
Now I'll see if 5 more iterations in the generator works(it has 5 times more parameters than the discrimiantor)... using a content loss function to try to avoid model collapse, of course
I was trying to compensate this by using a higher learning rate for the generator optimizer, but it also didn't work.
question
i was wondering if this code would work for a stock trading bot that uses machine learning and stock strategies
I have a Bert-for-NER model, named m0, that does 9 classes, so the final layer is Linear(in_features=768, out_features=9, bias=True). And I'm trying to make a copy of that model, m1, that does all of those same classes and three additional ones. So I did this.
m1 = BertForTokenClassification.from_pretrained('./m0.pkl')
linear = nn.Linear(768, len(e1)) # len(e1) == 12
with torch.no_grad():
linear.weight[:len(e0), :] = m0.classifier.weight.clone().detach() # len(e0) == 9
m1.classifier = linear
m1.train().to(cuda)
And if I print m1, I can see that indeed, the last layer is (classifier): Linear(in_features=768, out_features=12, bias=True).
But I still end up wit this error
Traceback (most recent call last):
File "/home/farnsworthsw/projects/cont_learning/replicate_addner.py", line 194, in <module>
optimizer = AdamW(m1.parameters(), lr=1e-5, eps=1e-8)
File "/home/farnsworthsw/projects/cont_learning/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/farnsworthsw/projects/cont_learning/venv/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1785, in forward
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
RuntimeError: shape '[-1, 9]' is invalid for input of size 7296
I checked all the layers of both m0 and m1, and I didn't see anything in either that depended on the number of classes except the classifier layer (which is the last one), so I'm not sure why this would suddenly become a problem.
I suppose I could try making m1 with nn.Sequential of all the layers of m0 except the last one, and then the new linear one.
I hope my question is sufficiently detailed without being too much.
omg sorry missed thisss
shall i np.shape(h)
import numpy as np
import matplotlib.pyplot as plt
# Define functions to compute the right-hand sides of the differential equations
def f_x(x, y, vx, vy):
return -2 * y**2 * x * (1 - x**2) * np.exp(- (x**2 + y**2))
def f_y(x, y, vx, vy):
return -2 * x**2 * y * (1 - y**2) * np.exp(- (x**2 + y**2))
def trajectory(impactpar, speed):
maxtime = 10 / speed
t = np.linspace(0, maxtime, 300)
x = impactpar
y = -2
vx = 0
vy = speed
# Initialize arrays to store the solutions
x_sol = np.empty(t.shape)
y_sol = np.empty(t.shape)
for i, _t in enumerate(t[:-1]):
h = t[i+1] - _t
k1_x, k1_y = h * vx, h * vy
k2_x, k2_y = h * (vx + 0.5 * k1_x), h * (vy + 0.5 * k1_y)
k3_x, k3_y = h * (vx + 0.5 * k2_x), h * (vy + 0.5 * k2_y)
k4_x, k4_y = h * (vx + k3_x), h * (vy + k3_y)
x += (k1_x + 2 * k2_x + 2 * k3_x + k4_x) / 6
y += (k1_y + 2 * k2_y + 2 * k3_y + k4_y) / 6
vx = f_x(x, y, vx, vy)
vy = f_y(x, y, vx, vy)
x_sol[i+1], y_sol[i+1] = x, y
return x_sol, y_sol
x_sol,y_sol = trajectory(0.1, 0.1)
# Plot the resulting trajectory
plt.plot(x_sol, y_sol)
plt.xlabel("x")
plt.ylabel("y")
plt.show()
# Solution to part (b)
def scatterangles(allb, speed):
# Initialize an array to store the scatter angles
angles = np.empty(allb.shape)
# Loop over the impact parameter values
for i, impactpar in enumerate(allb):
# Solve the differential equations and store the final values of x and y
_, vy = trajectory(impactpar, speed)
# Compute the scatter angle
angles[i] = np.arctan2(vy, 0)
# Return the array of scatter angles
return angles
allb = np.arange(-2, 2, 0.001)
angles = scatterangles(allb, 0.1)
# Plot the scatter angles as a function of impact parameter
plt.plot(allb, angles)
plt.xlabel("Impact parameter")
plt.ylabel("Scatter angle")
plt.show()```
Eeeeh...maybe he was wrong here, too. Again, discriminator loss stabilized at 0.33 and generator loss stabilized around 5.73

I hope I'm just being too hasty
In the end, the correct answer was "try using residual blocks". I was trying to use an architecture similar to SRGAN but without residual blocks, but, apparently, residual blocks are magical. Though I don't quite understand the justificative on why they work so well...
I can understand when you concatenate a residual block to your output, but I don't quite get it why it works when you directly sum the residual blocks, element-wise.
huys, I have this column in a Pyspark data frame:
date
2022/1/1
2022/10/2
2022/2/4
and I really need to convert the datas to:
2022/01/01
2022/10/02
2022/02/04
how can i do this with pyspark?
How do I make an AI that tries making different chords and tests if the user likes them or not and keep making chords that the user likes?
can we create a auto encoder for dealing with names ?
if so can you link me to some paper/article
anyone good in xpath here?
//*[contains(concat( " ", @class, " " ), concat( " ", "organizationName", " " ))]
Can someone explain to me how this xpath works?
I am trying to crawl data from a website using scrapy
is there any service for labeling product images?
I used Google Vision and probably if I had an hour I would deploy a service better than that...
but mine wouldn't be enough as well
Not sure if this is the right place to ask,
I've been a ML Engineer for 2+ years now in the same company (straight out of college)
I'm considering switching companies soon and was looking for potential project ideas to put on my resume.
Is there any place I can get ideas from? (Which aren't too generic)
My resume just has one project at present
join a kaggle competition, you might want to form teams with someone you know. Gravitate to a topic that is relevant to the company you are interested in or that satisfy the new job's skills
for example if you want to be an ML in a real estate company, perhaps you might want to join this: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques
Wouldn't these would be more fit for a Data Scientist rather than ML Engineer profile?
they kinda do, but i think you can develop some kind of ML technique and test it against top results (other competitors which you can see their codes in kaggle).
Does someone know how to change the colors that they won't look so similar?
You may try something similar to this:
axes.scatter(high_action.suspense, high_action.action, high_action.comedy, c="red", marker="x", s=200)
axes.scatter(low_action.suspense, low_action.action, low_action.comedy, c="blue", marker="o", s=200)
axes.set_title("Sample Movies")
axes.set_xlabel('Suspense')
axes.set_ylabel('Action')
axes.set_zlabel('Comedy')```
And the full article will be helpful if you are using matplotlib
Try using a Generative model(Diffusion, GAN, Variational AutoEncoder) to make the chords, and take a look at Reinforcement Learning methods to consider the user feedback.
(I'd recommend asking ChatGPT about PPO...it explains in a quite clean way how it works. But also consider Time Difference Learning)
Am I the only one that thinks that Advanced indexing in numpy doesn't follow the principle of minimum astonishment?
for example
a = np.random.rand(100, 100)
a[(2,4)] #this yields the element at [2,4]
a[[2,4]] #this yields the rows at position 2 and 4
a[1, (2,4)] #this yields the 2nd and 4th elements of row 1. (So actually does advanced indexing)
a[1, [2,4]] # Works the same way as the previous one.
Worst of all, it's very easy for someone do a mistake and not notice it: it seems to me that the first method, a[(2,4)], should not be allowed, and instead only a[*(2,4)] should work. I checked how it works in Julia (which has a similar syntax), and a[(2,4)] would yield an error, which makes sense to me. Could it be an idea to deprecate a[(2,4)]-like usages?
numpy syntax is not a magical layer on top of python, it uses the features provided by python - when you do arr[1, 2] that 1, 2 is a tuple. ```pycon
class Foo:
... def getitem(self, *something):
... print(repr(something))
...foo = Foo()
foo[(1, 2, 3)]
((1, 2, 3),)
foo[1, 2, 3]
((1, 2, 3),)
@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | (1, 2, 3)
002 | (1, 2, 3)
!e without the * (had it there for testing)```py
class Foo:
def getitem(self, item):
print(repr(item))
arr = Foo()
arr[1, 2, 3]
arr[(1, 2, 3)]
arr[*(1, 2, 3)]
@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | (1, 2, 3)
002 | (1, 2, 3)
003 | (1, 2, 3)
I have a data frame that looks like this. I used a recursive function to create a lineage of dependencies. The problem is, some of the lineage routes are incomplete, though the data set will include the complete route at some point.
How can I remove the incomplete routes and preserve the complete routes?
Here the red line is incomplete, while the green line is complete.
JAI HIND. I AM FROM INDIA AND LOOKING FORWARD TO BE A DATA SCEINTIST
what makes the marked rows incomplete?
So it takes every row of the original data set, and then creates that rows chain.
but any particular row in the original data set can be in the middle of a chain
how do u define whether a row is complete or incomplete then
It's more like a series of rows
to cluster ur data u need to define rules for clustering
It's like this
row 1
row 2
row 3
row 4
is complete series
row 3
row 4
is incomplete series
so different sources ?
use something like this in a pandas df:
df.loc[df['column_name'] == some_value]
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
or make a boolean df beforehand and filter afterwards.
unfortunately I do not understand your question logic
I think what he meant that the chain creation is created by the dependency and the dependent
Like delta cool 1 depedent is cool 1
Then the delta cool 2 dependency is cool 1
Index chain 9, 10, 11 is incomplete because if you look at one of their dependency and the dependent for the next index are not equal.
Like delta cool 3 in index 9 has dependent cool report but delta cool 2 in index 10 has the dependency cool 1. This forms an incomplete chain.
Yeah, that's right.
Delta cool 3 in index 9 is the end of a chain, so there isn't anything that comes after it.
So there's a correction there on my part, the chain actually begins at index 10, with delta cool 2, but is incomplete
Another way to say it, is I want to only preserve a chain that begins at the root
the dependency
I realized that maybe this isn't ideal, though.
Basically, I'm pulling definitions of stored procedures from a SQL database and then parsing the from and into clauses to find dependencies.
And I envisioned this as a spreadsheet lineage.
Now I realize that preserving only the complete series might be a bad idea, because you might want to search a lineage when the "root" is actually in the middle of a series.
Thank you for the help @patent lynx - I'm going to go for a walk. Enough hacking for now.
Thank you
turns out my solution was basically right, except that I needed to deconstruct m1's individual layers into a nn.Module subclass.
I've a question why does all my plots look the same is my energy functions wrong
# YOUR CODE HERE
import numpy as np
import math
import matplotlib.pyplot as plt
G=6.6738e-11
M=1.9891e30
m=5.9722e24
def verlet(x0,y0,vx0,vy0,N,paramters=()):
t = 1/N #timestep
G = paramters[0]
M = paramters[1]
m = paramters[2]
x=np.zeros((N,2))
v=np.zeros((N,2))
x[0]=(x0,y0)
v[0]=(vx0,vy0)
for i in range(N-1):
x[i+1] = x[i] + (v[i] * t)
f = -G * M * x[i+1] / (np.linalg.norm(x[i+1])**3)
v[i+1] = v[i] + (f * t)
return x,v
def solve(par):
xval,vval = verlet(1.521e11,0,0,2.9291e4,35040,paramters=par)
return xval,vval
def potentialEnergy(r,par):
energy = np.zeros(len(r))
vals = r[:, 1]
for i in range(len(r)):
energy[i] = par[2] * par[0] * np.linalg.norm(r[i])
return energy
def kineticEnergy(v,par):
energy = np.zeros(len(v))
vals = v[:, 0]
for i in range(len(v)):
energy[i] = 0.5 * par[2] * ((np.linalg.norm(v[i]))**2)
return energy
xval,vval = verlet(1.521e11,0,0,2.9291e4,35040,paramters=(G,M,m))
pe = potentialEnergy(xval,(G,M,m))
ke = kineticEnergy(vval,(G,M,m))
total = pe+ke
plt.subplot(3, 1, 1)
plt.plot(pe)
plt.subplot(3, 1, 2)
plt.plot(ke)
plt.subplot(3, 1, 3)
plt.plot(total)
plt.show()```
happy to show the question if needed
provide the question pls @keen notch
Hey @keen notch!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
omgg
looks better i thinkk
the start scale is squashed
tbh not sure how it's supposed to look just knew my graphs looked very wrong😂
😄
how did you do it in terms of code
ohh ok let me know what's up
ooo thank you so what was the issue
so i can understand
DAMN YOUR GENIUS
is there no orange plot?
under the green one
u see blue line is by 0
so there wont be many changes to the resulting curve
i would assume its ur np.zeros all time so u get a mismatch somewhere
but i didnt check ur "logic"
so the iterations u do
i would suggest to check em
ahh fair enough so what does this do
good plan
u right acc that makes sense
which part
the appends
oh right t[-1]?
thats why u need to append later and why i think ur np.zeros are resulting in mismatch
ahhh i seee
smart
and interesting
t[-1]?
u want to compare last t value with the tmax
ohhhh
to iterate only in the range
!e
import numpy as np
test = np.array((1,2,3))
print(test[-1])
no worries
but [-1] is pretty basic stuff
u should check the basic first
before u attend such difficult questions?!
i google haha
i should ur right
u doing it for university?
working on a course outside uni but yes
in summer i will it's more a side hobby atm
for sure!!
thats totally fine
but maybe ur current class is then too advanced for u
its no shame to start small
that's true!!!
i'm more a C girl
i'm just trying to do python questions I've done some basic ones tbf and didn't have problems
its a good learning approach
however can be frustrating and difficulty
im out for today have a great night
🦉
have a goodnight too!! and thanks again I'll take your advice
one question
shouldn't the total remain constant?
im trying to train a model with labels in the format of integers ranging from 0-2 and am getting this error:
Received a label value of 2 which is outside the valid range of [0, 1). Label values: 1 2 2 2 2 0 0 0 1 1 2 2 1 1 0 0
[[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_73146]
I know that sparse crossentropy is supposed to accept integer encoded labels as opposed to one - hot, so what am I doing wrong?
lol my q learning bot that I'm trying to teach to play a top down shooter game has learned that pygame has lag
I don't have a cap on how many bullets it can shoot, and I gave it a penalty for not putting its laser on the target and a reward for being on target, so naturally as it spun around trying to find the target, it realized it could minimize its penalty if it slowed down time
so in between every turn action it takes, it fires another bullet
now that it's found the target I think it's about to learn that not slowing down time is actually beneficial
So I was following a notebook on kaggle to train model on TPU which was by kaggle grandmaster Phil Cullinton https://www.kaggle.com/code/philculliton/a-simple-tf-2-1-notebook/notebook . In this he has used VGG16 model which takes the input shape as (224,224) during training. But he is training with (192,192) input shape images ?? On GPU or CPU it will throw an error. How is it possible ??
Hi, I hope everyone is having a great christmas. I'm searching for some advice and think that this might be the right channel to ask. I'm about to finish my theoretical physics degree in barcelona and I've taken some computational courses+ I've done Machine Learning course and an internship ML related in stockholm (erasmus). I really enjiyed this topics and was thinking about pursuing a carreer in data science. I was hoping that there is someone with a similar background that could give some advice. Thanks in advance 🙂
Idk without seeing the rest of the plot but ur total energy should be higher than the kinetic energy too
sry without seeing the rest of the code
Maybe its not properly labelled
it is labelled properly?
the thing I'm unsure about is whether the total has to stay constant
Okay dont look at this comment
energy should be more in absolute value in this case
not sure what I'm doing wrong
Oh
I think it is right
it oscillates because of the method u're using
just check b
It does oscillate around a constant mean value
That can happen when using numerical methods
wait how so maybe I understand this wrong
ohh so it's fine
Paint
haha ohh
hahahah
but you can get the mean value with np.mean and then plot a constant line if u want
yess
Can u plot the potential and kinetic energy
top graph
yes but i cant see the potential
is it because the blue line is by 0
so there wont be many changes to the resulting curve (orange)
So this basically
Hey guys, question regarding regression model.
I want to predict a salary based on categorical data such as experience level or job title.
What model would recommend?
Thx in advance 🙂
so we all good:))
Im not sure
hahahahahah
If u could plot only the potential
im assuming it will look just like the total energy
that might be right
hmm i'll think
but thank you for your help!:)
np
It still doesnt look quite okay
But think about it I can't help rn maybe at night
ahhh no a is wrong total should be constant
it's okee
does anyone know feature engineering ?
That's some what of a vague question. There's a lot of parts to feature engineering. What do you need to know?
I have 4 dataset
I want to apply feature engineering using these four dataset with an hypothesis question
So what's the goal? Are you trying to first merge these datasets together? By country and then by year?
Saying "apply feature engineering" is very generic. It would be like: I want to apply math to this math problem. Do you want to do integrals? Algebra?
apply a model and predict something
Do you mind if we can get on call and explain you everything @charred light
thinking about applying linear regression model
Then you would want to start by merging these datasets together. You can use Pandas and use pd.merge function.
I don't have time to commit to a call, nor do I really want to do that either.
I do really need help @charred light though
And text is just fine to do that.
you should convert the data over to a common format using sklearn, unable it and then throw it in your models
first thing is to compair the types of data using pandas.
df1.dtypes
for example
can you explain using a zoom call
lol not zoom no
oops we do have discord
I did that
https://pastebin.com/LJaFfFsE
Can someone please help ? This is a deep learning problem. I trained a gesture learning model on 225x225 pixel images using keras and neural networks. I saved the model to an h5 file. Above is the code I want to use for detecting it. However when I show my hand in front of the camera it shuts down right away with error being that
ValueError: Input 0 of layer "sequential_4" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(None, 21, 2)
I am fine with detecting my hand within a region of interest. But what other things I can do to fix this problem and get it up and running?
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
also anyone know why this isnt working
l1.dytpes
bare_nuclei object
----> 1 l1["bare_nuclei"] = pd.to_numeric(l1["bare_nuclei"])
TypeError: 'method' object is not subscriptable
l1["bare_nuclei"] = pd.to_numeric(l1["bare_nuclei"])```
The proper use case is just pd.to_numeric(l1["bare_nuclei"]). You don't need to set it back to the dataframe.
well i kinda need to shove it there for all the stuff i am doing later
? does pd.to_numeric actualy convert the data?
@fading zealot https://www.statology.org/pandas-merge-multiple-dataframes/
This tutorial explains how to merge multiple DataFrames into one in pandas, including an example.
give that a read
Yes, it applies similarly to the inplace flag in other pandas functions
also @charred light just using pd.to_numeric(l1["bare_nuclei"]) has same error
Do a type(pd.to_numeric(l1["bare_nuclei"])) and see what that returns. Also, check your dataframe. type(l1["bare_nuclei"])
@rancid sorrel Do you want help me later?
ValueError Traceback (most recent call last)
File ~/.local/lib/python3.9/site-packages/pandas/_libs/lib.pyx:2363, in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "?"
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[4], line 1
----> 1 type(pd.to_numeric(l1["bare_nuclei"]))
2 #pd.to_numeric(l1["bare_nuclei"])
File ~/.local/lib/python3.9/site-packages/pandas/core/tools/numeric.py:185, in to_numeric(arg, errors, downcast)
183 coerce_numeric = errors not in ("ignore", "raise")
184 try:
--> 185 values, _ = lib.maybe_convert_numeric(
186 values, set(), coerce_numeric=coerce_numeric
187 )
188 except (ValueError, TypeError):
189 if errors == "raise":
File ~/.local/lib/python3.9/site-packages/pandas/_libs/lib.pyx:2405, in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "?" at position 23```
the pandas.core.series.Series is the type(l1["bare_nuclei"])
Ah, you'll need the errors flag
You have a "?" in one of your rows, which is causing the error.
oh bugger yeah that would do it sorry this data is supposed to be sanitized already
time to go thow some chlorox at it
Welcome to DS, the data is never clean (No matter what the data team tells you).
yeah 16 ? in the data
thank god the dataset is small enough i can just open it in vs code
That's probably why it's being read in as an object too.
do we have a crap in crap out emoji?
It should have defaulted as a int/float.
yeah i would have expected that, honestly i am final year CS this is my first time dealing with Data Science
💩 in --> 💩 out
@charred light in order to predict something from a dataset what are the steps we need to take into consideration?
apply models and predict the accuracy ?
Explore data analysis and then apply the model to predict?
#without scaling funtion for the models
def models(X_train,Y_train):
#Logistic Regression
from sklearn.linear_model import LogisticRegression
log = LogisticRegression(random_state = 0)
log.fit(X_train, Y_train)
#Decision Tree
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier(criterion = 'entropy',random_state=0)
#Random Forest class1ifier
from sklearn.ensemble import RandomForestclass1ifier
forest = RandomForestclass1ifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
forest.fit(X_train, Y_train)
#print model accuracy on the training data.
print('[0]Logistic Regression Training Accuracy:', log.score(X_train, Y_train))
print('[1]Decision Tree class1ifier Training Accuracy:', dtc.score(X_train, Y_train))
print('[2]Random Forest class1ifier Training Accuracy:', forest.score(X_train, Y_train))
return log, dtc, forest```
but thats a decent load of code to run all the models
Normally the process for modeling is as follows:
Feature engineering (Merge datasets, clean data, apply transformations) -> EDA (See relevant columns to be used in model) -> Modeling -> Check results + finetune if needed
@charred light any youtube video to help me ?
thanks for this
@fading zealot freecodecamp.org
@rancid sorrel can we use Linear regression ?
Learn Data Analysis with Python in this comprehensive tutorial for beginners, with exercises included!
NOTE: Check description for updated Notebook links.
Data Analysis has been around for a long time, but up until a few years ago, it was practiced using closed, expensive and limited tools like Excel or Tableau. Python, SQL and other open libra...
@fading zealot you test the prediction accuracy after
so use sklearn to split the data into train and test, then you apply the metrics used to test each model
once you have your models scored you can then chose the correct one
Oh k
I wish to learn that slowly @rancid sorrel with focus using different data sets from kaggle
https://www.youtube.com/results?search_query=linear+regression+python+sklearn is a good point to start.
e.g. https://www.youtube.com/watch?v=b0L47BeklTE
But you'll also need to know how to merge datasets first. https://www.youtube.com/watch?v=h4hOPGo4UVU
@charred light looking at the dataset I shared with you
what do you think what shall be a good hypothesis question
@charred light my uni skipped al the data sanitation and cleaning and just threw us at the , AAN,CNN and was like "lol enjoy the deep end here is a rock for you to help you float"
but yeah if your learning i recommend you use MD for notes with something like obsidan.md to collate all your notes into a vault (it dosnt work well with Jupiter however)
at least not yet
One could be: "Does CO2 emissions cause global temperatures to rise? (Spoilers: Yes it does)"
Another could be: "Does EV prices have an affect on CO2 emissions?"
Oh merge is so easy, it is just combining two dataframes and passing on the query using an attribute
The uni course probably has data cleaning/processing as prerequisite. At least that's what I would assume.
The problem at the moment is the term period I have .. I need to finish off with this asap
You'll probably be using 1978 Year + for global temps, and limited to 2010 year + for EV.
@charred light you seem an expert in this field
No expert here, just working as a data scientist. There's a lot to this field.
honestly i hope to reach the levels of skill you have oneday skyglow
@charred light if you want me to be honest. All I want you to explain using a Jupyter notebook and coding at the moment
and then guide me how to be a good data scientist
I am willing to learn and make it happen
well not that you have the time for all of it but i think you need to go back to basics
freecodecamp > cs50
intro to github basics
yeah this is like 2 weeks of time to watch
yeah i know what you mean, well bascialy your kinda boned
ya
If you or skyglow will help me with this project
that would be great
and then make a schedule or plan what to learn and how to be a good scientist
once you have followed, skyglows reccomendations, by merging the datasets
okay i will do that
@rancid sorrel can you stay online here
@charred light please do stay. I might need help
As a data Scientist, do let me know the road to be a good data scientist
@rancid sorrel do we need find the missing values as well
you can however assuming like me your not dealing with 8Pb datasets
you can just get the datasets into the correct shape, then deal with the sanitization
so share> sanitize > merge
ok
@charred light this was my hypothesis question
‘Will the increased usage of electric vehicles aid in decreasing CO2 emissions, therefore leading to reduced global warming?’
okay
your dealing with the demand side fo the equation not the supply side, and thats the main issue with EV
You can use that hypothesis. I'm not sure if you can see a significant impact over 12 years worth of data though.
honestly we should all be using h2 from biofule with carbon capture at conversion . but thats just my opinion
true
Or have better public transportation (for the US).
my general point is you got some major survivor bias with that analysis with your dataset
what would a unique hypothesis question ?
this one ?
honestly id compair the amount of EV adoption vs the adoption of renewable energy generation
and see how much energy your wasting, and analyse the supply side shortfall compared to gas
if you can chose your datasets your better off using non US data,
cause use emmesions are crap shoot (imo)
This is the datasets that was provided.
yes
fair enough just plow though it then 🙂
but was said we can use external data as well
sorry for going off on datascience no1 rule of cynicism
EU/UK has good data on this
as they use the europe emission standards, also EU has the satellite that tracks it
uk is good for EV adoption as its data is centraly avaible and accurate due to the regulations
@rancid sorrel how long does it take get done with the code ?
Yea, US has something like that from DOT (Department of Transportation). Although, most of it is null lol.
you have to have all your info up to date in the uk
or your going to jail, there are automatic number plate readers everwhere
so the DVLA (DMV) has a complete dataset
@charred light how long will it take for you get done with the feature engineering and the EDA with the data sets provided to you ?
ok
analysis = sv.analyze(l1, target_feat='class1') analysis.show_html('EDA-Sweetviz2.html', open_browser=False)
i am installing
Hey @rancid sorrel!
It looks like you tried to attach file type(s) that we do not allow (.html). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
Look at the relevant columns, and then prep the data for modeling.
which datasets you want me use for the hypothesis question
co2 emission and ev datasets ?
If you merged correctly, you should have 1 dataset with all the columns.
I know. that
but which datasets would you recommend me to merge
thats what I am asking
All of them
why packages not showing?
okay
I might be annoying to both of you but all I am left is with 8 hours to finish it off @rancid sorrel @charred light Apologies.
@charred light can you do the feature engineering for me ?
in a jypter notebook
finish what exactly?
Their homework lol
this
so yeah, no - that would be overstepping by quite a long margin.
it is just bcoz I dont understand now and I am left with no time
please don't ping people asking them to answer questions for you. everyone is a volunteer and no one is on-call to provide help.
I know
get a zero and study for the next time then
then why did you do it?
Because I am stuck at the moment
I don't know that we'll be able to help you become unstuck before the assignment is due.
I am not a frequent user asking for help
it is just bcoz I am stuck and with the christmas around I was not in a state to get done with everything
we and your teacher(s) would rather have you ask for help often to learn things when you're supposed to than ask for help only when given homework, and ask for everything then
i am in same situation and litterly dealing with youtube rn, btw do we have any rep?
you can ask a specific question that doesn't require catching up on all the context of what you're trying to do, or you can seek help elsewhere.
Moreover i am project manager and all I am stuck is with this code thats all
i would like to thank @charred light for helping me with my particual problem
I'm sorry that you're struggling. but all this extra information about your personal situation isn't relevant.
@serene scaffold I know what you are trying to explain
I am not dumb to come up here to get done with my assignment
If you want help, please make sure that your next message is a stand-alone explanation of what the problem is, with relevant code samples as needed.
@serene scaffold Seriously brother. It's not easy to understand other problems what they go through
This isn't going anywhere. If you're not going to ask your question in a way where people can provide you with meaningful assistance (and without doing it for you), we're just going to keep talking in circles.
when family is around with 20 people in your house and one wish to study .. it is next to impossible for me get done with everything
Ok
No more discussion of your personal situation in this channel. You can ask a question if you want. "Please do my homework for me", as we've discussed, does not count.
I never mentioned do my homework for me as a statement
Check if you want
Stop this crap and coding for a non-technical person is hard
Fine I wont ask anything
You need to look at me that I am eager to learn and understand the concepts
No more discussion please @serene scaffold
that's what I've been asking for 
stop pinging specific people. I already said that.
I was about to answer your question, too
you can ping people if they've already engaged with your current question. not if they engaged with a similar question in the past.
you would have to load each csv with pd.read_csv and then have the name of each df be the last statement of a cell
so if there are 3 dataframes to display, you need 3 cells.
I mean all to say all the three datasets
is each dataset not a CSV file?
so, you would do pd.read_csv four times. one for each dataset. and then you'd need four cells to display them
because each cell displays the result of the last statement
it gave utf-8 error
remember to always show the whole error message.
ok noted
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 150: invalid start byte
okay, so the encoding of your CSV file is different than expected. try encoding='ascii' in the read_csv function
so it would look something like df1 = pd.read_csv('file/path.csv', encoding='ascii')
you need a comma between the file name and the encoding= part
ok noted sir
if you do df.head() four times, only the last one will be displayed, afaik
your correct you have to use print (df.head()) for each one otherwise
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 30: ordinal not in range(128)
is the data in any particular language?
Can I share the datasets ?
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
but there isn't really any way for us to know what the encoding will be except to guess.
these are the datasets
oky i will try my way to view the dataset
try removing the encoding part and doing endoing_errors='ignore' and see if that works
ok
it looks like the problem is whatever char is at the end of this
Afghanistan,AF,93,1752,0,41128771,652230,0.40%,63/km�
so, we can just ignore it.
yes
did it work? I'll be leaving in about ten minutes btw.
oky
can you just tell me what feature engineering is
what are the steps need to be taken into consideration
it's where you use the data to create additional features
with the existing data ..yes i know that
how to predict something we need to have an hypothesis question
i have that
like these features: Year,BEV average price (USD),Global Sales Volume,Mileage (Km),Lithium Ion Battery Price (USD),,,Average price of new car
you might make another feature, battery price per milage
or something like that
is this what I need to do
Feature engineering (Merge datasets, clean data, apply transformations)
imo, only the "apply transformations" is feature engineering. data cleaning is its own thing.
can anyone explain why
`missing_values = ["NA","N/a",np.nan,"?"]
u1 = pd.read_csv("../DataSets/Breast cancer dataset/breast-cancer-wisconsin.data",header=None,na_values=missing_values)
ul.dropna()`
isnt working for ul.dropna()
is dropna() a predefinded funtion or does it take my missing_values when called this way
.
dropna() generally only applies for None types or np.NaN (Not a Number). See https://pandas.pydata.org/docs/user_guide/missing_data.html#values-considered-missing
You'll need to filter out user defined list of null values manually. (You can use some type of filter to do this. e.g. .isin())
it appears i needed l1 = l1.dropna()
What does numpy use for the visualization of data in their documentation?
missing_values = ["NA","N/a",np.nan,"?"] << appears to flag the ? as a null value
i also tried l1['bare_nuclei'] = pd.to_numeric(l1['bare_nuclei'],errors='coerce') @charred light
Did some error pop up?
no that swaps the errors to nan
at which point icould do a drop null easily
errors='coerce' << swaps errors to nan
Ok, good to hear.
unless i am making a mistake. but so far thats a fairly novel way to handle the problem. now to turn autopilot back on
It can be better to go in and manually fix errors (depending on scale of your data). Like mentioned earlier, if the data point is 16 ?. Then it could be better to clean this with apply and some function. But then again, if you have large rows of data, it doesn't really matter losing one data point or two.
You might want to clarify what you mean by flattening a pip list. I assume pip here is apart of the pip python package, which you can send to a txt file.
100% but i am working with a team
and i am responsible for creating the templates for the EDA
so the templates have to do the cleaning for them rather than me manipulating the data, even if the data is small
also honestly the data forensic training i had really makes me not want to change the original data>
in band data manipulation is just a habit that's been forced into me so you can see the manipulation clearly. also the academics prefer it
Yea, academia tends to have perfect data. Real world data is mostly nulls lol
honestly my background is sysadmin and cybersecurity (15 is years) going back to accedmia was a bit of a mindfuck
at work id fix it with sed and damn the data or pull this from a SQL server and fix it there
I would sure hope the data is cleaned before heading into a database.
honestly thats usally where you get the crap show
Yea, it's always a fight with our digital team that handles the databases lol
honestly its usally the frountends fault 😉
they are not doing the sanitisation at the java script 😉
that's why I add a ; DROP TABLE usernames every time a wifi connection asks me for info
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
# Load and preprocess stock data
data = load_stock_data()
X = data[['past_performance']]
y = data['future_performance']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)
# Use the model to make predictions on the test data
predictions = model.predict(X_test)
# Evaluate the model's performance
score = model.score(X_test, y_test)
print("Model accuracy:", score)
# Use the model to make decisions about which stocks to buy or sell
while True:
current_stocks = get_current_stocks()
current_prices = get_current_prices(current_stocks)
for stock, price in current_prices.items():
prediction = model.predict([[price]])
if prediction > price:
# Buy the stock
buy_stock(stock, price)
elif prediction < price:
# Sell the stock
sell_stock(stock, price)
time.sleep(3600) # Wait an hour before making new predictions```
Would this work as a stock trading bot?
What add ons or tweaks would need to be made for it to work efficiently
Depending on how much money you willing to lose. Maybe have a hold out set to test.
?what
If I make a GAN and, instead of passing the Generator's output directly to the Discriminator, I pass it to a SuperResolution model and only then I pass it to the Discriminator...would this make things too messy?
it should be fine, it'd make the two more capable of working independently. ofc there's now way more parameters so the training would be slower (unless you train and freeze the super res part ahead of time)
Yeah, I was thinking about using a pretrained SRGAN in order to make my Generator(in my current GAN) to produce images with a better resolution
I mean...even images in 64x64 are so blurry without superresolution models...
(At least if I'm not making anything wrong...which is quite likely)
do keep in mind though that doing SR does not result in any extra info... at least normally. using a network that might be different, but the added info is bias that the network picked up from the training data
the original architecture without SR should perform about as well as with SR if everything is working ideally
Wouldn't the SR stimulate the generator to make better images? I mean, it would remark the generator's mistakes for the discriminator, wouldn't it?
@charred light like a minimum Range and a maximum Range?
that'll depend on what exactly the network does. most cost functions anyway have some kind of averaging incorporated, so i'm not convinced it'll make a huge difference
but try it out and see!
I'll need an AWS entire building just to learn about GANs...
Meant more as a joke. You should be testing your model on a hold out set if you haven't.
Helpful resource for data science - Data science dictionary
https://play.google.com/store/apps/details?id=com.neuralnetworker.datasciencedictionary
Hello guys, I have this error : sklearn.exceptions.NotFittedError: The TF-IDF vectorizer is not fitted
at the code below :
def predict(message : str) :
tfidf_vectorizer=TfidfVectorizer(stop_words='english')
pac = pickle.load(open("model.pkl", "rb"))
vectMessage = tfidf_vectorizer.transform(pd.Series([message]))
prediction = pac.predict(vectMessage)
return prediction[0]
I don't know why I have this error because I am transforming the series so it can fit. Thanks in advance for the help.
Before saving the model into a pkl it worked fine, but now it keeps raising this error
if somebody could help it would be good, thanks in advance
The error sklearn.exceptions.NotFittedError: The TF-IDF vectorizer is not fitted is raised when you try to use a scikit-learn estimator that has not been fitted yet.
In your case, it seems that you are trying to transform a message into a vector using the tfidf_vectorizer object, but this object has not been fitted to any data. To fit the vectorizer, you need to pass a list of documents to the fit method, like this:
tfidf_vectorizer=TfidfVectorizer(stop_words='english')
tfidf_vectorizer.fit(list_of_documents)
Where list_of_documents is a list of strings representing the documents you want to fit the vectorizer to. Once the vectorizer is fitted, you can use the transform method to transform new documents into vectors.
In your case, you might want to consider fitting the vectorizer to the training data that you used to train the classifier, so that the vectorizer is able to transform new messages in the same way as it did for the training data.
I hope this helps!
how can i find multi-input models from scikit learn and compare them on a given dataset?
Here is an example of how you could use the model_selection module to compare different models on a given dataset:
from sklearn.model_selection import cross_val_score
Load the dataset
X = ... # Input features
y = ... # Target variable
Define a list of models to compare
models = [LinearRegression(), LogisticRegression(), DecisionTreeRegressor(), RandomForestRegressor(), SVR()]
Iterate over the models and print their mean cross-validation score
for model in models:
scores = cross_val_score(model, X, y, cv=5) # 5-fold cross-validation
print(f"{model.class.name}: {scores.mean():.2f}")
This code will train each model on the training folds of a 5-fold cross-validation, and then evaluate its performance on the corresponding test folds. The mean cross-validation score for each model will be printed at the end.
I hope this helps. Let me know about it.
ty for respons i do use CV for LR but i want to compare LR with other models, however i dont find on scikit-site a list of models who are compatible
Yes i thought it was possible to consider the text as something to be predicted by the model
so I will always need to insert the new document I want to predict inside a serie of documents right?
Yes, that's correct. When you use a scikit-learn vectorizer to transform a document into a numerical representation (also known as a feature vector), you need to pass the document as a list of strings to the transform method.
For example, if you want to transform a single document message, you can do it like this:
vectMessage = tfidf_vectorizer.transform([message])
If you want to transform multiple documents at once, you can pass them as a list of strings:
vectMessages = tfidf_vectorizer.transform(list_of_messages)
Where list_of_messages is a list of strings representing the documents you want to transform.
yes but in my code as you can see I wrote [message]
Disclaimer: those are the responses of chatgpt.
To add to what CheemaBhaiExpereince has suggested, I also use this approach when I'm extremely lazy.
this looks good to me ❤️
does all work for multiple features?
can someone help me?
Post what you need help with and people will respond to your question.
I wish this was added under #rules or #code-of-conduct for easier access.
Is anyone aware of Lark grammar libraries? Not even necessarily actual libs but even just published grammar sets. I searched the discord, doesn't seem to be much here. I found: https://pypi.org/project/lark-grammars/0.3.0/ but I'm wondering if I can get an even larger set of grammar use cases/solutions.
What do you guys think about this app?
I saw you posted that same link on the data science discord @sudden ermine. Your own app?
Yes Its my first app
@odd meteor it seems like this works only for 1 feature inputs?
Nice observation. I'll forward your observation to the main mods.
cc: @serene scaffold
there's ongoing discussion about that in #1044328825145786458
It works for more than 1 input features x1, x2,..., xn
for both x and y?
I had no idea. Okay that's supercool.
currently i use df as input and it returns an empty result for clf.fit
do i need to convert my features into arrays?
normally scikit works with dfs aswell?
@odd meteor any idea?
i converted the df now to array with .to_numpy() but still empty result
Hi Greenleek, If you were able to split your data with Train-Test-Split, all you need to do next is to just follow the snapshot I sent. If it's still not super clear, I'll advise you try replicating this on your PC to see how it works. Once you are able to replicate the results, then using lazypredict in your own project will become easier to grab.
Just follow the screenshot I sent, or better still check the documentation for more clarity. https://lazypredict.readthedocs.io/en/latest/usage.html#classification
thanks for the reply emyrs but i did tried ur method on the test-set however when using my data its not working Q_Q
if i use lazy regressor it gives different results then my previously determined ones so i guess something isnt working with the input
i dunno why tho cause the input is exactly the same as in the example, with the only difference that both X and y got multiple features
Just try and see if you can replicate the result from the example on the documentation page.
i can
Awesome.
The reason you're getting different result could be because of a couple of things...
- Random State used
- The hyperparameter tuned/involved etc
So long as you can replicate the result on the documentation page, you can just pick, say, the top 3 algorithms and try to do more hyperparameter tuning to improve the model performance.
but i do get an empty "models" after running the lazyclassifier 🗿
so it seems no model works for the offered data
even tho the data is in the correct format
Are you building a classification model or regression model? I've not had that issue using the library before. although I've not used it for a while now.
regression but there it results in errors as well
"y should be a 1d array, got an array of shape (20,20) instead"
so they arent capable to perform on multiple features iguess
and the LinearRegressor offers waaaaay different results then my previously run scikit
I've not had any issue with the library (the couple of times I used it). It worked perfectly. If you believe this is a serious issue, perhaps you can raise this issue on the library's GitHub page. https://github.com/shankarpandala/lazypredict
I'll try to use the library once I get home today to confirm if it's still working properly.
Also, you share the error message / your code?
there is no error simply an empty result frame but thanks for ur help, ill stick to manual changing models 🗿
Guys, I want to use a model in Pytorch which outputs 2 classes using softmax(softmax, not sigmoid). However, I don't know really which Loss Function I should use, as Pytorch's Cross Entropy includes a LogSoftmax implemented, and NLLLoss expects an output generated by a LogSoftmax function.
Any suggestion?
are there an assosiated rules for none binary variable?
like for iris dataset for example
Since there's some folks here that are quite mathmaniacs, can someone tell me if this madness I made makes sense?
The idea here is to adapt the Dot-Product Attention from Transformer(NLP) into a Element-Wise Attention layer to extract features from images. I want to avoid Matrix Multiplications because they're too computationally expensive.
class AttentionBlock(nn.Module):
def __init__(self, in_channels, n_attention_weights):
super(AttentionBlock, self).__init__()
self.create_x_weights = nn.Conv2d(in_channels, n_attention_weights, kernel_size=1, stride=1, bias=False)
self.create_y_weights = nn.Conv2d(in_channels, n_attention_weights, 1, 1, bias=False)
self.conclude_attention = nn.Conv2d(n_attention_weights, in_channels, 1, 1, bias=False)
self.Xsoftmax = nn.Softmax(-2) # Computes softmax over the X axis in a feature map
self.Ysoftmax = nn.Softmax(-1) # Computes softmax over the Y axis in a feature map
def forward(self, input):
x_weights = self.create_x_weights(input)
y_weights = self.create_y_weights(input)
x_weights = self.Xsoftmax(x_weights)
y_weights = self.Ysoftmax(y_weights)
attention_weights = x_weights * y_weights
attention_weights = self.conclude_attention(attention_weights)
attention_output = attention_weights * input
return attention_output
I've noticed that Transformer uses a "similarity matrix", that is the dot-product between queries and keys, then applies softmax to this product. But I don't see exactly how I could use something like this here, so I just applied softmax over the rows of some feature maps(which would be the row weights) and softmax over the columns of other feature maps(column weights) and then apply element-wise product to the input. The higher the X and Y weights, higher the final product, higher the relevancy...or so this is what I want.
I should simply test this...but I'm also crazy enough to have this idea while trying to make a GAN, so even if this works, it might not appear so, since...well...GANs things
I also don't know if maybe wouldn't it be better to just stick with a single Conv2D instead of doing all this
hey guys so i have this code that makes an error at the division of 2 different dataframe that i want to use for a new data frame... It makes NaN everywhere.. Here's the code: ```py
Chargement des données financières des entreprises du secteur pharmaceutique
df = pd.read_csv(r'data//income_statement.csv')
Sélection des colonnes à inclure dans l'analyse
cols = ['entreprise', 'date', 'chiffre_affaires', 'resultat_operationnel', 'resultat_net']
df = df[cols]
df['date'] = pd.to_datetime(df['date'], format='%Y', errors='ignore')
Filtrage des données pour ne conserver que les années précédant la covid-19 (2019 et avant) et celles incluant la covid-19 (2020 et après)
df_avant_covid = df[df['date'].dt.year < 2020]
df_apres_covid = df[df['date'].dt.year >= 2020]
Calcul de la moyenne annuelle des chiffres d'affaires et des résultats opérationnels pour chaque entreprise, avant et après la covid-19
df_avant_covid = df_avant_covid.groupby(['entreprise', df_avant_covid['date'].dt.year]).mean()
df_apres_covid = df_apres_covid.groupby(['entreprise', df_apres_covid['date'].dt.year]).mean()
Calcul de la variation des chiffres d'affaires et des résultats opérationnels entre les périodes avant et après la covid-19
df_variation = pd.DataFrame()
df_variation['variation_ca'] = df_apres_covid['chiffre_affaires'] / df_avant_covid['chiffre_affaires'] - 1
df_variation['variation_op'] = df_apres_covid['resultat_operationnel'] / df_avant_covid['resultat_operationnel'] - 1
Affichage des variations des chiffres d'affaires et des résultats opérationnels pour chaque entreprise
print(df_variation)
Création d'un graphique comparant les variations des chiffres d'affaires et des résultats opérationnels pour chaque entreprise
plt.bar(df_variation.index, df_variation['variation_ca'], label="variation du chiffre d'affaires")
plt.bar(df_variation.index, df_variation['variation_op'], label="variation du résultat opérationnel")
plt.legend()
plt.show()```
here's the output of the df_variation dataframe containing the NaNs from the operation py variation_ca variation_op entreprise date Roche Holding AG 2018 NaN NaN 2019 NaN NaN 2020 NaN NaN 2021 NaN NaN
and here's the error when it tries to generate the plot: ```py
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8504/54936595.py in <module>
24
25 # Création d'un graphique comparant les variations des chiffres d'affaires et des résultats opérationnels pour chaque entreprise
---> 26 plt.bar(df_variation.index, df_variation['variation_ca'], label="variation du chiffre d'affaires")
27 plt.bar(df_variation.index, df_variation['variation_op'], label='variation du résultat opérationnel')
28 plt.legend()
c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\pyplot.py in bar(x, height, width, bottom, align, data, **kwargs)
2649 x, height, width=0.8, bottom=None, *, align='center',
2650 data=None, **kwargs):
-> 2651 return gca().bar(
2652 x, height, width=width, bottom=bottom, align=align,
2653 **({"data": data} if data is not None else {}), **kwargs)
c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib_init_.py in inner(ax, data, *args, **kwargs)
1359 def inner(ax, *args, data=None, **kwargs):
1360 if data is None:
-> 1361 return func(ax, *map(sanitize_sequence, args), **kwargs)
1362
1363 bound = new_sig.bind(ax, *args, **kwargs)
c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\axes_axes.py in bar(self, x, height, width, bottom, align, **kwargs)
2277
2278 if orientation == 'vertical':
-> 2279 self._process_unit_info(
2280 [("x", x), ("y", height)], kwargs, convert=False)
2281 if log:
c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\axes_base.py in _process_unit_info(self, datasets, kwargs, convert)
2339 # Update from data if axis is already set but no unit is set yet.
2340 if axis is not None and data is not None and not axis.have_units():
-> 2341 axis.update_units(data)
2342 for axis_name, axis in axis_map.items():
2343 # Return if no axis is set.
c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\axis.py in update_units(self, data)
1446 neednew = self.converter != converter
1447 self.converter = converter
-> 1448 default = self.converter.default_units(data, self)
1449 if default is not None and self.units is None:
1450 self.set_units(default)
c:\Users\PEGON\anaconda3\lib\site-packages\matplotlib\category.py in default_units(data, axis)
...
---> 92 raise TypeError(
93 "{!r} must be an instance of {}, not a {}".format(
94 k,
TypeError: 'value' must be an instance of str or bytes, not a tuple```
any one has a clue?
Check df_avant_covid['chiffre_affaires'] and df_avant_covid['resultat_operationnel'] for zeros.
there's no 0 in the df_avant_covid : ```py
chiffre_affaires resultat_operationnel resultat_net
entreprise date
Roche Holding AG 2018 60829.73 15099.83 10735.20
2019 64165.38 17662.06 13584.73```
i have no nan there too
the nan are made from here
df variation
bcuz i try to do an operation from there using the two above df (df_avant_covid & df_apres_covid)
but they are not in the same dimension i think thats why
Oh, it could also be happening because when you groupby you have no data. Then when you call .mean() you get NaN.
nah its not
@queen cradle welcome to our wonderful data science chat 
Thank you
how did you find this channel immediately after joining, anyway? 
Um, I scrolled down?
good to know. (some people complain about the findability of our channels.)
i think it might be more clear with this little explaination
" I want to divide: (mean of 2018-2019) / (mean of 2020-2021) from the "chiffre_affaires" column... But it's complicated bcuz it's inside the same dataframe and i want to store the result at the "variation_ca" column "
With a server this big, I assumed when I joined that there would be lots of channels I wouldn't be interested in; I just kept going until I found some that looked promising. But I'm not new to Discord, and I think the long list of channels might have been more difficult for me to parse if I were.
thanks for your feedback 👍
can someone help me why this is not working
this is how br looks
br.set_index(["area_name","dat"]).stack("area_name")
@serene scaffold
I think it's related to the area name column. Because a normal set index("area).stack() is also not working for that.
In the code you posted, df_variation is a new DataFrame unrelated to df. It looks like df doesn't have chiffre_affaires or resultat_operationnel columns and like df_variation only has those two columns. But in the picture, it looks like you have one large data frame with everything. Also, I can see a df4. So I think you must have done something that you haven't shown us.
I recommend restarting your analysis and inserting print statements.
ok i found something but got a new error now
here's the dataframe i have
variation_ca variation_op chiffre_affaires resultat_operationnel \
date
2018 0.0 0.0 60829.73 15099.83
2019 0.0 0.0 64165.38 17662.06
2020 0.0 0.0 64361.84 19777.96
2021 0.0 0.0 72046.48 19863.39
resultat_net
date
2018 10735.20
2019 13584.73
2020 15247.05
2021 15240.81 ```
and the code with the error is:
# Chargement des données financières des entreprises du secteur pharmaceutique
df = pd.read_csv(r'data//income_statement.csv')
# Sélection des colonnes à inclure dans l'analyse
cols = ['entreprise', 'date', 'chiffre_affaires', 'resultat_operationnel', 'resultat_net']
df = df[cols]
df['date'] = pd.to_datetime(df['date'], format='%Y', errors='ignore')
# Filtrage des données pour ne conserver que les années précédant la covid-19 (2019 et avant) et celles incluant la covid-19 (2020 et après)
df_avant_covid = df[df['date'].dt.year < 2020]
df_apres_covid = df[df['date'].dt.year >= 2020]
# Calcul de la moyenne annuelle des chiffres d'affaires et des résultats opérationnels pour chaque entreprise, avant et après la covid-19
df_avant_covid = df_avant_covid.groupby([df_avant_covid['date'].dt.year]).mean()
df_apres_covid = df_apres_covid.groupby([df_apres_covid['date'].dt.year]).mean()
# Calcul de la variation des chiffres d'affaires et des résultats opérationnels entre les périodes avant et après la covid-19
df_variation = pd.DataFrame()
df_variation['variation_ca'] = df_apres_covid['chiffre_affaires'] / df_avant_covid['chiffre_affaires'] - 1
df_variation['variation_op'] = df_apres_covid['resultat_operationnel'] / df_avant_covid['resultat_operationnel'] - 1
data = [df_variation, df_avant_covid, df_apres_covid]
df4 = pd.concat(data)
df4 = df4.iloc[4:, :]
df4 = df4.fillna(0)
# Sélection des lignes à utiliser pour la division
df4 = df4.set_index("date")
row_2018_2019 = df4.loc[['2018', '2019'], 'chiffre_affaires']
row_2021_2020 = df4.loc[['2021', '2020'], 'chiffre_affaires']
# Division des lignes sélectionnées
df4["variation_ca"] = row_2021_2020 / row_2018_2019
df4```
brings me this error idk why: ```py
KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_12880/950567047.py in <module>
26
27 # Sélection des lignes à utiliser pour la division
---> 28 df4 = df4.set_index("date")
29 row_2018_2019 = df4.loc[['2018', '2019'], 'chiffre_affaires']
30 row_2021_2020 = df4.loc[['2021', '2020'], 'chiffre_affaires']
c:\Users\PEGON\anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
c:\Users\PEGON\anaconda3\lib\site-packages\pandas\core\frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
5449
5450 if missing:
-> 5451 raise KeyError(f"None of {missing} are in the columns")
5452
5453 if inplace:
KeyError: "None of ['date'] are in the columns"```
how much data is usually needed for very effective CNNs? and does the accuracy of an algo fall down to just how much data you have, or the type of algorithms you're creating?
I've read that Relativistic Discriminators tend to perform better in GANs, and now that I've implemented it, my discriminator simply won't learn anything...nice.
The accuracy should get higher with the more data you have, unless your algorithm is overfitting(in this case, is the type of algorithm you're creating).
And people tend to use tens of thousands of data, from what I've seen so far.
but what's the difference between 10,000 and 1M ?
would the same algorithm perform way better on 1M images?
Probably
1M images = more features to learn, more ways to generalize
A person who only learned around 100 words will have way more difficulty in communicating and developing social skills than someone who has more than 25,000 words in his vocabulary
Now it's working...
Why does initializing my weights with a too low std leads to vanishing gradients, though? Doesn't seem to make sense
||Also thanks ChatGPT...someday I'll have an AI better than you.||
Does anyone know how to only get 1 Line as the output for this?
s1 = """
Apples Oranges Grapes
White Black Red Green"""
s2 = "Apples"
print(s1[s1.index(s2) + len(s2):])
Output:
Oranges Grapes
White Black Red Green
I want the output to be Oranges Grapes (Which is only the one line after the word)
nvm i got it
any data scientist here
@charred light can you suggest the dataset
the dataset i have used the accuracy is -0.44
are you there ?
Negative accuracy!
Ur predictions in a black hole or smtn
ok
@steady basalt can you help with the datasets
these are the three datasets
I am finding to difficult to find the features
EV dataset has only 10 rows
atleast need a big dataset to calculate r2
rsquare
does anyone know why this error
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
def acc_func(t, vals):
x, y, vx,vy = vals
acc_x = -2.0 * y**2 * x * (1 - (x**2)) * np.exp(- (x**2 + y**2))
acc_y = -2.0 * x**2 * y * (1 - (y**2)) * np.exp(- (x**2 + y**2))
return np.array([vx, vy,acc_x,acc_y],dtype=object)
def trajectory(impactpar, speed):
maxtime = 10.0 / speed
t = np.linspace(0, maxtime, 300)
x0 = impactpar
y0 = -2.0
vx0 = 0.0
vy0 = speed
vals = np.array([x0, y0,vx0,vy0],dtype=object)
acc = solve_ivp(acc_func, (0.,300.), vals, t_eval=t)
x = acc.y[0]
y = acc.y[1]
return x, y
x, y = trajectory(0.15, 0.1)
# Plot the trajectory
plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("y")
plt.show()
# # Solution to part (b)
def scatterangles(allb, speed):
angles = []
for impactpar in allb:
x, y = trajectory(impactpar, speed)
vx = x[-1]
vy = y[-1]
angle = np.arctan2(vy, vx)
angles.append(angle)
return angles
allb = np.arange(-2, 2, 0.001)
angles = scatterangles(allb, 0.1)
plt.plot(allb, angles)
plt.xlabel("Impact parameter")
plt.ylabel("Scatter angle")
plt.show()```
Try the .divide() function instead of /
I'm happy to provide the question to make more sense
my graph looks like this👀 😂
sorry for the spam but looks better now just the range error still:(
I think it might be i need to use solve_ivp
Hey @keen notch!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
Hey, I think this is the perfect place to ask my question, so let's give it a shot. I am building a predictive model based on temperature data from my greenhouse. Now I have data which spans over 10 months from January till October. Every hour, the sensors register attributes such as: AVG temperature in and out the greenhouse, AVG relative humidity, ABS humidity and AVG moisture deficit. I have plotted my data and I see some points that are quite high. Because of this, I want to detect outliers and remove them, but I dont know how I can approach it the best way in this use cause, mainly due the fact the data represent real stuff and all the attributes depend a bit on each other. Any advice on how I can approach this the best way?
hey guys i have started learning for data science from sep 2022 till now im done with python ,pandas,numpy ,gui,databases,graphs and charts although data science is a very vast topic to learn i'm thinking to learn till Machine learning wihich includes statistics,EDA & feature engineering,ML, PCA, NLP, time series analysis,stats. after that end to end projects on ML. Will this be enough to get a entry level job in ML field ? i will complete my data science with ongoing job. my background is civil engineering
Hi
I need source code for an Instagram scraper which scrapes an account's followers and gets their emails
can we do this ? hacking
Web scraping
ohk
I dont think this is the right channel to ask.
lol…..
yes
can you just recommend me
which features i can take into consideration and do feature engineering
What is useful
And also, you have 10 rows this isn’t good enough unless you’re joining datasets and have a forgein key
ya
what if we take co2 dataset and air pollution dataset
what features would you suggest
these are the datasets
Given car engine, model, price etc?
your target variable is global temperature?
What is each sampler
A sample of what, factory output?
Where do you get this data, what is the actual data im on mobile
U can just say what is your sample on the data set u working on
- global temperature
i want to calculate r square and for that I need huge data
in all the datasets i have
you need to properly describe your data
Just one row
Globe is what ur predicting?
Ok I see
to predict my hypothesis
Global temp is time series monthly
i tried but i got -0.44 accuracy
Time series regression isn’t accuracy based
It’s error
Did you take the square of the error what is ur metric
i just want you to recommend what features i can use from two datasets and then will apply linear regression
U need to understand better what predictions are and how you measure them
i dont wanna use global temperature
There’s no point until you understand how we measure regression
It’s more important than features
ok
nobody that has an answer on this?
If you must merge data sets maybe you can show over time the largest co2 producers increasing co2
Have you identified extreme outliers
Maybe just a box plot
merge co2 and air pollution?
What defines an 'extreme outlier' ?
You need to define a goal before you define a method
What do you set out to predict
Decide that first
anyone ?
Something way beyond the iqr I guess? Can eyeball it and see
does anyone how to use solve_ivp
If there’s a distribution of points and a single point miles out
yes I am unable to get a good dataset
I can show you some charts? Since it is greenhouse data, the variables do depend on each other.
It matters then whether you think it’s relevant or not to your model
check government websites
i did but didnt get relevant dataset
You can maybe combine datasets to get the wanted result.
I don’t think global temperature is something easy to predict unless you’re just going with x decades of trend
Shits random and has been for millions of years
All u can do is say line go up because humans
lets say we can use co2 data set and air pollution dataset
can we say with the increase in temperature of co2 an no2
it is causing global warming
Supermoon, want to see some charts?
Controversial question
Sure
this is co2 dataset
Correlation.. causation… it’s theoretical still
Not something I’d personally work on
so features like country year can be taken into consideration
:incoming_envelope: :ok_hand: applied mute to @unique ridge until <t:1672409033:f> (10 minutes) (reason: attachments rule: sent 7 attachments in 10s).
The <@&831776746206265384> have been alerted for review.
Ouch
!unmute 360683248151429131
:incoming_envelope: :ok_hand: pardoned infraction mute for @unique ridge.
Whatever is useful and related to your outcome
🤣
Could you please upload them to some image host instead perhaps? Otherwise our bot will not like it
sure but it might take a little bit of time, since the bot looks at 10s window
Outliers where?
Might not want to delete anything based on that
More useful than time series
What do you mean?
How can u decide to delete points based on the series
If it can be legit and caused by something at that time
I’d imagine none of those points will appear as outliers on ur box plot
Since it is real data. It is all legit indeed, but the sensors can do a fuckywucky ofcourse.
If you beleive this is the case try linear interpolation ?
Doesn’t look too bad but I don’t know much about greenhouses
Since i am not knowing the sensor did an oopsie, is the reason i want to find possible outliers. Well, just keep in mind that if it is warm outside, the temperature and humidity in the greenhouse increase as well. Only if the variables change too fast, preventive actions get taken.
What are you modelling for
I basically want to predict the temperature in the greenhouse based on the other 5 attributes.
Incase ur thermostat breaks?
Iam following crisp-dm, so i am now at prepping.
No, so you can predict the upcoming temperature 😛
Interesting
Its a small school project so nothing too serious, but if i do something, i want it done right (or atleast the best i can do)
Would you suggest i dont need outliers removal?
I wouldn’t…
But you know glasshouses better than me
And it could be a good idea if ur usecase aligns with it
Maybe ur recorders weren’t fault
What i maybe can do is select from each attribute 10 highest values and have a look if the values from other attributes match with the others?
Is it possible to test ur model now without doing so see how close it is
Predicting the next day 15 readings
Say
Maybe give it to a bi lstm and then predict windows
Yeah i can iterate multiple times through it. One with no 'outlier removal' and 1 without.
Try and see what u get after all pre processing
At predicting next 5 readings
Look at absolute error maybe to get an idea
Are you using all readings at equal intervals to predict temp
If only you have sunlight too
hey!
the existence of a forward feeding neural network implies the existence of a mysterious, unseen backward feeding neural network
Does initializing my weights with a very low standard deviation(2e-5) might lead to vanishing gradients, or was this just a delirium from ChatGPT?
I've tested it and it seemed to actually make a difference, but I was sleepy back then and I might've changed my residual scaling factor...idk...I don't remember... 
I was hoping to see more discussions about GPT and AI here, maybe anyone could provide a recommendation to other channel?
Everyone here is already talking about AI. And there's more to AI than gpt.
The message before yours even mentions chatgpt
When bringing my tensorflow model code into an API, do I have to save the model and load it in the API file? I understand thsr saving the model saves its weight and accuracy, but assuming I keep the same parameters and code in the API code, shouldnt the model be around the same accuracy after fitting it?
In other words, why can't I just copy and paste the same code that trains/fits the model into the API code file, as it only runs one time when the server is setting up?
Nvm realize now that the model takes long to train so training it one time and saving it saves you a lot of time.
ya gotta retrain the model before each prediction
no batching, either

to be clear, I was just being silly, since you seem to have figured out on your own that you can save and load models.
No I understand no worries. Just took me one google search of why do I need to save models🤦♂️
anyone got a good guide for how to hook up an AAN to input/output thats dynamic?
Hello, I am using a module called chatterbot and I am trying to see what the confidence of the chatbot’s response is, does anyone know how to?
I'm trying to install chatterbot so I can figure that out for you. but I suspect that each response from the bot isn't a string, but some kind of Response object
Curious...I thought pretraining my Generator with a L1 Loss would mess the adversarial training with the Discriminator, but it seems to actually do no harm at all... so far, at least
EDIT: during adversarial training, the Discriminator simply messes up the generator pretrained weights 
I'm almost becoming a GAN researcher...too bad I still couldn't get any decent result
Wdym?
How hard is it to build a good gan model
we get a lot of guides about how to parse data into a AAN, we dont get many guides on how to hook the up as say a control system
like for example a self driving car
Control system??
Oh right
Well that’s just engineering
well teh coding part
Once you have the model you can deploy it
For instance, you can use the model you’ve trained in an app
like for example
input A ->>> AAN ->>> output B
Apparently, it isn't that hard if you're a random guy making a tutorial in the internet, but it's being a bit hard for me 
i want more examples of parts A and B
Data will come from somewhere, for me it’s cloud based. For your self driving car, I suppose an app will stream images
That’s probably very complex software
So I can only explain on a more basic level
You can do a batch or a real time app
Once you built a model it’s all app building, and retraining to account for drift
my disseration is ML with Cybersecurity. honestly all the stuff sofar is about importing static data
There is some data engineering and possibly ml ops you will need to learn then
very much so. got any places i can start
Data engineering will be making the data get to model to retrain and ml ops you will need to work out how to deploy it to produce results
I mean I started on the job with Azure pipelines
What is your objective?
Don’t do something random as it’s harder
hoenstly ive got like 90% of a comp sci degree in me so i can code whatever needed,
In which case
baiscaly making a ML powered bot