#data-science-and-ml
1 messages · Page 314 of 1
uuuuuh
no but
i get my model with load_model from keras
i need to load the model, remove the last layer, add my own, freeze, and train?
it shouldnt take too long, right?
yes, should be faster than training all the model and you don't need many epochs as you are only optimizing one layer
thanks thanks
guys where to learn pandas
hi here,
I'm trying to do some stats on a pd DataFrame on some products,
I have a column store which is a list of string : ['carrefour', 'auchant', 'bi1', 'wallmart', ...] stores where that product is sold
and a column calories : float number of calories in that product
I want to rank stores based on the average calories of the products they sell
can someone help me ?
It'll be something like (from memory so something might be a little off)
# assuming df is the name of your DataFrame
df.groupby("store")["calories"].mean()
it's not really working,
df.groupby('store')['kcal'].mean().nlargest(10)
[['Wholefood']] 3830.0
[['Costco']] 3779.0
[['Super U', 'Magasins U', 'Woolworths', 'Coles']] 2384.0
[['carrefour market plouagat']] 2000.0
[['Biocoop eau vive']] 900.0
[['Bo nature et santé']] 900.0
[['Carrefour Market', 'Leclerc', 'Systeme U', 'Auchan', 'Casino', 'Intermarché']] 900.0
[['Carrefour', 'houra.fr', 'Magasins U']] 900.0
[['Carrefour', 'intermarché']] 900.0
[['Carrfeour', 'Auchan', 'Leclerc', 'Systeme U', 'Casino', 'Monoprix']] 900.0
Name: kcal, dtype: float64
it's not grouping the way I want. See how "Carrefour" appears in multiple rows
Hi, so I'm looking into config files and I have one, but it generates based off of my main.py file which explicitly defines the data structures. I want to be able to modify the config and have that reflect in my main.py file. How would I do this?
What's happening now:
main.py creates config.ini w/ pre-defined data-structures (config.ini = hardcoded)
What I want:
modifyable config file which main.py retrieves information from and uses to perform operations
Oh, I misread original. I recommend splitting your store column, so that each row has the number of calories for one product at one store. You may try https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html
looks like you'll need to make another dataframe you where explode the store column
# what you have
[['Wholefood']] A
[['Costco']] B
[['Super U', 'Magasins U', 'Woolworths', 'Coles']] C
# what you want
'Wholefood' A
'Costco' B
'Super U' C
'Magasins U' C
'Woolworths' C
'Coles' C
Can someone recommend me a good tutorial for generating text with a rnn with keras ?all tutorial I've tried just don't work for me... Thanks in advance
yes and TF2
Anyone know how configs work? I have it working backwards of how I want it. Any help as to how to use config values to perform calculations in a .py file would be greatly appreciated!
Spam
Hey guys, Q about gradient descent.
When utilizing gradient descent, what is the function we are "descending"?
Is it the Loss (y axis) vs Feature weight (x axis)? If so how do we find this function?
Also for a neural network, is gradient descent done for each individual weight? E.g. for 20 lines connecting nodes, 20 gradient descents are performed on each weight to find the weights resulting ineast errors?
Is it the Loss (y axis) vs Feature weight (x axis)?
Yes, but keep in mind that there can be many feature weights in a big neural network, potentially thousands or millions. Soxcan be a very high-dimensional vector.
If so how do we find this function?
Either derive it by hand, or use an "automatic differentiation" software package to compute it for you.
Also for a neural network, is gradient descent done for each individual weight? E.g. for 20 lines connecting nodes, 20 gradient descents are performed on each weight to find the weights resulting ineast errors?
No. The "gradient" is kind of like a vector-valued derivative. You update the entire weight vector in one step. Updating each weight individually is called "coordinate descent", which is used in some models but usually it's not important from the user perspective.
Gradient descent is effectively an implementation detail. It happens that neural networks are so difficult to optimize that tuning the optimizer is a necessary part of training them. With most other models and optimizers, you don't have to tune the optimizer in day-to-day usage (e.g. logistic regression with L-BFGS).
hi! i was wondering how do i iterate over rows to find a specific value that meets a certain condition
I don't think you need machine learning for this
Just do:
for row in rows:
if <whatever>:
do_something()
Or maybe you meant something like anomaly detection?
(for that you need ai)
@simple shadow
haha
technically tho, even using if-else is A.I
in pandas, you don't usually need to specifically iterate over rows. do you want the row number? the value itself? the row "label" (aka the index)?
@grave breach this channel is kind of the catch-all channel for pandas, numpy, matplotlib, and scipy
Sorry, didn't got he meant pandas' rows, I thought he just had a list of lists
ah yeah. you'll start to see common patterns in people's XY questions so you can skip some of the back and forth "what do you mean" stuff.
I didn't see that in there either
@desert oar i want to change specific values in one column
are you working in a pandas data frame?
yes @near cosmos
I think you are looking for this https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#if-then
# for all rows where column AAA >= 5, change the value of BBB to -1
df.loc[df.AAA >= 5, "BBB"] = -1
@serene scaffoldhello. He didnt reply u yet, right?
He said he's not sure how one would request that.
huh?
what do u think i want? i mean, i was looking for like 50+ images of each pokemon
For a dissimilarity measure to compare ratings such as very dissatisfied, dissatisfied,
neutral, satisfied, very satisfied should it be 1 hot encoded and then use the hamming distance?
or can I use euclid's distance
hey i'm using tensorflow.keras to train a CNN, but for some reason it doesn't show anything or do anything after model.fit
i basically removed all my layers but still nothing
I need to gather a big image data set of pokemons to make a cnn classifier. If any of u wanna help me, pls check google image search python api, and follow the steps. Ping me too to share a script
it just prints 1 and keeps running forever
any idea why it might be?
i even have verbose to 1 so it should print some epoch info
Thank you for the awesome explanation. I am getting a grasp of how the cost function updates the weights of all the weights instead of doing them individually.
I have a few more qualms, if you don't mind.
- How do we actually know the formula of our cost function (e.g. in the form y = mx + c)?
My current intuition is that we set our cost function from the get-go, e.g. least squares. So that's how we know. Is that correct?
Also, here is how I think gradient descent works: Please let me know if this is wrong
-
We have a point on our cost-function, (x, y, z) which is the weights of our x, y, z features
-
We partially differentiate our cost-function, and sub in values for x, y, z to find the gradient at the point specified in the previous step (the rate of change of the loss function in respect to all features)
so, something like [1, 2, 2] -
We then go down this gradient, by updating the weights for our x, y and z features at once:
weights = [5 (weight of x), 4 (weight of y), 3 (weight of z)]
# multiplying our weights with the gradient of the weight
[ 5
4 x -[1, 2, 2] = multiplied_weights
3 ]
new_weights ([new_x, new_y, new_z]) = old_weights - training_steps * multiplied_weights
We recalculate our cost, and carry on, until it's a minimum.
The hardest part was thinking of each feature as a vector. I really appreciate your help. If you have a bitcoin address let me know.
thanks!
also if I'm finding a cost function for a neural network, how many iterations should I go through (adding the cost to a cost_sum variable before I divide by the number of iterations to get the avg. cost)?
hi, I have a data set a mix of int and string , the type is numpy.ndarray am trying to detect the string attribues do a condition I used if isinstance(point, str)== True but it doesn't seem to work
can you make a minimal working example of the problem?
okay I will try
Hi guys, is someone experienced with scipy here? I'm trying to solve a set of two differential equations and i need some help to verify if it's correct
I don't know how correct this method is XD am still learning but this is an example
what does your data (X and centroids) look like? consider grabbing a help channel #❓|how-to-get-help
the centroids are random points from the data this is how they look
guys, using seaborn, how can i make a barplot without legend on one axe?
usually you can pass legend=False or legend=None (I forget which)
it doesnt allow me
x = sns.barplot(list(dct.keys()), list(dct.values()), x=None)
so
dct.keys() is what i dont wanna display xd
so basically what i wanna display is
string1 = 80, string2 = 120
etc
the values of each one
but i only want a vertical axe having numbers like 25-50-75-100
do i explain?
boa noite
sorry I don't quite understand. can you provide an example and the output you are getting now
i dont want dct.keys to appear on the legend
what do neat-python outputs coorespond to and how do I know what to do with them exactly?
My current intuition is that we set our cost function from the get-go, e.g. least squares. So that's how we know. Is that correct?
Yes. You choose your loss functionf(y_true, y_predicted)and plug in your model fory_predicted. So if your model is linear regression, the loss function isf(y_true, ax + b).
I'm not sure I fully understand your example of gradient descent. Yes, the gradient at a specific point is the vector of partial derivatives evaluated at that specific point. You don't perform weight updates multiplicatively, you do them additively. I recommend looking at the equations, you might be surprised at how simple it is.
Our server rules disallow offering or requesting payment for help, but I wouldn't accept payment anyway.
I don't understand your question. Can you be more explicit?
Does anyone know of any complete beginner tutorials that introduce Keras? I really want to get into ML a bit more but every tutorial I find is absolute trash when it comes to explaining the code. For the most part, other than some deeper level ideas, I understand most ML concepts. That isn't the issue. However, understanding what I'm writing down and not just mindlessly copying it is my problem. I have no idea what the code does or how to use it without almost completely rewatching an entire tutorial. And a lot of the tutorials that I do end up finding are very vague and tend to just read commonly available materials which I have already gone through. I'd really appreciate any and all help I could receive, thank you!
@severe valve the keras documentation is honestly the best for this. Tutorials tend to abstract away from what is going on under the hood.. and over abstraction leads to the confusion you are talking about. The documentation is dry, but it will leave you with an accurate understanding of how to use the API.
You are passing a file object, instead of a string, to tokenize_words. You need to .read or similar from the file to get the contents
I agree, so far it has been the best resource, although when I have a question regarding code in the documentation there doesn't seem to be many answers. But I guess I'll have to give it another shot before I do another exhaustive search.
can anyone explain this? im using CNN with keras and it prints epoch 1/10 and then lags for a bit and quits
basically when i add a conv layer to my network it stops working
any help is appreciated
Can anyone tell me please do i need to have a good understanding of mathematics in order to learn ML??
@subtle panther you should intend to expand your math knowledge and understanding in parallel with your hands-on experience and your programming knowledge
So yes, eventually. But you can get started without knowing lots of math up front
If you already know calculus and the basics of linear algebra (vector/matrix math and the interpretation thereof as systems of linear equations) it will help
Hello guys!1
Could you help me to create some script?
Note that I'm using 'Selenium Webdriver' and now I need to get all these values (red square) and put then in a List, for example:
[5-2021, 4-2021, 3-2021, etc...]
If I could get its values in a list, I could create a big web scrapping.
I can already click on each field, but I wanna put it on a loop, where the RPA will select the first value, click and get all table values... after it will clicks on the second and get all table information
Use beautifulsoup’s findall method and iterate the rows with for loop
I'll search for it and learn how to use. Thank you a lot!!
You’re welcome 🙂
Is there a list somewhere that shows all the depreceated and equivalent functions from tensorflow v1?
@desert oar thanks I understood properly what I need to do
I've import this lab as you suggested, but I've tried many ways to extract the information.
'div', class_="form-control custom-select"'
'div', class_="drop-meslme"'
'div', class_="col-12 col-md-5 col-lg-4"'
'select', name_="meslme"
etc...
I can't find any fild on red square as well.
I use this site
Além de oferecer a mais completa linha de Laminados e Extrudados, a Shockmetais destaca-se também pelo setor de Fitas e Blanques com modernos equipamentos que agregam serviço de cortes transversais, longitudinais e recortes em blanques para as mais variadas aplicações.
and the values are in the "combobox" as in following picture
Hi, someone knows how use an audio output as signal generator in google colab?
I try to generate a signal v = E.sin(wt+phi) + Vbias but the function IPython.display.Audio kills my bias.
Thanks in advance!
hey when building analytical models, what is being referred to when someone says "have you done the business rules for this model"?
someone at work is asking me and im scared to answer because idk
does he mean documentation
TELL ME YOU HAVE IMPOSTER SYNDROME WITHOUT TELLING ME YOU HAVE IMPOSTER SYNDROME
tried saving it with librosa and reading it again with it?
I want to make a project analyzing programming language popularity by developer type based on the data contained in the Stack Overflow 2020 Developer survey.
I thought about creating a separate DataFrame for each dev type, then calculating a percentage for each language each dev type said they worked with, but it sounds like too much work for something that surely has a simpler solution.
any ideas?
how can I use opencv to scan what the camera sees and than give a 3d digram of that to the computer
I'm streaming some time series data from Kafka using Spark structured streaming of 10 seconds but sometimes streamed data not contain 10sec .. any solution?
thanks in advance
it depends entirely on the format of the data
Hello guys, I have a question how do I calculate the effect size in a chi quare test. (Outputs I should get are: Phi & Cramer-V)
Thank you for help
what is the nn that codes for u?
hi guys can anyone help me out please im sorta really stuck. im trying to draw a sphere onto processing but anytime i use P3D, the display image is just grey and blank. does anyone know why?
GPT-3
Anyone availible to help me with Intro to Data Analysis for Python?
thanks, but doesn´t work on multiple items on B, I solved it, it was quite tricky
ye it wouldn’t
you would need a different approach for that
but your original example only had one row
sorry, I made a quick example, didn´t think it would influence on result
in general it’s more complicated if it’s a many to many thing
but if you’ve solved it then gratz 🏆
import numpy as np
import pandas as pd
data_in = pd.DataFrame([
{'x': 3, 'y': 2, 'z': 1, 'data': [0.1, 0.2, 0.3]},
{'x': 2, 'y': 0, 'z': 4, 'data': [0.7, 0.8, 0.9]},
])
shape_out = (5, 5, 5, 3)
data_out = np.zeros(shape_out)
for row in data_in.itertuples():
data_out[row.x, row.y, row.z, :] = row.data
Is there a way to do this using Numpy fancy indexing, or something otherwise vectorized, without looping + itertuples?
Naively I had tried
shape_out = (5, 5, 5, 3)
data_out = np.zeros(shape_out)
data_out[data_in['x'], data_in['y'], data_in['z'], :] = data_in['data']
But I got the ValueError: setting an array element with a sequence. error
data_out[data_in['x'], data_in['y'], data_in['z']] = data_in['data'].tolist()
oof, really?
you get that problem because you have an array of lists
in data
data_in['data'].tolist() this converts it into a list of lists
which numpy will treat as an array
you need either an array or list of lists, not a mixture
huh, i figured it wouldn't matter if it was a list or an array-like thing
yep that worked perfectly
data_out2[data_in['x'], data_in['y'], data_in['z'], :] = data_in['data'].tolist()
it doesn't as long as it's homogenous
@fallen trellis see above
yeah i like it for visual clarity
what do you mean by this?
i.e. not dtype='O'?
either a list of lists or an array of primitives, so, yes, I guess
just not an array of lists
because then numpy treats it as having length 1 across the relevant axis and tries to broadcast across it
which leads to trying to put the list in individual slots in data_in
hence setting an array element with a sequence (said list)
yeah that makes sense, it doesn't know what the objects are in the array so it treats them all as scalars
good to know
yeah, it´s not the most efficient thing ever, do you want to see?
so
how bad this is
this folder is supposed to have only bulbasaurs
but api randomly grabbed an ivysaur, the evolution
how many fails like this are a problem for a neural network?
Like, if i have 100 images, how many failures can i afford?
trying to avoid data clean :)
Please help!
Generate a vector of 1000 random numbers between 0 to 100.•Plot a histogram of these numbers with number of bins equal to 10.•Calculate the average of these numbers by using numpy method mean().•Plot a red line (red color) from the mean point on the histogram plot in y direction to show the mean location in the plot.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randint(100,size(1,1000))
print(data)
matplotlib.pyplot
plt.hist(list(data)),range=(0,100),bins=10
mean = data.mean()
print(mean)
In general the disc is not to solve homework problems :p but if you're having any problems with your code at a specific point we might help
what is your issue?
I did the homework. I just keep getting an error.
please share that error
NameError: name 'matplotlib' is not defined
uh, have you installed matplotlib?
you need to install the library first before using it
pip install matplotlib, or if using CONDA distrios, conda install matplotlib
it is installed
then at that point you need to make sure you're using the right PATH
that is, that your Python install is pointing to the right direction
unfortunately that's something you need to do yourself
sometimes rebooting works, but first you need to make sure Path is ok
thank you
there is no formula that can tell you this. there are many factors involved: how similar is the mislabeled record to the correctly labeled records? how much variation is there in the correctly labeled records? what % of records are mislabeled?
there are some techniques that are specifically designed to adjust for mislabeled data, e.g. Gold Loss Correction which i've used with some success in the past
this also isn't specific to neural networks. the same reasoning applies for pretty much all statistics and machine learning
mmmm
just for in case u know
if instead of searching "bulbasaur"
i search "*bulbasaur*"
with *
will i increase my ocurrences of bublasaur?
like in a normal user search. * mean only that
or thats what i think
i have no idea, it depends on what exactly you're searching
nah, nvm, dont worry about it
i will search for a few images, and take a fast look, and if i see many missplaced imgs
i will look for that loss u mentioned above
just at the screenshot you provided, there's also a picture with a faraway view of a street, and one with 3 pokemon
so a different evolution that looks quite similar isn't a big problem comparatively
so
the street is a bigger problem?
lmao
This is the street image
there is a bulbasaur technically
hm I think that can be improved
but I'd need to have a more complete example
and if what you have works for your purposes then might as well go with it
yeah, I thought that, maybe I can use outer with axis=X
it´s working, I´m still new to numpy, I will later try to make it more efficient
jesus, I sent you the wrong code
oh no, it was the right one
sorry
anyone ever feel like they've hit a wall when it comes to learning ML/NN? I really want to learn a lot about these fields so I can apply them to a future job in medical research but I just can't sometimes. It gets so boring and blunt. Everything feels too complex and involves so much math but I feel like if I don't learn ML/NN, then I won't be sought after job-wise. Data analytics, visualization, etc only takes you so far. Even if I try and push through this, all I get is a bunch of information that I can't apply leading me to rewatch the videos and get stuck in an endless cycle.
My advice: ditch the videos, use textbooks, spend some time learning the math.
With a good textbook, doing some exercises at the end of each chapter can be very important for learning.
At the same time, just start messing with data.
Do a bit of math, then forget all about math and just make some pretty plots, or fit some models.
sick name bro
so i've basically just been doing that last part, i've just messed around with a lot of models. but I've had absolutely zero idea what the model does other than on a high level. ( E.g CNN ~ image classifier. Linear regression ~ linear problem. etc ) But so far that hasn't gotten me very far and when I get errors or don't understand why my model is performing so bad I just have to stop because I have zero understanding of the subject.
and then when I go to actually learn it just becomes more and more difficult.
But I'll try and find some textbooks if I can and try doing the math
I have a friend that easily started a summer job that eventually turned into a part time, then full time position just blind applying to the clinical research unit at my university with just a bachelors and no experience in medicine. It really helped her learn machine learning in a more meaningful way, but the Math was unavoidable. Her first month there was just reading a textbook on Markov chains.
exactly my point. I'd really love to apply ML ( as I learn best through application. I initially struggled with this in other programming languages before I found python ) but even when I try to apply it, it all just breaks down in front of me. But I guess the concepts behind ML are the most important for now, I'll definitely go look into textbooks for ML. Thank you everyone for your advice and time. :)
I think this is a commonly mentioned book, I read Mathematics for Machine Learning and I felt like it was a very pleasant read and covered a good breadth of material. I probably wouldn't use it until you've had at least a first course in calculus and linear algebra though.
i am currently reading that book too, its a free book
There also a great book on ML by Ian goodfellow and and Yoshua Bengio
Sorry man, I was offline, did you solve the problem? I’d suggest using xpath to specify the location of the red box
//*[@name=“meslab”] like this
hi guys. got a question about kaggle house prices data; been investigating other solutions to improve my code and perspective. see sth like that
Getting the correlation of all the features with target variable.
(train.corr()**2)["SalePrice"].sort_values(ascending = False)[1:]
what is the reason for using train.corr()**2?
ops sorry double * makes the code bold
pow(train.corr(),2) is better to write here
oh silly me.. got it tnx anyways guys
For neural networks, how do you get the partial derivative of the cost function?
For tensorflow models, is this hardcoded depending on the cost function you choose, like MSE?
is IBM data science certificate good on coursera? or you people can recommened me some good one, to mention, i'm new to data science.
Hello man!! Don't worry about it.
Is it possible to use xpath on beautifulsoup? I couldn't find yet
Interesting, however I'm not sure this works for my case as the indices of the data are hidden in attributes, e.g., rows[0].idx.y . Or do you see a way, still?
There's no way to get the attributes in some vectorized form?
Unless you iterate over the entire dataset, no
Ah, then no
Even if, how would numpy handle loading the 5gig+ dataset?
the problem isnt numpy
the problem is ur ram xd
u can read from a buffer i believe
Lets say, first 256 Mb of the data set, then the other 256, and so on
the more giant data you have to buffer, the more you should consider Dask
R u into software engineering?
I only dabble.
I see!
Where would be the best place to learn neural networks (preferably in python) for something like facial recognition?
5 GB is not that big
also, memory mapping
5 GB is not that big
cries in 2Gb of memory
Hey guys, gradient descent Q
I read this:
A larger learning rate leads to a faster learning process at a cost to be stuck in a suboptimal solution (local minimum). A smaller learning rate might produce a good suboptimal or global solution, but it will take it much longer to converge. In the extremes, a learning rate too large will lead to an unstable learning process oscillating over the epochs. A learning rate too small may not converge or get stuck in a local minimum.
I don't get it.
A larger learning rate may mean that you miss the global minimum and end up somewhere else, but why does it mean you are stuck?
while with a tiny learning rate, won't you most definitely be stuck in the first local minimum you get into?
second question, yes
first question, possibly, depends
you might diverge
diverge? as in, miss the global minimum?
so... your loss is going to increase without bound? so you will never reach a minimum?
yes
hmm 🤔 why would that be, wouldn't you always go down the path of the negative gradient, thus moving towards a lower loss all the time?
could you give a possible scenario for this? My idea would be a function like y=x^2, where if you have a large learning rate you always overshoot the minimum, but I don't think that explains the loss increasing indefinitely
that's precisely the example
if you have a large enough learning rate, you jump from x to -x -a for some positive a
there the slope is higher, so next you jump to x + a + b where b>a...
and continue bouncing off the walls of the parabola, getting further and further away from the minimum
whee
bouncy
...sorry I'll keep quiet now
ooh, thanks. I didn't think of it like that
so in gradient descent, how do you actually know that you reached the minima - the gradient vector's parameters will all be 0? (so the gradients in each axis are 0 thus it's a minima)?
Well, yes, that is basically the definition of a minima, though a simpler way is just checking that you haven't moved much this step
ok gotcha. I also didn't fully understand why stochastic gradient descent is less susceptible to getting stuck in a local minima compared to batch gradient descent.
iirc, the formula for updating our weights is proportional to our losses, so:
new_weights = old_weights + learning_rate*(negative_gradient_vector * loss)
If stochastic gradient descent is less likely to get stuck in a minimum, that means that the loss has to be greater? But why is that the case? Surely, if you take the loss of one of your predictions (instead of your whole dataset), you are not guaranteed to have a greater loss so I would think it's unfair to say it's less likely to get stuck in a local minima.
Maybe you get lucky and SGD chooses a random point with a loss that is greater than your whole dataset's loss. Then I can see why it's less susceptible to getting stuck. But still, it's a bit "random" and is a chance. Is this why people say that?
I'm not sure about this, but it might simply be that since it's nondeterministic, it'll eventually luck into a path out of a local minima, unlike deterministic ones that are definitely stuck
what do you mean by "stuck in a minimum"?
you want to get stuck in a minimum, that's the whole point of doing gradient descent
i mean, a local minimum which may not be the global minimum
that's different from "not converging", which is what reptile was talking about (and what you were asking about) with batch gradient descent
gradient descent only ever finds local minima
I don't know, but a lot of things in ML are not theoretically backed - it's just found that in practice x works better than y and so on
oh really?
I was just trying to think about how we can get ourselves out of a local minima and continue to a global minimum
so, when people say "stochastic gradient descent is less susceptible to getting stuck in a local minima", what does that mean?
gradient descent alone can't do that, to my knowledge
just curious, then how do we do that? adam and such?
i think this is because it bounces around more
so the idea is that it's less likely to get stuck in a small local minimum because it might just skip over it
that's my understanding, at least
but ultimately it's still finding a local minimum, there's no guarantee (that i know of) that it's a global minimum
ah I see, so just due to the random nature of SGD, it can randomly pick a prediction which has a high loss and makes you skip over say, a local minimum, and you might then get to a global minimum
but for batch, stochastic and mini-batch, they essentially all just converge at the first local minimum they find (most of the time)
so yeah, as @grave frost, what would you use then to find the global minimum?
what if the neural network re-runs with different initialized weights, multiple times, to try to find the global minimum
people do in fact do this
you won't ever know that it's global
yeah, I was thinking that, so there's essentially no way to know if its a global minimum?
or maybe you can differentiate the cost function and find each turning point, then you'd have an idea of which areas to check and one of them will be a global minimum
you can't ever know. you can compare loss values at 2 different local minima, but that's it
but if that is indeed the case, then why is it that changing the seed of the model does not yield much of an accuracy difference? does this imply 9/10 times a model does find a global minima?
would this not be a viable method to check all possible minimas?
yeah, because realistically there aren't that many minima, or different initializations don't have that much of an effect on which minimum is chosen
so even if different initializations get stuck on a local minima, then what? I use something different?
but you don't and can't know that they are local, non-global minima
an accuracy jump/drastic loss decrease
would tell me I was in a local minima, doesn't it?
https://papers.nips.cc/paper/2018/file/a41b3bb3e6b050b6c9067c67f663b915-Paper.pdf
https://www.cwi.nl/events/cwi-scientific-meetings/ml.pdf
https://arxiv.org/abs/1704.08045
https://deepai.org/publication/understanding-the-loss-surface-of-neural-networks-for-binary-classification
there's some interesting and nontrivial research being done on this topic btw
so why can't you just differentiate the cost function to find all the minimas then compare the losses between them? Because some cost functions can have infinite amount of minima?
the derivative of the cost function is the gradient
gradient descent is how we attempt to find a minimum
hmm, so the cost function is not in the form (an example) y = f(x)?
huh?
back up
what do we do in order to fit a model
- define a loss function
- minimize the loss function
right?
yes
so how do you propose to find all minima?
isn't that just the argmin?
yes, but how do you find it? with gradient descent.
why can't we brute-force? surely it wouldn't be that slow
brute force how? re-initialize at 1000 different points and re-run gradient descent for each one?
well... I might just be spouting rubbish... but, if you had the cost function y = f(x),
can't you differentiate it to get the gradient of each axis?
I guess then you would have to sub in numbers into each derivative so that all derivatives equal 0, and that would find you minimas
can't you differentiate it to get the gradient of each axis?
the gradient is the vector of partial derivates
yes
if the objective is to get the global minima of the loss function, then surely the lowest value is the minima?
yeah, sure. might be interesting as an academic exercise, but probably a total waste of time otherwise.
how good is downloading a model that seems to work, download some random images cuz the data set used to train that model is gone, use that model to clean the data i downloaded, and use this cleaned data to train model for better results?
realistically models probably don't have that many minima
but what I think they're trying to say is that you don't explicitly know if it's the global minima, or just another local minima
so the gradient is a vector of partial derivatives, are we not able to equal each derivative to zero?
but...the lowest value would be the global minima
if we brute force everything, yeah I agree
yes, assuming you have in fact enumerated all local minima
(which i am not sure is even possible)
then? wouldn't brute forcing be faster?
brute forcing how? computing the derivative at "every" point?
some mathematical technique to single out potential candidate points first?
you’re basically talking about a grid search over the whole feature space
computationally intractable
go invent one and publish it, i'm not aware of any (other than the various neural network initialization techniques that are currently known)
hmm...have NN's been tried to find faster alternatives to SGD?
good talk, this community rocks 😎
@sly salmon derivative == 0 just means it's locally flat, could be a saddle point
you mean try to solve the cost function analytically?
and yeah i think that's what they're proposing - solve analytically for all roots of the derivative and compare the loss at each one
you can’t do that because
i assume that's not possible
hmm, but can't we do that to find every minima?
///<
3x + y = 6
x - y = -2
x + y = -3
there’s no consistent solution
to all those equations
now remember that
each set of feature values and target
this is
forms one such equation
and you often have many more data points than features
regrets not learning fully about SGD
In linear algebra, the Rouché–Capelli theorem determines the number of solutions for a system of linear equations, given the rank of its augmented matrix and coefficient matrix.
correct
(basically)
hmm, so for this example, since there is no consistent solution we can't determine a minima? But, we assume that there is a minima there?
so we have to find it via some exploratory technique with gradient descent?
BASICALLY
yes
it’s late so I won’t go into the details but
think about it this way
take a piece of cloth
no matter how you contort it
but...can't we just solve for each 2 and average the solutions?
alright, I appreciate it. really good talk I learnt a lot today!
also, the solution for this, isn't it where all of the lines meet?
those are lines
@sly salmon https://stats.stackexchange.com/q/212619/36229
they do not meet at any single point
there are several points to make
okay let’s continue this another time?
bedtime for me
yes, goodnight!
gn!
also, what do you guys mean by solving the cost function "analytically"? I've never heard the term before
Analysis is the branch of mathematics dealing with limits
and related theories, such as differentiation, integration, measure, infinite series, and analytic functions.These theories are usually studied in the context of real and complex numbers and functions. Analysis evolved from calculus, which involves the elementary concepts and techniques o...
i guess this answers that question
@sly salmon "analytically" means finding an exact solution by solving equations
i.e. "set the derivative equal to 0 and solve for x" is the analytical solution
as opposed to the numerical solution which doesn't require solving for the exact form
"analysis" in the sense of "real analysis" is a different thing
i see. and yeah, that example given before:
3x + y = 6
x - y = -2
x + y = -3
that simultaneous equation could essentially be replaced by all my partial derivatives, and it may be impossible to find a consistent solution. Ig I could use it as an analogy to say that "there are no consistent values where it's a minimum", so we have to take the iterative "gradient descent" approach.
but if I do it that way, essentially I'm saying all of my gradient vectors will never meet at one point? Thus they are never going to equal the same value (0) where there's a minima? < I might be wrong there.
But then the question lies...
If there is no consistent solution analytically, how can there be a solution iteratively (via gradient descent)? Or maybe the answer is just an approximation, hmmm.
Anybody know if pandas has expressed intent to port the package to arm for m1 macs?
can someone explain me why variance in ml referring to overfitting but in statistics its measuring the how much the data is spread from the mean ? i am little bit confused 😐
I'm inserting data into DB using json files
with os.walk() but after some times speed decreased
any solution for this
variance in ml referring to overfitting
it is not "referring to overfitting". the variance of a model is the variance of the model predictions.
@limpid oak you should ask this question in a help channel, and provide the code that you are using
or #databases
thank you @desert oar
so meaning how far the predicted value will be from the actual testing value?
no. variance is always a measure of spread around a mean.
in the context of model overfitting/underfitting, people usually refer to the "variance" of the entire model-fitting procecure
imagine that you could randomly re-generate your data over and over, then fit your model on each version of the data
then you would have a probability distribution of models, more or less
the definition of "variance" never changes
oh ok thanks
hey there.. anyone knows how to develop algorithm using python3
I'm trying to get some data from wikipedia but wikipedia's data is so dirty so is there an easy way to clean it or is there another cleaner alternative to it?
no
anyone know a good module/api that can do multistep algebra like https://mathpapa.com/algebra-calculator.html
Algebra Calculator shows you the step-by-step solutions! Solves algebra problems and walks you through them.
(ping me when u respond cuz ima be afk coding)
maybe this ?
is it able to automatically detect things like something/something2 = something3/x but also normal equations with the same code or will i have to not be lazy and code all of that
no
mybe you can build this with https://www.serhii.net/blog/2018/02/18/experinces-jupyter-notebooks-pyplot-sympy/, but it not solve step by step
Hello, can someone help me in simple linear regression. I have a feature total_spend and target sales. Now I scale this data and train my model and get the estimates for beta0 and beta1 such that Y = beta0 + beta1 * total_spend. But the beta's I have right now are estimated for the scaled data, so its somewhere between 0 and 1. But this is a problem because I cannot use these beta's for inference i.e to study the affect on sales by a one unit increase in total_spends. So how do I get my beta's back to my original scale?
Save the scaling step as well. You have to apply the same scaling on inference also
Otherwise your model is pointless. It must be fed data with the same scaling for both train and inference
Once you do that, you'll realise your model is more like Y = beta0 + beta1 * f_scaling(total_spend)
That should let you do any analysis as you see fit
yeah, I would have chimed in to make a custom preprocessing layer if your pre-pro gets a but complex - but def not for linear regression
any idea how can you sort an np array with strings that follows the same kind of sorting of linux file systems?
How are linux file names sorted?
Say you had a 1000 simultaneous equation with 20 variables. would solving each equation for a consistent solution be insanely computationally hard and long?
is this related to your question last night
hi
Is anybody free for a short call about a few questions about datascience and AI?
just post your questions no one wants to be called
YEs
I have a quick question
Could I ask for some help with coming up with ideas for a future project. I am not a very creative person but I want a fun project to do with AI and CV
please ping me as I have this server muted
Can anyone explain what is going on here? I can't figure out why my dataframe is giving me the wrong length.
it seems to be the right length?
the indices skip some numbers
I realized it just as you typed that
Thank you, I spent way too much time looking for something else
rippp
first time doing on pycharm instead of jupyter ....the output it correct but theres bunch of following red text...is it nothing to bother...?
guys, i'm new to this data science and i'm serious to have a career in it. can you guys suggest me books/courses or any vids to start it?
Has anyone here used the dlib library? Im try to make a face recognition program and im having a little issue
The perfect amount of epochs would be the one that ends with the minimum loss?
Hello, while doing a simple linear regression, using just 1 feature. My MSE keeps on increasing a lot by each epoch until python gives out overflow error. What could this mean? Why is MSE increasing?
try:
shcSurveyNo = shcSurvey.split('/')[0]
# villDF['name_match'] = villDF['PIN1'].apply(lambda x: 'Match' if x==shcSurveyNo else 'Mismatch')
if shcSurveyNo in villDF['PIN1'].unique():
print(shcSurveyNo,'Yes')
villDF['shc']=1
else:
villDF['shc']=0
print(shcSurveyNo,'No')
except:
print("Something went wrong!!!!!!!!")
what I'm missing here, please hel[
help
65 Yes
38 No
185 Yes
396 Yes
373 Yes
but in df its only show 1
Hey does anyone use any cloud service here for computing?
I have a DataFrame in which I'm trying to count the number of times a certain string exist in a particular column. All the methods I've tried didn't work out.
For instance, in a DataFrame, under the name column, I'm trying to find rows that contain the word 'Mega', and count the total number of times the word appears.
would training with many epochs, finding epoch with as less loss in the end, and limiting epochs to this amount be good?
@short heart I dont know much but when I increase epochs it decrease my loss and increases accuracy
@boreal summit Hey would you tell me a way you tried?
did you try count()?
I already tried using **
**mm = data['name'].str.contains('mega')
Then I passed the Boolean above
Then I passed the Boolean to **data
It didn't work.
The logic didn't even work, so it didn't get to the count part.
I've also tried **str.find()
They seem to work online but not with what I'm doing ATM.
could send a screenshot?
Okay
It just worked now, thanks. I guess I was doing something wrong and I didn't know.
Great
anybody?
Say, if Ive got a really small loss (2.0239e-04), but result is pretty bad, is that underfitting?
I have a question : can I use min max data normalization than use Z score normalization , in theory it would work well but I am not sure because I read it is recomanded to use only one normalization method
Maybe. The absolute size of the loss usually isn't interpretable. A practical example of underfitting would be predicting the mean for any input.
It's "under" fitting in that the model isn't representing/learning enough of the variation in the data
Normally I recommend normalization when you know the bounds of the data, and standardization when you don't
kind of stock price
11000
lstm layers
What kinds of features are there? Is it classification or regression? Etc etc
with batch normalization layers and relu in between
hey
There is a technique called "early stopping" which is intended to help prevent overfitting
what is the case to drop the column with missing data ?
more than 90% missing value in that column ?
There is no rule or magic number for that either
for what ?
Why is the data missing? Why did you want to include that column in the first place?
Well subjectively that sounds like it might not be useful. But I don't know the specifics of your situation. Maybe that column is necessary and you need to do some more work
in that case, this is a great opportunity to practice being smart about missing data
someone told me when there is column with missing data over 90% just drop that column
don't attempt to follow or even invent strict rules for discarding data
it always depends on the situation
i happen to know that in the titanic dataset, age is important
but you don't know that up front
that's for you to figure out. maybe you can infer the data from somewhere else
or maybe its missingness or lack of missingness is itself a feature
i know age matters there
maybe you can infer broadly a range of values from other data, even if you don't know the exact value
or maybe you just drop it and see what happens 🙂
you might want to look into the different kinds of missing data.. "missing completely at random", "missing at random", and "not missing at random" (MCAR, MAR, and NMAR)
why is that your only option? 
not necessarily
And then forget about the missing at random idea because it's always a terrible assumption 😉
say I have a robotic wheelchair controlled by eye movements, can I class the user as an actuator?
what could be reasons for pandas groupby to output me rows with duplicate keys they are being rouped on?
When I try to remove it with .drop_duplicates() the problem persist but the rows are clearly the same for those keys.
hi, first timer here. i'd like to ask some general question regarding how you choose a machine learning algorithm to build a model, more specifically an image classification recognition problem
to my understanding, generally I would want to look at the data, judge its distribution, its features and go from there. But that answer seems too generalized and is there any format or "guidline" that i could follow?
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
@shadow knot this could help you but imo the best is to know how each of these roughly function and from there you can choose the one best suited for your need. There ain't a magic formula for choosing and not a single right answer either most of the time
sometimes you can just depend on other people to tell you what to use 🙂 e.g. for image classification CNNs are dominant for a good reason
i am executing an assignment in school and one of the criteria is to choose a number of base models and justify why I chose it over a dataset of 20,000 RGB images of size 27x27, essentially making its feature dimensionality up to 729 if im handling it by greyscale value of individual pixels
whereas for "social science" data usually there is no right answer and you might need to try several things
how many do you have to choose, and do you have to justify all of them or just the "best" one?
what constitutes a "model" in this case? are imagenet and resnet considered different models for the purposes of this assignment?
dont have to choose a certain number, just have to justify why i chose it over other models.
I do have to justify all of them, but only to some degree. After hyper parameter tuning and feature selection, i am required to make an ultimate judgement and recommend a "best" one.
forgot to mention but I am not allowed to use pre-trained models, therefore i believe imagenet and resnet are out of bound. But if they werent, they would be considered 2 models.
right now i am going with Logistic Regression, Random Forest, CNN, SVM and KNN to cover all different "type" of algorithms
i'd recommend looking into kernel SVM, specifically radial basis kernel, which as far as i know was very popular before deep learning took over
knn is an interesting choice, because you specifically need to define what how to define the "distance" between images
which brings me to my next question. Due to the number of features i have, i was thinking about dropping KNN since from what i've read, KNN effiency drops when you introduce a high dimensional dataset
can i use threads with a neural network?
the nature of the dataset is medical. the images are cell images and my two tasks consist of:
- Classify if it's a cancerous cell
- Classify if it's a specific type of cell
my original thought was i could use the "distance" metric for KNN instead of "uniform" since a cell of similar nature should have more "weight" when voting compared to a cell of completely different structure
i agree with this, look up "the curse of dimensionality" for why defining a distance metric on high-dimensional data is not necessarily helpful
as far as i understand, pre-deep-learning image classification depended heavily on special-purpose feature engineering
so you could theoretically still use KNN if you could significantly reduce the size of the feature space
what's your thought on Logistic Regression?
regularization could help bring the size of the feature space down, which is my primary reason on selecting it as one of the base model
i will be implementing Bagging Random Forest as my feature selection technique so some feature removal will be done there but imo a model that could do that also is a plus right?
Taking the Fourier transform and keeping only a few of the terms with the largest Fourier coefficients seems like the first strategy that would come to mind for me
is this technique widely applied for image recognition since my brief googling effort only shows its usage for sound/speech application
yeah i would prefer something "intelligent" that reduces the feature space rather than somewhat-randomly discarding features
To me an image is just sound with a higher dimension
people used to do all kinds of signal processing stuff for ML on images, and probably still do
^
sorry if your question got lost in my wall of text
I saw an example of this technique recently where they had multiple examples of biblical art with cherubs holding some kind of writing and used it to decode the text effectively
i will read up on this technique, much appreciated
You’re not really discarding random features though. It is analogous to PCA discarding eigenvectors with the lowest eigenvalue. Unless you consider that also discarding those eigenvectors is also discarding random features which is certainly reasonable
what could be some of the performance metric that is generally good for this type of classification?
i know about the general one like accuracy, recall, precision and f1
overall, F.T is worth experimenting - but seeing the lack of use in real-world (imaging BTW) doesn't seem like a promisable candidate, than say PCA
which is also easier and more commonly used 🤷
im really new to this domain so could you elaborate on what PCA is?
Really? I wouldn't say like that - for instance the differenced b/w images and sound are huge
sound can be represented as useless images spectral features, but images can't be represented as sounds, can they?
I’m just saying in a hand wavy way that many of the techniques used in dsp translate directly and nicely to image processing. I wouldn’t delve too deeply into a sound are pictures metaphor
fair enough
what would that even mean?
how many classes? f1 isn't a bad default, for a school project at least
you can also consider using a proper scoring rule like brier score, but neural networks tend to have very poor probability calculation
first one is binary, second one consist of 4 classes
sure, although if the underlying model prediction code is already multi-threaded then you don't want to start mixing in your own threading
also in python specifically multi-threading for computation doesn't work well due to something called the "global interpreter lock"
so in python you really need to use processes for parallel computations, threads are good for parallel/concurrent i/o but not cpu-bound computation
Mmmm
in english pls? xDDD
sorry i didnt understand. May i tell u my plan
and u tell me if it is doable?
yes, it helps if you are more specific
Hello i want to ask again about dummy dataset for face recognition using vector similiarities. Let said I have thousand vectors from thousand known person in my vector database. I have talk to someone outside that i need to add some dummy vectors with unkown class. But i still confused why i need to add some unknown dummy vectors to the known dataset? Is this for performance testing?
Okey. I downloaded a bunch of pokemon images. I found on github a model that is supposed to predict pokemon images. I wanna use this model to clean the images i downloaded. Can i use threading to increase the speed?
nvm, from 95 images the model i downloaded fails on 71
easy peasy
i guess i have to manually clean the data :D
gl hf
Is good for the training model passing this image as Bulbasaur Label?
no, right?
use processes, not threads. see https://pypi.org/project/joblib, but consider that there might be significant overhead in loading the data multiple times or passing it between processes. joblib can help by caching/sharing numpy arrays and dataframes, but i find that it's not always reliable (and there's no easy way to explicitly tell joblib what to cache and what not to cache)
what do you mean by this? what's a "dummy" dataset? what is the data being used for?
t1 checks pokemon[0], t2 pokemon[1] and t3 pokemon[2]. When they are done, t1 will move to pok[3], t2 to pok[4] and t3 [pok5]
threads for predicting, not training
there are 2 problems:
- in python, 2 threads can't execute computations in parallel. this is a python-specific limitation.
- the underlying machine learning library might already be using multithreaded computations.
so you want to explicitly turn off multithreading, group your data into batches, and have processes making predictions on their own batches
mhmm okey, so no threads
a model that cleans pokemon images?
can you send me the repo?
it doesnt clean xd
i just though about using that already trained model to predict the images i downloaded xd
if the prediction matches the class where i downloaded the image, then it is a good download xD
the probablity that a random model on github generalizes on general data is slim, but ehh
i use it for image searching. The dataset is embedded vector from an image (512 x 1) and each vector represent one known image class. The dummy is the random 1 * 512 vectors with dummy numerical and unknown class
this is the one i found
loss: 0.1279 - accuracy: 0.9743 - validation loss: 0.9940 - validation accuracy: 0.7917
But failing 71 out of 95 doesnt fit that accuracy
xDDDD
so if I have 300 vector from 300 class image, i will put for example 600 dummy random vectors with unknown class. Anyway this is to calculate the performance, to know how much the vector search return unknown or wrong class. But i think it's can be work without the dummy, unless the vector of each class are pretty close distance
it's overfitted lol
didn't expect much either
whew thats pretty big loss too... and also sign of overfit
it's clearly overfit since the validation and train loss margin are too big (0.1 to 0.9)
yes this is what i expected
well, then how can you use it on your task?
i cant
:D
then why are you saying it you are using? 🤔
how does inference has to do with data cleaning?
?
.
that's not cleaning bruv
its more considered under general pre-processing
but anyways
it is not a bulbsaur
then model will say it is venosaur
but that img in on bulbasaur class
don't really matter if the quantity of outliers is less
you can always compensate by robustness for the model
anyway, ^
again, shouldn't be too much of an issue
mmm
so may i use this model and do transfer learning with it??
even i have some images that wont match the class?
you can't always compensate for outliers... usually you can tolerate a few
if you are training on your own pokemon data, then it does seem like a good idea to use an existing pokemon image classifier for transfer learning
This is how many images i have per class
well, not my own. I want it to predict any image
also, why u say this? everytime u wanna classify some images u start from some point. Imagenet for example. Isnt that transfer learning?
you are asking, why does it seem like a good idea?
cuz even if that model sucks, it has already seen some pokemon images
you can - most of the times
depends on how many and how bad
you have about 20% in kaggle competitions
looks at cassava
;(
right, and i said it was a good idea. you just answered your own question!
after learning basic python, if you want to learn ML and AI, where would you start?
I've heard of numPy
you can even learn them
It's simple
How should I learn them though
Okay, thank you
imo start learning probability and statistics. don't focus too much on learning fancy python libraries yet
there is probably a good "data science with python" book
ok, i actually do need to learn basic python first i was asking for future reference but thank you!
https://jakevdp.github.io/PythonDataScienceHandbook/ this might be good for specifically learning the basic data science tools in python, once you are comfortable with the python basics
but it doesn't help you learn the stats or math
this might be a good one too, again for learning how to use python libraries like scikit-learn https://www.oreilly.com/library/view/introduction-to-machine/9781449369880/
can it be a basic level of probability and statistics?
what do you mean by this? you should always start by learning the basics
numpy, scipy, matplotlib + seaborn, scikit-learn, spacy and nltk (for text/nlp work), tensorflow/pytorch/jax, etc.
i mean, do you have to learn advanced probability and statistics
oh ok thx
do you need to fully understand the measure theoretic definition of probability? no
oh kk
imagine a university graduate who got an A in calculus, probability, linear algebra, and statistics
if you have that level of training, you know enough
you don't need to learn all of it at once, of course
you should strive to gradually learn more of it over time
ah ok
can someone recommend me a good book to start data science with python?
@polar stag i just recommended two of them above
got it. thanks
Is there a way to keep a jupyter notebook running while it's not actively opened in the browser and keep the output?
def softmax(self):
return np.exp(self.dataset) / np.sum(self.dataset)
does anyone know why the plot of my softmax looks wrong
the dataset only has 1 dimension
File --> Export Notebook As... --> Executable Script then run the python file it gives you.
hmm I'd like to run it remotely though
I have my jupyter notebook set up on a remote server
ssh into the server, run the file
at that point what's the point of the notebook then?
The point of notebook is exactly as the name implies, a notebook.
If I ssh I'll have to make sure to run it such that it doesn't stop when I close the ssh session as well..
It's useful for sharing things too.
use screen
Basically I need it to stay up so I can run experiments on azure over night
I'm not sure if it's possible to submit experiments in a queue
that would make it easier
What could be easier than making a script that does all this with one click?
(can tmux too)
I know about screen
it's mostly about things being harder to edit if I have to sftp them in and out every time..
yeah but it's not remote
I spent way too much time getting the notebook to run remote in the first place which kinda feels wasted now
I figured it would be possible since colab can keep sessions alive when a notebook isn't open
So you have notebook running on a remote server with the --no-browser option?
If you run some cell(s) it should just keep running.
Try running notebook server on your local machine: jupyter notebook --no-browser --port=8080, connect, make a new notebook, run an infinite loop that prints something, and exit. It should just keep running as long as the server is running.
hmm yeah it keeps running when I close it but when I open the notebook again it restarts the kernel (?)
Yeah you need to keep the page open.
because that's not the definition of the softmax function
That's exactly what I'm trying to prevent
You can't it's a known issue that is open on github for about 5 years now. So it's not happening.
The weird part is that the kernel is shown as "running" when the tab is closed - it just restarts when it's reopened for some weird reason
I think it means the editing session.
If you already ran something the output is still there.
hmm I'll have to try it again
If you want it to run while you are not connected, you need to download the python file which runs all the cells and do it manually.
But the manual is not too hard, just need to write a small script that uploads and runs it on a screen.
No it's definitely still running in the background
it shows as running and when I run a new code cell it queues it
so it can keep the process running in the background
It doesn't seem to be able to track which cell is running though
So when I ran a simple input echo loop that runs forever and came back the cell had finished running
and the output is ofc not kept in the cell output but I could log to a file instead
The previous input output echos were still there
It looked like that to me at first too but it was just not shown as running
the tab still shows the hour glass icon and trying to run another cell queues it instead of running it immediately
Idk when I came back to the notebook it was no longer asking for input which means the loop has stopped.
Yea I verify it, I ran a loop that sleeps for ~30 seconds, closed the tab, opened the notebook again and queued another code cell
it showed as queued and executed a few seconds later
so in principal it seems to work, it just doesn't recognize which code cell is currently running and stops printing the output
maybe that's good enough if I log to a file instead
yea, I just check the tab logo to see if something is running\
but for output, logging is the simplest way
Are you using the builtin python logging library for that?
I mostly train models only, and TF/PT have a built in logger as an optional callback
I store epoch-level stats in the checkpoint itself
the logging would be more for azure stuff
Hey everyone, I'm trying to determine the average treatment effect for a problem I've been working through. To achieve this, I'm using dowhy + econML. When using the ForestDRLearner that is a part of these packages, the results I'm getting back for an average treatment effect are waaaaaay bigger than they should be. Does anyone know what could be causing this?
For example, range of outcome variable is [-10, 10], binary treatment, ATE is -50
how much? down-payment? 😛
I want to learn AI in University, I need a guidance please Help me!
Bsc Artificial Intelligence
hey i got some trouble with pandas, anybody out here?
trying to get EMA5 from ccxt api with pandas, but its giving me some diff return values
import ccxt
import pandas as pd
exchange = ccxt.binance({ 'enableRateLimit': True })
ohlcv = exchange.fetch_ohlcv(symbol = 'DOGE/USDT', timeframe='1h', limit=5)
data = map(lambda x: [ x[4]], ohlcv)
df = pd.DataFrame(data, columns = ['close'])
df['ewm'] = df['close'].ewm(span=5, min_periods=0, adjust=False, ignore_na=False).mean()
print(df)
hm, that looks like the right usage to me. what were you expecting that you didn't get?
lemme send some samples
that'd be great
output:
0 0.33208 0.332080
1 0.33461 0.332923
2 0.33247 0.332772
3 0.33088 0.332141
4 0.33400 0.332761
following some screenies
thats the 3rd index
id expect around 0.33354 on the second value
but it says 0.332141
just to be sure that the blue line indicates EMA5
i have update the whole script on my first message ^
thanks for the updated code
close close_ewm
timestamp
1622134800000 0.33208 0.332080
1622138400000 0.33461 0.332923
1622142000000 0.33247 0.332772
1622145600000 0.33088 0.332141
1622149200000 0.33494 0.333074
so this is what i got
ye same as i did, thanks
im not sure what im looking at with these screenshots
lemme screen the whole area
0.332141 should be 3.4something?
they might be doing something subtly different with their ewma calculation
this is the close price from that hour, which refers to the first column, index 3
what does the (3,5) indicate?
this is the value that the close_ewm should be
those are the 2 EMA indicators that i put on
EMA3 and EMA5
oh, that's the value for the 3-period and 5-period versions
i see
are they using a different definition of "period"?
yes EMA5 would be calculated over the last 5 periods of 1 hour
what kinds of timestamps are these? i thought they were unix timestamps but then they're dates 3000 years in the future
UTC timestamp in milliseconds, integer
i tried it with timestamps, but the order seems right
i tried to reverse the array tho, but that shouldnt be
btw here is how i loaded the data
import ccxt
import pandas as pd
exchange = ccxt.binance({ 'enableRateLimit': True })
ohlcv = exchange.fetch_ohlcv(symbol = 'DOGE/USDT', timeframe='1h', limit=5)
df = pd.DataFrame(ohlcv, columns = ['timestamp', 'open', 'high', 'low', 'close', 'volume'])
df.set_index('timestamp', inplace=True)
df.sort_index(inplace=True)
so yes they are definitely in ascending order
df['close_ewm'] = df['close'].ewm(span=5).mean()
print(df[['close', 'close_ewm']])
this gives me
close close_ewm
timestamp
1622134800000 0.33208 0.332080
1622138400000 0.33461 0.333598
1622142000000 0.33247 0.333064
1622145600000 0.33088 0.332157
1622149200000 0.33494 0.333225
but i can't get the 2nd-to-last one to be something that rounds up to 0.34
imnot sure if this is the correct place but hi guys, im trying to plot my own trendline onto this graph how do i do it
what would a trend line be in this case? this looks very non-linear
the exponential line you see has been converted into a linear line which has the value of 0.6559.. etc and i want to plot it over it, so so i could see the two overlapping
the exponential line you see has been converted into a linear line which has the value of 0.6559.. etc and i want to plot it over it, so so i could see the two overlapping
so you want to plot a line with slope 0.6559? what is the y intercept, 0?
its about the same that i already got, still not near
yeah @cinder lantern im not sure. maybe they are doing something slightly different with their calculation
im not sure, i have used the line equation to get that value so if i use another y value wouldnt it make the line inaccurate? if thats not the case, ill try it rn!
it didnt work 😦 unless i did it wrong
i dont know if there is a y intercept, i got this line by using the straight line equation, y2-y1/x2-x1
lol i got it
man im dumb af
@desert oar
we could have seen this from our testresults
close ema3 ema5
0 0.33876 0.338760 0.338760
1 0.34095 0.339855 0.339490
2 0.34117 0.340512 0.340050
3 0.33920 0.339856 0.339767
4 0.33584 0.337848 0.338458
see how 0 is the same while having a different span. its cuz it had no history to calculate with....
i only queried 5 periods, but for the first period, i have to query 4 more backwards in time
can i train a model on colab with files on my local machine?
Yeah, I suspected it was something like that
plot first argument should just be the 2 x values, second argument should be the 2 y values
https://stackoverflow.com/questions/32539832/keep-jupyter-notebook-running-after-closing-browser-tab
%%capture output
Code doesn't stop on tab closes, but the output can no longer find the current browser session and loses data on how it's supposed to be displayed, causing it to throw out all new output received until the code finishes that was running when the tab closed.
(Also why my input loop stopped, it could not fetch input anymore)
Does anyone know if there is a way to scrape fb data
@broken stratus discussion of scraping facebook would be against our server rules, sorry.
that's because it violates the facebook terms of use, and we are not allowed to help with that
You are just plotting a,b and these are points. If you want a line calculate trendline = a*x + b and plt.plot(x,trendline) .
hey guys i need to manually map specific values with another value. Its like 55 records with specifically different (no logic) mapped valued. i have two alternatives im thinking about:
- make a dictionary with 55 values and associated values, or
- just put them in two columns and match them up via a .csv or .xlsx file.
Just wondering if theres a good practice for stuff like this? What would you recommend?
Hey
I am training a Keras model
a CNN for sentence classification
TensorFlow tells me my GPU is available, but how can I discretely see if the GPU is being utilized during training?
OS? GPU?
fixed!
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
the select query keeps on giving error 1241. Operand should contain 1 column. Would anyone here know where the problem might be?
Maybe both? Keep it as a csv since that's easy to write. Load it in memory from csv to a dictionary for use in code. If the count was smaller I might not have used csv but 55 seems like a large enough number
guys does anyone know an python project about face classification from photo which one real face or it was fake (taken using 2nd phone/monitor) using any kind of method as reference?
Whats the purpose of Dense layer?
its to flatten the n-dimension feature into single dimension one
to wrap up the entire convolutions layers
how to increase the fig size????? #help-burrito #help-bread #help-avocado
help me please....
i tried all tricks..
the fig size is not increasing
heeeloooooooo
please advice...
maybe try making it horizontal?
Oh I see, your graph is tiny and no, I don't know how to make it bigger, sorry
So...I was thinking about gradient descent
suppose we have a simple equation where the variables are the weights for the network, and the equation is the loss function. so we would basically want to locate argmin
but instead of using SGD the whole time, why can't graph it?
so like we take n samples of different random weights - and we visually graph it (not the TF graph). we store it in a data structure, say the weights in one column and the output loss in the other. As we store the values in the data strucuture, we build a visual graph of it as we go.
now, after we try n different combinations of weights, we see where all the local minimas in the visual graph lie.
(Obviosuly we won't compute it for the whole domain, only certain number of specific values. )
lets call n ---> resolution of the graph. Thus, with a decent enough resolution, we can atleast guess where the global minima might be.
Thus, we take the guess of the weights that might correspond to a minima, and then we do SGD on it. so basically to initialize the weights and biases closer to a guess of a global minima.
on the graphs, mathematically we can calculate minima if suppose we have a 3-D loss plane. then a point where surrounding points would be greater than that point, would be the local minima
we do this a few times (which would take milliseconds) and then we would have a quite good initialization for the weights of the NN.
why don't we do this?
something very similar to this is what metahueristics like simulated annealing do - they randomly take some weights and explore multiple minimas and hope to find a global minima. in theory this sounds good but in practice metahueristics suck at training neural networks.
also, finding the global minima isn't necessarilly the best thing, it could be overfit to that data. a sufficient local minima that generalizes well is enough
Simply because it won't be as simple and as you assume. These functions can get gnarly. And we don't really use sgd as is, we use it to teach sure, but we usually use some clever tricks on top of sgd (look up Adam or adaboost)
as Darr said, randomly selecting weights and letting sgd do its work isn't very efficient. the loss space is so huge that 99.99% of the time it would be better to start from a single random weight let it train for longer
If my data only has 1 feature, is feature scaling still required?
😔 But I though scaling makes sense when there are atleast 2 features out of scale, can you give me an explanation as to why is it needed with just 1 feature?
can anyone explain mcmc
i get the monte carlo part, i also know what markov chains are, but i dont get how u put them together and how it works
like markov chains of parameters? what does that even mean
it won't be as sensitive to variations in data that may be actually small, but numerically large.
but if you are doing linear regression, then it doesn't matter
Can you help me? Why does the accuracy remain constant in the epoch results in the cnn model?
The super tldr: there are ways to construct a markov chain such that the equiliibrium distribution of the markov chain is a particular probability distribution. The really cool (and useful) part is that you can do this without knowing the exact analytical form of the distribution function. This enables us to fit and sample from complicated Bayesian models for which computing the exact form of the distribution function (especially the normalizing constant) would be intractable or impossible.
This general category of algorithms is called "Markov chain Monte Carlo". Typical MCMC algorithms include Metropolis-Hastings, Gibbs Sampling, Hamiltonian Monte Carlo, and the No U-Turn Sampler.
Vanishing gradient?
try fig.set_figheight and fig.set_figwidth, those always work for me
however there could be other issues here, it looks like the legend is very big but the main plotting axis is not
do jpg files of mxn pixels have the same quality as if it was png?
any good paper for image processing?
PNG is a lossless compression format, JPEG is a lossy one.
so answer is no?
JPEG quality depends on the settings - the higher the compression, the more it butchers the image
if quality is important, use PNG
Well, PNG is the most common one; others exist:
https://en.wikipedia.org/wiki/Lossless_compression#Raster_graphics
well, my pokemon data set was full on png format. it was 4GB i guess. now i changed so that images with 3 channels are saved as jpg. size reduced by half
but idk if a png of 3 channels has the same quality of a jpg :D
I mean, i did this cuz i need to upload the dataset to drive :(
nah, JPEG can compress better because it introduces artifacts
isnt jpg = jpeg?
same, yes
ok ok
I mean better than PNG
you can play around with saving images to JPEG, and comparing them with the originals
here's an example
anyway, now that u mentioned artifacts, i guess is good having artifacts on ur dataset, so model trains better
o.O
ah! also... i have another question
A compression artifact (or artefact) is a noticeable distortion of media (including images, audio, and video) caused by the application of lossy compression. Lossy data compression involves discarding some of the media's data so that it becomes small enough to be stored within the desired disk space or transmitted (streamed) within the available...
sometimes... when u convert png to jpg, background has weird things... and my model works with 3 channels
so those shiity backgrounds... may i preprocess them?
not sure what you mean by weird things
wait
those "weird things" are the compression artifacts
no
i dont mean that
discord can open png, so it looks like this
if i open this image as RGB
no alpha
it looks like this
it is because the background has color, but since alpha channel is there, it isnt being painted
yep, (255, 255, 0, 0) looks the same as (255, 0, 255, 0) because they're both fully transparent
so should i do some kind of preprocessing like if alpha is 0 then paint the pixel full white
or something?
yeah i was about to suggest that
Yeah, you need to blend the alpha-channel into the image, since JPEG doesn't support RGBA
from PIL import Image
import io, numpy as np
fake_file = io.BytesIO()
img = Image.open(r"D:\Programming\1200px-Typescript_logo_2020.png")
img = img.convert("RGB")
img.save(fake_file,format="jpeg",quality=10)
fake_file.seek(0)
img2 = Image.open(fake_file)
# calculate difference:
arr1 = np.array(img)
arr2 = np.array(img2)
diff = np.abs(arr1.astype(np.int32)-arr2.astype(np.int32)).sum(axis=2)
diff = diff*255/np.max(diff)
diff = diff.astype(np.uint8)
diff_img = Image.fromarray(diff)
diff_img.show()
here's an example of compression artifacts
original
@tidal bough is there an "intelligent" way to do the alpha channel blending? i assume hard coding to white could cause problems if there are light colored or white objects in the image
difference with result
difference between png and jpg u mean?
it shouldnt since the only pixels full transparent are the background
this is a bad example because this is a very simple image and can be compressed well, but you can see that at the edge of letters and at the rounded edges, there are differences between original and compressed
ok ok, i see
I don't know of one. You need to select a background color and then mix it and the image depending on alpha. Perhaps you can somehow detect what color isn't present in the image
what if the pokemon has white hair or something?
the edge of the hair is gonna be black
look, this is what ive done in the past
def black_white(image):
return np.where(image[:, :, 3] == 0, 255, 0).astype('uint8')```
eevee has hair tho
i think u mean hair could have some transparency, but some != totally transparent
So i only care about the pixels with alpha value == 0
maybe the eevee pixels have transparency 100, or maybe 85, or 1. i dont care. What i know is alpha = 0 -> no pokemon
but if u dont trust this 100%, an intelligent way is using a neural network that extracts the object on the picture and its mask :)
salient object detection or something like this that is this called
there is a model called u2net
I know I’m butting into the conversation. I recently did image processing for a class (not AI; just regular image processing).
Always use PNG for processing. If the user input file is a JPEG, convert it to PNG before operating on it.
As for alpha channel problem (Bulbasaur example) you will need to replace the color of the image background with the platform background color.
Thankfully, the image background is usually one solid color (white or close to white).
No, platform like your GUI program. Discord dark mode uses gray.
ah
Hey guys, I would like to start in DS and ML, but I am not good at math. What topics of math is used in DS and ML?
Do you guys have some material to help me with math?
yeah but, for the bulbasaur above, if i pass it to my model for training, model will see green tones on the background
cuz it will remove the alpha
for ML, a ton of linear algebra and some basic calculus (derivatives, etc)
for DS, well, statistics and the probability theory required for it
Can you guys recommend me some books to learn math?
One last thing. Is geometry used in ML? I just HATE GEOMETRY
I mean, not really, unless you count linear algebra as such
@cedar sun Unfortunately I didn’t do it with ML so I don’t know that answer. Just remember that an image of size 400x500 is always 400x500 unless you physically crop it. Which you don’t because that’s annoyingly hard. Therefore, every pixel in that dimension needs a numerical value.
I hate the fact that I love computer world but I have to learn math to understand
(heh, I mostly remember disliking geometry in high school because it was too... nonobvious - like, you had to find a way to solve a problem, what equations to write, as opposed to just doing some mindless math like in algebra or even physics)
does anybody know by chance whats the best accuracy anybody has ever gotten in predicting forex/stocks?
Do you guys think that Khan Academy is a good place do learn math?
linear algebra of the kind you'll be using in ML isn't really geometry-related
sure, it has nice calculus
but it´s not the best right?