#data-science-and-ml | Python | Page 314

cedar sun May 24, 2021, 1:47 PM

#

and what will happen with the conexions?

#

i need to retrain it,right?

novel elbow May 24, 2021, 1:48 PM

#

yes

#

you can freeze all previous layers and just train the last one

cedar sun May 24, 2021, 1:49 PM

#

uuuuuh

#

no but

#

i get my model with load_model from keras

#

i need to load the model, remove the last layer, add my own, freeze, and train?

#

it shouldnt take too long, right?

novel elbow May 24, 2021, 1:51 PM

#

yes, should be faster than training all the model and you don't need many epochs as you are only optimizing one layer

#

check https://keras.io/guides/transfer_learning

Keras documentation: Transfer learning & fine-tuning

cedar sun May 24, 2021, 1:51 PM

#

thanks thanks

upper spade May 24, 2021, 4:02 PM

#

guys where to learn pandas

pallid cliff May 24, 2021, 4:40 PM

#

hi here,
I'm trying to do some stats on a pd DataFrame on some products,
I have a column store which is a list of string : ['carrefour', 'auchant', 'bi1', 'wallmart', ...] stores where that product is sold
and a column calories : float number of calories in that product
I want to rank stores based on the average calories of the products they sell
can someone help me ?

near cosmos May 24, 2021, 4:59 PM

#

pallid cliff hi here, I'm trying to do some stats on a pd DataFrame on some products, I have ...

It'll be something like (from memory so something might be a little off)

# assuming df is the name of your DataFrame
df.groupby("store")["calories"].mean()

pallid cliff May 24, 2021, 5:18 PM

#

near cosmos It'll be something like (from memory so something might be a little off) ```py ...

it's not really working,

df.groupby('store')['kcal'].mean().nlargest(10)

[['Wholefood']]                                                                      3830.0
[['Costco']]                                                                         3779.0
[['Super U', 'Magasins U', 'Woolworths', 'Coles']]                                   2384.0
[['carrefour market plouagat']]                                                      2000.0
[['Biocoop eau vive']]                                                                900.0
[['Bo nature et santé']]                                                              900.0
[['Carrefour Market', 'Leclerc', 'Systeme U', 'Auchan', 'Casino', 'Intermarché']]     900.0
[['Carrefour', 'houra.fr', 'Magasins U']]                                             900.0
[['Carrefour', 'intermarché']]                                                        900.0
[['Carrfeour', 'Auchan', 'Leclerc', 'Systeme U', 'Casino', 'Monoprix']]               900.0
Name: kcal, dtype: float64

it's not grouping the way I want. See how "Carrefour" appears in multiple rows

woven surge May 24, 2021, 5:20 PM

#

Hi, so I'm looking into config files and I have one, but it generates based off of my main.py file which explicitly defines the data structures. I want to be able to modify the config and have that reflect in my main.py file. How would I do this?

What's happening now:
main.py creates config.ini w/ pre-defined data-structures (config.ini = hardcoded)

What I want:
modifyable config file which main.py retrieves information from and uses to perform operations

near cosmos May 24, 2021, 5:21 PM

#

pallid cliff it's not really working, ```py df.groupby('store')['kcal'].mean().nlargest(10) `...

Oh, I misread original. I recommend splitting your store column, so that each row has the number of calories for one product at one store. You may try https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html

serene scaffold May 24, 2021, 5:24 PM

#

pallid cliff hi here, I'm trying to do some stats on a pd DataFrame on some products, I have ...

looks like you'll need to make another dataframe you where explode the store column

#

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html

#

# what you have
[['Wholefood']]                                           A
[['Costco']]                                              B
[['Super U', 'Magasins U', 'Woolworths', 'Coles']]        C

# what you want
'Wholefood'  A
'Costco'     B
'Super U'    C
'Magasins U' C
'Woolworths' C
'Coles'      C

oblique raft May 24, 2021, 5:29 PM

#

Can someone recommend me a good tutorial for generating text with a rnn with keras ?all tutorial I've tried just don't work for me... Thanks in advance

grave frost May 24, 2021, 5:29 PM

#

oblique raft Can someone recommend me a good tutorial for generating text with a rnn with ker...

https://www.tensorflow.org/tutorials/text/text_generation

TensorFlow

Text generation with an RNN | TensorFlow Core

oblique raft May 24, 2021, 5:30 PM

#

Interesting...

#

Python 3?

grave frost May 24, 2021, 5:30 PM

#

oblique raft Python 3?

yes and TF2

woven surge May 24, 2021, 5:34 PM

#

woven surge Hi, so I'm looking into config files and I have one, but it generates based off ...

Anyone know how configs work? I have it working backwards of how I want it. Any help as to how to use config values to perform calculations in a .py file would be greatly appreciated!

oblique raft May 24, 2021, 5:36 PM

#

Spam

sly salmon May 24, 2021, 5:50 PM

#

Hey guys, Q about gradient descent.

When utilizing gradient descent, what is the function we are "descending"?

Is it the Loss (y axis) vs Feature weight (x axis)? If so how do we find this function?

Also for a neural network, is gradient descent done for each individual weight? E.g. for 20 lines connecting nodes, 20 gradient descents are performed on each weight to find the weights resulting ineast errors?

desert oar May 24, 2021, 6:14 PM

#

sly salmon Hey guys, Q about gradient descent. When utilizing gradient descent, what is th...

Is it the Loss (y axis) vs Feature weight (x axis)?
Yes, but keep in mind that there can be many feature weights in a big neural network, potentially thousands or millions. So x can be a very high-dimensional vector.

If so how do we find this function?
Either derive it by hand, or use an "automatic differentiation" software package to compute it for you.

Also for a neural network, is gradient descent done for each individual weight? E.g. for 20 lines connecting nodes, 20 gradient descents are performed on each weight to find the weights resulting ineast errors?
No. The "gradient" is kind of like a vector-valued derivative. You update the entire weight vector in one step. Updating each weight individually is called "coordinate descent", which is used in some models but usually it's not important from the user perspective.

Gradient descent is effectively an implementation detail. It happens that neural networks are so difficult to optimize that tuning the optimizer is a necessary part of training them. With most other models and optimizers, you don't have to tune the optimizer in day-to-day usage (e.g. logistic regression with L-BFGS).

simple shadow May 24, 2021, 7:23 PM

#

hi! i was wondering how do i iterate over rows to find a specific value that meets a certain condition

grave breach May 24, 2021, 7:28 PM

#

simple shadow hi! i was wondering how do i iterate over rows to find a specific value that mee...

I don't think you need machine learning for this

#

Just do:
for row in rows:
if <whatever>:
do_something()

#

Or maybe you meant something like anomaly detection?

#

(for that you need ai)

#

@simple shadow

near cosmos May 24, 2021, 7:30 PM

#

grave breach I don't think you need machine learning for this

https://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/ 🙂

grave breach May 24, 2021, 7:31 PM

#

near cosmos https://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/ 🙂

haha

grave frost May 24, 2021, 7:35 PM

#

technically tho, even using if-else is A.I

desert oar May 24, 2021, 7:41 PM

#

simple shadow hi! i was wondering how do i iterate over rows to find a specific value that mee...

in pandas, you don't usually need to specifically iterate over rows. do you want the row number? the value itself? the row "label" (aka the index)?

#

@grave breach this channel is kind of the catch-all channel for pandas, numpy, matplotlib, and scipy

grave breach May 24, 2021, 7:50 PM

#

Sorry, didn't got he meant pandas' rows, I thought he just had a list of lists

desert oar May 24, 2021, 7:55 PM

#

ah yeah. you'll start to see common patterns in people's XY questions so you can skip some of the back and forth "what do you mean" stuff.

near cosmos May 24, 2021, 7:55 PM

#

I didn't see that in there either

simple shadow May 24, 2021, 8:01 PM

#

@desert oar i want to change specific values in one column

near cosmos May 24, 2021, 8:05 PM

#

simple shadow <@!389497659087650836> i want to change specific values in one column

are you working in a pandas data frame?

simple shadow May 24, 2021, 8:07 PM

#

yes @near cosmos

near cosmos May 24, 2021, 8:11 PM

#

simple shadow <@!389497659087650836> i want to change specific values in one column

I think you are looking for this https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#if-then

# for all rows where column AAA >= 5, change the value of BBB to -1
df.loc[df.AAA >= 5, "BBB"] = -1

cedar sun May 24, 2021, 8:20 PM

#

@serene scaffoldhello. He didnt reply u yet, right?

serene scaffold May 24, 2021, 8:23 PM

#

cedar sun <@!253696366952316929>hello. He didnt reply u yet, right?

He said he's not sure how one would request that.

cedar sun May 24, 2021, 8:25 PM

#

huh?

#

what do u think i want? i mean, i was looking for like 50+ images of each pokemon

uncut barn May 24, 2021, 8:58 PM

#

For a dissimilarity measure to compare ratings such as very dissatisfied, dissatisfied,
neutral, satisfied, very satisfied should it be 1 hot encoded and then use the hamming distance?

#

or can I use euclid's distance

gentle lion May 24, 2021, 9:05 PM

#

hey i'm using tensorflow.keras to train a CNN, but for some reason it doesn't show anything or do anything after model.fit

#

i basically removed all my layers but still nothing

cedar sun May 24, 2021, 9:05 PM

#

I need to gather a big image data set of pokemons to make a cnn classifier. If any of u wanna help me, pls check google image search python api, and follow the steps. Ping me too to share a script

gentle lion May 24, 2021, 9:05 PM

#

it just prints 1 and keeps running forever

#

any idea why it might be?

#

i even have verbose to 1 so it should print some epoch info

sly salmon May 24, 2021, 9:25 PM

#

desert oar > Is it the Loss (y axis) vs Feature weight (x axis)? Yes, but keep in mind that...

Thank you for the awesome explanation. I am getting a grasp of how the cost function updates the weights of all the weights instead of doing them individually.

I have a few more qualms, if you don't mind.

How do we actually know the formula of our cost function (e.g. in the form y = mx + c)?
My current intuition is that we set our cost function from the get-go, e.g. least squares. So that's how we know. Is that correct?

Also, here is how I think gradient descent works: Please let me know if this is wrong

We have a point on our cost-function, (x, y, z) which is the weights of our x, y, z features
We partially differentiate our cost-function, and sub in values for x, y, z to find the gradient at the point specified in the previous step (the rate of change of the loss function in respect to all features)
so, something like [1, 2, 2]
We then go down this gradient, by updating the weights for our x, y and z features at once:
weights = [5 (weight of x), 4 (weight of y), 3 (weight of z)]

# multiplying our weights with the gradient of the weight
[ 5
  4    x  -[1, 2, 2] = multiplied_weights
  3 ]

new_weights ([new_x, new_y, new_z]) = old_weights - training_steps * multiplied_weights

We recalculate our cost, and carry on, until it's a minimum.

The hardest part was thinking of each feature as a vector. I really appreciate your help. If you have a bitcoin address let me know.

simple shadow May 24, 2021, 10:05 PM

#

near cosmos I think you are looking for this https://pandas.pydata.org/pandas-docs/stable/us...

thanks!

sly salmon May 24, 2021, 10:10 PM

#

also if I'm finding a cost function for a neural network, how many iterations should I go through (adding the cost to a cost_sum variable before I divide by the number of iterations to get the avg. cost)?

fiery cipher May 24, 2021, 11:00 PM

#

hi, I have a data set a mix of int and string , the type is numpy.ndarray am trying to detect the string attribues do a condition I used if isinstance(point, str)== True but it doesn't seem to work

near cosmos May 24, 2021, 11:04 PM

#

fiery cipher hi, I have a data set a mix of int and string , the type is numpy.ndarray am tr...

can you make a minimal working example of the problem?

fiery cipher May 24, 2021, 11:09 PM

#

near cosmos can you make a minimal working example of the problem?

okay I will try

soft silo May 24, 2021, 11:10 PM

#

Hi guys, is someone experienced with scipy here? I'm trying to solve a set of two differential equations and i need some help to verify if it's correct

fiery cipher May 24, 2021, 11:11 PM

#

near cosmos can you make a minimal working example of the problem?

I don't know how correct this method is XD am still learning but this is an example

near cosmos May 24, 2021, 11:13 PM

#

fiery cipher I don't know how correct this method is XD am still learning but this is an exam...

what does your data (X and centroids) look like? consider grabbing a help channel #❓｜how-to-get-help

fiery cipher May 24, 2021, 11:14 PM

#

the centroids are random points from the data this is how they look

cedar sun May 24, 2021, 11:20 PM

#

guys, using seaborn, how can i make a barplot without legend on one axe?

near cosmos May 24, 2021, 11:21 PM

#

cedar sun guys, using seaborn, how can i make a barplot without legend on one axe?

usually you can pass legend=False or legend=None (I forget which)

cedar sun May 24, 2021, 11:21 PM

#

it doesnt allow me

#

x = sns.barplot(list(dct.keys()), list(dct.values()), x=None)

#

so

#

dct.keys() is what i dont wanna display xd

#

so basically what i wanna display is

#

string1 = 80, string2 = 120

#

etc

#

the values of each one

#

but i only want a vertical axe having numbers like 25-50-75-100

#

do i explain?

leaden meteor May 24, 2021, 11:41 PM

#

boa noite

near cosmos May 24, 2021, 11:51 PM

#

cedar sun but i only want a vertical axe having numbers like 25-50-75-100

sorry I don't quite understand. can you provide an example and the output you are getting now

cedar sun May 25, 2021, 12:01 AM

#

i dont want dct.keys to appear on the legend

arctic crown May 25, 2021, 1:09 AM

#

please help

#

warped cave May 25, 2021, 1:50 AM

#

what do neat-python outputs coorespond to and how do I know what to do with them exactly?

desert oar May 25, 2021, 2:23 AM

#

sly salmon Thank you for the awesome explanation. I am getting a grasp of how the cost func...

My current intuition is that we set our cost function from the get-go, e.g. least squares. So that's how we know. Is that correct?
Yes. You choose your loss function f(y_true, y_predicted) and plug in your model for y_predicted. So if your model is linear regression, the loss function is f(y_true, ax + b).

I'm not sure I fully understand your example of gradient descent. Yes, the gradient at a specific point is the vector of partial derivatives evaluated at that specific point. You don't perform weight updates multiplicatively, you do them additively. I recommend looking at the equations, you might be surprised at how simple it is.

Our server rules disallow offering or requesting payment for help, but I wouldn't accept payment anyway.

desert oar May 25, 2021, 2:25 AM

#

sly salmon also if I'm finding a cost function for a neural network, how many iterations sh...

I don't understand your question. Can you be more explicit?

severe valve May 25, 2021, 4:42 AM

#

Does anyone know of any complete beginner tutorials that introduce Keras? I really want to get into ML a bit more but every tutorial I find is absolute trash when it comes to explaining the code. For the most part, other than some deeper level ideas, I understand most ML concepts. That isn't the issue. However, understanding what I'm writing down and not just mindlessly copying it is my problem. I have no idea what the code does or how to use it without almost completely rewatching an entire tutorial. And a lot of the tutorials that I do end up finding are very vague and tend to just read commonly available materials which I have already gone through. I'd really appreciate any and all help I could receive, thank you!

autumn basin May 25, 2021, 5:13 AM

#

@severe valve the keras documentation is honestly the best for this. Tutorials tend to abstract away from what is going on under the hood.. and over abstraction leads to the confusion you are talking about. The documentation is dry, but it will leave you with an accurate understanding of how to use the API.

near cosmos May 25, 2021, 5:40 AM

#

You are passing a file object, instead of a string, to tokenize_words. You need to .read or similar from the file to get the contents

severe valve May 25, 2021, 6:16 AM

#

autumn basin <@!458038979326115854> the keras documentation is honestly the best for this. Tu...

I agree, so far it has been the best resource, although when I have a question regarding code in the documentation there doesn't seem to be many answers. But I guess I'll have to give it another shot before I do another exhaustive search.

gentle lion May 25, 2021, 8:29 AM

#

can anyone explain this? im using CNN with keras and it prints epoch 1/10 and then lags for a bit and quits

gentle lion May 25, 2021, 9:08 AM

#

basically when i add a conv layer to my network it stops working

#

any help is appreciated

subtle panther May 25, 2021, 12:13 PM

#

Can anyone tell me please do i need to have a good understanding of mathematics in order to learn ML??

desert oar May 25, 2021, 1:04 PM

#

@subtle panther you should intend to expand your math knowledge and understanding in parallel with your hands-on experience and your programming knowledge

#

So yes, eventually. But you can get started without knowing lots of math up front

#

If you already know calculus and the basics of linear algebra (vector/matrix math and the interpretation thereof as systems of linear equations) it will help

distant phoenix May 25, 2021, 1:13 PM

#

Hello guys!1
Could you help me to create some script?

Note that I'm using 'Selenium Webdriver' and now I need to get all these values (red square) and put then in a List, for example:

[5-2021, 4-2021, 3-2021, etc...]

If I could get its values in a list, I could create a big web scrapping.

#

I can already click on each field, but I wanna put it on a loop, where the RPA will select the first value, click and get all table values... after it will clicks on the second and get all table information

lapis sequoia May 25, 2021, 1:15 PM

#

distant phoenix Hello guys!1 Could you help me to create some script? Note that I'm using 'Sele...

Use beautifulsoup’s findall method and iterate the rows with for loop

distant phoenix May 25, 2021, 1:16 PM

#

I'll search for it and learn how to use. Thank you a lot!!

lapis sequoia May 25, 2021, 1:17 PM

#

You’re welcome 🙂

red hound May 25, 2021, 1:32 PM

#

Is there a list somewhere that shows all the depreceated and equivalent functions from tensorflow v1?

subtle panther May 25, 2021, 1:52 PM

#

@desert oar thanks I understood properly what I need to do

distant phoenix May 25, 2021, 3:31 PM

#

lapis sequoia You’re welcome 🙂

I've import this lab as you suggested, but I've tried many ways to extract the information.

'div', class_="form-control custom-select"'
'div', class_="drop-meslme"'
'div', class_="col-12 col-md-5 col-lg-4"'
'select', name_="meslme"
etc...

I can't find any fild on red square as well.

#

https://shockmetais.com.br/

I use this site

Shockmetais - Metais Não Ferrosos

Além de oferecer a mais completa linha de Laminados e Extrudados, a Shockmetais destaca-se também pelo setor de Fitas e Blanques com modernos equipamentos que agregam serviço de cortes transversais, longitudinais e recortes em blanques para as mais variadas aplicações.

#

and the values are in the "combobox" as in following picture

woven kayak May 25, 2021, 4:43 PM

#

Hi, someone knows how use an audio output as signal generator in google colab?
I try to generate a signal v = E.sin(wt+phi) + Vbias but the function IPython.display.Audio kills my bias.
Thanks in advance!

torpid ember May 25, 2021, 5:02 PM

#

hey when building analytical models, what is being referred to when someone says "have you done the business rules for this model"?

#

someone at work is asking me and im scared to answer because idk

#

does he mean documentation

#

TELL ME YOU HAVE IMPOSTER SYNDROME WITHOUT TELLING ME YOU HAVE IMPOSTER SYNDROME

grave frost May 25, 2021, 5:16 PM

#

woven kayak Hi, someone knows how use an audio output as signal generator in google colab? I...

tried saving it with librosa and reading it again with it?

lapis sequoia May 25, 2021, 5:25 PM

#

I want to make a project analyzing programming language popularity by developer type based on the data contained in the Stack Overflow 2020 Developer survey.

I thought about creating a separate DataFrame for each dev type, then calculating a percentage for each language each dev type said they worked with, but it sounds like too much work for something that surely has a simpler solution.

#

any ideas?

arctic ice May 25, 2021, 5:35 PM

#

how can I use opencv to scan what the camera sees and than give a 3d digram of that to the computer

agile sinew May 25, 2021, 5:45 PM

#

I'm streaming some time series data from Kafka using Spark structured streaming of 10 seconds but sometimes streamed data not contain 10sec .. any solution?

#

thanks in advance

desert oar May 25, 2021, 6:27 PM

#

lapis sequoia I want to make a project analyzing programming language popularity by developer ...

it depends entirely on the format of the data

lapis sequoia May 25, 2021, 6:29 PM

#

Hello guys, I have a question how do I calculate the effect size in a chi quare test. (Outputs I should get are: Phi & Cramer-V)
Thank you for help

cedar sun May 25, 2021, 9:29 PM

#

what is the nn that codes for u?

snow cliff May 25, 2021, 9:54 PM

#

hi guys can anyone help me out please im sorta really stuck. im trying to draw a sphere onto processing but anytime i use P3D, the display image is just grey and blank. does anyone know why?

grave frost May 25, 2021, 9:58 PM

#

cedar sun what is the nn that codes for u?

GPT-3

covert adder May 25, 2021, 10:11 PM

#

Anyone availible to help me with Intro to Data Analysis for Python?

median ember May 25, 2021, 10:37 PM

#

thanks, but doesn´t work on multiple items on B, I solved it, it was quite tricky

velvet thorn May 25, 2021, 10:43 PM

#

median ember thanks, but doesn´t work on multiple items on B, I solved it, it was quite trick...

ye it wouldn’t

#

you would need a different approach for that

#

but your original example only had one row

median ember May 25, 2021, 11:17 PM

#

velvet thorn but your original example only had one row

sorry, I made a quick example, didn´t think it would influence on result

velvet thorn May 25, 2021, 11:20 PM

#

median ember sorry, I made a quick example, didn´t think it would influence on result

in general it’s more complicated if it’s a many to many thing

#

but if you’ve solved it then gratz 🏆

desert oar May 25, 2021, 11:22 PM

#

import numpy as np
import pandas as pd

data_in = pd.DataFrame([
    {'x': 3, 'y': 2, 'z': 1, 'data': [0.1, 0.2, 0.3]},
    {'x': 2, 'y': 0, 'z': 4, 'data': [0.7, 0.8, 0.9]},
])

shape_out = (5, 5, 5, 3)
data_out = np.zeros(shape_out)
for row in data_in.itertuples():
    data_out[row.x, row.y, row.z, :] = row.data

Is there a way to do this using Numpy fancy indexing, or something otherwise vectorized, without looping + itertuples?

#

Naively I had tried

shape_out = (5, 5, 5, 3)
data_out = np.zeros(shape_out)
data_out[data_in['x'], data_in['y'], data_in['z'], :] = data_in['data']

But I got the ValueError: setting an array element with a sequence. error

velvet thorn May 25, 2021, 11:42 PM

#

desert oar Naively I had tried ```python shape_out = (5, 5, 5, 3) data_out = np.zeros(shape...

data_out[data_in['x'], data_in['y'], data_in['z']] = data_in['data'].tolist()

desert oar May 25, 2021, 11:42 PM

#

oof, really?

velvet thorn May 25, 2021, 11:42 PM

#

you get that problem because you have an array of lists

#

in data

#

data_in['data'].tolist() this converts it into a list of lists

#

which numpy will treat as an array

#

you need either an array or list of lists, not a mixture

desert oar May 25, 2021, 11:44 PM

#

huh, i figured it wouldn't matter if it was a list or an array-like thing

#

yep that worked perfectly

data_out2[data_in['x'], data_in['y'], data_in['z'], :] = data_in['data'].tolist()

velvet thorn May 25, 2021, 11:44 PM

#

it doesn't as long as it's homogenous

desert oar May 25, 2021, 11:44 PM

#

@fallen trellis see above

velvet thorn May 25, 2021, 11:45 PM

#

also you can leave off the final :

#

but that's not very important

desert oar May 25, 2021, 11:46 PM

#

yeah i like it for visual clarity

desert oar May 25, 2021, 11:46 PM

#

velvet thorn it doesn't as long as it's homogenous

what do you mean by this?

#

i.e. not dtype='O'?

velvet thorn May 25, 2021, 11:47 PM

#

desert oar what do you mean by this?

either a list of lists or an array of primitives, so, yes, I guess

#

just not an array of lists

#

because then numpy treats it as having length 1 across the relevant axis and tries to broadcast across it

#

which leads to trying to put the list in individual slots in data_in

#

hence setting an array element with a sequence (said list)

desert oar May 25, 2021, 11:48 PM

#

yeah that makes sense, it doesn't know what the objects are in the array so it treats them all as scalars

#

good to know

median ember May 25, 2021, 11:58 PM

#

velvet thorn but if you’ve solved it then gratz 🏆

yeah, it´s not the most efficient thing ever, do you want to see?

velvet thorn May 25, 2021, 11:59 PM

#

median ember yeah, it´s not the most efficient thing ever, do you want to see?

sure

cedar sun May 26, 2021, 12:38 AM

#

so

#

how bad this is

#

https://gyazo.com/b2c1b7e6dd813ad815feacb51359acd1

Gyazo

#

this folder is supposed to have only bulbasaurs

#

but api randomly grabbed an ivysaur, the evolution

#

how many fails like this are a problem for a neural network?

#

Like, if i have 100 images, how many failures can i afford?

#

trying to avoid data clean :)

covert adder May 26, 2021, 12:48 AM

#

Please help!

Generate a vector of 1000 random numbers between 0 to 100.•Plot a histogram of these numbers with number of bins equal to 10.•Calculate the average of these numbers by using numpy method mean().•Plot a red line (red color) from the mean point on the histogram plot in y direction to show the mean location in the plot.

import numpy as np
import matplotlib.pyplot as plt
data = np.random.randint(100,size(1,1000))
print(data)
matplotlib.pyplot
plt.hist(list(data)),range=(0,100),bins=10
mean = data.mean()
print(mean)

exotic maple May 26, 2021, 12:56 AM

#

covert adder Please help! Generate a vector of 1000 random numbers between 0 to 100.•Plot a ...

In general the disc is not to solve homework problems :p but if you're having any problems with your code at a specific point we might help

#

what is your issue?

covert adder May 26, 2021, 12:57 AM

#

I did the homework. I just keep getting an error.

exotic maple May 26, 2021, 12:57 AM

#

please share that error

covert adder May 26, 2021, 12:58 AM

#

NameError: name 'matplotlib' is not defined

exotic maple May 26, 2021, 12:58 AM

#

uh, have you installed matplotlib?

#

you need to install the library first before using it

#

pip install matplotlib, or if using CONDA distrios, conda install matplotlib

covert adder May 26, 2021, 1:02 AM

#

it is installed

exotic maple May 26, 2021, 1:02 AM

#

then at that point you need to make sure you're using the right PATH

#

that is, that your Python install is pointing to the right direction

#

unfortunately that's something you need to do yourself

#

sometimes rebooting works, but first you need to make sure Path is ok

covert adder May 26, 2021, 1:09 AM

#

thank you

desert oar May 26, 2021, 1:20 AM

#

cedar sun Like, if i have 100 images, how many failures can i afford?

there is no formula that can tell you this. there are many factors involved: how similar is the mislabeled record to the correctly labeled records? how much variation is there in the correctly labeled records? what % of records are mislabeled?

#

there are some techniques that are specifically designed to adjust for mislabeled data, e.g. Gold Loss Correction which i've used with some success in the past

#

this also isn't specific to neural networks. the same reasoning applies for pretty much all statistics and machine learning

cedar sun May 26, 2021, 1:22 AM

#

mmmm

#

just for in case u know

#

if instead of searching "bulbasaur"

#

i search "*bulbasaur*"

#

with *

#

will i increase my ocurrences of bublasaur?

#

like in a normal user search. * mean only that

#

or thats what i think

desert oar May 26, 2021, 1:23 AM

#

i have no idea, it depends on what exactly you're searching

cedar sun May 26, 2021, 1:25 AM

#

nah, nvm, dont worry about it

#

i will search for a few images, and take a fast look, and if i see many missplaced imgs

#

i will look for that loss u mentioned above

tidal bough May 26, 2021, 1:52 AM

#

just at the screenshot you provided, there's also a picture with a faraway view of a street, and one with 3 pokemon

#

so a different evolution that looks quite similar isn't a big problem comparatively

cedar sun May 26, 2021, 1:59 AM

#

so

#

the street is a bigger problem?

#

#

lmao

#

#

This is the street image

#

there is a bulbasaur technically

velvet thorn May 26, 2021, 2:36 AM

#

hm I think that can be improved

#

but I'd need to have a more complete example

#

and if what you have works for your purposes then might as well go with it

median ember May 26, 2021, 2:36 AM

#

velvet thorn hm I think that can be improved

yeah, I thought that, maybe I can use outer with axis=X

median ember May 26, 2021, 2:37 AM

#

velvet thorn and if what you have works for your purposes then might as well go with it

it´s working, I´m still new to numpy, I will later try to make it more efficient

#

jesus, I sent you the wrong code

#

oh no, it was the right one

#

sorry

severe valve May 26, 2021, 2:56 AM

#

anyone ever feel like they've hit a wall when it comes to learning ML/NN? I really want to learn a lot about these fields so I can apply them to a future job in medical research but I just can't sometimes. It gets so boring and blunt. Everything feels too complex and involves so much math but I feel like if I don't learn ML/NN, then I won't be sought after job-wise. Data analytics, visualization, etc only takes you so far. Even if I try and push through this, all I get is a bunch of information that I can't apply leading me to rewatch the videos and get stuck in an endless cycle.

desert oar May 26, 2021, 3:03 AM

#

My advice: ditch the videos, use textbooks, spend some time learning the math.

#

With a good textbook, doing some exercises at the end of each chapter can be very important for learning.

#

At the same time, just start messing with data.

#

Do a bit of math, then forget all about math and just make some pretty plots, or fit some models.

crisp ruin May 26, 2021, 3:31 AM

#

sick name bro

severe valve May 26, 2021, 3:32 AM

#

so i've basically just been doing that last part, i've just messed around with a lot of models. but I've had absolutely zero idea what the model does other than on a high level. ( E.g CNN ~ image classifier. Linear regression ~ linear problem. etc ) But so far that hasn't gotten me very far and when I get errors or don't understand why my model is performing so bad I just have to stop because I have zero understanding of the subject.

#

and then when I go to actually learn it just becomes more and more difficult.

#

But I'll try and find some textbooks if I can and try doing the math

merry ridge May 26, 2021, 3:34 AM

#

I have a friend that easily started a summer job that eventually turned into a part time, then full time position just blind applying to the clinical research unit at my university with just a bachelors and no experience in medicine. It really helped her learn machine learning in a more meaningful way, but the Math was unavoidable. Her first month there was just reading a textbook on Markov chains.

severe valve May 26, 2021, 3:37 AM

#

exactly my point. I'd really love to apply ML ( as I learn best through application. I initially struggled with this in other programming languages before I found python ) but even when I try to apply it, it all just breaks down in front of me. But I guess the concepts behind ML are the most important for now, I'll definitely go look into textbooks for ML. Thank you everyone for your advice and time. :)

merry ridge May 26, 2021, 3:42 AM

#

I think this is a commonly mentioned book, I read Mathematics for Machine Learning and I felt like it was a very pleasant read and covered a good breadth of material. I probably wouldn't use it until you've had at least a first course in calculus and linear algebra though.

vapid patrol May 26, 2021, 3:52 AM

#

i am currently reading that book too, its a free book

hard hound May 26, 2021, 4:02 AM

#

There also a great book on ML by Ian goodfellow and and Yoshua Bengio

lapis sequoia May 26, 2021, 4:06 AM

#

distant phoenix I've import this lab as you suggested, but I've tried many ways to extract the i...

Sorry man, I was offline, did you solve the problem? I’d suggest using xpath to specify the location of the red box

#

//*[@name=“meslab”] like this

zealous tulip May 26, 2021, 7:43 AM

#

silver widget May 26, 2021, 8:09 AM

#

hi guys. got a question about kaggle house prices data; been investigating other solutions to improve my code and perspective. see sth like that

Getting the correlation of all the features with target variable.

(train.corr()**2)["SalePrice"].sort_values(ascending = False)[1:]

what is the reason for using train.corr()**2?

#

ops sorry double * makes the code bold

#

pow(train.corr(),2) is better to write here

#

oh silly me.. got it tnx anyways guys

sly salmon May 26, 2021, 8:46 AM

#

For neural networks, how do you get the partial derivative of the cost function?

For tensorflow models, is this hardcoded depending on the cost function you choose, like MSE?

polar stag May 26, 2021, 9:04 AM

#

is IBM data science certificate good on coursera? or you people can recommened me some good one, to mention, i'm new to data science.

distant phoenix May 26, 2021, 10:07 AM

#

lapis sequoia Sorry man, I was offline, did you solve the problem? I’d suggest using xpath to ...

Hello man!! Don't worry about it.
Is it possible to use xpath on beautifulsoup? I couldn't find yet

fallen trellis May 26, 2021, 11:27 AM

#

desert oar <@!521989840582213643> see above

Interesting, however I'm not sure this works for my case as the indices of the data are hidden in attributes, e.g., rows[0].idx.y . Or do you see a way, still?

desert oar May 26, 2021, 11:30 AM

#

fallen trellis Interesting, however I'm not sure this works for my case as the indices of the d...

There's no way to get the attributes in some vectorized form?

fallen trellis May 26, 2021, 11:31 AM

#

Unless you iterate over the entire dataset, no

desert oar May 26, 2021, 11:31 AM

#

Ah, then no

fallen trellis May 26, 2021, 11:31 AM

#

Even if, how would numpy handle loading the 5gig+ dataset?

cedar sun May 26, 2021, 12:28 PM

#

the problem isnt numpy

#

the problem is ur ram xd

#

u can read from a buffer i believe

#

Lets say, first 256 Mb of the data set, then the other 256, and so on

noble drum May 26, 2021, 1:34 PM

#

the more giant data you have to buffer, the more you should consider Dask

digital aurora May 26, 2021, 1:37 PM

#

R u into software engineering?

noble drum May 26, 2021, 1:37 PM

#

I only dabble.

digital aurora May 26, 2021, 1:38 PM

#

I see!

light merlin May 26, 2021, 1:52 PM

#

Where would be the best place to learn neural networks (preferably in python) for something like facial recognition?

velvet thorn May 26, 2021, 1:54 PM

#

fallen trellis Even if, how would numpy handle loading the 5gig+ dataset?

5 GB is not that big

#

also, memory mapping

grave frost May 26, 2021, 1:58 PM

#

5 GB is not that big
cries in 2Gb of memory

sly salmon May 26, 2021, 3:26 PM

#

Hey guys, gradient descent Q

I read this:

A larger learning rate leads to a faster learning process at a cost to be stuck in a suboptimal solution (local minimum). A smaller learning rate might produce a good suboptimal or global solution, but it will take it much longer to converge. In the extremes, a learning rate too large will lead to an unstable learning process oscillating over the epochs. A learning rate too small may not converge or get stuck in a local minimum.

I don't get it.
A larger learning rate may mean that you miss the global minimum and end up somewhere else, but why does it mean you are stuck?
while with a tiny learning rate, won't you most definitely be stuck in the first local minimum you get into?

velvet thorn May 26, 2021, 3:29 PM

#

sly salmon Hey guys, gradient descent Q I read this: A larger learning rate leads to a fa...

second question, yes

#

first question, possibly, depends

#

you might diverge

sly salmon May 26, 2021, 3:30 PM

#

diverge? as in, miss the global minimum?

velvet thorn May 26, 2021, 3:30 PM

#

no

#

diverge meaning increase without bound

sly salmon May 26, 2021, 3:32 PM

#

velvet thorn diverge meaning increase without bound

so... your loss is going to increase without bound? so you will never reach a minimum?

velvet thorn May 26, 2021, 3:32 PM

#

yes

sly salmon May 26, 2021, 3:34 PM

#

hmm 🤔 why would that be, wouldn't you always go down the path of the negative gradient, thus moving towards a lower loss all the time?
could you give a possible scenario for this? My idea would be a function like y=x^2, where if you have a large learning rate you always overshoot the minimum, but I don't think that explains the loss increasing indefinitely

tidal bough May 26, 2021, 3:39 PM

#

sly salmon hmm 🤔 why would that be, wouldn't you always go down the path of the negative g...

that's precisely the example

#

if you have a large enough learning rate, you jump from x to -x -a for some positive a

#

there the slope is higher, so next you jump to x + a + b where b>a...

#

and continue bouncing off the walls of the parabola, getting further and further away from the minimum

velvet thorn May 26, 2021, 3:40 PM

#

tidal bough and continue bouncing off the walls of the parabola, getting further and further...

whee

#

bouncy

#

...sorry I'll keep quiet now

sly salmon May 26, 2021, 3:41 PM

#

tidal bough if you have a large enough learning rate, you jump from `x` to `-x -a` for some ...

ooh, thanks. I didn't think of it like that

#

so in gradient descent, how do you actually know that you reached the minima - the gradient vector's parameters will all be 0? (so the gradients in each axis are 0 thus it's a minima)?

tidal bough May 26, 2021, 3:45 PM

#

Well, yes, that is basically the definition of a minima, though a simpler way is just checking that you haven't moved much this step

sly salmon May 26, 2021, 3:55 PM

#

ok gotcha. I also didn't fully understand why stochastic gradient descent is less susceptible to getting stuck in a local minima compared to batch gradient descent.

iirc, the formula for updating our weights is proportional to our losses, so:
new_weights = old_weights + learning_rate*(negative_gradient_vector * loss)

If stochastic gradient descent is less likely to get stuck in a minimum, that means that the loss has to be greater? But why is that the case? Surely, if you take the loss of one of your predictions (instead of your whole dataset), you are not guaranteed to have a greater loss so I would think it's unfair to say it's less likely to get stuck in a local minima.

Maybe you get lucky and SGD chooses a random point with a loss that is greater than your whole dataset's loss. Then I can see why it's less susceptible to getting stuck. But still, it's a bit "random" and is a chance. Is this why people say that?

tidal bough May 26, 2021, 3:56 PM

#

I'm not sure about this, but it might simply be that since it's nondeterministic, it'll eventually luck into a path out of a local minima, unlike deterministic ones that are definitely stuck

desert oar May 26, 2021, 3:57 PM

#

what do you mean by "stuck in a minimum"?

#

you want to get stuck in a minimum, that's the whole point of doing gradient descent

sly salmon May 26, 2021, 3:57 PM

#

i mean, a local minimum which may not be the global minimum

desert oar May 26, 2021, 3:58 PM

#

that's different from "not converging", which is what reptile was talking about (and what you were asking about) with batch gradient descent

#

gradient descent only ever finds local minima

grave frost May 26, 2021, 3:59 PM

#

I don't know, but a lot of things in ML are not theoretically backed - it's just found that in practice x works better than y and so on

sly salmon May 26, 2021, 3:59 PM

#

oh really?

#

I was just trying to think about how we can get ourselves out of a local minima and continue to a global minimum

#

so, when people say "stochastic gradient descent is less susceptible to getting stuck in a local minima", what does that mean?

desert oar May 26, 2021, 4:01 PM

#

sly salmon I was just trying to think about how we can get ourselves out of a local minima ...

gradient descent alone can't do that, to my knowledge

grave frost May 26, 2021, 4:02 PM

#

just curious, then how do we do that? adam and such?

desert oar May 26, 2021, 4:02 PM

#

sly salmon so, when people say "stochastic gradient descent is less susceptible to getting ...

i think this is because it bounces around more

#

https://stats.stackexchange.com/a/144631/36229

Cross Validated

How can stochastic gradient descent avoid the problem of a local mi...

I know that stochastic gradient descent has random behavior, but I don't know why.
Is there any explanation about this?

#

so the idea is that it's less likely to get stuck in a small local minimum because it might just skip over it

#

that's my understanding, at least

#

but ultimately it's still finding a local minimum, there's no guarantee (that i know of) that it's a global minimum

sly salmon May 26, 2021, 4:04 PM

#

ah I see, so just due to the random nature of SGD, it can randomly pick a prediction which has a high loss and makes you skip over say, a local minimum, and you might then get to a global minimum

#

but for batch, stochastic and mini-batch, they essentially all just converge at the first local minimum they find (most of the time)

#

so yeah, as @grave frost, what would you use then to find the global minimum?

#

what if the neural network re-runs with different initialized weights, multiple times, to try to find the global minimum

desert oar May 26, 2021, 4:07 PM

#

sly salmon what if the neural network re-runs with different initialized weights, multiple ...

people do in fact do this

#

you won't ever know that it's global

sly salmon May 26, 2021, 4:08 PM

#

yeah, I was thinking that, so there's essentially no way to know if its a global minimum?

#

or maybe you can differentiate the cost function and find each turning point, then you'd have an idea of which areas to check and one of them will be a global minimum

desert oar May 26, 2021, 4:09 PM

#

you can't ever know. you can compare loss values at 2 different local minima, but that's it

grave frost May 26, 2021, 4:10 PM

#

but if that is indeed the case, then why is it that changing the seed of the model does not yield much of an accuracy difference? does this imply 9/10 times a model does find a global minima?

sly salmon May 26, 2021, 4:10 PM

#

sly salmon or maybe you can differentiate the cost function and find each turning point, th...

would this not be a viable method to check all possible minimas?

desert oar May 26, 2021, 4:11 PM

#

yeah, because realistically there aren't that many minima, or different initializations don't have that much of an effect on which minimum is chosen

grave frost May 26, 2021, 4:12 PM

#

so even if different initializations get stuck on a local minima, then what? I use something different?

desert oar May 26, 2021, 4:12 PM

#

but you don't and can't know that they are local, non-global minima

grave frost May 26, 2021, 4:12 PM

#

desert oar but you don't and can't know that they are local, non-global minima

an accuracy jump/drastic loss decrease

#

would tell me I was in a local minima, doesn't it?

desert oar May 26, 2021, 4:14 PM

#

https://papers.nips.cc/paper/2018/file/a41b3bb3e6b050b6c9067c67f663b915-Paper.pdf
https://www.cwi.nl/events/cwi-scientific-meetings/ml.pdf
https://arxiv.org/abs/1704.08045
https://deepai.org/publication/understanding-the-loss-surface-of-neural-networks-for-binary-classification
there's some interesting and nontrivial research being done on this topic btw

sly salmon May 26, 2021, 4:19 PM

#

so why can't you just differentiate the cost function to find all the minimas then compare the losses between them? Because some cost functions can have infinite amount of minima?

desert oar May 26, 2021, 4:21 PM

#

the derivative of the cost function is the gradient

#

gradient descent is how we attempt to find a minimum

sly salmon May 26, 2021, 4:22 PM

#

hmm, so the cost function is not in the form (an example) y = f(x)?

desert oar May 26, 2021, 4:22 PM

#

huh?

#

back up

#

what do we do in order to fit a model

#

define a loss function
minimize the loss function

#

right?

sly salmon May 26, 2021, 4:23 PM

#

yes

desert oar May 26, 2021, 4:24 PM

#

so how do you propose to find all minima?

grave frost May 26, 2021, 4:24 PM

#

desert oar so how do you propose to find _all_ minima?

isn't that just the argmin?

desert oar May 26, 2021, 4:25 PM

#

grave frost isn't that just the argmin?

yes, but how do you find it? with gradient descent.

grave frost May 26, 2021, 4:25 PM

#

desert oar yes, but how do you find it? with gradient descent.

why can't we brute-force? surely it wouldn't be that slow

desert oar May 26, 2021, 4:25 PM

#

brute force how? re-initialize at 1000 different points and re-run gradient descent for each one?

sly salmon May 26, 2021, 4:26 PM

#

well... I might just be spouting rubbish... but, if you had the cost function y = f(x),
can't you differentiate it to get the gradient of each axis?

I guess then you would have to sub in numbers into each derivative so that all derivatives equal 0, and that would find you minimas

desert oar May 26, 2021, 4:26 PM

#

can't you differentiate it to get the gradient of each axis?
the gradient is the vector of partial derivates

sly salmon May 26, 2021, 4:26 PM

#

yes

grave frost May 26, 2021, 4:27 PM

#

desert oar brute force how? re-initialize at 1000 different points and re-run gradient desc...

if the objective is to get the global minima of the loss function, then surely the lowest value is the minima?

desert oar May 26, 2021, 4:27 PM

#

grave frost if the objective is to get the global minima of the loss function, then surely t...

yeah, sure. might be interesting as an academic exercise, but probably a total waste of time otherwise.

cedar sun May 26, 2021, 4:28 PM

#

how good is downloading a model that seems to work, download some random images cuz the data set used to train that model is gone, use that model to clean the data i downloaded, and use this cleaned data to train model for better results?

desert oar May 26, 2021, 4:28 PM

#

realistically models probably don't have that many minima

sly salmon May 26, 2021, 4:28 PM

#

grave frost if the objective is to get the global minima of the loss function, then surely t...

but what I think they're trying to say is that you don't explicitly know if it's the global minima, or just another local minima

sly salmon May 26, 2021, 4:28 PM

#

desert oar > can't you differentiate it to get the gradient of each axis? the gradient is t...

so the gradient is a vector of partial derivatives, are we not able to equal each derivative to zero?

grave frost May 26, 2021, 4:29 PM

#

sly salmon but what I think they're trying to say is that you don't explicitly know if it's...

but...the lowest value would be the global minima

sly salmon May 26, 2021, 4:29 PM

#

grave frost but...the lowest value *would* be the global minima

if we brute force everything, yeah I agree

desert oar May 26, 2021, 4:29 PM

#

grave frost but...the lowest value *would* be the global minima

yes, assuming you have in fact enumerated all local minima

#

(which i am not sure is even possible)

grave frost May 26, 2021, 4:29 PM

#

then? wouldn't brute forcing be faster?

desert oar May 26, 2021, 4:30 PM

#

brute forcing how? computing the derivative at "every" point?

grave frost May 26, 2021, 4:30 PM

#

desert oar brute forcing how? computing the derivative at "every" point?

some mathematical technique to single out potential candidate points first?

velvet thorn May 26, 2021, 4:30 PM

#

you’re basically talking about a grid search over the whole feature space

#

computationally intractable

desert oar May 26, 2021, 4:31 PM

#

grave frost some mathematical technique to single out potential candidate points first?

go invent one and publish it, i'm not aware of any (other than the various neural network initialization techniques that are currently known)

grave frost May 26, 2021, 4:32 PM

#

hmm...have NN's been tried to find faster alternatives to SGD?

sly salmon May 26, 2021, 4:32 PM

#

good talk, this community rocks 😎

desert oar May 26, 2021, 4:32 PM

#

@sly salmon derivative == 0 just means it's locally flat, could be a saddle point

velvet thorn May 26, 2021, 4:32 PM

#

sly salmon so the gradient is a vector of partial derivatives, are we not able to equal eac...

you mean try to solve the cost function analytically?

desert oar May 26, 2021, 4:33 PM

#

and yeah i think that's what they're proposing - solve analytically for all roots of the derivative and compare the loss at each one

velvet thorn May 26, 2021, 4:33 PM

#

you can’t do that because

desert oar May 26, 2021, 4:33 PM

#

i assume that's not possible

velvet thorn May 26, 2021, 4:33 PM

#

the function is overdetermined

#

like

#

okay imagine you have

sly salmon May 26, 2021, 4:33 PM

#

desert oar <@!812098613450506351> derivative == 0 just means it's locally flat, could be a ...

hmm, but can't we do that to find every minima?

velvet thorn May 26, 2021, 4:33 PM

#

3x + y = 6
x - y = -2

#

you can solve that

#

but if you have

cedar sun May 26, 2021, 4:34 PM

#

cedar sun how good is downloading a model that seems to work, download some random images ...

///<

velvet thorn May 26, 2021, 4:34 PM

#

3x + y = 6
x - y = -2
x + y = -3

#

there’s no consistent solution

#

to all those equations

#

now remember that

#

each set of feature values and target

cedar sun May 26, 2021, 4:35 PM

#

this is

velvet thorn May 26, 2021, 4:35 PM

#

forms one such equation

cedar sun May 26, 2021, 4:35 PM

#

rouche-frobenious (?)

#

or something like that

velvet thorn May 26, 2021, 4:35 PM

#

and you often have many more data points than features

grave frost May 26, 2021, 4:35 PM

#

regrets not learning fully about SGD

velvet thorn May 26, 2021, 4:35 PM

#

think about linear regression

#

you can’t draw a line

cedar sun May 26, 2021, 4:35 PM

#

In linear algebra, the Rouché–Capelli theorem determines the number of solutions for a system of linear equations, given the rank of its augmented matrix and coefficient matrix.

velvet thorn May 26, 2021, 4:35 PM

#

that goes through all points, right?

#

same concept

sly salmon May 26, 2021, 4:36 PM

#

velvet thorn that goes through all points, right?

correct

velvet thorn May 26, 2021, 4:36 PM

#

(basically)

sly salmon May 26, 2021, 4:36 PM

#

velvet thorn 3x + y = 6 x - y = -2 x + y = -3

hmm, so for this example, since there is no consistent solution we can't determine a minima? But, we assume that there is a minima there?

#

so we have to find it via some exploratory technique with gradient descent?

velvet thorn May 26, 2021, 4:36 PM

#

sly salmon hmm, so for this example, since there is no consistent solution we can't determi...

BASICALLY

#

yes

#

it’s late so I won’t go into the details but

#

think about it this way

#

take a piece of cloth

#

no matter how you contort it

grave frost May 26, 2021, 4:37 PM

#

but...can't we just solve for each 2 and average the solutions?

sly salmon May 26, 2021, 4:37 PM

#

alright, I appreciate it. really good talk I learnt a lot today!

velvet thorn May 26, 2021, 4:37 PM

#

it must have a minimum

#

a “lowest valley”

#

it’s a physical necessity

sly salmon May 26, 2021, 4:37 PM

#

velvet thorn 3x + y = 6 x - y = -2 x + y = -3

also, the solution for this, isn't it where all of the lines meet?

velvet thorn May 26, 2021, 4:37 PM

#

sly salmon also, the solution for this, isn't it where all of the lines meet?

those are lines

desert oar May 26, 2021, 4:37 PM

#

@sly salmon https://stats.stackexchange.com/q/212619/36229

Cross Validated

Why is gradient descent required?

When we can differentiate the cost function and find parameters by solving equations obtained through partial differentiation with respect to every parameter and find out where the cost function is

velvet thorn May 26, 2021, 4:37 PM

#

they do not meet at any single point

sly salmon May 26, 2021, 4:38 PM

#

okay, how does that relate to minimas?

#

or is that just an analogy

velvet thorn May 26, 2021, 4:38 PM

#

there are several points to make

#

okay let’s continue this another time?

#

bedtime for me

sly salmon May 26, 2021, 4:39 PM

#

velvet thorn okay let’s continue this another time?

yes, goodnight!

grave frost May 26, 2021, 4:39 PM

#

gn!

sly salmon May 26, 2021, 4:40 PM

#

also, what do you guys mean by solving the cost function "analytically"? I've never heard the term before

#

https://en.wikipedia.org/wiki/Mathematical_analysis

Mathematical analysis

Analysis is the branch of mathematics dealing with limits
and related theories, such as differentiation, integration, measure, infinite series, and analytic functions.These theories are usually studied in the context of real and complex numbers and functions. Analysis evolved from calculus, which involves the elementary concepts and techniques o...

#

i guess this answers that question

desert oar May 26, 2021, 4:49 PM

#

@sly salmon "analytically" means finding an exact solution by solving equations

#

i.e. "set the derivative equal to 0 and solve for x" is the analytical solution

#

as opposed to the numerical solution which doesn't require solving for the exact form

#

"analysis" in the sense of "real analysis" is a different thing

sly salmon May 26, 2021, 4:59 PM

#

i see. and yeah, that example given before:
3x + y = 6
x - y = -2
x + y = -3

that simultaneous equation could essentially be replaced by all my partial derivatives, and it may be impossible to find a consistent solution. Ig I could use it as an analogy to say that "there are no consistent values where it's a minimum", so we have to take the iterative "gradient descent" approach.

#

but if I do it that way, essentially I'm saying all of my gradient vectors will never meet at one point? Thus they are never going to equal the same value (0) where there's a minima? < I might be wrong there.

But then the question lies...
If there is no consistent solution analytically, how can there be a solution iteratively (via gradient descent)? Or maybe the answer is just an approximation, hmmm.

limpid oak May 26, 2021, 5:02 PM

#

Hello friends

#

need some help

worldly ruin May 26, 2021, 5:03 PM

#

Anybody know if pandas has expressed intent to port the package to arm for m1 macs?

somber prism May 26, 2021, 5:03 PM

#

can someone explain me why variance in ml referring to overfitting but in statistics its measuring the how much the data is spread from the mean ? i am little bit confused 😐

limpid oak May 26, 2021, 5:03 PM

#

I'm inserting data into DB using json files

#

with os.walk() but after some times speed decreased

#

any solution for this

desert oar May 26, 2021, 5:05 PM

#

somber prism can someone explain me why variance in ml referring to overfitting but in statis...

variance in ml referring to overfitting
it is not "referring to overfitting". the variance of a model is the variance of the model predictions.

#

@limpid oak you should ask this question in a help channel, and provide the code that you are using

#

or #databases

limpid oak May 26, 2021, 5:06 PM

#

thank you @desert oar

somber prism May 26, 2021, 5:06 PM

#

desert oar > variance in ml referring to overfitting it is not "referring to overfitting". ...

so meaning how far the predicted value will be from the actual testing value?

desert oar May 26, 2021, 5:09 PM

#

somber prism so meaning how far the predicted value will be from the actual testing value?

no. variance is always a measure of spread around a mean.

#

in the context of model overfitting/underfitting, people usually refer to the "variance" of the entire model-fitting procecure

#

imagine that you could randomly re-generate your data over and over, then fit your model on each version of the data

#

then you would have a probability distribution of models, more or less

#

see https://en.wikipedia.org/wiki/Bias–variance_tradeoff

Bias%E2%80%93variance_tradeoff

#

the definition of "variance" never changes

somber prism May 26, 2021, 5:14 PM

#

oh ok thanks

tulip ridge May 26, 2021, 5:41 PM

#

hey there.. anyone knows how to develop algorithm using python3

lapis sequoia May 26, 2021, 5:55 PM

#

I'm trying to get some data from wikipedia but wikipedia's data is so dirty so is there an easy way to clean it or is there another cleaner alternative to it?

bronze skiff May 26, 2021, 5:55 PM

#

aren't there like a billion wikipedia datasets out there

#

just google around

lapis sequoia May 26, 2021, 5:56 PM

#

no

unkempt lion May 26, 2021, 7:17 PM

#

anyone know a good module/api that can do multistep algebra like https://mathpapa.com/algebra-calculator.html

Algebra Calculator - MathPapa

Algebra Calculator shows you the step-by-step solutions! Solves algebra problems and walks you through them.

#

(ping me when u respond cuz ima be afk coding)

main kernel May 26, 2021, 7:20 PM

#

unkempt lion anyone know a good module/api that can do multistep algebra like https://mathpap...

maybe this ?

#

https://reference.wolfram.com/language/WolframClientForPython/

unkempt lion May 26, 2021, 7:23 PM

#

main kernel https://reference.wolfram.com/language/WolframClientForPython/

is it able to automatically detect things like something/something2 = something3/x but also normal equations with the same code or will i have to not be lazy and code all of that

main kernel May 26, 2021, 7:37 PM

#

no

#

mybe you can build this with https://www.serhii.net/blog/2018/02/18/experinces-jupyter-notebooks-pyplot-sympy/, but it not solve step by step

teal wadi May 26, 2021, 7:52 PM

#

hello

#

how do i get permission to talk ?

late shell May 26, 2021, 7:56 PM

#

Hello, can someone help me in simple linear regression. I have a feature total_spend and target sales. Now I scale this data and train my model and get the estimates for beta0 and beta1 such that Y = beta0 + beta1 * total_spend. But the beta's I have right now are estimated for the scaled data, so its somewhere between 0 and 1. But this is a problem because I cannot use these beta's for inference i.e to study the affect on sales by a one unit increase in total_spends. So how do I get my beta's back to my original scale?

ripe forge May 26, 2021, 8:53 PM

#

Save the scaling step as well. You have to apply the same scaling on inference also

#

Otherwise your model is pointless. It must be fed data with the same scaling for both train and inference

#

Once you do that, you'll realise your model is more like Y = beta0 + beta1 * f_scaling(total_spend)

#

That should let you do any analysis as you see fit

grave frost May 26, 2021, 10:17 PM

#

yeah, I would have chimed in to make a custom preprocessing layer if your pre-pro gets a but complex - but def not for linear regression

lapis sequoia May 26, 2021, 10:27 PM

#

any idea how can you sort an np array with strings that follows the same kind of sorting of linux file systems?

serene scaffold May 26, 2021, 10:46 PM

#

lapis sequoia any idea how can you sort an np array with strings that follows the same kind of...

How are linux file names sorted?

sly salmon May 26, 2021, 10:59 PM

#

Say you had a 1000 simultaneous equation with 20 variables. would solving each equation for a consistent solution be insanely computationally hard and long?

velvet thorn May 26, 2021, 11:52 PM

#

more like like

#

it depends

velvet thorn May 26, 2021, 11:52 PM

#

sly salmon Say you had a 1000 simultaneous equation with 20 variables. would solving each e...

is this related to your question last night

lapis sequoia May 27, 2021, 12:03 AM

#

hi

hallow sundial May 27, 2021, 12:48 AM

#

Is anybody free for a short call about a few questions about datascience and AI?

bronze skiff May 27, 2021, 2:27 AM

#

just post your questions no one wants to be called

ruby peak May 27, 2021, 2:50 AM

#

YEs

trim cobalt May 27, 2021, 2:56 AM

#

I have a quick question

#

Could I ask for some help with coming up with ideas for a future project. I am not a very creative person but I want a fun project to do with AI and CV

#

please ping me as I have this server muted

merry ridge May 27, 2021, 4:33 AM

#

Can anyone explain what is going on here? I can't figure out why my dataframe is giving me the wrong length.

main dome May 27, 2021, 4:41 AM

#

merry ridge Can anyone explain what is going on here? I can't figure out why my dataframe is...

it seems to be the right length?

#

the indices skip some numbers

merry ridge May 27, 2021, 4:41 AM

#

I realized it just as you typed that

#

Thank you, I spent way too much time looking for something else

main dome May 27, 2021, 4:42 AM

#

rippp

mint palm May 27, 2021, 6:29 AM

#

first time doing on pycharm instead of jupyter ....the output it correct but theres bunch of following red text...is it nothing to bother...?

#

polar stag May 27, 2021, 9:12 AM

#

guys, i'm new to this data science and i'm serious to have a career in it. can you guys suggest me books/courses or any vids to start it?

robust charm May 27, 2021, 9:26 AM

#

Has anyone here used the dlib library? Im try to make a face recognition program and im having a little issue

short heart May 27, 2021, 10:10 AM

#

The perfect amount of epochs would be the one that ends with the minimum loss?

late shell May 27, 2021, 10:44 AM

#

Hello, while doing a simple linear regression, using just 1 feature. My MSE keeps on increasing a lot by each epoch until python gives out overflow error. What could this mean? Why is MSE increasing?

limpid oak May 27, 2021, 11:00 AM

#

  try:
    shcSurveyNo = shcSurvey.split('/')[0] 
    
#     villDF['name_match'] = villDF['PIN1'].apply(lambda x: 'Match' if x==shcSurveyNo else 'Mismatch')

    if shcSurveyNo in villDF['PIN1'].unique():
      print(shcSurveyNo,'Yes')
      villDF['shc']=1
      
    else:
      villDF['shc']=0
      print(shcSurveyNo,'No')  
  except:
    print("Something went wrong!!!!!!!!")

#

what I'm missing here, please hel[

#

help

#

65 Yes
38 No
185 Yes
396 Yes
373 Yes

#

but in df its only show 1

hard hound May 27, 2021, 11:09 AM

#

Hey does anyone use any cloud service here for computing?

boreal summit May 27, 2021, 11:11 AM

#

I have a DataFrame in which I'm trying to count the number of times a certain string exist in a particular column. All the methods I've tried didn't work out.

#

For instance, in a DataFrame, under the name column, I'm trying to find rows that contain the word 'Mega', and count the total number of times the word appears.

short heart May 27, 2021, 11:14 AM

#

would training with many epochs, finding epoch with as less loss in the end, and limiting epochs to this amount be good?

hard hound May 27, 2021, 11:15 AM

#

@short heart I dont know much but when I increase epochs it decrease my loss and increases accuracy

#

@boreal summit Hey would you tell me a way you tried?

#

did you try count()?

boreal summit May 27, 2021, 11:17 AM

#

I already tried using **

**mm = data['name'].str.contains('mega')

Then I passed the Boolean above

#

Then I passed the Boolean to **data

#

It didn't work.

#

The logic didn't even work, so it didn't get to the count part.

#

I've also tried **str.find()

#

They seem to work online but not with what I'm doing ATM.

hard hound May 27, 2021, 11:19 AM

#

could send a screenshot?

boreal summit May 27, 2021, 11:20 AM

#

Okay

boreal summit May 27, 2021, 11:21 AM

#

hard hound could send a screenshot?

It just worked now, thanks. I guess I was doing something wrong and I didn't know.

hard hound May 27, 2021, 11:21 AM

#

Great

limpid oak May 27, 2021, 11:33 AM

#

limpid oak ```for shcSurvey in shcDF['surveyno'].unique(): try: shcSurveyNo = shcSurv...

anybody?

short heart May 27, 2021, 11:37 AM

#

Say, if Ive got a really small loss (2.0239e-04), but result is pretty bad, is that underfitting?

fiery cipher May 27, 2021, 12:12 PM

#

I have a question : can I use min max data normalization than use Z score normalization , in theory it would work well but I am not sure because I read it is recomanded to use only one normalization method

desert oar May 27, 2021, 1:32 PM

#

short heart Say, if Ive got a really small loss (2.0239e-04), but result is pretty bad, is t...

Maybe. The absolute size of the loss usually isn't interpretable. A practical example of underfitting would be predicting the mean for any input.

#

It's "under" fitting in that the model isn't representing/learning enough of the variation in the data

desert oar May 27, 2021, 1:34 PM

#

fiery cipher I have a question : can I use min max data normalization than use Z score normal...

Normally I recommend normalization when you know the bounds of the data, and standardization when you don't

short heart May 27, 2021, 1:34 PM

#

then, next question

#

is 6500 values enough to train

#

or 11000

desert oar May 27, 2021, 1:34 PM

#

It depends entirely on the data and the model

#

There is no magic number

short heart May 27, 2021, 1:34 PM

#

kind of stock price

lapis sequoia May 27, 2021, 1:34 PM

#

11000

desert oar May 27, 2021, 1:35 PM

#

How are you evaluating the model?

#

What is the model anyway?

short heart May 27, 2021, 1:35 PM

#

lstm layers

desert oar May 27, 2021, 1:35 PM

#

What kinds of features are there? Is it classification or regression? Etc etc

short heart May 27, 2021, 1:35 PM

#

with batch normalization layers and relu in between

hoary wigeon May 27, 2021, 1:36 PM

#

hey

desert oar May 27, 2021, 1:37 PM

#

short heart would training with many epochs, finding epoch with as less loss in the end, and...

There is a technique called "early stopping" which is intended to help prevent overfitting

hoary wigeon May 27, 2021, 1:37 PM

#

what is the case to drop the column with missing data ?

#

more than 90% missing value in that column ?

desert oar May 27, 2021, 1:38 PM

#

There is no rule or magic number for that either

hoary wigeon May 27, 2021, 1:38 PM

#

for what ?

desert oar May 27, 2021, 1:38 PM

#

Why is the data missing? Why did you want to include that column in the first place?

hoary wigeon May 27, 2021, 1:38 PM

#

like

#

i have 77% record with missing Age data

#

in dataset

desert oar May 27, 2021, 1:39 PM

#

Well subjectively that sounds like it might not be useful. But I don't know the specifics of your situation. Maybe that column is necessary and you need to do some more work

hoary wigeon May 27, 2021, 1:39 PM

#

its just for practise

#

it is about titanic

#

😆 .

desert oar May 27, 2021, 1:40 PM

#

in that case, this is a great opportunity to practice being smart about missing data

hoary wigeon May 27, 2021, 1:40 PM

#

someone told me when there is column with missing data over 90% just drop that column

desert oar May 27, 2021, 1:40 PM

#

don't attempt to follow or even invent strict rules for discarding data

#

it always depends on the situation

#

i happen to know that in the titanic dataset, age is important

hoary wigeon May 27, 2021, 1:41 PM

#

but what the use of column

#

when there is 90 data missing

desert oar May 27, 2021, 1:41 PM

#

but you don't know that up front

hoary wigeon May 27, 2021, 1:41 PM

#

so i must replace it with median

#

-_-

desert oar May 27, 2021, 1:42 PM

#

that's for you to figure out. maybe you can infer the data from somewhere else

#

or maybe its missingness or lack of missingness is itself a feature

hoary wigeon May 27, 2021, 1:42 PM

#

i know age matters there

desert oar May 27, 2021, 1:42 PM

#

maybe you can infer broadly a range of values from other data, even if you don't know the exact value

#

or maybe you just drop it and see what happens 🙂

hoary wigeon May 27, 2021, 1:42 PM

#

I have only option to replace it with median

#

mean is close to median

desert oar May 27, 2021, 1:44 PM

#

you might want to look into the different kinds of missing data.. "missing completely at random", "missing at random", and "not missing at random" (MCAR, MAR, and NMAR)

noble drum May 27, 2021, 1:44 PM

#

why is that your only option? pithink

velvet thorn May 27, 2021, 2:43 PM

#

hoary wigeon when there is 90 data missing

not necessarily

near cosmos May 27, 2021, 2:44 PM

#

And then forget about the missing at random idea because it's always a terrible assumption 😉

fleet dove May 27, 2021, 2:54 PM

#

say I have a robotic wheelchair controlled by eye movements, can I class the user as an actuator?

tidal bronze May 27, 2021, 2:56 PM

#

what could be reasons for pandas groupby to output me rows with duplicate keys they are being rouped on?

When I try to remove it with .drop_duplicates() the problem persist but the rows are clearly the same for those keys.

shadow knot May 27, 2021, 2:58 PM

#

hi, first timer here. i'd like to ask some general question regarding how you choose a machine learning algorithm to build a model, more specifically an image classification recognition problem

#

to my understanding, generally I would want to look at the data, judge its distribution, its features and go from there. But that answer seems too generalized and is there any format or "guidline" that i could follow?

tidal bronze May 27, 2021, 3:01 PM

#

https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

@shadow knot this could help you but imo the best is to know how each of these roughly function and from there you can choose the one best suited for your need. There ain't a magic formula for choosing and not a single right answer either most of the time

desert oar May 27, 2021, 3:05 PM

#

shadow knot hi, first timer here. i'd like to ask some general question regarding how you ch...

sometimes you can just depend on other people to tell you what to use 🙂 e.g. for image classification CNNs are dominant for a good reason

shadow knot May 27, 2021, 3:06 PM

#

i am executing an assignment in school and one of the criteria is to choose a number of base models and justify why I chose it over a dataset of 20,000 RGB images of size 27x27, essentially making its feature dimensionality up to 729 if im handling it by greyscale value of individual pixels

desert oar May 27, 2021, 3:06 PM

#

whereas for "social science" data usually there is no right answer and you might need to try several things

#

how many do you have to choose, and do you have to justify all of them or just the "best" one?

#

what constitutes a "model" in this case? are imagenet and resnet considered different models for the purposes of this assignment?

shadow knot May 27, 2021, 3:11 PM

#

desert oar how many do you have to choose, and do you have to justify all of them or just t...

dont have to choose a certain number, just have to justify why i chose it over other models.

I do have to justify all of them, but only to some degree. After hyper parameter tuning and feature selection, i am required to make an ultimate judgement and recommend a "best" one.

forgot to mention but I am not allowed to use pre-trained models, therefore i believe imagenet and resnet are out of bound. But if they werent, they would be considered 2 models.

#

right now i am going with Logistic Regression, Random Forest, CNN, SVM and KNN to cover all different "type" of algorithms

desert oar May 27, 2021, 3:24 PM

#

i'd recommend looking into kernel SVM, specifically radial basis kernel, which as far as i know was very popular before deep learning took over

#

knn is an interesting choice, because you specifically need to define what how to define the "distance" between images

shadow knot May 27, 2021, 3:26 PM

#

which brings me to my next question. Due to the number of features i have, i was thinking about dropping KNN since from what i've read, KNN effiency drops when you introduce a high dimensional dataset

cedar sun May 27, 2021, 3:29 PM

#

can i use threads with a neural network?

shadow knot May 27, 2021, 3:30 PM

#

the nature of the dataset is medical. the images are cell images and my two tasks consist of:

Classify if it's a cancerous cell
Classify if it's a specific type of cell

my original thought was i could use the "distance" metric for KNN instead of "uniform" since a cell of similar nature should have more "weight" when voting compared to a cell of completely different structure

desert oar May 27, 2021, 3:38 PM

#

shadow knot which brings me to my next question. Due to the number of features i have, i was...

i agree with this, look up "the curse of dimensionality" for why defining a distance metric on high-dimensional data is not necessarily helpful

#

as far as i understand, pre-deep-learning image classification depended heavily on special-purpose feature engineering

#

so you could theoretically still use KNN if you could significantly reduce the size of the feature space

shadow knot May 27, 2021, 3:41 PM

#

what's your thought on Logistic Regression?

#

regularization could help bring the size of the feature space down, which is my primary reason on selecting it as one of the base model

#

i will be implementing Bagging Random Forest as my feature selection technique so some feature removal will be done there but imo a model that could do that also is a plus right?

merry ridge May 27, 2021, 3:44 PM

#

Taking the Fourier transform and keeping only a few of the terms with the largest Fourier coefficients seems like the first strategy that would come to mind for me

shadow knot May 27, 2021, 3:49 PM

#

merry ridge Taking the Fourier transform and keeping only a few of the terms with the larges...

is this technique widely applied for image recognition since my brief googling effort only shows its usage for sound/speech application

desert oar May 27, 2021, 3:52 PM

#

yeah i would prefer something "intelligent" that reduces the feature space rather than somewhat-randomly discarding features

merry ridge May 27, 2021, 3:52 PM

#

To me an image is just sound with a higher dimension

desert oar May 27, 2021, 3:52 PM

#

people used to do all kinds of signal processing stuff for ML on images, and probably still do

cedar sun May 27, 2021, 3:53 PM

#

cedar sun can i use threads with a neural network?

^

shadow knot May 27, 2021, 3:53 PM

#

cedar sun ^

sorry if your question got lost in my wall of text

merry ridge May 27, 2021, 3:53 PM

#

I saw an example of this technique recently where they had multiple examples of biblical art with cherubs holding some kind of writing and used it to decode the text effectively

shadow knot May 27, 2021, 3:54 PM

#

merry ridge I saw an example of this technique recently where they had multiple examples of ...

i will read up on this technique, much appreciated

merry ridge May 27, 2021, 3:55 PM

#

You’re not really discarding random features though. It is analogous to PCA discarding eigenvectors with the lowest eigenvalue. Unless you consider that also discarding those eigenvectors is also discarding random features which is certainly reasonable

shadow knot May 27, 2021, 3:58 PM

#

what could be some of the performance metric that is generally good for this type of classification?

#

i know about the general one like accuracy, recall, precision and f1

grave frost May 27, 2021, 3:59 PM

#

overall, F.T is worth experimenting - but seeing the lack of use in real-world (imaging BTW) doesn't seem like a promisable candidate, than say PCA

#

which is also easier and more commonly used 🤷

shadow knot May 27, 2021, 4:00 PM

#

grave frost overall, F.T is worth experimenting - but seeing the lack of use in real-world (...

im really new to this domain so could you elaborate on what PCA is?

grave frost May 27, 2021, 4:00 PM

#

merry ridge To me an image is just sound with a higher dimension

Really? I wouldn't say like that - for instance the differenced b/w images and sound are huge

#

sound can be represented as ~~useless images~~ spectral features, but images can't be represented as sounds, can they?

merry ridge May 27, 2021, 4:03 PM

#

I’m just saying in a hand wavy way that many of the techniques used in dsp translate directly and nicely to image processing. I wouldn’t delve too deeply into a sound are pictures metaphor

grave frost May 27, 2021, 4:05 PM

#

fair enough

desert oar May 27, 2021, 4:06 PM

#

cedar sun ^

what would that even mean?

desert oar May 27, 2021, 4:07 PM

#

shadow knot what could be some of the performance metric that is generally good for this typ...

how many classes? f1 isn't a bad default, for a school project at least

#

you can also consider using a proper scoring rule like brier score, but neural networks tend to have very poor probability calculation

shadow knot May 27, 2021, 4:07 PM

#

desert oar how many classes? f1 isn't a bad default, for a school project at least

first one is binary, second one consist of 4 classes

cedar sun May 27, 2021, 4:09 PM

#

if i could have multiple threads predicting images

#

having the same model loaded

desert oar May 27, 2021, 4:10 PM

#

sure, although if the underlying model prediction code is already multi-threaded then you don't want to start mixing in your own threading

#

also in python specifically multi-threading for computation doesn't work well due to something called the "global interpreter lock"

#

so in python you really need to use processes for parallel computations, threads are good for parallel/concurrent i/o but not cpu-bound computation

cedar sun May 27, 2021, 4:12 PM

#

Mmmm

#

in english pls? xDDD

#

sorry i didnt understand. May i tell u my plan

#

and u tell me if it is doable?

desert oar May 27, 2021, 4:12 PM

#

yes, it helps if you are more specific

inland zephyr May 27, 2021, 4:12 PM

#

Hello i want to ask again about dummy dataset for face recognition using vector similiarities. Let said I have thousand vectors from thousand known person in my vector database. I have talk to someone outside that i need to add some dummy vectors with unkown class. But i still confused why i need to add some unknown dummy vectors to the known dataset? Is this for performance testing?

cedar sun May 27, 2021, 4:13 PM

#

Okey. I downloaded a bunch of pokemon images. I found on github a model that is supposed to predict pokemon images. I wanna use this model to clean the images i downloaded. Can i use threading to increase the speed?

#

nvm, from 95 images the model i downloaded fails on 71

#

easy peasy

#

i guess i have to manually clean the data :D

#

gl hf

#

Is good for the training model passing this image as Bulbasaur Label?

#

no, right?

desert oar May 27, 2021, 4:22 PM

#

cedar sun Okey. I downloaded a bunch of pokemon images. I found on github a model that is ...

use processes, not threads. see https://pypi.org/project/joblib, but consider that there might be significant overhead in loading the data multiple times or passing it between processes. joblib can help by caching/sharing numpy arrays and dataframes, but i find that it's not always reliable (and there's no easy way to explicitly tell joblib what to cache and what not to cache)

cedar sun May 27, 2021, 4:22 PM

#

i though about threads cuz

#

imagine i have the first 6 pokemons

#

if i have 3 threads,

desert oar May 27, 2021, 4:23 PM

#

inland zephyr Hello i want to ask again about dummy dataset for face recognition using vector ...

what do you mean by this? what's a "dummy" dataset? what is the data being used for?

cedar sun May 27, 2021, 4:23 PM

#

t1 checks pokemon[0], t2 pokemon[1] and t3 pokemon[2]. When they are done, t1 will move to pok[3], t2 to pok[4] and t3 [pok5]

#

threads for predicting, not training

desert oar May 27, 2021, 4:24 PM

#

there are 2 problems:

in python, 2 threads can't execute computations in parallel. this is a python-specific limitation.
the underlying machine learning library might already be using multithreaded computations.

#

so you want to explicitly turn off multithreading, group your data into batches, and have processes making predictions on their own batches

cedar sun May 27, 2021, 4:25 PM

#

mhmm okey, so no threads

grave frost May 27, 2021, 4:26 PM

#

cedar sun Okey. I downloaded a bunch of pokemon images. I found on github a model that is ...

a model that cleans pokemon images?

#

can you send me the repo?

cedar sun May 27, 2021, 4:26 PM

#

it doesnt clean xd

#

i just though about using that already trained model to predict the images i downloaded xd

#

if the prediction matches the class where i downloaded the image, then it is a good download xD

grave frost May 27, 2021, 4:28 PM

#

the probablity that a random model on github generalizes on general data is slim, but ehh

inland zephyr May 27, 2021, 4:30 PM

#

desert oar what do you mean by this? what's a "dummy" dataset? what is the data being used ...

i use it for image searching. The dataset is embedded vector from an image (512 x 1) and each vector represent one known image class. The dummy is the random 1 * 512 vectors with dummy numerical and unknown class

cedar sun May 27, 2021, 4:31 PM

#

https://github.com/AbdulAhadSiddiqui11/Pokemon-Image-Classifier

GitHub

AbdulAhadSiddiqui11/Pokemon-Image-Classifier

Its a convNet built upon InceptionV3 and trained on 928 pokemon classes. - AbdulAhadSiddiqui11/Pokemon-Image-Classifier

#

this is the one i found

#

loss: 0.1279 - accuracy: 0.9743 - validation loss: 0.9940 - validation accuracy: 0.7917

#

But failing 71 out of 95 doesnt fit that accuracy

#

xDDDD

inland zephyr May 27, 2021, 4:31 PM

#

so if I have 300 vector from 300 class image, i will put for example 600 dummy random vectors with unknown class. Anyway this is to calculate the performance, to know how much the vector search return unknown or wrong class. But i think it's can be work without the dummy, unless the vector of each class are pretty close distance

grave frost May 27, 2021, 4:32 PM

#

cedar sun But failing 71 out of 95 doesnt fit that accuracy

it's overfitted lol

#

didn't expect much either

inland zephyr May 27, 2021, 4:32 PM

#

cedar sun ``loss: 0.1279 - accuracy: 0.9743 - validation loss: 0.9940 - validation accurac...

whew thats pretty big loss too... and also sign of overfit

#

it's clearly overfit since the validation and train loss margin are too big (0.1 to 0.9)

cedar sun May 27, 2021, 4:33 PM

#

grave frost it's overfitted lol

yes this is what i expected

grave frost May 27, 2021, 4:33 PM

#

cedar sun yes this is what i expected

well, then how can you use it on your task?

cedar sun May 27, 2021, 4:33 PM

#

i cant

cedar sun May 27, 2021, 4:33 PM

#

cedar sun i guess i have to manually clean the data :D

:D

grave frost May 27, 2021, 4:33 PM

#

then why are you saying it you are using? 🤔

#

how does inference has to do with data cleaning?

cedar sun May 27, 2021, 4:33 PM

#

?

cedar sun May 27, 2021, 4:34 PM

#

cedar sun if the prediction matches the class where i downloaded the image, then it is a g...

.

grave frost May 27, 2021, 4:34 PM

#

that's not cleaning bruv

cedar sun May 27, 2021, 4:35 PM

#

it is

#

imagine the model works 100%

grave frost May 27, 2021, 4:35 PM

#

its more considered under general pre-processing

cedar sun May 27, 2021, 4:35 PM

#

https://gyazo.com/7b719d41d3eb9d50bcea7da7585524f2

Gyazo

#

see the 081

grave frost May 27, 2021, 4:35 PM

#

but anyways

cedar sun May 27, 2021, 4:35 PM

#

it is not a bulbsaur

#

then model will say it is venosaur

#

but that img in on bulbasaur class

grave frost May 27, 2021, 4:36 PM

#

don't really matter if the quantity of outliers is less

#

you can always compensate by robustness for the model

cedar sun May 27, 2021, 4:36 PM

#

cedar sun Is good for the training model passing this image as Bulbasaur Label?

anyway, ^

grave frost May 27, 2021, 4:37 PM

#

cedar sun Is good for the training model passing this image as Bulbasaur Label?

again, shouldn't be too much of an issue

cedar sun May 27, 2021, 4:38 PM

#

mmm

#

so may i use this model and do transfer learning with it??

#

even i have some images that wont match the class?

desert oar May 27, 2021, 4:48 PM

#

grave frost you can always compensate by robustness for the model

you can't always compensate for outliers... usually you can tolerate a few

cedar sun May 27, 2021, 4:49 PM

#

wait may i do?

#

https://gyazo.com/3bb2ec549fb88e5878f9402e8c0be1cb

Gyazo

desert oar May 27, 2021, 4:49 PM

#

cedar sun so may i use this model and do transfer learning with it??

if you are training on your own pokemon data, then it does seem like a good idea to use an existing pokemon image classifier for transfer learning

cedar sun May 27, 2021, 4:49 PM

#

This is how many images i have per class

cedar sun May 27, 2021, 4:50 PM

#

desert oar if you are training on your own pokemon data, then it does seem like a good idea...

well, not my own. I want it to predict any image

cedar sun May 27, 2021, 4:52 PM

#

desert oar if you are training on your own pokemon data, then it does seem like a good idea...

also, why u say this? everytime u wanna classify some images u start from some point. Imagenet for example. Isnt that transfer learning?

desert oar May 27, 2021, 4:53 PM

#

you are asking, why does it seem like a good idea?

cedar sun May 27, 2021, 4:53 PM

#

cuz even if that model sucks, it has already seen some pokemon images

grave frost May 27, 2021, 4:54 PM

#

desert oar you can't _always_ compensate for outliers... usually you can tolerate a few

you can - most of the times

desert oar May 27, 2021, 4:54 PM

#

depends on how many and how bad

grave frost May 27, 2021, 4:57 PM

#

desert oar depends on how many and how bad

you have about 20% in kaggle competitions

#

looks at cassava

cedar sun May 27, 2021, 4:58 PM

#

;(

desert oar May 27, 2021, 5:01 PM

#

cedar sun cuz even if that model sucks, it has already seen some pokemon images

right, and i said it was a good idea. you just answered your own question!

cedar sun May 27, 2021, 5:02 PM

#

oh

#

i read it doesnt seem

#

sorry, mb

icy python May 27, 2021, 5:10 PM

#

after learning basic python, if you want to learn ML and AI, where would you start?

lapis sequoia May 27, 2021, 5:11 PM

#

Tensorflow

#

NumPy and Scipy are famous

icy python May 27, 2021, 5:12 PM

#

I've heard of numPy

lapis sequoia May 27, 2021, 5:12 PM

#

you can even learn them

lapis sequoia May 27, 2021, 5:12 PM

#

icy python I've heard of numPy

It's simple

icy python May 27, 2021, 5:12 PM

#

How should I learn them though

lapis sequoia May 27, 2021, 5:12 PM

#

There are a lot of tutorials on youtube

#

and read documentation

icy python May 27, 2021, 5:13 PM

#

Okay, thank you

lapis sequoia May 27, 2021, 5:13 PM

#

🙂

#

apart from pandas, what other modules are useful for AI and data science?

desert oar May 27, 2021, 5:41 PM

#

icy python after learning basic python, if you want to learn ML and AI, where would you sta...

imo start learning probability and statistics. don't focus too much on learning fancy python libraries yet

#

there is probably a good "data science with python" book

icy python May 27, 2021, 5:43 PM

#

ok, i actually do need to learn basic python first i was asking for future reference but thank you!

desert oar May 27, 2021, 5:43 PM

#

https://jakevdp.github.io/PythonDataScienceHandbook/ this might be good for specifically learning the basic data science tools in python, once you are comfortable with the python basics

Python Data Science Handbook | Python Data Science Handbook

#

but it doesn't help you learn the stats or math

#

this might be a good one too, again for learning how to use python libraries like scikit-learn https://www.oreilly.com/library/view/introduction-to-machine/9781449369880/

O’Reilly Online Learning

Introduction to Machine Learning with Python

lapis sequoia May 27, 2021, 5:44 PM

#

can it be a basic level of probability and statistics?

desert oar May 27, 2021, 5:45 PM

#

lapis sequoia can it be a basic level of probability and statistics?

what do you mean by this? you should always start by learning the basics

desert oar May 27, 2021, 5:45 PM

#

lapis sequoia apart from pandas, what other modules are useful for AI and data science?

numpy, scipy, matplotlib + seaborn, scikit-learn, spacy and nltk (for text/nlp work), tensorflow/pytorch/jax, etc.

lapis sequoia May 27, 2021, 5:46 PM

#

i mean, do you have to learn advanced probability and statistics

desert oar May 27, 2021, 5:46 PM

#

yes, eventually

#

but it depends on what you mean by "advanced"

lapis sequoia May 27, 2021, 5:46 PM

#

desert oar numpy, scipy, matplotlib + seaborn, scikit-learn, spacy and nltk (for text/nlp w...

oh ok thx

desert oar May 27, 2021, 5:46 PM

#

do you need to fully understand the measure theoretic definition of probability? no

lapis sequoia May 27, 2021, 5:47 PM

#

oh kk

desert oar May 27, 2021, 5:48 PM

#

imagine a university graduate who got an A in calculus, probability, linear algebra, and statistics

#

if you have that level of training, you know enough

#

you don't need to learn all of it at once, of course

#

you should strive to gradually learn more of it over time

lapis sequoia May 27, 2021, 5:56 PM

#

ah ok

polar stag May 27, 2021, 5:57 PM

#

can someone recommend me a good book to start data science with python?

desert oar May 27, 2021, 5:57 PM

#

@polar stag i just recommended two of them above

polar stag May 27, 2021, 5:58 PM

#

got it. thanks

sharp reef May 27, 2021, 7:30 PM

#

Is there a way to keep a jupyter notebook running while it's not actively opened in the browser and keep the output?

uncut barn May 27, 2021, 7:40 PM

#

  def softmax(self):
    return np.exp(self.dataset) / np.sum(self.dataset)

#

#

does anyone know why the plot of my softmax looks wrong

#

the dataset only has 1 dimension

iron basalt May 27, 2021, 7:42 PM

#

sharp reef Is there a way to keep a jupyter notebook running while it's not actively opened...

File --> Export Notebook As... --> Executable Script then run the python file it gives you.

sharp reef May 27, 2021, 7:42 PM

#

iron basalt `File --> Export Notebook As... --> Executable Script` then run the python file ...

hmm I'd like to run it remotely though

#

I have my jupyter notebook set up on a remote server

iron basalt May 27, 2021, 7:43 PM

#

ssh into the server, run the file

sharp reef May 27, 2021, 7:43 PM

#

at that point what's the point of the notebook then?

iron basalt May 27, 2021, 7:43 PM

#

The point of notebook is exactly as the name implies, a notebook.

sharp reef May 27, 2021, 7:44 PM

#

If I ssh I'll have to make sure to run it such that it doesn't stop when I close the ssh session as well..

iron basalt May 27, 2021, 7:44 PM

#

It's useful for sharing things too.

iron basalt May 27, 2021, 7:44 PM

#

sharp reef If I ssh I'll have to make sure to run it such that it doesn't stop when I close...

use screen

sharp reef May 27, 2021, 7:44 PM

#

Basically I need it to stay up so I can run experiments on azure over night

#

I'm not sure if it's possible to submit experiments in a queue

#

that would make it easier

iron basalt May 27, 2021, 7:45 PM

#

What could be easier than making a script that does all this with one click?

#

https://www.howtogeek.com/662422/how-to-use-linuxs-screen-command/

How-To Geek

With the Linux screen command, you can push running terminal applications to the background and pull them forward when you want to see them. It also supports split-screen displays and works over SSH connections, even after you disconnect and reconnect!

#

(can tmux too)

sharp reef May 27, 2021, 7:47 PM

#

I know about screen

#

it's mostly about things being harder to edit if I have to sftp them in and out every time..

iron basalt May 27, 2021, 7:48 PM

#

So you want a remote editing tool?

#

That's just a text editor / IDE feature.

sharp reef May 27, 2021, 7:48 PM

#

yeah but it's not remote

#

I spent way too much time getting the notebook to run remote in the first place which kinda feels wasted now

#

I figured it would be possible since colab can keep sessions alive when a notebook isn't open

iron basalt May 27, 2021, 7:50 PM

#

So you have notebook running on a remote server with the --no-browser option?

sharp reef May 27, 2021, 7:51 PM

#

yeah

#

I access it via the browser to edit and run notebooks

iron basalt May 27, 2021, 7:53 PM

#

If you run some cell(s) it should just keep running.

#

Try running notebook server on your local machine: jupyter notebook --no-browser --port=8080, connect, make a new notebook, run an infinite loop that prints something, and exit. It should just keep running as long as the server is running.

sharp reef May 27, 2021, 7:56 PM

#

hmm yeah it keeps running when I close it but when I open the notebook again it restarts the kernel (?)

iron basalt May 27, 2021, 8:01 PM

#

sharp reef hmm yeah it keeps running when I close it but when I open the notebook again it ...

Yeah you need to keep the page open.

desert oar May 27, 2021, 8:01 PM

#

uncut barn ```py def softmax(self): return np.exp(self.dataset) / np.sum(self.dataset...

because that's not the definition of the softmax function

sharp reef May 27, 2021, 8:03 PM

#

iron basalt Yeah you need to keep the page open.

That's exactly what I'm trying to prevent

iron basalt May 27, 2021, 8:03 PM

#

sharp reef That's exactly what I'm trying to prevent

You can't it's a known issue that is open on github for about 5 years now. So it's not happening.

sharp reef May 27, 2021, 8:04 PM

#

iron basalt You can't it's a known issue that is open on github for about 5 years now. So it...

The weird part is that the kernel is shown as "running" when the tab is closed - it just restarts when it's reopened for some weird reason

iron basalt May 27, 2021, 8:04 PM

#

sharp reef The weird part is that the kernel is shown as "running" when the tab is closed -...

I think it means the editing session.

#

If you already ran something the output is still there.

sharp reef May 27, 2021, 8:05 PM

#

hmm I'll have to try it again

iron basalt May 27, 2021, 8:06 PM

#

If you want it to run while you are not connected, you need to download the python file which runs all the cells and do it manually.

#

But the manual is not too hard, just need to write a small script that uploads and runs it on a screen.

sharp reef May 27, 2021, 8:06 PM

#

No it's definitely still running in the background

#

it shows as running and when I run a new code cell it queues it

#

so it can keep the process running in the background

#

It doesn't seem to be able to track which cell is running though

iron basalt May 27, 2021, 8:07 PM

#

So when I ran a simple input echo loop that runs forever and came back the cell had finished running

sharp reef May 27, 2021, 8:07 PM

#

and the output is ofc not kept in the cell output but I could log to a file instead

iron basalt May 27, 2021, 8:08 PM

#

The previous input output echos were still there

sharp reef May 27, 2021, 8:08 PM

#

iron basalt So when I ran a simple input echo loop that runs forever and came back the cell ...

It looked like that to me at first too but it was just not shown as running

#

the tab still shows the hour glass icon and trying to run another cell queues it instead of running it immediately

iron basalt May 27, 2021, 8:09 PM

#

Idk when I came back to the notebook it was no longer asking for input which means the loop has stopped.

sharp reef May 27, 2021, 8:13 PM

#

Yea I verify it, I ran a loop that sleeps for ~30 seconds, closed the tab, opened the notebook again and queued another code cell

#

it showed as queued and executed a few seconds later

#

so in principal it seems to work, it just doesn't recognize which code cell is currently running and stops printing the output

#

maybe that's good enough if I log to a file instead

grave frost May 27, 2021, 8:47 PM

#

yea, I just check the tab logo to see if something is running\

#

but for output, logging is the simplest way

sharp reef May 27, 2021, 8:52 PM

#

grave frost but for output, logging is the simplest way

Are you using the builtin python logging library for that?

grave frost May 27, 2021, 8:52 PM

#

sharp reef Are you using the builtin python logging library for that?

I mostly train models only, and TF/PT have a built in logger as an optional callback

sharp reef May 27, 2021, 8:53 PM

#

I store epoch-level stats in the checkpoint itself

#

the logging would be more for azure stuff

steep rapids May 27, 2021, 9:01 PM

#

Hey everyone, I'm trying to determine the average treatment effect for a problem I've been working through. To achieve this, I'm using dowhy + econML. When using the ForestDRLearner that is a part of these packages, the results I'm getting back for an average treatment effect are waaaaaay bigger than they should be. Does anyone know what could be causing this?

For example, range of outcome variable is [-10, 10], binary treatment, ATE is -50

grave frost May 27, 2021, 9:04 PM

#

how much? down-payment? 😛

lapis sequoia May 27, 2021, 9:07 PM

#

I want to learn AI in University, I need a guidance please Help me!

#

Bsc Artificial Intelligence

cinder lantern May 27, 2021, 9:20 PM

#

hey i got some trouble with pandas, anybody out here?

#

trying to get EMA5 from ccxt api with pandas, but its giving me some diff return values

#

import ccxt
import pandas as pd

exchange = ccxt.binance({ 'enableRateLimit': True })
ohlcv = exchange.fetch_ohlcv(symbol = 'DOGE/USDT', timeframe='1h', limit=5)
data = map(lambda x: [ x[4]], ohlcv)
df = pd.DataFrame(data, columns = ['close'])
df['ewm'] = df['close'].ewm(span=5, min_periods=0, adjust=False, ignore_na=False).mean()
print(df)

desert oar May 27, 2021, 9:25 PM

#

hm, that looks like the right usage to me. what were you expecting that you didn't get?

cinder lantern May 27, 2021, 9:26 PM

#

lemme send some samples

desert oar May 27, 2021, 9:26 PM

#

that'd be great

cinder lantern May 27, 2021, 9:26 PM

#

output:
0 0.33208 0.332080
1 0.33461 0.332923
2 0.33247 0.332772
3 0.33088 0.332141
4 0.33400 0.332761

#

following some screenies

#

thats the 3rd index

#

id expect around 0.33354 on the second value

#

but it says 0.332141

#

just to be sure that the blue line indicates EMA5

#

i have update the whole script on my first message ^

desert oar May 27, 2021, 9:33 PM

#

thanks for the updated code

#

                 close  close_ewm
timestamp                        
1622134800000  0.33208   0.332080
1622138400000  0.33461   0.332923
1622142000000  0.33247   0.332772
1622145600000  0.33088   0.332141
1622149200000  0.33494   0.333074

so this is what i got

cinder lantern May 27, 2021, 9:33 PM

#

ye same as i did, thanks

desert oar May 27, 2021, 9:34 PM

#

im not sure what im looking at with these screenshots

cinder lantern May 27, 2021, 9:34 PM

#

lemme screen the whole area

desert oar May 27, 2021, 9:34 PM

#

0.332141 should be 3.4something?

#

they might be doing something subtly different with their ewma calculation

cinder lantern May 27, 2021, 9:34 PM

#

cinder lantern May 27, 2021, 9:35 PM

#

cinder lantern

this is the close price from that hour, which refers to the first column, index 3

desert oar May 27, 2021, 9:36 PM

#

what does the (3,5) indicate?

cinder lantern May 27, 2021, 9:36 PM

#

cinder lantern

this is the value that the close_ewm should be

#

those are the 2 EMA indicators that i put on

#

EMA3 and EMA5

desert oar May 27, 2021, 9:36 PM

#

oh, that's the value for the 3-period and 5-period versions

#

i see

#

are they using a different definition of "period"?

cinder lantern May 27, 2021, 9:37 PM

#

yes EMA5 would be calculated over the last 5 periods of 1 hour

desert oar May 27, 2021, 9:41 PM

#

what kinds of timestamps are these? i thought they were unix timestamps but then they're dates 3000 years in the future

cinder lantern May 27, 2021, 9:41 PM

#

UTC timestamp in milliseconds, integer

desert oar May 27, 2021, 9:41 PM

#

ahh

#

that makes more sense

cinder lantern May 27, 2021, 9:42 PM

#

i tried it with timestamps, but the order seems right

#

i tried to reverse the array tho, but that shouldnt be

desert oar May 27, 2021, 9:44 PM

#

btw here is how i loaded the data

import ccxt
import pandas as pd

exchange = ccxt.binance({ 'enableRateLimit': True })
ohlcv = exchange.fetch_ohlcv(symbol = 'DOGE/USDT', timeframe='1h', limit=5)
df = pd.DataFrame(ohlcv, columns = ['timestamp', 'open', 'high', 'low', 'close', 'volume'])
df.set_index('timestamp', inplace=True)
df.sort_index(inplace=True)

so yes they are definitely in ascending order

#

df['close_ewm'] = df['close'].ewm(span=5).mean()
print(df[['close', 'close_ewm']])

this gives me

                 close  close_ewm
timestamp                        
1622134800000  0.33208   0.332080
1622138400000  0.33461   0.333598
1622142000000  0.33247   0.333064
1622145600000  0.33088   0.332157
1622149200000  0.33494   0.333225

but i can't get the 2nd-to-last one to be something that rounds up to 0.34

limpid raven May 27, 2021, 9:44 PM

#

imnot sure if this is the correct place but hi guys, im trying to plot my own trendline onto this graph how do i do it

desert oar May 27, 2021, 9:45 PM

#

limpid raven imnot sure if this is the correct place but hi guys, im trying to plot my own tr...

what would a trend line be in this case? this looks very non-linear

limpid raven May 27, 2021, 9:46 PM

#

the exponential line you see has been converted into a linear line which has the value of 0.6559.. etc and i want to plot it over it, so so i could see the two overlapping

limpid raven May 27, 2021, 9:46 PM

#

desert oar what would a trend line be in this case? this looks very non-linear

the exponential line you see has been converted into a linear line which has the value of 0.6559.. etc and i want to plot it over it, so so i could see the two overlapping

desert oar May 27, 2021, 9:46 PM

#

so you want to plot a line with slope 0.6559? what is the y intercept, 0?

cinder lantern May 27, 2021, 9:47 PM

#

desert oar ```python df['close_ewm'] = df['close'].ewm(span=5).mean() print(df[['close', 'c...

its about the same that i already got, still not near

desert oar May 27, 2021, 9:47 PM

#

yeah @cinder lantern im not sure. maybe they are doing something slightly different with their calculation

cinder lantern May 27, 2021, 9:47 PM

#

ey alr, might google further

#

thanks anyways tho

#

lookin at this for 2 hrs alr

limpid raven May 27, 2021, 9:48 PM

#

desert oar so you want to plot a line with slope 0.6559? what is the y intercept, 0?

im not sure, i have used the line equation to get that value so if i use another y value wouldnt it make the line inaccurate? if thats not the case, ill try it rn!

#

it didnt work 😦 unless i did it wrong

#

#

i dont know if there is a y intercept, i got this line by using the straight line equation, y2-y1/x2-x1

cinder lantern May 27, 2021, 9:54 PM

#

lol i got it

#

man im dumb af

#

@desert oar

#

we could have seen this from our testresults

#

     close      ema3      ema5
0  0.33876  0.338760  0.338760
1  0.34095  0.339855  0.339490
2  0.34117  0.340512  0.340050
3  0.33920  0.339856  0.339767
4  0.33584  0.337848  0.338458

#

see how 0 is the same while having a different span. its cuz it had no history to calculate with....

#

i only queried 5 periods, but for the first period, i have to query 4 more backwards in time

cedar sun May 27, 2021, 10:14 PM

#

can i train a model on colab with files on my local machine?

desert oar May 27, 2021, 10:16 PM

#

cinder lantern i only queried 5 periods, but for the first period, i have to query 4 more backw...

Yeah, I suspected it was something like that

desert oar May 27, 2021, 10:17 PM

#

limpid raven i dont know if there is a y intercept, i got this line by using the straight lin...

plot first argument should just be the 2 x values, second argument should be the 2 y values

iron basalt May 27, 2021, 10:46 PM

#

sharp reef It looked like that to me at first too but it was just not shown as running

https://stackoverflow.com/questions/32539832/keep-jupyter-notebook-running-after-closing-browser-tab

Stack Overflow

Keep Jupyter notebook running after closing browser tab

I use Jupyter Notebook to run a series of experiments that take some time.
Certain cells take way too much time to execute so it's normal that I'd like to close the browser tab and come back later....

#

%%capture output

#

Code doesn't stop on tab closes, but the output can no longer find the current browser session and loses data on how it's supposed to be displayed, causing it to throw out all new output received until the code finishes that was running when the tab closed.

#

(Also why my input loop stopped, it could not fetch input anymore)

broken stratus May 27, 2021, 11:07 PM

#

Does anyone know if there is a way to scrape fb data

south gull May 27, 2021, 11:10 PM

#

sure there is

#

i'd say that's #web-development

desert oar May 28, 2021, 12:12 AM

#

@broken stratus discussion of scraping facebook would be against our server rules, sorry.

#

that's because it violates the facebook terms of use, and we are not allowed to help with that

vital apex May 28, 2021, 3:30 AM

#

limpid raven

You are just plotting a,b and these are points. If you want a line calculate trendline = a*x + b and plt.plot(x,trendline) .

torpid ember May 28, 2021, 4:19 AM

#

hey guys i need to manually map specific values with another value. Its like 55 records with specifically different (no logic) mapped valued. i have two alternatives im thinking about:

make a dictionary with 55 values and associated values, or
just put them in two columns and match them up via a .csv or .xlsx file.

Just wondering if theres a good practice for stuff like this? What would you recommend?

opaque stratus May 28, 2021, 5:39 AM

#

Hey

#

I am training a Keras model

#

a CNN for sentence classification

#

TensorFlow tells me my GPU is available, but how can I discretely see if the GPU is being utilized during training?

iron basalt May 28, 2021, 5:46 AM

#

opaque stratus TensorFlow tells me my GPU is available, but how can I discretely see if the GPU...

OS? GPU?

opaque stratus May 28, 2021, 5:50 AM

#

fixed!

opaque stratus May 28, 2021, 6:39 AM

#

!code

arctic wedgeBOT May 28, 2021, 6:39 AM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

sleek otter May 28, 2021, 6:51 AM

#

the select query keeps on giving error 1241. Operand should contain 1 column. Would anyone here know where the problem might be?

ripe forge May 28, 2021, 7:28 AM

#

torpid ember hey guys i need to manually map specific values with another value. Its like 55 ...

Maybe both? Keep it as a csv since that's easy to write. Load it in memory from csv to a dictionary for use in code. If the count was smaller I might not have used csv but 55 seems like a large enough number

inland zephyr May 28, 2021, 8:25 AM

#

guys does anyone know an python project about face classification from photo which one real face or it was fake (taken using 2nd phone/monitor) using any kind of method as reference?

short heart May 28, 2021, 8:37 AM

#

Whats the purpose of Dense layer?

inland zephyr May 28, 2021, 8:41 AM

#

short heart Whats the purpose of Dense layer?

its to flatten the n-dimension feature into single dimension one

#

to wrap up the entire convolutions layers

worthy bear May 28, 2021, 10:29 AM

#

how to increase the fig size????? #help-burrito #help-bread #help-avocado

#

help me please....

#

i tried all tricks..

#

the fig size is not increasing

#

heeeloooooooo

#

please advice...

lapis sequoia May 28, 2021, 10:37 AM

#

maybe try making it horizontal?

low hornet May 28, 2021, 10:49 AM

#

Oh I see, your graph is tiny and no, I don't know how to make it bigger, sorry

grave frost May 28, 2021, 12:36 PM

#

So...I was thinking about gradient descent

#

suppose we have a simple equation where the variables are the weights for the network, and the equation is the loss function. so we would basically want to locate argmin

#

but instead of using SGD the whole time, why can't graph it?

so like we take n samples of different random weights - and we visually graph it (not the TF graph). we store it in a data structure, say the weights in one column and the output loss in the other. As we store the values in the data strucuture, we build a visual graph of it as we go.

now, after we try n different combinations of weights, we see where all the local minimas in the visual graph lie.
(Obviosuly we won't compute it for the whole domain, only certain number of specific values. )
lets call n ---> resolution of the graph. Thus, with a decent enough resolution, we can atleast guess where the global minima might be.

Thus, we take the guess of the weights that might correspond to a minima, and then we do SGD on it. so basically to initialize the weights and biases closer to a guess of a global minima.

on the graphs, mathematically we can calculate minima if suppose we have a 3-D loss plane. then a point where surrounding points would be greater than that point, would be the local minima

we do this a few times (which would take milliseconds) and then we would have a quite good initialization for the weights of the NN.

why don't we do this?

winged stratus May 28, 2021, 1:03 PM

#

grave frost but instead of using SGD the whole time, why can't graph it? so like we take `n...

something very similar to this is what metahueristics like simulated annealing do - they randomly take some weights and explore multiple minimas and hope to find a global minima. in theory this sounds good but in practice metahueristics suck at training neural networks.

also, finding the global minima isn't necessarilly the best thing, it could be overfit to that data. a sufficient local minima that generalizes well is enough

ripe forge May 28, 2021, 1:04 PM

#

Simply because it won't be as simple and as you assume. These functions can get gnarly. And we don't really use sgd as is, we use it to teach sure, but we usually use some clever tricks on top of sgd (look up Adam or adaboost)

winged stratus May 28, 2021, 1:06 PM

#

as Darr said, randomly selecting weights and letting sgd do its work isn't very efficient. the loss space is so huge that 99.99% of the time it would be better to start from a single random weight let it train for longer

late shell May 28, 2021, 1:17 PM

#

If my data only has 1 feature, is feature scaling still required?

grave frost May 28, 2021, 1:22 PM

#

late shell If my data only has 1 feature, is feature scaling still required?

yes

late shell May 28, 2021, 1:44 PM

#

😔 But I though scaling makes sense when there are atleast 2 features out of scale, can you give me an explanation as to why is it needed with just 1 feature?

teal nova May 28, 2021, 2:18 PM

#

can anyone explain mcmc

#

i get the monte carlo part, i also know what markov chains are, but i dont get how u put them together and how it works

#

like markov chains of parameters? what does that even mean

grave frost May 28, 2021, 2:32 PM

#

late shell 😔 But I though scaling makes sense when there are atleast 2 features out of sca...

it won't be as sensitive to variations in data that may be actually small, but numerically large.

#

but if you are doing linear regression, then it doesn't matter

lament stag May 28, 2021, 3:02 PM

#

Can you help me? Why does the accuracy remain constant in the epoch results in the cnn model?

desert oar May 28, 2021, 3:04 PM

#

teal nova can anyone explain mcmc

The super tldr: there are ways to construct a markov chain such that the equiliibrium distribution of the markov chain is a particular probability distribution. The really cool (and useful) part is that you can do this without knowing the exact analytical form of the distribution function. This enables us to fit and sample from complicated Bayesian models for which computing the exact form of the distribution function (especially the normalizing constant) would be intractable or impossible.

#

This general category of algorithms is called "Markov chain Monte Carlo". Typical MCMC algorithms include Metropolis-Hastings, Gibbs Sampling, Hamiltonian Monte Carlo, and the No U-Turn Sampler.

desert oar May 28, 2021, 3:07 PM

#

lament stag Can you help me? Why does the accuracy remain constant in the epoch results in t...

Vanishing gradient?

desert oar May 28, 2021, 3:08 PM

#

worthy bear how to increase the fig size????? <#696888596006830201> <#776182336609845269> ...

try fig.set_figheight and fig.set_figwidth, those always work for me

#

however there could be other issues here, it looks like the legend is very big but the main plotting axis is not

cedar sun May 28, 2021, 3:37 PM

#

do jpg files of mxn pixels have the same quality as if it was png?

soft viper May 28, 2021, 3:38 PM

#

any good paper for image processing?

tidal bough May 28, 2021, 3:40 PM

#

cedar sun do jpg files of mxn pixels have the same quality as if it was png?

PNG is a lossless compression format, JPEG is a lossy one.

cedar sun May 28, 2021, 3:41 PM

#

so answer is no?

tidal bough May 28, 2021, 3:41 PM

#

JPEG quality depends on the settings - the higher the compression, the more it butchers the image

#

if quality is important, use PNG

#

Well, PNG is the most common one; others exist:
https://en.wikipedia.org/wiki/Lossless_compression#Raster_graphics

cedar sun May 28, 2021, 3:43 PM

#

well, my pokemon data set was full on png format. it was 4GB i guess. now i changed so that images with 3 channels are saved as jpg. size reduced by half

#

but idk if a png of 3 channels has the same quality of a jpg :D

#

I mean, i did this cuz i need to upload the dataset to drive :(

tidal bough May 28, 2021, 3:44 PM

#

nah, JPEG can compress better because it introduces artifacts

cedar sun May 28, 2021, 3:45 PM

#

isnt jpg = jpeg?

tidal bough May 28, 2021, 3:45 PM

#

same, yes

cedar sun May 28, 2021, 3:45 PM

#

ok ok

tidal bough May 28, 2021, 3:45 PM

#

I mean better than PNG

#

you can play around with saving images to JPEG, and comparing them with the originals

#

here's an example

cedar sun May 28, 2021, 3:46 PM

#

anyway, now that u mentioned artifacts, i guess is good having artifacts on ur dataset, so model trains better

#

o.O

#

ah! also... i have another question

tidal bough May 28, 2021, 3:46 PM

#

cedar sun anyway, now that u mentioned _artifacts_, i guess is good having _artifacts_ on ...

https://en.wikipedia.org/wiki/Compression_artifact

Compression artifact

A compression artifact (or artefact) is a noticeable distortion of media (including images, audio, and video) caused by the application of lossy compression. Lossy data compression involves discarding some of the media's data so that it becomes small enough to be stored within the desired disk space or transmitted (streamed) within the available...

cedar sun May 28, 2021, 3:46 PM

#

sometimes... when u convert png to jpg, background has weird things... and my model works with 3 channels

#

so those shiity backgrounds... may i preprocess them?

tidal bough May 28, 2021, 3:48 PM

#

not sure what you mean by weird things

cedar sun May 28, 2021, 3:48 PM

#

wait

desert oar May 28, 2021, 3:49 PM

#

cedar sun sometimes... when u convert png to jpg, background has weird things... and my mo...

those "weird things" are the compression artifacts

cedar sun May 28, 2021, 3:49 PM

#

no

#

i dont mean that

#

discord can open png, so it looks like this

#

if i open this image as RGB

#

no alpha

#

it looks like this

#

https://gyazo.com/d70c148d764a63c7e34d2a9fd2dbdc68

Gyazo

#

it is because the background has color, but since alpha channel is there, it isnt being painted

desert oar May 28, 2021, 3:51 PM

#

yep, (255, 255, 0, 0) looks the same as (255, 0, 255, 0) because they're both fully transparent

cedar sun May 28, 2021, 3:51 PM

#

so should i do some kind of preprocessing like if alpha is 0 then paint the pixel full white

#

or something?

desert oar May 28, 2021, 3:51 PM

#

yeah i was about to suggest that

cedar sun May 28, 2021, 3:51 PM

#

okey

#

dammit

tidal bough May 28, 2021, 3:52 PM

#

Yeah, you need to blend the alpha-channel into the image, since JPEG doesn't support RGBA

#

from PIL import Image
import io, numpy as np
fake_file = io.BytesIO()
img = Image.open(r"D:\Programming\1200px-Typescript_logo_2020.png")
img = img.convert("RGB")
img.save(fake_file,format="jpeg",quality=10)
fake_file.seek(0)
img2 = Image.open(fake_file)
# calculate difference:
arr1 = np.array(img)
arr2 = np.array(img2)
diff = np.abs(arr1.astype(np.int32)-arr2.astype(np.int32)).sum(axis=2)
diff = diff*255/np.max(diff)
diff = diff.astype(np.uint8)

diff_img = Image.fromarray(diff)
diff_img.show()

here's an example of compression artifacts

#

original

desert oar May 28, 2021, 3:53 PM

#

@tidal bough is there an "intelligent" way to do the alpha channel blending? i assume hard coding to white could cause problems if there are light colored or white objects in the image

tidal bough May 28, 2021, 3:53 PM

#

difference with result

cedar sun May 28, 2021, 3:54 PM

#

tidal bough difference with result

difference between png and jpg u mean?

cedar sun May 28, 2021, 3:54 PM

#

desert oar <@!266216750876459008> is there an "intelligent" way to do the alpha channel ble...

it shouldnt since the only pixels full transparent are the background

tidal bough May 28, 2021, 3:54 PM

#

this is a bad example because this is a very simple image and can be compressed well, but you can see that at the edge of letters and at the rounded edges, there are differences between original and compressed

cedar sun May 28, 2021, 3:54 PM

#

ok ok, i see

tidal bough May 28, 2021, 3:55 PM

#

desert oar <@!266216750876459008> is there an "intelligent" way to do the alpha channel ble...

I don't know of one. You need to select a background color and then mix it and the image depending on alpha. Perhaps you can somehow detect what color isn't present in the image

desert oar May 28, 2021, 3:55 PM

#

cedar sun it shouldnt since the only pixels full transparent are the background

what if the pokemon has white hair or something?

cedar sun May 28, 2021, 3:55 PM

#

the edge of the hair is gonna be black

#

look, this is what ive done in the past

#

def black_white(image):
    return np.where(image[:, :, 3] == 0, 255, 0).astype('uint8')```

#

https://gyazo.com/a2f6a9c9ed463d329ec822ed6b887fa1

Gyazo

#

eevee has hair tho

#

i think u mean hair could have some transparency, but some != totally transparent

#

So i only care about the pixels with alpha value == 0

#

maybe the eevee pixels have transparency 100, or maybe 85, or 1. i dont care. What i know is alpha = 0 -> no pokemon

#

but if u dont trust this 100%, an intelligent way is using a neural network that extracts the object on the picture and its mask :)

#

salient object detection or something like this that is this called

#

there is a model called u2net

#

https://github.com/xuebinqin/U-2-Net

GitHub

xuebinqin/U-2-Net

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection." - xuebinqin/U-2-Net

fast dune May 28, 2021, 4:10 PM

#

I know I’m butting into the conversation. I recently did image processing for a class (not AI; just regular image processing).

#

Always use PNG for processing. If the user input file is a JPEG, convert it to PNG before operating on it.

#

As for alpha channel problem (Bulbasaur example) you will need to replace the color of the image background with the platform background color.

#

Thankfully, the image background is usually one solid color (white or close to white).

tidal bough May 28, 2021, 4:16 PM

#

oh, there exist default background colors for platforms?

#

how'd you get that?

cedar sun May 28, 2021, 4:19 PM

#

what is a platform?

#

framework?

fast dune May 28, 2021, 4:21 PM

#

No, platform like your GUI program. Discord dark mode uses gray.

cedar sun May 28, 2021, 4:22 PM

#

ah

rose cipher May 28, 2021, 4:22 PM

#

Hey guys, I would like to start in DS and ML, but I am not good at math. What topics of math is used in DS and ML?

#

Do you guys have some material to help me with math?

cedar sun May 28, 2021, 4:22 PM

#

yeah but, for the bulbasaur above, if i pass it to my model for training, model will see green tones on the background

#

cuz it will remove the alpha

tidal bough May 28, 2021, 4:22 PM

#

for ML, a ton of linear algebra and some basic calculus (derivatives, etc)

#

for DS, well, statistics and the probability theory required for it

rose cipher May 28, 2021, 4:23 PM

#

Can you guys recommend me some books to learn math?

#

One last thing. Is geometry used in ML? I just HATE GEOMETRY

tidal bough May 28, 2021, 4:24 PM

#

I mean, not really, unless you count linear algebra as such

fast dune May 28, 2021, 4:24 PM

#

@cedar sun Unfortunately I didn’t do it with ML so I don’t know that answer. Just remember that an image of size 400x500 is always 400x500 unless you physically crop it. Which you don’t because that’s annoyingly hard. Therefore, every pixel in that dimension needs a numerical value.

rose cipher May 28, 2021, 4:25 PM

#

I hate the fact that I love computer world but I have to learn math to understand

tidal bough May 28, 2021, 4:25 PM

#

(heh, I mostly remember disliking geometry in high school because it was too... nonobvious - like, you had to find a way to solve a problem, what equations to write, as opposed to just doing some mindless math like in algebra or even physics)

short heart May 28, 2021, 4:25 PM

#

does anybody know by chance whats the best accuracy anybody has ever gotten in predicting forex/stocks?

rose cipher May 28, 2021, 4:25 PM

#

Do you guys think that Khan Academy is a good place do learn math?

tidal bough May 28, 2021, 4:25 PM

#

linear algebra of the kind you'll be using in ML isn't really geometry-related

tidal bough May 28, 2021, 4:25 PM

#

rose cipher Do you guys think that Khan Academy is a good place do learn math?

sure, it has nice calculus

rose cipher May 28, 2021, 4:26 PM

#

but it´s not the best right?

cedar sun May 28, 2021, 4:26 PM

#

i think ML needs more about calculus than algebra tho

#

back propagation is basically the heart of ML

#

and thats calculus