#data-science-and-ml | Python | Page 388

misty flint Mar 20, 2022, 1:03 AM

#

imblearn? are your classes that imbalanced? that may be the reason tbh

safe viper Mar 20, 2022, 1:41 AM

#

misty flint imblearn? are your classes that imbalanced? that may be the reason tbh

yeah they are quite imbalanced, but even using imblearn the best i am getting is 50%.

#

It's just so discouraging when you feel like you can't do anything to solve the problem.

#

The field feels like such a black box

misty flint Mar 20, 2022, 1:48 AM

#

yep ×100

#

kekHands

safe viper Mar 20, 2022, 1:49 AM

#

I'm only in second year of college. I really wanted to go into ML but this like broke that. Sorry im like ranting but its just so annoying.

#

Maybe just the more data science part of it is not for me

#

arg

misty flint Mar 20, 2022, 1:49 AM

#

you shouldnt get discouraged

#

these are tough problems and it doesnt mean you arent good at it

#

im a grad student and i basically get the same results as you

#

kekHands

misty flint Mar 20, 2022, 1:51 AM

#

safe viper I'm only in second year of college. I really wanted to go into ML but this like ...

also this is just one domain, NLP. IRL, you usually dont have to reinvent the wheel again aka only using sklearn

#

DoggoKek

#

im reinventing the wheel again in my DL class with minitorch

#

and sometimes it feels like im just banging my head against the table

#

kekHands

misty flint Mar 20, 2022, 2:11 AM

#

https://www.quantamagazine.org/machine-learning-reimagines-the-building-blocks-of-computing-20220315/

Quanta Magazine

Machine Learning Reimagines the Building Blocks of Computing | Quan...

Traditional algorithms power complicated computational tools like machine learning. A new approach, called algorithms with predictions, uses the power of machine learning to improve algorithms.

#

interesting read talking about applications in machine learning + cybersecurity

pseudo wren Mar 20, 2022, 2:20 AM

#

Yes I am

urban prism Mar 20, 2022, 2:30 AM

#

I have an object detection problem with two types of images. Cards and random images without cards. Cards have their bounding boxes and the other doesn't. I only have the images for my model to be able to classify that there aren't any cards in the image. How should I keep this data (I have to turn them into TFRecord files later on because I have to use Tensorflow Object Detection API) I have the images as numpy files such as ìd-1.npy and am keeping their bounding box informations in a dictionary {"id-1":[],"id-2":[448,123,343,532]} (In this case id-1 is one of the random images, therefore it doesn't have any bounding box information -so empty list). How should I go with this?

misty flint Mar 20, 2022, 2:58 AM

#

https://twitter.com/felipehoffa/status/1503910665144061959

Felipe Hoffa (@felipehoffa)

I'm proud of this one 💪

graceful glacier Mar 20, 2022, 3:57 AM

#

hello

#

i have a question most likely related to numpy

#

so currently i am tasked with creating a mock schedule for an nba team and here are the rules

#

#

im having issues with the second bullet point

tiny osprey Mar 20, 2022, 3:59 AM

#

you can include a EarlyStopping parameter as a callback to monitor if validation loss is not improving over a set number of epochs

graceful glacier Mar 20, 2022, 4:00 AM

#

essentially i need to create a column in pandas that generates either a 0 or a 1 but it has to generate exactly 6 1's

#

how can i accomplish this^^^?

misty flint Mar 20, 2022, 4:03 AM

#

have you tried doing it on paper first

misty flint Mar 20, 2022, 4:13 AM

#

graceful glacier essentially i need to create a column in pandas that generates either a 0 or a 1...

something like this?

#

!e

import numpy as np
x = np.ones((6,1))
print(x)

arctic wedgeBOT Mar 20, 2022, 4:14 AM

#

@misty flint :white_check_mark: Your eval job has completed with return code 0.

001 | [[1.]
002 |  [1.]
003 |  [1.]
004 |  [1.]
005 |  [1.]
006 |  [1.]]

graceful glacier Mar 20, 2022, 4:15 AM

#

how can i add this randomly to a matrix of zeros then?

#

a column of zeros to be more specific

misty flint Mar 20, 2022, 4:16 AM

#

pretty sure you can stack them

#

lets see

#

!e

import numpy as np
a = np.array([0,0,0,0,0,0])
b = np.array([1,1,1,1,1,1])
c = np.column_stack([a,b])

print(c)

arctic wedgeBOT Mar 20, 2022, 4:19 AM

#

@misty flint :white_check_mark: Your eval job has completed with return code 0.

001 | [[0 1]
002 |  [0 1]
003 |  [0 1]
004 |  [0 1]
005 |  [0 1]
006 |  [0 1]]

radiant trout Mar 20, 2022, 6:11 AM

#

graceful glacier essentially i need to create a column in pandas that generates either a 0 or a 1...

lets say ur col is of size 20,

#

col=np.zeros(20)

#

col[np.random.choice(np.arange(20),6)]=1

elfin copper Mar 20, 2022, 6:13 AM

#

Hey

#

Anyone have experience in Reinforcement learning

thorn pier Mar 20, 2022, 8:40 AM

#

Hi everyone, I have a project about " scannen results of corona test " which algo should i use ? KNN or SVM ?

tacit basin Mar 20, 2022, 9:50 AM

#

thorn pier Hi everyone, I have a project about " scannen results of corona test " which alg...

Try both see which gives better results

wicked grove Mar 20, 2022, 10:18 AM

#

Hello, im training a vgg 19 model and im using k fold cross validation and want to plot the validation accuracy

#

Should i use this ?

#

https://inria.github.io/scikit-learn-mooc/python_scripts/cross_validation_validation_curve.html

#

or this https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-fashion-mnist-clothing-classification/

Machine Learning Mastery

Deep Learning CNN for Fashion-MNIST Clothing Classification

The Fashion-MNIST clothing classification problem is a new standard dataset used in computer vision and deep learning. Although the dataset […]

eager wedge Mar 20, 2022, 11:12 AM

#

tiny osprey you can include a EarlyStopping parameter as a callback to monitor if validation...

ok thx

tacit basin Mar 20, 2022, 11:28 AM

#

wicked grove Hello, im training a vgg 19 model and im using k fold cross validation and want ...

can you show example of what you desired output to be?

wicked grove Mar 20, 2022, 11:29 AM

#

tacit basin can you show example of what you desired output to be?

Something like this but for all my folds

tacit basin Mar 20, 2022, 11:29 AM

#

wicked grove Something like this but for all my folds

just for one epch point?

#

so like if you train for 10 epochs, at each epoch you will have different validation loss or other metric, you can only use last epoch, that's a choice. just something to remember that neural nets can overfit with too many epochs and you will see overfitte results, where some previouse epochs could be better. unless you use early stopping or saving best model checkpint

#

so yeah with early stopping or saving best checkpint you could just plot one loss va lue / metric per training

#

then you could just append the results to a list and calucate mean, standard devaition etc

unique tartan Mar 20, 2022, 11:33 AM

#

I don't even have money hahah !

tacit basin Mar 20, 2022, 11:34 AM

#

these are greate (and free) courses:
https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/
course.fast.ai

FUN MOOC

Machine learning in Python with scikit-learn

Build predictive models with scikit-learn and gain a practical understanding of the strengths and limitations of machine learning!

unique tartan Mar 20, 2022, 11:35 AM

#

I have just heard it before , but I don't know what are its functions

#

Thanks

wicked grove Mar 20, 2022, 11:36 AM

#

wicked grove Mar 20, 2022, 11:37 AM

#

tacit basin so like if you train for 10 epochs, at each epoch you will have different valida...

Yess i am using early fitting

#

And i want for each fold the last epoch's accuracy and loss curve to be plotted

wicked grove Mar 20, 2022, 11:39 AM

#

wicked grove

@tacit basin how do i get this kinda graph kfold cross validation

tacit basin Mar 20, 2022, 11:41 AM

#

wicked grove <@490342783572246538> how do i get this kinda graph kfold cross validation

what do you want on x and y axis?

wicked grove Mar 20, 2022, 11:43 AM

#

tacit basin what do you want on x and y axis?

Accuracy vs the fold

#

And loss vs the fold

tacit basin Mar 20, 2022, 11:44 AM

#

so you would just add the accuracy and losses for each fold to say list and then plot that. would that work?

#

you are probably interested in average value and st deviation for example as well?

#

something like that:

#

#

https://inria.github.io/scikit-learn-mooc/python_scripts/parameter_tuning_grid_search.html

odd meteor Mar 20, 2022, 12:15 PM

#

unique tartan I don't even have money hahah !

Some ML courses on Udemy are pocket-friendly you can get something good with less than $50 there.

Alternatively, you can try free courses and Bootcamp. Check Dphi bootcamp you might like it.

https://dphi.tech/courses#exploreCourses

DPhi - Democratizing Data Science Learning

Learn Data Science for free through courses, practice on real-world datasets and discuss with leading domain experts and data science enthusiasts.

steady basalt Mar 20, 2022, 12:43 PM

#

unique tartan I have just heard it before , but I don't know what are its functions

Hahaha don’t say u know tensorflow if u can’t use it 😂

wicked grove Mar 20, 2022, 1:14 PM

#

tacit basin so you would just add the accuracy and losses for each fold to say list and then...

spmething like this?

#

but i can't understand the plot i am getting

#

for i in range(len(histories)):
        # plot loss
        plt.subplot(211)
        plt.title('Cross Entropy Loss')
        plt.plot(histories[i].history['loss'], color='blue', label='train')
        plt.plot(histories[i].history['val_loss'], color='orange', label='test')
        # plot accuracy
        plt.subplot(212)
        plt.title('Classification Accuracy')
        plt.plot(histories[i].history['accuracy'], color='blue', label='train')
        plt.plot(histories[i].history['val_accuracy'], color='orange', label='test')
plt.show()```

wicked grove Mar 20, 2022, 1:16 PM

#

tacit basin you are probably interested in average value and st deviation for example as we...

but this is only needed for test set right?

tacit basin Mar 20, 2022, 1:25 PM

#

wicked grove but this is only needed for test set right?

pkt.scatter(folds, accuracies)

wicked grove Mar 20, 2022, 1:25 PM

#

tacit basin pkt.scatter(folds, accuracies)

Oh okay thank you

wicked grove Mar 20, 2022, 1:26 PM

#

wicked grove ```py for i in range(len(histories)): # plot loss plt.subplot(21...

I used this

#

And got this graph

#

@tacit basin but idk if this is correct

tacit basin Mar 20, 2022, 1:26 PM

#

What graph did you get?

wicked grove Mar 20, 2022, 1:27 PM

#

wicked grove Mar 20, 2022, 1:27 PM

#

tacit basin What graph did you get?

This

#

Like does it mean it's underfitting cause the curves fall flat

tacit basin Mar 20, 2022, 1:53 PM

#

wicked grove Like does it mean it's underfitting cause the curves fall flat

You had 50 folds?

#

Or colors are folds? And x axis epochs?

wicked grove Mar 20, 2022, 2:00 PM

#

tacit basin Or colors are folds? And x axis epochs?

x axis is epochs, the orange is val accuracy and blue is train accuracy

tacit basin Mar 20, 2022, 2:01 PM

#

Nice

serene scaffold Mar 20, 2022, 2:05 PM

#

@slim frigate ask here.

wicked grove Mar 20, 2022, 2:06 PM

#

tacit basin Nice

But i cant understand the graph

#

Im getting kinda confused

#

If it's underfitting cause it the loss curves look flat

tacit basin Mar 20, 2022, 2:07 PM

#

It kind of starts overfitting from epoch 6 ish

#

At least on that blue orange curve that start at high loss

#

Not sure why the other folds start at low losses, maybe something with the setup?

wicked grove Mar 20, 2022, 2:16 PM

#

tacit basin Not sure why the other folds start at low losses, maybe something with the setup...

Oh like?🙈

#

When i read these graphs

#

Am i only supposed to look at the loss the curve or the accuracy one as well

tacit basin Mar 20, 2022, 2:18 PM

#

Yes both. Loss is for model to adjust weights, accuracy is human read able metric,

steady basalt Mar 20, 2022, 2:27 PM

#

Anyone know in pandas how to return only rows which meet condition

#

.where and Iloc giving errors

#

I want to say where the columns are saying True, or 1 would also work as I converted to binary

tidal bough Mar 20, 2022, 2:36 PM

#

one usually does something like df[df["country"] == "my cool country"] or whatever.

#

the idea is that you get a boolean Series from comparisons, and you can index with it.

odd meteor Mar 20, 2022, 2:37 PM

#

steady basalt I want to say where the columns are saying True, or 1 would also work as I conv...

Try this : df[df['A'] == True]

wicked grove Mar 20, 2022, 2:38 PM

#

tacit basin Yes both. Loss is for model to adjust weights, accuracy is human read able metri...

So in the case of accuracy the train curev should be below the val curve?

slim frigate Mar 20, 2022, 2:39 PM

#

serene scaffold <@!691427365926076416> ask here.

data = {
"Joe": {
"math": 65,
"science": 78,
"english": 98,
"gym": 89
},
"Bill": {
"math": 55,
"science": 72,
"english": 87,
"gym": 95
},
"Tim": {
"math": 100,
"science": 45,
"english": 75,
"gym": 92
},
"Sally": {
"math": 30,
"science": 25,
"english": 45,
"gym": 100
},
"Jane": {
"math": 100,
"science": 100,
"english": 100,
"gym": 60
}
}

i want to add the values with inputs like a app In order not to keep edit the dictionary every time

serene scaffold Mar 20, 2022, 2:41 PM

#

slim frigate data = { "Joe": { "math": 65, "science": 78, "englis...

I don't understand the question, sorry

radiant trout Mar 20, 2022, 2:46 PM

#

slim frigate data = { "Joe": { "math": 65, "science": 78, "englis...

are u talking about GUI's like Tkinter or something?

manic tangle Mar 20, 2022, 2:50 PM

#

maybe a very broad question, but how do I decide whether I should try and write an algorithm or use an ai / machine learning approach for a task?

serene scaffold Mar 20, 2022, 2:53 PM

#

manic tangle maybe a very broad question, but how do I decide whether I should try and write ...

if the problem you're trying to solve can be solved with an exact procedure, don't use AI. if you can't, but you have training data, you can use AI.

slim frigate Mar 20, 2022, 3:11 PM

#

slim frigate data = { "Joe": { "math": 65, "science": 78, "englis...

@serene scaffold like this

def info():
name = input('Your name ? : ')
math = input('Your math degree ? : ')
science = input('Your science degree ? : ')
english = input('Your english degree ? : ')
gym = input('Your gym degree ? : ')
info()

serene scaffold Mar 20, 2022, 3:16 PM

#

slim frigate <@!253696366952316929> like this def info(): name = input('Your name ? : ')...

you mentioned over DMs that this would be a pandas question, but it would appear that it is not.

slim frigate Mar 20, 2022, 3:18 PM

#

serene scaffold you mentioned over DMs that this would be a pandas question, but it would appear...

because i'm trying to export the data as excel but i can pass this step

serene scaffold Mar 20, 2022, 3:18 PM

#

slim frigate because i'm trying to export the data as excel but i can pass this step

try using a general help channel; see #❓｜how-to-get-help

slim frigate Mar 20, 2022, 3:18 PM

#

thanks

misty flint Mar 20, 2022, 3:46 PM

#

manic tangle maybe a very broad question, but how do I decide whether I should try and write ...

exactly like stelercus said. google's first rule of ML: if you can solve a problem without ML, do it. DoggoKek

manic tangle Mar 20, 2022, 3:53 PM

#

serene scaffold if the problem you're trying to solve can be solved with an exact procedure, don...

mm I guess I should just try an algorithmic approach first, I think it maybe makes more sense for my problem anyways. thank you for the advice

random sapphire Mar 20, 2022, 4:09 PM

#

I just made a youtube video about working with image data in python for anyone interested in image processing for computer vision / machine learning. Let me know if you have feedback: https://www.youtube.com/watch?v=kSqxn6zGE0c

YouTube

Medallion Data Science

Working with Image Data in Python

In this video I show how to work with image data in python! Using the popular python packages matplotlib and opencv you will learn how to open image data, how the data is formatted, some ways to manipulate the data and save it off in a different format. If you enjoy you can also check out my live twitch streams (below). Image data is extremely p...

▶ Play video

tacit basin Mar 20, 2022, 4:28 PM

#

wicked grove So in the case of accuracy the train curev should be below the val curve?

Train loss / accuracy are usually better than test/valid. You don't want to see valid/ test loss / accuracy getting much worse and going in wrong direction as you see on your graph

prisma mist Mar 20, 2022, 4:43 PM

#

refactored and rerunning a training set that took 40 mins last time... this should be fun

wicked grove Mar 20, 2022, 4:48 PM

#

tacit basin Train loss / accuracy are usually better than test/valid. You don't want to see ...

I didn't get you

tacit basin Mar 20, 2022, 4:53 PM

#

wicked grove I didn't get you

https://en.m.wikipedia.org/wiki/File:Overfitting_svg.svg

File:Overfitting svg.svg

#

This is the point where model started overfitting

#

Red curve ( test/valid) started going in 'the wrong' direction

#

While train (blue) decreases

prisma mist Mar 20, 2022, 5:09 PM

#

a SVM is using 100 percent of a core for training for > 20 mins... anyone know how to make it multi-threaded?

#

austere swift Mar 20, 2022, 5:13 PM

#

wicked grove Like does it mean it's underfitting cause the curves fall flat

underfitting happens when both training and validation accuracy are low, meaning the model doesnt have the complexity needed to fit the data

#

when they fall flat that just means it converged

#

which means its done training

wicked grove Mar 20, 2022, 5:14 PM

#

tacit basin https://en.m.wikipedia.org/wiki/File:Overfitting_svg.svg

Oh okayy, thank you so much

wicked grove Mar 20, 2022, 5:15 PM

#

austere swift when they fall flat that just means it converged

Yeah so i can't understand for kfold cross validation how can i evaluate the model when there are like 5 folds

wicked grove Mar 20, 2022, 5:16 PM

#

wicked grove

This is the graph

#

And the loss looks like it is flattening

wicked grove Mar 20, 2022, 5:17 PM

#

tacit basin https://en.m.wikipedia.org/wiki/File:Overfitting_svg.svg

Should i then plot the curve only using the average like you had suggested

steady basalt Mar 20, 2022, 5:18 PM

#

Has anyone here used the ukbiobank, urgent research help needed

#

@serene scaffold perhaps?

gentle swift Mar 20, 2022, 5:20 PM

#

Some amazing tweets from data twitter this week! Take a look 👇
https://twitter.com/moderndatastack/status/1505589000240701440?t=hEmXC9_15YzJX6nEwrfY1A&s=19

Modern Data Stack 🇺🇦 (@moderndatastack)

Data Twitter this Week!

Bringing you some of the amazing tweets from data twitter for this week. If we missed any amazing tweets or tweet threads (we know we did!), add them in the thread below👇
#datatwitter #moderndatastack

tacit basin Mar 20, 2022, 6:10 PM

#

wicked grove Should i then plot the curve only using the average like you had suggested

that's an option too. but i wonder why other plots in your graph start with such low error. is this correct?

upper spindle Mar 20, 2022, 6:49 PM

#

#

i got the correct cwd but still comes out with an error

#

error is FileNotFoundError: [Errno 2] No such file or directory: 'Downloads\\sp_500_stocks(1).csv'

prisma mist Mar 20, 2022, 6:50 PM

#

upper spindle

try using the full path. your cwd suggests otherwise. also.. not important but if you're using windows use raw-strings for less chances of errors

upper spindle Mar 20, 2022, 6:53 PM

#

i tried full path but still comes out with the same error

stone marlin Mar 20, 2022, 7:03 PM

#

You could try something like

os.path.expanduser("~/Downloads/whatever.csv")

mint palm Mar 20, 2022, 7:05 PM

#

is embedding just encoding?

odd meteor Mar 20, 2022, 7:06 PM

#

random sapphire I just made a youtube video about working with image data in python for anyone i...

I've just subscribed. Nice Channel.

Happy early congratulations on hitting 1k subscribers. 🎉🎉🎉

You should easily hit 1k subs if about 60 more people subscribe to your channel.

Hey guys let's get RoblksCube to 1k. Consider subscribing to his YouTube channel 🙏🙏

https://youtube.com/channel/UCxladMszXan-jfgzyeIMyvw

YouTube

Medallion Data Science

Medallion Data Science is a channel devoted to growing a community of people interested in learning machine learning, data science and coding in python. Also streaming live coding sessions on twitch as Medallion Stallion https://www.twitch.tv/medallionstallion_

prisma mist Mar 20, 2022, 7:19 PM

#

someone pls help... i'm utterly stuck... trying to use pandas.concat feature

for i in enumerate(col_name):
    column_name = col_name[i[0]]
    pearson_coef, p_value = stats.pearsonr(df[column_name], df["SEVERITYCODE"])
    fuck = {
            "Column Name": column_name,
            "Pearson Correlation Coefficient": pearson_coef,
            "P-value of": p_value,
        }
    i_m_crying = pd.DataFrame(fuck)
    df_local_list.append(i_m_crying)
    percof_smry = pd.concat(df_local_list, ignore_index=True)
ValueError: If using all scalar values, you must pass an index

#

i tried every solution but i keep getting errors.. how do i use pd.concat in a for loop?

azure marsh Mar 20, 2022, 7:26 PM

#

mint palm is embedding just encoding?

One is usually done to the input a model, the other is usually the output of a model

mint palm Mar 20, 2022, 7:28 PM

#

azure marsh One is usually done to the input a model, the other is usually the output of a m...

why do we have a dedicated layer for embedding

#

is it trained

prisma mist Mar 20, 2022, 7:30 PM

#

that's not the problem... i got the idx correct... it's the i_m_crying = pd.DataFrame(fuck) that's saying i'm using scalar value

tacit basin Mar 20, 2022, 7:30 PM

#

prisma mist that's not the problem... i got the idx correct... it's the `i_m_crying = pd.Dat...

fuck

tacit basin Mar 20, 2022, 7:31 PM

#

prisma mist that's not the problem... i got the idx correct... it's the `i_m_crying = pd.Dat...

can you create dataframe without loop with the same data?

prisma mist Mar 20, 2022, 7:32 PM

#

tacit basin can you create dataframe without loop with the same data?

manually create an entire dataframe of the columns + their Pearson Corr + p value?

serene scaffold Mar 20, 2022, 7:33 PM

#

steady basalt <@!253696366952316929> perhaps?

Please don't ping me to answer questions that I haven't already started answering

tacit basin Mar 20, 2022, 7:33 PM

#

prisma mist manually create an entire dataframe of the columns + their Pearson Corr + p valu...

yeah without the loop,
if it wants index givit it to it:
d = {'col1': [0, 1, 2, 3], 'col2': pd.Series([2, 3], index=[2, 3])}

prisma mist Mar 20, 2022, 7:33 PM

#

i can keep using the old code

percof_smry = pd.DataFrame({'Column Name': [], 'Pearson Correlation Coefficient': [], 'P-value of': []})
for i in range (0,len(col_name)):
    pearson_coef, p_value = stats.pearsonr(df[col_name[i]], df['SEVERITYCODE'])
    percof_smry = percof_smry.append({"Column Name":col_name[i],"Pearson Correlation Coefficient": pearson_coef , "P-value of": p_value }, ignore_index=True)
print(percof_smry)

but pd.append is deprecated

tacit basin Mar 20, 2022, 7:34 PM

#

prisma mist i can keep using the old code ``` percof_smry = pd.DataFrame({'Column Name': []...

where is the error raised? at concat?

prisma mist Mar 20, 2022, 7:36 PM

#

tacit basin where is the error raised? at concat?

i'm trying to use the new pd.concat feature. right now the error is at

i_m_crying = pd.DataFrame(fuck)

it goes away if i use pd.Series but that's the wrong thing to use... i need the output in a concatenated DataFrame

tacit basin Mar 20, 2022, 7:42 PM

#

prisma mist i'm trying to use the new pd.concat feature. right now the error is at ``` i_m_...

If you add index as the error message suggests?

prisma mist Mar 20, 2022, 7:43 PM

#

SOLVED:

df_local_list = []
for i in enumerate(col_name):
    column_name = col_name[i[0]]
    pearson_coef, p_value = stats.pearsonr(df[column_name], df["SEVERITYCODE"])
    fuck = [{
            "Column Name": column_name,
            "Pearson Correlation Coefficient": pearson_coef,
            "P-value of": p_value,
        }]
    i_m_crying = pd.DataFrame.from_dict(fuck)
    df_local_list.append(i_m_crying)
percof_smry = pd.concat(df_local_list, ignore_index=True)

monkey around long enough and you eventually produce works of shakespeare. my exasperation can be seen thru my variable names

azure marsh Mar 20, 2022, 7:48 PM

#

mint palm why do we have a dedicated layer for embedding

It can be learned, or a pre-trained one can be used. It can be used on inputs to reduce computation complexity by reducing input size to the actual network since "encodings" are not typically learned

#

https://medium.com/logivan/neural-network-embedding-and-dense-layers-whats-the-difference-fa177c6d0304

mint palm Mar 20, 2022, 7:52 PM

#

azure marsh It can be learned, or a pre-trained one can be used. It can be used on inputs to...

yes, it finally made sense, i will probably give a million dollar to developer if they started writing docs like how people would say while explaining to others

prisma mist Mar 20, 2022, 7:56 PM

#

pd.append... simple... straight forward.. easy to use
what they replaced it with: pd.concat
how to use it:

turn your dict into a list... x=[dict]
pass it to a dataframe... framed = pd.DataFrame(x)
append it to a list again ... appendify = [] appendify.append(framed)
now you can use concat... your_objective_df.concat(appendify) 👏

#

who comes up with these ideas?.... rewriting features just for the sake of putting out a new version

misty flint Mar 20, 2022, 8:12 PM

#

concat comes from numpy world. its good for concatenating matrices

azure marsh Mar 20, 2022, 8:13 PM

#

It's been around pandas for a long time, maybe as long as append. It's also more general

misty flint Mar 20, 2022, 8:15 PM

#

i dont understand the issue tbh.

#

kekHands

prisma mist Mar 20, 2022, 8:17 PM

#

what's not to understand? it's written right there

azure marsh Mar 20, 2022, 8:18 PM

#

The issue is it adds several steps to appending a dict to a dataframe

#

Though they can be put on one line it's ugly

#

But python's tagline is to have one obvious way of doing things

prisma mist Mar 20, 2022, 8:19 PM

#

pd.append was working fine... they didn't need to deprecate it

azure marsh Mar 20, 2022, 8:20 PM

#

https://github.com/pandas-dev/pandas/issues/35407

#

If you want to read the actual reasons

#

Series.append and DataFrame.append [are] making an analogy to list.append, but it's a poor analogy since the behavior isn't (and can't be) in place. The data for the index and values needs to be copied to create the result.

prisma mist Mar 20, 2022, 8:30 PM

#

more 👎 than 👍 ... and his reasoning was flawed... what we ended up doing was pd.concat with ignore_index=True .... seems like the devs just needed to do "something" to put out a new version and forced this issue thinking it up in isolation

#

my old code:

percof_smry = pd.DataFrame({'Column Name': [], 'Pearson Correlation Coefficient': [], 'P-value of': []})
for i in range (0,len(col_name)):
    pearson_coef, p_value = stats.pearsonr(df[col_name[i]], df['SEVERITYCODE'])
    percof_smry = percof_smry.append({"Column Name":col_name[i],"Pearson Correlation Coefficient": pearson_coef , "P-value of": p_value }, ignore_index=True)
print(percof_smry)

my new code:

df_local_list = []
for i in enumerate(col_name):
    column_name = col_name[i[0]]
    pearson_coef, p_value = stats.pearsonr(df[column_name], df["SEVERITYCODE"])
    the_values = [
        {
            "Column Name": column_name,
            "Pearson Correlation Coefficient": pearson_coef,
            "P-value of": p_value,
        }
    ]
    pass_to_df = pd.DataFrame.from_dict(the_values)
    df_local_list.append(pass_to_df)
percof_smry = pd.concat(df_local_list, ignore_index=True)
print(percof_smry)

it has to pass thru extra steps now

tacit basin Mar 20, 2022, 8:36 PM

#

What deep neutral network architecture would be good for large images classification, there will be both larger and smaller details that will be important for classification.

azure marsh Mar 20, 2022, 8:38 PM

#

prisma mist more 👎 than 👍 ... and his reasoning was flawed... what we ended up doing was ...

If you've ever designed architecture, you'll know popularity is not always aligned with the correct choice.

#

If you provide counters to each of their reasons that would be much more relevant than one code sample

prisma mist Mar 20, 2022, 8:40 PM

#

azure marsh If you've ever designed architecture, you'll know popularity is not always align...

if you're just looking for pointless debates just go away. don't talk to me

azure marsh Mar 20, 2022, 8:41 PM

#

Go away? Is this your discord? I'm merely asking you to think beyond your specific use case

#

You're welcome to ignore the request and exit the discussion. You're trying to debate the deprecation of an API, if you're looking for more than blind agreement perhaps look elsewhere.

modern cypress Mar 20, 2022, 8:44 PM

#

Can anyone direct me to pretrained object detection models for COCO? If anything like that exists?

tacit basin Mar 20, 2022, 8:44 PM

#

modern cypress Can anyone direct me to pretrained object detection models for COCO? If anything...

Yolov5

prisma mist Mar 20, 2022, 8:45 PM

#

go away means don't talk to me... some of us have real models to train rather than having pointless debates

tacit basin Mar 20, 2022, 8:45 PM

#

modern cypress Can anyone direct me to pretrained object detection models for COCO? If anything...

https://pytorch.org/hub/ultralytics_yolov5/

PyTorch

azure marsh Mar 20, 2022, 8:45 PM

#

prisma mist go away means don't talk to me... some of us have real models to train rather th...

You introduced the topic

modern cypress Mar 20, 2022, 8:49 PM

#

tacit basin https://pytorch.org/hub/ultralytics_yolov5/

Thanks, I'll check this out ^^

azure marsh Mar 20, 2022, 8:51 PM

#

modern cypress Can anyone direct me to pretrained object detection models for COCO? If anything...

There are a lot, as COCO is one of the largest and most popular datasets

#

One of the biggest model zoos is detectron2. https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md

GitHub

detectron2/MODEL_ZOO.md at main · facebookresearch/detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks. - detectron2/MODEL_ZOO.md at main · facebookresearch/detectron2

#

But there are surely more across all frameworks

minor elbow Mar 20, 2022, 8:53 PM

#

prisma mist more 👎 than 👍 ... and his reasoning was flawed... what we ended up doing was ...

srsl = [pd.Series((d[0], *d[1:]), index=['Colname', 'Corr', 'Pval']) for d in dat]
pd.DataFrame(srsl)

#

or something like that

azure marsh Mar 20, 2022, 8:57 PM

#

And as I mentioned above you can roll multiple of your lines into one. List append doesn't need to be it's own line, and neither does from dict

#

Additionally rolling into all one line is less readible and maintainable, specifically the old dict creation

random sapphire Mar 20, 2022, 9:15 PM

#

odd meteor I've just subscribed. Nice Channel. Happy early congratulations on hitting 1k ...

Thanks so much for the sub! Hope to hit 1k soon 😊

prisma mist Mar 20, 2022, 9:22 PM

#

minor elbow ```dat = [(col, stats.pearsonr(df[col], df["SEVERITYCODE"])) for col in col_name...

ValueError: Length of values (2) does not match length of index (3)

minor elbow Mar 20, 2022, 9:28 PM

#

you would have to work on it a bit most probably, was just eyeballing it

#

i feel like you would do well to work on your python fundamentals though

#

depends on what stats.pearsonr gives back, you could unpack it in the first comprehension maybe

prisma mist Mar 20, 2022, 9:35 PM

#

minor elbow depends on what stats.pearsonr gives back, you could unpack it in the first comp...

yes that worked.. although a bit messy in a single list comprehension line

minor elbow Mar 20, 2022, 9:38 PM

#

generators can be nice if you need to do a bit too much dancing for a straight comprehension

#

i definitely lean towards using more lines and being clearer than trying to wrap it all in one giant thing

steady basalt Mar 20, 2022, 9:40 PM

#

prisma mist more 👎 than 👍 ... and his reasoning was flawed... what we ended up doing was ...

U have a good point that sometimes pandas has some unintuitive design

#

Love the package tho

#

Sometimes I am confused why they fix what isn’t broken

#

Btw I don’t see anything wrong with concat

#

I don’t think it’s new

minor elbow Mar 20, 2022, 9:48 PM

#

i think one of the problems is, where do u draw the line about what to keep for backwards compatibility

#

like if they acknowlege it was a bad idea/didnt really work as expected, id rather they got rid of it

#

rahter than having a million different ways to do things, many of which are suboptimal

#

also maintaining a bunch of older stuff that should have been deprecated takes time away from doing newer better things

#

like as an extreme, ive worked at place that still run on mainframes, the argument is 'well they still work fine' which is true on one level but in practise it means they cant do anything modern that their users/customers want to do

unique tartan Mar 20, 2022, 10:22 PM

#

Hey Guys ! Can u send some examples or ur projects using AI

#

You know what I mean, like a face detector or whatever

grave frost Mar 20, 2022, 10:34 PM

#

@iron basalt seems Numenta is ditching HTM https://arxiv.org/abs/2201.00042
Apparently, it doesn't change things.... much for TBT, but I am currently pestering some guys about the status

anyways, its a pretty bad paper with plenty of criticisms - not to mention the DL baselines it competed against were weak, GOFAI stuff used to disguise cheating and general sussiness regarding inconsistent methodologies across experiments

languid stratus Mar 20, 2022, 10:34 PM

#

Anyone any good with doc2vec? I'm wondering why in this tutorial/ example https://radimrehurek.com/gensim/auto_examples/howtos/run_doc2vec_imdb.html#sphx-glr-auto-examples-howtos-run-doc2vec-imdb-py
They train the model on both the train and the test set. I had thoguht that this was not the way it was done and you should keep the sets seperate

Gensim: topic modelling for humans

Efficient topic modelling in Python

misty flint Mar 20, 2022, 11:02 PM

#

minor elbow like as an extreme, ive worked at place that still run on mainframes, the argume...

until everything breaks and you dont have anyone around able to fix it

#

kekHands

#

just like if your system is run on COBOL. where are you going to find COBOL programmers nowadays kekHands

warm stirrup Mar 20, 2022, 11:16 PM

#

thanks, this helped a lot

misty flint Mar 20, 2022, 11:22 PM

#

languid stratus Anyone any good with doc2vec? I'm wondering why in this tutorial/ example https:...

doesnt look like it

misty flint Mar 20, 2022, 11:24 PM

#

languid stratus Anyone any good with doc2vec? I'm wondering why in this tutorial/ example https:...

looks like it is separated even when they trained the model

languid stratus Mar 20, 2022, 11:25 PM

#

brave sand Mar 20, 2022, 11:26 PM

#

so I’m close to completing LunarLander on OpenAI, how different is that from a 4 motor drone?

mint palm Mar 20, 2022, 11:38 PM

#

how to deal with problem that LabelEncoding takes input only similar datatype

#

#

these are my modelling parameters

#

what kind of encoding do you suggest

misty flint Mar 21, 2022, 12:11 AM

#

languid stratus

oh thats weird. i have no idea then kekHands

pseudo wren Mar 21, 2022, 2:46 AM

#

Having trouble getting all the elements from my dataframe into one list

#

I’m looking to get everything into a list in order to calculate polarity

#

For some reason this is not working right now

#

I’ve been trying to find an answer to this question and can’t as of yet

novel elbow Mar 21, 2022, 2:49 AM

#

pseudo wren Having trouble getting all the elements from my dataframe into one list

paste a sample of your code

karmic valley Mar 21, 2022, 2:56 AM

#

anyone help me make a quick loop. if free for 5min just @winged grove would appreciate

#

want to create a loop or something. all my images are in order. like part_1, part_2, part3 . want code to automatically do code for like part[i+1].png

#

from skimage import io, img_as_float
import numpy as np

image= io.imread(r'C:\Users\guest\Dropbox\con1_outfolder_split_30sbeforepeak2min30safterpeak\part_1.png')
image = img_as_float(image)
print(np.mean(image))

novel elbow Mar 21, 2022, 3:01 AM

#

karmic valley ```py from skimage import io, img_as_float import numpy as np image= io.imread(...

eg:

images = [img_as_float(io.imread(f'C:\Users\guest\Dropbox\con1_outfolder_split_30sbeforepeak2min30safterpeak\part_{i}.png')) for i in range(10)]

karmic valley Mar 21, 2022, 3:09 AM

#

novel elbow eg: ```python images = [img_as_float(io.imread(f'C:\Users\guest\Dropbox\con1_out...

thanks you

#

only thing i am thinking is will it give me in order. because i want to see printed results in order. like first part 1 then part 2 then part 3 @novel elbow

#

got this error

 images = [img_as_float(io.imread(f'C:\Users\guest\Dropbox\con4_outfolder_split_30sbeforepeak2min30safterpeak\part_{i}.png')) for i in range(10)]
                                                                                                                             ^
SyntaxError: invalid syntax

wicked grove Mar 21, 2022, 3:25 AM

#

tacit basin that's an option too. but i wonder why other plots in your graph start with such...

Yeah idk why either, i mean i did the plotting with that code

#

How can i know if it is wrong tho

graceful glacier Mar 21, 2022, 3:35 AM

#

whats the easiest way to extract the last digit from a number in pandas?

misty flint Mar 21, 2022, 4:08 AM

#

pseudo wren Having trouble getting all the elements from my dataframe into one list

have you tried .to_list()

#

youll need to select a column from your dataframe first since that function only accepts pandas series

#

your_list = df['column A'].to_list()

tacit basin Mar 21, 2022, 5:29 AM

#

wicked grove Yeah idk why either, i mean i did the plotting with that code

yeah, double-check cross-validation set up and plotting. looks suspicious.

wicked grove Mar 21, 2022, 5:42 AM

#

tacit basin yeah, double-check cross-validation set up and plotting. looks suspicious.

If everything checks out what could be the issue

iron basalt Mar 21, 2022, 5:50 AM

#

grave frost <@!119925597395877889> seems Numenta is ditching HTM https://arxiv.org/abs/2201....

I don't think they are ditching it for DL, they have pretty much always been doing some DL-ish stuff too (I think Jeff already said in his book that HTM was not right, the specifics of it, but general things like having cortical columns still persist). I have not read the paper all that much so IDK about its quality. It does not interest me that much. I'm more interested in Jeff's grid cells idea (thousand brains theory, but also just for localization / regular grid cells stuff).

#

While we were inspired by HTM and such, we don't do it the way they do because it never got really good results (if it does not work, we move on, though it's hard to tell since it's a multi-arm bandit problem). The big picture of the structure and such is there / modelling the neocortex, but the details of how to do that are very different.

#

You can also see their use of DL in their first papers on grid cells based object detection in which they use a pre-trained CNN to simply demonstrate that the grid cells can identify objects given only patches of the original image in any order (as a sequence of eye movements). Ideally the DL part would be replaced with something more biologically plausible that at least gets similar results (we have done that, which also enables online learning in our case). Numenta as a company has different things going on and for me personally it's hit and miss. Sometimes it's a really nice idea like thousand brains theory, but sometimes it's kind of meh.

tacit basin Mar 21, 2022, 5:56 AM

#

mabye convert column to string and slice it and back to number

df.AA.str[-1:].astype(int)

tacit basin Mar 21, 2022, 5:57 AM

#

wicked grove If everything checks out what could be the issue

different splits give different resutls, would try to understand splitting

wicked grove Mar 21, 2022, 5:58 AM

#

tacit basin different splits give different resutls, would try to understand splitting

Ahh, so i have like 2963 images

#

And i split it five times

tacit basin Mar 21, 2022, 5:59 AM

#

what is it classification i forgot sry, if classification you can check number of classess in each split and also in validation set

subtle spoke Mar 21, 2022, 6:53 AM

#

does anyone know how to use ffmpeg? I'm following these instructions but keep getting an error saying the file can't be found, even though I saved it in the same directory as the ffmpeg-split.py file

#

This is the command I ran

mint palm Mar 21, 2022, 7:01 AM

#

subtle spoke Mar 21, 2022, 7:02 AM

#

tacit basin what is it classification i forgot sry, if classification you can check number o...

I just realized I don't have ffmpeg itself installed lmao

mint palm Mar 21, 2022, 7:02 AM

#

suggest some architectures for complete categorial dataset

slim frigate Mar 21, 2022, 7:34 AM

#

hi

#

from openpyxl import Workbook, load_workbook
from openpyxl.utils import get_column_letter
from openpyxl.styles import Font
data = {"rename": {
        "math": 20,
        "science": 20,
        "english": 20,
        "gym": 20}
        }

def info():
    name = input('Your name ? : ')
    math = input('Your math degree ? : ')
    science = input('Your science degree ? : ')
    english = input('Your english degree ? : ')
    gym = input('Your gym degree ? : ')
    data.update({name:{"math":math,"science":science,"english":english,"gym":gym }})
    input('press any key ...')

for a in range(2):
    info()
    a+=1

wb = Workbook()
ws = wb.active
ws.title = "Grades"

headings = ['Name'] + list(data['rename'].keys())
ws.append(headings)

for person in data:
    grades = list(data[person].values())
    ws.append([person] + grades)

for col in range(1, 6):
    ws[get_column_letter(col) + '1'].font = Font(bold=True, color="0099CCFF")

wb.save("NewGrades.xlsx")```

#

result :

#

#

but i want change the output without rename data

#

any suggestions can help me

slim frigate Mar 21, 2022, 7:41 AM

#

slim frigate but i want change the output without rename data

#

stone marlin Mar 21, 2022, 7:59 AM

#

This does not appear to be data science, you may want to ask in the standard help rooms.

mint palm Mar 21, 2022, 9:49 AM

#

embedding needs encoding first right?

rotund trellis Mar 21, 2022, 10:17 AM

#

Is there a way for me to use drone-acquired images of water/ocean/lakes and use them to check for pollution using machine learning?

haughty dust Mar 21, 2022, 11:16 AM

#

Can someone advise what filter to keep on images while making word clouds in python??
Since I am not able to get proper imprint of the person with word cloud

steady basalt Mar 21, 2022, 11:33 AM

#

rotund trellis Is there a way for me to use drone-acquired images of water/ocean/lakes and use ...

Do photos show polution? Like colour?

fallen jackal Mar 21, 2022, 11:55 AM

#

hey, i have a cropped opencv2 image that i want to predict in a model that i made.
the model requires shape (28,28), but when i try to reshape it i get the error
"cannot reshape array of size 63948 into shape (28,28)'"

img = cv2.imread(u'/content/drive/MyDrive/Data/Project/Screenshot_44.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

hImg, wImg, _ = img.shape
boxes = pytesseract.image_to_boxes(img)
for b in boxes.splitlines():
  b = b.split(' ')
  x,y,w,h = int(b[1]), int(b[2]), int(b[3]), int(b[4])

  width = w-x
  height = h-y
  n = width - height

  if (width > height):
    h = int(h + n/2)
    y = int(y - n/2)

  elif (height > width):
    w = int(w - n/2)
    x = int(x + n/2)

  crop_img = img[hImg-h:hImg-y, x:w]
  reshape = crop_img.reshape((28,28))

rotund trellis Mar 21, 2022, 12:04 PM

#

steady basalt Do photos show polution? Like colour?

yes and even stuff like oil spills

grave frost Mar 21, 2022, 12:17 PM

#

iron basalt You can also see their use of DL in their first papers on grid cells based objec...

It's a weird trajectory all-in-all I suppose, what with this weird hybrid of approaches. but at least now they have started competing on benchmarks 🤷‍♂️

steady basalt Mar 21, 2022, 12:21 PM

#

rotund trellis yes and even stuff like oil spills

Do you have thousands of these images

#

How high up are they taken

#

If exists a vast library of these images then you could do it

forest knoll Mar 21, 2022, 12:35 PM

#

Hi guys, is it possible to use CNN to train a model without labels in order to query similar images?

serene scaffold Mar 21, 2022, 12:37 PM

#

forest knoll Hi guys, is it possible to use CNN to train a model without labels in order to ...

by what standard of similarity?

humble mountain Mar 21, 2022, 12:38 PM

#

Someone need help for a code?

serene scaffold Mar 21, 2022, 12:40 PM

#

humble mountain Someone need help for a code?

if you want to answer questions, try checking the occupied help channels for those not being addressed.

forest knoll Mar 21, 2022, 12:47 PM

#

@serene scaffold I have 20 query images which have been cropped, I need to rank 10 most similar images among 5000 images, they are not quite similar
I only know I can use some feature extraction algo like SIFT or color histogram to find those image, but some images are still not found. And I know transfer learning is a way to modify the existing CNN model like VGG. but what's the most accurate way to do that?

pseudo wren Mar 21, 2022, 1:16 PM

#

misty flint have you tried `.to_list()`

I have tried to list, but I need it to be a list of strings

spring marsh Mar 21, 2022, 1:24 PM

#

Does it matter if I normalize my data if I am using logistic regression ? I tried using standard scaler from sklearn and I am somehow getting way better results in confusion matrix and classification report

grave frost Mar 21, 2022, 1:25 PM

#

forest knoll Hi guys, is it possible to use CNN to train a model without labels in order to ...

look up contrastive learning

spring marsh Mar 21, 2022, 1:25 PM

#

I can send the whole code if someone wants to take a look

sour spindle Mar 21, 2022, 1:28 PM

#

spring marsh Does it matter if I normalize my data if I am using logistic regression ? I trie...

sometimes it helps alot

sour spindle Mar 21, 2022, 1:29 PM

#

spring marsh I can send the whole code if someone wants to take a look

yeah send it if you want

#

normalizing data for lstm layers helps alot since it ranges values in tanh and sigmoid function

fossil ivy Mar 21, 2022, 1:30 PM

#

Hey everyone! Anyone here experienced with Elasticsearch who could help me out? Im using ES 7.17 for lower level security but when I try to access the running node it tells me now that I am missing authentication credentials
elasticsearch.exceptions.AuthenticationException: AuthenticationException(401, 'security_exception', 'missing authentication credentials for REST request [/persons]')

spring marsh Mar 21, 2022, 1:30 PM

#

sour spindle yeah send it if you want

I am basically worried that I might have leaked the data to the scaler I am sending some pics to you in dm please take a look

sour spindle Mar 21, 2022, 1:30 PM

#

spring marsh I am basically worried that I might have leaked the data to the scaler I am send...

ok

forest knoll Mar 21, 2022, 1:55 PM

#

@grave frost thank you for suggestion, i'm studying contrastive learning and transfer learning to see which one is more accurate

high grove Mar 21, 2022, 2:22 PM

#

How to make a graph in third quadrant in matplotlib ?

hasty kiln Mar 21, 2022, 2:32 PM

#

What Data Science and Data Analysis Skills Are Required to Become an ML Engineer?

#

I read in an article about the requirements to become an ML engineer, that you should have some experience with Data Science, data analysis and other major requirements.

steady basalt Mar 21, 2022, 2:44 PM

#

hasty kiln What Data Science and Data Analysis Skills Are Required to Become an ML Engineer...

Are you a tensorflow user? Do you know how to code models from scratch and are you a math god

#

actual ml engineers in research are literal gods

#

Check out some papers on designing new algorithms

#

Tbh I don’t think I’ll ever reach that level due to the hard cap on my math

hasty kiln Mar 21, 2022, 2:46 PM

#

steady basalt Are you a tensorflow user? Do you know how to code models from scratch and are y...

On I'm a beginner trying to make a fresh start

hasty kiln Mar 21, 2022, 2:47 PM

#

steady basalt Check out some papers on designing new algorithms

Ok

hasty kiln Mar 21, 2022, 2:48 PM

#

steady basalt Tbh I don’t think I’ll ever reach that level due to the hard cap on my math

I'm not bad in math 😅

steady basalt Mar 21, 2022, 2:51 PM

#

Ur gona need like

#

Literal years of education or experience

#

Take CS at uni maybe, then a statistical postgrad

#

Do u use python? Or R

hasty kiln Mar 21, 2022, 2:56 PM

#

steady basalt Do u use python? Or R

Python

novel elbow Mar 21, 2022, 3:02 PM

#

In my view a ml engineer is the one who knows how to apply software eng good practices into ml model development/deployment (mainly testing and CI)

desert oar Mar 21, 2022, 3:25 PM

#

good point of course. but you might want to look into scikit-learn's dataset generators for some insight into generating somewhat "realistic" datasets https://scikit-learn.org/stable/datasets/sample_generators.html

scikit-learn

7.3. Generated datasets

In addition, scikit-learn includes various random sample generators that can be used to build artificial datasets of controlled size and complexity. Generators for classification and clustering: Th...

hasty kiln Mar 21, 2022, 3:41 PM

#

novel elbow In my view a ml engineer is the one who knows how to apply software eng good pr...

Are you ML engineer?

novel elbow Mar 21, 2022, 3:44 PM

#

hasty kiln Are you ML engineer?

unemployed atm, but have worked as such

unreal charm Mar 21, 2022, 3:47 PM

#

Hi, how can i convert a csv file to base64 encode?

hasty kiln Mar 21, 2022, 3:48 PM

#

novel elbow unemployed atm, but have worked as such

Good

karmic moth Mar 21, 2022, 4:07 PM

#

Hi guys, i have a question, its simple and it has 2 parts, so basically

Should we train a CNN with Conv1D layers, with 2D or 3D data, or both are possible?
Should we train a LSTM model with 2D or 3D data, or both are possible?
I need someone's advice on this before I proceed.

mint palm Mar 21, 2022, 4:27 PM

#

sys:1: DtypeWarning: Columns (1) have mixed types.Specify dtype option on import or set low_memory=False.

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

#

error^^^^^^^^^^ on running

jovial summit Mar 21, 2022, 4:46 PM

#

Continuing from #help-cherries

For context, I'm a mechanical engineer without a ton of software experience that's now writing finite-element analysis code. We have an old mess of MPI-based C++ code that nobody understands. I'm currently looking to rewrite it. I have a simpler version of it implemented in Python now using Numba, to see if it works with multithreading the way we need.

I'm trying to determine if Numba is the best way to implement this, or if I should look into something else like Cython

Here's a snippet that shows the 'type' of operations that it's mostly based on.

@njit(parallel = True)  
def stepAllCalcs(dx, dz, ux3, uz3, ux2, uz2, ux1, uz1, lam, mu, lam_2mu, dt2rho, weights):
    co_dxx = 1/dx**2
    co_dzz = 1/dz**2
    co_dxz = 1/(4.0 * dx * dz)

    #Ux
    dux_dxx = co_dxx * (ux2[1:-1,0:-2] - 2*ux2[1:-1,1:-1] + ux2[1:-1,2:])
    dux_dzz = co_dzz * (ux2[0:-2,1:-1] - 2*ux2[1:-1,1:-1] + ux2[2:,1:-1])
    dux_dxz = co_dxz * (ux2[0:-2,2:] - ux2[2:,2:]- ux2[0:-2,0:-2] + ux2[2:,0:-2])

    (...)

    # Stress G
    stressUX = lam_2mu * dux_dxx + lam * duz_dxz + mu * (dux_dzz + duz_dxz)

It's mostly simple array addition, with some scalar multiplication.

@desert oar

desert oar Mar 21, 2022, 4:47 PM

#

other than the horrifying 70s style variable names, this looks about as good as it's going to get

#

(i know i know it's math notation rendered in ascii, i've written/used code like this)

#

maybe there are some additional optimizations for this kind of calculation but i wouldn't know any

jovial summit Mar 21, 2022, 4:49 PM

#

desert oar other than the horrifying 70s style variable names, this looks about as good as ...

yeah there are certainly some improvements to be made there

desert oar Mar 21, 2022, 4:49 PM

#

writing high-performance cython is a lot closer to C than Python

#

you are still messing with pointers and such, and even worse you now have to worry about interacting with python, thread safety, reference counting, etc.

#

and at that point you're probably better off with the original c++ application

#

which leads me to ask: is this performing significantly worse than the C++ version?

jovial summit Mar 21, 2022, 4:50 PM

#

we don't have an exact comparison between the two

desert oar Mar 21, 2022, 4:50 PM

#

it seems like numba already uses openmp for parallelization, fwiw https://numba.discourse.group/t/does-numba-support-mpi-and-or-openmp-parallelization/483/2

Numba Discussion

Does Numba support MPI and/or openMP parallelization?

Hi @goldmosh, Not out of the box. You can use a lot of ctypes in Numba and could call MPI functions if you wanted to but it’d probably be a lot of work. You might be interested in trying out dask and it’s dask.distributed backend, it works well with Numba. Yes. See Automatic parallelization with @jit — Numba 0.52.0-py3.7-linux-x86_64.egg doc...

jovial summit Mar 21, 2022, 4:51 PM

#

this is a 2d version that I wrote a while back and am just now trying on our HPC, while the 'actual' code is 3d

#

trying to figure out if it's worth rewriting in 3d

jovial summit Mar 21, 2022, 4:53 PM

#

desert oar it seems like numba already uses openmp for parallelization, fwiw https://numba....

Is there any reason I'd want to use MPI? from my limited understanding it's mostly useful for distributed computing, while we're mostly running on one system

#

just multiple cores, which numba seems to handle fine

#

also if anyone has general input for how to approach writing large-scale stuff like this I'd appreciate it

desert oar Mar 21, 2022, 5:03 PM

#

not that i know of, no

#

i think openmp is what a lot of numerical programs use anyway

trim pond Mar 21, 2022, 5:03 PM

#

Hi I want to make a project that uses object detection. I have some tf and data science experience but never used computer vision and stuff. Which libraries or frameworks do you guys recommend?

#

Or any courses to get started?

jolly knoll Mar 21, 2022, 5:32 PM

#

Guys, I have a column called "Routes" with 900 unique values. Should I have one-hot encoded it? Haha

#

If not, what should I have done?

desert oar Mar 21, 2022, 5:39 PM

#

jolly knoll Guys, I have a column called "Routes" with 900 unique values. Should I have one-...

for what purpose?

jolly knoll Mar 21, 2022, 5:40 PM

#

for feature selection (rfe), then model development (random forest)

desert oar Mar 21, 2022, 5:42 PM

#

jolly knoll for feature selection (rfe), then model development (random forest)

probably not actually. although random forest does tend to over-weight high-cardinality categorical features

jolly knoll Mar 21, 2022, 5:42 PM

#

hmm what should I have done with it instead?

jolly knoll Mar 21, 2022, 5:43 PM

#

desert oar probably not actually. although random forest does tend to over-weight high-card...

i was getting relatively decent results, do you think it may have overfitted?

mint palm Mar 21, 2022, 6:00 PM

#

should i apply hash encoding after test_train_split??

#

or before??????????????//

desert oar Mar 21, 2022, 6:00 PM

#

jolly knoll i was getting relatively decent results, do you think it may have overfitted?

probably not, i wouldn't worry about it

desert oar Mar 21, 2022, 6:01 PM

#

mint palm should i apply hash encoding after test_train_split??

hashing specifically doesn't matter, because there is no "training" involved

mint palm Mar 21, 2022, 6:01 PM

#

desert oar hashing specifically doesn't matter, because there is no "training" involved

i have some worry regarding hashing of Y

desert oar Mar 21, 2022, 6:01 PM

#

but normally you should fit/train your data transformations only on the training set, not on the test set. data transformation is part of your model!

mint palm Mar 21, 2022, 6:01 PM

#

how should i deal with Y

desert oar Mar 21, 2022, 6:01 PM

#

mint palm i have some worry regarding hashing of Y

you probably shouldn't hash your class labels / output values

#

why would you?

mint palm Mar 21, 2022, 6:02 PM

#

consider my dataset to be 100% categorial

#

should i just one_hot y

#

or LabelEncode Y

desert oar Mar 21, 2022, 6:03 PM

#

mint palm or LabelEncode Y

LabelEncode is simpler and better-designed for this purpose

#

assuming you're using scikit-learn, OneHotEncoder has a bunch of extra features that you don't need for labels

#

and hashing makes no sense for a variety of reasons; i encourage you to think about why

mint palm Mar 21, 2022, 6:04 PM

#

yeah ok on it, i hope it doesnt give me 100 accuracy on all three sets this time

#

else i am skrewed

desert oar Mar 21, 2022, 6:04 PM

#

mint palm yeah ok on it, i hope it doesnt give me 100 accuracy on all three sets this time

if you are getting 100% accuracy on all 3 sets, then you probably accidentally put the labels into the model as a feature

mint palm Mar 21, 2022, 6:05 PM

#

yeah tried all that stuff

#

i am actually not doing any mistakes

#

that model was just not suited for the problem i believe thats why i am switching the model

desert oar Mar 21, 2022, 6:06 PM

#

100% accuracy isn't a bad thing btw, but it does probably mean that your model is badly overfitted

mint palm Mar 21, 2022, 6:07 PM

#

desert oar 100% accuracy isn't a bad thing btw, but it does probably mean that your model i...

exactly. thats why i am just overcomplicating model. lol to decrease accu

desert oar Mar 21, 2022, 6:12 PM

#

mint palm exactly. thats why i am just overcomplicating model. lol to decrease accu

that's not a good idea

#

that's covering up the problem, not solving it

mint palm Mar 21, 2022, 6:12 PM

#

desert oar that's not a good idea

well i have a reason, its not just that 100/100/100

#

i have lower loss on validation then on train

#

thats suggest an "inconclusive performance"

desert oar Mar 21, 2022, 6:13 PM

#

that suggests bugs in your code to me, or a particularly unlucky train/test split

#

this is why i think nested cross validation is a much better approximation of out-of-sample performance than a plain split

mint palm Mar 21, 2022, 6:14 PM

#

desert oar that suggests bugs in your code to me, or a particularly unlucky train/test spli...

you want to try debugging, i would appreciate it a lot

desert oar Mar 21, 2022, 6:14 PM

#

you can try posting your code, can't make any guarantees

#

!paste

arctic wedgeBOT Mar 21, 2022, 6:14 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

mint palm Mar 21, 2022, 6:15 PM

#

i have dataset and code in google drive and collab

#

are you comfortable with those

#

i can dm you, if you give me permission to

desert oar Mar 21, 2022, 6:18 PM

#

that's a bit more than i'll have time to look at

mint palm Mar 21, 2022, 6:19 PM

#

This website paste.pythondiscord.com/ is currently offline. Cloudflare's Always Online™ shows a snapshot of this web page from the Internet Archive's Wayback Machine. To check for the live version, click Refresh.

desert oar Mar 21, 2022, 6:20 PM

#

that's a shame, try https://bpaste.net

#

or its new url https://bpa.st/

supple leaf Mar 21, 2022, 6:23 PM

#

salt rock lamp, could you give me a tip how i can find the maximum value from my code:

import pandas as pd
import matplotlib.pyplot as plt
#import numpy as np

var = pd.read_excel(r'/Users/pontusskol/Desktop/data.xlsx')
print(var)

x = list(var['X values'])
y = list(var['Y values'])

plt.figure(figsize=(10,10))
plt.style.use('seaborn')
plt.plot(x,y, '-o',label='x,y')
plt.scatter(x,y,marker="o",s=100,edgecolors="black",c="yellow")
plt.title("Excel sheet to Scatter Plot")
plt.show()

Which gives me this graph as i showed you before:

#

Ive been searching on internet but I just cant make it work :/

mint palm Mar 21, 2022, 6:23 PM

#

desert oar that's a shame, try https://bpaste.net

https://bpa.st/F53A

supple leaf Mar 21, 2022, 6:24 PM

#

by using this?

xmax = x[numpy.argmax(y)]
ymax = y.max()

young egret Mar 21, 2022, 6:24 PM

#

Is there a way to make my twint program update its data in real time?

agile cobalt Mar 21, 2022, 6:25 PM

#

young egret Is there a way to make my twint program update its data in real time?

Use the Twitter API instead of web scraping libraries

desert oar Mar 21, 2022, 6:28 PM

#

supple leaf salt rock lamp, could you give me a tip how i can find the maximum value from my...

i told you: use loess, spline, or gaussian process to interpolate. then find the max value from that

#

either you compute a bunch of values on a very finely-spaced grid and do a search, or use some numerical optimization routine

supple leaf Mar 21, 2022, 6:28 PM

#

okay thanks, will look into that

mint palm Mar 21, 2022, 6:36 PM

#

you got any error?

jovial summit Mar 21, 2022, 6:36 PM

#

@desert oar sorry for the random ping, but it seemed like you were knowledgeable about Numba - any ideas where I'd start troubleshooting LLVM / SVML? I'm trying to enable it for the project from earlier but it doesn't seem to be working

#

right now I'm just doing numba._try_enable_svml which always returns false despite having the libs installed

mint palm Mar 21, 2022, 6:57 PM

#

how to handle hash encoding if my column has more then one datatype.

umbral sage Mar 21, 2022, 6:59 PM

#

does anyone know if I load a model through HDFS how can I load it to use like pickle.load would since it is from a connection instead of a file

lapis sequoia Mar 21, 2022, 7:00 PM

#

hello guys how can i fix installing kivy in anaconda errors

desert oar Mar 21, 2022, 7:12 PM

#

jovial summit <@!389497659087650836> sorry for the random ping, but it seemed like you were kn...

oof, definitely no idea. sorry

#

i have a conceptual model about how numba works, and i know what llvm is, and that's about it

lapis sequoia Mar 21, 2022, 7:20 PM

#

ERROR: Could not find a version that satisfies the requirement kivy-deps.sdl2 (from versions: none)
ERROR: No matching distribution found for kivy-deps.sdl2

#

how to fix it?

mint palm Mar 21, 2022, 7:26 PM

#

        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.```

ocean pendant Mar 21, 2022, 7:29 PM

#

hello guys

#

im kinda new to learning curve analysis . does anyone know if this curve is good or not

#

im trying to run optimization buut I dont exactly know how I can fine tune the hyperparameters according to the learning curve

unreal charm Mar 21, 2022, 7:42 PM

#

Hi I want to convert this: ```label,question,answer
label 1,pytanie 1?,odpowiedź 1
label 1-2,pytanie 2?,odpowiedź 1
label 1-2,pytanie 1?,odpowiedź 1
label 1-2,pytanie 2?,odpowiedź 2
label 1-2,pytanie 1?,odpowiedź 2
label 2,pytanie 2?,odpowiedź 2

#

with base64

#

and instead of this:

lapis sequoia Mar 21, 2022, 7:42 PM

#

ocean pendant im kinda new to learning curve analysis . does anyone know if this curve is good...

it kinda goes down so probably not so good

unreal charm Mar 21, 2022, 7:42 PM

#

bGFiZWwscXVlc3Rpb24sYW5zd2VyCmxhYmVsIDEscHl0YW5pZSAxPyxvZHBvd2llZMW6IDEKbGFiZWwgMS0yLHB5dGFuaWUgMj8sb2Rwb3dpZWTFuiAxCmxhYmVsIDEtMixweXRhbmllIDE/LG9kcG93aWVkxbogMQpsYWJlbCAxLTIscHl0YW5pZSAyPyxvZHBvd2llZMW6IDIKbGFiZWwgMS0yLHB5dGFuaWUgMT8sb2Rwb3dpZWTFuiAyCmxhYmVsIDIscHl0YW5pZSAyPyxvZHBvd2llZMW6IDIK

#

I have that written in my csv file: ```98,71,70,105,90,87,119,115,99,88,86,108,99,51,82,112,98,50,52,115,89,87,53,122,100,50,86,121,67,109,120,104,89,109,86,115,73,68,69,115,99,72,108,48,89,87,53,112,90,83,65,120,80,121,120,118,90,72,66,118,100,50,108,108,90,77,87,54,73,68,69,75,98,71,70,105,90,87,119,103,77,83,48,121,76,72,66,53,100,71,70,117,97,87,85,103,77,106,56,115,98,50,82,119,98,51,100,112,90,87,84,70,117,105,65,120,67,109,120,104,89,109,86,115,73,68,69,116,77,105,120,119,101,88,82,104,98,109,108,108,73,68,69,47,76,71,57,107,99,71,57,51,97,87,86,107,120,98,111,103,77,81,112,115,89,87,74,108,98,67,65,120,76,84,73,115,99,72,108,48,89,87,53,112,90,83,65,121,80,121,120,118,90,72,66,118,100,50,108,108,90,77,87,54,73,68,73,75,98,71,70,105,90,87,119,103,77,83,48,121,76,72,66,53,100,71,70,117,97,87,85,103,77,84,56,115,98,50,82,119,98,51,100,112,90,87,84,70,117,105,65,121,67,109,120,104,89,109,86,115,73,68,73,115,99,72,108,48,89,87,53,112,90,83,65,121,80,121,120,118,90,72,66,118,100,50,108,108,90,77,87,54,73,68,73,75

#

why is that?

#

my code:

    if not name or name == '':
        print("badn name")

    label_check = db.session.query(Labels.label_name,Labels.label_id).first()

    if label_check == None:
        print("no labels")


    header = ['question', 'label', 'answer']
    data = db.session.query(Labels.label_name,Questions.question,Answers.answer)\
    .filter(Labels.label_id==Answers.label_id)\
    .filter(Labels.label_id==Questions.label_id).all()
    result = all_many_schema.dumps(data)
        
    with open(f"dowolands/{name}.csv", 'w', newline='') as f:
        header = ['label', 'question', 'answer']
        writer = csv.writer(f)
        writer.writerow(header)
        i = 0
        for range in data:
            writer.writerow(data[i])
            i = i+1
    data = open(f"dowolands/{name}.csv", "r").read().encode('utf8')
    encoded = base64.b64encode(data)
    with open(f"dowolands/{name}.csv", 'w') as f:
        writer = csv.writer(f)
        writer.writerow(encoded)
        f.close()
    print(encoded)```

jolly knoll Mar 21, 2022, 7:43 PM

#

desert oar probably not, i wouldn't worry about it

Thanks, that's a relief to know. But do you know what should I have done instead with dealing with high-cardinality columns? I've read of PCA but I heard it's designed for continuous variables. Would very much appreciate your input!

jaunty mural Mar 21, 2022, 8:30 PM

#

night guys, sorry for disturbing, I feel a bit sick and can't concantrate on simple task. How can I add points(markers) for this subplot and prevent displaying scientific number in axis

cerulean finch Mar 21, 2022, 9:02 PM

#

whats the ".exe" encoding bois?

iron basalt Mar 21, 2022, 9:25 PM

#

cerulean finch whats the ".exe" encoding bois?

The Microsoft portable executable format? Is this a DS question?

iron basalt Mar 21, 2022, 9:33 PM

#

mint palm embedding needs encoding first right?

Encoding is putting something into some kind of system of signals (very generic term). Embedding's goal is to embed something into something to gain new insight about it and other things related to it. You can imagine embedding the data as analogous to Archimedes embedding Hiero's crown into liquid to measure its volume.

#

Whenever you change the form of some data you have technically encoded it. But whether or not that encoding is useful in that it lets you compare things is what matters.

mint palm Mar 21, 2022, 9:38 PM

#

how do i predict embedding layer attributes

mint palm Mar 21, 2022, 9:38 PM

#

iron basalt Whenever you change the form of some data you have technically encoded it. But w...

can you tell it dependencies

#

input_dim
output_dim
and input_length
while embedding

iron basalt Mar 21, 2022, 9:40 PM

#

You need to define some embedded space and then have something that embeds items into that space. The input_dim is whatever the input's dim is. The output_dim is whatever you decided.

brave sand Mar 21, 2022, 9:45 PM

#

iron basalt You need to define some embedded space and then have something that embeds items...

hey squiggle

#

will this equation work for all reinforcement learning tasks?

#

new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)

pastel valley Mar 21, 2022, 9:51 PM

#

im using free google colab and i cant train 100 epochs all in a single runtime so can i just train it 50 epochs first then save the model then next runtime load and train it for another 50 epochs?

iron basalt Mar 21, 2022, 10:04 PM

#

brave sand will this equation work for all reinforcement learning tasks?

If by works you mean works well, then no, not on every task (but a lot of them). Q-learning "works" on any RL task, as in you can try to apply it always. If you want another option see SARSA and try to figure out which tasks it would perform better on and why.

brave sand Mar 21, 2022, 10:09 PM

#

iron basalt If by works you mean works well, then no, not on every task (but a lot of them)....

Sure, I will look into SARSA, since I’m working on Lunar Lander, how hard is a task similar to lunar lander but with 4 motors?

odd meteor Mar 22, 2022, 12:37 AM

#

ocean pendant im kinda new to learning curve analysis . does anyone know if this curve is good...

From your plot, you can see that your train set (blue line) Loss reduces as the number of Epoch increases. However, we can't say the same about the Validation loss.

The validation loss briefly reduced alongside train loss until, say, in the 7th Epoch when it slowly starts to diverge.

So, in essence the bigger/wider the resulting space caused by the divergence between your Train loss and Validation loss, the more your model overfits the data

odd meteor Mar 22, 2022, 12:54 AM

#

ocean pendant im trying to run optimization buut I dont exactly know how I can fine tune the ...

Try to use EarlyStopping callback to prevent overfitting.

from Keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience = 5)
model.fit(X_train, y_train, epochs =250, validation_data =(X_yest, y_test), callbacks=[early_stopping])

Also, for performing hyperparameter tunning in DL using an approach that's somewhat synonymous to RandomizedSearchCV in sklearn, you could use sklearn wrapper from keras

from keras.wrappers.acikit_learn import KerasClassifier`

quartz plank Mar 22, 2022, 12:59 AM

#

Has anyone tried out neural intent?

lapis sequoia Mar 22, 2022, 1:17 AM

#

i'm trying to make a new column that converts the S&P ratings to numbers

#

import pandas as pd

grades = {
    'AAA': 1,
    'AA+': 2,
    'AA': 3,
    'AA-': 4,
    'A+': 5,
    'A': 6,
    'A-': 7,
    'BBB+': 8,
    'BBB': 9,
    'BBB-': 10,
    'BB+': 11,
    'BB': 12,
    'BB-': 13,
    'B+': 14,
    'B': 15,
    'B-': 16,
    'CCC+': 17,
    'CCC': 18,
    'CCC-': 19,
    'CC': 20,
    'C': 21,
    'D': 22,
}

states = pd.read_csv('./data/states_credit_scores.csv')
states_frame = pd.DataFrame(states)
number_sp = [grades[x] for x in states_frame['Rating']]
states_frame['Rating_Num'] = number_sp
states_frame.sort_values(by='Rating_Num', inplace=True)
states_frame

countries = pd.read_csv('./data/countries_credit_scores.csv')
countries_frame = pd.DataFrame(countries)
number_cp = [grades[x] for x in countries_frame['Rating']]
countries_frame['Rating_Num'] = number_cp
countries_frame.sort_values(by='Rating_Num', inplace=True)
countries_frame

#

this is my error:

Traceback (most recent call last):
  File "/usr/lib/python3.8/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/snap/pycharm-professional/278/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/snap/pycharm-professional/278/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/amicharski/PycharmProjects/njBudget/main.py", line 37, in <module>
    number_cp = [grades[x] for x in countries_frame['Rating']]
  File "/home/amicharski/PycharmProjects/njBudget/main.py", line 37, in <listcomp>
    number_cp = [grades[x] for x in countries_frame['Rating']]
KeyError: nan

spark dirge Mar 22, 2022, 1:29 AM

#

turn it into a for loop and print each iteration? that or running in a debugger to see which key is causing problems.

tranquil yarrow Mar 22, 2022, 1:54 AM

#

Anyone know of a way to invert the background/axis/label colors on a matplotlib 3d plot?
I'm trying to plot the orbits of solar system bodies, and they are color-coded with light colors because they will be on dark background (space) eventually.
I could probably convert to HSL and then just turn down the lightness, but a dark background would probably just be better.
Here is what they look like currently:

#

For anyone curious, the visible blue orbit is Neptune, the largest colored orbit is the dwarf planet Gonggong, and the biggest gray orbit is Comet Ikeya-Zhang.

lapis sequoia Mar 22, 2022, 2:18 AM

#

Any suggestions on how to solve systems of differential equations on GPU using Python? Are there any packages like SciPy that offer this functionality on the GPU? I posted this in the #algos-and-data-structs channel too but it might be more appropriate for this channel.

iron basalt Mar 22, 2022, 2:47 AM

#

lapis sequoia Any suggestions on how to solve systems of differential equations on GPU using P...

Approximate numerically or solve analytically?

#

Which functions from SciPy do you want?

lapis sequoia Mar 22, 2022, 2:51 AM

#

@iron basalt Numerical solvers. For example, the solve_ivp function in SciPy solves a system of ODEs but uses the CPU. Is there anything like that available for GPU?

lime current Mar 22, 2022, 3:31 AM

#

Hello, has anyone used or worked on "spotify/ANNOY" machine learning nearest neighbor model?
I need held on my project!!!!

#

help*

iron basalt Mar 22, 2022, 3:32 AM

#

lapis sequoia <@!119925597395877889> Numerical solvers. For example, the solve_ivp function in...

Not sure, but you could implement it yourself to run on the GPU using either Numba (probably the easiest if you can get it to use the GPU), cupy (if you are using Nvidia GPUs) / pycuda, pyopencl (any GPU or CPU / device with parallel compute), or Kompute (Vulkan).

#

It seems SciPy's default solving method is Runge-Kutta of order 5 (4). Assuming I mean interpreting "RK45" correctly and they don't actually mean "RKF45".

misty flint Mar 22, 2022, 3:42 AM

#

PikaThink

#

we're moving into squiggle's domain

#

NervousSip

iron basalt Mar 22, 2022, 3:45 AM

#

Just checked the source code for it, it's Runge-Kutta order 5 (4).

#

If you have never written a solver for it before there are plenty of tutorials, the code is really short.

#

So you can first write it with numpy, then move that to numba.

misty flint Mar 22, 2022, 3:49 AM

#

blobhyperthink

#

minitorch uses numba PikaThink

iron basalt Mar 22, 2022, 3:50 AM

#

SciPy's solve_ivp is basically just RK45 with some extra code for picking methods other than RK45, and parameter wrangling.

slate hollow Mar 22, 2022, 3:53 AM

#

can someone tell me why i need to install visual studio for cuda? i haven't installed any of the tools that it comes with, just the editor

#

and yet cuda seems to do just fine

#

what's so magical about vs 2019 that cuda wants

#

?

iron basalt Mar 22, 2022, 3:54 AM

#

slate hollow what's so magical about vs 2019 that cuda wants

It wants the Microsoft SDK probably, not the IDE. But the SDK comes with the IDE.

slate hollow Mar 22, 2022, 3:56 AM

#

Microsoft SDK
?

iron basalt Mar 22, 2022, 3:56 AM

#

Windows SDK*

slate hollow Mar 22, 2022, 3:57 AM

#

https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/

#

this thing?

iron basalt Mar 22, 2022, 4:00 AM

#

slate hollow https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/

I think so, I have not used Windows in a while, but I there is some SDK which is needed for development on Windows which is used by Visual Studio and others.

#

(unless you use mingw or something like that, but that is unofficial)

#

CUDA I think makes use of the visual studio SDK on windows so it may be that one (or both).

#

I know that whatever system you are using, CUDA hijacks your C++ compiler so you can write kernels in C++ directly.

#

Ah found the info: ```

Visual Studio is an IDE (Integrated Development Environment). It's the user interface.

Build Tools include the compiler that compiles your source code into machine code.

Windows SDK contains headers, libraries and sample code used to develop applications.

#

I would think that it needs the SDK and build tools.

#

But it probably wants to also integrate into visual studio.

slate hollow Mar 22, 2022, 4:05 AM

#

thanks!

iron basalt Mar 22, 2022, 4:06 AM

#

Windows does not have a standard way of dealing with SDKs / libs like Linux, so it's all a mess there.

misty flint Mar 22, 2022, 4:06 AM

#

oof

#

sounds about right tho

#

DoggoKek

#

think i wanna try this manim library

#

to see if i can make a short clip about numpy's broadcasting

#

blobhyperthink

rocky bough Mar 22, 2022, 4:21 AM

#

Hello, I'm looking to have an interactive bot that reacts to user messages in certain scenarios and tries to match up a user's response up to one of a few different request options. I'm kind of lost as to what approach I should take here. Any general pointers would be much appreciated!

iron basalt Mar 22, 2022, 4:32 AM

#

rocky bough Hello, I'm looking to have an interactive bot that reacts to user messages in ce...

So you are trying to classify their messages?

rocky bough Mar 22, 2022, 4:33 AM

#

yes

iron basalt Mar 22, 2022, 4:33 AM

#

Have you tried something really simple like naive bayes?

rocky bough Mar 22, 2022, 4:34 AM

#

I haven't really tried anything yet. I'm looking for pointers on what I can read up on or specific libraries to use

#

I've done Binary Classification to use, but now I need something more

#

and I also dont really have that large of a training set to work on

iron basalt Mar 22, 2022, 4:35 AM

#

Even more simple, you can just check for keywords in the messages.

#

Basically naive bayes, but hand crafted probabilities.

misty flint Mar 22, 2022, 4:36 AM

#

DoggoKek

rocky bough Mar 22, 2022, 4:36 AM

#

yeah, I've considered that, but in some cases, it would be necessary to differentiate between who is being referred to

iron basalt Mar 22, 2022, 4:37 AM

#

Well that is not just classification, that is much more complicated.

misty flint Mar 22, 2022, 4:37 AM

#

yeah best to start simple. you can always iterate later

rocky bough Mar 22, 2022, 4:37 AM

#

for example if the message says "I will do xyz, you should do abc", that needs to be analysed

iron basalt Mar 22, 2022, 4:37 AM

#

However, you could first classify it with something dumb like naive bayes, and then try to figure out stuff from there based on that class.

misty flint Mar 22, 2022, 4:38 AM

#

might as well learn NLP basics too

#

DoggoKek

iron basalt Mar 22, 2022, 4:38 AM

#

Or yeah, actually learn NLP.

misty flint Mar 22, 2022, 4:38 AM

#

you could use this as an excuse to learn more

misty flint Mar 22, 2022, 4:38 AM

#

iron basalt Or yeah, actually learn NLP.

kekHands

#

funny enough we went over RNNs today

#

super classic

#

not even LSTMs yet

#

or attention mechanisms

#

DoggoKek

#

we should get to transformers eventually

#

and modern NLP architectures

iron basalt Mar 22, 2022, 4:40 AM

#

If you are struggling with LSTM, try looking at GRU. It's better in every way.

#

More simple, better results.

misty flint Mar 22, 2022, 4:40 AM

#

i also looked at that today

#

but on my own

#

DoggoKek

#

what i was using: https://d2l.ai/chapter_recurrent-modern/gru.html

rocky bough Mar 22, 2022, 4:42 AM

#

ok, thanks for the pointers. so Naive Bayes can provide some basic classification, that can then be further analyzed, and if I need more I need to look further into NLP

#

since at the same time it can't be unreasonably complicated because more complex algorithms usually take more processing power, its probably better to keep it simple stupid anyway

mint palm Mar 22, 2022, 4:51 AM

#

Can someone please make me underatand how n_companents work in hashing.

iron basalt Mar 22, 2022, 5:03 AM

#

mint palm Can someone please make me underatand how n_companents work in hashing.

https://en.wikipedia.org/wiki/Hash_table

Hash table

In computing, a hash table (hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed ...

mint palm Mar 22, 2022, 5:07 AM

#

iron basalt https://en.wikipedia.org/wiki/Hash_table

I am having dimension error, when i change n_components

#

I understand how it works

#

But dont know how dimensions work

lapis sequoia Mar 22, 2022, 6:39 AM

#

Hello guys, looking for some easy to follow python repos on github (data engineering preferred) where the code is written in a modular and production appropriate manner.

#

Basically I have been writing code in Jupyter notebooks for data lift and shift but would like to learn how to convert the code into a more modular and reusable format.

steady basalt Mar 22, 2022, 8:59 AM

#

Boys my essay and coding assignment has been set

#

Which supervised models should I become an expert in 🧐

jaunty mural Mar 22, 2022, 9:14 AM

#

wow, after good cup of tea I've improved my simple script file to plot 4 subplots

icy nebula Mar 22, 2022, 10:12 AM

#

Hi there, Is there anyone who has a sample presentation/study file analyzing PCA components in terms of original variables? ( I am struggling to find a good example that explains PCA in business context)

odd meteor Mar 22, 2022, 10:56 AM

#

steady basalt Which supervised models should I become an expert in 🧐

As much as you can. There are many Supervised Learning algorithm, once you know a 2 or 3, it'll be easier to grasp how others work too. It's almost same syntax but different algorithms, and sometimes, different hyperparameters.

Knowing both Linear-based and Tree-based algorithm is quite important

odd meteor Mar 22, 2022, 11:05 AM

#

icy nebula Hi there, Is there anyone who has a sample presentation/study file analyzing PCA...

Do you understand what PCA does? I think if you've understood it very well you can easily apply/implement it in any business context.

You can check this video https://youtu.be/FgakZw6K1QQ

YouTube

StatQuest with Josh Starmer

StatQuest: Principal Component Analysis (PCA), Step-by-Step

Principal Component Analysis, is one of the most useful data analysis and machine learning methods out there. It can be used to identify patterns in highly complex datasets and it can tell you what variables in your data are the most important. Lastly, it can tell you how accurate your new understanding of the data actually is.

In this video, I...

▶ Play video

odd meteor Mar 22, 2022, 11:09 AM

#

lapis sequoia Hello guys, looking for some easy to follow python repos on github (data enginee...

I don't know of any but try checking this https://youtu.be/bkJZDmreIpA then put on your FBI hat and do a quick digging on their GitHub repo. You might find what you seek therein

YouTube

DataTalksClub ⬛

Data Engineering Zoomcamp

https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

▶ Play video

icy nebula Mar 22, 2022, 11:10 AM

#

odd meteor Do you understand what PCA does? I think if you've understood it very well you c...

I have recently dived into it, am trying to learn and apply. Thank you for the information and the YouTube link. I will watch it.

steady basalt Mar 22, 2022, 11:56 AM

#

odd meteor As much as you can. There are many Supervised Learning algorithm, once you know ...

I meant, the assignment you have to choose two models

#

Which two… I already know how all of them “work” I meant on an expert level

#

We have to use them predictively as well as write essays

odd meteor Mar 22, 2022, 12:05 PM

#

steady basalt We have to use them predictively as well as write essays

Awesome. You can pick any of your favourite algorithm. For me, I like CatBoost 😂

mint palm Mar 22, 2022, 12:34 PM

#

#

    categorical_columns = [c for c in dataset.columns if (c != 'Slice Type (Output)')]

    hs = category_encoders.HashingEncoder(cols=categorical_columns, n_components=16)
    d = hs.fit_transform(dataset)```

#

i applied this encoding, is this actually correct, i worry about not able to see any correlation

patent pine Mar 22, 2022, 12:48 PM

#

If I want to compare a column from 2 data frames, is there a more efficient way than df1.compare(df2) ?

#

Sorry I can not help with your problem. I was asking about mine 😂😂😭

desert oar Mar 22, 2022, 12:55 PM

#

patent pine If I want to compare a column from 2 data frames, is there a more efficient way ...

what do you mean by "compare" exactly?

#

and what do you mean by "efficient"?

desert oar Mar 22, 2022, 12:56 PM

#

odd meteor Awesome. You can pick any of your favourite algorithm. For me, I like CatBoost �...

i always had good results with lightgbm for quick and dirty gradient boosting

#

xgboost seems to need more "care and feeding" to get good model performance, and generally is slower

#

and catboost never gave me good results compared to lightgbm on the problems where i tried it

serene scaffold Mar 22, 2022, 12:57 PM

#

patent pine If I want to compare a column from 2 data frames, is there a more efficient way ...

if there's a pandas operation that is one method call, that is almost certainly the "most efficient" way by any definition of "efficient".

desert oar Mar 22, 2022, 12:57 PM

#

lapis sequoia Hello guys, looking for some easy to follow python repos on github (data enginee...

imo just start reading code that isn't "data" code. production-ready just means no bugs. which means it needs to be testable, which means you are basically writing an application and all the usual recommendations about application design apply

#

data science code is usually bad quality

#

read the scikit-learn source code, their code is usually pretty decent

#

it's a bit "old school" in some respects, but for the most part it's a well-organized and thoughtful code base

serene scaffold Mar 22, 2022, 12:58 PM

#

hmm, old school in what way?

desert oar Mar 22, 2022, 1:00 PM

#

no type hints, * imports

#

one thing that they do which is interesting is mapping 1:1 **kwargs to instance attributes, this is actually enforced by their base classes

#

in a world without type annotations, that's a really nice thing

#

and in general it makes it impossible to accidentally discard user input

#

also distinguishing "generated" fields by suffixing with _ is kind of ad-hoc but a very useful convention

#

of course they almost certainly should have gone the R/statsmodels route of returning a "result" object instead of mutating the original "model" object and adding a bunch of fields

#

oh yeah, another old school thing: fields that are not initialized in __init__ and using hasattr() to check the current state of the object

#

the flipside of making the model fitted in-place is that you can chain transformers easily, but that's kind of a quirky thing that you don't usually need anyway

red sphinx Mar 22, 2022, 1:03 PM

#

hey yall

#

i wanted to ask a question

#

if AI isnt telling the code if the person says "hello" or hi or sup or wassup or what ever word that means a welcoming action then how is AI made i mean like siri and google assastaint

serene scaffold Mar 22, 2022, 1:04 PM

#

red sphinx if AI isnt telling the code if the person says "hello" or hi or sup or wassup or...

programs like siri are not an AI on the whole, but they contain components that are

red sphinx Mar 22, 2022, 1:05 PM

#

yea but if siri hgas components that include AI

serene scaffold Mar 22, 2022, 1:05 PM

#

for example, if you ask siri a factual question, it uses an information retrieval algorithm to find a statement that answers the question, and that is AI.

red sphinx Mar 22, 2022, 1:06 PM

#

so when i say hello it uses a algorithm to know what does hello even mean and what to answer if a user says hello

#

is that what your saying?

serene scaffold Mar 22, 2022, 1:06 PM

#

no, just saying "hello" to siri and getting a response does not include any AI.

red sphinx Mar 22, 2022, 1:07 PM

#

then how?, how does it know what hello means

serene scaffold Mar 22, 2022, 1:07 PM

#

if user.says() in ('hello', 'hi'):
    return random.choice(['hi', 'hello', 'greetings'])

serene scaffold Mar 22, 2022, 1:07 PM

#

red sphinx then how?, how does it know what hello means

speech recognition. but it's just mapping what you say to a string.

red sphinx Mar 22, 2022, 1:08 PM

#

so you mean that this is how siri is mean

#

serene scaffold Mar 22, 2022, 1:08 PM

#

it's unlikely that the siri source code has whole conversations mapped out like this

red sphinx Mar 22, 2022, 1:08 PM

#

ofcourse

#

thats what im saying

serene scaffold Mar 22, 2022, 1:09 PM

#

but for trivial conversations, it's probably just picking from a few canned responses, just using speech recognition.

red sphinx Mar 22, 2022, 1:09 PM

#

oh,

#

well thanks

serene scaffold Mar 22, 2022, 1:09 PM

#

yw

red sphinx Mar 22, 2022, 1:09 PM

#

https://tenor.com/view/steven-universe-stevenuniverse-excited-happy-gif-7198942

Tenor

red sphinx Mar 22, 2022, 1:10 PM

#

serene scaffold yw

well can i ask you anthor question

serene scaffold Mar 22, 2022, 1:10 PM

#

red sphinx well can i ask you anthor question

sure

red sphinx Mar 22, 2022, 1:10 PM

#

a quick one

#

thanks

red sphinx Mar 22, 2022, 1:10 PM

#

serene scaffold ```py if user.says() in ('hello', 'hi'): return random.choice(['hi', 'hello'...

in your code here

#

return random.choice

#

i want to put that in my code

#

look ill give you an example

serene scaffold Mar 22, 2022, 1:11 PM

#

it's my code, so you owe me 100 bucks

modest shuttle Mar 22, 2022, 1:12 PM

#

Hello,
Which is better for detecting fake news?
Supervised Learning? Semi-Supervised Learning? UnSupervised Learning? Deep Learning?

red sphinx Mar 22, 2022, 1:12 PM

#

message = input("Type your message: ")
if message == ("hello"):
print("hello", "hi", "greetings")

#

will this work?

serene scaffold Mar 22, 2022, 1:12 PM

#

modest shuttle Hello, Which is better for detecting fake news? Supervised Learning? Semi-Super...

these are all broad categories of algorithms. your question can only be answered in terms of specific algorithms.

#

also none of those are mutually exclusive with deep learning

serene scaffold Mar 22, 2022, 1:13 PM

#

red sphinx ```py message = input("Type your message: ") if message == ("hello"): print("hel...

you don't need to wrap hello in parentheses. the indentation for the print call is wrong. that would print all three of those, not one of them randomly.

red sphinx Mar 22, 2022, 1:14 PM

#

oh

#

so how do i make it random then?

modest shuttle Mar 22, 2022, 1:14 PM

#

serene scaffold also none of those are mutually exclusive with deep learning

What is your suggestion?

serene scaffold Mar 22, 2022, 1:14 PM

#

red sphinx so how do i make it random then?

it's in the code example you were referencing.

red sphinx Mar 22, 2022, 1:14 PM

#

so i can just type

#

message = input("")

if message == hello:
return random.choice(['hi', 'hello'])

serene scaffold Mar 22, 2022, 1:15 PM

#

red sphinx ```py message = input("") if message == hello: return random.choice(['hi', 'hel...

!indent

arctic wedgeBOT Mar 22, 2022, 1:15 PM

#

Indentation

Indentation is leading whitespace (spaces and tabs) at the beginning of a line of code. In the case of Python, they are used to determine the grouping of statements.

Spaces should be preferred over tabs. To be clear, this is in reference to the character itself, not the keys on a keyboard. Your editor/IDE should be configured to insert spaces when the TAB key is pressed. The amount of spaces should be a multiple of 4, except optionally in the case of continuation lines.

Example

def foo():
    bar = 'baz'  # indented one level
    if bar == 'baz':
        print('ham')  # indented two levels
    return bar  # indented one level

The first line is not indented. The next two lines are indented to be inside of the function definition. They will only run when the function is called. The fourth line is indented to be inside the if statement, and will only run if the if statement evaluates to True. The fifth and last line is like the 2nd and 3rd and will always run when the function is called. It effectively closes the if statement above as no more lines can be inside the if statement below that line.

Indentation is used after:
1. Compound statements (eg. if, while, for, try, with, def, class, and their counterparts)
2. Continuation lines

More Info
1. Indentation style guide
2. Tabs or Spaces?
3. Official docs on indentation

modest shuttle Mar 22, 2022, 1:16 PM

#

modest shuttle What is your suggestion?

@serene scaffold ?

serene scaffold Mar 22, 2022, 1:16 PM

#

modest shuttle <@!253696366952316929> ?

I don't have one, sorry

red sphinx Mar 22, 2022, 1:17 PM

#

serene scaffold !indent

yea about the tabs im typing the code in discord thats why the tabs arent there

#

well thanks

serene scaffold Mar 22, 2022, 1:17 PM

#

a company I interviewed for told me about their fake news detection algorithm, but I don't think they want me repeating it.

serene scaffold Mar 22, 2022, 1:17 PM

#

red sphinx yea about the tabs im typing the code in discord thats why the tabs arent there

if the indentation isn't there when you paste the code into Discord, I have no way of knowing what the actual code looks like.

red sphinx Mar 22, 2022, 1:17 PM

#

oh ok

#

then ill just make it in VSC then send it here

#

well ik i have asked from you alot

#

but just the last question

#

user.says() THIS

#

THIS

#

I HAVE SUFFRED FROM THIS

#

sorry caps but dude please tell me

serene scaffold Mar 22, 2022, 1:18 PM

#

that part is entirely made up. there is no user.says()

red sphinx Mar 22, 2022, 1:18 PM

#

ik

#

but the user.says

#

has the input code and stuff

#

please tell me how do i make it

serene scaffold Mar 22, 2022, 1:19 PM

#

I don't know.

red sphinx Mar 22, 2022, 1:19 PM

#

oh

#

ok

#

well thanks for the help

#

bye

odd meteor Mar 22, 2022, 1:21 PM

#

desert oar i always had good results with lightgbm for quick and dirty gradient boosting

I like LightGBM too. For me, it's CatBoost, XGBoost, LightGBM in that order. 😀

patent pine Mar 22, 2022, 1:27 PM

#

desert oar what do you mean by "compare" exactly?

I have two models, and I want to compare their output. I want to know which event did they predict differently.
And since I'm dealing with massive data frames, I want to use the most efficient way to compare. Efficient in terms of memory.

patent pine Mar 22, 2022, 1:28 PM

#

serene scaffold if there's a pandas operation that is one method call, that is almost certainly ...

Yes, there is compare. So I should just use it?
Thank you

keen helm Mar 22, 2022, 2:04 PM

#

ok, i have https://www.toptal.com/developers/hastebin/epelemifuz.properties as (a* algorithm) pathfind AI, i just wanna ask, where is the (0,0) point in this list https://www.toptal.com/developers/hastebin/novaconile.ini

Hastebin: Send and Save Text or Code Snippets for Free | ToptalÂ®

Hastebin is a free web-based pastebin service for storing and sharing text and code snippets with anyone. Get started now.

Hastebin: Send and Save Text or Code Snippets for Free | ToptalÂ®

Hastebin is a free web-based pastebin service for storing and sharing text and code snippets with anyone. Get started now.

short heart Mar 22, 2022, 2:32 PM

#

is there an easy way to replace values with their mean for array area. For example:

[[0,6,0],
[6,3,1]]

and id want to take mean of [0,6].[6,3],[0,1] so result will be following:

[[3,4.5,0.5],
3,4.5,0.5]]```

with numpy or without, anything will do

serene scaffold Mar 22, 2022, 2:46 PM

#

you can use mean imputation to replace the numeric NaNs and mode imputation to replace the string NaNs. Both of these can involve DataFrame.fillna. If you have any follow up questions, keep in mind that I will not look at any screenshots of text, only actual text in a markdown block in the chat or in the pastebin.

serene scaffold Mar 22, 2022, 2:47 PM

#

short heart is there an easy way to replace values with their mean for array area. For examp...

which values are you trying to replace?

keen helm Mar 22, 2022, 2:52 PM

#

keen helm ok, i have https://www.toptal.com/developers/hastebin/epelemifuz.properties as (...

:) which one?

acoustic crow Mar 22, 2022, 3:04 PM

#

serene scaffold you can use mean imputation to replace the numeric NaNs and mode imputation to r...

Apologies for sending them in such format. I am not looking to replace the values in the dataset, but instead whenever the function encounters a NULL value just to skip over it and do nothing with it. This is where I define the column list and the function


# positive integer columns
pos_int_col = data_check[['ApplicationFinancedAmount'
              ,'AssetHighestValueGapRatio'
              ,'AssetHighestValueManufacturingYear'
              ,'DeductionPercentage']] 

# Creating a function to check for negative values "find_neg_index" and print the value of the row and its position
# The function takes two arguments 
# df - the dataframe to validte upon
# num_col - is a predifined list of integer values only columns within the dataframe / or a single integer column

def find_neg_index(df, num_col):
  
  neg_dict = {}
# Iterating on column level
  for col in num_col:

# Creating a list within the dictionary and adding the column name as key and input an empty list as it pair
    neg_dict[col] = []

# Getting the full lenght of the dataframe, row
    indx_list = range(0,len(df[col]))

# Creating an empty list for the index position
    neg_indices = []
  
# Iterating on row level
    for indx in indx_list:   
    
# Extracting the value on each row the loop is working on
      val = data_check.loc[indx,col]

# Setting the condition for the validation and transforming string values to numeric 
      if pd.to_numeric(val) < 0:
        print('Find ',val,'at row',indx,'for column',col)
        neg_indices.append(indx)
        neg_dict.update({col:neg_indices})
        
  return neg_dict ```

#

After that when I parse the dataframe and the list of columns through the function I get the following error:


#error message
Find  -62242.65 at row 25 for column ApplicationFinancedAmount
ValueError: Unable to parse string "NULL" at position 0
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "NULL"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<command-156159945049379> in <module>
----> 1 find_neg_index(data_trans, pos_int_col)

<command-3934177330870497> in find_neg_index(df, num_col)
     26 
     27 # Setting the condition for the validation and transforming string values to numeric
---> 28       if pd.to_numeric(val) < 0:
     29         print('Find ',val,'at row',indx,'for column',col)
     30         neg_indices.append(indx)

/databricks/python/lib/python3.8/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    152         coerce_numeric = errors not in ("ignore", "raise") ```

#

I just want for the function to skip over the NULLS but I am not sure how to achieve that.

serene scaffold Mar 22, 2022, 3:06 PM

#

acoustic crow Apologies for sending them in such format. I am not looking to replace the value...

you can use dropna

#

if you call dropna on a series, it will give you a copy of the series with no NaNs. if you do it on a df, it will give you a copy without rows that had at least one NaN

acoustic crow Mar 22, 2022, 3:09 PM

#

serene scaffold if you call dropna on a series, it will give you a copy of the series with no Na...

Thank you!

keen helm Mar 22, 2022, 3:11 PM

#

how does A* determine x and y position

lapis sequoia Mar 22, 2022, 4:07 PM

#

What rl algorithm would be smart to choose for a simple shooter game.

The agent would be given the
position and rotation of enemies (direction their facing)
distance to the enemies, and it's own

rotation, speed, position and acceleration.

Possible moves would be
Turn left, right
accelerate forwards backwards left and right, fire.

Positive points for hitting enemies, negative points for being hit.

(If for whatever reason more complexity is needed, enemies could move, the agent could accelerate in 8 directions instead of 4 (diagonal) and the agent only gets position of visible enemies.)

lapis sequoia Mar 22, 2022, 4:32 PM

#

is qr decomposition in numpy a static method? as in, does it involve any random factors or it will give same output for the same matrix everytime?

misty flint Mar 22, 2022, 4:44 PM

#

could probably use simple q-learning for that

#

pithink

brave sand Mar 22, 2022, 4:46 PM

#

if i had a drone learn how to fly, what would be the reward?

lapis sequoia Mar 22, 2022, 4:56 PM

#

misty flint could probably use simple q-learning for that

what is q learning?

brave sand Mar 22, 2022, 5:00 PM

#

lapis sequoia what is q learning?

Google

lapis sequoia Mar 22, 2022, 5:05 PM

#

hm yeah Q learning is related to reinforcement learning, and qr decomposition is a very different process.

mint palm Mar 22, 2022, 5:17 PM

#

Will i have to experiment when it comes making architectures like this:

misty flint Mar 22, 2022, 5:38 PM

#

lapis sequoia what is q learning?

not for you. the other guy is asking about RL algorithms bud

#

kekHands

lapis sequoia Mar 22, 2022, 5:39 PM

#

hi

#

where do i ask for help

misty flint Mar 22, 2022, 5:39 PM

#

try a #help channel

serene scaffold Mar 22, 2022, 5:40 PM

#

lapis sequoia where do i ask for help

this is the channel to ask for data science help. otherwise there are instructions in #❓｜how-to-get-help. just don't interrupt someone else's help channel

lapis sequoia Mar 22, 2022, 5:40 PM

#

tysm

misty flint Mar 22, 2022, 5:40 PM

#

yes there should be available ones

#

so try to use those

lapis sequoia Mar 22, 2022, 5:50 PM

#

misty flint not for you. the other guy is asking about RL algorithms bud

Oh yeah yeah lol my bad

misty flint Mar 22, 2022, 5:54 PM

#

all good

mint palm Mar 22, 2022, 6:02 PM

#

mint palm Will i have to experiment when it comes making architectures like this:

any advice ? plz

worldly zinc Mar 22, 2022, 6:11 PM

#

Do questions about numerical integration methods go here?

serene scaffold Mar 22, 2022, 6:12 PM

#

worldly zinc Do questions about numerical integration methods go here?

I suppose, but unfortunately I think it's unlikely that you will get an answer.

worldly zinc Mar 22, 2022, 6:12 PM

#

Ah, darn

#

I tried the help channel but no luck, I'll try another server

#

thanks!

paper compass Mar 22, 2022, 6:39 PM

#

hi! is it possible to create working sound classification model for atypical speech (stuttering etc) using this dataset? https://github.com/apple/ml-stuttering-events-dataset i'm fairly new to machine learning and my attempts left me unsatisfied.

GitHub

GitHub - apple/ml-stuttering-events-dataset

Contribute to apple/ml-stuttering-events-dataset development by creating an account on GitHub.

#

most of these audio clips contain multiple labels (in rating system 0-3)

misty flint Mar 22, 2022, 7:04 PM

#

not sure

#

speech recognition is not my area of expertise

#

but it does sound like an interesting problem

#

PikaThink

lapis sequoia Mar 22, 2022, 7:23 PM

#

paper compass hi! is it possible to create working sound classification model for atypical spe...

it looks promising at first, but I cannot find the short stuttering clips

#

only longer interviews, seems too messy to work with

misty flint Mar 22, 2022, 7:25 PM

#

oof

#

probs gotta clean it first before can get to usable state then

#

kekHands

vestal ocean Mar 22, 2022, 7:28 PM

#

artists_european = artists_european.drop(['Position','Track Name', 'URL', 'Date','Region'], axis = 1) ```

#

Why does this code give me this?

#

#

But i get this when running ```py
artists_european = artists_european.groupby("Artist")['Streams'].sum()

#

How can i make stay in the previous format but just summing the streams for each artist?

lapis sequoia Mar 22, 2022, 7:29 PM

#

with previous format you mean including the index labels ?

#

or the order ?

vestal ocean Mar 22, 2022, 7:30 PM

#

lapis sequoia with previous format you mean including the index labels ?

just the previous format, indexing doesnt really matter

#

i can just reset it

lapis sequoia Mar 22, 2022, 7:32 PM

#

the result of groupby is not a dataframe, that's why it looks different https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

paper compass Mar 22, 2022, 7:33 PM

#

lapis sequoia it looks promising at first, but I cannot find the short stuttering clips

there's a script to extract short clips, which i already did receive

#

but training model with this data is kinda unclear for me, how should i correctly label those clips?

lapis sequoia Mar 22, 2022, 7:33 PM

#

paper compass there's a script to extract short clips, which i already did receive

ah that will help great ^^

paper compass Mar 22, 2022, 7:34 PM

#

if for example one clip contains 4 labels, with associated number from 1 to 3

lapis sequoia Mar 22, 2022, 7:34 PM

#

paper compass but training model with this data is kinda unclear for me, how should i correctl...

oh I don't know that much about it yet, just the very basics
I learn mostly Python and pandas lately, ML just every now and then not deep yet

vestal ocean Mar 22, 2022, 7:34 PM

#

lapis sequoia the result of groupby is not a dataframe, that's why it looks different https://...

is there an alternative that would do what im looking for?

lapis sequoia Mar 22, 2022, 7:35 PM

#

vestal ocean is there an alternative that would do what im looking for?

I think your best bet would be to turn the groupbydf object back into a dataframe

#

is the dataset publically available ?

misty flint Mar 22, 2022, 7:36 PM

#

paper compass if for example one clip contains 4 labels, with associated number from 1 to 3

this is what it seems like bud

#

3 means all 3 agree on that label

vestal ocean Mar 22, 2022, 7:37 PM

#

lapis sequoia is the dataset publically available ?

No it isnt unfortunately

paper compass Mar 22, 2022, 7:39 PM

#

misty flint this is what it seems like bud

yeah exactly! training my model with only clips that all reviewers agreed to give certain label resulted in 60% accuracy.. so what really could be improved?

lapis sequoia Mar 22, 2022, 7:40 PM

#

vestal ocean No it isnt unfortunately

ah well I'll make a simple mock-up version, hold on

vestal ocean Mar 22, 2022, 7:41 PM

#

lapis sequoia ah well I'll make a simple mock-up version, hold on

i think i might have figured it out

mint palm Mar 22, 2022, 7:41 PM

#

suggest a functional api plz, column 7 and 8 are greatly correlated, other comparitivly preety less

misty flint Mar 22, 2022, 7:41 PM

#

paper compass yeah exactly! training my model with only clips that *all reviewers* agreed to ...

have you tried with 2 reviewers or even different models. i think at this point its just trying different approaches/ methods

vestal ocean Mar 22, 2022, 7:41 PM

#

vestal ocean i think i might have figured it out

but my font is weird especially for suicideboy, is there a way to make all of the font types to be the same?

misty flint Mar 22, 2022, 7:42 PM

#

i also dont know what you tried so theres always dif ways for improvements

paper compass Mar 22, 2022, 7:43 PM

#

misty flint have you tried with 2 reviewers or even different models. i think at this point ...

i've tried multiple approaches with no success, therefore my main question is — is this dataset capable of creating actually working model?

misty flint Mar 22, 2022, 7:43 PM

#

who knows tbh

#

you cant really look at a dataset and know immediately without exploring and/or trying a few models

#

kekHands

paper compass Mar 22, 2022, 7:44 PM

#

so there's hope, that's all i really needed to know lol

misty flint Mar 22, 2022, 7:44 PM

#

kekHands

#

glad i could help (?) kekHands

paper compass Mar 22, 2022, 7:45 PM

#

yes u did help! thank you 🙂

misty flint Mar 22, 2022, 7:45 PM

#

Praise

#

@strange stump what do you mean by just plotting and gradients?

#

you seem like youre at least comfortable in excel, no?

strange stump Mar 22, 2022, 7:50 PM

#

scatter plots for example

misty flint Mar 22, 2022, 7:50 PM

#

thats a good start

strange stump Mar 22, 2022, 7:51 PM

#

i used python to analyse my data for my degree

#

i am a physics student

misty flint Mar 22, 2022, 7:51 PM

#

i see

steady basalt Mar 22, 2022, 7:51 PM

#

odd meteor Awesome. You can pick any of your favourite algorithm. For me, I like CatBoost �...

We’re only allowed to use the basic ones

strange stump Mar 22, 2022, 7:51 PM

#

had some "useless" data i needed to clean and then identify peaks

#

i used python for this

misty flint Mar 22, 2022, 7:51 PM

#

physics is a good background for this tbh

#

since you are used to working with messy data kekHands

strange stump Mar 22, 2022, 7:52 PM

#

ya thats probably why i got the interview

misty flint Mar 22, 2022, 7:52 PM

#

thats good

#

i would try to step back and try to understand the dataset when you open it up

#

look at the column names, see if theres any attached documentation that might give you more context

strange stump Mar 22, 2022, 7:52 PM

#

yeah ok

misty flint Mar 22, 2022, 7:52 PM

#

then you can maybe figure out what exactly you want to plot

strange stump Mar 22, 2022, 7:53 PM

#

so to my understanding

misty flint Mar 22, 2022, 7:53 PM

#

what is independent vs dependent

strange stump Mar 22, 2022, 7:53 PM

#

id wanna clean it first

#

technically i can do that with python

misty flint Mar 22, 2022, 7:53 PM

#

typically but you may also receive an already cleaned dataset

strange stump Mar 22, 2022, 7:54 PM

#

true!

misty flint Mar 22, 2022, 7:54 PM

#

if youre more comfortable in python, then feel free

#

whatever youre most comfortable with

strange stump Mar 22, 2022, 7:54 PM

#

and they said i have 30 mins for this

#

so its probably already cleaned

misty flint Mar 22, 2022, 7:54 PM

#

then after understanding the columns, i would do some EDA (Exploratory Data Analysis)

strange stump Mar 22, 2022, 7:54 PM

#

i just gotta draw some conclusions about the variables they give

misty flint Mar 22, 2022, 7:54 PM

#

pandas is really good for EDA

strange stump Mar 22, 2022, 7:55 PM

#

i might need to look at that more

#

i just remember using pandas to store data in a dataframe

misty flint Mar 22, 2022, 7:55 PM

#

strange stump i might need to look at that more

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html

#

is a good start

strange stump Mar 22, 2022, 7:55 PM

#

and then doing fourier transforms on my cleaned data

lapis sequoia Mar 22, 2022, 7:55 PM

#

vestal ocean i think i might have figured it out

yeah artists_european.reset_index(drop=True)
the $ make it cursive, weird cannot find info about it online

misty flint Mar 22, 2022, 7:55 PM

#

since you can see avg, max, min, count, etc.

misty flint Mar 22, 2022, 7:56 PM

#

strange stump and then doing fourier transforms on my cleaned data

lol you probably wont have to do fourier transformations here unless there is signal processing involved kekHands

strange stump Mar 22, 2022, 7:56 PM

#

HAHAHAHAH

#

yeah its just nice discrete data i hope

#

i wouldnt expect anything too spicy just a nice simple graph

misty flint Mar 22, 2022, 7:56 PM

#

probs otherwise it would be a bit much to expect from a data analyst role

strange stump Mar 22, 2022, 7:56 PM

#

graduate level too...

misty flint Mar 22, 2022, 7:57 PM

#

yeah how comfortable are you with python viz libraries

#

either matplotlib or plotly or seaborn

strange stump Mar 22, 2022, 7:57 PM

#

for example matplotlib?

#

ive used matplotlib to draw my graphs

misty flint Mar 22, 2022, 7:57 PM

#

be able to make sure you can

plot graphs
label axes and titles
draw simple regression lines

strange stump Mar 22, 2022, 7:57 PM

#

the other channel user said seaborn is just for visualisation too but not necessary

misty flint Mar 22, 2022, 7:57 PM

#

yeah it just looks nicer is all

#

im partial to plotly myself

#

but you should be able to still convey the info with matplotlib

strange stump Mar 22, 2022, 7:58 PM

#

should i bother learning how to present data with seaborn?

misty flint Mar 22, 2022, 7:58 PM

#

ehh

#

do you think it will help

strange stump Mar 22, 2022, 7:58 PM

#

i mean if its visually pleasing they might like it more?

misty flint Mar 22, 2022, 7:58 PM

#

or do you think the company cares more about the info

strange stump Mar 22, 2022, 7:58 PM

#

psychology and sht like that

misty flint Mar 22, 2022, 7:58 PM

#

i mean its not that hard to pick up tbh

#

so up to you

#

i wouldnt use matplotlib graphs in any documentation or papers but thats me

#

kekHands

strange stump Mar 22, 2022, 7:59 PM

#

i think theyre a bit ugly too yeah xD

#

ok fine i got the visualisation bit

#

and maybe the analysis

#

so i should be good then?

misty flint Mar 22, 2022, 7:59 PM

#

good

strange stump Mar 22, 2022, 7:59 PM

#

i just gotta answer some questions apparently

misty flint Mar 22, 2022, 7:59 PM

#

get some practice in working with regular datasets and you should be good

#

just so you dont run out of time

strange stump Mar 22, 2022, 8:00 PM

#

oh yeah

#

ok i should be good then

#

thanks man

#

ill ping you next week IF i need it 😄

misty flint Mar 22, 2022, 8:01 PM

#

no problem bud

#

best of luck

#

Praise

vestal ocean Mar 22, 2022, 8:35 PM

#

lapis sequoia yeah artists_european.reset_index(drop=True) the $ make it cursive, weird cannot...

Is there a better solution to what I did?

lapis sequoia Mar 22, 2022, 8:38 PM

#

vestal ocean Is there a better solution to what I did?

not that I know of

mild dirge Mar 22, 2022, 11:07 PM

#

Using an existing neural network model (efficient net b1) for object recognition, but untrained. Trying to fit it on dataset of 400 classes (bird images) with about 100 images per class. Is this going to take really long to train/ is it possible from an untrained efficient net b1 model?

#

Running it with pytorch on a 2080 gpu with cuda

agile cobalt Mar 22, 2022, 11:17 PM

#

according to https://keras.io/api/applications/#usage-examples-for-image-classification-models it takes around 5.6 ms for each inference, but other than that I have no idea about what to even look for

mild dirge Mar 22, 2022, 11:18 PM

#

hmm

#

I'm more so wondering if it would even converge after a reasonable amount of epochs

#

it takes about 5 mins per epoch (40k images)

#

But I guess there's no good way to tell

#

Training such a massive network from scratch seems not very do-able

agile cobalt Mar 22, 2022, 11:20 PM

#

7.9M parameters with a depth of 186...

mild dirge Mar 22, 2022, 11:20 PM

#

yeah haha

#

using transfer learning it's literally 1 epoch that takes about 1 minute and 90% accuracy

#

Doing it for a project comparing pre-trained and non-pretrained networks

agile cobalt Mar 22, 2022, 11:21 PM

#

maybe look into https://keras.io/api/applications/efficientnet_v2?

mild dirge Mar 22, 2022, 11:21 PM

#

But otherwise i'd have to design my own network, which makes the comparison basically worthless

mild dirge Mar 22, 2022, 11:21 PM

#

agile cobalt maybe look into <https://keras.io/api/applications/efficientnet_v2>?

Why would this be better if I may ask? is it smaller?

agile cobalt Mar 22, 2022, 11:22 PM

#

the v2 sounds like it should be a direct improvement on the v1

mild dirge Mar 22, 2022, 11:22 PM

#

right haha

agile cobalt Mar 22, 2022, 11:23 PM

#

same authors, two years of progress later down the line

mild dirge Mar 22, 2022, 11:33 PM

#

Don't see it in pytorch, and all my code is with pytorch rn so that would complicate a lot

#

I'll just try efficientnet b0 for now, it seems a lot smaller and at least 3 times faster per iteration

mild dirge Mar 23, 2022, 1:29 AM

#

!paste

#

https://paste.pythondiscord.com/giviqakofi

#

Currently looking at this code (custom CNN model using PyTorch), And i'm not completely sure how the shapes match for a specific line (line 46)

#

The input shape there is 64 x 7 x 7 but in the forward pass they explain that the output after the layer before it would be 128 x 7 x 7 (line 68)

#

The code seems to work fine however, so is the comment wrong, or am I missing something?

#

And a bonus question, They seem to bundle these layers up multiple times. Does this pattern have a name? what does res stand for?

#

Appreciate any response!

misty flint Mar 23, 2022, 2:54 AM

#

PikaThink

#

possibly ResNet

#

where are all our CV peeps kekHands

misty flint Mar 23, 2022, 3:01 AM

#

mild dirge And a bonus question, They seem to bundle these layers up multiple times. Does t...

looks like its a typical ResNet "residual block"

#

ResNet follows VGG’s full convolutional layer design. The residual block has two convolutional layers with the same number of output channels. Each convolutional layer is followed by a batch normalization layer and a ReLU activation function.

#

https://d2l.ai/chapter_convolutional-modern/resnet.html

#

im not even a computer vision guy

#

kekHands

mild dirge Mar 23, 2022, 3:40 AM

#

misty flint looks like its a typical ResNet "residual block"

Ah that would make sense, just read about the residual block coincidentally too lol

#

Thx for the reply!

mild dirge Mar 23, 2022, 3:41 AM

#

mild dirge Currently looking at this code (custom CNN model using PyTorch), And i'm not com...

still wondering about this though if anyone could explain that (might reply late, going to sleep rn)

misty flint Mar 23, 2022, 3:49 AM

#

yeah maybe the comment is wrong. i believe you can always check

#

have it print the size at that line or something

#

good night

#

waveboye

misty flint Mar 23, 2022, 5:19 AM

#

https://medium.com/p/3de817bd69ec

Medium

Why Data Engineering Has Overtaken Data Science

It was October 2012, Data Science seemingly exploded overnight with the publishing of the now-famous Harvard Business Review article titled…

#

blobhyperthink

stone marlin Mar 23, 2022, 5:59 AM

#

I feel like one thing that people looking to advance in their DS careers tend to not think about is DE and Devops stuff, as well as Business-related things. It's totally possible to become a staff or whatever DS without this stuff, but having a general understanding, in my opinion, makes one much more competitive in the industry and allows for a more holistic understanding of the entire data pipeline --- instead of just modeling.

But since I'm doing MLE right now, my opinion is pretty biased, ha.

lone drum Mar 23, 2022, 7:16 AM

#

hello python File "C:\Users\Admin\AppData\Local\Temp/ipykernel_2592/3914410830.py", line 1 nf_df_cur_exp = df[df['Expiry_new'] == 2022-03-24] ^ SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

#

how i can fix above error ?

#

when i try python nf_df_cur_exp = df[df['Expiry_new'] == 2022-0o3-24] i am getting empty dataframe

#

can anyone help me in this ? ping me when reply

iron basalt Mar 23, 2022, 7:21 AM

#

lone drum hello ```python File "C:\Users\Admin\AppData\Local\Temp/ipykernel_2592/3914410...

2022-03-24 is not valid python syntax.

#

Did you mean "2022-03-24"?

lone drum Mar 23, 2022, 7:32 AM

#

iron basalt `2022-03-24` is not valid python syntax.

Hii thanks for your response
I have fixed the issue

inland zephyr Mar 23, 2022, 9:48 AM

#

i have question about loss while training a model

Epoch 1/10
8/8 [==============================] - 14s 578ms/step - loss: 529.6362 - accuracy: 0.7676 - categorical_crossentropy: 529.6362 - val_loss: 62.1763 - val_accuracy: 0.0000e+00 - val_categorical_crossentropy: 62.1763
Epoch 2/10
8/8 [==============================] - 2s 248ms/step - loss: 466.1245 - accuracy: 0.8966 - categorical_crossentropy: 466.1245 - val_loss: 78.1461 - val_accuracy: 0.1207 - val_categorical_crossentropy: 78.1461
Epoch 3/10
8/8 [==============================] - 2s 248ms/step - loss: 201.3840 - accuracy: 0.9024 - categorical_crossentropy: 201.3840 - val_loss: 139.4732 - val_accuracy: 0.1762 - val_categorical_crossentropy: 139.4732
...
Epoch 9/10
8/8 [==============================] - 2s 252ms/step - loss: 60.5674 - accuracy: 0.9659 - categorical_crossentropy: 60.5674 - val_loss: 897.9677 - val_accuracy: 0.7778 - val_categorical_crossentropy: 897.9677
Epoch 10/10
8/8 [==============================] - 2s 245ms/step - loss: 66.1619 - accuracy: 0.9669 - categorical_crossentropy: 66.1619 - val_loss: 924.7500 - val_accuracy: 0.8333 - val_categorical_crossentropy: 924.7500
3/3 [==============================] - 1s 241ms/step - loss: 414.3506 - accuracy: 0.8426 - categorical_crossentropy: 414.3506

i used small epoch for the model(10) since my data are very small. I dont know if this is normal when the epoch spiked quickly but the evaluate result are fine

shell anvil Mar 23, 2022, 10:27 AM

#

that never happened to me

#

when you say the data are very small, what you really mean with that?

#

@inland zephyr

inland zephyr Mar 23, 2022, 10:29 AM

#

18 class, each class have 18 image

#

when using 80:20 split, so 16 train and 4 test. Although that, i also set in training phase 0.1 validation split

shell anvil Mar 23, 2022, 10:32 AM

#

ok

#

returning to your doubt

shell anvil Mar 23, 2022, 10:33 AM

#

inland zephyr i have question about loss while training a model ``` Epoch 1/10 8/8 [==========...

i think its fine when that happens

#

but...

#

I'm not 100% shore

#

I'm new in the machine learning

#

but I think its fine

inland zephyr Mar 23, 2022, 10:58 AM

#

but this is happen if i call the model.evaluate()

Accuracy: 0.790123462677002
AUC: 0.5
Precision: 0.0555555559694767
Recall: 0.0555555559694767
F1-Sco: 0.0555555559694767

I think this is troublesome

lone drum Mar 23, 2022, 11:27 AM

#

my dataframe this way ```python

1 Strike Price Token_x Exchange_x ... Vega_y Gamma_y Expiry_new_y
0 14350.0 102048025.0 NgdE ... None None 2022-03-24
1 14350.0 102048025.0 NSsgE ... None None 2022-03-24``` i want to make first row as header

#

ping me when reply

dusty ivy Mar 23, 2022, 12:24 PM

#

HI guyz

#

can anyone help me how to implement this one?

#

I implement this in C++ but the output seems like not correct

#

void train(vector<vector<double>> xy){
            int x = 0, y = 1;
            int epoch = 3;
            while (epoch--){
                random_shuffle(xy.begin(), xy.end());
                double tot_err = 0;
                while(tot_err < 0.01){
                    for(vector<double> data : xy){
                        double y_c = predict(data[x]);
                        // a.
                        err = data[y] - y_c;
                        tot_err += err * err;
                        // b.
                        b1 = b1 + alpha * err * data[x];
                        b0 = b0 + alpha * err;            
                    }
                }
            }
        }

#

the epoch here happens with the variable x and y are done distributed.

#

Total error: 1.80167e+09
Total error: 1.45195e+09
Total error: 1.54914e+09
y = -2556 + 6608x

#

the good thing here is that the total error is not zero but in the y hat it should be y = -2467 + 256x or something like that is not thousand because my output seems like too different.

tacit basin Mar 23, 2022, 12:35 PM

#

dusty ivy can anyone help me how to implement this one?

here's code for simple linear regression code from scratch: https://github.com/joelgrus/data-science-from-scratch/blob/master/scratch/simple_linear_regression.py

GitHub

data-science-from-scratch/simple_linear_regression.py at master · j...

code for Data Science From Scratch book. Contribute to joelgrus/data-science-from-scratch development by creating an account on GitHub.

dusty ivy Mar 23, 2022, 12:37 PM

#

def sum_of_sqerrors(alpha: float, beta: float, x: Vector, y: Vector) -> float:
    return sum(error(alpha, beta, x_i, y_i) ** 2
               for x_i, y_i in zip(x, y))

@tacit basin what is the zip(x, y) here?

tacit basin Mar 23, 2022, 12:49 PM

#

dusty ivy ```python def sum_of_sqerrors(alpha: float, beta: float, x: Vector, y: Vector) -...

it takes elements from x and y and, works like that

>>> a = [1,2,3,4]
>>> b = [5,6,7,8]
>>> list(zip(a,b))
[(1, 5), (2, 6), (3, 7), (4, 8)]

dusty ivy Mar 23, 2022, 12:49 PM

#

okay thanks...

steady basalt Mar 23, 2022, 12:53 PM

#

does anyone know when sk-learns random forest was released?

#

https://www.researchgate.net/publication/318468361_Liver_Disease_Diagnosis_Based_on_Neural_Networks

#

I am quite suprised that these authors did not show in their results that RF can achieve higher than their NN. On this data set I see it is 74% with RF without oversampling

#

is this sort of academic trickery prevalent?

#

And I wonder why these essentially homework level projects are being published

misty flint Mar 23, 2022, 1:11 PM

#

stone marlin I feel like one thing that people looking to advance in their DS careers tend to...

yeah i agree. theres def things that can make you more competitive and i feel like many DS steer clear away from the DE/DevOps stuff which is a shame tbh

mint palm Mar 23, 2022, 1:18 PM

#

#

#

    model.add(Dense(8, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(4, activation='tanh'))
    model.add(BatchNormalization())
    model.add(Dense(3, activation='softmax'))```

#

comment of performance please

tacit basin Mar 23, 2022, 1:27 PM

#

Beautiful

#

Although a bit unusual that train is worse than test

mint palm Mar 23, 2022, 1:33 PM

#

tacit basin Although a bit unusual that train is worse than test

yeah, focussing on that

mild dirge Mar 23, 2022, 1:44 PM

#

Maybe the test images are cherry-picked? @mint palm

mint palm Mar 23, 2022, 1:48 PM

#

mild dirge Maybe the test images are cherry-picked? <@!408337360548528138>

my dataset if huge, half million example, i also used test_train_split and shuffle

mild dirge Mar 23, 2022, 1:48 PM

#

Ah

#

well if it's not averaged it could just be an outlier

mint palm Mar 23, 2022, 1:49 PM

#

?

#

you means intentionally pick example, for normal distribution

mild dirge Mar 23, 2022, 1:49 PM

#

Have another pytorch question, when using transfer learning you often see something like in this code I attached. the model.classifier = line. Is this an existing part of the model that we replace with our own layers?

mild dirge Mar 23, 2022, 1:50 PM

#

mint palm you means intentionally pick example, for normal distribution

Just a lucky draw on the test images maybe

mint palm Mar 23, 2022, 1:51 PM

#

mild dirge Just a lucky draw on the test images maybe

maybe removing seed would do that

karmic moth Mar 23, 2022, 2:03 PM

#

Does anyone knows TF-IDF well, my question is should we remove exremely rare words/features and the most common features/words when producing TF-IDF vectors by using min_df and max_df?

sinful bramble Mar 23, 2022, 2:22 PM

#

please i need help, i want to crop the passport of each student base on the of the student in the album, i wrote an algorithm which can crop the passports but it crop it crop the passport randomly, whereas i want the first passport to be 001.jpg while the second passport to 002.jpg .

mild dirge Mar 23, 2022, 2:27 PM

#

karmic moth Does anyone knows TF-IDF well, my question is should we remove exremely rare wor...

We did this as well, but our teacher actually pointed out that even very rare features can be very decisive. We had to classify the review rating of recipes, and stuff like "bell pepper" could make everyone give the recipe a 1 star rating, even though bell peppers aren't super common.

#

Does that make sense?

karmic moth Mar 23, 2022, 2:34 PM

#

mild dirge Does that make sense?

hmm i see

grave frost Mar 23, 2022, 2:48 PM

#

@iron basalt Pretty disappointed in Numenta all in all, the fact that they're resorting to such base tricks to try and show the performance of their methods is...honestly appalling.

#

they feed an explicit one-hot-encoded vector to their model for a meta learning, multi-task RL env and they have the gall to call it a "prior" which other DL models don't have access to?

#

pretty much exploiting the definiton of a prior smh

mint palm Mar 23, 2022, 2:58 PM

#

mild dirge Maybe the test images are cherry-picked? <@!408337360548528138>

i tried changing seed and validation size, it had similar/rather minutely worse impact on difference between val and train accuracy, can i conclude my model its ok.

mild dirge Mar 23, 2022, 2:58 PM

#

I mean if that test accuracy is correct then it seems fine

#

but it is weird that your test accuracy is higher than training accuracy

#

So there might be some unknown underlying problem

desert oar Mar 23, 2022, 3:30 PM

#

patent pine I have two models, and I want to compare their output. I want to know which even...

you can just do series1 == series2, which will give you a bool-valued Series, with True where they are equal and False otherwise

#

you should check the docs to see what compare does, it probably does more than you need

mild dirge Mar 23, 2022, 5:17 PM

#

Is it unconventional to not freeze a large part of the model when using transfer learning?

lapis sequoia Mar 23, 2022, 5:43 PM

#

Can someone help in why is it not working when I try it in loop

#

model is a list of strings

serene scaffold Mar 23, 2022, 5:51 PM

#

lapis sequoia Can someone help in why is it not working when I try it in loop

we'd need to see the whole error message. the most salient part is off-screen. Also, text is pretty much always better than screenshots

#

!code

arctic wedgeBOT Mar 23, 2022, 5:51 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold Mar 23, 2022, 5:51 PM

#

Anyway, it looks like you're trying to select rows where df['Model'] is an element of model, is that right?

lapis sequoia Mar 23, 2022, 5:52 PM

#

IndexError                                Traceback (most recent call last)
<ipython-input-50-9ca28fa27480> in <module>
      1 for m in model:
----> 2     x=df[df["Model"]==m].sort_values("Total",ascending=False).iloc[0]    ##Taking the one with the maximum Total

~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    893 
    894             maybe_callable = com.apply_if_callable(key, self.obj)
--> 895             return self._getitem_axis(maybe_callable, axis=axis)
    896 
    897     def _is_scalar_access(self, key: Tuple):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1499 
   1500             # validate the location
-> 1501             self._validate_integer(key, axis)
   1502 
   1503             return self.obj._ixs(key, axis=axis)

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_integer(self, key, axis)
   1442         len_axis = len(self.obj._get_axis(axis))
   1443         if key >= len_axis or key < -len_axis:
-> 1444             raise IndexError("single positional indexer is out-of-bounds")
   1445 
   1446     # -------------------------------------------------------------------

IndexError: single positional indexer is out-of-bounds```

lapis sequoia Mar 23, 2022, 5:52 PM

#

serene scaffold Anyway, it looks like you're trying to select rows where `df['Model']` is an ele...

yup

serene scaffold Mar 23, 2022, 5:52 PM

#

lapis sequoia yup

and why do you want that?

lapis sequoia Mar 23, 2022, 5:53 PM

#

there are repeated rows. I wanna take the one with the maximum "Total" column. Because it's the latest

serene scaffold Mar 23, 2022, 5:54 PM

#

try df.groupby('Model')['Total'].max()

#

without the for loop. just that one statement by itself.

lapis sequoia Mar 23, 2022, 5:58 PM

#

Then do i drop the duplicates and replace their total value from this new DF?

serene scaffold Mar 23, 2022, 5:59 PM

#

lapis sequoia Then do i drop the duplicates and replace their total value from this new DF?

what I showed you is just intended to give you the max Total value for each value of Model. I don't know enough about your goal or what data you have to guide you beyond that

#

If I were to do so, I'd have to see the rest of the DataFrame. but I'm heading out now.

tacit basin Mar 23, 2022, 6:05 PM

#

mild dirge Is it unconventional to not freeze a large part of the model when using transfer...

The way fastai fine tune works by default is to train one epoch with frozen weights except for the head, then multiple epochs with unfrozen weights, but with differential learning rate, that means some earlier layers get updated with much lower learning rate than the later layers

mint palm Mar 23, 2022, 6:19 PM

#

https://stackoverflow.com/questions/56694980/valueerror-n-components-4-must-be-between-0-and-minn-samples-n-features-2-wi

Stack Overflow

ValueError: n_components=4 must be between 0 and min(n_samples, n_f...

I get an error with my code like this
for n, df_process in enumerate(all_df):
#Normalisasi data dengan metode Standard Scaler
scaler=StandardScaler()

#

what does it mean by n_sample and n_feature

#

in above

mild dirge Mar 23, 2022, 6:41 PM

#

tacit basin The way fastai fine tune works by default is to train one epoch with frozen weig...

with "except for the head" you mean put on a different head and train that right? and what do you mean with differential learning rate?

final spruce Mar 23, 2022, 6:45 PM

#

Hey, I'm trying to make my own trading bot.
Does anyone know a good API what I should use to gain stock information (not only the value/volume but also values from indicators like MACD / RSI)?

formal breach Mar 23, 2022, 7:40 PM

#

Guys is dataquest a good site to learn python? i've learned the basic so now i will move on doing projects learning about data science ( Data analysing and Machine learning) are there better place to learn or this is good? i like learning visualising and doing projects

jaunty mural Mar 23, 2022, 7:42 PM

#

formal breach Guys is dataquest a good site to learn python? i've learned the basic so now i w...

it is the most popular thing in Python (not count web development with Django) but data manipulation with pandas and numpy, and further visualisation with matplotlib are most common thing today

lapis sequoia Mar 23, 2022, 7:52 PM

#

formal breach Guys is dataquest a good site to learn python? i've learned the basic so now i w...

it's pretty good, but only if you have the money to spare. You can learn similar things all over the internet for free

tacit basin Mar 23, 2022, 8:00 PM

#

mild dirge with "except for the head" you mean put on a different head and train that right...

Yes correct, first epoch only trains the new head. Differential learning rates it's when different layers in network are trained with different learning rates.

mild dirge Mar 23, 2022, 8:00 PM

#

ah alright

#

Using SGD now with decaying learning rates for the entire model so won't be using that I think

#

thx for the replies!

tacit basin Mar 23, 2022, 8:02 PM

#

mild dirge Using SGD now with decaying learning rates for the entire model so won't be usin...

You get it for free with fastai learner. You can use lr rate scheduler like cosine annealing and on top of if differential learning rates

mild dirge Mar 23, 2022, 8:04 PM

#

ah cool. Just started using pytorch for this project. I'll just probably try to wrap up this project as soon as possible to start on the report. but pytorch seems really cool.

#

The fact that it is a lot lower level than sklearn ,what I mostly use, really helps understand stuff better

tacit basin Mar 23, 2022, 8:04 PM

#

Fastai is layer on top of pytorch https://docs.fast.ai/callback.schedule.html#Learner.fine_tune

Hyperparam schedule

Callback and helper functions to schedule any hyper-parameter

#

Sorry it's discriminative learning rates not differential

iron basalt Mar 23, 2022, 8:17 PM

#

grave frost <@!119925597395877889> Pretty disappointed in Numenta all in all, the fact that ...

Is this still about that same paper? Can you link it again? Let me put it this way, I am impressed with Numenta's ideas, not their results or comparisons with others. For example, there are several others that have also gone and run with the grid cell idea and their stuff seems to be getting results. So I would suggest taking their ideas and trying to make them work yourself, and avoid the issues that they have. There is ofc always drama in the ML community and such. If you think the idea might still have some merit to it but they did it wrong, either in their implementation or method of testing / comparison, then you can do it the right way. You can find those that blindly follow Numenta's work, and those that are overly dismissive.

lapis sequoia Mar 23, 2022, 8:54 PM

#

If anyone out here with experience in ai & ml field doesn’t mind specifying a solid book for Ai beginners/juniors please let he/she kindly do as I’m really confused on Ai learning

stark zenith Mar 23, 2022, 9:59 PM

#

Been doing data manipulation in pandas for like 6 months now for work and just now got a strong hold on apply, and now I feel like i can both rule the world, and need to reinvent everything I wrote so far.

proven meadow Mar 23, 2022, 10:00 PM

#

So full disclosure I originally asked this in a help channel but it’s kind of an open ended question so I think it fits here better.

Hello, so I have a project where I need to parse through a txt file of a classic novel, examine all lines of spoken dialogue, and (this is the hard part) decide which character speaks which line.

My teacher has not lectured us on NLP before, and I honestly don’t know where to start for the actual classification algorithm. If anyone can help guide me with any tips on what I would have to employ, links to resources (that aren’t too mathy for a HS sophomore), explain some packages that can help, etc., that would be great, thanks!

serene scaffold Mar 23, 2022, 10:03 PM

#

proven meadow So full disclosure I originally asked this in a help channel but it’s kind of an...

the simplest way would probably be to use words or groups of words that one character uses more than other characters

#

what class is this for?

proven meadow Mar 23, 2022, 10:05 PM

#

serene scaffold what class is this for?

My AI course in school. Also would the NLTK be any help for this?

serene scaffold Mar 23, 2022, 10:06 PM

#

proven meadow My AI course in school. Also would the NLTK be any help for this?

yes, nltk can help you make ngrams. an ngram is a tuple of n words/tokens/"grams". so you might pick an n value of 3, which are trigrams. and then you'd look for trigrams that correlate with specific characters.

misty flint Mar 23, 2022, 10:08 PM

#

so can SpaCy

#

RunFail

proven meadow Mar 23, 2022, 10:08 PM

#

serene scaffold yes, nltk can help you make ngrams. an ngram is a tuple of n words/tokens/"grams...

How does one determine the correlation though if its not the case that the same exact three words are said repeatedly, is it an ML algorithm?

#

Sorry I’m a noob at this I don’t really know ML

blissful bone Mar 23, 2022, 10:09 PM

#

#885998868200828928 message

misty flint Mar 23, 2022, 10:09 PM

#

depending on the complexity of the text, simple classification models should suffice

#

otherwise

#

youre looking at maybe more advanced stuff

#

DoggoKek

serene scaffold Mar 23, 2022, 10:11 PM

#

proven meadow How does one determine the correlation though if its not the case that the same ...

if you do it based on trigrams, you'd have to make trigrams for everything that every character says in your training data, and count them. and then add them up for all the characters. and then see which ones are high for a given character but low for all the others.

proven meadow Mar 23, 2022, 10:12 PM

#

serene scaffold if you do it based on trigrams, you'd have to make trigrams for everything that ...

I think it’s supposed to be unsupervised learning

misty flint Mar 23, 2022, 10:13 PM

#

so instead of classification youre doing clustering. for the most part the same tbh

#

kekHands

serene scaffold Mar 23, 2022, 10:13 PM

#

proven meadow I think it’s supposed to be unsupervised learning

uh okay. are there any other details like this?

misty flint Mar 23, 2022, 10:13 PM

#

inb4 10+ details

#

kekHands

proven meadow Mar 23, 2022, 10:13 PM

#

Give me a moment

misty flint Mar 23, 2022, 10:13 PM

#

this is like when you work with a business stakeholder

#

no offense to business peeps

#

kekHands

grave frost Mar 23, 2022, 10:15 PM

#

iron basalt Is this still about that same paper? Can you link it again? Let me put it this w...

ye, https://arxiv.org/abs/2201.00042
my main issue is why the level of inconsistencies in the overall testing methodology? put it on the forum, authors won't even reply 🤷‍♂️

its not a major thing really, but....it does put a dent in Numenta's overall credibility

arXiv.org

Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning...

A key challenge for AI is to build embodied systems that operate in
dynamically changing environments. Such systems must adapt to changing task
contexts and learn continuously. Although standard...

proven meadow Mar 23, 2022, 10:15 PM

#

serene scaffold uh okay. are there any other details like this?

Ok so basically the project is: Build a "profile" for each character in the novel. The profile includes all of their spoken dialogue as well as a list of adjectives that would accurately describe their characterization in the novel. I'm pretty sure it's supposed to be unsupervised clustering (ie I will not go through the novel by hand and match words to characters).

#

This is my teacher's first year doing this lab so it's pretty open-ended, I don't need something perfect

serene scaffold Mar 23, 2022, 10:16 PM

#

proven meadow Ok so basically the project is: Build a "profile" for each character in the nove...

what do you mean "by hand"? none of it will be "by hand" because you'll write a program that does it.

proven meadow Mar 23, 2022, 10:17 PM

#

Oh oops yeah. In that case what I meant is that I'm only doing this for one novel

#

wait