#data-science-and-ml

1 messages · Page 386 of 1

warm verge
#

Can you send your variables?

#

Like copy and paste a sample

#

Are you trying to convert your labels to floats?

lapis sequoia
#

hey, i was wondering how could i make labels and test images ? because ever since i started learning they're given to me

upper spindle
#

can someone help my with my lstm model, this is what it predicts

#
lstm_5 = Sequential([ 
    tf.keras.layers.InputLayer(input_shape=[n_past, n_dims]),

    # ADDING 1st LSTM LAYER
    tf.keras.layers.LSTM(64, return_sequences=True),
    tf.keras.layers.Dropout(0.2),

    # ADDING 2nd LSTM LAYER
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dropout(0.2),

    # DENSE OUTPUT LAYER
    tf.keras.layers.Dense(1)
])

lstm_5.compile(loss='mse', optimizer="adam")

multivariate = lstm_5.fit(mat_X_train, mat_y_train, 
                  validation_split=0.2, shuffle=True,
                  verbose=0, batch_size=batch_size, epochs=200)

# FORECASTING ON VALIDATION SET
multivariate_prediction = lstm(lstm_5, validation_index)

# SCALING OUTPUT TO MINMAXSCALER FITTED TO TRAINING CURRENT VOLUME
multivariate_prediction_scaled = scale(vol_scaler, multivariate_prediction)
#

have i used too many epochs or are my units incorrect

dusk tide
#

Has anyone tried the book The hundred page ML book by Andrey Burkov?

regal gale
#

any kind soul can help to give some feedback to a self-check assignment from a regression textbook? Unfortunately I have to pay for the answer and I am not willing to, hopefully someone can let me know if there's any glaring issue

woven fractal
#

Any newbies want to do some NumPy practice? I was thinking about doing some live coding.

civic ivy
#

So question. i want to have it so an AI does things repeatedly(wake up, go to the window, open fridge, sleep, repeat) but it doing so brings down a variable called happiness. then i want another map like (wake up then leave) but i want the AI choices this when happiness is a certain point but as well having a chance to not choose it. Ik i could some like

if happi == "0" 
  map1 = False
  map2 = True
  while map2 == True
    other code stuff

but i want the AI to choose map2 instead of doing map1. I am wondering if this possible?

serene scaffold
regal gale
#

Anyone know autocorrelation function (ACF) and partial autocorrelation function (PACF) #🤡help-banana

civic ivy
#

that is a good point but if it was possible i was going to see if i could use this in my project SLOAM as a way to emulate a form of emotion that is caused by repetitive actions. SLOAM is a Self Learning Optical Auditory Machine. it will learn from well optical and auditory responses.

bitter pilot
#

Hello Everyone,
I have a data science use case, (I am not super beginner tho), however I need some guidance in where to start/look for.
I have a dataset of people with some work information, (department, skills, responsabilities, etc) and also training they have taken.
I need to train a system in order to be able to suggest to new employees or even existing employees which training they should take based on the model.
for example if I a switched from Junior to Senior range, then the system would recommend me which trainings to follow.
any pointer in the right direction would be useful

odd meteor
#

Just a quick question. What makes you believe your clustering "didn't make sense"?

Your observation/answer to this will determine the kind of solution I'll suggest you try.

urban lance
odd meteor
#

It could possibly be the problem of exploding gradient.

regal gale
#

Anyone know autocorrelation function (ACF) and partial autocorrelation function (PACF)

odd meteor
urban lance
#

no not KMeans, it wasn't well suited from my problem

#

I used Chi2 distance with gaussian mixture models and hierachical clustering (My data was/is count of the occurences within a time interval)

regal gale
#

I need some help with know autocorrelation function (ACF) and partial autocorrelation function (PACF)

upper spindle
regal gale
#

@upper spindle omg life savior

upper spindle
#

wasnt the best at it, but i understood acf and pacf partially

regal gale
odd meteor
odd meteor
urban lance
#

DBSCAN had even worse results 😅

#

I've now tempered some more with the freatures, tomorrow I'll see if it worked or not

regal gale
#

Anyone know autocorrelation function (ACF) and partial autocorrelation function (PACF)

odd meteor
# urban lance DBSCAN had even worse results 😅

If the dimension of your data is much, try decomposing it with t-SNE before applying your preferred clustering algorithm. Maybe you'll get a more customer-friendly result that way.

Nonetheless, may the force be with you 😊

urban lance
#

I read that t-SNE might find false patterns

odd meteor
urban lance
#

I mean that it might not be able to reconstruct the data in 2d

odd meteor
#

You can use PCA as well but t-SNE is better than PCA.

regal gale
#

Anyone know autocorrelation function (ACF) and partial autocorrelation function (PACF)

lapis sequoia
#

so it could be a vanishing gradient too right?

misty flint
#

learning that i can replace pandas' default matplotlib viz library with the plotly one has greatly improved my mood

#

whoever came up with that feature. i am glad. shiroGomen

spiral gale
#

hey, i am stuck with assigning weights to my multinominal one-vs-rest logistic regression with sklearn. I know that I can assign weights but how would I go around it in a multilabel setting with e.g. seven possible outcomes / labels?

regal gale
#

Anyone familiar with bootstraping sample replacement

frosty jackal
#

Which platform is good for data science colab or jupyter

cosmic pewter
#

i'm VERY NEW to tensorflow

and I want to make a rock paper scissor game using tensorflow (basically just detect rock paper or scissor)
I've trained my model but i wonder how can i trigger a function when CONFIDENT is above 70% or something like that like retuned a list with data that I can check for
||or maybe it already retuned something i can check for like confident level but i just don't know how to access it educate me pls||

anyone have tutorials or article about this matter?

there's a ref : this project use tensorflow trained model to check for laughing face and trigger "LOSE"
https://github.com/andypotato/do-not-laugh

**
tl;dr : wanna make rock paper scissor game using my already trained model
how can i check for confident score so i can use it to trigger win/lose

please suggest what i need to know and what i need to learn
**

GitHub

A simple AI game based on Vue.js and Electron. Contribute to andypotato/do-not-laugh development by creating an account on GitHub.

prisma mist
#

any way to make pytesseract.image_to_string() faster?

jolly knoll
#

Guys, how do you showcase a Tableau workbook when you no longer have Tableau?

tacit basin
karmic valley
#

please

#

if anyone could help me

hollow sentinel
#

damnit

#

dead kernel

#

why

serene scaffold
#

because fuck notebooks, that's why

hollow sentinel
#

bless

#

i created a new environment and then just copy pasted that and it worked

iron basalt
#

This talk will present a new approach to dimension reduction called UMAP. UMAP is grounded in manifold learning and topology, making an effort to preserve the topological structure of the data. The resulting algorithm can provide both 2D visualisations of data of comparable quality to t-SNE, and general purpose dimension reduction. UMAP has been...

▶ Play video
graceful glacier
#

what would be the best method of getting the second column given the first column

graceful glacier
#

yes

misty flint
#

i mean theyre obv not meant for dev or production

tall blaze
#

I would separate the text column by the space delimiter. The write a lambda function to multiply the newly created numeric column based on if the first column is minus or not

misty flint
#

but theyre decent for experiments

tall blaze
serene scaffold
#

@misty flint we were shitting on notebooks in #pedagogy earlier

misty flint
#

honestly they can be kinda annoying tbh

serene scaffold
#

honestly they can be kinda annoying tbh

misty flint
graceful glacier
#

thats along the lines of wht i was thinking as well

misty flint
#

maybe they will die out in the future when better tooling comes out

tall blaze
# graceful glacier i dont think the code is necessary

Snap I started writing it. I’m on my phone and it’s not the most efficient but it gets the job done:
df[['col1', 'col2']]=df['string'].str.split(' ', expand=True)

df['col2']=pd.to_numeric(df['col2'])

df['col1']=df['col1'].apply(lambda x:-1 if x=='Minus' else 1)

df['final_col']=df['col1']*df['col2']

graceful glacier
#

thanks

grave frost
hollow sentinel
#

that made me almost spit out my coffee

#

good one

#

CPR failed, trying paddles

grave frost
#

thanks lol

grave frost
tall blaze
grave frost
#

if you mean to say its more convenient to crash a kernel

tall blaze
grave frost
#

I find colab more useful. the fact that I can totally mess my env up by downloading 10 versions of torch and do a simple reset blows my mind

tall blaze
grave frost
#

a big F U to docker nerds, nerding over why learning 100 commands and debugging errors is better than colab

grave frost
#

colab pro+ is heavily undercutting its competition

tall blaze
# grave frost I doubt it

Maybe, depending on if you are using it a ton. I have a databricks account with aws cloud and never go over 50 a month for my personal stuff

tall blaze
#

Just be supperrrrr careful to terminate clusters when you done

grave frost
tall blaze
#

Pretty much

grave frost
#

useless

#

AWS costs a shitton

misty flint
#

databricks has a ton of ML specific stuff tho

grave frost
#

GCP is soo cheap

#

or maybe y'all are just rich

tall blaze
misty flint
#

i mean i think 50/mo is cheap

tall blaze
#

Heaven

grave frost
#

well, depends

tall blaze
#

Spark sql pass throughs with hive storage

grave frost
#

Google's TPUs are dirt cheap, and AWS doesn't do much for pricing for A100s

tall blaze
#

But if you training all bloody month go with colab, although I think they have ways of limiting usage

grave frost
#

what are you training that takes a month?

#

even GPT2 took a few weeks

tall blaze
grave frost
#

I suppose

#

Colab is for experimentation anyways, which is why I love the colab+kaggle comob

#

GCP is just objectively cheap all around. it has a pretty UI, helpful support and plenty of integrations

tall blaze
#

I think google is trying to capture space in the market, they’ll prolly jack up prices

#

Just don’t use Microsoft lol

grave frost
#

¯_(ツ)_/¯

#

I doubt it

#

I think they're just getting more competitive with thinner margins - and their TPUs ofc

tall blaze
#

Maybe

grave frost
#

TPUs are a pain to get working, but they just outperform every single GPU out there no biggie

#

they're criminally underrated IMO

#

(but I hope it stays that way lol)

serene scaffold
iron basalt
# grave frost (but I hope it stays that way lol)

I would say that they are everywhere now due to being part of Apple's new SOCs. But in typical Apple fashion they are very closed off in terms of access. You have to use their own tools to do specific operations on it while in reality it's pretty generic and could probably support something like OpenCL.

iron basalt
#

Since it all readjusted as you change anything then.

#

(see Enso)

#

(or shader graphs in Blender or flow graphs in Unreal, etc)

serene scaffold
#

regardless, I've decided to position myself as the anti-notebook, anti-jupyter guy to spread awareness of their limitations/issues.

iron basalt
#

Rather than what they actually want to be which is this pipes and filters thing.

inland mantle
#

@serene scaffold This is just a personal question. Because you are a data scientist with linguistics, do you work on NLP?

raven cloud
#

how would you go about counting objects (with tensorflow for example) in a video where the camera is moving randomly ?
As an example , say the camera is pointing towards an object and counts it and afterwards it moves away from that object facing anywhere else away from the object and later faces again to the same object but you would not add it to the count since its already been counted.

karmic valley
#

i have an image and i want python to work out the average pixel intensity below the blue line and average pixel intensity above blue line from image. I know code to work out average pixel intensity of full image. But dont know how to do pixel intensity below blue line and above. IF you have any ideas please help. Or another way is to get python to split image into 2 - one with everything below blue line and other with everything above line then i can do my code. But i dont know how to do that

#

just @ me if you have any thoughts

misty flint
#

fourier transformation

#

jk

#

idk anything about signal processing

#

even tho i took one class

#

made me avoid it more monkaCHRIST

elfin jungle
#

Got a machine learning question for you guys
I have an airbnb data set that has prices of properties throughout the year (300+ entries for a single ID). Applying ml on the data would result in heavy overfitting and not capture the true goal of measuring price change throughout the year. I know in R there is lm.clustering which accounts for multiple entries, is there any equivalent in python? @tender hearth

misty flint
# serene scaffold Yes

stelercus i know you might be biased and may or may not be able to look into the future, but do you think specializing in NLP is good in terms of future job growth or should i try to specialize in another set of ML algos 🔮

karmic valley
misty flint
desert oar
#

heck you can probably even do it in photoshop, if you don't need to automate it for more than one image

#

then you can literally just look at the pixel rgb values

#

of course you can use gimp if you don't want to use adobe proprietary shitware 🙂

carmine oasis
neon imp
#

unless you are postdoc

#

my experience is that it's very hard to find work as a specialist

#

there are too many postdoc kaggle grandmasters for rent

#

I think people have an extremely tilted idea of how competitive the post doc grind is and how much talent there is in the market atm

#

ML algorithm work is being advanced by huge research teams, and the impact of a lone ranger in the field is becoming super low

iron basalt
#

(Actually applies to more than just ML)

#

(incremental vs different axis)

neon imp
#

I think you underestimate how hard it is to do research outside a research group

#

of highly motivated peers

#

but that's sorta not industry

#

I think in terms of industry the applications of NLP are extremely saturated right now kinda. The capacity for NLP projects is maxxed out.

iron basalt
#

I don't but I know it's hard for many to do so. Solo is not only hard for motivation, but also confidence. You may fail and have wasted your time or not, but if you can get over that fear it's fine.

neon imp
#

To be successful in research you need to know 100 other successful researchers

#

just to help you course correct etc

#

and learn of opportunity

#

and yeah, motivation confidence etc

iron basalt
#

Eh, not exactly. You have one more trick up your sleeve and that is the results. If it clearly works (and you made sure it actually works and are not tricking yourself) then it works.

neon imp
#

Yes, but the low hanging fruit is very picked

#

in the 21st century every research area has a small army of wannabes assaulting the easy stuff

#

so you need quite the edge to tackle stuff.

#

and the network and mentorship is part of that

iron basalt
#

It's actually not because of again the axis issue. The focus is different. There are many things in ML for which there are only a handful of people working on it.

#

It's just not even being attempted often.

neon imp
#

like?

iron basalt
#

Online learning, causal modelling, non-backpropagation based methods. These things have relatively few people working on them.

#

There are still many, but these big teams in the ML world are often focused on their specific stuff which others follow making it seem like that is all there is.

#

And that is a losing strategy because they are already doing it, and in a large group.

neon imp
#

Hmm, online learning I think is not very compatible with large scale backprop

#

also to be frank unsupervised online learning in industry is very...

#

risky

iron basalt
#

Yeah online learning is an example where incremental improvements by just using backprop does not work.

neon imp
#

but yes I guess so

iron basalt
#

Risky, but it must also happen, because an AGI can do online learning.

#

But that's research, it's risky and you will probably fail, but that is how all interesting things happen. Not being afraid of failure is the first step. It's why kids are considered creative, they lack that fear in their shielded environment.

neon imp
#

That's a very pure research mentality haha

#

the flipside is that succesful research requires extremely insane feedback loops to be done successfully

iron basalt
#

Having a group to work with is of course much better, but if there is not such group, and you know that it must be done (e.g. for AGI), then it is what it is. You need to start that group, someone has to do it.

neon imp
#

So you need to be on top of your game to really do any thing, just because the intellectual capital being deployed is significant so you have to be really cutting edge to find delta

#

Artificial general intelligence talk kinda doesn't motivate me much

#

that entire field has skipped so much foundational knowledge

#

about neuroscience

#

we know so little about problem solving techniques that don't involve Von Neumann state machines

iron basalt
#

This is why our approach is based on modern neuroscience and results. If it does not work, even with a nice pretty theory, it goes into the bin. There are things however that we know we must have, like online learning.

neon imp
#

but it's clear that we will need to use models that are not von neumann state machines to get anywhere close

iron basalt
#

We also are very into non-von neumann machines.

neon imp
#

yea, but that is pure research.

iron basalt
#

The reason for the neuroscience is that while it may not be necessary, it's the current real world example out there of it working (the human brain).

neon imp
#

I don't doubt it's valuable, but you must bring a pure research mindset to it

#

neuroscience is very necessary

#

we need other models of computatation haha

#

depending on your definition of neuroscience

iron basalt
#

For non-research, yeah you can join a big group and help with the incremental improvements. And maybe just keep tabs on the pure research people's stuff.

neon imp
#

Well non research means that its commercial applications

#

if that makes sense

#

and commercially all this stuff is just miles and miles off

inland zephyr
#

hello all
i'm now in progress on my personal project with image classification with cnn. i want to build airplane tail classifier started with 20 different airlines and each airlines provide 20 different tail image in 120x120 px dims and manually cropped using PS (i dont event care about the aircraft type). something that i want to ask is it feasible to do with only 20 images which i planned to split for train and test?

misty flint
neon imp
#

allg

iron basalt
#

Not exactly, there are already non-backprop methods in use and have been long before deep learning actually. And they can out perform them.

neon imp
iron basalt
#

Especially when it comes to computation cost**

neon imp
#

but you can cheat and go with a few hundred if you're really good at oversampling techniques

misty flint
#

i did research in my past life. not a fan. kekHands

#

if i can avoid it, i will.

neon imp
#

where are you coming from then?

inland zephyr
misty flint
#

secret RunFail

neon imp
#

well where do you think about going

#

I think the tl;dr of a MLE career is Kaggle good

#

Pump the weights in the Kaggle gym it's really real world relevant

neon imp
misty flint
#

tbh im probs gonna go the route of technical PM

#

so yeah

neon imp
#

#1 don't use photoshop for labelling

inland zephyr
#

maybe i will bring the sampling to albumentation or keras preprocess to help

neon imp
#

invest the effort up front to configure a labelling tool

#

#2 get a bit more source data, say 50 distinct original images for each

neon imp
#

the world doesn't need more pms

#

I'm really serious

inland zephyr
#

what is PM? project manager?

neon imp
#

in this context yea

misty flint
#

the world doesnt need a lot of things but it still gets it. ill do what makes me happy. i wouldve continued my past life if i didnt care about my happiness DoggoKek

neon imp
#

I think the hardest lesson I've learnt in life that happiness doesn't get solved by the easy path

#

most happiness is hard earnt

#

if you've got a good path ahead of you for the hard yards of that domain sure, but just keep that in mind

#

I am of the view that it's very hard to succeed in any domain in this industry if you can't succeed at the technical IC path

misty flint
#

i know myself and my personality and i would suffer if i went down the IC route

#

everyone is different. dif strokes for dif folks.

neon imp
#

Perhaps, but I think the fundamental thing you need is the ability to focus and achieve difficult hard work

#

I think the skills to be a succesful IC are a lot less rare than people think

#

well, a lot less specialised

#

but what happens is that highly succesful ICs are highly successful because of a range of skills that are successful in any domain

misty flint
#

i dont doubt your words, but i think success in life looks dif for everybody, yknow?

neon imp
#

hence they are given 10 direct reports because they're highly successful people

neon imp
#

is different for everyone

#

I think the inputs of success are extremely similar for everyone

iron basalt
misty flint
#

i think we are talking past each other. lets just say i wouldve stayed in my original path had i cared about what others thought and what society thought of "having a successful career" meant

#

but to each their own, you do bring up some valid points.

neon imp
#

That's fair Squiggle haha. I was trying to think from a industry mindset

#

AGI research is of course extremely interesting as long as you do it right

neon imp
#

I'm just saying that you need to figure out what you want and then you need to really work on improving your input goods to success kinda

#

but I can't give more advice to you really without knowing more, I know you said ML nat lang stuff but yea

#

My experience in life is just that

#

successful people are rarely bad at a particular area

#

The successful researchers I know are often surprisingly solid full stack devs,

misty flint
#

and im saying that success in life =/= success in career

#

thats what i mean

#

you get me?

iron basalt
misty flint
#

anyway, im getting off-topic for this channel, so ill see myself out RunFail

iron basalt
#

(often out of need because we need to simulate things, have a web UI for it, etc. Or because people now in ML often come from other things such as game dev)

#

"jack of all trades master of none" - lame, self limiting phrase, "jack of all trades master of one" - much better.

#

(a large math knowledge base is probably the underlying thing here (and/or physics))

neon imp
#

and im saying that success in life =/= success in career
thats what i mean
you get me?

#

I understand, but I completely disagree

#

The #1 factor to success IMO is not having a mentality of.

#

"I'm not good at that and that's fine"

#

Most successful people are "Jack of all Trades AND Master of Something"

#

oh that's what you already said squiggle haha

#

jack of all master of one

iron basalt
#

Yeah it's what can be seen when looking at pretty much all famous researchers (and lesser known ones). For example Newton was good at much more than just physics.

#

But that was his one master thing.

neon imp
#

I don't like Newton or older researchers as an example

#

back in the 19th century you could be a weird shit and still successful

#

for various different reasons.

iron basalt
#

I mean yeah, Newton and such are weird.

neon imp
#

Modern successful researcher just looks like Alan Kay kinda

#

Just... they're not bad at stuff.

iron basalt
#

Yeah. Well, the thing is there is of course always some stuff that is out of your hands, but using that as an excuse is not going to help you.

#

The thing is though, you can often see such people, successful or not, doing many different things.

#

And often just as a sort of problem along the way to trying to solve a different problem.

#

It really comes down to whether or not you throw up hands when there is a problem or you constantly push for a solution.

neon imp
#

in a way

#

I think the biggest attribute is investing in your personal productive capacity kinda

#

I think it's less about charging through problems

#

becuase there are a lot of failed people who pushed really hard on bad problems

#

Survivorship bias is a really strong thing

#

I think it's about investing in your personal capacity to do really good work, and to find and identify really important and profitable work to do

#

and there's a lot involved in doing that successfully for obvious reasons

iron basalt
#

The thing is that it's a multi-arm bandit problem and it's really hard to tell if you keep going or not. At some point the whole different approach thing kicks in for some, and others just keep pushing the same way. Knowing more about seemingly unrelated things may just happen to give you the alternative solution and math happens to be generic enough to often allow for such bridges.

neon imp
#

I just would try to not worry about the problem

#

rather than the process

#

be process minded

#

Always, be very process minded

#

Don't sweat about the win sweat about having an amazingly good process

iron basalt
#

I agree.

neon imp
#

breadth of knowledge is also a very good point

pastel valley
#

is 87% training accuracy considered underfitting?

alpine bay
alpine bay
# alpine bay

I want to print this in google colab any can help??

pastel valley
# alpine bay I want to print this in google colab any can help??

i just saw this on net

fig, ax = plt.subplots(figsize=(8, 8))
ax.matshow(con_mat, cmap=plt.cm.Blues, alpha=0.3)
for i in range(con_mat.shape[0]):
    for j in range(con_mat.shape[1]):
        ax.text(x=j, y=i,s=con_mat[i, j], va='center', ha='center', size='xx-large')

plt.xlabel('Predictions', fontsize=18)
plt.ylabel('Actuals', fontsize=18)
plt.title('Confusion Matrix', fontsize=18)
plt.show()
alpine bay
silver sun
#

Im getting this error on my CNN deep learning code. Any help? I tried to fix it but Im stuck. ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 24), found shape=(None, 8)

regal gale
#

Hi

#

Anyone familiar with fitting a logistic regression model using 70%-30% of the data for training-testing the model. Repor AUC

stone marlin
#

This feels like another homework question, Jessica.

lapis sequoia
regal gale
#

@stone marlin i have the answer

#

just need some explaination @stone marlin

pastel valley
# alpine bay

try ctrl v or paste on notepad first then copy paste again

lapis sequoia
#

When my pipeline gets really convoluted, it'd be nice to see all the steps graphically and be able to inspect intermediate results

tacit basin
# lapis sequoia When my pipeline gets really convoluted, it'd be nice to see all the steps graph...
inland zephyr
#

My model get emotional damage for now

Epoch 2/500
5/5 [==============================] - 1s 116ms/step - loss: 33.5003 - accuracy: 0.0507 - false_negatives: 139.0000 - categorical_crossentropy: 33.5003 - val_loss: 236863552.0000 - val_accuracy: 0.8100 - val_false_negatives: 57.0000 - val_categorical_crossentropy: 236863568.0000
Epoch 3/500
5/5 [==============================] - 1s 116ms/step - loss: 7.6320 - accuracy: 0.0300 - false_negatives: 131.0000 - categorical_crossentropy: 7.6320 - val_loss: 361702784.0000 - val_accuracy: 0.8233 - val_false_negatives: 53.0000 - val_categorical_crossentropy: 361702784.0000
Epoch 4/500
5/5 [==============================] - 1s 123ms/step - loss: 2.8214 - accuracy: 0.0343 - false_negatives: 127.0000 - categorical_crossentropy: 2.8214 - val_loss: 139447792.0000 - val_accuracy: 0.8100 - val_false_negatives: 57.0000 - val_categorical_crossentropy: 139447792.0000
Epoch 5/500
5/5 [==============================] - 1s 142ms/step - loss: 4.0497 - accuracy: 0.0379 - false_negatives: 131.0000 - categorical_crossentropy: 4.0497 - val_loss: 51209328.0000 - val_accuracy: 0.8233 - val_false_negatives: 53.0000 - val_categorical_crossentropy: 51209328.0000

I need to tweak it more ...

solemn dragon
#

Hey guys how would you go about using groupby only on rows that are "1T" apart in a pandas timeseries ?
So im my mind it would be something like this :
deltaTimeThreshold = np.timedelta64(1, 'm')

Not valid code obviously : df = df.groupby('sn')(if df.date -df.date.shift() <= deltaTimeThreshold)

misty flint
#

im curious, what happens if you try it

urban lance
#

on the topic of groupby
does a groupby sum change nan to 0 🤔
if a group has only nans, I want it to remain nan in the sum

df = df.groupby(["x",pd.Grouper(key="y", freq="M")]).agg({
    "feature" : "sum",
#

ex:

features = [nan, nan, nan, nan, nan]
output : nan

features = [nan, nan, 0.14, 0, nan]
output : 0.14
inland zephyr
mint palm
#

i actually was trying to run a NN model from github, but i am getting this error .......all i changed was remove the encoding part as i had the encoded dataset for the same

upper spindle
mint palm
#

nobody helping

#

😢

lapis sequoia
#

its simple shape error

mint palm
#

yes but i can even show you my code, all i did is remove encoding

lapis sequoia
#

also given the error already, what is the shape of your X and y?

inland zephyr
#
s_label=[]
s_image=[]
fig = plt.figure(figsize=(5, 20))
k = 0
sample = paths.list_images(r'this is the path')
for s in sample:
    s_label = s.split(os.path.sep)[-2]
    s_image = cv2.imread(s)
    s_image = np.array(s_image, dtype="float") / 255.0
    fig.add_subplot(2, 5, (k + 1))
    plt.imshow(s_image)
    plt.axis='off'
    plt.title=s_label
plt.savefig(r'this is also a path', bbox_inches='tight')

I want to plot 10 image into a plot of 2x5 area but when i run this, i only got the last image

#

the structure look like this, but only the last folder shown in the plot

lapis sequoia
#

you want to save all of them right? its out of loop.

#

put in in loop.

inland zephyr
#

actually i want to make it as a montage

#

so all in one image

lapis sequoia
#

oh, so then you should use...

#

subplot

inland zephyr
#

it works

#

but now the axis make it annoying

#

nvm

fig = plt.figure()
k = 0
r = 2
c = 5
i = 1
sample = paths.list_images(r'')
print(sample)
for s in sample:
    plt.subplot(r,c,i)
    plt.title(s.split(os.path.sep)[-2])
    plt.axis('off')
    s_img = cv2.imread(s)
    plt.imshow(cv2.cvtColor(s_img, cv2.COLOR_BGR2RGB))
    i=i+1

plt.savefig(r'', bbox_inches='tight')
#

and the output seems fine

lapis sequoia
#

cheers!

misty flint
inland zephyr
#

i dont know event each tail looks very distinct, i still cannot made the model predict all of them easily

#

visually

#

it was easy to classified them since I was choose carefully the easy one, not put the hard to separate pattern

mint palm
# lapis sequoia yeah showing code will help.
import tensorflow.keras
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils

import numpy
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from keras.utils.vis_utils import plot_model

seed = 9
numpy.random.seed(seed)

data = pd.read_csv("C:\\Users\\rahul\\PycharmProjects\\pythonProject1\\complete.csv")
X = data.iloc[:, 0:8]
Y = data.iloc[:, 8]

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.001, random_state=seed)
# create model
model = Sequential()
model.add(Dense(8, input_dim=8, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(3, activation='tanh'))
model.add(Dense(3, activation='softmax'))
print(model.summary())
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Fit the model

history = model.fit(X_train, Y_train, validation_split=0.3, epochs=16, batch_size=128)

# evaluate the model
scores = model.evaluate(X_test, Y_test)

print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

plot_model(model, to_file='model.png')

# Plot training & validation accuracy values
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()

#

my code^

lapis sequoia
mint palm
#
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils

import numpy
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from keras.utils import plot_model

seed = 9
numpy.random.seed(seed)

# load datasets
#csv files were filtered based on the data.
input_file = "C:\\XXX...csv"
test_file = "C:\\XXX.csv"

dataset = pd.read_csv(input_file).values

# read training data
datasetTest = pd.read_csv(test_file).values

# split into input (X) and output (Y) variables
X = dataset[:,0:8].astype("int32")
Y = dataset[:,8]
XT = datasetTest[:,0:8].astype("int32")

encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

(X_train, X_test, Y_train, Y_test) = train_test_split(X, dummy_y, test_size=0.001, random_state=seed)
# create model
model = Sequential()
model.add(Dense(8, input_dim=8, init='normal', activation='relu'))
model.add(Dense(4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='tanh'))
model.add(Dense(3, init='normal', activation='softmax'))
print(model.summary())
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Fit the model

history = model.fit(X_train, Y_train, validation_split=0.3, epochs=16, batch_size=128)

# evaluate the model
scores = model.evaluate(X_test, Y_test)

print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

plot_model(model, to_file='model.png')

# Plot training & validation accuracy values
plt.plot(history.history['acc'])
```(removed some lines to fit  in)
#

dataset is 63168 by 9

lapis sequoia
#

i repeat, what is the shape of your Y?

mint palm
#

where last column is Y

#

63168 by 1

lapis sequoia
#
model.add(Dense(3, init='normal', activation='softmax'))

your model expects output of vector of 3 not 1.

mint palm
#

i didnt change it but...

#

i will try

lapis sequoia
#

you are not using encoded y right?

mint palm
#

yes

#

yes

#

i have encoded data

lapis sequoia
#

so your encoded y has the shape of (63168, 3) that is why it works for that, and not for this.

mint palm
#

no Y is encoded to either 1 or 2 or 3

#

for each 63K example

lapis sequoia
#

whats the shape of encoded_Y can you print it?

mint palm
#

yes i will verify

mint palm
#

its 1 column

lapis sequoia
#

i said print encoded_Y.shape lemon_grumpy

mint palm
#

i removed my encoded

lapis sequoia
#

okay...okay do you first of all understand what is the issue?

mint palm
#

yeah

lapis sequoia
#

and why removing encoding affects your code?

mint palm
#

it want different size

#

i am giving it 3 by something

#

it need 1 by something

lapis sequoia
#

yes.

#

do you know the simplest way to resolve it?

mint palm
#

i think i should try printing his encoded Y too

#

i will know if error is after that or somewhere else

lapis sequoia
urban lance
#

I'm guessing those 2 purple clusters are classified as being of the same cluster right? 😬

lapis sequoia
#
Y = to_categorical(Y)

this will convert your Y in (something, 3) so then you're good.

#

@mint palm

mint palm
#

ok thank you....i will try all that you said

lapis sequoia
#

alright, ping me here if you're still stuck.

urban lance
#

I'm using Kmeans so how does this work 🤔

#

my input is a distance matrix of my dataset

lapis sequoia
urban lance
#

of course I did

lapis sequoia
urban lance
#

it's 2 dimensional data

#

(with chi2 distance)

lapis sequoia
#

if it's 2 dimensional then by the definition of kmeans, there are 2 possibilities, something is wrong with your algo or something is wrong with your visualization.

urban lance
#

this is what I imput

#

probably the plot then, thought finally I had something cause I clearly saw 3 clusters now :/

#

been trying to cluster this data for almost 2 weeks 😅

urban lance
#

the elbow method tells me the optimal number of clusters is 3

#

and this is what happens when I input the original data set

coral garden
#

what is that zombie infestation

urban lance
#

haha

#

ye

lapis sequoia
coral garden
#

;_)

urban lance
#

but kmeans isn't optimal for my dataset;

#

I tried with chi1 distance and just counts of my dataset

lapis sequoia
urban lance
urban lance
urban lance
#

but when I did a profile report, I saw that the clusters didn't make sense

#

I know what you're all thinking, outliers! haha

lapis sequoia
urban lance
#

pandas profiling report

lapis sequoia
#

...so how does this relate with kmeans?

#

wtf @strange elbow

urban lance
#

I made a different dataset for each cluster, then looked at the values for al features but they weren't really different
for ex:
in cluster 1, the num_visits ranges from 1-27
and in cluster 2 they'd range from 1-24
and cluster 3: 1-32

#

I tried with count values + chi2 distance and hierarchical clustering/GMM
and redid the feature engineering so I had the right data from kmeans

#

should add that the data is normalized

#

@lapis sequoia

lapis sequoia
#

i cannot understand whats the issue

urban lance
#

the clusters don't make sense so something must not be up with the feature engineering
I'm trying to predict where in the customer journey a certain user is so the data should make sense 🤔

#

(tried with 3 datasets from different companies)

#

all the same results

karmic valley
#

i want to work out the gradient of colour from this image. so basically im thinking of creating a line or box that starts at e.g. y=0 and goes up until it hits the black colour and it should find pixel colours in terms of whiteness at all points along that line. so it should go from high white pixel values to medium grey pixel values to blackish pixel values. i can use some more basic softwares like imagej to draw a line and plot pixel values but the problem is in my image region of interest there are lots of random pixels with completely different greyness colours so the graph would have lots of noise (go from very high pixel whiteness to sudden low pixel whiteness). i want some way to kind of smooth image or change odd pixels to have same colour as its neughbouring pixels. for smoothing i cant smooth much because essentially i want to get an accuarate plot of changing pixel intensity across the line. could anyone siggest some code

grave frost
#

idk how they're exactly closed - their speciality is bf16 ops, that's what they're designed to do and Jax, Pytorch or TF works very well IMO

mint palm
#

Will double encoding do anything at all?

serene scaffold
mint palm
#

for encoding string to integers

mint palm
lean harbor
serene scaffold
#

it depends on the algorithm, but generally speaking, you can't just assign words to arbitrary integers, as this tells the algorithm that a word with a higher number is "more" than another word, which makes no sense.

serene scaffold
#

I don't look at screenshots of code, sorry

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

^ that and the paste bin are the only way I will look at code.

mint palm
# serene scaffold ^ that and the paste bin are the only way I will look at code.
import tensorflow.keras
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils

import numpy
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from keras.utils.vis_utils import plot_model

seed = 9
numpy.random.seed(seed)

input_file = "C:\\Users\\rahul\\PycharmProjects\\pythonProject1\\complete.csv"
test_file = "C:\\Users\\rahul\\PycharmProjects\\pythonProject1\\complete.csv"

dataset = pd.read_csv(input_file).values

# read training data
datasetTest = pd.read_csv(test_file).values

# split into input (X) and output (Y) variables
X = dataset[:,0:8].astype("int32")
Y = dataset[:,8]
XT = datasetTest[:,0:8].astype("int32")
print(Y)
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
print(dummy_y)
(X_train, X_test, Y_train, Y_test) = train_test_split(X, dummy_y, test_size=0.5, random_state=seed)
serene scaffold
#

what's the algorithm?

mint palm
serene scaffold
#

so you're using a neural network, built with keras

#

what is the model intended to do?

mint palm
#

i have a dataset thats basically all strings

#

so i have to predict an output that falls in one of the 3 category

#

for which i use onehot representation

serene scaffold
#

"strings". just saying that you have strings is uninformative--what kind of strings? what do they represent? where do they come from?

#

and what are the categories?

mint palm
#
1,1,1,1,0,2,1,5,1
1,1,1,1,1,2,1,5,1
1,1,1,1,2,2,1,5,1
1,1,1,1,3,2,1,5,1
1,1,1,1,4,2,1,5,1
1,1,1,1,5,2,1,5,1
1,1,1,1,6,2,1,5,1
1,1,1,1,7,2,1,5,1
1,1,1,1,8,2,1,5,1
1,1,1,1,9,2,1,5,1
1,1,1,1,10,2,1,5,1
1,1,1,1,11,2,1,5,1```
#

this is a sample

serene scaffold
#

after you encode them?

mint palm
#

encoded sample^

lapis sequoia
#

Is there a good format to save a table with images? I'll typically plot a data frame and save each plot to a file, but it might be nice to just save the whole table to a single file which I can open/annotate. The question is, is there a format which already has visualizers, which allow you to filter/maybe annotate?

serene scaffold
mint palm
serene scaffold
#

remember: I know nothing about what you're trying to do. only you do

lapis sequoia
#

Something a little better than opening a folder full of plots with the file explorer

karmic valley
lapis sequoia
#

I've cooked up something like this, just wondering if there's something mature and off-the-shelf out there

serene scaffold
mint palm
# mint palm

the left hand side column element are column name for encoded data

#

almost

serene scaffold
# mint palm

what are the categories you're predicting for?

#

I should really get back to work, but I would recommend reading about feature encoding. you have to be intentional about how you represent each feature for the network, or it won't understand what you're telling it to do.

mint palm
#

actually the main problem was....when i run this code it says 100% accuracy lol

crisp sluice
#

has anyone had any experience with openai gym? i installed in python 3.9 and tried running sample code from the docs on the openai gym site but the example isn't popping up for me...

#

here's the code:

#

env = gym.make('CartPole-v1')
observation = env.reset()
for _ in range(1000):
    env.render()
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action) # take a random action
    if done:
        observation = env.reset()
env.close()```
misty flint
#

you might have to mess around with kernel sizes

karmic valley
misty flint
#

im not a CV person, so im not the best person to ask sorry. i really didnt like my image processing/feature engineering class kekHands

silver sun
# lapis sequoia hm seems like you are either giving the wrong input or you need to edit architec...

My X train input is this shape (700227, 8) and my y train input is this shape(700227, 11). Here is my model architecture: model.add(Dense(2000, activation='relu',input_dim=24)) model.add(Dense(1500, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(800,activation='relu')) model.add(Dropout(0.2)) model.add(Dense(400,activation='relu')) model.add(Dropout(0.2)) model.add(Dense(150,activation='relu')) model.add(Dropout(0.2)) model.add(Dense(12, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) My error is ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 24), found shape=(None, 8)

karmic valley
#

is there any matlab discord server

mint palm
#

this is a part from github code

#

why are there seperate input_file and test_file

#

he later does split data set into X_train, Y_train, and X_test , Y_test

lean kindle
#

Hello all, I am trying to perform invoice data extraction from an image of invoice and export the data into an excel file. But I want to extract only a few fields from the invoice and not the entire invoice. Can anyone please advise how can I do that ?

from PIL import Image # pip install Pillow

set tesseract cmd to the be the path to your tesseract engine executable

(where you installed tesseract from above urls)

and start doing it

your saved images on desktop

list_with_many_images = [
"PartI_Data/Img1.PNG",
"PartI_Data/Img1.PNG",

"PartI_Data/Img1.PNG"
]

create a function that returns the text

def image_to_str(path):
""" return a string from image """
return pytesseract.image_to_string(Image.open(path))

now pure action + csv part

with open("images_content.csv", "w+", encoding="utf-8") as file:
file.write("ImagePath, ImageText")
for image_path in list_with_many_images:
text = image_to_str(image_path)
line = f"{image_path}, {text}\n"
file.write(line)

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

mint palm
#

i think NO

lapis sequoia
mint palm
#

its like giving unique value to represent a string like in dataset containing sunday to saturday we can substitute 0 to 6

#

well leaving that aside for a while, my main issue is i am getting 200% accuracy on running github code without any change

lapis sequoia
#

what in the world

mint palm
#

100

#

sorry

#

lmao

#

its 100%

#

it seems like my test set is a part of train set

lapis sequoia
#

lol

lapis sequoia
#

seems like private

mint palm
#

no

#

its public

lapis sequoia
modest shuttle
#

Hello, I'm beginner in AI,
What is best methods for fake image(modified) detection? (without CUDA)

serene scaffold
lapis sequoia
serene scaffold
mint palm
modest shuttle
lapis sequoia
#

so involve as in to make it faster right?

modest shuttle
serene scaffold
lapis sequoia
modest shuttle
serene scaffold
#

also, you can experiment with GPU computation using google colab.

lapis sequoia
#

^for certain hours

serene scaffold
mint palm
#

imput_file is ok

#

but if he used train_test_split then why he need test_file

lapis sequoia
#

oh yes so if you have different testdata why are you bothering to split train data?

mint palm
lapis sequoia
#

ok so lets see, I'll assume he has just split train data to get some scores, but if you do look below, he has commented out the part

#from sklearn.metrics import confusion_matrix
#y_pred_keras = model.predict_classes(XT)

#csv = open("C:\\DeepSlice\\5G\\output.csv", "w")
#"w" indicates that you're writing strings to the file

#pd.DataFrame(y_pred_keras).to_csv("C:\\DeepSlice\\5G\\output.csv")
#cm = confusion_matrix(Y_test, y_pred_keras, labels=[0, 1, 2])
#

which is for...testing i assume

#

now why he did above thing, well thats the thing I am unaware of.

mint palm
#

so we are stuck

lapis sequoia
#

you* arelemon_pensive

mint palm
#

it still gives 100 %accuracy

lapis sequoia
#

i mean... it literally depends on dataset and the problem, just for the sake of argument, give it like 0.1?

#

also just ping me here if you ask another question, I'm reading a novel.

lapis sequoia
#

i mean, i dont know what the problem is, may be it could be solve by some linear function for all i know.

mint palm
#

cant be....an author mentioned after applying CNN +LSTM it gave 95% accu

#

how can it be 100 with linear function lmao

lapis sequoia
mint palm
#

yeah i understand, i am not complaining, you are fair at your place

exotic thicket
#

How come the solution got (20,20) would someone mind helping me with this solution

agile cobalt
#

that isn't really fit for this channel
but either way, it's just (2+2, 3+1) = (4,4) then (4*5, 4*5) = (20, 20).
You could've asked in a general help channel though (#❓|how-to-get-help)

barren vigil
#

hello sorry which chat talks about computational neuroscience?

#

૮₍ ˶•⤙•˶ ₎ა
./づ~ 🍓

karmic valley
#

could anyone help me with something. i want to get the pixel whiteness (in terms of greyscale) from bottom to top along a single imaginary line in my image. so i get many values of numbers

serene scaffold
karmic valley
#

oh okay, would you be able to help me with the code

serene scaffold
#

I can't do that rn. sorry.

karmic valley
#

sure, no problem. would i be able to come back at a time that suits you to get more help

serene scaffold
iron basalt
# grave frost wait what - TPUs merging with Apple Sillicon?

They call them neural cores or whatever, but the thing is that you can't just write programs for them without using their high level ML library which is Pytorch-like (and can convert Pytorch, TF models). Which is limiting because for those who want to run their Pytorch models and such there are missing functions (and you just have to hope that Apple adds the missing stuff), and for those that want to just get as much compute as possible out of the SOC, they would have to now hack on this high level API rather than just being able to generate instructions for it directly. People are already actively reverse engineering it, but there is not really any (good) reason for Apple to make it this painful.

#

(The neural cores are more or less just CUDA-like cores ripped out of the GPU)

#

In addition, in the high level API, you can't control where the ANN runs. Apple's driver decides it dynamically and can place it either on the CPU, GPU, or neural cores. This might sound nice, but in practice programmers often know where they want it to run and the driver will just make everything worse by trying to be smart (many have already run into this issue). It would be fine if that was the default setting and you can still force it to run where you want.

minor elbow
#

Apple hardware has not really been that good for scientific computing. It partly stems from the apple/nvidia rift and NV being completely ahead of the game wrt CUDA, but also historically the higher end apple hardware has been designed to be really good at photo/video work

#

i say this as a huge apple fanboy who just got an m1 pro macbook

#

i think they are broadening things tho with the arm/m1 stuff

grave frost
minor elbow
#

its still very early days for desktop ARM. As I see it building massive deep learning models is still a very niche thing and most orgs are going to either do it via cloud or dedicated specialized hardware

#

also ipython nb are for sharing/documenting research output for other ppl, they arent good as units of computation imo

#

stuff like anaconda are obsolete and shouldnt be used IMO

#

it seems to me if you want to do serious GPU compute, you want to use CUDA cause thats what all the libraries support best, which means you're going to use Nvidia hw, which rules out apple.

#

I am not sure how correct that view is

iron basalt
grave frost
#

there hasn't been any information released about the neural cores AFAIK

iron basalt
#

We know that you can use Apple's core-ml lib or whatever it's called which is Pytorch-like and convert Pytorch / TF models to itself. Unofficially people have been reverse engineering it for some time.

minor elbow
#

if u want to use other ppls built models coreml is good imo

#

its a decent solution to a tricky problem

grave frost
#

and I still don't see how it is relevant to the original point - TPUs are pretty customizable w/ Jax, and nothing like Apple's SOC at all. they have plenty of information available and work directly with XLA as well as the Jax team

iron basalt
#

I thought you wanted more info on TPUs being a thing on Apple silicon.

grave frost
#

I don't see any operation you can't do with jax, just that it won't be any faster or optimized unless its precision-agnostic

iron basalt
#

TPUs are Google's specific terminology for them, but it's more or less the same thing as Apple's neural cores. 16 bit floats and all.

grave frost
iron basalt
#

TPUs are much more open, but the general hardware idea of them could be considered to be everywhere now that they are in Apple silicon.

#

And will continue to be everywhere like how a GPU is now.

grave frost
#

well, again AFAIK fundamentally TPU structure is pretty different to other hardware alternatives

#

its more about the hardware architecture 🤔 rather than direct customizations of the chips themselves - they have different memory systems and other complex stuff which I didn't get

iron basalt
#

TPUs do two things well, fast low precision floats for matrix multiplies, and convolutions. They were also designed to fit nicely into their data center racks. Other than that, they are basically just stripped down GPU cores.

#

Apple's neural cores do these two things well also.

#

And are also stripped down GPU cores.

#

So other than name, TPU vs neural core, they are more or less the same thing. My guess is that the rename was to both avoid confusion with Google's stuff and to sell it better.

#

There are probably differences between the two, but they both have the same goal of those fast low precision float operations. And both come from having previously used the GPU and so they are still GPU-like to save R&D time.

misty flint
#

may or may not get to use gpt-3 at my company

frank moth
#

Hello, I am trying to use plot_predict with python 3.10.3 and statsmodels 0.13.2. My advisor ran the exact same code and it worked but when I run it I get the following error. I have tried uninstalling and reinstalling everything 3 times with python 3.9.11, 3.9.9 and statsmodels 0.12.2 which is the version that the advisor uses. None of it is working, how can I get it to work?
Thank you

fig, ax = plt.subplots()
ax = adtrain.loc['2020-05-02':].plot(ax=ax)
fig = result_whole.plot_predict(start = '2021-05-02', end = "2022-02-14", dynamic=True, ax=ax, plot_insample=False)
plt.show()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [7], in <cell line: 6>()
      4 ax = adtrain.loc['2020-05-02':].plot(ax=ax)
      5 ## fig = result_whole.plot_predict(start = '2020-05-02', end = "2022-02-14", dynamic=True, ax=ax, plot_insample=False)
----> 6 fig = result_whole.plot_predict(start = '2021-05-02', end = "2022-02-14", dynamic=True, ax=ax, plot_insample=False)
      7 plt.show()

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\statsmodels\base\wrapper.py:34, in ResultsWrapper.__getattribute__(self, attr)
     31 except AttributeError:
     32     pass
---> 34 obj = getattr(results, attr)
     35 data = results.model.data
     36 how = self._wrap_attrs.get(attr)

AttributeError: 'ARIMAResults' object has no attribute 'plot_predict'

#

If I try to install statsmodels 0.12.2 with the current python version I get a very long error: ``` note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.```
I found this solution ( https://stackoverflow.com/questions/71009659/note-this-error-originates-from-a-subprocess-and-is-likely-not-a-problem-with ) but Idk what plugin to take from the site. The one from the stackoverflow solution does not work.

eager wedge
#

How do I change the learning rate of my CNN using tensorflow?

serene scaffold
eager wedge
#

ill check it out

#

Is there a way to check out the current learning rate

serene scaffold
eager wedge
#

cnn = tf.keras.models.Sequential()
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cnn.add(tf.keras.layers.MaxPooling2D())
cnn.add(tf.keras.layers.Conv2D(32, 3, activation='relu'))
cnn.add(tf.keras.layers.MaxPooling2D())
cnn.add(tf.keras.layers.Conv2D(32, 3, activation='relu'))
cnn.add(tf.keras.layers.MaxPooling2D())
cnn.add(tf.keras.layers.Flatten())
cnn.add(tf.keras.layers.Dense(units=255, activation='relu'))
cnn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
cnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
cnn.fit(x=train_set,validation_data=test_set,epochs=25)

serene scaffold
#
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential()
model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(layers.Activation('softmax'))

opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=opt)

I found this.

eager wedge
#

ok thx

serene scaffold
#

tbh I'm not clear on what an "optimizer" is. is it a way of representing backpropagation as an object?

#

has OOP gone too far?

desert oar
#

i guess you could say it's an implementation of the strategy pattern, if you want to think about it in OO terms

#

basically it just changes the weight update algorithm

#

although in principle you could have l-bfgs or something like that, i don't think there's a "stochastic" l-bfgs version, so you'd have to set batch size = training set

serene scaffold
desert oar
#

no, backprop is backprop

#

"backprop" just is how you calculate the gradient

serene scaffold
desert oar
#

so backprop is always backprop

#

but how the weights are updated, is the optimizer

#

so SGD is the basic algorithm

serene scaffold
#

gradient descent that conforms to a defined probabilistic distribution?!

desert oar
#

what do you mean

serene scaffold
#

just expanding out "SGD" to have the definition of "stochastic" in it.

desert oar
#

oh

serene scaffold
#

I'm still listening, if there is more that you were planning to say.

desert oar
#

sorry i got pulled into a dota match!

#

ping me in an hour lol

#

tldr you can have fancier weight updates than sgd

serene scaffold
#

!remind 1h lick the salt

arctic wedgeBOT
#
Affirmative!

Your reminder will arrive on <t:1647485281:F>!

ionic palm
#

Is simulated annealing in category of machine learning?

serene scaffold
desert oar
#

its an optimization algorithjm

serene scaffold
#

are you playing dota or not?!

ionic palm
#

Can be algorithm and unsupervised learn same time?

iron basalt
serene scaffold
#

yes. informally, unsupervised learning is when you don't tell it what the answers are.

odd meteor
desert oar
#

@serene scaffold an optimization algorithm in general is an algorithm for finding a local or global maximum or minimum

#

so gradient descent is a general category of optimization algorithms

ionic palm
#

Okay a method of unsupervised learning

iron basalt
#

Unsupervised is the type of algorithm, not a specific one. There are several.

ionic palm
#

Huh then not machine learn

desert oar
#

simulated annealing is another optimization algorithm

iron basalt
#

There are categories of machine learning, and there are categories of optimization.

grave frost
#

@ionic palm you are probably looking for k-means clustering as an unsupervised algorithm

serene scaffold
grave frost
#

I was referring to this

Can be algorithm and unsupervised learn same time?

iron basalt
#

So you pick which category of machine learning you want / need. Then it will probably involve some optimization problem which can be solved by picking a method from some category of optimization algorithms.

ionic palm
#

Wtf is unsupervised algorithm

grave frost
#

no labels

#

say if you have a dataset of cats and dogs - in unsupervised learning, you just lump together photos that look like cats together and photos of dogs together, but you don't know which one of the either sets are cats or dogs

#

this is an example btw

serene scaffold
#

what was your actual question going to be, anyway, @ionic palm?

ionic palm
#

Is simulated annealing in category of machine learning?

#

Now i understand it simulated annealing is a optimize method of unsupervised learning

odd meteor
# ionic palm Wtf is unsupervised algorithm

😀 It's a type of Machine Learning. So apparently there are basically 5 types of ML (some even say 4 types)

Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Self-Supervised Learning
Reinforcement Learning

serene scaffold
serene scaffold
ionic palm
#

Im sorry for asking that much, thank you so much

serene scaffold
#

it's long debugging sessions that are more draining for us

odd meteor
iron basalt
#

(some don't even have proper names yet beyond temporary made up names)

#

(you could argue there is just two, unsupervised (external input only (and those generated by itself)), and addition input / structure / supervised (e.g. labels, embedded vector spaces, knowledge graphs, reward signals, etc (can be hand crafted)))

#

(but you could also go more extreme and say it's all just inputs, which is not very useful because it does not distinguish anything (how many clusters do I choose?))

arctic wedgeBOT
misty flint
#

as much as people seem to dislike the idea of full stack DS, i really think there is power in being able to prototype stuff, especially "data apps"

#

thats my hot take for tonight kekHands

serene scaffold
#

@misty flint if full stack were applied to DS, I would think it should mean "a data scientist who is also a software engineer"

#

Because there are a lot of data scientists who shit out terrible code. Especially if they only use notebooks.

#

There's my hot take as well.

iron basalt
#

(Assuming they do DS just as well)

serene scaffold
#

I'm not sure what rex means by disliking the concept of full stack ds

misty flint
#

im trying to understand that myself

#

i think those who are more "A-type Analyst" data scientists dislike the concept of full stack is what it seems like

serene scaffold
#

There are a lot of concepts I hate as well. Including Keurig and Applebee's.

misty flint
#

but the "B-type Builder" data scientists obv are all for it

mint palm
#

(X_train, X_test, Y_train, Y_test) = train_test_split(X, dummy_y, test_size=0.001, random_state=seed)
does this suffle dataset before dividing between train and test?

serene scaffold
#

@mint palm the docs probably specify if the partitions are random, arbitrary, or deterministic.

misty flint
#

i think full-stack DS can prove business value to average companies much easier than someone who doesnt; i think maybe as this field develops more they can help add ML features to apps and such (outside of data-driven orgs)

#

otherwise you have business people always asking DS whats their business value

#

since many dont truly understand the concept of R&D

#

and experimentation

#

"what did you do this quarter?"
"...our experiments were inconclusive"
"..."

iron basalt
#

I'm not sure who would be opposed to this. If you can do more work for me then sure, go ahead.

misty flint
#

oh, im all for this squiggle

lone merlin
#

I am new to the data science world. I want to ask some questions. If I want to be Business Analytics Specialist, is it better to focus on the ML aspects or I need to learn something else? RIght now I am still an undergrad student at math. Thank you!

misty flint
#

T-shaped for life

iron basalt
#

Is this your chart?

ionic palm
#

Sadly.. yes

misty flint
iron basalt
#

Simulated annealing is outside all of that, in another bubble, under "optimization".

misty flint
#

optimization, oof

#

thats dissertation stuff right there

ionic palm
#

I am so sorry

iron basalt
#

What are you sorry for?

#

Getting any kind of set chart with intersections, subsets, and such will be difficult and not really work for this.

#

Not everything fits in those.

misty flint
#

as humans, we like to put everything into boxes PikaThink

iron basalt
#

It's more like pick type of ML, then pick algorithm in that type, and that may involve annealing, etc, which is its own separate thing.

ionic palm
#

ML Type + Algorithm = ML?

iron basalt
#

What you had is kind of like putting linear algebra under chemistry. Chemistry makes use of it, but it's not a subset of it, linear algebra exists on its own.

iron basalt
#

Note that this is super loose and can pretty much include any program that stores inputs or information about those inputs.

#

A really simple example is a program that takes two numbers as input, X and Y.

#

The program then stores those are pairs.

#

And if you give it one, it can give you the other.

#

It learned to associate them.

#

The complexity of ML is how to do more with less. And how to infer stuff based on what was stored.

#

And also what if your input is noisy / not exact? Etc.

inland zephyr
#

DS research the model

#

ML Engineer make it to production level

iron basalt
#

What if the input is too complex to really deal with directly (e.g. an image)?

inland zephyr
#

There are still confusion about the job responsibility and capability for ML and DS

iron basalt
#

(how do you store the relevant stuff about it?)

#

ML is about machine learning, nothing more and nothing less. DS often involves standard statistics, forcasting, business stuff. DS can make use of ML since ML happens to also often make use of statistics to function well.

#

DS is more like, I have this job, and I need to pick the right tools and such. That might include picking an ML based tool.

inland zephyr
#

from my previous code
i hate that big gap between rows... now find the way to tighten the gap

inland zephyr
iron basalt
#

However, many DS are also often MLEs, etc. People are not limited to one thing, it's just their job title.

inland zephyr
#

oh now i get the point

misty flint
#

also this field is so new that the boundaries tend to blur quite often

inland zephyr
#

DS in title
full stack Data and Modelling is the real job desk

misty flint
#

and it differs per company too

iron basalt
inland zephyr
#

The more distinctive is DS and DA

#

DS could also handle job of MLE

#

and sometimes DA too

iron basalt
#

Yes, DS is broad.

misty flint
#

for companies, i think in general for a "data team" if they can cover the majority of the skills between all the roles, i think thats sufficient for most business use cases

iron basalt
#

(in part due to companies not knowing what DS is and often just want a statistician that knows Python or R)

misty flint
#

obv if you are a data-driven SaaS company, thats very dif then

iron basalt
#

(but also acts somewhat as an accountant? idk, it's weird)

misty flint
inland zephyr
lone merlin
#

I rarely find a job title dan use 'Machine Learning Engineer'. sometimes it's also business analytics. and many people don't know what Business Analytics means. That's why I often are confused, haha

mint palm
misty flint
inland zephyr
#

I find an MLE job
and it need higher degree like Master or PhD

iron basalt
#

I mean really it's a nebulous "handles data" person. Which often involves statistics, some spreadsheets (or getting stuff from databases / any tables), and some graphing.

#

(from what the companies know / POV)

misty flint
iron basalt
#

They will often make the requirements much bigger than needed.

misty flint
#

yknow what i heard about that on a separate podcast

#

that particular element as well as increasing number of years allows companies to do one critical thing for job postings

#

decrease the amount of job applicants

#

dunno if thats actually true but thats what the podcast guest advocated for

#

hes a director-level so maybe Oopsies

misty flint
#

interesting

#

i do want to try out airflow sometime

#

get bit familiar with it

#

see if i can use it for this one project

glossy hinge
#

can anyone suggest me a good YouTube video on reinforcement learning ?

lapis sequoia
#

i need a nice platform, I'm using draw.io but I dont know...it seems okay.

tough barn
#

please anyone help me with this 2 csv , I want a single row of each which contains only numeric data so that i can again convert that into another and use to test my model, I am getting this type of csv as a feature extracted from a package. So please help me out in this. Thank you

#

I tried but it gives weird output and in t2.csv unable to open using pandas and also unable to convert to float from string

tacit basin
tough barn
#

like 0.0406, 0.0363, 0.0278, 0.0206, 0.1041, -0.0145, -6e-04, 0.0654, 0.04, 0.086, 0.0775, 0.0018, 0.0285, 0.109, 0.0569, 0.0169, 0.0484, 0.161, 0.0248, 0.0696, 0.0285, 0.0367, 0.0438, 0.0269, 0.0758, 0.0389, 0.0049, 0.0367, 0.0325, 0.0796, 0.0778, 0.0334, 0.0589, 0.0939, 0.0919, 0.026, 0.0331, 0.0943, 0.0247, 0.0616, 0.014, 0.0314, 0.0409, 0.0419, 0.0949, 0.0409, 0.0249, 0.0614, 0.0345, 0.066, 0.0485, 0.0438, 0.031, 0.0688, 0.1064, 0.0406, 0.0488, 0.0868, 0.0314, 0.0431, 0.0329, 0.0514, 0.0432, 0.0533,
0.0747, 0.0552, 0.0489, 0.0638, 0.045, 0.0484, 0.031, 0.0579, 0.0085, 0.0498, 0.1074, 0.0454, 0.0442, 0.0902, 0.0173, 0.0316, 0.0124, 0.0327, 0.0582, 0.0438, 0.116, 0.0352, 0.029, 0.0849, 0.0306, 0.0418, 0.0375, 0.0412, 0.0265, 0.0628, 0.0717, 0.0515, 0.0487, 0.0904, 0.0454, 0.0399, 0.0316, 0.0517, 0.0459, 0.0304, 0.0781, 0.0454, 0.0245, 0.0442, 0.0446, 0.0643, 0.0492, 0.0501, 0.0239, 0.0616, 0.0838, 0.0381, 0.0484, 0.1091, 0.0281, 0.0469, 0.0461, 0.0619, 0.0493, 0.0503, 0.0629, 0.0572, 0.0522, 0.0611, ...... in a single row without c() for t2.csv and t3.csv

#

separate output for t2 and t3

#

like c(1,2,3,4),c(5,6,7,8,..)
should give output as 1,2,3,4,5,6,7,8...

tacit basin
tough barn
#

diff files t2.csv and t3.csv

#

which are uploaded

tacit basin
tough barn
#

okay I have to try this, if you have code ref you can share

tacit basin
tacit basin
tough barn
#

can we perform t2.split() directly on csv

tacit basin
tough barn
#

by which func

tacit basin
tough barn
#

thank you for saving my day

#

🤝

mint palm
#

@lapis sequoia

#

this is the unencoded matrix

#

i found it
the green box in Y(prediction)

#

Y can be either eMBB, URLLC, mMTC

#

as you can see the rest0(X[0:8,:]) are categorical data, thats why author might have encoded it

stark breach
#

Hey ,wanted to begin ML ,don't know where to start and if i should to algorithms and data structures and dwell into competitive coding first

#

Also i don't comprehend uni level math

#

like the stuff in andrew Ng

mint palm
#

yes

lapis sequoia
#

yes what?

#

I'm sorry I don't even remember what was the question. it was all yesterday.

mint palm
#

ok, the problem was:
is ok to encode X before solving

mint palm
exotic thicket
#

How come the 2nd question got an answer of 77mm. Can someone mind interpretin it?

odd meteor
# mint palm ``(X_train, X_test, Y_train, Y_test) = train_test_split(X, dummy_y, test_size=0....

Yes it does. By default, it randomizes your sample observations before splitting them into train and test set. You can also disable shuffling prior splitting.

train_test_split(x, y, test_size =0.15, shuffle=False, random_state = 2022)

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

odd meteor
iron shell
#

Hey I'm in doubt, I was training a model and I used a split part to be train and other to be valid, to both I normalized, but I've another dataset to be test, that simulate "real data", should I normalize too?

exotic thicket
#

@lone merlin hello dude do u know any best discord communities on mathematics I had stuck with linear algebra, coordinate geometry, a random process that is all based on problems on computer vision and image processing problems.. Plz would u mind helping me to get what I'm looking for desperately since I took the CV and IP course which is all abt mathematical and physical underpinnings

odd meteor
# stark breach Hey ,wanted to begin ML ,don't know where to start and if i should to algorithms...

I think a large number of people who recently got into ML had at some point used the popular Andrew NG's Machine Learning course on Coursera.

You can start from there and see if that works for you. However, if you happen to be like me who was practically dosing off each time I go beyond 20 mins in Andrew Ng's course, then feel free to drop the course and try Udemy, DataQuest, DataCamp, Books, Bootcamp, etc.

I guess the moral of my story is, try as much resources as possible before you eventually settle for one, and don't waste time to drop any material that doesn't work for you.

Only after you've gained some form of experience in learning ML would you truly appreciate using Hackathons to validate and reinforce what you've learned.

All the best ✌️

exotic thicket
#

@odd meteor @catcatgurl hello dude do u know any best discord communities on mathematics I had stuck with linear algebra, coordinate geometry, a random process that is all based on problems on computer vision and image processing problems.. Plz would u mind helping me to get what I'm looking for desperately since I took the CV and IP course which is all abt mathematical and physical underpinnings

jaunty mural
#

any suggestion, how to improve looking this graph?

#

and this one

odd meteor
nimble jolt
#

Hey guys I'm a beginner in python and I have some module installation problems,I don't know how to solve can any f you help

#

Or is this not the place to ask

odd meteor
odd meteor
nimble jolt
#

The module name is pywhatkit

#

The error message is line 3, in <module>import pywhatkit

odd meteor
nimble jolt
#

I don't understand still a beginner

#

Can you explain in simpler terms

#

plz

jaunty mural
odd meteor
nimble jolt
#

Ok

#

But what if i'm using pycharm

jaunty mural
nimble jolt
#

OK

odd meteor
nimble jolt
#

But i've tried installing it on both pycharm and command propmt and they all say it has been installed.

nimble jolt
#

Oh and thanks for your help.

odd meteor
# jaunty mural any suggestion, how to improve looking this graph?

I don't know if there's any other better way to visualise a 3d dataset other than probably making it interactive. That way you can interact, rotate, and view each dimension of the data with ease.

Isn't this plotly? 😀

If not, try using plotly or cufflinks to plot and visualize the data

sand fossil
#

hello peoples

urban lance
#

what clustering methods are great for hyperbolic data?

jaunty mural
exotic thicket
# exotic thicket

@serene scaffold @digital radish plz any of u help me with this problem

jaunty mural
#

@odd meteor no, plotly is good for zooming live, but I need this plots to include in my paper, so

ember lark
#

anyone here have any experience with pyttsx3 for text to speech? Currently weighing different options for a voice engine for my AI but for windows pyttsx3 uses sapi5 engine which brings robotic voices. Other txt to speech engines I have found need a file to read from to turn it to speech. Does anyone have a recommendation for natural sounding voices using pyttsx3 or a free API that can achieve the same thing?

misty flint
#

anyone work with gpt3 before

odd meteor
serene scaffold
#

@exotic thicket it's impolite to ping random people to draw attention to your question. Please refrain in the future--this is a warning.

#

@lapis sequoia did you see what I just said to pari?

lapis sequoia
#

I'll delete it if you want.

serene scaffold
lapis sequoia
#

damn, i just..did.

serene scaffold
#

now they've been pinged and they'll never be able to figure out where it came from.

#

anyway, you can ping someone if they've already engaged with your specific question.

lapis sequoia
lapis sequoia
#

high definition was possible in matlab as much i remember.

jaunty mural
jaunty mural
lapis sequoia
#

ah alr, my bad lol.

jaunty mural
#

it's ok

lapis sequoia
#

so yeah what do you mean by improve here?

jaunty mural
mint palm
#

i used one hot encoding on all Y.
is it correct?

jaunty mural
#

changed transparency value

mint palm
#

my output isnt one hot though

#

output is still is probability(floats)

lapis sequoia
odd meteor
jaunty mural
odd meteor
jaunty mural
serene scaffold
#

!otn a panel beat the data

arctic wedgeBOT
#

:ok_hand: Added panel-beat-the-data to the names list.

jaunty mural
mint palm
copper dirge
#

G'day everyone, not sure if it would be able to be done or not, but is it possible to create a VERY simple machine learning algorithm using ONLY numpy?

#

Something that can categorise messages as spam or not for example

#

If so, please ping/pm me 🙂

lapis sequoia
copper dirge
#

That's pretty cool

#

I'm not sure what to look for, do you think that you'd be able to nudge me in the right direction @lapis sequoia ?

#

Would it also be possible to do the following:

#
1. Export the programs "learning" to a text file of some sort
2. Read this file when a "classify" function is run?

I assume this would be better rather than training the bot every time you want to classify something?

lapis sequoia
#

Anyone with experience in (geo)pandas that wants to help me figuring out why my plot does not show? #help-cookie

#

Hi, does anyone here use kaggle, want to ask for their opinions on the cost on data since I don't have any unlimited internet plans

tacit basin
lapis sequoia
# tacit basin I use kaggle from time to time

I am thinking of signing up to kaggle to learn pytorch on their jupyter notebooks. i was afraid on whether it would cost money to use their gpu but someone told me that there aren't any charges.
I would like to ask, does kaggle need a constant internet connection to code

tacit basin
lapis sequoia
tacit basin
#

There are more free GPU options: google colab, paperspace gradient, AWS sagemaker studio lab

lapis sequoia
tacit basin
lapis sequoia
lapis sequoia
tacit basin
#

But kaggle is data science competition platform. They provide free GPU hours so ppl can learn or start competitions, but it's not used as paid cloud computing resource as for example gcp, AWS, Azure

lapis sequoia
lapis sequoia
#

i'll setup an account and try it only for learning data science.

tacit basin
#

See you on some competition leaderboard soon. Good luck!

lapis sequoia
lapis sequoia
steady basalt
#

@lapis sequoia

lapis sequoia
#

Can anyone give me a good educational documentation regarding heatmapping etc. With matplotlib?

#

Ive found this but it isnt well explained

steady basalt
#

I’d normally recommend seaborn for that but matplot really straightforward it’s only takes one line of code @lapis sequoia

#

Nevermind, opened your document to find something that looks far from the heatmap I am used to

misty flint
#

yeah im used to the other heatmap term

#

not the geological one

agile cobalt
#

maybe look up Choropleth charts?
(and relevant documentation / stackoverflow questions for whichever libraries you use)

mint palm
#

Sterlercus
i found the data

steady basalt
#

Anyone else find the titanic Kaggle impossible to beat 80% accuracy?

#

I see scores even high

serene scaffold
steady basalt
#

Have you done the space titanic too?

#

It’s better

#

It’s about a spaceship called titanic

misty flint
#

do the majority of the people die as well

steady basalt
#

No they disappear

misty flint
tacit basin
lapis sequoia
misty flint
#

ahhhhh

#

endless query optimizations

#

if only people had all their requirements listed at the beginning

#

tragic

#

this is the real data scientist life

exotic thicket
steady basalt
#

Ah you guys.. how long does RFE take to run on a 8000 row and 8 feature big set

#

On Kaggle CPUS

#

It’s been trying to select 3 features for 8 minutes now

#

Step=1

#

logistic regression estimator

steady basalt
#

Okay it’s been running for 25 mins now, and on my macs cpu 15 mins

#

It’s sklearn btw

#

Is there even any point in doing this or should it just be manual selection based on a correlation heatmap

#

okay my laptops about to set on fire holy shit

#

AAA HELP

tacit basin
#

living on the edge :))

steady basalt
#

Bro, i am on 170%

#

Did i do something wrong?

#

its been running for 30 mins

#

no, more actually

#

does this feature selection only work on data with same data type or wat?

tacit basin
#

gpu memory 15.7 out of 15.9, amost the dreded out of memory error lol

steady basalt
#

the gpu accelerator doesnt even work with mine

#

is it normal for feature selection to run for hours? when theres only 8 features in total?

#

okay well, im actually running with one hot encoded version so quite alot more

tacit basin
#

not sure why you select features from X, after you split to X_train
not sure if this will speed up massively as this is still 70% of data, but it's a correct way not to use test data to set up features etc

steady basalt
#

I have read before that its better to do this way

tacit basin
#

can you share a link?

#

to where you read that

steady basalt
#

I just googled 'feature selection before or after test train spit'

#

first result

tacit basin
#

because it's a bit cheating, in practice you don't know the test data, that's the reason to split