#data-science-and-ml | Python | Page 5

steady basalt Aug 2, 2022, 6:02 PM

#

there are 1800

#

of 50k test size

desert oar Aug 2, 2022, 6:02 PM

#

this is exactly why we do stratified sampling. if the rare class is 5% of cases, you want to make sure that it's 5% in both the train and test sets if possible. then you can oversample in test later.

steady basalt Aug 2, 2022, 6:02 PM

#

that isnt a issue

steady basalt Aug 2, 2022, 6:02 PM

#

desert oar this is exactly why we do stratified sampling. if the rare class is 5% of cases,...

hmm?

#

i see that as cheating as test data shud be totally unseen

desert oar Aug 2, 2022, 6:03 PM

#

it's not cheating. it's a valid technique. the test data is meant to be a simulation of unseen data, but your data

steady basalt Aug 2, 2022, 6:03 PM

#

how do i fix this?!

desert oar Aug 2, 2022, 6:04 PM

#

1800 / 50_000 = 3.6%

#

what's the % in the training set before oversampling?

steady basalt Aug 2, 2022, 6:04 PM

#

in the overall dataset its 18k/500+k

#

so um

desert oar Aug 2, 2022, 6:05 PM

#

ok so about the same

steady basalt Aug 2, 2022, 6:05 PM

#

15k/45?

#

youd expect so with random split

desert oar Aug 2, 2022, 6:05 PM

#

yeah with that much data it should be ok. so alright, you've ruled out a pathological case like having literally 5 instances of the rare class in the test set

steady basalt Aug 2, 2022, 6:05 PM

#

1800 is more th an enough

desert oar Aug 2, 2022, 6:06 PM

#

next question: feature distributions. are there any rare feature values that show up in one split but not the other?

steady basalt Aug 2, 2022, 6:06 PM

#

is there a quick and ez way to test that

desert oar Aug 2, 2022, 6:06 PM

#

not in general. but conceptually it's "groupby and compute the distribution"

#

or just compute the distribution in both sets and compare to the baseline

steady basalt Aug 2, 2022, 6:06 PM

#

X_train and X_test plots?

#

let me go and look

desert oar Aug 2, 2022, 6:06 PM

#

how many features do you have anyway? what kind of model is this?

steady basalt Aug 2, 2022, 6:07 PM

#

ok there was about 10 features but after dummifiying its 50

#

a handful of continouis

#

random forest for now, but will change that later once fixed

#

to xgb maybe

desert oar Aug 2, 2022, 6:08 PM

#

they're all categorical?

steady basalt Aug 2, 2022, 6:08 PM

#

so, histograms for cont data and countplots for the onehot encoded data?

#

ei have 5 numerical

#

althoi about 70~% of one of them were imputed w median

desert oar Aug 2, 2022, 6:10 PM

#

generally it's not necessary to dummify categorical variables in tree-based models. if you do that, you end up skewing your model to over-fitting on high-cardinality features. sometimes it's actively harmful.

steady basalt Aug 2, 2022, 6:10 PM

#

Oh.........

#

let me change that and see if the problem is fixed

desert oar Aug 2, 2022, 6:11 PM

#

steady basalt althoi about 70~% of one of them were imputed w median

are you doing this inside the training loop? just checking. the median should only be computed on the train set, not the full data (train+test set).

steady basalt Aug 2, 2022, 6:11 PM

#

well spotted

#

i did it on all data

desert oar Aug 2, 2022, 6:11 PM

#

well that's an obvious source of data leakage

#

https://towardsdatascience.com/one-hot-encoding-is-making-your-tree-based-ensembles-worse-heres-why-d64b282b5769 here's a good illustration of the one-hot encoding problem

Medium

One-Hot Encoding is making your Tree-Based Ensembles worse, here’s...

Optimizing Tree-Based Models

steady basalt Aug 2, 2022, 6:12 PM

#

so its fillna of both test and train with train median

#

ah man, i imputed modes too way earlier when i ifrst loaded the data

desert oar Aug 2, 2022, 6:13 PM

#

yeah that's an easy mistake to make. assuming you're using scikit-learn, you'll want to add it as a step in your sklearn pipeline and not at the beginning when loading data

steady basalt Aug 2, 2022, 6:14 PM

#

actually, im not so sure all of it was a problem

#

alot of things had to be imputed manually

#

such as missing variable to a certain category

desert oar Aug 2, 2022, 6:16 PM

#

yeah i don't want to declare prematurely that fixing the data leakage will solve your problem. but it's one thing you'll want to eliminate anyway.

steady basalt Aug 2, 2022, 6:18 PM

#

maybe your advice on the balance of the features wil work il lcheck it out

#

@desert oar

#

first comparison

#

its not too different

#

omfg i have some outlier

#

no nvm

#

outlier shiudn tmatter

#

#

#

looking better with some other params i chose, but sitll not good enough for class1 f1

#

now, if i grid searched for f1, and it thought the best params are when class0 f1 is 0.9 and class 1 f1 is 0.2, how do i instead try to go for both being somewher ein the middle, say 0.7

#

@desert oar

#

class-specific f1 for class1 would fix it?

desert oar Aug 2, 2022, 6:40 PM

#

steady basalt now, if i grid searched for f1, and it thought the best params are when class0 f...

you only have 2 classes, right? use overall f1, not class-specific

steady basalt Aug 2, 2022, 6:40 PM

#

so just 'f1' metric no variant

desert oar Aug 2, 2022, 6:40 PM

#

right

steady basalt Aug 2, 2022, 6:41 PM

#

but it keeps thinking best params is when class0 f1 is really high

#

and that means clas1 will suck

desert oar Aug 2, 2022, 6:41 PM

#

isn't "class1 f1" what you want anyway?

steady basalt Aug 2, 2022, 6:41 PM

#

yeah

desert oar Aug 2, 2022, 6:41 PM

#

assuming the rare class is class 1

#

so ignore class 0 f1, it's not the thing you care about

steady basalt Aug 2, 2022, 6:41 PM

#

well, it kinda matters

#

but still, class1 f1 sucks

#

idk why

#

did u see the histograms above

desert oar Aug 2, 2022, 6:43 PM

#

steady basalt well, it kinda matters

in this case it doesn't matter because you only have 2 classes - every prediction of "class 1" is a prediction of "not class 0"

#

a false positive for one is a false negative for the other

steady basalt Aug 2, 2022, 6:43 PM

#

?

#

its best to get a decent f1 for both

#

?

desert oar Aug 2, 2022, 6:44 PM

#

i am suggesting that you focus on the f1 score for the class that you care about. and that you should convince yourself in the binary classification case that there's no point in looking at both

#

i'm not sure what the histograms are telling me

steady basalt Aug 2, 2022, 6:44 PM

#

so if recall and precision for the negative class is rly low, im saying my models going to tell people theyre getting x disease when actually they aint?

#

thats bad

#

with predicting diseases its probably quite important to consider both classes

#

i dont want amodel that cannot properly predict people who dont have the disease

desert oar Aug 2, 2022, 6:49 PM

#

steady basalt so if recall and precision for the negative class is rly low, im saying my model...

it is true that precision and recall don't take into account the true negative %. in this case you have two competing optimization goals

steady basalt Aug 2, 2022, 6:49 PM

#

Yes

#

thats a further challenge, but for now, maybe i shud optimise to make class1 f1 highest rather than just say scorer=f1

desert oar Aug 2, 2022, 6:50 PM

#

you can also look at "balanced accuracy" which is the average of sensitivity and specificity

desert oar Aug 2, 2022, 6:50 PM

#

steady basalt thats a further challenge, but for now, maybe i shud optimise to make class1 f1 ...

scorer=f1 is equivalent to "class1 f1", that's what i'm saying

steady basalt Aug 2, 2022, 6:50 PM

#

the issue with that is that the stupid grid search will think '0.9' precision and '0.1' recall is best bcs the f1s highest, when i cud be getting something better like 0.65 and 0.65

#

in this case recall>precision

desert oar Aug 2, 2022, 6:50 PM

#

fair enough. maybe you can penalize extreme values somehow

steady basalt Aug 2, 2022, 6:50 PM

#

i cud optimise for recall, and have tried, but still got the same bad result

desert oar Aug 2, 2022, 6:51 PM

#

try balanced accuracy maybe

steady basalt Aug 2, 2022, 6:51 PM

#

i think this has to be a data problem, i shud step away from grid searching and just use default models until something looks alright

desert oar Aug 2, 2022, 6:51 PM

#

that's also a good decision

#

start with a baseline model and then optimize after you know it works

steady basalt Aug 2, 2022, 6:51 PM

#

so how should i go about fixing this weird issue

#

i showed you the distributions arent too different

#

a bit for age, sure but shudnt ruin the model

desert oar Aug 2, 2022, 6:52 PM

#

right, so you've ruled out another issue

#

now it seems that you need to turn to the imbalance problem itself

steady basalt Aug 2, 2022, 6:52 PM

#

well i used smote so its 5050 for train

desert oar Aug 2, 2022, 6:52 PM

#

what are your performance metrics within the train set?

steady basalt Aug 2, 2022, 6:52 PM

#

i turned 18k/450k into 450/450k

desert oar Aug 2, 2022, 6:53 PM

#

it's possible also that the smote results aren't good

steady basalt Aug 2, 2022, 6:53 PM

#

i dont think undersampling will help me her eeither as i need the data

desert oar Aug 2, 2022, 6:53 PM

#

i've heard "mixed reviews" about oversampling and i don't personally use it

steady basalt Aug 2, 2022, 6:53 PM

#

smote shud work, i mean its just balancing the class while adding nothing new

#

for model purposese

#

i dont think thats the problem

#

my halving grid search seems to be showing test f1 of 0.7 so far maximuim

#

but im telling u when i print the actual results of the test set its gona suck for either class

#

espeically recall/precision being imbalanced

#

if i dont fix this soon im gona have to call it quits and submit bad results, which doenst look good for a thesis

#

sure, its not about performance but it helps when the reader sees decent numbers that would be helpful in deployment

desert oar Aug 2, 2022, 6:58 PM

#

unfortunately this is just how machine learning goes

#

i do need to get back to my own work for now, but you'll have to keep trying things and coming up with reasons why the model might be overfitting to one class or another

steady basalt Aug 2, 2022, 6:58 PM

#

is there a metric thats balanced f1? so its not maximising by saying 0.99 precision and 0.1 recall and instead tries to find a maximum where both are highest and balanced?

desert oar Aug 2, 2022, 6:59 PM

#

you could e.g. take the average of both f1 scores

steady basalt Aug 2, 2022, 6:59 PM

#

does sklearn have it?

#

i need to make it?

desert oar Aug 2, 2022, 6:59 PM

#

i'm not sure if that's equivalent to plain f1 though... might need to write it out

steady basalt Aug 2, 2022, 6:59 PM

#

yeah no i meant for class 1 mostly

desert oar Aug 2, 2022, 6:59 PM

#

steady basalt is there a metric thats balanced f1? so its not maximising by saying 0.99 precis...

oh like penalizing extreme values

#

im not sure, it would be useful though

steady basalt Aug 2, 2022, 7:00 PM

#

i meant i prefer 0.7 recall and 0.6 precision over 0.9 and 0.3

desert oar Aug 2, 2022, 7:00 PM

#

actually it's a harmonic mean, it should discourage extreme values anyway

steady basalt Aug 2, 2022, 7:01 PM

#

#

optimising f1 again hasnt done well for clas 1

#

its maximiing class 0

#

0.07... terrible

#

0.55 recall...

desert oar Aug 2, 2022, 7:02 PM

#

you don't know that it's maximizing class 0 precision. you only know that it's coming out high

steady basalt Aug 2, 2022, 7:02 PM

#

thats def not trying to optimise class 1 f1

#

scorer=f1 must be doing average f1

#

f2_score = make_scorer(fbeta_score, beta=2, pos_label=1) shud i do this

desert oar Aug 2, 2022, 7:03 PM

#

stop and read the docs. f1 score without any further qualifications is precision and recall of the "1" class. and that is how it's almost always used. resist the temptation to guess

steady basalt Aug 2, 2022, 7:03 PM

#

that just cant be whats going on here

desert oar Aug 2, 2022, 7:04 PM

#

it has to be, unless there's a bug in your code

steady basalt Aug 2, 2022, 7:05 PM

#

                             random_state=42,verbose=10,n_jobs=-1,cv=3,scoring='f1').fit(X_train, y_train)
clf=RandomForestClassifier(**search.best_params_)```

#


y_preds = clf.predict(X_test)
print(accuracy_score(y_test, y_preds))

print(classification_report(y_test, y_preds))
metrics.plot_roc_curve(clf, X_test, y_test) ```

#

maybe i shud try .95 .05 split ....

#

xd

#

900 rare class is still not too bad

desert oar Aug 2, 2022, 7:07 PM

#

i mean, it's possible that it's using the wrong f1 score i guess. try being explicit, scoring=make_scorer(f1_score, average='binary')

steady basalt Aug 2, 2022, 7:08 PM

#

f1 score doenst work

#

i think new sklearn is f1?

desert oar Aug 2, 2022, 7:09 PM

#

from sklearn.metrics import f1_score

#

that isn't a string. it's the actual function

steady basalt Aug 2, 2022, 7:09 PM

#

i was using grid search's scorer f1 function

desert oar Aug 2, 2022, 7:09 PM

#

right, it lets you pass a string for convenience

#

somewhere in the docs (i forget where) it says which string corresponds to which function

steady basalt Aug 2, 2022, 7:09 PM

#

'f1' is probably the same as yours

#

anyway, tried splitting down to 95% 5% which is 870 class1's

#

some more training data for it

#

i suppose it wud be bad luck if its like, given the 800 samples of the 18k that are the hardes tto predict

#

: )

#

guess thats why i need to do cv, ill do it on the train set

amber thorn Aug 2, 2022, 10:36 PM

#

Can anyone here help me with a code I've been working on...I am having difficulty calling a function into another function. lmk if anyone's interested, i'll give detailed explanation in PM.

#

help is appreciated!!

serene scaffold Aug 2, 2022, 10:50 PM

#

amber thorn Can anyone here help me with a code I've been working on...I am having difficult...

Don't ask people to go to your DMs. No one wants to wait for you to type out what your real question is to find out if it's something they can/want to answer.

amber thorn Aug 2, 2022, 10:51 PM

#

oh okay...thanks!

worldly kiln Aug 2, 2022, 10:55 PM

#

hey, if possible i would like some guidance regarding starting my journey in data science….If anyone can help me where i should start from, what steps to take on first and where do i learn it from? thank you.

serene scaffold Aug 2, 2022, 10:58 PM

#

worldly kiln hey, if possible i would like some guidance regarding starting my journey in dat...

keep in mind that it's mostly math, and data scientist positions are very difficult to get without a degree. that said, this is the book I recommend: https://www.oreilly.com/library/view/data-science-from/9781492041122/

O’Reilly Online Learning

Data Science from Scratch, 2nd Edition

worldly kiln Aug 2, 2022, 10:59 PM

#

serene scaffold keep in mind that it's mostly math, and data scientist positions are very diffic...

thank you so much, i do have a degree in IT but it doesnt take me to a direction i want to go in

serene scaffold Aug 2, 2022, 11:01 PM

#

worldly kiln thank you so much, i do have a degree in IT but it doesnt take me to a direction...

if you work on a book like this for a while and feel confident that this is the way to go, a boot camp might help you bridge the gap. but it's very difficult to get a first job in data science, even with a CS degree.

worldly kiln Aug 2, 2022, 11:04 PM

#

serene scaffold if you work on a book like this for a while and feel confident that this is the ...

so, you are suggesting doing MS in data science first is a better approach?

serene scaffold Aug 2, 2022, 11:06 PM

#

worldly kiln so, you are suggesting doing MS in data science first is a better approach?

yes, that would make it quite likely that you'd find a position.

#

though it might be a CS degree, and you'd have to look into ways to make it data science-focused. there usually aren't data science degrees.

delicate wasp Aug 3, 2022, 12:10 AM

#

Whats AI ?

lapis sequoia Aug 3, 2022, 12:26 AM

#

delicate wasp Whats AI ?

anything that has logic pretty much

#

could be an if statement

serene scaffold Aug 3, 2022, 12:26 AM

#

lapis sequoia anything that has logic pretty much

That's too general.

iron basalt Aug 3, 2022, 12:26 AM

#

delicate wasp Whats AI ?

Good question...

serene scaffold Aug 3, 2022, 12:26 AM

#

delicate wasp Whats AI ?

Programs that emulate the application of knowledge.

lapis sequoia Aug 3, 2022, 12:26 AM

#

serene scaffold That's too general.

ai is a general definition

#

serene scaffold Aug 3, 2022, 12:27 AM

#

lapis sequoia ai is a general definition

There's no formal definition, but the one you proposed captures so many things that aren't AI that it shouldn't be considered

iron basalt Aug 3, 2022, 12:28 AM

#

lapis sequoia

I have yet to see one of these diagrams that works fully.

#

I guess you could say AI is more about the goal, what you want the end result to be, and not so much about what exact methods being used / what it is right now (to some limit, it has to do enough for most to consider it AI, so some lower bound of things).

#

And that goal is often mimicking some part of human intelligence.

#

Or not even human intelligence.

#

Like maybe how bees map out their territory.

#

Though the very general idea of AI is often dropped to something more specific to be on the same page and be productive.

iron basalt Aug 3, 2022, 12:38 AM

#

lapis sequoia could be an if statement

The problem is that "an if statement" does probably not reach that lower bound for most people. It does not say enough.

vernal crescent Aug 3, 2022, 12:55 AM

#

Hello, does anyone here have any knowledge about the Speech Recognition module? I have a question shipit

bold timber Aug 3, 2022, 1:20 AM

#

Hi, I have a problem for imbalanced data. How to determine the equation for class_weight with classes 0 or 1?

lapis sequoia Aug 3, 2022, 3:12 AM

#

When a ML model "learns," how do I know what exactly is it doing to learn? What mathematical functions does it use?

agile cobalt Aug 3, 2022, 3:33 AM

#

look up a tutorial on gradient descent to have a basic idea or take on a course like Andew Ng's on Coursera

#

if you just want to "what it does", it's not that complex ~~except the math behind the derivatives~~, but if you want to know "why does it works", specially for some of the most complex tasks... I'm not sure if whenever anyone in the world can answer that adequately

~~probably yes and I'm just exaggerating, but still~~

lapis sequoia Aug 3, 2022, 3:53 AM

#

agile cobalt if you just want to "what it does", it's not that complex ~~except the math behi...

Ok, thank you. I looked up activation functions a bit and it's making more sense.

#

I have another question, so based on my understanding a neural network basically creates a mathematical predictive function that maps x to y. How can I see the function it has created? If I can't see it, can I know the rough shape of it?

agile cobalt Aug 3, 2022, 3:57 AM

#

I recommend taking a look at simpler algorithms before looking into neural networks
(e.g., linear regression and decision trees)

but the "function" in neural networks is essentially just multiplying all the previous layer's nodes by the connection weights, then adding it all together (for each node in the next layer)

#

oh, I almost forgot - then yeeting it into the activation function

lapis sequoia Aug 3, 2022, 3:57 AM

#

agile cobalt I recommend taking a look at simpler algorithms before looking into neural netwo...

but there must be a final function that it ends up with at the end after training, right?

#

that it would run future predictions on

agile cobalt Aug 3, 2022, 3:58 AM

#

I'm not sure if the libraries ever compile it into a single function or just stores all the weights then runs it layer by layer

#

my guess is that it has to store and run layer by layer because of the activation functions though
(otherwise, the final function would get monstrous really, really fast)

#

but yeah, like I suggested before: If you want to understand them properly, do take a course on it.
I don't understand them all that well either, as you may have already guessed

lapis sequoia Aug 3, 2022, 4:04 AM

#

I've decided to read a book called Neural Networks from Scratch, where we build a NN from raw Python

iron basalt Aug 3, 2022, 4:04 AM

#

lapis sequoia but there must be a final function that it ends up with at the end after trainin...

After training there are no more changes or it will still be considered to be training.

lapis sequoia Aug 3, 2022, 4:05 AM

#

iron basalt After training there are no more changes or it will still be considered to be tr...

Right, but what if we want to store the function and want to make some tweaks manually in the future(even though they might be worse than retraining)?

agile cobalt Aug 3, 2022, 4:05 AM

#

ah, francis explained it a bit better in #python-discussion

iron basalt Aug 3, 2022, 4:05 AM

#

lapis sequoia Right, but what if we want to store the function and want to make some tweaks ma...

You can.

lapis sequoia Aug 3, 2022, 4:06 AM

#

iron basalt You can.

How can I access the function in order to do that?

agile cobalt Aug 3, 2022, 4:06 AM

#

lapis sequoia Right, but what if we want to store the function and want to make some tweaks ma...

you would just about never tweak a neural network manually and I kinda doubt that the libraries provide the means to do so

find tuning is a thing though

iron basalt Aug 3, 2022, 4:06 AM

#

lapis sequoia How can I access the function in order to do that?

If you wrote the code, you have it.

lapis sequoia Aug 3, 2022, 4:06 AM

#

iron basalt If you wrote the code, you have it.

If I use a pytorch library, I don't think I'll be able to see it

lapis sequoia Aug 3, 2022, 4:07 AM

#

agile cobalt you would just about never tweak a neural network manually and I kinda doubt tha...

Yeah, probably not. But I'm asking this to improve my understanding of NN's.

agile cobalt Aug 3, 2022, 4:07 AM

#

the model is not one single function.
It's a bunch of weights.

you can look up "pytorch visualise model weights" or more generally "pytorch model visualization"

iron basalt Aug 3, 2022, 4:07 AM

#

lapis sequoia If I use a pytorch library, I don't think I'll be able to see it

Depends on what kinds of tweaks, libraries like Pytorch are designed for a more specific kind of neural network, and only allow specific types of modifications.

#

You can edit its source code though.

#

But i'm not sure what kind of modifications you are trying to do. If it's just altering some weights, yeah, that is what it already does.

#

Neural networks can be seen as just big functions that have fine-tuned parameters via some algorithm (all can if you abstract hard enough).

#

(The details of that function are what matter though, and it's why it's considered a neural network and not just any big function)

frigid creek Aug 3, 2022, 7:52 AM

#

hi, so there are IoU, confidence threshold, precision, recall, AP and im just still confused about the part where mAP is based on IoU plus theres also part where some say precision-recall curve is weighted by threshold, in this graph is IoU and confidence threshold the same thing? thanks

unique flame Aug 3, 2022, 8:10 AM

#

Anyone know of a paper for image classification in which the researchers tried different training/validation ratio? So a split of 60/40,70/30,80/20 etc.

modest onyx Aug 3, 2022, 8:57 AM

#

my rnn is actually convering pog

Screen_Shot_2022-08-03_at_1.57.30_AM.png

rapid cedar Aug 3, 2022, 9:18 AM

#

coders of ~~reddit~~ discord, where did you learn ai?
also, what modules do you recommend to learn b4 starting ai learning

steady basalt Aug 3, 2022, 9:20 AM

#

Uni

steady basalt Aug 3, 2022, 9:22 AM

#

lapis sequoia I have another question, so based on my understanding a neural network basically...

Differential equation

rapid cedar Aug 3, 2022, 9:27 AM

#

steady basalt Uni

uni?

modest onyx Aug 3, 2022, 9:30 AM

#

rapid cedar coders of ~~reddit~~ discord, where did you learn ai? also, what module**s** do ...

I started off in the classic andrew ng coursera ml course back in spring break

#

now there's a new and improved version you can check out (it's free)

wooden sail Aug 3, 2022, 9:30 AM

#

steady basalt Differential equation

hmm?

modest onyx Aug 3, 2022, 9:33 AM

#

steady basalt Differential equation

why would you need DE here

#

an NN is just a massive composite and piecewise function

#

There are many ways to visualize what each layer is doing though

#

https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html I found this quite interesting

unique flame Aug 3, 2022, 9:38 AM

#

rapid cedar coders of ~~reddit~~ discord, where did you learn ai? also, what module**s** do ...

Started in statistic classes and then suddenly found ourselves using AI.

jaunty creek Aug 3, 2022, 10:02 AM

#

hello, I'm trying to make a few google accounts and google asks me for a phone number
to get around this, i've tried MAC spoofing, changing IP's, multiple browsers, various timeframes, various passwords and username configurations and clearing the browser cache and the cache on my local cache on my machine.
how does google still know im the same user trying to make new accounts?

#

is this some sort of ai

#

detecting it

neat crescent Aug 3, 2022, 10:14 AM

#

Unknown, but this sounds a little sketchy. It's not difficult to make more than one Google account, so whatever you're doing is out of the norm. As such we won't help as per our rule 5.

#

!rule 5

arctic wedgeBOT Aug 3, 2022, 10:14 AM

#

Rules

5. Do not provide or request help on projects that may break laws, breach terms of services, or are malicious or inappropriate.

barren wedge Aug 3, 2022, 10:23 AM

#

Is there any document ai outside google cloud?

jaunty creek Aug 3, 2022, 10:31 AM

#

neat crescent Unknown, but this sounds a little sketchy. It's not difficult to make more than ...

out of the norm? i like to use 1 gmail account for each service

steady basalt Aug 3, 2022, 10:34 AM

#

I’ve got like 4 gmails

modest onyx Aug 3, 2022, 10:34 AM

#

I've got like 12

#

half of which are for each institute/organization I'm in

wicked vessel Aug 3, 2022, 10:47 AM

#

hi 👋I was playing around with tensorflow and was wondering if there was a better way for the following:

I have a dataset with a feature (tcp port) that can range from 0-65500 where only the exact value matters. Eg if the dataset feature has port 123 then it only has a meaning if the input variable is exactly 123, and not 124, 122, etc. From what ive seen you normally want a categorical/integer encoding for something like this but i have a feeling there should be a better way when dealing with 65k different possible values

#

Right now my model seems to heavily depend on that one feature as a big indicator, giving outputs that make no sense when looking at the other features

tropic matrix Aug 3, 2022, 10:52 AM

#

is there a way to implement a custom loss function using sklearn mathematical operations?

i use a MinMaxScaler on my y output (regression), and i'm wondering if there's a way to inverse_transform the rmse in order to find a "true" rmse and judge how accurate it is

#

nvm figured it out

#

i had to first enable run_eagerly=True in model.compile, then converted my y_true and y_pred to np arrays using .numpy(), then i just inputted them into the function

steady basalt Aug 3, 2022, 11:24 AM

#

does anyone know how to fillna conditionally so that where ycol = 1 it fills with medians with that condition rather than column median

#

as one of my features would be better imputed this way for better predictive power

#

seems like a tricky pandas query

wooden sail Aug 3, 2022, 11:27 AM

#

do it in two lines, perhaps. df['somecol'].loc[ some_condition and some_other_condition ] = some_val, and a second line for where the condition doesn't hold

#

otherwise, pandas should have an equivalent to numpy's where, which does exactly this

#

https://stackoverflow.com/questions/38579532/pandas-equivalent-of-np-where here are some examples

steady basalt Aug 3, 2022, 11:28 AM

#

so i can say impute median of x column using a median thats calculated only on a condition of anothjer col

#

for example, my target class is binary, and i want to impute medians based on that rather than blanket impute a column

#

so the median xcol is say 130 where y=1 but 125 where y=0

#

so a missing value will check first, does ycol equal 1 or 0, then accordingly impute the median of all columns where that condition meets

#

the code you wrote works for finding median values, how do you put that into thefillna command

wooden sail Aug 3, 2022, 11:32 AM

#

this would be without fillna, but rather making one of the conditions into isna

steady basalt Aug 3, 2022, 11:32 AM

#

with fillna youd literally writ ethe exact same code

#

before and after fillna

#

seemd to have worked

#

df['somecol'].loc[ some_condition and some_other_condition ]

#

.fillna(df['somecol'].loc[ some_condition and some_other_condition ] .median())

wooden sail Aug 3, 2022, 11:34 AM

#

that's basically the same thing, and also requires two lines

#

you only need one condition in that case

steady basalt Aug 3, 2022, 11:34 AM

#

seemed to h ave imputed the wrong number

wooden sail Aug 3, 2022, 11:34 AM

#

or you could check out the pandas where that i shared above. any of these 3 work

steady basalt Aug 3, 2022, 11:35 AM

#

not sure why it didnt work

#

it seemed to have imputed all

#

not just where y=1

wooden sail Aug 3, 2022, 11:37 AM

#

show the code

steady basalt Aug 3, 2022, 11:38 AM

#

combined_train['bp'].loc[combined_train['stroke'] == 1].fillna(combined_train['bp'].loc[combined_train['stroke'] == 1].median(),inplace=True)

#

NaN 1

#

still have these

wooden sail Aug 3, 2022, 11:43 AM

#

you're replacing the entries if column 'bp' corresponding to where the rows of the column 'stroke' are true, with the median of those same entries of bp

steady basalt Aug 3, 2022, 11:44 AM

#

But it’s fillna so it shud do so where it’s missing based on where it isn’t

wooden sail Aug 3, 2022, 11:44 AM

#

no

#

if you tell it to replace nans with the median of some nans, it will happily give you nans again

steady basalt Aug 3, 2022, 11:45 AM

#

Ummmm

untold bloom Aug 3, 2022, 11:45 AM

#

pandas' median and many other methods exclude NaN in the result

#

code is perhaps not working due to inplace=True

steady basalt Aug 3, 2022, 11:45 AM

#

I’m thinking it’s easier to fillna with a float that I’ve calculated myself to be the median

untold bloom Aug 3, 2022, 11:45 AM

#

at that point, you're perhaps modifying a copy

steady basalt Aug 3, 2022, 11:45 AM

#

Nah it works otherwise on normal use

untold bloom Aug 3, 2022, 11:45 AM

#

inplace=True is to be avoided 999 out of 1000 occasions :p

#

it's useful in very rare cases; so perhaps try re-assigning

wooden sail Aug 3, 2022, 11:46 AM

#

what does combined_train['bp'].loc[combined_train['stroke'] == 1].median() yield?

steady basalt Aug 3, 2022, 11:46 AM

#

The correct number

#

It’s higher than when == 0

wooden sail Aug 3, 2022, 11:46 AM

#

then try what nahita says

steady basalt Aug 3, 2022, 11:47 AM

#

What if I just fillna of that condition with the number

#

Shud work

untold bloom Aug 3, 2022, 11:47 AM

#

sub_values = combined_train.loc[combined_train["stroke"] == 1, "bp"]
combined_train.loc[combined_train["stroke"] == 1, "bp"] = sub_values.fillna(sub_values.median())

#

problem is inplace=True...

wooden sail Aug 3, 2022, 11:49 AM

#

the other two methods i mentioned earlier also get you there without using fillna and are pretty similar to this, so try those out too, if you like

steady basalt Aug 3, 2022, 11:50 AM

#

A value is trying to be set on a copy of a slice from a DataFrame

#

his strategy gave this

untold bloom Aug 3, 2022, 11:50 AM

#

that means combined_train was defined from another dataframe, possibly as a subset or something

#

like combined_train = other_df[...] idk

steady basalt Aug 3, 2022, 11:50 AM

#

it iws

#

yes

untold bloom Aug 3, 2022, 11:51 AM

#

you need to chain .copy() at the end

steady basalt Aug 3, 2022, 11:51 AM

#

when creating it?

untold bloom Aug 3, 2022, 11:51 AM

#

combined_train = other_df[...].copy()

steady basalt Aug 3, 2022, 11:51 AM

#

well it was created from X_train and y_train which is also a subset of another df

#

and on and on and on before that

#

problem?

#

combined_train = pd.concat([X_train, y_train], axis=1).copy()

#

? works ?

wooden sail Aug 3, 2022, 11:52 AM

#

will you work with the original df or only these new ones? you could use copies if you don't need to modify the original

steady basalt Aug 3, 2022, 11:52 AM

#

this shudnt be an issue

#

qol sacrified for what functionality?

untold bloom Aug 3, 2022, 11:53 AM

#

pd.concat would give a copy anyway; that warning is perhaps due to some other code that you didn't share

steady basalt Aug 3, 2022, 11:53 AM

#

.copy didnt work

untold bloom Aug 3, 2022, 11:53 AM

#

yes, that was expected :p

#

because pd.concat gives you a new thing anyway

steady basalt Aug 3, 2022, 11:53 AM

#

nah its from x_train and y_train from previous df, whicih in itself is likely from another df

#

its a long notebook im not going back and going thru everyhing

untold bloom Aug 3, 2022, 11:54 AM

#

untold bloom ```py sub_values = combined_train.loc[combined_train["stroke"] == 1, "bp"] combi...

do you have this part, or something different? if so, how? can you share that? i guess it gives you the warning after this or similar operation

steady basalt Aug 3, 2022, 11:54 AM

#

instead of sub values i just said the condition you made subvalues from =

#

same thing

#

combined_train['bp'].loc[combined_train['stroke'] == 1].fillna(134.5,inplace=True)

#

this SHUD work wtf

untold bloom Aug 3, 2022, 11:56 AM

#

that should never work really

steady basalt Aug 3, 2022, 11:56 AM

#

the 1s didnt get filled

untold bloom Aug 3, 2022, 11:56 AM

#

you have chained access...

steady basalt Aug 3, 2022, 11:56 AM

#

combined_train['bp'].loc[combined_train['stroke'] == 1]=combined_train['bp'].loc[combined_train['stroke'] == 1].fillna(134.5) this worked

untold bloom Aug 3, 2022, 11:56 AM

#

combined_train['bp'].loc[combined_train['stroke'] == 1] is unpreferred

#

combined_train.loc[combined_train['stroke'] == 1, "bp"] is preferred

steady basalt Aug 3, 2022, 11:56 AM

#

so the weird thing is why this worked and the earlier didnt work

untold bloom Aug 3, 2022, 11:57 AM

#

untold bloom `inplace=True` is to be avoided 999 out of 1000 occasions :p

^^

untold bloom Aug 3, 2022, 11:57 AM

#

steady basalt ```combined_train['bp'].loc[combined_train['stroke'] == 1]=combined_train['bp']....

if this didn't work, we shouldn't be surprised as well

untold bloom Aug 3, 2022, 11:57 AM

#

untold bloom `combined_train.loc[combined_train['stroke'] == 1, "bp"]` is preferred

because of this.

steady basalt Aug 3, 2022, 11:57 AM

#

well its done now and working on what i typed there

untold bloom Aug 3, 2022, 11:58 AM

#

ok, sorry for the clutter

steady basalt Aug 3, 2022, 11:58 AM

#

so now i have different medians filled in my train data to base imputing my tests on that are conditionalyl different

#

so the model easier predicts

#

maybe improve score

steady basalt Aug 3, 2022, 12:22 PM

#

are you allowed to fillna of the test data conditionally or does it have to just be based off of a single column train value

#

it wud be cheating right? to impute test values where the test y is a certain value

#

is the best way to do this to impute based off of the enitre column median from train?

#

but isnt that leaking data in a way

#

its that you have seen and imputed on knowledge of test data

#

if you say 'impute test value with x value where testy=1

#

x value coming from training data

#

if you know that test1 is 1 or 0, isnt that cheating

#

testy*

wooden sail Aug 3, 2022, 12:25 PM

#

hmm yeah, you mean to modify the values of the input based on the output? you don't wanna do that

#

it'll make your test results better than they would otherwise be, not representative of real data

steady basalt Aug 3, 2022, 12:25 PM

#

so the entire load of work i just did, is just to make better training set, but for the test set id need to impute nissing values based on a isngle training value

wooden sail Aug 3, 2022, 12:27 PM

#

you can incorporate this behavior into the model with some kind of recursion or adding a variable that keeps track of the previous state. that would allow you to accumulate your predictions

steady basalt Aug 3, 2022, 12:28 PM

#

wdym?

#

looking at the first prediction, seeing its y value and adjusting accordingly ? cant do that if theres any na in the first place

wooden sail Aug 3, 2022, 12:29 PM

#

let's take a step back. you're calling x the input and y what you're trying to predict, yeah?

steady basalt Aug 3, 2022, 12:29 PM

#

yes

wooden sail Aug 3, 2022, 12:29 PM

#

ok. and you want to modify values of x based on what y is

steady basalt Aug 3, 2022, 12:30 PM

#

well i want to boost my f1 score its rly low

#


           0       0.97      0.75      0.84     48436
           1       0.04      0.30      0.08      1806

    accuracy                           0.73     50242
   macro avg       0.50      0.53      0.46     50242
weighted avg       0.93      0.73      0.82     50242```

#

for class1 its still bad

wooden sail Aug 3, 2022, 12:30 PM

#

sure, but what you're doing right now is replacing NaNs in x, yeah?

steady basalt Aug 3, 2022, 12:31 PM

#

i just did that with your guys help to make it better than just blanket imputing based off of the entire column but instead based off of the value of train_y

wooden sail Aug 3, 2022, 12:31 PM

#

just yes or no lol

steady basalt Aug 3, 2022, 12:31 PM

#

yes

wooden sail Aug 3, 2022, 12:32 PM

#

ok. well, in practice the values of y will not be available. but is there any reason to believe the current value of y depends on the previous values of y?

steady basalt Aug 3, 2022, 12:33 PM

#

no

wooden sail Aug 3, 2022, 12:33 PM

#

how about the current x on the previous values of x?

steady basalt Aug 3, 2022, 12:33 PM

#

no, its meant to be random

wooden sail Aug 3, 2022, 12:33 PM

#

then this approach cannot be done in practice

steady basalt Aug 3, 2022, 12:33 PM

#

well that isnt what i was doing

#

was was giving the model the assumption that any missing data from test is going to be different depending on its x or y in the first place

#

by imputing differnet medians before training and evaluating

#

based on training sets x and y

wooden sail Aug 3, 2022, 12:34 PM

#

you're still trying to modify x based on y

steady basalt Aug 3, 2022, 12:35 PM

#

if the average bp of someone with a strok eis higher than someone without a stroke in the trainnig set, it makse sense to impute conditionally so that those with a stroke have higher bp value in the training data, and this isnt cheating because.. its training data

#

then the model will 'see' someone with that higher blood pressure and may tend towards putting it in stroke=1 category, which may improve accuracy

#

make sense?

wooden sail Aug 3, 2022, 12:36 PM

#

yes, that's fine

#

what will you replace the values with when you don't have y

steady basalt Aug 3, 2022, 12:36 PM

#

yea that is what all the fuss was about

#

oh, I have y for everything

#

I only used data where y existed

wooden sail Aug 3, 2022, 12:37 PM

#

yes but when you go out and actually use the network, y is not available

steady basalt Aug 3, 2022, 12:37 PM

#

y is either 'the source said this person was on the record for stroke' or wasnt

wooden sail Aug 3, 2022, 12:37 PM

#

or are you not trying to predict y from x?

steady basalt Aug 3, 2022, 12:37 PM

#

I am

wooden sail Aug 3, 2022, 12:37 PM

#

these are two inputs?

steady basalt Aug 3, 2022, 12:37 PM

#

wdym y is not available?

#

in practise it isnt

wooden sail Aug 3, 2022, 12:38 PM

#

if you want to predict y from x in real data, you are given x and y is unknown

steady basalt Aug 3, 2022, 12:38 PM

#

yes

wooden sail Aug 3, 2022, 12:38 PM

#

so what do you plan on doing with the nans then

steady basalt Aug 3, 2022, 12:38 PM

#

its based off of the training set

#

so this new unseen data will use the training sets values

#

to impute nans

#

same as how my test set has also done

#

which isnt conditional that is just based on entire column value

wooden sail Aug 3, 2022, 12:40 PM

#

so the training nans in x are computed from the training y, and then the text x nans are computed from the training x values?

steady basalt Aug 3, 2022, 12:40 PM

#

weird, my grid search cv is now saying 0.95 scores for these tests.... that is odd

steady basalt Aug 3, 2022, 12:41 PM

#

wooden sail so the training nans in x are computed from the training y, and then the text x ...

yes

wooden sail Aug 3, 2022, 12:41 PM

#

all right. i have to say it's a bit weird when you also said the values of x are unrelated to each other

steady basalt Aug 3, 2022, 12:41 PM

#

idk what u meant by that. but i had to do it this way to prevent leakage while also improving the model

wooden sail Aug 3, 2022, 12:41 PM

#

but you're using something like a population average, so that's okish, not great

steady basalt Aug 3, 2022, 12:41 PM

#

what else can i do?

wooden sail Aug 3, 2022, 12:42 PM

#

if you're computing just an average value, may as well look at statistics people have gathered in larger populations so that you get a better estimate of the median or mean or whatever you're using

steady basalt Aug 3, 2022, 12:43 PM

#

this is odd, how my grid search scores are 0.97 now, what is happening? do you think its now seeing x values and being like 'well ill just guess y being this' and getting them all right due to the distribution of the x values

#

thats strange, shudnt score that well using this

#

its surpoassed 0.99 train and test in grid search cv on my trainig set

#

!!!!

#

how the hell has this happened

#

oh i know how

#

hmmm. i oversampled so actually still shudnt hav ehappened

#

somehow must still be guessing only one class to a massively high f1 and just disregarding the other class

#

optimising on f1

#

@wooden sail do u know why my f1 for class0 is 0.9 and class1 is only 0.1

wooden sail Aug 3, 2022, 12:50 PM

#

what's your cost function

steady basalt Aug 3, 2022, 12:50 PM

#

RF critereon/

#

entropy

#

im using random forest and xgb

#

[CV 4/10; 6/32] END bootstrap=True, criterion=entropy, max_depth=20, max_features=sqrt, min_samples_leaf=2, min_samples_split=5, n_estimators=100, n_jobs=-1;, score=(train=0.997, test=0.993) total time= 1.8s this is on the training set btw

#

the training sets oversampled to balance

#

i did SMOTE after one-hot encoding also

wooden sail Aug 3, 2022, 12:56 PM

#

if one of the categories is not very common, you could get it wrong always and still get a good performance by this metric

steady basalt Aug 3, 2022, 12:57 PM

#

yep

wooden sail Aug 3, 2022, 12:57 PM

#

didn't someone suggest using max f1 or something like that yesterday?

steady basalt Aug 3, 2022, 12:57 PM

#

they said just f1

wooden sail Aug 3, 2022, 12:57 PM

#

you could also make your own cost function where you average the results from the two categories, for instance

steady basalt Aug 3, 2022, 12:57 PM

#

also, random forest isnt supposed to take onehot encoded columsn right?

#

if thats the case, it wudnt be possible to use smote

#

as it i beleive requires you to encode

wooden sail Aug 3, 2022, 12:58 PM

#

it can, but it's probably not the most efficient at dealing with the sparsity involved

steady basalt Aug 3, 2022, 12:58 PM

#

whats better model?

wooden sail Aug 3, 2022, 12:58 PM

#

you could use smote and cast back to the original categories

steady basalt Aug 3, 2022, 12:58 PM

#

wooden sail you could also make your own cost function where you average the results from th...

how wud i code that

steady basalt Aug 3, 2022, 12:58 PM

#

wooden sail you could use smote and cast back to the original categories

how?

wooden sail Aug 3, 2022, 12:58 PM

#

how did you do the encoding? just do that backwards

wooden sail Aug 3, 2022, 12:59 PM

#

steady basalt how wud i code that

this depends on which library you're using for the training

steady basalt Aug 3, 2022, 12:59 PM

#

ok

#

sklearn for now

wooden sail Aug 3, 2022, 12:59 PM

#

should be able to do it with numpy, then, i think

steady basalt Aug 3, 2022, 1:00 PM

#

what model would well handle one hot encoded data

#

neurla network?

desert oar Aug 3, 2022, 2:14 PM

#

steady basalt so now i have different medians filled in my train data to base imputing my test...

you use the median from the train data to impute missing values in the test data

desert oar Aug 3, 2022, 2:14 PM

#

steady basalt as it i beleive requires you to encode

you can undo the one-hot encoding afterwards

steady basalt Aug 3, 2022, 2:28 PM

#

desert oar you can undo the one-hot encoding afterwards

whats the best way to do this

#

i forgot the syntax

#

and shud it be converted to string as i encoded on floats which were categories

wooden sail Aug 3, 2022, 2:29 PM

#

should be inverse_transform

steady basalt Aug 3, 2022, 2:30 PM

#

i shud also string(values) yes?

wooden sail Aug 3, 2022, 2:31 PM

#

this is the first example on the sklearn website if you look up onehot encoding

>>> enc = OneHotEncoder(handle_unknown='ignore')
>>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
>>> enc.fit(X)
OneHotEncoder(handle_unknown='ignore')
>>> enc.categories_
[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
>>> enc.transform([['Female', 1], ['Male', 4]]).toarray()
array([[1., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.]])
>>> enc.inverse_transform([[0, 1, 1, 0, 0], [0, 0, 0, 1, 0]])
array([['Male', 1],
       [None, 2]], dtype=object)
>>> enc.get_feature_names_out(['gender', 'group'])
array(['gender_Female', 'gender_Male', 'group_1', 'group_2', 'group_3'], ...)

steady basalt Aug 3, 2022, 2:32 PM

#

do u think this is whats causing my random forest to do this

#

#

when training cross val seems to be fine (balanced)

ornate shard Aug 3, 2022, 2:36 PM

#

hello guys

#

#

how to read this csv file ?

wooden sail Aug 3, 2022, 2:36 PM

#

in python, you can read csvs with the csv lib or pandas. there, you can specify the separator, which is a bar here instead of the more common comma

steady basalt Aug 3, 2022, 2:41 PM

#

wud anyone know whjy my test adn train scores are nan

ornate shard Aug 3, 2022, 2:41 PM

#

wooden sail in python, you can read csvs with the csv lib or pandas. there, you can specify ...

how can i do this bro

#

steady basalt Aug 3, 2022, 2:42 PM

#

after i went back and removed the onehot encoding part of the process and just used strings

ornate shard Aug 3, 2022, 2:42 PM

#

because it shows like this

wooden sail Aug 3, 2022, 2:42 PM

#

ornate shard how can i do this bro

pandas.read_csv(file_name, sep='|')

rapid cedar Aug 3, 2022, 2:42 PM

#

can someone explain about this
i was looking thru some videos on youtube about ml, and i found one that said about deep learning. i find that very interesting. but it said that it has generation, how do i know that what im using rn is the generation i wanted? i cant just set seed like minecraft do. so how do they reconige the generation? like theres not token of the generation that has certain data

steady basalt Aug 3, 2022, 2:43 PM

#

its saying randomfrest doenst accept nans but i have no nans

rapid cedar Aug 3, 2022, 2:43 PM

#

rapid cedar can someone explain about this i was looking thru some videos on youtube about m...

help please 🥹

ornate shard Aug 3, 2022, 2:44 PM

#

wooden sail pandas.read_csv(file_name, sep='|')

thank you soooo much brother

ornate shard Aug 3, 2022, 2:44 PM

#

wooden sail pandas.read_csv(file_name, sep='|')

❤️

wooden sail Aug 3, 2022, 2:44 PM

#

glad it worked

steady basalt Aug 3, 2022, 2:45 PM

#

sklearn doesnt take strings

#

so how to deal with categoricals?

#

if u arent meant to encode for random forest

ornate shard Aug 3, 2022, 2:48 PM

#

#

how about this file how to read from a specifc line because i don't want the data definitions ?

wooden sail Aug 3, 2022, 2:50 PM

#

there's another parameter called skiprows, which counts from 0. you want to skip lines 0 to 8. how about trying pandas.read_csv(file_name, sep = '|', skiprows = 8)

steady basalt Aug 3, 2022, 2:50 PM

#

@desert oar i thinkit was you talking about this

ornate shard Aug 3, 2022, 2:54 PM

#

wooden sail there's another parameter called skiprows, which counts from 0. you want to skip...

thank you sooooo much ❤️

wooden sail Aug 3, 2022, 2:56 PM

#

for further reference, you can check here if you like reading the docs https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html i've never used pandas myself, so i'm just digesting the docs for you 😛

lapis sequoia Aug 3, 2022, 2:57 PM

#

ornate shard

besides skiprows, there is a parameter called comment that skips some lines depending on how they start. in this case it would be comment="#"

ornate shard Aug 3, 2022, 2:58 PM

#

lapis sequoia besides skiprows, there is a parameter called comment that skips some lines depe...

thank youu soo much brother ❤️

steady basalt Aug 3, 2022, 3:40 PM

#

Cannot index with multidimensional key

#

anyone know why this suddenly started happening when it was working earlier

#

sns.boxplot(x=combined_train.loc[combined_train['stroke'] == 1, 'bp'])

naive turret Aug 3, 2022, 4:14 PM

#

Need help

#

being able to combine tables

#

#

I've so far used pandas to try and clean them

#

I want to try and restructure the second table to look like the first

steady basalt Aug 3, 2022, 4:19 PM

#

anyone know why keras validation auc is always 0

#

but binary cross entropy seems to be ok

#

oh you h ave to type a function and not the string

#

no, ok that didnt fix it

#

262/262 [==============================] - 2s 9ms/step - loss: 0.2052 - auc: 0.9507 - val_loss: 0.2516 - val_auc: 0.0000e+00```

#

how is that possible

misty flint Aug 3, 2022, 4:27 PM

#

hahaha this is so accurate

#

kekHands

lapis sequoia Aug 3, 2022, 4:29 PM

#

naive turret I want to try and restructure the second table to look like the first

use pandas melt function

steady basalt Aug 3, 2022, 4:32 PM

#

@misty flint any idea why my neural network always gets 0.0 auc

#

for validation

misty flint Aug 3, 2022, 4:33 PM

#

hmm i have a note here that says you dont really listen to what i have to say

#

so...

#

Oopsies

steady basalt Aug 3, 2022, 4:34 PM

#

but it makes 0 sense that training auc is 0.7 while val auc is 0

naive turret Aug 3, 2022, 4:45 PM

#

lapis sequoia use pandas melt function

I'm trying to wrap my head around it

#

Because the years arent headers wouldnt it merge and say "year" repeatedly

steady basalt Aug 3, 2022, 4:53 PM

#

somehow the nerual network only predicts one class

#

even tho i did undersample

midnight rain Aug 3, 2022, 5:24 PM

#

Has anyone ever thought about implementing a feature store by using the entity component system (ECS) architecture?

#

I've been thinking about it a ton over the last few weeks and it seems like it'd be super nice

misty flint Aug 3, 2022, 5:31 PM

#

midnight rain Has anyone ever thought about implementing a feature store by using the entity c...

thats a very interesting idea

#

PikaThink

#

there are also a lot of out-of-the-box feature stores out there too

midnight rain Aug 3, 2022, 5:40 PM

#

misty flint there are also a lot of out-of-the-box feature stores out there too

Yeah for sure

#

I've always wanted to make one as a SAAS product though

misty flint Aug 3, 2022, 5:46 PM

#

ml startup when

#

lemon_fingerguns_shades

#

+1 for a very good UI

#

if you do end up doing something

iron basalt Aug 3, 2022, 6:13 PM

#

midnight rain Has anyone ever thought about implementing a feature store by using the entity c...

A (relational) database / datastore? ECS works well for games that want something like a database, you can even implement an ECS by using one (pretty much a subset of what a database does).

serene scaffold Aug 3, 2022, 6:14 PM

#

naive turret

you need to split Month and Year into separate columns, and then do pivot_table

#

oh, you want to go the other direction. well, you can also do that with pivot_table

#

!docs pandas.pivot_table

arctic wedgeBOT Aug 3, 2022, 6:15 PM

#

pandas.pivot\_table


pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)```
Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

midnight rain Aug 3, 2022, 6:18 PM

#

iron basalt A (relational) database / datastore? ECS works well for games that want somethin...

Well with a good feature store you want to be able to define whether a feature is computed when stored or at query time right? And you want graph dependencies on those as well. I'm curious if you could take the ECS architecture and get better performance and easy maintenance on a feature store/service

#

Maybe you could use an in memory ECS setup to cache the "hot" features and allow a regular relational/graph db to handle colder features if they aren't in the cache.

iron basalt Aug 3, 2022, 6:22 PM

#

midnight rain Well with a good feature store you want to be able to define whether a feature i...

Databases already have caching and especially relational databases have pretty much always been ECS-like. They are already optimized (although still on-going as many databases adapted to changes in hardware such as SSDs, bigger caches / memory speed being the bottleneck, etc). Relational databases are not always the preferred choice, but if configured correctly they are pretty much already following the data-oriented-design principle.

#

A more specific datastore / database for a specific task, will as with most things, probably be faster.

#

So if you have some niche to hit, it could be worth it.

midnight rain Aug 3, 2022, 6:23 PM

#

iron basalt Databases already have caching and especially relational databases have pretty m...

I think the most beneficial setup for an ECS + feature store would be allowing for an easy to use query + dsl setup to maintain the dependency graphs and query for mixtures of systems and component features

iron basalt Aug 3, 2022, 6:24 PM

#

midnight rain I think the most beneficial setup for an ECS + feature store would be allowing f...

Yeah maybe, if you want to make it go for it, and see how it works out. I am very much for reinventing the wheel. Even if it ends up not being a better wheel you will have learned a lot.

midnight rain Aug 3, 2022, 6:25 PM

#

I think it would be pretty cool to build a feature store/service on top of apache arrow + arrow flight + parquet

iron basalt Aug 3, 2022, 6:26 PM

#

midnight rain I think it would be pretty cool to build a feature store/service on top of apach...

Yeah apache arrow or some other standard.

#

(As long as it does not hold back your design)

midnight rain Aug 3, 2022, 6:27 PM

#

iron basalt Yeah apache arrow or some other standard.

think arrow would be a nice base because you get immediate support for a super efficient storage format, get high performance vectorized compute (for basic operations at least), and you get a data transport protocol

iron basalt Aug 3, 2022, 6:29 PM

#

midnight rain think arrow would be a nice base because you get immediate support for a super e...

The format is a pretty straight forward generic binary format. The speed comes from it just not being a mess so that you can write reasonably fast code. It does not prevent you from being fast I guess would be the way to put it.

#

More importantly, if multiple programs support it you get all the interop, but not with some janky format.

midnight rain Aug 3, 2022, 6:30 PM

#

well apache arrow flight is built on top of gRPC + IPC

iron basalt Aug 3, 2022, 6:30 PM

#

(So if someone wrote some fast parallel query stuff for it, you can just use it)

iron basalt Aug 3, 2022, 6:32 PM

#

midnight rain well apache arrow flight is built on top of gRPC + IPC

IPC via sockets on the same machine is silly, but if you plan on actually having it communicate with processes on different machines then it's fine.

steady basalt Aug 3, 2022, 7:24 PM

#

anyone know why xgb over predicts class 1 while random forest over predicts class 0

modest onyx Aug 3, 2022, 7:43 PM

#

does it?

agile cobalt Aug 3, 2022, 7:53 PM

#

probably something related to the way their data is distributed or just a coincidence, I highly doubt that this holds true in general for these model types

rancid kelp Aug 3, 2022, 7:54 PM

#

hey I have a dataset where some of the variables are numerical and are very skewed. Would you recommend Transformation(log/normal) OR Discretization? or mix of both?
Thanks :))

steady basalt Aug 3, 2022, 7:57 PM

#

wel it seems to have improved alot simply by just dropping all na values rather than find the most efficient way to impute, this way it isnt guessing a single class for every guess

#

now to somehow improve precision, i dont know when theres only 500 of each class in a binary prediction, originalyl had 500k rows

#

perhaps a good way to impute would be the mid-way between class means in the training set, but ive never done that before.

dusty valve Aug 3, 2022, 9:14 PM

#

can someone recommend a tensorflow tutorial (not anything specific, i just wanna learn everything about it)

tropic matrix Aug 3, 2022, 9:31 PM

#

when tuning hyperparameters in keras_tuner, what is the difference between hyperband and bayesian optimization? and which one should i be using to find good hyperparameters the fastest?

dusty valve Aug 4, 2022, 2:40 AM

#

here comes the fun part, so, what layers should i use for my text gen tensorflow Sequential model

tame grail Aug 4, 2022, 3:00 AM

#

what does the code under #normalize data do

dusty valve Aug 4, 2022, 3:15 AM

#

what am i doing wrong?

lapis sequoia Aug 4, 2022, 4:22 AM

#

dusty valve what am i doing wrong?

Share whole thing.

lapis sequoia Aug 4, 2022, 4:23 AM

#

tame grail what does the code under `#normalize data` do

Well it is normalizing both xtrain and xtest. So you first fit xtrain and transform it. Then using same model transform xtest as well.

tropic matrix Aug 4, 2022, 5:00 AM

#

I am trying to optimize 3 metrics, number of layers, number of neurons per layer, and the learning rate. Would you like to know the ranges of these values?

rapid cedar Aug 4, 2022, 6:23 AM

#

should i learn tensorflow b4 starting ml?

modest onyx Aug 4, 2022, 7:08 AM

#

no it's the other way around

#

I don't recommend tensorflow though learn pytorch

lapis sequoia Aug 4, 2022, 8:34 AM

#

hey, so im using pytorch. Would there be any problem down the road if weights.grad from my model's layers produce gradients tensors but weights.grad_fn returns None

rapid cedar Aug 4, 2022, 8:39 AM

#

sooo, study math first?

silver ibex Aug 4, 2022, 8:48 AM

#

please can someone help me, i really dont know where or what im doing wrong 😦

modest onyx Aug 4, 2022, 9:07 AM

#

rapid cedar sooo, study math first?

yes basically

#

you'd be surprised how much of ml is math in comparison to coding

#

though if you're already pretty good at math then you can dive in and study the more complex math stuff in parallel with learning the coding aspect of it

#

at least that's what I did

silver ibex Aug 4, 2022, 9:20 AM

#

is this what you mean?

steady basalt Aug 4, 2022, 9:44 AM

#

anyone know why my accuracy doesnt change in my neural network

#

but loss does

modest onyx Aug 4, 2022, 9:53 AM

#

could be because your loss function isn't strongly correlated with accuracy

serene scaffold Aug 4, 2022, 10:49 AM

#

silver ibex please can someone help me, i really dont know where or what im doing wrong 😦

it looks like row['shares'] is None, but in data science python, you want missing values to be represented as NaN. A number times NaN will just give you another NaN, instead of erroring.

#

By the way, please don't ask people to read screenshots of text. actual text is easier for everyone.

#

!code

arctic wedgeBOT Aug 4, 2022, 10:49 AM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

silver ibex Aug 4, 2022, 10:50 AM

#

serene scaffold !code

sorry, thank you for your help

steady basalt Aug 4, 2022, 11:16 AM

#

modest onyx could be because your loss function isn't strongly correlated with accuracy

hm?

#

binary cross entropy?

mild dirge Aug 4, 2022, 12:16 PM

#

Any good book on linear algebra someone would recommend? If it is specific too machine learning that would be a bonus

#

Read through some books about general mathematics for ml, but it did not go deep enough into linear algebra

#

Also started on the book by bishop but it seemed kinda dense, and it is also quite old, so the chapter on NN is probably a bit deprecated

grim coral Aug 4, 2022, 12:29 PM

#

hi

wooden sail Aug 4, 2022, 12:33 PM

#

mild dirge Read through some books about general mathematics for ml, but it did not go deep...

gilbert strang's linalg is good

#

axler's linear algebra done right is great, but it's a lot more abstract. i don't think it even deals with matrices until rather late on, focusing on abstract vector spaces and linear transformations first

#

e.g. integration, differential operators, operations on polynomials and the like

mild dirge Aug 4, 2022, 12:52 PM

#

wooden sail gilbert strang's linalg is good

I'll try the gilbert one, think I already downloaded it but haven't started on it yet

wooden sail Aug 4, 2022, 12:57 PM

#

was there anything in particular you wanted to reinforce? the main point in strang's linalg is his so-called fundamental theorem of linalg which related the "4 fundamental subspaces" related to matrices as linear maps

mild dirge Aug 4, 2022, 1:01 PM

#

I just feel in general that I don't have a good grip on the mathematics behind some of the machine learning methods. So I want to get a good understanding of that before I move on to getting a better understanding of more complicated deep learning concepts.

wooden sail Aug 4, 2022, 1:02 PM

#

hmmm it's very likely that linalg alone won't be enough to make you feel comfortable with ML methods. it's a great place to start though

mild dirge Aug 4, 2022, 1:03 PM

#

I am also reading some stuff on probability and statistics

wooden sail Aug 4, 2022, 1:03 PM

#

so, linalg and stats help you in defining the problems you want to solve, but not in solving them 😛

#

that's still a separate topic that comes after

mild dirge Aug 4, 2022, 1:04 PM

#

Yeah sure

#

Imo my uni just did not spend enough time on the basics, and just started to move onto topics like deep learning and cnns without really explaining why that stuff works

wooden sail Aug 4, 2022, 1:05 PM

#

are you in bsc? engineering?

mild dirge Aug 4, 2022, 1:05 PM

#

Master AI

wooden sail Aug 4, 2022, 1:05 PM

#

huh, then that's kinda bad

mild dirge Aug 4, 2022, 1:05 PM

#

But it's a bit of a jack of all trades, we can choose a lot of our courses

#

But I tend to choose more of the computer vision and mathematics type stuff

#

But there is also cognitive ergonomics (like we had to design a UI for a satnav f.e.) which I don't care about a lot

#

In hindsight I probably should have chosen ML master, but I feel like I can supplement most of the stuff with self-study

wooden sail Aug 4, 2022, 1:08 PM

#

that's certainly the goal of masters programs, to prepare you to be able to teach yourself what you need

severe oriole Aug 4, 2022, 1:08 PM

#

guys I just want to ask but is Kaggle a good place to study all about data science and practice competitions in it for experience?

wooden sail Aug 4, 2022, 1:09 PM

#

kaggle is pretty good, sure

#

it won't teach you everything, but it's a nice complement to your studies

mild dirge Aug 4, 2022, 1:09 PM

#

wooden sail that's certainly the goal of masters programs, to prepare you to be able to teac...

Well in the end I'll at least have some relevant masters degree

severe oriole Aug 4, 2022, 1:09 PM

#

I'm still pretty much a beginner in this field right now tbh

mild dirge Aug 4, 2022, 1:09 PM

#

And hopefully enough knowledge to just gain some practical exp

severe oriole Aug 4, 2022, 1:09 PM

#

and what should I go for next after Kaggle then

mild dirge Aug 4, 2022, 1:10 PM

#

You could maybe even join a team for competitions @severe oriole

wooden sail Aug 4, 2022, 1:10 PM

#

kaggle is nice due to the challenges and availability of data, but you should learn some theory that you can throw at them

severe oriole Aug 4, 2022, 1:10 PM

#

should it be practicing with Tableau for data visualization

severe oriole Aug 4, 2022, 1:12 PM

#

mild dirge You could maybe even join a team for competitions <@508159821288177669>

Hmm that sounds interesting. I'd love to try after I master all basics first then

#

And the theory you mean like statical theories basics right

mild dirge Aug 4, 2022, 1:13 PM

#

Yeah or just different types of machine learning models

severe oriole Aug 4, 2022, 1:14 PM

#

Thanks a lot man. Let me search more before I come back here and ask again

#

Right now I just know 3 models from the intro: decision tree, random forest, and the validation one

#

Lemme find more abt this

mild dirge Aug 4, 2022, 1:15 PM

#

Those 2 could be sued on tabular data (like just numbers)

#

But you would also want to learn about multi-layer perceptrons, and stuff like CNNs (convolutional neural networks) for image data

severe oriole Aug 4, 2022, 1:17 PM

#

Interesting. That's something new there for me

#

Now I'm even hyped more to finish this kaggle thing before diving into that xd

mild dirge Aug 4, 2022, 1:19 PM

#

Well I would think this would come before trying to classify a lot of kaggle datasets

#

You would use these models on the data

severe oriole Aug 4, 2022, 1:20 PM

#

Oh then it's better if I know them first right

#

Since they're models and knowing to use them helps me when I go and try those kaggle datasets is what you mean?

mild dirge Aug 4, 2022, 1:22 PM

#

Yes, exactly

#

You could use kaggle datasets to try them out while you are learning about them though

severe oriole Aug 4, 2022, 1:23 PM

#

Noted. Let me try monkeying with data around to see about that

#

I'll keep the update here then

desert oar Aug 4, 2022, 2:36 PM

#

wooden sail axler's linear algebra done right is great, but it's a lot more abstract. i don'...

axler is definitely good for a "2nd course in linear algebra" after you've gone through a "1st course" that builds strong intuition in R^n like strang's

desert oar Aug 4, 2022, 2:37 PM

#

steady basalt

i think it would help a lot if you showed your actual model-fitting and evaluation code

#

personally i've never seen output this pathological even on egregiously unbalanced datasets. to the point where i'd sooner suspect a bug in your code than a problem in your procedures

livid goblet Aug 4, 2022, 3:03 PM

#

Where do I start learning AI ? any beginner friendly resources ?

desert oar Aug 4, 2022, 3:06 PM

#

livid goblet Where do I start learning AI ? any beginner friendly resources ?

fast.ai is good (even if you're not a beginner, it's a great catch-up to industry-standard ai/ml), but i'm not sure what the pre-requisites are

tropic matrix Aug 4, 2022, 3:08 PM

#

Alright, thank you!

desert oar Aug 4, 2022, 3:11 PM

#

@tropic matrix @lapis sequoia i've also had good results w/ "halving search" (which is available in scikit-learn) for such problems. for black-box optimization though, i suggest the Optuna library, and i suggest comparing both techniques.

tropic matrix Aug 4, 2022, 3:13 PM

#

tropic matrix Alright, thank you!

One more question, is it effective to tune hyperparameters on just a portion of the dataset (if the dataset is large and can take hours per epoch)?

#

@desert oar

desert oar Aug 4, 2022, 3:14 PM

#

tropic matrix One more question, is it effective to tune hyperparameters on just a portion of ...

it can be, if you are confident that the portion of the dataset is representative of the full dataset, and that you aren't introducing excessive sparsity in the features or labels

tropic matrix Aug 4, 2022, 3:15 PM

#

Alright, ig i'll see if it ends up becoming effective

desert oar Aug 4, 2022, 3:15 PM

#

it's a very good technique for testing that your models actually work though

livid goblet Aug 4, 2022, 3:35 PM

#

desert oar fast.ai is good (even if you're not a beginner, it's a great catch-up to industr...

thank you for the link i'll check it out now . I wanna get into ml and ai . They are in general the same thing right ?

desert oar Aug 4, 2022, 3:36 PM

#

livid goblet thank you for the link i'll check it out now . I wanna get into ml and ai . They...

kind of. "ai" in practice is more like "ml but with fancy window dressing". real "ai" is still something that mostly happens in university labs.

#

there is a philosophical argument to be made that ml is ai, but it's important to distinguish between the appearance of intelligence and actually having intelligence.

serene scaffold Aug 4, 2022, 3:45 PM

#

desert oar kind of. "ai" in practice is more like "ml but with fancy window dressing". real...

we had a big meeting in my division yesterday, and the AI director for my company said "no one can agree on what AI is, except that it's what you can't currently do"

lapis sequoia Aug 4, 2022, 4:17 PM

#

livid goblet Where do I start learning AI ? any beginner friendly resources ?

I would recommend also Andrew Ng's machine learning course on Coursera, he starts from the basics to more advanced concepts.

iron basalt Aug 4, 2022, 4:18 PM

#

desert oar kind of. "ai" in practice is more like "ml but with fancy window dressing". real...

What makes this very confusing is that they used to be considered the same thing (ML/AI). But then a rift formed between the two as there was a large push against statistical/probabilistic methods from the established symbolic AI community which was gate-keeping (they were considered the authority on AI at the time). This caused the "AI Winter" to happen because symbolic AI was going nowhere in terms of progress and all funding / effort to probabilistic methods and especially neural networks had been cut (at the universities in general, there still was much progress being made but it had no recognition). This went on until the performance of neural networks was too great to ignore, especially after CNNs completely blew all previous computer vision out of the water.

desert oar Aug 4, 2022, 4:20 PM

#

iron basalt What makes this very confusing is that they used to be considered the same thing...

very good points. i think what's happened nowadays is that a distinction is made between "AGI or something close to it" and "automating tasks via computer w/ the user-facing perception of human intelligence"

iron basalt Aug 4, 2022, 4:20 PM

#

Now I like to use ML as a term to avoid debate over what AI is and instead just work on something productive.

#

Because everyone has an opinion on what AI is. You can see the difference if you for example look what you can find on the AI subreddit vs the ML subreddit or anywhere else.

#

(One is productive in majority, the other would be productive in majority if it was not always spammed with debate that leads nowhere)

desert oar Aug 4, 2022, 4:22 PM

#

i think the other problem is marketing and media hype

#

really complicates terminology and makes me want to avoid calling anything "ai" as much as possible

iron basalt Aug 4, 2022, 4:24 PM

#

Yeah, it has caused everyone to have an opinion on it by muddying the definition to make it accessible to all for opinions. You will hear an opinion from everyone on things that are easy to have an opinion on. I don't see everyone having an opinion on the merits of batched gradient descent vs not.

desert oar Aug 4, 2022, 4:26 PM

#

right, i will continue to let the "technologists" emptily debate on medium.com, while i go solve business problems

iron basalt Aug 4, 2022, 4:29 PM

#

It's fine to do such debate, but at some point I just choose what matters to me (what do I find interesting / productive) and go do some work.

ornate shard Aug 4, 2022, 4:37 PM

#

hello guys anyone available to help me with my project in a voice chanel for just 5 mins ?

wooden sail Aug 4, 2022, 4:41 PM

#

desert oar right, i will continue to let the "technologists" emptily debate on medium.com, ...

i'm gonna let you solve business problems while i do research no one will ever read or care about 😌

iron basalt Aug 4, 2022, 4:54 PM

#

wooden sail i'm gonna let you solve business problems while i do research no one will ever r...

You're both being productive IMO (research or business).

#

👍

serene steeple Aug 4, 2022, 7:05 PM

#

guys i downloaded a compressed data in a "zst" format, do i need to extract the file in it or is supposed to be just the compressed zst file ?

quaint leaf Aug 4, 2022, 7:18 PM

#

serene steeple guys i downloaded a compressed data in a "zst" format, do i need to extract the ...

from what I read you cannot open it directly, so yeah you need to extract to reach the files

modest onyx Aug 4, 2022, 7:56 PM

#

steady basalt hm?

Well we can't really help you with the information that you gave us. Could be a bug in your implementation

mint palm Aug 4, 2022, 8:50 PM

#

what does it mean to say "anomaly detection methods are quite sparse"?
does it mean there less relevant features compared to useless ones?

unique flame Aug 4, 2022, 9:55 PM

#

I would interpret it as the algorithms used. Like K-means clustering,

#

k nearest neighbour, hierarchical clustering

mint palm Aug 4, 2022, 10:07 PM

#

unique flame I would interpret it as the algorithms used. Like K-means clustering,

you mean diverse in feature?

#

^"sort of"

unique flame Aug 4, 2022, 10:11 PM

#

no as in I can't think of any more methods

mint palm Aug 4, 2022, 10:16 PM

#

sparse meant taking only highly pronounced features, thats what i wanted to know

vale solstice Aug 5, 2022, 1:27 AM

#

Has anyone done Andrew Ngs Machine Learning Specialization that just came out?

last peak Aug 5, 2022, 2:18 AM

#

im working on it

modest onyx Aug 5, 2022, 2:30 AM

#

I did the classic one a while ago

dusty valve Aug 5, 2022, 4:32 AM

#

it's beautiful

#

holy shit it just hit 100% accuracy

#

eeeeeee

#

it actually hit 100%, but it went down after

#

screw it, my 48% accuracy text model did better

#

it produced actual words, although they did not make sense

#

i want to string the python

#

maybe i just need a larger dataset

#

im just gonna train it on my all my text messages

modest onyx Aug 5, 2022, 5:13 AM

#

is this an lstm?

lapis sequoia Aug 5, 2022, 5:26 AM

#

is anyone good with tensorflow image classification btw
im having a hard time learning it

dusty valve Aug 5, 2022, 5:26 AM

#

modest onyx is this an lstm?

Yes

lapis sequoia Aug 5, 2022, 5:26 AM

#

dusty valve it actually hit 100%, but it went down after

thats tensorflow

#

right?

dusty valve Aug 5, 2022, 5:26 AM

#

Yes

modest onyx Aug 5, 2022, 5:26 AM

#

damn

lapis sequoia Aug 5, 2022, 5:26 AM

#

finally

#

can u help me :d

modest onyx Aug 5, 2022, 5:27 AM

#

I tried building my own rnn and lstm and they won't budge

dusty valve Aug 5, 2022, 5:27 AM

#

No

modest onyx Aug 5, 2022, 5:27 AM

#

not learning at all

lapis sequoia Aug 5, 2022, 5:27 AM

#

the official docs look like cancer to mee

dusty valve Aug 5, 2022, 5:27 AM

#

Same

modest onyx Aug 5, 2022, 5:27 AM

#

class LSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim) -> None:
        super(LSTM, self).__init__()

        self.input_linear = nn.Linear(input_dim + hidden_dim, 4*hidden_dim)
        self.output_linear = nn.Linear(hidden_dim, input_dim)

        self.hidden_state = torch.zeros(hidden_dim)
        self.cell_state = torch.zeros(hidden_dim)

    def forward(self, input):
        
        self.hidden_state, self.cell_state = self.hidden_state.detach(), self.cell_state.detach()
        # l is the length of each gate
        l = len(self.hidden_state)

        concatenated = torch.cat((self.hidden_state, input), axis=0)
        temp = self.input_linear(concatenated)
        i, f, o, g = (
            torch.sigmoid(temp[:l]), 
            torch.sigmoid(temp[l:2*l]), 
            torch.sigmoid(temp[2*l:3*l]), 
            torch.tanh(temp[3*l:])
        )
        self.cell_state = f * self.cell_state + i * g
        self.hidden_state = o * torch.tanh(self.cell_state)

        output = self.output_linear(self.hidden_state)
        return output
    
    def reset_states(self):
        hidden_dim = len(self.hidden_state)
        self.hidden_state = torch.zeros(hidden_dim)
        self.cell_state = torch.zeros(hidden_dim)

lapis sequoia Aug 5, 2022, 5:27 AM

#

dusty valve Same

what are you making btw

modest onyx Aug 5, 2022, 5:27 AM

#

legit won't learn

dusty valve Aug 5, 2022, 5:27 AM

#

I had to tweak with a bunch of stuff and rerun a lot

dusty valve Aug 5, 2022, 5:27 AM

#

lapis sequoia what are you making btw

Text gen

lapis sequoia Aug 5, 2022, 5:28 AM

#

ohhh

dusty valve Aug 5, 2022, 5:28 AM

#

I wanted to perfect my first

lapis sequoia Aug 5, 2022, 5:28 AM

#

I want to make hcaptcha classifier but cant

dusty valve Aug 5, 2022, 5:28 AM

#

When I get my discord messages I’m gonna train it on them

dusty valve Aug 5, 2022, 5:28 AM

#

lapis sequoia I want to make hcaptcha classifier but cant

!rule 5

arctic wedgeBOT Aug 5, 2022, 5:28 AM

#

Rules

5. Do not provide or request help on projects that may break laws, breach terms of services, or are malicious or inappropriate.

lapis sequoia Aug 5, 2022, 5:28 AM

#

its not breaking law

#

I'm doing it as a fun project

#

it will be fun to learn tensorflow while trying to make it

dusty valve Aug 5, 2022, 5:29 AM

#

That can be used to bypass captchas

lapis sequoia Aug 5, 2022, 5:31 AM

#

dusty valve That can be used to bypass captchas

I agree but thats not my intention

dusty valve Aug 5, 2022, 5:31 AM

#

That doesn’t matter

#

If it can be used maliciously we can’t help

lapis sequoia Aug 5, 2022, 5:32 AM

#

well I find that a captcha identifier will motivate me to learn AI completely But alright I'm good with the rules
lets keep that apart

I just need help with tensorflow

#

nothing seems understandable to me

modest onyx Aug 5, 2022, 5:45 AM

#

I'mma try making it more than one block deep and see how it goes

#

maybe my model's capacity is currently too low to generate text

lapis sequoia Aug 5, 2022, 5:46 AM

#

modest onyx maybe my model's capacity is currently too low to generate text

do you know tensorflow completely?

#

then would you mind helping me :d

modest onyx Aug 5, 2022, 5:47 AM

#

no I don't know tensorflow that well

lapis sequoia Aug 5, 2022, 5:47 AM

#

oh

modest onyx Aug 5, 2022, 5:47 AM

#

because I use pytorch

lapis sequoia Aug 5, 2022, 5:47 AM

#

pytorch

#

pytorch can it classify images?

modest onyx Aug 5, 2022, 5:48 AM

#

ofc

#

classifying images is now one of the simplest things to do using frameworks

lapis sequoia Aug 5, 2022, 5:49 AM

#

oh cool

modest onyx Aug 5, 2022, 6:48 AM

#

@dusty valvewhat encoding/decoding method did you use?

#

I did a bit of research and it seems my problem comes from a common problem known as text degeneration

#

which comes from using the naive encoding approach which I am using

modest spire Aug 5, 2022, 8:05 AM

#

#

xD such a pretty graph

lapis sequoia Aug 5, 2022, 8:12 AM

#

is anyone willing to help me with tensorflow

unique flame Aug 5, 2022, 8:16 AM

#

For captcha classifying (and possibly bypassing)? no

lapis sequoia Aug 5, 2022, 8:17 AM

#

unique flame For captcha classifying (and possibly bypassing)? no

not for that

#

I just want to learn image classification

#

to create stuffs like traffic signal detector etc

#

so can you help me with that

modest onyx Aug 5, 2022, 8:22 AM

#

If you can't build a simple image classifier using tensorflow then you've definitely not done your homework

#

Go search a tutorial or better yet learn the underlying theory first

#

Thats if your goal is to learn

unique flame Aug 5, 2022, 9:17 AM

#

So I'm trying to make a confusion matrix for the validation set of my data, but the accuracy shown using model.evaluate() seems to be different than the accuracy shown in classification_report(), so the confusion matrix would already look weird. Anyone know what I'm doing wrong? The batch size for this specific goal was set to the amount of images in the data-set as some comments on stack overflow recommended to do that.

validation_dataset = image_dataset_from_directory(Path_to_images,
    image_size=(400, 400),
    validation_split=0.3,
    subset="validation",
    seed=2,
    batch_size=1801)

model=keras.models.load_model(Path_to_model)

#first method to get accuracy
val_loss, val_acc = model.evaluate(validation_dataset)
print(f"Validation accuracy: {val_acc:.3f}") #prints acc 93%

#second method to get accuracy also to get confusion matrix
y_true = np.concatenate([y for x, y in validation_dataset], axis=0)
y_pred = model.predict(validation_dataset).argmax(axis=1)
print(classification_report(y_true, y_pred)) #prints acc 17%

iron tusk Aug 5, 2022, 9:54 AM

#

Hello smart Python people! I have some problems with the MediaPipe realtime pose tracker. So I can get the landmarks and everything, but the data is very noisy and practically unusable. Does anybody know any way of smoothing realtime landmark data?

worthy phoenix Aug 5, 2022, 10:41 AM

#

hi, i was trying to make an image synthesis which uses clip, but i wanna run it on my cpu, i get this error:

RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half

here is the code:

def convert_weights(model: nn.Module):
    """Convert applicable model parameters to fp16"""

    def _convert_weights_to_fp16(l):
        if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
            l.weight.data = l.weight.data.half()
            if l.bias is not None:
                l.bias.data = l.bias.data.half()

        if isinstance(l, nn.MultiheadAttention):
            for attr in [*[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]], "in_proj_bias", "bias_k", "bias_v"]:
                tensor = getattr(l, attr)
                if tensor is not None:
                    tensor.data = tensor.data.half()

        for name in ["text_projection", "proj"]:
            if hasattr(l, name):
                attr = getattr(l, name)
                if attr is not None:
                    attr.data = attr.data.half()

    model.apply(_convert_weights_to_fp16)

im using pytorch if thats not clear, any help would be appreciated

#

here is a similar issue: https://github.com/nerdyrodent/VQGAN-CLIP/issues/70

GitHub

Error when running in CPU mode · Issue #70 · nerdyrodent/VQGAN-CLIP

Bug I get RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half' when running this against my CPU. To reproduce $ python generate.py -p "A...

#

but i dont wanna comment it out tbh

unborn crow Aug 5, 2022, 12:36 PM

#

hey, how can i show my df after i ran through a sklearn pipeline?

dim palm Aug 5, 2022, 12:47 PM

#

unborn crow hey, how can i show my df after i ran through a sklearn pipeline?

during the pipeline ? you can't

unborn crow Aug 5, 2022, 12:49 PM

#

dim palm during the pipeline ? you can't

i would like to take a peak after preprocessing

dim palm Aug 5, 2022, 12:53 PM

#

just print it ?

dim palm Aug 5, 2022, 12:54 PM

#

unborn crow i would like to take a peak after preprocessing

where is the pb 😅

sterile heath Aug 5, 2022, 1:41 PM

#

@tired matrix

tired matrix Aug 5, 2022, 1:43 PM

#

Thanks

real oyster Aug 5, 2022, 1:52 PM

#

Anyone here have experience with ML in Python?

Question regarding CNN in Keras that I am building:

I have 10000 images and training a CNN but I added data augmentation but as soon as I did it is taking like 10 hours to train my model AND even if it does and I trained it on like 40 EPOCHS it reaches an efficiency of like 61%. Is there any way I can speed it up? I guess a easy fix would just be to increase epochs to like 100 and get higher efficiency cause longer training time and higher epochs but like that is going to take 2 full days. Am I being too impatient or what do you think? Thank you!

I am a beginner

mild dirge Aug 5, 2022, 1:53 PM

#

Alright, so first of all, what is the model that you use, and is that "efficiency" the accuracy?

real oyster Aug 5, 2022, 1:54 PM

#

Screen_Shot_2022-08-05_at_9.54.05_AM.png

#

Yes like the accuracy is taking so long

mild dirge Aug 5, 2022, 1:54 PM

#

How many classes are there?

real oyster Aug 5, 2022, 1:54 PM

#

Like in my dataset all I have are 5000 images of "Yes it is a blowdryer" and 5000 images of "No it is not a blow dryer", 0 or 1

#

So like 2 classes?

mild dirge Aug 5, 2022, 1:55 PM

#

And your accuracy stays at 0.5?

real oyster Aug 5, 2022, 1:55 PM

#

mild dirge And your accuracy stays at 0.5?

I mean it looks like it

#

Kinda with little to no improvment

mild dirge Aug 5, 2022, 1:55 PM

#

That is as good as random guessing then

real oyster Aug 5, 2022, 1:55 PM

#

Yes

#

And this was after I added data augmentation

#

Before when I didn't have data augmentation which I heard was bad, my model was overfitting like crazy

mild dirge Aug 5, 2022, 1:56 PM

#

Alright, so it might be that something horribly goes wrong then, because it is making basically nothing from your data right now

#

What did you get before data augmentation?

real oyster Aug 5, 2022, 1:56 PM

#

mild dirge What did you get before data augmentation?

99%

#

But it was overfitting all the way to hell

mild dirge Aug 5, 2022, 1:57 PM

#

training or validation?

real oyster Aug 5, 2022, 1:57 PM

#

training

mild dirge Aug 5, 2022, 1:57 PM

#

training accuracy is only really useful to check if you are over/under-fitting

#

validation is what we care about

real oyster Aug 5, 2022, 1:57 PM

#

All I know is that the difference in val_loss and loss was like so large and the val_accuracy was staying the same

#

So it was overfitting

mild dirge Aug 5, 2022, 1:57 PM

#

Yes, but how much

#

what was your validation accuracy

real oyster Aug 5, 2022, 1:58 PM

#

82%

#

84%

mild dirge Aug 5, 2022, 1:58 PM

#

That is already quite decent

#

Better than random at least

real oyster Aug 5, 2022, 1:58 PM

#

Yes but I need it to be like 95%+

mild dirge Aug 5, 2022, 1:58 PM

#

And what does your model look like

#

And what have you tried to prevent overfitting

real oyster Aug 5, 2022, 1:58 PM

#

`model.add(Conv2D(64, kernel_size=4, activation="relu", input_shape = (256, 256, 3)))
model.add(MaxPooling2D(4,4))

model.add(Conv2D(32, kernel_size=3, activation="relu", padding="same"))
model.add(MaxPooling2D(3,3))

model.add(Flatten())
model.add(Dense(32))
model.add(Dense(train_generator.num_classes, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(0.0001)))`

#

This is what it is right now

#

I added data augmentation and it became the results I just showed you

#

Before this was performing at like 90%+ but was overfitting like crazy

mild dirge Aug 5, 2022, 1:59 PM

#

Why do you have sigmoid at the end, with 2 neurons?

real oyster Aug 5, 2022, 1:59 PM

#

mild dirge Why do you have sigmoid at the end, with 2 neurons?

I don't knwo

mild dirge Aug 5, 2022, 1:59 PM

#

Well, you chose it 😛

real oyster Aug 5, 2022, 2:00 PM

#

I asked another person and they said that since all I have is 0 and 1 make it 2 so I did

#

Like Blow dryer or no blow dryer

#

Could you help me understand

#

What would u change it to?

mild dirge Aug 5, 2022, 2:00 PM

#

Sigmoid in the final layer is mostly used when you want to check if a class is present or not

real oyster Aug 5, 2022, 2:00 PM

#

mild dirge Sigmoid in the final layer is mostly used when you want to check if a class is p...

Ok gotcha

mild dirge Aug 5, 2022, 2:00 PM

#

And each node then represents a class

real oyster Aug 5, 2022, 2:00 PM

#

mild dirge And each node then represents a class

Ok gotcha...

mild dirge Aug 5, 2022, 2:00 PM

#

So if you use sigmoid with 2 classes, you basically want to have 1 neuron in the final layer

#

which is the dryblower

#

OR you could use softmax with 2 neurons

real oyster Aug 5, 2022, 2:01 PM

#

model.add(Dense(1, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(0.0001)))

#

So like that?

mild dirge Aug 5, 2022, 2:01 PM

#

Yes, but you might want to change your labels from [0 1] [1 0] to just [0] and [1]

real oyster Aug 5, 2022, 2:02 PM

#

Right now my Labels are already that

#

I have it seperated in 2 folders

mild dirge Aug 5, 2022, 2:02 PM

#

I doubt this would change a lot about the accuracy, but this is the common way to do it

#

When looking at your model, it also seems that you start with a lot of filters, and you use maxpooling that decreases the size by a lot

real oyster Aug 5, 2022, 2:03 PM

#

mild dirge When looking at your model, it also seems that you start with a lot of filters, ...

Ok gotcha

mild dirge Aug 5, 2022, 2:03 PM

#

I would try only using maxpooling with 2x2, and start with less filters, and have more filters for later layers

real oyster Aug 5, 2022, 2:03 PM

#

mild dirge I would try only using maxpooling with 2x2, and start with less filters, and hav...

Ok gotcha

mild dirge Aug 5, 2022, 2:04 PM

#

As there are less small patterns, and more complex patterns typically in an image

#

And later layers can extract more complex features from smaller features

real oyster Aug 5, 2022, 2:04 PM

#

So like should I remove Conv2D too?

#

Or just change all the MaxPoolings to (2,2)

mild dirge Aug 5, 2022, 2:05 PM

#

you need convolutional layers

#

they are the ones extracting patterns

real oyster Aug 5, 2022, 2:05 PM

#

yes

mild dirge Aug 5, 2022, 2:05 PM

#

but maxpool with 3x3 removes a lot of information all at once

#

or 4x4 even

real oyster Aug 5, 2022, 2:05 PM

#

Ok gotcha

#

what else would you recommend thank you btw

mild dirge Aug 5, 2022, 2:05 PM

#

But this is mostly just gut feeling, from trying out stuff with datasets

real oyster Aug 5, 2022, 2:05 PM

#

`model.add(Conv2D(64, kernel_size=4, activation="relu", input_shape = (256, 256, 3)))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(32, kernel_size=3, activation="relu", padding="same"))
model.add(MaxPooling2D(2,2))

model.add(Flatten())
model.add(Dense(32))
model.add(Dense(train_generator.num_classes, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(0.0001)))`

#

changed it to 2,2 now

real oyster Aug 5, 2022, 2:06 PM

#

mild dirge But this is mostly just gut feeling, from trying out stuff with datasets

O ye like my dataset has lots of colors just saying

mild dirge Aug 5, 2022, 2:06 PM

#

So normally you want to increase the perceptive field (each layer should be able to deetect patterns from larger areas of the image)

real oyster Aug 5, 2022, 2:06 PM

#

mild dirge So normally you want to increase the perceptive field (each layer should be able...

Ok

mild dirge Aug 5, 2022, 2:06 PM

#

And you want to increase the amount of filters further in the model

real oyster Aug 5, 2022, 2:07 PM

#

gotcha

#

so how would I do that programmatically

real oyster Aug 5, 2022, 2:07 PM

#

real oyster `model.add(Conv2D(64, kernel_size=4, activation="relu", input_shape = (256, 256,...

from this

mild dirge Aug 5, 2022, 2:07 PM

#

What do you mean?

#

Do what?

real oyster Aug 5, 2022, 2:07 PM

#

mild dirge And you want to increase the amount of filters further in the model

How would I do this

#

Like what would you change in my model

#

Can you show me programatically

#

code snippet

mild dirge Aug 5, 2022, 2:08 PM

#

The amount of filters for the conv2d layers

real oyster Aug 5, 2022, 2:08 PM

#

So increase that amount?

mild dirge Aug 5, 2022, 2:08 PM

#

They are just parameters

#

incrementally over the layers yes

#

Later layers have more filters, earlier layers less

real oyster Aug 5, 2022, 2:08 PM

#

O I see

#

So like this??

Screen_Shot_2022-08-05_at_10.09.04_AM.png

#

Is 16 a good starting point?

mild dirge Aug 5, 2022, 2:09 PM

#

Yeah

#

could be

real oyster Aug 5, 2022, 2:09 PM

#

Ok gotcha thank you

#

Any other recommendations?

mild dirge Aug 5, 2022, 2:09 PM

#

There isn't a set rule, this is just common sense considering there are typically not that many small patterns, and more large patterns

real oyster Aug 5, 2022, 2:09 PM

#

O ye I am also using the Adam optimizer
tf.keras.optimizers.Adam(learning_rate=0.01)

mild dirge Aug 5, 2022, 2:10 PM

#

Yeah, could be okay

#

Just another hyper parameter you can tune

#

A lot of this is trial and error

real oyster Aug 5, 2022, 2:10 PM

#

Do you really think changing the maxpooling to (2,2) and changing the filter to that is going to make a difference?

mild dirge Aug 5, 2022, 2:10 PM

#

You should try a lot of cross validation with different parameters to see what works best

real oyster Aug 5, 2022, 2:10 PM

#

mild dirge You should try a lot of cross validation with different parameters to see what w...

How would I do that

mild dirge Aug 5, 2022, 2:10 PM

#

real oyster Do you really think changing the maxpooling to (2,2) and changing the filter to ...

It will make some difference probably yeah

#

Do you know k-fold cross validation?

real oyster Aug 5, 2022, 2:11 PM

#

mild dirge Do you know k-fold cross validation?

sadly no

mild dirge Aug 5, 2022, 2:11 PM

#

You should look it up then, you need to it to find out what model gives good results

#

You can also just have training/validation/test split

#

But that means you would use the same data to train and validate on for each set of hyper parameters

real oyster Aug 5, 2022, 2:12 PM

#

Hmm ok

mild dirge Aug 5, 2022, 2:12 PM

#

In any case, make sure to not use your test set while you are still working on your model

real oyster Aug 5, 2022, 2:12 PM

#

Btw thank you

#

It is looking way better..btw I just ran it

mild dirge Aug 5, 2022, 2:13 PM

#

I would first just try it out on unaugmented data, because it seemed that gave some issues the last time

#

and when you seem to hit a wall, you can try that to decrease overfitting

#

You could also add a dropout layer between two dense layers that you have

#

And maybe add another dense layer, because you only have 2 right now

real oyster Aug 5, 2022, 2:14 PM

#

So ur saying don't use data augmentation cause its messing it up and do it without. And when it overfits add more dense layers or dropout

real oyster Aug 5, 2022, 2:14 PM

#

mild dirge And maybe add another dense layer, because you only have 2 right now

Is that bad for what I am doing right now

mild dirge Aug 5, 2022, 2:14 PM

#

No, I would just add another dense layer, because 2 dense layers isn't often enough to predict the class using the convolutional layer features

#

And you can add a dropout layer to decrease overfitting

real oyster Aug 5, 2022, 2:14 PM

#

mild dirge No, I would just add another dense layer, because 2 dense layers isn't often eno...

Ok I will do that

mild dirge Aug 5, 2022, 2:15 PM

#

Dropout layer means that some weights get deactivated randomly between two layers

#

This means the model can't just base its decision on a set of features, it will need to make use of all/more of them

real oyster Aug 5, 2022, 2:16 PM

#

Ok gotcha

#

Thank. youso much for the help

mild dirge Aug 5, 2022, 2:16 PM

#

yeah nw

real oyster Aug 5, 2022, 2:17 PM

#

I will ask again. Right now I will take all ur advice and then test it all out. And then come back again. Thanks again!

#

Gonna first try without augmentation

real oyster Aug 5, 2022, 2:22 PM

#

mild dirge No, I would just add another dense layer, because 2 dense layers isn't often eno...

So I would add another one like this:

#

model.add(Dense(32))

#

But how many neurons would I have? Still 32?

wooden sail Aug 5, 2022, 2:29 PM

#

try a number between the previous layer and the next one

#

somewhere between 32 and the size of the output

real oyster Aug 5, 2022, 2:30 PM

#

Hmm ok

wooden sail Aug 5, 2022, 2:31 PM

#

that'S where you're putting the dense layer, between the 32 one and the output, right?

real oyster Aug 5, 2022, 2:32 PM

#

model.add(Dense(32)) model.add(Dense(16)) model.add(Dense(train_generator.num_classes, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(0.0001)))

#

So like this? I added the model.add(Dense(16))

#

inbetween

wooden sail Aug 5, 2022, 2:32 PM

#

sure

real oyster Aug 5, 2022, 2:32 PM

#

Gotcha thanks

#

will update on the progress. i have a bad feeling its going to overfitt..again

mild dirge Aug 5, 2022, 2:36 PM

#

You haven't added a dropout inbetween, which may also help

#

And still using sigmoid with 2 neurons for a binary prediction problem

wooden sail Aug 5, 2022, 2:41 PM

#

ah yeah you can boil that down to a single neuron output. but i'd say it's good to change one thing at a time and see what improvements you get

real oyster Aug 5, 2022, 2:42 PM

#

I am planning on changing it one thing at a time and then updating my progress in this chat. Thank you so much @mild dirge

#

And thank you @wooden sail for pitching in

#

Ok wait are you seeing this:

Screen_Shot_2022-08-05_at_10.45.01_AM.png

#

Again, it is starting to overfit

#

This is without data augmentation and my model looks like this:
`model.add(Conv2D(16, kernel_size=4, activation="relu", input_shape = (256, 256, 3)))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(32, kernel_size=3, activation="relu", padding="same"))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(64, kernel_size=3, activation="relu", padding="same"))
model.add(MaxPooling2D(2,2))

model.add(Flatten())
model.add(Dense(32))
model.add(Dense(train_generator.num_classes, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(0.0001)))`

After that runs I will add another dense layer and dropout

wooden sail Aug 5, 2022, 2:46 PM

#

but both the acc and val acc are increasing together, i wouldn't say that's overfitting, at least not all that bad

real oyster Aug 5, 2022, 2:46 PM

#

wooden sail but both the acc and val acc are increasing together, i wouldn't say that's over...

Ok but my graph when I graph it out looks really messed up

wooden sail Aug 5, 2022, 2:46 PM

#

can you show it?

real oyster Aug 5, 2022, 2:47 PM

#

Oh no I will show it after it runs completely

#

but I have done it before and played around and the graph does not look nice. Like the val_loss and loss difference is too great

#

eventualy

#

@wooden sail the difference is getting very large

Screen_Shot_2022-08-05_at_10.54.07_AM.png

#

losss and val_loss

wooden sail Aug 5, 2022, 2:55 PM

#

well, i've certainly seen worse models lol

real oyster Aug 5, 2022, 2:55 PM

#

Ok but I want. thismodel to be good not better than worse

wooden sail Aug 5, 2022, 2:55 PM

#

the loss is not normalized, so the raw number doesn't mean much to me

real oyster Aug 5, 2022, 2:55 PM

#

Are you sure?

#

@mild dirge can u look at the image too

#

Cause I was told it was overfitting

wooden sail Aug 5, 2022, 2:55 PM

#

if the loss has a dynamic range of thousands, for example, this is a tiny percentual difference

real oyster Aug 5, 2022, 2:56 PM

#

wooden sail if the loss has a dynamic range of thousands, for example, this is a tiny percen...

Not sure what you mean

mild dirge Aug 5, 2022, 2:56 PM

#

real oyster <@309775277720993792> can u look at the image too

Edd probably knows this stuff better then I lol

#

Why do you have the validation accuracy rounded to 1 decimal

#

Is it rounded, or floored?

real oyster Aug 5, 2022, 2:56 PM

#

It isnt

#

It is cut of into the next line

#

Look at the next line it's 0.8272 what not

mild dirge Aug 5, 2022, 2:57 PM

#

Well it is overfitting, but not by that much

real oyster Aug 5, 2022, 2:57 PM

#

But it's overfitting

#

dammit

wooden sail Aug 5, 2022, 2:58 PM

#

let it get to like 10 or 15 epochs and if you're not happy with the performance, add in the dropout that pccamel recommended

real oyster Aug 5, 2022, 2:58 PM

#

Ok so next step looking at the current model:
`model.add(Conv2D(16, kernel_size=4, activation="relu", input_shape = (256, 256, 3)))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(32, kernel_size=3, activation="relu", padding="same"))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(64, kernel_size=3, activation="relu", padding="same"))
model.add(MaxPooling2D(2,2))

model.add(Flatten())
model.add(Dense(32))
model.add(Dense(train_generator.num_classes, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(0.0001)))`

I will add dropout, another dense layer inbetween, and decrease epochs to 10 or 15?

#

Is doing the follow good:

#

I will add dropout, another dense layer inbetween, and decrease epochs to 10 or 15?

wooden sail Aug 5, 2022, 2:59 PM

#

sounds ok

#

but really, validation accuracy is normalized, but loss isn't

#

looking at the loss alone is not always enough to say what the performance is like. the validation was looking ok (even though it could be better)

mild dirge Aug 5, 2022, 3:00 PM

#

Not sure if an even kernel size is useful/common

#

Also, your first flattened layer has 65,000 ish neurons

#

So that will also not be too good for overfitting I'd think

wooden sail Aug 5, 2022, 3:02 PM

#

yeah i was thinking of that as well, regarding the size of the flattened layer. maybe use like dense 2048, 512, 128, 32, 1 or something like that. with dropout before the first dense and maybe between some of the other dense layers

mild dirge Aug 5, 2022, 3:02 PM

#

That's about 2 million weights

#

Between the first and second dense layer

wooden sail Aug 5, 2022, 3:03 PM

#

as for the even kernel, it's fine, but it's not symmetric, which might introduce some unexpected or difficult to interpret behaviors

mild dirge Aug 5, 2022, 3:03 PM

#

It's probably better to add another conv+maxpool

#

Or maybe a convolutional layer with a stride higher than 1

real oyster Aug 5, 2022, 3:03 PM

#

real oyster I will add dropout, another dense layer inbetween, and decrease epochs to 10 or ...

Ok so should I implement these changes?

mild dirge Aug 5, 2022, 3:04 PM

#

The biggest problem right now is having 2 million weights between two of your dense layers

#

That would be the first thing to fix

real oyster Aug 5, 2022, 3:08 PM

#

How can I fix that

#

I know I am asking a lot but would it be ok to hop on a quick call

#

I will share my screen

#

If not that's completely ok

mild dirge Aug 5, 2022, 3:10 PM

#

mild dirge It's probably better to add another conv+maxpool

.

real oyster Aug 5, 2022, 3:11 PM

#

mild dirge .

what

mild dirge Aug 5, 2022, 3:11 PM

#

I replied to a message with a tip to reduce the amount of neurons

#

I can't go on voice right now

real oyster Aug 5, 2022, 3:12 PM

#

Oh ok

wooden sail Aug 5, 2022, 3:14 PM

#

try adding another conv with max pool, but don't increase the number of filters anymore

#

that'll cut the size of the output roughly in half before going into the dense layers

#

and put a dropout there, too

mild dirge Aug 5, 2022, 3:14 PM

#

wooden sail that'll cut the size of the output roughly in half before going into the dense l...

it will be cut into a quarter

wooden sail Aug 5, 2022, 3:15 PM

#

i flattened in my head in the wrong order, oops

#

1/4 is right, yeah

real oyster Aug 5, 2022, 3:18 PM

#

`model.add(Conv2D(16, kernel_size=4, activation="relu", input_shape = (256, 256, 3)))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(32, kernel_size=3, activation="relu", padding="same"))
model.add(Dropout(0.2))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(64, kernel_size=3, activation="relu", padding="same"))
model.add(Dropout(0.2))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(64, kernel_size=3, activation="relu", padding="same"))
model.add(MaxPooling2D(2,2))

model.add(Flatten())
model.add(Dense(64))
model.add(Dropout(0.2))
model.add(Dense(32))
model.add(Dropout(0.2))
model.add(Dense(train_generator.num_classes, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(0.0001)))`

#

Ok so like this?

wooden sail Aug 5, 2022, 3:20 PM

#

i take back my 2048 recommendation, that'll make the problem monstrous lol

#

try 64 or keep it at 32 as you already had it, i had neglected the size of the conv output earlier

real oyster Aug 5, 2022, 3:22 PM

#

I updated my message

#

does that look good

wooden sail Aug 5, 2022, 3:22 PM

#

yeah

dusty valve Aug 5, 2022, 3:33 PM

#

modest onyx <@879807617260716143>what encoding/decoding method did you use?

all i did was py vocab = ''.join(sorted(set(text))) char2int = {c: i for i, c in enumerate(vocab)} int2char = {i: c for i, c in enumerate(vocab)}

#

tbh that .join was not necessary

dusty valve Aug 5, 2022, 3:52 PM

#

yes something is up with my encoding

dusky storm Aug 5, 2022, 4:17 PM

#

Can someone help me i don't know where i went wrong though i gave correct import file it's showing me no module error !!! Any fixes

iron tusk Aug 5, 2022, 4:24 PM

#

You should try adding a dot before Backend in line 1

hidden fern Aug 5, 2022, 4:27 PM

#

i need a pseudo code for:

if column[a] has values: X, Y, Z then create a new column with values that say “Alphabet” else impute values “not alphabet”

desert oar Aug 5, 2022, 4:59 PM

#

please don't post screenshots of your IDE. explain your problem in words and post your error output & file structure as text using a code block.
this isn't a data science question. please carefully read #❓｜how-to-get-help.

dusky storm Aug 5, 2022, 4:59 PM

#

Uhm sorry by the way !!! I will delete it

tidal bough Aug 5, 2022, 5:49 PM

#

How can I, after something like df.groupby("username"), apply a custom function to each group that collapses each group into one row?

#

I thought .groupby().apply did that, but it seems to be called on each row, not on each group.

wooden sail Aug 5, 2022, 5:51 PM

#

how about df.loc[some_condition_on_some_col, some_other_col].apply()? though i guess that requires repeating the process several times, hmm

tidal bough Aug 5, 2022, 5:52 PM

#

Not sure how that'd help - yeah, I want to collapse every group (all rows with the same username) into one row.

wooden sail Aug 5, 2022, 5:52 PM

#

you want to actually collapse it or just apply the same func to all of the ones with the same username?

tidal bough Aug 5, 2022, 5:53 PM

#

Actually collapse. So I want groupby to call my function with an entire group at a time, and it'd return one row per group.

wooden sail Aug 5, 2022, 5:55 PM

#

maybe agg? but i'm out of my depth in this one

untold bloom Aug 5, 2022, 5:58 PM

#

.apply is capable of accepting a function returning 1 row as well as .agg

#

but the thing is: there might be a better way without these if the function is not so customized

untold bloom Aug 5, 2022, 5:59 PM

#

untold bloom `.apply` is capable of accepting a function returning 1 row as well as `.agg`

e.g., df.groupby("item").agg(lambda gr: gr.iloc[0]) should reduce the dataframe to length being number of unique items, and the rows will be the first rows of each group

#

.apply will work as well, except it will retain the grouper column as a column in the result, too.

tidal bough Aug 5, 2022, 6:00 PM

#

untold bloom `.apply` is capable of accepting a function returning 1 row as well as `.agg`

The issue is that it seems to get called on each row of each group rather than each group

untold bloom Aug 5, 2022, 6:00 PM

#

need to prove it :p

tidal bough Aug 5, 2022, 6:00 PM

#

as for aggregate, hmm

untold bloom Aug 5, 2022, 6:01 PM

#

it can call your function for the first group twice to seek a fast path if possible, but no, it won't call it per row of groups; it will do for the entire group.

tidal bough Aug 5, 2022, 6:03 PM

#

With aggregate, the issue seems to be that it works separately on each column

untold bloom Aug 5, 2022, 6:04 PM

#

yes; you'd want .apply, then

#

afk&

tidal bough Aug 5, 2022, 6:11 PM

#

this is some sort of magic, huh