#data-science-and-ml | Python | Page 390

misty flint Mar 26, 2022, 11:43 PM

#

google's crash course on GANs is pretty good

#

grave frost Mar 26, 2022, 11:46 PM

#

yeah, but "crash courses" remain what they are - a quick overview. I would discourage people from actually using them, and instead encourage deep dives in the topic

misty flint Mar 26, 2022, 11:47 PM

#

eh if you just want a quick overview then thats what they are for

#

you cant deep dive into everything

#

kekHands

#

you should deep dive into what youre actually interested in tbh

steady basalt Mar 27, 2022, 12:20 AM

#

How the hell did a pro get 0.86 accuracy. I can only manage about 0.73 after doing all the steps

#

0.86 is on training data

thin palm Mar 27, 2022, 12:55 AM

#

Hi Python gang, I have a take home assessment for this job interview and was wondering if I can get some help on your thoughts of what to look for when doing data exploration? What's everyone's top 5 things they look for when examining data? Cheers!

misty flint Mar 27, 2022, 1:01 AM

#

thin palm Hi Python gang, I have a take home assessment for this job interview and was won...

Errors/dirty data (i.e. things that seem erroneous or might need data cleaning for further analysis)
Summary statistics (mean, distribution, etc.)
Outliers (special instances, etc.)
Visualize in multiple ways (may see something unexpected)
Basic models (to see any relationships)

thin palm Mar 27, 2022, 1:08 AM

#

misty flint 1. Errors/dirty data (i.e. things that seem erroneous or might need data cleanin...

Man awesome! But what do you mean by basic models? For more context I'm working on astronauts data and in another file their missions information

#

So I could make a model that says how likely it was to succeed

steady basalt Mar 27, 2022, 1:27 AM

#

thin palm Hi Python gang, I have a take home assessment for this job interview and was won...

Data leak

thin palm Mar 27, 2022, 1:31 AM

#

steady basalt Data leak

uhhh Data Leak may happen if you dont take away duplicates or do split train correctly

misty flint Mar 27, 2022, 1:51 AM

#

thin palm Man awesome! But what do you mean by basic models? For more context I'm working ...

anything that doesnt take too much time. if its a take home, you just want the low hanging fruit first. (i.e. youre not going to be building a neural network, but probably should look at a simple linear regression model)

#

(if the data is linear)

#

kekHands

thin palm Mar 27, 2022, 1:52 AM

#

misty flint anything that doesnt take too much time. if its a take home, you just want the l...

Man love it, thank you very much for this!
I'll do: Cleaning Data (duplicates, missing values,scaling), Data Viz, Feature engineering (encoding), feature selection(feature correlation, modeling)

misty flint Mar 27, 2022, 1:52 AM

#

best of luck bud. some of those take homes can eat up a lot of time

#

kekHands

#

🕯️

thin palm Mar 27, 2022, 1:53 AM

#

misty flint <:kekHands:948697940711587900>

factsssss

misty flint Mar 27, 2022, 1:53 AM

#

its good to have a standard approach

thin palm Mar 27, 2022, 1:53 AM

#

misty flint its good to have a standard approach

for sure

misty flint Mar 27, 2022, 1:53 AM

#

as that can help with this

pine flare Mar 27, 2022, 2:13 AM

#

Hey, I just joined this server, i'm 17 and wanting to get starting in data science and AI, how would y'all go about doing this? Im learning matplotlib and pandas libraries right now, I only started learning python 2 and a half months ago.

serene scaffold Mar 27, 2022, 3:51 AM

#

pine flare Hey, I just joined this server, i'm 17 and wanting to get starting in data scien...

are you going to go to college/university, and if so, will you be pursuing a degree related to DS/AI?

tacit basin Mar 27, 2022, 4:49 AM

#

pine flare Hey, I just joined this server, i'm 17 and wanting to get starting in data scien...

This book is great for starters https://allendowney.github.io/ElementsOfDataScience/README.html

hollow flare Mar 27, 2022, 4:53 AM

#

serene scaffold are you going to go to college/university, and if so, will you be pursuing a deg...

For Data science, where should I start from

serene scaffold Mar 27, 2022, 4:55 AM

#

hollow flare For Data science, where should I start from

Looks like the message above yours attempts to answer the same question.

hollow flare Mar 27, 2022, 4:56 AM

#

Ok, thanks

gusty forge Mar 27, 2022, 5:32 AM

#

Hey

#

Opencv lags so much when I run a ipynb notebook. Most of the time, frame gives a not responding message

#

What do I do

mint palm Mar 27, 2022, 6:53 AM

#

#

is this good^

#

?

mint palm Mar 27, 2022, 7:14 AM

#

initially val accu is higher then train but gets better later......is it ok?

spark sonnet Mar 27, 2022, 10:00 AM

#

umm vids to learn python?

lapis sequoia Mar 27, 2022, 10:02 AM

#

spark sonnet umm vids to learn python?

!resources
please keep other non-data science questions in the help channels

arctic wedgeBOT Mar 27, 2022, 10:02 AM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

spark sonnet Mar 27, 2022, 10:02 AM

#

lapis sequoia !resources please keep other non-data science questions in the help channels

okay

steady basalt Mar 27, 2022, 10:23 AM

#

would anyone know why my model performs worse on unseen data after tuning than before?

#

initial score is say 0.7 on cv, final test scores are more like 0.68

#

cv is performed on training set

#

https://stats.stackexchange.com/questions/104713/hold-out-validation-vs-cross-validation

Cross Validated

Hold-out validation vs. cross-validation

To me, it seems that hold-out validation is useless. That is, splitting the original dataset into two-parts (training and testing) and using the testing score as a generalization measure, is somewhat

#

@mild dirge I feel as though holding back test data is making my model seem WORSE than it should be

#

especially KNN

#

dropped to 0.58

#

auc

#

altho Random forest is pretty good

#

worse still, I am getting like 0.52 for precision on target feature being the 'positive' value

#

which is really bad in cases like detecting disease

blissful fulcrum Mar 27, 2022, 10:53 AM

#

Anyone can help me in this ?

steady basalt Mar 27, 2022, 11:10 AM

#

@mild dirge another thing is the hold out testing set is using very imbalanced data, so of course precision is high for predicting the class which has like 4x more values than the other class

blissful fulcrum Mar 27, 2022, 11:21 AM

#

lapis sequoia Mar 27, 2022, 11:30 AM

#

blissful fulcrum

you could combine rank with a groupby https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html

blissful fulcrum Mar 27, 2022, 11:31 AM

#

Thanks @lapis sequoia got the answer 👍

#

blissful fulcrum Mar 27, 2022, 12:17 PM

#

One More Question, I have a data frame and I wanted to generate a new column for colour codes which starts from red for the least value of Opportunity and moves toward green for the highest value of Opportunity

#

#

dataframe ☝️

#

Interpolation of colors basically

steady basalt Mar 27, 2022, 12:28 PM

#

You want to colour code the opportunity column?

#

For plotting or ?

blissful fulcrum Mar 27, 2022, 12:29 PM

#

yes

#

I need color codes for using in front end of my application so i can minimize the time for rendering cause data set so big

steady basalt Mar 27, 2022, 12:31 PM

#

You mean you’re going to categorise the column into a lower number of colour code?

blissful fulcrum Mar 27, 2022, 12:32 PM

#

yes you can say like that

steady basalt Mar 27, 2022, 12:49 PM

#

Do u have the colour codes you want?

#

You need how many?

blissful fulcrum Mar 27, 2022, 12:51 PM

#

I want values between red and green depends on Opportunity

#

top least val is dark red and top large val is dark green

wicked grove Mar 27, 2022, 2:24 PM

#

Can i penalize one of the classes of my model with this function? tf.nn.softmax_cross_entropy_with_logits

potent plank Mar 27, 2022, 2:29 PM

#

hello

#

!voiceverify

indigo cove Mar 27, 2022, 3:43 PM

#

Hello

#

Anyone willing to help with coding a special K-means algorithm?

mild dirge Mar 27, 2022, 3:44 PM

#

special?

indigo cove Mar 27, 2022, 3:44 PM

#

I am trying to code this

#

But when using

#

dist = cdist(X,np.array([D[i,:]]).T,axis=1)

#

I get an error

#

ValueError: XA and XB must have the same number of columns (i.e. feature dimension.)

mild dirge Mar 27, 2022, 3:47 PM

#

XA and XB?

#

there are no XA and XB in that snippet

indigo cove Mar 27, 2022, 3:47 PM

#

It is a common error

#

when using cdist

mild dirge Mar 27, 2022, 3:48 PM

#

alright, so print the shape of X and np.array([D[i,:]]).T and check if they are what you expect

indigo cove Mar 27, 2022, 3:55 PM

#

I am just thinking are their other ways to calculate the minimum distance between data points and clusters

#

since I only find errors

#

Using different methods

mild dirge Mar 27, 2022, 4:03 PM

#

Well it seems that there is a shape mismatch between the two points

#

so it can't calculate the distance

#

Some simple distance functions are f.e euclidean, or manhattan

indigo cove Mar 27, 2022, 5:11 PM

#

Anyone that can help me with some k-means coding

#

I am stuck again

serene scaffold Mar 27, 2022, 5:44 PM

#

indigo cove Anyone that can help me with some k-means coding

Try giving some more information. What code have you written so far, and what part are you stuck with?

hardy prism Mar 27, 2022, 5:46 PM

#

Long shot - anyone here use JS & PYTHON to script/automate in Microsoft Excel?

serene scaffold Mar 27, 2022, 5:46 PM

#

hardy prism Long shot - anyone here use JS & PYTHON to script/automate in Microsoft Excel?

python data science people usually use pandas for that kind of thing.

#

you can read excel data into python code with pandas, do all the transformations, and save the result (and any intermediary parts, if you want) back to excel.

steady basalt Mar 27, 2022, 5:49 PM

#

thats why no point using R 😆

hardy prism Mar 27, 2022, 5:53 PM

#

serene scaffold python data science people usually use pandas for that kind of thing.

Thanks. I figured. Hoping to find a unicorn who can help explain pros & cons

Do you of a better place I could ask this question?

serene scaffold Mar 27, 2022, 5:56 PM

#

hardy prism Thanks. I figured. Hoping to find a unicorn who can help explain pros & cons D...

the pros and cons of pandas vs what?

steady basalt Mar 27, 2022, 6:10 PM

#

hardy prism Thanks. I figured. Hoping to find a unicorn who can help explain pros & cons D...

I find pandas more simple than excel but that’s just cause I never used exel

haughty ibex Mar 27, 2022, 6:53 PM

#

i have strings that are formatted in different ways and i want them all to have the same format for example i have a string that is like "7 MY STRING" however i want it to be "7th My String" using ordinal suffixes. I'm thinking the best way to do this is to split the column after the integers add the ordinal suffixs (1st 2nd 3rd etc...) then use the title() method on the strings and then rejoin the column. I"m just not sure how i would implement it in pandas.

arctic crown Mar 27, 2022, 7:02 PM

#

is this a good course?https://learning.edx.org/course/course-v1:IBM+ML0101EN+2T2021/block-v1:IBM+ML0101EN+2T2021+type@sequential+block@ffe9e3abd5094c7d935d5af42e16df21/block-v1:IBM+ML0101EN+2T2021+type@vertical+block@1b528ed7a63b4badbeecdf6a57c5a377

wicked grove Mar 27, 2022, 7:03 PM

#

Hello,i have a 3 class classification problem

#

I want to increase the loss for one of the classes

#

Can i penalize one of the classes of my model with this function? tf.nn.softmax_cross_entropy_with_logits

#

I can't understand what the logits argument does

inland mantle Mar 27, 2022, 7:19 PM

#

I feel like I asked this before but what is the difference between machine learning and deep learning

misty flint Mar 27, 2022, 7:22 PM

#

the difference between deep learning and machine learning can be blurry but you should consider that deep learning is a special type of machine learning where typically the algorithm does its own "feature extraction" vs. doing it yourself

gLj252OGLhz51fOqQ9IVcEPSFXBD0hVwQ9IVcEPSFXBD0hVwQ9IVcEPSFXBD0hVwQ1KIvPWM0ffVCbJcuiKBqZy8v8BqC415yNb2OoAAAAASUVORK5CYII.png

quick eagle Mar 27, 2022, 7:22 PM

#

trying to get help: parsing data from a .fit file into pandas DF - is that something for this forum or elsewhere?

misty flint Mar 27, 2022, 7:22 PM

#

inland mantle I feel like I asked this before but what is the difference between machine learn...

typically deep learning involves a type of neural network structure which you dont usually see in traditional ML

inland mantle Mar 27, 2022, 7:24 PM

#

Which would be faster when trying to train data sets

haughty ibex Mar 27, 2022, 7:24 PM

#

if i understand correctly an example of machine learning would be like a streaming platform recommending you something based on your interests and then deep learning would be like how google developed ALPHAGO to beat the best GO player in the world.

quick eagle Mar 27, 2022, 7:28 PM

#

I'm in help-coconut, but do people just show up there, or do I 'recruit help; from here?

serene scaffold Mar 27, 2022, 7:31 PM

#

quick eagle I'm in help-coconut, but do people just show up there, or do I 'recruit help; fr...

you wait for someone to volunteer themselves to help, though you can crosspost in whichever topical channel relates to your question.

#

your question does not appear to be data science related, however

quick eagle Mar 27, 2022, 7:32 PM

#

yeah, its more about basic tuples/list/for loops.... what's the better place to start?

serene scaffold Mar 27, 2022, 7:34 PM

#

nevermind, I see that it's about pandas

wicked grove Mar 27, 2022, 7:43 PM

#

Hello, can i use a pytorch loss function with keras model.compile?

desert oar Mar 27, 2022, 7:45 PM

#

very unlikely

wicked grove Mar 27, 2022, 7:52 PM

#

desert oar very unlikely

i want to use the adaptive loss function available in pytorch but idk how i can do that

#

i have a model for 3 class classification and want to penalise one class i thought of using tf.nn.softmax_cross_entropy_with_logits

steady basalt Mar 27, 2022, 7:59 PM

#

inland mantle I feel like I asked this before but what is the difference between machine learn...

Deep learning is a subset of machine learning

hardy prism Mar 27, 2022, 8:16 PM

#

serene scaffold the pros and cons of pandas vs what?

Compared to using office.JS or node JS packages

hardy prism Mar 27, 2022, 8:16 PM

#

steady basalt I find pandas more simple than excel but that’s just cause I never used exel

Do you use JavaScript to automate or script?

agile cobalt Mar 27, 2022, 8:19 PM

#

hardy prism Do you use JavaScript to automate or script?

I'm pretty sure that nobody here uses JavaScript to automate stuff

#

and what do you mean by script?

steady basalt Mar 27, 2022, 8:25 PM

#

hardy prism Do you use JavaScript to automate or script?

I use python because pandas is python

#

The only beenfit to using JavaScript for this is if you are good with it and cant use python

#

Unless I’m misinterpreting what you want to do

misty flint Mar 27, 2022, 8:28 PM

#

CL6_ThinkingIntensifies

mint palm Mar 27, 2022, 8:48 PM

#

when making a NN for classification of lets say A, B, C. is it neccessary to have number of example in ratio 1:1:1 ?

agile cobalt Mar 27, 2022, 9:13 PM

#

it doesn't have to be 1:1:1, but if it's too unbalanced, you'll have to pay extra attention to that when evaluating the model

#

Tensorflow has a tutorial about it: https://www.tensorflow.org/tutorials/structured_data/imbalanced_data

TensorFlow

Classification on imbalanced data | TensorFlow Core

mint palm Mar 27, 2022, 9:15 PM

#

I checked its 350000:120000:200000

mild dirge Mar 27, 2022, 9:16 PM

#

mint palm I checked its 350000:120000:200000

Just make sure you have a separate test (possibly balanced) test on which you can see the accuracy of the model

agile cobalt Mar 27, 2022, 9:17 PM

#

~~converting to percentages 52% ; 18% ; 30%~~

mild dirge Mar 27, 2022, 9:17 PM

#

I had a dateset with a distribution somewhat like this, it had 400 classes, most having around 120 images, but some a bit more

#

And not balancing the data gave better results

#

Than simple down-sampling to 120 images

mint palm Mar 27, 2022, 9:19 PM

#

mild dirge And not balancing the data gave better results

So just test needs to be balanced?

mild dirge Mar 27, 2022, 9:19 PM

#

If you are looking at an average accuracy over the test set than that would be good imo yes

#

or another averaged performance measure

mint palm Mar 27, 2022, 9:20 PM

#

mint palm

Maybe thats why it was like this

mild dirge Mar 27, 2022, 9:20 PM

#

Otherwise if your data has 90000:2 ratio and your test set also has a similar ration, then obviously it can get a very high average accuracy without being able to separate the classes

#

As it would just guess the most common class every time, or at least a lot more often

mint palm Mar 27, 2022, 9:21 PM

#

I want it to be exceptional with no redundancies

mint palm Mar 27, 2022, 9:21 PM

#

mild dirge As it would just guess the most common class every time, or at least a lot more ...

Yes,

#

May be thats why test train split wasnt good enough

agile cobalt Mar 27, 2022, 9:22 PM

#

you might want to use balanced-accuracy instead of accuracy or something like that if your scoring function supports such

mint palm Mar 27, 2022, 9:22 PM

#

agile cobalt you might want to use `balanced-accuracy` instead of `accuracy` or something lik...

Dont know what that is

agile cobalt Mar 27, 2022, 9:22 PM

#

a scoring metric

mild dirge Mar 27, 2022, 9:23 PM

#

yeah, macro average vs micro average performance is something to look into

#

macro takes the average performance per class and then averages it to get a final score
micro average takes the average over the entire set, whether or not it has been balanced

#

iirc*

#

macro treats all classes as equally important

mint palm Mar 27, 2022, 9:25 PM

#

Actually a simple project has taught me a lot, no theory class can teach..

mild dirge Mar 27, 2022, 9:25 PM

#

Yeah for sure, google is a dang good professor 😛

steady basalt Mar 27, 2022, 9:46 PM

#

mild dirge Yeah for sure, google is a dang good professor 😛

Do you work as a data scientist?

mild dirge Mar 27, 2022, 9:46 PM

#

No I study AI

steady basalt Mar 27, 2022, 9:47 PM

#

I have come to conclude your strategy of taking a test set isn’t really useful in my work because the dataset doesn’t represent a population

#

I did take the set anyway to see how it acts on balanced vs unbalanced

mild dirge Mar 27, 2022, 9:47 PM

#

steady basalt I have come to conclude your strategy of taking a test set isn’t really useful i...

Why do you think this?

steady basalt Mar 27, 2022, 9:51 PM

#

Because it’s labelled data

#

And it’s showing many people with a condition and not as many without

#

So maybe in this case I should not split? Well if I had enough data I’d split and rebalance towards reality

#

But that takes a lot more data than I usb

#

Have

#

Also my friends lecturer said you shouldn’t do this at all

#

U shud balance test data

mild dirge Mar 27, 2022, 9:54 PM

#

Yeah but the the performance of your model should not be judged on the accuracy on the real poluation distribution.
Say you are a doctor and people come to you for diagnosis, as they might have a disease.
99% of the time the people have no harmful disease, so saying no to all the people would lead to an accuracy of a whopping 99%!
But you want to know how many of the patients that were sick could have been diagnozed before it got bad

steady basalt Mar 27, 2022, 9:55 PM

#

So you’re saying to not balance at all? That would mean it’s scored purely against the balance of data done in the original testing which we don’t know what it was

#

Why not judge it on real population? You can see it’s use as a screening tool

mild dirge Mar 27, 2022, 9:56 PM

#

No I am saying you what problems you run into when judging the performance of your model on the unbalanced population data

steady basalt Mar 27, 2022, 9:56 PM

#

In this case as you say it’s only going to be an accurate predictor or performance on peopel who probably have the disease

mild dirge Mar 27, 2022, 9:57 PM

#

You can use unbalanced data to test your performance on, but you should consider giving all classes equal weights towards the performance measure

steady basalt Mar 27, 2022, 9:57 PM

#

It’s binary

mild dirge Mar 27, 2022, 9:57 PM

#

then both classes, the point still holds

steady basalt Mar 27, 2022, 9:57 PM

#

That’s why I’ve balanced the data for model design

mild dirge Mar 27, 2022, 9:58 PM

#

right, thats good

steady basalt Mar 27, 2022, 9:58 PM

#

But testing I against an untouched holdout

#

That will only show performance on a group of people who mostly have the disease

#

Why not show how it does on a balanced population or better still a population likely to come to testing where more do not have disease

#

As per your advice it’s only compared against unseen data from original data which is a group that mainly has disease

#

How’s that more useful

#

Wouldn’t it be wise to give precision scores for different balances

mild dirge Mar 27, 2022, 10:00 PM

#

Because your goal is not to persé optimize the accuracy of the model on the population, but to optimize recall

steady basalt Mar 27, 2022, 10:00 PM

#

Which I do prefer in health data over accuracy anyway

mild dirge Mar 27, 2022, 10:00 PM

#

Making sure not too many people who have a disease that will be told they're healthy

steady basalt Mar 27, 2022, 10:01 PM

#

The precision is quite good on diagnosing but it’s really bad on detecting those without disease

#

I’d say it’s more acceptable than other way around

mild dirge Mar 27, 2022, 10:01 PM

#

precision would be seeing how many of the people that you diagnosed as having cancer, actually have cancer

#

Which I say is less important

steady basalt Mar 27, 2022, 10:01 PM

#

Exaclty

#

No though you want to make sure as many WITH cancer get treated

mild dirge Mar 27, 2022, 10:02 PM

#

I think making sure someone who is sick, will actually be diagnosed as sick and get themselves checked out

#

which is recall

steady basalt Mar 27, 2022, 10:02 PM

#

Which is why precision for positives is more important than negative

#

That’s precision ?

mild dirge Mar 27, 2022, 10:02 PM

#

precision is true positives / (true positives + false positives)

stone marlin Mar 27, 2022, 10:02 PM

#

Precision is: "I said all these people had cancer. How many of them actually did have cancer?"

#

Recall is: "Out of all of the people who had cancer, how many did I say had cancer?"

mild dirge Mar 27, 2022, 10:02 PM

#

yeah that

#

that's more intuitive than showing the formulas haha

stone marlin Mar 27, 2022, 10:03 PM

#

This graphic is the one every DS has taped to their wall, pret much:

steady basalt Mar 27, 2022, 10:03 PM

#

Ah ha

#

I should replace my precision with recall then

sand cedar Mar 27, 2022, 10:03 PM

#

stone marlin This graphic is the one every DS has taped to their wall, pret much:

Hell, I had that made into a magnet and put on my fridge LOL.

stone marlin Mar 27, 2022, 10:03 PM

#

Right, you're probably going to want Recall.

steady basalt Mar 27, 2022, 10:03 PM

#

But still, I have a holdout set

#

Is it worth balancing that at all

#

Desperately

#

Seperately

#

I have tested it unbalanced

#

As a first step

mild dirge Mar 27, 2022, 10:04 PM

#

well it's not that bad when you look at recall

steady basalt Mar 27, 2022, 10:04 PM

#

And that’s showing performance on a biased population

mild dirge Mar 27, 2022, 10:04 PM

#

and the class is heavily under represented in the training data

steady basalt Mar 27, 2022, 10:04 PM

#

Over represented

mild dirge Mar 27, 2022, 10:04 PM

#

Talking about the disease class

steady basalt Mar 27, 2022, 10:04 PM

#

Yeah

#

It’s over represented

stone marlin Mar 27, 2022, 10:04 PM

#

I only skimmed this, but if you're talking about doing SMITE/SMOTE with your holdout set, that's not good. You don't want to affect "real world data" by artificially inflating.

steady basalt Mar 27, 2022, 10:04 PM

#

Compare to real life

mild dirge Mar 27, 2022, 10:05 PM

#

Right, I just mean compared to balanced data

#

But you balanced it

steady basalt Mar 27, 2022, 10:05 PM

#

stone marlin I only skimmed this, but if you're talking about doing SMITE/SMOTE with your hol...

The thing is my friends professor said you should smote the test set too

stone marlin Mar 27, 2022, 10:05 PM

#

You're literally just scoring the model on the holdout, so it's saying, "Given that I feed this 1 row, how accurately would that be classified?" If you modified your holdout in some way, you're giving your scoring (not your model) an advantage.

mild dirge Mar 27, 2022, 10:05 PM

#

I disagree, and melatonin does as well

stone marlin Mar 27, 2022, 10:05 PM

#

Which makes zero sense, because your model is unaffected.

steady basalt Mar 27, 2022, 10:05 PM

#

Ye

stone marlin Mar 27, 2022, 10:05 PM

#

You should never SM[I/O]TE the test/holdout set.

sand cedar Mar 27, 2022, 10:05 PM

#

I agree, I don't understand why SMOTE would be useful here either.

stone marlin Mar 27, 2022, 10:05 PM

#

It makes zero sense to do so. It will artificially inflate your metrics.

#

Like, you're basically saying, "I think my model is good at detecting the thing... lemme give it a lot of easy cases it can correctly classify, which will inflate my metric."

steady basalt Mar 27, 2022, 10:06 PM

#

I don’t understand why there’s no merit to re balancing data to get a more accurate view or real populations so you can see how good it is as a screening diagnostic

mild dirge Mar 27, 2022, 10:07 PM

#

because you are just giving it "the same" (or very similar) cases over and over

stone marlin Mar 27, 2022, 10:07 PM

#

To sum this up in a very concise way: Your holdout set should, as closely as possible, represent the distribution of the real data you will be feeding it.

steady basalt Mar 27, 2022, 10:07 PM

#

So instead of 50 with disease and 10 without, select as a holdout the other way airings

steady basalt Mar 27, 2022, 10:07 PM

#

stone marlin To sum this up in a very concise way: _Your holdout set should, as closely as po...

Exactly

#

My point

#

Real data in real life

#

Wouldn’t be every 7/10 people have the disease

mild dirge Mar 27, 2022, 10:07 PM

#

You either balance it and take the averaged performance measure, or look at macro averaged performance measure

#

Which treats all classes as equally important

steady basalt Mar 27, 2022, 10:08 PM

#

Balance?

mild dirge Mar 27, 2022, 10:08 PM

#

yes, downsample

steady basalt Mar 27, 2022, 10:08 PM

#

I oversampled because I have a tiny set

mild dirge Mar 27, 2022, 10:08 PM

#

not upsample

steady basalt Mar 27, 2022, 10:09 PM

#

It’s going to really mess with my reliability

mild dirge Mar 27, 2022, 10:09 PM

#

reliability?

steady basalt Mar 27, 2022, 10:09 PM

#

But are you saying to undersample the test set

#

U said not to touch

stone marlin Mar 27, 2022, 10:10 PM

#

Okay, we've got a few things going on here. The data that you're given, in general, should represent the data that you expect to collect. If this is violated, nothing else matters.

steady basalt Mar 27, 2022, 10:10 PM

#

It doesn’t represent a population

#

At all

#

Nothing I can do about that have to just use the cards dealt to me

#

Unless we’re taking about a literal alcoholic hospital ward

stone marlin Mar 27, 2022, 10:12 PM

#

If your current data that you're training on does not represent the data that you expect to collect, then --- you can do some things synthetically to it, as we've noted, like SMITE or SMOTE, to TRY and make it similar to the real data. This isn't great, but it does work sometimes. You'd, then, do this before you do anything else. Then you'd split into train-test/holdout. Do not do this, see below.

#

In the past, this has worked like... 25% of the time for me, for standard datasets, but I tend to use this more for imbalance than anything.

steady basalt Mar 27, 2022, 10:13 PM

#

I have done that

#

Well not exaclty no

stone marlin Mar 27, 2022, 10:13 PM

#

Actually --- hm. Actually, someone else check me here --- I don't think SMITE/SMOTE before everything works nicely because there's gonna be data-leakage.

steady basalt Mar 27, 2022, 10:13 PM

#

I took pccamels advice and did the split before smote

#

As to have a non touched test set

stone marlin Mar 27, 2022, 10:13 PM

#

Yeah, I'd do exactly that, and then test on a non-SMOTE'd test set.

steady basalt Mar 27, 2022, 10:14 PM

#

But then we only have performance on a really unrealistic population

#

Why not inverse the balance and get a read on how it would be irl

mild dirge Mar 27, 2022, 10:15 PM

#

But again, you aren't aiming for a high accuracy on the population, you want to be able to see how many of the people with diseases you can diagnoze, and how many you deem healthy that actually have a disease.

stone marlin Mar 27, 2022, 10:15 PM

#

You're attempting to classify something, and you should be able to still do so with your test set. Recall / Precision / whatever. It sucks that you don't have a lot of data, but that's how it goes.

#

If you're introducing more positive elements, then "missing" one of these elements won't be as big of a deal for your model's score.

steady basalt Mar 27, 2022, 10:15 PM

#

mild dirge But again, you aren't aiming for a high accuracy on the population, you want to ...

Accuracy also kinda matters so you don’t misdiagnose and waste resources

stone marlin Mar 27, 2022, 10:16 PM

#

But I kind of get what you're saying here. The dataset in general isn't representative.

#

You don't want accuracy, you prob want prec, recall, and f1, and look at those.

mild dirge Mar 27, 2022, 10:16 PM

#

Sure, there's some balance between false positives and false nagtives you want to consider

#

Rather false positive than false negative

stone marlin Mar 27, 2022, 10:16 PM

#

You can prob find some beta for F_beta and adjust accordingly, but F_1 is usually a good inbetween.

steady basalt Mar 27, 2022, 10:17 PM

#

Btw, I took an accuracy read earlier on the training data before doing anything to it so it’s basically the same as test data in terms of distribution

#

My final model tuned on such performs 3% worse

stone marlin Mar 27, 2022, 10:17 PM

#

Yeah, it should be stratified.

#

What is the size of your whole dataset?

steady basalt Mar 27, 2022, 10:17 PM

#

That was with k=5

#

500+

stone marlin Mar 27, 2022, 10:17 PM

#

Like... 500 - 1000?

steady basalt Mar 27, 2022, 10:17 PM

#

Perhaps

#

Maybe just 500

#

Actually

stone marlin Mar 27, 2022, 10:18 PM

#

I wouldn't worry too much about +/-3% to whatever metric you're using there.

steady basalt Mar 27, 2022, 10:18 PM

#

So, my final report says that Iiterslly lost accuracy after doing all this work

#

To perfect a model

#

Looks like time wasted

stone marlin Mar 27, 2022, 10:18 PM

#

Sure, but how are prec / recall? Accuracy is rarely a good metric to use.

mild dirge Mar 27, 2022, 10:19 PM

#

you didn't "lose accuracy", the metric wasn't correct when you artificially inflated your test data

#

it had little meaning

steady basalt Mar 27, 2022, 10:19 PM

#

Should you test that and auroc before training too as a comparator benchmark

stone marlin Mar 27, 2022, 10:19 PM

#

To compare what? If your problem doesn't lend itself to use the accuracy metric, there is no point.

mild dirge Mar 27, 2022, 10:19 PM

#

ehh, before training the weights are randomized, so the results will probably be as good as random

steady basalt Mar 27, 2022, 10:19 PM

#

To see how much improvement came from tuning etc

#

?

stone marlin Mar 27, 2022, 10:20 PM

#

You should try a baseline (maybe just guessing the most frequent class, or something like that) but you need to know what metrics you'll be using before scoring.

steady basalt Mar 27, 2022, 10:20 PM

#

Else we can just assume it did nothing lol

mild dirge Mar 27, 2022, 10:20 PM

#

Yeah using a baseline classifier (like a small or simple algo) tells you more about how well the model performs on the problem than using a random guesser

stone marlin Mar 27, 2022, 10:20 PM

#

For example, in this case, you care a bit about precision and you care about recall. So you can do, you know, two models that choose either always choose zero or always choose 1 or whatever. Or you can do a simple linear model. That'll be an okay baseline.

#

My baseline is usually a linear model or a random forest, and I go from there.

mild dirge Mar 27, 2022, 10:21 PM

#

It also tells you how complex the problem might be

stone marlin Mar 27, 2022, 10:21 PM

#

But to emphasize: you need to choose your metrics before you compare anything to anything.

#

Most "real world" problems do very well with recall, precision, and [their harmonic mean] F_1.

steady basalt Mar 27, 2022, 10:22 PM

#

Oh I just tried now getting a bench mark I fit the random forest to the training data and evaluated it on predicting hold out

#

Scored 0.3

#

Weird

stone marlin Mar 27, 2022, 10:22 PM

#

Scored 0.3 for what? Accuracy?

steady basalt Mar 27, 2022, 10:22 PM

#

Might be because I forgot to reset kernel

#

Sec

#

Probably one of them got scaled and one didn’t

mild dirge Mar 27, 2022, 10:22 PM

#

make it guess the opposite and you get 0.7 ^^

steady basalt Mar 27, 2022, 10:22 PM

#

I’ll do without scaling

#

0.74

#

Accuracy

#

Recall 0.94

#

Lol

#

My model got wrecked by default

mild dirge Mar 27, 2022, 10:23 PM

#

yeah those seem like good metrics

steady basalt Mar 27, 2022, 10:24 PM

#

Yeah but

stone marlin Mar 27, 2022, 10:24 PM

#

Pret good recall.

steady basalt Mar 27, 2022, 10:24 PM

#

Then I go on to tune and do feature selection and scale

#

And the model then performs much worse

#

On the same holdout

mild dirge Mar 27, 2022, 10:24 PM

#

what kinda model?

steady basalt Mar 27, 2022, 10:24 PM

#

Random forest and KNN

stone marlin Mar 27, 2022, 10:25 PM

#

Here's my DS secret. Many models that I make work "just fine" out of the box. Most are like, "80% good" without too much fuss. It's the iterative optimization that's the extremely difficult part.

steady basalt Mar 27, 2022, 10:25 PM

#

So what I conclude is that essentially my entire processing stage as well as parameter optimisation and scaling and over sampling made my performance much worse

#

Tf can I fix this? Looks really bad as a conclusion lol

#

I want improvement

stone marlin Mar 27, 2022, 10:25 PM

#

Especially RFs.

#

Uh. You could try out xgboost and see if that does anythin' for you as opposed to RF.

mild dirge Mar 27, 2022, 10:26 PM

#

stone marlin Especially RFs.

rf?

steady basalt Mar 27, 2022, 10:26 PM

#

I shud use cv instead of .score right

stone marlin Mar 27, 2022, 10:26 PM

#

But honestly RF works really well right outt'a the box.

#

Random Forests, sorry Camel.

mild dirge Mar 27, 2022, 10:26 PM

#

oh random forest

misty flint Mar 27, 2022, 10:27 PM

#

RF praise

stone marlin Mar 27, 2022, 10:27 PM

#

I'm lookin' at a model right now for evaluation for work, and it's 90% feature engineering and then at the end it's like two lines of a grid search on a random forest. Works really well.

misty flint Mar 27, 2022, 10:28 PM

#

still performs well on non-random missing data

stone marlin Mar 27, 2022, 10:28 PM

#

It's not mine, but, you know.

misty flint Mar 27, 2022, 10:28 PM

#

kekHands

#

was for time series healthcare data too kekHands

steady basalt Mar 27, 2022, 10:28 PM

#

Anyone wana try fix my model

#

Maybe the problem is tuning isn’t wide enough

#

I only did about 500 searches

stone marlin Mar 27, 2022, 10:28 PM

#

Once you get time series data in the right form, it's a delight to work with. :'''] But before that? It's a gd nightmare.

steady basalt Mar 27, 2022, 10:29 PM

#

So it got beat by default

#

Should the benchmark be done after overdampling the training data

stone marlin Mar 27, 2022, 10:29 PM

#

It happens to the best of us. I get beat by my baseline model a bunch during hyperparam sweeps.

#

"I can't believe I lost to linear regression!"

mild dirge Mar 27, 2022, 10:29 PM

#

Maybe it is over-fitting on your training data if your model is complex

steady basalt Mar 27, 2022, 10:30 PM

#

It’s not really complex

steady basalt Mar 27, 2022, 10:30 PM

#

steady basalt Should the benchmark be done after overdampling the training data

?

mild dirge Mar 27, 2022, 10:30 PM

#

the bench mark should probably use the same data to train on, and the test data to test on

steady basalt Mar 27, 2022, 10:31 PM

#

I saw SMOTE as just a part of the process rather than having to be done before

#

Then you’d also say to benchmark on scaled data too

#

The only thing changing is the parameter of model

mild dirge Mar 27, 2022, 10:34 PM

#

smote is part of the process, but just the training process

steady basalt Mar 27, 2022, 10:41 PM

#

The metrics function which gives things like recall on a table only works for a single predict

#

How do u cross validate and use the same table as averages

bold timber Mar 28, 2022, 12:55 AM

#

Hi, I have a question: What the meaning of 1 in LogSoftmax?

tall blaze Mar 28, 2022, 1:03 AM

#

bold timber Hi, I have a question: What the meaning of 1 in LogSoftmax?

That's weird, I haven't seen softmax being used with a single output node. Typically softmax is used for classification outputs, while sigmoid is used for binary outputs

#

What is the purpose of this model.

thin palm Mar 28, 2022, 1:06 AM

#

what's the best plot for when I'm comparing countries and the top occupations in each country?

tall blaze Mar 28, 2022, 1:07 AM

#

thin palm what's the best plot for when I'm comparing countries and the top occupations in...

histogram

thin palm Mar 28, 2022, 1:08 AM

#

tall blaze histogram

I was thinking of making mulitple pie charts and each representing the country and then occuptaion

#

it's for astronaut data

tall blaze Mar 28, 2022, 1:09 AM

#

thin palm I was thinking of making mulitple pie charts and each representing the country a...

If you can fit it, I would personally start with a stacked histogram, where each country was assigned a color.

#

But I could see the pie thing if you had a user selection to select each country

quick eagle Mar 28, 2022, 1:09 AM

#

I'm trying to slice data in pandas to look at different areas of a data frame, eg:

df['field1'][4500:9000]

however, I'm doing graphs, etc, which means if I want to look at 5000:7000, I need to change it in a lot of places.
Is there a way to define a variable " slice = '4500:9000', and then use something like df['field1'][slice] ?

serene scaffold Mar 28, 2022, 1:10 AM

#

quick eagle I'm trying to slice data in pandas to look at different areas of a data frame, e...

with pandas, the whole dataframe is "one thing". it's not like looking up something in a list that's in a dict, where the list is a completely separate thing from the dict that it's in.

you need to use loc

#

!docs pandas.DataFrame.loc

arctic wedgeBOT Mar 28, 2022, 1:10 AM

#

pandas.DataFrame.loc


property DataFrame.loc```
Access a group of rows and columns by label(s) or a boolean array.

`.loc[]` is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

thin palm Mar 28, 2022, 1:10 AM

#

tall blaze If you can fit it, I would personally start with a stacked histogram, where each...

ahhh makes sense, I only have 40 countries but that may take a lot of space good call mate. Maybe even filter it with occupation and then countries filled inside

serene scaffold Mar 28, 2022, 1:11 AM

#

df.loc[4500:9000, 'field1'] is probably what you need, since it indexes by row and then by column.

tall blaze Mar 28, 2022, 1:11 AM

#

thin palm ahhh makes sense, I only have 40 countries but that may take a lot of space good...

Yea with 40 youll have to find a way to allow the user to select. Otherwise I cannot think of a way to make it not look like a mess

quick eagle Mar 28, 2022, 1:15 AM

#

something like this: ?

tzero = combined_dive_df.index[4600]
start = 4500
stop = 9000
fig = go.Figure()

fig.add_trace(go.Scatter(x=combined_dive_df.iloc['start':'stop'], y=combined_dive_df["SAC Rate (2 minute avg)_Shearwater"].loc['start':'stop'].interpolate(method='time'),
                    mode='lines', name='Shearwater'))

fig.add_trace(go.Scatter(x=combined_dive_df.iloc['start':'stop'], y=(14.7/100)*combined_dive_df["pressure_sac_Garmin"].loc['start':'stop'].interpolate(method='time'),
                    mode='lines', name='Garmin'))

serene scaffold Mar 28, 2022, 1:15 AM

#

combined_dive_df["SAC Rate (2 minute avg)_Shearwater"].loc['start':'stop'] this is wrong. the dataframe is one thing. this is treating it as two things.

#

if "SAC Rate (2 minute avg)_Shearwater" is the name of a column, it goes in the loc call after the row indexers.

#

are you picking both rows and columns, or just columns?

quick eagle Mar 28, 2022, 1:17 AM

#

just columns for the y axis, and trying to slice from row 'start' to row 'stop'

serene scaffold Mar 28, 2022, 1:17 AM

#

quick eagle just columns for the y axis, and trying to slice from row 'start' to row 'stop'

can you do print(combined_dive_df.head().to_dict('list'), combined_dive_df.head().index) and show the text (no screenshots)?

quick eagle Mar 28, 2022, 1:18 AM

#

ie, select the column to graph, and then only slice for the interesting bits

#

https://paste.pythondiscord.com/usigohoxik

serene scaffold Mar 28, 2022, 1:21 AM

#

@quick eagle these are the names of your columns. start and stop are none of them

Index(['distance_Garmin', 'enhanced_altitude_Garmin',
       'absolute_pressure_Garmin', 'depth_Garmin', 'ascent_rate_mm_s_Garmin',
       'heart_rate_Garmin', 'temperature_Garmin', 'unknown_135_Garmin',
       'unknown_136_Garmin', 'next_stop_depth_Garmin', 'next_stop_time_Garmin',
       'time_to_surface_Garmin', 'ndl_time_Garmin', 'n2_load_Garmin',
       'cns_load_Garmin', 'air_time_remaining_s_Garmin', 'pressure_sac_Garmin',
       'unknown_108_Garmin', 'timer_trigger_Garmin', 'event_Garmin',
       'event_type_Garmin', 'event_group_Garmin', 'unknown_19_Garmin',
       'unknown_20_Garmin', 'data_Garmin', 'transmitterID_Garmin',
       'pressure_100_Garmin', 'Heartrate_Garmin', 'ElapsedTime_Garmin',
       'Time (ms)_Shearwater', 'Depth_Shearwater',
       'First Stop Depth_Shearwater', 'Time To Surface (min)_Shearwater',
       'Average PPO2_Shearwater', 'Fraction O2_Shearwater',
       'Fraction He_Shearwater', 'First Stop Time_Shearwater',
       'Current NDL_Shearwater', 'Current Circuit Mode_Shearwater',
       'Current CCR Mode_Shearwater', 'Water Temp_Shearwater',
       'Gas Switch Needed_Shearwater', 'External PPO2_Shearwater',
       'Set Point Type_Shearwater', 'Circuit Switch Type_Shearwater',
       'External O2 Sensor 1 (mV)_Shearwater',
       'External O2 Sensor 2 (mV)_Shearwater',
       'External O2 Sensor 3 (mV)_Shearwater', 'Battery Voltage_Shearwater',
       'Tank 1 pressure (PSI)_Shearwater', 'Tank 2 pressure (PSI)_Shearwater',
       'Tank 3 pressure (PSI)_Shearwater', 'Tank 4 pressure (PSI)_Shearwater',
       'Gas Time Remaining_Shearwater', 'SAC Rate (2 minute avg)_Shearwater',
       'Ascent Rate_Shearwater', 'Safe Ascent Depth_Shearwater',
       'CO2mbar_Shearwater', 'moles_tank_Ideal_Garmin',
       'moles_tank_interpolate_Ideal_Garmin',
       'moles_tank_diff_interp_Ideal_Garmin',
       'liters_ambient_used_interp_Ideal_Garmin', 'moles_tank_Ideal_SW',
       'moles_tank_interpolate_Ideal_SW', 'moles_tank_diff_interp_Ideal_SW',
       'liters_ambient_used_interp_Ideal_SW'],
      dtype='object')

#

and then your rows are indexed by timestamps.

quick eagle Mar 28, 2022, 1:21 AM

#

correct - I'm trying to use 'start' and 'stop' as shortcuts for a slice, not as columns

serene scaffold Mar 28, 2022, 1:21 AM

#

as shortcuts for a slice?

quick eagle Mar 28, 2022, 1:24 AM

#

https://paste.pythondiscord.com/naranaxiba

thin palm Mar 28, 2022, 1:25 AM

#

in my Data there's 'Pilot' and 'pilot', thus my pandas is recognizing them as 2 unique values. Is this a good way to make them the same?

    return x.replace('P','p') ```
my code works, but want to see if this is like "okay cool", or if it's "why do that?"

quick eagle Mar 28, 2022, 1:25 AM

#

see the top - works with numbers, but when trying to 'centralize' the slice indexes into variables (so I can change it in one location, not 4), it doesn't work

serene scaffold Mar 28, 2022, 1:27 AM

#

@quick eagle I'm not following. .loc has one or two parts. the first (required) part is the row indexer, which in your case has to be a timestamp or a slice of timestamps. the second (which is optional) is the column indexer. they both go in the .loc[ ], separated by commas. any syntax that looks like df[ ][ ] or df[ ].loc[ ] is likely to be wrong.

.iloc is similar except that it's by position, regardless of how the DF is indexed.

tall blaze Mar 28, 2022, 1:27 AM

#

quick eagle something like this: ? ```py tzero = combined_dive_df.index[4600] start = 4500...

Yea for: "x=combined_dive_df.iloc[start:stop]" are you trying to return rows or columns?

quick eagle Mar 28, 2022, 1:28 AM

#

rows

#

from row start to row stop

tall blaze Mar 28, 2022, 1:28 AM

#

change it to loc

serene scaffold Mar 28, 2022, 1:28 AM

#

tall blaze Yea for: "x=combined_dive_df.iloc[start:stop]" are you trying to return rows or ...

I don't think this question makes sense. a dataframe always has rows and columns.

tall blaze Mar 28, 2022, 1:29 AM

#

serene scaffold I don't think this question makes sense. a dataframe always has rows and columns...

I try to use simple language

serene scaffold Mar 28, 2022, 1:29 AM

#

tall blaze I try to use simple language

how would you have asked the question if you weren't trying to hide any complexity?

quick eagle Mar 28, 2022, 1:30 AM

#

x=df.index[4500:9000], y=df["datafield"][4500:9000]

#

I'm trying to replace the above with

tall blaze Mar 28, 2022, 1:30 AM

#

serene scaffold how would you have asked the question if you weren't trying to hide any complexi...

Look at my profile and we can skip the check if I know what I am talking about

quick eagle Mar 28, 2022, 1:31 AM

#

x=df.index[4500:9000], y=df["datafield"][4500:9000]

a = 4500
b = 9000
x=df.index[a:b], y=df["datafield"][a:b]

serene scaffold Mar 28, 2022, 1:31 AM

#

tall blaze Look at my profile and we can skip the check if I know what I am talking about

I'm not trying to call that into question. I'm trying to understand what you meant.

quick eagle Mar 28, 2022, 1:32 AM

#

explicitly putting the slice indexes works, but trying to reference variables to slice with doesn;t

tall blaze Mar 28, 2022, 1:32 AM

#

serene scaffold I'm not trying to call that into question. I'm trying to understand what you mea...

Oh sorry, have had some interesting people here. I am asking if he is trying to grab a specific subset of the index with the code line he pasted

quick eagle Mar 28, 2022, 1:33 AM

#

basically, I have several hours of data, and just want to look at a specific time period - although plotly has some built in slicing on graphs, etc; I'm trying to do it by slicing the data frame

#

(but the specific time period is not known apriori)

serene scaffold Mar 28, 2022, 1:34 AM

#

quick eagle basically, I have several hours of data, and just want to look at a specific tim...

taking this statement on its own, it sounds like you need to first make sure that the rows are sorted by the index (so that they're in order by timestamp), and then use loc to pick a slice for the first and last timestamp that you want.

quick eagle Mar 28, 2022, 1:35 AM

#

yep, already sorted and indexed by timestamp

#

the x axis is datettime

#

which is also the index

tall blaze Mar 28, 2022, 1:35 AM

#

quick eagle yep, already sorted and indexed by timestamp

Is the timestam the index or have you created a new numerical one?

serene scaffold Mar 28, 2022, 1:35 AM

#

tall blaze Is the timestam the index or have you created a new numerical one?

there's a print of the index in this pastebin: https://paste.pythondiscord.com/usigohoxik.py

tall blaze Mar 28, 2022, 1:36 AM

#

ty

quick eagle Mar 28, 2022, 1:36 AM

#

https://paste.pythondiscord.com/naranaxiba

#

that one is later on, where there's stuff actually happening

thin palm Mar 28, 2022, 1:37 AM

#

In my data there's 'NAME' and 'NATIONALITY', but sometimes the names appear twice or more because they've competed in spaceflight, my pandas will count the same person twice, how do I prevent this?

serene scaffold Mar 28, 2022, 1:37 AM

#

thin palm In my data there's 'NAME' and 'NATIONALITY', but sometimes the names appear twic...

please do print(df.head().to_dict('list')) and show the text (no screenshots) so we know what you're working with.

thin palm Mar 28, 2022, 1:38 AM

#

or is this okay?

thin palm Mar 28, 2022, 1:38 AM

#

serene scaffold please do `print(df.head().to_dict('list'))` and show the text (no screenshots) ...

it's nasty, can I print out just the head?

serene scaffold Mar 28, 2022, 1:39 AM

#

thin palm it's nasty, can I print out just the head?

yes, as long as it's followed by .to_dict('list') in the code.

tall blaze Mar 28, 2022, 1:39 AM

#

quick eagle that one is later on, where there's stuff actually happening

So unless I am wrong here @serene scaffold I think he is attempting to insert numerical index values into the loc function when he should be putting in datetime values

thin palm Mar 28, 2022, 1:39 AM

#

serene scaffold yes, as long as it's followed by `.to_dict('list')` in the code.

{'id': [1, 2, 3, 4, 5], 'number': [1, 2, 3, 3, 4], 'nationwide_number': [1, 2, 1, 1, 2], 'name': ['Gagarin, Yuri', 'Titov, Gherman', 'Glenn, John H., Jr.', 'Glenn, John H., Jr.', 'Carpenter, M. Scott'], 'original_name': ['ГАГАРИН Юрий Алексеевич', 'ТИТОВ Герман Степанович', 'Glenn, John H., Jr.', 'Glenn, John H., Jr.', 'Carpenter, M. Scott'], 'sex': ['male', 'male', 'male', 'male', 'male'], 'year_of_birth': [1934, 1935, 1921, 1921, 1925], 'nationality': ['U.S.S.R/Russia', 'U.S.S.R/Russia', 'U.S.', 'U.S.', 'U.S.'], 'military_civilian': ['military', 'military', 'military', 'military', 'military'], 'selection': ['TsPK-1', 'TsPK-1', 'NASA Astronaut Group 1', 'NASA Astronaut Group 2', 'NASA- 1'], 'year_of_selection': [1960, 1960, 1959, 1959, 1959], 'mission_number': [1, 1, 1, 2, 1], 'total_number_of_missions': [1, 1, 2, 2, 1], 'occupation': ['pilot', 'pilot', 'pilot', 'pSp', 'pilot'], 'year_of_mission': [1961, 1961, 1962, 1998, 1962], 'mission_title': ['Vostok 1', 'Vostok 2', 'MA-6', 'STS-95', 'Mercury-Atlas 7'], 'ascend_shuttle': ['Vostok 1', 'Vostok 2', 'MA-6', 'STS-95', 'Mercury-Atlas 7'], 'in_orbit': ['Vostok 2', 'Vostok 2', 'MA-6', 'STS-95', 'Mercury-Atlas 7'], 'descend_shuttle': ['Vostok 3', 'Vostok 2', 'MA-6', 'STS-95', 'Mercury-Atlas 7'], 'hours_mission': [1.77, 25.0, 5.0, 213.0, 5.0], 'total_hrs_sum': [1.77, 25.3, 218.0, 218.0, 5.0], 'field21': [0, 0, 0, 0, 0], 'eva_hrs_mission': [0.0, 0.0, 0.0, 0.0, 0.0], 'total_eva_hrs': [0.0, 0.0, 0.0, 0.0, 0.0]}```

serene scaffold Mar 28, 2022, 1:40 AM

#

thin palm ```print(data.head().to_dict('list')) {'id': [1, 2, 3, 4, 5], 'number': [1, 2, 3...

thank you; one moment

#

keep in mind @thin palm that if you had only done print(df.head()), most of the columns would have been omitted, and it would be useless.

thin palm Mar 28, 2022, 1:41 AM

#

serene scaffold thank you; one moment

because if I want to find out how much time each person spent in space, it'll count duplicates. If Neil Armstrong went to space twice his first time in space was lets say 3 hours, then his next mission was 27 hours. it'll count 30 + 30 = 60. Even though it is only 30 hours in space.

quick eagle Mar 28, 2022, 1:41 AM

#

tall blaze So unless I am wrong here <@!253696366952316929> I think he is attempting to ins...

But note that when I explicitly use numbers ([int:int], it works just fine; the top half of https://paste.pythondiscord.com/naranaxiba shows this and it works just fine

tall blaze Mar 28, 2022, 1:42 AM

#

@quick eagle I think it is because you are calling the loc function vs the index function but I am not 100%. I would create a new numerical index in place of the current one:

df=df.replace_index(drop=False)

serene scaffold Mar 28, 2022, 1:43 AM

#

thin palm because if I want to find out how much time each person spent in space, it'll co...

I think you need something like df.groupby('name')['column_to_sum'].sum(), but where you replace 'column_to_sum' with a column name

tall blaze Mar 28, 2022, 1:43 AM

#

tall blaze <@!486606526425858048> I think it is because you are calling the loc function vs...

Hang on this is the wrong function. I just got back from vacation lol one sec

thin palm Mar 28, 2022, 1:43 AM

#

serene scaffold I think you need something like `df.groupby('name')['column_to_sum'].sum()`, but...

ahh okay, this makes sense

bold timber Mar 28, 2022, 1:43 AM

#

tall blaze What is the purpose of this model.

the purpose of the model is to predict whether diabetes or not

tall blaze Mar 28, 2022, 1:44 AM

#

tall blaze Hang on this is the wrong function. I just got back from vacation lol one sec

@quick eagle its reset_index(drop=False)

#

Then you can use numerical ranges for the loc function. Make sure that your data is sorted correctly first thou

thin palm Mar 28, 2022, 1:45 AM

#

serene scaffold I think you need something like `df.groupby('name')['column_to_sum'].sum()`, but...

for plotting purposes does a box plot sound like a good idea? to see average, mean, and outliers?

tall blaze Mar 28, 2022, 1:45 AM

#

bold timber the purpose of the model is to predict whether diabetes or not

For binary classification you should be using a sigmoid output layer then

quick eagle Mar 28, 2022, 1:47 AM

#

tall blaze Then you can use numerical ranges for the loc function. Make sure that your data...

hmm.. problem is that I'm merging 4 different sensors at different sampling rates, so I have to interpolate the data when graphin (otherwise nothing appears), and it needs to be interpolated by method=time

tall blaze Mar 28, 2022, 1:47 AM

#

if the model was to predict like type of diabetes you would most likely use softmax with the number of nodes equal to the number of diabetes types

bold timber Mar 28, 2022, 1:47 AM

#

tall blaze For binary classification you should be using a sigmoid output layer then

I design the architecture as a multiclass classification

tall blaze Mar 28, 2022, 1:51 AM

#

bold timber I design the architecture as a multiclass classification

so you can do this but it is unusual and depending on the dataset it wont yield as effective of a model. How well do you know the math for the two regression models?

#

sigmoid is the same as logistic regression as is always the go to choice for binary outputs

#

and I looked into this further, I am pretty sure you would need to have 2 output nodes with softmax regression even if you are doing a binary classifier

quick eagle Mar 28, 2022, 1:54 AM

#

tall blaze Then you can use numerical ranges for the loc function. Make sure that your data...

so I hard-coded the interpolation, and then reset the index (and used x = df['timestamp'] [a:b] ) and that works!

tall blaze Mar 28, 2022, 1:54 AM

#

quick eagle so I hard-coded the interpolation, and then reset the index (and used x = df['ti...

amazing!

bold timber Mar 28, 2022, 1:55 AM

#

tall blaze so you can do this but it is unusual and depending on the dataset it wont yield ...

I not really know about the two regression models. I'm a beginner for Neural Network and this is my first study case to learn. Can you explain it?

bold timber Mar 28, 2022, 1:56 AM

#

tall blaze sigmoid is the same as logistic regression as is always the go to choice for bin...

In this case I use a LogSoftmax to get an Integer value to predict yes or no. If I use a Binary Classificatio the value is floating point arrange 0 to 1

bold timber Mar 28, 2022, 1:56 AM

#

bold timber In this case I use a LogSoftmax to get an Integer value to predict yes or no. If...

I think we can use both

tall blaze Mar 28, 2022, 1:56 AM

#

tall blaze and I looked into this further, I am pretty sure you would need to have 2 output...

without getting way too complicated: the way softmax works is with 2 nodes the first one 0-1 will represent the models prediction for the first class and the second 0-1(diabetes) will represent the prediction for the second class (not having diabetes). For logistic it predicts two mutually exclusive variables so 0 will be no diabetes while 1 means that the model predicts diabetes

quick eagle Mar 28, 2022, 1:57 AM

#

tall blaze amazing!

looks likt the 'timestamp' column is preserved as a datetime (so I can do x = df[timestmap][a:b] - df[timestamp][time_zero] ), but the x axis labels are '0, 0,2T, 0.4T' etc ... never seen that before. what is that 'T" notation mean? (I'm trying to basically set a start time, and then elapsed time (in min:sec) since the time_zero point.....

tall blaze Mar 28, 2022, 2:02 AM

#

tall blaze without getting way too complicated: the way softmax works is with 2 nodes the f...

Additionally my thought is that if you dont use sigmoid it wont "let" the model know when it backward propagates that the two values are in fact mutually exclusive. So there is a possibility, in the way the math is written, for the model to predict that an individual both has diabetes and doesnt have diabetes at the same time

tall blaze Mar 28, 2022, 2:04 AM

#

tall blaze Additionally my thought is that if you dont use sigmoid it wont "let" the model ...

And although this is unlikely as the dataset prolly doesnt have those values it will lead to less efficent training and a less accurate model

tall blaze Mar 28, 2022, 2:07 AM

#

bold timber I think we can use both

one of the best ways I really learned this in my classes was writing a simple neural network from scratch: this is a fun tutorial I dug up. Try inputting both value and look at the results https://towardsdatascience.com/how-to-build-a-simple-neural-network-from-scratch-with-python-9f011896d2f3

Medium

How to build a simple Neural Network from scratch with Python

A Neural Network implementation without using a framework.

bold timber Mar 28, 2022, 2:10 AM

#

tall blaze one of the best ways I really learned this in my classes was writing a simple ne...

Ok, thank you so much

tall blaze Mar 28, 2022, 2:11 AM

#

bold timber Ok, thank you so much

of course, this stuff is very complex. Even with a masters in it, it can seem like a blackbox that just spits out numbers. I would say really dive into the math for each of the layers. And if you want to be an actual expert get a math PhD lol.

thin palm Mar 28, 2022, 2:17 AM

#

Does this boxplot make sense to everybody? the amount of time each country has spent in space.

Screen_Shot_2022-03-27_at_8.16.47_PM.png

desert oar Mar 28, 2022, 2:18 AM

#

thin palm Does this boxplot make sense to everybody? the amount of time each country has s...

that x axis label makes no sense

#

also what is the unit here? the total time per country is one value per country

quick eagle Mar 28, 2022, 2:18 AM

#

thin palm Does this boxplot make sense to everybody? the amount of time each country has s...

I would sort from most to least. also, it should be a total - not a range?? or are you doing it by astronaut?

desert oar Mar 28, 2022, 2:18 AM

#

a boxplot represents a distribution of values

desert oar Mar 28, 2022, 2:18 AM

#

quick eagle I would sort from most to least. also, it should be a total - not a range?? or a...

this, exactly. what is each data point? per astronaut?

thin palm Mar 28, 2022, 2:18 AM

#

desert oar this, exactly. what is each data point? per astronaut?

per astronaut per country

desert oar Mar 28, 2022, 2:19 AM

#

i would at least recommend sorting by median astronaut hours or by total number of astronauts

thin palm Mar 28, 2022, 2:19 AM

#

What plots would you use?

thin palm Mar 28, 2022, 2:19 AM

#

desert oar i would at least recommend sorting by median astronaut hours or by total number ...

I like that better tbh

quick eagle Mar 28, 2022, 2:19 AM

#

then make the x axis "hours per astronaut" or similar

#

otherwise it's misleading

desert oar Mar 28, 2022, 2:20 AM

#

the title should be "Distribution of astronaut total times in space, by country"

#

and the x axis could be "Total hours spent by astronaut in space"

thin palm Mar 28, 2022, 2:20 AM

#

desert oar the title should be "Distribution of astronaut total times in space, by country"

perfect!

#

so the box plot is fine yes? Just my labels?

desert oar Mar 28, 2022, 2:20 AM

#

it's fine in that it shows something that isn't nothing. but are you trying to show something specific? or just "something"

quick eagle Mar 28, 2022, 2:21 AM

#

I would emphasize by the title "Distribution of individual astronaut total times in space, by country"

desert oar Mar 28, 2022, 2:21 AM

#

that's better

#

"individual" makes it clearer

thin palm Mar 28, 2022, 2:21 AM

#

desert oar it's fine in that it shows something that isn't nothing. but are you trying to s...

my assigment is to take this data and create something out of it. I have data on Astronauts -> so I'm figuring out which country sent the most humans, what year they were being sent, men v women being sent, etc

thin palm Mar 28, 2022, 2:22 AM

#

quick eagle I would emphasize by the title "Distribution of individual astronaut total times...

It's my job to tell a story with this data

quick eagle Mar 28, 2022, 2:22 AM

#

also, you could try a violin plot, etc. the problem you have is with the US data set - it's extremely bimodal (~2week shuttle flights and 6month ISS expeditions), which makes the boxplot have a ton of outliers

desert oar Mar 28, 2022, 2:23 AM

#

+1 for violins, great observation

quick eagle Mar 28, 2022, 2:23 AM

#

did you try just plotting individual points?

thin palm Mar 28, 2022, 2:23 AM

#

quick eagle also, you could try a violin plot, etc. the problem you have is with the US dat...

what does the violin plot tell?

desert oar Mar 28, 2022, 2:23 AM

#

it might be interesting to plot number of astronauts vs total flight hours across all astronauts on a scatterplot

thin palm Mar 28, 2022, 2:23 AM

#

quick eagle did you try just plotting individual points?

individual points in what?

quick eagle Mar 28, 2022, 2:23 AM

#

just a dot per astronaut total time per country. no boxes

thin palm Mar 28, 2022, 2:24 AM

#

quick eagle just a dot per astronaut total time per country. no boxes

so scatter plot maybe?

quick eagle Mar 28, 2022, 2:24 AM

#

I usually start with scatterplot, and if the data distribution permits, then summarize with something else (eg boxplot, etc)

thin palm Mar 28, 2022, 2:25 AM

#

quick eagle I usually start with scatterplot, and if the data distribution permits, then sum...

ok ok, data viz not my thing but it's so valuable

wicked grove Mar 28, 2022, 2:25 AM

#

Hello, i have a 3 class classification problem and i want to penalise 1 class
Does this work for that tf.nn.softmax_cross_entropy_with_logits

#

I can't understand what i should put in the place of logits and labels

quick eagle Mar 28, 2022, 2:26 AM

#

thin palm ok ok, data viz not my thing but it's so valuable

boxplots work best for normally distributed data. they can be used for non-normal distributions, but they are less useful

misty flint Mar 28, 2022, 2:26 AM

#

data viz is def something to focus on especially in industry setting

quick eagle Mar 28, 2022, 2:26 AM

#

Strongly suggest the Edward Tufte series !!!

misty flint Mar 28, 2022, 2:26 AM

#

how you persuade stakeholders is important

quick eagle Mar 28, 2022, 2:26 AM

#

(4 books)

misty flint Mar 28, 2022, 2:27 AM

#

aka half of your job sometimes kekHands

thin palm Mar 28, 2022, 2:27 AM

#

quick eagle boxplots work best for normally distributed data. they can be used for non-norma...

my approach was to see what the average time spent in space per astronaut per country. But I'll do a scatter instead

#

It's for a job hopefully I make some cool stuff here

misty flint Mar 28, 2022, 2:27 AM

#

storytelling with data by cole knaflic is another recommendation

#

praise

quick eagle Mar 28, 2022, 2:28 AM

#

thin palm my approach was to see what the average time spent in space per astronaut per co...

average is not great for spaceflight times, as they are are either minutes, ~2 weeks, 6 months, or 1 year. You could try those as 3 bins

#

*4 bins

thin palm Mar 28, 2022, 2:29 AM

#

quick eagle average is not great for spaceflight times, as they are are either minutes, ~2 w...

great call, I didn't process much about it. But this makes sense thank you so much

misty flint Mar 28, 2022, 2:29 AM

#

yeah so part of it is really understanding the data

#

and that comes with experience or domain knowledge

quick eagle Mar 28, 2022, 2:30 AM

#

and the obvious story that ought to pop up right away is the soviet mostly long duration, the US has mostly short due to shuttle (10-14 days, but each flight had 7 crew, whereas ISS are crews of 3 per expedition)

thin palm Mar 28, 2022, 2:31 AM

#

quick eagle average is not great for spaceflight times, as they are are either minutes, ~2 w...

how do you feel about what I have?
1.)The amount of times each country has sent someone on a space mission
2.)Plot of the amount of times only women have gone on missions (only showing which countries have sent them)
3.)Histplot showing what year each astronaut was selected for space missions (showing the year that had most declared missions)
4.)When the year of mission was actually initiated (another histplot) from 1960 - 2020

quick eagle Mar 28, 2022, 2:31 AM

#

misty flint and that comes with experience or domain knowledge

yes - knowing about the history of spaceflight lets you shortcut the wonky distributions

quick eagle Mar 28, 2022, 2:35 AM

#

thin palm how do you feel about what I have? 1.)The amount of times each country has sent...

1 could be panel A: # times by citizenship, B: total by citizen
2 - yes, although it would be 'women', not 'only women', as there haven't been any all-female crews (yet!)
3 - may be interesting when they were selected vs first flew
4- yes

#

your initial plot is 1B (with the corrections discussed)

#

"amount of times each country has sent someone on a space mission" -> "number of times a citizen of that country has gone to space" ; international partners go on US, or Russia/USSR vehicles

thin palm Mar 28, 2022, 2:40 AM

#

quick eagle 1 could be panel A: # times by citizenship, B: total by citizen 2 - yes, althoug...

I think for a final graph I will include which occupation was was in space the most, we have pilot, PSP, commander, space tourist. Would be cool to see which occupation was most trusted and which went to space just for fun (space tourist)

quick eagle Mar 28, 2022, 2:42 AM

#

thin palm I think for a final graph I will include which occupation was was in space the m...

you may have to tweak that - I think you mean crew role (pilot ,commander, mission specialist, payload specialist for NASA, commander, flight engineer, spaceflight participant for RSA), not occupation/training (pilot, geologist, doctor, electrical engineer, etc)

thin palm Mar 28, 2022, 2:42 AM

#

quick eagle you may have to tweak that - I think you mean crew role (pilot ,commander, missi...

sorry by occupation I am referring to my data! Yes role in the mission

quick eagle Mar 28, 2022, 2:45 AM

#

also, commander vs pilot is a bit tricky (the 'lead pilot' was the commander, but 2 were trained as pilots for shuttle; the rest were all mission specialists, with the occasional payload specialist for shuttle). apollo was 3 pilots - lunar, command module, and commander...

mystic cloud Mar 28, 2022, 3:04 AM

#

Can someone help me with pycharm + virtualenv + jupyter notebook?
I have the venv created with inherit global options (for jupyter in global python) and tensorflow in my new virtualenv
I open the jupyter notebook, try to import tensorflow and it shows as it is not installed but it is installed ._.

desert oar Mar 28, 2022, 3:08 AM

#

quick eagle yes - knowing about the history of spaceflight lets you shortcut the wonky distr...

on the other hand, plotting the wonky distributions correctly actually helps you see the pattern

thin palm Mar 28, 2022, 4:02 AM

#

quick eagle also, commander vs pilot is a bit tricky (the 'lead pilot' was the commander, bu...

how's this puppy?

Screen_Shot_2022-03-27_at_10.02.27_PM.png

quick eagle Mar 28, 2022, 4:03 AM

#

thin palm how's this puppy?

nice! looks like you may need to try log x scale? everything is piled up on the left

thin palm Mar 28, 2022, 4:04 AM

#

quick eagle nice! looks like you may need to try log x scale? everything is piled up on the ...

yeah I moved my nationality over to the y axis because it was getting crunched up on the x axis and I coudln't figure out how to resize

quick eagle Mar 28, 2022, 4:15 AM

#

I'm trying to calculate a difference over a time period (eg 2 min), but my dataset has samples at a non-constant sample rate. is it possible to do a .diff(period=X) where X is '2min', not a set number of steps? my index is timestamp

#

(this is kind of like using .interpolate(method='time'), but with diff)

mint palm Mar 28, 2022, 5:48 AM

#

Is it ok to initially have higher accuracy then trains set....cuz they really come from same distridution...

worn bough Mar 28, 2022, 6:44 AM

#

quick eagle I'm trying to calculate a difference over a time period (eg 2 min), but my datas...

I think you could use .rolling using a timestamp and then apply the .diff

#

(the documentation links to here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)

dusty ivy Mar 28, 2022, 7:03 AM

#

#

why does the line not fits to the dots?

#


    vector<double> X = { 38, 50, 15, 30, 50, 38, 50, 20, 45, 50, 20, 35, 30, 43, 35, 37.5, 37, 35, 30, 45, 4, 37.5, 25, 46, 30, 200, 200, 30
    };
    
    // variable X
    vector<double> Y = { 8000, 6400, 2500, 3000, 6000, 5000, 8000, 4000, 11000, 25000, 4000, 8800, 5000, 7000, 8000, 1800, 5400, 15000, 3500, 2400, 1000, 8000, 2100, 8000, 4000, 1000, 2000, 4800
    };
    
    
    double alpha = 0.0001; // learning rate
    int epoch = 1000;// number of epochs
    SimpleLinearRegression *slr = new SimpleLinearRegression(X, Y, alpha, epoch, true);
    slr->train();
    slr->print_yhat();

    
    vector<double> Y_c = slr->predict(X);

    // denormalize Y_c
    vector<double> Y_c_denormalize;
    double Y_MAX = *max_element(Y.begin(), Y.end());
    double Y_MIN = *min_element(Y.begin(), Y.end());
    double X_MAX = *max_element(X.begin(), X.end());
    double X_MIN = *min_element(X.begin(), X.end()); 

    for(int i = 0; i < Y_c.size(); i++){
        Y_c_denormalize.push_back(Y_c[i] * ((Y_MAX - Y_MIN) + Y_MIN));
    }

    
    double Y_c_MAX = *max_element(Y_c_denormalize.begin(), Y_c_denormalize.end());
    double Y_c_MIN = *min_element(Y_c_denormalize.begin(), Y_c_denormalize.end());

    // Scatter plot
    matplotlibcpp::figure_size(700, 500);
    matplotlibcpp::scatter(X, Y, 25);

    double x = 45;
    double y = slr->predict(x);
    double y_denorm = y * (Y_MAX - Y_MIN) + Y_MIN;

    cout << "Prediction of " << x << " Hours Per week is " << y_denorm << " Income" << endl;
    
    matplotlibcpp::plot({X_MIN, X_MAX}, {Y_c_MIN, Y_c_MAX}, "r");
    matplotlibcpp::xlabel("Hours per Week (x)");
    matplotlibcpp::xlim(0, 80);
    matplotlibcpp::ylabel("Income (y)");
    matplotlibcpp::title("Scatter Plot");
    matplotlibcpp::show();

#

the problem here is when I want to denormalization the value of Y_c

#

I just want to hardcode this one in C++ rather than using libraries in python

lone drum Mar 28, 2022, 8:49 AM

#

hello i am having dataframe which has a column name marks in that values arepython 21100 23000 25650 78550 36100 22600 22700 34550 i want to get rows which are multiple of 100 for e.g. my expected output python 21100 23000 36100 22600 22700 this way ping me when reply

inland zephyr Mar 28, 2022, 9:25 AM

#

i need suggestion about image classification or related field about image processing in ML

#

is it common to consider original image resolution and depth (eg: dpi of the image) before feed it to NN for create the knowledge? since i aware that image size to used are generally small (ranged between 120x120 px to 200x200 px) to feed up the NN.

#

Also is it considerable whether process each channel separately and combine it in the of the NN? since my intuition said for colored image, R G and B channel must have different value and could have different story to tell the NN about the image.

steady basalt Mar 28, 2022, 10:20 AM

#

dusty ivy why does the line not fits to the dots?

Remove outlier

maiden pelican Mar 28, 2022, 10:20 AM

#

Where can I find code for back propagation ?

dusty ivy Mar 28, 2022, 10:35 AM

#

steady basalt Remove outlier

what outlier?

maiden pelican Mar 28, 2022, 10:37 AM

#

Can somebody help me with bp neural network algorithm ?

vestal saffron Mar 28, 2022, 11:15 AM

#

lone drum hello i am having dataframe which has a column name `marks` in that values are``...

Use modulo 100

strange stump Mar 28, 2022, 11:17 AM

#

@misty flint i need help 😄

misty flint Mar 28, 2022, 11:45 AM

#

lol just ask. others can help too

#

it is 6am over here kekHands

mellow vapor Mar 28, 2022, 11:52 AM

#

If a single layer MLPClassifier gives me an accuracy of like 94%
does adding more layers guranteer any more precision or scope of improvement
or is it just black box testing
may or may not work?

woeful falcon Mar 28, 2022, 11:54 AM

#

!e Why is it not rounding the decimal places in the array

import numpy as np

w = np.array([9.79810329e+209,
 2.01077594e+210,
 1.57202605e+210,
 2.53363565e+210])

print(np.round(w,10))

arctic wedgeBOT Mar 28, 2022, 11:54 AM

#

@woeful falcon :white_check_mark: Your eval job has completed with return code 0.

[9.79810329e+209 2.01077594e+210 1.57202605e+210 2.53363565e+210]

strange stump Mar 28, 2022, 11:58 AM

#

misty flint lol just ask. others can help too

bruh i failed that

#

rip

misty flint Mar 28, 2022, 12:00 PM

#

oof

#

🕯️

#

what happened

strange stump Mar 28, 2022, 12:06 PM

#

bro

#

so

#

the data set was like

#

containership sizes

#

ok

#

i tried to see if there is some correlation between

#

the age of the ship and the size

#

sht like that

#

heatmap

#

i couldnt

#

it said there are strings in the data

#

i check on excel for strings

#

literally nothing

#

one question was to split the size of the containers into 5000 bands

#

and plot and find the distribution

#

wtf

#

"cant split strings"

#

what can i do man

#

never began i gotta work as some cleaner forever now

mild dirge Mar 28, 2022, 12:09 PM

#

You can write sentences without an enter every 3 words 😛

strange stump Mar 28, 2022, 12:09 PM

#

im sorry im just mad 😦

misty flint Mar 28, 2022, 12:09 PM

#

Blob_pat

mild dirge Mar 28, 2022, 12:09 PM

#

And if you are using pandas, you should try convert the columns to floats instead

#

if they are supposed to be floats

strange stump Mar 28, 2022, 12:10 PM

#

yeah they are

misty flint Mar 28, 2022, 12:10 PM

#

its ok bro, i failed my first takehomes as well

#

kekHands

#

it gets better with experience

strange stump Mar 28, 2022, 12:10 PM

#

well this was the analysis

#

i wrote down what i would have done

#

hopefully i can talk my way into it lol

#

tmro is the interview

#

wait do you think i can get away with trying something now after the allotted time and presenting it in the interview would it be appropriate

misty flint Mar 28, 2022, 12:12 PM

#

idk your constraints so maybe

#

kekHands

strange stump Mar 28, 2022, 12:13 PM

#

it was 30 mins

#

4 questions

misty flint Mar 28, 2022, 12:13 PM

#

thats rough

#

idk what they expect for 30 mins tbh

#

so i feel like whatever you say is fine

#

lolo

strange stump Mar 28, 2022, 12:14 PM

#

bruh i dont get it

#

30 mins

#

like

#

not enough time omg

#

its for a junior role as well

#

do they expect me to be some pro at 20 yrs old 0 experience

misty flint Mar 28, 2022, 12:15 PM

#

yeah thats def not enough time

#

for much

#

only if youre experienced would you maybe get anything valuable

strange stump Mar 28, 2022, 12:15 PM

#

oh i found out my problem

#

the fking added commas for the 10000

#

like

#

13,000

#

so the code wont see this as a number

#

i gotta split(",") like this right? or something

steady basalt Mar 28, 2022, 12:17 PM

#

dusty ivy what outlier?

There’s an outlier

#

Pulling the line up?

misty flint Mar 28, 2022, 12:19 PM

#

i think you should be able to convert datatypes

#

even with the commas

#

pithink

strange stump Mar 28, 2022, 12:19 PM

#

nah i tried man

#

😦

misty flint Mar 28, 2022, 12:20 PM

#

which function did you use

strange stump Mar 28, 2022, 12:20 PM

#

no i tried to make stuff like

#

heatmaps

#

to find correlations

#

.corr()

#

uh

#

describe()

misty flint Mar 28, 2022, 12:20 PM

#

you cant do that stuff

#

without converting datatypes first

#

you basically have dirty data

#

gotta clean it first

strange stump Mar 28, 2022, 12:21 PM

#

so they want me to clean it and find trends within 30 mins

#

nice

misty flint Mar 28, 2022, 12:23 PM

#

yeah its still much for a junior

#

i wouldve probably done it all in excel tbh

steady basalt Mar 28, 2022, 12:23 PM

#

Did u remove missing values

#

And encode

misty flint Mar 28, 2022, 12:23 PM

#

since you didnt really need python

steady basalt Mar 28, 2022, 12:24 PM

#

I usually at least remove nan before I do heatmap tbh

#

@strange stump yes literally everywhere is expecting 20 year olds to be pro rn

#

It’s the new meta

strange stump Mar 28, 2022, 12:25 PM

#

😦

#

i have more experience in python rex

#

i used python to analyse data for uni work

steady basalt Mar 28, 2022, 12:25 PM

#

At least u got that job dude, some of us data scientists have to claw our way up from junior roles and internships doing exel

strange stump Mar 28, 2022, 12:25 PM

#

since it looks better on reports

steady basalt Mar 28, 2022, 12:25 PM

#

And have training in ML

strange stump Mar 28, 2022, 12:25 PM

#

steady basalt At least u got that job dude, some of us data scientists have to claw our way up...

i dont have the job this was the first half of the interview

steady basalt Mar 28, 2022, 12:26 PM

#

Ohh

#

U applied to a pretty hard job for ur skills then haha

strange stump Mar 28, 2022, 12:26 PM

#

yea i should probably just look elsewhere

steady basalt Mar 28, 2022, 12:26 PM

#

Good luck

#

Nah if u can get this done u have what it takes

strange stump Mar 28, 2022, 12:27 PM

#

probably

steady basalt Mar 28, 2022, 12:27 PM

#

But I’d not want to have to learn pandas in one day lol

#

Ok just check the data for NAN values

#

And if there’s any consider filling them with column medians or modes

#

The data is integer?

misty flint Mar 28, 2022, 12:28 PM

#

strange stump i have more experience in python rex

i see. i think you just got unlucky bud

strange stump Mar 28, 2022, 12:28 PM

#

i just converted the data into integers

#

by removing all the commas they had

misty flint Mar 28, 2022, 12:28 PM

#

ok gtg bye

strange stump Mar 28, 2022, 12:28 PM

#

cya boss

#

i checked for null values

#

isnull().sum()

#

all gave 0

#

now i try to use corr() and the output is "__"

steady basalt Mar 28, 2022, 12:29 PM

#

Cause commas?

#

Did u check data type

strange stump Mar 28, 2022, 12:29 PM

#

there arent any commas now

#

yeah

#

says int 64

steady basalt Mar 28, 2022, 12:29 PM

#

Umm

strange stump Mar 28, 2022, 12:30 PM

#

do i need them as floats

#

wait what

steady basalt Mar 28, 2022, 12:30 PM

#

Dm me a screenshot of the df and the matrix code

strange stump Mar 28, 2022, 12:30 PM

#

matrix code?

steady basalt Mar 28, 2022, 12:30 PM

#

If in doubt google ur question

strange stump Mar 28, 2022, 12:30 PM

#

do you want my python stuff?

steady basalt Mar 28, 2022, 12:31 PM

#

Did u try to google it

#

Chances are someone’s posted on stack overflow this question

strange stump Mar 28, 2022, 12:31 PM

#

"code featured in the movie matrix"

#

....

steady basalt Mar 28, 2022, 12:31 PM

#

Google pandas corr giving ___

strange stump Mar 28, 2022, 12:32 PM

#

OMG IT IS FLOATS

#

bro i kid you not i searched how to convert it to float, copied the code and it coverts it into int64

steady basalt Mar 28, 2022, 12:34 PM

#

What?

#

Corr require float?

strange stump Mar 28, 2022, 12:34 PM

#

yea

steady basalt Mar 28, 2022, 12:34 PM

#

Haah TIL

strange stump Mar 28, 2022, 12:34 PM

#

i converted all the data into floats

#

which i thought thats what my code did when i searched for " convert column into float"

steady basalt Mar 28, 2022, 12:34 PM

#

Yeah learning process is literally how good are you at googling

strange stump Mar 28, 2022, 12:35 PM

#

nah this is bs ima complain about the time in the interview

#

but i think they will like that im trying again

#

hopefully anyway

steady basalt Mar 28, 2022, 12:35 PM

#

Well I could prob do this in under 10 mins

strange stump Mar 28, 2022, 12:35 PM

#

i feel like thats how it works for interviews

#

oh ok sorry mr pro

#

steady basalt Mar 28, 2022, 12:35 PM

#

😅

strange stump Mar 28, 2022, 12:35 PM

#

this is my heatmap i wanted

steady basalt Mar 28, 2022, 12:35 PM

#

Nice

strange stump Mar 28, 2022, 12:35 PM

#

but

steady basalt Mar 28, 2022, 12:35 PM

#

They gave u three columns?

strange stump Mar 28, 2022, 12:35 PM

#

the range is 0.992 - 1

#

they all have a strong correlation then?

steady basalt Mar 28, 2022, 12:36 PM

#

They’re highly correlated

#

Yes

strange stump Mar 28, 2022, 12:36 PM

#

just some stronger

#

nah i was just testing these columns

steady basalt Mar 28, 2022, 12:36 PM

#

Do it with all columns

#

Trust

strange stump Mar 28, 2022, 12:36 PM

#

i send a screenshot of the head of the dataset

steady basalt Mar 28, 2022, 12:36 PM

#

U can also mask half of that and save eyesore

strange stump Mar 28, 2022, 12:36 PM

#

some are worded answers

mild dirge Mar 28, 2022, 12:37 PM

#

If you haven't carefully tested something but so just "played around with a lot of different configurations", how can you neatly put this in a report?
Like we empirically found that this type of model gave the best results so we used this.

steady basalt Mar 28, 2022, 12:37 PM

#

Hmm by that I mean make it a triangle

mild dirge Mar 28, 2022, 12:37 PM

#

Or something of that nature

steady basalt Mar 28, 2022, 12:37 PM

#

I’d say that ahha

strange stump Mar 28, 2022, 12:38 PM

#

i know what youre saying yeah ill look into that later im happy i know how to do this now

steady basalt Mar 28, 2022, 12:38 PM

#

I’d just say an initial test proved certain models stronger

strange stump Mar 28, 2022, 12:38 PM

#

oh so supermoon

#

i wanna make a new column

#

Age of the ship

steady basalt Mar 28, 2022, 12:38 PM

#

Show us the matrix with all columns

strange stump Mar 28, 2022, 12:38 PM

#

2022 - the year built

#

how would i do that

steady basalt Mar 28, 2022, 12:38 PM

#

You have year built column

#

U can create a new column that passes exactly that formula

#

Ur gona need to google syntax

#

Are u applying to data analyst?

strange stump Mar 28, 2022, 12:39 PM

#

yes

#

ok i created it nvm

steady basalt Mar 28, 2022, 12:40 PM

#

USA?

strange stump Mar 28, 2022, 12:40 PM

#

UK

steady basalt Mar 28, 2022, 12:40 PM

#

Me too

#

I was under the impression most of those jobs are rly hard but maybe it’s regional

#

I’m prob gona have to do this type of work in my first year

#

No one wants a data scientist without analyst experience anymore

#

FMl

strange stump Mar 28, 2022, 12:41 PM

#

idk man i dont think im smart enough for this

steady basalt Mar 28, 2022, 12:41 PM

#

U already done have of it

#

It doesn’t get much harder

#

Now just do some plots

#

My hint is use pairplot by seaborn to scout out areas of interest

#

I do that

#

And just use matplotlib or pandas to plot bars

#

Or distributions

rich olive Mar 28, 2022, 12:42 PM

#

Guys I'm tryna self-teach python

#

As my first language and dip into programming

strange stump Mar 28, 2022, 12:43 PM

#

oh wtf

steady basalt Mar 28, 2022, 12:43 PM

#

U are in the data science room

rich olive Mar 28, 2022, 12:43 PM

#

And I have some super basic code that's not working.

steady basalt Mar 28, 2022, 12:43 PM

#

Ahhh

#

U found the hack for graphs

rich olive Mar 28, 2022, 12:43 PM

#

You guys don't do data science in python

steady basalt Mar 28, 2022, 12:43 PM

#

I do

rich olive Mar 28, 2022, 12:44 PM

#

I am building linear regression. That is data sciency

steady basalt Mar 28, 2022, 12:44 PM

#

Nice

strange stump Mar 28, 2022, 12:44 PM

#

none of these graphs are useful imo

steady basalt Mar 28, 2022, 12:44 PM

#

They are

#

Isn’t it a key part of analysing a data set

#

U now see where all the ships were built

#

Which years got more contracts

#

Now u can plot these individually and later remove the pairplot

strange stump Mar 28, 2022, 12:45 PM

#

i got rid of the unique identifier column

#

hm

steady basalt Mar 28, 2022, 12:45 PM

#

What exactly is the task

#

If it’s general analysis what’s bad about plotting to find which years were best

#

Or correlations

#

I mean some of those are just a literal extension of ur matrix

strange stump Mar 28, 2022, 12:46 PM

#

one of the questions was to find some trends

steady basalt Mar 28, 2022, 12:46 PM

#

U can plot the trends on a graph from ur matrix

#

As u can see those highly correlated dots

strange stump Mar 28, 2022, 12:48 PM

#

ok 😄

urban marlin Mar 28, 2022, 12:56 PM

#

#

so i was trying to train a model with tensorflow object detection module and this problem came up , can anybody tell me how to change checkpoint version to V2 ?

gusty forge Mar 28, 2022, 1:43 PM

#

Is it possible to convert an opencv model to tensorflow model?

#

Ultimately I just want to use the model to run in an Android app

next phoenix Mar 28, 2022, 1:55 PM

#

Found this. For advanced Python crash course : https://medium.com/coders-mojo/python-crash-course-part-2-78acc9694997?sk=ebf1a04a790fe88f6483e0d56f12fbf2

Medium

Python Crash Course — Part 2

With Code Implementation…

tacit grail Mar 28, 2022, 1:57 PM

#

Thanks @modest mulch for response.
my application is following:
There will be an online examination system.
In the question, the attached image will be shown. (image may be vary) and asks student to create the same in ms word.

our program collects all student's created word doc and compare with our word document.
I want to make a program that give scores based how created document is similar to provided one.

this will compare template
font size
color
font-face

steady basalt Mar 28, 2022, 1:59 PM

#

Can’t u just use ur eyes and look if it’s the same

#

To check for typos just use a text filter

#

Otherwise you’re going to need some state of the art computer vision

mild dirge Mar 28, 2022, 2:06 PM

#

tacit grail Thanks <@!835478920337424435> for response. my application is following: There w...

Wouldn't students just be able to copy the image and paste it in word?

#

What info are you planning on feeding the model?

steady basalt Mar 28, 2022, 2:10 PM

#

mild dirge Wouldn't students just be able to copy the image and paste it in word?

lol

#

This a good example of using AI for a simple task that would require some really advanced AI or no AI

lone drum Mar 28, 2022, 2:14 PM

#

Traceback (most recent call last):
  File "D:\college_project\modules\model_train.py", line 21, in <module>
    model.add(Convolution2D(16, 3, 3, activation = 'relu'))

  File "C:\Users\shubh\anaconda3\lib\site-packages\tensorflow\python\training\tracking\base.py", line 629, in _method_wrapper
    result = method(self, *args, **kwargs)

  File "C:\Users\shubh\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "C:\Users\shubh\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2013, in _create_c_op
    raise ValueError(e.message)

ValueError: Exception encountered when calling layer "conv2d_1" (type Conv2D).

Negative dimension size caused by subtracting 3 from 1 for '{{node conv2d_1/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 3, 3, 1], use_cudnn_on_gpu=true](Placeholder, conv2d_1/Conv2D/ReadVariableOp)' with input shapes: [?,1,1,8], [3,3,8,16].

Call arguments received:
  • inputs=tf.Tensor(shape=(None, 1, 1, 8), dtype=float3``` how to fix this error

desert oar Mar 28, 2022, 2:14 PM

#

steady basalt This a good example of using AI for a simple task that would require some really...

the old classic. so many tasks are still more or less intractable or unsolved for "AI" and machine learning, but are trivial for humans (if perhaps slow/tedious)

#

you could just look at some image similarity metric

#

but i have a feeling that will not be easy to tune and will not reliably give good results

#

you'd need to segment the image and compute similarities on various parts + some kind of graph similarity for the overall structure

mild dirge Mar 28, 2022, 2:15 PM

#

mild dirge Wouldn't students just be able to copy the image and paste it in word?

yeah but this

desert oar Mar 28, 2022, 2:16 PM

#

you'd spend 5x as long building the model as you would grading by hand

#

and yeah that too lol

mild dirge Mar 28, 2022, 2:16 PM

#

You have to use more data to make sure it's not just that

desert oar Mar 28, 2022, 2:16 PM

#

well you'd have to make it low res in the exam, too low to scale up properly to the document size

wicked grove Mar 28, 2022, 2:25 PM

#

desert oar well you'd have to make it low res in the exam, too low to scale up properly to ...

Hello, is this only for binary classification https://www.tensorflow.org/api_docs/python/tf/nn/weighted_cross_entropy_with_logits

TensorFlow

tf.nn.weighted_cross_entropy_with_logits | TensorFlow Core v2.8.0

Computes a weighted cross entropy.

misty flint Mar 28, 2022, 2:26 PM

#

steady basalt To check for typos just use a text filter

kekHands

misty flint Mar 28, 2022, 2:26 PM

#

steady basalt This a good example of using AI for a simple task that would require some really...

kekHands

#

bruh

#

first rule of google's ML

#

solve the problem without ML if you can

#

kekHands

desert oar Mar 28, 2022, 2:29 PM

#

wicked grove Hello, is this only for binary classification https://www.tensorflow.org/api_doc...

no

wicked grove Mar 28, 2022, 2:30 PM

#

desert oar no

Oh okay thank you, could you please tell me what i should put for logits in this function

desert oar Mar 28, 2022, 2:30 PM

#

wicked grove Oh okay thank you, could you please tell me what i should put for logits in this...

can you clarify this question?

#

the logits are the outputs of your model

#

basically the stuff that comes out of the final output layer, before applying softmax

#

they are called "logits" because conceptually they are the result of applying the logit function to the predicted probabilities

wicked grove Mar 28, 2022, 2:32 PM

#

desert oar basically the stuff that comes out of the final output layer, _before_ applying ...

Ohhh
Is that the y_pred?

desert oar Mar 28, 2022, 2:32 PM

#

https://stackoverflow.com/a/43577384/2954547

Stack Overflow

What is the meaning of the word logits in TensorFlow?

In the following TensorFlow function, we must feed the activation of artificial neurons in the final layer. That I understand. But I don't understand why it is called logits? Isn't that a mathemati...

desert oar Mar 28, 2022, 2:32 PM

#

wicked grove Ohhh Is that the y_pred?

no, it's the "raw" values that come out of the final layer

arctic blade Mar 28, 2022, 2:32 PM

#

What would happen if somebody made self aware ai?

serene scaffold Mar 28, 2022, 2:33 PM

#

arctic blade What would happen if somebody made self aware ai?

it would realize how horrible the world is and delete itself.

arctic blade Mar 28, 2022, 2:33 PM

#

serene scaffold it would realize how horrible the world is and delete itself.

Lol

#

I was wondering if it would be like the matrix smh😂

serene scaffold Mar 28, 2022, 2:33 PM

#

in either case, a "self-aware AI" is a long way out. the way AI is depicted in the media is just wrong.

arctic blade Mar 28, 2022, 2:34 PM

#

serene scaffold in either case, a "self-aware AI" is a long way out. the way AI is depicted in t...

Whats the most developed ai we have developed as of now?

wicked grove Mar 28, 2022, 2:34 PM

#

desert oar https://stackoverflow.com/a/43577384/2954547

Ohhh okayy,thank you so much!!
So i have a question, what do i put in the argument.. this tf.nn.weighted_cross_entropy_with_logits takes 3 arguments

serene scaffold Mar 28, 2022, 2:35 PM

#

arctic blade Whats the most developed ai we have developed as of now?

most developed ai for what? AIs are designed to solve specific problems.

wicked grove Mar 28, 2022, 2:35 PM

#

Labels,logits and pos_weight

arctic blade Mar 28, 2022, 2:35 PM

#

serene scaffold most developed ai *for what*? AIs are designed to solve specific problems.

Oh, i know nothing about ai lol, just a general curiosity

#

Idk

#

Whats the most impressive ai then

mild dirge Mar 28, 2022, 2:36 PM

#

stuff like alexa and google home is pretty impressive

serene scaffold Mar 28, 2022, 2:36 PM

#

arctic blade Whats the most impressive ai then

uhh, GPT-3 is a model that's able to generate long realistic-sounding texts, but that doesn't mean that the model actually "knows" what the text means.

agile cobalt Mar 28, 2022, 2:37 PM

#

Alpha Zero (or AlphaGo) has some nice advancements in complex-ish games
Nvidia has some crazy image manipulation stuff

desert oar Mar 28, 2022, 2:37 PM

#

yeah i'd say that the alpha-stuff is probably the most-developed for general-purpose problem solving, at least that the public knows about

arctic blade Mar 28, 2022, 2:37 PM

#

mild dirge stuff like alexa and google home is pretty impressive

Thats ai? It thought it was just a box thats told ‘heres 4000 odd different ways to ask what the weather is, if ur asked this, talk about weather’

arctic blade Mar 28, 2022, 2:37 PM

#

serene scaffold uhh, GPT-3 is a model that's able to generate long realistic-sounding texts, but...

That sounds interesting

mild dirge Mar 28, 2022, 2:37 PM

#

Ai is not machine learning persé, it's just something "intelligent"

#

very broad

desert oar Mar 28, 2022, 2:37 PM

#

arctic blade Thats ai? It thought it was just a box thats told ‘heres 4000 odd different ways...

that's literally what people thought AI would be for ~50 years. look up "expert systems" and "symbolic AI"

mild dirge Mar 28, 2022, 2:38 PM

#

But it definitely uses machine learning too

serene scaffold Mar 28, 2022, 2:38 PM

#

arctic blade Thats ai? It thought it was just a box thats told ‘heres 4000 odd different ways...

yes, there's a few AI components for those products. the intent classifier figures out what you're asking it to do. the automated question/answerer takes a question and searches for text that answers it.

desert oar Mar 28, 2022, 2:38 PM

#

the definition of "AI" is fuzzy and has been co-opted by marketing teams to sell machine learning products

arctic blade Mar 28, 2022, 2:38 PM

#

I see

wicked grove Mar 28, 2022, 2:38 PM

#


           0       0.96      0.97      0.97       100
           1       0.93      0.74      0.82       100
           2       0.78      0.93      0.85       100

    accuracy                           0.88       300
   macro avg       0.89      0.88      0.88       300
weighted avg       0.89      0.88      0.88       300

wicked grove Mar 28, 2022, 2:38 PM

#

desert oar no, it's the "raw" values that come out of the final layer

This is my report,And i wanted to improve class1's recall

serene scaffold Mar 28, 2022, 2:39 PM

#

@arctic blade Google (the search engine) is an AI: it's a document retrieval system that figures out what documents (web pages) are relevant to your query.

arctic blade Mar 28, 2022, 2:39 PM

#

serene scaffold <@!808985689927450645> Google (the search engine) is an AI: it's a document retr...

I never thought about that, thats pretty cool

desert oar Mar 28, 2022, 2:40 PM

#

wicked grove Ohhh okayy,thank you so much!! So i have a question, what do i put in the argum...

pos_weight allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.
this is what the docs say

#

so you can try adjusting pos_weight above 1

#

the docs are pretty clear, they even have formulas

#

for one specific class, you might need to specifically assign class weights

wicked grove Mar 28, 2022, 2:40 PM

#

Yupp i saw this,but should i pass the fully connected layer as the logits

wicked grove Mar 28, 2022, 2:41 PM

#

desert oar for one specific class, you might need to specifically assign class weights

Ohhh

desert oar Mar 28, 2022, 2:41 PM

#

ah, it looks like pos_weights can be a vector

#

so you can assign different weights to different classes

#

A coefficient to use on the positive examples, typically a scalar but otherwise broadcastable to the shape of logits. Its value should be non-negative.

wicked grove Mar 28, 2022, 2:41 PM

#

wicked grove Ohhh

I wanted to do this jll look at that

wicked grove Mar 28, 2022, 2:43 PM

#

desert oar > A coefficient to use on the positive examples, typically a scalar but otherwis...

Ohh okayy,thank youu!! Just the last question, how can i pass the raw outputs

desert oar Mar 28, 2022, 2:43 PM

#

what do you mean?

#

how can you access the values without applying softmax?

#

you could just not put the softmax layer on the nn, and apply it manually when generating predictions. but i'm not sure what actual tf users do, let me see if i can figure it out

wicked grove Mar 28, 2022, 2:45 PM

#

desert oar how can you access the values without applying softmax?

Yeahh
tf.nn.weighted_cross_entropy_with_logits(
labels, logits, pos_weight, name=None
) like for this, i can put y for labels ,i gotta put the raw outputs for logits right?

desert oar Mar 28, 2022, 2:45 PM

#

yes, do not apply softmax to the logits. weighted_cross_entropy_with_logits does that internally

#

this is described in the page for https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits softmax_cross_entropy_with_logits

TensorFlow

tf.nn.softmax_cross_entropy_with_logits | TensorFlow Core v2.8.0

Computes softmax cross entropy between logits and labels.

#

ah wait

#

yeah nvm

wicked grove Mar 28, 2022, 2:46 PM

#

desert oar this is described in the page for https://www.tensorflow.org/api_docs/python/tf/...

Ohhhh,yes i came across this but is it the same as weighted cross entropy loss?

desert oar Mar 28, 2022, 2:46 PM

#

weighted_cross_entropy_with_logits applies sigmoid, not softmax

#

the docs say that

#

sorry i misread

wicked grove Mar 28, 2022, 2:47 PM

#

Ohh okayy, it's alrightt

wicked grove Mar 28, 2022, 2:47 PM

#

desert oar this is described in the page for https://www.tensorflow.org/api_docs/python/tf/...

Should i use this instead?but it isnt similar weighted cross entropy loss is it?

modest mulch Mar 28, 2022, 2:52 PM

#

@desert oar yo man, do you have any idea on using GANS for generating object on images (the output of GANS could directly be fed into an object detector)

misty flint Mar 28, 2022, 3:23 PM

#

your question doesnt make any sense. you can just do one and then the other afterwards

desert oar Mar 28, 2022, 3:31 PM

#

wicked grove Should i use this instead?but it isnt similar weighted cross entropy loss is it?

sigmoid applies the sigmoid function to each component individually, softmax applies softmax to all components

#

sigmoid is for independent binary classes, softmax is for 1 mutually exclusive set of classes

wicked grove Mar 28, 2022, 3:36 PM

#

desert oar sigmoid is for independent binary classes, softmax is for 1 mutually exclusive s...

Okayy yess correct, thank you so much ,i understood

wicked grove Mar 28, 2022, 3:37 PM

#

desert oar this is described in the page for https://www.tensorflow.org/api_docs/python/tf/...

But this doesn't have the weights options so that i can penalize one class

#

https://gist.github.com/wassname/ce364fddfc8a025bfab4348cf5de852d do you think this is incorrect

Gist

Keras weighted categorical_crossentropy (please read comments for u...

Keras weighted categorical_crossentropy (please read comments for updated version) - keras_weighted_categorical_crossentropy.py

mint palm Mar 28, 2022, 3:42 PM

#

#

#

which is better?

grave frost Mar 28, 2022, 3:42 PM

#

arctic blade Whats the most impressive ai then

GPT3 - the fact that it can learn things is pretty insane

mint palm Mar 28, 2022, 3:44 PM

#

both were trained on something like this

grave frost Mar 28, 2022, 3:44 PM

#

on the surface, it often looks like a "stochastic parrot" (I certainly thought so too) but its really from some digging that one actually understands how much it can do as compared to previous methods

#

as much as people hate calling it "intelligent" on the internet - those are usually ones posting blogs who live in an extreme, expecting GPT3 to be skynet-like AGI

#

while in the academic community, its mostly GPT3's meta-learning capabilities that really astound. Its completely unexpected, was never thought to be emergent yet the model managed to do it a bit... just by being pre-trained on MLM 🤔

mild dirge Mar 28, 2022, 3:50 PM

#

mint palm

This one gives better test accuracy, so if this test data represent new data well, then this one is better

#

Also using too many epochs can cause overfitting

#

The model converged way before 5 epochs, let alone 30

mint palm Mar 28, 2022, 3:51 PM

#

yup

mint palm Mar 28, 2022, 3:51 PM

#

mild dirge The model converged way before 5 epochs, let alone 30

i will try one 100 epoch....then see how it fits.....for the report i will adjust epochs accordingly

mild dirge Mar 28, 2022, 3:52 PM

#

why try 100?

#

30 is too many

mint palm Mar 28, 2022, 3:52 PM

#

i wanna see....

mild dirge Mar 28, 2022, 3:52 PM

#

you want to see it overfitting? 😛

mint palm Mar 28, 2022, 3:52 PM

#

i want to add in a report....isnt lower epoch graph very noisy

mild dirge Mar 28, 2022, 3:52 PM

#

you can average it over multiple runs

zinc sparrow Mar 28, 2022, 3:53 PM

#

Hey all! Old timer AI guy here - used to run my own C++ libraries - how are y'all running performant python code?

mint palm Mar 28, 2022, 3:53 PM

#

mint palm both were trained on something like this

and what do you guys think of this, i tried like 30 models before i settled on this

mild dirge Mar 28, 2022, 3:53 PM

#

zinc sparrow Hey all! Old timer AI guy here - used to run my own C++ libraries - how are y'al...

Using pytorch

zinc sparrow Mar 28, 2022, 3:53 PM

#

Gotcha... So precompiled code, eh?

#

Cool, thanks

mild dirge Mar 28, 2022, 3:53 PM

#

You can run it on your cpu and gpu

#

multiple gpu's / machines even

zinc sparrow Mar 28, 2022, 3:53 PM

#

Yeah, that's a given

mild dirge Mar 28, 2022, 3:54 PM

#

yeah but it has a nice api to do that, you don't have to figure that all out yourself from scratch

zinc sparrow Mar 28, 2022, 3:54 PM

#

Just needed to confirm my suspicion and an argument I've had with a colleague that debated that python was as performant as C -eyeroll-

mild dirge Mar 28, 2022, 3:54 PM

#

well pytorch is mostly written in c++ iirc

zinc sparrow Mar 28, 2022, 3:55 PM

#

mild dirge yeah but it has a nice api to do that, you don't have to figure that all out you...

Yeah... I'll check the APIs, Thanks!

zinc sparrow Mar 28, 2022, 3:55 PM

#

mild dirge well pytorch is mostly written in c++ iirc

Yeah, but it's not python 😄 The argument has another background... Just a young fellow trying to flex his python skills on me saying he could make better performance code with pure python...

#

Sorry about off topic, carry on!

mild dirge Mar 28, 2022, 3:56 PM

#

ah right haha. Well it'll run pretty fast but it's not the python code making it happen 😛

lone drum Mar 28, 2022, 4:10 PM

#

my error https://paste.pythondiscord.com/citaneweho my code
https://paste.pythondiscord.com/abohecepas here

#

Traceback (most recent call last):

  File "D:\college_project\modules\untitled0.py", line 56, in <module>
    model.fit(train_generator, epochs=5, validation_data=validation_generator)

  File "C:\Users\shubh\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "C:\Users\shubh\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

InvalidArgumentError: Graph execution error:```

mint palm Mar 28, 2022, 4:10 PM

#

#

this is 10 epochs....sound good?

lone drum Mar 28, 2022, 4:11 PM

#

lone drum ```python Traceback (most recent call last): File "D:\college_project\modules...

i tried searching SO and tuning the parameters but unable to fix this error ping me when u reply

mild dirge Mar 28, 2022, 4:12 PM

#

btw how does your model have 94% accuracy on epoch 0?

agile cobalt Mar 28, 2022, 4:13 PM

#

mint palm this is 10 epochs....sound good?

yeah suspiciously high
if that's for a classification task, how imbalanced are your categories?

mild dirge Mar 28, 2022, 4:13 PM

#

Assuming epoch 0 is not trained

wicked grove Mar 28, 2022, 4:14 PM

#

loss = tf.nn.softmax_cross_entropy_with_logits(labels=[[1. 0. 0.] [0. 0. 1.] [0. 1. 0.]], logits=output, axis=-1, name=None)```

#

i am getting an invalid syntax error

#

i cant understand it

mint palm Mar 28, 2022, 4:16 PM

#

agile cobalt yeah suspiciously high if that's for a classification task, how imbalanced are y...

yes classification
3 category
2 : 1.2 : 1.25

#

i have 200,000 of category A
120,000 of B
126,000 of c

agile cobalt Mar 28, 2022, 4:18 PM

#

maybe double check if there's any data leakage?

mild dirge Mar 28, 2022, 4:18 PM

#

you should probably just stop after epoch 2

#

Well with that much data it can be possible as long as the function is not too complex

mint palm Mar 28, 2022, 4:19 PM

#

agile cobalt maybe double check if there's any data leakage?

i have checked many times...i just think i overlook....its becoming nightmare....

#

this is my code:

#

https://colab.research.google.com/drive/1P31Cf0dnlifL5RcrcIbO9OZudnZsJV_V?usp=sharing

Google Colaboratory

agile cobalt Mar 28, 2022, 4:33 PM

#

maybe see how well it works with other forms of train/test splitting

#

the simplest would be just leaving a slightly larger group out, and not shuffling the data

mint palm Mar 28, 2022, 4:35 PM

#

ok

wicked grove Mar 28, 2022, 4:41 PM

#

agile cobalt maybe see how well it works with other forms of train/test splitting

Hello,could you please help me with this error

loss = tf.nn.softmax_cross_entropy_with_logits(labels=[[1. 0. 0.] [0. 0. 1.] [0. 1. 0.]], logits=output, axis=-1, name=None)

agile cobalt Mar 28, 2022, 4:44 PM

#

you cannot simply copy paste a numpy array like that

mint palm Mar 28, 2022, 4:45 PM

#

agile cobalt you cannot simply copy paste a numpy array like that

#

same

#

but i checked the dataset

#

its quite shuffled....like category a, b, c is present in chunks of 100

wicked grove Mar 28, 2022, 4:45 PM

#

agile cobalt you cannot simply copy paste a numpy array like that

it's one hot encoded,but what should i do?

#

i tried with floating points but that gave errors as well,so idk what i should do for th labels

bold timber Mar 28, 2022, 4:47 PM

#

my friend want to install anaconda, but his get like this. how to fix this?

mild dirge Mar 28, 2022, 4:48 PM

#

whatever place they try to install it in has 2 spaces in the name which apparently anaconda does not like

misty flint Mar 28, 2022, 4:48 PM

#

mild dirge multiple gpu's / machines even

did someone say multiple gpu's ID_blurryeyes

#

look @serene scaffold

#

RunFail

bold timber Mar 28, 2022, 4:51 PM

#

mild dirge whatever place they try to install it in has 2 spaces in the name which apparent...

ok thank you

misty flint Mar 28, 2022, 5:04 PM

#

i wonder who makes that call thats like

#

this model is way too big we need to train on multiple gpu's

#

kekHands

#

maybe its more like, this is never converging

#

lets try multiple gpu's

#

kekHands

serene river Mar 28, 2022, 5:18 PM

#

Hi, sorry for bothering, i'm encountering a problem at plotting streamline in matplotlib. I plotted vectors but then i need to use Euler's method (or Runge Kutta) to trace the streamlines. I have no idea on how to start and what result I should get

inland mantle Mar 28, 2022, 5:19 PM

#

I’m still learning about a career I want to do, so I am thinking of choosing a career in deep learning with a specialty in computer vision

#

Does computer graphics include computer vision?

elfin merlin Mar 28, 2022, 5:27 PM

#

Hey guys, I have a computer vision problem. I am using openCV but since there is no computer vision chat I figure data science is the closest thing to the problem that I am having. I am using OpenCV in python. I have a color image and a binary mask image (0 to 255). I want to instead have a color image with the mask applied.

 full_mask_bgr = cv2.cvtColor(full_mask,cv2.COLOR_GRAY2BGR)
 full_mask_bgr[full_mask_bgr==255]=1.0
 img2 = np.multiply(img, full_mask_bgr)```

I am able to do this by doing these functions.  First: convert from grayscale to bgr, then convert all the 255 (white) values of mask to 1 then multiply the original bgr image by the 0,1 mask.
The only problem with this solution is that its slow as hell.  Is there a better way to do this?

misty flint Mar 28, 2022, 5:28 PM

#

inland mantle Does computer graphics include computer vision?

hmm you usually see the two going separate routes but im sure theres ways to combine them (i.e. GANs + 3D modeling, etc.)

#

maybe try doing a project in both and seeing how you feel about it

inland mantle Mar 28, 2022, 5:29 PM

#

Yeah I’ll see I’m just exploring rn

misty flint Mar 28, 2022, 5:30 PM

#

same im interested in 3 things atm

#

hopefully i decide on 1 before i graduate kekHands

#

~~which is soon~~

elfin merlin Mar 28, 2022, 5:30 PM

#

I love computer vision but Im probably going to go into app dev instead

misty flint Mar 28, 2022, 5:30 PM

#

mobile or web

elfin merlin Mar 28, 2022, 5:30 PM

#

Mobile (probably)

misty flint Mar 28, 2022, 5:31 PM

#

gotcha. theres still opportunities to apply CV in that space

#

i also dont know the answer to your question since im not really a CV guy kekHands

#

we also did our stuff with matlab, which has tons of image processing functions DoggoKek

elfin merlin Mar 28, 2022, 5:32 PM

#

I think theres got to be a way to do it in numpy or opencv but the way I did it is so roundabout and my camera is now like 1 fps

misty flint Mar 28, 2022, 5:34 PM

#

yeah someone who knows opencv well could probs answer your question

elfin merlin Mar 28, 2022, 5:34 PM

#

I got to figure this out. Our robotics competition is this friday and those 3 lines of code are slowing down the robot

inland mantle Mar 28, 2022, 5:39 PM

#

misty flint i also dont know the answer to your question since im not really a CV guy <:kekH...

Would CV and augmented reality work hand in hand

misty flint Mar 28, 2022, 5:51 PM

#

elfin merlin I got to figure this out. Our robotics competition is this friday and those 3 l...

oh shoot...i would try asking in ML-specific servers or at different times since some peeps live in dif timezones

misty flint Mar 28, 2022, 5:52 PM

#

inland mantle Would CV and augmented reality work hand in hand

hmm idk

#

one is more i believe analyzing the data, while the other is generating it

#

PikaThink

#

but maybe

inland mantle Mar 28, 2022, 5:53 PM

#

Ah I see

steady basalt Mar 28, 2022, 5:53 PM

#

arctic blade What would happen if somebody made self aware ai?

Depends if it’s a human

misty flint Mar 28, 2022, 5:53 PM

#

inland mantle Ah I see

i think if you get really good at maybe GANs, you could produce better AR stuff

#

blobhyperthink

steady basalt Mar 28, 2022, 5:54 PM

#

In my opinion if we built a neural network that’s a 1:1 replica of the brain and raised it like a child would it know if it’s inside a computer ? If so it wud wana kill itself

mild dirge Mar 28, 2022, 5:58 PM

#

supermoon, this is going to be hard to break it to you but...

steady basalt Mar 28, 2022, 6:21 PM

#

mild dirge supermoon, this is going to be hard to break it to you but...

But?

misty flint Mar 28, 2022, 6:33 PM

#

but...

tight flare Mar 28, 2022, 6:33 PM

#

elfin merlin Hey guys, I have a computer vision problem. I am using openCV but since there i...

is full_mask_bgr the binary image? The binary image that you're converting to BGR and setting all the 255 values to 1 and then multiplying it with another BGR image?

acoustic peak Mar 28, 2022, 6:43 PM

#

elfin merlin Hey guys, I have a computer vision problem. I am using openCV but since there i...

Instead of np.multiply in your third line, can you delete your second line and use cv2.bitwise_and(img, full_mask_bgr) instead? It should work if full_mask_bgr only contains 0 and 255.

misty flint Mar 28, 2022, 6:46 PM

#

ah here are the peeps that know opencv

#

kekHands

gilded bobcat Mar 28, 2022, 6:46 PM

#

Hi all I ahd a question on using "feature importance" with sklearn?

#

Namely, I ran a tree and took the most important features to determine if an animal would be adopted (so 0 or 1). I get these results.

#

However my question is this: How do I know if these features are important to classify the observation as adopted (1) or not adopted (0)?

#

Like "Sex upon Intake Unknown" is def important, but its important to classify an obs as 1 or 0?!

agile cobalt Mar 28, 2022, 6:49 PM

#

that depends on:

which model are you using
what is the scale of the variables
what is the intercept (assuming that it has one)

#

if it's a LogisticRegression with the default parameters, check the model's intercept_

gilded bobcat Mar 28, 2022, 6:51 PM

#

Good points, I am having a hard time on thinking of how to use that info to determine this. If it helps:

Using Adaboost over my data
Everything is not scaled in any particular fashion, so no change from raw (I read that scaling data in tree's does nothing, but maybe actually harms the predictive power, not too sure)
None, it's a tree (?)

#

Very much learning

#

So sorry for obvious stupid replies lol