potent sky Jun 7, 2023, 6:23 PM

#

Using latest_checkpoint() method? I think it takes as arg the directory where you have those checkpoints

past meteor Jun 7, 2023, 6:24 PM

#

I'm not using it for anything particularly important downstream hence why I was worried about the overengineering part. At best it's just something we might use if we're doing workshops and we want to quickly generate data on a topic if we want to talk about it.

brave sand Jun 7, 2023, 6:24 PM

#

potent sky Using latest_checkpoint() method? I think it takes as arg the directory where yo...

I used this to generate the .ckpt file:

import tensorflow as tf

# Path to the .meta file
meta_path = '/home/ethan/UAS4STEM/models/checkpoints/model.ckpt.meta'

# Create a session
with tf.compat.v1.Session() as sess:
    # Restore the graph structure from the .meta file
    saver = tf.compat.v1.train.import_meta_graph(meta_path)
    # Save the variables to a .ckpt file
    saver.restore(sess, '/home/ethan/UAS4STEM/models/checkpoints/model.ckpt')

potent sky Jun 7, 2023, 6:25 PM

#

past meteor I'm not using it for anything particularly important downstream hence why I was ...

are there privacy protection concerns? for the real data you're using to condition this model

potent sky Jun 7, 2023, 6:26 PM

#

brave sand I used this to generate the .ckpt file: ```cpp import tensorflow as tf # Path t...

ngl these days I want to give up as soon as I see tf.compat.v1
where from are you following this tutorial

brave sand Jun 7, 2023, 6:26 PM

#

potent sky ngl these days I want to give up as soon as I see `tf.compat.v1` where from are ...

I googled it, not a tutorial. the tutorial doesn't go into depth enough

potent sky Jun 7, 2023, 6:26 PM

#

brave sand I used this to generate the .ckpt file: ```cpp import tensorflow as tf # Path t...

this loads the ckpt file, not generates it right?

brave sand Jun 7, 2023, 6:27 PM

#

oh

#

what should it be then? also, what is a good value for num_steps?

past meteor Jun 7, 2023, 6:27 PM

#

potent sky are there privacy protection concerns? for the real data you're using to conditi...

No, it's really something like where we want to quickly generate high-quality looking BS data on say health and someone draws up the PGM with some realistic looking values / relationships and we spit out the output from the VAE (or directly from the PGM)

brave sand Jun 7, 2023, 6:27 PM

#

and batchsize

potent sky Jun 7, 2023, 6:27 PM

#

brave sand what should it be then? also, what is a good value for `num_steps`?

check out the tf docs for tf.train.latest_checkpoint()

I think it takes as arg the directory where you have those checkpoints

brave sand Jun 7, 2023, 6:28 PM

#

my batch_size is 64 and my num_steps is 1000

potent sky Jun 7, 2023, 6:29 PM

#

past meteor No, it's really something like where we want to quickly generate high-quality lo...

then maybe you can use the original data with either some augmentation, rule based generation, or purely a VAE on top
your idea is exciting tho

brave sand Jun 7, 2023, 6:29 PM

#

potent sky check out the tf docs for `tf.train.latest_checkpoint()` > I think it takes as a...

I used this guide from zestar
https://stackoverflow.com/questions/41265035/tensorflow-why-there-are-3-files-after-saving-the-model

Stack Overflow

TensorFlow, why there are 3 files after saving the model?

Having read the docs, I saved a model in TensorFlow, here is my demo code:

Create some variables.

v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...

Add an op to initialize ...

past meteor Jun 7, 2023, 6:29 PM

#

potent sky then maybe you can use the original data with either some augmentation, rule bas...

There is no original data

potent sky Jun 7, 2023, 6:29 PM

#

brave sand what should it be then? also, what is a good value for `num_steps`?

that really depends on your data, your learning problem, and other hparams
there is no universally good number

potent sky Jun 7, 2023, 6:30 PM

#

brave sand and batchsize

same for this but you don't want to keep it too small as the updates to the gradient might be "volatile" then

past meteor Jun 7, 2023, 6:30 PM

#

It's truly a case of drawing up a PGM for any topic you care about, picking values and then just simulating 😛

brave sand Jun 7, 2023, 6:30 PM

#

potent sky that really depends on your data, your learning problem, and other hparams there...

oh alright

#

yeah, i think the problem is that the .ckpt file is 0 bytes

#

which means it'll be stuck on the loading thing

#

i have no idea why it's 0 bytes though

potent sky Jun 7, 2023, 6:32 PM

#

past meteor There is no original data

ah then graphical model looks like a good idea to capture the relationships between the variables
what about rule based generation

brave sand Jun 7, 2023, 6:32 PM

#

the other 2 .ckpt files aren't 0 bytes

past meteor Jun 7, 2023, 6:32 PM

#

I haven't heard of rule based generation, I'll look it up

potent sky Jun 7, 2023, 6:32 PM

#

past meteor It's truly a case of drawing up a PGM for any topic you care about, picking valu...

also wdym by PGM

past meteor Jun 7, 2023, 6:32 PM

#

probabilistic graphical model, sorry

potent sky Jun 7, 2023, 6:34 PM

#

past meteor I haven't heard of rule based generation, I'll look it up

it won't be very robust, I think you must be aware of it under a different name
but what you could do otherwise is use a domain specific Generative model and use rule-based conditioning instead of explicit rule based generation

#

that should also make it a bit non-boring xd

#

I think there was a paper on smtg similar but I'm not sure

potent sky Jun 7, 2023, 6:35 PM

#

brave sand the other 2 .ckpt files aren't 0 bytes

how big is the .data0000-0001 file

brave sand Jun 7, 2023, 6:35 PM

#

potent sky how big is the .data0000-0001 file

67.5 MB

#

is that bad?

potent sky Jun 7, 2023, 6:38 PM

#

brave sand 67.5 MB

also how did you save these checkpoint files

potent sky Jun 7, 2023, 6:38 PM

#

brave sand is that bad?

no, you should be good

brave sand Jun 7, 2023, 6:38 PM

#

https://towardsdatascience.com/custom-object-detection-using-tensorflow-from-scratch-e61da2e10087
step 8

Medium

Custom Object Detection using TensorFlow from Scratch

Custom Dataset Training for Object Detection using TensorFlow | Dog Detection in Real time Videos | Perfect Guide for Object Detection

past meteor Jun 7, 2023, 6:39 PM

#

potent sky it won't be very robust, I think you must be aware of it under a different name ...

What is rule based conditioning? p(X|c_1) ~ N(a_1,b_1) and p(X|c_2) ~ N(a_2, b_2)? (sorry for the many questions)

potent sky Jun 7, 2023, 6:45 PM

#

past meteor What is rule based conditioning? p(X|c_1) ~ N(a_1,b_1) and p(X|c_2) ~ N(a_2, b_...

Yep yep
Wait I'm not sure about representing it as a normal distribution

potent sky Jun 7, 2023, 6:46 PM

#

past meteor What is rule based conditioning? p(X|c_1) ~ N(a_1,b_1) and p(X|c_2) ~ N(a_2, b_...

And np with the questions lol I learn a lot too

past meteor Jun 7, 2023, 6:46 PM

#

Yeah, I just took that one but it could be any distribution

#

It's been a while since I've done anything related to graphical models but I think that idea might be subsumed by them

potent sky Jun 7, 2023, 6:47 PM

#

past meteor What is rule based conditioning? p(X|c_1) ~ N(a_1,b_1) and p(X|c_2) ~ N(a_2, b_...

I was thinking if you have a good p(c_1|X) model and a good p(X) model then you could get p(X|c_1) as kp(c_1|X)p(X)

#

By model here I mean anything which can replicate the desired distribution

past meteor Jun 7, 2023, 6:48 PM

#

I think I'll just go over my old slides again but in the meantime I'll also think of conceptually simpler ways to do this 😛

potent sky Jun 7, 2023, 6:48 PM

#

past meteor It's been a while since I've done anything related to graphical models but I thi...

Maybe, my experience with graphical models is also limited

past meteor Jun 7, 2023, 6:49 PM

#

One of the many things of university that are squarely under YAGNI

potent sky Jun 7, 2023, 6:49 PM

#

past meteor I think I'll just go over my old slides again but in the meantime I'll also thin...

I always like an elegant statistical solution xd

past meteor Jun 7, 2023, 6:49 PM

#

Then you'll enjoy spending time with graphical models. They're really at the intersection of CS, stats and domain knowledge

#

https://dtai.cs.kuleuven.be/problog/ was peak YAGNI 😢

potent sky Jun 7, 2023, 6:50 PM

#

Yeah I was collaborating on a graphical models x federated learning project but I was mostly handling the fl part
Gotta dive into graphical models properly

potent sky Jun 7, 2023, 6:52 PM

#

past meteor https://dtai.cs.kuleuven.be/problog/ was peak YAGNI 😢

I hadn't come across this so far will have to check it out sometime xd

potent sky Jun 7, 2023, 6:53 PM

#

brave sand https://towardsdatascience.com/custom-object-detection-using-tensorflow-from-scr...

Yep this seems cool

#

Should work

past meteor Jun 7, 2023, 6:54 PM

#

Not a lot of value - many of my former profs peaked in the late 90s early 00s so a lot of time was spent on esoteric things like that or restricted boltzmann machines when more relevant techniques existed. Took "offense" with the RBMs because I'm a relatively recent grad and covering other generative methods in more detail would've been more relevant but I digress

potent sky Jun 7, 2023, 6:56 PM

#

past meteor Not a lot of value - many of my former profs peaked in the late 90s early 00s so...

Hmm fair enough
RBMs really kicked off the whole hierarchical representation thing tho, very impressive

#

But relevance is important yeah. Maybe a good overview of RBMs and then they could've moved on xd

past meteor Jun 7, 2023, 7:00 PM

#

RBMs are somewhat relevant but Hebbian learning and hopfield networks probably less so: https://arxiv.org/abs/2008.02217

potent sky Jun 7, 2023, 7:05 PM

#

past meteor RBMs are somewhat relevant but Hebbian learning and hopfield networks probably l...

Okay I'm not at all familiar with hopfield networks

#

Looks interesting. And that too goes onto the impossible pile of "I'll check it out"

past meteor Jun 7, 2023, 7:06 PM

#

Put it at the bottom of that pile. Either way, enjoy your day 🙂

potent sky Jun 7, 2023, 7:06 PM

#

Haha sure, you too!

brave sand Jun 7, 2023, 7:53 PM

#

@potent sky hey do you mind explain batch size?

crimson summit Jun 7, 2023, 7:55 PM

#

thank you !!!!!

#

I would not worry about if models have progressed or not I would learn the basics inside and out because then you will have a deep understanding that will allow you to use and build with the current models much better. I followed along building the neural network in a book called make your own neural network by tariq rashid. It was good and got the job done but theres def way better tutorials out there. I would also watch and understand all of 3blue1brown's videos on youtube the he has put out on neural networks

potent sky Jun 7, 2023, 8:02 PM

#

brave sand <@833644804670750750> hey do you mind explain batch size?

It's how many data instances are processed together (in batches)
It has several advantages like making better use of parallel processing, stabilizing gradient updates, introducing controlled stochasticity, etc.

brave sand Jun 7, 2023, 8:03 PM

#

oh ok, so not something i should be worrying about

potent sky Jun 7, 2023, 8:03 PM

#

You'd select your batch size depending on how your data is distributed and how many instances you want to process before taking a learning step
In case of larger models, it also depends on how much memory you have
Popular choices for batch size are 16, 32, 64 etc

potent sky Jun 7, 2023, 8:04 PM

#

brave sand oh ok, so not something i should be worrying about

Keep a decent batch size and no

brave sand Jun 7, 2023, 8:04 PM

#

https://www.youtube.com/watch?v=amURyS6CAaY
around 20:22, he sets the checkpoint to ckpt-0, I don't get how though, what does that represent

YouTube

techzizou

Train a Deep Learning model for custom object detection using Tenso...

This video shows step by step tutorial on how to train an object detection model for a custom dataset using TensorFlow 2.x. The custom object trained here is a face mask.

① ⚡⚡ My Website Blog post on this ⚡⚡
👉🏻 https://techzizou.com/training-an-ssd-model-for-a-custom-object-using-tensorflow-2-x/

② ⚡⚡ My Medium post on ⚡⚡
👉🏻 https://techzizou0...

▶ Play video

brave sand Jun 7, 2023, 8:04 PM

#

potent sky Keep a decent batch size and no

my batch_size right now is 64 with 16 gb ram

potent sky Jun 7, 2023, 8:04 PM

#

Cool enough

past meteor Jun 7, 2023, 8:05 PM

#

brave sand my batch_size right now is `64` with 16 gb ram

How much VRAM or are you not training on GPU?

brave sand Jun 7, 2023, 8:05 PM

#

I am training on GPU

#

8 GB vram

#

the process sis always killed though:
2023-06-07 16:05:21.075792: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_24' with dtype resource [[{{node Placeholder/_24}}]] Killed

vernal ore Jun 7, 2023, 8:25 PM

#

Is there a xgboost-gpu version for Windows 10? When I do pip install xgboost-gpu i get the error: ERROR: Could not find a version that satisfies the requirement xgboost-gpu (from versions: none)
ERROR: No matching distribution found for xgboost-gpu

past meteor Jun 7, 2023, 8:43 PM

#

vernal ore Is there a xgboost-gpu version for Windows 10? When I do pip install xgboost-gpu...

you should install regular xgboost, one of the parameters is running it on GPU

brave sand Jun 8, 2023, 1:33 AM

#

Can someone please help with this error:

         [[{{node Placeholder/_0}}]]
2023-06-07 21:29:23.493493: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_29' with dtype int64
         [[{{node Placeholder/_29}}]]
Killed

hollow meadow Jun 8, 2023, 2:58 AM

#

I need help with the code for paraphrasing the text, can someone here help or /i should ask in another channel?

serene scaffold Jun 8, 2023, 3:15 AM

#

hollow meadow I need help with the code for paraphrasing the text, can someone here help or /i...

You can ask here

hollow meadow Jun 8, 2023, 6:31 AM

#

serene scaffold You can ask here

I need help with the code I can't input the text in the text area anymore

alpine temple Jun 8, 2023, 8:39 AM

#

Way too ambiguous if you're after anything reliable.

#

Hey room, I'm trying to think of a way in which I can analyse the results of two different classification models that I made with 37 classes.

#

Here is a confusion matrix which is a little misleading, because the scales are not the same.

#

EfficientNet:

#

This is SqueezeNet

#

I also have the confusion matrix for each of these stored in a DataFrame.

past meteor Jun 8, 2023, 8:44 AM

#

top 5 accuracy and top 1 accuracy?

alpine temple Jun 8, 2023, 8:45 AM

#

1.) I will be adjusting the scales on the confusion matrix images.
2.) Do you have any ideas on how I should review why certain breeds of animal might have been misclassified?

potent sky Jun 8, 2023, 8:45 AM

#

hollow meadow I need help with the code I can't input the text in the text area anymore

!paste

arctic wedgeBOT Jun 8, 2023, 8:45 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

alpine temple Jun 8, 2023, 8:46 AM

#

past meteor top 5 accuracy and top 1 accuracy?

I'll def add that to the shortlist.

#

And by shortlist, I mean I'll do that.

hollow meadow Jun 8, 2023, 8:52 AM

#

potent sky !paste

I see so I can use that to paste the code?

#

thank you

potent sky Jun 8, 2023, 8:52 AM

#

hollow meadow I see so I can use that to paste the code?

yep and then u can share the link here

hollow meadow Jun 8, 2023, 9:03 AM

#

https://paste.pythondiscord.com/ucokovugib

I am not so good at this topic yet but maybe someone here is more knowledgeable. I am working with NLTK libraries and other tools for NLP, my goal is to paraphrise the text using grammatical rules to identify synonyms and conjugate them and then replace in the text. But those conjugating rules don't work properly and the words in text stay in their base form and don't even get replaces if there was synonym found. That's quite a lot of a problem but at least maybe someone can help me resolve the problem wiht grammatical rules not working, maybe I went wrong in the algorithm when writing it or else

#

or advice where I can ask about this so someone can help

alpine temple Jun 8, 2023, 10:53 AM

#

hollow meadow or advice where I can ask about this so someone can help

I would be recommending that you look into some of the latest research into Deep Learning and Transformers with Attention.

hollow meadow Jun 8, 2023, 10:54 AM

#

alpine temple I would be recommending that you look into some of the latest research into Deep...

Okay

alpine temple Jun 8, 2023, 10:54 AM

#

I would also specifically be looking at something like BERT.

#

Which is really good at understanding context.

vital widget Jun 8, 2023, 11:42 AM

#

Anyone working with GLCM, LBP and GABOR filters? I need to ask a few questions about my homework

rose dagger Jun 8, 2023, 12:13 PM

#

Are learning curves like these a sign of overfitting? Or does it look fine and i should just add more epochs?

mild dirge Jun 8, 2023, 12:18 PM

#

It looks like the accuracy is all over the place, did you test it on a very small number of text examples?

#

I doubt you can draw very many conclusions from this

rose dagger Jun 8, 2023, 12:19 PM

#

mild dirge It looks like the accuracy is all over the place, did you test it on a very smal...

Training data consists of 1280 images (320images, each image rotated 4 times by 90°). Validation data consists of 320 images ( 80 images, each image rotated 4 times by 90°)

cold osprey Jun 8, 2023, 12:20 PM

#

hmm the acc graph looks weird

rose dagger Jun 8, 2023, 12:20 PM

#

indeed it looks weird lol

mild dirge Jun 8, 2023, 12:20 PM

#

Yeah either your model changes very chaotically, or the measurement isn't very accurate

#

80 images (duplicated 3 times each) isn't that much data

#

Try running the model like 10 times, and then average over those 10 runs

#

Also think it's a bit sketchy to augment your test data, normally you want to keep it as is

rose dagger Jun 8, 2023, 12:22 PM

#

mild dirge Try running the model like 10 times, and then average over those 10 runs

Good idea.
My entire dataset consists of roughly 1600 images, so max i can do is ~1300 images as test data, then augment extra test data by rotating to get ~5000 images as train data.

mild dirge Jun 8, 2023, 12:23 PM

#

You shouldn't generally augment your test data

rose dagger Jun 8, 2023, 12:23 PM

#

Sorry i meant training data

mild dirge Jun 8, 2023, 12:24 PM

#

How have you split it now into train/validate/test?

#

Could also use k-fold cross validation

rose dagger Jun 8, 2023, 12:27 PM

#

mild dirge How have you split it now into train/validate/test?

I chose a random subset of size 400 of the dataset, augmented extra data, then split 80% into training data, 20% into validation data, i.e. no test data so far. The plots only show accuracy on training + validation data.

rose dagger Jun 8, 2023, 12:27 PM

#

mild dirge Could also use k-fold cross validation

Will do

mild dirge Jun 8, 2023, 12:27 PM

#

split it 80/20 intro train and test. Perform k-fold cross validate on the train data

#

And you could even run k-fold more times to average it over more runs

rose dagger Jun 8, 2023, 12:31 PM

#

Thank you so much for the quick help!

brave sand Jun 8, 2023, 12:44 PM

#

brave sand Can someone *please* help with this error: ```2023-06-07 21:29:23.493108: I tens...

does anyone know?

uncut iris Jun 8, 2023, 12:45 PM

#

https://colab.research.google.com/drive/1Z7f2nackDDo6vmBrTT6qdWyjGekZAh-6?usp=sharing

is anyone know why the result is so bad when i try in new image?

Google Colaboratory

quick bay Jun 8, 2023, 1:28 PM

#

Does anyone has any experience on doing classification of time series data?

past meteor Jun 8, 2023, 1:33 PM

#

quick bay Does anyone has any experience on doing classification of time series data?

What is you're setting, you have an entire time series and you want to classify it?

quick bay Jun 8, 2023, 1:41 PM

#

What I mean is like this.
I have a time series data set of sea level anomaly.
The feature in the dataset is 'sea level anomaly' and it has 3 classes (so the data is labeled) .
Class 0 : normal
Class 1: anomaly-nontsunami
Class 2: anomaly-tsunami
So basically, i would like to build a model to predict the classes based on the 'sea level anomaly'

past meteor Jun 8, 2023, 1:44 PM

#

Yes but it's still not clear.

Do you have a sequence of data points for which you want to make 1 prediction. For example, a full day's worth of measurements and you want to predict if an anomaly will occur the next day.

Or is your case one where you want to predict something for each subsequent input in the sequence.

quick bay Jun 8, 2023, 1:49 PM

#

Each subsequent input in the sequence

past meteor Jun 8, 2023, 1:53 PM

#

123456789 is our series

123 -> label(4) | 234 -> label(5) ...

That's how your problem is structured right?

#

What you could do is making rolling features. You take the last n features and you compute some stuff on that, for example summary statistics and that's your feature vector

#

What you have now is a standard multi-class classification problem. Packages like tsfresh do this out of the box: https://tsfresh.readthedocs.io/en/latest/text/forecasting.html I'm sure sktime and others have similar functionality

quick bay Jun 8, 2023, 1:58 PM

#

Would you like to discuss it in private chat? I'll send the dataset and i'll send what i've been working on, maybe yoy can evaluate whether what i'm doing make sense or not?

past meteor Jun 8, 2023, 1:58 PM

#

Better that we discuss it here. I might not be available later and there's smarter people in this channel that can correct me if I make a mistake so it's in your best interest as well 🙂

quick bay Jun 8, 2023, 2:01 PM

#

this is the data set

📎 IDSL-302_MARINAJAMBU_out_tsunami.csv

#

I have a note that shows me on what time the normal, anomaly tsunami and anomaly non tsunami occured

#

So i labeled it in the code

#

Do you think what i've been doing is right?

#

Please give me some enlightment

#

the sea level anomaly is from
sea level - tide

#

it is in meters

past meteor Jun 8, 2023, 2:08 PM

#

So your features are the last 35 lags?

quick bay Jun 8, 2023, 2:09 PM

#

including the sea level anomly 0, which is the original sea level anomaly value, so there are 37 in total

past meteor Jun 8, 2023, 2:09 PM

#

The notebook is quite long, looking at it from afar it seems good

quick bay Jun 8, 2023, 2:10 PM

#

sea level anomaly 0 - 35 and the minute

past meteor Jun 8, 2023, 2:10 PM

#

It's very close to what I proposed, the extra I proposed was essentially making more / different features on the basis of your lags. Whether or not those make sense is problem dependent

#

I'm not a fan of class weighting to solve imbalance, better to reason about the precision-recall tradeoff by yourself and pick an operating point

#

Finally, the only xgboost hyperparameter I'd tune is the number of estimators. It's the one that has the most impact

hollow meadow Jun 8, 2023, 2:12 PM

#

alpine temple Which is really good at understanding context.

Thank you, will look into it

quick bay Jun 8, 2023, 2:13 PM

#

can you explain what is 'the precision-recall tradeoff by yourself and pick an operating point'?

#

sorry, i'm a newbie

#

and english is not my first language

#

you're right, the estimators gave big impact to the model

past meteor Jun 8, 2023, 2:16 PM

#

Don't worry - the output of your model is a probability right? Under normal circumstances (in the 0 vs 1 case) above 0.5 => 1 and under 0.5 => 0. You used class weights to solve this. Instead of using class weights you could for instance look at the distribution of the scores your model is giving. You can plot the precision and recall as a function of you increasing or decreasing the minimum score to be 0 or 1 from its default 0.5 (the operating point)

#

So on the basis of finding false positives or false negatives worse you can pick your own threshold that is different

brave sand Jun 8, 2023, 2:27 PM

#

does the xml file path matter for the data?

quick bay Jun 8, 2023, 2:33 PM

#

past meteor Don't worry - the output of your model is a probability right? Under normal circ...

yes it is. It seems like i need to read about precision-recall tradeoff so that i would have better understanding

#

thank you so much

karmic void Jun 8, 2023, 5:56 PM

#

Sklearn or tensorflow?

reef lantern Jun 8, 2023, 5:58 PM

#

hey does anyone here using language R or is learning it ?

potent sky Jun 8, 2023, 6:00 PM

#

karmic void Sklearn or tensorflow?

Both have their uses. Tensorflow is more of a deep learning framework, besides having an entire ecosystem associated with it (model formats, dataset file formats, serving, tflite, tfx, tfhub etc.)
sklearn has its own uses

past meteor Jun 8, 2023, 6:11 PM

#

reef lantern hey does anyone here using language R or is learning it ?

Sure what questions do you have?

serene scaffold Jun 8, 2023, 6:28 PM

#

reef lantern hey does anyone here using language R or is learning it ?

R is mainly for scientists (by which I include economists and mathematicians) who wouldn't necessarily consider themselves programmers. I don't think there's any way to effectively do deep learning in R. Why do you ask?

dusty valve Jun 8, 2023, 6:34 PM

#

I have a bunch of data of x and y coordinates, the shape is (n, 5000, 2), so is there any library that can render 5000 points and updates 30 times a second? Ive been told pygame will not be suitable, but maybe matplotlib will be or pyglet

past meteor Jun 8, 2023, 6:35 PM

#

serene scaffold R is mainly for scientists (by which I include economists and mathematicians) wh...

RIP in my colleagues like to use R 😩

serene scaffold Jun 8, 2023, 6:35 PM

#

past meteor RIP in my colleagues like to use R 😩

what do they do

past meteor Jun 8, 2023, 6:36 PM

#

Write unmaintanable spaghetti

serene scaffold Jun 8, 2023, 6:36 PM

#

but like what are they trying to do

past meteor Jun 8, 2023, 6:36 PM

#

Data science

#

But the R people don't do deep learning at all afaik

#

There are "core" non-DL models that are better supported in R than in Python like ARIMA and GAMs to name 2. The Python version(s) are respectively poorer ports and unmaintained. Is it worth the effort? Largely depends on what you do I guess. Especially considering the overall experience in R is way worse than Python, both syntax and semantics.

agile cobalt Jun 8, 2023, 6:40 PM

#

dusty valve I have a bunch of data of x and y coordinates, the shape is `(n, 5000, 2)`, so i...

matplotlib can sort of do it, not sure if I'd really recommend it though

lapis sequoia Jun 8, 2023, 6:42 PM

#

Anyone here think they could recreate this within matplotlib or another python plotting library?

potent sky Jun 8, 2023, 6:55 PM

#

serene scaffold R is mainly for scientists (by which I include economists and mathematicians) wh...

Yep. There is tensorflow for R and keras for R but they're community maintained only. And not that great tbh

#

I tried to write capsule Networks in R sometime ago (I'm a masochist that's why) and even tho the code was all correct it ended up not working due to some bug in keras R

#

Tried lots of stuff to get it working but didn't. It was like a deadlock situation but with errors. Annoying

#

But as much as I dislike the code style and practices and paradigms of R, ig I can see how it'd be useful to people doing pure Data Analytics

crystal obsidian Jun 8, 2023, 7:26 PM

#

Is this script good for correcting error prone text

night kernel Jun 8, 2023, 7:36 PM

#

what do you guys think are the best models available on hugging face?

#

theres many ways to filter - curious to hear what you guys think

crystal obsidian Jun 8, 2023, 8:00 PM

#

night kernel what do you guys think are the best models available on hugging face?

gpt2, grammarly coedit

lapis sequoia Jun 8, 2023, 8:19 PM

#

lapis sequoia Anyone here think they could recreate this within matplotlib or another python p...

Would it be possible or are you too limited with the layout/design to replicate this

tidal bough Jun 8, 2023, 8:22 PM

#

lapis sequoia Would it be possible or are you too limited with the layout/design to replicate ...

Placing the legend beneath the plot and using circles rather than lines in its markers might take a bit of doc-reading, but nothing else I see here looks unusual for matplotlib

brave sand Jun 8, 2023, 9:07 PM

#

how do I change this:

from keras import backend
from keras import initializers
from keras.optimizers import utils as optimizer_utils
from keras.optimizers.schedules import learning_rate_schedule
from keras.utils import tf_utils```
to 
```py
from tensorflow ...```

hasty mountain Jun 8, 2023, 9:44 PM

#

brave sand how do I change this: ```py from keras import backend from keras import initiali...

Install tensorflow in a 2.X version. It'll include keras, so you can just use

from tensorflow.keras import backend
from tensorflow.keras.optimizers import utils as optimizer_utils
...

#

It's possible that you may have to do some adjustments, but in general it's basically just adding tensorflow. before keras

night kernel Jun 8, 2023, 9:46 PM

#

crystal obsidian gpt2, grammarly coedit

perhaps a novice question but is there not a gpt3.5 or gpt4, basically an updated version, available?

verbal venture Jun 8, 2023, 9:48 PM

#

how much harder is generative AI compared to vision?

brave sand Jun 8, 2023, 9:54 PM

#

hasty mountain Install tensorflow in a 2.X version. It'll include keras, so you can just use ``...

oh ok, thank you so much. do you know what this means:

Killed

I followed this guide exactly:
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html

hasty mountain Jun 8, 2023, 9:57 PM

#

brave sand oh ok, thank you so much. do you know what this means: ```2023-06-08 17:53:48.60...

Hm... No. I don't. Sorry.

hasty mountain Jun 8, 2023, 10:01 PM

#

verbal venture how much harder is generative AI compared to vision?

Depends on which model you want to use for generation.
If you want to use a Variational AutoEncoder...well, it's quite simple, though the theory is complicated...and it's also quite hard to find a decent tutorial.
If you want to use a Generative Adversarial Network, then the theory is easy and it's easy to find a tutorial, but it's too hard to make it work as it involves too much trial and error.
If you want to use a Diffusion Model, then you'll have the mid-term between those two above: theory complicated, easy to find a tutorial, and not that hard to make it work, but also requires some trial and error.

#

~~PS: Flow-models are an aberration~~

verbal venture Jun 8, 2023, 10:01 PM

#

ok, what about transformers

hasty mountain Jun 8, 2023, 10:04 PM

#

I have never used a Transformer for generating images, but there might be a version of it for that... I know that there's a GAN model that uses Self-Attention, which is a mechanism from Transformer...

But I suppose the problem could be pretty much the same as for texts: teacher enforcing bias, gradients biased due to residual blocks, crazy gradients due to how the layers weights behave...

brave sand Jun 8, 2023, 10:04 PM

#

does anyone know why https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html
this guide uses cpu even though it's supposed to use GPU?

#

my cpu load is 99%

verbal venture Jun 8, 2023, 10:05 PM

#

how long do you think it would tak to get a firm understanding of GANs, Transformers, VAEs and diffusion models?

hasty mountain Jun 8, 2023, 10:11 PM

#

verbal venture how long do you think it would tak to get a firm understanding of GANs, Transfor...

I think that, for a GAN...maybe 3 days? 1 week, at most, I think. Really, the idea behind them is quite simple: a Generator trying to fool a Discriminator with fake images and behare of Nash Equilibrium, exploding/vanishing gradients. Follow a tutorial and you may be able to make a DCGAN work easily with MNIST or CelebA dataset(which are the most common ones). The problem is if you try to go beyond that.

Transformers...if you already has some knowledge on NLP, maybe also 3 or 4 days. If know nothing about NLP, vectorizing, etc, it may require some weeks. There's a course on Transformers on Coursera made by Andrew Ng(one of Transformer creators) which can help.

VAEs...around a week. Diffusion Models may take a bit more.

verbal venture Jun 8, 2023, 10:11 PM

#

no.. to make production models

#

create my own models* similar to stabilityAI

hasty mountain Jun 8, 2023, 10:13 PM

#

verbal venture create my own models* similar to stabilityAI

About the same time.

verbal venture Jun 8, 2023, 10:13 PM

#

I have 0 deep learning background

hasty mountain Jun 8, 2023, 10:14 PM

#

Oh, then you may need to add some weeks to those estimations

verbal venture Jun 8, 2023, 10:14 PM

#

so I can literally make my own production-ready GAN in 3 weeks?

#

that doesn't seem possible

hasty mountain Jun 8, 2023, 10:17 PM

#

Well, it depends on how you want to do it.
Tutorials on how to do a GAN are quite easy to find, so you may be able to make one quite fast. The only thing that may slow you down a bit will be debugging your code. But if you follow everything correctly, you may be able to have a working DCGAN on MNIST/CelebA dataset relatively fast.
However...as I said, if you want to modify the architecture of your DCGAN to generate different results, then it'll take quite some time to make it work.

#

In fact, it can take months of trial and error and studying.

brave sand Jun 8, 2023, 10:18 PM

#

Does anyone know of a way to test if my system is lacking ram?

hasty mountain Jun 8, 2023, 10:20 PM

#

hasty mountain In fact, it can take months of trial and error and studying.

And when I say months of trial and error and studying, I mean literally. I've studied GANs for more than a year in order to make my own GANs work

#

VAEs and Diffusion Models are a bit more tough to understand, but making them work is quite easy. And that's why they've been getting the favor of most people nowadays.

Transformer follows the same idea, plus the fact that you may need to learn Natural Language Processing...which can take a while to understand...especially the idea around vectorization and embedding matrices.

verbal venture Jun 8, 2023, 10:25 PM

#

hasty mountain And when I say months of trial and error and studying, I mean **literally**. *I'...

How many hours per day were you studying GANs

hasty mountain Jun 8, 2023, 10:29 PM

#

verbal venture How many hours per day were you studying GANs

Good question... I'd say...an average of 1~2 hours? pithink

#

Some days were more...some less...

wanton sentinel Jun 9, 2023, 1:04 AM

#

Quick one and I'm feeling real dumb for having to ask it... but selecting data based on the value of a combined index in Pandas... Say I have ['FIRST','SECOND'] as my index labels, how would I select based on a filter on the values in SECOND, ignoring any matches in FIRST? This seems like what I want, but not sure how to restrict it to just the second part of the index: https://pandas.pydata.org/docs/user_guide/indexing.html#selection-by-label

agile cobalt Jun 9, 2023, 1:08 AM

#

explained in #python-discussion but leaving here too for others to know it's been answered```pycon

df
0
a a 1
b 2
b a 3
df.loc[(slice(None), 'a'), :]
0
a a 1
b a 3

wanton sentinel Jun 9, 2023, 1:10 AM

#

Works perfectly. Thanks muchly!

empty furnace Jun 9, 2023, 1:36 AM

#

data scientists who worked for companies where SQL was an optional requirment, how much do you use it? And if you use it how techical is your knowledge?

broken arch Jun 9, 2023, 4:22 AM

#

_\

glad glade Jun 9, 2023, 5:59 AM

#

bro I'm just following some yt tutorial for python but for some reason it won't detect mss idk how to fix this, I thought the file might be corrupted so I deleted it and reinstalled it but as shown it did not work

lapis sequoia Jun 9, 2023, 6:12 AM

#

I need some help in visualising the counts of present vs absent per person

#

P means present and A is absent, which are stored in the respective dataset

#

And there are 79 unique IDs

past meteor Jun 9, 2023, 7:24 AM

#

potent sky But as much as I dislike the code style and practices and paradigms of R, ig I c...

Beyond just style, R is lower on the strong typing spectrum than Python, bit like PhP and JS.

If your language is dynamically typed you need to throw a lot of runtime errors like Python and Ruby. R, JS, PhP sometimes produce silently wrong results 🥴. In R it's just worse because the odds you inherit a shitty codebase is higher than in JS, PhP.

royal dagger Jun 9, 2023, 7:24 AM

#

has anyone finetuned any model using QLoRA?

past meteor Jun 9, 2023, 7:29 AM

#

Although I think if you're not exclusively doing DL it's worth to learn it. I treat it like a statistics DSL.

potent sky Jun 9, 2023, 7:41 AM

#

past meteor Although I think if you're not exclusively doing DL it's worth to learn it. I tr...

Fair ig. I've taken a few uni courses and a few mini projects, but the prospect of actively coding in R wouldn't thrill me so to speak

glacial rampart Jun 9, 2023, 10:49 AM

#

lapis sequoia I need some help in visualising the counts of present vs absent per person

Can you provide a bit more information on what the df's look like?
Based on the image I now assume that every [x, y] in df_present contains an id of someone present and every [x, y] in df_absent contains an id of someone absent. Is that correct?
If so, you would first have to aggregate the dataframes into a [1, 79] df and then you can easily visualize it:

df = pd.DataFrame(np.random.randint(0,79,size=(2000, 12)), columns=list('ABCDEFGHIJKL')) # [2000 x 12]
df_counts = df.apply(pd.Series.value_counts) # [79 x 12]
df_sum = df_counts.sum(axis=1) # [79 x 1]

plt.rcParams["figure.figsize"] = (100, 5)
df_sum.plot.bar(range(df_sum.shape[0]), df_sum.values)
# Repeat the same for absent and plot in same image

Please make sure to give more details on how you want to visualize the data, if I misunderstood

#

Separate post:
So I joined this community mainly because I'm looking for a specific kind of Python course. I think this channel is probably best suited for the question:
I'm a Data Engineer with basic Python knowledge, but I'm looking to expand that through project-based learning. I'm looking for a kind of course or training that offers such kind of learning.
Many courses seem to offer basic explanations etc. whereas I really would like to learn more advanced Python tips and tricks.
Could anyone recommend a person or company for such trainings? (Can be online, classroom based, multi-day, thats all fine)

little vector Jun 9, 2023, 11:52 AM

#

I just finished an applied machine learning course, but I want to learn more about the subject. Any tips for documentations/tutorials I could study?
I want to expend my knowledge and perhaps to some ML on my own.

karmic void Jun 9, 2023, 1:06 PM

#

For beginners, sklearn or tenserflow or any other module?

serene scaffold Jun 9, 2023, 1:07 PM

#

karmic void For beginners, sklearn or tenserflow or any other module?

you don't learn ML by "learning libraries". you have to learn ML itself.

#

but tensorflow is for neural networks, and you definitely shouldn't start with that.

karmic void Jun 9, 2023, 1:09 PM

#

serene scaffold you don't learn ML by "learning libraries". you have to learn ML itself.

Yeah, I gotta learn the terms and theory, right?

serene scaffold Jun 9, 2023, 1:09 PM

#

karmic void Yeah, I gotta learn the terms and theory, right?

yes. and none of the python ML libraries are intended to help you learn them.

#

except maybe keras for neural networks.

karmic void Jun 9, 2023, 1:10 PM

#

serene scaffold yes. and none of the python ML libraries are intended to help you learn them.

So, is there any tutorial I can follow?

serene scaffold Jun 9, 2023, 1:10 PM

#

karmic void So, is there any tutorial I can follow?

!resources data science

arctic wedgeBOT Jun 9, 2023, 1:10 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

little vector Jun 9, 2023, 1:59 PM

#

I am familear with most of the libaries. We did a lot of work wit sklearn

subtle knot Jun 9, 2023, 2:06 PM

#

how do I change the font size of a single markdown cell in jupyter?

potent sky Jun 9, 2023, 2:06 PM

#

little vector I am familear with most of the libaries. We did a lot of work wit sklearn

Try to work on and get solid with the theory and the math

little vector Jun 9, 2023, 2:09 PM

#

serene scaffold !resources data science

Nice I think this will help a lot

little vector Jun 9, 2023, 2:11 PM

#

subtle knot how do I change the font size of a single markdown cell in jupyter?

Can't you use a HTML div within the cell?

<div style="font-family:verdana">Lorem Ipsum.</div>

errant lake Jun 9, 2023, 3:11 PM

#

glacial rampart **Separate post:** So I joined this community mainly because I'm looking for a s...

I don't know specific trainings but we can certainly discuss for which topics to aim for as a DE today

#

Actually, the Cloud providers DE certifications are always good to have imo. So you can check what your favorite Cloud has to offer

glacial rampart Jun 9, 2023, 3:27 PM

#

errant lake I don't know specific trainings but we can certainly discuss for which topics to...

Yeah I'm also interested in that, the Cloud certifications are on my radar. However I prefer to gain more experience before taking them (we are setting up cloud in our company rn)

#

again though, I'd definitely be interested in hands-on courses for that, not just going through tons of theory 🙂

brave sand Jun 9, 2023, 4:10 PM

#

what does this mean:

2023-06-09 15:56:31.123032: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:392] Filling up shuffle buffer (this may take a while): 297 of 2048
2023-06-09 15:56:41.180448: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:392] Filling up shuffle buffer (this may take a while): 622 of 2048
2023-06-09 15:56:51.125136: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:392] Filling up shuffle buffer (this may take a while): 907 of 2048```

boreal gale Jun 9, 2023, 4:15 PM

#

if i were to guess, it's a log entry that says
"i am moving data into a special zone of memory (or just an allocated and contiguous region of memory really), this is a prerequisite to do a shuffling operation (but according to the source code it says sampling??), current progress is XXX out of the maximum capacity of YYY)

ref: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/data/shuffle_dataset_op.cc#L414-L415

quick fable Jun 9, 2023, 4:17 PM

#

Hello guys can somebody suggest me should I go for 3060 12gb or 3070ti 8gb for my pc , I will be doing dl/ml work and running models most of the time. Plus pls suggest me other components around it .

brave sand Jun 9, 2023, 4:18 PM

#

quick fable Hello guys can somebody suggest me should I go for 3060 12gb or 3070ti 8gb for m...

3060

brave sand Jun 9, 2023, 4:19 PM

#

boreal gale if i were to guess, it's a log entry that says "i am moving data into a special ...

I am using collab and my batch_size is 8, around 150 images. how could I be out of memory?

boreal gale Jun 9, 2023, 4:21 PM

#

brave sand I am using collab and my batch_size is 8, around 150 images. how could I be out ...

please post more details, or link to your previous posted detail if you already did.
while i probably can't answer this, someone else might be able to!

past meteor Jun 9, 2023, 4:25 PM

#

glacial rampart Yeah I'm also interested in that, the Cloud certifications are on my radar. Howe...

How is your SQL?

brave sand Jun 9, 2023, 4:29 PM

#

what are these files?
events.out.tfevents...

boreal gale Jun 9, 2023, 4:33 PM

#

https://stackoverflow.com/questions/53517857/can-i-delete-events-out-tfevents-xxxxxxxxxx-computer-name-files-from-training-fo
contains a pretty good summary of what those are, it's mostly for debugging/montiroing purposes.

quick fable Jun 9, 2023, 5:11 PM

#

brave sand 3060

Brotha any specific reason ? Cuz of vram ? So more vram > Cuda cores?
If more vram should I go amd 6900xt 16 gb ?

past meteor Jun 9, 2023, 5:35 PM

#

quick fable Brotha any specific reason ? Cuz of vram ? So more vram > Cuda cores? If more v...

I'm not sure how well DL libraries play with ROCm and AMD

#

You might have some hickups here and there, if you're fine with that then you can go ahead with AMD

junior rain Jun 9, 2023, 5:51 PM

#

This may be a dumb question but is it bad practice to make a nested list a value in a pandas data frame? I'm analyzing EEG data and the module will output an indeterminate number of peaks (one participant could have 2 and the other 3) for each participant and each peak is a list of some values. Right now I'm thinking of storing each list in one list for the participants peaks, but is there a better way to do this?

serene scaffold Jun 9, 2023, 6:00 PM

#

junior rain This may be a dumb question but is it bad practice to make a nested list a value...

even just having dataframe elements that are flat lists is a bit problematic--nested lists are a no-go.

#

I assume when you say "a value in a dataframe", you mean the value for a single cell. not a whole row or a whole column.

junior rain Jun 9, 2023, 6:01 PM

#

yes

quick fable Jun 9, 2023, 6:01 PM

#

past meteor I'm not sure how well DL libraries play with ROCm and AMD

Not good , amd still ain't a choice for deep learning in most of the cases
I don't know if I should go for 3060 12gb , I could be fine tuning model bert model , and bert requires at least 12gb vram and 3060 is slow for it but it's in a budget range.
Only proper gpu could be 3080 12gb but I can support used one only in this range.
Should I go for used one's ? Obviously non refurbished and non mined

serene scaffold Jun 9, 2023, 6:02 PM

#

junior rain yes

whatever the column is that is going to have these nested values, you might want to make it a separate dataframe with multiple levels of indexing.

junior rain Jun 9, 2023, 6:04 PM

#

serene scaffold whatever the column is that is going to have these nested values, you might want...

Each peak has 3 characteristics that are stored in those lists and then the amount of peaks for each individual can vary from zero to usually a max of 3. So you're saying I should represent those as a new DF with columns for the characteristics correct? If so, how could I account for the indeterminant number of peaks while making it clear that those peaks belong to one person, since in the original method they'd just be in the row for that person.

potent sky Jun 9, 2023, 6:06 PM

#

Yep that's what it's doing
The shuffling operation is already done when it's placed into the buffer. From my understanding it samples form the original tensor and places elements into the buffer, so they are "shuffled"
This buffer can then be used for further processing, i.e. it can be sampled from (which is what the comment refers to)

potent sky Jun 9, 2023, 6:07 PM

#

boreal gale if i were to guess, it's a log entry that says "i am moving data into a special ...

Replying to this^^

#

I really thought I had it on reply ;-;

junior rain Jun 9, 2023, 6:07 PM

#

lol I was very confused for a minute

potent sky Jun 9, 2023, 6:07 PM

#

Yeah mb idk why that happens on my phone

past meteor Jun 9, 2023, 6:09 PM

#

quick fable Not good , amd still ain't a choice for deep learning in most of the cases I don...

Is this for personal or professional use?

#

If it's for personal use there's nothing wrong with not getting a SoTA card and just using the cloud whenever your needs exceed your card's capacity. You can probably do some back of napkin calculations and see that this is likely the cheaper option

potent sky Jun 9, 2023, 6:12 PM

#

junior rain Each peak has 3 characteristics that are stored in those lists and then the amou...

Each of these 3 characteristics will be a single number or a list?

junior rain Jun 9, 2023, 6:12 PM

#

single number

potent sky Jun 9, 2023, 6:12 PM

#

You can build a multi index then

junior rain Jun 9, 2023, 6:13 PM

#

potent sky Each of these 3 characteristics will be a single number or a list?

they output like this: peak params: [[ 9.81511894 0.72768575 3.00442113]
[13.0427614 0.25957048 2.54345948]
[18.12826738 0.15093637 4.62563198]] with each list being one peak

potent sky Jun 9, 2023, 6:13 PM

#

One index for each patient/subject
And a lower index for each peak
With 3, columns in the df, one column each for each of the "characteristics" of a peak

#

Yep should work I think

#

!d pandas.MultiIndex

arctic wedgeBOT Jun 9, 2023, 6:15 PM

#

pandas.MultiIndex


class pandas.MultiIndex(levels=None, codes=None, sortorder=None, names=None, dtype=None, copy=False, name=None, verify_integrity=True)```
A multi-level, or hierarchical, index object for pandas objects.

junior rain Jun 9, 2023, 6:15 PM

#

potent sky Yep should work I think

Let me read into it more but it seems like that would work. My only concern is if I could output this to a csv file and what that would look like.

#

Yea based on the documentation it seems like I'd have a 3 dimensional df, so how could I export this into a simpler/readable database format?

potent sky Jun 9, 2023, 6:18 PM

#

I think the CSV would have a column for each index

junior rain Jun 9, 2023, 6:19 PM

#

potent sky I think the CSV would have a column for each index

Interesting...How would those columns look for participants who only had 2 peaks instead of 3? Would the extra column just be blank?

potent sky Jun 9, 2023, 6:20 PM

#

No...the peak would be described by one level of index
This would translate to one column

#

So you'd have 3 rows for participants with 3 peaks, 2 rows for those with 2 etc.

#

And your first level index would presumably be some sort of participant id

junior rain Jun 9, 2023, 6:21 PM

#

potent sky So you'd have 3 *rows* for participants with 3 peaks, 2 rows for those with 2 et...

oh I get it, that makes sense. This would work pretty well then.

junior rain Jun 9, 2023, 6:21 PM

#

potent sky And your first level index would presumably be some sort of participant id

that's exactly what it is, perfect

#

Thanks @potent sky and @serene scaffold

potent sky Jun 9, 2023, 6:23 PM

#

junior rain oh I get it, that makes sense. This would work pretty well then.

No problem! Stel gave the right idea I just explained it lol

#

oops wrong tag again ;-;

wooden forge Jun 9, 2023, 6:58 PM

#

Hey there, I am kinda confused with np.gradient. I would like to get the derivative of a 2D array (an image) but it returns two arrays x and y and I don't know what to do with these

wooden sail Jun 9, 2023, 7:06 PM

#

wooden forge Hey there, I am kinda confused with `np.gradient`. I would like to get the deriv...

what are you expecting gradient to do?

wooden forge Jun 9, 2023, 7:07 PM

#

nvm I found it

#

I didn't see the x and y were 2D arrays as well

#

so it's like the dx and dy

wooden sail Jun 9, 2023, 7:08 PM

#

yep

wooden forge Jun 9, 2023, 7:08 PM

#

debugging saves the day once again

wooden sail Jun 9, 2023, 7:08 PM

#

if you know what behavior you expect though, it might be easier to use a 2d convolution from scipy to produce the derivative you want

wooden forge Jun 9, 2023, 7:09 PM

#

gradient does exactly what I want so I'll stick with this thanks !

wooden sail Jun 9, 2023, 7:09 PM

#

all righty

subtle knot Jun 9, 2023, 7:21 PM

#

Is the audit only version of the machine learning course by Andrew ng good or should I buy the course(labs etc)?

glacial rampart Jun 9, 2023, 8:34 PM

#

past meteor How is your SQL?

It´s fine, I currently write jinja-sql with the dbt package. I don´t like SQL as much, but I think I know enough to be able to use it for most projects
For the current project we write all transformations in jinja-sql so that some nice hands-on experience I'll already get

past meteor Jun 9, 2023, 8:47 PM

#

glacial rampart It´s fine, I currently write jinja-sql with the dbt package. I don´t like SQL as...

For data engineering first jobs having a great understanding of SQL is a must because I can tell you out of experience many of them don't know Python 🥴

#

My issue with cloud certs is that a lot of it is just about their product and not necessarily concepts.

glacial rampart Jun 9, 2023, 8:55 PM

#

Yeah well regardless those cloud certs are becoming industry standard :/ I'll need to have one from any cloud provider at some point.
And yeah I know SQL is like the must-know for all older systems/ companies. I think I'll manage as long as I understand the idea behind SQL, which I do learn with jinja. It's just that without jinja you need to write a lot of additional sql yourself

past meteor Jun 9, 2023, 8:58 PM

#

The irony is that SQL is cutting-edge

#

As in, "modern" SQL is just a language/specification that is sent to a query an S3-like datastore in a distributed way. See: Snowflake, Databricks spark-SQL, ...

That being said, it kinda sucks because queries can get long and unreadable ofc.

#

Pretty sure there's DBT adapters for Snowflake and the Spark runtime.

Re cloud certs: are they asking for that for an entry level job? Idk if cloud certificates have value before working with the stack either way. I have a couple of Azure certs because where I live they have 80 % of the market share and there's ways to get them for free. Personally wouldn't have grokked anything without having used Azure before though.

potent sky Jun 10, 2023, 6:16 AM

#

subtle knot Is the audit only version of the machine learning course by Andrew ng good or sh...

The labs bring a significant fraction of the course's value. But you don't need to buy it. Apply for financial aid and explain your reasons, they're usually pretty generous

glacial rampart Jun 10, 2023, 6:23 AM

#

past meteor Pretty sure there's DBT adapters for Snowflake and the Spark runtime. Re cloud...

No, not for entry level. But they start expecting juniors to take them and expect them from mediors and up

glacial rampart Jun 10, 2023, 6:26 AM

#

past meteor As in, "modern" SQL is just a language/specification that is sent to a query an ...

That's true. But at least with modern tools like Snowflake, Databricks you can use Python adapters (or Go, I think go is also promising, what do you think?). And on databricks you can schedule Python notebooks

past meteor Jun 10, 2023, 7:42 AM

#

glacial rampart That's true. But at least with modern tools like Snowflake, Databricks you can u...

They need to let you use python first. As I said, many data engineers don't know Python, just SQL. It's not because those features exist that you can use them on every team.

This discussion is getting a bit circular.

hoary jay Jun 10, 2023, 8:08 AM

#

So say I have a data ranging from say floating val data from -1 to 1 and i basically want to seperate it into 3 categories , basically +ve and -ve values and the 3rd one including data that is too close to 0 depending on a threshold, problem is how do i mathematically define and justify this threshold? I tried numpy's quantiles but that just divides it equally but in my case i can have 70% negative values and 10% neutral and so on.. so i can divide just in equal ratios, Im writing a research paper so i cant just assume and say i devide the data at -0.05 and +0.05 because that would be random so what can i do?

wooden sail Jun 10, 2023, 8:36 AM

#

hoary jay So say I have a data ranging from say floating val data from -1 to 1 and i basi...

well, why are you splitting it into 3 groups?

hoary jay Jun 10, 2023, 8:39 AM

#

I'm working on a NLP problem and i have some cosine similarity scores ranging from floating values from -1 to 1 and i basically want to seperate it into 3 categories, basically wanna segment values that are closer to -1, those that are closer to 1 seperately and those which are near to 0 are neutral values

this values are basically result of a cosine similarity b/w a sentence vector and a target vector (such as "I am a man" i calculate its sentence vector and then use cos_sim with say a tagret word "men" or "women" to identify which target word it is more related to )

#

its kind of like i just wanna cluster in 1d

wooden sail Jun 10, 2023, 8:40 AM

#

one way you could do this is to treat the thresholds as learnable parameters

#

or have them be hyperparams and test which one perform best. but there is no unique way of choosing them here

#

are you doing anything with the neutral values?

hoary jay Jun 10, 2023, 8:41 AM

#

wooden sail one way you could do this is to treat the thresholds as learnable parameters

this probably wouldnt be possible because im doing unsupervised learning so we will never have labelled data

hoary jay Jun 10, 2023, 8:42 AM

#

wooden sail are you doing anything with the neutral values?

actually identifying them is important because we dont want to include them in our further experiments so removing them is kinda important

hoary jay Jun 10, 2023, 8:42 AM

#

wooden sail or have them be hyperparams and test which one perform best. but there is no uni...

so like just manually observe and adjust everytime?

wooden sail Jun 10, 2023, 8:43 AM

#

you could try a grid search for that and keep the thresold that works best

past meteor Jun 10, 2023, 8:51 AM

#

hoary jay So say I have a data ranging from say floating val data from -1 to 1 and i basi...

Do you want to find 3 groups in your data and you want to know the optimal cut off points to decide the groupings?

hoary jay Jun 10, 2023, 8:51 AM

#

past meteor Do you want to find 3 groups in your data and you want to know the optimal cut o...

yep

past meteor Jun 10, 2023, 8:52 AM

#

Do you do anything with it that is quantitative or do you just qualitatively analyse what's in the groups?

hoary jay Jun 10, 2023, 8:52 AM

#

hoary jay I'm working on a NLP problem and i have some cosine similarity scores ranging fr...

.

wooden sail Jun 10, 2023, 8:52 AM

#

from what they said, i understood they use this value to remove samples from the training data

#

i might've misunderstood

hoary jay Jun 10, 2023, 8:53 AM

#

I'm seperating different sentences actually on the basisi of the similarity values

hoary jay Jun 10, 2023, 8:53 AM

#

wooden sail from what they said, i understood they use this value to remove samples from the...

yeah that too

wooden sail Jun 10, 2023, 8:53 AM

#

how do you do that separation?

past meteor Jun 10, 2023, 8:53 AM

#

Okay but can we phrase it under qualitative and quantitative for a second

#

Do you have a downstream task that computes something?

hoary jay Jun 10, 2023, 8:54 AM

#

wooden sail how do you do that separation?

thats what i asked, if i can group this data that will be a seperation then i can work on it further

hoary jay Jun 10, 2023, 8:54 AM

#

past meteor Do you have a downstream task that computes something?

umm no i guess

#

like wdym by that can u elaborate?

past meteor Jun 10, 2023, 8:55 AM

#

Like if you have a downstream task that computes something (quantitative) it's a hyperparameter you can tune like Edd says

wooden sail Jun 10, 2023, 8:55 AM

#

what do you do with the data after separating into these 3 groups?

past meteor Jun 10, 2023, 8:55 AM

#

If these are 3 groups and you'll kind of just look at what is inside and analysing it without a clear measure of performance then it's qualitative

hoary jay Jun 10, 2023, 8:56 AM

#

the data is useless after seperation, once i seperate the sentences based on their similarity values i dont use that data anywhere else

wooden sail Jun 10, 2023, 8:56 AM

#

what do you do with the data you DO use

past meteor Jun 10, 2023, 8:56 AM

#

Why do you want to separate the data?

#

Like if there's nothing you're doing with it why separate at all and not just make a histogram?

#

I'm missing something here, don't fully understand

hoary jay Jun 10, 2023, 8:58 AM

#

ok so let me explain

we have s (sentence vector) and t1 and t2 two target sets

now i calculate cos(s, t1) - cos(s, t2)

if its -ve it will tell me s is more related to t2 then i can do more statisitcal tests on the "s" by analzying its word vectors

wooden sail Jun 10, 2023, 8:58 AM

#

what kind of statistical tests?

hoary jay Jun 10, 2023, 9:00 AM

#

wooden sail what kind of statistical tests?

like im not sure if im allowed to tell its for a research paper, im contributing to under a prof, like sorry if it sounds stupid, but I can tell you it has nothing to do with the data im talking about I just use it for initial classification

wooden sail Jun 10, 2023, 9:01 AM

#

then we can't help you 😛 cuz the answer depends on that

#

some statistical tests turn this into a tunable hyperparameter you can optimize

hoary jay Jun 10, 2023, 9:01 AM

#

ah

wooden sail Jun 10, 2023, 9:01 AM

#

descriptive statistics do not, and all this threshold gives is a family of descriptors

#

in the former case you can optimize it. in the latter, all you can do is graph the family and the choice is kinda arbitrary

#

what you do with this info is up to you now 😛

past meteor Jun 10, 2023, 9:03 AM

#

Graphing the data, picking a threshold and then doing certain tests is a bit suspect because you can pick a cutoff to select the conclusion you want to reach, no?

hoary jay Jun 10, 2023, 9:03 AM

#

past meteor Graphing the data, picking a threshold and then doing certain tests is a bit sus...

agreed

wooden sail Jun 10, 2023, 9:03 AM

#

in the descriptive case you would NEED to plot the whole family, otherwise the results are not meaningful

#

this is something you need to discuss with your supervisor or something, we can't help much more with the amount of info you can share

hoary jay Jun 10, 2023, 9:05 AM

#

I'll confirm with him once, whether I'm allowed to talk about our work on online help forums untill then , thanks for the help tho

wooden sail Jun 10, 2023, 9:05 AM

#

i'm not asking you exactly which tests on which data btw

#

just what kind of statistics

past meteor Jun 10, 2023, 9:06 AM

#

Can a sentence not be related to T1 and T2 at the same time and produce a ve (whatever this is) that is near 0

wooden sail Jun 10, 2023, 9:06 AM

#

if it's descriptive statistics that ones thing

#

if you're fitting a model or comparing to some reference, that's a different kind of statistics

hoary jay Jun 10, 2023, 9:06 AM

#

not fitting a model totally unsupervised learning, I'm doing hypothesis testing later after this

wooden sail Jun 10, 2023, 9:07 AM

#

past meteor Can a sentence not be related to T1 and T2 at the same time and produce a ve (wh...

it was short of negative

#

btw subtracting cosine similarities sounds a little weird

#

related to zestar's example, you can get a t1 similarity of 0, and -1 with t2

hoary jay Jun 10, 2023, 9:07 AM

#

past meteor Can a sentence not be related to T1 and T2 at the same time and produce a ve (wh...

yes that is possible, that is a problem we have and thats why seprating those cases is also important

wooden sail Jun 10, 2023, 9:08 AM

#

this gives you a positive 1 for sim(x, t1) - sim(x,t2) , but the correlation is negative with t2 and 0 with 1

#

even for nonzero similarity with t1, if the negative similarity with t2 is larger you get a positive. same in the opposite direction

#

that metric does not distinguish between positive correlation with t1 and negative correlation with t2

past meteor Jun 10, 2023, 9:09 AM

#

I think you're probably making something simple hard

#

Define a set of rules with arbitrary thresholds and work with that group of data

#

Do it before looking at your data and motivate the choices. Statistics is full of default parameters anyway

#

You can also perform your analysis on a different set of splits and with different signifance levels and put it in a big table

hoary jay Jun 10, 2023, 9:11 AM

#

ah actually it was my bad actually the data will never range from -1 to 1, cossim will always give value b/w 0-1 for t1 and t2

wooden sail Jun 10, 2023, 9:11 AM

#

that too. for example if you have a reference paper that you're gonna compare to or build up on, you can just steal their hyperparamts

hoary jay Jun 10, 2023, 9:12 AM

#

past meteor Define a set of rules with arbitrary thresholds and work with that group of data

okay..

hoary jay Jun 10, 2023, 9:14 AM

#

wooden sail that too. for example if you have a reference paper that you're gonna compare to...

we do have a ref paper although they never discussed the problem of classification so we are trying to adresss this, Il see what i can do now

wooden sail Jun 10, 2023, 9:14 AM

#

all righty

hoary jay Jun 10, 2023, 9:14 AM

#

thanks for the help both of ya

wooden sail Jun 10, 2023, 9:14 AM

#

i think zestar and i just dropped a lot of stuff on you, so at this point i suggest you step back and mull it over for a bit and discuss with your supervisor. otherwise we're gonna confuse or derail you because we also lack some context

hoary jay Jun 10, 2023, 9:22 AM

#

wooden sail i think zestar and i just dropped a lot of stuff on you, so at this point i sugg...

okay I'll do that!

wooden sail Jun 10, 2023, 11:27 AM

#

that categorization sounds like it should be a supervised task to me

#

let's see what zestar has to say

past meteor Jun 10, 2023, 11:32 AM

#

Before I respond, without knowing fully what they do, I think this is something @serene scaffold might be able to help at

serene scaffold Jun 10, 2023, 11:32 AM

#

I have to read all that?
Fuck

night prawn Jun 10, 2023, 11:33 AM

#

i want to use my gpu so i followed this tutorial https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute#setting-up-tensorflow-directml-or-pytorch-directml( the Setting up TensorFlow-DirectML or PyTorch-DirectML part) but for the third command it returns me conda: command not found. thank you in advance for your help

GPU accelerated ML training in WSL

Learn how to setup the Windows Subsystem for Linux with NVIDIA CUDA, TensorFlow-DirectML, and PyTorch-DirectML. Read about using GPU acceleration with WSL to support machine learning training scenarios.

wooden sail Jun 10, 2023, 11:34 AM

#

you probably didn't do the conda init at the end of the miniconda installation

#

navigate to where the miniconda executable is located, and in that directly run the command conda init. this will modify your bashrc to allow you to use the conda command from anywhere

#

then close the terminal and open a new one, or do source .bashrc from your home dir

serene scaffold Jun 10, 2023, 11:36 AM

#

serene scaffold I have to read all that? Fuck

I'll need more time to think

past meteor Jun 10, 2023, 11:40 AM

#

Does bert even give sentence embeddings or are those gotten by averaging all word embeddings or doing a max at each dimension?

#

Okay appearantly I forgot about CLS pooling. I kind of don't want to answer as this is quite NLP specific and that's not my jam aside from the high level ideas 😛

#

But it does look like classification tbh.

hoary jay Jun 10, 2023, 11:56 AM

#

past meteor Does bert even give sentence embeddings or are those gotten by averaging all wo...

not exactly, I'm using Sentence Transformers or popularly known as sbert, i think it works differently rather than taking average of all word embeddings

hoary jay Jun 10, 2023, 12:06 PM

#

wooden sail that categorization sounds like it should be a supervised task to me

the idea is to analyze the conversation of a community such as reddit and then attempting to classify those comments. So we can never have a trained model on some labelled dataset that can be applied to every conversation on social media , because in different communities different words are used in different context, for example slangs that are used in a particular subreddit may not be getting used with the same context in say a YouTube comment section conversation

wooden sail Jun 10, 2023, 12:08 PM

#

the idea of a training set is being representative, not to contain all text

#

how will you verify that your classification is working?

hoary jay Jun 10, 2023, 12:10 PM

#

wooden sail how will you verify that your classification is working?

we are trying to prove our results via experimentation, So we'll try to work on 100k rows labelled dataset and then try to classify we can test on datasets taken from different social media communities to prove that this process is consistent

wooden sail Jun 10, 2023, 12:11 PM

#

so you have labelled data. my approach would actually be to add this as an extra layer or two in the network and spit out the class instead of that function of the cosine similarity

hoary jay Jun 10, 2023, 12:11 PM

#

but we are not implying that a labelled dataset is important for this process to work

wooden sail Jun 10, 2023, 12:12 PM

#

in that case though, you'd have to do something like plotting a family of curves as a function of this threshold param

#

still, let's see what stelercus says. i'm on the same boat as zestar, maybe there's something else at play that i can't see

hoary jay Jun 10, 2023, 12:13 PM

#

wooden sail in that case though, you'd have to do something like plotting a family of curves...

can you elaborate?

wooden sail Jun 10, 2023, 12:14 PM

#

to know if your classification is working, you compare it to something. in this case you have this labelled dataset you mentioned. the goodness of the classification will be a function of where you draw the threshold of which data is neutral

#

this alone should make you think of false negatives and false positives being affected by where you set the threshold

hoary jay Jun 10, 2023, 12:17 PM

#

wooden sail this alone should make you think of false negatives and false positives being af...

ok makes sense, the results do vary quite a bit on the basis of what threshold i set, like the accuracy of the model was ranging from all the way 56% to 78% on the basis of the initial threshold used for separation.

wooden sail Jun 10, 2023, 12:19 PM

#

very naively from my side, i would think that tuning this parameter as a hyperparam based on your labelled data and including the tuning alg and making the dataset public both justifies the approach and makes the result reproducible, which is great for a paper

#

that immediately makes it clear that the choice of the parameter depends on you having good data to go off of

#

i'm not an nlp person though

hoary jay Jun 10, 2023, 12:24 PM

#

wooden sail very naively from my side, i would think that tuning this parameter as a hyperpa...

yeahh i think this should be ok for now I hope? If i could just come up with a tuning algorithm or any process to calculate and justify what value of the threshold i can select then i think this will be more than enough for us. i Just need to find a way to tune this threshold

wooden sail Jun 10, 2023, 12:25 PM

#

idk how complicated your inference process is. the bert part you mentioned is already trained yeah?

#

cuz if that's the case you can technically differentiate through that and use your labelled data to optimize the param

#

otherwise you can do a grid search

#

but still, do wait til someone who actually knows what they're doing lends a comment 😛

#

i'm just doing armchair nlp here lol

hoary jay Jun 10, 2023, 12:29 PM

#

wooden sail idk how complicated your inference process is. the bert part you mentioned is al...

yeah pre trained

crimson summit Jun 10, 2023, 12:36 PM

#

i am doing another course on machine learning by andrew ng and in it he divides the cost function by 2*M which M is the number of training examples. Do yall have any idea why he does this. It seems like an additional learning rate that makes the error smaller.

wooden sail Jun 10, 2023, 12:37 PM

#

crimson summit i am doing another course on machine learning by andrew ng and in it he divides ...

quite technically, this term is not needed

#

notice that scalars factor out of derivatives, so scaling the cost by 1/2M scales the gradient by the same amount

#

it does not, however, change where minima are located (remember we're looking for the 0s of the gradient)

#

it's important on your computer, however, because floating point number can only be so large. this scaling factor is useful in keeping the gradient from exploding

#

if you have several terms added together in the cost function, the relative scaling of each one IS important. for a single cost term, this scaling is only here for numerical purposes on your computer

crimson summit Jun 10, 2023, 12:40 PM

#

is it kind of like making the step size smaller ?

past meteor Jun 10, 2023, 12:40 PM

#

The irony of me an NLP is that I took a course on multimodal information retrieval before I took computer vision and NLP

#

I did doing computer vision propper afterwards but anything I know about NLP is from an IR perspective, not "complete" whatsoever but also not nothing

wooden sail Jun 10, 2023, 12:40 PM

#

crimson summit is it kind of like making the step size smaller ?

sure, but the cost was also made proportionally smaller, so those two factors cancel out

crimson summit Jun 10, 2023, 12:43 PM

#

wooden sail sure, but the cost was also made proportionally smaller, so those two factors ca...

the cost is divided by 2m what part are you reffering to cancels out

wooden sail Jun 10, 2023, 12:43 PM

#

let's put it this way

#

imagine my house is 10 meters away, and each step i take is 1 meter long. it takes me 10 steps to get there

#

what if my house is now 1 meter away, but my steps are 0.1 meters long?

#

this is the same thing that is happening here, since the gradient is computed from the cost and scalars factor out of derivatives

crimson summit Jun 10, 2023, 12:50 PM

#

doesnt the cost tell you in this case tell you "how far away you are from your house" and you would scale other things in the formula that adjust the weights to make your steps smaller

wooden sail Jun 10, 2023, 12:51 PM

#

you could if you wanted, i'm just explaining to you that the scaling of the cost function does not have an impact

#

the gradient scales by the same amount as the cost automatically

#

if you want to change the gradient size in a meaningful way, you'd have to change the step size, which is a scalar multiplied ONLY to the gradient, not to the cost

crimson summit Jun 10, 2023, 1:05 PM

#

wooden sail you could if you wanted, i'm just explaining to you that the scaling of the cost...

it makes sense now thx bro

#

house analogy was 👍

broken thistle Jun 10, 2023, 2:23 PM

#

Hey everyone, I need help with outlier handling in each of the train folds, when using StratifiedKFold with GridSearchCV. Im currently using a strategy which might be causing data leakage. The full description and current code is given in my post here. Any insight would be greatly appreciated. Thanks!

https://stackoverflow.com/q/76446540/18559120

Stack Overflow

How to identify outliers and drop rows in train folds, when using S...

I'm using StratifiedKFold CV in GridSearch for AdaBoost and RandomForests Classsifiers. For Outlier anlaysis, I've identified all feature outliers on full dataset (given below). I figured there mig...

night prawn Jun 10, 2023, 2:55 PM

#

wooden sail you probably didn't do the conda init at the end of the miniconda installation

do you have to do the first part to do the second?

wooden sail Jun 10, 2023, 2:57 PM

#

wdym?

night prawn Jun 10, 2023, 3:06 PM

#

in the tutorial

wooden sail Jun 10, 2023, 3:08 PM

#

what are you calling "first" and "second" parts?

night prawn Jun 10, 2023, 3:21 PM

#

the first : Setting up NVIDIA CUDA with Docker the second : Setting up TensorFlow-DirectML or PyTorch-DirectML

wooden sail Jun 10, 2023, 3:24 PM

#

if you want to use an nvidia gpu, yes

queen cradle Jun 10, 2023, 3:33 PM

#

If I understand your setup correctly, t1 and t2 are vectors, and cos(v, w) is the cosine of the angle between them. So cos(word, t1) - cos(word, t2) or cos(sentence, t1) - cos(sentence, t2) is just telling you whether you're closer to t1 or to t2. This can also be phrased in terms of the perpendicular bisector of t1 and t2. This perpendicular bisector is a hyperplane; vectors on one side of the hyperplane are closer to t1 while vectors on the other side are closer to t2. The usual reason for taking the cosine is to get rid of the effect of document length, which is great if you want to compare two arbitrary vectors (e.g., determine if one sentence is similar to another). But if you're trying to determine whether you're closer to one vector or another, then you get the same information by computing the distance to their perpendicular bisector, which can be done by taking the dot product with (t1 - t2)/||t1 - t2|| and looking at the sign. In your case, since you want a neutral category, it's the same as looking at whether the dot product is near 0, large positive, or large negative.

#

That doesn't solve your question, of course. You were asking about how to define thresholds for the categories, and all I did was tell you about a different way to compute something for distinguishing the categories. But I think it clarifies the situation; earlier the question of why you were taking the difference of cosines was raised, and I think the resolution is that it's equivalent to something more mathematically motivated.

night prawn Jun 10, 2023, 3:39 PM

#

wooden sail if you want to use an nvidia gpu, yes

ok thank you

acoustic fjord Jun 10, 2023, 4:01 PM

#

hello there! anyone here ever tried animating their python graphs? if so, how was your experience? thanks!

lapis sequoia Jun 10, 2023, 5:38 PM

#

Anyone know if theres a tag in PyTorch for easy tasks like cleanup or good first task

serene scaffold Jun 10, 2023, 5:52 PM

#

lapis sequoia Anyone know if theres a tag in PyTorch for easy tasks like cleanup or good first...

https://github.com/pytorch/pytorch/issues?q=is%3Aopen+is%3Aissue+label%3A"good+first+issue"

vale idol Jun 10, 2023, 5:54 PM

#

Hi I have a quick question regarding assigning values to columns using pandas

#

df.column1[df.column2 <= 3] = 'LScores'

serene scaffold Jun 10, 2023, 5:55 PM

#

vale idol ` df.column1[df.column2 <= 3] = 'LScores'`

you should be using loc for this

vale idol Jun 10, 2023, 5:55 PM

#

Whenever I try to assign values using a function I get : SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

serene scaffold Jun 10, 2023, 5:55 PM

#

df.loc[df['column2'].le(3), 'column1'] = 'LScores'

vale idol Jun 10, 2023, 5:56 PM

#

Isn't it the same?

serene scaffold Jun 10, 2023, 5:56 PM

#

no

vale idol Jun 10, 2023, 5:56 PM

#

plus I am trying to assign the value back to the original dataframe so thoght loc wouldn't make much sense, espeicially since I'm running this command in a for loop

serene scaffold Jun 10, 2023, 5:57 PM

#

vale idol plus I am trying to assign the value back to the original dataframe so thoght lo...

that's what loc is for.

agile cobalt Jun 10, 2023, 5:57 PM

#

you might want to enable Copy on Write by the way
https://pandas.pydata.org/docs/dev/user_guide/copy_on_write.html

serene scaffold Jun 10, 2023, 5:58 PM

#

but you should probably store the string value in a different column. or you'll have a heterogenous column

#

columns need to be homogenous

#

oh, you are

#

fixed my code example.

vale idol Jun 10, 2023, 5:59 PM

#

serene scaffold but you should probably store the string value in a different column. or you'll ...

Yeah its just a predifined empty column I'm assigning to

hoary jay Jun 10, 2023, 6:01 PM

#

actually the formula for cosine similarity used inside scipy and pretty much every where else is infact (u•v)/|u||v| ,where u and v are vectors so essentially we are already doing w • ((t1-t2)/|t1-t2|) like you said anyways.

The problem is how to set thresholds and also how to introduce fuzzy logic in the process (or anything else that can help) because if i have a score of 0.052 and the neutral threshold is <= 0.05 then this will put it in the t1 partition which is wrong because 0.002 is not a significant difference and the sentence is actually neutral not biased.

hoary jay Jun 10, 2023, 6:01 PM

#

queen cradle That doesn't solve your question, of course. You were asking about how to define...

.

agile cobalt Jun 10, 2023, 6:02 PM

#

vale idol Yeah its just a predifined empty column I'm assigning to

if you just want a if x > something else b you could use np.where instead of assigning yourself

if you have something like -inf...0 = a, 0...3 = b, 3...inf = c you could use pd.Series.map with a dictionary but it gets a bit uglier

vale idol Jun 10, 2023, 6:02 PM

#

agile cobalt if you just want `a if x > something else b` you could use `np.where` instead of...

Yeah I'm dealing with the second option

queen cradle Jun 10, 2023, 6:03 PM

#

hoary jay actually the formula for cosine similarity used inside scipy and pretty much eve...

That's the formula for the cosine of the angle, which is in fact the same as distance from the hyperplane. Either way of looking at it is fine; I just think that looking at it as distance from a hyperplane clarifies why what you're doing is okay.

vale idol Jun 10, 2023, 6:04 PM

#

Specifically, I have a multi index df where I ran the following command
df['env_measure_sorts_10'] = df.groupby(['year'])[score_type[1]].transform(lambda x: pd.qcut(x, q=10, labels=labels_dec))

hoary jay Jun 10, 2023, 6:04 PM

#

queen cradle That's the formula for the cosine of the angle, which is in fact the same as dis...

okay i see

queen cradle Jun 10, 2023, 6:04 PM

#

As far as setting thresholds, I think it depends. You said you have some labeled data. You could use that: There may be a good threshold which distinguishes "sentence about males" versus "everything else" and a second good threshold that distinguishes "sentences about females" versus "everything else".

vale idol Jun 10, 2023, 6:05 PM

#

vale idol Specifically, I have a multi index df where I ran the following command df['...

So the function I asked about is just assigning labels to the deciles made here

queen cradle Jun 10, 2023, 6:05 PM

#

And (perhaps more importantly) you can find that threshold using your labeled data.

vale idol Jun 10, 2023, 6:05 PM

#

Also thanks for the help @agile cobalt @serene scaffold, appreciate it 🙂

queen cradle Jun 10, 2023, 6:06 PM

#

My suggestion is to plot histograms of "sentences about males" and "everything else" and see if it looks like there is a threshold that distinguishes them. And similarly for "sentences about females" and "everything else". If there's a threshold, great! If not, then you'll need a different embedding.

lapis sequoia Jun 10, 2023, 6:06 PM

#

serene scaffold https://github.com/pytorch/pytorch/issues?q=is%3Aopen+is%3Aissue+label%3A%22good...

Oh geez sweet thank you

hoary jay Jun 10, 2023, 6:07 PM

#

queen cradle My suggestion is to plot histograms of "sentences about males" and "everything e...

what about the problem i discussed? basically if you test it out on a huge data then you always end up having a continuous range of data between -n to n and a strict threshold limit always end up leading to false positives all the time, like the one i mentioned above

#

the case of 0.052

queen cradle Jun 10, 2023, 6:08 PM

#

You'll always have false positives. That's how classification problems are.

#

Well, realistic ones, at least.

hoary jay Jun 10, 2023, 6:09 PM

#

queen cradle You'll always have false positives. That's how classification problems are.

in my last run, i set the threshold to 0.05 and about 200 sentences were ranging from 0.05 to 0.055 that's a huge amount which lowered the accuracy drastically

queen cradle Jun 10, 2023, 6:11 PM

#

The problem is likely to be either your embedding or your choice of the vectors t1 and t2. If your embedding isn't very good at describing masculine-topic sentences, feminine-topic sentences, and other sentences, or if t1 and t2 are not very representative of masculine- and feminine-topic sentences, then you'll get poor results no matter what you do.

vale idol Jun 10, 2023, 6:12 PM

#

Also if anyone knows, is there a difference between:
df.loc[df.column1 <= 3, 'column 2'] = 'LScores'
and
df.loc[df['column1'] <= 3, 'column 2'] = 'LScores'

#

Since they apepar to give the same output but not sure if there are caviats to this

serene scaffold Jun 10, 2023, 6:12 PM

#

vale idol Also if anyone knows, is there a difference between: df.loc[df.column1 <= 3,...

they're the same, but some people dislike using dot notation because it puts columns and dataframe methods in the same namespace

#

and if you have a column with the same name as a method, you can't get to it with dot notation.

#

or if the name of the column has a space

vale idol Jun 10, 2023, 6:13 PM

#

serene scaffold they're the same, but some people dislike using dot notation because it puts col...

Good to know. Thanks a lot for the info!

queen cradle Jun 10, 2023, 6:14 PM

#

@hoary jay If you haven't tried plotting histograms of the scores of your sentences, then I would do that before anything else. Those plots will tell you whether or not you have a problem with your thresholds.

wooden sail Jun 10, 2023, 6:17 PM

#

queen cradle If I understand your setup correctly, t1 and t2 are vectors, and cos(v, w) is th...

ah good observation, i somehow didn't see that immediately

hoary jay Jun 10, 2023, 6:18 PM

#

queen cradle <@685053757767024712> If you haven't tried plotting histograms of the scores of ...

plotted a kde, it's a normal distribution curve...

queen cradle Jun 10, 2023, 6:19 PM

#

For which class?

#

And centered where, and with what kind of variance?

hoary jay Jun 10, 2023, 6:20 PM

#

queen cradle For which class?

nope plotted the complete data, centred almost at 0 shifted towards the left a little almost -0.15 ig

queen cradle Jun 10, 2023, 6:21 PM

#

That's saying that most of your data is not especially similar to either of the vectors you're comparing to. Is most of your labeled data neutral?

wooden sail Jun 10, 2023, 6:22 PM

#

that'd more depend on the choice of the threshold, no?

queen cradle Jun 10, 2023, 6:22 PM

#

If the whole distribution looks normalish, though, then there isn't clear separation between the classes of interest.

hoary jay Jun 10, 2023, 6:23 PM

#

queen cradle That's saying that most of your data is not especially similar to either of the ...

yep

hoary jay Jun 10, 2023, 6:23 PM

#

queen cradle If the whole distribution looks normalish, though, then there isn't clear separa...

makes sense

queen cradle Jun 10, 2023, 6:23 PM

#

Okay. What if you plot KDEs of just the other classes?

#

Maybe a little separation is visible?

hoary jay Jun 10, 2023, 6:24 PM

#

ok so i just select a random threshold and then plot the different classes on seperating with it?

queen cradle Jun 10, 2023, 6:24 PM

#

No, I'm saying take all the data with label 1, and plot a histogram (or KDE) of that data and only that data. Then plot a separate histogram for the data with label 2, etc.

hoary jay Jun 10, 2023, 6:25 PM

#

queen cradle No, I'm saying take all the data with label 1, and plot a histogram (or KDE) of ...

oh okay

wooden sail Jun 10, 2023, 6:27 PM

#

i just got that you were talking about the a priori distribution, i'm tripping today

hoary jay Jun 10, 2023, 6:28 PM

#

dist for the two labels

queen cradle Jun 10, 2023, 6:30 PM

#

Wait. This is the labeled data that you began with? It doesn't seem right.

#

It seems extremely unlikely to me that there are sudden cutoffs at zero. Especially for the first picture since it seems to have only just started to peak.

hoary jay Jun 10, 2023, 6:32 PM

#

queen cradle It seems extremely unlikely to me that there are sudden cutoffs at zero. Especia...

correct, basically i only considerd the two labels here there is another label which is the neutrals that is where you will mostly find similarity values closer to 0

#

so u see when i plot the whole data i get a normal dist

wooden sail Jun 10, 2023, 6:33 PM

#

this is the labelled data set right? not the output of your classification for a given threshold

hoary jay Jun 10, 2023, 6:34 PM

#

wooden sail Jun 10, 2023, 6:34 PM

#

that's a bimodal normal at best, btw

hoary jay Jun 10, 2023, 6:34 PM

#

wooden sail this is the labelled data set right? not the output of your classification for a...

yeah i took the labelled dataset then computed their sentence embeddings and the similarity score and that is the output, similarity score of every sentence with t1 and t2 in the plot

hoary jay Jun 10, 2023, 6:35 PM

#

wooden sail that's a bimodal normal at best, btw

oh yeah right.. sorry the word embeddings and their similarity scores had a normal dist not the sentences, sorry im really confused since morning lol

wooden sail Jun 10, 2023, 6:36 PM

#

is this the same dataset from the reference paper? out of curiosity

hoary jay Jun 10, 2023, 6:36 PM

#

wooden sail is this the same dataset from the reference paper? out of curiosity

nope the red paper was not about classifcation so they didnt had any labelled data

wooden sail Jun 10, 2023, 6:36 PM

#

ok

#

i also find it a little "too nice" that there don't appear to be any missclassified sentences

queen cradle Jun 10, 2023, 6:38 PM

#

It looks to me like (barring a bug) your embedding isn't capturing enough information about individual sentences for you to reliably draw the distinctions you want.

#

But I also think it's suspicious that there are no misclassified sentences.

hoary jay Jun 10, 2023, 6:39 PM

#

wooden sail i also find it a little "too nice" that there don't appear to be any missclassif...

wdym? there are misclassiferd sentences...

queen cradle Jun 10, 2023, 6:39 PM

#

hoary jay wdym? there are misclassiferd sentences...

Sure doesn't look like it.

#

One distribution is negative only. The other is positive only. No misclassification.

wooden sail Jun 10, 2023, 6:39 PM

#

i would've worded it as, especially in the first figure, one of the classes being super steeply affected by the choice of the threshold. similar to what kyle said, the classes are kinda hard to separate

hoary jay Jun 10, 2023, 6:41 PM

#

queen cradle One distribution is negative only. The other is positive only. No misclassificat...

how do you know the sentence that is in the negative is actually biased towards t1? only one way to check is manyally or by going through the labels and on doing that we find that most of the sentences u see on the left side are actually neutral and hence wrongly classified

queen cradle Jun 10, 2023, 6:42 PM

#

If the plots are showing what I thought they were, there's one plot for all the labeled data whose labels are masculine-topic and one plot for all the labeled data whose labels are feminine-topic.

#

Is that not the case?

wooden sail Jun 10, 2023, 6:43 PM

#

what labels does your labelled dataset have?

#

or what did you put in the plots rn

queen cradle Jun 10, 2023, 6:44 PM

#

most of the sentences u see on the left side are actually neutral and hence wrongly classified
This sounds to me like the plots are showing how the data was classified, not how the data was labeled.

hoary jay Jun 10, 2023, 6:44 PM

#

wooden sail what labels does your labelled dataset have?

0 for neutral 1 and 2 for bias against t1 and t2

wooden sail Jun 10, 2023, 6:45 PM

#

ok

#

and you plotted the t1 - t2 score here, yeah? separately for the labels t1 and t2

hoary jay Jun 10, 2023, 6:46 PM

#

queen cradle > most of the sentences u see on the left side are actually neutral and hence wr...

in the above 2 plots u can see right, that neutral statements with value closer to 0 are in the majority .. of a population of t1 and t2

#

that is wrong classification right?

hoary jay Jun 10, 2023, 6:46 PM

#

wooden sail and you plotted the t1 - t2 score here, yeah? separately for the labels t1 and t...

yep

queen cradle Jun 10, 2023, 6:46 PM

#

If the plots are showing what I think they are, then no, I can't see that.

queen cradle Jun 10, 2023, 6:46 PM

#

hoary jay dist for the two labels

Are you saying that there are neutral statements in these plots?

wooden sail Jun 10, 2023, 6:47 PM

#

the plots kinda say that the t1 vs t2 classification is trivial, and that there is not much info at all regarding the neutral class

#

can you show the histogram for the 0 labelled data?

hoary jay Jun 10, 2023, 6:47 PM

#

queen cradle Are you saying that there are neutral statements in these plots?

no im saying that biased statements have a similarity score of closer to 0 when in reality they should not..

hoary jay Jun 10, 2023, 6:47 PM

#

wooden sail can you show the histogram for the 0 labelled data?

ok

wooden sail Jun 10, 2023, 6:47 PM

#

i think something weird is going on

#

in how this was labelled

#

the more i think about the plots the less sense they make lol

hoary jay Jun 10, 2023, 6:49 PM

#

wooden sail can you show the histogram for the 0 labelled data?

similarity score of only Neutral statements (0 labelled)

wooden sail Jun 10, 2023, 6:49 PM

#

ok

#

my interpretation wasn't that far off

hoary jay Jun 10, 2023, 6:50 PM

#

wooden sail in how this was labelled

perhaps i did something wrong in the code i should re calculate the embeddings and check all the var names, i make a lot of mistakes when im fatigued lol

wooden sail Jun 10, 2023, 6:50 PM

#

according to your labelled data, the 1 vs 2 classification is trivial

#

and fishing out the neutrals comes at a terrible price

#

your data set is such that the threshold of false positives means you will get a ton of false negatives

#

it could be true of the language, that idk. if could be artifact of the choice of embedding or labelling, idk

hoary jay Jun 10, 2023, 6:52 PM

#

wooden sail and fishing out the neutrals comes at a terrible price

looks like it that is exactly what is happening lol i end up classifying the t1 and t2 class but i also end up assigning a lot of neutrals with them

wooden sail Jun 10, 2023, 6:52 PM

#

the actual interpretation of this requires domain knowledge that i don't have, this really needs knowledge of the statistics of a language

hoary jay Jun 10, 2023, 6:52 PM

#

same

wooden sail Jun 10, 2023, 6:53 PM

#

but if we put that aside, and assume these are correct

#

my suggestion would be the same as kyles. as i said before, we can treat the threshold as a hyperparam and given this info, we should be able to optimize for it

hoary jay Jun 10, 2023, 6:53 PM

#

hmm ill try that

wooden sail Jun 10, 2023, 6:54 PM

#

or pick it for a choice of false negatives per class, or something of the sort

#

immediately you notice that this thing isn't symmetric, so i would recommend to pick a different negative and positive thresh

#

about the correctness of the histogram, you'd need to check papers or contact an expert

#

but those histograms look very weird to me

queen cradle Jun 10, 2023, 6:55 PM

#

I agree that the histograms look weird. So I'm suspicious that there's a bug. But assuming there isn't, you may not be able to extract enough data from a single sentence to do what you want.

#

You might have better luck if you embed whole paragraphs.

wooden sail Jun 10, 2023, 6:56 PM

#

ah, can you change the dimensionality of the embedding?

#

increase it a little?

hoary jay Jun 10, 2023, 6:56 PM

#

wooden sail increase it a little?

no can't its 384 dim now

#

wait i think i can

#

there is another big pretrained model i belive i can try that

wooden sail Jun 10, 2023, 6:56 PM

#

i can understand technical constraints 😛

#

give it a shot, or what kyle said too. depends on how much time you have available for this

hoary jay Jun 10, 2023, 6:57 PM

#

wooden sail i can understand technical constraints 😛

that would be a slight problem but i think colab might help

hoary jay Jun 10, 2023, 6:57 PM

#

wooden sail give it a shot, or what kyle said too. depends on how much time you have availab...

abstract deadline is on 23rd of June

#

trying to submit it in a confrence

wooden sail Jun 10, 2023, 6:58 PM

#

ah, pretty tight on time

hoary jay Jun 10, 2023, 6:58 PM

#

yeep

#

lets see

queen cradle Jun 10, 2023, 6:58 PM

#

Oof, goof luck!

wooden sail Jun 10, 2023, 6:58 PM

#

but if it's just an abstract, all you need is 1 or 2 pretty pictures 😛

hoary jay Jun 10, 2023, 6:58 PM

#

wooden sail but if it's just an abstract, all you need is 1 or 2 pretty pictures 😛

pithink

#

this my first every paper so i actually dont know what to submit either 😂

queen cradle Jun 10, 2023, 6:59 PM

#

Depends on the venue. Your supervisor should be able to explain.

hoary jay Jun 10, 2023, 6:59 PM

#

ill ask the prof yea

wooden sail Jun 10, 2023, 6:59 PM

#

check the conference guidelines. many conferences that request an abstract have a condition like 1 or 2 paragraphs, 1 or 2 images, a word limit, and other specs. check that ahead of time

#

in some cases you can get away with promising the world

#

in others you need to show the results up front, ideally with pretty pictures

hoary jay Jun 10, 2023, 7:00 PM

#

wooden sail check the conference guidelines. many conferences that request an abstract have ...

ok will do

wooden sail Jun 10, 2023, 7:00 PM

#

and peer reviewed ones need the paper for review months in advance (not the case if they only request an abstract)

queen cradle Jun 10, 2023, 7:00 PM

#

For questions like that, there are often conventions that are specific to your field. Outsiders aren't going to have much luck making recommendations.

wooden sail Jun 10, 2023, 7:00 PM

#

the conference website has the explicit guidelines, go check it out!

#

btw your stuff looks kinda like 2 beta distributions and a gaussian, if you like doing parametric estimation instead of KDE. not much of a difference here tbh since it's anyway blind, but...

hoary jay Jun 10, 2023, 7:03 PM

#

wooden sail btw your stuff looks kinda like 2 beta distributions and a gaussian, if you like...

what does this mean actually tho?

#

sorry not that good in stats but i like stats

wooden sail Jun 10, 2023, 7:04 PM

#

in KDE one takes a basic shape and uses it to build up the observed shape

#

in parametric estimation, one knows the correct parametric family ahead of time and fits the parameters of that directly

hoary jay Jun 10, 2023, 7:05 PM

#

wooden sail in KDE one takes a basic shape and uses it to build up the observed shape

yeah kind of likes curve fitting

wooden sail Jun 10, 2023, 7:05 PM

#

but differently

hoary jay Jun 10, 2023, 7:06 PM

#

wooden sail in parametric estimation, one knows the correct parametric family ahead of time ...

ok so can that info be useful? like knowing the distribution family? how can i use this info

wooden sail Jun 10, 2023, 7:06 PM

#

for KDE the "width" or "variance" of a kernel is chosen a priori, as well as a "model order" (number of kernels). then one finds where to shift the kernels to

#

on the other hand, if you correctly know the parametric family ahead of time, you can explicitly do model order and parameter estimation

#

the difference is that KDE in general has no physical interpretation, it just gives you a parametric representation you can evaluate anywhere

#

in parameter estimation, the estimated parameters actually represent the properties of the data

#

(assuming your choice of parametric family was correct... that's a big IF)

queen cradle Jun 10, 2023, 7:08 PM

#

Parametric estimation, like the normal and beta distributions Edd is describing, is particularly useful when you have limited data, when you're trying to find a simple approximation, and when theory predicts that a distribution should be close to parametric.

wooden sail Jun 10, 2023, 7:08 PM

#

that's the final kicker. the lower bound for parameter estimation requires as many samples as unknown parameters

queen cradle Jun 10, 2023, 7:08 PM

#

For example, the central limit theorem essentially says that, given enough data (and some mild technical hypotheses), most things look like they have a normal distribution.

#

At least if you choose the distribution's parameters right.

wooden sail Jun 10, 2023, 7:10 PM

#

kyle's practical implications are probably more relevant to you than my ramblings 😛

hoary jay Jun 10, 2023, 7:11 PM

#

ig you both are insightful

wooden sail Jun 10, 2023, 7:11 PM

#

you guys are doing the work, i'm just a resident rubber ducky

brave sand Jun 10, 2023, 7:12 PM

#

how are you guys using collab without running out of memory?

hoary jay Jun 10, 2023, 7:13 PM

#

wooden sail you guys are doing the work, i'm just a resident rubber ducky

lol dont say that im the one writing a paper and getting stuck on freshman statistics lol😭

#

i still dont know how u eyeballed that its a beta distribution, when i can hardly remember its formula and graph

queen cradle Jun 10, 2023, 7:14 PM

#

I guess I should add that real data is never exactly normal (despite the central limit theorem). Often it's not even hard to see the difference, particularly in the tails.

wooden sail Jun 10, 2023, 7:14 PM

#

hoary jay i still dont know how u eyeballed that its a beta distribution, when i can hardl...

through the magic of google. there are other curves that look like that too, i just picked one i recognized

#

you'd technically have to try a few and then do something like a kolmogorov-smirnov test to pick which one fits best

hoary jay Jun 10, 2023, 7:15 PM

#

wooden sail through the magic of google. there are other curves that look like that too, i j...

okay i should try that

#

is it like another hypothesis test?

wooden sail Jun 10, 2023, 7:17 PM

#

you can check it out here https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test it's a measure of distance between distributions. scipy appears to have an implementation

brave sand Jun 10, 2023, 7:17 PM

#

does anyone know what this error means?

AttributeError: module 'object_detection.protos.square_box_coder_pb2' has no attribute 'DESCRIPTOR'```

queen cradle Jun 10, 2023, 7:18 PM

#

hoary jay is it like another hypothesis test?

It's a hypothesis test for equality of distributions. If you're trying to determine whether one distribution looks like another, though, then you can just use the KS statistic.

#

I would say that finding an appropriate parametric distribution is a bit of an art. It tends to work best when there's some underlying reason (physical, experimental, etc.) why that distribution is close to what you're observing.

hoary jay Jun 10, 2023, 7:20 PM

#

queen cradle It's a hypothesis test for equality of distributions. If you're trying to determ...

oh i see

queen cradle Jun 10, 2023, 7:21 PM

#

So, for example, if you say, "Oh, I got this quantity by taking a sum of a lot of other quantities," then that's maybe suggestive of something normal or normal-ish. There are a lot of distributions that look similar to a normal distribution (t-distributions, chi^2 distributions, etc.) but if you told me that you wanted to fit a big sum to something parametric, my first guess would be a normal distribution.

wooden sail Jun 10, 2023, 7:21 PM

#

one of the things i considered when i said "beta" is that it's bounded, for example, unlike other similar-looking dists. that at least fits the behavior of the values computed by your similarity measure

queen cradle Jun 10, 2023, 7:21 PM

#

If you have enough data, though, non-parametric methods will give you better results in the sense that they will come closer to describing reality.

#

Oh, and there are loads of places where people perform a normality test, see that it fails to reject the null hypothesis, and use that as justification for a hypothesis test that requires normality (e.g., a t-test). You should never do this: Failing to reject the null hypothesis does not mean your data is normal. Your data is never perfectly normal; you just may not have enough to reject normality with the test you're using.

hoary jay Jun 10, 2023, 7:25 PM

#

queen cradle Oh, and there are loads of places where people perform a normality test, see tha...

wait t-test requires the population to follow normal distribution?

#

what if it doesn't? then is ttest valid?

queen cradle Jun 10, 2023, 7:25 PM

#

The usual justification for a t-test requires normality.

#

If the data isn't normal, then it may still be close enough to normal that the t-test is okay.

hoary jay Jun 10, 2023, 7:26 PM

#

ok so how can i make sure the data is close to normal? im asking because I'm using T test later

#

something like Shapiro test?

queen cradle Jun 10, 2023, 7:27 PM

#

Well, if you want to be really careful, then there's no a priori guarantee.

hoary jay Jun 10, 2023, 7:28 PM

#

ok

queen cradle Jun 10, 2023, 7:28 PM

#

Usually we apply the t-test to something like sample means. The central limit theorem says that if we have enough data, then those are approximately normally distributed.

#

But how close you get to normal depends on other parameters of the distribution that you probably don't know.

#

This is quantified by the Berry–Esseen theorem, which shows that the rate of convergence depends on the third moment (unnormalized skewness).

#

In practice, you don't know the third moment, so you don't know how close you are to normal, so you can't really justify a normal approximation and a t-test.

#

But in practice, this rarely matters. In practice, with a decent amount of data, the skewness is almost never so extreme that it messes you up.

hoary jay Jun 10, 2023, 7:31 PM

#

my idea was as a 2nd classification, after classifying a sentence into t1 or t2 related if i can calculate scores of word embeddings in that sentence and take it as a sample and then perform a ttest while the population would be scores of words that are actually biased towards t1 (say) obtained from the the same dataset, then perhaps the null hypothesis can be sentence is biased because it contains biased words and the alternative hypothesis could be sentence is not biased because the mean of the sample is not related or similar to that of mean of the biased words..does that make sense?

queen cradle Jun 10, 2023, 7:32 PM

#

There are various rules of thumb, like n > 30 or n > 50 or whatever. None of these are entirely reliable; you can always concoct a really bad example where they won't work. But most data isn't really bad in that way, so these rules of thumb usually work.

queen cradle Jun 10, 2023, 7:35 PM

#

hoary jay my idea was as a 2nd classification, after classifying a sentence into t1 or t2 ...

I don't think I quite follow. Say you have a sentence. You embed it, and you classify it into t1 or t2 related. Great. It has a lot of words. You embed those and then score them. But I'm not sure what you propose to do with those scores.

hoary jay Jun 10, 2023, 7:40 PM

#

there's a difference b/w related to t1 and t2 and biased towards t1 and t2... because if a sentence is biased towards t1 and t2 then there must be a use of some Stereotypical word or anything that contributes to biases. In my very first text, i mentioned that our ref paper can actually use this word embedding scores to find out the top most biased words against t1 and t2
.

So i thought if a sentence is related to t1 then it can either be a normal sentence that just revolves around t1 and doesn't have any Stereotypical or gender bias in it...But then if a sentence that is related to t1 and has biased words getting used in it probably has a higher chance of being a Stereotypical or biased sentence against t1 right? So that's why i thought maybe we could look at the sentence from the perspective of words too

queen cradle Jun 10, 2023, 7:46 PM

#

hoary jay there's a difference b/w related to t1 and t2 and biased towards t1 and t2... be...

It sounds like you want to use both sentence and word embeddings. The approach you describe is a kind of hierarchical model. Those are fine; a good hierarchical model can be quite powerful. Another approach you could try (well, with enough time; I remember that you have a deadline coming up!) is making a big, combined embedding by using both a sentence and a word embedding. Maybe that would give you more information than the word embeddings alone. It has the same information as the hierarchical model, but it might also be harder to fit.

hoary jay Jun 10, 2023, 7:49 PM

#

queen cradle It sounds like you want to use both sentence and word embeddings. The approach y...

perhaps i could, if i could make time anyways thanks for your help tho!

plain jungle Jun 10, 2023, 7:51 PM

#

An AI I made to ply battleship. Am very excited on how well it turned out

hoary jay Jun 10, 2023, 7:54 PM

#

plain jungle An AI I made to ply battleship. Am very excited on how well it turned out

did u use pygame for engine?

plain jungle Jun 10, 2023, 7:56 PM

#

hoary jay did u use pygame for engine?

Tkinter , PIL, Matplotlib

hoary jay Jun 10, 2023, 7:58 PM

#

plain jungle Tkinter , PIL, Matplotlib

cool i like the confusion matrix on the right

plain jungle Jun 10, 2023, 7:59 PM

#

Thank you! It’s a standard population density graph! My goal has been lately to show people that AI is a very broad range and not everything needs NNs to be optimal

hoary jay Jun 10, 2023, 8:08 PM

#

love that

tidal bough Jun 10, 2023, 8:10 PM

#

plain jungle An AI I made to ply battleship. Am very excited on how well it turned out

Heat Map Colorbar Label 😉

plain jungle Jun 10, 2023, 8:14 PM

#

tidal bough Heat Map Colorbar Label 😉

Lmao, thank you for the catch! Was just too excited that I was working I forgot

tacit knot Jun 10, 2023, 8:32 PM

#

Ok, I'm really hoping someone can point me in a direction here... I'm building an open source AI based thing (called AutoVR.ai, details if ya want, but doesn't matter for this question)

Under the hood I'm using ZoeDepth that is based on MiDaS and those build off of torch.nn.Module. The point of this model is to take in an image and it puts out a depthmap. There is an ability to adjust the "precision" to use for the actual inference portion and even though the output resolution will end up matching the input image, it does seem to make a dramatic difference on the quality and details clearly present in the depthmap itself. So there is a desire to crank up that "precision" as much as possible if trying to produce high quality outputs. Now, the problem is, the max precision someone can use is going to be directly associated to the VRAM available. If that "precision" is set too high, it will just throw an out of VRAM error and I'm trying to deal with that a bit more gracefully than just letting the thing crash.

I first attempted to catch that error, automatically adjust the precision down a bit, reattempt, and keep track of the working precision combinations so it doesn't need re-determined multiple times. That process generally works, but there are some issues. I've stumbled onto some insane memory leaks that I might have introduced, but they sure seem to be inherent in this ZoeDepth thing that I'd rather not rip apart and/or update to do proper garbage collection.

#

That all said, I simply don't understand python memory management well enough at the moment. For example, without that "dynamic" functionality I'm getting this as an example error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.97 GiB (GPU 0; 11.69 GiB total capacity; 7.51 GiB already allocated; 2.06 GiB free; 7.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The part I'm really not understanding is why it is telling me 11.69 total, 7.55 reserved, but that would imply 4.14 free, although it is saying only 2.06. Just understanding this a bit more might help a lot and honestly, my googling/chatgpting has sent me down more rabbit holes than anything that seems actually helpful.

iron basalt Jun 10, 2023, 9:11 PM

#

tacit knot That all said, I simply don't understand python memory management well enough at...

There is a bunch of other memory used by Pytorch that is not included in this out of memory error.

#

There is also memory being used by the rest of the system (other processes).

tacit knot Jun 10, 2023, 9:12 PM

#

Gotcha, is there any way to see/find that? I can see some using nvidia-smi, but the numbers are still way different. And I'm very aware that I just don't fully understand this stuff, just need the right thing/name/concept to look into more.

iron basalt Jun 10, 2023, 9:15 PM

#

tacit knot Gotcha, is there any way to see/find that? I can see some using `nvidia-smi`, bu...

Nvidia-smi gives you some of the missing info, you also need the base amount of memory Pytorch needs.

#

2109.44 MiB (2.06 GiB free) + 7731.2 MiB (7.55 GiB reserved) + 1251 MiB (Pytorch base amount) = 11091.64 MiB = ~10.83 GiB If you include other processes you can see how it's close to 11.69.

#

In addition, there can be memory fragmentation and other issues.

tacit knot Jun 10, 2023, 9:19 PM

#

gotcha, that helps a bit. Honestly, I'm just a bit surprised how easy it is to blow this sort of thing up. The only way I was able to get the memory leak even semi under control was to tear down the entire model and re-build it if it encounters an out of memory error. Kinda rough, but I can make due with some caching of the "determined" maximums.

#

Yea, tried the environment variable at 64mb, 128mb, 512mb, and 1024mb with no significant differences.

iron basalt Jun 10, 2023, 9:20 PM

#

Pytorch is not super memory efficient, it's setup for easy experimentation.

tacit knot Jun 10, 2023, 9:20 PM

#

Ah, gotcha, good to know and keep in mind.

iron basalt Jun 10, 2023, 9:20 PM

#

Making full use of memory requires custom implementations that are not even close to being worth the effort (a lot of work to implement all the manual allocation and custom kernels for the specific model) when you can just buy a bigger GPU or more GPUs.

#

The only applications that will try to squeeze out all of the memory of a GPU is more or less video games (and only some of them (especially if they have to run on consoles)).

tacit knot Jun 10, 2023, 9:23 PM

#

Ugh, I've done a little bit of PS1/PS2 game dev long long ago. Not a fan of writing custom memory managers lol

iron basalt Jun 10, 2023, 9:24 PM

#

In the world of ML we just throw more hardware at it, because we usually can / are running in the cloud.

tacit knot Jun 10, 2023, 9:24 PM

#

ugh, if i re-run this same thing a handful of times, the max resolution I can run at floats up/down by 10-15% without making any changes

#

Yea, I'm seeing that lol...i'm trying to build this thing to run on consumer level hardware AND i'm trying to make it pretty much auto configured, at least for a reasonable set of defaults

iron basalt Jun 10, 2023, 9:26 PM

#

Consumer level hardware ML is very rough, because of things like this.

#

The ecosystem is not really setup for it, at least not in an easy way.

tacit knot Jun 10, 2023, 9:27 PM

#

Finding that out lol...the early version of this memory leak was so bad that it would crash OTHER applications and occasionally my entire user session (Kubuntu, flavor of Ubuntu) and that kinda surprised me to say the least.

iron basalt Jun 10, 2023, 9:28 PM

#

Yeah, and it will crash some drivers too, can brick some ppl's PCs in some cases like some games do.

#

When I experiment on the GPU locally I often have to turn it off and back on again.

tacit knot Jun 10, 2023, 9:29 PM

#

I haven't worked on the infrastructure/DevOps side of things in a long time, but would a Docker container even provide any protection/sandboxing?

iron basalt Jun 10, 2023, 9:30 PM

#

We use a whole virtual machine now, and set an upper limit on memory usage, lower by a decent amount than the total. Can just restart the instance then.

tacit knot Jun 10, 2023, 9:30 PM

#

It seems like some of this stuff is almost what I used to call "bare metal" code back in my C/C++ days.

iron basalt Jun 10, 2023, 9:30 PM

#

Requires some annoying GPU pass-through work though, which your motherboard needs to support.

tacit knot Jun 10, 2023, 9:31 PM

#

hmm my understanding was that VMs don't expose the GPU very well from a performance/overhead point of view

iron basalt Jun 10, 2023, 9:31 PM

#

tacit knot hmm my understanding was that VMs don't expose the GPU very well from a performa...

It depends on the hardware, they need to support virtualization, for the GPU.

tacit knot Jun 10, 2023, 9:31 PM

#

Ahh that makes sense. Kinda a limiting factor if i'm trying to make this broadly available/usable.

iron basalt Jun 10, 2023, 9:32 PM

#

Yeah, support is spreading, but it will take a few years to be everywhere.

tacit knot Jun 10, 2023, 9:32 PM

#

Also trying to avoid the insane install/setup process that I've had to go through with some of these experimental AI/ML things, standing up instant-ngp (NeRF) was insane for me.

iron basalt Jun 10, 2023, 9:33 PM

#

These kinds of issues is one the reasons we don't see ML on consumer hardware used in applications everywhere yet. GPU work is very buggy / annoying, and kind of miracle that people manage to get video games to run on all hardware configurations (spoilers, they don't, it's endless bug reports (working with consoles is very nice by comparison / fixed hardware)).

tacit knot Jun 10, 2023, 9:35 PM

#

Fixed hardware, but at least back when I was in school (degree in game design/development from a place called Full Sail) that meant you were also going to have to write your own low level libraries and such like memory managers or basic data structures. lol I shouldn't have to write my own doubly linked list damnit...

#

Damn, that was 20 years ago, just realizing that lol

iron basalt Jun 10, 2023, 9:38 PM

#

tacit knot Fixed hardware, but at least back when I was in school (degree in game design/de...

Yeah, but at least someone could write it once, and it works. With random hardware combinations on PCs you have weird stuff like specific GPU with specific driver version bricks the PC when they plug in a specific keyboard, and you get a bug report that makes no sense, all you can do is random trial and error.

#

And then to add to injury, the OS is really buggy (and getting worse over time (e.g. Windows 11 can barely manage to make a window fullscreen now without 10%+ of users crashing)).

tacit knot Jun 10, 2023, 9:43 PM

#

ha, yea, the unknown combinations makes stuff a lot trickier

low delta Jun 11, 2023, 12:56 AM

#

i'm new to AI. Anyone knows the GODEL NLP model (based on T5) taht can give me some pointers? stuck on two things.

serene scaffold Jun 11, 2023, 1:11 AM

#

low delta i'm new to AI. Anyone knows the GODEL NLP model (based on T5) taht can give me s...

Be sure to always ask a complete question that someone who knows the answer can start answering

low delta Jun 11, 2023, 1:18 AM

#

low delta i'm new to AI. Anyone knows the GODEL NLP model (based on T5) taht can give me s...

The big thing is I'm training the model IAW GODEL data structure of Context / Knowledge / Response using Huggingface Trainer. The code executed completely, but there doesn't seem to be any effect. I tried to query the model and it wouldn't give me any hint of the training data. The second thing is I'm working on a SQUAD metric to monitor the training process, but ran into some data structure conflict between the Trainer's evaluation output eval_pred and the SQUAD metric's input metric.add_batch.

warm copper Jun 11, 2023, 2:48 AM

#

guys

#

what do you prefer? over or undersampling?

serene scaffold Jun 11, 2023, 4:02 AM

#

warm copper what do you prefer? over or undersampling?

for what? this sounds like something where you have to consider what the model is intended to do and what the tradeoffs are for its use case.

weak lagoon Jun 11, 2023, 7:03 AM

#

I have a big data problem. I have csv files with over 300 gb. Data is in multiple files. I want to carry out EDA. Data frames memory limit is only 100gb so that is not sufficient plus it is taking too long to process a single file even after chunking. What is the solution?

weak lagoon Jun 11, 2023, 7:13 AM

#

warm copper what do you prefer? over or undersampling?

Both are wrong and will create an incorrect model. Under sampling will does have enough training and oversampling will go in too much detail to get the prediction wrong

glacial rampart Jun 11, 2023, 7:26 AM

#

weak lagoon I have a big data problem. I have csv files with over 300 gb. Data is in multipl...

Well that's the problem with Big Data and why cloud processing is a thing right.. if you really want to do it locally, it simply WILL cost a lot of time. But I can add using the Threading library if you havent yet. Then you can at least process multiple chunks at the same time

#

@weak lagoon I actually did just stumble on https://docs.dask.org/en/stable/scheduling.html while looking for examples. Might be interesting for you

past meteor Jun 11, 2023, 8:31 AM

#

warm copper what do you prefer? over or undersampling?

None

#

Gonna become an evangelist for this topic.

train classifier
Make your ROC curves, precision recall plots, DET-curves, ...
Select your decision boundary based on where you want to land on these curves

#

Under/oversampling, class weights, ... are an opaque way to solve this problem. If there is a signal in the data your data you will see a difference between the negative and positive class

#

I think these are better because they're easier to reason about and you use your data, which is hopefully a representable sample of the population, as is. The one I could get most behind is class weights because at least you're not messing with your sample.

median leaf Jun 11, 2023, 9:16 AM

#

do any of you guys understand AIC and BIC scores in regression models?

past meteor Jun 11, 2023, 10:09 AM

#

median leaf do any of you guys understand AIC and BIC scores in regression models?

What is your specific question?

median leaf Jun 11, 2023, 10:09 AM

#

im tryna figure out why my AIC and BIC scores are in the 600s

past meteor Jun 11, 2023, 10:22 AM

#

median leaf im tryna figure out why my AIC and BIC scores are in the 600s

Have you looked at the formula's of AIC and BIC? They're pretty intuitive imo

wooden sail Jun 11, 2023, 10:47 AM

#

the number itself is also not that important, only that pick the smallest one w.r.t. the model order

#

much like in most other cost functions

lapis sequoia Jun 11, 2023, 11:27 AM

#

how would you guys impliment an api into a chatbot

crimson summit Jun 11, 2023, 1:28 PM

#

Does anyone know why in the pircture on the right it says it was much much fastter if in that image it goes through 900 iterations and in the image on the left it only goes through 9

glacial rampart Jun 11, 2023, 2:21 PM

#

It doesn't say it's faster, it says it converges faster. In the left where you did 10 runs (which was your max iteration input), you didn't get to the point of convergence yet. And since the learning rate is small, it would take more iterations. Why don't you compare them both on 2000 iterations?

cosmic harbor Jun 11, 2023, 2:26 PM

#

I cannot uninstall a package. Can anyone help?

$ vcs
bash: vcs: command not found...
Install package 'python3-vcstool' to provide command 'vcs'? [N/y] y

$ vcs
usage: vcs <command>

Most commands take directory arguments, recursively searching for repositories
in these directories.  If no arguments are supplied to a command, it recurses
on the current directory (inclusive) by default.

The available commands are:
   branch     Show the branches
   custom     Run a custom command
   diff       Show changes in the working tree
   export     Export the list of repositories
   import     Import the list of repositories
   log        Show commit logs
   pull       Bring changes from the repository into the working copy
   push       Push changes from the working copy to the repository
   remotes    Show the URL of the repository
   status     Show the working tree status
   validate   Validate the repository list file

See 'vcs <command> --help' for more information on a specific command.

$ pip uninstall vcs
WARNING: Skipping vcs as it is not installed.

$ pip uninstall python3-vcstool
WARNING: Skipping python3-vcstool as it is not installed.```

tidal bough Jun 11, 2023, 2:29 PM

#

cosmic harbor I cannot uninstall a package. Can anyone help? ``` $ vcs bash: vcs: command not ...

that's not a python package i think, that's an apt-get one, or whatever package manager you're using.

glacial rampart Jun 11, 2023, 2:30 PM

#

Ye also thought its probably apt, though I'm not familiar with Bash. 'apt remove python-vcstool' should do it. If it really is pip you should be able to see your packages with 'pip list'

past meteor Jun 11, 2023, 2:31 PM

#

pip list | grep vcs

dense crane Jun 11, 2023, 4:53 PM

#

how am i suppose to convert each sentence to tensors in case of making the seq2seq model, because each sentece dont have to contains the same number of tokens so while i will be training that model might occurs the problem with that, dont you agree with me?

hasty mountain Jun 11, 2023, 5:18 PM

#

dense crane how am i suppose to convert each sentence to tensors in case of making the seq2s...

That's why you use a <pad> token to your sentence, so you can make all your sentences have the same size and convert them to tensors.

#

Just add the pad tensor to all your sentences until all of them have the same size as the largest one.

dense crane Jun 11, 2023, 5:20 PM

#

is this like adding zeros?

keen gust Jun 11, 2023, 5:24 PM

#

just came across datalore by jetbrains, does anyone use this? I'm working on data reports/insights for some of our staff (non-technical) and found streamlit to be limited. While looking up Dash documentation I came across this and it looked interesting. Curious if anyone has tried it or uses it currently

dense crane Jun 11, 2023, 5:26 PM

#

so i am suppose to the vocab where the each token will have his own number (including <pad>, <eos> and <sos>) then i am converting each sentece to those vectors of integers and from there i am converting this to embeddings, right?

#

@hasty mountain

past meteor Jun 11, 2023, 5:26 PM

#

keen gust just came across datalore by jetbrains, does anyone use this? I'm working on dat...

What type of analysis do you do?

hasty mountain Jun 11, 2023, 5:27 PM

#

dense crane so i am suppose to the vocab where the each token will have his own number (incl...

Yes. And it's like adding zeros. The difference is that the zeros will be the value of the vector of those tokens

#

So a sentence <how><are><you><doing><?><pad> will be something like[1, 2, 3, 4, 5, 0]

dense crane Jun 11, 2023, 5:28 PM

#

hasty mountain So a sentence `<how><are><you><doing><?><pad>` will be something like`[1, 2, 3, ...

yeah and then creating the embedding from that right?

hasty mountain Jun 11, 2023, 5:28 PM

#

Yes

past meteor Jun 11, 2023, 5:29 PM

#

I'm asking because it's important context to answer your question correctly 🙂

dense crane Jun 11, 2023, 5:29 PM

#

so ok thx!

keen gust Jun 11, 2023, 5:31 PM

#

past meteor What type of analysis do you do?

it'll mostly focus on financial data like revenue (MoM,YoY,etc.) for 70+ locations and then stats related to our main product offering so stuff like # of players, bookings, maybe customer demographics. The end goal is to replace our current Google Studio reporting with something a bit more automated while maintaining a very user friendly ui

past meteor Jun 11, 2023, 5:32 PM

#

Okay the big brain answer is to focus on data prep and give business a BI tool and let them make the reports themselves

#

If that's a step too far in your org, you should still use a BI tool (Power BI, Tableau, Looker) to make the reports for them

keen gust Jun 11, 2023, 5:37 PM

#

past meteor If that's a step too far in your org, you should still use a BI tool (Power BI, ...

yeah it's still a very small op so there's no dedicated team for that unfortunately - it's likely to fall on myself to push out or at least give upper mgmt key stats at a glance. I was able to put together a basic dashboard on streamlit but it got a bit limiting as I wanted to do more so I've just been searching for alternatives. Our data isn't overwhelming and luckily a lot is self reported by our locations, it's just cleaning it up and trying to put together meaningful reporting. Not opposed to using a BI tool though

past meteor Jun 11, 2023, 5:39 PM

#

What's limiting about streamlit for your case?

keen gust Jun 11, 2023, 5:46 PM

#

past meteor What's limiting about streamlit for your case?

at the moment I have a working db for our corporate locations w/ basic stats our controller uses for her work - this likely can be scaled & adjusted for franchise locations so it's not an issue. On the other hand, staff in our corp. locations are asking if something can also be done for them so that they can skip exporting csvs, making tables, etc. for their daily reporting. I'd ideally like to keep these as private apps and right off the back not being able to deploy more than 1 on SL is an issue. Granted, I've been mostly self teaching Python for a few months so if any workarounds or more appropriate methods exist it's completeply plausible I just don't have the 'know' right now. Streamlit also re-running on a user action isn't ideal if I expect a bit of users to be on it at once.

#

also appreciate the responses & help fyi

past meteor Jun 11, 2023, 5:48 PM

#

I think there's several issues here right?

You want a way to onboard the data in a better way
you want people to see only the data they're authorized to see?

#

if that's the case, you need to find out what is generating the data for those CSVs and either have it upload data to your DB directly or at least automate exporting the CSVs and then parsing it and putting it in your DB (less robust).

zinc briar Jun 11, 2023, 5:50 PM

#

Is algebraic geometry used in machine learning

past meteor Jun 11, 2023, 5:51 PM

#

keen gust at the moment I have a working db for our corporate locations w/ basic stats our...

For point 2, Yeah, thhat's an issue with streamlit. There's an authentication package for streamlit. As soon as a user is authenticated you could filter the dataset to show only what they're allowed to see. That's called row level security. Means you only need to make a single app.

#

BI tools at least have the second part baked into it.

keen gust Jun 11, 2023, 6:02 PM

#

past meteor For point 2, Yeah, thhat's an issue with streamlit. There's an authentication pa...

thank you 🙂

cerulean kayak Jun 11, 2023, 6:14 PM

#

Anybody got a website that has a list or a list of pluggins that improve productivity/are just good to have for DataScience in Python Notebook via VSCode?
At me if you got anything.

rose dagger Jun 11, 2023, 6:31 PM

#

If i load my data by a Tensorflow DataGenerator from a directory like this, what file formats will the datagenerator accept? I'm trying to load grayscale images represented as 2D numpy arrays, but the datagenerator seems to recognize "0 images" in the directory.

crimson summit Jun 11, 2023, 6:34 PM

#

glacial rampart It doesn't say it's faster, it says it converges faster. In the left where you d...

makes sense thx bro

#

i got another question real quick

#

Why is the graph of the cost vs itteration steeper for the picture on the right which has a smaller learning rate than the picture on the left. I would thing that since the picture on the left has a bigger learning rate the graphs would be opposite

glacial rampart Jun 11, 2023, 6:47 PM

#

Whether a learning rate is (too) small or (too) big depends on the data. IF the learning rate is too small, a larger learning rate value will lead to faster convergence. However, if the learning rate is good or already too much, the too large learning rate leads to not finding the convergence at all or later. The answer lies in how the w-values develop over iterations.

crimson summit Jun 11, 2023, 6:48 PM

#

glacial rampart Whether a learning rate is (too) small or (too) big depends on the data. IF the ...

on the picture on the right at the bottom why does it say that it is converging slower that the picture on the left

#

the pic on the right has a much steeper cost vs itteration graph

safe lintel Jun 11, 2023, 6:49 PM

#

yo anyone here now about R programming?

past meteor Jun 11, 2023, 6:49 PM

#

safe lintel yo anyone here now about R programming?

Ask away

safe lintel Jun 11, 2023, 6:49 PM

#

have any idea about creating a subset using 2 diff datasets?

past meteor Jun 11, 2023, 6:51 PM

#

I don't understand your question

safe lintel Jun 11, 2023, 6:52 PM

#

like i have 2 diff data files and not all samples are present in both files. so we need to first subset the samples that are present in both files.

glacial rampart Jun 11, 2023, 6:54 PM

#

safe lintel like i have 2 diff data files and not all samples are present in both files. so ...

Dplyr inner join is probably what you're looking for

glacial rampart Jun 11, 2023, 6:55 PM

#

crimson summit on the picture on the right at the bottom why does it say that it is converging ...

I'm not sure what they mean by that tbh

safe lintel Jun 11, 2023, 6:55 PM

#

yes i did that , but the issue is my one data file does not have any header name and that is creating problem for me

#

if u could allow i can send u the file to have a look.

glacial rampart Jun 11, 2023, 6:57 PM

#

I'm watching a competition atm 😛 so sorry not going to do too much. Are the files comparable? E.g. should the 2nd file have the same header as file 1?

safe lintel Jun 11, 2023, 6:58 PM

#

these are headers in 1st file ("" "type" "tissue_source_site" "disease_type") but for the second file ("" "TCGA.E9.A1NI.01A" "TCGA.A1.A0SP.01A" "TCGA.E2.A14T.01A" "TCGA.AR.A24O.01A"........) it just start like this no header or something at all

past meteor Jun 11, 2023, 6:59 PM

#

Are the columns the same?

#

Why not just append them to each other

safe lintel Jun 11, 2023, 7:00 PM

#

there are no colomns in 2nd file.

#

1st file

#

2nd file

#

...

glacial rampart Jun 11, 2023, 7:08 PM

#

So how would you like to subset them?

safe lintel Jun 11, 2023, 7:09 PM

#

i have no idea how subsetting words , if u can shed some lights on it

glacial rampart Jun 11, 2023, 7:09 PM

#

Well, if you want to merge or use those files together, there should be a way to match the records, right?

#

Otherwise you just put random data together which is meaningless

safe lintel Jun 11, 2023, 7:10 PM

#

yes the header "type" has data which is also present in 2nd file

glacial rampart Jun 11, 2023, 7:10 PM

#

I'm not sure how it works in R, but in that case I'd remove both file headers and manually create header arrays and load the files with those

#

make sure the "type" header is on the correct location for both files and name other columns whatever you want

safe lintel Jun 11, 2023, 7:10 PM

#

if u can explain in python that too work

glacial rampart Jun 11, 2023, 7:11 PM

#

In Pandas you can specify column names when you read_csv

#

Just like this: ['name_a', 'name_b', 'yolo', 'type']

safe lintel Jun 11, 2023, 7:11 PM

#

but as i told the 2nd file looks like this , how im gonna create header ?

#

gene_expression_df = pd.read_csv('tumor_gene_expression_data.csv', 'name_a') like this?

glacial rampart Jun 11, 2023, 7:13 PM

#

Hmm that file will be a problem since it all seems to be on 1 row and only whitespace separated

safe lintel Jun 11, 2023, 7:14 PM

#

yes, that i was banging my head since yday, tried a lot of soln still nothing workds 😦

#

any fix for this?

agile cobalt Jun 11, 2023, 7:18 PM

#

hard to tell without having the full file to look at, but worst case scenario just parse it with normal string manipulation then pass it to pandas as a dictionary of lists

glacial rampart Jun 11, 2023, 7:18 PM

#

Yes it can be fixed, but it will involve some work. Think of 'readlines()', regular expressions and writing to csv. Those are the 3 things you will need to do is my guess.

#

If you can get it to csv format, all you need to do is specify headers yourself and you can use it however you want

safe lintel Jun 11, 2023, 7:21 PM

#

i tried this code
import pandas as pd

Read the TSV file

df = pd.read_csv('TCGA-BRCA.htseq_fpkm-uq_gene_name.tsv', sep='\t')

Convert to CSV

df.to_csv('output_file.csv', index=False)

#

now it looks like this

glacial rampart Jun 11, 2023, 7:24 PM

#

Ahah, that is a lot different than I expected. That makes it a lot easier though. There's just a lot of column_names but the file quality seems fine.

safe lintel Jun 11, 2023, 7:24 PM

#

what should i do next?

glacial rampart Jun 11, 2023, 7:24 PM

#

So are you now able to find the column_name you want?

safe lintel Jun 11, 2023, 7:25 PM

#

no, i did not able to get

glacial rampart Jun 11, 2023, 7:25 PM

#

Oh I guess I understand what you want to do now

#

You need to pivot your data

safe lintel Jun 11, 2023, 7:26 PM

#

how exactly?

warm copper Jun 11, 2023, 7:29 PM

#

@weak lagoon then how do you solve imbalanced target variable

#

@serene scaffold I was trying to offset imbalanced target variable

#

also imbalanced target variable also creates incorrect model and you need to offset it

glacial rampart Jun 11, 2023, 7:31 PM

#

safe lintel how exactly?

Can you rename your first column to 'identifier' or whatever?

safe lintel Jun 11, 2023, 7:31 PM

#

glacial rampart Can you rename your first column to 'identifier' or whatever?

the unnamed thing?

glacial rampart Jun 11, 2023, 7:33 PM

#

safe lintel the unnamed thing?

Yes

safe lintel Jun 11, 2023, 7:34 PM

#

glacial rampart Yes

done next

glacial rampart Jun 11, 2023, 7:39 PM

#

Haven't tested it myself, but try: df_unpivot = pd.melt(df, id_vars='unnamed', value_vars=df.columns[1:-1])

safe lintel Jun 11, 2023, 7:41 PM

#

glacial rampart Haven't tested it myself, but try: df_unpivot = pd.melt(df, id_vars='unnamed', v...

glacial rampart Jun 11, 2023, 7:45 PM

#

Didn't know how to exactly write it without testing

#

df_result = df.melt(id_vars=['unnamed'], value_vars=df.loc[:, df.columns != "unnamed"])
This should work. Replce unnammed with what u called it

#

(and ofc replace df with gene_expression_df)

safe lintel Jun 11, 2023, 7:56 PM

#

glacial rampart (and ofc replace df with gene_expression_df)

#

well after fixing a bit got this

brave sand Jun 11, 2023, 9:03 PM

#

for this model ssd_resnet50_v1_fpn_640x640_coco17_tpu-8

#

what is the recommended image size?

#

my current image size from the xml files is this:

<size>
<width>4056</width>
<height>3040</height>
<depth>3</depth>```

#

is that too much?

#

i'm still dealing with a filling up kernal issue on colab

agile cobalt Jun 11, 2023, 9:06 PM

#

brave sand my current image size from the xml files is this: ``` <size> <width>4056</width>...

those are some really large images

agile cobalt Jun 11, 2023, 9:06 PM

#

brave sand for this model `ssd_resnet50_v1_fpn_640x640_coco17_tpu-8`

if that 640x640 means it expects 640x640...

brave sand Jun 11, 2023, 9:06 PM

#

yeah i thought that would be the culprit

brave sand Jun 11, 2023, 9:06 PM

#

agile cobalt if that `640x640` means it expects 640x640...

so i compress them?

agile cobalt Jun 11, 2023, 9:08 PM

#

I don't know how tensorflow works very well, but it sounds like it might resize automataically: https://github.com/tensorflow/models/blob/5f0f949de9667552d85f0922191a05a0c9d0a99c/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config#L46-L51

arctic wedgeBOT Jun 11, 2023, 9:08 PM

#

research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config lines 46 to 51

image_resizer {
  fixed_shape_resizer {
    height: 640
    width: 640
  }
}```

brave sand Jun 11, 2023, 9:08 PM

#

agile cobalt I don't know how tensorflow works very well, but it sounds like it might resize ...

lol we googled the same thing, yeah the tutorial never said anything about image size, i thought it would resize on it's own

#

this memory issue is really bugging me now

agile cobalt Jun 11, 2023, 9:08 PM

#

in https://www.tensorflow.org/api_docs/python/tf/image/resize it says the the default method is

bilinear: Bilinear interpolation. If antialias is true, becomes a hat/tent filter function with radius 1 when downsampling.
but I do not know if that applies to that config

coral field Jun 11, 2023, 9:11 PM

#

for subwords that replace out of vocabulary words, how does the model interpret the new definition?

glacial rampart Jun 11, 2023, 9:14 PM

#

safe lintel

I'm convinced my code should work. Probably something went wrong with renaming that column
You can indeed remove the brackets though. So you end up with:
gene_expression_df.melt(id_vars='identifier', value_vars=df.loc[:, df.columns != 'identifier'])

brave sand Jun 11, 2023, 9:22 PM

#

agile cobalt in <https://www.tensorflow.org/api_docs/python/tf/image/resize> it says the the ...

yeah it does resize it

#

im not sure what is wrong then

#

should I still resize it bc the image is still massive?

#

i reduced the dataset to 100 images only

coral bloom Jun 11, 2023, 9:44 PM

#

is it true that gpt-4 training data was only 45GB?

#

anyone?

agile cobalt Jun 11, 2023, 9:45 PM

#

almost definitely not

#

maybe the fine tuning dataset, but the full training data (including pre-training)

warm copper Jun 11, 2023, 10:05 PM

#

@agile cobalt how do you deal with unbalanced target values

agile cobalt Jun 11, 2023, 10:07 PM

#

warm copper <@256442550683041793> how do you deal with unbalanced target values

depends on how unbalanced it is
for some things you can just use a "balanced" function for the loss (which applies weights based on the number of elements in that group)

warm copper Jun 11, 2023, 10:07 PM

#

Very unbalanced

#

Oversampling or undersampling is not suitable?

#

Or SMOT @agile cobalt

#

#

see @agile cobalt

#

rf_model_balanced = RandomForestClassifier(random_state=0, class_weight='balanced')
rf_model_balanced.fit(train_features, train_labels)
rf_model_balanced_pred = rf_model_balanced.predict(test_features)

print(classification_report(test_labels, rf_model_balanced_pred))

#

this is what I did

dusty plaza Jun 11, 2023, 11:43 PM

#

ChessDotComResponse(stats=Collection(chess_blitz=Collection(best=Collection(date=1673051280, game='', rating=310), last=Collection(date=1686273194, rating=207, rd=119), record=Collection(draw=1, loss=8, win=3)), chess_bullet=Collection(best=Collection(date=1679884176, game='', rating=451), last=Collection(date=1686027007, rating=309, rd=113), record=Collection(draw=0, loss=5, win=4)), chess_daily=Collection(last=Collection(date=1673799161, rating=400, rd=350), record=Collection(draw=0, loss=1, time_per_move=66341, timeout_percent=0, win=0)), chess_rapid=Collection(best=Collection(date=1686345903, game='', rating=651), last=Collection(date=1686362671, rating=651, rd=31), record=Collection(draw=35, loss=187, win=169)), puzzle_rush=Collection(best=Collection(score=6, total_attempts=9)), tactics=Collection(highest=Collection(date=1685588227, rating=1349), lowest=Collection(date=1675015529, rating=412))))

Anyone know how I can use this data to pull the wins, draws, losses of a specific game type? I am planning on figuring out how to differentiate between them

plucky bolt Jun 12, 2023, 12:59 AM

#

Anyone here also work with C++?

zinc briar Jun 12, 2023, 2:17 AM

#

plucky bolt Anyone here also work with C++?

Of course! In the realm of C++, I find myself immersed in the intricacies of its profound abstractions and the boundless tapestry of algorithmic finesse it offers. Metaprogramming serves as a key instrument, allowing us to transcend the limitations of traditional runtime execution by deftly manipulating compile-time computation. Memory management becomes a virtuosic endeavor as I orchestrate intricate ballets of optimization, minimizing runtime overhead through the adept utilization of smart pointers, move semantics, and refined memory layouts. Within the expansive dominion of the C++ standard library, I harness an array of opulent algorithms, containers, and utilities, evoking an exquisite finesse within my code. Together, we embark on a collective pursuit of enlightenment, unraveling the enigmatic tapestry of C++ as we push the boundaries of innovation and amplify the crescendo of our software engineering prowess.

serene scaffold Jun 12, 2023, 3:08 AM

#

zinc briar Of course! In the realm of C++, I find myself immersed in the intricacies of its...

This looks like a ChatGPT response

serene scaffold Jun 12, 2023, 3:09 AM

#

plucky bolt Anyone here also work with C++?

This is the data science channel. Why do you ask about C++?

plucky bolt Jun 12, 2023, 3:12 AM

#

serene scaffold This is the data science channel. Why do you ask about C++?

The reason I asked about that is because this channel, according to the description, is also about matplotlib. I have used that and loved it. I tried to use it with C++ for easy quick plots but ran into some issues. I was hoping there might be folks on here who have similarly tried to exploit matplotlib.

serene scaffold Jun 12, 2023, 3:14 AM

#

plucky bolt The reason I asked about that is because this channel, according to the descript...

While matplotlib is on topic here, that's only if you're using it in python.

plucky bolt Jun 12, 2023, 3:15 AM

#

serene scaffold While matplotlib is on topic here, that's only if you're using it in python.

Well. I tried! 😛

queen cradle Jun 12, 2023, 4:37 AM

#

zinc briar Is algebraic geometry used in machine learning

I wish! I will celebrate the day that someone finds a non-trivial use of ample line bundles in machine learning.

coral field Jun 12, 2023, 4:44 AM

#

why can tensorflow models take in both tensors and numpy arrays, but not regular lists? why even convert to numpy?

wooden sail Jun 12, 2023, 4:55 AM

#

because you's realistically do math, e.g. preprocessing of some sort in numpy and its arrays are easy to convert for many reasons: having a single type, being memory adjacent, and having an efficient interface like being able to take in or give out their buffer. pretty much none of those are true for python lists

coral field Jun 12, 2023, 4:57 AM

#

ah

wooden sail Jun 12, 2023, 5:00 AM

#

you should be able to use a list of numpy arrays though

#

convert_to_tensor should be able to grab lists and make tf tensors out of them

zinc briar Jun 12, 2023, 5:40 AM

#

Cant numpy arrays be tensors

wooden sail Jun 12, 2023, 5:43 AM

#

sure they can, but i think they meant specifically tf tensors

lapis sequoia Jun 12, 2023, 6:50 AM

#

#

simple uncens chatgpt

#

source code in my github

worn mango Jun 12, 2023, 6:57 AM

#

anybody know why vector addition using numpy might produce different results across two machines?

#

numpy and python versions are the same ^

wooden sail Jun 12, 2023, 7:20 AM

#

worn mango anybody know why vector addition using numpy might produce different results acr...

is the backend for numpy also the same?

brazen lichen Jun 12, 2023, 7:31 AM

#

Hi, Could someone please share data science - AI learning path and topics to be covered ?

tidal bough Jun 12, 2023, 8:38 AM

#

worn mango anybody know why vector addition using numpy might produce different results acr...

one thing that comes to mind is that the default integer and float types depend on the platform

#

like, np.zeros(100) may be float32 or float64 depending on the system

coral bloom Jun 12, 2023, 8:47 AM

#

does anyone know

#

how to render template in a subfolder of templates in flask

potent sky Jun 12, 2023, 10:18 AM

#

tidal bough one thing that comes to mind is that the default integer and float types depend ...

Also numpy is simd optimised which has different implementations for each architecture. So maybe differences there

tidal bough Jun 12, 2023, 10:19 AM

#

I thought basic CPU operations were guaranteed to produce consistent results

potent sky Jun 12, 2023, 11:04 AM

#

That may be, but from what I understand, the vectorization process can potentially change the order of operations. Considering this manipulation is at very low level of memory management, this could lead to slightly different results, especially across different architectures as the simd implementations will also be different

wooden sail Jun 12, 2023, 11:07 AM

#

tidal bough I thought basic CPU operations were guaranteed to produce consistent results

as long as you do them in the same order. depending on how you install numpy tho, don't you end up with different BLAS/MKL flavors?

tidal bough Jun 12, 2023, 11:09 AM

#

potent sky That may be, but from what I understand, the vectorization process can potential...

ah, right, that makes sense

tidal bough Jun 12, 2023, 11:09 AM

#

wooden sail as long as you do them in the same order. depending on how you install numpy tho...

yeah, pypi build uses blas, anaconda's is a fancy mkl one

dense umbra Jun 12, 2023, 11:58 AM

#

brazen lichen Hi, Could someone please share data science - AI learning path and topics to be...

Hi, are you still looking for this? I can dm you some recommendations if you like.

brazen lichen Jun 12, 2023, 12:01 PM

#

dense umbra Hi, are you still looking for this? I can dm you some recommendations if you lik...

Yes please that would help.

iron basalt Jun 12, 2023, 5:16 PM

#

tidal bough I thought basic CPU operations were guaranteed to produce consistent results

Not for floating point operations across different devices (SIMD or not).

#

It's why some multiplayer games (e.g. Starcraft) use fixed point.

#

(For deterministic results across different devices)

#

There are some CPU flags that can be enabled to get the same results, but they come at a performance cost. Some physics engines support that.

#

Then there are bugs, found in many math libraries, and the hardware itself.

warm copper Jun 12, 2023, 5:57 PM

#

Does anyone here know how to deal with imbalanced response variable?

#

I have a very imbalanced response variable for categorical prediction

#

I am trying to predict the possibility of a bankruptcy of a company

#

I am using RandomForestClassifier

#

I used class_weight='balanced' parameter

#

but Im not sure that is good enough

#

I read some people do under or oversampling

#

or SMOT

#

but some places tell me not to do those

#

so Im a bit confused

past meteor Jun 12, 2023, 6:03 PM

#

I've answered this question like 3x

warm copper Jun 12, 2023, 6:03 PM

#

you did?

#

you should have tagged me @past meteor

past meteor Jun 12, 2023, 6:05 PM

#

Instead of saying everything above 0.5 is bankrupt and under not you should look at precision recall, ROC, DET, ... curves and determine a cut-off yourself @warm copper

warm copper Jun 12, 2023, 6:06 PM

#

so like using under and oversampling? @past meteor

past meteor Jun 12, 2023, 6:06 PM

#

no, just your data as-is

warm copper Jun 12, 2023, 6:06 PM

#

how do I determine a cut off tho?

#

Ive never heard that before

past meteor Jun 12, 2023, 6:07 PM

#

For example you can simply plot the distribution of scores (a histogram) of the scores for your positive and negative class

#

And then you eyeball your data and decide where to put it

warm copper Jun 12, 2023, 6:08 PM

#

how would that even work tho when 98 percent of the data has bankrupted companies

#

lol

#

I already did that @past meteor

past meteor Jun 12, 2023, 6:09 PM

#

This is an example of what I mean

warm copper Jun 12, 2023, 6:09 PM

#

this is either 0 or 1 tho

past meteor Jun 12, 2023, 6:10 PM

#

So here you can see if I choosse the score at 0.08 there would be no more false negatives

warm copper Jun 12, 2023, 6:10 PM

#

I dont know what li score is tho

past meteor Jun 12, 2023, 6:10 PM

#

In your case li score will be whatever probability you have

#

I was working on finger print recognition so the li = left index, for you it'll be "bankruptcy score" or whatever

warm copper Jun 12, 2023, 6:11 PM

#

#

I have the count bars

#

😄

past meteor Jun 12, 2023, 6:11 PM

#

your model has a .predict() and a .score() method, use the latter

#

Sorry, it's .predict_proba()

#

I'm actually giving a pretty shitty explanation I know @warm copper , part of it is me not wanting to type out what I've done x3 over the past few days but that's my fault and not yours 😅

worn mango Jun 12, 2023, 6:20 PM

#

tidal bough one thing that comes to mind is that the default integer and float types depend ...

Think i could send a bit of code and you could take a look? I can tell you which line the difference occurs

warm copper Jun 12, 2023, 6:26 PM

#

lol

#

can I dm you?

#

@past meteor just to show

past meteor Jun 12, 2023, 6:26 PM

#

I prefer if we keep it here because then other people can add stuff as well 🙂

warm copper Jun 12, 2023, 6:26 PM

#

pred_prob = rf_model.predict_proba(test_features)
print(pred_prob)

plt.figure(figsize=(16, 10))
plt.hist(pred_prob[test_labels == 0], bins=50, label='Negatives', alpha=0.5, color='b')
plt.hist(pred_prob[test_labels == 1], bins=50, label='Positives', alpha=0.7, color='r')
plt.xlabel('Probability of being Positive Class', fontsize=25)
plt.ylabel('Number of records in each bucket', fontsize=25)
plt.legend(fontsize=15)
plt.tick_params(axis='both', labelsize=25, pad=5)
plt.show()

#

am I doiny something wrong here?

#

my test_features = X_test

past meteor Jun 12, 2023, 6:28 PM

#

You need to normalize the scores because I can imagine test_labels == 0 is a lot more than test_labels == 1

warm copper Jun 12, 2023, 6:28 PM

#

my test_labels = Y_test

#

I keep getting this error tho

#

ValueError: The 'color' keyword argument must have one color per dataset, but 2 datasets and 1 colors were provided

#

#

I fixed it

past meteor Jun 12, 2023, 6:33 PM

#

Now you still have to normalize the scores and then you're nearly there

warm copper Jun 12, 2023, 6:33 PM

#

thet are already normalized tho

past meteor Jun 12, 2023, 6:34 PM

#

How is your y-axis then going to 5000

warm copper Jun 12, 2023, 6:34 PM

#

train_features_scaled = scaler.fit_transform(train_features)
test_features_scaled = scaler.transform(test_features)```

past meteor Jun 12, 2023, 6:34 PM

#

I mean normalize your histogram

warm copper Jun 12, 2023, 6:34 PM

#

I used standard scaler

#

how do I normalize it?

#

density=true?

past meteor Jun 12, 2023, 6:35 PM

#

yes

warm copper Jun 12, 2023, 6:37 PM

#

#

question tho

#

why do I have negatives and positives legends different color

#

pithink

past meteor Jun 12, 2023, 6:40 PM

#

No idea. The general idea btw is that these plots can help you select a sensible probability to decide if something is in the negative or positive class.

#

Ties in well with the ideas behind ROC-curves, precision - recall, etc.

warm copper Jun 12, 2023, 6:40 PM

#

yeah

#

😄

past meteor Jun 12, 2023, 6:41 PM

#

Undersampling, oversampling, class weights take part of this out of your hands when it should probably be a decision you make

warm copper Jun 12, 2023, 6:41 PM

#

Im going to try to fix the graph first

#

https://towardsdatascience.com/pythons-predict-proba-doesn-t-actually-predict-probabilities-and-how-to-fix-it-f582c21d63fc

Medium

Python’s «predict_proba» Doesn’t Actually Predict Probabilities (an...

Data scientists typically evaluate their predictive models in terms of accuracy or precision, but hardly ever ask themselves: However, an accurate estimate of probability is extremely valuable from a…

#

interesting @past meteor

past meteor Jun 12, 2023, 6:47 PM

#

Yeah they're not calibrated probabilities

#

They're over and/or underestimated but for this example that doesn't matter too much

warm copper Jun 12, 2023, 6:48 PM

#

okay

#

I dont know why my graph is messing up

#

did you seaborn for yours?

past meteor Jun 12, 2023, 6:49 PM

#

Yes, sns.histplot()

warm copper Jun 12, 2023, 6:49 PM

#

I think the probalem is that predict_proba returns a 2d array

#

problem**

#

maybe I need to convert it to one dimensional array?

#

ValueError: The 'color' keyword argument must have one color per dataset, but 2 datasets and 1 colors

#

hence this error

zealous badger Jun 12, 2023, 6:51 PM

#

hi, i have a scatterplot in plotly and i want to display labels for the bubbles. but there's a lot of them and its not really neat. is there a good way to selectively display the labels?

warm copper Jun 12, 2023, 7:03 PM

#

#

colors are fixed @past meteor

#

increased the alpha too

#

oh wait

#

warm copper Jun 12, 2023, 7:41 PM

#

#

I found the best threshold using predict_proba @past meteor

#

Best Threshold=0.230000, F-Score=0.457

#

😄

past meteor Jun 12, 2023, 7:43 PM

#

Selecting the right one is more a business concern than a data science one. Do you find false positives or false negatives worse? 🙂

warm copper Jun 12, 2023, 7:43 PM

#

so this is for false positive

#

should I also perform for false negative?

#

i did it for positive outcome

#

😄

past meteor Jun 12, 2023, 7:47 PM

#

It's fine like this tbh

warm copper Jun 12, 2023, 7:47 PM

#

Best Threshold=0.230000, F-Score=0.457
Best Threshold=0.110000, F-Score=0.125

#

past meteor Jun 12, 2023, 7:48 PM

#

There's a million other plots like this you can make: FPR-FRR (false positive vs false rejection rate), DET (detection error), ROC, PR curve, ...

warm copper Jun 12, 2023, 7:48 PM

#

false negative looks horrible

#

😄

#

this is for negative outcome

#data-science-and-ml

Create some variables.

Add an op to initialize ...

Read the TSV file

Convert to CSV