umbral hatch Apr 21, 2025, 5:15 PM

#

plateau tho

viscid urchin Apr 21, 2025, 5:15 PM

#

and then there's https://course.fast.ai/

Practical Deep Learning for Coders

Practical Deep Learning for Coders - Practical Deep Learning

A free course designed for people with some coding experience, who want to learn how to apply deep learning and machine learning to practical problems.

umbral hatch Apr 21, 2025, 5:16 PM

#

down the line what should it be? should i learn like advanced math and data and shit?

viscid urchin Apr 21, 2025, 5:19 PM

#

Yeah, I see it as a repeated loop of learning a new ML idea, then going back and re-learning its math, and then going again

umbral hatch Apr 21, 2025, 5:20 PM

#

for now tho just python? as a beginner of course

viscid urchin Apr 21, 2025, 5:21 PM

#

Yeah just Python is fine. You can learn Triton or something to add to what you can do later.

umbral hatch Apr 21, 2025, 5:21 PM

#

any other languages along python later on?

viscid urchin Apr 21, 2025, 5:21 PM

#

A lot of people like to learn C++ or Rust, but to me this might be the power move second language for ML https://triton-lang.org/main/index.html

umbral hatch Apr 21, 2025, 5:21 PM

#

can you tell me why?

viscid urchin Apr 21, 2025, 5:21 PM

#

(This is what most of OpenAI's models are written in)

#

This lets you write code that runs on the GPU very efficiently, and do it cross-platform without needing a specific nVidia card.

umbral hatch Apr 21, 2025, 5:22 PM

#

ah nice

viscid urchin Apr 21, 2025, 5:22 PM

#

This would be an alternative to, say, learning CUDA directly

umbral hatch Apr 21, 2025, 5:22 PM

#

what jobs can one land with data science?

viscid urchin Apr 21, 2025, 5:22 PM

#

At this point, whatever you can dream almost; everybody wants data science.

umbral hatch Apr 21, 2025, 5:23 PM

#

highly competitve? evne more than software engineering?

viscid urchin Apr 21, 2025, 5:24 PM

#

Well, it IS software engineering, so it's hard to compare

umbral hatch Apr 21, 2025, 5:24 PM

#

oh shit...

#

well is it more competitve than other fields or?

viscid urchin Apr 21, 2025, 5:27 PM

#

It's super hot so yeah, you could say so.

#

But how competitive depends on your level really

umbral hatch Apr 21, 2025, 5:28 PM

#

super beginner

viscid urchin Apr 21, 2025, 5:28 PM

#

You'll have a lot of competition, but if you know stuff, the market is thirsty.

umbral hatch Apr 21, 2025, 5:29 PM

#

never back down never give up

viscid urchin Apr 21, 2025, 5:36 PM

#

If you're actually interested in the topic, you should be able to stand out, in my opinion

#

There are a lot of people just doing it because it's supposed to be hot

umbral hatch Apr 21, 2025, 6:50 PM

#

nah i kidna like it and interested in it

limpid dew Apr 21, 2025, 6:51 PM

#

Is it normal to get a career in data science without a degree?

serene scaffold Apr 21, 2025, 6:55 PM

#

limpid dew Is it normal to get a career in data science without a degree?

No, it pretty much never happens.

#

Data science is even more degree-requiring than software development.

umbral hatch Apr 21, 2025, 6:56 PM

#

woah

limpid dew Apr 21, 2025, 6:57 PM

#

serene scaffold No, it pretty much never happens.

Thank you, thought that would be relevant to @umbral hatch

umbral hatch Apr 21, 2025, 6:57 PM

#

limpid dew Thank you, thought that would be relevant to <@1124881023881183243>

thanks. yeah it is, but i do want to have some experience and background, alongside some real projects

limpid dew Apr 21, 2025, 6:58 PM

#

What kind of projects are you thinking of?

umbral hatch Apr 21, 2025, 7:00 PM

#

limpid dew What kind of projects are you thinking of?

for now, im barely learning loops. For a future project, it'd be something like asking the user for inputs what's your name, age...etc and then in the same project something interactive, multiplying the age, scrambling names. this is far down the line tho

limpid dew Apr 21, 2025, 7:01 PM

#

Very cool! Also smart to keep it simple at first.

umbral hatch Apr 21, 2025, 7:01 PM

#

yeah and for fun maybe start coding games and learning other languages. main reason i went into coding

fallow coyote Apr 21, 2025, 7:57 PM

#

How do you lot use databases with your ML programs? I'm trying to make my first ML program to see the likelihood of someone developing heart failure. I cleaned the data and then transferred it into an SQLite database

wooden hill Apr 21, 2025, 7:57 PM

#

ML program?

glacial root Apr 21, 2025, 8:00 PM

#

@serene scaffold do you know what the issue here is

#

sorry to ping, you're the only one here that i know is into nlp

fallow coyote Apr 21, 2025, 8:01 PM

#

wooden hill ML program?

Machine Learning program. Just something simple; using linear regression. Ill spend this week on it, send it to this channel for everyone to critique and then I'll attempt to make a proper project (which will have mini projects within it as I try to expand my knowledge and skills in ML programming)

wooden hill Apr 21, 2025, 8:02 PM

#

fallow coyote Machine Learning program. Just something simple; using linear regression. Ill sp...

OH! I should probably know that considering thats what Im going for 😅

serene scaffold Apr 21, 2025, 8:02 PM

#

when you say "pairs", do you mean "two adjacent words"?
remember that lists are not arrays.

#

corpus[i:i+2] -- if you try to do a string slice that is out of range, you'll get an empty string.

#

!e

print("hello world!!!"[100:100000])

arctic wedgeBOT Apr 21, 2025, 8:04 PM

#

serene scaffold !e ```py print("hello world!!!"[100:100000]) ```

:warning: Your 3.12 eval job has completed with return code 0.

[No output]

serene scaffold Apr 21, 2025, 8:04 PM

#

note that it did print("") rather than erroring.

glacial root Apr 21, 2025, 8:06 PM

#

serene scaffold when you say "pairs", do you mean "two adjacent words"? remember that lists are ...

to adjacent items within the list, which starts out to be all chars but then they get grouped together

glacial root Apr 21, 2025, 8:07 PM

#

serene scaffold `corpus[i:i+2]` -- if you try to do a string slice that is out of range, you'll ...

i increment until one less than the length though

serene scaffold Apr 21, 2025, 8:08 PM

#

for i in range(1000):
    pairs = dict()
    for j in range(len(corpus) - 1):
        key_list = list(pairs.keys())
        pair = ''.join(corpus[i:i+2])

do you see the problem?

#

also, look at the number of lines: https://paste.pythondiscord.com/NMA23TQ64ACSOJMZ4K5GC4OREA

glacial root Apr 21, 2025, 8:15 PM

#

serene scaffold ```py for i in range(1000): pairs = dict() for j in range(len(corpus) - ...

i end at one before the last char though and i'm joining two chars

#

i don't understand how it's going past the last character

serene scaffold Apr 21, 2025, 8:16 PM

#

glacial root i don't understand how it's going past the last character

look at what variables you're using for what.

#

for x in range(1000):
    pairs = dict()
    for y in range(len(corpus) - 1):
        key_list = list(pairs.keys())
        pair = ''.join(corpus[x:x+2])

this might make it easier to see.

glacial root Apr 21, 2025, 8:17 PM

#

oh shit

#

wait yeah i didn't see that part earlier

#

i'm such an idiot

#

thank you

serene scaffold Apr 21, 2025, 8:18 PM

#

glacial root i'm such an idiot

you are not

glacial root Apr 21, 2025, 8:20 PM

#

serene scaffold you are not

no these kinds of mistakes happen far too frequently for me

#

i should have known to check a few more times before asking for help

serene scaffold Apr 21, 2025, 8:26 PM

#

glacial root i should have known to check a few more times before asking for help

It's good to figure out what you can on your own, but don't be ashamed to ask for help.

glacial root Apr 21, 2025, 8:33 PM

#

serene scaffold It's good to figure out what you can on your own, but don't be ashamed to ask fo...

nah that's not the part i'm worried about

#

sometimes i just make so many mistakes like this

#

it's scary what's gonna happen in the future

serene scaffold Apr 21, 2025, 8:34 PM

#

You're being too hard on yourself. Everyone starts out like this.

glacial root Apr 21, 2025, 8:35 PM

#

hopefully this issue goes away as i practice more

serene scaffold Apr 21, 2025, 8:55 PM

#

glacial root hopefully this issue goes away as i practice more

I think it will.

civic elm Apr 22, 2025, 7:31 AM

#

Hi all, in machine learning, in particular feature update or (rollbacks?) how do you solve the "Any change can break everything" problem?

#

Basically I made some feature changes to my model inputs in jupyter notebook and it performed worse than an hour ago. how do we solve this?

grand minnow Apr 22, 2025, 9:01 AM

#

Has anyone tried using Helicone? I was set on using it to track the cost of tokens and requests per conversation, etc but the dashboard just kept showing the default sample data metrics 😭

I've verified that I am currently sending through their gateway tho... Does it lag?

narrow tiger Apr 22, 2025, 3:05 PM

#

If i have a rag from which I want to pass in context to an llm (to use relevent data)
should i send this in system prompt or user prompt?
how can it effect the outputs

agile cobalt Apr 22, 2025, 3:07 PM

#

how much it affects the output will vary depending on the model, but in general you'll want to put "trusted" data in the system prompt, "untrusted" data in the user prompt

you shouldn't rely on the model to distinguish between it, but it might help it understand how official or reliable that data is

narrow tiger Apr 22, 2025, 3:32 PM

#

thanks that makes sense,

untold cliff Apr 22, 2025, 5:01 PM

#

Hello! I am trying to dive deep into tokenization and understand the need of the shift towards subword based tokenization.
One of the main points I see is how hard it might be to define splitting rules, especially for complex languages. I keep seeing turkish as an example but I don't know the language so I can't tell that much, but I think I can see ot even in english of a word is a somewhat complex combination of prefixes and suffixes (but no specific examples come to mind unfortunately)
Another main point I see is ambiguity, in the sense of which representation to use. I see example like "don't", should be considered one token or be split into "do" and "n't" or "not". And "U.S.A" or "New York-based" for example, how should it be split? And I'm wondering if it's that hard to agree on one common way of doing or if it is that hard to define rules for doing so. I see arguments saying that it depends on the use case, so for some cases one way of splitting is better than the other, but I can't think of any examples.
Can you shed some light maybe? Or give some clear examples? Thank you!

final jolt Apr 22, 2025, 5:08 PM

#

I feel like that is a complex question with not at all the same set of examples. realistically an LLM would understand "don't" as a single token and breaking that up would be model dependant. (and also would it even be useful?) The same can be said for splitting prefixes and suffixes, at least in English where you would fundamentally change the meaning of a word by doing so which I say would just cause more harm than good. A stronger argument could be made for your last example but by that same degree I feel like splitting the tokens just by some set standard would yield unreliable outcomes. given that New York-based would have a different meaning than New York, based or similar depending on the model in question.

The greater support or capability for natural language interaction makes the argument even harder to justify IMO.

north plank Apr 22, 2025, 7:08 PM

#

untold cliff Hello! I am trying to dive deep into tokenization and understand the need of the...

is it possible to use fuzzyword to generate and map such subwords? as a part of tokenization. Because how long can we count and create such subword lists?

stiff night Apr 22, 2025, 7:11 PM

#

Hello, I'm here to seek guidance. I have a project in which I have to train an AI model on segmenting tumors from breast mammograms. This is my first AI project so I'm kind of lost at the moment.
The dataset that I'm working with is the INbreast 2012 dataset on kaggle. I have managed to load the DICOMs and their corresponding tumor masks and train a UNet model on them, but I did not get any promising results. All metrics like dice and IoU are very small (less than 1%).
I'd really like any help if possible. Thanks in advance.

fallow coyote Apr 22, 2025, 8:14 PM

#

I've been learning a lot of the statistics for ML. As I'm still at the beginning of my ML journey, how much of a focus should I have put in learning the linear algebra side?

iron basalt Apr 22, 2025, 9:39 PM

#

fallow coyote I've been learning a lot of the statistics for ML. As I'm still at the beginning...

A lot. It will also enhance your understanding in others parts, like statistics. It's foundational as a building block due to modern computers being designed around it for performance.

fallow coyote Apr 22, 2025, 10:12 PM

#

Ill finish off learning the stats. Tbf a lot of the stats I still can recall from high school several years ago. Just need to revise them. Then Ill go onto learning all the linear algebra stuff

lapis sequoia Apr 23, 2025, 12:17 AM

#

gans make me feel dirty about myself, they waste time, I thought they were harder than obj detection. Any of you young and good lads have any resources at your disposal for object detection?

viscid urchin Apr 23, 2025, 1:12 AM

#

https://arxiv.org/html/2502.15840v1

lapis sequoia Apr 23, 2025, 1:12 AM

#

you think rlhf is not really RL?

viscid urchin Apr 23, 2025, 1:13 AM

#

I wouldn't call it fake, but it's clearly different when humans are involved

lapis sequoia Apr 23, 2025, 1:14 AM

#

yes, the reward are still through actual tangible data

iron basalt Apr 23, 2025, 1:20 AM

#

lapis sequoia gans make me feel dirty about myself, they waste time, I thought they were harde...

As in computer vision object detection?

gray slate Apr 23, 2025, 1:22 AM

#

viscid urchin https://arxiv.org/html/2502.15840v1

We find no clear correlation between failures and the point at which the model’s context window becomes full, suggesting that these breakdowns do not stem from memory limits
Interesting. Any thoughts about it?

#

Also, arxiv entering the 21st century with HTML! 🎉
Nice work, science! 😂

viscid urchin Apr 23, 2025, 1:24 AM

#

gray slate > We find no clear correlation between failures and the point at which the model...

Yeah that part is super interesting, I'm still trying to decide what my mental model for it is. It's not like the model can get 'bored'.. is it just the 'game of telephone' it is playing with only being given the last 30,000 tokens etc? Not sure.

gray slate Apr 23, 2025, 1:26 AM

#

viscid urchin Yeah that part is super interesting, I'm still trying to decide what my mental m...

I haven't read the full thing so don't have a good mental model of it. But I've got some decent mental models in general. I like Tim Scarfe's take on it, that they tend towards the mean while we push towards chaos: https://www.mlst.ai/p/agentialism-and-the-free-energy-principle

lethal gull Apr 23, 2025, 1:33 AM

#

Does anybody that has worked with VAE's know what ways i can increase model performance?

lapis sequoia Apr 23, 2025, 1:39 AM

#

iron basalt As in computer vision object detection?

yes

gray slate Apr 23, 2025, 1:39 AM

#

lapis sequoia yes

https://www.youtube.com/watch?v=8jXIAWg_yHU&list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p&index=1

^ Full university course on computer vision from the creator of YOLO. Thi sis the best resource you're likely to find anywhere

lyric furnace Apr 23, 2025, 6:47 AM

#

guys

#

could anyone suggest me a machine learning documentation:

i am a kid and i am intrested in ML/DEEP LEARNING.
i am trying to learn linear algebra and stuff

could anyone please suggest me a doc, cause the docs i find is very complicated and not well explained

grand minnow Apr 23, 2025, 7:39 AM

#

lyric furnace could anyone suggest me a machine learning documentation: i am a kid and i am i...

Have you tried a more hands-on learning course like https://kaggle.com/learn ?

Learn Python, Data Viz, Pandas & More | Tutorials | Kaggle

Practical data skills you can apply immediately: that's what you'll learn in these no-cost courses. They're the fastest (and most fun) way to become a data scientist or improve your current skills.

fallow coyote Apr 23, 2025, 1:36 PM

#

Can anyone help me try to decipher what a model matrix is and how to create one? Third fucking time I'm asking. If you cant be asked to help, at least guide me to a decent resource that can help me understand what a model/design matrix is because I cannot find anything remotely useful on the internet that tells me what I need.

final jolt Apr 23, 2025, 1:38 PM

#

fallow coyote Can anyone help me try to decipher what a model matrix is and how to create one?...

Ah yes be condescending and rude while asking for help. Excellent tactic

serene scaffold Apr 23, 2025, 2:03 PM

#

fallow coyote Can anyone help me try to decipher what a model matrix is and how to create one?...

I'm sorry you're frustrated. I've never heard of a model matrix or a design matrix.

We do our best in this server to connect knowledgeable people to newcomers, but everything is ultimately voluntary and no one is required to help or entitled to receive it.

#

This raises the question: why do you think you need to know about something for which scant resources exist?

lavish wraith Apr 23, 2025, 2:10 PM

#

Is pandas is equivalent of Excel

final jolt Apr 23, 2025, 2:13 PM

#

I mean pandas is a library for data queries and other stuff. Excel is a spreadsheet program. Yes both can be used for data query and analysis but they are very different approaches

lavish wraith Apr 23, 2025, 2:14 PM

#

For data analysis what's basic skills required ??

final jolt Apr 23, 2025, 2:16 PM

#

I feel like that heavily depends on what data you may be analyzing, what your goal of the analysis is and what exactly are you wanting to do.

serene scaffold Apr 23, 2025, 2:17 PM

#

lavish wraith For data analysis what's basic skills required ??

keep in mind that you almost certainly need a degree--employers are going to be selective about who they trust to help them make consequential business decisions.
pandas is one of the most popular tools for data analysis, but you also need to understand statistics and have some domain knowledge in what you're analyzing.

lavish wraith Apr 23, 2025, 2:23 PM

#

final jolt I feel like that heavily depends on what data you may be analyzing, what your go...

Just find job as data analysis

final jolt Apr 23, 2025, 2:23 PM

#

lavish wraith Just find job as data analysis

Ah then very much what Stelercus said. A degree most likely as a minimum

grand mantle Apr 23, 2025, 2:25 PM

#

Hello guys. I am little bit confused on data sharing

#

I provide Instagram data but i can't extend number of clients

lavish wraith Apr 23, 2025, 2:26 PM

#

I search lot of website skill required they said Excel ,sql,tableau ,numpy ,pandas,matplotlib ,seaborn

final jolt Apr 23, 2025, 2:29 PM

#

lavish wraith I search lot of website skill required they said Excel ,sql,tableau ,numpy ,pand...

I mean that is basically a catchall wordsalad but still something to start with in what of those things do you know? You are certainly going to need at least familiarity and understanding of those tools and libraries. Knowledge of usage and experience beyond that

final jolt Apr 23, 2025, 2:30 PM

#

grand mantle I provide Instagram data but i can't extend number of clients

Going to need more details like what you created your system with, what errors you are getting or basically just more details in order for someone to be able to help

grand minnow Apr 23, 2025, 3:48 PM

#

lavish wraith For data analysis what's basic skills required ??

Doesn't look like that there's a pre-requisite to learning data analytics. Why don't you give it a try? https://www.coursera.org/google-certificates/data-analytics-certificate

ornate rose Apr 23, 2025, 5:13 PM

#

Hey guys. I'm new to here (but not entirely new to python). I just wanted your opinions on this

I'm an advanced beginer in python ( I know the basics. Loops, ifs, whiles, input data types... all that kind of stuff) and i've worked on natural language processing using the NLTK library in python. so it's been quite a lot though ofcourse not being familiar with it makes me forget a lot of syntax in the library

I'm presently 17 and will be starting college pursuing a CS and Cognitive Science degree and I'll be working on a research paper on AI before resumption (September) with top level profs and graduates. This paper would be submitted to arXiv and top AI conferences like NeurIPs. I'd be aiming to pursue a FAANG internship though I'd settle for whatever I'll get but my main goal is to master Python Programming by the end of this year up to a given extent.

I'd love anyone that has inputs or advice they are willing to share so I begin working.

serene scaffold Apr 23, 2025, 5:26 PM

#

^ I responded to this in #career-advice

fallow coyote Apr 23, 2025, 6:56 PM

#

serene scaffold This raises the question: why do you think you need to know about something for ...

Apologies if I was being brash. I'm going through a book (Introduction to Statistical Learning for Python or ISLP). It mentions something about a model matrix or design matrix which I believe is to set the template for your model (i.e. defining what X and y will be). ISLP uses its own custom module to create a design matrix. I'll continue on with my project and see how it goes. I'll post it up on here when I'm done with it.

viscid urchin Apr 23, 2025, 6:59 PM

#

fallow coyote Apologies if I was being brash. I'm going through a book (Introduction to Statis...

Does this help at all? https://en.wikipedia.org/wiki/Design_matrix
Each row in the matrix is an 'observation', at least as far as I understand them currently.

#

I guess this is what you're doing? https://intro-stat-learning.github.io/ISLP/models/spec.html

#

and I guess this is where the rubber meets the road

#

Never used this API, looks fancy

fallow coyote Apr 23, 2025, 7:03 PM

#

viscid urchin I guess this is what you're doing? https://intro-stat-learning.github.io/ISLP/mo...

Yes. That book I'm going through. It feels like a decent book to go through the basics of machine learning and the stastical maths behind (not so much the linear algebra side). I've made a design matrix from the book 'manually' but I dont understand what makes their custom design matrix module better if that makes any sense

viscid urchin Apr 23, 2025, 7:03 PM

#

Not sure; my guess is that it helps you generate the matrix from the dataset, but I guess I'd have to look at their docs to know why it's cool

#

Perhaps the most common use is to extract some columns from a pd.DataFrame and produce a design matrix there we go I suppose

fallow coyote Apr 23, 2025, 7:06 PM

#

I'll continue on with my current project and see how it goes from there. Thanks for trying to help. I have this weird obsession where I must know how everything works or else i cant use it to its fullest potential. I'm slowly getting better at just needing to learn the surface details then applying it straight way and after, go deeper into the subject

final jolt Apr 23, 2025, 7:51 PM

#

viscid urchin Not sure; my guess is that it helps you generate the matrix from the dataset, bu...

yea this is about all I can glean from the terminology in that book as well. Its more of a term used to describe the matrix created with extracted data vs a 'concept' of its own

opaque condor Apr 23, 2025, 7:51 PM

#

For making my own image network convelution
Do I make a file with each reference

#

Image

lapis sequoia Apr 23, 2025, 7:53 PM

#

yo, RAG stuff, any good tid-bits or links?

lapis sequoia Apr 23, 2025, 8:03 PM

#

opaque condor For making my own image network convelution Do I make a file with each referenc...

no, it should be labeled in each folder with what the image is.

misty wraith Apr 23, 2025, 8:20 PM

#

hi im new just trying to find my way around this server. im trying to learn python with data science as one of the goals

serene scaffold Apr 23, 2025, 8:20 PM

#

misty wraith hi im new just trying to find my way around this server. im trying to learn pyth...

hello and welcome to our wonderful data science channel

misty wraith Apr 23, 2025, 8:21 PM

#

serene scaffold hello and welcome to our wonderful data science channel

hi thanks took me a while to find this channel its a big server

wooden sail Apr 23, 2025, 8:21 PM

#

fallow coyote I'll continue on with my current project and see how it goes from there. Thanks ...

a "model matrix" is a particular flavor of linear statistical model, so what you're looking for is a "statistical model" https://en.wikipedia.org/wiki/Statistical_model#Formal_definition

Statistical model

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, the data-generating process. When referring specifically to probabilities, the corresponding term is...

#

after some discretization/sampling and/or a choice of basis in a finite dimensional vector space, a "model matrix" is roughly equivalent to the assumption that your data is described by a statistical model with deterministic but unknown parameters, and those parameters are related to the observed data via a linear transformation

opaque condor Apr 23, 2025, 8:45 PM

#

lapis sequoia no, it should be labeled in each folder with what the image is.

So like fox image one fox image 2 etc?

lapis sequoia Apr 23, 2025, 9:52 PM

#

opaque condor So like fox image one fox image 2 etc?

what do you have? Is the data in files? and in those files, is the data labeled?

opaque condor Apr 23, 2025, 10:02 PM

#

Not yet and I'm going over foxes and cats and dogs

ionic dirge Apr 23, 2025, 10:30 PM

#

Hi everyone, please, I need your help. I currently use Google colab on a mobile device to run datasets. I only just started. I need to analyze datasets from kaggle. How can I use these datasets on Google colab without downloading it?

potent meadow Apr 23, 2025, 10:30 PM

#

im doing a project for my uni and everything looks good except that the graph isnt showing anything
i cant fix it and i dont know where's the issue
can someone help? i can provide the code and other files and stuff but it's difficult to just upload everything here lol

viscid urchin Apr 23, 2025, 10:34 PM

#

ionic dirge Hi everyone, please, I need your help. I currently use Google colab on a mobile...

https://www.kaggle.com/discussions/general/74235

Easiest way to download kaggle data in Google Colab | Kaggle

Easiest way to download kaggle data in Google Colab

#

Or https://www.geeksforgeeks.org/how-to-import-kaggle-datasets-directly-into-google-colab/

ionic dirge Apr 23, 2025, 10:47 PM

#

viscid urchin Or https://www.geeksforgeeks.org/how-to-import-kaggle-datasets-directly-into-goo...

Thank you so much. Will check it out.

final jolt Apr 23, 2025, 10:52 PM

#

potent meadow im doing a project for my uni and everything looks good except that the graph is...

probably going to at least need some snippets of code and output screenshyot

potent meadow Apr 23, 2025, 11:33 PM

#

final jolt probably going to at least need some snippets of code and output screenshyot

can i dm you?

final jolt Apr 23, 2025, 11:38 PM

#

You can but I can't promise I alone could help. And probably won't have a chance to look today 🙂

potent meadow Apr 23, 2025, 11:41 PM

#

final jolt You can but I can't promise I alone could help. And probably won't have a chance...

yea all good i will appreciate just trying :D ^^
im busy atm so i will send it later but appreciate you

opaque condor Apr 23, 2025, 11:41 PM

#

lapis sequoia what do you have? Is the data in files? and in those files, is the data labeled?

The basics

wraith jay Apr 24, 2025, 12:35 AM

#

i need some pandas help

def in_prop(formula):
    parsed = _parse_formula(formula)
    bools = [el in elements for el in propeties.keys()]
    return all(data)

filtered = df.loc[lambda x: ~in_prop(x['formula'])]
df['en diff'] = filtered.map(en_diff)

i want to filter elements in the formula column of the dataframe based on the result of the in_prop function, but i cant figure out how to do it

#

this code doesnt work

viscid urchin Apr 24, 2025, 12:35 AM

#

Sorry if I'm being dumb, what is en_diff?

#

also propeties is spelled wrong

#

and where is data coming from?

wraith jay Apr 24, 2025, 12:36 AM

#

viscid urchin Sorry if I'm being dumb, what is `en_diff`?

im creating a new column for the filtered data which i appply a fucntion to

wraith jay Apr 24, 2025, 12:37 AM

#

viscid urchin and where is `data` coming from?

a json file

viscid urchin Apr 24, 2025, 12:37 AM

#

Sure, but I mean, it's not in the code you show; is that variable in scope?

wraith jay Apr 24, 2025, 12:37 AM

#

yeah

#

im doing this in jupyter notebook

#

heres the error im getting btw

TypeError: expected string or bytes-like object, got 'Series'

#

for this line filtered = df.loc[lambda x: ~in_prop(x['formula'])]

#

i just dont know how to do the filter

viscid urchin Apr 24, 2025, 12:38 AM

#

Don't you want to say return all(bools)?

#

in in_prop?

#

Like..

def in_prop(formula):
    parsed = _parse_formula(formula)
    bools = [el in properties.keys() for el in parsed]
    return all(bools)
``` ?

wraith jay Apr 24, 2025, 12:40 AM

#

ohh right

viscid urchin Apr 24, 2025, 12:40 AM

#

Or is that not what 'parsed' is all about?

wraith jay Apr 24, 2025, 12:40 AM

#

ty

viscid urchin Apr 24, 2025, 12:40 AM

#

If I understand you correctly I THINK the way to say it is:

filtered = df.loc[~df['formula'].apply(in_prop)]
``` but I'm not a pandas wizard.

wraith jay Apr 24, 2025, 12:40 AM

#

yeah data wasnt a variable. but the error still persists, i dont think its even calling the function yet

wraith jay Apr 24, 2025, 12:41 AM

#

viscid urchin If I understand you correctly I THINK the way to say it is: ```python filtered =...

oh awesome that works

#

wait

#

nvm

viscid urchin Apr 24, 2025, 12:42 AM

#

Remove the ~ if you want to invert it

#

and then you'd do df['en diff'] = filtered['formula'].map(en_diff)

wraith jay Apr 24, 2025, 12:42 AM

#

ahh i see

#

thanks for the help!

limpid dew Apr 24, 2025, 2:11 AM

#

Anyone know if there is a difference between embedding and one hot encoding?

grand minnow Apr 24, 2025, 2:34 AM

#

limpid dew Anyone know if there is a difference between embedding and one hot encoding?

One-hot encoding is simple and is used for categorical data when relationships between categories don’t matter.

Embeddings are advanced and crucial for tasks like natural language understanding in LLMs, where capturing meaning, context, and relationships is essential. Because it stores the info as vectors.

limpid dew Apr 24, 2025, 2:38 AM

#

grand minnow One-hot encoding is simple and is used for categorical data when relationships b...

But is the first layer of a DNN not encoding the one hot encoded vector into some latent space in the same way as imbedding algorithm?

grand minnow Apr 24, 2025, 2:41 AM

#

I don't know what or how a DNN works

#

Sorry

limpid dew Apr 24, 2025, 3:17 AM

#

grand minnow I don't know what or how a DNN works

oh was that a chat gpt response?

glacial root Apr 24, 2025, 3:29 AM

#

misty wraith hi thanks took me a while to find this channel its a big server

note that this is the best channel in this server, you've come to a wonderful place ‼️

quaint mulch Apr 24, 2025, 3:31 AM

#

limpid dew oh was that a chat gpt response?

This is the funniest thing I saw all day

quaint mulch Apr 24, 2025, 3:32 AM

#

limpid dew Anyone know if there is a difference between embedding and one hot encoding?

like, they are very different categories, so IDK how to answer that.
1hot can be considered pre-processing, not part of the model, and there are no learnable weights

limpid dew Apr 24, 2025, 3:33 AM

#

quaint mulch like, they are very different categories, so IDK how to answer that. 1hot can be...

but ins't enbedding just relating an index to a vector?

quaint mulch Apr 24, 2025, 3:33 AM

#

embedding is just a concept for what's hapening on the 1st layer of DNN usually
the idea is it is projecting from data space to a latent space.

quaint mulch Apr 24, 2025, 3:34 AM

#

limpid dew but ins't enbedding just relating an index to a vector?

there are many ways to do embedding. usually, it is not.
Although it can be done that way.

quaint mulch Apr 24, 2025, 3:34 AM

#

limpid dew but ins't enbedding just relating an index to a vector?

you got an example where this is the case and we can take a look?

limpid dew Apr 24, 2025, 3:36 AM

#

sadly I dont have an example

#

Just interested conspetually.

viscid urchin Apr 24, 2025, 3:37 AM

#

https://github.com/xbeat/Machine-Learning/blob/main/Embeddings Re-ranking and Vector Databases in Python.md

GitHub

Machine-Learning/Embeddings Re-ranking and Vector Databases in Pyth...

Cross Beat (xbe.at) - Your hub for python, machine learning and AI tutorials. Explore Python tutorials, AI insights, and more. - xbeat/Machine-Learning

limpid dew Apr 24, 2025, 3:39 AM

#

Fundamentally, isn't embedding just a mapping from a set of binary features to an arbitrary vector?

quaint mulch Apr 24, 2025, 3:39 AM

#

limpid dew Fundamentally, isn't embedding just a mapping from a set of binary features to a...

No. In many cases, the features are non-binary.

#

take an image or a sound file.

limpid dew Apr 24, 2025, 3:40 AM

#

Thank you that's helpful.

quaint mulch Apr 24, 2025, 3:40 AM

#

or like, stock prices

viscid urchin Apr 24, 2025, 3:40 AM

#

Oh sick, GItHub has notebook support these days? https://github.com/google-gemini/cookbook/blob/main/quickstarts/Embeddings.ipynb

peak hamlet Apr 24, 2025, 3:41 AM

#

viscid urchin Oh sick, GItHub has notebook support these days? https://github.com/google-gemin...

It’s had it for a long time
It got a few improvements recently though
Maybe a couple months back? You could search the changelog

viscid urchin Apr 24, 2025, 3:41 AM

#

Neat

limpid dew Apr 24, 2025, 3:43 AM

#

quaint mulch No. In many cases, the features are non-binary.

Does the scaling of the feature correspond to a proportional scaling of the embedded vector?

viscid urchin Apr 24, 2025, 3:44 AM

#

"learned vectors that place semantically or structurally similar items close together in high-dimensional space" is a definition I just found that I kinda like.

limpid dew Apr 24, 2025, 3:44 AM

#

I like that as a high level explanation

viscid urchin Apr 24, 2025, 3:45 AM

#

It's interesting I guess that this sense of 'embedding' is different from the broader mathematical term

limpid dew Apr 24, 2025, 3:45 AM

#

Does have a bit of an LLM slant to it though

viscid urchin Apr 24, 2025, 3:45 AM

#

Like, I was just thinking about whether a 'consistent hash ring' like you'd find in a distributed system is an 'embedding' of its nodes.. and I guess it is in the mathematical sense but not in the machine-learning sense.

limpid dew Apr 24, 2025, 3:46 AM

#

I'm interesting in embedding dota2 and league heros/champions

limpid dew Apr 24, 2025, 3:47 AM

#

viscid urchin Like, I was just thinking about whether a 'consistent hash ring' like you'd find...

I'm afraid that's a bit over my head.

viscid urchin Apr 24, 2025, 3:48 AM

#

Imagine you have a bunch of servers and you want each of them to store an even chunk of your data.. You might use an algorithm that assigns them to 'positions' on a 'ring' or 'clock face'

quaint mulch Apr 24, 2025, 3:48 AM

#

limpid dew Does the scaling of the feature correspond to a proportional scaling of the embe...

really depends on your embedding.
If it is just a matrix multiplication, then yes.
But in general, no.

limpid dew Apr 24, 2025, 3:50 AM

#

quaint mulch really depends on your embedding. If it is just a matrix multiplication, then ye...

Could you point me to an example which doesn't correspond to a simple mapping of a feature to an vector (+ some scaling if the feature is nonbinary)

lyric furnace Apr 24, 2025, 3:51 AM

#

grand minnow Have you tried a more hands-on learning course like https://kaggle.com/learn ?

i didint knew that kaggle had this types of content, any way thank you for helping me out buddy.

quaint mulch Apr 24, 2025, 3:52 AM

#

limpid dew Could you point me to an example which doesn't correspond to a simple mapping of...

classic VGG? https://arxiv.org/pdf/1409.1556v6

limpid dew Apr 24, 2025, 3:59 AM

#

quaint mulch classic VGG? https://arxiv.org/pdf/1409.1556v6

Don't think I see anything about embedding in this paper but perhaps I missed in on my first skim through. Cook paper though!

quaint mulch Apr 24, 2025, 4:00 AM

#

I mean, it is cited 140k times

#

Well, embedding is just a concept right? Usually the 1st layer we call it emebdding

#

in this paper they don't use the embedding concept, but you can think of the 1st layer as embedding

viscid urchin Apr 24, 2025, 4:01 AM

#

limpid dew Could you point me to an example which doesn't correspond to a simple mapping of...

Unless I'm misunderstanding you this qualifies right? https://jesusleal.io/2021/01/13/node2vec-tutorial-with-capitol-bikeshare-data/

#

@limpid dew Or you could say https://medium.com/%40eddiewctan/collaborative-filtering-and-embeddings-3d6a49034965 ?

#

Maybe I don't understand what you mean by 'simple mapping'

#

In the second one the embedding represents the 'latent factors' you are trying to optimize for

limpid dew Apr 24, 2025, 4:05 AM

#

quaint mulch in this paper they don't use the embedding concept, but you can think of the 1st...

Yeah, that's what I mean. The main point of my question is to say ins't one hot encoding -> the first layer the same as embedding.

viscid urchin Apr 24, 2025, 4:05 AM

#

and then there's like BERT where the embeddings are totally contextual https://medium.com/%40davidlfliang/intro-getting-started-with-text-embeddings-using-bert-9f8c3b98dee6

limpid dew Apr 24, 2025, 4:08 AM

#

I think the difference might be in how the vectors are treated after the first layer.

#

Perhaps the only difference is that, in embedding, the vectors are assumed to have the same basis, and therefore can be added together without increasing the dimensionality.

viscid urchin Apr 24, 2025, 4:11 AM

#

https://stackoverflow.com/questions/73139690/can-you-turn-a-one-hot-vector-to-a-nn-embedding-in-a-differentiable-way

Stack Overflow

Can you turn a one-hot vector to a nn.Embedding in a differentiable...

Is there a way to feed a one-hot ([batch_size, seq_len, vocab_size]) vector to torch.nn.Embedding and get the same embeddings you would get from [batch_size, seq_len] integer tokens as an input and...

#

multiplying a one-hot by the weight matrix selects that row I guess

#

But presumably in practice your embeddings would have a much more efficient way of being looked up

#

https://forums.fast.ai/t/cant-wrap-my-head-around-one-hot-encoding-vs-embeddings/50327

fast.ai Course Forums

Can't wrap my head around one hot encoding vs embeddings

In Lesson 5, Jeremy talks about how he converted user and movie IDs to one hot encoded vectors and then multiplied it with the weight matrices. I just missed the point of this. The one hot encoded matrix is just an identity matrix right? Multiplying it with the weight matrix just gives you the weight matrix again. What was the point of that? A...

limpid dew Apr 24, 2025, 4:13 AM

#

Perhaps it's more efficient but mathmatically is would be the same then.

#

that stack overflow article is helpful, thanks

#

I take that answer to mean that one hot encoding would be isomorphic to embedding.

junior venture Apr 24, 2025, 4:16 AM

#

hi guys

limpid dew Apr 24, 2025, 4:16 AM

#

Hi ccccccp

junior venture Apr 24, 2025, 4:17 AM

#

can you say where general

limpid dew Apr 24, 2025, 4:17 AM

#

were general

junior venture Apr 24, 2025, 4:17 AM

#

yeah

limpid dew Apr 24, 2025, 4:17 AM

#

nice

junior venture Apr 24, 2025, 4:18 AM

#

wait i mean where general

#

hello?

#

@limpid dew are you here?

limpid dew Apr 24, 2025, 4:20 AM

#

?

junior venture Apr 24, 2025, 4:20 AM

#

where general?

#

...

limpid dew Apr 24, 2025, 4:20 AM

#

idk what that means

#

what's this discords policy on esports betting?

viscid urchin Apr 24, 2025, 4:34 AM

#

I can't think of a rule against it as long as it's not against any terms of service, but I guess be careful?

#

Maybe send a ModMail asking? Dunno.

limpid dew Apr 24, 2025, 4:36 AM

#

I'm trying to build a model to beat betting odds for esports like dota or league.

#

looking for someone to help with the leage side of things as I only know dota.

#

each dota hero -> an embedding vector hense the questions earlier.

rich moth Apr 24, 2025, 4:38 AM

#

Holy smokes guys! I just made a univerisal complexity scoring tool that works across every digital domain I could throw at it, tabular, time series, images, text, you name it.

In a nut shell it quantifies how hard each example is, automatically sorts training data, from easy to hard, hard to easy, or random. On time series its boosted it to 192% on time series data.

limpid dew Apr 24, 2025, 4:42 AM

#

rich moth Holy smokes guys! I just made a **univerisal complexity scoring tool** that work...

Can someone explain what this means?

rich moth Apr 24, 2025, 4:46 AM

#

Sorrry to flood the channel, I just felt this was too awesome to not share 😄

time_series_damped_oscillators_classifier_comparison.png

time_series_damped_oscillators_complexity_distribution.png

time_series_damped_oscillators_curriculum_comparison.png

limpid dew Apr 24, 2025, 4:47 AM

#

Could you give a little more detail?

#

What is a complexity scoring tool

#

do you mean Kolmogorov complexity?

rich moth Apr 24, 2025, 4:48 AM

#

Sure! It bascailly quantifies how "difficult" or "complex" different data examples are for ML.

#

It repesents compleixty as a complex number Φ(x), which provides both Magnitude and Phase arg.

limpid dew Apr 24, 2025, 4:50 AM

#

Yes but how do you define complexity?

rich moth Apr 24, 2025, 4:52 AM

#

With a mathmathically formula I made

limpid dew Apr 24, 2025, 4:53 AM

#

Would you share that with us?

rich moth Apr 24, 2025, 4:53 AM

#

Not at this moment in time, no.

limpid dew Apr 24, 2025, 4:54 AM

#

Interesting, why do you chose to represent the complexity as a complex number?

#

Is it because complex is in the word complexity?

rich moth Apr 24, 2025, 4:59 AM

#

💯 😆

#

A better way would be to say Φ(x) gives us a single score that tells us both how hard a data example is and what kind of challenge it poses, so we can train models more efficiently.

limpid dew Apr 24, 2025, 5:03 AM

#

You should check out Kolmogorov complexity. I think a similar version of what you describe has been done before.

limber spear Apr 24, 2025, 5:13 AM

#

rich moth Holy smokes guys! I just made a **univerisal complexity scoring tool** that work...

Heck no I call bs 😂 so it can sort categorical data automatically blobhuh

#

Still fire tho 🔥

limpid dew Apr 24, 2025, 5:16 AM

#

limber spear Still fire tho 🔥

idk about that

limber spear Apr 24, 2025, 5:17 AM

#

limpid dew idk about that

Well yea it still has to be tested

limpid dew Apr 24, 2025, 5:17 AM

#

limber spear Well yea it still has to be tested

joking. It's a cool idea

limber spear Apr 24, 2025, 5:24 AM

#

https://tenor.com/view/simpsons-homer-simpson-gif-13518564799186478937

Tenor

#

deadge catching some Zzz’s later chat

rich moth Apr 24, 2025, 5:45 AM

#

appreciate your guys feedback, bedtime now. but we can dig deeper tomorrow

#

#

Naturally makes sense text is the hardest for ML.

#

Languages, dialects, slang. ,translations. I mean the list goes, its chaotic.

#

Ok now bed time

limber spear Apr 24, 2025, 11:36 AM

#

rich moth Languages, dialects, slang. ,translations. I mean the list goes, its chaotic.

You may have to define it more. I was looking over @limpid dew ‘s conjecture of the Kolmogorov complexity. That conjecture is from the 1960’s for example

#

I did a brief online search and went into reading about Alan Turing’s research

#

Bro wakey wakey eggs and bakey 😅 have a good day chat or night

lapis sequoia Apr 24, 2025, 12:02 PM

#

Time series is underrated. Very underrated. It requires patience.

timber trail Apr 24, 2025, 12:07 PM

#

https://github.com/WhitzardIndex/self-replication-research/blob/main/AI-self-replication-fudan.pdf

#

https://storage.googleapis.com/deepmind-media/Era-of-Experience /The Era of Experience Paper.pdf

timber trail Apr 24, 2025, 12:08 PM

#

lapis sequoia Time series is underrated. Very underrated. It requires patience.

It's not hard mate 🤣

lapis sequoia Apr 24, 2025, 12:09 PM

#

timber trail It's not hard mate 🤣

No one said “hard”, I did not at least. I said patience.

hearty token Apr 24, 2025, 1:04 PM

#

For agglutinative languages where words boundaries are not defined by spaces, and where word segmentation is needed, what kind of definition should a "word" have? For instance, there are such things as compound verbs, these are conjugation of lone verbs that is in practice used as a single word, but in terms of meaning have their own spots in the dictionary. Should these be segmented or kept together? What is the benefit of keeping them together vs segmenting them? Practically speaking, for the usage of tokenization, would keeping them together be better?

#

my understanding is that since this is word segmentation, the segmented pieces doesn't need to be morphemes, and so keeping these compound verbs together would make more sense, but then there is also the fact of variation, the lone verbs in the compound verbs can take other forms as well, which doesn't betray its own part in other variaties. Would training on more segmented be better this way?

serene scaffold Apr 24, 2025, 1:07 PM

#

hearty token For agglutinative languages where words boundaries are not defined by spaces, an...

I would tokenize them separately and let the model figure out what it means when they appear sequentially

#

What language is this?

final jolt Apr 24, 2025, 1:08 PM

#

It would depend on the language but I would say the tokenization should be done at the level where the entire word actually makes sense.

serene scaffold Apr 24, 2025, 1:08 PM

#

Even for languages like English, it's already a thing to have "sub tokens", which is really just when you tokenize at the morpheme level.

serene scaffold Apr 24, 2025, 1:09 PM

#

final jolt It would depend on the language but I would say the tokenization should be done ...

Why? If I say "misunderstand", the "mis" has discrete meaning that can be applied in other words, even if it can't stand on its own

hearty token Apr 24, 2025, 1:09 PM

#

serene scaffold I would tokenize them separately and let the model figure out what it means when...

Ah I see

hearty token Apr 24, 2025, 1:09 PM

#

serene scaffold What language is this?

this is Burmese

#

ထီးဖြူဆောင်း မယ်တော်နတ်သည် ကွမ်းဆော် မင၏မယတဖစသည ။

#

e.g.

#

Is there a case where segmenting compound words together would make sense?

hearty token Apr 24, 2025, 1:11 PM

#

final jolt It would depend on the language but I would say the tokenization should be done ...

mm

serene scaffold Apr 24, 2025, 1:11 PM

#

I would just always tokenize them separately and let the model figure it out.

final jolt Apr 24, 2025, 1:12 PM

#

serene scaffold Why? If I say "misunderstand", the "mis" has discrete meaning that can be applie...

well what would be the point of taking prefixes and suffixes off to re-use them when they have no meaning (in those cases) by themselves?

serene scaffold Apr 24, 2025, 1:13 PM

#

final jolt well what would be the point of taking prefixes and suffixes off to re-use them ...

They do have meaning by themselves, they just can't be used by themselves. You want the model to "know" what the prefix itself means

final jolt Apr 24, 2025, 1:15 PM

#

I see I misread the end of their question about training and was only thinking of interraction. yes I do agree that breaking the words apart into their repeated components is worthwhile. It could even be worth doing both?

hearty token Apr 24, 2025, 1:16 PM

#

serene scaffold Why? If I say "misunderstand", the "mis" has discrete meaning that can be applie...

I notice that in some tokenizers, when this happens, the token appears not as "mis", "understand" but something like "mis##", "##understand" Is this a standard to mark that mis doesn't stand alone, or just specific to the tokenizer i saw?

serene scaffold Apr 24, 2025, 1:16 PM

#

hearty token I notice that in some tokenizers, when this happens, the token appears not as "m...

I've seen that notation used in BERT tokenizers.

hearty token Apr 24, 2025, 1:18 PM

#

final jolt I see I misread the end of their question about training and was only thinking o...

I was thinking. Maybe one case where it would be useful to put those together is in something like an aspect based sentiment analyzer? Because this is something reader-facing for business insights and whatnot. But then training the classifier is separately perhaps

hearty token Apr 24, 2025, 1:18 PM

#

serene scaffold I've seen that notation used in BERT tokenizers.

Gotcha. So it isn't really a standard across tokenizers.

#

iirc openai uses tiktoken and they don't do that

naive axle Apr 24, 2025, 1:34 PM

#

I was training a pytorch mobilenetv2 model on limited dataset - only have 1 image for a class so I used data augmentations to make it 10, even with training accuracy around 0.96 it cannot recognize images outside of training dataset their are not in the first 10 predictions . only difference I see in new image and training image is the background and size, and I applied gradient background/resize to training images hoping to resolve this issue. is it worth to use data augmentation on same source image and train model?

rich moth Apr 24, 2025, 5:41 PM

#

rich moth Apr 24, 2025, 8:33 PM

#

https://drive.google.com/drive/folders/1XwHbWh9OEAnCoRS6_glW98BYxwZqtQYM?usp=sharing

A link to the visual results.

limber spear Apr 24, 2025, 9:30 PM

#

rich moth https://drive.google.com/drive/folders/1XwHbWh9OEAnCoRS6_glW98BYxwZqtQYM?usp=sha...

Are you planning to monetize this Plunder pikawow if it’s the real deal Holyfield deadge

Because this cuts across a ton of tech stacks. Like a butt-ton

#

Even our beloved Python stacks

#

Python is one of if not the goat 🙌

rich moth Apr 24, 2025, 10:13 PM

#

limber spear Are you planning to monetize this Plunder <:pikawow:1270942841358647399> if it’s...

That's the plan. I'm still working through all the possibilities for monetization. The cross domain applicability is what makes this kinda exciting The python language is just the begining, since the formula is language agnostic

limber spear Apr 24, 2025, 10:17 PM

#

rich moth That's the plan. I'm still working through all the possibilities for monetizat...

Yep. Be ready for the C community. A lot are boomers 😂

#

Laughing more about dynamics in the community. The C community is the old guard

#

Most don’t believe any of this data sciencey ai mumbo jumbo

glacial root Apr 24, 2025, 10:52 PM

#

limber spear Yep. Be ready for the C community. A lot are boomers 😂

nah but c is genuinely useful

#

in embedded systems applications c/cpp are pretty much necessary

limber spear Apr 24, 2025, 10:54 PM

#

glacial root in embedded systems applications c/cpp are pretty much necessary

I believe it. Conversations get very rigid very fast when I converse with the C community

#

They could easily claim that C drives Python

glacial root Apr 24, 2025, 11:34 PM

#

limber spear I believe it. Conversations get very rigid very fast when I converse with the C ...

that's probably cause people using c use it for very specific use cases, unlike python which is becoming the go to for most

karmic pond Apr 25, 2025, 12:35 AM

#

Hi everyone

#

Does anyone work in AI roles?

lapis sequoia Apr 25, 2025, 1:15 AM

#

Why are there people who talk about LLMs all of the time, but have never ever mentioned a transformer?

serene scaffold Apr 25, 2025, 1:31 AM

#

lapis sequoia Why are there people who talk about LLMs all of the time, but have never ever me...

it's not necessary to understand transformers to use LLMs, or design new use cases for them, or to evaluate their performance thereon.

serene scaffold Apr 25, 2025, 1:31 AM

#

karmic pond Does anyone work in AI roles?

Yes. Why do you ask? You'll find that you get better and more answers when you're more forthcoming.

karmic pond Apr 25, 2025, 1:34 AM

#

@serene scaffold I would like to know what technologias should dominate to work with Azure, I have the pcap and it is Ml,DL

serene scaffold Apr 25, 2025, 1:38 AM

#

karmic pond <@253696366952316929> I would like to know what technologias should dominate to ...

sorry, but I do not understand your question.

viscid urchin Apr 25, 2025, 1:38 AM

#

What's a pcap?

karmic pond Apr 25, 2025, 1:42 AM

#

@viscid urchin The certificate

#

@serene scaffold knowing python and AI

#

Can I get a job related to Azure?

serene scaffold Apr 25, 2025, 1:44 AM

#

karmic pond Can I get a job related to Azure?

what do you think Azure is?

karmic pond Apr 25, 2025, 1:47 AM

#

Ok

serene scaffold Apr 25, 2025, 1:57 AM

#

karmic pond Ok

I'm asking you a question.

#

@karmic pond I recommend you join this: https://hablemospython.dev/

Hablemos Python - Inicio

La comunidad hispanohablante de Python más grande del mundo

opaque condor Apr 25, 2025, 2:27 AM

#

So if I wanted to make my own AI data set it would go like this?:

Folders

                  Eggplant:
                               Eggplant_image1
Eggplant_image2
Eggplant_image3
Eggplant_image4```

karmic pond Apr 25, 2025, 2:30 AM

#

@serene scaffold Thanks bro

grand minnow Apr 25, 2025, 4:43 AM

#

For anyone who uses LLM, how do you currently track your tokens for input and output?

rich moth Apr 25, 2025, 5:02 AM

#

grand minnow For anyone who uses LLM, how do you currently track your tokens for input and ou...

Some people track em' using specific tokenizers for that LLM. Autotokenizer from HF loads the correct and taliorred tokenzier for whatever model you're using

#

So anyways, after the tokenzier is loaded you can use .encode() method. You give it a your text string (prompt) and it gives you back lost of numbers which are your token IDS . To finally get the actual token count you use len() function on that list.

grand minnow Apr 25, 2025, 5:09 AM

#

So basically... There's no active library/app that I can integrate to get the cost of tokens used per user?

rich moth Apr 25, 2025, 5:10 AM

#

grand minnow So basically... There's no active library/app that I can integrate to get the co...

Well, not to my knowledge, but I never bothered to search 🙂

grand minnow Apr 25, 2025, 5:10 AM

#

Alright thanks

rich moth Apr 25, 2025, 5:10 AM

#

I mean theres more than one way todo it thats for sure.

#

Maybe addtional feedback would be helpful

limber spear Apr 25, 2025, 5:41 AM

#

Wanna see what real feature engineering in machine learning looks like 😏

opaque condor Apr 25, 2025, 10:42 AM

#

opaque condor So if I wanted to make my own AI data set it would go like this?: Folders ```Ve...

Did I write this example right?

lavish wraith Apr 25, 2025, 12:15 PM

#

    figsize(8,8),
    subplots=True,
    layout=(2,2,),
    sharey=True,
    legend=False,
)
plt.show()
``` in plot method what does mean layout (2,2)  ``
(rows, columns) for the layout of subplots.```

final jolt Apr 25, 2025, 12:19 PM

#

lavish wraith ```sales_df.set_index('date').plot( figsize(8,8), subplots=True, lay...

Yes according to docs https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

lavish wraith Apr 25, 2025, 12:27 PM

#

in this example layout=(2,2) what does 2,2 mean it create 4 subplot ??

final jolt Apr 25, 2025, 1:03 PM

#

lavish wraith in this example layout=(2,2) what does 2,2 mean it create 4 subplot ??

That is 2 rows and 2 columns of subplots. Which is 4 total, yes

limber token Apr 25, 2025, 3:42 PM

#

Do you guys know of services that host zero shot classification models and serve them as REST APIs?

serene scaffold Apr 25, 2025, 3:55 PM

#

limber token Do you guys know of services that host zero shot classification models and serve...

zero-shot classification models don't require special considerations. you should be able to deploy it from any cloud VM.

limber token Apr 25, 2025, 3:57 PM

#

serene scaffold zero-shot classification models don't require special considerations. you should...

Oh yeah we currently serve the model from an AWS EBS, we were looking to cut costs because we use NVIDIA GPU instances to be able to leverage CUDA but those are too expensive

serene scaffold Apr 25, 2025, 3:58 PM

#

limber token Oh yeah we currently serve the model from an AWS EBS, we were looking to cut cos...

so the model requires a GPU to run in a reasonable amount of time?

limber token Apr 25, 2025, 4:00 PM

#

My boss wants the reasonable amount of time to be a bit unreasonable so yeah lol

serene scaffold Apr 25, 2025, 4:01 PM

#

I'm not aware of a cloud compute service that offers GPUs at a lower price than AWS, but I run all my stuff on my company's own hardware.

limber token Apr 25, 2025, 4:02 PM

#

serene scaffold I'm not aware of a cloud compute service that offers GPUs at a lower price than ...

I was thinking of ML platforms that offer the models themselves since we use a default huggingface one

serene scaffold Apr 25, 2025, 4:03 PM

#

what model is it?

limber token Apr 25, 2025, 4:04 PM

#

valhalla/distilbart-mnli-12-9

#

Is ONNX runtime something that would make sense here to try and lower GPU costs?

slender pasture Apr 25, 2025, 5:16 PM

#

Hi all,
Hope you are all doing well.
I wanted to see if someone could help me out on something that's been cracking my skull for 3 FULL days now in matplotlib/numpy

I want to use spectogram data to plot ontop of the spectogram, the plotting part is easy, but I cant wrap my head around the data.

spectrum, frequencies, time, image = spectogram()

Is there any way I can transform this data to get x, y, z points?

example of a before and after attached

#

Any help is appreciated.
I know
time = x axis
frequencies = y axis
But how do I transform spectrum, which is a 2d array, into z values in order to decide whether to plot a point there or not?

wooden sail Apr 25, 2025, 7:04 PM

#

slender pasture Any help is appreciated. I know `time = x axis` `frequencies = y axis` But how...

the 2d array already contains the z values. the plot shows them as color, but the values are numeric

#

the row and column index of the 2d array correspond to a frequency and a time, so you can then use that to e.g. plt.scatter(time_val, freq_val) whenever the spectrum value exceeds a threshold

slender pasture Apr 25, 2025, 7:31 PM

#

Wow, thanks, I cant believe how simple that was.
I think my mistake was I was trying to implement all the logic within a the view_lim of the graph, to only calculate whats visible. And i kept mixing up the arrays.
Thanks so much @wooden sail

Now that I can get the Z data I'll try to clip spectrum to fit into the visible axis x and y limits. If you have any suggestions they are more than welcome.
Else I will post results later on.
Thanks!

wooden sail Apr 25, 2025, 7:34 PM

#

slender pasture Wow, thanks, I cant believe how simple that was. I think my mistake was I was t...

i'm not sure i understood the clipping part, can you give an example of what you wanna do?

slender pasture Apr 25, 2025, 7:38 PM

#

You know how you can zoom and pan around the graph?
Well I only want to calculate on whats visible in the current image.

I know that with the following I can get the limits of the graph that are currently visible

y0, y1 = self.ax1.get_ylim()
x0, x1 = self.ax1.get_xlim()

That way I can plot the line only on whats visible.
So i would have to find a way to clip spectrum to fit within x0, x1 and y0, y1

wheat snow Apr 25, 2025, 7:53 PM

#

hello!

i used a short python script to concatonate a few json files intoa csv and set the Timestamp column as index:

df['ts']= pd.to_datetime(df['ts'])
df=df.set_index(df['ts'])
df=df.drop(columns=['ts'])
df.index = df.index.tz_convert('Europe/Berlin')

df.to_csv('C:\\Privat\\Python_VSC\\Spotify\\MyData_2025\\Data_concat.csv')

ye i importated the new saved Data_concat into a jupyter notebook

#

unfortunatle the Dtype of the idnex is

dytpe("O")

#

so i cannot use commands like df.index.hours

viscid urchin Apr 25, 2025, 7:55 PM

#

Why are you dropping the ts column after making it be the index?

#

Don't you just want?

df['ts'] = pd.to_datetime(df['ts'])
df = df.set_index('ts')
df.index = df.index.tz_convert('Europe/Berlin')
df.to_csv(r'C:\Privat\Python_VSC\Spotify\MyData_2025\Data_concat.csv')
``` ?

wheat snow Apr 25, 2025, 7:56 PM

#

this is waht the ts column looks like in the json itself:

2021-01-06T19:04:34Z```

wheat snow Apr 25, 2025, 7:56 PM

#

viscid urchin Why are you dropping the ts column after making it be the index?

no need, unneccary column if i have it as the index

viscid urchin Apr 25, 2025, 7:57 PM

#

Hmm, is that how that works? Interesting.

wheat snow Apr 25, 2025, 7:57 PM

#

i mean i can use the index as basis for new columsn like Hours weekdays and so on

#

for x axsises

#

anyway ye, the json timestamp is UTC

#

i need it in Berlin Time so i converted it in the python script

#

yet as already mentioned when opening it in jupyter notebook i am loosing the dtype of that index

viscid urchin Apr 25, 2025, 8:02 PM

#

Well, it's a CSV, everything starts out as a string, right? How are you asking pandas to cast the values on load?

wheat snow Apr 25, 2025, 8:02 PM

#

viscid urchin Well, it's a CSV, everything starts out as a string, right? How are you asking p...

.read_csv

#

hm

#

thats true

viscid urchin Apr 25, 2025, 8:02 PM

#

Sure but you have to pass the dtype argument

#

(like, a dict mapping column names to types)

wheat snow Apr 25, 2025, 8:03 PM

#

but trying to run df.index= pd.to_datetime(df.index) returns the following

#

ValueError: Array must be all same time zone

#

and thats true, cause some conversions are made to +2 and +1

#

i assume because of szummertiem change

#

anyway i have to analyze the difference and the value_counts of the different +02:00 and +01:00 that exist

#

probably a string splitter?

viscid urchin Apr 25, 2025, 8:07 PM

#

Oh that's weird; so it won't let you tz_convert the ISO8601 strings?

wheat snow Apr 25, 2025, 8:08 PM

#

viscid urchin Oh that's weird; so it won't let you `tz_convert` the ISO8601 strings?

yup

#

but i know it worked before

#

cause it was an old project ijust picked up again

#

i mean i casted tz.convert in the pythjon script, and that works

#

trying to run it in the jupyter just returns me that i cannot use that command on the index

rich moth Apr 25, 2025, 8:09 PM

#

is it a trading bot? just curious. felt like ive dealt with this before

wheat snow Apr 25, 2025, 8:09 PM

#

cause it thinks its not a datetiem object

wheat snow Apr 25, 2025, 8:09 PM

#

rich moth is it a trading bot? just curious. felt like ive dealt with this before

nah, its csv spotify streaming history

rich moth Apr 25, 2025, 8:09 PM

#

ahh ok

wheat snow Apr 25, 2025, 8:09 PM

#

my top 1 song is still from kanye west

#

ims o cooked

#

307 plays of Stronger

#

2.3k hours is actually fine for 7 years

#

a more condensed version is found in #1365417667573583943

verbal oar Apr 25, 2025, 8:32 PM

#

gen ai course, video or book?

zealous dawn Apr 25, 2025, 8:35 PM

#

Hello, I have a little problem, got some embeddings done in clap (512vectors) and want to cluster them using HDBSCAN, I get OOM pretty quick because I've got the embeddings on 50k files. How can I fix this, it's kinda out of my league.

Tried some LLM answers was:

use k-NN to build a sparse distance matrix and metric='precomputed'

Dimensionality Reduction with PCA

verbal oar Apr 25, 2025, 8:38 PM

#

pca is compression technique

#

dont sure but can stem or lemmatize words

#

to make fewer or have shorter words

#

also chunk files to not process at once 50k

#

I mean split

zealous dawn Apr 25, 2025, 8:46 PM

#

verbal oar also chunk files to not process at once 50k

but for clustering specifically I have to run them all at once for what I know

verbal oar Apr 25, 2025, 8:46 PM

#

ah ok

zealous dawn Apr 25, 2025, 8:47 PM

#

maybe im mistaken and maybe there is a way to split them on disk, but honestly I dont have that much expertise so im looking for a solution

#

and Id like to avoid taking smaller samples because then ill have to reorder all the files I have in the correct clusters

viscid urchin Apr 25, 2025, 8:53 PM

#

zealous dawn Hello, I have a little problem, got some embeddings done in clap (512vectors) an...

I know you say you want to use HDBSCAN, but this is incremental and might be worth a look? https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html
You could start with that and refine it with HDBSCAN maybe?

scikit-learn

Birch

Gallery examples: Compare BIRCH and MiniBatchKMeans Comparing different clustering algorithms on toy datasets

#

I see a paper about streaming DBSCAN but I don't, sadly, understand it yet.

zealous dawn Apr 25, 2025, 8:55 PM

#

viscid urchin I know you say you want to use HDBSCAN, but this is incremental and might be wor...

I don't want to necessarily use HDBSCAN but I was trying with DBSCAN initially and moved to HDBSCAN which was the first that worked, I kind of want some granularity between clusters where for example it can detect if something is a car or a chainsaw, it should be distinct (the embeddings are made from audio files)

#

Ill take a look

thick rapids Apr 25, 2025, 9:11 PM

#

hey guys

#

do you suggest tensorflow over pytorch for machine learning

#

general purpose machine learning

agile cobalt Apr 25, 2025, 9:17 PM

#

thick rapids do you suggest tensorflow over pytorch for machine learning

no, over the last few years pytorch has greatly overtaken tensorflow in popularity

for some things you don't need of either of them though, just sklearn could be enough if you don't need of neural networks

thick rapids Apr 25, 2025, 9:19 PM

#

agile cobalt no, over the last few years pytorch has greatly overtaken tensorflow in populari...

At the moment I don’t need complex neural networks but I’m thinking of taking an advanced machine learning class and you know, better be prepared

untold fable Apr 25, 2025, 11:35 PM

#

from flask import Flask, render_template, request, jsonify
import requests

app = Flask(name)

Replace this with your Hugging Face token

HUGGINGFACE_API_KEY = "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"

Inference API URL

API_URL = "https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta"

headers = {
"Authorization": f"Bearer {HUGGINGFACE_API_KEY}",
"Content-Type": "application/json"
}

@app.route('/')
def home():
return render_template("index.html")

@app.route("/api/chat", methods=["POST"])
def chat():
try:
user_message = request.json.get("message")
print("User Message:", user_message)

    prompt = f"You are a helpful medical assistant named Oxy. {user_message}"

    # Generate a reply using a local model pipeline (if installed)
    # Or you can use a simple hardcoded reply for now
    # Here's a dummy response for testing:
    response_text = ""

    if "fever" in user_message.lower():
        response_text = "It sounds like you have a fever. Stay hydrated, rest, and monitor your temperature regularly."
    elif "cold" in user_message.lower():
        response_text = "Symptoms of a cold include a runny nose, sore throat, and mild fatigue. Get rest and drink fluids!"
    else:
        response_text = "Sorry, I couldn't understand. Try rephrasing your question."

    return jsonify({"reply": response_text})

except Exception as e:
    import traceback
    traceback.print_exc()
    return jsonify({"reply": "⚠️ Something went wrong. Please try again."}), 500

#

📎 app.py

arctic wedgeBOT Apr 25, 2025, 11:36 PM

#

untold fable

~~Please react with ✅ to upload your file(s) to our paste bin, which is more accessible for some users.~~

untold fable Apr 25, 2025, 11:36 PM

#

i need help why this is not workig

agile cobalt Apr 25, 2025, 11:49 PM

#

untold fable from flask import Flask, render_template, request, jsonify import requests app ...

the HUGGINGFACE_API_KEY (and anything with "key" or "secret" in the name overall) is supposed to be kept secret, API keys are used to identify who is making the request, may provide access to confidential information owned by the account that created them, and any operations that have a cost will be billed to whoever owns the API key

Be very careful not to share them.

#

Ideally shouldn't include it in the code in first place, but rather use environment variables or other ways of managing secrets

#

(go delete/revoke/regenerate it in your HuggingFace settings ASAP)

fallow coyote Apr 26, 2025, 12:50 PM

#

agile cobalt the `HUGGINGFACE_API_KEY` (and anything with "key" or "secret" in the name overa...

I remember seeing something about storing your api keys in an .env file. What exactly is an .env file and what makes them useful?

gray slate Apr 26, 2025, 12:52 PM

#

fallow coyote I remember seeing something about storing your api keys in an .env file. What ex...

it's a file with lines of text like:

NAME=value
OTHER_NAME="some other value"

you can source .env on it, and if you pip install envfile it'll load them up for you I think. at least it does in vs code

#

you can put .env into .gitignore so it doesn't get shared with anyone

fallow coyote Apr 26, 2025, 12:55 PM

#

gray slate it's a file with lines of text like: ``` NAME=value OTHER_NAME="some other value...

But what about the overall usage of an .env file? As in what would you use an env file aside from storing and preventing others from using your API keys?

calm thicket Apr 26, 2025, 12:58 PM

#

fallow coyote But what about the overall usage of an .env file? As in what would you use an en...

env means "environment", as in environment variables. that's all it does

leaden narwhal Apr 26, 2025, 1:39 PM

#

Hello everyone, this next weekend I’m going to have a coding challenge and I’m going to need to tackle docker, aws s3, lambda and ec2, flask/fast and restapi and pytest. Does anyone have a comprehensive kaggle notebook or GitHub repository link in which I can get some practical experience. Thanks!

serene scaffold Apr 26, 2025, 1:48 PM

#

leaden narwhal Hello everyone, this next weekend I’m going to have a coding challenge and I’m g...

there's not going to be a kaggle notebook that covers all of these.

#

or possibly any, since you don't really use flask, fastapi, or pytest in a notebook.

#

and docker and aws aren't part of python.

mighty grove Apr 26, 2025, 1:53 PM

#

Hello! Anybody here with experience with the Awpy library?

serene scaffold Apr 26, 2025, 1:53 PM

#

mighty grove Hello! Anybody here with experience with the Awpy library?

Remember to always ask your whole question so that someone who knows the answer can start answering it. Never ask to ask.

leaden narwhal Apr 26, 2025, 2:03 PM

#

serene scaffold and docker and aws aren't part of python.

Of course but this is integrated into your work environment framework

gritty vessel Apr 26, 2025, 2:08 PM

#

Hey guys

#

How do you find relation between continous values and categorical values?

#

i checked the distribution of data what are the ways i can see if there is linear or non linear relation between them

#

pearson coorelation will not work as we need mean as well in it

#

but we cant find mean of categorical values

leaden narwhal Apr 26, 2025, 2:10 PM

#

Cosine similarities?

#

Maybe I’m yapping

gritty vessel Apr 26, 2025, 2:12 PM

#

its not used for coorelation I think

leaden narwhal Apr 26, 2025, 2:14 PM

#

gritty vessel its not used for coorelation I think

Have you tried an ANOVA

gritty vessel Apr 26, 2025, 2:15 PM

#

annova has assumptions that data is normal distributed

leaden narwhal Apr 26, 2025, 2:15 PM

#

Normalize it then

gritty vessel Apr 26, 2025, 2:15 PM

#

#

i got three categories

#

#

#

I got three categories

#

01,2

leaden narwhal Apr 26, 2025, 2:16 PM

#

GMP is your target right?

gritty vessel Apr 26, 2025, 2:16 PM

#

and TIR!,WV,MIR are continous vars

gritty vessel Apr 26, 2025, 2:16 PM

#

leaden narwhal GMP is your target right?

yes

#

so i plotted the distribution of each TIr1,wv,mir whenever flag is 0 ,1,2

leaden narwhal Apr 26, 2025, 2:17 PM

#

Have you done modelling yet

gritty vessel Apr 26, 2025, 2:17 PM

#

no I am performing eda

#

trying to find if there is any relation between these vars with GPM Flag

#

after this I will move to Modelling

leaden narwhal Apr 26, 2025, 2:18 PM

#

Try kruskal-wallis

#

Test

gritty vessel Apr 26, 2025, 2:19 PM

#

okie

#

do you think this is normally distributed?

leaden narwhal Apr 26, 2025, 2:20 PM

#

It’s used for not normal distributions

gritty vessel Apr 26, 2025, 2:20 PM

#

yes I checked it

leaden narwhal Apr 26, 2025, 2:26 PM

#

gritty vessel yes I checked it

How were the results

gritty vessel Apr 26, 2025, 2:34 PM

#

doing it now

#

I was watching a vid about it

#

still watching I have to watch mann witney test before this

gritty vessel Apr 26, 2025, 2:36 PM

#

leaden narwhal How were the results

I will update you after going through them

quaint mulch Apr 26, 2025, 2:39 PM

#

gritty vessel

plot all 3 in one plot

#

use histtype='step'
https://matplotlib.org/stable/gallery/statistics/histogram_histtypes.html

gritty vessel Apr 26, 2025, 2:41 PM

#

okie just a min

quaint mulch Apr 26, 2025, 2:41 PM

#

so it looks like this, so you can compare https://stackoverflow.com/questions/26691836/multiple-step-histograms-in-matplotlib

Stack Overflow

Multiple step histograms in matplotlib

Dear python/matplotlib community,

I am having an issue within matplotlib: I can't seem to plot multiple overlaid histograms in the same plot space using the following:

binsize = 0.05

min_x_data_...

#

another alternative is by adjusting alpha. this is a matter of preference

#

depending on the number of sample, if you only have 3 continous variable, you can also plot 3 scatters, each scatter, will plot 2

#

https://seaborn.pydata.org/generated/seaborn.pairplot.html

#

(Actually seaborn pairplot is the way to go haha) gives you everything you need

gritty vessel Apr 26, 2025, 2:46 PM

#

#

#

#

interesting

gritty vessel Apr 26, 2025, 2:48 PM

#

quaint mulch (Actually seaborn pairplot is the way to go haha) gives you everything you need

for categorical values?

#

I bin the data based on the categories and then I create plot Like I am doing i nthese

quaint mulch Apr 26, 2025, 2:54 PM

#

gritty vessel for categorical values?

yes, like this

#

sns.pairplot(penguins, hue="species")

quaint mulch Apr 26, 2025, 2:56 PM

#

gritty vessel

no, I mean the other way around.
for each figure, you plot the same variable e.g. TIR1
but you plot 3 different histogram for differeng GPM_Flag

gritty vessel Apr 26, 2025, 2:57 PM

#

Oh ok got it

#

so we will find difference

#

#

#

#

I excluded Flag = 0 as it represents no rain

#

and its in dominance and because of that flag 1 and flag 2 were not visible properly

quaint mulch Apr 26, 2025, 3:02 PM

#

desnity=True
coz you have imbalance data

#

and you can plot all 3 if you use density=True

#

it feels so weird to do back seat EDA but also fun

#

lol

gritty vessel Apr 26, 2025, 3:04 PM

#

Why we are not considering frequency ?

quaint mulch Apr 26, 2025, 3:04 PM

#

well, we are trying to compare the distribution no?
like if they have different mean/std / peak etc2?

#

if they have different shape?

gritty vessel Apr 26, 2025, 3:04 PM

#

#

#

quaint mulch Apr 26, 2025, 3:05 PM

#

yeap

gritty vessel Apr 26, 2025, 3:05 PM

#

quaint mulch well, we are trying to compare the distribution no? like if they have different ...

yes we can say that I want to see whats the relation between them

quaint mulch Apr 26, 2025, 3:06 PM

#

gritty vessel

You can say that flag0 causes TIR1_Temp to go down?

gritty vessel Apr 26, 2025, 3:06 PM

#

no

#

for flag 0

#

TIR1_Temp is on higher side

quaint mulch Apr 26, 2025, 3:07 PM

#

gritty vessel

increasing flag increases the skew of WV_Temp?

quaint mulch Apr 26, 2025, 3:07 PM

#

gritty vessel TIR1_Temp is on higher side

sorry yes, lol

gritty vessel Apr 26, 2025, 3:07 PM

#

#

for flag 1 it goes to lower side

quaint mulch Apr 26, 2025, 3:08 PM

#

gritty vessel

i.e. negative skewness

gritty vessel Apr 26, 2025, 3:08 PM

#

yes

quaint mulch Apr 26, 2025, 3:09 PM

#

gritty vessel

maybe im blind lol, or just preference, but I prefer to look at these 3 plots, instead of that grid of 9.

gritty vessel Apr 26, 2025, 3:09 PM

#

its easier to comapre when its three in one

quaint mulch Apr 26, 2025, 3:09 PM

#

gritty vessel yes we can say that I want to see whats the relation between them

do you mean statistical test, or just visualisation?

gritty vessel Apr 26, 2025, 3:10 PM

#

just visulisation

quaint mulch Apr 26, 2025, 3:10 PM

#

gritty vessel

I mean, is this good enough, or you want more?

gritty vessel Apr 26, 2025, 3:10 PM

#

Good enough

#

it looks like Tir1 will be the major predictor

#

for flag

#

Im sending the pairplot

gritty vessel Apr 26, 2025, 3:13 PM

#

quaint mulch I mean, is this good enough, or you want more?

is it ok If i send it in some time? I have to go to mess for dinner or it will close

quaint mulch Apr 26, 2025, 3:16 PM

#

lol hahaha, you don't have to send it to me, it is for yourself to dechiper

#

enjoy dinner, I have to sleep too

gritty vessel Apr 26, 2025, 3:24 PM

#

gritty vessel Apr 26, 2025, 3:24 PM

#

quaint mulch lol hahaha, you don't have to send it to me, it is for yourself to dechiper

Haha Thank you

quaint mulch Apr 26, 2025, 3:26 PM

#

hahaha, you need to ajdust alpha, maybe use KDE, and turn on density
Might be easier to redo it in pyplot manualy
but this might give you interesting insight, and I hope you get the gist

steep cypress Apr 26, 2025, 4:08 PM

#

hello 👋
had a question, how to approach multivariate time series anomaly detection with or without ML? Actually the inputs are signals in a steel manufacturing plant -- so signals of equipment at latter stage correspond to the same metal from earlier region.
There's data drift as well i.e. whenever the plant turns on after a config change, value range of many signals change. We have over 100 signals corresponding to many regions

would really appreciate some guidance and thoughts!

gritty vessel Apr 26, 2025, 4:22 PM

#

quaint mulch hahaha, you need to ajdust alpha, maybe use KDE, and turn on density Might be ea...

oh yes i used kde but didnt turned on density I will go that got it I got the idea now

rich moth Apr 26, 2025, 4:23 PM

#

gritty vessel Apr 26, 2025, 6:25 PM

#

steep cypress hello 👋 had a question, how to approach multivariate time series anomaly detec...

you can start with checking the trends of signals over a period of time

steep cypress Apr 26, 2025, 6:27 PM

#

gritty vessel you can start with checking the trends of signals over a period of time

@gritty vessel yes we do this for some signals with rule based methods.

gritty vessel Apr 26, 2025, 6:27 PM

#

something like pinns?

#

but for signals

rose spade Apr 26, 2025, 6:34 PM

#

How do I create and train my own AI Model in scratch for a voice recognition with text translation using PyTorch?

gritty vessel Apr 26, 2025, 7:39 PM

#

https://discord.com/channels/267624335836053506/1365761594411450378 hey guys can you discuss with me what I can do on this

lapis sequoia Apr 26, 2025, 8:46 PM

#

you do ml with python or c++?

gritty vessel Apr 26, 2025, 8:49 PM

#

python

lapis sequoia Apr 26, 2025, 8:49 PM

#

gritty vessel python

dont be biased

gritty vessel Apr 26, 2025, 8:50 PM

#

c++ is faster

#

but I need more functions and all for easier workflow

wheat snow Apr 26, 2025, 10:44 PM

#

how do i check dtype of smth again?

#

i suppose it is a numpy array

viscid urchin Apr 26, 2025, 10:45 PM

#

somearray.dtype

wheat snow Apr 26, 2025, 11:10 PM

#

i remembered type(smth) exists

#

in normal python

#

can someone recommend a course/video series/webpage/book to learn about some matplotlib fundamentels and how i responsible setup plots with stuff like .subplot, .ax and that stuff.... and especially when and when to use the fig,ax setup method

glacial root Apr 26, 2025, 11:28 PM

#

for a liscence plate recognizer would using easyocr be considered cheating

viscid urchin Apr 26, 2025, 11:30 PM

#

Depends on the "rules"; it's certainly not cheating if you just need to get it done

glacial root Apr 26, 2025, 11:45 PM

#

viscid urchin Depends on the "rules"; it's certainly not cheating if you just need to get it d...

my goal is to learn as much as possible, and for that doing it from scratch would probably be best but also it would be super inefficent

#

so i think i'll stick to just doing implementations from scratch and using some libraries for projects

#

but it seems like easyocr takes off a little too much of the work

viscid urchin Apr 26, 2025, 11:47 PM

#

easyocr turns it into a text classification problem vs. an image recognition one, yeah

glacial root Apr 26, 2025, 11:48 PM

#

i see

#

my reason for doing this project is to use both

#

so the end goal is for it to be able to recognize the location of the liscence plate on an image and then print the plate number elsewhere

#

though i have a lot to learn before i can do this if i want to truly understand the concepts behind it

rustic elk Apr 27, 2025, 2:19 AM

#

I'm building a KNN using sklearn and using cross-validation to find the best value for k.
Is it overkill to test different numbers of folds and pick the most common k across them?

serene scaffold Apr 27, 2025, 2:27 AM

#

rustic elk I'm building a KNN using `sklearn` and using cross-validation to find the best v...

if you're doing k nearest neighbors and doing m fold cross validation, you can try several combinations of k and m, yes

rustic elk Apr 27, 2025, 2:27 AM

#

Yes forgot to mention, different k values as well

serene scaffold Apr 27, 2025, 2:27 AM

#

(it's usually called k fold cross validation, but we're already using k for k nearest neighbors, so I picked a different letter)

serene scaffold Apr 27, 2025, 2:27 AM

#

rustic elk Yes forgot to mention, different k values as well

pick a different letter for each variable

rustic elk Apr 27, 2025, 2:28 AM

#

rustic elk Yes forgot to mention, different k values as well

nvm it was implied in my first question

#

but anyways, thanks

serene scaffold Apr 27, 2025, 2:29 AM

#

yw

#

you can use itertools.product to loop over all the "combinations"

rustic elk Apr 27, 2025, 2:30 AM

#

I did it "manually" with the help of GridSearchCV from sklearn because I want to plot the results for each fold / k later

serene scaffold Apr 27, 2025, 2:30 AM

#

sure

toxic pilot Apr 27, 2025, 3:06 AM

#

getting really wild fluctuations while testing with the valid set, but the train set seems to be doing fine... anybody know why this might be?

#

increasing batch size seems to help a little bit, so i guess im over fitting? it's a binary classifier and my average validation epoch accuracy is hovering at around 50-60% (so it looks like pretty much random classification)

tawdry finch Apr 27, 2025, 6:49 AM

#

i dont want to get into web development i want to becoma a data analyst or fo in the field of fintech, ai ml what should i do i know some basic python and currently learning flask and web scraping etc

#

i m a student by the way just cleared of high school

lavish wraith Apr 27, 2025, 8:04 AM

#

print(my_array)
print(my_array.dtype)

my_series = pd.Series([1,np.nan,2,3])
print(my_series)```
 what is difference numpy data type and pandas numpy and pandas also same  handle missing value

wooden sail Apr 27, 2025, 8:10 AM

#

they're basically the same, pandas uses numpy

clear topaz Apr 27, 2025, 9:12 AM

#

Hello everyone!
Can anybody suggest some good resources to learn AIML? I am already good with basic data structures and have done some web scraping, so I think I am eligible to dive directly into concepts.

#

Oh... This chat seems to be really dead

grand minnow Apr 27, 2025, 9:14 AM

#

clear topaz Hello everyone! Can anybody suggest some good resources to learn AIML? I am alre...

Did you check the pinned message?

clear topaz Apr 27, 2025, 9:14 AM

#

Oh sorry forgot to mention that. And yes I did check the pinned messages, but I am still confused to find resources from https://jgreenemi.github.io/MLPleaseHelp/

grand minnow Apr 27, 2025, 9:16 AM

#

It doesn't look that confusing. Its literally a resource list

dreamy night Apr 27, 2025, 9:16 AM

#

guys is there any good course available on statistics ? for ml and dl enthusiasts

grand minnow Apr 27, 2025, 9:16 AM

#

dreamy night guys is there any good course available on statistics ? for ml and dl enthusiast...

Have you looked at the pinned messages?

clear topaz Apr 27, 2025, 9:17 AM

#

grand minnow It doesn't look that confusing. Its literally a resource list

Sorry but, there are 104 of them... And I can't seem to decide the best one for my cause.

dreamy night Apr 27, 2025, 9:17 AM

#

grand minnow Have you looked at the pinned messages?

ohhh will check.thanks

grand minnow Apr 27, 2025, 9:17 AM

#

clear topaz Sorry but, there are 104 of them... And I can't seem to decide the best one for ...

Try the first one

clear topaz Apr 27, 2025, 9:17 AM

#

Ok

verbal oar Apr 27, 2025, 12:34 PM

#

is code generation special case of nlg?

#

I searched and says nlp so maybe yes

serene scaffold Apr 27, 2025, 1:18 PM

#

verbal oar is code generation special case of nlg?

pretty much

toxic pilot Apr 27, 2025, 1:19 PM

#

lavish wraith ```my_array = np.array([1,np.nan,2,3]) print(my_array) print(my_array.dtype) my...

same same. under the hood they’re implemented the same way

serene scaffold Apr 27, 2025, 1:20 PM

#

numpy has been the backend for pandas, but that's largely an implementation detail.

verbal oar Apr 27, 2025, 1:22 PM

#

so it looks like I must sample github for code?

#

because massive amounts of data is beyond my storage

#

looks like biggest roadblock is data collection when I have data I just use some llm model train on data and predict as usual

#

Data:
AI code generation models are trained on massive datasets of public code, including open-source projects.

#

maybe there is some code dataset?

#

it will be quicker than collecting

#

just want to see how it works assuming I have already data

#

I assume its not like
prompt: write hello
code: print("hello")

result: print("hello")
because it looks like pairs or rule based

toxic pilot Apr 27, 2025, 1:52 PM

#

verbal oar maybe there is some code dataset?

check kaggle or huggingface

verbal oar Apr 27, 2025, 1:58 PM

#

ok

toxic pilot Apr 27, 2025, 1:58 PM

#

verbal oar I assume its not like prompt: write hello code: print("hello") result: print("...

natural language generation is disntincly not rule based

rustic elk Apr 27, 2025, 3:30 PM

#

How likely is that two distinct k values perform exactly the same in a knn algorithm ?

#

Precision scores (and acc) are exactly the same (15+ floating point precision)

toxic pilot Apr 27, 2025, 3:49 PM

#

rustic elk How likely is that two distinct `k` values perform **exactly** the same in a `kn...

its possible, but exceedingly unlikely

#

basically unless you have an incredibly noisy dataset or a very easily separable data set, it probably won't happen

umbral hearth Apr 27, 2025, 3:58 PM

#

tawdry finch i m a student by the way just cleared of high school

ارحب

rustic elk Apr 27, 2025, 3:59 PM

#

toxic pilot basically unless you have an incredibly noisy dataset or a very easily separable...

can it be caused by a mistake on my side ?

#

I am using sklearn, the algorithm is not my own

verbal oar Apr 27, 2025, 4:01 PM

#

looks like bad param/params

toxic pilot Apr 27, 2025, 4:01 PM

#

rustic elk can it be caused by a mistake on my side ?

likely, yeah

rustic elk Apr 27, 2025, 4:02 PM

#

shi

verbal oar Apr 27, 2025, 4:02 PM

#

compare your to some example knn

toxic pilot Apr 27, 2025, 4:03 PM

#

rustic elk shi

might be your dataset as well

umbral hearth Apr 27, 2025, 4:03 PM

#

I got a really quick question
Is it worth it to get the PCPP-32-1?
I got pcep and pcap and i feel like pcpp practices arent that important

verbal oar Apr 27, 2025, 4:04 PM

#

like not scaled, normalized?

rustic elk Apr 27, 2025, 4:05 PM

#

verbal oar compare your to some example knn

I mean, mine is pretty straighfarward as well

 def feed_data(self, train_data):
    self.X = train_data.drop(self.response_column, axis=1)
    self.y = train_data[self.response_column]
                                                                                                  
    categorical_cols = self.X.select_dtypes(include=["object", "category"]).columns.tolist()
    numeric_cols = self.X.select_dtypes(include=["int64", "float64"]).columns.tolist()
                                                                                                  
    self.preprocessor = ColumnTransformer(
        transformers=[
            ("num", StandardScaler(), numeric_cols),
            ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols),
        ]
    )
                                                                                                  
    self.X_train, self.X_valid, self.y_train, self.y_valid = train_test_split(
        self.X, self.y, test_size=self.test_size, random_state=self.random_state, stratify=self.y
    )

#

def fit(self):
    if self.best_k is None:
        raise ValueError("k value has not been set yet. Call find_best_k() or define it manually when creating the KNN")
    if self.X is None:
        raise ValueError("Training data has not been fed to network yet. Call feed_data() first.")
                                                                                                                                  
    self.final_model = Pipeline(
        [
            ("preprocessor", self.preprocessor),
            ("classifier", KNeighborsClassifier(n_neighbors=self.best_k)),
        ]
    )                                                                                                                           
    self.final_model.fit(self.X_train, self.y_train)
                    
    validation_predictions = self.final_model.predict(self.X_valid)
    validation_accuracy = accuracy_score(self.y_valid, validation_predictions)
    validation_precision = precision_score(self.y_valid, validation_predictions, average="macro", zero_division=0) # type: ignore
                    
    self.validation_metrics = {
        "Accuracy": validation_accuracy,
        "Precision": validation_precision,
    }
                   
    precision_yes = precision_score(self.y_valid, validation_predictions, pos_label="yes", zero_division=0) # type: ignore
    precision_no = precision_score(self.y_valid, validation_predictions, pos_label="no", zero_division=0) # type: ignore
                      
    accuracy_yes = (self.y_valid[self.y_valid == "yes"] == validation_predictions[self.y_valid == "yes"]).mean() # type: ignore
    accuracy_no = (self.y_valid[self.y_valid == "no"] == validation_predictions[self.y_valid == "no"]).mean() # type: ignore
                        
                    
    self.final_model.fit(self.X, self.y)
         
    return self

verbal oar Apr 27, 2025, 4:06 PM

#

ok so you have standardscaler

rustic elk Apr 27, 2025, 4:08 PM

#

I don't remember why I added it, I think I was getting perfect results without it

#

lemme check something

toxic pilot Apr 27, 2025, 4:12 PM

#

rustic elk ```py def fit(self): if self.best_k is None: raise ValueError("k val...

also shuffle your set for training

verbal oar Apr 27, 2025, 4:12 PM

#

yes to avoid model of remembering

toxic pilot Apr 27, 2025, 4:13 PM

#

actually idk if that impact will be huge if ur tanking your weights between each run

toxic pilot Apr 27, 2025, 4:15 PM

#

umbral hearth I got a really quick question Is it worth it to get the PCPP-32-1? I got pcep a...

definitely not

#

those certifications and stuff are pretty much entirely meaningless

rustic elk Apr 27, 2025, 4:16 PM

#

removing StandardScaler my acc and pre goes down, also my network seems to prefer the smallest k value

#

verbal oar Apr 27, 2025, 4:16 PM

#

but scaling is best practice

rustic elk Apr 27, 2025, 4:16 PM

#

(I am testing different number of folds and k's)

toxic pilot Apr 27, 2025, 4:16 PM

#

rustic elk (I am testing different number of folds and k's)

reduce your model size?

#

that might help a bit

rustic elk Apr 27, 2025, 4:17 PM

#

wdym ?

toxic pilot Apr 27, 2025, 4:17 PM

#

also it might just be your dataset

toxic pilot Apr 27, 2025, 4:17 PM

#

rustic elk wdym ?

seems like you’re under fitting

rustic elk Apr 27, 2025, 4:18 PM

#

This is with StandardScaler(), seems better I think but its weird that 3 and 7 have the exact same performance idk

#

First time tackling with this stuff so yea

rustic elk Apr 27, 2025, 4:19 PM

#

toxic pilot seems like you’re under fitting

wdym by "size" though

toxic pilot Apr 27, 2025, 4:19 PM

#

toxic pilot seems like you’re under fitting

like reduce dimensionality

#

linear discriminate analysis or whatever

verbal oar Apr 27, 2025, 4:21 PM

#

what this is about classifier of what?

toxic pilot Apr 27, 2025, 4:22 PM

#

rustic elk This is with `StandardScaler()`, seems better I think but its weird that 3 and 7...

you really just gotta tune all your parameters until it works out

#

and ofc it might be your dataset

verbal oar Apr 27, 2025, 4:22 PM

#

so looks like noisy dataset and or need still some preprocessing

rustic elk Apr 27, 2025, 4:23 PM

#

verbal oar what this is about classifier of what?

yea, its binary one, just yes / no

toxic pilot Apr 27, 2025, 4:23 PM

#

verbal oar so looks like noisy dataset and or need still some preprocessing

maybe noisy, or also could be wayyyy too well separated

rustic elk Apr 27, 2025, 4:23 PM

#

for k = 3 the results are good no ?

#

#

just weird but idk if I want to bother more with it

toxic pilot Apr 27, 2025, 4:25 PM

#

rustic elk

pretty crazy results

rustic elk Apr 27, 2025, 4:25 PM

#

why ?

toxic pilot Apr 27, 2025, 4:25 PM

#

they’re good!

rustic elk Apr 27, 2025, 4:26 PM

#

ah aright

verbal oar Apr 27, 2025, 4:26 PM

#

overfitting?

toxic pilot Apr 27, 2025, 4:26 PM

#

feel like pushing for more accuracy would be… pushing it

rustic elk Apr 27, 2025, 4:26 PM

#

Exactly the same with k = 7 but ill leave it like that

toxic pilot Apr 27, 2025, 4:26 PM

#

verbal oar overfitting?

oh possibly

#

but usually if overfitting you’d see wild fluctuations in the validation

#

and there’d be a huge gap between accuracy during train and test

clear topaz Apr 27, 2025, 4:27 PM

#

Damn, I gotta learn so much yet
My syllabus in AIML diploma hasn't even truly started ig

verbal oar Apr 27, 2025, 4:27 PM

#

yes like these classic curves

#

on plot

clear topaz Apr 27, 2025, 4:28 PM

#

Lmao

toxic pilot Apr 27, 2025, 4:28 PM

#

verbal oar yes like these classic curves

also often for over fitting, test accuracy will be a non negligible percentage higher than train accuracy

#

i don’t think it’s over fitted

verbal oar Apr 27, 2025, 4:28 PM

#

better seen on plot

rustic elk Apr 27, 2025, 4:28 PM

#

verbal oar overfitting?

I am cross validating for multiple num of folds and k values and picking the most common k among them

#

wouldn't that prevent it

toxic pilot Apr 27, 2025, 4:29 PM

#

rustic elk wouldn't that prevent it

i don’t think ur overfitting

toxic pilot Apr 27, 2025, 4:29 PM

#

toxic pilot but usually if overfitting you’d see wild fluctuations in the validation

^

verbal oar Apr 27, 2025, 4:31 PM

#

ok yes no but is it spam or else?

#

I mean what are you trying to classify?

rustic elk Apr 27, 2025, 4:32 PM

#

idk what spam is

#

I am trying to predict the if a person is a responder on a campaign based on some past data

verbal oar Apr 27, 2025, 4:33 PM

#

ok maybe now it will be easier to help

rustic elk Apr 27, 2025, 4:33 PM

#

64 female urban free never 1.00 1.00 0.00 0.00 0.00 no

#

This is what the data looks like

#

first number is age and the rest are logins the last 4 weeks, 6 months, purchases in last 4 weeks, 6 months and total purchases

#

they arent between 0-1 it just happend to be to this example i copied

verbal oar Apr 27, 2025, 4:35 PM

#

and last is no/yes i see

rustic elk Apr 27, 2025, 4:36 PM

#

I don't even know if there is something wrong, the only odd thing is the similarity between 3 and 7 k values

#

thats all

verbal oar Apr 27, 2025, 4:40 PM

#

I suggest to compare your code with some tutorial knn but watch out there will be different problem and dataset, but many things have in common

#

https://www.datacamp.com/tutorial/k-nearest-neighbor-classification-scikit-learn

K-Nearest Neighbors (KNN) Classification with scikit-learn

This article covers how and when to use k-nearest neighbors classification with scikit-learn. Focusing on concepts, workflow, and examples. We also cover distance metrics and how to select the best value for k using cross-validation.

rustic elk Apr 27, 2025, 4:46 PM

#

aright ill see what I can do

rustic elk Apr 27, 2025, 4:48 PM

#

verbal oar https://www.datacamp.com/tutorial/k-nearest-neighbor-classification-scikit-learn

(pic from website)
Seems like the acc for k = [10, 14] is the same

#

so I guess is an "ok" phenomenon

verbal oar Apr 27, 2025, 4:49 PM

#

yes looks like it could be, right

#

but you have 3 and 7 not 3 to 7 🤔

toxic pilot Apr 27, 2025, 4:51 PM

#

rustic elk (pic from website) Seems like the acc for k = [10, 14] is the same

yes but be a bit wary about these kinds of comparisons

rustic elk Apr 27, 2025, 4:51 PM

#

Ill try to copy his code see what i get

toxic pilot Apr 27, 2025, 4:51 PM

#

toxic pilot yes but be a bit wary about these kinds of comparisons

different model, different data, so accuracy could mean different things

verbal oar Apr 27, 2025, 4:59 PM

#

i saw weird jumps

rustic elk Apr 27, 2025, 5:00 PM

#

I think i did an opsie

verbal oar Apr 27, 2025, 5:01 PM

#

perhaps

toxic pilot Apr 27, 2025, 5:01 PM

#

wait i’m not seeing where the problem is

#

fluctuations are p normal

#

just not huge percentages in fluctuations

opaque condor Apr 27, 2025, 5:42 PM

#

How do I implement a stop so the network doesn't over fit to the data

serene scaffold Apr 27, 2025, 5:44 PM

#

opaque condor How do I implement a stop so the network doesn't over fit to the data

if you're doing the training in a loop, you can use an if statement to decide if the change in loss has flatlined, and break from the loop.

opaque condor Apr 27, 2025, 5:55 PM

#

Thank you

verbal oar Apr 27, 2025, 6:20 PM

#

self.final_model.fit(self.X_train, self.y_train)

...

self.final_model.fit(self.X, self.y)

is it ok?

#

not just one .fit?

toxic pilot Apr 27, 2025, 6:22 PM

#

opaque condor How do I implement a stop so the network doesn't over fit to the data

just find the derivative of the accuracy graph or the loss graph and see if it’s close to 0

#

another way to check is to just see if your epochs are seeing any improvements

toxic pilot Apr 27, 2025, 10:05 PM

#

gritty vessel c++ is faster

most of the data processing or ai/ml related libararies are implemented with C under the hood. python is a wrapper, and the overhead is not worth talking about

gray slate Apr 27, 2025, 10:06 PM

#

basically hot loops go a language that's slow to write and fast to execute, cool ones go in one that's easy to read and fast to write

toxic pilot Apr 27, 2025, 10:07 PM

#

gray slate basically hot loops go a language that's slow to write and fast to execute, cool...

i write most of my models in rust o_o

ripe rampart Apr 27, 2025, 10:08 PM

#

toxic pilot i don’t think ur overfitting

hi

toxic pilot Apr 27, 2025, 10:09 PM

#

ripe rampart hi

oh you’re here 0-0?

#

hai :D

flint onyx Apr 28, 2025, 4:31 AM

#

how would I do this question? 😭 have an exam tmr

#

would u1 be from bottom left to top right
and u2 perpendicular to that?
both passing through that black circle in the middle

arctic wedgeBOT Apr 28, 2025, 5:20 AM

#

:incoming_envelope: :ok_hand: applied timeout to @rich moth until <t:1745818246:f> (10 minutes) (reason: attachments spam - sent 8 attachments).

The <@&831776746206265384> have been alerted for review.

limber spear Apr 28, 2025, 5:34 AM

#

Oh snap. Plunder why are you spamming the channel 😂 post your plots in a single directory file or something

limber spear Apr 28, 2025, 5:36 AM

#

flint onyx how would I do this question? 😭 have an exam tmr

Are you supposed to draw vectors here 👀

#

Looks like you just have to draw 2 arrows here u1 and u2 in the direction of the PCA. Not sure about this. What do you think chat

rich moth Apr 28, 2025, 5:52 AM

#

https://drive.google.com/drive/folders/1s9G7Db1IVL1JiPmEX4lRTexWLkBcevKA?usp=sharing

Thanks @limber spear . Here yall go

limber spear Apr 28, 2025, 5:53 AM

#

This channel is too fire.

rich moth Apr 28, 2025, 6:08 AM

#

all the data I've been looking at about this makes me wonder something about data's overall structure, almost like its "dimensional constraints". so I made this formula. but in it I have this phase and it seems this phase angle is basically acting like almost some type of trace for that idea.

#

im calling it like a "structural DNA", but it seems inherent it in all data types based on the complexity measurement tool which gives you Φ(x)

#

anyone? what do you guys make of this ? honestly

limber spear Apr 28, 2025, 6:32 AM

#

rich moth anyone? what do you guys make of this ? honestly

I’m learning set theory this semester. Worth a look into. It provides a framework to develop a mathematical theory of, get this—infinity.

#

https://tenor.com/view/gloop-gif-25555044

Tenor

rich moth Apr 28, 2025, 6:38 AM

#

well at first i though just the different curriculm method you used determined by the domain was the key, but then i saw something different in all the data . something about the inherent complexity on data thats based off magnitude and phase paint a realy important picture

verbal oar Apr 28, 2025, 6:40 AM

#

can I transfer learn text to code I mean ready made text model which works on code?

#

because code is some kind of text

rich moth Apr 28, 2025, 6:41 AM

#

🤯

#

what?

verbal oar Apr 28, 2025, 6:42 AM

#

so use some text generating model to adapt to code

rich moth Apr 28, 2025, 6:43 AM

#

ok

#

Im super confused.

limber spear Apr 28, 2025, 6:43 AM

#

Text to code. Like text to speech blobhuh TTS

verbal oar Apr 28, 2025, 6:44 AM

#

I see there are code gen models but text gen models are more

rich moth Apr 28, 2025, 6:44 AM

#

Isnt Mistral a good all round one with coding?

#

Ohh i see.

verbal oar Apr 28, 2025, 6:45 AM

#

dont know yet

rich moth Apr 28, 2025, 6:45 AM

#

Text to code.

verbal oar Apr 28, 2025, 6:45 AM

#

yes

#

write hello
print("hello")

#

text to code

limber spear Apr 28, 2025, 6:46 AM

#

I don’t think even ChatGPT can do that at the moment. Or the trending DeepSeek model tbh

#

Claude. Pretty much every ‘cutting edge AI’ today 😂

verbal oar Apr 28, 2025, 6:47 AM

#

yes claude can

limber spear Apr 28, 2025, 6:48 AM

#

Explain explain pikawow

verbal oar Apr 28, 2025, 6:48 AM

#

for example can generate some snippet of sentiment analysis

limber spear Apr 28, 2025, 6:48 AM

#

Does it have speech to code

verbal oar Apr 28, 2025, 6:49 AM

#

dont know

limber spear Apr 28, 2025, 6:49 AM

#

But my point is. Your idea is innovative 👍

verbal oar Apr 28, 2025, 6:50 AM

#

I didnt tested it i just got some code from someone one year ago and I thought he wrotes but said its claude output

limber spear Apr 28, 2025, 6:51 AM

#

If big tech steals our ideas know it came from the Python community 😌

polar shard Apr 28, 2025, 7:55 AM

#

hi

verbal oar Apr 28, 2025, 8:27 AM

#

https://claude.ai/login?returnTo=%2F%3F
its already there text to code

untold fable Apr 28, 2025, 8:27 AM

#

hey man

#

how are you

verbal oar Apr 28, 2025, 8:53 AM

#

fine
This is neat
I wrote
write sentiment analysis
got text
then
write code for it and above code as in screenshot

#

I dont think my idea is innovati e text to code is already

vivid skiff Apr 28, 2025, 9:01 AM

#

what is the difference between torch.compile and torch.export?

verbal oar Apr 28, 2025, 9:03 AM

#

compile is same here as in keras?

#

oh can also share link i stead of screenshot https://claude.ai/share/e1bbc591-3719-4fa1-898e-970d4fe3a733

#

still dont know how it works I mean input text output code

#

I saw somewhere it tokenizes code but what next?

limber spear Apr 28, 2025, 10:13 AM

#

verbal oar I dont think my idea is innovati e text to code is already

You have to think text to code <-> code to text <-> text to speech <-> speech to text. When YouTube videos or Zoom and Teams meetings are transcribed to text(subtitles), they are slow and 100% not accurate.

When you apply this idea to applications of text to code, there is nothing that comes to mind that is cutting edge ‘stable’ even with sentiment analysis

#

Mind you these are ‘production grade’ products 😂

#

Ok I have to backtrack. People worked hard on these products.

verbal oar Apr 28, 2025, 11:12 AM

#

so this is scam which need developer to fix 😂 ?

#

I dont trust these tools I just want to learn

#

I dont care about it I just want to make money if its legal

#

personally for me I dont use it

#

I have mindset just do dont care

#

why I dont trust because code on which is trained is not shared

#

its like lets train model on leetcode

#

but better to train it on gh repos but still different people have different style of writing code

#

for me its little controversial

#

and also I dont trust text generation I dont just see proof it works

#

if it makes mistakes or make mistakes sometimes

toxic pilot Apr 28, 2025, 11:19 AM

#

verbal oar I saw somewhere it tokenizes code but what next?

the tokenized output is fed thru a transformer and then passed through a dense layer. the output is then decoded to plain text

verbal oar Apr 28, 2025, 11:19 AM

#

ok explained thanks

toxic pilot Apr 28, 2025, 11:20 AM

#

i suppose stacked LSTMs could also work but it’d be slow and very inaccurate

#

you need the attention layers of transformers

toxic pilot Apr 28, 2025, 11:22 AM

#

limber spear Text to code. Like text to speech <:blobhuh:658062568191033373> TTS

you’d use different methodologies for TTS and code generation

verbal oar Apr 28, 2025, 11:22 AM

#

so just look at process of generating text and generalize it to code?

toxic pilot Apr 28, 2025, 11:22 AM

#

verbal oar so just look at process of generating text and generalize it to code?

well basically

verbal oar Apr 28, 2025, 11:22 AM

#

different but similar

toxic pilot Apr 28, 2025, 11:22 AM

#

that’s a pretty big oversimplification tho

verbal oar Apr 28, 2025, 11:23 AM

#

just implementation differs

toxic pilot Apr 28, 2025, 11:24 AM

#

verbal oar just implementation differs

well no not really 0_o

toxic pilot Apr 28, 2025, 11:24 AM

#

toxic pilot you’d use different methodologies for TTS and code generation

wait no i’m stupid

#

i though you said speech to text 💀

limber spear Apr 28, 2025, 11:25 AM

#

I was about to catch some Zzz’s 👀

#

The only difference between what some of us do in this space is 2 words, custom and proprietary. That is how I look at it

verbal oar Apr 28, 2025, 11:29 AM

#

no speech to text no
Im talking about text generation and text to code

#

and similarities about it which I didnt talk

limber spear Apr 28, 2025, 11:30 AM

#

It can get very complex very fast mq. For example have you looked into abstract syntax trees

verbal oar Apr 28, 2025, 11:31 AM

#

claude gave me lstm based instead of transformer text generation
for text to code I reached limit :sweatsmile:

#

on nlp course I had sth about formal grammars etc

#

Ah ast right because it parses code

limber spear Apr 28, 2025, 11:33 AM

#

Data science is a very new field, but in my opinion the foundation of it also includes foundations in computer science

verbal oar Apr 28, 2025, 11:34 AM

#

yes better explored with context of it

limber spear Apr 28, 2025, 11:35 AM

#

My hopes are that a lot in this field study rigorously. There is a lot to explore 🫡

verbal oar Apr 28, 2025, 11:37 AM

#

yes for example feeling semantic web but why i need it and then you meet topic of nlp where you see usage of semantic web and ontology

#

yes btw wordnet is for text is there sth for code?

limber spear Apr 28, 2025, 11:39 AM

#

Maybe lsp’s

toxic pilot Apr 28, 2025, 11:52 AM

#

limber spear Data science is a very new field, but in my opinion the foundation of it also in...

data science has been around forever

toxic pilot Apr 28, 2025, 11:54 AM

#

limber spear It can get very complex very fast mq. For example have you looked into abstract ...

wait this is actually pretty smart

#

you can feed your context as an AST and have the model predict the next node. then, deterministically convert the ast to the language in question

verbal oar Apr 28, 2025, 12:08 PM

#

lsp as in language server or other meaning?

#

yes I think it was about language server protocol

#

which provides language intelligence tools

#

of course server provides not lsp

toxic pilot Apr 28, 2025, 12:30 PM

#

???

#

you dont need an lsp to do code generation

verbal oar Apr 28, 2025, 12:34 PM

#

maybe lsp's

quaint mulch Apr 28, 2025, 1:19 PM

#

flint onyx would u1 be from bottom left to top right and u2 perpendicular to that? both pas...

that sounds good.
why are you not sure?

limber spear Apr 28, 2025, 4:42 PM

#

toxic pilot you dont need an lsp to do code generation

LSP’s drive every programming language

#

This is the backend of every programming language

#

You can dig further into compilers but that goes more to foundations in computer science

limber spear Apr 28, 2025, 4:48 PM

#

toxic pilot data science has been around forever

Computer science is a more robust field. I know because I dig into kernel code lol

#

What y’all cooking chat

#

I’m handwriting a cart decision tree build this week

limber spear Apr 28, 2025, 4:54 PM

#

verbal oar which provides language intelligence tools

This is probably where you want to research. Language intelligence tools

#

I’m not sure what do you think chat. Where are the tokenizers deadge

toxic pilot Apr 28, 2025, 4:55 PM

#

limber spear This is the backend of every programming language

LLVM is the backend for many programming languages

limber spear Apr 28, 2025, 4:55 PM

#

I’m well aware of llvm. Chris is in my LinkedIn network lol

toxic pilot Apr 28, 2025, 4:56 PM

#

all lsp does is interface with the IDE to provide syntax highlighting and completions and so on

#

nothing to do with the backend of compilers

verbal oar Apr 28, 2025, 4:57 PM

#

about to make some python code generation based on some gh repos, some prototype

toxic pilot Apr 28, 2025, 4:57 PM

#

limber spear I’m not sure what do you think chat. Where are the tokenizers <:deadge:129344055...

?

#

shouldn’t really matter. just use bert_cased or smth

#

might be useful to train a tokenizer on programming languages tho

limber spear Apr 28, 2025, 4:58 PM

#

Let’s help make mq the next Steve Jobs chat

verbal oar Apr 28, 2025, 5:00 PM

#

get rid of unemployment would be enough 😄

limber spear Apr 28, 2025, 5:00 PM

#

And ethqnol his Steve Wozniak 😉

#

I can be some rando founding dev deadge

#

‘No one cares about him. He is just a founding employee’

verbal oar Apr 28, 2025, 5:11 PM

#

so frontend of app is done with streamlit or flask?

#

instead of just showing in console which is not user friendly

weary timber Apr 28, 2025, 7:24 PM

#

https://discuss.pytorch.org/t/w-gan-loss-gets-stuck/219562

PyTorch Forums

W-GAN loss gets stuck

Hello, i tried to implement a W-GAN but ran into a stubborn problem. The loss for both the generator and the critic starts at 0, and slowly the generators loss rise to 1.5 and the critics loss falls to -2.8, and after that the losses stay very close to those values. I tried everything but couldnt get it fixed. Here is the full code: import t...

#

please help me with this im desperate i tried everything 😢😢😢

toxic pilot Apr 28, 2025, 7:33 PM

#

try decreasing batch size

flint onyx Apr 28, 2025, 7:33 PM

#

quaint mulch that sounds good. why are you not sure?

ah thanks. I havent seen a question like this before thats why I wasnt sure

toxic pilot Apr 28, 2025, 7:33 PM

#

weary timber https://discuss.pytorch.org/t/w-gan-loss-gets-stuck/219562

and maybe tune up lr a bit

verbal oar Apr 28, 2025, 8:19 PM

#

this is hugging face specific train and evaluate instead of fit and predict?

#

trainer.train()

serene scaffold Apr 28, 2025, 8:27 PM

#

verbal oar `trainer.train()`

I suppose. I think that with trainer objects, you specify the training and test data in advance, whereas with sklearn style models, you pass some or all of the training data (and not the test data) to fit.

toxic pilot Apr 28, 2025, 9:32 PM

#

serene scaffold I suppose. I think that with trainer objects, you specify the training and test ...

correct me if i’m wrong but learn/train is usually associated with supervised ml, while fit is more apt for unsupervised

serene scaffold Apr 28, 2025, 9:34 PM

#

toxic pilot correct me if i’m wrong but learn/train is usually associated with supervised ml...

Terminology is not used consistently in the Ai/ML world. sklearn used fit and predict for everything, and the way they use terminology is pretty influential in the python ecosystem.

toxic pilot Apr 28, 2025, 9:35 PM

#

serene scaffold Terminology is not used consistently in the Ai/ML world. sklearn used fit and pr...

sklearn is an unsupervised ml library no?

serene scaffold Apr 28, 2025, 9:35 PM

#

No, it has lots of supervised models

toxic pilot Apr 28, 2025, 9:35 PM

#

ah it does have KNN

#

it looks like

limpid dew Apr 28, 2025, 9:58 PM

#

what is KNN?

serene grail Apr 28, 2025, 10:01 PM

#

I'm guessing K Nearest Neighbors in this context

stray gulch Apr 28, 2025, 10:11 PM

#

serene grail I'm guessing K Nearest Neighbors in this context

yes, K Nearest Neighbors is a machine learning algorithm used for classification and regression

viscid urchin Apr 28, 2025, 10:18 PM

#

Is this a safe-looking tutorial to send an ML learner re: perceptrons? It looks fine to me at first glance, but maybe someone knows of a nicer one? Buddy of mine is asking for a good reference.
https://medium.com/@becaye-balde/perceptron-building-it-from-scratch-in-python-15716806ef64

Medium

Perceptron: Building it from scratch in python

In this tutorial, we will build a custom Perceptron from scratch, then test it on the overused Iris dataset ;).

#

Maybe this one is better because it summarizes the back-story more? https://pyimagesearch.com/2021/05/06/implementing-the-perceptron-neural-network-with-python/

PyImageSearch

Adrian Rosebrock

Implementing the Perceptron Neural Network with Python - PyImageSearch

First introduced by Rosenblatt in 1958, The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain is arguably the oldest and most simple of the ANN algorithms. Following this publication, Perceptron-based techniques were all the rage in…

past meteor Apr 28, 2025, 10:30 PM

#

viscid urchin Is this a safe-looking tutorial to send an ML learner re: perceptrons? It looks ...

Why do they want to learn about perceptron? It's very niche and not directly relevant, even if you want to move over to neural nets later

viscid urchin Apr 28, 2025, 10:32 PM

#

past meteor Why do they want to learn about perceptron? It's very niche and not directly rel...

I haven't exactly asked, but my guess is just that it's not that huge, plausible to code using just basic tools, etc?

#

They are sorta "my first supervised learning", right?

#

I may just be out of date though, hence the question I guess.

past meteor Apr 28, 2025, 10:33 PM

#

It's fairly straightforward but I'd recommend just covering linear and logistic regression

limpid dew Apr 29, 2025, 12:06 AM

#

I understand how trees work but could anyone explain to me how models which use trees like random forests learn the binary operators at each node? Is there some process which is analogous to back propagation?

serene scaffold Apr 29, 2025, 12:09 AM

#

limpid dew I understand how trees work but could anyone explain to me how models which use ...

There isn't any part of random forests that is similar to backpropogation.

#

with a random forest, you use all the decision trees that you made, and take all their predictions. you can either use the most frequent prediction as the system prediction, or weigh the prediction of each tree differently, or whatever you want.

toxic pilot Apr 29, 2025, 12:37 AM

#

viscid urchin They are sorta "my first supervised learning", right?

just tel him to do mnist or smth

opaque sphinx Apr 29, 2025, 12:56 AM

#

hi is anyone here familiar with google colab? having trouble with enabling t4 gpu runtime, need some help if possible. doing unsupervised learning here and my code seems stuck, found out that Im running with cpu so i changed to gpu t4 and theres a warning that says im not utilizing gpu

serene scaffold Apr 29, 2025, 1:06 AM

#

opaque sphinx hi is anyone here familiar with google colab? having trouble with enabling t4 gp...

You did Runtime > Change runtime type. Did you reset the notebook (namely the kernel?)

opaque sphinx Apr 29, 2025, 1:07 AM

#

serene scaffold You did Runtime > Change runtime type. Did you reset the notebook (namely the ke...

yes I changed to gpu t4 and i got this message

#

I tried googling it and someone mentioned i shud install pytorch and fastai to run it, when i did this came out instead

serene scaffold Apr 29, 2025, 1:08 AM

#

What library is your code using? Torch?

#

Oh, if you've maxed out your free GPU limit, you'll have to pay or wait.

#

Please always always share code as text

#

!code

arctic wedgeBOT Apr 29, 2025, 1:09 AM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

opaque sphinx Apr 29, 2025, 1:09 AM

#

serene scaffold Oh, if you've maxed out your free GPU limit, you'll have to pay or wait.

cant I use my own gpu to run it? instead of using google colab's

serene scaffold Apr 29, 2025, 1:09 AM

#

@opaque sphinx please permanently remember this ^^^^

opaque sphinx Apr 29, 2025, 1:10 AM

#

serene scaffold <@711980135859093535> please permanently remember this ^^^^

alright sorry, new here so forgot the rules here, very sorry

#

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import time

from collections                import Counter
from imblearn.over_sampling     import SMOTE

from sklearn.model_selection    import StratifiedKFold, train_test_split
from sklearn.preprocessing      import LabelEncoder, StandardScaler
from sklearn.ensemble           import IsolationForest
from sklearn.svm                import OneClassSVM
from sklearn.neighbors          import LocalOutlierFactor
from sklearn.metrics            import (
    classification_report,
    precision_score,
    recall_score,
    f1_score,
    make_scorer
)
from sklearn.inspection         import permutation_importance

warnings.filterwarnings('ignore')

serene scaffold Apr 29, 2025, 1:10 AM

#

opaque sphinx cant I use my own gpu to run it? instead of using google colab's

When you execute code on google colab, it's running on their server, not your computer.

What kind of GPU do you have?

opaque sphinx Apr 29, 2025, 1:11 AM

#

abit of context is I am trying to run to detect anomalies in a prescription dataset to detect errors, so i been running isolation forest, SVM and localoutlier factor, but the recall is too low, hence ive been doing tuning and using SMOTE for oversampling, but bcs of this I cant do so

opaque sphinx Apr 29, 2025, 1:11 AM

#

serene scaffold When you execute code on google colab, it's running on their server, not your co...

rtx3070

#

I also have a macbook M3 pro

opaque sphinx Apr 29, 2025, 1:12 AM

#

serene scaffold When you execute code on google colab, it's running on their server, not your co...

oof so is there no way to run it now on gpu unless i pay the 9.99$ a month?

#

been doing some reading online saying I need to run the model on pytorch, the documentation says i need to run tensorflow, but when I did the error came out again

serene scaffold Apr 29, 2025, 1:13 AM

#

If they told you that you used up your free compute, they're not kidding. I think it resets every day

serene scaffold Apr 29, 2025, 1:13 AM

#

opaque sphinx been doing some reading online saying I need to run the model on pytorch, the do...

Pytorch and tensorflow are for neural networks. You use one or the other

#

And it should be pytorch.

serene scaffold Apr 29, 2025, 1:15 AM

#

opaque sphinx been doing some reading online saying I need to run the model on pytorch, the do...

What did you actually do to "run it on tensorflow"? Just importing tensorflow has no effect.

opaque sphinx Apr 29, 2025, 1:15 AM

#

opaque sphinx I tried googling it and someone mentioned i shud install pytorch and fastai to r...

I havent tried tensorflow, I chose pytorch but now I cant even change the runtime to gpu, it just shows this error

serene scaffold Apr 29, 2025, 1:16 AM

#

Don't try tensorflow if you haven't already.

opaque sphinx Apr 29, 2025, 1:16 AM

#

ok i wont

#

!ltt install torch torchvision >> /.tmp
!pip install fastai --upgrade >> /.tmp

import torch
assert torch.cuda.is_available(), "GPU not available"

#

i ran these, but the output is also gpu not available, then i try change the runtime, same error occurs

serene scaffold Apr 29, 2025, 1:17 AM

#

You might not have installed the version of torch that has the cuda driver

limpid dew Apr 29, 2025, 1:17 AM

#

You can run it on your own gpu is you install CUDA

#

https://pytorch.org/get-started/locally/

PyTorch

Start Locally

opaque sphinx Apr 29, 2025, 1:18 AM

#

lemme google how to install cuda into google colab

limpid dew Apr 29, 2025, 1:19 AM

#

You don't install it into google colab.

serene scaffold Apr 29, 2025, 1:19 AM

#

Well if colab isn't letting you use the GPU, it doesn't matter

#

It will always say that cuda isn't available until it lets you use the GPU again

viscid urchin Apr 29, 2025, 1:19 AM

#

Check out that 'start locally' page, it has a thing at the top where you can click on the versions you want and it will show you the install command.

opaque sphinx Apr 29, 2025, 1:19 AM

#

serene scaffold It will always say that cuda isn't available until it lets you use the GPU again

ah gotcha, only when the google colab resets again only then i can start doing

serene scaffold Apr 29, 2025, 1:20 AM

#

Yes, or you can move all this to your computer

opaque sphinx Apr 29, 2025, 1:20 AM

#

viscid urchin Check out that 'start locally' page, it has a thing at the top where you can cli...

ok sir i will check it out thanks @limpid dew too

limpid dew Apr 29, 2025, 1:20 AM

#

Try to run the code in a .py file locally on your pc (not using colab)

opaque sphinx Apr 29, 2025, 1:21 AM

#

ok i will try it in pycharm

#

thanks guys

toxic pilot Apr 29, 2025, 1:25 AM

#

serene scaffold Don't try tensorflow if you haven't already.

idk man Keras is pretty darn good

#

also it’s so speedy

#

the only issue i have with tensorflow is that tf code is more difficult to understand

#

people also say it’s less “pythonic” which is not something i really care about but if you do, well… that’s something to consider

limpid dew Apr 29, 2025, 1:28 AM

#

My understanding as someone who only uses pytorch, is that keras is more user friendly but pytorch gives the user more control and is therefore better for research applications.

toxic pilot Apr 29, 2025, 1:29 AM

#

limpid dew My understanding as someone who only uses pytorch, is that keras is more user fr...

i think pytorch is actually more userfriendly

#

thats just a personal opinion tho. ultimately i think the difference is minimal enough to not be worth talking about.

limpid dew Apr 29, 2025, 1:31 AM

#

People like tensorflow better for production right?

toxic pilot Apr 29, 2025, 1:31 AM

#

TF is supposed to be super versatile in a production environment, and pytorch has better tools for regular users

#

tbh i dont think it actually matters which you use. if you learn one, learning the other is trivial. personally i use pytorch tho

limpid dew Apr 29, 2025, 1:32 AM

#

That seems right. I doubt there's much you can do with one library which you CAN'T do in the other.

serene scaffold Apr 29, 2025, 1:41 AM

#

limpid dew People like tensorflow better for production right?

I've never heard of anyone use tensorflow outside of academia

glacial root Apr 29, 2025, 2:43 AM

#

yo isn't that against the server's terms/conditions

serene scaffold Apr 29, 2025, 2:51 AM

#

!cleanban 1266449020306587688 Asking for jobs after being told not to.

arctic wedgeBOT Apr 29, 2025, 2:51 AM

#

:incoming_envelope: :ok_hand: applied ban to @spiral locust permanently.

tawdry finch Apr 29, 2025, 2:54 AM

#

i heard currently pandas and bs4 etc are outdated and replaced by more powerful libraries like polar, crawl4ai etc so anyone who knows or is a data analyst can you pls guide me on which libraries to learn as well as i am completely open to learn under someone

serene scaffold Apr 29, 2025, 2:55 AM

#

tawdry finch i heard currently pandas and bs4 etc are outdated and replaced by more powerful ...

please only ask your question in one place. you can cross-post your question in relevant channels, but everything should point back to one place to get the answer.

tawdry finch Apr 29, 2025, 2:56 AM

#

serene scaffold please only ask your question in one place. you can cross-post your question in ...

ok

rich moth Apr 29, 2025, 6:27 AM

#

rich river Apr 29, 2025, 7:16 AM

#

[component_container-1] 2025-04-29 14:52:42.409273636 [W:onnxruntime:, graph.cc:1348 Graph] Initializer onnx::Conv_2881 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
[component_container-1] 2025-04-29 14:52:42.461690793 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 2 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
[component_container-1] 2025-04-29 14:52:42.462035032 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 12 Memcpy nodes are added to the graph sub_graph4 for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
[component_container-1] 2025-04-29 14:52:42.464001375 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
[component_container-1] 2025-04-29 14:52:42.464005845 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

#

has anyone ever used ONNX for exporting models? Im not sure how to solve this

#

import onnx
import torch
from perception.src.mask_rcnn_model import MaskRcnnModel

# Function to Convert to ONNX
def Convert_ONNX(model, target_path):
    # set the model to inference mode
    model.eval()
    target_height = 800
    target_width = 1100
    dummy_input = torch.randn(1, 4, target_height, target_width, requires_grad=True)
    dummy_input = dummy_input.cuda()
    # Export the model
    torch.onnx.export(
        model,  # model being run
        dummy_input,  # model input (or a tuple for multiple inputs)
        target_path,  # where to save the model
        export_params=True,  # store the trained parameter weights inside the model file
        opset_version=12,  # the ONNX version to export the model to
        do_constant_folding=True,  # whether to execute constant folding for optimization
        input_names=["modelInput"],  # the model's input names
        output_names=["modelOutput"],  # the model's output names
        dynamic_axes={
            "modelInput": {0: "batch_size"},  # variable length axes
            "modelOutput": {0: "batch_size"},
        },
        keep_initializers_as_inputs=True,
    )

if __name__ == "__main__":
    print(torch.cuda.is_available())
    maskrcnn_path = "maskrcnn_rgbd_2025-01-28_epoch_526.pth"
    maskrcnn_model = MaskRcnnModel(maskrcnn_path, 0)._model

    target_path = "MaskRCNNModel.onnx"
    Convert_ONNX(maskrcnn_model, target_path)
    onnx_model = onnx.load(target_path)
    onnx.checker.check_model(onnx_model)
    print(onnx.helper.printable_graph(onnx_model.graph))

ocean hinge Apr 29, 2025, 7:42 AM

#

Hello. I didn't know where else to ask this. I have experience as a game dev and a dell bhoomi dev. I am trying to switch career to data science. But idk where to start. Can anyone provide me a roadmap? Like what learn in a format? And where to learn?

grand breach Apr 29, 2025, 7:58 AM

#

the docker image i created for my document assistant RAG tool is as big as the virtual env i created for it ~ 10 gb - i tried multi stage build, minimal requirements.txt etc but nothing is reducing it's size

grand breach Apr 29, 2025, 8:36 AM

#

is it ok to deploy if the image is 10 gb

rich river Apr 29, 2025, 9:18 AM

#

rich river ```python import onnx import torch from perception.src.mask_rcnn_model import Ma...

changing from keep_initializers_as_inputs=True to False solves all warnings regarding This may prevent some of the graph optimizations, like const folding
but I still havent figured out how to solve these warnings

[component_container-1] 2025-04-29 17:14:53.086686347 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 2 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
[component_container-1] 2025-04-29 17:14:53.087215064 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 14 Memcpy nodes are added to the graph sub_graph4 for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
[component_container-1] 2025-04-29 17:14:53.089272371 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
[component_container-1] 2025-04-29 17:14:53.089276591 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

weary timber Apr 29, 2025, 10:07 AM

#

toxic pilot and maybe tune up lr a bit

so youre sure there isnt anything wrong with the model or the loss calculation

#

?

rich river Apr 29, 2025, 10:21 AM

#

#

I used netron to visualize the graph, maybe I cannot find which nodes go wrong and just assume it won't affect the performance and ignore it

ocean hinge Apr 29, 2025, 11:01 AM

#

ocean hinge Hello. I didn't know where else to ask this. I have experience as a game dev and...

Anyone?

verbal oar Apr 29, 2025, 11:11 AM

#

https://claude.ai/share/7a73e89a-54c8-49eb-8f2a-4347d484e901
right for code generation code is long

#

I see some usage of tree sitter

#

temperature is related to simulated annealing or its in different context?

#

generally temperature is for controlling randomness as from description of open ai

#

but honestly im confused

crimson jackal Apr 29, 2025, 11:59 AM

#

Hey guys. I created a kind of street fighter game using pygame and i want to have a ai model for my oponnent. Can someone give some insight on how to do it with reinforcement ml? I know just a little about ml. Not much.

toxic pilot Apr 29, 2025, 12:41 PM

#

weary timber so youre sure there isnt anything wrong with the model or the loss calculation

doesnt look like it.

#

you could aslo try switching to gradient penalty instead

#

and if you're getting a bunch of fluctuations, maybe add a layer norm or batch norm somewhere in the middle

#

and also like tune your parameters, like maybe try doubling n_critic

unkempt apex Apr 29, 2025, 12:47 PM

#

ocean hinge Anyone?

check the pinned messages

copper umbra Apr 29, 2025, 1:57 PM

#

Hey folks looking for advise on preferred UI

For years, I have been using anaconda spyder. I love the setup of that program. However, I need to find an alternative that has GIT repo integrations.

Which do you guys love

I hate Jupiter notebooks
Needs to have 3 windows at least- code, output (for print data check) , variables (click df and see the frame etc).
Has highlight code and run options not run whole file from command line

agile cobalt Apr 29, 2025, 1:58 PM

#

copper umbra Hey folks looking for advise on preferred UI For years, I have been using anaco...

maybe take a look at marimo

pretty sure that spyder should have a git integration though?

serene scaffold Apr 29, 2025, 2:00 PM

#

copper umbra Hey folks looking for advise on preferred UI For years, I have been using anaco...

I use pycharm, which does all the things you mentioned wanting.

gritty vessel Apr 29, 2025, 2:07 PM

#

toxic pilot most of the data processing or ai/ml related libararies are implemented with C u...

correct

iron basalt Apr 29, 2025, 2:19 PM

#

copper umbra Hey folks looking for advise on preferred UI For years, I have been using anaco...

I use vim, it has these things (except not clicking, keyboard based) as plugins (e.g. https://github.com/luk400/vim-jukit ).

GitHub

GitHub - luk400/vim-jukit: Jupyter-Notebook inspired Neovim/Vim Plugin

Jupyter-Notebook inspired Neovim/Vim Plugin. Contribute to luk400/vim-jukit development by creating an account on GitHub.

#

Variable value check is included here in it being the Python REPL, so you can just type the variable name.

verbal oar Apr 29, 2025, 2:59 PM

#

I dont like light theme of jupyter notebook, what else dark or other theme?

#

too green text boring background

#

I also discovered I can use jupyter notebook inside vs code

viscid urchin Apr 29, 2025, 4:10 PM

#

pytorch's docs actually have a tutorial about it, have you read through this yet? https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

#

If you have I'm guessing you want something more fundamental?

agile cobalt Apr 29, 2025, 4:11 PM

#

understanding which aspect of it?

pale thunder Apr 29, 2025, 4:17 PM

#

I drew this with matplotlib like so. While it is kind of fine, is there some library that would be better suited to this kind of drawing?

fig, ax = plt.subplots()
for a, b in relevant_edges:
    ax.plot(*zip(a, b), color='grey')
for a, b in all_edges:
    ax.plot(*zip(a, b), color='blue')
for a, b in path:
    ax.plot(*zip(a, b), color='green')
ax.set_aspect('equal', 'box')
ax.axis('equal')
ax.scatter(*zip(*all_pts))
ax.scatter(*start, color='red')
ax.scatter(*end, color='green')
plt.show()

wooden sail Apr 29, 2025, 4:22 PM

#

pale thunder I drew this with matplotlib like so. While it is kind of fine, is there some lib...

maybe something like networkx? https://networkx.org/documentation/latest/tutorial.html

pale thunder Apr 29, 2025, 4:24 PM

#

the biggest reason to go with networkx is usually that it can lay out the graph nodes comfortably, but here, I know the exact positions of each node.

viscid urchin Apr 29, 2025, 4:30 PM

#

pale thunder I drew this with matplotlib like so. While it is kind of fine, is there some lib...

There's Plotly https://plotly.com/python/

Something like...

import plotly.graph_objects as go
figure = go.Figure()
for a, b in relevant_edges:
    figure.add_trace(go.Scatter(x=[a[0], b[0]], y=[a[1], b[1]], 
                         mode='lines', line=dict(color='grey'), showlegend=False))
# draw the rest of the stuff
# ..draw the points etc
x_pts, y_pts = zip(*all_pts)
figure.add_trace(go.Scatter(x=x_pts, y=y_pts, mode='markers', 
                            marker=dict(color='black'), showlegend=False))
# then some kind of figure.update_layout(...) call
# and finally figure.show()

at least that's what I get from their docs, take it with a grain of salt.

Plotly

Plotly's

#

ooh actually maybe seaborn would be the slickest here?

#

It's based on matplotlib but has cooler tricks https://github.com/mwaskom/seaborn

GitHub

GitHub - mwaskom/seaborn: Statistical data visualization in Python

Statistical data visualization in Python. Contribute to mwaskom/seaborn development by creating an account on GitHub.

#

It's got like..

x_pts, y_pts = zip(*all_pts)
seaborn.scatterplot(x=x_pts, y=y_pts, s=50)

I've never used it but I've heard it mentioned a number of times.

#

Plotly doesn't really seem less-verbose than your original code, it's just different.

verbal oar Apr 29, 2025, 6:18 PM

#

why its called seaborn?

#

heatmap from seaborn I only relate

viscid urchin Apr 29, 2025, 6:24 PM

#

Hah, I had to look it up, and apparently it's an obscure joke relating to https://en.wikipedia.org/wiki/Sam_Seaborn

Sam Seaborn

Samuel Norman Seaborn is a fictional character played by Rob Lowe on the television serial drama The West Wing. From the beginning of the series in 1999 until the middle of the fourth season in 2003, he is deputy White House Communications Director in the administration of President Josiah Bartlet played by Martin Sheen. The character departed f...

#

Hence the common import alias of import seaborn as sns :\

verbal oar Apr 29, 2025, 6:25 PM

#

so similar to python is not from snake but from monty python

teal depot Apr 29, 2025, 11:07 PM

#

Hi, I'm new to Python and want to start AI/ML, but I don't know how to get started. Please help me with some recommended courses and tutorials.

grand minnow Apr 30, 2025, 1:58 AM

#

teal depot Hi, I'm new to Python and want to start AI/ML, but I don't know how to get start...

Check the pinned message

glacial root Apr 30, 2025, 3:37 AM

#

what kinds of things are graph neural networks used for

#data-science-and-ml

Replace this with your Hugging Face token

Inference API URL