#data-science-and-ml

1 messages ยท Page 164 of 1

umbral hatch
#

plateau tho

viscid urchin
umbral hatch
#

down the line what should it be? should i learn like advanced math and data and shit?

viscid urchin
#

Yeah, I see it as a repeated loop of learning a new ML idea, then going back and re-learning its math, and then going again

umbral hatch
#

for now tho just python? as a beginner of course

viscid urchin
#

Yeah just Python is fine. You can learn Triton or something to add to what you can do later.

umbral hatch
#

any other languages along python later on?

viscid urchin
umbral hatch
#

can you tell me why?

viscid urchin
#

(This is what most of OpenAI's models are written in)

#

This lets you write code that runs on the GPU very efficiently, and do it cross-platform without needing a specific nVidia card.

umbral hatch
#

ah nice

viscid urchin
#

This would be an alternative to, say, learning CUDA directly

umbral hatch
#

what jobs can one land with data science?

viscid urchin
#

At this point, whatever you can dream almost; everybody wants data science.

umbral hatch
#

highly competitve? evne more than software engineering?

viscid urchin
#

Well, it IS software engineering, so it's hard to compare

umbral hatch
#

oh shit...

#

well is it more competitve than other fields or?

viscid urchin
#

It's super hot so yeah, you could say so.

#

But how competitive depends on your level really

umbral hatch
#

super beginner

viscid urchin
#

You'll have a lot of competition, but if you know stuff, the market is thirsty.

umbral hatch
#

never back down never give up

viscid urchin
#

If you're actually interested in the topic, you should be able to stand out, in my opinion

#

There are a lot of people just doing it because it's supposed to be hot

umbral hatch
#

nah i kidna like it and interested in it

limpid dew
#

Is it normal to get a career in data science without a degree?

serene scaffold
#

Data science is even more degree-requiring than software development.

umbral hatch
#

woah

limpid dew
umbral hatch
limpid dew
#

What kind of projects are you thinking of?

umbral hatch
# limpid dew What kind of projects are you thinking of?

for now, im barely learning loops. For a future project, it'd be something like asking the user for inputs what's your name, age...etc and then in the same project something interactive, multiplying the age, scrambling names. this is far down the line tho

limpid dew
#

Very cool! Also smart to keep it simple at first.

umbral hatch
#

yeah and for fun maybe start coding games and learning other languages. main reason i went into coding

fallow coyote
#

How do you lot use databases with your ML programs? I'm trying to make my first ML program to see the likelihood of someone developing heart failure. I cleaned the data and then transferred it into an SQLite database

wooden hill
#

ML program?

glacial root
#

@serene scaffold do you know what the issue here is

#

sorry to ping, you're the only one here that i know is into nlp

fallow coyote
# wooden hill ML program?

Machine Learning program. Just something simple; using linear regression. Ill spend this week on it, send it to this channel for everyone to critique and then I'll attempt to make a proper project (which will have mini projects within it as I try to expand my knowledge and skills in ML programming)

wooden hill
serene scaffold
#

when you say "pairs", do you mean "two adjacent words"?
remember that lists are not arrays.

#

corpus[i:i+2] -- if you try to do a string slice that is out of range, you'll get an empty string.

#

!e

print("hello world!!!"[100:100000])
arctic wedgeBOT
serene scaffold
#

note that it did print("") rather than erroring.

glacial root
glacial root
serene scaffold
#
for i in range(1000):
    pairs = dict()
    for j in range(len(corpus) - 1):
        key_list = list(pairs.keys())
        pair = ''.join(corpus[i:i+2])

do you see the problem?

glacial root
#

i don't understand how it's going past the last character

serene scaffold
#
for x in range(1000):
    pairs = dict()
    for y in range(len(corpus) - 1):
        key_list = list(pairs.keys())
        pair = ''.join(corpus[x:x+2])

this might make it easier to see.

glacial root
#

oh shit

#

wait yeah i didn't see that part earlier

#

i'm such an idiot

#

thank you

serene scaffold
glacial root
#

i should have known to check a few more times before asking for help

serene scaffold
glacial root
#

sometimes i just make so many mistakes like this

#

it's scary what's gonna happen in the future

serene scaffold
#

You're being too hard on yourself. Everyone starts out like this.

glacial root
#

hopefully this issue goes away as i practice more

serene scaffold
civic elm
#

Hi all, in machine learning, in particular feature update or (rollbacks?) how do you solve the "Any change can break everything" problem?

#

Basically I made some feature changes to my model inputs in jupyter notebook and it performed worse than an hour ago. how do we solve this?

grand minnow
#

Has anyone tried using Helicone? I was set on using it to track the cost of tokens and requests per conversation, etc but the dashboard just kept showing the default sample data metrics ๐Ÿ˜ญ

I've verified that I am currently sending through their gateway tho... Does it lag?

narrow tiger
#

If i have a rag from which I want to pass in context to an llm (to use relevent data)
should i send this in system prompt or user prompt?
how can it effect the outputs

agile cobalt
#

how much it affects the output will vary depending on the model, but in general you'll want to put "trusted" data in the system prompt, "untrusted" data in the user prompt

you shouldn't rely on the model to distinguish between it, but it might help it understand how official or reliable that data is

narrow tiger
#

thanks that makes sense,

untold cliff
#

Hello! I am trying to dive deep into tokenization and understand the need of the shift towards subword based tokenization.
One of the main points I see is how hard it might be to define splitting rules, especially for complex languages. I keep seeing turkish as an example but I don't know the language so I can't tell that much, but I think I can see ot even in english of a word is a somewhat complex combination of prefixes and suffixes (but no specific examples come to mind unfortunately)
Another main point I see is ambiguity, in the sense of which representation to use. I see example like "don't", should be considered one token or be split into "do" and "n't" or "not". And "U.S.A" or "New York-based" for example, how should it be split? And I'm wondering if it's that hard to agree on one common way of doing or if it is that hard to define rules for doing so. I see arguments saying that it depends on the use case, so for some cases one way of splitting is better than the other, but I can't think of any examples.
Can you shed some light maybe? Or give some clear examples? Thank you!

final jolt
#

I feel like that is a complex question with not at all the same set of examples. realistically an LLM would understand "don't" as a single token and breaking that up would be model dependant. (and also would it even be useful?) The same can be said for splitting prefixes and suffixes, at least in English where you would fundamentally change the meaning of a word by doing so which I say would just cause more harm than good. A stronger argument could be made for your last example but by that same degree I feel like splitting the tokens just by some set standard would yield unreliable outcomes. given that New York-based would have a different meaning than New York, based or similar depending on the model in question.

The greater support or capability for natural language interaction makes the argument even harder to justify IMO.

north plank
stiff night
#

Hello, I'm here to seek guidance. I have a project in which I have to train an AI model on segmenting tumors from breast mammograms. This is my first AI project so I'm kind of lost at the moment.
The dataset that I'm working with is the INbreast 2012 dataset on kaggle. I have managed to load the DICOMs and their corresponding tumor masks and train a UNet model on them, but I did not get any promising results. All metrics like dice and IoU are very small (less than 1%).
I'd really like any help if possible. Thanks in advance.

fallow coyote
#

I've been learning a lot of the statistics for ML. As I'm still at the beginning of my ML journey, how much of a focus should I have put in learning the linear algebra side?

iron basalt
fallow coyote
#

Ill finish off learning the stats. Tbf a lot of the stats I still can recall from high school several years ago. Just need to revise them. Then Ill go onto learning all the linear algebra stuff

lapis sequoia
#

gans make me feel dirty about myself, they waste time, I thought they were harder than obj detection. Any of you young and good lads have any resources at your disposal for object detection?

viscid urchin
lapis sequoia
#

you think rlhf is not really RL?

viscid urchin
#

I wouldn't call it fake, but it's clearly different when humans are involved

lapis sequoia
#

yes, the reward are still through actual tangible data

iron basalt
gray slate
# viscid urchin https://arxiv.org/html/2502.15840v1

We find no clear correlation between failures and the point at which the modelโ€™s context window becomes full, suggesting that these breakdowns do not stem from memory limits
Interesting. Any thoughts about it?

#

Also, arxiv entering the 21st century with HTML! ๐ŸŽ‰
Nice work, science! ๐Ÿ˜‚

viscid urchin
gray slate
lethal gull
#

Does anybody that has worked with VAE's know what ways i can increase model performance?

lapis sequoia
gray slate
lyric furnace
#

guys

#

could anyone suggest me a machine learning documentation:

i am a kid and i am intrested in ML/DEEP LEARNING.
i am trying to learn linear algebra and stuff

could anyone please suggest me a doc, cause the docs i find is very complicated and not well explained

grand minnow
fallow coyote
#

Can anyone help me try to decipher what a model matrix is and how to create one? Third fucking time I'm asking. If you cant be asked to help, at least guide me to a decent resource that can help me understand what a model/design matrix is because I cannot find anything remotely useful on the internet that tells me what I need.

final jolt
serene scaffold
#

This raises the question: why do you think you need to know about something for which scant resources exist?

lavish wraith
#

Is pandas is equivalent of Excel

final jolt
#

I mean pandas is a library for data queries and other stuff. Excel is a spreadsheet program. Yes both can be used for data query and analysis but they are very different approaches

lavish wraith
#

For data analysis what's basic skills required ??

final jolt
#

I feel like that heavily depends on what data you may be analyzing, what your goal of the analysis is and what exactly are you wanting to do.

serene scaffold
# lavish wraith For data analysis what's basic skills required ??

keep in mind that you almost certainly need a degree--employers are going to be selective about who they trust to help them make consequential business decisions.
pandas is one of the most popular tools for data analysis, but you also need to understand statistics and have some domain knowledge in what you're analyzing.

lavish wraith
final jolt
grand mantle
#

Hello guys. I am little bit confused on data sharing

#

I provide Instagram data but i can't extend number of clients

lavish wraith
#

I search lot of website skill required they said Excel ,sql,tableau ,numpy ,pandas,matplotlib ,seaborn

final jolt
final jolt
grand minnow
ornate rose
#

Hey guys. I'm new to here (but not entirely new to python). I just wanted your opinions on this

I'm an advanced beginer in python ( I know the basics. Loops, ifs, whiles, input data types... all that kind of stuff) and i've worked on natural language processing using the NLTK library in python. so it's been quite a lot though ofcourse not being familiar with it makes me forget a lot of syntax in the library

I'm presently 17 and will be starting college pursuing a CS and Cognitive Science degree and I'll be working on a research paper on AI before resumption (September) with top level profs and graduates. This paper would be submitted to arXiv and top AI conferences like NeurIPs. I'd be aiming to pursue a FAANG internship though I'd settle for whatever I'll get but my main goal is to master Python Programming by the end of this year up to a given extent.

I'd love anyone that has inputs or advice they are willing to share so I begin working.

serene scaffold
fallow coyote
# serene scaffold This raises the question: why do you think you need to know about something for ...

Apologies if I was being brash. I'm going through a book (Introduction to Statistical Learning for Python or ISLP). It mentions something about a model matrix or design matrix which I believe is to set the template for your model (i.e. defining what X and y will be). ISLP uses its own custom module to create a design matrix. I'll continue on with my project and see how it goes. I'll post it up on here when I'm done with it.

viscid urchin
#

and I guess this is where the rubber meets the road

#

Never used this API, looks fancy

fallow coyote
viscid urchin
#

Not sure; my guess is that it helps you generate the matrix from the dataset, but I guess I'd have to look at their docs to know why it's cool

#

Perhaps the most common use is to extract some columns from a pd.DataFrame and produce a design matrix there we go I suppose

fallow coyote
#

I'll continue on with my current project and see how it goes from there. Thanks for trying to help. I have this weird obsession where I must know how everything works or else i cant use it to its fullest potential. I'm slowly getting better at just needing to learn the surface details then applying it straight way and after, go deeper into the subject

final jolt
opaque condor
#

For making my own image network convelution
Do I make a file with each reference

#

Image

lapis sequoia
#

yo, RAG stuff, any good tid-bits or links?

lapis sequoia
misty wraith
#

hi im new just trying to find my way around this server. im trying to learn python with data science as one of the goals

serene scaffold
misty wraith
wooden sail
# fallow coyote I'll continue on with my current project and see how it goes from there. Thanks ...

a "model matrix" is a particular flavor of linear statistical model, so what you're looking for is a "statistical model" https://en.wikipedia.org/wiki/Statistical_model#Formal_definition

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, the data-generating process. When referring specifically to probabilities, the corresponding term is...

#

after some discretization/sampling and/or a choice of basis in a finite dimensional vector space, a "model matrix" is roughly equivalent to the assumption that your data is described by a statistical model with deterministic but unknown parameters, and those parameters are related to the observed data via a linear transformation

opaque condor
lapis sequoia
opaque condor
#

Not yet and I'm going over foxes and cats and dogs

ionic dirge
#

Hi everyone, please, I need your help. I currently use Google colab on a mobile device to run datasets. I only just started. I need to analyze datasets from kaggle. How can I use these datasets on Google colab without downloading it?

potent meadow
#

im doing a project for my uni and everything looks good except that the graph isnt showing anything
i cant fix it and i dont know where's the issue
can someone help? i can provide the code and other files and stuff but it's difficult to just upload everything here lol

ionic dirge
final jolt
final jolt
#

You can but I can't promise I alone could help. And probably won't have a chance to look today ๐Ÿ™‚

potent meadow
wraith jay
#

i need some pandas help

def in_prop(formula):
    parsed = _parse_formula(formula)
    bools = [el in elements for el in propeties.keys()]
    return all(data)

filtered = df.loc[lambda x: ~in_prop(x['formula'])]
df['en diff'] = filtered.map(en_diff)

i want to filter elements in the formula column of the dataframe based on the result of the in_prop function, but i cant figure out how to do it

#

this code doesnt work

viscid urchin
#

Sorry if I'm being dumb, what is en_diff?

#

also propeties is spelled wrong

#

and where is data coming from?

wraith jay
wraith jay
viscid urchin
#

Sure, but I mean, it's not in the code you show; is that variable in scope?

wraith jay
#

yeah

#

im doing this in jupyter notebook

#

heres the error im getting btw

TypeError: expected string or bytes-like object, got 'Series'
#

for this line filtered = df.loc[lambda x: ~in_prop(x['formula'])]

#

i just dont know how to do the filter

viscid urchin
#

Don't you want to say return all(bools)?

#

in in_prop?

#

Like..

def in_prop(formula):
    parsed = _parse_formula(formula)
    bools = [el in properties.keys() for el in parsed]
    return all(bools)
``` ?
wraith jay
#

ohh right

viscid urchin
#

Or is that not what 'parsed' is all about?

wraith jay
#

ty

viscid urchin
#

If I understand you correctly I THINK the way to say it is:

filtered = df.loc[~df['formula'].apply(in_prop)]
``` but I'm not a pandas wizard.
wraith jay
#

yeah data wasnt a variable. but the error still persists, i dont think its even calling the function yet

wraith jay
#

wait

#

nvm

viscid urchin
#

Remove the ~ if you want to invert it

#

and then you'd do df['en diff'] = filtered['formula'].map(en_diff)

wraith jay
#

ahh i see

#

thanks for the help!

limpid dew
#

Anyone know if there is a difference between embedding and one hot encoding?

grand minnow
limpid dew
grand minnow
#

I don't know what or how a DNN works

#

Sorry

limpid dew
glacial root
quaint mulch
quaint mulch
limpid dew
quaint mulch
#

embedding is just a concept for what's hapening on the 1st layer of DNN usually
the idea is it is projecting from data space to a latent space.

quaint mulch
quaint mulch
limpid dew
#

sadly I dont have an example

#

Just interested conspetually.

viscid urchin
limpid dew
#

Fundamentally, isn't embedding just a mapping from a set of binary features to an arbitrary vector?

quaint mulch
#

take an image or a sound file.

limpid dew
#

Thank you that's helpful.

quaint mulch
#

or like, stock prices

viscid urchin
peak hamlet
viscid urchin
#

Neat

limpid dew
viscid urchin
#

"learned vectors that place semantically or structurally similar items close together in high-dimensional space" is a definition I just found that I kinda like.

limpid dew
#

I like that as a high level explanation

viscid urchin
#

It's interesting I guess that this sense of 'embedding' is different from the broader mathematical term

limpid dew
#

Does have a bit of an LLM slant to it though

viscid urchin
#

Like, I was just thinking about whether a 'consistent hash ring' like you'd find in a distributed system is an 'embedding' of its nodes.. and I guess it is in the mathematical sense but not in the machine-learning sense.

limpid dew
#

I'm interesting in embedding dota2 and league heros/champions

limpid dew
viscid urchin
#

Imagine you have a bunch of servers and you want each of them to store an even chunk of your data.. You might use an algorithm that assigns them to 'positions' on a 'ring' or 'clock face'

quaint mulch
limpid dew
lyric furnace
limpid dew
quaint mulch
#

I mean, it is cited 140k times

#

Well, embedding is just a concept right? Usually the 1st layer we call it emebdding

#

in this paper they don't use the embedding concept, but you can think of the 1st layer as embedding

viscid urchin
#

Maybe I don't understand what you mean by 'simple mapping'

#

In the second one the embedding represents the 'latent factors' you are trying to optimize for

limpid dew
viscid urchin
limpid dew
#

I think the difference might be in how the vectors are treated after the first layer.

#

Perhaps the only difference is that, in embedding, the vectors are assumed to have the same basis, and therefore can be added together without increasing the dimensionality.

viscid urchin
#

multiplying a one-hot by the weight matrix selects that row I guess

#

But presumably in practice your embeddings would have a much more efficient way of being looked up

#
limpid dew
#

Perhaps it's more efficient but mathmatically is would be the same then.

#

that stack overflow article is helpful, thanks

#

I take that answer to mean that one hot encoding would be isomorphic to embedding.

junior venture
#

hi guys

limpid dew
#

Hi ccccccp

junior venture
#

can you say where general

limpid dew
#

were general

junior venture
#

yeah

limpid dew
#

nice

junior venture
#

wait i mean where general

#

hello?

#

@limpid dew are you here?

limpid dew
#

?

junior venture
#

where general?

#

...

limpid dew
#

idk what that means

#

what's this discords policy on esports betting?

viscid urchin
#

I can't think of a rule against it as long as it's not against any terms of service, but I guess be careful?

#

Maybe send a ModMail asking? Dunno.

limpid dew
#

I'm trying to build a model to beat betting odds for esports like dota or league.

#

looking for someone to help with the leage side of things as I only know dota.

#

each dota hero -> an embedding vector hense the questions earlier.

rich moth
#

Holy smokes guys! I just made a univerisal complexity scoring tool that works across every digital domain I could throw at it, tabular, time series, images, text, you name it.

In a nut shell it quantifies how hard each example is, automatically sorts training data, from easy to hard, hard to easy, or random. On time series its boosted it to 192% on time series data.

limpid dew
rich moth
#

Sorrry to flood the channel, I just felt this was too awesome to not share ๐Ÿ˜„

limpid dew
#

Could you give a little more detail?

#

What is a complexity scoring tool

#

do you mean Kolmogorov complexity?

rich moth
#

Sure! It bascailly quantifies how "difficult" or "complex" different data examples are for ML.

#

It repesents compleixty as a complex number ฮฆ(x), which provides both Magnitude and Phase arg.

limpid dew
#

Yes but how do you define complexity?

rich moth
#

With a mathmathically formula I made

limpid dew
#

Would you share that with us?

rich moth
#

Not at this moment in time, no.

limpid dew
#

Interesting, why do you chose to represent the complexity as a complex number?

#

Is it because complex is in the word complexity?

rich moth
#

๐Ÿ’ฏ ๐Ÿ˜†

#

A better way would be to say ฮฆ(x) gives us a single score that tells us both how hard a data example is and what kind of challenge it poses, so we can train models more efficiently.

limpid dew
#

You should check out Kolmogorov complexity. I think a similar version of what you describe has been done before.

limber spear
#

Still fire tho ๐Ÿ”ฅ

limpid dew
limber spear
limpid dew
limber spear
#

deadge catching some Zzzโ€™s later chat

rich moth
#

appreciate your guys feedback, bedtime now. but we can dig deeper tomorrow

#

Naturally makes sense text is the hardest for ML.

#

Languages, dialects, slang. ,translations. I mean the list goes, its chaotic.

#

Ok now bed time

limber spear
#

I did a brief online search and went into reading about Alan Turingโ€™s research

#

Bro wakey wakey eggs and bakey ๐Ÿ˜… have a good day chat or night

lapis sequoia
#

Time series is underrated. Very underrated. It requires patience.

timber trail
lapis sequoia
hearty token
#

For agglutinative languages where words boundaries are not defined by spaces, and where word segmentation is needed, what kind of definition should a "word" have? For instance, there are such things as compound verbs, these are conjugation of lone verbs that is in practice used as a single word, but in terms of meaning have their own spots in the dictionary. Should these be segmented or kept together? What is the benefit of keeping them together vs segmenting them? Practically speaking, for the usage of tokenization, would keeping them together be better?

#

my understanding is that since this is word segmentation, the segmented pieces doesn't need to be morphemes, and so keeping these compound verbs together would make more sense, but then there is also the fact of variation, the lone verbs in the compound verbs can take other forms as well, which doesn't betray its own part in other variaties. Would training on more segmented be better this way?

serene scaffold
#

What language is this?

final jolt
#

It would depend on the language but I would say the tokenization should be done at the level where the entire word actually makes sense.

serene scaffold
#

Even for languages like English, it's already a thing to have "sub tokens", which is really just when you tokenize at the morpheme level.

serene scaffold
hearty token
#

แ€‘แ€ฎแ€ธแ€–แ€ผแ€ฐแ€†แ€ฑแ€ฌแ€„แ€บแ€ธ แ€™แ€šแ€บแ€แ€ฑแ€ฌแ€บแ€”แ€แ€บแ€žแ€Šแ€บ แ€€แ€ฝแ€™แ€บแ€ธแ€†แ€ฑแ€ฌแ€บ แ€™แ€„แแ€™แ€šแ€แ€–แ€…แ€žแ€Š แ‹

#

e.g.

#

Is there a case where segmenting compound words together would make sense?

serene scaffold
#

I would just always tokenize them separately and let the model figure it out.

final jolt
serene scaffold
final jolt
#

I see I misread the end of their question about training and was only thinking of interraction. yes I do agree that breaking the words apart into their repeated components is worthwhile. It could even be worth doing both?

hearty token
serene scaffold
hearty token
hearty token
#

iirc openai uses tiktoken and they don't do that

naive axle
#

I was training a pytorch mobilenetv2 model on limited dataset - only have 1 image for a class so I used data augmentations to make it 10, even with training accuracy around 0.96 it cannot recognize images outside of training dataset their are not in the first 10 predictions . only difference I see in new image and training image is the background and size, and I applied gradient background/resize to training images hoping to resolve this issue. is it worth to use data augmentation on same source image and train model?

rich moth
rich moth
limber spear
#

Even our beloved Python stacks

#

Python is one of if not the goat ๐Ÿ™Œ

rich moth
limber spear
#

Laughing more about dynamics in the community. The C community is the old guard

#

Most donโ€™t believe any of this data sciencey ai mumbo jumbo

glacial root
#

in embedded systems applications c/cpp are pretty much necessary

limber spear
#

They could easily claim that C drives Python

glacial root
karmic pond
#

Hi everyone

#

Does anyone work in AI roles?

lapis sequoia
#

Why are there people who talk about LLMs all of the time, but have never ever mentioned a transformer?

serene scaffold
serene scaffold
karmic pond
#

@serene scaffold I would like to know what technologias should dominate to work with Azure, I have the pcap and it is Ml,DL

serene scaffold
viscid urchin
#

What's a pcap?

karmic pond
#

@viscid urchin The certificate

#

@serene scaffold knowing python and AI

#

Can I get a job related to Azure?

serene scaffold
karmic pond
#

Ok

serene scaffold
opaque condor
#

So if I wanted to make my own AI data set it would go like this?:

Folders

                  Eggplant:
                               Eggplant_image1
Eggplant_image2
Eggplant_image3
Eggplant_image4```
karmic pond
#

@serene scaffold Thanks bro

grand minnow
#

For anyone who uses LLM, how do you currently track your tokens for input and output?

rich moth
#

So anyways, after the tokenzier is loaded you can use .encode() method. You give it a your text string (prompt) and it gives you back lost of numbers which are your token IDS . To finally get the actual token count you use len() function on that list.

grand minnow
#

So basically... There's no active library/app that I can integrate to get the cost of tokens used per user?

rich moth
grand minnow
#

Alright thanks

rich moth
#

I mean theres more than one way todo it thats for sure.

#

Maybe addtional feedback would be helpful

limber spear
#

Wanna see what real feature engineering in machine learning looks like ๐Ÿ˜

opaque condor
lavish wraith
#
    figsize(8,8),
    subplots=True,
    layout=(2,2,),
    sharey=True,
    legend=False,
)
plt.show()
``` in plot method what does mean layout (2,2)  ``
(rows, columns) for the layout of subplots.```
lavish wraith
#

in this example layout=(2,2) what does 2,2 mean it create 4 subplot ??

final jolt
limber token
#

Do you guys know of services that host zero shot classification models and serve them as REST APIs?

serene scaffold
limber token
serene scaffold
limber token
#

My boss wants the reasonable amount of time to be a bit unreasonable so yeah lol

serene scaffold
#

I'm not aware of a cloud compute service that offers GPUs at a lower price than AWS, but I run all my stuff on my company's own hardware.

limber token
serene scaffold
#

what model is it?

limber token
#

valhalla/distilbart-mnli-12-9

#

Is ONNX runtime something that would make sense here to try and lower GPU costs?

slender pasture
#

Hi all,
Hope you are all doing well.
I wanted to see if someone could help me out on something that's been cracking my skull for 3 FULL days now in matplotlib/numpy

I want to use spectogram data to plot ontop of the spectogram, the plotting part is easy, but I cant wrap my head around the data.

spectrum, frequencies, time, image = spectogram()

Is there any way I can transform this data to get x, y, z points?

example of a before and after attached

#

Any help is appreciated.
I know
time = x axis
frequencies = y axis
But how do I transform spectrum, which is a 2d array, into z values in order to decide whether to plot a point there or not?

wooden sail
#

the row and column index of the 2d array correspond to a frequency and a time, so you can then use that to e.g. plt.scatter(time_val, freq_val) whenever the spectrum value exceeds a threshold

slender pasture
#

Wow, thanks, I cant believe how simple that was.
I think my mistake was I was trying to implement all the logic within a the view_lim of the graph, to only calculate whats visible. And i kept mixing up the arrays.
Thanks so much @wooden sail

Now that I can get the Z data I'll try to clip spectrum to fit into the visible axis x and y limits. If you have any suggestions they are more than welcome.
Else I will post results later on.
Thanks!

wooden sail
slender pasture
#

You know how you can zoom and pan around the graph?
Well I only want to calculate on whats visible in the current image.

I know that with the following I can get the limits of the graph that are currently visible

y0, y1 = self.ax1.get_ylim()
x0, x1 = self.ax1.get_xlim()

That way I can plot the line only on whats visible.
So i would have to find a way to clip spectrum to fit within x0, x1 and y0, y1

wheat snow
#

hello!

i used a short python script to concatonate a few json files intoa csv and set the Timestamp column as index:

df['ts']= pd.to_datetime(df['ts'])
df=df.set_index(df['ts'])
df=df.drop(columns=['ts'])
df.index = df.index.tz_convert('Europe/Berlin')

df.to_csv('C:\\Privat\\Python_VSC\\Spotify\\MyData_2025\\Data_concat.csv')

ye i importated the new saved Data_concat into a jupyter notebook

#

unfortunatle the Dtype of the idnex is

dytpe("O")
#

so i cannot use commands like df.index.hours

viscid urchin
#

Why are you dropping the ts column after making it be the index?

#

Don't you just want?

df['ts'] = pd.to_datetime(df['ts'])
df = df.set_index('ts')
df.index = df.index.tz_convert('Europe/Berlin')
df.to_csv(r'C:\Privat\Python_VSC\Spotify\MyData_2025\Data_concat.csv')
``` ?
wheat snow
#

this is waht the ts column looks like in the json itself:

2021-01-06T19:04:34Z```
wheat snow
viscid urchin
#

Hmm, is that how that works? Interesting.

wheat snow
#

i mean i can use the index as basis for new columsn like Hours weekdays and so on

#

for x axsises

#

anyway ye, the json timestamp is UTC

#

i need it in Berlin Time so i converted it in the python script

#

yet as already mentioned when opening it in jupyter notebook i am loosing the dtype of that index

viscid urchin
#

Well, it's a CSV, everything starts out as a string, right? How are you asking pandas to cast the values on load?

viscid urchin
#

Sure but you have to pass the dtype argument

#

(like, a dict mapping column names to types)

wheat snow
#

but trying to run df.index= pd.to_datetime(df.index) returns the following

#

ValueError: Array must be all same time zone

#

and thats true, cause some conversions are made to +2 and +1

#

i assume because of szummertiem change

#

anyway i have to analyze the difference and the value_counts of the different +02:00 and +01:00 that exist

#

probably a string splitter?

viscid urchin
#

Oh that's weird; so it won't let you tz_convert the ISO8601 strings?

wheat snow
#

but i know it worked before

#

cause it was an old project ijust picked up again

#

i mean i casted tz.convert in the pythjon script, and that works

#

trying to run it in the jupyter just returns me that i cannot use that command on the index

rich moth
#

is it a trading bot? just curious. felt like ive dealt with this before

wheat snow
#

cause it thinks its not a datetiem object

wheat snow
rich moth
#

ahh ok

wheat snow
#

my top 1 song is still from kanye west

#

ims o cooked

#

307 plays of Stronger

#

2.3k hours is actually fine for 7 years

verbal oar
#

gen ai course, video or book?

zealous dawn
#

Hello, I have a little problem, got some embeddings done in clap (512vectors) and want to cluster them using HDBSCAN, I get OOM pretty quick because I've got the embeddings on 50k files. How can I fix this, it's kinda out of my league.

Tried some LLM answers was:

use k-NN to build a sparse distance matrix and metric='precomputed'

Dimensionality Reduction with PCA

verbal oar
#

pca is compression technique

#

dont sure but can stem or lemmatize words

#

to make fewer or have shorter words

#

also chunk files to not process at once 50k

#

I mean split

zealous dawn
verbal oar
#

ah ok

zealous dawn
#

maybe im mistaken and maybe there is a way to split them on disk, but honestly I dont have that much expertise so im looking for a solution

#

and Id like to avoid taking smaller samples because then ill have to reorder all the files I have in the correct clusters

viscid urchin
#

I see a paper about streaming DBSCAN but I don't, sadly, understand it yet.

zealous dawn
#

Ill take a look

thick rapids
#

hey guys

#

do you suggest tensorflow over pytorch for machine learning

#

general purpose machine learning

agile cobalt
thick rapids
untold fable
#

from flask import Flask, render_template, request, jsonify
import requests

app = Flask(name)

Replace this with your Hugging Face token

HUGGINGFACE_API_KEY = "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"

Inference API URL

API_URL = "https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta"

headers = {
"Authorization": f"Bearer {HUGGINGFACE_API_KEY}",
"Content-Type": "application/json"
}

@app.route('/')
def home():
return render_template("index.html")

@app.route("/api/chat", methods=["POST"])
def chat():
try:
user_message = request.json.get("message")
print("User Message:", user_message)

    prompt = f"You are a helpful medical assistant named Oxy. {user_message}"

    # Generate a reply using a local model pipeline (if installed)
    # Or you can use a simple hardcoded reply for now
    # Here's a dummy response for testing:
    response_text = ""

    if "fever" in user_message.lower():
        response_text = "It sounds like you have a fever. Stay hydrated, rest, and monitor your temperature regularly."
    elif "cold" in user_message.lower():
        response_text = "Symptoms of a cold include a runny nose, sore throat, and mild fatigue. Get rest and drink fluids!"
    else:
        response_text = "Sorry, I couldn't understand. Try rephrasing your question."

    return jsonify({"reply": response_text})

except Exception as e:
    import traceback
    traceback.print_exc()
    return jsonify({"reply": "โš ๏ธ Something went wrong. Please try again."}), 500
arctic wedgeBOT
# untold fable

Please react with โœ… to upload your file(s) to our paste bin, which is more accessible for some users.

untold fable
#

i need help why this is not workig

agile cobalt
#

Ideally shouldn't include it in the code in first place, but rather use environment variables or other ways of managing secrets

#

(go delete/revoke/regenerate it in your HuggingFace settings ASAP)

fallow coyote
gray slate
#

you can put .env into .gitignore so it doesn't get shared with anyone

fallow coyote
calm thicket
leaden narwhal
#

Hello everyone, this next weekend Iโ€™m going to have a coding challenge and Iโ€™m going to need to tackle docker, aws s3, lambda and ec2, flask/fast and restapi and pytest. Does anyone have a comprehensive kaggle notebook or GitHub repository link in which I can get some practical experience. Thanks!

serene scaffold
#

or possibly any, since you don't really use flask, fastapi, or pytest in a notebook.

#

and docker and aws aren't part of python.

mighty grove
#

Hello! Anybody here with experience with the Awpy library?

serene scaffold
leaden narwhal
gritty vessel
#

Hey guys

#

How do you find relation between continous values and categorical values?

#

i checked the distribution of data what are the ways i can see if there is linear or non linear relation between them

#

pearson coorelation will not work as we need mean as well in it

#

but we cant find mean of categorical values

leaden narwhal
#

Cosine similarities?

#

Maybe Iโ€™m yapping

gritty vessel
#

its not used for coorelation I think

leaden narwhal
gritty vessel
#

annova has assumptions that data is normal distributed

leaden narwhal
#

Normalize it then

gritty vessel
#

i got three categories

#

I got three categories

#

01,2

leaden narwhal
#

GMP is your target right?

gritty vessel
#

and TIR!,WV,MIR are continous vars

gritty vessel
#

so i plotted the distribution of each TIr1,wv,mir whenever flag is 0 ,1,2

leaden narwhal
#

Have you done modelling yet

gritty vessel
#

no I am performing eda

#

trying to find if there is any relation between these vars with GPM Flag

#

after this I will move to Modelling

leaden narwhal
#

Try kruskal-wallis

#

Test

gritty vessel
#

okie

#

do you think this is normally distributed?

leaden narwhal
#

Itโ€™s used for not normal distributions

gritty vessel
#

yes I checked it

leaden narwhal
gritty vessel
#

doing it now

#

I was watching a vid about it

#

still watching I have to watch mann witney test before this

gritty vessel
quaint mulch
gritty vessel
#

okie just a min

quaint mulch
#

another alternative is by adjusting alpha. this is a matter of preference

#

depending on the number of sample, if you only have 3 continous variable, you can also plot 3 scatters, each scatter, will plot 2

#

(Actually seaborn pairplot is the way to go haha) gives you everything you need

gritty vessel
#

interesting

gritty vessel
#

I bin the data based on the categories and then I create plot Like I am doing i nthese

quaint mulch
#

sns.pairplot(penguins, hue="species")

quaint mulch
# gritty vessel

no, I mean the other way around.
for each figure, you plot the same variable e.g. TIR1
but you plot 3 different histogram for differeng GPM_Flag

gritty vessel
#

Oh ok got it

#

so we will find difference

#

I excluded Flag = 0 as it represents no rain

#

and its in dominance and because of that flag 1 and flag 2 were not visible properly

quaint mulch
#

desnity=True
coz you have imbalance data

#

and you can plot all 3 if you use density=True

#

it feels so weird to do back seat EDA but also fun

#

lol

gritty vessel
#

Why we are not considering frequency ?

quaint mulch
#

well, we are trying to compare the distribution no?
like if they have different mean/std / peak etc2?

#

if they have different shape?

gritty vessel
quaint mulch
#

yeap

gritty vessel
quaint mulch
gritty vessel
#

no

#

for flag 0

#

TIR1_Temp is on higher side

quaint mulch
quaint mulch
gritty vessel
#

for flag 1 it goes to lower side

quaint mulch
gritty vessel
#

yes

quaint mulch
# gritty vessel

maybe im blind lol, or just preference, but I prefer to look at these 3 plots, instead of that grid of 9.

gritty vessel
#

its easier to comapre when its three in one

quaint mulch
gritty vessel
#

just visulisation

quaint mulch
gritty vessel
#

Good enough

#

it looks like Tir1 will be the major predictor

#

for flag

#

Im sending the pairplot

gritty vessel
quaint mulch
#

lol hahaha, you don't have to send it to me, it is for yourself to dechiper

#

enjoy dinner, I have to sleep too

gritty vessel
quaint mulch
#

hahaha, you need to ajdust alpha, maybe use KDE, and turn on density
Might be easier to redo it in pyplot manualy
but this might give you interesting insight, and I hope you get the gist

steep cypress
#

hello ๐Ÿ‘‹
had a question, how to approach multivariate time series anomaly detection with or without ML? Actually the inputs are signals in a steel manufacturing plant -- so signals of equipment at latter stage correspond to the same metal from earlier region.
There's data drift as well i.e. whenever the plant turns on after a config change, value range of many signals change. We have over 100 signals corresponding to many regions

would really appreciate some guidance and thoughts!

gritty vessel
rich moth
gritty vessel
steep cypress
gritty vessel
#

something like pinns?

#

but for signals

rose spade
#

How do I create and train my own AI Model in scratch for a voice recognition with text translation using PyTorch?

gritty vessel
lapis sequoia
#

you do ml with python or c++?

gritty vessel
#

python

lapis sequoia
gritty vessel
#

c++ is faster

#

but I need more functions and all for easier workflow

wheat snow
#

how do i check dtype of smth again?

#

i suppose it is a numpy array

viscid urchin
#

somearray.dtype

wheat snow
#

i remembered type(smth) exists

#

in normal python

#

can someone recommend a course/video series/webpage/book to learn about some matplotlib fundamentels and how i responsible setup plots with stuff like .subplot, .ax and that stuff.... and especially when and when to use the fig,ax setup method

glacial root
#

for a liscence plate recognizer would using easyocr be considered cheating

viscid urchin
#

Depends on the "rules"; it's certainly not cheating if you just need to get it done

glacial root
#

so i think i'll stick to just doing implementations from scratch and using some libraries for projects

#

but it seems like easyocr takes off a little too much of the work

viscid urchin
#

easyocr turns it into a text classification problem vs. an image recognition one, yeah

glacial root
#

i see

#

my reason for doing this project is to use both

#

so the end goal is for it to be able to recognize the location of the liscence plate on an image and then print the plate number elsewhere

#

though i have a lot to learn before i can do this if i want to truly understand the concepts behind it

rustic elk
#

I'm building a KNN using sklearn and using cross-validation to find the best value for k.
Is it overkill to test different numbers of folds and pick the most common k across them?

serene scaffold
rustic elk
#

Yes forgot to mention, different k values as well

serene scaffold
#

(it's usually called k fold cross validation, but we're already using k for k nearest neighbors, so I picked a different letter)

serene scaffold
rustic elk
#

but anyways, thanks

serene scaffold
#

yw

#

you can use itertools.product to loop over all the "combinations"

rustic elk
#

I did it "manually" with the help of GridSearchCV from sklearn because I want to plot the results for each fold / k later

serene scaffold
#

sure

toxic pilot
#

getting really wild fluctuations while testing with the valid set, but the train set seems to be doing fine... anybody know why this might be?

#

increasing batch size seems to help a little bit, so i guess im over fitting? it's a binary classifier and my average validation epoch accuracy is hovering at around 50-60% (so it looks like pretty much random classification)

tawdry finch
#

i dont want to get into web development i want to becoma a data analyst or fo in the field of fintech, ai ml what should i do i know some basic python and currently learning flask and web scraping etc

#

i m a student by the way just cleared of high school

lavish wraith
#
print(my_array)
print(my_array.dtype)

my_series = pd.Series([1,np.nan,2,3])
print(my_series)```
 what is difference numpy data type and pandas numpy and pandas also same  handle missing value
wooden sail
#

they're basically the same, pandas uses numpy

clear topaz
#

Hello everyone!
Can anybody suggest some good resources to learn AIML? I am already good with basic data structures and have done some web scraping, so I think I am eligible to dive directly into concepts.

#

Oh... This chat seems to be really dead

grand minnow
clear topaz
grand minnow
#

It doesn't look that confusing. Its literally a resource list

dreamy night
#

guys is there any good course available on statistics ? for ml and dl enthusiasts

grand minnow
clear topaz
dreamy night
clear topaz
#

Ok

verbal oar
#

is code generation special case of nlg?

#

I searched and says nlp so maybe yes

serene scaffold
toxic pilot
serene scaffold
#

numpy has been the backend for pandas, but that's largely an implementation detail.

verbal oar
#

so it looks like I must sample github for code?

#

because massive amounts of data is beyond my storage

#

looks like biggest roadblock is data collection when I have data I just use some llm model train on data and predict as usual

#

Data:
AI code generation models are trained on massive datasets of public code, including open-source projects.

#

maybe there is some code dataset?

#

it will be quicker than collecting

#

just want to see how it works assuming I have already data

#

I assume its not like
prompt: write hello
code: print("hello")

result: print("hello")
because it looks like pairs or rule based

toxic pilot
verbal oar
#

ok

toxic pilot
rustic elk
#

How likely is that two distinct k values perform exactly the same in a knn algorithm ?

#

Precision scores (and acc) are exactly the same (15+ floating point precision)

toxic pilot
#

basically unless you have an incredibly noisy dataset or a very easily separable data set, it probably won't happen

umbral hearth
rustic elk
#

I am using sklearn, the algorithm is not my own

verbal oar
#

looks like bad param/params

toxic pilot
rustic elk
#

shi

verbal oar
#

compare your to some example knn

toxic pilot
umbral hearth
#

I got a really quick question
Is it worth it to get the PCPP-32-1?
I got pcep and pcap and i feel like pcpp practices arent that important

verbal oar
#

like not scaled, normalized?

rustic elk
# verbal oar compare your to some example knn

I mean, mine is pretty straighfarward as well

 def feed_data(self, train_data):
    self.X = train_data.drop(self.response_column, axis=1)
    self.y = train_data[self.response_column]
                                                                                                  
    categorical_cols = self.X.select_dtypes(include=["object", "category"]).columns.tolist()
    numeric_cols = self.X.select_dtypes(include=["int64", "float64"]).columns.tolist()
                                                                                                  
    self.preprocessor = ColumnTransformer(
        transformers=[
            ("num", StandardScaler(), numeric_cols),
            ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols),
        ]
    )
                                                                                                  
    self.X_train, self.X_valid, self.y_train, self.y_valid = train_test_split(
        self.X, self.y, test_size=self.test_size, random_state=self.random_state, stratify=self.y
    )
#
def fit(self):
    if self.best_k is None:
        raise ValueError("k value has not been set yet. Call find_best_k() or define it manually when creating the KNN")
    if self.X is None:
        raise ValueError("Training data has not been fed to network yet. Call feed_data() first.")
                                                                                                                                  
    self.final_model = Pipeline(
        [
            ("preprocessor", self.preprocessor),
            ("classifier", KNeighborsClassifier(n_neighbors=self.best_k)),
        ]
    )                                                                                                                           
    self.final_model.fit(self.X_train, self.y_train)
                    
    validation_predictions = self.final_model.predict(self.X_valid)
    validation_accuracy = accuracy_score(self.y_valid, validation_predictions)
    validation_precision = precision_score(self.y_valid, validation_predictions, average="macro", zero_division=0) # type: ignore
                    
    self.validation_metrics = {
        "Accuracy": validation_accuracy,
        "Precision": validation_precision,
    }
                   
    precision_yes = precision_score(self.y_valid, validation_predictions, pos_label="yes", zero_division=0) # type: ignore
    precision_no = precision_score(self.y_valid, validation_predictions, pos_label="no", zero_division=0) # type: ignore
                      
    accuracy_yes = (self.y_valid[self.y_valid == "yes"] == validation_predictions[self.y_valid == "yes"]).mean() # type: ignore
    accuracy_no = (self.y_valid[self.y_valid == "no"] == validation_predictions[self.y_valid == "no"]).mean() # type: ignore
                        
                    
    self.final_model.fit(self.X, self.y)
         
    return self
verbal oar
#

ok so you have standardscaler

rustic elk
#

I don't remember why I added it, I think I was getting perfect results without it

#

lemme check something

toxic pilot
verbal oar
#

yes to avoid model of remembering

toxic pilot
#

actually idk if that impact will be huge if ur tanking your weights between each run

toxic pilot
#

those certifications and stuff are pretty much entirely meaningless

rustic elk
#

removing StandardScaler my acc and pre goes down, also my network seems to prefer the smallest k value

verbal oar
#

but scaling is best practice

rustic elk
#

(I am testing different number of folds and k's)

toxic pilot
#

that might help a bit

rustic elk
#

wdym ?

toxic pilot
#

also it might just be your dataset

toxic pilot
rustic elk
#

This is with StandardScaler(), seems better I think but its weird that 3 and 7 have the exact same performance idk

#

First time tackling with this stuff so yea

rustic elk
toxic pilot
#

linear discriminate analysis or whatever

verbal oar
#

what this is about classifier of what?

toxic pilot
#

and ofc it might be your dataset

verbal oar
#

so looks like noisy dataset and or need still some preprocessing

rustic elk
toxic pilot
rustic elk
#

for k = 3 the results are good no ?

#

just weird but idk if I want to bother more with it

toxic pilot
rustic elk
#

why ?

toxic pilot
#

theyโ€™re good!

rustic elk
#

ah aright

verbal oar
#

overfitting?

toxic pilot
#

feel like pushing for more accuracy would beโ€ฆ pushing it

rustic elk
#

Exactly the same with k = 7 but ill leave it like that

toxic pilot
#

but usually if overfitting youโ€™d see wild fluctuations in the validation

#

and thereโ€™d be a huge gap between accuracy during train and test

clear topaz
#

Damn, I gotta learn so much yet
My syllabus in AIML diploma hasn't even truly started ig

verbal oar
#

yes like these classic curves

#

on plot

clear topaz
#

Lmao

toxic pilot
#

i donโ€™t think itโ€™s over fitted

verbal oar
#

better seen on plot

rustic elk
#

wouldn't that prevent it

toxic pilot
verbal oar
#

ok yes no but is it spam or else?

#

I mean what are you trying to classify?

rustic elk
#

idk what spam is

#

I am trying to predict the if a person is a responder on a campaign based on some past data

verbal oar
#

ok maybe now it will be easier to help

rustic elk
#

64 female urban free never 1.00 1.00 0.00 0.00 0.00 no

#

This is what the data looks like

#

first number is age and the rest are logins the last 4 weeks, 6 months, purchases in last 4 weeks, 6 months and total purchases

#

they arent between 0-1 it just happend to be to this example i copied

verbal oar
#

and last is no/yes i see

rustic elk
#

I don't even know if there is something wrong, the only odd thing is the similarity between 3 and 7 k values

#

thats all

verbal oar
#

I suggest to compare your code with some tutorial knn but watch out there will be different problem and dataset, but many things have in common

rustic elk
#

aright ill see what I can do

rustic elk
#

so I guess is an "ok" phenomenon

verbal oar
#

yes looks like it could be, right

#

but you have 3 and 7 not 3 to 7 ๐Ÿค”

toxic pilot
rustic elk
#

Ill try to copy his code see what i get

toxic pilot
verbal oar
#

i saw weird jumps

rustic elk
#

I think i did an opsie

verbal oar
#

perhaps

toxic pilot
#

wait iโ€™m not seeing where the problem is

#

fluctuations are p normal

#

just not huge percentages in fluctuations

opaque condor
#

How do I implement a stop so the network doesn't over fit to the data

serene scaffold
opaque condor
#

Thank you

verbal oar
#

self.final_model.fit(self.X_train, self.y_train)

...

self.final_model.fit(self.X, self.y)

is it ok?

#

not just one .fit?

toxic pilot
#

another way to check is to just see if your epochs are seeing any improvements

toxic pilot
# gritty vessel c++ is faster

most of the data processing or ai/ml related libararies are implemented with C under the hood. python is a wrapper, and the overhead is not worth talking about

gray slate
#

basically hot loops go a language that's slow to write and fast to execute, cool ones go in one that's easy to read and fast to write

toxic pilot
ripe rampart
toxic pilot
#

hai :D

flint onyx
#

how would I do this question? ๐Ÿ˜ญ have an exam tmr

#

would u1 be from bottom left to top right
and u2 perpendicular to that?
both passing through that black circle in the middle

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied timeout to @rich moth until <t:1745818246:f> (10 minutes) (reason: attachments spam - sent 8 attachments).

The <@&831776746206265384> have been alerted for review.

limber spear
#

Oh snap. Plunder why are you spamming the channel ๐Ÿ˜‚ post your plots in a single directory file or something

limber spear
#

Looks like you just have to draw 2 arrows here u1 and u2 in the direction of the PCA. Not sure about this. What do you think chat

rich moth
limber spear
#

This channel is too fire.

rich moth
#

all the data I've been looking at about this makes me wonder something about data's overall structure, almost like its "dimensional constraints". so I made this formula. but in it I have this phase and it seems this phase angle is basically acting like almost some type of trace for that idea.

#

im calling it like a "structural DNA", but it seems inherent it in all data types based on the complexity measurement tool which gives you ฮฆ(x)

#

anyone? what do you guys make of this ? honestly

limber spear
rich moth
#

well at first i though just the different curriculm method you used determined by the domain was the key, but then i saw something different in all the data . something about the inherent complexity on data thats based off magnitude and phase paint a realy important picture

verbal oar
#

can I transfer learn text to code I mean ready made text model which works on code?

#

because code is some kind of text

rich moth
#

๐Ÿคฏ

#

what?

verbal oar
#

so use some text generating model to adapt to code

rich moth
#

ok

#

Im super confused.

limber spear
#

Text to code. Like text to speech blobhuh TTS

verbal oar
#

I see there are code gen models but text gen models are more

rich moth
#

Isnt Mistral a good all round one with coding?

#

Ohh i see.

verbal oar
#

dont know yet

rich moth
#

Text to code.

verbal oar
#

yes

#

write hello
print("hello")

#

text to code

limber spear
#

I donโ€™t think even ChatGPT can do that at the moment. Or the trending DeepSeek model tbh

#

Claude. Pretty much every โ€˜cutting edge AIโ€™ today ๐Ÿ˜‚

verbal oar
#

yes claude can

limber spear
#

Explain explain pikawow

verbal oar
#

for example can generate some snippet of sentiment analysis

limber spear
#

Does it have speech to code

verbal oar
#

dont know

limber spear
#

But my point is. Your idea is innovative ๐Ÿ‘

verbal oar
#

I didnt tested it i just got some code from someone one year ago and I thought he wrotes but said its claude output

limber spear
#

If big tech steals our ideas know it came from the Python community ๐Ÿ˜Œ

polar shard
#

hi

verbal oar
untold fable
#

hey man

#

how are you

verbal oar
#

fine
This is neat
I wrote
write sentiment analysis
got text
then
write code for it and above code as in screenshot

#

I dont think my idea is innovati e text to code is already

vivid skiff
#

what is the difference between torch.compile and torch.export?

verbal oar
#

compile is same here as in keras?

#

still dont know how it works I mean input text output code

#

I saw somewhere it tokenizes code but what next?

limber spear
# verbal oar I dont think my idea is innovati e text to code is already

You have to think text to code <-> code to text <-> text to speech <-> speech to text. When YouTube videos or Zoom and Teams meetings are transcribed to text(subtitles), they are slow and 100% not accurate.

When you apply this idea to applications of text to code, there is nothing that comes to mind that is cutting edge โ€˜stableโ€™ even with sentiment analysis

#

Mind you these are โ€˜production gradeโ€™ products ๐Ÿ˜‚

#

Ok I have to backtrack. People worked hard on these products.

verbal oar
#

so this is scam which need developer to fix ๐Ÿ˜‚ ?

#

I dont trust these tools I just want to learn

#

I dont care about it I just want to make money if its legal

#

personally for me I dont use it

#

I have mindset just do dont care

#

why I dont trust because code on which is trained is not shared

#

its like lets train model on leetcode

#

but better to train it on gh repos but still different people have different style of writing code

#

for me its little controversial

#

and also I dont trust text generation I dont just see proof it works

#

if it makes mistakes or make mistakes sometimes

toxic pilot
verbal oar
#

ok explained thanks

toxic pilot
#

i suppose stacked LSTMs could also work but itโ€™d be slow and very inaccurate

#

you need the attention layers of transformers

toxic pilot
verbal oar
#

so just look at process of generating text and generalize it to code?

verbal oar
#

different but similar

toxic pilot
#

thatโ€™s a pretty big oversimplification tho

verbal oar
#

just implementation differs

toxic pilot
toxic pilot
#

i though you said speech to text ๐Ÿ’€

limber spear
#

I was about to catch some Zzzโ€™s ๐Ÿ‘€

#

The only difference between what some of us do in this space is 2 words, custom and proprietary. That is how I look at it

verbal oar
#

no speech to text no
Im talking about text generation and text to code

#

and similarities about it which I didnt talk

limber spear
#

It can get very complex very fast mq. For example have you looked into abstract syntax trees

verbal oar
#

claude gave me lstm based instead of transformer text generation
for text to code I reached limit :sweatsmile:

#

on nlp course I had sth about formal grammars etc

#

Ah ast right because it parses code

limber spear
#

Data science is a very new field, but in my opinion the foundation of it also includes foundations in computer science

verbal oar
#

yes better explored with context of it

limber spear
#

My hopes are that a lot in this field study rigorously. There is a lot to explore ๐Ÿซก

verbal oar
#

yes for example feeling semantic web but why i need it and then you meet topic of nlp where you see usage of semantic web and ontology

#

yes btw wordnet is for text is there sth for code?

limber spear
#

Maybe lspโ€™s

toxic pilot
toxic pilot
#

you can feed your context as an AST and have the model predict the next node. then, deterministically convert the ast to the language in question

verbal oar
#

lsp as in language server or other meaning?

#

yes I think it was about language server protocol

#

which provides language intelligence tools

#

of course server provides not lsp

toxic pilot
#

???

#

you dont need an lsp to do code generation

verbal oar
#

maybe lsp's

quaint mulch
limber spear
#

This is the backend of every programming language

#

You can dig further into compilers but that goes more to foundations in computer science

limber spear
#

What yโ€™all cooking chat

#

Iโ€™m handwriting a cart decision tree build this week

limber spear
#

Iโ€™m not sure what do you think chat. Where are the tokenizers deadge

toxic pilot
limber spear
#

Iโ€™m well aware of llvm. Chris is in my LinkedIn network lol

toxic pilot
#

all lsp does is interface with the IDE to provide syntax highlighting and completions and so on

#

nothing to do with the backend of compilers

verbal oar
#

about to make some python code generation based on some gh repos, some prototype

toxic pilot
#

shouldnโ€™t really matter. just use bert_cased or smth

#

might be useful to train a tokenizer on programming languages tho

limber spear
#

Letโ€™s help make mq the next Steve Jobs chat

verbal oar
#

get rid of unemployment would be enough ๐Ÿ˜„

limber spear
#

And ethqnol his Steve Wozniak ๐Ÿ˜‰

#

I can be some rando founding dev deadge

#

โ€˜No one cares about him. He is just a founding employeeโ€™

verbal oar
#

so frontend of app is done with streamlit or flask?

#

instead of just showing in console which is not user friendly

weary timber
#
#

please help me with this im desperate i tried everything ๐Ÿ˜ข๐Ÿ˜ข๐Ÿ˜ข

toxic pilot
#

try decreasing batch size

flint onyx
toxic pilot
verbal oar
#

this is hugging face specific train and evaluate instead of fit and predict?

#

trainer.train()

serene scaffold
# verbal oar `trainer.train()`

I suppose. I think that with trainer objects, you specify the training and test data in advance, whereas with sklearn style models, you pass some or all of the training data (and not the test data) to fit.

toxic pilot
serene scaffold
toxic pilot
serene scaffold
#

No, it has lots of supervised models

toxic pilot
#

ah it does have KNN

#

it looks like

limpid dew
#

what is KNN?

serene grail
#

I'm guessing K Nearest Neighbors in this context

stray gulch
viscid urchin
#

Maybe this one is better because it summarizes the back-story more? https://pyimagesearch.com/2021/05/06/implementing-the-perceptron-neural-network-with-python/

past meteor
viscid urchin
#

They are sorta "my first supervised learning", right?

#

I may just be out of date though, hence the question I guess.

past meteor
#

It's fairly straightforward but I'd recommend just covering linear and logistic regression

limpid dew
#

I understand how trees work but could anyone explain to me how models which use trees like random forests learn the binary operators at each node? Is there some process which is analogous to back propagation?

serene scaffold
#

with a random forest, you use all the decision trees that you made, and take all their predictions. you can either use the most frequent prediction as the system prediction, or weigh the prediction of each tree differently, or whatever you want.

toxic pilot
opaque sphinx
#

hi is anyone here familiar with google colab? having trouble with enabling t4 gpu runtime, need some help if possible. doing unsupervised learning here and my code seems stuck, found out that Im running with cpu so i changed to gpu t4 and theres a warning that says im not utilizing gpu

serene scaffold
opaque sphinx
#

I tried googling it and someone mentioned i shud install pytorch and fastai to run it, when i did this came out instead

serene scaffold
#

What library is your code using? Torch?

#

Oh, if you've maxed out your free GPU limit, you'll have to pay or wait.

#

Please always always share code as text

#

!code

arctic wedgeBOT
#
Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

opaque sphinx
serene scaffold
#

@opaque sphinx please permanently remember this ^^^^

opaque sphinx
#
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import time

from collections                import Counter
from imblearn.over_sampling     import SMOTE

from sklearn.model_selection    import StratifiedKFold, train_test_split
from sklearn.preprocessing      import LabelEncoder, StandardScaler
from sklearn.ensemble           import IsolationForest
from sklearn.svm                import OneClassSVM
from sklearn.neighbors          import LocalOutlierFactor
from sklearn.metrics            import (
    classification_report,
    precision_score,
    recall_score,
    f1_score,
    make_scorer
)
from sklearn.inspection         import permutation_importance

warnings.filterwarnings('ignore')

serene scaffold
opaque sphinx
#

abit of context is I am trying to run to detect anomalies in a prescription dataset to detect errors, so i been running isolation forest, SVM and localoutlier factor, but the recall is too low, hence ive been doing tuning and using SMOTE for oversampling, but bcs of this I cant do so

opaque sphinx
#

I also have a macbook M3 pro

opaque sphinx
#

been doing some reading online saying I need to run the model on pytorch, the documentation says i need to run tensorflow, but when I did the error came out again

serene scaffold
#

If they told you that you used up your free compute, they're not kidding. I think it resets every day

serene scaffold
#

And it should be pytorch.

serene scaffold
opaque sphinx
serene scaffold
#

Don't try tensorflow if you haven't already.

opaque sphinx
#

ok i wont

#
!ltt install torch torchvision >> /.tmp
!pip install fastai --upgrade >> /.tmp

import torch
assert torch.cuda.is_available(), "GPU not available"
#

i ran these, but the output is also gpu not available, then i try change the runtime, same error occurs

serene scaffold
#

You might not have installed the version of torch that has the cuda driver

limpid dew
#

You can run it on your own gpu is you install CUDA

opaque sphinx
#

lemme google how to install cuda into google colab

limpid dew
#

You don't install it into google colab.

serene scaffold
#

Well if colab isn't letting you use the GPU, it doesn't matter

#

It will always say that cuda isn't available until it lets you use the GPU again

viscid urchin
#

Check out that 'start locally' page, it has a thing at the top where you can click on the versions you want and it will show you the install command.

opaque sphinx
serene scaffold
#

Yes, or you can move all this to your computer

opaque sphinx
limpid dew
#

Try to run the code in a .py file locally on your pc (not using colab)

opaque sphinx
#

ok i will try it in pycharm

#

thanks guys

toxic pilot
#

also itโ€™s so speedy

#

the only issue i have with tensorflow is that tf code is more difficult to understand

#

people also say itโ€™s less โ€œpythonicโ€ which is not something i really care about but if you do, wellโ€ฆ thatโ€™s something to consider

limpid dew
#

My understanding as someone who only uses pytorch, is that keras is more user friendly but pytorch gives the user more control and is therefore better for research applications.

toxic pilot
#

thats just a personal opinion tho. ultimately i think the difference is minimal enough to not be worth talking about.

limpid dew
#

People like tensorflow better for production right?

toxic pilot
#

TF is supposed to be super versatile in a production environment, and pytorch has better tools for regular users

#

tbh i dont think it actually matters which you use. if you learn one, learning the other is trivial. personally i use pytorch tho

limpid dew
#

That seems right. I doubt there's much you can do with one library which you CAN'T do in the other.

serene scaffold
glacial root
#

yo isn't that against the server's terms/conditions

serene scaffold
#

!cleanban 1266449020306587688 Asking for jobs after being told not to.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied ban to @spiral locust permanently.

tawdry finch
#

i heard currently pandas and bs4 etc are outdated and replaced by more powerful libraries like polar, crawl4ai etc so anyone who knows or is a data analyst can you pls guide me on which libraries to learn as well as i am completely open to learn under someone

serene scaffold
rich moth
rich river
#
[component_container-1] 2025-04-29 14:52:42.409273636 [W:onnxruntime:, graph.cc:1348 Graph] Initializer onnx::Conv_2881 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
[component_container-1] 2025-04-29 14:52:42.461690793 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 2 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
[component_container-1] 2025-04-29 14:52:42.462035032 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 12 Memcpy nodes are added to the graph sub_graph4 for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
[component_container-1] 2025-04-29 14:52:42.464001375 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
[component_container-1] 2025-04-29 14:52:42.464005845 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
#

has anyone ever used ONNX for exporting models? Im not sure how to solve this

#
import onnx
import torch
from perception.src.mask_rcnn_model import MaskRcnnModel

# Function to Convert to ONNX
def Convert_ONNX(model, target_path):
    # set the model to inference mode
    model.eval()
    target_height = 800
    target_width = 1100
    dummy_input = torch.randn(1, 4, target_height, target_width, requires_grad=True)
    dummy_input = dummy_input.cuda()
    # Export the model
    torch.onnx.export(
        model,  # model being run
        dummy_input,  # model input (or a tuple for multiple inputs)
        target_path,  # where to save the model
        export_params=True,  # store the trained parameter weights inside the model file
        opset_version=12,  # the ONNX version to export the model to
        do_constant_folding=True,  # whether to execute constant folding for optimization
        input_names=["modelInput"],  # the model's input names
        output_names=["modelOutput"],  # the model's output names
        dynamic_axes={
            "modelInput": {0: "batch_size"},  # variable length axes
            "modelOutput": {0: "batch_size"},
        },
        keep_initializers_as_inputs=True,
    )

if __name__ == "__main__":
    print(torch.cuda.is_available())
    maskrcnn_path = "maskrcnn_rgbd_2025-01-28_epoch_526.pth"
    maskrcnn_model = MaskRcnnModel(maskrcnn_path, 0)._model

    target_path = "MaskRCNNModel.onnx"
    Convert_ONNX(maskrcnn_model, target_path)
    onnx_model = onnx.load(target_path)
    onnx.checker.check_model(onnx_model)
    print(onnx.helper.printable_graph(onnx_model.graph))
ocean hinge
#

Hello. I didn't know where else to ask this. I have experience as a game dev and a dell bhoomi dev. I am trying to switch career to data science. But idk where to start. Can anyone provide me a roadmap? Like what learn in a format? And where to learn?

grand breach
#

the docker image i created for my document assistant RAG tool is as big as the virtual env i created for it ~ 10 gb - i tried multi stage build, minimal requirements.txt etc but nothing is reducing it's size

grand breach
#

is it ok to deploy if the image is 10 gb

rich river
# rich river ```python import onnx import torch from perception.src.mask_rcnn_model import Ma...

changing from keep_initializers_as_inputs=True to False solves all warnings regarding This may prevent some of the graph optimizations, like const folding
but I still havent figured out how to solve these warnings

[component_container-1] 2025-04-29 17:14:53.086686347 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 2 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
[component_container-1] 2025-04-29 17:14:53.087215064 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 14 Memcpy nodes are added to the graph sub_graph4 for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
[component_container-1] 2025-04-29 17:14:53.089272371 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
[component_container-1] 2025-04-29 17:14:53.089276591 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
weary timber
#

?

rich river
#

I used netron to visualize the graph, maybe I cannot find which nodes go wrong and just assume it won't affect the performance and ignore it

verbal oar
#

I see some usage of tree sitter

#

temperature is related to simulated annealing or its in different context?

#

generally temperature is for controlling randomness as from description of open ai

#

but honestly im confused

crimson jackal
#

Hey guys. I created a kind of street fighter game using pygame and i want to have a ai model for my oponnent. Can someone give some insight on how to do it with reinforcement ml? I know just a little about ml. Not much.

toxic pilot
#

you could aslo try switching to gradient penalty instead

#

and if you're getting a bunch of fluctuations, maybe add a layer norm or batch norm somewhere in the middle

#

and also like tune your parameters, like maybe try doubling n_critic

unkempt apex
copper umbra
#

Hey folks looking for advise on preferred UI

For years, I have been using anaconda spyder. I love the setup of that program. However, I need to find an alternative that has GIT repo integrations.

Which do you guys love

  1. I hate Jupiter notebooks
  2. Needs to have 3 windows at least- code, output (for print data check) , variables (click df and see the frame etc).
  3. Has highlight code and run options not run whole file from command line
agile cobalt
serene scaffold
iron basalt
#

Variable value check is included here in it being the Python REPL, so you can just type the variable name.

verbal oar
#

I dont like light theme of jupyter notebook, what else dark or other theme?

#

too green text boring background

#

I also discovered I can use jupyter notebook inside vs code

viscid urchin
#

If you have I'm guessing you want something more fundamental?

agile cobalt
#

understanding which aspect of it?

pale thunder
#

I drew this with matplotlib like so. While it is kind of fine, is there some library that would be better suited to this kind of drawing?

fig, ax = plt.subplots()
for a, b in relevant_edges:
    ax.plot(*zip(a, b), color='grey')
for a, b in all_edges:
    ax.plot(*zip(a, b), color='blue')
for a, b in path:
    ax.plot(*zip(a, b), color='green')
ax.set_aspect('equal', 'box')
ax.axis('equal')
ax.scatter(*zip(*all_pts))
ax.scatter(*start, color='red')
ax.scatter(*end, color='green')
plt.show()
pale thunder
#

the biggest reason to go with networkx is usually that it can lay out the graph nodes comfortably, but here, I know the exact positions of each node.

viscid urchin
# pale thunder I drew this with matplotlib like so. While it is kind of fine, is there some lib...

There's Plotly https://plotly.com/python/

Something like...

import plotly.graph_objects as go
figure = go.Figure()
for a, b in relevant_edges:
    figure.add_trace(go.Scatter(x=[a[0], b[0]], y=[a[1], b[1]], 
                         mode='lines', line=dict(color='grey'), showlegend=False))
# draw the rest of the stuff
# ..draw the points etc
x_pts, y_pts = zip(*all_pts)
figure.add_trace(go.Scatter(x=x_pts, y=y_pts, mode='markers', 
                            marker=dict(color='black'), showlegend=False))
# then some kind of figure.update_layout(...) call
# and finally figure.show()

at least that's what I get from their docs, take it with a grain of salt.

Plotly's

#

ooh actually maybe seaborn would be the slickest here?

#

It's got like..

x_pts, y_pts = zip(*all_pts)
seaborn.scatterplot(x=x_pts, y=y_pts, s=50)

I've never used it but I've heard it mentioned a number of times.

#

Plotly doesn't really seem less-verbose than your original code, it's just different.

verbal oar
#

why its called seaborn?

#

heatmap from seaborn I only relate

viscid urchin
#

Hah, I had to look it up, and apparently it's an obscure joke relating to https://en.wikipedia.org/wiki/Sam_Seaborn

Samuel Norman Seaborn is a fictional character played by Rob Lowe on the television serial drama The West Wing. From the beginning of the series in 1999 until the middle of the fourth season in 2003, he is deputy White House Communications Director in the administration of President Josiah Bartlet played by Martin Sheen. The character departed f...

#

Hence the common import alias of import seaborn as sns :\

verbal oar
#

so similar to python is not from snake but from monty python

teal depot
#

Hi, I'm new to Python and want to start AI/ML, but I don't know how to get started. Please help me with some recommended courses and tutorials.

glacial root
#

what kinds of things are graph neural networks used for