#data-science-and-ml

1 messages ยท Page 174 of 1

past meteor
#

I just misunderstood what point you were trying to make ๐Ÿ™‚

next panther
errant lake
#

sorry if unclear.

worldly dawn
#

I am big fan of doing some RICE scoring

#

It helps frame problems with a specific goal and can be done as a team

past meteor
#

I didnโ€™t know of this, Iโ€™ll check it out thanks a lot

worldly dawn
#

And in terms of orgs/groups/teams, I do like giving problems with KPI, so it's about converging towards solving a problem that is crisp to everyone

#

It's as important to know what to work on as it is important to know what to not work on. It avoids a lot of issues with respect to engineers being annoyed or wondering about why we ain't working on that shiny thing

past meteor
#

We rarely formulate this stuff because most people on my team have great โ€œinstinctโ€

worldly dawn
#

Sure and that's great! But I find that having great instinct going in the same direction has its benefits

past meteor
#

Yup, itโ€™s not an excuse

#

Lastly, when it comes to data and Python one thing I notice a lot is that classic DS/DA profiles are highly specialised to the point where it becomes annoying

past meteor
#

Sometimes the solution isnโ€™t building a new data thing or model but just rethinking the business process and maybe putting a small app in the middle

#

Most of the people that use Python in my company are fully siloed to pandas, spark stuff

errant lake
next panther
#

i think bro loves vtube

#

btw gn guys

little bobcat
#

Hi

cedar fox
#

I am trying to train a model with tensorflow/keras and get this error:

UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches. You may need to use the .repeat() function when building your dataset.
self._interrupted_warning()
2025-09-20 22:35:46.493184: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
[[{{node IteratorGetNext}}]]
[[StatefulPartitionedCall/ArgMax/_6]]
2025-09-20 22:35:46.493207: I tensorflow/core/framework/local_rendezvous.cc:426] Local rendezvous recv item cancelled. Key hash: 1198440015494271145

I preprocessed my training data and saved it to .npz files. When I try to just loop infinitely over the .npz files, training never advances from "1/15 epochs". How do I resolve this? What am I missing and how do I trouble shoot it?

I'm fitting the model like this:

def train(
    model: keras.models.Model,
    training_data: Iterable[tuple[np.ndarray, np.ndarray]],
    steps_count: int,
    validation_data: Iterable[tuple[np.ndarray, np.ndarray]],
    batch_size: int,
    output_directory: Path,
) -> keras.Model:
    typer.echo("Training model")

    model.compile(
        loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"]
    )
    model.fit(
        batches(training_data, batch_size),
        validation_data=batches(validation_data, batch_size),
        epochs=15,
        steps_per_epoch=steps_count,
        verbose=2,
        callbacks=[BackupAndRestore(output_directory, delete_checkpoint=False)],
    )
    return model

My code is here: https://github.com/codeguru42/gobot/blob/steps_count/src/train.py

GitHub

AI for playing Go. Contribute to codeguru42/gobot development by creating an account on GitHub.

gritty vessel
#

Hey a rookie question but is there a way I can see how cnn is extracting features from an image

#

Because it's not able to capture patterns on my dataset but it memorize a sample when I train it on same like 2-10 sample for 500 epochs but when I train on like 12000 samples it's not able to capture patterns

waxen kindle
#

Your CNN might not be complex enough

#

Or your hyper parameterd are badly tuned

#

Do you have a plot epoch x error or epoch x loss ?

#

That you would have furing training

mossy blaze
#

Hi there! I want to share a GitHub project about artificial intelligence: https://github.com/Julien-Livet/ai. I am currently thinking about natural language learning step by step, after composing numbers, expressions and dealing relationships with Python standard types (bool, int, float, numpy.ndarray, Sympy and OpenCV functions). I am open to any constructive feedback about my work ๐Ÿ™‚ .

abstract loom
#

Hi

hollow oasis
#

sup guys

mystic heron
#

Anyone know

#

Anything about my script

#

That I can improve for better results/

serene scaffold
# mystic heron Anything about my script

no, none of us have seen your script or know what it's supposed to do or what the current results are. you have to say all of that for us to be able to help you.

mystic heron
arctic wedgeBOT
serene scaffold
mystic heron
#

Sure

#

can you see it?

serene scaffold
#

yes. you also have to say what the current results are.
I'm actually heading out, but hopefully someone will take a look.

mystic heron
#

Alr

#

The current resutls are... amazing

#

I got a 300% return within a 3 month period on BTCUSD

#

simulated on previous data

#

but its the same thing as it would do if it were live other than latency

#

that could be an issue but the model is pretrained

cedar fox
mystic heron
#

are you listening???

cedar fox
#

that's not a question

mystic heron
#

Can you please give me feedback?

cedar fox
# mystic heron Can you please give me feedback?

Can you be more specific? Maybe take some time to explain the purpose of your code. Then describe what the current results are. Is there any problems with the code that you need help with? Or are you just looking for a general code review?

mystic heron
#

I have a pdf

#

but it wont let me send. ill dm it to you

cedar fox
#

what does a pdf have to do with anything?

#

it's probably malware

serene scaffold
mossy blaze
stark frigate
#

Yoooo anyone codes in manim

#

Or uses manim

shy sonnet
#

Hi, Iโ€™m Francis ๐Ÿ‘‹

Aspiring Data Engineer learning Python & SQL, currently building my first projects.

Excited to learn & connect ๐Ÿš€

meager gate
#

I am open to contact too

fervent badger
toxic vault
calm cipher
hot otter
#

hi i am sparkling
i am exited to connect with you guys.๐Ÿ™‚

marsh iron
warm flame
#

heeeeeeeeeey guys

mystic heron
#

yo im gay

serene scaffold
#

This channel is for talking about data science and AI. You're welcome to participate, but don't just say "hi" or anything like that. Say something about the topic that can contribute to meaningful@warm flame @crude escarp @marsh iron @hot otter conversation. @mystic heron

hexed maple
#

do we know of any Time-Series adjusted Random Forests or Neural Networks?

serene scaffold
hexed maple
#

i need to estimate some nuisance functions in the DML framework, but i need time series adjusted methods

earnest light
#

hello i'm sheiza,nice to see you guys

serene scaffold
hot otter
serene scaffold
#

Please stop just writing greetings without saying anything about data science or AI. These messages will be treated as intentional spam!

frosty mountain
#

in pandas, how do you set a negative number to NaN

agile cobalt
#

usually you'd just use numpy for that, np.where(series < 0, np.nan, series)
(series being a pandas series)

#

series[series < 0] = float('nan') also works but I'd recommend against using in-place operations if you can avoid it

frosty mountain
#

Cheers

trail mist
#

hey guys please let me out from this error

latent heath
serene scaffold
#

!code

arctic wedgeBOT
#
Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

serene scaffold
latent heath
#

I cannot confirm that jupyter is working as intended, and if it is how to work with it in pycharm.

serene scaffold
#

are you still there, @latent heath?

latent heath
#

I am, sorry.

#

I was walking, I need a moment to get set up.

#

That's the new project screen, and the options for enviroments I have.

agile cobalt
latent heath
#

I normally just use the first. I'm in an ai course and the prof has given us a .ipynb to work with, so I'm just going through setting it up.

agile cobalt
#

oh you meant the templates?

serene scaffold
latent heath
#

Conda isn't required, but the rest of the instruction is on that file.

serene scaffold
#

what do you mean "that file"?

agile cobalt
#

does it shows anything if you just try to open the jupyter notebook in it?

latent heath
serene scaffold
latent heath
urban vine
serene scaffold
urban vine
latent heath
#

And it does just open as it should. Welp. One of the dumber mistakes I've made. It says community edition only supports read only, but I should just be able to do it in any browser?

serene scaffold
urban vine
serene scaffold
urban vine
#

Like I believe it isn't just the certificate itself that guarantees the job, but also the knowledge I gained/retained while working toward that certificate

latent heath
serene scaffold
urban vine
#

So a degree is still needed, huh?

#

So like a bachelors degree or masters I suppose

serene scaffold
urban vine
#

Well, if I were to get a masters, I must not have a social life if possible

#

Well, I should try not to socialize with anyone outside of my career interest

serene scaffold
#

uh, what?

latent heath
#

Why's that? I'm only in my undergrad rn, but I'm considering a masters in some area of discrete math, but generally wouldn't it be better to have collegues with varied backgrounds?

urban vine
latent heath
#

I mean, if you're doing data science and ml, who are you doing it for? Like somewhere along the way you're gonna encounter people in different fields and have to work with them.

urban vine
#

I see

latent heath
#

Also, just make friends with cool people? I can't speak on the purely utilitarian aspect of how you pick your friends, but I don't see a reason to just aim for people who want the exact same career as you. You'll find those anyway.

urban vine
#

I see. So social skills are important?

latent heath
#

Correct.

#

If you want it for utilitarian purposes, modern science and professions are rarely solo or signle discipline endevours. You are a social being. Be social.

urban vine
#

I will try

latent heath
#

๐Ÿ‘

serene scaffold
#

there's more to life than your career. it's a good thing to have varied interests and to have friends who share those interests

urban vine
#

So I shouldn't try and graduate as fast as possible?

serene scaffold
urban vine
serene scaffold
urban vine
#

Alright. Once I get the finances needed for my degree, I will go the four years

serene scaffold
urban vine
serene scaffold
# urban vine The US.

so it's pretty much impossible to pay for a degree up-front. when you say "get the finances you need", what are you talking about?

latent heath
latent heath
fallow yacht
#

has anyone got access to WRDS CRSP data via an institution subscription and would be willing to share the AAPL series?

fallow yacht
#

I'd like to compare the outpuit of this:

crsp_monthly_query = (
  "SELECT msf.permno, date_trunc('month', msf.mthcaldt)::date AS date, "
         "msf.mthret AS ret, msf.shrout, msf.mthprc AS altprc, "
         "ssih.primaryexch, ssih.siccd "
    "FROM crsp.msf_v2 AS msf "
    "INNER JOIN crsp.stksecurityinfohist AS ssih "
    "ON msf.permno = ssih.permno AND "
       "ssih.secinfostartdt <= msf.mthcaldt AND "
       "msf.mthcaldt <= ssih.secinfoenddt "
   f"WHERE msf.mthcaldt BETWEEN '{start_date}' AND '{end_date}' "
          "AND ssih.sharetype = 'NS' "
          "AND ssih.securitytype = 'EQTY' "  
          "AND ssih.securitysubtype = 'COM' " 
          "AND ssih.usincflg = 'Y' " 
          "AND ssih.issuertype in ('ACOR', 'CORP') " 
          "AND ssih.primaryexch in ('N', 'A', 'Q') "
          "AND ssih.conditionaltype in ('RW', 'NW') "
          "AND ssih.tradingstatusflg = 'A'"
)

to the yahoo data for AAPL, so see where the discrepency arises

opaque condor
#

How can I train a multi-model?
What going take catastrophic forgetting even if it has a large data set?

bright crypt
#

I am trying to build a movie recommendation system, and i don't have much knowledge about RecSys apart from the basics of SVD and came across criticker , it looks like a good interface and close to what i want to do, are there any specific resources that will come in handy or any tips to start with the project will be highly appreciated..

viscid urchin
#

Hey folks, is anybody willing to do a neutral evalution of a Data Science B.S. degree program I am looking at? I have some personal biases here that I would like to calibrate out.

If so: https://datascience.fsu.edu/students/combined-pathways
Specifically, what you get when you click on BS in Computer Science (BS-CS to MS-IDS)

#

The program director is an old friend of mine, and I can't really expect myself to not have some rose tint when I review his choices etc.

mossy blaze
north thistle
#

Im here to learn data science and AI. I'm a biomedical engineering student

viscid urchin
#

(re: the above, feel free to @ me if you end up taking a look, many thanks.)

agile cobalt
viscid urchin
vague heron
#

I'm working on a project where I integrate all the standard stuff I think should be in any Pytorch project: MLFlow, Optuna, seperation between settings and logic using config files, cross validation, and making the core training script as generic as possible while supporting multiple model repo's like huggingface,ollama,monai. Are there any other projects that attempt writing a similar unified "template" code?

teal gate
#

Hello I want to start to do Machine learning and AI can anyone tell me how i do it im kinda a begginer in python

teal gate
#

Nice

spring field
#

beginner* ๐Ÿ™‚

main citrus
waxen kindle
delicate trench
#

i need to learn, alot of my 12th marks depends on it and i hope to make a career in AI engineering which requires atleast basic python knowledge

waxen kindle
delicate trench
waxen kindle
#

yeah we don't do that here

#

we have tons of good resources however

delicate trench
waxen kindle
#

!res

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

delicate trench
waxen kindle
#

what ?

delicate trench
#

i am serious about this python stuff dude.

waxen kindle
#

if you wanna learn, check the resources and ask when you have specific questions

#

but we don't do teaching/tutoring here

delicate trench
waxen kindle
#

it kinda falls into the rule 9

#

!rule 9

arctic wedgeBOT
#

9. Do not offer or ask for paid work of any kind.

waxen kindle
#

(and if you plan to have someone doing it for free, you'll simply don't find anyone)

delicate trench
#

wdym? when did i offer a payment?

agile cobalt
delicate trench
#

do you think, i am son of jeff bezos?

waxen kindle
#

No one will spend hours for free to teach you

worldly kelp
#

@delicate trench i have a tutor for you... https://www.youtube.com

agile cobalt
waxen kindle
#

Why would I spend hours doing it for free when I could be paid by someone to do so

delicate trench
delicate trench
#

Thats your decision and not mine to make

#

but dont try to force others

worldly kelp
delicate trench
waxen kindle
#

Yeah really, I never saw someone accept such a thing here, everyone always share the resource page

#

bc that's how developers learn

delicate trench
#

Yeah dude, but like chill

#

do ur own thing

worldly kelp
#

you're entitled expecting people to give you their time and effort for free to teach you things you can very easily teach yourself

waxen kindle
#

tbh if you wanna go into dev and Data science, you'll need to learn to use resources

#

better starting now

delicate trench
#

I only listen to parents, God, teachers, and then whoever i want to listen to

#

ok?

#

And surely, i wont be paying a money to anyone. So no server rules are being broken

#

so chill out, and dont play the games with me

waxen kindle
#

With your attitude, I would be very unlucky to have you as a student

delicate trench
#

the ego is insane

waxen kindle
#

It's not about rights and will, it's about people telling you it's gonna happen

worldly kelp
#

<@&831776746206265384> can we perhaps get someone to tone this guys attitude down a bit, fresh addition to the server and already being combative/rude

waxen kindle
#

I mean, we can simply stop talking and wait that hopefully someone come and accept, but you'll better start using the resources we gave you or you'll never learn anything

#

bc noone is coming to teach

#

realistically, noone will

delicate trench
delicate trench
#

Thats the best option

serene scaffold
#

!shh

arctic wedgeBOT
#

โœ… silenced current channel for 4 minute(s).

serene scaffold
#

I need a few minutes to get caught up

#

@delicate trench in all my years here, I've never seen anyone commit to an ongoing mentor-student relationship with another user. if someone wants to do that (for free), they absolutely can, but that's so unlikely to happen that the best way to learn and get help is to use self-guided resources and ask specific questions in this server when you have them. there are lots of people here who are excited to answer one-off questions.

arctic wedgeBOT
#

โœ… unsilenced current channel.

serene scaffold
#

we can now put that to bed

worldly kelp
#

thankyou good sir ๐Ÿ™‚

delicate trench
#

They think they are slick man.

serene scaffold
#

yeah, I said we're done talking about that, so they will.

delicate trench
#

alright thanks man!

#

i will use resource, but i still will continue search the master

serene scaffold
#

there is no master

worldly kelp
delicate trench
serene scaffold
#

especially in AI. everyone is running around acting ike they know what they're doing, but everyone is trying to figure out what's going on

delicate trench
#

!res

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

tulip drift
#

assalamualaikum guys
I'm learning AI but the problem is machine compatibility can someone share information for it except Cloud Computing Paid virtual machines or any method

coral hollow
#

How hard would it be to train a model to convert speech to text?

jagged jasper
coral hollow
jagged jasper
#

Good question, let me look.

coral hollow
#

Sorry, I am new to the space and I am asking as a complete beginner

#

I need to find out if this would make sense for me to do

jagged jasper
#

It only has 1.54B params

coral hollow
#

whisper should not be free for comercial use IIRC

jagged jasper
#

Ohh that's right...

coral hollow
#

Wait could be free for commercial use

jagged jasper
#

No, it is not.

#

MIT license!

coral hollow
#

Nice

jagged jasper
#

And it only has 800M params

coral hollow
#

However does it make sense to train the model?

#

I don't want it to think I am saying the wrong words

jagged jasper
#

You could if you wanted, but it would be a lot of work.

coral hollow
#

Hm..

#

It does not even have to understand a lot, it is just supposed to convert speech to the correct letters. If it sounds right its already enough for me

jagged jasper
agile cobalt
coral hollow
#

it would be fine if he thinks: "apple" is "abble" but not fine if it thinks its "train"

jagged jasper
#

@coral hollow Do you want me to write a script for whisper v3 large for you?

coral hollow
#

just wondering about the expected accuracy

jagged jasper
#

I'm not sure, but probably pretty good.

#

I'm testing it now

agile cobalt
coral hollow
#

Essentially what I need is whatever google is using to convert spoken words to text, like the small microphone button to talk

#

which then just converts whatever language is spoken to letters

jagged jasper
coral hollow
#

i wonder how they are doing it

coral hollow
#

text

jagged jasper
#

Ok, I've made a small script:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-small"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

result = pipe(["audio_1.mp3"], batch_size=1)
print(result["text"])
#

Make sure there is a mp3 file labeled 'audio_1.mp3' in the same directory.

coral hollow
#

very cool

#

how well does it perform?

jagged jasper
jagged jasper
#

@coral hollow It looks like it has an accuracy of 'up to 99% in some cases'!

coral hollow
jagged jasper
#

yeah ik

#

'99%'

#

ugh now I need ffmpeg, one minute.

coral hollow
#

I'd test it with music in the background, speaking quitly and like somone who dropped out of school

#

Then we'll see how good it really is

jagged jasper
coral hollow
#

sure

#

Try talking while eating

#

At this point its trolling the ai though

#

I am not at home, so I can't test it myself rn

#

@jagged jasper did you test it?

jagged jasper
#

@coral hollow I'm having a problem with ffmpeg right now; I'm not on my main computer. You can try it yourself; it's not a big download.

#

You'll need to get a particular version of ffmpeg, ffmpeg 7 I think.

coral hollow
#

ok

jagged jasper
#

@coral hollow I've tested it, and it seems really good!

#

I recorded a few clips with a poor mic and it translated perfectly.

#

One small mistake I made, you need to make this change to the definition of model:

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
#

Anyway, this definitely seems like a good idea for your project.

tulip drift
woven prairie
#

I want to learn RAG I have seen a few tutorials on YouTube but they uses , langchain mostly , is there any resources where they make a RAG FROM scratch, especially Retriever part.

west wing
#

Any good documentation for statistics used in data science

meager iron
#

Where can i find MIMIC-III and MIMIC-IV datasets?

#

Need these datasets for NLP model.

gusty iris
#

Hey im looking for recommendations on the best LLM for generating ML code, specifically for a computer vision task. My goal is to train a facial expression recognition model that beats an old paper's accuracy by at least 1%.

I'm a novice and initially used DeepSeek Coder R1, which performed well but didn't meet the target accuracy. Are there any other powerful LLMs you guys can suggest? Im currently torn between Claude $20 a month, Expanse Ai or Open router.

woven prairie
mystic heron
#

yo guys

#

I need help with RL stuff.

serene scaffold
mystic heron
#

My bad bro

dawn wyvern
#

hloo

steel jetty
#

does anyone know a good website where i can download datasets? Im working on a homework where i need to find a real-world data set and create a plot to display it

steel jetty
agile cobalt
#

you can also find a bunch in Hugging Face, https://datasetsearch.research.google.com/, government websites, and random places around the web though

edit; also public data in Google BigQuery though that is a bit of a pain to work with

steel jetty
agile cobalt
#

overall just remember to check the size, scope and license of any dataset before you download it

you don't want to try to download something larger than your computer's available storage space by mistake,
you probably don't want something that only covers things you are unfamiliar with,
and some datasets require attribution (and while not applicable for this, may also restrict commercial usage and redistribution)

pine prism
#

@small wedge so how would i add the weitghts first so i have a better understanding on how they can influence my ants

#

i know that the weights multiplied do something

small wedge
#

that's up to you as the person designing the sim, if you want to do it with a neural network like you're describing then you need to do two things, first is decide how to turn your inputs the ants will get as a vector of numbers, and then decide how you want to interpret the output of the nn, which will also be a vector of numbers

#

for example one of my projects taught some ai's to aim at a moving target, the numbers they were given as input were the position and velocity of the target, their outputs were the x,y coordinate to aim their shots at

pine prism
small wedge
#

yes

#

then you score the different ants (assuming you want them all to be agents and do a genetic sim), the best ones survive and cross over then their children mutate a bit

#

neat part about this is you can kind of avoid all the math of gradient descent etc that you would need in a policy gradient method like deep q learning or ppo

pine prism
#

should i store their decisions somewhere in the program so that the ants remember previous decisions so they know how to work next

small wedge
#

nah, there are algorithms that do that like q-learning where you score actions and then your agent basically picks the best actions based on their q-scores, but that would be seperate from this

#

another simple way of doing this that doesn't require a neural network at all is like a string genome, you could represent actions as just plain letters like L for left and R for right, you can run the same process here with choosing the best and crossing over their brains without any sort of actual weights

#

there are an infinite number of ways to do it really, you could add as many extra things as you want

pine prism
#

all the program did was store each response or prompt i gave it to a list named memory and it output responses based on what was in its memory

gusty iris
#

Can anyone recommend me between claude opus and gpt-5 in improving competent ML code

bronze wyvern
#

Hello, can someone explain what moving average is, how is it calculated and how it differs from "normal" average pls. I know it's a maths stuff but I don't have that knowledge, would really appreciate if someone can explain.

I did google but all I'm seeing is application of it, like for forecasting etc but they aren't explaining why it's used there, what is it's benefit and why not use just normal average

In real word analysis, what can moving average demonstrate to us that normal average can't pls

Also, if we need to plot the moving average on a graph, this average is for multiple years, so which year do we choose? I read that's it's the middle year which is chosen, would really appreciate if someone can explain why

left tartan
#

But, take global warming, for instance. Is the average temperature of the earth over past 1000 years useful?

#

You could chop the period into fixed intervals chunks, and compute 100 year averages, sure

#

That would produce a discontinuous graph, almost appearing like the average jumps around every 100 years

#

A rolling average would produce a more intuitive view of the changing temperature: showing how the average is changing over time

left tartan
#

There's also ways to 'weight' the more recent averages higher, so a weighted average but one where the older events are given less significance (Google 'EWMA')

bronze wyvern
#

yepp, I see, question though, when we plot the graph of moving average, since we are using multiple years, how do we choose which year correspond to the computed average?

#

I mean, we just take the middle year?

left tartan
#

For example, a 3 day moving average for past three days would be wed-fri for Friday, thur to sat for sat, and fri-sun for Sunday

bronze wyvern
#

yepp I see

#

Thanks !

frosty mountain
#

how do you mask out noise before calculating the silhouette score for DBSCAN?

calm cipher
#

some of them disclose they're synthetic, but a lot of them don't, and you won't really know anything is wrong until you try to do anything useful with the data

calm cipher
cursive chasm
#

is he on the right path?

agile cobalt
# cursive chasm

did you open that course to check its description before asking?
It has that on the linked page

#

and yes, I'd also recommend that course

autumn perch
#

I built Data-Cent because I often need to explore CSVs quickly without firing up pandas or writing custom code. Itโ€™s a Streamlit-based web app where you can: โ€ข Upload CSV files (no setup) โ€ข Auto-filter and explore the data โ€ข Create interactive charts (line, bar, scatter, etc.) โ€ข Run quick stats (mean, median, std) โ€ข Download a PDF/HTML report of your analysis
Live demo: https://data-cent.streamlit.app/ Source: https://github.com/data-centt/Data-Analytics

Would love feedback on performance and UI/UX โ€” especially from folks who explore data often or build Streamlit apps.

If you find it interesting please help me star the repo. TY

Data-Cent is an interactive data visualization and management app built using Streamlit. It allow...

GitHub

Contribute to data-centt/Data-Analytics development by creating an account on GitHub.

crude hedge
#

Normal average is a calculation which is not getting updated

pine prism
#

why did no one tell me how hard ML actually is i thought i could do this without other libraries like pytorch or numpy

#

im starting to get it now but what was i thinking last night where i could make a machine learning project by myself

#

also using cursor code editor

serene scaffold
iron basalt
#

That is in addition to the mathematical knowledge needed and then specific ML knowledge.

#

Cursor can't do that for you, it will only accelerate you if you already know what you are doing (almost all time is spent debugging, and you can't do that without understanding it all).

pine prism
# serene scaffold ML jobs pay well precisely because they're difficult to train for

yeah, i hope they do because so far most of my time has been spent making prototypes simulations without RL/ML not to mention the notetaking and document reading but i'm slowly understanding this more because the formulas are surprisingly easy to read
the hard part so far which is what i didnt expect to be hard is make the program make decisions on its own first before adding the machine learning aspect to it but other than that im making decently good progress in numpy its just a matter of can i understand pytorch libraries and documents

#

my first project is training a ant colony to maintain a good healthy state over time by making good decisions

serene scaffold
pine prism
gusty iris
#

i just asked Claude Opus to generate ML code and bro costed $2.5 for the single prompt

serene scaffold
gusty iris
#

500 input and around 6000 output

spice tartan
#

Anyone knows a way to make f strings format as normal string in newer versions of jupyterlab?

#

I don't like that colour

agile cobalt
# gusty iris 500 input and around 6000 output

!e unless you meant R$2.5 or some other currency that sounds off```py
from decimal import Decimal
input_cost = Decimal("15") / Decimal("1_000_000") # USD per Million tokens
output_cost = Decimal("75") / Decimal("1_000_000") # USD per Million tokens
cost = 500 * input_cost + 6000 * output_cost
print(cost)

arctic wedgeBOT
agile cobalt
#

but yeah Claude is ridiculously expensive

agile cobalt
pallid badge
#

Hi
Could I ask for some input, please?
How could one develop a AI tool that shows me gaps or trends, for example with cooking recipes. Let's assume there are databases with public API and no API (this would mean webscaping)
Now I would like to aggregate data in a structured way, I could query the data bases (or maybe later web scraping).
But then what is next? Maybe I want to find a trend in pasta recipes, are currently ingredients more popular then others ?
My question is if I have the data , I would need first to develop rules when something is popular, missing, trending? Am I right?

agile cobalt
#

you'd need of some structured way of determining what each recipe covers, then you can create some simple models to identify what "normal" looks like for each ingredient and look for outliers (values significantly above or under the normal)

pallid badge
#

Hi etrotta, thank you for the reply. For example, cooking utentils,number of ingredients, preparation time, type of ingredients maybe ?
When I thought about it, I arrived to the conclusion that I would to structure my data.

#

My introductions to ML showed me often the IRIS dataset, several properties, and finally a label for y. Based on those properties it was possible to sort the petals.
But with the recipes, the story is different? I don't have this "y" parameter.

agile cobalt
#

there are a lot of different 'tasks', I'd guess that most of what you have seen falls under supervised learning like regression and classification, but there are also a lot of techniques for unsupervised learning, in which you don't have clear labels

take a look at https://scikit-learn.org/stable/unsupervised_learning.html - specially https://scikit-learn.org/stable/modules/outlier_detection.html

rich walrus
#

a clause is a database query that is a command to get something out of a database?

eager lance
#

is brocode's pandas 1h video solid?

slender crown
#

Guys i got a question, i'm currently 15 and interested in ML. I know the math behind ML algos, Neural Networks and more. And working on personal projects. And i'm using Python for that, but got a question. On university, are they only going to teach math behind this? Or also teach libraries like Pytorch? Also if i'm graduated from university, is it easy to find a job in this field?

waxen kindle
#

It depends what courses you follow, but you will probably learn how to use the libraries too

#

Any decent DL course will explain how to use pytorch or tensorflow

#

also bc practicing things is part of the learning process

slender crown
#

Wanted to start 4 years ago but my math couldn't handle it.

waxen kindle
#

For the job, it's hard to answer, as it's hard to predict what the job market will be once you graduate

#

in like, 6-7 years....

#

Today I would not call it easy, because you need to have good grades and show a strong interest, but the job market is (for now) quite open in this field, at least it's what I feel, where I live. That will depend on where you live too

slender crown
waxen kindle
#

I you think you can't pass the exams in your country, idk what to suggest. Believe in yourself, if you are interested enough, and know how to study right, you'll get it !

#

If it doesn't work for you, studying abroad is also a great opportunity

#

there is a lot of pros and cons for all decisions, at the end it's for you to make them

#

If the exams are hard and you succeed where most people fail, you won't have any issue finding a job

agile cobalt
eager lance
#

is brocode's pandas 1h video solid?

agile cobalt
bronze wyvern
#

Hello, can someone explain how image processing works in general pls.

I need to answer these questions using pullow in python:

b.    Swap Red and Blue โ†’ how does the image change?
c.    Extract the Green channel and compute its average value.
d.    Convert image to grayscale by averaging R, G, B.
e.    Image cropping โ€“ cut out the center 100ร—100 region.
f.    Blurring โ€“ apply Gaussian blur.

But I first wanted to understand the theoretical aspect of how images are processed. I know that images are sequences of bits and are made up using multi-dimensional matrix/vectors.

I know we need to use libraries like numpy so that we can upload the image to be processed.

First question, when we upload the image into that array, do we have pixels to work with?

I know images are made of 3 colors, RGB, how do they work?
Like if I need to swap red with blue, what's the idea behind that, convey all bits holding blue into red?

The colors have an average value, what does that mean pls

spring field
# bronze wyvern Hello, can someone explain how image processing works in general pls. I need to...

First question, when we upload the image into that array, do we have pixels to work with?
Basically yes, you get either a 2D (mapped/palettized values or grayscale) or a 3D (RGB(A) values) array where the innermost dimension typically represents a particular pixel's color

I know images are made of 3 colors, RGB, how do they work?
You can think of them as color components, you have a bit of red, a bit of green, a bit of blue and when you mix them together you get a new color (and the value of a component tells you how much it contributes to the resulting color)

Like if I need to swap red with blue, what's the idea behind that, convey all bits holding blue into red?
with swapping you'd essentially write the original value of the red pixels to the blue pixels and then write the original value of the blue pixels to the red pixels, as in, overwrite those values with the original values of the color you're swapping with
if you work with an array interface, you'd essentially just extract all values of a particular color channel and then inject them into the other color channel, though there might be a method with pillow that already abstracts this away from you

The colors have an average value, what does that mean pls
In the context of grayscale, you take any single pixel and calculate the average value of its 3 components (RGB), just an arithmetic mean, for example if the pixel's value is [128, 64, 120], you get (128 + 64 + 120) / 3 = 104, so you just replace the pixel's value with [104, 104, 104]
in the context of blurring, you take the average of each color channel for all pixels in a certain area around your center pixel and then replace all of those pixels' color channel values with that single average for that channel (and this is a weighted average in the case of something like a gaussian blur)

spice tartan
#

Enough to get the ball rolling

#

There's Udemy courses for more in-depth

#

Or more on youtube

eager lance
spice tartan
#

And just watching a couple of minutes of these won't help u know about all the other cool functions that exist in pandas

wheat umbra
#

What is the common practice for pushing data-manipulation jupyter notebooks to github? Do you just push it as is or do you convert it to a python script first ? I have had some weird problems when pulling an .ipynb from a github repo.

grand minnow
#

what kind of problem are you getting when pulling one?

wheat umbra
#

Sometimes im having issues with the cells loading. Some take very long to appear properly. Tried this with multiple IDEs.

grand minnow
#

how complex is it? will the cells load if its something as simple as print("hello world")?

wheat umbra
#

I'm basically working with local datasets via pandas an NumPy. Also in the github repository the cell-outputs are cleaned so it does not automatically load the outputs when i pull the notebook.

left tartan
wheat umbra
#

im loading a dataset from a json file, normalizing it, and building a relational scheme. In another Cell im using the featuretools library to extract custom features via dfs from my relational data scheme. With "long to appear properly" i mean in VSCode for example the notebook is completely blank after pulling from the repo and the cells only appear one by one very slowly. Simmilarily to when youre loading a web page with a really bad internet connection. That's why i was wondering if it's even common practice to push jupyter notebooks to github instead of converting it to a python script, which fixes all these issues with a loss of control ofc.

agile cobalt
left tartan
#

I use a precommit to do this, so I don't forget

wheat umbra
#

yeah in my case the usage of this is more like a script. Im modifying a json file, building a new dataframe with more complex features and then saving the "clean" data to a parquet file. The actual use-case for it would be to just run it once on a raw-data-lake to convert it into a clean dataset. The notebook format is more like a debugging thing to inspect dataframes etc.. so i guess im going to convert this into a python script for the final version of my project.

left tartan
#

I'd just be curious what step is slow though, it could be that it's loading js assets for rendering/etc. Are you opening it in Jupyter directly or via vscode? I usually open my notebooks in vscode

wheat umbra
#

I open it in vscode

lapis sequoia
#

I couldn't find 1 GPU of H100/A100 on AWS, only the 8 GPUs of it so does anyone know an alternative I could use?

eager lance
bronze wyvern
#

also one thing :c, this is more of a general question, why would we represent images using multidimensional arrays, like 2D arrays? What flexibility does it gives us? Is it because of the row x col structure? (if so, what is special with that)?

night cove
#

Hey everyone, I need some help with running an older ML project called ECINN (Electrochemical-Inspired Neural Network).

Iโ€™ve been trying to run the example code (main.py) for Fe ion detection, but I keep running into compatibility issues with TensorFlow, Keras, and Python versions.

Hereโ€™s what Iโ€™ve tried so far:

  • Environment: WinPython 3.12.4.1

  • TensorFlow version: 2.20.0

  • Keras: the one bundled with TF

  • The codebase was originally written for TensorFlow 2.3.0 (2020 era).

  • On Windows, I keep hitting errors like:

    • TypeError: unsupported format string passed to list.__format__ (fixed manually)
    • ValueError: by_name only supports loading legacy '.h5'
    • DLL load failures for TensorFlow on WinPython
    • Pandas: "Invalid file path or buffer object type: <class 'list'>"

I even tried Colab, but it doesnโ€™t support TF 2.3.0 anymore (only โ‰ฅ2.16).

Question: Whatโ€™s the best way to get ECINN running in 2025? Should I:

  1. Use Docker with an old TF 2.3.0 image?
  2. Patch the code fully for TF 2.20.0 (new Keras saving/loading API, etc.)?
  3. Or is there a smarter way to emulate the old environment?

Ultimately, I just want to run the Fe ion example (ECINN-BV for Fe Ion on GCE) and get the trained weights + plots it should output.

Any advice or working setup instructions would be amazing ๐Ÿ™

https://zenodo.org/records/10246052

agile cobalt
#

docker with the original version is probably your best bet as far as compatibility goes

night cove
#

I have tried everything else except that
i will do that and see if it works
if it doesnt ill probably have sit and make the whole thing again

vivid nimbus
#

if anybody wants to work on modeling the economy within hypixel skyblock (minecraft), please dm me.

bronze wyvern
#

Hi, can someone explain how pre-processing techniques like gaussian blur and grayscale make images reduce "noise" pls

vague heron
#

Things like Gaussian blur filter/dampen out large variations (for example neighboring pixels with very different grayscale values

#

These large variations often relate to noise, but of course some of it is part of the image so it comes out blurry if you use it too aggressively.

iron basalt
spice tartan
#

There's some extension or something I remember in GitHub that make notebooks look cleaner and shows output clearly with diffs too.

lean seal
#

yo anyone learned bayesian networks Probabilistic graphical modelling

#

I am about to do research with my professor about it and i started to learn a bit but i feel like i am not yet comfortable w the math side of it the probability and side of it is just so weird any advice

vague heron
#

I hear there is a good course on Coursera about it

lean seal
#

yea the stanford one

#

i am a sophomore and it's a graduate course...

vague heron
lean seal
#

yea this is the one i am doing right now

#

when it comes to the tests i fail it

#

i feel like it doesn't help much and it just expects us to already be familiar with it

vague heron
#

I see ๐Ÿ™

wooden hill
#

Good evening fellas

viscid urchin
bronze wyvern
spring field
bronze wyvern
spring field
#

well, that's similar to taking the average of all channels to convert it to grayscale(ish), but you only used the value of one of the color channels (red in this case)

#

like you went from some pixel value like [128, 64, 120] to [128, 128, 128] instead of [128, 0, 0] (or [128, 255, 255])

bronze wyvern
#

when I only use r, like consider this:


r,g,b = cat_img.split()

Normaly, we have 3 instance of an image? Each r,g and b would each have 3 channels? Displaying r will show the average intensities across each channel? Like [128,128,128]?

vague heron
#

You can check this by comparing the first few pixels of the original with your modified 'red' one. Then it will become clear what happens.

mellow vector
#

Can someone say how I'd collect column headers from polars? atm I'm using column_list = list(headers_lf.collect_schema().names())

#

collect_schema is throwing a bunch of warnings at me

agile cobalt
pallid badge
#

Could I ask you if you heard about SPECTER2? https://huggingface.co/allenai/specter2
To the best of my understanding, this is an encoder for scientific text. Are there maybe better ones out there?
I would like to fine-tune this on a scientific domain.
What I have not yet full understand, I couple this with a LLM like LLama and somehow I can query with this help on my embeddings?
And how would I couple this with scikitlearn functionality, e.g. clustering?

twilit geode
#

I know, i know this is python based discussioons but is there a server to discuss how ot get into ai? Besides just youtubing it and being a drift on what is right and wrong approach? Tutorial hell. T-T, is that a better place toi ask this? JUST to start bc idk wnywhere to start with it, and I guess learn to use it, no mak stuff i guess..yet.

calm cipher
spice mason
#

hi

fallen basin
warm flame
#

heeey

lost beacon
#

Hi Hugging face transformers mutex.cc lock error . Has anyone faced this ?

twilit geode
#

Iโ€™m still absolute beginner & still learning Python, my dad was like learn ai. Which again broad af. But I did like the concept of data analyst when I picked this up years ago, just dunno how I could use ai to help with that.

mellow vector
agile cobalt
# mellow vector I opted to use `readline()`, though I have no idea how it compares, I read that ...

collect_schema may or may not need to load some data and execute parts of the query depending on the query

for example, ```py

import polars as pl
lz = pl.LazyFrame({'x': [1, 2, 3]})
unknown_schema = pl.col('x').map_elements(print)
informed_schema = pl.col('x').map_elements(print, return_dtype=pl.Int64)

No need to run any parts of the query (it can determine without running the query itself)

lz.select(informed_schema).collect_schema()
Schema({'x': Int64})

It needs to run it for a part of the query to know what the final schema will be (unknown return_dtype for map_elements)

lz.select(unknown_schema).collect_schema()
1
1
Schema({'x': Int64})

(not sure why it is printing 1 twice though)
mellow vector
#

It worked with collect_schema().names() which now that I think about it, worked when inferring the datatypes failed before, so it must not be terribly expensive (vertically)

bronze wyvern
# spring field no, RGB are the 3 channels

yeah I see, when I display only the image with channel r, this mean channel g and b have a bunch of 0s? Didn't understand why we have the gray image and not the red though, what is the maths here pls

crisp edge
#

Chat, I'm fed up of web development it's boring just designing and making it real. Resources that I have learned gives me edge fir machine learning and AI like python ..... So can anyone provide me a roadmap or structured plan on how to become a ML engineer to land a job at MAANG companies????

serene scaffold
crisp edge
twilit geode
#

Are there some video/course to help with general knowledge how to just get started?

small wedge
#

Assuming you want ml

#

If you're just looking for like the most basic surface level intro

#

Id recommend the 3 blue 1 brown series on neural networks

astral sun
#

I have a project name NCl or can be called: SSC ๐Ÿ™‚ I'm working on its parser/lexer:

arctic wedgeBOT
mellow vector
#

So I'm at a fork, altair or plotly? Until now I've used mostly matplotlib and I hate it.

#

Tempted to just flip a coin.

#

1000 times (naturally)

thorny umbra
#

Hello everyone, im a bs data science student, i just completed a 12 hours python course and learnt basic stuff and also did some basic projects as well. now i just want to ask what should be the next thing to work on for me. related to data science.

sweet verge
#

Pytorch or TenserFlow?

serene scaffold
sweet verge
#

I needed that..

agile cobalt
jovial ravine
#

i wanna make a AI chatbot with python using torch library

serene scaffold
#

A classifier would be more approachable to start. By orders of magnitude

jovial ravine
#

alright

viscid urchin
lapis sequoia
bronze wyvern
#

Hi, can someone suggest where I can get an image data set containing at least six of the following office items pls:

chair, bin, mug, bottle, book, keyboard, mouse, stapler, notebook, phone

viscid urchin
lapis sequoia
viscid urchin
#

Never run into it but cool.

viscid urchin
#

Iโ€™m aiming to do their โ€œInterdisciplinary Data Scienceโ€

merry oak
#

!rule 6 | We're not a job board. Your message has been removed.

arctic wedgeBOT
#

6. Do not post unapproved advertising.

lucid elbow
arctic wedgeBOT
gritty vessel
#

Hey I have a doubt what's the difference between training model for 200 epochs and training model for 100 epoch and then fine-tuning it with same data for 100epochs

serene scaffold
#

"fine tuning" is just "more training, possibly for a different task"

gritty vessel
#

Got it

#

So like optimizer momentum, learning rate schedule will be lost right in case of fine tuning

#

As we are starting again ?

#

When compared to going for 200 epochs on one go or saving all this info while saving the best model

#

In these cases both will be same?

serene scaffold
#

what I'm getting at is that "fine tuning" isn't a fundamentally different thing from training

waxen kindle
#

Usually you finetune something that has been trained for a different task

#

Or with different data

waxen kindle
opaque condor
#

What would the architecture for a multi model look like?

serene scaffold
opaque condor
#

A multiple model that can generate text and object detection

agile cobalt
# opaque condor A multiple model that can generate text and object detection

it varies, if by text you mean arbitrary LLM-like text messages, at one extreme you could have a ""normal"" multimodal llm trained to do object detection via tool calling, representing the detection as normal text formatted as JSON

another case could be having a shared base model, then one head that predicts the text and another head that predicts the bounding box for the object
(this second case making more sense for classification with fixed text labels)

autumn perch
knotty dagger
#

Hi, i am new to ML and from non-tech bg. I have a doubt. When working with outliers and resampling , do we work with the entire dataset or just training data

serene scaffold
void cape
#

Im planning to buy ISLR I only know python but should I buy the R or python version?
Some say to buy the R version while you build it in python so you can also pick up R comprehension along the way.

agile cobalt
sly isle
#

Does someone use GitHub Education?

serene scaffold
sly isle
bronze wyvern
#

Hello, quick question, I saw the word epoch quite frequently when we talk about training, what is that?

serene scaffold
#

usually when you train, you let the model train on each instance in the training set once.
every time you do that, that's an epoch

bronze wyvern
#

oh ok, so let's imagine I have 1000 images. I need to train my model to classify those images, let's say between cats and dogs.

1 epoch means "looking" at the dataset only once? ML algorithm try to infer some features during that first pass but this 1 epoch might not be sufficient to deduce all underlying features, so we try to increase the number of epochs?

(But if more epochs means better accuracy, does that mean, it should be as big a possible? )

serene scaffold
bronze wyvern
#

ah ok, loss, loss function? Yeah heard that term, I know we use backpropagation and gradient descent to minimize the loss

serene scaffold
bronze wyvern
#

yeah exactly, at this point, we don't really need to do more training, we assume it's a compromise and that adding more epoch will just increase accuracy by only a very little amount?

serene scaffold
#

like, if your test set has 1,000 instances, a loss change of 0.00000001 probably won't influence the model's decision for any of those 1000.

bronze wyvern
#

yeah I see

#

a lower loss translates directly to better performance
like overfitting you mean?

serene scaffold
#

is this how politicians feel?

#

anyway, if a model performs poorly despite gradually decreasing loss, that would mean that the model overfit to the training data.

bronze wyvern
#

yep I see, thanks !

bronze wyvern
#

Hello quick question. Say someone understood the basics of how ML/DL works, like the theoretical concept but now this person needs to apply it. While the later knows the concept, he still needs to implement that through code.

So my question is, what is a correct approach here? How does that person decide which library/framework to use?

Say we pick a library/framework. Now, in order to understand, for e.g, how to implement an RNN in tensorflow, we would expect tensorflow documentation to talk about that?

serene scaffold
bronze wyvern
#

alright noted, thanks !

#

by the way is there a reason why pytorch is prefered over tensorflow?

serene scaffold
#

The community has coalesced around pytorch and no one uses tensorflow except the authors of outdated tutorials.

#

I've never seen a coworker use tensorflow a single time for anything

bronze wyvern
#

yep noted, thanks !

bronze wyvern
#

Hello, quick question, how do we know that a model we have trained is ready? Like it's not overfitted etc and we can actually use it to do real stuff?

agile cobalt
#

depends on the task, for many you'll want to keep track of some metrics like its accuracy in addition to the loss, then stop training a bit after it stops improving

#

for some cases it could never become good enough to do 'real stuff' depending on what it is, or you could need to retrain it a few times using different data & hyperparameters configurations

bronze wyvern
#

yep I see, question though

#

when I was doing a project for uni, the teacher said that we should split our data into 80% trainint and 20% testing I think. But I read recently that we have training, validation and testing set

#

I'm confused, validation and testing set are different things?

agile cobalt
#

with the 3 sets, you split some data that will only be used after your entire project is over - you never evaluate with it until right before you decide whenever or not to put it into production / publish your results

if you 'retrain it a few times using different data & hyperparameters configurations' too much, some configurations may be better on your test data by chance, similarly to over-fitting to the train data

the separation of test & validation data helps to avoid overoptimistic results which then fail in production

viscid urchin
#

I see validation as a way to check on progress, and testing as a way to check outcomes, lemme know if anyone thinks that is crazy.

#

Maybe it means something different in the data science context.

bronze wyvern
#

yeah I see, hmm I will read a bit on hyperparameter tuning and came back, but with the validation data set, this also is unseen, no?

#

ah it's used indirectly with hyperparameters tuning?

#

with the test set, we don't do anything with that, no hyperparameters tuning etc?

agile cobalt
#

I didn't specify which is which because I sometimes get confused and swap them derp

yeah, you only run the later a single time after it's done training, no more tuning after you get your score on it, ideally no selecting which model to use based on it, just "this is your expected score with real data" after picking the final model

untold bloom
#

course materials are the training set, past year exams are the validation set, to-be-taken exam is the test set

bronze wyvern
#

yep I see, thanks !

unkempt wigeon
worldly dawn
woven prairie
#

Has anyone worked with RAG base memory for a llm

#

Instead of maintaining the last 5-6 queries as conversation history we can use the Rag based approach for memory.

unkempt wigeon
worldly dawn
unkempt wigeon
#

The reward function and the agent it's self

worldly dawn
#

what about it?

unkempt wigeon
#

The very small reward and how does the AI use the joints of the model

bronze wyvern
#

Hello, quick question, why are histograms vital in image processing? For example say we are plotting frequency against pixel values, what can we infer?

If say we have different histograms with R,G,B colors, if we draw 3 bell curve on them, we can try to deduce the tendency which pixel is more dominant?

silk pendant
#
import matplotlib.pyplot as plt
import numpy as np

plt.rcParams['text.usetex'] = True

fig = plt.figure()

ax = fig.add_subplot(projection="3d")
ax.view_init(elev=-21, azim=153, roll=-79.5)
ax.set_box_aspect((1, 1, 1), zoom=0.95)

x, y, z = np.array([[-1,0,0],[0,-1,0],[0,0,-1]])
u, v, w = np.array([[1,0,0],[0,1,0],[0,0,1]])
ax.quiver(x,y,z,u,v,w,arrow_length_ratio=0.1, color="black", length=5)

ax.text(3.9, 0.1, 0, '$x$', size='x-large')
ax.text(0, 3.9, 0.1, '$y$', size='x-large')
ax.text(0, 0.1, 3.9, '$z$', size='x-large')

ax.plot([0, 1], [0, 2], [0, 3], marker='o')

ax.set_axis_off()

plt.savefig('Figure-4.svg', bbox_inches='tight')

plt.show()

Why does my code above produce arrows of different lengths?

long locust
#

That is what ax.quiver produces

#

Or do you mean the main axes themselves?

#

It is likely to do with the default projection and rotation

silk pendant
#

it does seem like the scale of each axes itself changes

#

even if I comment out my code for setting the default view angle, one of the axes is still noticeably longer than the rest

long locust
#

Hmmm.

#

Could be the way you are passing parameters to the quiver function

#

From the docs:

quiver([X, Y], U, V, [C], /, **kwargs)

silk pendant
#

I think that's for 2d

long locust
#

So for 3D I would guess [X, Y, Z]

silk pendant
long locust
#

The plot thickens

silk pendant
#

as if it wasn't thick enough already

#

judging by how the text placements relative to the arrows are correct, I'd say the scale of each axis is what's changing

#

so in other words, the actual space is warping

#

๐Ÿค”

long locust
#

If I turn off the rcParams it gets pretty close to what I think you want

silk pendant
#
import matplotlib.pyplot as plt
import numpy as np

# plt.rcParams['text.usetex'] = True

fig = plt.figure()

ax = fig.add_subplot(projection="3d")
# ax.view_init(elev=-21, azim=153, roll=-79.5)
ax.set_box_aspect((1, 1, 1), zoom=0.95)

x, y, z = np.array([[-1,0,0],[0,-1,0],[0,0,-1]])
u, v, w = np.array([[1,0,0],[0,1,0],[0,0,1]])
ax.quiver(x,y,z,u,v,w,arrow_length_ratio=0.1, color="black", length=5)

ax.text(3.9, 0.1, 0, '$x$', size='x-large')
ax.text(0, 3.9, 0.1, '$y$', size='x-large')
ax.text(0, 0.1, 3.9, '$z$', size='x-large')

ax.plot([0, 1], [0, 2], [0, 3], marker='o')

ax.set_axis_off()

plt.savefig('Figure-4.svg', bbox_inches='tight')

plt.show()
long locust
#

Does the saved figure look different from your shown figure?

silk pendant
#

the same

long locust
# long locust
import matplotlib.pyplot as plt
import numpy as np

# plt.rcParams['text.usetex'] = True

fig = plt.figure()

ax = fig.add_subplot(projection="3d")
ax.view_init(elev=-21, azim=153, roll=-79.5)
ax.set_box_aspect((1, 1, 1), zoom=0.95)

x, y, z = np.array([[-1,0,0],[0,-1,0],[0,0,-1]])
u, v, w = np.array([[1,0,0],[0,1,0],[0,0,1]])
ax.quiver(x,y,z,u,v,w,arrow_length_ratio=0.1, color="black", length=5)

ax.text(3.9, 0.1, 0, '$x$', size='x-large')
ax.text(0, 3.9, 0.1, '$y$', size='x-large')
ax.text(0, 0.1, 3.9, '$z$', size='x-large')

ax.plot([0, 1], [0, 2], [0, 3], marker='o')

ax.set_axis_off()

# plt.savefig('Figure-4.svg', bbox_inches='tight')

plt.show()
silk pendant
#

ohhhhh

#

from that view angle it looks fine

#

but moving it around you realize the z axis is absurdly longer than the other axes

#

x axis I mean

long locust
#

That is odd, but I gotta run

silk pendant
#

there goes my one ray of hope

#

time to go back down the google/stack overflow rabbit hole

#
ax.set_xlim3d(0, 5)
ax.set_ylim3d(0, 5)
ax.set_zlim3d(0, 5)

adding this seems to work

molten hamlet
#
import numpy as np
import matplotlib.pyplot as plt

# Parameter
a = 0.1

# Time array
t = np.linspace(0, 50, 400)

# Compute X and Y
X = np.sin(t * a)
Y = np.cos(t * a)

# Create 2D grid for contour plot
X_grid, Y_grid = np.meshgrid(X, Y)

# Define function F(X, Y)
F = X_grid + Y_grid

# Plot filled contour
plt.figure()
contour = plt.contourf(X_grid, Y_grid, F, levels=8, cmap='plasma')
# plt.colorbar(contour, label='F(X, Y)')
# plt.title('Filled Contour plot of F(X, Y) = X + Y')
# plt.xlabel('X = sin(t*a)')
# plt.ylabel('Y = cos(t*a)')
plt.show()
``` I think matplotlib has some bugs
viscid urchin
spring field
#

Indeed, especially love the exploration away from LLMs

serene scaffold
#

I wish I could explore not-LLMs

#

but that's not where the money is

opaque condor
#

Play 47 images of cast would be good data set of cat images

waxen kindle
#

for which task ?

opaque condor
#

Image recognition of animals I still have dogs and gerbils

waxen kindle
#

I think you would need a at least few thousand images per class to get decent accuracy

opaque condor
#

I could apply transforms to all the images to test the model to robustness plus I this is why I could get from scouring both the internet and some of discord

waxen kindle
#

yes but still

opaque condor
#

But truly it would also train the ai for robustness

spring field
#

48, you have 48 images

#

also 48 images is how you overfit the model

#

have you considered getting a dataset from somewhere like huggingface?

waxen kindle
#

it will train it to recognize 48 images of cats, and some very similar images

#

remember that cat vs dog vs gerbils are very similar, it may be hard to spot differences even for a human if low-resolution or bad lighting

opaque condor
waxen kindle
#

I don't think you realize how many samples are required to train ML/DL/AI algorithm

spring field
#

and what does "from scratch" mean anyway? are you going around with a camera, to people's houses and taking photos of their cats?

opaque condor
#

No I mean is from scratch taking photos that people have shared I'm putting them in a folder labeled cats and naming each file

And to answer your statement 2x tanguy
I do realize how many images are needed I tried to make a image scraper (mindful dev) took me off of that route because he said it was against a sites policy I know but I can go to hugging face or kaggle but if I need to understand why is so hard to train my as well learn a little bit of it right

waxen kindle
#

ifyou do realise you wouldn't be saving a few dozen random pics from the internet

opaque condor
#

I'm going to add more

I don't exactly sleep and when I do I don't exactly want to get up so might as well use that to my advantage aim to get by the next two days scrape enough images that are not AI made to make my own dataset

waxen kindle
#

You can 100% use some AI-made images

#

of course not 100% of the dataset should be made of it, but you can have some

opaque condor
#

I'm trying to use as much pure data as I can

eager lance
#

any good resources for data science?

worldly dawn
primal pewter
#

Hello, Im thinking about doing a project that would involve training an ai model. Im a beginner still, but a cs student so in any case it will be a good learning experience. Now because Im a beginner, I dont really have an idea where to start and I was thinking about using GPT, not to code for me, but to point me in the right directions to start with, perhaps what I need and must do and generally just sets me up to go? Im not asking out of ethical concerns, but purely for if Language models like GPT are in a state sufficient enough to do that.

serene scaffold
primal pewter
mellow vector
#

Been going over pl.LazyFrames, working toward a memory cheap pipeline. I have it written from csv (-> to parquet) though the preprocessing operations and am at a point where I need to import it into pytorch. I'm not quite sure where to start, I'm compelled to .collect for everything in torch but I sus there's a cheaper method to load batches.

agile cobalt
mellow vector
agile cobalt
mellow vector
agile cobalt
#

shuffling in lazy mode is also awkward, but if you can do it after collecting each batch that should work

runic glacier
viscid urchin
#

Instead of giving you the answer, they explain the context and then ask you a question etc.

mellow vector
# runic glacier find solution by yourself and never ask here again

this kind of "humor" isn't really appropriate for this server, if you wouldn't behave that way in a library it's probably not a good idea, someone who wants help might be turned off to the server by your behavior and that is basically the opposite of what we want here. Everyone is trying to be helpful.

long locust
#

They won't be able to reply

gray saffron
#

anybody help me develop an ai

viscid urchin
gray saffron
#

no like a proper medical chatbot ai

viscid urchin
#

Yep, that's how I would build that.

high heron
bronze wyvern
#

Hello, quick question, what's the purpose of thresholding in image processing? Like I was told to apply "otsu" thresholding, what it is its purpose, how does that benefit image processing techniques pls

viscid urchin
#

My understanding is that there are two main reasons: To reduce the amount of data you are processing, and to "converge" similar images into the same result if they differ in ways that just seem "noisy".

#

But I'm not an expert, hopefully someone can improve on that.

hybrid shard
mossy blaze
#

I'm pleased to share with you the final results of my approach on the ARC AGI benchmark, which are as follows:

  • Total number of tasks solved: 446/1000
  • Success rate: 44.6%
  • Total execution time (on a CPU with 12 logical processors): 15 hours
  • Data size to analyze: 170 MB
hard brook
#

someone from Indonesia?

vale umbra
barren fractal
#

hey all, is this a good place to ask a question about data science specifically (no relation to Python) or are there better places for it?

#

I like discord or other chatroom-like apps over something like stackoverflow because it's easier to make conversation

grand minnow
barren fractal
# grand minnow Im pretty sure its fine to talk about data science. It will eventually gets tran...

there's a pretty cool statistics overview about Japanese travel: https://statistics.jnto.go.jp/en/graph (official numbers by a Japanese organisation) and I was wondering if the given data is enough to get a sense of the "average" itinerary of a tourist

#

specifically, given the "breakdown by length of stay" and "overnight stays by region/prefecture" would it be possible to make any meaningful inferences about how many regions the average tourist visits over the course of their trip

grand minnow
#

interesting stats nevertheless

barren fractal
#

yeah and that info isn't available, I imagine it's very challenging to collect

grand minnow
#

I bet so too

barren fractal
#

a few people I know have gone between 10 days and two weeks, they did get around a bit more but I'm sure that for business purposes it's entirely plausible that some people stay in the same city for months on end

grand minnow
barren fractal
#

how would you go about it? I'm not a stats guy myself personally

grand minnow
# barren fractal how would you go about it? I'm not a stats guy myself personally

Me neither. Still a noob at it. But here's my take. Start by a question like "how many tourists would go do Osaka by end of the year and what would they be doing" or something. Then I would dive into each stats and find correlation that may help answer that question. Organize and sort. And that should answer that question. Another hypothetical question might be, "I want to go when its not peak tourist season but still has events to attend". Then find the relevant data that shows and answers that.

barren fractal
#

hmm I see, that sounds like an interesting approach yeah

#

I tried ChatGPT but obviously it's not gonna teach me data science from a few questions

grand minnow
barren fractal
trail zodiac
#

Greetings. If anyone here is familiar with the img2table library, I'm getting an error that I need to install img2table[paddle] (despite it being installed). I found an issue for it here- https://github.com/xavctn/img2table/issues/243 - but I don't understand the solution. Can anyone provide some direction here?

GitHub

Hello, I got this error: Missing dependencies, please install 'img2table[paddle]' to use this class I created a virtual environment to navigate through the files, and Iโ€™m 100% sur I install...

agile cobalt
trail zodiac
#

I have done that, and I still get the error. Part of why I assume there's something here I'm not understanding.

agile cobalt
#

odds are it's just broken then

trail zodiac
#

damn. Aight, thanks. o7

mellow vector
#

It clearly works but gpt's advice is about the last place I'd take advice from if I have a choice

agile cobalt
#

it's not unreasonable though

mellow vector
#

hmmm

#

that's not a bad idea

#

Thinking I might polish this for my portfolio if I can get it streaming straight from parquet into the data loader

agile cobalt
#

you could also take the head/tail or every Nth row instead of doing it based on the ID

void stone
#

Hi,
I wanted to know what kind of projects I can make in order to secure a summer internship for a data scientist role
By the way I'm a beginner; currently working on a credit card fraud system; but I'm unsure if it would be enough

viscid urchin
void stone
bronze wyvern
#

Hi, just wondering, is there any resource that explains us how to train a multi-classification models using recommended ai/ml frameworks pls. Like from data cleaning, data split, hyperparameters tunning, metrics, model evaluation etc

odd meteor
bronze wyvern
#

yep, will have a look at fast.ai, ty !

bronze wyvern
#

Hi, quick question, when it comes to LLMs, how are they keeping track of latest things? For instance, say there is a new article published or a new technology released, what happens?

I know that LLMs learn from us, from our data, but question. Do they learn from anything we type in the LLM itself? Is there some kind of filtering before storing somewhere like in a vector database? Do the AI engineers actually have time to filter those info? Seem unrealistic, no?
I know there is the concept of RAG, but even, that database used is updated at some point no?

lapis sequoia
#

in the offensive language dataset based, the one from twitter tweets, is the point of it that it lacks context severely? Like, the word "yellow" without attention, will be classified as neutral, because it will more than likely be interpreted as a color, when it means something else based on the context(which is ignored). Is that kind of the reasoning for that dataset?

bronze wyvern
#

hmm yes in the sense that filtering does occur?

serene scaffold
bronze wyvern
#

yeah so basically, each time we see "searching the web", it's using an AI agent behind the scene to scrap the web and look for the info?

lapis sequoia
bronze wyvern
#

ohhh ok

lapis sequoia
#

its RAG

bronze wyvern
#

no but

#

if it's RAG, it should have give us a valid answer, no?

lapis sequoia
bronze wyvern
#

yupp I see

#

so LLMs are bad for new/recent things

lapis sequoia
# bronze wyvern so LLMs are **bad** for new/recent things

no, it depends on the quality of data. if you are using chatgpt on the openai site, it is ok. LLMs in general make up stuff or just go on forever and spew nonsense if it doesn't know the answer. LLM's with direct docuements on your PC through a api key are pretty great if you know what you doing.

serene scaffold
lapis sequoia
craggy kettle
#

fellow devs. anyone having experience working with pyspark to resolve deeply nested XML? I have XML files of different schema which are both nested and deeply nested (struct array stuct), I would be using a mapping csv to resolve the data. But I have not been able to do so far.

lavish skiff
#

I wanted to know if I can use deepface to train an AI

lapis sequoia
#

any of you do RL?

frosty mountain
#

When working with a dataframe, how do you deal with incorrect data points? For example a column 'age' having values such as -1 and 225, while ordinal columns like 'Thalassemia' have values outside the range of 3, 6, 7?

Like how can you set those out of range values to NaN for each column

carmine ridge
#

Hey I am trying to educate myself. Can someone explain to me what are

Gradient decent
loss function
learning rate

I am so confused rn. I just know they are used to optimize an algorithm but how

viscid urchin
#

The loss function is the mountain. Your altitude is how โ€œwrongโ€ your current position is. High means you are not close to your goal, low means you might be.

#

Gradient descent is how you find your way downhill. You stop, check out the ground where you are standing, and then figure out which way slopes down and go that direction.

#

Learning rate is how long the steps you take are. A long stride means you travel faster, but you also might step over an edge if youโ€™re not careful. A short stride means you are shuffling forward and it might be slow, and you might not make it down before nightfall, when the ice weasels come out.

carmine ridge
viscid urchin
#

If you think about a curving line drawn on a 2D plot, the process of using calculus to find the lowest point is exactly what is going on with โ€œgradient descentโ€

#

Gradient descent is just the โ€œextensionโ€ of that idea to more dimensions, like you end up with in machine learning

#

The โ€œslopeโ€ of the higher dimensional โ€œcurveโ€ is called the โ€œgradient vectorโ€

#

So I think the intuition of running your hand over a surface to find the lowest point is pretty ok to use

#

Just remember occasionally that itโ€™s a bunch of dimensions not just three

#

Beyond that itโ€™s just learning what the โ€œupdateโ€ formula looks like, but all itโ€™s doing is the stuff described above.

carmine ridge
viscid urchin
#

The Y value of the curving line is the loss function

#

the gradient vector is the slope at any given point

#

the loss function isn't the declining part, that's the gradient vector. the loss function is just a value, in this case 'how high off the ground are you'

#

This paper/book is the best explanation of LLMs I've seen so far, if you want to see how the full catastrophe is currently put together: https://arxiv.org/abs/2501.09223

#

(To be clear I'm not saying you were asking about LLMs, just that they certainly use these ideas.)

#

Another way to look at this stuff is as an application of this: https://en.wikipedia.org/wiki/Free_energy_principle

The free energy principle is a mathematical principle of information physics. Its application to fMRI brain imaging data as a theoretical framework suggests that the brain reduces surprise or uncertainty by making predictions based on internal models and uses sensory input to update its models so as to improve the accuracy of its predictions. Th...

#

The "least surprise" idea here is useful etc.

carmine ridge
# viscid urchin The Y value of the curving line is the loss function

Correct me if I am wrong:
So we start with OUR weights and a fixed technique of the loss function(eg the sum of sq diff)
We then calculate the Gradient (derivative)of the loss function wrt the weights used.
This tells us the side of the slope we should move(increase or decrease our weights, And the learning rate tells us by what extent we will change the weights).

And we continue this until we reach the weight that gives us the least value of the loss function

#

And now just have to apply this in a multi dimensional world

viscid urchin
#

100% yes

#

Start with the weights, calculate the gradient, update the weights, repeat

#

You got it

carmine ridge
#

Thanks man ily

#

I would vote for you if you ever ran for the president

viscid urchin
#

Thanks! I plan to run unopposed though when the time comes. ABBATH

thick heart
#

where to find models? i dont wanna do the ml
i just want the model
im a swe

jaunty helm
waxen kindle
#

hugging face maybe

obtuse acorn
#

im trying to find the right type of chart for displaying the proportion of a groups subcategories

#

wow that actually sounds like gibberish

#

its probably easier to just show what i mean

#

i found that a sankey diagram kinda works but only if i have the subcategories have different names which is doesnt look great

#

so like if i remove the prefixes it joins the categories together

viscid urchin
#

Maybe this is a situation for a "Sunburst Chart"? You're right that Sankey isn't great at hierarchies

#

or a Treemap maybe

#

both are designed for nesting

#

plotly can just do px.sunburst(df,...) on a dataframe from pandas or similar.

waxen kindle
#

What about a network

viscid urchin
#

assuming your data was like:

data = {
    'group': ['Adult', 'Adult', 'Child', 'Child', 'Child', 'Child'],
    'gender': ['Male', 'Male', 'Male', 'Male', 'Female', 'Female'],
    'speed': ['Fast', 'Slow', 'Fast', 'Slow', 'Fast', 'Slow'],
    'value': [4, 2, 1, 3, 3, 3]
}
#

plotly.sunburst would just "eat" that etc

waxen kindle
#

With nodes being names and the number of instances from a set to another is written on the edges ?

viscid urchin
#

Yeah, the numbers could be weights of the connections etc I guess.

#

But I think a Treemap kinda "just does that"? Not sure if they are totally equivalent.

waxen kindle
#

Yeah i think works too

obtuse acorn
#

basically im wanting to show the proportions of each subcategory

#

let me try and word this right

viscid urchin
#

I think I get what you're saying, and IMO both Starburst and Treemap do it

#

with the 'pie wedge size' and the 'rectangle size', respectively

obtuse acorn
#

i looked at treemaps and i dont think they do

#

i might be looking at bad example tho

viscid urchin
#

How not? a treemap view of your hard drive for example makes each box sized to the file

#

(bad-looking example but you get the idea)

#

area of labeled section = size/value/whatever

#

It's old:

obtuse acorn
#

i think the starburst one would work for what im after

#

idk if it would look great

viscid urchin
#

Yeah, might have to play with the 'style' a lot

obtuse acorn
#

oh yeah it probably helps if i share what this is actually for

viscid urchin
#

but I think it can clearly represent what you've got schema-wise

obtuse acorn
# obtuse acorn

im making a crowd crush simulator and i was using this chart as an example of how you might set the statistics of the crowd

#

basically recursively setting how different properties are distributed

#

splitting the categories into smaller subcategories

gilded pebble
#

where can i learn mathematics for AI and i have no background

viscid urchin
#

The sunburst chart is perfect because it directly maps to the nested properties of a crowd.

#

imagine you have "age group", "temperament", "goal", and "count" (number of people in each sub-category)

#
plotly.sunburst(
    your_df,
    path=['age_group', 'temperament', 'goal'], # The hierarchy
    values='count',
    title='Crowd Distribution Sim'
)
#

and whammo

#

Use everything but the thing you want to have control the size of the section be part of the 'path', and then use the thing that should map to area be the 'values'.

somber willow
vale field
#

Hi, quick question, I started learning n grams in nlp. I did scraped 9 wikipedia pages e.g. one on algorithms and software engineering etc, I just wanna ask after I have 1 gram - 5 gram, does the n grams need to be ordered by frequency (which is most common appearing)? Is it important if I need to make visualisations of each extracted gram e.g 1 gram, 2 gram etc?

dense lava
#

what do you mean by that do they need to be ordered by frequency ?

vale field
#

Like the word that appears the most in a page e.g. artificial is on the top

dense lava
#

can anyone help me with model selection for a time series forecasting. I have 20 time series with an upward trend and with a seasonality.

#

I tried using LSTMs , but the error is still too high

#

the evaluation metric is RMSLE

wide carbon
#

hello

mellow vector
#

Instructor is normalizing the entire dataset in a course I'm following, does this result in a leak from the testing set?

#

He hasn't included a validation set at any point yet, I wonder if he's just loosely combining them for simplicity's sake. I should probably just complete the course before I go crazy writing a pipeline

#

was thinking it might be cool to write a train_test_split suite with marimo ui elements, as lazily as possible, I'd love to hear peoples thoughts on that

waxen kindle
#

It probably doesn't matter, but practically yes

mellow vector
#

That's what I was thinking, was nice to normalize it on the fly but it's not that hard to set some stats aside

waxen kindle
#

What is usually done is to normalize both datasets woth the stats of the training one only

#

But in practice, if you get both subsets from the same dataset, they should have the same distribution

#

So you would get the same result

mellow vector
#

the random seeds have a pretty noticeable impact on the toy sets I'm using but I imagine that isn't an issue with larger sets

near rose
cloud apex
#

Iโ€™m looking for someone who knows machine learning and deep learning for a few coaching sessions. Iโ€™m currently learning and need help with a few things, as well as someone to review my code. If anyoneโ€™s interested, hit me up in the DMs.

serene scaffold
cloud apex
#

I will pay

cloud apex
serene scaffold
# cloud apex I will pay

It is not allowed to offer payment in this server.
You can post it in a paste bin or link to the github

#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

queen silo
#

Hey I was trying to implement a Multilayered Perceptron from scratch using numpy on Iris dataset but I can't implement the back propagation part so need help

twilit geode
#

Are there some video or course to help with general knowledge how to just get started with ai? I donโ€™t think machine learning. But ya probably have to know the basics of that too. Iโ€™ve also heard the term ai agent thrown around too.

tropic edge
pine heron
bronze wyvern
#

Hello, quick question. I need to work on a multi-image classification ML project.

I need to do some preprocessing with my dataset, I wanted to know how should I proceed.

So first, I should perform some data cleaning, like normalizing categories to numbers, removing nan values if any etc...?

Then after that, say I have my images. My question is:

I need to perform image augmentation/preprocessing, how should I proceed?
I should perform image rotation, transformation, grayscale, blurr? All of these operations or some specific pls, how do I choose which one?

Then after that, I would need to do my data split, say 80 10 10.

Then train, validate and test my model.

At the very end, I would need to calculate some metric to my model? What kind of metrics should we use for such task, the confusion matrix thing?

serene scaffold
bronze wyvern
#

concerning the image processing, is there anything I should cater about?

serene scaffold
bronze wyvern
#

ahh, no problem :c, I was wondering, normally in a dataset, should we have grayscale image/blurred image or these come down to the pre-processing

#

I think it comes to the pre-processing, no?

serene scaffold
#

idk

bronze wyvern
#

:c, hopefully someone may have an answer

#

I'm looking for resources online for these but it's very limited ๐Ÿฅฒ

tropic edge
#

Look for LangChain courses

bronze wyvern
#

Hello, can someone explain why when it comes to model evaluation, we can't only rely on accuracy, what would this imply if we did so?

agile cobalt
bronze wyvern
#

yep you are right, just read that

#

ty !

bronze wyvern
bronze wyvern
#

you'd want to do the micro and macro averages of all three.
What do you mean the micro and macro averages pls :c

serene scaffold
bronze wyvern
#

yep will do so

bronze wyvern
#

from what I've understood, macro average treats all class size equally and so we perform the average on precision/recall and f1 separately.

On the other hand, for micro average, we sum up all individual fn/tp/fp then calculate the mean for each metrics, I didn't understand quite where this is used though, when we need an overall metric for our model ? But we can achieve same with macro average, no? Why micro?

viscid urchin
#

Anybody used "Google Vertex AI Studio" for anything yet? I'm considering doing a thing with it, because it lets you directly get feedback about the "perplexity" of your prompts. The setup is a bit tedious though so I figured I'd ask first before going through the checklist.

serene scaffold
twilit geode
bronze wyvern
#

Hello, can someone explain the difference between evaluation and regression metrics and where to use them pls.

From what I've read, evaluation metrics is used when we have built our model entirely, while, regression metrics can be used for each epoch? See how to minimize loss for e.g?

mellow vector
#

Trying to write some lazyframe code to return mean and standard deviation of the frame, it seems really clunky to produce a 1 row frame with alternating columns for the values.

gritty vessel
#

Hey how these big models are train? Like gpt models,stable diffusion and all?

#

How do they decide which arch is best as it takes so much time to train

#

So trying different combinations will take lots of time and resources

agile cobalt
mellow vector
#

still wrapping my head around dataframes tbh, everything was a vlookup during my years as an excel spec

runic parcel
#

Does anyone over here has the Machine Learning Specialization and Deep Learning Specialization course by andrew ng?

jaunty helm
#

stable diffusion 3 also had issues on release; if you've seen the "woman lying on grass" abominations yeah that's sd3.
3.5 did fix some of those issues, but by then the community has moved on to flux

cedar veldt
#

hi guys nice to meet everyone

fringe marsh
cedar veldt
#

helows , whats up?

#

I've been doing a soft robot simulator xd but sometimes it's hard to concentrate while working alone in a project

#

that's why I joined this server

fringe marsh
fringe marsh
#

it has helped me couple of times here and there

cedar veldt
#

I have a video ... is it possible to share here videos? , its a little language built with python that lets you prototype and test voxel based robots

#

its very hard to sell something like this so my goal is to make it super fun to work with xd almost like a game

fringe marsh
cedar veldt
#

his name is fernando , he likes to walk but never gets too far : P

#

hahah I have another video with the IDE but I don't want to spam so I will share later

#

its very easy to do shapes because the shapes are defined by scalar fields , so you can do any implicit shape , I wanted to do this one first because it was the easiest

fringe marsh
cedar veldt
#

thank you!! uwu

honest obsidian
cedar veldt
#

the robot behaves like this because of different material properties , and the oscillations I defined owo

#

you can change the frequency for example and that can make it move twice as fast

#

when I get something more stable I will share it so people can write their own robots (its opensource), I wanna make a contest to see who can write the best bots for specific tasks uwu

fringe marsh
#

this is a really nice presentation @cedar veldt looks fun. Hopefully it will be open source some day. ๐Ÿ˜Ž

cedar veldt
#

it is already opensource

#

I haven't shared yet because I'm embarrassed of my messy code ๐Ÿ˜›

rugged spindle
#

Do you find LangChain developer?
I am senior AI/ML engineer.

grand minnow
calm cipher
#

oof, I just traced what I thought was a bug in my data preprocessing code to a couple of bad blocks on the drive storing my data

gritty vessel
#

Using which they can speed up and try different archs

bronze wyvern
#

Hello quick question, I know that both standardization and normalization are part of the feature scaling process in data preparation. My question is, why do we use one over the other?

Their main goal is just to convert some values into some other numerical values, like 0 and 1.

I read that normalization is preferred when we know that our dataset doesn't follow the gaussian distribution, so maybe when there are lots of outliers/skewness?

But what abour standardization when do we use it and why pls.

agile cobalt
bronze wyvern
#

I want to learn the maths associated with gradient descent and stochastic gradient descent, anyone knows where I can get a reference to pls. I know it's just basic thing like y= mx + c but I don't really know which parameter represent what

#

I'm trying to understand the problem of vanishing and exploding gradient and I wanted to have an overview of the maths related to gradient first

wooden sail
#

i'd say wikipedia offers a pretty good introduction, but the notation is already a little technical

bronze wyvern
#

The thing is I know the theoretical concept but not really how the maths work

bronze wyvern
wooden sail
#

vanishing and exploding gradients are usually something that pops up in the context of using the "chain rule"

bronze wyvern
#

yup noted, I have some knowledge of the chain rule I think, I will have a look how this give rise to these problems, ty !

wooden sail
#

if you want to try this out yourself by hand, something like khan academy should have simple examples with a step-by-step on how it works

bronze wyvern
#

Hi, has anyone ever use YOLO for image recognition and classification? I don't understand, under the hood it uses ResNet or ResNet is completely another CNN architecture?

I need to train a multi class classification model both for image recognition and classification, am confused which framework/library to use. I was told to use YOLO though, don't know the reason though, anyone here has experience with it pls

carmine ridge
#

Hey, i am new to deep learning and i am confused which library should be the best to start? I started with tensorflow but i also read abt pytorch and now I am confused

gritty vessel
# jaunty helm I honestly don't think they do, it's just try to make your best educated guess a...
sterile leaf
#

yo

#

is this the place to ask for reinforcement learning tutorials ?

wary geode
#

Hello everyone..
I havs finished python tutorial video and i have done some exercises of all topic and my end aim is either data science or ai enginner.
Now I am planning to do DSA in python but people are saying don't do DSA with python.

What should I do? I would be pleased if someone share their insight on my problem.

serene scaffold
viscid urchin
#

Yeah, DSA isn't about micro-performance issues like language choice, it's about different orders of efficiency.

#

micro-performance is Progamming Language Theory territory, not DSA

wary geode
#

What should i do?

serene scaffold
gritty vessel
#

You can do in python as well if you are gonna use python in your career ahead

viscid urchin
# wary geode What should i do?

Acquire this book and grind through it, emerge as a true warrior: https://webperso.info.ucl.ac.be/~pvr/book.html

gritty vessel
#

And yeah logic stays same between the languages

mighty lake
#

is there any websites that can teaches me python for free?

#

or DSA?

serene scaffold
bronze wyvern
#

hello quick question, say I have a system that takes as input multiple images but these images are all of different resolution and now I need to standardize them, say I need to make them 500 by 500pixels.

Now my question is, does the new size of the resolution we want matter, like if I want to use 300 by 300 or 400 by 400?

Now, I was wondering we would need to keep aspect ratio the same, no?

Now, all images might not have same aspect ratio so, in code we can't hard-code to keep aspect ratio of say 16:9, maybe I would need to find aspect ratio of original image for each sample of my data set then for that aspect ratio use pixel say 500 by 500, no?

agile cobalt
viscid urchin
#

ImageMagick's CLI tools can do this in a one-liner

#

(depending on the settings you want etc)

bronze wyvern
bronze wyvern
viscid urchin
#

Actually lemme see if I can just craft that example

#

Assuming you've installed the base ImageMagick package for your OS (which has various differently-named CLI entry points)...

mkdir -p ./conform_output
mogrify -path conform_output -resize '800x800>' -background black -gravity center -extent 800x800 *.png
``` (example uses 800x800 as max size, pick whatever you need.)
I think that's right?
#

(would work for *.jpg also etc)

#

The -resize syntax is advanced, you can do lots fancier stuff than that, which is just saying "limit max dimension to 800 pixels"

#

montage is great too, I used this recently to lay up a directory of pngs into 4-column posterboard style:

montage *.png -tile 4x -geometry +10+10 posterboard.png
#

(+10+10 says all-around 10px spacing between each)

bronze wyvern
#

oh ok, will have a look at that, seem really powerful and useful, ty !

lean oriole
#

hello may i ask a small question?

#

i am trying to start learning robotics and automation but i am confused about what content should i follow like which topic to be focused on and what will be a fun way to keep progressing while learning.

viscid urchin
#

my buddy Trey's title at NASA before his recent promotion was:

The Solver-in-Residence (SiR) program is a one-year detail position with the chief technologist in NASAโ€™s Office of Technology Policy and Strategy. The program enables a NASA civil servant to propose a one-year investigation on a specific technology challenge and then work to identify solutions to address those challenges.

#

"AI and Autonomy Solver-in-Residence"

#

crazy smart kid

#

My main responsibility is conducting a study I formulated on how Modular Open Systems Approaches could be used at NASA, both broadly and with a focus on how autonomy and robotics software interoperability could be improved using the Space Robot Operating System (Space ROS) framework. Conducting the study involves meeting with a broad range of experts across government, industry, and academia, organizing workshops, managing technical investigations, and briefing findings to senior NASA leadership.

#

Learning Robot Operating System stuff might be a good place to start actually.

lean oriole
#

wow

#

i'd look into it

mellow vector
#

this warning about column names is bugging me, I recall now that collect_schema.names appears to do what I need but this code is really verbose and it feels wrong ```py
lf = lf.select([(pl.col(c) - mean_df[c][0]) / std_df[c][0] for c in lf.HeresWhereTheNamesAre])

#

something like

(lf - mean_df) / std_df

would be so much prettier

opaque condor
serene scaffold
mellow vector
#

Import * from *

opaque condor
#

Got it to working and I need to know what imports I might need to add so I can train an AI to do all of abilities above

covert granite
#

are you a pytz enjoyer or a zoneinfo embracer?

opaque condor
bronze wyvern
#

hello, quick question

#

say I have train an image classification model. During the standardization process, I converted my images into 512 x 512 pixels.

Now say I build some interface that require us to upload the image we want to process. Now behind the scenes, we must first convert this image into 512 x 512 pixels then process it, right?

viscid urchin
#

Check out what I said above re: imagemagick and its โ€œmogrifyโ€ command.

bronze wyvern
#

yep so basically I can apply a vast range of transformations using that command but the thing is I would use imageMagick on images that are already on disk, right?

Say I have a website and a user decided to upload its own image, I would still need to do that processing... hmm do you think there are some sort of api that would allow me to write code to interact with imageMagick (I should investigate). For example the idea is:

User upload picture.
Picture goes into /images folder or something like that.
Before verifying/classifying which image it is, runs the imageMagick commands for that image.
Overwrite that image and classify the new image based on what was trained.

gritty vessel
#

You can apply it before training no need for overwriting the image

bronze wyvern
#

no but, I will do so, I will apply them on my dataset, but my system will be like a website where we can upload images and these uploaded images are not preprocessed

agile cobalt
#

yeah you must pre-process it identically to how you process training images