#data-science-and-ml

1 messages · Page 163 of 1

agile cobalt
#

yeah the only reason for using it I could imagine would be portability (taking the Raspberry and BCI gear with you)

limber spear
#

Grab a base nvidia gtx 1000 series to start. That is what I am starting with. Most of my training models is on a 10 year old intel Xeon cpu and 10 year old nvidia quadro m1000m gpu which pretty much has like 20 CUDA cores

#

Just barely bought it

#

It’s like $60 @ Walmart ~750 cuda cores give or take

deep bough
#

Hi, I need some help. I have to build an invoice categorizer. I’ve already created one, but it breaks when I upload a PDF with a different template. Is there something I can do about that? I’m thinking I should start from scratch.

past meteor
#

I think a lot of the BCI stuff is done with SVMs and other algos running on CPU

#

You might actually be fine with a raspberry pi in that case

#

Cause the (typically) tiny sample sizes and extremely high dimensionality are where SVMs still reign supreme

#

A bigger concern is whether or not all the libraries you want to use will support ARM

#

Yeah, KNN is an algorithm you probably should never use. You should be able to train the net on google collab using their GPUs and do inference on your CPU. Especially for image processing, and the fact you have 1000 images you should go down that route imo

serene grail
past meteor
#

Amongst other problems

#

By default it also doesn’t consider 1 feature more important than others

serene grail
#

I see, thank you!

past meteor
#

Sure you can probably assign weights but the default implementations just do the dot product of 2 normalised vectors and that’s it. In reality the distance of 1 feature should contribute more to the overall distance than others, we also want to learn this scoring and not
hard code it. If you add this requirement then you’ve arrived at how most real ML algos works

#

The idea that we want to learn this scoring is exactly what the coefficients of logistic regression mean btw

serene grail
dense pivot
#

Is it possible to train custom ai model on virtual machines for a short period of time or the training should be going on continuous?

serene scaffold
#

you don't just keep training a model forever, though.

dense pivot
serene scaffold
dense pivot
#

How can I train if I want to

serene scaffold
viscid urchin
dusk fractal
#

What model from https://pypi.org/project/g4f/ can I use to create creative&scientific writings of around 6000-8000 words? Considering I won't have to pay anything

#

I've tried the paid deepseek api and it can only return 1500-2000 word texts at a time

#

And I've only tried that because is it the cheapest

#

All the others are too expensive, considering I'll be needing about 400k-500k tokens a day of output, my max monthly budget is 20$, that's why I'm looking at G4F models

agile cobalt
dusk fractal
#

I don't care about ethics considering I'm poor and those companies are rich...

#

If I were to have had the money I would've just paid for the o1 api but I don't have it

viscid urchin
#

Sure but

#

!rule 5

arctic wedgeBOT
#

5. Do not provide or request help on projects that may violate terms of service, or that may be deemed inappropriate, malicious, or illegal.

agile cobalt
#

You can try using some of the "free" models under https://openrouter.ai instead, it's mostly companies voluntarily offering spare compute to test their infrastructure or trying to put their name out there to attract costumers

past meteor
#

It's definitley possible but it may not be practical

#

Many ML algorithms are online by default. They're trained iteratively, think everything that uses gradient descent

#

That means you can always persist the weights to disk and resume training whenever. It depends on the library you're using to make this process nice or very annoying

pale thunder
#

people who have been in a team in kaggle and tried to get actual work done,how?

#

We're suferring.

#

the versioning locks you out while things are running, there is no meaningful merge screen that I can find, ...

glacial root
#

is this kind of progression normal for a neural network or did i do something wrong

safe agate
neat pasture
#

Can someone here familiar with BCI and AI works help me?

I’m a masters student and I’m currently looking into research ideas for my thesis topic.

Initially I wanted to use brainwaves as input data to be used for controlling the mouse cursor in a PC, using AI as the interpreter of the brain wave data and further develop it to try to form texts from imagined words

But once I saw the prices of sensors that are available especially for the high requirements I have, I realised I have no chance of doing anything.
OpenBCI Daisy sensors or any good sensors with 14-16 channels costs ~$2000 if not more and as a master’s student whose monthly allowance is that (rent and food and utilities) I can’t afford to work with sensors.

So I’ve elected to instead work with existing datasets.

Is it really not possible to buy a bunch of cheap EEG sensors and then feed them into an Arduino to accomplish what OpenBCI Cyton+Daisy can do? Essentially a DIY version that would be cheaper but of course lots of wire and more stuff to program.

Or is dataset my only option?

Currently I have a good PC with a Ryzen 7600X3D cpu and 32GB RAM and a RTX 4070 Ti Super with 16GB VRAM to be used for AI training and coding

viscid urchin
glacial root
#

is it practical to implement a cnn with just numpy?

#

also is an ocr scanner a good resume project

neat pasture
wooden sail
#

any changes to the architecture (especially number of layers) require appropriate management of the derivatives, and the optimizers would also have to be implemented by you

#

there is certainly a time and place for that, but if you like numpy and wanna do a nitty gritty low level design of a network, i'd suggest jax

#

comes with autodiff and also some optimizers (e.g. in the optax module) if you wanna make custom architectures

quick igloo
#

Should i learn python or java

grand minnow
quick igloo
#

Is it hard to learn java

serene scaffold
#

Hello @quick igloo, this is the data science channel on Python Discord.
Java is not used in data science.

karmic pond
#

But Java is not used for AI

muted vine
#

genetic algorithms with a random entry, is considered reinforcement learning too?

muted vine
#

because G.A use error loss to select the better setup for the next iteration

#

and it's not supervised

knotty breach
knotty breach
tame monolith
#

Is there a roadmap I can follow for learning data science?

#

Sorry if it's asked too many times

karmic pond
knotty breach
karmic pond
karmic pond
tame monolith
#

No I know english and hindi

karmic pond
glacial root
worldly wagon
#

question about polars, so i've purely been using it over pandas for coming on a year now, an i notice 1-2 issues like tuples/list not being the best supported compared to pandas etcpithink

#

my question is how does polars performance compare to pandas with rapids?

#

also also to build on that question what's the viability if any of polars with rapids
@me when u reply i dont look at my laptop much

iron basalt
worldly wagon
iron basalt
#

In addition, this is better performance in terms of throughput, but worse latency (GPUs have higher latency).

#

I don't know whether it matters if you use Pandas or Polars, probably not.

#

Since both are just sending their tables to Rapids.

#

However, Polars has that integration to make it easier to use.

#

According to that page, it makes use of Polars' query optimizer, which AFAIK Pandas does not have.

agile cobalt
placid drum
#

hello!

#

can someone help me

serene scaffold
# placid drum can someone help me

Hello, remember to always always ask your actual question and never wait for someone to commit to answering before you actually ask it.

placid drum
#

ok sorry

serene scaffold
#

No problem, but be sure to always remember that every time forever

placid drum
#

alright so this is to calculate atmic mass from molecular formula

#

but when i compile it

worldly wagon
placid drum
#

its idle and doesnt prompt me for input

#

I cant debug it because I get this error Terminal exits with code 3221225786 (or similar)

worldly wagon
placid drum
#

loop_value = 0 # for infinite loop
while loop_value == 0: # indentation for loop
atomic_number_mass = { # dictionary

worldly wagon
#

use backticks

serene scaffold
worldly wagon
#

i.e

auto_review = load_dataset(
    "McAuley-Lab/Amazon-Reviews-2023",
    "raw_review_Automotive",
    streaming=True,
    trust_remote_code=True
)
viscid urchin
crisp salmon
#

I have the following code:

# frame = cv2.resize(frame, (0,0), fx=scaleFactor, fy=scaleFactor)
overlay = frame.copy()
for idx, detection in enumerate(extracted_boxes, 1):
    bbox, text, confidence = detection
    cv2.rectangle(overlay, (self.scoreBoxCoords['x']+int(bbox[0][0]), self.scoreBoxCoords['y']+int(bbox[0][1])),
            (self.scoreBoxCoords['x']+int(bbox[2][0]), self.scoreBoxCoords['y']+int(bbox[2][1])), (255, 0, 0), thickness=2)
    cv2.putText(overlay, str(text), (int(self.scoreBoxCoords['x']+int(bbox[0][0])), int(self.scoreBoxCoords['y']+int(bbox[0][1])-8)),
                cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 0, 255), 1, cv2.LINE_AA)

# Blend the overlay with the original frame
alpha = 0.6  # Transparency factor
frame = cv2.addWeighted(overlay, alpha, frame, 1 - alpha, 0)

# Show the frame with text overlay
cv2.imshow("Video Text Overlay", frame)

It works fine with one caveat: the results are for a scaled image. However, when removing the comment in the first line only the image is shown and no overlay is visible. What is the reason for this? Thanks for your help!

terse bone
#

anyone on kaggle

grand minnow
terse bone
grand minnow
terse bone
grand minnow
#

🤷

grand minnow
terse bone
terse bone
weary timber
hollow pagoda
#

is anyone familiar with fbProphet time series forecasting library

serene scaffold
glacial root
#

if i made a project where i did all the under the hood stuff with cython instead of using libraries, would that look better to recruiters or would they not care

serene scaffold
glacial root
glacial root
#

i'm trying to implement my own cnn and am using numpy but just setting up the architecture and getting the cost takes super long to run, and so i'm not sure how astronomically long the gradient descent part would take

#

and also i think it would speed up my first ann implementation, which i've never been able to test cause it never stops running

stone night
#

AI i been working on for a year and a half training constantly

serene scaffold
#

you can use a GPU for free on google colab.

hollow pagoda
serene scaffold
hollow pagoda
#

as if nobody understands the data science questions i have

hollow pagoda
#

i assume maybe people are busy, seeing if someone may have knowledge and wanna answer later

serene scaffold
hollow pagoda
#

i did

serene scaffold
#

is anyone familiar with fbProphet time series forecasting
unless you're doing a survey of who knows about what, this isn't your actual question.

hollow pagoda
#

its a data science library

serene scaffold
#

why do you care if anyone knows about it

hollow pagoda
#

because asking the question will get it overlooked and pushed out of chat

#

someone that knows is more likely to answer starting with the subject

serene scaffold
#

That is false.

#

You need to always ask your actual question. If you're not willing to do that, for whatever reason, I have to ask that you refrain from asking at all.

hollow pagoda
#

ok ill ask it in python discussion

serene scaffold
#

Okay, but the same rule applies. If you ask "does anyone know about X", I'm going to tell you again that you need to ask your actual question.

hollow pagoda
#

not if its a discussion

serene scaffold
hollow pagoda
#

how am i wasting time im not forcing anyone to do anything

#

my real questions already get ignored how can their time be wasted

serene scaffold
serene scaffold
hollow pagoda
#

so u rather me spam then see if i should even continue asking the ignored question

serene scaffold
#

I would rather that you repost your actual question than ask to ask, yes.

hollow pagoda
#

im not asking to ask im beginning a discussion

#

didnt even ask for help yet

serene scaffold
hollow pagoda
#

bro what 😭 you are mall copping

serene scaffold
#

I mean, I'm one of the directors of this community.
But I'm genuinely trying to maximize the number of questions that get answered and minimize the amount of volunteer time that gets spent not answering questions (by, for example, coercing people into asking their question).

hollow pagoda
#

tell volunteers to ignore asking-to-ask questions the same way normal questions get ignored

serene scaffold
glacial root
#

it happens to all of us sometimes, people are busy or maybe no one who knows the answer sees the question

#

and if we go by your logic which is that either one will get ignored, at least if you ask the full question there is a greater chance of it being answered

hollow pagoda
glacial root
#

?

#

that makes no sense

hollow pagoda
#

it does

#

if u cant figure something out and u send it in chat asking for help its like 200% easier to start working on something else and waiting for a response

glacial root
#

i agree, but what extra are you doing by asking the full question

hollow pagoda
#

wasting ur own time

glacial root
#

i do that too a lot

glacial root
#

you're spending an extra 5-10 seconds typing out the entire question

hollow pagoda
#

if u gonna ask a question u should provide as much context as u can

#

otherwise the person will be complaining about not seeing the examples/data/code, etc

glacial root
#

i mean do you want help or not

#

so if asking for help is a waste of time for you, why ask in the first place

hollow pagoda
#

its the type of question

glacial root
#

i somewhat get what you mean, but if you want help then you gotta ask properly

hollow pagoda
#

im assuming nobody active in chat uses the prophet library so if i ask why its forecasting strange results itll waste my time showing the results and data

glacial root
#

it happens

#

sometimes there aren't people who can answer

hollow pagoda
#

i know

glacial root
#

but if you are asking your colleagues for help, how would you ask

hollow pagoda
#

"have you done time forecasting im getting strange results from this model, expecting x but getting y and not sure if its just the model or the parameters"

#

a simple y or n then u show them

#

yall acting like i started pinging people bothering them

glacial root
#

but in all the programming servers i'm in, it seems to be a common norm and so i just follow it cause it's not really causing any inconvenience for me

#

i suggest you do the same just cause it would probably be more helpful for you

jade sinew
#

Anyone want to practice Pytorch with me?

quaint rivet
#

I am trying to convert torch.FloatTensor into cuda's type. But it's not converting. What to do?


path = os.path.join("train", "masks/**")
path_lst = glob.glob(path)

y = torch.tensor([], dtype=torch.float32)
out = torch.tensor([], dtype=torch.float32)

for path in path_lst:
    with rio.open(path) as dataset:
        img = dataset.read(1)  
        flat = torch.from_numpy(img).float().flatten()  
        y = torch.cat((y, flat), dim=0)
        out = y


np_out = out.numpy()
weights = compute_class_weight(class_weight="balanced", classes=np.unique(np_out), y=np_out)

class_weights = torch.from_numpy(weights)
c_weights = class_weights.to(device)
viscid urchin
#

I think you need to move it back to the cpu to do the numpy stuff?
e.g. np_out = out.cpu().numpy()?

#

And maybe specify device=your_device when initializing the tensors?

stoic hollow
#

Should a laplace mechanism be applied to every value in a dataset or just to the result of a query?

odd meteor
woven tinsel
#

does anyone know the best course on Data analytics online ? kindly give some suggestions

agile cobalt
# woven tinsel does anyone know the best course on Data analytics online ? kindly give some sug...

Build a job-ready data analytics skillset using industry-standard tools, including generative AI to extract insights, make decisions, and solve real-world business problems.

glacial root
#

not asking as a replacement for gpu usage, i mean how much is it used for performance optimization on top of gpu usage

serene scaffold
#

I've never created a python tool in C.

glacial root
#

so the only reason i would need to worry about using c/cpp for anything in ml is if i'm implementing a model to be used for an embedded system

serene scaffold
glacial root
#

and in that case why do many people (i know no where near the majority but still a decent number) use cpp for computer vision

serene scaffold
glacial root
#

oh i see

#

that's cool

#

so efficiency of c/cpp while not having to go through all the extras complications of the language

calm thicket
iron basalt
#

However, there will often still be some Python that is just there to basically connect things. Like feed the camera data to your CPP library, read config files, maybe do some networking, etc. Python shows up almost all the time as at least some general scripting tool that ties it all together.

#

This is because Python is simple to use, easy to get packages for, and has a lot of packages for everything imaginable.

#

These packages are usually all each implemented in something like C, with a small Python layer on top (sometimes that Python layer is large).

#

However, there are some cases where you may need to cut out Python entirely, these are not super common (for most ML devs).

worldly wagon
#

so i'm working with a fairly large dataset (200gb) so i moved it to parquet, then read it in using polars, however when merging/cleaning my kernel keeps dying I assume due to ram so i went from 32->48gb of ram but similar issues, haven't really been seeing success with the streaming engine

#

Is there any advice on dealing with datasets this large? I'm considering getting WSL to try using rapids with polars but idk the viablility of that if its a ram issue

#

code for context

lf_review: pl.LazyFrame = pl.scan_parquet("amazon_review_auto.parquet")
lf_meta: pl.LazyFrame = pl.scan_parquet("amazon_meta_auto.parquet")

lf_review: pl.LazyFrame = lf_review.filter(pl.col("rating").is_in([1, 2, 3, 4, 5]))
lf_review = lf_review.filter(pl.col("text").str.strip_chars().str.len_chars() > 0)


lf_review = lf_review.with_columns([
    pl.col("text").str.count_matches(r"\b\w+\b").alias("review_length"),
    (pl.col("timestamp").cast(pl.Datetime("ms")).dt.year()).alias("year")
])


lf: pl.LazyFrame = lf_review.join(lf_meta, on="parent_asin", how="left")
lf = lf.with_columns([extract_brand()])
lf = lf.unique(subset=["user_id", "text", "asin"], keep="first")

df: pl.DataFrame = lf.collect(streaming=True)
viscid urchin
#

df: pl.DataFrame = lf.collect(streaming=True) is the problem.. Yes it's streaming, but you're asking it to collect everything into a single dataframe

worldly wagon
viscid urchin
#

The example they give at the bottom is:

lf = pl.scan_csv("/path/to/my_larger_than_ram_file.csv")  
lf.sink_parquet("out.parquet")  
worldly wagon
#

the only thing is what happens when i need to merge multiple parquets about 34 into 1pithink

#

do i just sink it into a parquet again? (Asking btw not being condescending)

viscid urchin
#

Yeah, you should just be able to feed it more

#

It looks like there's also for batch in lf.collect_streaming_batches(): if you need to operate on the data before writing it to disk
Edit: I might be wrong about the function name, double-check me

viscid urchin
#

It looks like you can also ask for compression and stuff, that's useful:

lf.sink_parquet(
    "processed_results.parquet", 
    compression="zstd",
    maintain_order=False,  # Might improve performance?
    streaming=True
)
worldly wagon
#

i'll read into it

#

some quick very very dumb questions, I'm new to ML if i was to train after reading in a large parquet would that affect my ram or mainly cpu?

worldly wagon
viscid urchin
viscid urchin
worldly wagon
#

was just wondering if the gpu methods wud help with the ram/performance issues

viscid urchin
#

I guess it depends on how intense the operations you want to perform on the data are. If they are 'lightweight' you will be memory limited on the GPU.

#

pl.Config.set_streaming_chunk_size() seems to be how you can manually adjust how big the chunks it works on are.

#

I guess it's measured in rows

worldly wagon
viscid urchin
#

Oh, aha, my earlier thing should probably be phrased as for batch in lf.collect(streaming=True, streaming_chunk_size=batch_size) where you pick the batch size. The function I mentioned earlier is something I found on Google but seems to have been a user wondering about something

hollow pagoda
#

anyone know why plotly express boxplot appearing like scatter of data points instead of like the seaborn box

#

tryna use the dash app with it but its only pltexp compatible

agile cobalt
hollow pagoda
#

They have decimal places like a float but I'll check if it's decimal type in a bit

worldly wagon
#

anyone getting this recently in vscode and aware how to turn it off? (i dont use copilot)

Generate button in notebook cells

#

nvm found it idk if anyone is dumb as me so i'll leave this here

hollow pagoda
hollow pagoda
hollow pagoda
glacial root
#

if i wanted to implement an rnn for predicting the next word in a sequence, would it be a good idea to implement a graph database with the words for better word embeddings?

hollow pagoda
#

oh specifically because i did colors on the y instead of x makes sense

glacial root
#

i thought having organization with respect to semantic relationships was helpful

serene scaffold
glacial root
#

though that would probably be an issue for larger datasets

serene scaffold
glacial root
#

how do people typically do it

serene scaffold
#

typically do what?

glacial root
#

implement graph databases for this task

#

or is it typically just not used

serene scaffold
#

I've never heard of anyone doing this.

glacial root
#

oh

#

i read somewhere that it was useful

glacial root
serene scaffold
#

you want to create a model that generates text, right?

glacial root
serene scaffold
wide sphinx
#

Can I ask, why u r called stelercus papabilissimus?

glacial root
#

so it's essentially not necessary?

glacial root
wide sphinx
#

I see…

glacial root
#

can i ask why you like monkeys and friend chicken

#

same reasoning here

wide sphinx
#

Monkeys

#

R very very cute

#

And they remind me of humans

glacial root
#

same reasoning here

#

oh

#

not quite same reasoning then

#

but similar reasoning

wide sphinx
#

Similar

glacial root
#

because with the same set of words, we can find new sequences

serene scaffold
worldly wagon
snow plume
#

im making my first CNN project and im just after some advice on what statistics i should have. So far i have:

  • accuracy vs validation accuracy before data augmentation
  • loss vs validation loss before data augmentation
  • accuracy vs validation accuracy after data augmentation
  • loss vs validation loss after data augmentation
  • time taken for epochs
  • multiclass confusion matrix
  • f1 score

is there anything else people would recommend me adding?

unkempt apex
snow plume
#

perfect, thank you!

river cape
#

guys could anyone clear out my confusion , what does LSTM(64) mean?
is it like an lstm layer with 64 units?

and could some one clarify between lstm layer,lstm cell ,lstm unit

viral rune
#

Yes

charred stag
#
if object != None:
            filt = {"_id": index, object: {"$exists": True}}
            update = {"$set": {f"{object}.{to_edit}": value}}
            #* check if the field exists
            check_exist = file.find_one(filt)
            if check_exist == None:
                #* create new field
                file.update_one({"_id": index}, {"$set": {object: {}}})
        elif object == None:
            filt = {}
            update = {"$set": {to_edit: value}}
        file.update_one(filt, update)```

guess the library
snow plume
#

im a little lost on what i have done wrong with my CNN. Im using Efficientnetb0 model with CIFAR10 database.
i ran the cnn for 50 epochs without data augmentation and 50 epochs with data augmentation

after data augmentation my validation loss is slightly increasing and my validation accuracy is sitting around 0.85.

snow plume
#

yeah the accuracy is good but my validation accuracy hasnt really changed which im confused about

river cape
snow plume
#

how would that be solved then, saying this is after the data augmentation?

#

because from my understanding, data augmentation is one way to solve an overfitting problem?

snow plume
#

i have not, no. What would that do?

river cape
#

basically you stop training model once its goes above a certain patience

#

check it out

snow plume
river cape
snow plume
#

Sweet, I'll check them out and try again. Thanks for your time

dry raft
#

btw, do you know any github repos or kaggle notebooks that do this?

#

most things I see are always confusing

safe agate
worldly wagon
#

I'm debating if to try ssh into amazon ec2 and just ripping it once/twice

viscid urchin
#

stream it in chunks with collect() or similar, operate on it as needed, and use sink_parquet to stream it out to disk, seems to be the way with Polars

viscid urchin
#

Oh damn I thought there was a chunk_size arg on collect, lemme look for where that is

#

aha polars.Config.set_streaming_chunk_size on that page

worldly wagon
viscid urchin
#

to get all the ids and then scan over them in chunks

worldly wagon
#

i wonder if pyspark could help curb this issue not sure of the config hopefully someone discussed it in chat since imma search

viscid urchin
#

It really seems like collect should support a 'how much to collect' arg to me, but I guess I'm not a Polars expert, maybe there's a good reason not to.

worldly wagon
worldly wagon
#

not trying to be difficult btw if going parquet->csv means i can actually use the data it might be a common sense trade off

still no idea how this is going into plotly and sklearn lol

jaunty helm
jaunty helm
# worldly wagon not trying to be difficult btw if going parquet->csv means i can actually use th...

how this is going into plotly and sklearn
ngl, I don't think it's going to
you're gonna have to sample it before plotting if you don't want plotly to combust
sklearn... well no way all of that data's fitting in your memory, so look for estimators that can incrementally learn through .partial_fit so you can give it chunks of data and don't need to fit the entire thing in memory
you might also just consider neural nets, so pytorch

#

if you're willing to consider other libraries, other than spark, also check dask for dataframes and datashader for plotting (+ hvplot if you want a higher level API)

viscid urchin
#

Dask is very good, IMO.

#

Used it at work for a major thing; we ended up mostly rewriting it in Databricks due to management pressure, but Dask worked great.

worldly wagon
worldly wagon
worldly wagon
worldly wagon
viscid urchin
#

With dask the approach is usually to divide the work up into chunks that each get processed by a dask worker, and individually fit in RAM.

#

You could pair it with a streaming thing though if you wanted to do it differently

worldly wagon
viscid urchin
#

Yeah, assuming the task can be 'chunked' in the first place, some things are really hard to break up.

worldly wagon
viscid urchin
#

Yeah, "dask cloud provider" is the cloud back-end. It's optional but pretty handy.

swift gale
#

Could you recommend a suitable free and open-source model for generating embeddings to populate a vector database?

viscid urchin
#
Unlike a normal lazy query we evaluate the query and write the output by calling sink_parquet instead of collect.
agile cobalt
#

polars also improved their streaming engine a lot recently, try updating to the latest version if you aren't using it yet

limpid dew
#

Hello, all,

serene scaffold
#

Please be transparent about what this project is by posting a link to the open-source repository.

limpid dew
serene scaffold
limpid dew
glacial root
#

what would be the best approach to encoding text data into vectors without using any libraries except numpy

viscid urchin
#

I guess you'll need to choose a word encoding and write it from scratch; people typically use a second library to do that and then jam the encoded text into numpy, but you can do it yourself also

#

"Word2Vec" is the/a classic

glacial root
#

with text though it's probably worth trying on my own if it's not too much data right

viscid urchin
#

Yeah definitely

glacial root
#

i'd probably just use python's default file i/o to parse each text file and get them all into an array, and then from there it would be pretty easy to create a vector for each word

#

also one thing, i don't know if this is some variation of imposter syndrome or something, but i get this feeling that i need to only use numpy unless there's something i really need to use a library for (like using PIL to convert images to arrays), otherwise i'm skipping learning the concepts

viscid urchin
#

I mean, you're not wrong.. that's a great way to make sure you learn it.

glacial root
#

do most ml classes in college work like this?

#

or does it vary

viscid urchin
#

Not sure in modern times.. it probably varies a lot

#

I could also see the "get it working first, then go and explain all the parts" approach being used

glacial root
#

i think part of the reason why i feel a need to stick to numpy is cause i take a very implementation based approach to learning

#

so typically i'll just watch a quick theory video, and then i'll start trying to implement it myself

#

i usually never look at code examples or even pseudo code

charred estuary
#

Does anyone know if this is how you are supposed to set the temperature??

vocal cove
charred estuary
vocal cove
#

Higher means less reliable, more hallucinations.

viscid urchin
#

I believe 1.0 is the default for gemini-2.0 so 0.1 is pretty low I guess

charred estuary
#

Yes I am aware but is what I did the correct way to change it in python

charred estuary
viscid urchin
#

What makes you think the setting isn't working?

#

with config=types.GenerateContentConfig(...)

torpid mirage
arctic wedgeBOT
torpid mirage
#

huh.
Was it meant to work?

#

Worth a bug report IG.
Anyway.

viscid urchin
#

Cool. Are you already using any relevant libraries, or are you just starting out?

torpid mirage
#

I'm just starting out on the cleaning process. Just got pandas

viscid urchin
torpid mirage
#

What about external contextual data which can not be averaged, such as geographical locations/coordinates?

viscid urchin
#

Hmm, it looks like all the available 'scipy' interpolation algorithms are for data that is smoother than yours

torpid mirage
#

Yep. It's super rough

viscid urchin
#

So what's an example missing attribute in your data that we need to impute?

torpid mirage
#

There's a lot.
The main critical are the latitude, longitude, bird species, and the municipality

#

these are missing a lot

#

diagnosis date is another too

#

I asked for Copilot to calculate and summarize the quantity of data present in the columns, and he came back with;

Column    Missing %
focos_de_dnc    94.5%
focos_de_iaap    94.5%
doença    86.3%
número_da_investigação    86.3%
longitude    86.3%
data_do_laudo    86.3%
latitude    86.3%
espécie    86.3%
municipio    86.3%
ocorrência    19.2%
#

some of these will just have to be dropped

#

I'm fine

#

However, I'd like to be able to recover what we can

viscid urchin
#

Hmm. I guess those each kinda need a different approach. For example if the municipality is set but not lat/lon, we could just use the lat/lon of the center of that municipality. For the bird species, we might need a classifier that can figure out the most-likely bird for a location?

torpid mirage
#

yes

#

we got contextual data about common bird migratory patterns and also possibly domestic birds such as chicken

#

how can we do something with that?

#

I can also mix in environmental and biome data

#

such as wetlands, forests, etc

viscid urchin
#

Hmm, isn't latitude/longitude totally missing here, or am I reading the columns wrong?

torpid mirage
#

but to be honest, I have no clue how to do that.

torpid mirage
#

we need to infer those somehow

viscid urchin
#

OK that's fine, we just need to calculate it from the municipality. I wonder what we can know about it when THAT isn't set though?

torpid mirage
#

I have no clue 😩

#

From the bird type, perhaps?

#

They have some set migration patterns that should narrow down the possible location
Try to mean it

viscid urchin
#

I guess let's just work on it one piece at a time, and maybe the rest will fill itself in

#

for geocoding, we can use from geopy.geocoders import Nominatim

#

and then like

geocoder = Nominatim(user_agent="avian_influenza_analysis")
geocoder.geocode(f"{municipio}, {uf}, Brazil")
#

there's a rate-limiter thing built into geopy you might need to wrap that Nominatim() instance with, I guess

#

like maybe

locator = Nominatim(user_agent="avian_influenza_analysis")
geocoder = RateLimiter(locator.geocode, min_delay_seconds=1)
torpid mirage
#

Do you think we could possibly enrich the data with contextual information before cleaning?

#

Would that make it simpler, perhaps?

#

Since we would have more things to infer from

viscid urchin
#

Maybe, yeah. What else do you have to join with? You mentioned bird migratory patterns, I guess that could be cross-linked via the lat/lon you determine...

torpid mirage
#

So far, I've thought about;

Weather data
Environmental/geographical data
Bird migratory patterns

#

Just these three

viscid urchin
#

Best I can think of I guess for determining municipality when given only a state is to have a list of towns in the state, and go by whichever one has the most of these rows associated with it?

torpid mirage
#

I don't know/haven't studied in depth what else more could I plug in

viscid urchin
#

We could train a little classifier on the municipalities in the dataset, but the data is so small, hmm.

torpid mirage
#

Would you like the raw data?

#

With no cleaning

viscid urchin
#

Sure

#

I think a random forest might make sense to train the municipality-guesser, but we're getting into the limits of my experience now

torpid mirage
#

Just a second. I accidentally cooked my notebook...

viscid urchin
#

I've got a little implementation I'm working on, let's see if I can get it to successfully make up anything plausible

torpid mirage
#

God the raw CSV
@viscid urchin

arctic wedgeBOT
torpid mirage
#

1309 rows

viscid urchin
#
Imputed 1130 missing values in doença
Imputed 72 missing values in situação
Imputed 72 missing values in tipo_de_exploração
Imputed 1130 missing values in espécie
Imputed 72 missing values in espécie_principal

hmm, I guess that's something, let's see what it looks like

#

Successfully imputed coordinates for 179/1309 records hmm that's way fewer than I expected, I guess I have something to fix.

#

I wonder what the right play is in situations like this, where so much of the key data is missing. Seems really challenging to get right.

#

Oh I guess there are two columns in this for municipality? 'municipio', 'município'

torpid mirage
#

Yes

#

I guess it's just a duplicated from the combining.

#

I'm combining three spreadsheets into one here.

#

🥴

#

Yeah.
This one going to take a while.

#

And worse, I have to turn this into a pipeline.

viscid urchin
#

I sorta have that aspect of it working, but the imputations I've got are still far from ideal

hollow pagoda
#

@weak oxide

late lichen
#

Guys... I'm curious and got a perhaps stupid idea while coding a MLP library on what will happen if you initialize the network with 0 weights and biases or any real values

delicate pivot
#

How can I start AI/ML .What could you suggest for beginners?

viscid urchin
#

If all the neurons in a layer start with identical weights and biases, they will:
A) all calculate the same output
B) receive the same gradient during backpropagation
C) all make the same updates to their weights

So basically instead of a multi-neuron network, you just have one neuron per layer now

#

You specifically are trying to avoid symmetry

serene scaffold
late lichen
#

I mean all parameter will be initialized on a same value

#

let's say zero's

#

am I doing it right???

#

look on the first attempt

viscid urchin
#

You have weigth spelled two different ways between __call__ and dif etc

late lichen
#

huh

viscid urchin
arctic wedgeBOT
#

first_attempt/FUNC.py line 42

return 1 - (tanh(x) ^ 2)```
late lichen
#

I saw it thanks

late lichen
#

@viscid urchin on summary am i doing it right????

#

I'm heading to something???

wintry relic
viscid urchin
#

The current situation is kinda:
No actual weight initialization
No backpropagation implementation
No loss function
No training loop

#

Usually you put your weights in a matrix instead of having an explicit Edge concept but I'm not enough of an expert to say having what you have is wrong, just less-likely to be fast on modern hardware

#

Also I think your forward method is backwards, you raise an error when values are compatible?

#

To summarize before I crash..
a neuron is a weighted sum of inputs + bias, followed by activation
forward propagation is matrix multiplication between inputs and weights
backward propagation is computing gradients and updating weights

#

Any actual-expert feel free to correct anything I've said, glad to learn.

late lichen
late lichen
#

and for debugging

viscid urchin
#

Sure, but when your modulo test returns 0, that's when things have the same shape and are compatible, right?

#

Surely you want to throw an error when that's non-zero?

late lichen
#

it will make sure if we divide the amount of the layer to the amount of value that each neuron will have same amount of input values

late lichen
#

I wanna make sure the shape of the input is compatible to the layer

viscid urchin
#

Right, so I'm saying you have it backwards, but feel free to test it

arctic wedgeBOT
#

first_attempt/nuralnet.py line 18

return self.weight```
grand minnow
#

oh nvm

late lichen
grand minnow
arctic wedgeBOT
#

first_attempt/nuralnet.py lines 13 to 18

    self.weigth = weigth
def __call__(self, value:float) -> float:
    """takes a value, apply to the parent node, the multiply that output to the weigth"""
    return value * self.weigth
def dif(self) -> float:
    return self.weight```
grand minnow
#

Pick one: weight or weigth

late lichen
#

alr fixed btw

grand minnow
#

I know

viscid urchin
late lichen
#

wait

#

oof my bad lol

#

thanks

late lichen
#

math lib has it???

#

I know numpy has it but numpy takes so damn long t get imported

viscid urchin
late lichen
#

uhhhhh just in case...I wanna learn how to make my code use gpu to process stuff... how to do it???

#

I don't have tensor compatible gpu so no tenserflow

#

i don't have cuda compatible gpu to

#

what I have is Intel dual core graphics

viscid urchin
#

I mean, technically you can do it on that platform, but it's all very experimental and complex, not something I can really walk you through. On Intel the right path is to use a thing called SYCL

#

Oh maybe I'm wrong, I see Intel Core Ultra on the page that links to.. but nothing earlier.

#

Honestly I wouldn't think about it at all until you're comfortable building your neural network on the CPU

late lichen
#

that's fair

#

thanks for advice anyways

#

I also wanna learn attention block just in case I want to make transformers._. is there resource you would suggest?

serene grail
viscid urchin
# late lichen I also wanna learn attention block just in case I want to make transformers._. i...
bronze wyvern
#

Hello guys, I want to learn about ML and AI, how LLMs work and stuff like that, can someone recommend any good resource/books that englobe AI and its entirety please (please bip me if anyone has something to recommend :c)

grand minnow
bronze wyvern
#

but hmm I wanted to learn a bit of the theoretical parts first, like how things work behind the scenes

grand minnow
bronze wyvern
#

Ok, will have a look at them ,ty !

grand minnow
late lichen
#

and look for the machine learning list

#

6

#

guys what will happen to a MLP if you straight up initialized it's parameters to 0?

serene scaffold
late lichen
#

I have no idea

serene scaffold
late lichen
serene scaffold
late lichen
#

also I didn't just stated "zero"

#

wait

serene scaffold
#

@late lichen I'm not following. Can you restate what you're current question is, from the top?

late lichen
serene scaffold
late lichen
#

not sure why you replace "any" with "no"

#

it's clearly different words isn't?

serene scaffold
#

because "zero or any value" is just "any value", and the "or zero" is meaningless.

late lichen
#

infact it's rather opposite

late lichen
serene scaffold
late lichen
#

why

serene scaffold
#

the updated values are determined through multiplication, but x * 0 is always 0.

late lichen
#

yeah?? then??

serene scaffold
#

so if you have a neural network where all the weights are 0, they will stay as 0 no matter how much you try to train it.

#

and the network won't learn anything.

late lichen
#

uhh is there some resources where it clearly shows that??

serene scaffold
#

look into gradient descent

late lichen
#

okay I found one...

#

but how it will behave if it's all 1??

#

or 2???? or 3??

serene scaffold
river cape
daring crystal
#

I have learned some basics of the ml, what fun projects i should work on to get started??

viscid urchin
iron basalt
#

Also has stuff on ML, deep learning, language, etc.

#

A bit of everything.

bronze wyvern
#

ok, ty !

viscid urchin
# bronze wyvern Hello guys, I want to learn about ML and AI, how LLMs work and stuff like that, ...

Here's a really good video (more stuff from the same person, also worth checking afterward)
https://www.youtube.com/watch?v=SmZmBKc7Lrs

Shortform link:
https://shortform.com/artem

In this video we will talk about backpropagation – an algorithm powering the entire field of machine learning and try to derive it from first principles.

OUTLINE:
00:00 Introduction
01:28 Historical background
02:50 Curve Fitting problem
06:26 Random vs guided adjustments
09:43 Derivatives
14:34 ...

▶ Play video
untold bloom
viscid urchin
#

Do you have to use "pickle" format? It's not really the fastest way to serialize/deserialize data

#

There are some things that claim to be faster at pickle stuff, but I'm not sure how much they manage to beat dill by

#

If you can just use JSON also that's way faster

#

OK, so you can't just json.dumps(your_root_object)?

#

Aha, yeah, you may have better luck with PyArrow then.

flint onyx
#

can someone pls help me understand why the residual plot D is a problem. Ive read the solution and I still cant seem to get it

viscid urchin
#

Man those are close, to me, but I guess what it's saying is that you're looking to find a purely random pattern

#

whereas the one on the right has kind of a curve shape to it, where the residuals are positive as you get closer to 0 or 1, and negative as you get closer to the middle values (0.4 to 0.6)

#

but ugh I do have to stare at it to see that

#

I dunno, maybe I'm even reading it wrong, it's so subtle

tired otter
#

Guys

#

In a recommendation algorithm, the system analyzes the items that the user has already viewed and tries to predict what they might like. This type of AI is closest to:

A) Supervised learning.
D) Unsupervised learning.

What's the best answer here? He didn't say that the user rated the movies

viscid urchin
#

Collaborative filtering (which is what I get out of what you describe) is considered "unsupervised"

#

If you had explicit ratings it would/could be supervised

#

Arguably though there's a spectrum between unsupervised and supervised, and it's not a binary thing

#

Because what happens if you take view-counts into account.. that's suddenly "kinda" supervised...

tired otter
#

Got it

viscid urchin
#

That's just my take at least; Google search seems to back me up but I guess it's subjective.

viscid urchin
#

I might be in over my head tonight, team:

#

It's making sense, but slower than I hoped. Oof.

sand pine
#

Hello, very new to all of this after years away from any programming. More of a system setup to take advantage of GPU, running a laptop with 4060 and when searching how there was one method that adds Visual Studio Code to advanced graphic settings and selecting high performance GPU usage and then there is the Nvidia CUDA … are these doing the same thing or are they apples and oranges. Mainly for class project so it isn’t a must but getting into this so I figured I should learn. Any advice for the rookie would be great

viscid urchin
#

Those are apples and oranges, yeah.

#

CUDA is a 'programming toolkit' from nVidia for running code on GPUs, whereas the other thing is just telling Visual Studio Code to use hardware accelerated drawing techniques etc.

#

If you can say more about what kind of projects interest you, we can give better advice about what you should look at next.

jaunty helm
# sand pine Hello, very new to all of this after years away from any programming. More of a...

advanced graphic settings -> high performance
this is telling windows that it should use GPU to draw that program instead of the cpu; if said program (in your example, VSCode) is graphics intensive, it'll boost the performance, making it look smooth, etc.
one quick example off the top of my head is RPG MV games; by default on my pc it draws using CPU, lagging it a lot; setting it to high performance makes it use the GPU which gets me way higher fps
nvidia CUDA
this is probably what you're looking for in the context of programming, but honestly you may not even have to worry about it; for example, if you want to use the popular deep learning library pytorch, you can just install the correct version of pytorch and it'll automatically install CUDA for you during the process

limber spear
#

Can we build a new stack for CUDA called CUDIE or CUDY 😏

#

Maybe COODY

serene scaffold
grand minnow
#

<@&831776746206265384> spam ad

mystic harbor
#

!ban 1360871168776867991 giveaway spam

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied ban to @teal stump permanently.

woeful lodge
#

Any good books i can get from amazon on data science?

sand pine
sand pine
ivory root
#

How do I make my models usable(integrated in a system). My friend asked me to design a project priority level prediction model I finished it but i tried deploying it using fastapi so as for him to access through api but am failing miserably. am developing in colab notebook

agile cobalt
ivory root
#

Am developing it to be used in a friend's system how would he go about it coz basically I develop them and then github keeps them

agile cobalt
#

How and where is your friend planning to run/use that system?

ivory root
agile cobalt
#

what exactly do you mean by 'deploy them on github'? how are you "deploying" it?

you can upload your code to GitHub, but when doing that it only stores the files, it does not runs anything
or do you mean GitHub Pages? It only supports static websites (i.e. you cannot run python in the server side, at most embed in the browser)

#

they will need to host their system in some machine for users to be able to access it
usually you'll want to host your API in the same system, or something connected to it (same cloud provider if you're hosting on the cloud, or in the same network if self-hosting)

ivory root
#

In simple terms yeah I was uploading them. I was tryin to host using fastapi and ngrok

agile cobalt
#

neither Google Colab nor GitHub offers compute for you to host (run) it, are you planning to run it yourself? in some cloud server? in a machine your friend owns?

ivory root
#

not really obviously I wanna host it globally not locally

agile cobalt
#

you will need to decide where to host it in first place then

ivory root
#

I was using ngrok idk if am allowed to share a link in here, I wanted to show you where am at at the moment coz you know how we used to host web locally and get like a responsive page and you are the only seeing the page unless you hosted on like heroku.

#

I tried to host it globaly using ngrok but I can only access the root endpoint no other endpoint is accessible

#

root is a get request but I can't post

#

kidly reach back to me plz

austere prawn
#

Was there any recording of this marimo presentation?

safe agate
austere prawn
#

ok 🙂

serene scaffold
#

@safe agate you were on TalkPython recently, right?

sand pine
# viscid urchin If you can say more about what kind of projects interest you, we can give better...

@jaunty helm hitting both of your responses at once, thanks again for the feedback. In the short all i am working on now is class work for a graduate class on data analytics, so basic ML dealing with Logistics Regression, SVM, model comparison ... im sure I am not doing the summary justice. However following this semester I want to begin take some of what I have learned and begin slowing seeing what I can do in my current role within distribution center planning and design (order management, inventory management.....). In the next couple of weeks I will be wrapping up this class and I noticed some of the datasets are taking longer to run and the impatient person I am began to look into these things. Example some exercises we will use for loops with 3-4 kernel values, 3-4 C values - generally speaking "linear" always takes the longest. Like I said not catastrophic but looking down the road more than anything.

viscid urchin
# sand pine <@208918673178492929> hitting both of your responses at once, thanks again for ...

Cool, hopefully you're making good progress with your class. Linear kernels often scale poorly, it's kinda a core problem in ML. If you end up trying scikit-learn, it has some built-in parallelization that might help (n_jobs etc)

For SVM specifically, there's a thing called LinearSVC that is supposedly zippy, but I'm not an expert: https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html

Do you have a sample dataset for sales/orders/etc you want to play with? You could make a model to predict the right inventory levels for a given product based on historical data or something?

glacial root
#

what would be a good way to get into nlp, and what models should i be focusing on to start with

viscid urchin
glacial root
glacial root
torpid mirage
#

Soooooooooooooooooooo

#

I cooked a schema.

#

Also why do we have a slowmode here? Anyway,

#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

torpid mirage
#

It took a lot of Github copy, documentation diving, and just asking around.
Came up with this.

#

You remember that CSV file that we were cleaning? So, yeah. I managed to get it... somewhat working and clean

#

(Clean as in; I manually took each column and filled it out just to test the schema)

#

Thoughts and review? This is still a lot of magic as I've never built a schema for this purpose.

viscid urchin
#

Cool, let me look

#

Honestly this looks better than I expected it to. Only a few things come to mind.

#

One is that you might want to add a confidence_score column to track how confident the prediction was that filled it in

#

If you have multiple people working on it, it might make sense to add modified_by but that's really about auditing not quality

#

the postgis stuff looks correct to me also, nice

#

ST_SetSRID(ST_MakePoint(NEW.longitude, NEW.latitude), 4326); is nardog but I believe that is a fully correct invocation for WGS84

odd meteor
lapis sequoia
#

when making a GAN, does the image size have to be pretty large? Should the image be normalized to be larger?

#

no, that would make it too big

viscid urchin
#

Actually the early/influential papers used sizes like 32x32, 64x64, so no

#

Some approaches I guess start out super small, like 4x4 and 8x8, and then progressively scale up

lapis sequoia
#

So, if it is RGB, then the image size is 32 * 32 * 3?

viscid urchin
#

Yeah, I guess width x height x color-channels

lapis sequoia
#

what would be a good latent_size if it 32?

viscid urchin
#

Apparently the rule of thumb is that the latent space is 10x to 30x smaller than the output space, so I guess somewhere betwen 3072 / 30 and 3072 / 10?

#

Split the difference and call it 200 to start with maybe?

pearl barn
#

I want to ask which order for learning data analysis with python from Jose portilla courses on udemy he has many courses on python data analysis and i feel there are the same libraries in each course I don't know if they complete each other but which order should I take them to master these libraries??

viscid urchin
lapis sequoia
viscid urchin
#

The latent space depends on the size of the 'output space', so RGB matters in that you've got three color channels to care about.

lapis sequoia
#

yes

viscid urchin
#

Anybody tried to do anything with HiDream yet?

river cape
#

hey guys anyone aware of any open-source embedding models that works just as fine as OpenAIEmbeddings?

light cobalt
#

Hello, can anybody advice me a discord channel with topic of AI development (preferably on python) ?

#

I am new to AI, need to create one for my game. Just researching.

light cobalt
#

Well, then is there any articles about AI usage in gamedev. I just want to implement smart NPC enemies. I want to understand is it feasable in my project.

serene scaffold
shadow folio
#

Can I get Resources to learn data science

serene scaffold
arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

shadow folio
pearl barn
serene scaffold
pearl barn
#

Paid course sql

lapis sequoia
#

python c++ AI ML?
AR VR XR PYHON C++

pallid badge
safe agate
austere prawn
flint onyx
#

p1 - low leverage + small residual (good leverage point)
p2 - low leverage + large (?) residual (outlier (?))
p3 - high leverage + large residual (bad leverage point) (influential)
is this correct?
I asked chatgpt and it gave me something completely diff
it said p1 was an outlier but I cant see why
and it said p2 had high leverage

flint onyx
#

Also am I understanding this right?

low res + high lev = influential
high res + low lev = outlier
both high = could be influential

lapis sequoia
#

Are GANS just nonsense hard and unpredictable and just fry your gpu? Like, they are fighting to get what they want. They are in chaos. Does anyone casually just make GANS? At least in RL, they is some sort of sanity. You know?

viscid urchin
#

They are hard to train and resource-intensive, yeah.. but arguably with modern frameworks people are out there "casually" making them. It's just not easy to get to that level from scratch.

#

Are you doing your own by hand or are you using a library?

#

I'm feeling super dumb suddenly, what's a use-case for np.sum(array, axis=-1)? I'm trying to think of when I want to work backwards vs forwards.

serene scaffold
#

See what happens

viscid urchin
#

I mean, I understand what it's doing mechanically, I'm just trying to come up with an algorithm where I'm going to want that

#

I feel like I should be able to think of four really easily and it's not happening haha

#

Oh duh, image processing where you want to sum across the three color channels or something, I suppose.

#

I guess axis=0 is like "reduce rows", axis=1 is like "reduce columns", and axis=-1 is like "reduce innermost dimension"

desert oar
#

Note that this isn't just part of np.sum -- it's shared behavior for most vectorized numpy operations ("ufuncs")

desert oar
#

It's a very flexible system

viscid urchin
#

I'm trying to re-program my brain to think of the 'matrix approach' to things first; it's slightly painful. Thanks for the confirmation.

lapis sequoia
jaunty helm
#

fry your gpu
I mean all neural networks do that once you get big enough

#

llama 8b? casually eats 20gbs of vram (if you do no quantization)

glacial root
#

i could be wrong though

hearty depot
hearty depot
glacial root
hearty depot
#

for lists, its select the ith list strating from the inner d imension

pearl barn
#

What do you thin of Maven Data analysis course with python is it good or I will just repeat the process and when I find a real world problem I will be stuck like stupid? And What About Alice Zhao is she good she made an advanced sql course but I couldn't download it because it was uploaded to rapidgator?

lapis sequoia
jaunty helm
lapis sequoia
dusty valve
umbral geyser
#

Ah soon I will be active In this group as I am taking data analytic, ai, data science/ml path later on

#

So excited to discuss with you guys n learn some cool hacks n tips

viscid urchin
#

I wonder if anybody's tried to make a meme model that just uses the mantissa bits of FP16 NaNs, and ignores all real floats

tribal kettle
#

Hey guys give me the road map to data base administration

grand minnow
grand breach
#

does huggingface allow to generate embeddings through api without downloading the model locally ?

lapis sequoia
#

AI

#

Ml

grand breach
tribal kettle
#

Thank you so much agentQ

lapis sequoia
outer cloak
#

Yo! What's up mates I am back after a LOT of work! What's goin' on?

#

I learnt using Pandas and learned how to Clean Data

#

Now what??

pallid badge
#

Hi, are there any pages good for checking for data science jobs, scientific software dev in Europe? Linkedin shows me only promoted jobs first, very annoying

outer cloak
#

umm HI mate!

#

You can do Freelancing! Its easy and good for DS and ML.

Go to Fiverr or Upwork

#

sign in, create Gig and give out your sample projects. and BAM! you are done!

#

u get it?

lapis sequoia
#

bro, like, when most people talk of RL who are not in robotics or optimal control theory EE stuff, are they just talking Q-learing? With Q-tables? Like, what is up?

limber token
#

They're more of a hosting thing than a service thing

#

Doesn't seem to support every model type though

#

Here's a snippet on how to generate embeddings:

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

result = client.feature_extraction(
    inputs="Today is a sunny day and I will get some ice cream.",
    model="intfloat/multilingual-e5-large-instruct",
)
grand minnow
warm iron
#

Guys what could cause it?

odd meteor
# warm iron

A lot of things can cause this.

From this learning curve we could infer your model is overfitting. Your model keeps fitting better to the training data, but it’s no longer generalizing better to unseen validation data after 70th-80th epoch.

  • What's the model you're training?
  • Are you using sufficient amount of data to train the model?
  • Did you apply any regularization?
  • BatchNorm, Dropout, Weight Decay... are you using any of those?
  • Have you done hyperparameter sweep on your learning rate?
  • Tried using different batch size and it's still not improving?

I'd like to hear what you've tried so far.

fallen cliff
#

Hello I've made a Hcaptcha solver in python as university project, useing Ml.
Don't need anything but a feedback.
That's the repo -> https://github.com/Irodavlas/HCaptchaSolver .
If anyone wants to give it a try lmk it should take few minutes since I've put tests on it.

GitHub

Contribute to Irodavlas/HCaptchaSolver development by creating an account on GitHub.

river cape
#

hey guys i have just used the function calling in openai. and I dont understand where could it be useful? I understand that we define a function description and if the prompt matches the description , the model would extract the parameters . Now these params could be use for an api function to extract the real-time info , but then to pass that real-time info , we again need to send the same prompt again , to get the final outcome . So its a two step process.

weak oxide
#

I have a question
What's your opinion on Neural Prophet?

#

Because I saw normal Facebook prophet get completely bashed

narrow tiger
#

guys bge-m3 or openAI, which embeddings sare better?

inland fractal
hollow lake
#

Hello guys Anyone here who master using RAG framework with chatbots ?

serene scaffold
rose spade
#

How do I train my AI model with my own scratch datasets? I'm planning to use pytorch and pandas for this.

viscid urchin
rose spade
#

In a hypothetical situation, if I have a project that uses AI and Machine Learning, I should probably learn the basics first to understand the better logic and such?

viscid urchin
#

Yep. There's a lot to learn, and it's daunting at first, but you should learn it from the ground up.

#

Feel free to have a big goal in mind to motivate you of course.

#

For me that's meant re-learning a bunch of calculus I hadn't paid enough attention to the first time around.

rose spade
#

Well damn, that's a tough one. but ig I'll read the needed docs for that. hopefully I can show some progress for my project that involes machine learning and ai.

#

Supposedly, my only focus are on web-dev, but I got shifted to learning ai and such.

viscid urchin
#

That's a big shift of scope

#

Like, I don't want to belittle webdev, but ML is a bigger problem to tackle

ionic surge
#

i want to learn model training in python ising yolo in kaggle any one help

#

pls

viscid urchin
#

Check out the 'pinned' stuff at the top of the channel, it seems pretty good

viscid urchin
ionic surge
#

yes i have a notebook and i also create my own custom dataset for my project

ionic temple
#

Anyone?

viscid urchin
# ionic temple

This might just be your font that it's chosen to use, because I think the MongoDB console is utf-8 by default.

charred estuary
#

Has anyone played around with the HALO Hat for the Raspberry Pi 5?

#

It adds 26 TOPs and I was wondeing if I should get one or save to build a dedicated rig with a 3060

limpid dew
#

Is there a fundamental difference between using an embedding layer and one‑hot encoding into a fully connected layer?

untold fable
#

Imagine I want to create a program where that will guess your facial expression and based on your facial expression place the song on Spotify

#

I don't have any Spotify premium

versed bloom
#

Are people with knowledge in Deep Learning here? If yes, please write me a DM

limber spear
versed bloom
# limber spear What is it about. We can all learn on the channel here

I am kind of confused by ResNets. I've seen the provided picture and know that the architecture on the right represents them. But I don't get how the input is provided since it is combined with more prior output as of what I've read, so how the "Residuum" really is calculated. And how the Skip Connections work

limber spear
versed bloom
limber spear
#

This looks like a convulational neural network for image processing. With a bunch of processing layers

#

@versed bloom did you pull the equations? That is the how. Or what you may be seeking to understand

versed bloom
#

The skipping is the dotted lines i gues, ResNet is a translated form of CNNs. I dont need a equation, I dont know how the input of a layer comes up since it is combined with some other prior output and the initial input?

limber spear
#

The ‘skipping lines’ have equations. That is how most of these ‘complex’ models work

#

Just googled it

severe blade
#

hey, i've been struggling with this for 3-4 days now.

grand minnow
strange vault
#

I am trying to create a bot that extracts energy prices in de EU, per country and want to have live updates that are relatively up to date. I found one website that is both free and updates their dat frequently. But I can't find out how to use their API as they seem to be transitioning websites.

https://newtransparency.entsoe.eu/market

#

Does anyone know an alternative database for this or how to actually access their api to extract the data live and semi-continuous?

lapis sequoia
#

I made a GAN that did not mode collapse, I have goosebumps this is amazing and magical. GANS are magical. They really are.

#

I did not think this was possible. I love this!

spring field
limber spear
charred estuary
#

Hey does anyone know if PyTorch could take advantage of 2 GPUs? Was planning on getting x2 3050’s with 6gb of VRAM each. I know I can’t train a massive model but I want to try training something small from scratch or fine tuning a 500M - 1B model

#

Do I need 1 GPU with 12GB vram or will x2 with 6gb do the trick?

serene scaffold
viscid urchin
charred estuary
pseudo lagoon
#

Hey everyone I have basic knowledge of python mainly I am a web dev but want to learn ai since I am not getting anyone who wants a website :-:
Can anyone guide how to start in ai field and how to improve further
For context i ald have basic knowledge of python can use APIs have a foundation of pandas too

charred estuary
#

Building it definitely helped me learn the basics and now I’m learning PyTorch

charred estuary
#

Great tool to start learning

reef gazelle
#
import numpy as np
from PIL import Image, ImageOps

const_x_mean = 33.318421449829934
const_x_std = 78.56748998339798
epsilon = 1e-10


img_path = 'one.png'
img = Image.open(img_path).convert('L')  


pixel_mean = np.mean(img)
if pixel_mean > 127: 
    img = ImageOps.invert(img)

img = img.resize((28, 28))
img_array = np.array(img).reshape(1, 784)


img_array = (img_array - const_x_mean) / (const_x_std + epsilon)

prediction = model.predict(img_array)
predicted_class = np.argmax(prediction)

print("Predicted class:", predicted_class)


#

I need help it load wrong image data

#

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential(
[
Dense(128,activation='relu', input_shape=(784,)),
Dense(128,activation='relu', ),
Dense(10,activation='softmax')

]
)

#

any boday can help me how i can load image correctly when i predict

#

model work with the test data correctly

reef gazelle
#

hey

lapis sequoia
#

any of you remember your first GAN?

lapis sequoia
severe blade
viscid urchin
severe blade
grand minnow
severe blade
#

yes.

#

i literally bought this 4060 for this purpose. and it doesn't seem to work. 😭

grand minnow
severe blade
#

normally. pip install torch.

grand minnow
severe blade
#

yea, just noticed this.

grand minnow
#

reinstall your pytorch

#

Then try again

severe blade
#

done.

#

device = 'cuda' if torch.cuda.is_available() else 'cpu'
this too returns cpu only.

#

still doesn't work.

grand minnow
#

Try uninstalling pytorch completely and then run your test code. If it throws "Module Not Found" error, then its definitely in the right environment. If it still runs, then you've installed it in the wrong environment.

#

You might consider setting up a virtual environment

arctic wedgeBOT
#
Virtual environments

Virtual environments are isolated Python environments, which make it easier to keep your system clean and manage dependencies. By default, when activated, only libraries and scripts installed in the virtual environment are accessible, preventing cross-project dependency conflicts, and allowing easy isolation of requirements.

To create a new virtual environment, you can use the standard library venv module: python3 -m venv .venv (replace python3 with python or py on Windows)

Then, to activate the new virtual environment:

Windows (PowerShell): .venv\Scripts\Activate.ps1
or (Command Prompt): .venv\Scripts\activate.bat
MacOS / Linux (Bash): source .venv/bin/activate

Packages can then be installed to the virtual environment using pip, as normal.

For more information, take a read of the documentation. If you run code through your editor, check its documentation on how to make it use your virtual environment. For example, see the VSCode or PyCharm docs.

Tools such as poetry and pipenv can manage the creation of virtual environments as well as project dependencies, making packaging and installing your project easier.

Note: When using PowerShell in Windows, you may need to change the execution policy first. This is only required once per user:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
severe blade
severe blade
grand minnow
grand minnow
severe blade
#

still runs.

rich river
#
  "postCreateCommand": "cd detectron2-0.6 && python3 -m pip install -e ."

this is in my docker file
I got

Running the postCreateCommand from devcontainer.json...

[5223 ms] Start: Run in container: /bin/sh -c cd detectron2-0.6 && python3 -m pip install -e .
Defaulting to user installation because normal site-packages is not writeable
Obtaining file:///workspaces/FFS-main/detectron2-0.6
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      running egg_info
      creating /tmp/pip-pip-egg-info-qpqnconv/detectron2.egg-info
      writing /tmp/pip-pip-egg-info-qpqnconv/detectron2.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-pip-egg-info-qpqnconv/detectron2.egg-info/dependency_links.txt
      writing requirements to /tmp/pip-pip-egg-info-qpqnconv/detectron2.egg-info/requires.txt
      writing top-level names to /tmp/pip-pip-egg-info-qpqnconv/detectron2.egg-info/top_level.txt
      writing manifest file '/tmp/pip-pip-egg-info-qpqnconv/detectron2.egg-info/SOURCES.txt'
      error: package directory 'detectron2-0.6/projects/PointRend/point_rend' does not exist
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
[7480 ms] postCreateCommand from devcontainer.json failed with exit code 1. Skipping any further user-provided commands.
Done. Press any key to close the terminal.

However

vscode ➜ /workspaces/FFS-main $ ls -l detectron2-0.6/projects/PointRend/
total 24
-rw-rw-r-- 1 vscode vscode 7467 Oct 26  2021 README.md
drwxrwxr-x 4 vscode vscode 4096 Oct 26  2021 configs
drwxrwxr-x 2 vscode vscode 4096 Oct 26  2021 point_rend
-rwxr-xr-x 1 vscode vscode 5160 Oct 26  2021 train_net.py

I do have this folder, any ideas?

grand minnow
severe blade
#

its the same.

#

im using python version 3.12.6, for the record.

grand minnow
severe blade
#
Package            Version
------------------ -----------
certifi            2025.1.31
charset-normalizer 3.4.1
filelock           3.18.0
fsspec             2025.3.2
idna               3.10
Jinja2             3.1.6
MarkupSafe         3.0.2
mpmath             1.3.0
networkx           3.4.2
numpy              2.2.4
pandas             2.2.3
pillow             11.2.1
pip                25.0.1
python-dateutil    2.9.0.post0
pytz               2025.2
regex              2024.11.6
requests           2.32.3
setuptools         78.1.0
six                1.17.0
sympy              1.13.1
tiktoken           0.9.0
torchaudio         2.6.0
torchvision        0.21.0
typing_extensions  4.13.2
tzdata             2025.2
urllib3            2.4.0
#

why the hell is there no "torch"

#

but the code still runs?

grand minnow
#

What does your "kernel" show?

severe blade
grand minnow
#

Im off to bed

severe blade
severe blade
fallow coyote
#

I use CoPilot and ChatGPT to find resources and to give me an idea on how to go about starting a particular project in mind. Is this an effective way of using these tools? I never use AI for helping me complete any of my programming projects; any issues that come up in my code, either i figure out myself, ask the discord here or seaching on google (stackoverflow, reddit etc)

torpid jungle
#

crazy pfp

wild solar
#

Is there a preferred Python version to use when it comes to machine learning and AI? I used to work with Python 3.13 but had many issues with PyTorch and had to roll back to an older version, now i use 3.10 mainly and 3.12.6 occasionally.

tardy vessel
#

I'm building an ai voice agent in python that streams audio from twilio to assmebly ai stt for transcription and vad, but it takes 4 seconds to reply back with the final transcript. I want it to be less than 1 sec.

Can anyone help with this?

viscid urchin
inland fractal
#

should I only use sklearn correlation matrix before start feature scaling?

limpid dew
#

Anyone have experience with embedding? Trying to understand how it difference form one hot encoding at a low level.

grand minnow
grand minnow
grand minnow
shadow cobalt
#

When doing PCA, increasing number of components shouldn't affect previous features are selected should it? i.e. picking N features should lead to the same list of features N-1 but just with an extra feature added

grand minnow
shadow cobalt
#

maybe i need to rewatch how it works

#

i thought it was like tuning a polynomial fit like a taylor series

glacial root
#

hey guys, i just implemented a byte-pair encoding algorithm for a corpus of text, however i'm not exactly sure why this issue is happening
basically the issue is that after a certain number of iterations, rather than creating pairs it adds empty strings to the corpus, which originally started out as an array of each char from the original text used
here is my code and the corpus used, if anyone knows about this and can see the cause of the issue can you please help? thank you

#
import os

command = 'cat text_data/forms/abc/AbcPoems2AbcHkAndChinaV2Cauchy3Poembycheungshunsang.txt'
executable = os.popen(command)
corpus = list(executable.read())
executable.close()

vocabulary = []
for i in corpus:
    if i not in vocabulary:
        vocabulary.append(i)

for i in range(1000):
    pairs = dict()
    for j in range(len(corpus) - 1):
        key_list = list(pairs.keys())
        pair = ''.join(corpus[i:i+2])
        if pair in key_list:
            pairs[pair] += 1
        else:
            pairs.update({pair: 1})
    
    pairs = dict(sorted(pairs.items(), key = lambda item: item[1]))
    vocabulary.append(list(pairs)[-1])
    for j in range(len(corpus) - 1):
        chars = ''.join(corpus[i:i+2])
        if chars == vocabulary[-1]:
            corpus[i:i+2] = [chars]
#

and here's the corpus i used, which i then converted to an array of chars

#
2 ABC of H.k. and China revised vision.
Barrels tears are wines and salts.
With a whisk on goody tails!
Wiggle maces to fix the heads.
Heads in jack on boxes are ceased.
Cry to paranoid truly bosses.
Bosses are jokers take your boys.
Studs are bogs with fire apples.
True predicates worth cases.’
Descents wash in badly bands.
Wholly sales are smart with cats.
Who got tenth honors in China?
Homage grand to play and plays!
Trim the times of hearts then cry.
Tanks in steels but voice wail.
Bossy dragged by tails that whisked.
Go very timid and love the wise.
Hands are lent but laws are ends.
Cases on courts are borrowed lands.
Length long with treads to retch!
Straps on times and watch here.
Arrays tanks but all are men.
Cross all suctions steal the ends.
Cave on minds are cages on objects.
Rouser rockets powers holes.
Confine curses to stop our wounds.
Whirl your bodies and jump on grounds.
Crouch of soldiers after kicks with flings.
Block one leg and hit the middle.
Cauchy3 know the tricks to kill.
Threaten weak oppressed ill.
Surpass scores are bad in honors.
Wash to think that build the homes.
Angel sins but cauchy3 has funs.
Make ones tools when hats are found.
Worlds are drawers on bottom noses.
Singular ugly piece is rose.
Wily mores are teeth of sharks.
Saw with tooth is laws in arts.
Artful men power with grids.
Bodies stamped and wills are ridden.
Sign in forth with battles conquered.
Triumphs on candles whip the stands.
Soups are soaps and faiths not come.
We are meats in balls and rice to constants.
---Cheung Shun Sang=Cauchy3---
#

i know the poem is a little weird lol, was from a random dataset i found on kaggle

#

here's an example result of what i mean with what's going on with the corpus

arctic wedgeBOT
shadow cobalt
#

I'm pretty sure that works that way

fallow coyote
pine heron
charred estuary
#

I’m building a new rig and I am getting into AI training and running LLMs locally. Are there any good AMD GPUs for AI devs or is it just really an NVIDIA thing? I’m finding a lot of AMD GPUs with a decent amount of VRAM are much less than NVIDIA.

serene scaffold
# charred estuary I’m building a new rig and I am getting into AI training and running LLMs locall...

I'm not aware of any non-NVIDIA hardware that's anywhere nearly as widely supported as NVIDIA hardware. You can look to see if PyTorch runs on any non-NVIDIA devices, and with what caveats.

You will find that the amount of compute resources needed to fine-tune or deploy LLMs varies by orders of magnitude. you might consider not buying any AI-specific hardware at all, and using the savings to rent cloud compute.

You can't train an LLM from scratch on consumer-grade hardware--you can only maybe fine-tune an existing one.

iron basalt
charred estuary
#

The 4B may have been cloud but I know he did the 2B himself

charred estuary
#

It’s all compatible just asking wether or not it runs well

iron basalt
#

One GPU is not enough.

#

(15k USD)

charred estuary
#

Damn 😭

charred estuary
#

It’s more important to learn the skills IMO and the you can judge if a device like that is worth it

iron basalt
charred estuary
#

Yk what I mean

#

Not wanting to make the next big thing

iron basalt
#

Ok, if you just want to learn some ML, any modern consumer GPU will do.

#

Except Intel or whatever.

charred estuary
#

Yea but from what I’m finding most AMD cards come no where near NVIDIA

iron basalt
#

No idea of the status of Intel, seems like no one cares about it.

charred estuary
iron basalt
#

AMD is chosen for price.

charred estuary
agile cobalt
#

iirc there are some programs that support inference on AMD, but for training you'll really want NVIDIA

charred estuary
charred estuary
charred estuary
#

Ima email Jenson and js be like hey you gotta have an extra H100 laying around somewhere right

charred estuary
iron basalt
#

Nvidia is the typical option, and probably what you want. If you can get one...

agile cobalt
#

there is also the option of just renting cloud compute instead of purchasing a GPU though, specially if you want to try training/fine tuning larger models

iron basalt
charred estuary
#

It’s $3.39 an hour for an H100 rig

charred estuary
#

By not huge I mean not like DeepSeeks full 164B or whatever it is

#

Anyway any AMD cards that you think would be semi fast for training a 5B?

iron basalt
charred estuary
iron basalt
charred estuary
iron basalt
#

Well 2B is a lot smaller.

#

Also fine tuning or from scratch?

charred estuary
#

From scratch with torch

charred estuary
#

I asked if he did both locally or the 5 in the cloud and he said 3090 did both

iron basalt
charred estuary
iron basalt
#

It goes for about $1,700-1,800.

#

The Radeon RX 7900 XT has 20 GB, and about 103 TFLOPS at half precision. It goes for about $1,000-1,300.

#

The Radeon RX 6950 XT has 16 GB, and about 47 TFLOPS at half precision. It goes for about $500.

#

So the 3090 is clearly optimized around memory, likely to be able to hold a lot of texture data.

#

So games can load once and hold it all in there.

#

The conclusion here is the Nvidia prices are absurd, especially for a GPU that old.

#

Nvidia 5090 and such are way faster in terms of half precision FLOPs, but nobody can get a hold of them.

#

(And also have 32 GB)

iron basalt
#

So, important to keep that in mind if you want to go AMD, since it has less VRAM (unless you are willing to increase the price, then you can get 32 or 48 GB).

#

But on the other hand more FLOPs. So if you go smaller, you can go faster than the 3090 (e.g. 3-4B).

#

Note that the tricks used also degrade the quality, but since this is just for learning / messing around, that does not really matter.

river cape
# severe blade its the same.

hi i saw this right one, first of check which cuda toolkit you have and if its compactible with exisiting torch version

#

and also are you on windows or linux?

severe blade
severe blade
river cape
rich river
#

This is using the detectron2 framework

from detectron2.engine import launch
...
def main2(args):
  ...
  print(outputs)
  return outputs
def launch_main():
    # Create arg parser
    arg_parser = setup_arg_parser()
    # args = arg_parser.parse_args()
    args = arg_parser.parse_args(["--dataset-dir", "/workspaces/FFS-main/data",\
                                  "--test-dataset","E2E_Robotics_ood_val",\
                                  "--num-gpus", "1",\
                                  "--config-file", "/workspaces/FFS-main/Flow_Feature_Synthesis/detection/configs/AD-Detection/regnetx.yaml",\
                                  "--inference-config","/workspaces/FFS-main/Flow_Feature_Synthesis/detection/configs/Inference/standard_nms.yaml",\
                                  "--random-seed", "8",\
                                  "--image-corruption-level","0",\
                                  "--visualize","1"
                                  ])
    # Support single gpu inference only.
    args.num_gpus = 1
    # args.num_machines = 8

    print("Command Line Args:", args)

    outputs = launch(
        main2,
        args.num_gpus,
        num_machines=args.num_machines,
        machine_rank=args.machine_rank,
        dist_url=args.dist_url,
        args=(args,),
    )
    print("outputs in launch main are:")
    print(outputs)

the outputs printed inside main2 are correct
but when I try to get the result in launch_main, it shows

outputs in launch main are:
None

any ideas?

viscid urchin
#

You’re not returning anything from “launch”

upbeat prism
#

does the autograd/backprop engine in PyTorch first build a topologically sorted graph and then just runs backprop or do they somehow "merge" the two?

gilded sundial
#

What type of projects can I make with CNN classification?

jaunty helm
gilded sundial
gilded sundial
#

Ok

umbral hatch
#

hey guys I'm interested in data science is there any specific website in the pythondiscord.com/resources for data science? or should i just learn python for now?

viscid urchin
#

Do you have any budget for courses/websites? Some of the nicer-seeming options cost a little something.

#

(Plenty of free stuff too, but there are some nicely-structured paid things)

umbral hatch
#

nah unfrotuantely

#

student mainly with data science studying on the side

viscid urchin
umbral hatch