#data-science-and-ml | Python | Page 165

untold bloom Apr 30, 2025, 7:11 AM

#

wow

#

thanks for the enlightment

#

that last s has been bothering me

verbal oar Apr 30, 2025, 10:37 AM

#

for example for drug discovery

#

if I said it correctly

#

I mean I saw one offer related to graph nn

fallow coyote Apr 30, 2025, 12:21 PM

#

im struggling to understand the likelihood function. I look on one website, it says one defintion, I look at another, theres a completely different definiton. how do I define what it is

calm thicket Apr 30, 2025, 12:42 PM

#

there are many different likelihood functions. probably in machine learning you are using maximum likelihood estimation

wooden sail Apr 30, 2025, 12:46 PM

#

fallow coyote im struggling to understand the likelihood function. I look on one website, it s...

are you struggling with the general concept?

fallow coyote Apr 30, 2025, 12:59 PM

#

I've somewhat figured it out. Still seems confusing but I read up that likelihood is a topic that still confuses even the bbest mathematicians so Im not too bothered if I dont understand it fully

wooden sail Apr 30, 2025, 1:00 PM

#

the idea is kinda simple though

fallow coyote Apr 30, 2025, 1:01 PM

#

thats what I thought and then I looked up the definitions and it started confusing me

wooden sail Apr 30, 2025, 1:01 PM

#

the easiest way to see it, imo, is that you have a pdf for some set of observations, and the pdf depends on some parameter

#

for example you can say you have data X that is gaussian distributed. the gaussian distribution has 2 parameters, the mean and the variance (or standard dev, as you like)

#

we can then write the pdf as a function p(x, mean, std dev)

#

if you keep the mean and std dev constant, then p(x, mean, std dev) is a pdf describing the probability density of observing any x that follows this pdf

#

if on the other hand you observe one specific x and keep it constant, but allow the mean and std dev to vary

#

then p(x, mean, std dev) is no longer a pdf. it doesn't integrate to 1 if you integrate over the mean and std dev

#

this second case is what one calls the "likelihood function", and you can do this for any pdf

#

the difference is what is kept constant and what is allowed to vary

fallow coyote Apr 30, 2025, 1:05 PM

#

I think I get it now

verbal oar Apr 30, 2025, 2:43 PM

#

i just relate it to probability or odds, likelihood is related to bayes generally

#

when you have a priori, a posteriori and likelihood

#

but its likelihood not likelihood function

#

likelihood is B given A, B|A

#

I think this is origin of it

#

also lower, marginal is origin to marginal distribution or probability dont sure

#

yes likelihood is probability because of P

#

what do you think

#

looks like bayes is base

#

like its base for vae for example when you have priors

wooden sail Apr 30, 2025, 2:58 PM

#

that's something separate

#

the posterior you compute here via bayes' rule can itself be either a pdf or a likelihood

verbal oar Apr 30, 2025, 2:59 PM

#

ok

wooden sail Apr 30, 2025, 3:00 PM

#

the distinction is again made by what is kept constant and what is allowed to vary

verbal oar Apr 30, 2025, 3:02 PM

#

lol so I confused too because of word having two meanings likelihood

#

so depends on context

river cape Apr 30, 2025, 3:12 PM

#

Hey guys so I have spent my time learning ML , DL , transformers and right now i am learning langchain , but I dont have much knowledge on DSA . So im confused at this point , as to what to do

left tartan Apr 30, 2025, 3:12 PM

#

river cape Hey guys so I have spent my time learning ML , DL , transformers and right now i...

Which DSA do you mean? Data Structures and Algorithms?

river cape Apr 30, 2025, 3:12 PM

#

left tartan Which DSA do you mean? Data Structures and Algorithms?

Yes

verbal oar Apr 30, 2025, 3:12 PM

#

where I can find hypothesis testing inside deep learning?
with machine learning you see it for example in R statistical summary

left tartan Apr 30, 2025, 3:13 PM

#

river cape Yes

Oh, see #algos-and-data-structs . MIT 06.001 is a good start.

verbal oar Apr 30, 2025, 3:14 PM

#

looks like its hidden?

#

I dont know about relavance of it

river cape Apr 30, 2025, 3:15 PM

#

verbal oar where I can find hypothesis testing inside deep learning? with machine learning...

The thing about deep learning is that it works like a black box , so I doubt whether you can actually figure it out

verbal oar Apr 30, 2025, 3:15 PM

#

ah ok right

river cape Apr 30, 2025, 3:15 PM

#

left tartan Oh, see <#650401909852864553> . MIT 06.001 is a good start.

I see , but DSA is it actually required?

left tartan Apr 30, 2025, 3:17 PM

#

river cape I see , but DSA is it actually required?

Required for what? For a CS degree? Yah, I don't know of any that don't require it.

verbal oar Apr 30, 2025, 3:17 PM

#

so its doing it automatically

river cape Apr 30, 2025, 3:17 PM

#

left tartan Required for what? For a CS degree? Yah, I don't know of any that don't require ...

Required for a job?

left tartan Apr 30, 2025, 3:18 PM

#

river cape Required for a job?

That's a better question for another channel, like #career-advice . But, DSA basics (the content of an undergrad DSA class) are pretty fundamental and kinda assumed knowledge for software engineers.

#

The real question is "how much is needed", which is difficult to answer because it's difficult to measure.

verbal oar Apr 30, 2025, 3:20 PM

#

yes as I recall inside linear regression is hypothesis testing

limber spear Apr 30, 2025, 7:08 PM

#

river cape Hey guys so I have spent my time learning ML , DL , transformers and right now i...

Learn it all lol

#

Probably wouldn’t be able to 😅 there’s too much to learn

#

Hey chat

proper meteor Apr 30, 2025, 7:23 PM

#

river cape Hey guys so I have spent my time learning ML , DL , transformers and right now i...

Dsa is must if you want a job at good company you'll have to solve dsa problem in your technical interview regardless of the position you're applying for

feral meteor Apr 30, 2025, 8:19 PM

#

asked clude to show me the heat map

#

https://tenor.com/view/the-undertaker-aj-styles-wwe-boneyard-match-wrestle-mania36-gif-16943984

Tenor

verbal oar Apr 30, 2025, 9:15 PM

#

hey robert 🙂

hallow wagon Apr 30, 2025, 11:42 PM

#

Has anyone here tried out Google's new Agent Development Kit (ADK) yet? https://github.com/google/adk-python
Curious about giving it a shot but wondered if anyone recommends it or prefers other libraries for building agents?

GitHub

GitHub - google/adk-python: An open-source, code-first Python toolk...

An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control. - google/adk-python

manic lion May 1, 2025, 1:05 AM

#

machine learning is difficulty to work ?

#

exemple, an guy trainning an ia to respond the questions.

serene scaffold May 1, 2025, 1:34 AM

#

manic lion machine learning is difficulty to work ?

Yes

serene scaffold May 1, 2025, 1:34 AM

#

manic lion exemple, an guy trainning an ia to respond the questions.

That's relatively easy if you start with a foundation model. Otherwise it's very difficult.

#

(and if you start with a foundation model, all the actual AI is abstracted away.)

rich moth May 1, 2025, 2:13 AM

#

Whoa! Check out these polar plots! The the timeseries one (arrowhead) is crazy because all the points line up almost perfectly on the 0 degree, 180 degree line. It makes sense though, if you think of about time series just having a 1D nature . But whats interesting is the image one is that the complexity can be represented in a 2D nature. The points are distributed across multiple angles in the complex plane, forming patterns that extend in various directions rather than being confined to a single axis. What cool though is all these patterns are merging naturally from the math formula I made.

#

It's like this hidden "dna" of data

#

There's a unique "shape" to complexity across different data types ,almost kind of hidden signature or "calling card" that reveals the fundamental nature of information itself. It doesn't just measure how difficult a sample is to learn (magnitude), but also characterizes what kind of difficulty it represents (phase).

Images below are from IRIS dataset.

rich moth May 1, 2025, 2:47 AM

#

#

limber spear May 1, 2025, 2:54 AM

#

rich moth There's a unique "shape" to complexity across different data types ,almost kind ...

Welp Plunder. I think it is time for you to publish a paper on this atp. Is this going up on GitHub

#

Test your research against the big dawgs. Llama, DeepSeek, ChatGPT etc.

#

Claude. What is the deal with Langchain. Why are folks excited about Langchain

rich moth May 1, 2025, 3:07 AM

#

limber spear Welp Plunder. I think it is time for you to publish a paper on this atp. Is this...

Honestly, I dont even know where to begin. Feels so overwhelming. I have started writing some, Im slow at it and takes me forever, dreading it already lol. As far as github its a great idea, im also slow at that, too. I'm not greatest with it. Do you know of any exceptional resources to better my GitHub skills? I once (years ago) nuked my entire hardrive because I was , well an idiot and didnt know what I was doing. But i appreciate your input, broski.

limber spear May 1, 2025, 3:11 AM

#

rich moth Honestly, I dont even know where to begin. Feels so overwhelming. I have start...

I just started making repositories and building.

rich moth May 1, 2025, 3:13 AM

#

limber spear I just started making repositories and building.

How's that been working out?

limber spear May 1, 2025, 3:17 AM

#

rich moth How's that been working out?

Ehh. I contribute to open source. It is not so much for personal gain than it is to contribute to society

rich moth May 1, 2025, 3:22 AM

#

That's a great attitude

#

This one is on the Breast Cancer Dataset. Whats interesting is Phase ascending strategy took the cake on this one

#

oops, lol its obvious hard to easy. Im looking at two different things

#

It was the WINE tabular dataset i was looking at where Phase Ascending over random +5.41 %

rich moth May 1, 2025, 3:41 AM

#

Its almost like the opposite rings, for this complexity tool. Feedings it lots of information , rather than little,. improves overall results. Which makes sense, more complexity better results. But its not always the case, sometimes another method seems to be working , but always beating random everytime. But complexity isnt defined by domains, it something else, thats what I think I found though. Well some of it, even though it works and works well. I feel like theres something else missing to the pie. It's like dark matter, I cant see it, but trail and error in our measurements show "something is there"

rich moth May 1, 2025, 4:24 AM

#

Im really curious how fractals might play a role here. like thinking about patterns like the Mandelbrot set on the complex plane .. especially the boundaries between points that stay bounded and those that escape to infinity...

#

damn! i never thought about that... they both fundamentally operate in the complex plane there might a connection here

river cape May 1, 2025, 7:39 AM

#

proper meteor Dsa is must if you want a job at good company you'll have to solve dsa problem i...

I was aiming for a start up

proper meteor May 1, 2025, 7:40 AM

#

river cape I was aiming for a start up

Then you have to hire employees with dsa and take away assignments I'd do that if I were you kek

river cape May 1, 2025, 7:41 AM

#

proper meteor Then you have to hire employees with dsa and take away assignments I'd do that i...

I meant i was aiming to work as an employee for startup

proper meteor May 1, 2025, 7:42 AM

#

river cape I meant i was aiming to work as an employee for startup

Pretty sure they'll ask some dsa releated questions too

#

It allows the recruiters to check if you have problem solving abilities or not

abstract loom May 1, 2025, 12:37 PM

#

Hey

charred estuary May 1, 2025, 1:56 PM

#

Anyone else getting 503 with the Gemini API?

charred estuary May 1, 2025, 1:57 PM

#

abstract loom Hey

howdy

verbal oar May 1, 2025, 2:02 PM

#

being little offtopic github is not backup site

#

its only version control, still need backup somewhere

limber spear May 1, 2025, 2:14 PM

#

I use GitHub for backup. Unless GitHub says no lol

verbal oar May 1, 2025, 2:57 PM

#

I say what I read in some git book

#

not my opinion

charred estuary May 1, 2025, 3:00 PM

#

limber spear I use GitHub for backup. Unless GitHub says no lol

I agree. Especially for Python since files are small.

gritty notch May 1, 2025, 4:10 PM

#

Hey everyone pls help me with this

#

import numpy as np

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size)
        self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size)
        self.bias_hidden = np.zeros((1, self.hidden_size))
        self.bias_output = np.zeros((1, self.output_size))

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        return x * (1 - x)

    def feedforward(self, X):
        self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden
        self.hidden_output = self.sigmoid(self.hidden_activation)
        self.output_activation = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
        self.predicted_output = self.sigmoid(self.output_activation)
        return self.predicted_output

    def backward(self, X, y, learning_rate):
        output_error = y - self.predicted_output
        output_delta = output_error * self.sigmoid_derivative(self.predicted_output)
        hidden_error = np.dot(output_delta, self.weights_hidden_output.T)
        hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output)
        self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) * learning_rate
        self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
        self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate
        self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

#

def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            output = self.feedforward(X)
            self.backward(X, y, learning_rate)
            if epoch % 4000 == 0:
                loss = np.mean(np.square(y - output))
                print(f"Epoch {epoch}, Loss: {loss}")

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
nn.train(X, y, epochs=10000, learning_rate=0.1)
output = nn.feedforward(X)
print("Predictions after training:")
print(output)

viscid urchin May 1, 2025, 5:40 PM

#

So, my buddy has this awful 820-page PDF slide deck (ugh).. and he's trying to figure out which pages of it refer to a particular concept. The problem is that any individual slide may or may not mention any interesting keywords about that, so it's really a full-LLM kind of context problem, it seems to me.

I took a swing at using LangChain to feed it into OpenAI for him, and it didn't go that well; the problem is that I'm having to chunk it into 50-page slices, and each of those isn't maybe enough context about the project to make good input.

What general approach is best-suited for this kind of problem? Do I need to "fine-tune" a model on this slide deck?

#

As an example, here's a slide that needs to be a 'hit', because despite not mentioning anything by name, it is ABOUT the project he's looking for hits on:

#

and if I don't luck into having that context in the same 'chunk' as this slide, it's not gonna realize this is a hit.

verbal oar May 1, 2025, 6:40 PM

#

is ai agents related to reinforcement learning due to concept of agent in rl, can I think this way or its different meaning?

agile cobalt May 1, 2025, 7:00 PM

#

verbal oar is ai agents related to reinforcement learning due to concept of agent in rl, ca...

not related at all

an Agent (or more broadly, Agentic systems) using LLMs are when you take a language model, give it some tools (e.g. python functions it can call), then ask it to do a given task with some degree of autonomy
It does not necessarily involves reinforcement learning

RL is sometimes used for training the language models, specially to let them 'reason' on their own before writing a final answer or invoking a tool

verbal oar May 1, 2025, 7:02 PM

#

yes because I was confused about rlhf

#

thanks for explanation

agile cobalt May 1, 2025, 7:04 PM

#

viscid urchin So, my buddy has this awful 820-page PDF slide deck (ugh).. and he's trying to f...

I'd just try yeeting it into Gemini

if it's a one-off thing you don't plan to reuse, skimming through all pages manually may be faster than doing something ultra fancy with AI

viscid urchin May 1, 2025, 7:04 PM

#

agile cobalt I'd just try yeeting it into Gemini if it's a one-off thing you don't plan to r...

Gemini won't try, sadly; too many slides.

#

Yeah that's my best advice for him so far; hoping someone has a cooler plan.

verbal oar May 1, 2025, 7:05 PM

#

split them maybe?

viscid urchin May 1, 2025, 7:05 PM

#

The problem is that the first bunch of slides introduce the topic, and if you just feed it a slice of the later part of them it isn't doing a great job.

verbal oar May 1, 2025, 7:05 PM

#

ah like needs context

viscid urchin May 1, 2025, 7:05 PM

#

I tried chunking them with LangChain etc before posting, didn't really find what my buddy is looking for.

agile cobalt May 1, 2025, 7:06 PM

#

maybe run it through https://github.com/microsoft/markitdown or Mistral's OCR model first, should be small enough for most commercial LLMs afterwards

viscid urchin May 1, 2025, 7:06 PM

#

My suggestion to him was to try to use some 'tool memory' on the problem, but I don't have a specific MCP in mind for him to lean on.

#

Aah yeah I'll try that too.

agile cobalt May 1, 2025, 7:07 PM

#

viscid urchin My suggestion to him was to try to use some 'tool memory' on the problem, but I ...

it would take ages to get a memory-ish solution working well if your context is that sparse (as in, must catch loose references to other things refering to the same concept within the document)

glacial yoke May 1, 2025, 7:09 PM

#

What's a good way to prep for data science & ai uni course?

agile cobalt May 1, 2025, 7:09 PM

#

it's relatively easy to identify that a slide is making a direct reference to X thing, but finding all other slices that indirectly reference that thing is hard

as far as I know usually you'd just take fragments of things around the direct reference

agile cobalt May 1, 2025, 7:10 PM

#

glacial yoke What's a good way to prep for data science & ai uni course?

basic python syntax, basic statistics, maybe scikit-learn's MOOC, perhaps play around with numpy/pandas a bit (e.g. look at Kaggle competitions)

#

see if any of the courses list their pre-requisites / things you are expected to know

glacial yoke May 1, 2025, 7:11 PM

#

agile cobalt basic python syntax, basic statistics, maybe scikit-learn's MOOC, perhaps play a...

Any recommendations for tools to study these (except Python, got that already covered)?

agile cobalt May 1, 2025, 7:14 PM

#

not sure, maybe something like Datacamp but I never used it myself

make sure you take a look at the documentation of any libraries you use though, specially their User Guides if they have one

serene grail May 1, 2025, 7:23 PM

#

glacial yoke Any recommendations for tools to study these (except Python, got that already co...

I like the channel statquest on YouTube for basic statistics, if you search "statquest playlist" on YouTube you'll get a playlist called "statistics fundamentals", it covers a lot of concepts and it's pretty good IMO although I'm a beginner myself
Kaggle has some small (free?) courses that teach you how to use notebooks, pandas, etc.
I've seen people recommend Kaggle for that

glacial yoke May 1, 2025, 7:25 PM

#

serene grail I like the channel statquest on YouTube for basic statistics, if you search "sta...

hmm. will check those out after my a levels

viscid urchin May 1, 2025, 7:34 PM

#

agile cobalt maybe run it through <https://github.com/microsoft/markitdown> or Mistral's OCR ...

I started investigating why my markdownit output was so bad-looking, and it turns out the source presentation is just trash haha whoops. I'm just going to tell my friend to come back with better data.

verbal oar May 1, 2025, 8:07 PM

#

so just syntax?

#

not more?

#

like generators,iterators

#

for example big dataset and getting stream of data

#

datacamp looks ok imo

#

I thought python is prerequisite so looks like I'm overthinking it

#

python in the sense not just syntax but more, but seems its like you said then

#

statistics could be also khan academy but not sure

agile cobalt May 1, 2025, 8:50 PM

#

verbal oar like generators,iterators

usually you'll be working with dataframes or other abstract representations of datasets that manage the looping for you, it's relatively rare for you to have to write a generator yourself when working with data

when you're using python with libraries like pandas or polars, the way you write code is extremely different from python on its own

(of course, it's still nice to understand how more advanced python features work, but you must not go out of your way to use it when the library offers a different, more efficient way to do the same thing)

glacial root May 1, 2025, 8:57 PM

#

hi, typically what does the interview process look like for machine learning and/or computer vision roles, for both internships and full time

serene scaffold May 1, 2025, 9:07 PM

#

glacial root hi, typically what does the interview process look like for machine learning and...

Here are some example questions: #career-advice message

glacial root May 1, 2025, 9:22 PM

#

serene scaffold Here are some example questions: https://discord.com/channels/267624335836053506...

thank you, also are there any math questions or any programming questions they ask?

#

kind of like with the leetcode round with swe but with ml algorithms

serene scaffold May 2, 2025, 12:15 AM

#

glacial root thank you, also are there any math questions or any programming questions they a...

I ultimately don't know what they'll ask you. But I've only ever applied for data science and AI positions, and I was never asked a leetcode-style question, nor have I asked one to a candidate.

glacial root May 2, 2025, 12:16 AM

#

serene scaffold I ultimately don't know what they'll ask you. But I've only ever applied for dat...

not even math or algorithm questions?

#

interesting

#

and this is for full time?

serene scaffold May 2, 2025, 12:17 AM

#

glacial root not even math or algorithm questions?

they'll probably ask you theory questions. I'd be surprised if they asked you to do any live coding.

glacial root May 2, 2025, 12:17 AM

#

are there online assessments like with swe?

charred estuary May 2, 2025, 1:27 AM

#

If I were to train a model from scratch and give it lines of data structured like this (shown in gist), how many of these data points should I have?
https://gist.github.com/Tyguy047/b492cdbc68797f518a952ea41f994f96

Gist

dataset.jsonl

GitHub Gist: instantly share code, notes, and snippets.

rich moth May 2, 2025, 3:35 AM

#

#

#

This is based on the Fashion_Minst dataset.

#

Time data series Arrowhead. See how it still remains in 1D

rich moth May 2, 2025, 4:14 AM

#

Alright, I think the next experiment is to really push this thing: gonna calculate the complexity scores for all the datasets (time series, images, tabular, text) using the right domain logic for each first. Then, I'll combine all those complexity values into one giant list and sort it using my framework's magnitude score. Really want to see if the tool can create a meaningful difficulty ranking that truly works across all domains when the data types are mixed together. Basically, putting the 'Unified' in UCF to the ultimate test!

#

I've noticed patterns in this thing, the more complex the data , the better the results. Wihich makes sense, thats the whole purpose

rich moth May 2, 2025, 4:44 AM

#

Here are the cross domain results

rich moth May 2, 2025, 5:26 AM

#

https://drive.google.com/drive/folders/1ypom_1e665QzfKjI9BF7rojBIH0A3AjT?usp=sharing

verbal oar May 2, 2025, 10:56 AM

#

can I fine tune some model with dataset containing python code?

#

both are on hugging face

#

dataset https://huggingface.co/datasets/codeparrot/github-code

codeparrot/github-code · Datasets at Hugging Face

#

of course I select only python code

#

50gb of data will be slow for finetuning, cut it to some smaller size?

fierce python May 2, 2025, 2:06 PM

#

When dealing with binary classification, is it better to use torch.nn.BCELoss or torch.nn.CrossEntropyLoss and why?

From what I know, we use BCELoss to get the value close to either 0 or 1 and round it to the nearest one, but with CrossEntropyLoss we get the exact 0 or 1 value, so which one is better actually?

glacial root May 2, 2025, 3:49 PM

#

would 400 images be enough to train a liscence plate detector

#

or is that too little

agile cobalt May 2, 2025, 3:53 PM

#

might be enough to fine tune

you can also try using data augmentation

glacial root May 2, 2025, 4:22 PM

#

nevermind it doesn't have labels

#

i've been trying to find a dataset with proper labels, all of the ones i find don't state exactly what the label format is and it just says it's meant to work with yolo

#

but i need to train this model from scratch

#

are any of you aware of any datasets out there that are clear with the labeling format

#

like if it's the top left and bottom right coordinates or if it's one coordinate and the dimensions of the box

odd meteor May 2, 2025, 4:55 PM

#

fierce python When dealing with binary classification, is it better to use `torch.nn.BCELoss` ...

TLDR: It doesn't really matter which one you use since both loss functions can get the work done.

BCELoss is strictly for binary classification and expects probabilities (0 to 1), but then, you must manually apply torch.sigmoid() to your logits first because BCELoss() does not ido this for you. Your labels also need to be floats. It's the application of sigmoid that actually squashes the values close to 0 or 1 not exactly BCELoss itself.

BCEWithLogitsLoss on the other hand simplifies this by handling the sigmoid internally for you, so you just need to feed it the raw logits and float labels, no extra steps needed.

CrossEntropyLoss, designed for multi-class, also works for binary so long as your model outputs 2 logits (for class 0 and class 1)... It auto-applies Softmax to the logits and uses integer labels, unlike BCELoss().

So there's no rule I've seen that states that using one is better than the other, however, just ensure your predicted output and target are in the right format ( depending on whichever loss function you decide to use.)

Personally, for classification tasks, whether binary or multiclass, I still prefer CrossEntropyLoss because of its design consistency (no sigmoid, works with int labels)... Meanwhile, another person might like BCELoss() or BCEWithLogitLoss()

fierce python May 2, 2025, 5:08 PM

#

odd meteor TLDR: It doesn't really matter which one you use since both loss functions can g...

Thank you for the detailed explanation! That was beyond what I was expecting.

glacial root May 2, 2025, 6:53 PM

#

agile cobalt might be enough to fine tune you can also try using data augmentation

looks like i'm gonna have to go with this one, turns out it does have labels and is the only one i could find with clear labels for coordinates

#

what things should i do to augment the data and get accurate coordinates for the augmented data

agile cobalt May 2, 2025, 7:01 PM

#

crop and pan, rotate, maybe warp

#

to clarify: by augmenting I don't mean improving, but rather creating more samples

tensorflow and pytorch both have some docs on it, and overall it shouldn't take too much work to re-implement yourself in another framework if you needed to

serene grail May 2, 2025, 7:29 PM

#

agile cobalt crop and pan, rotate, maybe warp

Is this a common thing to do in machine learning? I'm guessing this is supposed to help prevent overfitting on some particular criteria, for example license plate always being horizontal/always being in the center of the screen/etc.

glacial root May 2, 2025, 8:04 PM

#

agile cobalt to clarify: by augmenting I don't mean improving, but rather creating more sampl...

would i be able to get the coordinates for the new image though

agile cobalt May 2, 2025, 8:07 PM

#

glacial root would i be able to get the coordinates for the new image though

not sure if any of the libraries do it automatically for you or if you must do it yourself, but even worst on the case scenario it should a relatively simple math calculation - just apply the same formula that's applied to the image pixels onto the bounding box

(imagine the label as a 2D image greyscale image with the same dimensions as the original image - as long as you apply the same transformations as you applied to the original image it remains in 'sync')

glacial root May 2, 2025, 8:10 PM

#

agile cobalt not sure if any of the libraries do it automatically for you or if you must do i...

oh yeah true

#

man something is wrong with me, i cannot think at all lol

lapis sequoia May 2, 2025, 10:24 PM

#

rich moth Alright, I think the next experiment is to really push this thing: gonna calcula...

You go hard! UCF! UCF!

#

Going hard in the paint. Yo!

dreamy sky May 3, 2025, 1:26 AM

#

Anyone doing the "Drawing with LLMs" Kaggle competition?

sudden wyvern May 3, 2025, 2:25 AM

#

Hi everyone I am new in AI workspace and working on one chat bot kind of functionality for odoo system I have used ollama to run deepseekr1 model locally now I want to train this model to answer odoo related queries and use our custom postgres database to give best possible answer of customer query so can anyone guide me on this how can I train deepseekr1 model with my custom data ❓

glacial root May 3, 2025, 3:11 AM

#

how much of a difference would there be in computational efficiency if i went with the region based method for classification versus training the model to both localize and classify

deep creek May 3, 2025, 5:31 AM

#

Hi,
I've prepared a ETL template.
Link: https://github.com/mglowinski93/EtlTemplate
More details: https://www.reddit.com/r/Python/comments/1kd4aib/etl_template_with_clean_architecture/

I hope you guys like it 🙂

GitHub

GitHub - mglowinski93/EtlTemplate: Template for Extract-Transform-L...

Template for Extract-Transform-Load (ETL). Contribute to mglowinski93/EtlTemplate development by creating an account on GitHub.

From the Python community on Reddit: ETL template with clean archit...

Explore this post and more from the Python community

lapis sequoia May 3, 2025, 1:55 PM

#

dreamy sky Anyone doing the "Drawing with LLMs" Kaggle competition?

What are these people doing with “LLMs”? Is this all just Q and A and RAG stuff? I’ve been seeing this all of the time.

#

My main confusion is, these “LLMs” how big are they? Are they owned by a company? Are people just making LLMs like it’s nothing? Are they using langchain? I don’t know, I’ve seen this come up a lot.

viscid urchin May 3, 2025, 4:32 PM

#

If you were learning it all again, which Calculus book would you send back to yourself?

lapis sequoia May 3, 2025, 4:49 PM

#

Heavy optimization with partials and unconstrained optimization.

#

Best place to start

viscid urchin May 3, 2025, 4:51 PM

#

Sure, but pretend you have a time portal that only one book fits through.

lapis sequoia May 3, 2025, 4:55 PM

#

Just learn derivatives and integrals. Know the unit circle.

#

understand limits. I took calc1 so long ago, like 2016. Dang. But yeah, I would suggest going to school for math

viscid urchin May 3, 2025, 5:02 PM

#

I went to school for math, yeah, that's not what I'm asking but I appreciate the feedback.

lapis sequoia May 3, 2025, 5:03 PM

#

viscid urchin I went to school for math, yeah, that's not what I'm asking but I appreciate the...

I didn’t mean to be rude. I misread your message. I apologize

viscid urchin May 3, 2025, 5:04 PM

#

To expand on my question.. I've got, for example, a 'programming paradigms' textbook in mind that would have greatly accelerated my learning if it had existed back then.. and I was just wondering today if there were a math equivalent text for that thought experiment.

lapis sequoia May 3, 2025, 5:07 PM

#

I just felt like the structure of trig calc1-3 was good how it was when I took it. Maybe, more of emphasis on differential equations. I don’t remember that class at all. Linear algebra should always be a requirement, optimization is underrated in calc1-3, it’s very important. Yeah, I think calc should focus more on optimization. I don’t remember it was so long ago.

viscid urchin May 3, 2025, 5:11 PM

#

Interesting; when I was taught, trig and calc were totally separate; did they overlap for you?

lapis sequoia May 3, 2025, 5:11 PM

#

You need trig for calc. No, I took trig separately. This was so long ago.

viscid urchin May 3, 2025, 5:11 PM

#

(I hated my trig class at the time, I hope they teach it in a different way now)

lapis sequoia May 3, 2025, 5:12 PM

#

I remember when I was 18 I grinded trig so hard and got litterally a 100 didn’t miss a point. shout out to my 18 year old self

viscid urchin May 3, 2025, 5:13 PM

#

Nice.

lapis sequoia May 3, 2025, 5:14 PM

#

I didn’t think calc was bad honestly. This was so long ago, but honestly I remember being introduced to a limit and it made more sense than a bond. This was so long ago. Honestly. I didn’t think calc1-3 was bad at all. I am serious it mad direct sense.

viscid urchin May 3, 2025, 5:18 PM

#

I do remember at one point I was given the example of a speedometer, and its relationship to distance traveled etc, and that was a WAY better guide to my intuition than what I'd been exposed to before

#

I also sorta ended up finding the 'fluxions' explanation of things more helpful than the modern one, oddly

lapis sequoia May 3, 2025, 5:28 PM

#

I remember how much I hated physics. I did well, I just didn’t find it interesting. It felt forced.

viscid urchin May 3, 2025, 5:33 PM

#

I had a really good high school physics teacher, I lucked out there.

lapis sequoia May 3, 2025, 5:35 PM

#

I just remember the labs were so boring. All of this was so long ago. I am trying to remember how I felt.

glacial root May 3, 2025, 6:04 PM

#

viscid urchin To expand on my question.. I've got, for example, a 'programming paradigms' text...

who's the author

viscid urchin May 3, 2025, 6:11 PM

#

glacial root who's the author

It's this https://webperso.info.ucl.ac.be/~pvr/book.html

Concepts, Techniques, and Models of Computer Programming

A comprehensive programming textbook that
covers all important programming paradigms in a unified framework
that is both practical and theoretically sound.
Special attention is given to concurrent programming and data abstraction.
The textbook uses the Oz multiparadigm programming language for its examples.

#

even more than some other classics, this changed how I thought about programming

pine arch May 3, 2025, 8:53 PM

#

quick question, I'm attempting at PCA and from my understanding if your data looks clustered and not varied this means your PCA isn't usable is that right?

serene grail May 3, 2025, 9:16 PM

#

viscid urchin If you were learning it all again, which Calculus book would you send back to yo...

I can't recommend a book but I REALLY recommend 3blue1brown's YouTube video series on Calculus, it really helped me grasp the basics

toxic pilot May 3, 2025, 11:06 PM

#

viscid urchin It's this https://webperso.info.ucl.ac.be/~pvr/book.html

i saw the word “techniques” and thought u were talking about the dragon book for a sec

viscid urchin May 3, 2025, 11:06 PM

#

toxic pilot i saw the word “techniques” and thought u were talking about the dragon book for...

I'm pleased to own a (red, I guess) copy of the Dragon book, but I can't say in retrospect it helped me learn much; I mostly just found it too dense, and when I finally understood it all from other sources enough for it to make sense, it was out of date.. seminal text though for sure

#

I just wasn't smart enough to get it the first time I guess

toxic pilot May 3, 2025, 11:07 PM

#

viscid urchin I'm pleased to own a (red, I guess) copy of the Dragon book, but I can't say in ...

it’s very theory oriented i will admit

#

i kind of jumped around in the book and read the parts i really cared about

viscid urchin May 3, 2025, 11:07 PM

#

I guess what it at least did was teach me the terminology, so I could go separately investigate the parts.

toxic pilot May 3, 2025, 11:08 PM

#

viscid urchin I guess what it at least did was teach me the terminology, so I could go separat...

a lot of is still pretty relevant tho. like a lot of the compiler optimization techniques

#

idk i feel like it’s a good textbook for college students to read if they’re taking a PL/compiler class

viscid urchin May 3, 2025, 11:09 PM

#

I mean, as long as you go also learn about PEGs and GLL parsing and other things it doesn't cover

#

e.g. https://dotat.at/tmp/gll.pdf

#

One probably shouldn't actually build YACC again circa 2025

#

I think the "front-end" vs. "back-end" hard distinction is out of favor too, in comparison to a long pipeline of simple transformations

#

I see a lot of people suggest this instead now https://www.amazon.com/Modern-Compiler-Design-Dick-Grune/dp/1461446988 but I haven't had the pleasure of reading it

Modern Compiler Design

"Modern Compiler Design" makes the topic of compiler design more accessible by focusing on principles and techniques of wide application. By carefully distinguishing between the essential (material that has a high chance of being useful) and the incidental (material that will be of benefit only i...

opaque condor May 3, 2025, 11:15 PM

#

https://paste.pythondiscord.com/OUSA is this code okey it graphs the acurecy over time

#

within the network

toxic pilot May 4, 2025, 1:25 AM

#

viscid urchin I think the "front-end" vs. "back-end" hard distinction is out of favor too, in ...

well, perhaps. I think there'll still be a discrete line between frontend and backend compilers as long as LLVM remains a key technology in compiler design

viscid urchin May 4, 2025, 1:29 AM

#

Yeah, it’s still probably a useful idea, I mostly was just saying it has turned out not to be sacred.

limber spear May 4, 2025, 1:34 AM

#

Compilers? Nothing moves without them. Especially code.

#

This is why I study systems level programming.

limber spear May 4, 2025, 1:58 AM

#

pine arch quick question, I'm attempting at PCA and from my understanding if your data loo...

Errr isn’t PCA breaking down the features to find the best ones meowthumbsup

#

My head hurts. Have a good day/night chat 🫡

severe blade May 4, 2025, 3:16 AM

#

this worked btw. thank you so much man. i hope you get all the success you wish for.

rich river May 4, 2025, 12:24 PM

#

from ultralytics.utils.ops import clip_boxes, scale_masks

class YoloModel(BaseModel):
    """
    This YoloModel class is for object detection and instance segmentation task
    """

    def __init__(self, model_path: str, confidence: float):
        self._model = YOLO(model_path)
        self._confidence = confidence
        self._cv_bridge = CvBridge()

    def segmentation(self) -> Tuple[list[str], list[Segmentation]]:
        model_output = self._model.predict(
            self._color_img, conf=self._confidence, iou=self._confidence
        )[0]
        self._model_output = model_output
        if model_output.masks is None or model_output.boxes is None:
            return None, None

        names = [value for _, value in sorted(model_output.names.items())]

        # the box coordinates are given in float32 but we want int32,
        # clip them again to avoid rounding issues causing the boxes
        # to be out of the image
        boxxywhs = clip_boxes(model_output.boxes.xywh.int(), model_output.orig_shape)
        scale_up_masks = (
            scale_masks(model_output.masks.data[None], model_output.orig_shape)
            .squeeze(0)
            .to(torch.uint8)
            .cpu()
        )

        segmentations = []
        for i in range(model_output.boxes.data.shape[0]):
            item = self.yolo_result_to_segmentation(
                model_output.boxes.cls[i].int().item(),
                model_output.boxes.conf[i],
                boxxywhs[i],
                scale_up_masks[i],
            )
            segmentations.append(item)

        return names, segmentations

anyone has an idea how to rewrite codes using ultralytics in C++? since I need to deploy it using C++
my current thoughts are rewriting clip_boxes and .predict function in C++, but it seems a lot of work

wild zenith May 4, 2025, 1:14 PM

#

loc or iloc which is better

serene scaffold May 4, 2025, 2:02 PM

#

wild zenith loc or iloc which is better

Why do you think one is better than the other?

toxic pilot May 4, 2025, 2:04 PM

#

viscid urchin Yeah, it’s still probably a useful idea, I mostly was just saying it has turned ...

that that is true, especially with more complex languages like Rust, the line between frontend & backend (and middle end perhaps?) isnt as clear. I'm pretty sure i read somewhere that Rust is actually planning to move of of LLVM but somebody should definitely fact check that

gaunt zephyr May 4, 2025, 5:33 PM

#

anybody tried Google-adp?

#

It’s for making chatbots with Gemini

limber spear May 4, 2025, 6:15 PM

#

gaunt zephyr anybody tried Google-adp?

What’s Google-adp pikawow is it like yt-dlp or something

leaden narwhal May 4, 2025, 6:16 PM

#

Anyone ok to go over with me my notebook so i can organize it properly? Im having a hard time doing so since this is so unorganized

#

Doing a solo project

limber spear May 4, 2025, 6:17 PM

#

Just post it here meowthumbsup

#

We should have a workspace channel for this channel. Like the movie Inception pithink

serene scaffold May 4, 2025, 6:18 PM

#

notebooks aren't amenable to sharing over Discord, so you'll want to do something like python -m jupyter nbconvert --to script --stdout your_notebook.ipynb

leaden narwhal May 4, 2025, 6:18 PM

#

on discord ? ahahha

serene scaffold May 4, 2025, 6:18 PM

#

Yes, that's where we are

leaden narwhal May 4, 2025, 6:19 PM

#

No no i mean the code line

#

📎 DareDataDP.ipynb

limber spear May 4, 2025, 6:20 PM

#

SP is recommending converting any Python scripts to Jupyter notebooks

serene scaffold May 4, 2025, 6:20 PM

#

limber spear SP is recommending converting any Python scripts to Jupyter notebooks

the opposite

serene scaffold May 4, 2025, 6:20 PM

#

leaden narwhal

someone would have to start a notebook server to read this, so it would be easier for them if you do the command to convert it to flat text.

limber spear May 4, 2025, 6:21 PM

#

I see 👍

#

Does nbviewer allow shareable notebooks

leaden narwhal May 4, 2025, 6:22 PM

#

serene scaffold notebooks aren't amenable to sharing over Discord, so you'll want to do somethin...

Im sorry if i sound like an idiot but where would i put this

#

cmd?

limber spear May 4, 2025, 6:23 PM

#

Nbviewer here: https://nbviewer.org/

nbviewer

serene scaffold May 4, 2025, 6:25 PM

#

leaden narwhal Im sorry if i sound like an idiot but where would i put this

you do not, and yes.

leaden narwhal May 4, 2025, 6:27 PM

#

serene scaffold you do not, and yes.

cmd is saying i dont have python installed, ill just go ahead and use nbviewer

leaden narwhal May 4, 2025, 6:30 PM

#

limber spear Nbviewer here: https://nbviewer.org/

This also didnt work

#

I did a colab link

#

Easier i guess

#

https://colab.research.google.com/drive/1oJZPu8W_Z4xX8z3q-6vZXQYbLoze4L_J?usp=sharing

Google Colab

#

Correct one

#

Also sorry for asking this guys, my head just hurts from looking at jupyter the whole day

rich moth May 4, 2025, 6:39 PM

#

whats up people.

#

so this curriculum learning tool i made and been playing around with and I think i stumbled on something fundamental about how data/ knowledge is structured.

basically, I found a way to measure the "learning complexity" of individual samples in ANY dataset (images, time series, tabular, text) using a single unified framework but the crazy part when I sort training data by this complexity measure, I'm seeing performance gains from 3% to 150% (!!) depending on the dataset

#

whats even more wild though is the farmwork correctly identifies when data doesn't have inherent structure ( tlike the Madelon dataset), where random is the winner

#

I tested it on 62 datasets on 4 domains the biggest increase was +149% WAFER dataset, blood tranfusion +84%, ECG data 53%, but on truly random data its 0%, as it should be

#

but what i think im finding here, theres something like a "conceptual dependency graph" hidden in data. some knowledge has prerequisites (like learning addition before multiplication), some doesn't (like learning colors)

But this framwork i made can tect which is which automatically

i feel likes theres something deeper here aout how information itself is structured

limber spear May 4, 2025, 6:53 PM

#

leaden narwhal https://colab.research.google.com/drive/1oJZPu8W_Z4xX8z3q-6vZXQYbLoze4L_J?usp=sh...

Your notebook looks fairly organized Manny. Random Forest are 1 of my favorite stacks. What is your question exactly

#

@rich moth test your stack on Manny’s dataset here 👀

rich moth May 4, 2025, 6:56 PM

#

Yum yums! Lets do !

#

idk maybe im overthinking this but it feels like there's some universal pattern here about how knowledge organizes itself? like why does it work across images AND time series AND text AND tabular? seems weird right?

limber spear May 4, 2025, 7:00 PM

#

Well we have data here. Let’s put your stack up to the test 🤔

#

Data can easily lie. A lot don’t understand that

rich moth May 4, 2025, 7:01 PM

#

this sounds aweosme let me see

#

Sao Paulo Geospatial datasets ?

leaden narwhal May 4, 2025, 7:03 PM

#

limber spear Your notebook looks fairly organized Manny. Random Forest are 1 of my favorite s...

I thought things looked disorganized and wanted to put them more organized

#

and now that i did it on jupyter my r2 scores went to shit and now linear regression is better than rfr?

leaden narwhal May 4, 2025, 7:03 PM

#

rich moth Sao Paulo Geospatial datasets ?

Ye

limber spear May 4, 2025, 7:07 PM

#

@leaden narwhal put your accuracy metrics toward the bottom maybe and group your map plots together. From what I see your Random Forest metrics have higher accuracy vs LR numbers.

A confusion matrix and F1, precision, recall metrics should give you a more reliable accuracy metric.

#

This should tell you if your models are confused or not. Basically lying to you 😂

#

I lecture my models all the time

leaden narwhal May 4, 2025, 7:11 PM

#

limber spear <@299967933063626752> put your accuracy metrics toward the bottom maybe and grou...

Let me show you my organized notebook

#

i think this shows the actuall values

#

causes i was using log income values to do a prediction on income which is stupid

#

so now i changed to linear regression

#

https://colab.research.google.com/drive/1geMb9ONPDFzXiGlckX7LVG9CmWmxd32z?usp=sharing

Google Colab

#

Im going to try xgboost

limber spear May 4, 2025, 7:13 PM

#

Your project is really cool. Dedo no cu e gritaria

#

Did I say that right

leaden narwhal May 4, 2025, 7:14 PM

#

XD

#

Im not brazilian, im portyuguese but it also applies lol

limber spear May 4, 2025, 7:15 PM

#

Someone taught me bad words mkay

leaden narwhal May 4, 2025, 7:15 PM

#

Is this an xgboost moment

#

https://tenor.com/view/clap-joe-hendry-gif-13588265419480385996

Tenor

#

Ok i dunno why but xgboost actually was goated and made some crazy predictions

#

almost perfect, some districts still havent predicted properly but most yeah. Im happy with this!

#

@limber spear

#

Check the folium at the end and tell me what you think

https://colab.research.google.com/drive/1rG_63k7PU3B-4q2gvQhThikKy_Htc4d-?usp=sharing

Google Colab

limber spear May 4, 2025, 7:21 PM

#

Xgboost is pretty nice RF is goated as well imo

leaden narwhal May 4, 2025, 7:21 PM

#

yeah but the r2 score is 10 times better

limber spear May 4, 2025, 7:23 PM

#

This is where your fine-tuning skills can come in. Some are fine-tuning goats. This is where OpenAI and Grok devs make their living

#

Billion parameter models

#

But big tech won’t tell you this. Big tech will say cutting edge AI or proprietary

versed axle May 4, 2025, 7:32 PM

#

leaden narwhal https://tenor.com/view/clap-joe-hendry-gif-13588265419480385996

say his name moment?

glacial root May 4, 2025, 7:44 PM

#

why did they do this 💀
https://www.kaggle.com/datasets/naim99/lion-image?select=lion.jpg

lion image

viscid urchin May 4, 2025, 7:46 PM

#

subtle mirage May 4, 2025, 10:06 PM

#

messing around with matplotlib and remembered you can add text to plots :D

lapis sequoia May 4, 2025, 10:23 PM

#

ok, today, this sounds dumb, I never cloned a repo that was not mine ever. I thought that was cheating or something. I would either read about or look at code for reference if I did not understand it. Cloning, makes this so much faster. I never knew this. I only cloned for my own repos to edit or someone else's I had permission to. Everyone clones?

exotic star May 5, 2025, 12:48 AM

#

i started getting into ai and i didnt know how/what's the best way to do it

#

i started learning pandas now numpy and then matplot

#

and seaborn

#

after that pytorch scikit-learn

#

and then ML methods like classification regression decision trees and so on

#

after that essential method or before that? and then deep learning

#

is this strategy decent? i kinda seperated them into different parts and i do seperate projects with all of them after learning a bit then a combined 1

#

right now learning about numpy, almost done and then starting matplot

craggy patio May 5, 2025, 2:14 AM

#

subtle mirage messing around with matplotlib and remembered you can add text to plots :D

He's roaring because of the extreme underfitting going on

rich moth May 5, 2025, 4:04 AM

#

@Manny @limber spear São Paulo geospatial dataset results

#

limber spear May 5, 2025, 5:16 AM

#

leaden narwhal Check the folium at the end and tell me what you think https://colab.research....

Looks pretty well mapped. Is this going on a dashboard?

obsidian bronze May 5, 2025, 6:24 AM

#

Hello

#

some of you guys is a begineer

opaque flower May 5, 2025, 7:12 AM

#

Is data science less saturated than other IT fields? Also, is it a good career choice for the future? I’ve heard some people say it’s a dying field.

limber spear May 5, 2025, 8:51 AM

#

opaque flower Is data science less saturated than other IT fields? Also, is it a good career c...

science is an infinite circle, and data can scale to infinity. Forever. What do you think of this conjecture

jaunty helm May 5, 2025, 9:41 AM

#

leaden narwhal Is this an xgboost moment

xgboost falls into the category of gradient boosted trees; others include lightgbm and catboost
these are usually very competitive when it comes to tabular data

subtle mirage May 5, 2025, 11:54 AM

#

opaque flower Is data science less saturated than other IT fields? Also, is it a good career c...

dying field? who the heck told you that? there will never not be data. and there will never, ever, not be the need to process it and analyze results and make something of them

#

beautiful

quaint mulch May 5, 2025, 12:04 PM

#

lapis sequoia ok, today, this sounds dumb, I never cloned a repo that was not mine ever. I tho...

Yes.
People WANT to have their repo cloned. There is a counter that says how many times their repo got cloned. A repo that is cloned a lot is a good repo, because it means many people find it useful.
If, for any reason, people don't get want to get their repo cloned, they will not make it public in the first place.

quaint mulch May 5, 2025, 12:05 PM

#

exotic star is this strategy decent? i kinda seperated them into different parts and i do se...

Many people already made may list (including me)

https://www.pythondiscord.com/resources/?topics=data-science
http://introtodeeplearning.com/
https://deep-learning-drizzle.github.io/
https://kidger.site/thoughts/just-know-stuff/
https://github.com/aprbw/ArianDLPrimer (I made the last list myself)

quaint mulch May 5, 2025, 12:06 PM

#

exotic star right now learning about numpy, almost done and then starting matplot

The list that you give looks good to me.
Sounds like you are making steady progress.
Keep it up!

agile cobalt May 5, 2025, 12:18 PM

#

quaint mulch Yes. People WANT to have their repo cloned. There is a counter that says how man...

Just check the License before you clone/install something

some will limit what you can do with it
others will force you to share under the same license
and if there is no license, then strictly speaking you have no permission to do anything with it

exotic star May 5, 2025, 12:29 PM

#

quaint mulch The list that you give looks good to me. Sounds like you are making steady progr...

thanks a lot! It's fun and i'll definitely keep it up

lapis sequoia May 5, 2025, 12:37 PM

#

quaint mulch Yes. People WANT to have their repo cloned. There is a counter that says how man...

All of this time, I would litterally book mark the actual repo if it was good and learn from it. The amount of time that would’ve been saved by simply cloning it and putting myself in their shoes…. It’s ok, the grit is there. Oh my god. I have only cloned my repo to change it or others the I had access to. Never ever cloned a repo as a guide when it’s like “oh I need a good example from their prospective “ I will remember this day. Forward onto Dawn. Let’s go. I don’t care I am glowing.

quaint mulch May 5, 2025, 12:44 PM

#

agile cobalt Just check the License before you clone/install something some will limit what ...

@lapis sequoia make sure you read the caveat by etrotta.
good luck

proven current May 5, 2025, 3:06 PM

#

What is data

gritty vessel May 5, 2025, 3:33 PM

#

Hey I am working on image segmentaion and my targets have nan values so masked loss fucntion is the only way to go?

#

Like it ignores the nan and only get trained on valid data

verbal oar May 5, 2025, 3:33 PM

#

heh I recalled "learn from it" from siraj raval data lit video 🙂

#

dont know if he uploads sth still must check

#

but he has ml on tensorflow not in pytorch as I correctly remember

#

https://github.com/llSourcell

GitHub

llSourcell - Overview

subscribe to my youtube channel!
www.youtube.com/c/sirajraval

llSourcell

gritty vessel May 5, 2025, 3:51 PM

#

lapis sequoia All of this time, I would litterally book mark the actual repo if it was good an...

yo I performed the analysis

#

that day you helped me

#

So i noticed in scatter plot when there is rain all variables get low by 10-20 kelvin and some even get low by around 40 kelvin

#

so its a good idea to include all 4 vars

#

I had one more que guys

#

In meteroloigcal data

#

weighted loss function is a good choice or not?

#

as we give more weightage lets say rain events

#

but in real scenerio no rain events will be more

#

and rain events will be less

agile cobalt May 5, 2025, 4:24 PM

#

gritty vessel as we give more weightage lets say rain events

depends on which metric matters the most for you

for example, if you were trying to predict extreme weather, you might be willing to sacrifice accuracy in exchange for a higher recall knowing you'll get some more false alerts

gritty vessel May 5, 2025, 5:02 PM

#

agile cobalt depends on which metric matters the most for you for example, if you were tryin...

I'm considering general weather,currently excluding storms,cyclones and all

#

but trainig examples for rain case are definetly very less than no rain cases

#

95% data is of no rain cases

#

and 4% of rain

#

1%others

serene scaffold May 5, 2025, 6:46 PM

#

!mute 1270417623296905301 "1 hour" This is your final warning to stop advertising.

arctic wedgeBOT May 5, 2025, 6:46 PM

#

:incoming_envelope: :ok_hand: applied timeout to @last oriole until <t:1746474360:f> (1 hour).

verbal oar May 5, 2025, 6:51 PM

#

putting 1d data to 2d is can be called embedding?

#

or unproject
I'm viewing ml teach by doing and about feature representation

limber spear May 5, 2025, 7:04 PM

#

verbal oar putting 1d data to 2d is can be called embedding?

No clue mq. That is a great question. Embedded in hardware or are you wondering about embedding in software. What do you think chat

verbal oar May 5, 2025, 7:05 PM

#

ah ok Im thinking about embedding from math

#

as is word embedding

#

2d to 1d is to project

#

but vice versa? looks like its embedding not sure just

limber spear May 5, 2025, 7:07 PM

#

I think vector embeddings have applications in cybersecurity though I am not sure.

verbal oar May 5, 2025, 7:07 PM

#

hmm there is t-sne

#

t-sn e (embedding)

#

t-distributed stochastic neighbor embedding for sure

limber spear May 5, 2025, 7:11 PM

#

I find it interesting what chunks of data can do. They dance around in our little machines 😅

#

Paint pictures. It’s fascinating

lapis sequoia May 5, 2025, 7:40 PM

#

someone said this learn principles, algorithm, architecture (as in - design your own architecture not copy someone else architecture without understanding why you are doing things that way)
is this also related to ai ml stuff? or ml is a completely different thing and architecture and stuff dont apply to it?

quartz wren May 5, 2025, 7:47 PM

#

hey anyone online
any tips for profile face detection
cant find anything anywhere
currently using cv2 and mediapipe
detection is really good for frontal
but not good for profile side
ping me if answer

verbal oar May 5, 2025, 8:40 PM

#

architecture is about big picture of system birds eye view without going into details, as I think about it

#

I assume you mean model architecture

serene scaffold May 5, 2025, 8:49 PM

#

lapis sequoia someone said this learn principles, algorithm, architecture (as in - design you...

Model architecture and software architecture are two different, largely unrelated things

limber spear May 5, 2025, 8:53 PM

#

Let’s put them together and call it smodel or modware architecture

#

Tbh this is why I love this field. Innovation is endless. Make up words. Build a new model. Been having a blast. It’s like building with lego blocks

quaint mulch May 6, 2025, 2:53 AM

#

verbal oar but vice versa? looks like its embedding not sure just

the way I see people use it, both are projections, 1d to 2d and vice versa

quaint mulch May 6, 2025, 2:54 AM

#

lapis sequoia someone said this learn principles, algorithm, architecture (as in - design you...

It is completely different thing.
But strangely, still applies.

quaint mulch May 6, 2025, 2:56 AM

#

verbal oar architecture is about big picture of system birds eye view without going into de...

No, architecture is more about the "family" of models. For example, I think of VGG as an architecture and you can get many version of it, but they are all behaving in a similar way.

quaint mulch May 6, 2025, 2:57 AM

#

quartz wren hey anyone online any tips for profile face detection cant find anything anywher...

why is it not good?

dusty forge May 6, 2025, 4:14 AM

#

I made a tool called ParquetToHuggingFace to help you upload your audio data to Hugging Face easily in Python. It takes your raw .wav files, turns them into Parquet format, and then uploads them to the Hub. The repo has clear steps on how to set everything up, where to put your files, and how to run the script. If you're working with speech data and want a quick way to share it on Hugging Face, give it a try!
GitHub Repo: https://github.com/pr0mila/ParquetToHuggingFace

GitHub

GitHub - pr0mila/ParquetToHuggingFace: ParquetToHuggingFace process...

ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scri...

#

🎉 Introducing GroqStreamChain! 🎉
A real-time AI chat application built with Python , FastAPI, WebSocket, LangChain and Groq. 💬 Seamlessly stream AI responses and interact with smarter chatbots powered by cutting-edge technology. 🤖
🚀 Features:

Real-time WebSocket communication
Streaming AI responses
Smooth and responsive UI
🔗 Check out the project on GitHub: https://github.com/pr0mila/GroqStreamChain
Join the conversation and start building your own AI-powered chat apps today! 💬

GitHub

GitHub - pr0mila/GroqStreamChain: GroqStreamChain is a real-time AI...

GroqStreamChain is a real-time AI-powered chat app using FastAPI, WebSocket, and Groq. It streams AI responses for interactive, low-latency communication with session management and a clean, respon...

rich river May 6, 2025, 5:11 AM

#

https://github.com/ultralytics/ultralytics/blob/main/examples/YOLOv8-ONNXRuntime-CPP/inference.cpp

char* YOLO_V8::WarmUpSession() {
    clock_t starttime_1 = clock();
    cv::Mat iImg = cv::Mat(cv::Size(imgSize.at(0), imgSize.at(1)), CV_8UC3);
    cv::Mat processedImg;
    PreProcess(iImg, imgSize, processedImg);
    if (modelType < 4)
    {
        float* blob = new float[iImg.total() * 3];
        BlobFromImage(processedImg, blob);
        std::vector<int64_t> YOLO_input_node_dims = { 1, 3, imgSize.at(0), imgSize.at(1) };
        Ort::Value input_tensor = Ort::Value::CreateTensor<float>(
            Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, 3 * imgSize.at(0) * imgSize.at(1),
            YOLO_input_node_dims.data(), YOLO_input_node_dims.size());
        auto output_tensors = session->Run(options, inputNodeNames.data(), &input_tensor, 1, outputNodeNames.data(),
            outputNodeNames.size());
        delete[] blob;
        clock_t starttime_4 = clock();
        double post_process_time = (double)(starttime_4 - starttime_1) / CLOCKS_PER_SEC * 1000;
        if (cudaEnable)
        {
            std::cout << "[YOLO_V8(CUDA)]: " << "Cuda warm-up cost " << post_process_time << " ms. " << std::endl;
        }
    }
...

what does WarmUpSession do here?

GitHub

ultralytics/examples/YOLOv8-ONNXRuntime-CPP/inference.cpp at main ...

Ultralytics YOLO11 🚀. Contribute to ultralytics/ultralytics development by creating an account on GitHub.

quartz wren May 6, 2025, 7:33 AM

#

quaint mulch why is it not good?

its designed for frontal view

#

theres some frames where it doesnt work

#

i waas thinking about using retinaface but in terms of costs i think i will go with opencv and mediapipe

rare bane May 6, 2025, 10:43 AM

#

Hello is there anyone who works with Tf-idf vectorization?

rich moth May 6, 2025, 2:01 PM

#

https://docs.google.com/document/d/1Cgi1XPuSTduMxGeypXdAGWabQrX8oMyEVW5WveCdcYE/edit?usp=sharing

Google Docs

UCF Deep Synthesis: Unveiling the Structure of Learnable Knowledge

Unified Complexity Framework (UCF): Deep Synthesis of Findings - Unveiling the Structure of Learnable Knowledge 1. Introduction: Beyond Optimization - A Framework for Understanding The extensive evaluation of the Unified Complexity Framework (UCF), defined by Φ(x) = N + A·e^(iθ) + ε, across ~62 d...

serene scaffold May 6, 2025, 2:08 PM

#

rare bane Hello is there anyone who works with Tf-idf vectorization?

always always ask your actual question. never ask if someone knows about your question without asking your actual question.

rare bane May 6, 2025, 2:08 PM

#

serene scaffold always always ask your actual question. never ask if someone knows about your qu...

Ok noted

#

That said does a Tf-idf vectorization always have to transform-fit with description of an unsupervised data?

lapis sequoia May 6, 2025, 2:11 PM

#

stealcrus

#

thanks bro

serene scaffold May 6, 2025, 2:11 PM

#

rare bane That said does a Tf-idf vectorization always have to transform-fit with descript...

sounds like you're using sklearn. which is a specific way of doing tfidf vectorization.
do you know the difference between fit and transform, in sklearn?

lapis sequoia May 6, 2025, 2:11 PM

#

stela are you into ml? whats the hardest thingforoy

serene scaffold May 6, 2025, 2:11 PM

#

lapis sequoia stela are you into ml? whats the hardest thingforoy

the hardest thing in ML?

lapis sequoia May 6, 2025, 2:12 PM

#

yeah

#

for you

#

?notforeveryone else for you

rare bane May 6, 2025, 2:12 PM

#

serene scaffold sounds like you're using sklearn. which is a specific way of doing tfidf vectori...

Yes I do, however I've never worked with unsupervised datasets before, so I'm just learning

serene scaffold May 6, 2025, 2:12 PM

#

nothing is hard for me, because I'm awesome.

lapis sequoia May 6, 2025, 2:12 PM

#

NICE ILOVE THAT CONFIDENCR BRO

#

THAT WAS COOL BRO THATS WHAT IM TALKING ABOUT BE CONFIDENT NOTHING IS HAARD

#

ITS TOOO EASY

#

ITS TOO EASY

serene scaffold May 6, 2025, 2:13 PM

#

I'm not actually being serious. I have problems that I don't immediately know how to solve every day.

#

when I was a beginner, the hardest part was dealing with how everyone explains things differently. not like in this server, but in books and online guides

serene scaffold May 6, 2025, 2:14 PM

#

rare bane Yes I do, however I've never worked with unsupervised datasets before, so I'm ju...

a dataset isn't inherently supervised or unsupervised. training techniques are.

rare bane May 6, 2025, 2:15 PM

#

serene scaffold a dataset isn't inherently supervised or unsupervised. training techniques are.

Hmm I see. I have used the .fit function in model.fit when I started out with linear regression models

serene scaffold May 6, 2025, 2:16 PM

#

rare bane Hmm I see. I have used the .fit function in model.fit when I started out with li...

a tfidf vectorizer encodes text, so you use the fit method to tell it what kind of text it's going to be encoding.

the tfidf vectorizer's knowledge of which words are frequent and which are not--and which words exist at all--are determined when you fit it.

rare bane May 6, 2025, 2:17 PM

#

serene scaffold a tfidf vectorizer encodes text, so you use the fit method to tell it what kind ...

Ohhhh that explains the clustering I was getting

#

Sorry if that doesn't explain too much, but I understand why it is done now

#

I just needed to know why it was fit and now for the transform part?

rare bane May 6, 2025, 2:18 PM

#

rare bane I just needed to know why it was fit and now for the transform part?

I think i may need to provide better context

serene scaffold May 6, 2025, 2:19 PM

#

rare bane I just needed to know why it was fit and now for the transform part?

you can use fit_transform if you want to encode the same text that you used to fit the vectorizer.

verbal oar May 6, 2025, 3:17 PM

#

I think ml is not hard, hard thing is to understand it then its easy

#

maybe rather its at start not intuitive and one must build some intuition

jaunty helm May 6, 2025, 3:19 PM

#

TIL how to annotate a heatmap in hvplot/holoviews with dynamic text color based on the value by hitting my head against the wall a lot of times
(why is it so hard and why can't I find docs explaining it... probably skill issue)
assuming hm is your heatmap:
hm * hv.Labels(hm).opts(text_color=-hv.dim('value'), cmap='binary')
where 'value' is literal and not something you change, and the - is there so low values are dark instead, so you can actually see on the heatmap

verbal oar May 6, 2025, 3:20 PM

#

also depends if someone thinks about classic ml as ml
or ml and deep learning as machine learning

#

for me its sometimes confusing

quaint mulch May 6, 2025, 3:35 PM

#

serene scaffold when I was a beginner, the hardest part was dealing with how everyone explains t...

Setting up environment.
Nothing is harder than this.

sudden delta May 6, 2025, 3:35 PM

#

i think the hardest thing about ML is people who know ML are bad at explaining ML

verbal oar May 6, 2025, 3:37 PM

#

agree

#

you have sth in your eyes, see patterns and can't transfer it to someone

#

but this thing could be learned right, thats where experience comes in

lapis sequoia May 6, 2025, 3:41 PM

#

can a data analyst please reach out to me? i'm facing trouble with cleaning shopify data

verbal oar May 6, 2025, 3:42 PM

#

dont sure but maybe think about cleaning data not shopify specific, try to generalize it

jaunty helm May 6, 2025, 3:58 PM

#

jaunty helm TIL how to annotate a heatmap in `hvplot/holoviews` with dynamic text color base...

oh, and to change the text formatting (e.g. only to 3 decimal places):

dim = hv.Dimension('value', value_format=lambda x: f'{x:.3f}')
hm * hv.Labels(hm, vdims=dim).opts(text_color=-hv.dim('value'), cmap='binary')

limber spear May 6, 2025, 4:49 PM

#

Someone in chat build a data cleaning stack or something 😂 it’s so boring 💀

limber spear May 6, 2025, 4:50 PM

#

lapis sequoia can a data analyst please reach out to me? i'm facing trouble with cleaning shop...

See deadge tell em playboicatering

serene grail May 6, 2025, 4:51 PM

#

limber spear Someone in chat build a data cleaning stack or something 😂 it’s so boring 💀

Which parts do you find boring? I'm only a beginner in pandas but automating some of the data cleaning sounds like an interesting project, I might try my hand at this

limber spear May 6, 2025, 4:52 PM

#

serene grail Which parts do you find boring? I'm only a beginner in pandas but automating som...

Omgersh where do I start. Feature engineering should be streamlined. But responsibly

#

Especially with medical data for example

#

I love my granny bro

serene grail May 6, 2025, 4:54 PM

#

Yeah that's probably above my head
I'm just learning basic EDA for now

limber spear May 6, 2025, 4:54 PM

#

No worries keep on cooking 🔥

verbal oar May 6, 2025, 6:13 PM

#

etl is responsible for data cleaning?

serene scaffold May 6, 2025, 6:13 PM

#

verbal oar etl is responsible for data cleaning?

what does etl stand for?

verbal oar May 6, 2025, 6:13 PM

#

extract transform load

serene scaffold May 6, 2025, 6:14 PM

#

what's the context for this?

verbal oar May 6, 2025, 6:14 PM

#

more specifically transform for data cleaning?

serene scaffold May 6, 2025, 6:14 PM

#

I've never heard of "extract transform load"

viscid urchin May 6, 2025, 6:14 PM

#

The “T” is where you typically do some cleaning yeah

#

“ETL pipeline” is a fairly common phrase

serene scaffold May 6, 2025, 6:15 PM

#

people are always inventing new terms 😠

viscid urchin May 6, 2025, 6:15 PM

#

More for regular systems than for ML though

verbal oar May 6, 2025, 6:15 PM

#

in short sth related to normalization, normal forms, there is 5nf at most

#

etl is in data warehouses course

#

no one cleans data manually?

#

I mean instead one can just use some etl tool

#

load means feed data to data warehouse

glacial root May 6, 2025, 6:47 PM

#

lapis sequoia NICE ILOVE THAT CONFIDENCR BRO

https://tenor.com/view/thats-why-hes-the-goat-the-goat-bigfellerjake-skippe-space-i-think-gif-26908844

Tenor

dense needle May 6, 2025, 7:15 PM

#

serene scaffold I've never heard of "extract transform load"

It’s all over the place in job postings I have seen

#

Seems like it is a term from data engineering?

#

I have seen it used a bunch in the small amount of time I have spent in r/dataengineering

viscid urchin May 6, 2025, 7:28 PM

#

ETL is a pretty old term. From the 80s. There used to be arguments about whether you should transform before loading or vice versa.

past meteor May 6, 2025, 7:53 PM

#

verbal oar etl is responsible for data cleaning?

It's broader than that, it's reshaping the data and putting it into a format that is fit for purpose for downstream tasks

#

For example, SAP ERP can have 100k tables (not kidding)

#

If you want to do anything with this downstream you're definitely going to want to pull this data down to some other place, consolidate, reshape etc.

viscid urchin May 6, 2025, 7:55 PM

#

SAP is wild. Some of the servers people are running it on just have comical amounts of RAM.

verbal oar May 6, 2025, 7:55 PM

#

yes normalize

past meteor May 6, 2025, 7:55 PM

#

The inverse actually denormalize

verbal oar May 6, 2025, 7:55 PM

#

yes ok

past meteor May 6, 2025, 7:55 PM

#

Normalize ==> adding more tables, denormalize ==> making it flat(ter)

verbal oar May 6, 2025, 7:56 PM

#

I meant joining tables

#

merging

#

my bad

past meteor May 6, 2025, 7:57 PM

#

No worries, it's clear you know what it means 🙂

lapis sequoia May 6, 2025, 7:59 PM

#

#

gesturing whilest learning can help you immensely

verbal oar May 6, 2025, 8:00 PM

#

I think must have muscle memory

#

so for example when understanding perceptron rotate hands?

fallow coyote May 6, 2025, 8:02 PM

#

should I install the dotenv module for interacting with .env variables? or is there another module you lot recommend? For a little bit more informastion, Ill be making my first program that uses the AlphaVantage API to extract stock market information (for this case, extracting data concerning the US Dollar index). I want to start using webscraping and learning how to use APIs particularly for data analysis

verbal oar May 6, 2025, 8:02 PM

#

to improve retention of rotating line

lapis sequoia May 6, 2025, 8:04 PM

#

past meteor May 6, 2025, 8:04 PM

#

fallow coyote should I install the dotenv module for interacting with .env variables? or is th...

Yeah!

A common pattern is that configuration is stored as environment variables. Lots of us deploy with Docker or Kubernetes, which means we can "inject" these env vars right into a specialised place where we run our app.

During local development we still need to provide some config. This is typically done in the form of a .env file that contains all the secrets (API keys and whatnot). This file is read with stuff like dotenv.

lapis sequoia May 6, 2025, 8:05 PM

#

no its true ever think about how when your thinking about somehtig youlook up in thes sky instead of doing that use yourhands as well. it might looks weird but it also might works

lapis sequoia May 6, 2025, 8:06 PM

#

verbal oar to improve retention of rotating line

kinda like that yeah think of a box but now think of a box whilesr gesturing your hands notice how when you do that the box becomes much clearer in yourhead

trim dock May 6, 2025, 8:09 PM

#

can i know does this looks gud?

spring field May 6, 2025, 10:11 PM

#

lapis sequoia gesturing whilest learning can help you immensely

did you "verify critical facts"?

spring field May 6, 2025, 10:13 PM

#

past meteor Yeah! A common pattern is that configuration is stored as **environment variabl...

you could do local development in the container as well though

past meteor May 6, 2025, 10:14 PM

#

spring field you could do local development in the container as well though

dev containers always felt super janky to me

spring field May 6, 2025, 10:15 PM

#

idk, I love 'em
don't have to worry about any env setup pretty much, just start the container and begin developing stuff

lapis sequoia May 6, 2025, 10:45 PM

#

spring field did you "verify critical facts"?

try it see if it work

#

s

glacial root May 6, 2025, 11:49 PM

#

what does it mean by manually describe

#

i don't understand how else it would do it without ocr

#

this is chatgpt by the way

agile cobalt May 6, 2025, 11:56 PM

#

glacial root this is chatgpt by the way

if ChatGPT says something that makes no sense, odds are it makes no sense and it's just the model hallucinating

Technically you could have

native multi-modal image inputs (tokenize the image and feed it directly to the model as part of the prompt)
a separate OCR tool the model can use via function calling
and it would make sense if the model tried to use the OCR tool first, then used native image inputs after it failed, but that's extremely unlikely

glacial root May 7, 2025, 12:08 AM

#

agile cobalt if ChatGPT says something that makes no sense, odds are it makes no sense and it...

by tokenize the image, do you mean it would go through the image and extract haar features and then use that to detect each character?

lapis sequoia May 7, 2025, 1:30 AM

#

Is it rad that I avoided langchain forever because I just thought it was trendy garbage and fine tuning T5 is more lit? Am I like a hipster now? RLHF is cool, but I don’t want to abandon my roots.

grand minnow May 7, 2025, 2:44 AM

#

The alternative to Langchain that I found is LiteLLM. Its nice

serene scaffold May 7, 2025, 3:49 AM

#

@proven current I removed your message because the content is disturbing for some users

quaint mulch May 7, 2025, 5:35 AM

#

glacial root this is chatgpt by the way

there are no inherent meaning behind anything ChatGPT said.

rich moth May 7, 2025, 5:42 AM

#

https://docs.google.com/document/d/1i1cShe-DFkFvCNpWng1iO4VJp7Iti6Fo_BkatFBgKb0/edit?usp=sharing

Google Docs

The Unified Complexity Framework: Revolutionizing Multi-Domain Data...

The Unified Complexity Framework: Revolutionizing Multi-Domain Data Complexity Analysis Abstract This paper introduces the Unified Complexity Framework (UCF), a revolutionary approach to quantifying data complexity across diverse domains. Unlike conventional methods that treat complexity as domai...

rich moth May 7, 2025, 6:28 AM

#

Ok It's my rough draft on my research paper .

#

This could change the game of how we design datasets. Imagine datasets with built-in complexity metadata that map optimal learning pathways and make curriculum learning effortless, eliminating the need to calculate sample complexity during training. UCF can enhance these datasets, transforming machine learning data from simple collections into structured knowledge maps with clear learning trajectories, dramatically improving training efficiency and transfer learning capabilities. This could establish a new gold standard for ML datasets where curriculum-readiness becomes a core feature rather than an afterthought, reimagine how we approach data design across all domains

#

The proof is in the pudding. It's all on the wall.

sudden delta May 7, 2025, 7:07 AM

#

a way of sorting data by complexity so you can feed hard or easy stuff first?

limber spear May 7, 2025, 10:40 AM

#

past meteor For example, SAP ERP can have 100k tables (not kidding)

I didn’t think of it this way before 🤯 that is a game changer. In the DE community a lot of the conversations move toward code but rarely fundamentals are discussed

quaint mulch May 7, 2025, 12:53 PM

#

rich moth https://docs.google.com/document/d/1i1cShe-DFkFvCNpWng1iO4VJp7Iti6Fo_BkatFBgKb0/...

I suggest you put it somewhere more "official"
The idea being, you can claim it that you have written this at a certain date with timestamp.
Ideally ArXiv, if you cannot, then github will do.

quaint mulch May 7, 2025, 1:09 PM

#

achieving performance improvements of up to 149% over random sampling baselines.
You need to use a better baseline rather than just random sampling. You need to use the latest SOTA in curriclum learning.

#

It seems that you purposedly not reveal the methods?

quaint mulch May 7, 2025, 1:10 PM

#

rich moth https://docs.google.com/document/d/1i1cShe-DFkFvCNpWng1iO4VJp7Iti6Fo_BkatFBgKb0/...

are you looking for feedback? or something else?

verbal oar May 7, 2025, 1:27 PM

#

anyone can write research paper?

quaint mulch May 7, 2025, 1:30 PM

#

Yes, anyone can.

verbal oar May 7, 2025, 1:31 PM

#

but I see its hard to start different style of language, similar to writing thesis

quaint mulch May 7, 2025, 1:34 PM

#

I mean, anyone can, not saying that it is easy.
With enough effort and resources, almost everyone can,
the question is if it is worth it.

coral apex May 7, 2025, 1:57 PM

#

so im working on benchmarking various AF to improve a NN model that is trying to learn the pattens of the sin function.

the thing is that i dont want to use any fancy methods like normalization or special optimizers just yet.

and am trying to improve the functionality just by only changing the following configurations: number of hidden layers, number of hidden neurons, learning rate, gradient clipping threshold(ik this is a fancy method but its unavoidable for now) .

so the problem im encountering is that my current model is adapting and predicting well when the training values include values only form -pi to +pi (with 500 samples)with a loss of upto 10^-5, but the moment i increase the range to lets say -100 to +100 (5000 samples) predictions of all the activation function are stagnating at a loss of 0.5 which is no where enough obviously

any idea on how to fix this or improve this ?
shoudl i send my code here or is that not allowred ?

#

using basic GD btw

limber spear May 7, 2025, 2:15 PM

#

coral apex so im working on benchmarking various AF to improve a NN model that is trying to...

What is meant by value here when you state -pi to +pi I’m not catching on

coral apex May 7, 2025, 2:23 PM

#

limber spear What is meant by value here when you state -pi to +pi I’m not catching on

im using
x = np.linspace(-np.pi,np.pi,500)

#

training samples i mean

#

should i just send the entire code ?

#

print("hello world test for syntax highlighting")

limber spear May 7, 2025, 2:25 PM

#

coral apex im using x = np.linspace(-np.pi,np.pi,500)

I went to the machines. Total guess here, you’re mapping of your inputs [-pi to +pi] to target labels of [-1 to 1], but issue maybe is your build of taking in inputs of [-100 to 100], your model isn’t designed for that. Probably just have to refactor portions of your build like functions

coral apex May 7, 2025, 2:27 PM

#

waht do you mean by my model is not designed to take in those inputs?
im pretty new to ml too btw
like should it just take the inputs and try to learn it according to its map which will be [-1,1]??

#

print("test)

📎 message.txt

arctic wedgeBOT May 7, 2025, 2:29 PM

#

coral apex ```python print("test) ```

Click here to see this code in our pastebin.

limber spear May 7, 2025, 2:29 PM

#

coral apex waht do you mean by my model is not designed to take in those inputs? im pretty ...

It was a guess. It depends what you’re targeting for your outputs. It could be binary like 0 or 1. Or a range [0 to 1], [-1 to 1]. If that makes sense

coral apex May 7, 2025, 2:30 PM

#

limber spear I went to the machines. Total guess here, you’re mapping of your inputs [-pi to ...

you are completely right here i am trying to predict the mapped range of [-1,1] values

limber spear May 7, 2025, 2:32 PM

#

Oh ok you would probably just have to do a light refactoring in your build to map everything correctly

coral apex May 7, 2025, 2:33 PM

#

how would that look like and what was my mistake?

limber spear May 7, 2025, 2:35 PM

#

coral apex im using x = np.linspace(-np.pi,np.pi,500)

Total guess here. It could be as basic as the line of code you shared here: x = np.linspace(-np.pi,np.pi,500)

coral apex May 7, 2025, 2:36 PM

#

but whatt in that though ?

limber spear May 7, 2025, 2:36 PM

#

Maybe what x does in your build. If that makes sense

#

The -pi to +pi you mentioned

coral apex May 7, 2025, 2:37 PM

#

i am so lost here, what are you trying to tell me?

#

i commented out the -pi,pi cuz thatt one works fine but the -100,100 does not

limber spear May 7, 2025, 2:38 PM

#

I’m a noob tutor mkay chat who’s a better explainer

coral apex May 7, 2025, 2:38 PM

#

T_T
alright

#

thanks though!

limber spear May 7, 2025, 2:40 PM

#

You can test ranges out. But I like breaking stuff to learn

#

👍

#

Lock in you’re on the right track bKC

#

I just suck at explaining. That means I don’t have the science down. Need to lock in as well lol

coral apex May 7, 2025, 2:45 PM

#

limber spear You can test ranges out. But I like breaking stuff to learn

yess will do
but i think -100,100 is just too much to expect from a basic NN
ig ill just have to accept that this is the best it can do
and add upgrades like normalizatoin and optimization techniques

coral apex May 7, 2025, 2:45 PM

#

limber spear I just suck at explaining. That means I don’t have the science down. Need to loc...

real real

limber spear May 7, 2025, 2:50 PM

#

i was in the middle of cooking this up for a class

umbral hearth May 7, 2025, 4:13 PM

#

coral apex i commented out the -pi,pi cuz thatt one works fine but the -100,100 does not

Why

#

Isnt it -1:1

weak oxide May 7, 2025, 4:23 PM

#

Have any of you guys used the SEC API

#

Not the one from Python itself which requires the API key but from the SEC which is the one where you request the headers

coral apex May 7, 2025, 4:31 PM

#

umbral hearth Isnt it -1:1

waht are you askingg exacctly ?

stuck brook May 7, 2025, 6:51 PM

#

Hello everyone, I have a small question, where can I find well-documented datasets that can support academic research or thesis developlment ?? Maybe some open-access platformsor even government data portals. Bonus poitns for anything that supports ML, predictive anaylitics. Thank you !! PLEASE @ ME HERE

agile cobalt May 7, 2025, 6:57 PM

#

stuck brook Hello everyone, I have a small question, where can I find well-documented datase...

Kaggle and Hugging Face are good places to start

you can also use https://datasetsearch.research.google.com/

rich moth May 8, 2025, 2:40 AM

#

quaint mulch are you looking for feedback? or something else?

Honestly, I don't know. I'm lost what I do.. I would love to monetize it. Get me out of my UPS driving career. After 17 years I'd love to jump switch into a more technical domain.

limber spear May 8, 2025, 3:18 AM

#

rich moth Honestly, I don't know. I'm lost what I do.. I would love to monetize it. Get...

Have you pinged the Hacker News community. They can Simon Cowell your stack if it’s good or not

quaint mulch May 8, 2025, 6:15 AM

#

rich moth Honestly, I don't know. I'm lost what I do.. I would love to monetize it. Get...

I guess, 1st of all, congrats for getting this far, it seems that you did some real studying and real work.

2nd, sorry to burst your bubble, but there are still some gap between your draft and something publishable. The gap is not unsurmountable, but I suppose, somewhere between few weeks to few months. The biggest issue I can spot is that you need SOTA curriculum learning as your baseline. I also cannot judge your method and it is not revealed yet. So keep up the good work and you'll get there.

3rd, the sad news is, even if this is published at top journal, there are still a huge gap between that, and monetizing it. And I don't even know what. A lot of PhD fresh grad want to convert their thesis to a startup, but very few succeed. If I know how, I would have done it myself, I want to get rich quick too.

Finally, if you put the full version (with your methods and code) on a github, you can start emailing professors and ask for collaboration. They get to be co-authors, and you get valuable feedback and even funding for conference submission, and from there, I hope it is one step easier to get into some ML jobs.

sudden delta May 8, 2025, 6:27 AM

#

rich moth Honestly, I don't know. I'm lost what I do.. I would love to monetize it. Get...

just one recommendation, when you show it to people, instead of a paragraph about how it will revolutionize the field, just say what it does

limber spear May 8, 2025, 6:43 AM

#

I have a bit of food for thought. So the father of modern genetics Gregor Mendel his work went unrecognized in the scientific community until about 16 years after his death. Sometimes no one even cares 😂 5+ centuries from now who will remember these billionaires

#

Sorry about laughing. Idk I think about these things.

rich river May 8, 2025, 7:25 AM

#

I want to rewrite this without using libtorch

  torch::Tensor pred_masks = torch::nn::functional::interpolate(
      masks.index({scores_mask, torch::indexing::Ellipsis}),
      torch::nn::functional::InterpolateFuncOptions().size(
          std::vector<int64_t>({input_height_, input_width_})));

in which masks and scores_mask are both tensors.
I don't want to use libtorch because the library introduces great space cost to my project, but I dont know how to rewrite the functions such as the torch::nn::functional::InterpolateFuncOptions() and .index by using pointers like float *
anyone has ideas about it?

lapis sequoia May 8, 2025, 7:30 AM

#

does anybody have experience analysing META data?

ashen venture May 8, 2025, 8:44 AM

#

Guys which packages and from where I should learn in python to master data science

fallow coyote May 8, 2025, 11:21 AM

#

ashen venture Guys which packages and from where I should learn in python to master data scien...

python for data analysis by wes kineey is a good book to get you started off. Thats what I used. I will say make sure you spend a lot of your time learning the maths or you wont be able to use the modules to their full effectiveness

verbal oar May 8, 2025, 11:37 AM

#

I'm impressed you are self taught, I thought you have some background, nice to hear and good luck

fallow coyote May 8, 2025, 11:44 AM

#

I swear learning the statistics for ML is fucking annoying. Half the time I'm trying to interpret the context of the notation and symbols. Like I'm learning in a section about multivariate gaussian distribution; why tf is sigma being used as a variable?! now I have to distinguish between sigma meaning 'sum of' and sigma as a variable. Apologies for the rant. Hopefully learning the linear algebra side will be a bit easier to interpret

verbal oar May 8, 2025, 11:45 AM

#

learn at first about gaussian distribution not mutlivariate would be less confusing

#

sigma is variance

#

oh sorry std - standard deviation, sigma squared is variance

#

and maybe read "statistics for machine learning" not about statistics without context its more annoying

fallow coyote May 8, 2025, 11:50 AM

#

I mean the big sigma not the small sigma that denotes the variance. i swear what were these staticians smoking when they came up with these formulas and decide to not use distinguishable notation?

quaint mulch May 8, 2025, 11:50 AM

#

fallow coyote I swear learning the statistics for ML is fucking annoying. Half the time I'm tr...

coz this is a frankenstine?
Some many people got into this field from electrical engineering and came up with a lot of signal processing ideas, some are stats, some are pure maths, some are coming from physics, ETC2

verbal oar May 8, 2025, 11:50 AM

#

maybe machine learning mastery have sth like this dont sure

wooden sail May 8, 2025, 11:52 AM

#

fallow coyote I mean the big sigma not the small sigma that denotes the variance. i swear what...

in fairness, the sigma used for covariance matrices is usually either bold or caligraphic, and the summation one has a sub- and superindex

verbal oar May 8, 2025, 11:52 AM

#

for example I learned sth about student t distribution "where it is used?", my question was then
I saw data science full archive and there ah right its in t-sne (t-distributed),
poisson distribution ah in poisson regression etc

#

I mean statistics course without context, also sth like confidence interval, dont sure if they should teach this way

fallow coyote May 8, 2025, 11:55 AM

#

wooden sail in fairness, the sigma used for covariance matrices is usually either bold or ca...

I've noticed that but even still, I've always seen the big sigma as 'sum of'. This is my first time actually seeing big sigma in another context. Im giving it another two weeks and then going to focus for a few weeks on learning linear algebra whilst going through the relational databases course on freecodecamp to learn some new skills

verbal oar May 8, 2025, 11:56 AM

#

but it was not about calculating distribution but reading some stats lookup table

wooden sail May 8, 2025, 11:56 AM

#

fallow coyote I've noticed that but even still, I've always seen the big sigma as 'sum of'. Th...

it's probably a good habit to always read the notation table at the beginning of a book. one lie that people learn in school is that math notation is somehow fixed and standardized

verbal oar May 8, 2025, 11:56 AM

#

instead of substituting in formula

wooden sail May 8, 2025, 11:56 AM

#

it isn't 😛 at all

fallow coyote May 8, 2025, 12:00 PM

#

wooden sail it isn't 😛 at all

I dont think the ISLP book has some form of notation guide. Tbf, the more I learn about statistics and the more I go through the book, it gets easier to understand the overall concepts. Im only learning the surface level understanding so I can be able to use the ML modules effectively enough. I can always at later date go indepth in the proper theory behind the cocepts

verbal oar May 8, 2025, 12:00 PM

#

question why they reduce from 768 dims to 2 dims with umap, cant be t-sne or other dim reduction method? inside nlp and trasformers book by oreily

#

hmm assuming you didnt read it its hard to explain

fallow coyote May 8, 2025, 12:06 PM

#

I couldnt tell you mate XD. Still attmepting to learn the maths

tawdry finch May 8, 2025, 12:15 PM

#

hey i am an intermediate in python i am looking for communities to join to work with anyone intrested like small projects etc

limber spear May 8, 2025, 1:30 PM

#

I asked the bots about the foundations of data science. Does this tree diagram look complete

#

Only linear algebra and calculus pikawow that is a ton of innovation baked into just 1 bubble node 👀 imagine what the mathematicians would say

elfin shadow May 8, 2025, 1:32 PM

#

This looks nice. I cant be botherd to read most of it.

limber spear May 8, 2025, 1:35 PM

#

Fair enough. Save for research purposes meowthumbsup

ashen venture May 8, 2025, 2:26 PM

#

Never thought data science was so tough 😮

#

This probably explains the salary paid to them

serene scaffold May 8, 2025, 2:29 PM

#

ashen venture This probably explains the salary paid to them

the amount that a job pays is often a function of how much training/experience is required to do it

ashen venture May 8, 2025, 2:31 PM

#

def calsal ():

frozen arch May 8, 2025, 3:04 PM

#

lapis sequoia May 8, 2025, 3:36 PM

#

limber spear I asked the bots about the foundations of data science. Does this tree diagram l...

Are you sure you need all of that?

#

because for example I know that MLOps is a whole field like ML

#

or you just need to have a shallow info about it like you do with ML?

#

plus you don't need all of these programming languages

#

just one would be enough

#

like python

maiden harbor May 8, 2025, 3:37 PM

#

limber spear I asked the bots about the foundations of data science. Does this tree diagram l...

where would AGI be?

limber spear May 8, 2025, 3:37 PM

#

As shallow as an activation function blobhuh

#

Or a Wilcoxon rank sum test

lapis sequoia May 8, 2025, 3:38 PM

#

https://roadmap.sh/ai-data-scientist

roadmap.sh

AI and Data Scientist Roadmap

Learn to become an AI and Data Scientist using this roadmap. Community driven, articles, resources, guides, interview questions, quizzes for modern backend development.

#

I am not sure if it's the best roadmap but this website is kinda popular

limber spear May 8, 2025, 3:40 PM

#

Ah yeh those guys. I know about them

limber spear May 8, 2025, 3:41 PM

#

maiden harbor where would AGI be?

No clue 💀

lapis sequoia May 8, 2025, 3:41 PM

#

lapis sequoia https://roadmap.sh/ai-data-scientist

but this roadmap is even looks overwhelming and kinda confusing as they are speaking about two fields at the same time

maiden harbor May 8, 2025, 3:42 PM

#

limber spear No clue 💀

lol, it's okay. Do you intend to expand this diagram or not?

lapis sequoia May 8, 2025, 3:42 PM

#

but maybe they share these

maiden harbor May 8, 2025, 3:43 PM

#

Maybe add, reinforcement learning, and or that thing where AI train themselves but I forgot the name

limber spear May 8, 2025, 3:43 PM

#

I am looking to perfect my craft. Nothing more nothing less tbh

#

And contribute to society which I probably suck at

#

I like the roadmap peeps. They are building something to help others learn

#

And build 👍

lapis sequoia May 8, 2025, 3:47 PM

#

try to get in touch with someone in the field you want to break in

#

tell them your background and let them guide you from there how to achieve your goal

limber spear May 8, 2025, 3:50 PM

#

lapis sequoia but this roadmap is even looks overwhelming and kinda confusing as they are spea...

I agree. It can be considered overwhelming. But the freedom of the field imho lies in the data + the science. The possibilities are endless

jaunty helm May 8, 2025, 3:52 PM

#

limber spear I asked the bots about the foundations of data science. Does this tree diagram l...

not sure what it's even trying to say tbh
like, does A -> B mean A contains B like machine learning -> deep learning?
does A -> B mean A is a prerequisite like LinAlg/Calc -> Stats?
does A -> B mean you should do A before B like EDA -> Feature Engineering?

limber spear May 8, 2025, 3:52 PM

#

Total guess here. The bot probably ran a decision tree or algorithm of some sort

jaunty helm May 8, 2025, 3:56 PM

#

also ig linear models, trees, clustering etc just don't exist anymore

limber spear May 8, 2025, 3:57 PM

#

How so

jaunty helm May 8, 2025, 3:57 PM

#

limber spear How so

not on the map anywhere
whereas deep learning gets like a quarter of the entire graph

#

also I just realized it put automl under deep learning

limber spear May 8, 2025, 3:59 PM

#

The diagram I posted? That would probably just be in 1 node of that diagram

jaunty helm May 8, 2025, 3:59 PM

#

limber spear The diagram I posted? That would probably just be in 1 node of that diagram

yeah
what I'm saying is the graph isn't great imo

limber spear May 8, 2025, 4:00 PM

#

I disagree

#

But. I understand your conjecture

#

If you frequent the Linux community. A diagram means squat

lapis sequoia May 8, 2025, 4:18 PM

#

limber spear I agree. It can be considered overwhelming. But the freedom of the field imho li...

it's always better to ask someone already in the field and has not even just a beginner

#

but it's up to you

limber spear May 8, 2025, 4:18 PM

#

I don’t think you know who I know

sudden delta May 8, 2025, 4:21 PM

#

would be interesting to see a timeline of when these different things on the map were invented

#

and how close that is to how they are chained

#

and how much is squished into the past few decades, while the math fundamentals go back centuries 😂

limber spear May 8, 2025, 4:35 PM

#

Honestly that is what I took from when I first pulled that visual. The statistics node has a timeline on its own merit.

#

Statisticians cook.

verbal oar May 8, 2025, 5:34 PM

#

i'll add graph theory

#

maybe fuzzy logic

#

maybe etl
metaheuristics, data mining

#

fuzzy logic due to fuzzy clustering c-means, neuro-fuzzy networks

#

maybe its overboard

verbal oar May 8, 2025, 8:14 PM

#

Knowledge representation and reasoning

drifting loom May 8, 2025, 9:10 PM

#

Anyone up for simple DS project? For skill up?

hollow cobalt May 8, 2025, 9:13 PM

#

I’m trying to clean and format a large raw text file. Does anyone know any methods that are best for cleaning large amounts of text?

drifting loom May 8, 2025, 9:14 PM

#

hollow cobalt I’m trying to clean and format a large raw text file. Does anyone know any metho...

In excel, right?

hollow cobalt May 8, 2025, 9:14 PM

#

.txt files

drifting loom May 8, 2025, 9:35 PM

#

hollow cobalt .txt files

I don't know sorry

drifting loom May 8, 2025, 9:36 PM

#

hollow cobalt .txt files

I suggest go for prompt in Google collab and it'll clean the data

untold dove May 8, 2025, 10:01 PM

#

maiden harbor Maybe add, reinforcement learning, and or that thing where AI train themselves b...

recursive self improvement ? or unsuprivised learning

#

the problem with RSI if you have read the STOP algo paper is it has a significant bottleneck sadly I think if you were really creative you could build off the implementation's within that paper with multiple algos maybe say beam search top p top k alteration. Or some other sort of dynamic deducing algo

limber spear May 9, 2025, 2:48 AM

#

I can’t believe I started a debate over that diagram. What I find interesting is that what if the diagram read left to right. Or right to left. Perspectives can differ thonk

short escarp May 9, 2025, 7:13 AM

#

Hi guys, anyone here is expert in machine learning. Mainly in sklearn.svm SVC (support vector classification). I want to ask some questions

fickle shale May 9, 2025, 7:26 AM

#

Don't ask to ask, just ask!!

quaint mulch May 9, 2025, 7:43 AM

#

limber spear I asked the bots about the foundations of data science. Does this tree diagram l...

I think it is very far for complete. Where would graph neural network be? How about Neural ODE? Or like contrastive learning? JEPA, Energy based model, geometric deep learning?

limber spear May 9, 2025, 8:03 AM

#

quaint mulch I think it is very far for complete. Where would graph neural network be? How ab...

Idk the deep learning node

#

You could literally draw a timeline in just the machine learning and deep learning nodes. 2 nodes.

#

But then if you turn it into a decision tree, everything changes

#

or does it

limber spear May 9, 2025, 8:24 AM

#

Perplexing. You could legit earn a doctorates degree with this research

fluid cave May 9, 2025, 8:34 AM

#

hello gys,
what are the best techniques to improve the accuracy of a classification model (tabular data with alot of categorical variables)

dusty forge May 9, 2025, 9:25 AM

#

fluid cave hello gys, what are the best techniques to improve the accuracy of a classificat...

Feature Engineering -
Create new features by combining or grouping existing ones.
Hyperparameter Tuning -
Test different model settings to boost performance.

verbal oar May 9, 2025, 9:48 AM

#

ensemble methods also

pine arch May 9, 2025, 7:51 PM

#

After PCA I had more than 2 PC that I can use for clustering, and its impossible to work with more than 3, what should I do in such cases?

limpid dew May 10, 2025, 12:49 AM

#

the tab:blue is goated

obtuse acorn May 10, 2025, 1:32 AM

#

so im using seaborn to plot data, and im using a pairplot and i can set corner=True so it doesnt have duplicate plots on one side of it

#

but is there a way to have it plot a different type of plot on one side

#

like if i set corner=true it doesnt plot the ones on the top right, but is there a way to have it plot a kde plot on the top right?
obv i could just manually edit the images so its got the other type on the top right but it would be easier if there was a way to do it programatically

#

also, any ideas for better ways to plot stuff?

jaunty helm May 10, 2025, 2:06 AM

#

obtuse acorn like if i set corner=true it doesnt plot the ones on the top right, but is there...

docs at the very bottom may be helpful

obtuse acorn May 10, 2025, 2:08 AM

#

thanks

jaunty helm May 10, 2025, 2:08 AM

#

obtuse acorn also, any ideas for better ways to plot stuff?

nothing off the top of my head; it simply looks like there's no difference between male/female when it comes to these features

#

if you use something like plotly or hvplot, then you click an item in the legend and that will be hidden
e.g. if I made this in hvplot, I can click Male then all Male data will be set to alpha 0.3 (configurable)
(hvplot is more like a higher api that can wrap matplotlib, bokeh or plotly; the latter two can do what I described)

obtuse acorn May 10, 2025, 2:21 AM

#

jaunty helm [docs](https://seaborn.pydata.org/generated/seaborn.pairplot.html) at the very b...

hmmm, it just overlays it, is there a way to hide the scatter plot on one side, setting corner=True makes it not draw the kde plot

fresh mulch May 10, 2025, 5:26 AM

#

Good day everyone, im new here and i just start learning Python its only been a month of daily solving fundamental problems for each basic topics in python (not include the OOP topics), and im almost done with studying basic topics while solving fundamental questions and i want to dive into the world of Data Science and AI but not sure what to do after im done studying Python Basic should i study Math and Statistics for Data Science or continue learning Python OOP or study Data Structure and Algorithms or just go straight into Python Data Science Libraries like NumPy and Panda?

peak thorn May 10, 2025, 8:29 AM

#

i have to build projet of a numer,-license plates detection and extract the content from the plates please guide me in this because it's my first computer vison project. Thank you

jaunty helm May 10, 2025, 9:41 AM

#

obtuse acorn hmmm, it just overlays it, is there a way to hide the scatter plot on one side, ...

actually, if you just follow the link to PairGrid, doesnt it show what you're looking for?

g = sns.PairGrid(penguins, diag_sharey=False)
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot)
g.map_diag(sns.kdeplot)

elfin shadow May 10, 2025, 9:41 AM

#

When was this.

#

@sterile heath this is really interesting so dose that mean that your offspring could have the same problem.

sterile heath May 10, 2025, 9:44 AM

#

I don't know about the genetics of it. It can sometimes take two to tango.

#

But also, offspring are not a likelihood in my future.

#

Nor have I any.

#

Nor have I ever.

elfin shadow May 10, 2025, 9:46 AM

#

sterile heath But also, offspring are not a likelihood in my future.

Yea it’s something to do with the chromosomes I think because it’s split into pairs. Half and half from the male and the other for the female.

sterile heath May 10, 2025, 9:46 AM

#

#voice-chat-text-0

#

Sorry guys.

fickle shale May 10, 2025, 10:43 AM

#

sterile heath Sorry guys.

forgive!! Enjoy ur life!!

verbal elbow May 10, 2025, 2:26 PM

#

I'm a data scientist who's looking for a paid internship opportunity, please how can you be of help to me? Thanks

rich moth May 10, 2025, 5:14 PM

#

man I've been working on this UCF paper for days, i think I'm almost done but i don't know exactly how to incorporate the visuals. I think I will create some type of "gallery" with the results? Also , Do I need an endorsement for a arXiv submission?

odd meteor May 10, 2025, 5:45 PM

#

verbal elbow I'm a data scientist who's looking for a paid internship opportunity, please how...

You might wanna check LinkedIn

limpid zenith May 10, 2025, 6:22 PM

#

rich moth man I've been working on this UCF paper for days, i think I'm almost done but i...

You probably won't if you have a university email.

lapis sequoia May 10, 2025, 6:39 PM

#

what is job
in coroutines
/threads

rare bane May 10, 2025, 6:59 PM

#

been working on unsupervised data and i have a problem getting two annotated labels for the visualaization to be visible, i've modified the xytest for about 2 hours and i've gotten zilch results

here is the syntax for the code, if anyone can help:
colormap = plt.get_cmap('tab20', num_clusters)
colors = [colormap(i) for i in range(num_clusters)]
plt.figure(figsize=(17, 12))

for cluster_num in range(num_clusters):
cluster_points = tfidf_matrix_reduced[df['cluster'] == cluster_num]
plt.scatter(cluster_points[:, 0], cluster_points[:, 1], color=colors[cluster_num], label=cluster_to_genre[cluster_num], alpha=0.9)

plt.title('Book Genre Clusters with 2D PCA')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')

plt.annotate('Popular / Romantic →', xy=(0.42, 0), xytext=(0.63, 0.18), fontsize=11, color='blue')
plt.annotate('← Serious / Thought-Provoking', xy=(-0.5, 0), xytext=(-1.0, 0.2), fontsize=11, color='red')
plt.annotate('↑ Literary / Award-Winning', xy=(0, 0.5), xytext=(0.05, 0.75), fontsize=11, color='purple')
plt.annotate('↓ Genre Fiction / Mass Appeal', xy=(0, -0.5), xytext=(0, -1.0), fontsize=11, color='green')

plt.legend(title='Predicted Genre', bbox_to_anchor=(0.5, -0.08), loc='upper center', borderaxespad=0., ncol=num_clusters // 2 if num_clusters > 2 else 2)

plt.tight_layout(rect=[0.08, 0.12, 0.92, 0.95])
plt.subplots_adjust(bottom=0.22)
plt.grid(True)
plt.show()

zealous hollow May 10, 2025, 8:01 PM

#

Hii!!

so i have a question regarding pyomo.

my problem has an objective, and a stationary, for ideal convex, stationary value is 0. but for any real case, stationary value can never be zero, but i want it to move towards zero. like closer it is to zero, better,

objective function, is entirely different value, but PS. stationary and objective, cant be in objective object of pyomo, as that makes them compare, which shouldnt happen in my setup.

tropic sphinx May 10, 2025, 8:39 PM

#

Hi everyone,
I'm looking for some guidance and would really appreciate your help.

I'm not a data scientist by profession, but I've been learning machine learning and working with Python. I'm currently building a tool that tracks data transformations before the data is fed into a model for training.

Right now, I'm trying to find example projects—either ones you've worked on yourself or available online—that I can use to test my tool. I'm primarily focusing on transformations using pandas, NumPy, and scikit-learn.

I can build basic pipelines myself (e.g., using fillna, one-hot encoding, PCA, etc.), but since I don’t have experience with real-world projects, I’m not confident I’m covering all the important cases. Any pointers to existing pipelines or datasets with preprocessing steps would be incredibly helpful.

Thanks in advance for your guidance!

rich moth May 10, 2025, 9:15 PM

#

https://claude.ai/public/artifacts/0baf9898-9a1e-4da4-bed8-8a6f52c23d3b

This is what I got so far.

rich moth May 10, 2025, 11:12 PM

#

I find the feature space fascinating. It's like the DNA of each dataset.

digital basin May 10, 2025, 11:21 PM

#

hi guys, im trying to getting into the AI world, but idk where to start and where to learn, can you please help me?

rich moth May 10, 2025, 11:35 PM

#

That all depends what the most effective learning strategy is for you. 😄

#

Out of all the feature spaces i looked at guys, theres hundreds. Look at this! I dont know what it is , but this one is hypnotic.

lapis sequoia May 10, 2025, 11:40 PM

#

people hiring alot for ml engineers?

rich moth May 10, 2025, 11:44 PM

#

quaint mulch I guess, 1st of all, congrats for getting this far, it seems that you did some r...

I never saw this, thanks for the feedback this is great. I've updated my paper, please feel free to critique it. I will look into what you told me todo though. Seems like github is the way to go.

iron basalt May 11, 2025, 1:18 AM

#

digital basin hi guys, im trying to getting into the AI world, but idk where to start and wher...

https://aima.cs.berkeley.edu/ The standard introduction to AI book.

warped notch May 11, 2025, 3:05 AM

#

Hello what should I as a newbie to ML use ? plotly or matplotlib for visualization

#

what do advanced users use ?

#

is there anything better than these two ?

agile cobalt May 11, 2025, 3:15 AM

#

for simple things, plotly express is very straightforward and can make interactive plots
(plotly express is a sub-module on plotly)

for more custom things you might want to look into matplotlib or seaborn

serene scaffold May 11, 2025, 3:16 AM

#

warped notch Hello what should I as a newbie to ML use ? plotly or matplotlib for visualizati...

Matplotlib is older and its API is based on a different language (namely Matlab).

People use what they prefer. I've gravitated to plotly.

agile cobalt May 11, 2025, 3:18 AM

#

iirc Vega Altair is also somewhat popular, but personally I also prefer plotly

serene scaffold May 11, 2025, 3:19 AM

#

When I do use matplotlib, it's only via pandas. Imo matplotlib has the worst API of any data science library

#

And it isn't close

warped notch May 11, 2025, 3:19 AM

#

digital basin hi guys, im trying to getting into the AI world, but idk where to start and wher...

check this out https://developers.google.com/machine-learning/crash-course/linear-regression

#

OK,thank you

gusty silo May 11, 2025, 5:50 AM

#

Wassup guys

fickle shale May 11, 2025, 5:52 AM

#

rich moth I find the feature space fascinating. It's like the DNA of each dataset.

beautiful!!

rich moth May 11, 2025, 7:20 AM

#

This image shows what happens when you rank all samples from different data types (images, text, time series, and tabular data) by a single universal complexity measure and divide them into 10 difficulty bins.

#

Its 141k+ samples from 50+ datasets.

#

I think the fact the different data types naturally separate into ten distinct difficulty regions helps bring this all home.

fresh mulch May 11, 2025, 7:41 AM

#

so i was wondering what SQL will i master(or start learning) can you help me decide?

what type of SQL is best for Python Data Science?

MySQL
PostgreSQL
Oracle
others(mention it 🙂 )

rich moth May 11, 2025, 7:44 AM

#

I like option 2

#

In the long run I think it would do you the most justice because of its analytical capabilities.

limber spear May 11, 2025, 8:08 AM

#

I was taught to develop in vanilla SQL. It codes to all 4 on that list

rich moth May 11, 2025, 8:15 AM

#

I got the idea to create a unified dataset across a smaller subset of 14k samples. results confirm the exact same stratification pattern we saw in the larger test

#

storm nexus May 11, 2025, 11:28 AM

#

Hello

#

Is there a book which discusses multi class classification techniques like OVR AND OVO

#

And explicitly states those names, and not just the math

obtuse acorn May 11, 2025, 12:02 PM

#

so ive got a dataset thats got a date column and a time column and im reading the csv into a pandas dataframe, and i can set parse_dates=['date'], date_format='%d-%b-%y' and read the date just fine

#

but i dont see how i set it to read the date and the time if they are in seperate columns

#

ah data.time = pd.to_timedelta(data.time) worked

prisma patrol May 11, 2025, 1:57 PM

#

I'm looking for someone interested in networking for Data Science. I'm very motivated and would like to meet people who are in tune with me

plush kettle May 11, 2025, 2:48 PM

#

Guys, is it possible to train a resnet50 model with 640 x 640 images?

#

Here is my collate_fn: ```def collate_fn(batch):
images = list(image.to(DEVICE) for image, _ in batch)
targets = []
for _, target in batch:
boxes = []
labels = []
for annotation in target:
bbox = annotation['bbox']
# Convert from [x, y, width, height] to [xmin, ymin, xmax, ymax]
xmin = bbox[0]
ymin = bbox[1]
xmax = bbox[0] + bbox[2]
ymax = bbox[1] + bbox[3]
boxes.append([xmin, ymin, xmax, ymax])
labels.append(annotation['category_id']) # Use 'category_id' from COCO

targets.append({
    'boxes': torch.as_tensor(boxes, dtype=torch.float32).to(DEVICE),
    'labels': torch.as_tensor(labels, dtype=torch.int64).to(DEVICE)
})

return images, targets```

#

I use my data already processed with roboflow its pretty much like this: DATA_DIR = '/content/oreo-1' # Replace with the actual path TRAIN_ANNOTATION_FILE = os.path.join(DATA_DIR, 'train_annotation/_annotations.coco.json') TRAIN_IMAGE_DIR = os.path.join(DATA_DIR, 'train') # Adjust as needed VAL_ANNOTATION_FILE = os.path.join(DATA_DIR, 'val_annotation/_annotations.coco.json') VAL_IMAGE_DIR = os.path.join(DATA_DIR, 'valid') # Adjust as needed NUM_CLASSES = 122 # Replace with the number of classes in your dataset (e.g., 80 for COCO) BATCH_SIZE = 4 LEARNING_RATE = 0.001 NUM_EPOCHS = 175 DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu') CONFIDENCE_THRESHOLD = 0.5 IOU_THRESHOLD = 0.5

verbal oar May 11, 2025, 5:23 PM

#

hmm I thought collate is for text but its for images too why?

#

I see maybe too much of epochs

#

hmm I think not, resnet is for small images

#

must resize them

#

also it is longer to process bigger images

#

dont remember what size was sth like 32 or 64, check it

verbal oar May 11, 2025, 6:21 PM

#

is temperature of llm related to simulated annealing?

serene scaffold May 11, 2025, 7:13 PM

#

verbal oar is temperature of llm related to simulated annealing?

In both contexts, the "temperature" is just a probability that the model will, intentionally, make a sub optimal decision

#

A higher temperature is a higher likelihood that the model will do that.

If you've taken chemistry, you'll remember that higher temperature systems are more chaotic, and stuff.

glossy canyon May 11, 2025, 11:21 PM

#

Hey guys, is there anyone who has an idea on how to create custom tokenization for OCR or how to train Tesseract using custom datasets?

rich moth May 12, 2025, 1:22 AM

#

glossy canyon Hey guys, is there anyone who has an idea on how to create custom tokenization f...

Which version of tesseract are you planning on using. Theres a lot going on here. You might want to really research this. Maybe gooogle it and have a chat with AI about brainstorming methods.

#

I mean the answer is right there. Dont ask us I feel, you should have been showing us results by now!

#

But what insight do you seek that you that you probably can find yourself?

#

What I've found in my research is we all learn differently. If we treat humans as individual datasets, we all have our unique learning methods, some even hidden.

#

😄

rich moth May 12, 2025, 3:07 AM

#

#

I finally got this visual working!

#

Its the culmination of processing of 62 data sets from 4 domains the 141k+ samples

lapis sequoia May 12, 2025, 3:56 AM

#

can a data analyst please reach out to me? i'm struggling in my internship with no mentor.

outer cloak May 12, 2025, 5:25 AM

#

yo @lapis sequoia

#

join the voice chat

#

"Voice Chat 0" ok

odd meteor May 12, 2025, 7:01 AM

#

plush kettle Here is my collate_fn: ```def collate_fn(batch): images = list(image.to(DEVICE...

I think ResNet requires resizing your image to 224x224

odd meteor May 12, 2025, 7:09 AM

#

lapis sequoia can a data analyst please reach out to me? i'm struggling in my internship with ...

If you had mentioned what you're struggling with in your internship someone who have most likely responded by now.

You can also use the #career-advice channel to get advise on how to surmount the challenge you're facing.

#

Meanwhile, who else is submitting their work to NeurIPS? 😛

glossy canyon May 12, 2025, 7:47 AM

#

rich moth Which version of tesseract are you planning on using. Theres a lot going on her...

Thanks for the honest feedback! I agree that most of this info is out there, and I have been researching it—I’m actually experimenting with both a custom CRNN model and now considering Tesseract 5.4.1 for comparison.
My main challenge is fine-tuning Tesseract on a custom script like Balochi, especially around generating accurate training data and understanding the lstmtraining process. I’m not looking for spoon-fed answers—just hoping someone might have hands-on experience or tips that could help speed things up a bit.
And you're totally right—we all learn differently. For me, discussing things out loud (or in chat) helps uncover blind spots and validate whether I'm on the right track. 😄

verbal oar May 12, 2025, 11:41 AM

#

yes youre right 224x224, I checked in search

#

it was 224 not 24, as I guessed 32 or 64 size, my bad
but still relatively small images

verbal oar May 12, 2025, 11:58 AM

#

I meant I thought it is 24x24 nvm

#

ah maybe you want to fine tune it, then its different thing

lapis sequoia May 12, 2025, 3:57 PM

#

Hello everyone, to study ai/ml and robotics do you need to learn about electricity and how does it work (asking as a self-taught programmer)

agile cobalt May 12, 2025, 4:13 PM

#

for ai/ml not really
for robotics you'll likely want to have a descent notion of physics though

lapis sequoia May 12, 2025, 4:52 PM

#

Aha so it's like i don't need to study it deeply right?

woven stream May 12, 2025, 5:33 PM

#

Nah I mean my CS course has robotics in it and it covers kinematics briefly and control (PID / MPC), Reinforcement learning, Markov decision processes, some sensor stuff and just general deep learning

heavy crow May 12, 2025, 7:45 PM

#

If I have large amounts of data ~5million training points for a relatively small CNN with 0.4M params, should I be running the full dataset per epoch or only a subset? How would I estimate the number of batches per epoch to try?

#

Obviously if I run the full dataset per epoch my LR scheduler will kick in way later, but are there other benefits?

serene scaffold May 12, 2025, 7:47 PM

#

heavy crow If I have large amounts of data ~5million training points for a relatively small...

an epoch is defined as a full pass over the dataset.

heavy crow May 12, 2025, 7:48 PM

#

okay, thank you! Is it at all common practice to have a LR scheduler act within an epoch?

serene scaffold May 12, 2025, 7:50 PM

#

heavy crow okay, thank you! Is it at all common practice to have a LR scheduler act within ...

I'm not sure how common it is, but you get to decide if you want to call it after each batch, or after every n batches, or once at the end of each epoch, or what

heavy crow May 12, 2025, 7:59 PM

#

Okay! thanks for your input 🙂 Looking at some literature on similar networks i see they half the LR every 2*10^5 minibatches which seems like a good starting point

#

When training residual super resolution networks, is mode collapse a problem? I.e mean and var collapsing to zero beause the low and high res images are already close to each other?

serene scaffold May 12, 2025, 8:03 PM

#

idk what that even is

heavy crow May 12, 2025, 8:07 PM

#

For image super resolution you can increase efficiency by upscaling the image with normal upscalers such as bicubic or lanczos and then learn a delta that gets added to this.

#

instead of using transposed convolutions to reach a higher resolution output from the low res input

#

https://openaccess.thecvf.com/content_cvpr_2017_workshops/w12/papers/Lim_Enhanced_Deep_Residual_CVPR_2017_paper.pdf

I belive this is the paper that introduced the approach initially.

#

Okay they were not the orignal, seems like SRResNet came before them.

final jolt May 12, 2025, 8:46 PM

#

thought id try asking here since stuff like pandas is related but anyone got some experience/recommendation on a python library to convert pdf to possibly csv? Im fine managing the data cleanup itself but looking for other options. I have tried pdfplumber and its not working well. tabula works quite well but it relies on java which im not a fan of needing to have that as part of my app

limpid zenith May 12, 2025, 8:46 PM

#

like OCR libraries?

final jolt May 12, 2025, 8:48 PM

#

limpid zenith like OCR libraries?

not that complex. basically trying to convert pdf bank statements into CSV. whole bunch of junk that isnt needed in there

limpid zenith May 12, 2025, 8:54 PM

#

if you don't mind using LLMs then this seems interesting https://pypi.org/project/llama-parse/

final jolt May 12, 2025, 8:56 PM

#

I did see that one as well as one called ThePipe and not really a fan of feeding private financial data to an LLM, especially if I ever want to release this application for others

agile cobalt May 12, 2025, 9:05 PM

#

final jolt thought id try asking here since stuff like pandas is related but anyone got som...

the two ones I remember out of the top of my head are https://github.com/microsoft/OmniParser and https://pymupdf.readthedocs.io/en/latest/

final jolt May 12, 2025, 9:08 PM

#

agile cobalt the two ones I remember out of the top of my head are https://github.com/microso...

hmm ill take a look at those as well thanks. what input formats I support will be of course heaviliy limited to what formats I need so that will make it more forgiving to try

#

huh tabula creates massive lists of what it parses. and each entry in the list is a table. Well thats notable but kinda annoying

rich moth May 13, 2025, 2:37 AM

#

So I took the UCF stuff and decided to make a trading bot. The idea here is to represent market structure as a complex number, mapping the market into this phase space where I can visualize different regimes way more clearly.

What I did was build these layers that all talk to each other. Like one part figures out what market "regime" we're in (trending, choppy, whatever), another part picks the right strategy for that regime, and another part handles risk. The cool thing is they all continuously adapt, no retraining needed.

sudden delta May 13, 2025, 2:39 AM

#

rich moth So I took the UCF stuff and decided to make a trading bot. The idea here is to ...

if you've got a trading bot, sounds like your career change is all set 😂

rich moth May 13, 2025, 2:49 AM

#

sudden delta if you've got a trading bot, sounds like your career change is all set 😂

I just started it, I imagine its gathering enough data to figure out the current market regimes. I'll know buy this morning. You know.. I hate messing with the trading logic in these things lol

#

I made a bunch of visuals, hoping to have some stuff to share.

rich moth May 13, 2025, 2:51 AM

#

glossy canyon Thanks for the honest feedback! I agree that most of this info is out there, and...

How's it going? Any luck on this stuff?

final cobalt May 13, 2025, 4:25 AM

#

https://github.com/lucaswalkeryoung/Diffusion-2/blob/master/Diffusers/DDPM.py

GitHub

Diffusion-2/Diffusers/DDPM.py at master · lucaswalkeryoung/Diffusi...

Contribute to lucaswalkeryoung/Diffusion-2 development by creating an account on GitHub.

#

Though y'all might like this

obtuse acorn May 13, 2025, 11:05 AM

#

any idea if its possible to resize the squares in a seaborn or plotly heatmap?

#

basically i saw this and thought it could be handy
the code for it is at https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science/blob/main/Plotting/Size-encoded-heatmaps.ipynb

GitHub

Daily-Dose-of-Data-Science/Plotting/Size-encoded-heatmaps.ipynb at ...

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/ - ChawlaAvi/Daily-Dose-of-Data-Science

#

i got it working but it doesnt look right

#

radiant cipher May 13, 2025, 11:07 AM

#

Anyone aware of something one can let loose on a set of gut repos with a task and getting plans/code changed out of it

obtuse acorn May 13, 2025, 11:08 AM

#

obtuse acorn

like the squares dont line up properly

final jolt May 13, 2025, 12:09 PM

#

radiant cipher Anyone aware of something one can let loose on a set of gut repos with a task an...

what do you mean by 'plans' like are you just wanting to see recent changes to the repo? Git basically shows you all of that in history and such

radiant cipher May 13, 2025, 12:15 PM

#

final jolt what do you mean by 'plans' like are you just wanting to see recent changes to ...

I want to submit tasks with example code patterns and fixes and get reports on occurrence and or create prs with fixes

#

So an agent system to apply global changes to about 100 git repos

final jolt May 13, 2025, 12:26 PM

#

So you want to create like a report against a bunch of repos based on a code pattern you provide, like you are looking for vulnerabilities or improper code blocks and then change them?

radiant cipher May 13, 2025, 12:28 PM

#

final jolt So you want to create like a report against a bunch of repos based on a code pat...

@final jolt exactly

final jolt May 13, 2025, 12:32 PM

#

radiant cipher <@308963648636715011> exactly

Well thats a lot of prep to do. The only things I have encountered like that are custom things in a corporate environment. What I would suggest starting with would be simple scripts that can do pattern matching for some example code you are trying to look for. Once you get that working checking a bunch of repos will be the easier part (automatically editing them is another matter though)

agile cobalt May 13, 2025, 12:48 PM

#

radiant cipher So an agent system to apply global changes to about 100 git repos

The least worst is probably something like https://github.com/All-Hands-AI/OpenHands or just a custom RAG system, but in general agents are nowhere near reliable enough to do that well yet

#

even for a single git repo you'd get mixed results, let alone 100 at once

final jolt May 13, 2025, 12:56 PM

#

agile cobalt even for a single git repo you'd get mixed results, let alone 100 at once

yea very much this which is why Ive only seen this type of thing done custom at any level of scale. And even then its not something that goes out and actively change repos but more just data integrity, syntax and format checking and validating. And most of the time as part of the commit workflows because that is the best place to fix issues like that

#

imo

austere shore May 13, 2025, 2:48 PM

#

Can anyone teach me AI/ML

final jolt May 13, 2025, 2:52 PM

#

austere shore Can anyone teach me AI/ML

dont post the same message in multiple channels. Also someone responded to you in #python-discussion already earlier

austere shore May 13, 2025, 2:52 PM

#

Oh

#

I'm sorry

#

But can you teach me

final jolt May 13, 2025, 2:53 PM

#

I cannot

austere shore May 13, 2025, 2:54 PM

#

Oh

plush kettle May 13, 2025, 5:13 PM

#

Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
Using 'backbone_name' and 'weights' as positional parameter(s) is deprecated since 0.13 and may be removed in the future. Please use keyword parameter(s) instead.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-5af1204afd6e> in <cell line: 0>()
----> 1 model = create_retinanet_model()

3 frames
/usr/local/lib/python3.11/dist-packages/torchvision/models/detection/backbone_utils.py in <lambda>(kwargs)
     63     weights=(
     64         "pretrained",
---> 65         lambda kwargs: _get_enum_from_fn(resnet.__dict__[kwargs["backbone_name"]])["IMAGENET1K_V1"],
     66     ),
     67 )

TypeError: unhashable type: 'list'``` I got this error when executing this method ```def create_retinanet_model():
    # Load a pre-trained ResNet50 backbone
  backbone = torchvision.models.resnet50(pretrained=True)
  test = list(backbone.children())[:-2]
  backbone = torch.nn.Sequential(*test)  # Remove the last two layers

    # Input channels for the feature pyramid network.  Resnet50 outputs 2048
  in_channels_list = [2048, 1024, 512]  # Channels for P3, P4, P5

    # Output channels for FPN
  out_channels = 256

    # Create the feature pyramid network.
  fpn = resnet_fpn_backbone(in_channels_list,out_channels)

    # 91 because of the background class
  num_classes = NUM_CLASSES
    # Anchor generator
  anchor_generator = AnchorGenerator(
      sizes=((32,), (64,), (128,), (256,), (512,)),
      aspect_ratios=((0.5, 1.0, 2.0),) * 5
  )
    # Put anchor generator inside the model
  model = RetinaNet(backbone,
                    num_classes=num_classes,
                    fpn=fpn,
                    anchor_generator=anchor_generator)
  return model``` why is this?

#

I just want to remove the two last layers of torchvision Resnet50 backbone

radiant cipher May 13, 2025, 5:34 PM

#

final jolt yea very much this which is why Ive only seen this type of thing done custom at ...

Its a bit of a stretch

Im looking for sensible building blocks so i can set it up as something iterative with feedback

final jolt May 13, 2025, 6:20 PM

#

radiant cipher Its a bit of a stretch Im looking for sensible building blocks so i can set it...

Well functionally you are asking for something very large and complex that doesnt really exist and is very difficult to do is what we are saying. So the better approach would be to simplify your goal to what is more feasible and start there and then try to expand on it as you improve its functionality

radiant cipher May 13, 2025, 6:30 PM

#

final jolt Well functionally you are asking for something very large and complex that doesn...

Starting point would of course be one repo at a time

There's be research/locate possibly some ondexing and eventually something that fires that chain at all the repos like a madman

I lack familiarity with concrete building blocks wrt Ai running knowledge store and state management

final jolt May 13, 2025, 7:38 PM

#

anyone familiar with pymupdf?

serene scaffold May 13, 2025, 7:49 PM

#

final jolt anyone familiar with `pymupdf`?

please remember to always--every time--ask your actual question. please never ask "does anyone know about x". just ask your actual question about x, and people will know it's about x from reading it.

final jolt May 13, 2025, 7:49 PM

#

yea sorry. got sidetracked lol

#

basically getting this error

  File "D:\scripts\pybudget\pdf_convert.py", line 57, in <module>
    pymu_pdf(pdf_path, csv_path)
  File "D:\scripts\pybudget\pdf_convert.py", line 35, in pymu_pdf
    pprint(tabs[0].extract())
TypeError: 'module' object is not callable```
when trying to just extract tables from a pdf.  One off the table parsing works but trying to iterate it is failing
```py
def pymu_pdf(pdf_path, csv_path):
    pdf = pymupdf.open(pdf_path)
    print(f"Total pages: {len(pdf)}")
    for pages in pdf: 
        if pages.number == 2:
            tabs = pages.find_tables(strategy="text")
            if tabs.tables:
                pprint(tabs[0].extract())```

serene scaffold May 13, 2025, 7:53 PM

#

final jolt basically getting this error ```Traceback (most recent call last): File "D:\s...

this actually isn't a pymupdf problem. it's a naming problem. so remember to never ask "does anyone know about x", because your problem might actually have nothing to do with x.

did you just do import pprint?

final jolt May 13, 2025, 7:54 PM

#

yup and you are right the example is wrong on their docs

serene scaffold May 13, 2025, 7:54 PM

#

because pprint is a module that contains a function that's also named pprint. so if you do import pprint, then pprint is a module. if you do from pprint import pprint, then it's a function

#

I recommend doing import pprint as pp and then pp.pprint. that way it's never a mystery which one pprint is.

final jolt May 13, 2025, 7:55 PM

#

yup that was the issue, I was originally doing print and had errors so went back to the example to test and missed the pprint from pprint

serene scaffold May 13, 2025, 7:55 PM

#

there was actually a PEP that could have fixed this, but it was rejected

final jolt May 13, 2025, 7:56 PM

#

heh, bummer, and yea I have been good about just asking until this time heh. thanks for the info

#

now I can try this again with pprint to see if this works. However either way this is major progress as I was trying with tabula before and it was very cumbersome

final jolt May 13, 2025, 8:15 PM

#

oh that is soooo much better

obtuse acorn May 13, 2025, 8:35 PM

#

obtuse acorn

got it

#

i basically drew white rectangles above the heatmap then drew colored scaled rectangles above those

scenic parcel May 14, 2025, 4:22 AM

#

serene scaffold there was actually a PEP that could have fixed this, but it was rejected

It would have prevented a module and a function from having the same name?

scenic parcel May 14, 2025, 4:23 AM

#

serene scaffold I recommend doing `import pprint as pp` and then `pp.pprint`. that way it's neve...

Why would you ever import pprint instead of from pprint import pprint though? Similar to datetime.datetime

dull mortar May 14, 2025, 4:25 AM

#

im not sure if this comes under this channel but

how are numpy arrays structured? like how does it compare to matrices and their notation? (is a 3x4 matrix the same as a numpy array with shape (3, 4)? will using functions like np.dot() on such an array yield the same results as the same operation on a 3x4 matrix (and a 4x1 vector)?)

im asking specifically for like visualisation. in math usually the first number corresponds to the number of rows and im basically just wondering if thats the same for numpy. (and if itll work the same for matrix operations)

sorry if my phrasing is slightly off. kind of new to both linear algebra and numpy

wooden sail May 14, 2025, 4:50 AM

#

dull mortar im not sure if this comes under this channel but how are numpy arrays structure...

numpy ndarrays with 2 dimensions work just like matrices, yes

dull mortar May 14, 2025, 5:08 AM

#

wooden sail numpy ndarrays with 2 dimensions work just like matrices, yes

thank you! for arrays larger than that, does the order go from the highest order array to the lowest (im not sure if im saying that right)? like would a size of (2, 3, 4) mean 2 slices, 3 rows, and 4 columns?

wooden sail May 14, 2025, 6:05 AM

#

dull mortar thank you! for arrays larger than that, does the order go from the highest order...

yeah that should be the case considering numpy is row-major by default. if you rely on functions like np.dot for multidimensional operations, the default behavior will work by treating the last 2 dimensions as rows and columns

rare bane May 14, 2025, 10:41 AM

#

rare bane May 14, 2025, 10:42 AM

#

rare bane

So you can view it if you like...I accept constructive criticism

spring field May 14, 2025, 11:23 AM

#

serene scaffold I recommend doing `import pprint as pp` and then `pp.pprint`. that way it's neve...

why not pp.pp 😁

#

4-p

final jolt May 14, 2025, 12:20 PM

#

scenic parcel Why would you ever import pprint instead of from pprint import pprint though? Si...

yea I just did it as from pprint import pprint as pp since I didnt see any reason to just import pprint and then have to do pprint.pprint()

#

Now if I could get pymupdf to be more consistent that would be cool. parsing pdfs suck for sure.

spring field May 14, 2025, 12:26 PM

#

final jolt yea I just did it as `from pprint import pprint as pp` since I didnt see any rea...

are you aware of pprint.pp ?

final jolt May 14, 2025, 12:28 PM

#

ah no, I also never really use pprint and only did here because I was following some docs. I dont functionally need pprint for anything at the end of the day

long locust May 14, 2025, 1:57 PM

#

!timeout 1079012483290890321 spam

arctic wedgeBOT May 14, 2025, 1:57 PM

#

:incoming_envelope: :ok_hand: applied timeout to @slow sleet until <t:1747234634:f> (1 hour).

obtuse acorn May 14, 2025, 1:58 PM

#

any idea when to use standard scaler vs minmax scaler in scikit learn?

final jolt May 14, 2025, 2:22 PM

#

So short version I am trying to use matplot to display gridlines on a page rendered from a pdf. I got all that working however I am trying to adjust the grid line spacing with no success. I thought the correct parameter was markevery but that seems to just be for an actual graph

    DPI = 150
    pix = chosen_page.get_pixmap(dpi=DPI)
    img = np.ndarray([pix.h, pix.w, 3], dtype=np.uint8, buffer=pix.samples_mv)
    plt.figure(dpi=DPI)  # set the figure's DPI
    plt.title("title")  # set title of image
    plt.grid(True)
    plt.grid(markevery=10)
    plt.grid(color='gray', linestyle='--', linewidth=0.5)
    _ = plt.imshow(img, extent=(0, pix.w * 72 / DPI, pix.h * 72 / DPI, 0))
    plt.show()```
code snipper here. the gridlines seem to default to every 100
*edit* I never got this to work but bruteforced what I needed but I am curious how to make this work if anyone wants to weigh in

charred ferry May 14, 2025, 2:23 PM

#

Hi guys, basically if i wanted to do a final year project that combined data analytics and machine learning, do u guys have any good resources i can use to study to get a basic understanding of both? Idk which channel to ask this. I been looking for video tutorials and rsources myself but additional resources from other people would be useful.

final jolt May 14, 2025, 2:24 PM

#

!res

arctic wedgeBOT May 14, 2025, 2:24 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

final jolt May 14, 2025, 2:24 PM

#

have you checked in that page? I think there was a section for ML and such but I could be wrong

charred ferry May 14, 2025, 2:26 PM

#

oh thanks

final jolt May 14, 2025, 2:26 PM

#

hmm looks like more generalized stuff there but could be a good starting point. Certainly some YT channels have other video series for more specific topics as well

charred ferry May 14, 2025, 2:26 PM

#

final jolt hmm looks like more generalized stuff there but could be a good starting point. ...

yaeah

#

thanks for the help tho

stiff crown May 14, 2025, 2:36 PM

#

interesting

final jolt May 14, 2025, 3:20 PM

#

I swear this join_tolerance doesnt do jack in PyMuPDF, heh

final jolt May 14, 2025, 4:05 PM

#

ok pages.search_for() is goat

verbal oar May 14, 2025, 6:54 PM

#

if there is crewai ok but what else to understand how it works?

#

crewai is very high-level is there sth more verbose?

agile cobalt May 14, 2025, 6:57 PM

#

verbal oar if there is crewai ok but what else to understand how it works?

I don't get what you are asking whatsoever?

#

what exactly are you trying to understand

#

if you want to understand how the models themselves work, look up Andrej Karpathy's tutorials

if you want to understand how to build a simple agent using an LLM, use the HuggingFace Transformers library directly

if you want to orchestrate multiple agents more manually, either still just Transformers or PydanticAI / LangChain

verbal oar May 14, 2025, 6:59 PM

#

in crewai you just prompting what agent must do and task description

#

but this works under the hood I assume some tokenization and nlp related things

#

maybe more manually ok so langchain then

agile cobalt May 14, 2025, 7:01 PM

#

langchain still abstracts away all "nlp related things"

if you want to see how tokenization works then start from Andrej Karpathy's videos

verbal oar May 14, 2025, 7:01 PM

#

I know tokenization

#

ok so its just llm agents this how should I think

agile cobalt May 14, 2025, 7:04 PM

#

most "Agents" just replace all NLP techniques for one giant blackbox (the LLM), instructions goes in then text, a tool call or a well formatted object comes out

verbal oar May 14, 2025, 7:04 PM

#

I see thanks, now more clear

#

so this is like learn llm
then learn llm agents
order

agile cobalt May 14, 2025, 7:07 PM

#

depending on where you look you may find normal code mixed in the orchestration though, some of which may or may not be using classical NLP techniques

For example, before sending to an agent you can use a simple metric to determine which agent to send to (or not send to one at all)

agile cobalt May 14, 2025, 7:08 PM

#

verbal oar so this is like learn llm then learn llm agents order

yeah, if you want to mess around with agents or crews I'd strongly recommend understanding what the inputs and outputs of the llms look like

ideally also how the llm (neural network) itself works, but honestly that isn't vital unless you plan to customize the underlying model

final jolt May 14, 2025, 7:18 PM

#

Well finally got this pdf parsing nearly complete and I can get back to the pandas side of thing. Still odd that it splits one of the columns randomly but what are ya gonna do I guess. other than join them I mean

rich moth May 14, 2025, 9:21 PM

#

I gotta fix the Text above, i got an overlapping problem, but man what it captured it awesome. The different unique signatures "fingerprint" of a few crypto coins.

final jolt May 14, 2025, 9:29 PM

#

rich moth I gotta fix the Text above, i got an overlapping problem, but man what it captur...

nice looking graphics though what exactly is the data points being graphed here. complexity of a crpyto coin in what sense?

rich moth May 14, 2025, 9:38 PM

#

final jolt nice looking graphics though what exactly is the data points being graphed here....

hey dude thank you! each dot represents a specific market state at a moment in time.

final jolt May 14, 2025, 9:39 PM

#

rich moth hey dude thank you! each dot represents a specific market state at a moment in ...

ah so you are interpreting like certain aspects of market change as complexity? Thats neat.

#

stuff like number of trades, price changes, etc?

rich moth May 14, 2025, 9:42 PM

#

final jolt ah so you are interpreting like certain aspects of market change as complexity? ...

exactly! instead of just, is the market going up or is it volatile? i'm measuring how structured vs random the behavior is (through phase θ), how typical vs unusual the current conditions are (component A), and the overall energy/difficulty level (magnitude |Φ|)

final jolt May 14, 2025, 9:46 PM

#

rich moth exactly! instead of just, is the market going up or is it volatile? i'm measurin...

Very clever. So is there a way to tell the timing of each data point is this more to get a better feeling of overall "volatility" of a coin in general over a given day? a fingerprint as you called it.

rich moth May 14, 2025, 9:54 PM

#

final jolt Very clever. So is there a way to tell the timing of each data point is this mo...

each dot is representing 1 min of market state over 30minutes, but rather than tracking volatility over time, im mapping their distribution in "complexity space" to hopefully reveal their structural patterns.

#

thank you 😄

rich moth May 14, 2025, 11:00 PM

#

i was thinking why not make it a continuous feedback loop also.

#

Its already learning..just in a new type of way

heavy knot May 14, 2025, 11:15 PM

#

hey, I use homebrew as my installer and I'm trying to install flake8 in jupyterbooks. Anyone have any experience in doing so cause I can't get brew to recognized jupyterlab-flake8

vague trout May 14, 2025, 11:18 PM

#

I've done foundational courses (andrew ng) and more deeplearning.ai courses but i don't understand how to start a project or what do I do, I have 0 practical knowledge. where do I start practicing ml?

rich moth May 14, 2025, 11:57 PM

#

OK so I made it automatically tunes strategy parameters every 4 hours, it analyzes win rates and profit factors for each strategy, Underperforming strategies get parameter adjustments (tighter stops, adjusted take profits), Outperforming strategies get optimized for even better results and Cooling periods prevent over-adjustment.

Every 24 hours it builds and updates the "fingerprints".
It clusters UCF states and analyzes performance by cluster and creates an asset specific "memory" f which complexity states are profitable, which in turn influces future trading via confidence adjustments and postion sizing.

I added realtime feedback stuff to boost oreduce confidence based on histroical perofrmance in similar states, it adjusts confidewnce when phase alignment is strongf and modifies position sizing base on histroical profiatblre clusters. Most importantly it saves and loads all these learned adjustments in a pickle which inclues stratergy parameters and the state checkpoints.

iron basalt May 15, 2025, 12:56 AM

#

dull mortar thank you! for arrays larger than that, does the order go from the highest order...

If you were to loop over the elements in order as they are in memory (contiguous access) and then compute the N-dimensional index, the last element of that index would be the fastest changing, and the first the slowest changing.

rich moth May 15, 2025, 1:20 AM

#

https://drive.google.com/drive/folders/1UVWdutaFWTrw9DTeRr6v3uL5NvvTgDLX?usp=sharing

Heres the complexity space images of 12 different crypto coins.

rich moth May 15, 2025, 1:22 AM

#

vague trout I've done foundational courses (andrew ng) and more deeplearning.ai courses but ...

People like to start with kaggle stuff. Try some reinforcement learning games.

vague trout May 15, 2025, 2:43 AM

#

rich moth People like to start with kaggle stuff. Try some reinforcement learning games.

thanks, can you be more specific, I don't know how I will start taking part in competition without practical knowledge

rich moth May 15, 2025, 3:43 AM

#

vague trout thanks, can you be more specific, I don't know how I will start taking part in c...

You can help me tackle this https://www.kaggle.com/competitions/stanford-rna-3d-folding

I made the tool todo it, i just need to write the code.

Stanford RNA 3D Folding

Solve RNA structure prediction, one of biology's remaining grand challenges

#

I want to build something to test this theory

rich moth May 15, 2025, 4:00 AM

#

I would need to first Transform RNA sequence data into a format suitable for the UCF

#

I thinking just adding the logic to the data preprocess for "rna_sequence" domain an potentially tailored to θ calculation, while reusing the N,A,ϵ logic where possible. it actually sounds fun. anyone got any ideas how to visualize this

rich moth May 15, 2025, 5:28 AM

#

Ok I built the pipeline for the RNA 3D structure prediction that uses the UCF to biological sequences. Im using that kaggle data set from the comp. It's basically applying mathematical complexity theory to biological structure prediction. Might be a bit for visuals but im excited 😄

#

heres a couple that came in

#

visuals need work though 😛

serene grail May 15, 2025, 5:38 AM

#

vague trout thanks, can you be more specific, I don't know how I will start taking part in c...

Kaggle has mini-courses that walk you through using notebooks on their website and how to do competitions (the Titanic dataset is used as a tutorial IIRC)
Just Google kaggle learn for the mini-courses
This is the Titanic competition just in case https://www.kaggle.com/competitions/titanic

rich moth May 15, 2025, 6:08 AM

#

Predicted RNA folding pattern visualization

naive matrix May 15, 2025, 8:08 AM

#

I’m trying to learn numpy working with opencv but there’s no good vid in YouTube that teaches about it, please help or give me advice if y’all can

crystal pier May 15, 2025, 11:06 AM

#

naive matrix I’m trying to learn numpy working with opencv but there’s no good vid in YouTube...

Read docs?

And what're you actually trying to work on? Learning library internals won't help if you're not working on a project, you'll end up forgetting the API

naive matrix May 15, 2025, 11:09 AM

#

crystal pier Read docs? And what're you actually trying to work on? Learning library interna...

I’m trying to learn how to make a server and client, so I want a video stream of the client

#

And server

crystal pier May 15, 2025, 11:11 AM

#

Shouldn't be a numpy problem. Do you by any chance mean collecting frames from a stream, say a webcam or rtsp network?

naive matrix May 15, 2025, 11:11 AM

#

Yes webcam

crystal pier May 15, 2025, 11:18 AM

#

naive matrix Yes webcam

Oh you're going to have to expose the RTSP link for your webcam to the opencv cv::VideoCapture API, shouldn't be a herculean task provided I've given some leads already.

It wouldn't be fun if you were just told what to do as well, so go ahead and break things😁

#

Also if you're doing any heavy inference of some sorts on the frames you'd also need to either:

Learn threading, python has the threading module, mutexes, GIL, so on
Or not learn anything and use the Inference library which is a pain to set up dependencies if you don't use a separate venv, you'll probably need some docker experience as well for this one

so I'd just recommend the former, cuz you'll learn things as well from the process

naive matrix May 15, 2025, 11:25 AM

#

Thank you ima chatgpt this to understand cause I don’t understand fullywhat’s a rtsp link and some things u say

#

I appreciate it

torn hill May 15, 2025, 11:28 AM

#

Topic - GLoVE Paper

Hi , so i recently started to read the GLoVE paper and there is this line in it which is confusing me which is "Since vector spaces are inher-
ently linear structures, the most natural way to do
this is with vector differences."

I dont get that how authors get to this conclusion that vector differences is a natural way? is there some logic behind it? or its pure heuristics?

Please tell me if its way less of a context I'll try to explain more

crystal pier May 15, 2025, 11:35 AM

#

I'll make out time to read the paper, but a few lines could help

#

But in this context I'd say that it means two things

#

Vector addition: vectors in a vector space can be added together to form another vector in that same vector space

I.e vectors are closed under additivity, this should be independent of the field, as vector spaces are inherently closed under additivity

#

Scalar multiplication: vectors in a vector space can be scaled to get other vectors within that same vector space,

These two properties bring about other linear properties while being linear themselves

E.g distributive properties, additive and multiplicative inverses etc

#

But vector spaces would be non linear under operations like multiplication of vectors by other vectors i.e vector squaring

#

tldr basically; all structures and operations in a vector space just respect linearity,

#

Sorry for the wall of text got slightly too into it 😅

crystal pier May 15, 2025, 11:52 AM

#

torn hill Topic - GLoVE Paper Hi , so i recently started to read the GLoVE paper and ther...

Now when I read the second part it seems more like Euclidean geometry, but I don't know what "the most natural thing" that the paper is doing is

But the vector differences just means that in a vector space all positions are relative, there is no absolution, so if I move my vector origin some (0_1, 0_2, ..., 0_n), all vectors in the same sense are moved, and a vector say x_1 - x_2 would stay the same, so all vector differences stay the same

Make any sense?

iron basalt May 15, 2025, 12:00 PM

#

torn hill Topic - GLoVE Paper Hi , so i recently started to read the GLoVE paper and ther...

It's how you do things relative to each other. If you have some position A as a vector and some position B as a vector, and you want a new vector at B (still same spot), but relative to A (it is the new origin), you can just do B - A.

#

(New vector tip is on B, and tail on A)

torn hill May 15, 2025, 12:03 PM

#

ok i understand the premise of vectors the thing is the authors are suggesting that it makese sense for them if the are taking the difference of two vectors but not addition and am not sure why , maybe if you took a quick read of page 3 of the paper it might make more sense? @crystal pier @iron basalt

iron basalt May 15, 2025, 12:04 PM

#

Example: you have a video game explosive barrel object with 3D position vector, and the player with another 3D position vector. Now to do the game logic, you want the player's position relative to the barrel, so you do player.pos - barrel.pos. Then you can check its magnitude for distance checks like if the player is in explosion hurt radius.

#

They wrote in the paper they want to encode the relative information of the probabilities.

#

First, we would like F to encode the information present the ratio Pik /Pj k in the word vector space.

#

If you use log probabilities, you get a difference instead of division...

#

(It's a morphism)

torn hill May 15, 2025, 12:08 PM

#

iron basalt If you use log probabilities, you get a difference instead of division...

so basically you are saying that log(Pik/Pij) = log(Pik)-Log(P(ij))
and thats why vector difference is making sense

iron basalt May 15, 2025, 12:09 PM

#

torn hill so basically you are saying that log(Pik/Pij) = log(Pik)-Log(P(ij)) and thats wh...

Ratio between probabilities is getting the relative info, and you want to also encode relative info in a vector space, which leaves the natural choice of difference, but yes, you can also get more into it with log probabilities, it makes it a bit more obvious.

#

Any time probabilities are involved, consider log probabilities, they make things way more clear.

torn hill May 15, 2025, 12:13 PM

#

Ratio between probabilities is getting the relative info, and you want to also encode relative info in a vector space, which leaves the natural choice of difference```

Yes this that its a natural choice to use difference , maybe i dont have the intuition yet to also understand this abstractly

Like i understand we want to encode info of a scalar value in vector space but how is that leaving us with a "natural" choice of difference, is my math pretty weak to understand this?

iron basalt May 15, 2025, 12:13 PM

#

torn hill ``` Ratio between probabilities is getting the relative info, and you want to al...

What other choice would you use in a vector space for relative info?

iron basalt May 15, 2025, 12:14 PM

#

iron basalt Example: you have a video game explosive barrel object with 3D position vector, ...

Consider this example.

#

It's physical, much less abstract.

torn hill May 15, 2025, 12:15 PM

#

iron basalt It's physical, much less abstract.

yes this actually helped in understanding the relative premise

iron basalt May 15, 2025, 12:16 PM

#

The difference vector on an abstract level encodes the relative information between the entities.

torn hill May 15, 2025, 12:16 PM

#

ok i think i understand this but while we are on the topic can you help me one more aspect?
So the paper further said that - While F could be taken to be a complicated func- tion parameterized by, e.g., a neural network, do- ing so would obfuscate the linear structure we are trying to capture

#

I tried to make sense that why neural networks wasnt the first choice here and this is what i ended up with - The GloVe paper emphasizes that while a neural network could have been used to learn word embeddings and might produce good results, such models often act as black boxes. This means they provide embeddings without a clear understanding of why certain relationships emerge. In contrast, GloVe is built on explicit statistical information derived from word co-occurrence counts, making its embeddings more interpretable. This aligns with the goal of enabling meaningful vector arithmetic (e.g., king - man + woman ≈ queen) and revealing transparent relationships between word vectors based on how frequently words appear together in a corpus.

#

is this making sense?

#

the problem was LHS and RHS were not equal , LHS was vector and RHS was scalar

iron basalt May 15, 2025, 12:17 PM

#

Simply it would mess up your ability to do simple vector operations that you want to be able to do with words.

torn hill May 15, 2025, 12:18 PM

#

iron basalt Simply it would mess up your ability to do simple vector operations that you wan...

Yeah the crux

iron basalt May 15, 2025, 12:18 PM

#

Because networks scramble things.

crystal pier May 15, 2025, 12:18 PM

#

oh hey I'm back

torn hill May 15, 2025, 12:18 PM

#

ok so i guess i am on the right path

torn hill May 15, 2025, 12:18 PM

#

crystal pier oh hey I'm back

well squiggle did solved my doubt although turns out it was pretty dumb

crystal pier May 15, 2025, 12:18 PM

#

I'm guessing sure squiggle has this on lock

torn hill May 15, 2025, 12:19 PM

#

@iron basalt Thanks man

iron basalt May 15, 2025, 12:19 PM

#

torn hill well squiggle did solved my doubt although turns out it was pretty dumb

It's not dumb, when papers just say stuff like this is "natural," it's begging to be questioned.

#

As it's often not the best choice.

torn hill May 15, 2025, 12:20 PM

#

iron basalt It's not dumb, when papers just say stuff like this is "natural," it's begging t...

so are there forums out there where people questions or tear down a paper?

iron basalt May 15, 2025, 12:20 PM

#

torn hill so are there forums out there where people questions or tear down a paper?

Yes, a lot.

#

It's also just part of the academic process.

torn hill May 15, 2025, 12:21 PM

#

hmm soooooo where should i look at, haha am kinda new to this

iron basalt May 15, 2025, 12:22 PM

#

You are already in Yannic's discord, they cover papers there every week in a call, and people there cover papers all the time in chat.

torn hill May 15, 2025, 12:23 PM

#

iron basalt You are already in Yannic's discord, they cover papers there every week in a cal...

ahh yes, I do join them occasionally

plush kettle May 15, 2025, 5:20 PM

#

Guys I need help

#

So I am trying object detection with keras resnet50, here is how i prepared my data: ```def parse_tfrecord(example_proto):
feature_description = {
'image/encoded': tf.io.FixedLenFeature([], tf.string),
'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
'image/object/class/label': tf.io.VarLenFeature(tf.int64),
}

parsed_features = tf.io.parse_single_example(example_proto, feature_description)

# Decode and preprocess the image

image = tf.image.decode_jpeg(parsed_features['image/encoded'], channels=3)
#image = tf.image.resize(image, [HEIGHT, WIDTH])
image = tf.cast(image, tf.float32) / 255.0
labels = tf.sparse.to_dense(parsed_features['image/object/class/label'])

return image, labels def get_object_detection_dataset(tfrecords_dir, batch_size):
files = tf.io.gfile.glob(tfrecords_dir)
dataset = tf.data.TFRecordDataset(files)
dataset = dataset.map(parse_tfrecord)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset```

#

and I built my model like this: ```def build_resnet50_fpn_backbone(input_shape=(640, 640, 3), weights='imagenet', include_top=False):
"""
Builds a ResNet50 backbone with a Feature Pyramid Network (FPN) for object detection.

Args:
    input_shape (tuple): The shape of the input images (height, width, channels).
    weights (str): The weights to load for the ResNet50 model.
        'imagenet' for pre-trained weights on ImageNet, or None for random initialization.
    include_top (bool): Whether to include the top (fully connected) layers of ResNet50.
        For feature extraction, this should be False.

Returns:
    tf.keras.Model: A Keras model representing the ResNet50 FPN backbone.  The model
                    has multiple outputs, which are the feature maps from the FPN levels (C3, C4, C5).
"""
# Ensure valid input shape
if input_shape is None or len(input_shape) != 3 or input_shape[2] != 3:
    raise ValueError("Input shape must be a tuple of (height, width, 3).")

# Ensure channels_last data format
tf.keras.backend.set_image_data_format("channels_last")

# Load ResNet50, excluding the top (fully connected) layers
resnet50 = ResNet50(
    include_top=include_top,
    weights=weights,
    input_shape=input_shape
)

# Get the outputs of the intermediate layers we need for FPN.  These are
# the activations before the pooling layers.
c3_output = resnet50.get_layer('conv3_block4_out').output  # Shape: (None, 80, 80, 512) for 640x640 input
c4_output = resnet50.get_layer('conv4_block6_out').output  # Shape: (None, 40, 40, 1024) for 640x640 input
c5_output = resnet50.get_layer('conv5_block3_out').output  # Shape: (None, 20, 20, 2048) for 640x640 input

# FPN layers.  These layers take the output of the ResNet stages and combine them
# to create feature maps at multiple scales.  This helps with detecting objects
# of different sizes.
# P5 is initialized directly from C5
p5 = layers.Conv2D(256, (1, 1), name='P5')(c5_output) # (None, 20, 20, 256)
# Upsample P5 and add it to C4
p4 = layers.Add(name='P4_add')([
    layers.Conv2D(256, (1, 1), name='P4_conv1')(c4_output), # (None, 40, 40, 256)
    layers.UpSampling2D(size=(2, 2), name='P4_upsample')(p5), # (None, 40, 40, 256)
])
p4 = layers.Conv2D(256, (3, 3), padding='same', name='P4_conv2')(p4) # (None, 40, 40, 256)

# Upsample P4 and add it to C3
p3 = layers.Add(name='P3_add')([
    layers.Conv2D(256, (1, 1), name='P3_conv1')(c3_output), # (None, 80, 80, 256)
    layers.UpSampling2D(size=(2, 2), name='P3_upsample')(p4), # (None, 80, 80, 256)
])
p3 = layers.Conv2D(256, (3, 3), padding='same', name='P3_conv2')(p3) # (None, 80, 80, 256)

# P6 and P7 are created by downsampling P5
p6 = layers.Conv2D(256, (3, 3), strides=2, padding='same', name='P6')(p5) # (None, 10, 10, 256)
p7 = layers.Conv2D(256, (3, 3), strides=2, padding='same', name='P7')(p6) # (None, 5, 5, 256)

# Define the model with multiple outputs

model = Model(inputs=resnet50.input, outputs=[p3, p4, p5, p6, p7])

#model = Model(inputs=resnet50.input, outputs=feature_map)
return model```

#

model = build_resnet50_fpn_backbone(input_shape=input_shape)``` ```losses = {'classification_output': 'sparse_categorical_crossentropy',
          'bbox_output': 'mse'
        }```  ```optimizer = Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss=losses) # Two losses: bbox and class```

#

model.fit(training_data, epochs=NUM_EPOCHS, validation_data=validation_data) when I tried that fit, this came up: y_true and y_pred have different structures. y_true: * y_pred: ['*', '*', '*', '*', '*']

#

Does anyone have any idea how could this happen

verbal oar May 15, 2025, 6:47 PM

#

debug it

main nymph May 15, 2025, 6:50 PM

#

done

#

i debugged it

verbal oar May 15, 2025, 6:50 PM

#

outputs=[p3, p4, p5, p6, p7]
['*', '*', '*', '*', '*']

main nymph May 15, 2025, 6:50 PM

#

by print("debug code")

#

(this is all i know in python)

verbal oar May 15, 2025, 6:51 PM

#

but "easier" to put some breakpoints and watch variables

#

with debugger

#

oh as I thought you can also add verbose param to fit

#

verbose: "auto", 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. "auto" becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (e.g., in a production environment). Defaults to "auto".

charred estuary May 15, 2025, 7:27 PM

#

For anyone looking to make their own dataset for their AI's lmk what you think of my project and I would love to see if you build off of it:
https://github.com/Tyguy047/Cluster-Dataset-Builder

GitHub

GitHub - Tyguy047/Cluster-Dataset-Builder

Contribute to Tyguy047/Cluster-Dataset-Builder development by creating an account on GitHub.

limpid dew May 16, 2025, 12:15 AM

#

what are you trying to do with that

toxic pilot May 16, 2025, 2:39 AM

#

plush kettle ```model.fit(training_data, epochs=NUM_EPOCHS, validation_data=validation_data)`...

check the their dimmensions; looks like one is a vector and the other is a scalar

charred estuary May 16, 2025, 3:15 AM

#

limpid dew what are you trying to do with that

make your own dataset program. tweak it, refine it idc

limpid dew May 16, 2025, 3:16 AM

#

like pandas?

charred estuary May 16, 2025, 3:16 AM

#

limpid dew like pandas?

no

#

read the README

limpid dew May 16, 2025, 3:20 AM

#

the readme isnt very descriptive, it's some llm training tool?

charred estuary May 16, 2025, 3:21 AM

#

I think it includes what it needs to. You build your own dataset to train or fine-tune a model. It's automated and generates data from multiple AI models to avoid inbreeding data. || @limpid dew ||

limpid dew May 16, 2025, 3:24 AM

#

have you build an LLM with it?

charred estuary May 16, 2025, 3:27 AM

#

limpid dew have you build an LLM with it?

Running it doesn't build you an LM you use the dataset it generates paired with your train.py file to train your model. You can modify the script to ask the cluster to only generate data that will help train an AI on python debugging or math or whatever you want.

limpid dew May 16, 2025, 3:29 AM

#

I think you misunderstood my question. That's okay. You say you can use it to train an AI on math? What kind of math? How do you do this?