#data-science-and-ml

1 messages Ā· Page 170 of 1

toxic pilot
#

Oh nvm lmao. What was it?

#

You don't need to persist it in the context window necessarily right?

#

Like I imagine you could provide the model with some functions, such as write_to_db or read_from_db, and have the Todo list retrieved istead

#

Ah smart

#

Hmm tru

#

How'd you do it?

#

Interesting

#

I was thinking maybe maintaining several databases šŸ¤”

#

Maybe running ollama or smth locally?

#

Well not locally obviously but like hosting your own model

#

Oh yeah and it's fantastic at code generation

#

I use it to write all my latex these days šŸ’€

#

Were u the person I talked to about Jax/Flax?

#

So maybe this isn't relevant to you but I've started using it

#

It's actually so good

#

Super speedy, and the syntax is a lot cleaner

#

Unfortunately I've had to implement some of the loss functions myself

#

But it's not too bad ngl

#

Like I had to implement Huber loss a few days ago

broken seal
#

hello guys, i am new to the server, i want to start with machine learning, can anyone help me with a good resource to start with.

nocturne saffron
#

how to use conda for data science and machine learning? (sorry I am a beginner)

serene scaffold
nocturne saffron
#

ouh

#

and

#

what i have to use?

serene scaffold
#

just regular python.

nocturne saffron
#

ok thank you

#

i just use conda for ml

serene scaffold
#

don't do that.

nocturne saffron
#

what i have to use for ml?

#

They say that conda is ideal for machine learning and data science.

#

i just watch and i i belive itšŸ˜…

serene scaffold
#

I have never used conda, and I've been doing ML since about 2018.

nocturne saffron
#

ohh

#

do you use regular python?

serene scaffold
#

yes.

nocturne saffron
#

ok thank you @serene scaffold

#

you are very helpful

bold patrol
#

hello guys i am kinda new to python and data science i kinda learned the basics of the python i still have some problem with the classes , lists and to add more extentions to the codes but what should i do and how can i learn better and i wanna become a data scientist if someone helps me i would be appriciated

nocturne saffron
# bold patrol hello guys i am kinda new to python and data science i kinda learned the basics ...

you have to learn Descriptive statistics (mean, median, mode, standard deviation, distribution)Basic Probability (Chance, Normal Distribution, Binomial)Linear Algebra (vectors, matrices, basic operations)Basic Calculus (derivatives, gradients — for ML later).
and the library NumPy - numerical operations and arraysPandas - read data, clean, transformMatplotlib and Seaborn - data visualization

#

oh yea

#

@serene scaffold can i add you?

serene scaffold
nocturne saffron
#

oh

#

ok

bold patrol
nocturne saffron
#

do you know the library?

#

oh yea

#

i forgot to say scikit-learn

jaunty helm
short barn
#

Hello

#

How and with what language can I run ai models on CPU efficiently?

serene scaffold
#

there's no way to run an LLM on a CPU efficiently no matter what.

sharp crow
#

This regression is driving me mad

#

No matter what I do I can't get R2 score above 50

short barn
#

1MB approximately

sharp crow
#

Lads I have a doubt my regression model is pretty bad so I am thinking of converting it into classification, is this logical?

#

Or I am just dumb

serene scaffold
sharp crow
toxic pilot
#

thats a big oof lmao

sharp crow
sharp crow
#

I was blind

short barn
#

I've heard that pandas is slow (and seen benchmarks)

serene scaffold
short barn
#

And there's a high load expected

short barn
#

I need to run it with zero overhead

serene scaffold
#

there's no such thing as zero overhead.

short barn
#

Ok, with close to zero

#

Something that would run on a potato without major issues

serene scaffold
#

why?

#

what's the larger context here?

sharp crow
short barn
serene scaffold
#

and random forrests don't need a GPU.

short barn
#

But I want minimal overhead

#

And probably want to avoid slow pandas

serene scaffold
#

idk where you're getting the idea that pandas is slow, but in either case, pandas is faster with the pyarrow backend than with the numpy one.

toxic pilot
#

supposed to be faster i think?

#

but i feel like most of these tools are already optimized

serene scaffold
#

xgboost is a different algorithm than random forrest.

toxic pilot
#

oh is it?

#

i thought it was a gpu implementation of random forrest

#

mb

short barn
#

Pandas spends 30 seconds on what polar does in 2 seconds

short barn
serene scaffold
short barn
#

I expect somewhat high load

#

Isn't there something like model runner or so

jaunty helm
#

and the other 2 popular libraries that you see around - lightgbm, catboost - are also GBMs and not RFs

devout quartz
#

hey guys

#

im a fresher as a cs major

#

i wanna choose data science in ai as my main goal

#

is it a bad option ,as i hear it from diff people about it negative sides

#

please give ur feedbacks

coral pumice
#

guys i have passion in ai where do i start learning

sharp crow
#

What negative sides are you taking about tho?

broken seal
quartz turret
#

ANYONE DOING JOB?

serene scaffold
quartz turret
serene scaffold
quartz turret
#

i'm hearing it for first time , can you explain what's that?

serene scaffold
quartz turret
#

in the field of ds is front end dev is neccesary to learn,what's your pov

quartz turret
#

what should i cover in my initial 3 months in my ds jounrny

quartz turret
serene scaffold
quartz turret
quartz turret
spring field
#

The domain of AI is soooo broad, so many things you could do
Here's an idea I think would be pretty cool (though I'm slightly biased towards making games in a way, lol)

You could make a strategy game and integrate RULER into it, making the enemy AIs learn using RL, but with a more attractive way of crafting the reward functions
You could record player's actions, then use an LLM to summarize those actions, then use an LLM to devise a strategy to be fed into RULER and then make it learn based on that, then you can set up RAG to store previous player strategies as well or something
I'm sure something can be made in that direction, for example

but like, sooo many options out there

runic parcel
#

I need to wrap a custom function (like an API call) inside a PyTorch nn.Module. Is it hard to do?

serene scaffold
worn atlas
#

hey guys, im pretty new to all this. Im currently a rising sophmore in highschool and im wondering what college majors should i look into if im trying to become an ML engineer.

serene scaffold
#

a torch module is like, a layer of a network. I can't think of a situation where it would entail an API call. and it would make it take way way longer to train.

serene scaffold
worn atlas
#

would that be best over data science or engineering?

serene scaffold
worn atlas
#

ah thank you

dry raft
#

Alright, so I'm working on GNNs for molecular representation, and I found two main types, imo: GNNs that learn off 2d data, and GNNs that work on 3D data. Which should I use for my project? Should I use both or stick to one? (My project is on the understanding on organic molecules with LLMs, for context)

worthy oasis
# worn atlas hey guys, im pretty new to all this. Im currently a rising sophmore in highschoo...

U should get CS the reason almost every concept used on ML is learned on a CS career is really useful on ML I'm currently working on some ML stuff on my job and some AI concepts that i learned on College really help's to understant ML also College is not all u should read from books, tutorials and also very fucking important LEARN HOW TO READ DOCUMENTATION that really helps in work usually on school everybody just type on Google "Wtf is numpy" and just skip the official documentation, it could be a little overwhelming but the the sooner you do it, the better for you and Good Luck and wishing success bro !

#

and dont forget : "Understand very well the basics almost every thing in programming are just basic concepts on Complex problems "

worn atlas
#

Thanks man I appreciate the guidance

quaint mulch
#

generally, most GNN shouldn't care of the data is 2d or 3d

odd meteor
smoky summit
#

heyyy,does anybody have a github student verified account ???
I really need it

grand minnow
devout quartz
smoky summit
smoky summit
#

no I have to be in a uni in order to get a student account, do you have one??

broken seal
runic parcel
#

requirement

serene scaffold
modest vigil
fresh sluice
#

AI is really a vast field but thats the thing about it , you go deep in some areas which in future may not be that much beneficial and thus things can go south for you.....so i was also looking for a person who is in this industry so that they can guide me on which path shall i choose

#

I have some time so if the path is new , i will on it ....i just dont wanna be left out

pseudo condor
#

Is there any webistes that help with learning about neural networks?

dry raft
#

3Blue1Brown videos are very intuitive

#

So I'd start there

solar thistle
#

Is anyone knowledgeable with GRU and transformer conceptually? I (this sounds stupid but it was good for getting the toes wet) have been working on exposing myself more to at least basic ML concepts, so I tried my best to implement an ML solution that can identify single word palindromes. I initially tried an LSTM, then I moved to working on GRU, and more recently have a GRU transformer hybrid, and I learned (probably nothing impressive) a bit about how to think about and abstract the basic ways to structure your dataset to improve a model of that kind. I’ve gotten it to a point to where the current model is 99.88% accurate on a data set of 800 randomly selected as well as generated inputs.

#

But now I’m trying to further understand how to identify what the model is doing over the course of its training, and I’m having a hard time making sense of the heat maps, I’m not sure if I’m generating them right or if I just don’t understand how to read them. Truthfully, I’ve done googling, YouTubing but also have used LLMs to try to get a better understanding and I’m still falling short.

#

I feel like I get how to identify when a model is overfitting and when it’s reached its peak for identifying over training. But I want to better understanding the process of what it’s doing, and how I can look at the training data to understand what correlations it’s drawing during the training.

#

If anyone’s got some insight they’d share, welcome to PM me if this isn’t seen until some later time

calm cipher
#

Hm a couple of things, first I guess there's nothing wrong with training a model to identify palindromes but it can be done very simply with a handwritten algorithm so machine learning is massive overkill

#

you're just experimenting so it is ok but it isn't something you would typically do with machine learning

#

I would expect massive overfitting with this problem, a dataset with 800 examples is extremely small compared to the size and complexity of the model

#

I'm curious where your 99.88% accuracy is coming from - when you're evaluating model performance, is it on a train/test/validation split of the data, or are you training and evaluating with the same data?

#

and finally I wouldn't expect looking at heat maps in general to tell you much, neural networks are famously black boxes and you won't learn much about what they're doing by peeking inside, unless you have a very specific problem where the model's attention directly corresponds with some explainable aspect of the problem

#

your model is overfitting if it achieves near-perfect results when evaluated on training data, but very poor results on test data that wasn't used for training

serene scaffold
#

@solar thistle are you tokenizing at the letter level? Otherwise, the model is certainly overfitting.

quaint mulch
quaint mulch
quaint mulch
#

And in general, this is still a very active area of research.

solar thistle
# calm cipher Hm a couple of things, first I guess there's nothing wrong with training a model...

Oh yeah I know that its an overkill solution for an already easily solved problem. But thats one of the reasons I picked it. Its an easy problem to model out and the possible ways to solve a palindrome traditionally is like, theres only 1 and its really easy to do lol. But That gives me an advantage because its not a problem that I have any difficulty understanding to the full degree how to solve it. I figured because of that the ML model that would solve it would for 1, be simple enough that I can use it as a self guided intro to ML, but also there wouldnt be any "magic" to how it works. That hasnt entirely really been the case though. Since this is my first time really trying to understand how ML works, I stumbled a lot to get to where I was. I had heard some of the terms before (LSTM, GRU, RNN etc etc) but wasnt sure how you identify a problem and which solution is most fit to be used to apply to it)

#

Really I feel like I learned more about the importance of well structured and valid data that represents the problem youre trying to solve lol.

#

Training intially wasnt that great, but I learned (for this specific application) things I hadnt really thought of before. Cuz youre right, its an easily solvable tradtional CS problem, but when I was working on the GRU and the LSTM both I noticed things about palindromes id never considered would be kind of important facets of what they are. For example the LSTM version I did would often misclass palindroms that were near-palindromes, consistently. Like wowowiw as an example often was mis-classed

#

so that lead me to generate data that specifically would expose the model to large amounts of data that included those kinds of near-palindrome edge cases, and improved the model success substantially. I think the orignal GRU model I started with would also fail a lot when you had the first 2 or 3 letters symetrically match the last 2 or 3, and it would basically ignore the middle section. Also got stumped when words had more than 2 repeated letters, no matrter where they appeared

#

Obviously what Im using ML for isnt impressive, but I dont expect to fully become ML capable, I just want to be able to better understand how they work and just be somewhat competent about recognizing how they work/what and how useful data is structred etc

solar thistle
#

this is how im encoding


def preprocess(data, maxlen=MAXLEN):
    alphabet = list(string.ascii_lowercase)
    char_to_index = {c: i+1 for i, c in enumerate(alphabet)}

    def encode(word):
        return [char_to_index.get(c, 0) for c in word]

    X = [encode(word) for word, _ in data]
    y = [label for _, label in data]
    X = pad_sequences(X, maxlen=maxlen, padding='post')
    y = np.array(y)
    return X, y```
#

so X ends up as a numpy array (num_samples, maxlen) and y ends up being the binary representation identifier of if the word is a valid or invalid palindrome

#

I only use words of len 12 so pad the delta of word.len and 12, and then use the padding to normalize all words to the same array len

#

Heres an example of the attention heatmap for both classifications

#

I was reading that the heatmap represents the y axis is the query, so responsible for computing attention, and the x axis is the listener thats attended to or whatever, which sounds kinda fine, I guess the query token basically uses the grid to represent how much "attention" was paid on the listener

#

conceptually that sounds fine lol. But like. So why then in the first graph did each encoded letter position pay basically what I udnerstand to be like "max" attention, all to the same letter

#

wouldnt, if how Im thinking of that and said it, was correct, wouldnt you expect to see a distribution of max heat like this? - or rather, where you would expect the most "attention" to have been attatched

#

thats kinda why Im asking for help, cuz im not sure if im just not reading the graph correctly, or if the graph itself isnt being generated correctly

#

And I know the model works well, so its not like its just randomly outputting some invalid data

#

The data is a mix of predefined english palindromes and non palindromes, and randmomly generated just palindromes/nonpalindromes from random characters

calm cipher
#

Maybe if you try adding some positional encoding it could work, but I think the principle of how attention works isn't going to help

#

If you're interested in interpreting how neural networks work, have you studied the xor problem?

sharp crow
#

Lads I want to make some good projects. Any recommendation?

mossy pond
#

Pi as function of angle of every number(0=0°, 1=36°, ... 9=324°) every number is one step forward in the angle of the number.
first 50000 second 200000 with the window of 50000

analog leaf
#

Hey, I hope that this is the right channel for my issue. I need to read some txt files with spectra Data. The thing is that both Excel and Origin can't really import the data. It just come out wrong. So I thought about doing it in Python. I got some old lecture skripts from a friend that goes in that direction but I don't really understand them. I either would like some help writing skript, which I can read all the files and have it build an Excel file so I can insert it into Origin properly, or recommendations for resources where I can do it myselve (preferably in a acceptable amount of time). I tried several AI Tools but the produced Excel Files all have major problems.

All the files look like above (I don't know if it acceptable to uploade one here, I would need to change some meta data in order to not post personal info). And I need an Excel file that looks somewhat like the following:

Mass (m/z) Value (counts)
100.3 281083.092750

can somebody help me?

#

Please also fell free to ping me

cursive totem
#

hey guys, i want to learn ml and now i am at a stage of learning pure python. Is asyncio worth spending time to learn or should i skip this step? i learned about coroutines and it says that it is useful in asynchronous programming, so i wonder if i even need it

#

and is algorithms knowledge (like in leetcode) needed for it? i will make graph neural network (some physics applied) as my diploma thesis, at this point i still didnt look up what graph neural network is (and neither my supervisor lol), so i wonder if i need to know algorithms like BFS or DFS

neon owl
#

Yes

serene scaffold
serene scaffold
#

Neither are most leetcode questions. You should probably know the main sorting algorithms and understand their asymptotic analysis. Same for graph traversal algorithms.

#

"BFS and DFS" you need to understand these, and if you can't code them, you don't understand them

serene scaffold
#

I think this is the most questions I've answered in my three-stop train ride

serene scaffold
#

Yes

calm cipher
#

It looks like 3 separate tab-separated tabular files combined into one with headers separating them

analog leaf
calm cipher
#

If you manually separate them, pandas should be able to open them as is

#

Granted you'll probably have to do more cleanup, but at least you'll have it in memory in a format where that's possible

#

If you can't manually separate them you'll have to try to do it in code, which could be easy or hard depending on how much variation there is in the structure

#

Actually hmm the first two sections look like key value pairs

analog leaf
calm cipher
#

Only the third section is tabular data

calm cipher
analog leaf
analog leaf
calm cipher
#

I mean it could completely make up numbers that were never there in originally

#

If you still want to use AI, try something like this in your prompt

#

You want to write a program that will read the contents of a text file into memory, but will ignore all lines that occur before "Raw Data:"

#

Once you have it in memory as a string, you can load it into Pandas

analog leaf
#

I will try that tomorrow or the day after chatgpt told me that I hit the limit for data analysis

calm cipher
#

Yeah give it a shot

#

If you're intending any or part of this process to involve python scripts, even with chatgpt, you're going to need to know some python

analog leaf
#

I know enough so I can read most of what I encountered. Just the writing process is the problem

#

And I am working on refreshing some stuff with lecture notes and books from my Uni library

fallow coyote
#

Atm, getting a bit bored with ai/ml. I feel like the projects I’m doing are not piquing my interest. I’ll still go through with them because I need the practice and experience in applying my knowledge but, I’d like suggestions in interesting machine learning projects

solar thistle
# calm cipher I am not 100% sure about this but I suspect attention isn't right mathematically...

Thats kinda like, the only thing that makes sense right? Like. I said I wasnt sure If i wasnt reading it right, or if it wasnt being generated right, but maybe thats just not the way to conceptualize the model performace, I didnt really think about that so thank you so much for the input. And no I havnt heard of that but I will do some reasearch and hopefully that points me in a better direction that Im going now, thank you very much for taking the time to read and respond to me!

calm cipher
#

And because determining palindromes requires considering position, that means the RNN has to produce a sequence of vectors where, to use the outer two characters as an example, the first vector is most similar to the final vector

#

Or at least it has to produce vectors that can be transformed into a key space and query space where that is true

#

But I also think that is probably very hard for a RNN to do because it goes into the input without knowing how long it is

#

I suspect it might be possible to make it work with a very deep multilayered bidirectional RNN, and it also might be possible that it is doing something unrelated to position that still generalizes, but at any rate it is probably just overfitting and memorizing the training data

unkempt thorn
#

any help would be amazing

nocturne whale
#

We Just Build an AI Agent without a Big A$$ Prompt

Last year we tried to bring an LLM ā€œagentā€ into a real enterprise workflow. It looked easy in the demo videos. In production it was… chaos.

• Tiny wording tweaks = totally different behaviour
• Impossible to unit-test; every run was a new adventure
• One mega-prompt meant one engineer could break the whole thing
• SOC-2 reviewers hated the ā€œno traceabilityā€ story

We wanted the predictability of a backend service and the flexibility of an LLM. So we built NOMOS: a step-based state-machine engine that wraps any LLM (OpenAI, Claude, local). Each state is explicit, testable, and independently ownable.

NOMOS supports lots of llm providers including OpenAI, MistralAI, Groq, Gemini/Gemma, OpenRouter, Anthropic, Ollama and Cohere. and there is lots of functionality already there and more are coming everyday.

Open-source core (MIT)
• GitHub: https://github.com/dowhiledev/nomos
• Documentation: https://nomos.dowhile.dev/

Looking ahead: we’re also prototyping Kosmos, a ā€œVercel for AI agentsā€ that can deploy NOMOS or other frameworks behind a single control plane. If that sounds useful, Join the waitlist.
https://nomos.dowhile.dev/kosmos

Would love war stories from anyone who’s wrestled with flaky prompt agents. What hurt the most?.

nocturne whale
#

yes, you can use nomos validate --config ... to check the validity

#

also can generate the schema easily using nomos schema and use it with your yaml

#

We introduced steps and transition between each steps. transitions are fully controlled by different routes and conditions. If the conditions are not met those routes will be not taken. Also we have introduced lots self healing techniques such as as soon the llm makes an mistake we will constrain the options it have dynamically, so next time it tries back it has fewer options.

calm cipher
#

This is an ad for a service, isn't it? <@&831776746206265384>

south quest
#

ehhh

#

it's OSS

#

but it's edging on advertisements

#

@nocturne whale showcasing projects is okay here, but not anything which has any sort of paid offering, just as a reference

#

i think what you're posting is okay, apart from the fact you're posting it in multiple channels. showcase OSS work but don't do it in the form of length walls of text, it's not appropriate for the space and violates the rules.

calm cipher
#

There's a waitlist for a control panel service

south quest
#

talking about OSS software is fine -- promoting paid offerings for upgraded versions of OSS suites is not okay and over-promoting the corporate side of OSS is not appropriate for this space

hollow pagoda
#

Those look more fun especially the game labs

#

More challenging aswell

exotic star
#

is why machines learn a good start for learning the math

#

or is khan academy better?

#

im already familiar with python and im currently learning pandas and numpy tho i was told here that learning the math is crucial for learning the libraries for data science

fallow coyote
calm cipher
#

there are some simple reinforcement learning problem formulations that might help you learn the math as you learn the RL

#

maybe try studying something like multi-armed bandits to start with

#

it's usually one of the earliest things you'd study in reinforcement learning anyway

small wedge
#

while a lot of RL algorithms require a lot of math like qlearning and gradient policy stuff, I find things like evolution strategies and genetic algorithms to be a very easy way to jump into some fun RL projects without needing much math. Or at least the math involved is very simple and intuitive; what it means to preform crossover and mutation is completely up to you as the dev.

weak seal
#

can anyone here proof chain a sigmoid activation function all the way to an equilateral triangle?

sand herald
#

as im kinda mid-level already, I think it is time for me to specialize

#

and almost all jobs are asking for LLM experience, ugh

#

my current skillset is more towards ML/DL/optimization

vocal zealot
#

and it wasnt even allowed to use the internet, or external tools, and had the same time as the other contestants

#

PURE NATURAL LANGUAGE.

that is seriously impressive

#

It seems test time compute is indeed extremely effective. Of course, along with other algorithmic breakthroughs

sand herald
#

yeah i saw that news but uh... i dont know. seems to excel in advanced areas but struggle with other areas which are deemed more basic?

#

anyway, it seems like in industry, usually the applications of LLM are kinda... boring? like chatbots and shit

vocal zealot
hoary jay
#

hey, grade 11 student here. im lowk interested in learning abt data science and ai and i wanna be able to land a small internship next summer before going into uni for cs. any tips on how to get started?

vocal zealot
#

That is absolutely wild

#

unlike googles model

#

that was specifically made for this

#

this is fundementally different

sand herald
#

does anyone here understand LLMs on a deep level? i've been struggling to see it past a "next word guesstimator"

vocal zealot
#

It can think for many hours, that's the difference between a model like this, and say... gpt 4o, that responds almost instantly

sand herald
#

oh, so i guess it's very different than a vanilla Transformers then?

#

i have some basic understanding of Transformers as I'm going to need that for my next project. my "understanding" of LLMs is from that

#

i dont follow the LLM space/progress closely

vocal zealot
sand herald
#

right. any idea how it's being done differently? is it a totally new model architecture or are they just adding on extra stuff to the core (which I assume to be still Transformers)?

vocal zealot
sand herald
sand herald
#

because at the end of the day, it's all numbers isn't it? are they letting the LLM loop more...? against different parts of the pre-trained dataset? ugh

#

yeah

#

i know MLPs, LSTMs on a quite deep level

#

and basics for transformers, CVAEs

#

because to me, these models are learning a set of weights, which are just numbers, from data. i think that'd be good enough to guess the next word, and the next. or even a sentence or paragraph if given the right architecture and data. but to say that they can "understand meaning" and "reason", that's a bit of a stretch to me.

but it could be because i don't understand the leap from Transformers -> the current LLMs

#

i've seen some of the basic ML/DL models achieve surprising things though

vocal zealot
#

Quick reminder: Don’t end like this idiot. He was proven wrong a day later.

calm cipher
#

Until they publish their results and methodology in a way that is reproducible by other researchers, this is just marketing

vocal zealot
#

These all (obviously) Open Ai researchers.

vocal zealot
#

If you think they are literally just lying like this… then you are beyond saving… you do you I guess

#

Yeah that will probably happen, as it literally has with every single Open Ai release…

#

But the model will be released likely by end of year

#

The most insane part is it’s a GENERAL reasoning šŸ’€

This is not like Google, who creates their models specially for this.

#

THIS is what AGI is about

#

GENERAL intelligence. Google deepmind results are cool…. but far less impressive

#

What?

#

Open Ai said it passed the bar and then MIT said it didn’t?

#

It does easily pass the bar now btw lol

#

šŸ¤¦ā€ā™‚ļø

#

I’m sure they are all lying bro.

#

But we’ll see

#

This insane distrust people have for literally no reason is so insane. Especially with a company like Open Ai which has a pretty good track record believe it or not

#

Every single open ai model ever released in the history of Open ai has had a paper along with its release.

This model obviously won’t have one as it’s not coming out yet.

calm cipher
#

They don't, they are extremely closed off and routinely hype themselves up and actively try to scare people

#

They are very closed off to the peer review process

vocal zealot
#

But I guess we’ll see. Im sure in 3 months it will come out that it was ALL A BIG LIE PERPETUATED BY EVERY SINGLE EMPLOYEE, LIKE THE LAST TIME THAT OPEN AI… did some imaginary major lie like this which apparently I’ve never heard of

#

It’s ok. We’ll see. For now, GPT 5 is incoming, which is exciting.

#

Anyways…

#

Ima go eat pizza now

#

that movie was fucking awesome btw. ok im leaving now

sand herald
#

uhh i also view these news and benchmarks with distrust

#

to me, openAI needs investors and they tend to hype things up

#

kinda distrust sam altman as a person too

#

remember a few months ago when they were hyping AGI

#

and how close we were to AGI

#

actually, what is the deal about getting gold for math olympia?

#

what i mean is, how differently are the math olympia questions structured differently from ... i don't know, typical questions that the general public ask an LLM?

#

multiple layers of reasoning?

#

it's funny they mention that the model has no access to the internet or tools? i kinda thought it almost has the entire internet as its training data, isn't it?

#

I see

#

that's kinda strange though

#

i'd imagine

#

i'd think that math equations would be closer to code than say... natural language

#

for LLMs to perform better in natural language than math, it's interesting

#

yeah on that, has there been any research on why LLMs hallucinate?

#

yeah i've read something along this line as well. perhaps human-like reasoning is flawed, full of gaps and we tend to "hallucinate" too?

#

it could also be that the training data has much more natural language than code?

#

or that the underlying architecture, LSTMs and Transformers, seemed to be designed more for natural langauge rather than code or math

#

do you have any resources to quickly understand the leap from Transformers to the first-generation of GPTs?

#

i roughly understand how transformers work, but the Generative & Pre-trained part of it, I don't

#

haven't tried looking at it yet šŸ˜›

#

oh wow

#

this almost simulates how i add numbers quickly

#

but anyway, interesting, so there are multiple pathways

#

i guess that same set of weights has to be used across all functions/domains

#

not just simple addition/subtraction

#

oh yeah, i was about to suggest something similar, to have steps/layers at the start to figure the task

#

they probably already have it though

#

this is probably a simplified diagram

#

im kinda wondering about this part as well

#

do they use the same set of weights for all tasks? or different sets of weights for different tasks? or different PARTS of the weights for different tasks (i.e. can we activate/deactivate some of the weights depending on the task, as i'd imagine the other parts of the weight to represent the "reality" that is not specific to the task)?

#

ah this is making my brain hurt

#

hahahaha

#

😮

#

maybe i should go into LLM research

#

and get 1% of that sweet sweet 200 mil package

#

ahh this is too advanced for me. i don't have the background for it

#

hahahaha

#

i see

#

openAI is closed-sourced, how about the rest like claude?

#

right

#

"Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters." so 32B params is considered small?

#

in the world of LLMs

#

yeah prolly, I can run 1B params on 4 L40s

#

🤣

#

params is referring to the number of weights/ the scalar values that are in the matrices, right?

#

hmm that doesn't sound like a lot

#

i dont know, unless 1 param takes different amount of compute depending on if you're running a MLP vs LSTM vs Transformers, etc.

#

my own models had i think... uh

#

1GB of params

#

which was about 1B params if i remember correctly

#

which is why im a little surprised at the 50-100B number hmm

#

anyways

#

yeah

thick heron
#

Hey, I’ve been building a local AI assistant in Python — voice + text input, mood engine, memory with ChromaDB, and an LLM running via Ollama. It runs offline and uses a personality system that shifts tone based on mood vectors (sort of like a 12-band EQ). Also uses local TTS.

The problem is I’ve hit serious hardware limits (6GB VRAM). I’ve already optimized it to load models on-demand and split behavior vs task logic, but beyond this, the system can’t handle more development or testing. I’ve tried to keep everything modular and as light as possible, but even basic scaling breaks things.

At this point, I’m mostly just finishing up documentation. Not sure what to do next — cloud is not really an option. Has anyone here worked on something similar or found any hacks/tricks to keep these kinds of projects alive locally on limited hardware?

Appreciate any ideas. Not trying to show off, just genuinely stuck.

thick heron
tidal bough
#

I'd imagine it could have some steps to figure out what the question is asking, a step to encode the number and operations, but then why not just take advantage of the fact that it's running on a computer and that the network is just processing a bunch of vectors already ?
I don't think that's possible - remember that there's stuff like activation functions on each layer

#

the fact that everything is highly nonlinear harms it here - it has to find a combination of nonlinear operations that approximates multiplication

#

So it can't, just, say, have a subnetwork that turns a written number into a single scalar that it can then do operations on, because a single scalar can't represent a number - it'll instantly get truncated by the activation function

tidal bough
#

(vllm is more hardware-demanding, e.g. it refuses to work on my Pascal-architecture GPU, but it's significantly faster and I suspect you can configure it for low RAM usage, too)

tidal bough
#

oh, and the new Gemma models, too. It's popular for local reasoning because it lets you pack more capabilities into a limited amount of VRAM (or even RAM).

jaunty helm
tidal bough
#

Well, sure, but tt doesn't instinctively seem to me that there's simpler ones then the one that paper finds?

thick heron
tidal bough
#

ollama is already using heavily quantized models, doesn't it?

#

it always downloads the Q4_M gguf (IIRC) unless you specify an exact huggingface file

jaunty helm
jaunty helm
thick heron
thick heron
jaunty helm
thick heron
#

I tried phi 3 tiny lama

jaunty helm
#

some models of similar size:

  • llama 3.x 8b
  • gemma 2 9b
  • gemma 3 12b
  • mistral nemo 12b
  • qwen 3 14b
thick heron
#

They are too verbose or 1800 scholars

jaunty helm
# thick heron I tried phi 3 tiny lama

I haven't heard of "phi 3 tiny llama," but I know that the phi series is more meant for single turn reasoning / instruction following
they are extremely dumb when it comes to human knowledge
so maybe not what you want

thick heron
thick heron
tidal bough
#

maybe also see phi4-reasoning:plus, somehow I missed its release entirely but it's 14B and dominates benchmarks... though depending on the GPU it might be hard to fit into 6GB

jaunty helm
thick heron
tidal bough
#

Your project is quite interesting to me because I also have only 6GB VRAM (and an old GPU that doesn't support a lot of important features like bfloat16), and I mostly concluded that all the LLMs I can run locally are too dumb to be useful, even as assistants

jaunty helm
#

there is offloading to cpu but it tanks inference speed obviously

thick heron
tidal bough
#

Isn't that already part of the training process for most small models?

#

well, I guess they weren't finetuned on this exact environment, so it might help

thick heron
#

It was a multipurpose project

#

An assistant has to be balanced in all regions

jaunty helm
#

well but for someone to finetune a model you need quite a bit more compute than you need to run it
finetuning on 6gb of vram sounds sketch

thick heron
#

I tried a multi-model style. It's too much latency

#

Like brain swapping

#

Trying to make this fully local with colab u get high resources I get carried away puts me off the goal of making this possible

#

In any laptop with a gpu and 6gb vram

jaunty helm
thick heron
#

I will try that

#

I don't know much about fine tuning where do i learn more about it ?

#

Final question thanks for all the support guys

#

I will try youtube if I get any problems I will comeeee back 🤧 tysm for helping

cursive totem
#

Guys, is jupyter plugin in pycharm better choice than pure jupyter? I mean in pycharm it has much better tools, even fancy looking array display and i have pycharm pro cuz i am currently a student

acoustic barn
#

Hi all, what laptop would you recommend for starting with a data science and ai bachelor’s degree?

serene scaffold
serene scaffold
cursive totem
#

So i guess i will stivk with it

oak coyote
#

I want to make an Project about the skills i learnt and how to integrate all things
I have understanding about - Python, numpy, pandas, streamlit, sql, matplot and powerBI

#

I am new at discord, finding peoples which can help...

proven pier
#

Is optical character recognition still practical tech?

proven pier
#

Pretty interesting. Thanks for giving me the rundown

oak coyote
#

I am just a beginner

#

And held on tutorials

#

But wanted to make some projects but not able to do so

#

I want to learn ML

acoustic barn
#

Thanks! I also have a macbook, but unfortunately I require a laptop with a minimum of nvidia 3070, so I need to buy a windows laptop

oak coyote
#

And somebody asked me to firstly learn about Data Analytics

acoustic barn
#

But I’ll just wait for the first lessons and ask a teacher if it’s really necessary

#

Yeah it’s probably for the machine learning course

oak coyote
#

and now i got stuck not able to find projects

proven pier
acoustic barn
#

Ohhh that’s sick

proven pier
#

Yeah, that's the most practical thing to do...

acoustic barn
proven pier
#

A server is just another computer

#

So you have a crappy cheap laptop, that you use to SSH into (remotely log in) another computer (the beefy server) that runs all of your costly computations on

acoustic barn
#

It is the same idea of logging into a raspberry pi from my macbook?

proven pier
#

Yes they are both computers

acoustic barn
#

Ahhhh I see

proven pier
#

Do you use SSH to get into your raspberry pi

acoustic barn
#

Yes

proven pier
#

Yes exactly the same

#

But instead of spending 2-3 thousand dollars on a laptop

#

You might spend 100 and then rent a server for like 10-20 bucks a month

acoustic barn
#

Where would I be able to do that?

oak coyote
#

Ohkay

proven pier
#

I'm not in the market, so I will literally just use my search engine and see what prices I see

acoustic barn
#

What should I search for?

proven pier
acoustic barn
#

Also, does it work without flaws?

proven pier
#

Just look up VPS, gpu's, machine learning, do research from there

#

Dont straight up buy, read about what people have written

#

Not just about specific services, but the entire process

acoustic barn
#

Yess I see, thank you for your explanation 😊, I’m going to look into it

proven pier
#

Then you can weigh it against owning the GPU machine yourself

#

Idk how many semesters you're in class for. But if you can find the GPU in a laptop for $2k, divide the VPS monthly rental price by that and see how many months it would take to cost more

#

If rental fee is $50 a month, it would take 40 months of renting to cost $2k, or over 3 years

acoustic barn
#

The degree is 4 years

proven pier
#

You wont be in class during the summer

#

and maybe a month off during winter

acoustic barn
#

True

#

Oh no only 2 weeks ):

proven pier
#

Okay so what, 9 months of classes?

#

9*4 = 36 months

acoustic barn
#

I suppose yes

proven pier
#

$2k / 36 months = $55 a month

acoustic barn
#

Hmmm

proven pier
#

So if you get something beefier than that, it would maybe be better to buy a laptop at the $2k. Once again, you would need to weigh the option of what the laptop provides

#

Your laptop will also be running other graphical things when you're using it, so you probably wont have access to the entire GPU resources to program

acoustic barn
#

Is there a laptop you would recommend? If I were to go for a new laptop

proven pier
#

No I have no idea

acoustic barn
#

Ahhh okay thank you again for your replies ur amazing

proven pier
#

Just looking at a VPS, here's the rental price for 2 years (monthly rental price), and here's JUST THE GPU BY ITSELF

#

So you could buy the GPU at $2700 and own it, or you could rent it for 2 years which will cost $312

#

@acoustic barn ^ just things to consider. So once again, just look at hardware and VPS providers and see if things line up and how cost efficient it is to go one way or the other

#

It almost seems ridiculous, 18 years of renting until it costs more to rent? Yeah, I would just do a bit more research. I'm just saying, what I've heard is it's better to rent servers for this type of thing

acoustic barn
#

Thank you for sharing! Definitely gonna look into it

proven pier
#

Follow up - I have ZERO idea why it's showing.. indian? currency in the gpu price on the right? It thinks I'm in mumbai? šŸ¤”
Whatever link I used did amazon.in which is india, I'm guessing

#

I was thinking what the hell is that currency symbol

proven pier
glass carbon
#

Yeah what about a recent gpu such as rtx 4080 or sth?

acoustic barn
acoustic barn
proven pier
#

Yeah looks like after 6-9 months it's better to own. At that point, probably best to own a GPU somewhere and use it - if you're going to need it for 4 years.
I suppose the whole "rent dont buy" is for people who want to spin up a new application for their company during 1 month or something - train their models, then use it on "normal" computers

glass carbon
#

Ask older students what they use @acoustic barn

proven pier
#

Maybe you could have a cheaper gpu for personal use/more frequent lower end class projects, then when big projects come through you rent one of these bad boys

acoustic barn
proven pier
#

Yeah if you could get ahold of a senior or junior and talk to them about how workloads are

acoustic barn
#

I will ask someone in year two when uni starts

proven pier
#

Will they not lie about hallucination

#

If it already hallucinates wont it believe its information is true

#

Only when I directly "confront" it by saying it's wrong will it say so. And even then, it's because they want to agree so bad

#

Maybe some sort of roleplaying like "you are under oath, and perjury is a penalty that is enforced with jailtime" or some crazy shit šŸ˜‚

#

idk how practical it is to test hallucinations. If hallucinating scenarios are even reproducible. I feel like a lot of LLM results aren't reproducible. Or maybe I'm ignorant, can you provide static seeds to LLM applications so they always respond the same to input?

#

That would be the only way I could see any sort of real test driven solutions bearing fruit. But then again, it would form an extreme bias towards that one seed, so šŸ¤·ā€ā™‚ļø

#

what are you designing here

#

human memories are just in yaml format?

#

I'm joking, I was just referencing that image I responded to

#

That's fun. I haven't gotten deep into AI, but I have wanted to because I have some of my own opinions on the human brain. It's cool how you're tying your conception of memory together in the process of information flow

#

Just gotta program the DNA that seeds the whole process, then it can be a reproducible learning agent with access to actuators and sensors šŸ˜‚ maybe try to give a large reward mechanism towards friendly sociable, non psychotic selfish behavior šŸ˜‚

#

LLM's are just trained models that predict statistically likely responses to queries. I still keep an open mind that a lot of tech people are skeptical of AGI possibilities. The brain exists in the physical universe, I dont see why it can't be recreated outside of our biological context

#

I think the human brain tries to put everything it interacts with into a sort of mental state. And we are always predicting where we expect those states to be. And as we interact with the world, it adjusts how our inner state representation looks and behaves. And as we further interact with the environment, our prediction of entities improves. You see a car driving down the street, you expect it to continue and it does. When it crashes, that is surprising. You know it can happen. Sometimes it's surprising. However, sometimes you see somebody driving erratically and you sort of expect an accident. It happens, still surprising, but not as much

#

You also expect your kitchen to be in a "state" of configuration. You leave your room, or enter your house, and are thirsty, so you move towards the state that should satisfy your quest

#

If it's a natural or instinctive decision, I can see it coming first. But some decisions you truly have to contemplate over

#

Then the act of deciding, it sort of doesn't matter which "happens" first. I would presume the "conscious" awareness must lag a bit

#

You can only be aware of something once it exists

#

Awareness is probably just your own ability to measure the state of your mind. Well, your mind reaches that state before it can be measured

#

Upon being aware, you are in a state of review, and you still have the option of "changing your mind"

#

You've trained your brain how to make decisions

#

It's been trained over a lifetime of you making decisions, right or wrong, then reviewing them

#

Even if you make a decision, you can review it and adjust it or change it completely

#

All of which requires the awareness and review process

#

You could choose to throw yourself in front of a bus, but you are not passive. It's always an option but your awareness is saying it's probably not a good one

#

Maybe sometimes you want to quit your job, but you still decide against it. Not passive at all

#

Yeah I mean we have a brain stem that breathes for us, or makes the heart pump. There needs to be an underlying order that we don't consciously focus on or we would get nothing done

#

If we had to be consciously aware of the prediction mathematics and calculus that goes into the process at all times it would be overwhelming. However, you can review and report on how you predict things using your words. It would be extremely verbose to lay it out in detail in realtime 24/7, but you can get quite introspective

#

When you say evolution, you mean neural network architectures predisposed from dna? Because babies still have to "learn" and train their models through experience

#

There's certainly some architectures already setup from dna. And initial seeding of its "training set" potentially which is why most babies act the same

#

I suppose that would come from evolution pressures, yes

#

Maybe no initial seeded training set, just randomized. Crying and all that is just lack of emotional control (training)

dull radish
#

Hey guys, I'm looking for a way to parse a pdf file into a format like:
{
"title": "This is the title",
"outline": [
{ "level": "H1", "text": "Introduction", "page": 1 },
{ "level": "H2", "text": "Main content", "page": 2 },
{ "level": "H3", "text": "Conclusion", "page": 3 }
]
}
pymupdf has been my primary basis to extract the content from the pdf but any idea, what can be done further from that, simple heuristics extraction doesn't always work for all the different types of pdfs

P.s apologies for disturbing the existing convo

proven pier
dull radish
#

Especially when u take into account more complex pdfs

topaz sail
#

Hello so I am new in coding and I wanna learn data science and I wanna get the basics in math first where can I learn? Any recommendations

sour blaze
# dull radish Hey guys, I'm looking for a way to parse a pdf file into a format like: { "title...

Given the mostly unstructured nature of the PDF format, everything would likely need to be done through heuristics. This could be things like detecting headers from font usage or the document outline or tables from the way items are grouped. Fortunately here, there are multiple existing tools for this, primarily for use with LLMs but perhaps useful here, such as docling, markitdown, and PyMuPDF4LLM to name a few.

Depending on where the document was sourced though, it's possible that it already includes similar information in the form of logical structure. Unfortunately, I'm not sure if anything supports a high-level interface for it though most PDF processors will let you access the StructTreeRoot key where this information is stored. So that would probably be a last resort.

dull radish
# sour blaze Given the mostly unstructured nature of the PDF format, everything would likely ...

I have explored docling and pymupdf4llm as of now, docling does seem to do a decent job although I have to probably switch to an onnx runtime to fall under <200mb time constraint.

Just tried out pymupdf4llm earlier and it's pretty solid as well but not quite enough on its own, I'll try applying heuristics and see where this goes.

Haven't tried markitdown so will do that and see.

and I'm surprised this would be done through heuristics ideally, I thought a light weight classifier or a visual analysis tool maybe using yolo finetuned on a dataset would be a better approach.

and yeah I wouldn't bank on that, the ones I'm using don't have that yeah.

#

thanks though I'll definitely give this a shot

tranquil jasper
#

hi
how long would you say, on average, would take for someone to get a grip on computer vision?
someone who knows programming, but nothing about computer vision or ai in general

tepid bluff
#

claude 4.5

grand minnow
#

lol I didn't think I asked too much too often to hit these errors

raven garden
#

Hello guys, hope you are doing well. I have problem I cant solve. I try to deploy my ML model with streamlit. its the first time I do this. I ve been said that the model and other relevent files should be in the same directory than the streamlit app python file, which is what I did. however when I run the code I got the error message ( that I wrote in the code ) : "Model file not found! Please ensure 'los_model_complete.pkl' is in the same directory."

I do not understand whats going on, if someone can guide me on how to solve this I would appreciate it very much, thanks

#

P.S: I made a mistake in the code screen shot, the file should be los_model_complete.pkl and not los_best_model, I changed it but it still doesnt work

sage sparrow
#

I'm looking for a realistic e-commerce dataset. Does anyone know any sources outside of Kaggle? My next bet was looking into synthetic data creation with Python functions since it's just for a showcase project

stable flower
modest badger
#

Ai cannot "infinitely recycle energy" like a perpetual motion machine

naive river
modest badger
#

haha, well, I am excited to hear the thought

tepid turtle
#

hello, chat. I'm new here and wanted to share my data cleaning/ visualization project. Just looking for some feedback from you, so is it ok if i share a link on the gihub preview and a repo here?

tepid turtle
#

Thanks) My project will be especially interesting for those who're interested in media, politics and journalism things

So I was looking at the Reporters Without Borders data estimates on countries' press freedom index, and thought that if we could see this data on the graph along the years - it could me much more informative than a simple timestamp that they have on their infographics in a form of worldmap with countries on it.

So here I've collected their data from 2002 and until now about the counties ranking in press freedom, their score and different factors (which are very valuable, but initiate only from 2022.

You can play around with it, selecting different countries, years and factors. Again, could be very informative for those who appreciate a truthful journalism.

And here's the link: https://vlad-gby.github.io/rsf_index_visualization/
And a repo with files and a notebook (readme is not ready yet, working on it): https://github.com/vlad-gby/rsf_index_visualization

GitHub

I decided that the data form RSF can be structured better and in a more informative way - in a form of interactive graph - vlad-gby/rsf_index_visualization

stable flower
#

@modest badger the whole point of a PMM perpetual motion machine is to make a way to infinitely recycle and use heat energy to power machines or in general support civilization. the problem with every life form and physical object is its subject to entropy and loses either its structure or energy.

#

the only way we can avoid death or extinction is by finding a way to reverse or manipulate entropy and bypass the laws of thermodynamics

#

so far there doesnt seem to be a way to do it

#

it seems like we all will die with this bright staryy universe

#

whats even more terrifying than death is a cyclical universe. scientsits have said that our universe will never repeat again and this is likely to be the first lifespan or itereation of the universe. whether that sounds stupid since this is the only known universe as if there were more. well my point is scientists said that we will never be created again through another big bang or big crunch

#

it still worries me since there is such a possiblitiy for lifeforms on earth to exist once more and billions of animals suffering and being exploited and humans suffer as well

#

i dont wanna come on earth anymore

modest badger
#

so your angst is not just your own mortality, but the universes?

torpid mirage
#

Me when I'm not isent from transformation and entropy as a physical object within the universe

serene scaffold
#

But how is this related to data science and AI?

serene scaffold
mossy pond
vast thunder
#

What are some good resources to get started with AI and ML? I barely know anything about it. And please a more "practical" approach. I couldn't keep reading the Microsoft AI tutorial on Github just because it was more theoretical than practical

serene scaffold
#

!resources data science

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

vast thunder
vast thunder
serene scaffold
vast thunder
serene scaffold
vast thunder
buoyant reef
#

what module should i use to make an ai

serene scaffold
#

Like what is this AI supposed to do?

buoyant reef
#

i want the ai t be a very. low level catbot

#

chatbot*

split olive
#

Hello, im trying to make a suspicious activity detection model given as a challenge to me by my prof, my current plan is to make a shoplifting detection model then build upon other activities. Im going to use yolo11-pose.pt model to get the keypoints, label each frame and feed them into an LSTM model so it can predict whether or not there is a shoplift. Is this a good approach, i really want some advice. I havent started coding im only looking for approaches. Any help is appreciated :)

reef latch
#

hola

#

anyone know how i can get a start into machine learning through python?

split olive
dull radish
stable flower
#

thank you lisan. what a perfect answer you gave me about my issues. i really agree with #3 as well as the rest of what you said. I suppose there are ways to live much longer or possibly live nearly forever. although we may never defeat the laws of thermodynamics which governs us and energy exchange. i guess im just worried we will be trapped to do the same things over and over forever in a deterministic universe. i personally believe in free will but if time really is similar to a flat circle or even a clock i assume it would repeat but like you said thats incredibly far into the future and better not to worry about it

sonic ravine
#

not sure if this is the right place to ask but does anyone have experience with publishing applied maths papers?
and if so what plotting library did you use for graphs
i cant decide between plotly and matplotlib, but im open to other suggestions too

serene scaffold
#

I used matplotlib when I was in academia, and I hate it.

#

the plotly API feels less intentionally unintuitive than the matplotlib one.

sonic ravine
#

thats really helpful thanks a lot <3

carmine nest
# split olive Hello, im trying to make a suspicious activity detection model given as a challe...

You have this project and the accompanying thesis, which is one of the most advanced in the field (the model is under a non-commercial license). It’s up to you to rewrite and train the final model. https://github.com/TeCSAR-UNCC/PoseLift

GitHub

This directory contains the PoseLift dataset published in WACV 2025 conference. - TeCSAR-UNCC/PoseLift

toxic pilot
#

well he aint wrongl AI = 0 because thats the amount of value it provides in that specific usecase

calm cipher
calm cipher
#

But it doesn't make it less clunky to use unfortunately

#

I'm going to have to learn plotly sometime

lapis sequoia
#

šŸ‘€šŸ‘€

split olive
lapis sequoia
#

Are GANS dead? Like, what’s going on? Pump the latent dim into robots so they can learn quicker and let’s make everything automated. Let’s go!

iron basalt
#

I guess I can add the fourth case of lack of software ecosystem (tooling) surrounding the idea.

sour hamlet
#

Hey all, I’m Ali from Code Craft — I made a beginner-friendly Linear Regression tutorial in Python. Excited to learn with you

lapis sequoia
sour hamlet
#

šŸ‘‹ Hey everyone!

I just finished making a short and beginner-friendly tutorial on Linear Regression in Python.

āœ… You’ll learn how to:

Import and work with a clean dataset (included)

Train and visualize a simple linear regression model

Predict house prices using scikit-learn

šŸŽÆ It’s aimed at new ML learners who want a clear, step-by-step walkthrough with no fluff.

šŸ“Š Dataset (CSV): https://drive.google.com/file/d/1rZ5OhntQeJ5gA7WtWFAWbntgwznTx10X/view?usp=sharing
ā–¶ļø Tutorial (14–15 mins): https://www.youtube.com/watch?v=zBk72AV_weg&t=76s

I’d love any feedback or suggestions on what project I should do next. Thanks!

šŸŽÆ Learn how to build your first Linear Regression model in Python and use it to predict house prices — step by step!

In this beginner-friendly tutorial, we’ll cover:
āœ”ļø Importing libraries
āœ”ļø Loading a clean, custom dataset
āœ”ļø Splitting the data into training and test sets
āœ”ļø Training a simple linear regression mode...

ā–¶ Play video
iron basalt
#

The options are also narrowed down by available hardware (although FPGAs exist so you can put this under "too annoying to work with" too).

lapis sequoia
#

I don’t know, it used to be the big thing. But it’s just two NNs fighting. autoencoders are better by a lot.

sour hamlet
iron basalt
# sour hamlet Totally agree, but why is it like that

Computers that we have these days are a miracle of modern science and international cooperation. It's a miracle that we have any, and as many options as we do. Ideally we would have way more types/options though.

#

Then on the software side it's driven by open source, which is driven by a few very motivated individuals (working for free) that are just rare, like 1 in a million rare (combination of skill, obsession and resources (free time mostly)).

sour hamlet
#

I know right, I cant even Imagine how they created stuff like this, but do you want to check out my chanel, I would for you to support

#

I started today, and made my first video

#

It is about LinearRegression

next shard
sour hamlet
next shard
sour hamlet
#

Ok I will try!

#

although I am not that good

#

At tensor

#

But I will give a beginner

#

Tutorial

#

If that is ok

next shard
#

Ok brother

sour hamlet
#

If you want notifications subscribe!

#

You dont have too

serene scaffold
sour hamlet
#

Oh

#

Sorry

serene scaffold
#

Anyway, I don't recommend learning to use tensorflow. It hasn't been popular for several years

#

Just use pytorch

serene dew
#

I think i finally found the "book of everything"

gilded axle
odd stratus
#

does anyone know where i can find just a very basic premade ai network and weights to run on my home computer?
nothing too fancy, just something i can run on my cpu

grand minnow
gentle stone
peak field
#

Can anyone give me some advice on how to detect if a person is looking at their phone? Are there any libraries which could help me achieve that? I found this gazetracking library but its meant for webcams. Ig i could still use it but its not the greatest?

sour hamlet
restive roost
serene scaffold
#

@sour hamlet your message was removed for advertising

gentle stone
sour hamlet
#

I might do a live stream of an Ai "Hackathon"

short moth
#

anyone know how I can read a pdf table that is formatted like this?

#

i tried to use camelot but it gives me this

short moth
#

Ok ill put it in ai

#

yup 🤣

sour hamlet
#

Hey guys\

next shard
#

Hi guys please suggest to me what I should add in my GitHub profile

short moth
#

anyone have recommendation to have a more efficient code here (it works but I don't like creating another temporary dataframe I want it all in one statement):

import pandas as pd
def assign_id(df_,value_list):
    previous_value = ''
    for i in range(len(value_list)):
        if value_list[i] != '':
            previous_value = value_list[i]
        else:
            value_list[i] = previous_value
    return value_list
def tweak_df(df_):
    temp_df = df_.filter(items = [0,2,3,4]).rename(columns = {0:'END_ID', 2:'PART_NUMBER', 3: 'DESCRIPTION', 4: 'QTY'}).drop([0])
    updated_val = assign_id(temp_df, temp_df.END_ID.values)
#

(im basically trying to replace the values in one column based on the previous values before it since some values are empty

serene scaffold
short moth
serene scaffold
short moth
#

so for example if I have:
0 hello world
1
2
3 hello
4

serene scaffold
#

Always represent missing data with NaN and not with anything else.

short moth
#

oh i did not think of tht

#

my missing data is empty strings

#

so i replace ' ' with NaN

#

np.nan

#

and then use ffill

#

very good suggestion haha

#

did not think of it

serene scaffold
#

if the CSV is like ,,3,,, the actual empty strings should be interpreted as NaN
if the csv is like "","",3,"","", you can change the parameters of read_csv to interpret "" as NaN.

short moth
#

im reading pdf's so its different XD

#

pdf tables

serene scaffold
#

that sucks

short moth
#

its pain but I have no other choice

#

all the data at the company i work at is all in pdfs

#

i mean I could convert all the data to csv or excel but it would take me years cuz i would need to do it all by hand

#

its a weird ass and long ass table to parse

#

so using camelot helped me to put it in a more readable form

lapis sequoia
#

Have any of you ever used a transformer for time series? If so, was it encoder-decoder ish?

magic dune
#

classification

#

forecasting?

lapis sequoia
magic dune
lapis sequoia
magic dune
#

@lapis sequoia also it depends on the nature of you data

lapis sequoia
lapis sequoia
magic dune
#

wanna know if its bivariate uni or multi

lapis sequoia
#

A good amount not over blown

magic dune
#

ok ya TFT should be good but I would start with a subset of dataset and experiment

#

with other models aswell

#

are you strictly stuck to a transformer-based model?

lapis sequoia
magic dune
lapis sequoia
magic dune
#

!pip darts

arctic wedgeBOT
#

A python library for easy manipulation and forecasting of time series.

Released on <t:1751200606:D>.

magic dune
#

darts supports the TFT model

#

so you just got to import it

#

and you can play around with it

magic dune
lapis sequoia
lapis sequoia
lapis sequoia
#

I know

magic dune
#

from my experience

#

took me 7 days to training on a simple dataset

#

@lapis sequoia

#

this looks promising for HF based

lapis sequoia
sour hamlet
#

šŸš€ Just Dropped: Build Your Own AI Chatbot in Python (Part 1)
Hey everyone! šŸ‘‹
I just released the first video in a free series where I teach how to build an AI chatbot from scratch using Python.
āœ… No libraries. No shortcuts. Just pure Python and real learning.

šŸ“¹ Watch here: https://www.youtube.com/watch?v=2p9hr53iBYY
šŸ¤– In this part, we build the bot’s brain + memory — and it actually learns from you as you chat!

If you’re into:

Python projects

AI / machine learning

Building real-world tools
…then this is for you. Would love feedback or ideas for future parts!

Let’s build smarter tools together. šŸ’¬

šŸš€ Welcome to Part 1 of this beginner-friendly series where we build an AI chatbot in Python — from scratch!

In this video, you'll learn how to:
āœ… Create a chatbot that responds to known phrases
āœ… Teach the bot new responses during chat
āœ… Store the chatbot's brain and memory using JSON
āœ… Save every conversation and response fo...

ā–¶ Play video
restive roost
#

Or any suggestion for low cost 'receptionist' AI agent?

dense tulip
#

I'm building a SQL Ai agent, And I'm a bit lost

#

Any public repository that can bring me some help?

jaunty helm
fresh sluice
gentle stone
#

Hi I'm back, As I said I will learn this course after finishing previous courses. I'm going to take this course. How's your progress?

cinder onyx
#

Hey, i’m a student really into finance and quant stuff, and I’ve been thinking of starting a project in that space (something hedge fund-ish, infra/research focused). Just wondering if anyone else here might be interested in teaming up — could be a cool opportunity to build/research something legit and learn a ton along the way.
Nothing formal, just seeing who’s out there. feel free to dm!

lapis sequoia
sour hamlet
#

My friends requested it bbecaue they want ot learn about ai and machine laering, although this was not but my next video will be some ai to it

toxic pilot
toxic pilot
#

encoder + decoder is good for seq2seq

jaunty helm
toxic pilot
# magic dune nope

yes right> because they were designed for sequence data, which is essentially a superset of time series

calm cipher
#

there have been variants for visual data but I'm not very familiar with them

buoyant vine
#

that is definitely a stretch of a definition imo šŸ˜…

calm cipher
#

attention doesn't necessarily imply timeseries but it's often used in timeseries data

#

I'm using timeseries and sequential interchangeably

#

actually to drive that point home even further, a lot of language models add positional encoding to the input data, because otherwise the transformer wouldn't know where timesteps exist in the input in relation to one another

#

whereas you'd get some kind of notion of position in a RNN for free just based on how it works

#

but the basic abstraction you use to think about transformers are queries, keys, and values, which has nothing to do with position

#

actually if you want an example of non-sequential attention, check out global style tokens for tacotron

#

they do but the attention layer would not consider position in weighing each step

#

positional encoding allows it to do this

#

right

#

yes exactly

#

it's a question of who thinks it's sequential data

#

without positional encoding, you're interpreting the data as sequential, but the attention layer doesn't care

#

with positional encoding, the attention layer can treat the data as sequential, or at least account for position when computing attention scores

#

this is only indirectly related to the attention talk but I thought it was cool

#

some TTS models perform a small one-dimensional convolution over the input data prior to giving it as input in a RNN

#

so if you're trying to synthesize "Hello", it has the effect of blurring nearby characters into each other

magic dune
calm cipher
#

so instead of considering H, e, l, l, o separately, it would consider H-e, e-l, l-l, l-o together

#

and I think that's neat

#

yeah

#

I think it's loosely related to the n-gram concept from NLP

#

and it works really well with speech synthesis where a character by itself isn't enough context to know how to pronounce it

toxic pilot
jaunty helm
#

like I don't think there's even close to a silver bullet type architecture to time series just because how broad that entails
like the position of a pendulum swinging back and forth on a moving cart could be a time series, and so can the population of rabbits and wolves in an area

tepid bluff
#

yeah

iron basalt
# calm cipher actually to drive that point home even further, a lot of language models add pos...

It's been shown that if you just scale it up enough and train long enough it basically learns to do positional encoding on its own. Having it helps by it not having to waste a bunch of time and resources on learning this though. Interestingly, the thing it learns resembles grid cells (which positional encoding does too) (hinting that this may be a universal solution to positioning problems that biology has also found).

#

(RNNs have been shown to learn the same thing too)

calm cipher
#

I wouldn't be surprised, but it would be based on patterns it finds in the data

#

mathematically any single attention layer has no concept of position or a local space around different timesteps in the input or anything like that

iron basalt
#

(And grid cells it seems)

lime grove
#

does anyone know if cuda 12.9 plays nice with PyTorch?

calm cipher
lime grove
#

which version torch?

calm cipher
#

whatever the latest is

lime grove
#

what GPU are you using?

calm cipher
# lime grove what GPU are you using?

actually to clarify I have 12.9.1 installed at the system level, but the cuda in my virtual environment is 12.6.4.1, but they are working fine together

#

it's on a 3090

lime grove
#

your PyTorch is using the 12.6?

calm cipher
#

it's whatever the PyTorch module has listed as a dependency

#

I wouldn't be surprised if it takes many more than one block

#

this paper tested it on models with 125 million to 1.3 billion weights

#

they saw it learning positions within 4 layers

lime grove
calm cipher
#

hm I guess it must be 12.6 but it is not incompatible with a system installation of 12.9

#

you might want to disregard what I said if you're intending to run it with 12.9, although I don't know if it's compatible with the current stable version

lime grove
#

sounds like you're not sure of anything

calm cipher
#

right, like i said, disregard what I said before

magic dune
thorny zealot
#

book or online training to learn code ?

serene scaffold
arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

brisk cave
#

Hey guys!

My name is Romie and I am an aspiring Data Scientist!
No formal schooling and currently hold a position as a data analyzer!
My company is small and no one has degrees. The data literacy is very limited so I'm traveling through uncharted territory.

I just started Learning Python and I am great need of a mentor or someone to point me in the right direction. My co-workers have no interest in data, so most of my conversations are diluted to fit my audience.

Someone '@ me or message me. I'm hungry to learn.

What should I learn first. Mathematics? Fundamentals of Language? Someone point me.

Thanks!

Romie

sour hamlet
#

šŸš€ New Video: Smarter Python Chatbot That Understands Typos (Part 2)
Hey everyone! šŸ‘‹

Just dropped Part 2 of my free YouTube course where we’re building an AI chatbot in Python — this time, it gets smarter.

āœ… Now it can understand fuzzy input — like if someone types "helo" instead of "hello," it still gets it right.
This is where the AI magic starts!

šŸ“¹ Watch here: https://youtu.be/gI4ftsWQSXk
🧠 Built with basic Python + fuzzy matching — no fancy libraries needed.

If you're into AI, Python projects, or building tools that learn, check it out and let me know what you'd like to see in Part 3!

In this second part of our free AI Chatbot course, we take our basic chatbot to the next level — by teaching it to understand fuzzy input!

That means: if a user types ā€œheloā€ instead of ā€œhello,ā€ the bot will still know what you meant. 🧠

āœ… What you'll learn in this video:
• How to use difflib.get_close_matches() for fuzzy ...

ā–¶ Play video
lime grove
#

getting nvidia to install is fucking tedious

#

you'd imagine that a trillion dollar corp wouldn't be so inept

#

yeah

#

to be fair, Microshit is worse

#

and it has been worse for decades

#

like, why the fuck do I have to iterate over goddamn visual studio versions while installing nvidia?

#

slow mode

#

it has a "feature" called Nsight, and there are versioning issues. Added to that are whatever mysteries that happen in your own machine (GPU versions, which OS, etc)

#

Anthropic has an excuse. It is barely 4 years old

#

I just want a local CUDA for PyTorch (another versioning headache)

lime grove
#

done 🫔

#

had to install the cuda toolkit 12.9, then find out that Pytorch only likes 12.8, so uninstall, restart, redownload 12.8, install & discover that it hanged on the Nsights stuff.

#

so, I had to then wipe the computer clean of nvidia 12.8, Visual Studio 2022, then reinstall the same fucking Visual Studio version and nvidia version and then it worked

#

like WTF man

#

there is an Nvidia app that doesn't seem to work unless you have visual studio installed. The whole thing is bonkers

#

so, after installing 20 GB of Visual Studio, another 4 GB of nvidia, I can write a 3 line python script! Woohoo!

wooden sail
#

probably the visual studio c++ redistributables?

#

ah just noticed the message is pretty old

lime grove
#

not sure, tbh

#

these packages are so sprawling and over extended that it is probably unknowable

last crag
#

Hi does anyone happen to have a prompt script for sending prompts to a LM Studio llm with vectors from a qdrant database thats working? I cannot seem to get mine working after days of trying...

eternal mist
#

Hey from where should I learn numpy library

#

Thanks mate

old rivet
#

Hey everyone, I’m a Biochemistry grad diving into ML for bioinformatics. Just shared two Jupyter Notebook projects analyzing ALAS1 gene expression (1000-row Bgee RNA-Seq dataset) on GitHub (MIT License):

ALAS1 Expression Analyzer: Normalizes expression & read counts to compute Expression_Combined, with stats and histogram.
ALAS1 Read Count Visualizer: Plots trends over time with linear regression.

šŸ”— Code: https://github.com/Hi-Script/biochemistry_ml_mini_projects
Why: ALAS1 is key for heme biosynthesis, prepping for ML like LSTM.
Feedback: Ideas for ML models or code improvements?

GitHub

Jupyter Notebook projects analysing ALAS1 gene expression and read counts for bioinformatics. - Hi-Script/biochemistry_ml_mini_projects

lime grove
#

@old rivet I don't understand what that trend line is supposed to be trending on

#

like, is it appropriate to toss out those apparent outliers (like the one around 150 minutes)?

old rivet
lime grove
#

you could project the points to the y-axis, and thus build a sort of probability mass function. From there assume some sort of distribution, and remove outliers. But, that is just clumsy, surely they mean something physical?

#

I am not a biochemist. But from a pure time series perspective, that trendline is somewhat problematic. It doesn't seem to have any meaning

#

torch success after last night's hair pulling frustrations:

#

pretty cool that pip3 install ing pytorch brings in mpmath, arbitrary precision floating point computation. Also, sympy

old rivet
# lime grove I am not a biochemist. But from a pure time series perspective, that trendline i...

Thanks thanks for this, an outlier at ~150 minutes, which I kept after IQR analysis, suspecting it’s biologically meaningful (ALAS1 is key in heme biosynthesis). I want to:

  1. Improve trend analysis (e.g., handle outliers better).
  2. Predict future h.ALAS1_Combined values (time-series forecasting).
  3. Classify strains by expression patterns (e.g., high vs. low).

ML Ideas: I’m exploring:

  • Robust regression ( Huber) to reduce outlier impact.
  • LSTM for forecasting ALAS1_Combined over Time_min.
  • K-means clustering to group strains by expression.

Any suggestions for robust ML models or preprocessing? better ways to handle outliers or prep data for LSTM/clustering?

lime grove
#

forecasting time series is probably best done via ARIMA (classical, i.e.) approaches. I am not sure LSTM is interpretable enough. What do you plan using K-means?

#

But, again, not a biochemist.

safe agate
lavish wraith
#

Have use sql in data science or mostly use pandas ??

lime grove
#

SQL is mandatory

barren path
#

Bro can any one tell how much math is required for data science

daring aurora
#

I'm 14 rn and I really interesting in coding but I'm so scared and overthinking because of the ai, what do I do T-T

proven pier
#

I have a project in mind that should "leverage" ai, but I feel is more of a systems progarmming problem than anything.

#

I simply want to detect, through optical character recognition, the XYZ coordinates of a minimap. The minimap is not always on the screen, however, so I will need to also detect when the minimap is available

#

Will I need to manually take a lot of static images, and label them for where the minimap is and if it is available? What sort of process do I need to follow to have a classification/identification network for this?

broken stirrup
#

hey, i have a task and i dont know how shall i proceed
i have 700+ tables having polution related data, i stored them in bigquery and made excel modules to fetch them.
now i want if user give me a query like
tell me co2 emission of aniak electricity plant , then the source should be searched for in the tables.

i have made a graph rag for that (made metadata of tables and stored in json) and fed it to gemini and asking for response as in "which dataset u think this can be present"

but gemini response is very inaccurate, is there a way to make it accurate or is there any other way for me to get the source info

proven pier
#

Dont take in everything at once. You need to learn how to program first

#

You didn't study calculus or geometry when you started school. You began with the basics, to give your mind context and a framework of logic to reason with. That will aid you in understanding more complex topics in the future

spring field
broken stirrup
#

let me send u the prompt

#

i think my metadata is abit loose but still im not sure

#

and i have these categorized based on country -> material -> sector

#

1.5 flash

#

uhm but wont that be time taking, as i may have 100s of inputs

#

but my approach is fine right?

#

i mean, it cant go more optimal

#

but how would i know which dataset my source input is

#

for example if someone say
tell me co2 emission caused because of crops in france

then my llm will see the metadata -> country (france)-> datasets in france -> oh its a crop (agribalyse dataset) -> give output as = your source can be in 'agribalyse'

#

then i perform normal fuzzy match in this dataset and show user the co2 emission

#

yes bcz i cant see each source in all tables as i have 700+ im using llm to narrow down it to one or two tables

#

so if i give wider description of all datasets and use a reasoning model, i can expect better results

#

is there any better approach or this llm thing sounds good

#

oh

#

😭 im short on tokens too or else i could have used multiple prompts

errant bison
#

Gemini?

broken stirrup
#

im not so worried about the data as im using openly available ones but still ill take care of it

#

thank you for your help

woven prairie
#

Has anyone worked on any application

#

Like which can do data analysis, and can do forecasting on the user data

woven prairie
#

What was the flow of your project

#

I have to make one , can you guide me ?

errant idol
#

hey everyone

#

i am into learning ML so i wonder if someone can assist me

#

like how to start and what should i learn is there any resources you recommand

marble furnace
#

i was going to say the exact thing aswell

errant idol
#

would you like if we start together ?

marble furnace
#

well im pretty much as lost as you but i dont mind at all!

solar thistle
calm cipher
#

oh nice! was that with a RNN with a transformer on top, or only transformers?

#

i'd be really curious to see the weights either way

#

and if you had to add positional encoding or it's working without it

#

also curious if you think two heads are required for this or if only one is necessary

solar thistle
#

uhhh

#

Its a transformer on top of a bi directional GRU classifier with like a little extra custom classafier injected in

#

idk if its necessary to have 2 really, but it helped a lot

#

it was getting caught up on palindromes made up of palindromes, and that seemed to help

#

like for example "wee kek kek kek eew"

frail meteor
solar thistle
calm cipher
#

ah I see, I guess the GRU is learning some kind of position

solar thistle
#

I did give it this though

#

which like is kind of cheating but not really

#
def char_reflection_score(s):
    """Calculate symmetry score based on character reflection."""
    s = ''.join(c for c in s.lower() if c in string.ascii_lowercase)
    n = len(s)
    half = n // 2

    if n % 2 == 0:
        left = s[:half]
        right = s[half:]
    else:
        left = s[:half]
        right = s[half+1:]

    right = right[::-1]

    def embed(c):
        return (ord(c) - 97 - 13) / 13  # zero-centered

    vec1 = np.fromiter(map(embed, left), dtype=float)
    vec2 = np.fromiter(map(embed, right), dtype=float)

    diff = vec1 - vec2
    return 1.0 - np.mean(np.abs(diff))  # 1.0 = perfect symmetry```
#

it just splits a string basically, encodes it to a vector and then linear normalization

frail meteor
solar thistle
#

i just wanted to explore ML, ive never done it before

#

so i trained a model to identify palindromes

#

I was joking with sdomeone I should write it in JS and release it as an NPM package lel

frail meteor
solar thistle
#

oh im done now its 100% accurate lol. Can only beat this horse so much

frail meteor
solar thistle
#

What’s that mean

#

I’m an electrician lol

frail meteor
solar thistle
#

I do wanna play with RAGs and LLMs though

frail meteor
solar thistle
#

Retrieval augmented generation. Like you can give it structured data that gets kinda injected into the prompt. It’d be cool to datastruct the whole electrical code book.

frail meteor
#

Based on on logistic regression

solar thistle
#

What’s it ID

frail meteor
#

Im trying to restructure it.. And I dont have server for 24/7

#

I also think if I have to split into 3Ɨ3 segments, rather than 2Ɨ2

shadow viper
#

hey guys, hows it going?

#

please anyone have an idea of how much it'd cost to build a chat bot?
i'm not asking for a job or offering one, i just want to confirm something

serene scaffold
shadow viper
shadow viper
#

god why don't i know the marketing part of tech

frail meteor
spark kiln
#

hi guys does anyone have a good/decent resource to learn ai? im kind of struggling rn

serene scaffold
foggy jay
#

Hey can anyone suggest me for numpy reference documentation, which I should use

#

Or some resource which I should use

serene scaffold
foggy jay
#

I am in data science field

woven prairie
#

How things were working , was your input and how you were getting the output

frigid crane
#

how does one get into nlp

steep raft
#

guys,is 3blue1brown's website down?

#

@hot obsidian

shadow viper
shadow viper
lapis sequoia
#

I wanted to build a chat bot for discord just as practice since I’m new to python. Would you guys recommend that?

serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
calm cipher
#

This is a cool project but I don't understand how they're fixing the problem of reward hacking they describe in section 3.1 of the paper

#

I guess the idea is that they're qualitatively evaluating the architecture in addition to mazimizing performance, but that would just mean they're cherry-picking models that do well and look nice as opposed to just cherry-picking models that do well

#

something pretty easy to do would have been to hold out a few benchmarks that aren't part of the fitness function to show that the model generalizes as opposed to just optimizing for all the benchmarks

calm cipher
#

they're evaluating models on the training loss? wtf

frail meteor
#

I like it

#

Analysis for all life cases.. How do you input the data?

#

How do you guys keep your servers alive for 24/7?

#

Keep your desktop work for 24/7?

#

You use your desktop as server?

#

Ohh

#

How much do u pay?

#

Ohh.. Okok I see

#

If it works, you can make huge staff based on that data

#

Yes, we can discuss it on pm

#

I mean you can implement ml with it

#

For different apps

#

Are you good with the frontend programming?

#

Something like telegram bots?

#

Have you heard of telegram?

#

You can make a bot for telegram

#

Ohh okok

#

You write them from scratch or using specific libs?

#

Lmao okšŸ˜‚

#

Yeah, in terms of bot creation the only one I made was in telegram

#

Yet

#

I like itšŸ˜‚šŸ‘

#

Good luck, man. Ill be back little later

inner hemlock
#

no it's actually community made

marsh sage
#

Can I share Logos

calm cipher
#

Against my better judgement I spent most of today looking into their results, lol

#

I have a lot of issues with the methodology of the paper but I put some visualizations together based on the lineage of different models

#

So first off this thing generated 1771 models and only 106 of them were actually selected as being good based on some iffy criteria

#

the vast majority of generated models performed worse than their parent

#

here's a visualization that shows the lineage of each model along with its score

#

it looks to me like this is maybe more akin to a genetic model that's just throwing stuff at the wall and seeing what sticks than anything that's actually making intelligent decisions about what to try next

#

Here's a better version of the lineage visualization, no change to the data, just better axis labels

small wedge
calm cipher
#

there are a couple of things

#

all qualitative measuremeants of the model are done with LLMs

#

the qualitative LLM scores given to their models don't seem to be explained or explored in any depth and I don't know how it compares to something like a human rater, or code complexity measures, or anything that might show how it compares to other ways of evaluating code

#

if I take the language they use in the paper literally, they seem to be using the training loss rather than a validation loss when computing the score, which is useless

#

they don't use statistical tests to show if their results are meaningfully different from the human baselines

#

they only give a consolidated score for the 1,771 candidate models, so it's difficult to evaluate how the model performance changed as they adjusted the number of weights and training data size

#

it appears that some models might have performed worse as they scaled them up, which suggests that improvements are more due to random noise than actual improvements in the model

#

none of the training code or code that computes metrics are available in the repo

#

lol I think that's it

small wedge
#

hmm yeah that's not ideal

#

where are you seeing them getting worse as they scale?