#data-science-and-ml
1 messages · Page 181 of 1
ollama is kind of sketch
a large part of it is built on llamacpp, but ollama really doesn't want to talk about it, and have worded it in the past like features from llamacpp were done by ollama
sometimes they also try to be the first and ship kinda broken code, iirc gpt-oss for them was a lot more inefficient on ollama than on llamacpp
I'd just use run llama-cpp's openai-compatible server and send requests to it
tera or peta bytes of data vs RL fine tuning, who wins lmao
Interesting...
post training can do a lot though
I don't think they'd equally weighed for example
and stuff like safety alignment is also done in post training to my knowledge, which has given us models which are very resistant to attempts at breaking it (even with more finetuning)
i have no idea, but apparently even tho they are fine tuning tool usage through JSON a model like gpt 5 still performs better on yaml
must be something to it
yeah... no idea from me either, never done this type of training/tuning myself
how do I run the new qwen3.5-9B with pytorch or transformers?
yay scares me...im too afraid i brick my system lol
then how do you install your python libs on arch linux?
raw pytorch probably not, could look into transformers
they prob have instructions on how to do that on the model page tho I havent checked
on the hf page it says that you can use transformers serve
Im currently setting it up, but I guess you have to use yay inevtiably? idk even how libs work yet in arch lol
how do you install transformers serve on arch linux though
I think they meant you pip install transformers[serving] and then run transformers serve ...
you can't use pip install on arch linux
in venv you can eh?
or maybe you can
but it returns this ```agnulo -> pip install
error: externally-managed-environment
× This environment is externally managed
╰─> To install Python packages system-wide, try 'pacman -S
python-xyz', where xyz is the package you are trying to
install.
If you wish to install a non-Arch-packaged Python package,
create a virtual environment using 'python -m venv path/to/venv'.
Then use path/to/venv/bin/python and path/to/venv/bin/pip.
If you wish to install a non-Arch packaged Python application,
it may be easiest to use 'pipx install xyz', which will manage a
virtual environment for you. Make sure you have python-pipx
installed via pacman.```
making a venv for your project and installing it in there is prob a good idea
i havent verified it yet, but i imagine it ought to work?
venv is virtual environment?
yes
how do I do that?
check if you have a venv running in shell, if not, python -m venv venv for example
then activate from bin
I don't know what these big words mean 😵💫
are you running an ide or bash?
I use zsh
you can see the docs for more detail, but yeah in a nutshell go into your project directory, run python -m venv ./.venv and your venv will be stored in .venv
then depending on your shell run one of these commands, after which your terminal will be in the venv, then you can pip install
literally my order
when I install transformers serve to the venv, will it also install on the normal environment
no
the idea of venv is to isolate the packages into their own little environments so it doesn't mess up other things
normal as in global, no, thats the entire point
zsh: no matches found: transformers[serving]```
why doesn't it work?
hm
maybe it needs the exact command
pip install "transformers[serving] @ git+https://github.com/huggingface/transformers.git@main" is what it says on the qwen page
oh yeah thanks
sudo pacman -Syu
sudo pacman -S python python-pip python-virtualenv
need to setup env too for pip
should work then
(just verified myself)
actually should've asked first but do you have enough vram to run the 9b or whichever one you're trying to run?
I have 16gb of vram so yeah probably
do I need to reinstall rocm or pytorch-rocm on the venv?
check if its in there?
uhh... maybe not
my guess is you need at least 18gb to run the full prec 9b model
looks like you can quantize when you serve though
not sure if venv pulls from global excl or if its custom
you can just pip it anyway and test
yes I want to quantize
uh I got the error:
Could not install packages due to an OSError: [Errno 122] Disk quota exceeded
my disk is not full:
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p3 884G 432G 407G 52% /
devtmpfs 3.8G 0 3.8G 0% /dev
tmpfs 3.8G 76M 3.8G 2% /dev/shm
efivarfs 256K 126K 126K 50% /sys/firmware/efi/efivars
tmpfs 1.6G 1.6M 1.6G 1% /run
tmpfs 3.8G 17M 3.8G 1% /tmp
/dev/nvme0n1p1 1022M 74M 949M 8% /boot
tmpfs 778M 84K 778M 1% /run/user/1000
none 1.0M 0 1.0M 0% /run/credentials/systemd-journald.service
agnulo -> ```
0 idea on that
I'm on windows
well no because I'm on / obviously
wasnt obv to me 🥹
sorry 🥺
hmm kinda tricky problem, seems it may be a lot of things
@knotty raven
im reading some obscure stuff about inodes saying to run: df -hi and see if youre capped? this is a bit beyond me tbh
I have no idea what IUse isagnulo -> df -hi Filesystem Inodes IUsed IFree IUse% Mounted on /dev/nvme0n1p3 57M 777K 56M 2% / devtmpfs 966K 680 965K 1% /dev tmpfs 972K 86 972K 1% /dev/shm efivarfs 0 0 0 - /sys/firmware/efi/efivars tmpfs 800K 1.2K 799K 1% /run tmpfs 1.0M 7.3K 1017K 1% /tmp /dev/nvme0n1p1 0 0 0 - /boot tmpfs 195K 100 195K 1% /run/user/1000 none 1.0K 1 1023 1% /run/credentials/systemd-journald.service agnulo ->
I just cleared cache and the error persists
sudo pacman -Syu and reboot, activate venv again and pip it again id say
what happened?
I rebooted and now my WM doesn't work and I have no internet
youre on your phone or other device, and your machine is in tty?
might run something lik: journalctl -b -p warning and see if you have some major issues
okay so I don't have internet because of NerworkManager
ah gotta config that
didnt you do that during installation?
maybe you wrote it on your iso 😛
NetoworkManager.service: Job NetworkManager.service/start failed with result 'dependency'
instead of chroot
gotta do it manually then i guess
because of D-Bus
did you archinstall or manual?
i screwed it up first time too lol
you should be able to troubleshoot it from tty though
it's not my first time
but I'm so confused
it worked for months til now
I need to chroot probably
boot from usb again and set it up i guess?
after lots of troubleshooting
it seems like I'm fucked 😎
i am trying to do logistic regression with multiple variable using scikit-learn any videos that can help me?
We need to try RAG
What would happen if an AI was trained on a world war II data set like music recipes etc
what would you want to happen?
what do you mean by "an AI"?
trained to do... what?
You can't train anything without a desired outcome to select for.
Every kind of machine learning requires a reward system. Otherwise how do you know what behavior to reinforce?
You could train a generative audio model on a dataset of 1940s music, for example - that has a specific goal which can be rewarded, "produce data which is recognizable as being 1940s era music."
A lm
My dad needed my help to get pellets
So generative models are basically constructed on the concept of de-noising
Basically your training set is composed of some kind of informational artifact, like an image or audio file or text or w/e
And you mess it up, then train a network to reproduce the original from the messed up input.
And then you can feed new inputs in to generate new things
So if you have a training set composed entirely of textual documents from the 1940s, then you can train a model to produce new outputs which are statistically similar to textual documents from that era.
Pretty much simple as
ingredents:
- 2 tbsp butter
- 2 tbsp flour
- 1 cup milk or cream
- 1 tsp salt
- 0.5 tsp black pepper
- 0.5 tsp mustard (optional,for flavor)
- 1 cooked chicken, slice
instructions:
1. In a saucepan, melt the butter over medium heat.
2. Stir in the flour to make a smooth roux, cooking for 1–2 minutes.
3. Gradually whisk in the milk or cream until smooth and slightly thickened.
4. Add salt, pepper, and mustard if using.
5. Pour over the cooked chicken, or return the chicken to the pan and simmer briefly so it’s coated with the sauce.```
here is one file i have as a explination
If you train a model using recipes from the 1940s, the outputs of the model will statistically resemble recipes from the given time period/culture.
I have a lot of files to write
A language model is just reproducing patterns in text it has consumed. So if it's consumed texts like this, it'll output texts with similar patterns. E.g. it will favor ingredients which were commonly used during that period.
do I have a good start
I do have music not as audio files but text files that use every bit of data that's available
I'm sure you could train a model on this kind of input if you have enough of it. If you're not very familiar with neural networks you should probably do something a bit more basic at first
But yeah there are tons of archived texts from the ww2 period, you could definitely produce a pretty good data set for training I think
A good starting point might be using a pre-designed architecture but throwing the pre-trained weights away and training new ones
could also try a technique called transfer learning
I've worked with CNN
convolutional neural network
I haven't worked with videos because videos take a lot of memory for my computer
But I understand models I know I can use a pre-trained model but I tried one they didn't come on I have to pick her up then I don't understand how hard it is to make a data set if I can't find it you know on any type of site
Do you think that is too much to try to unpack
For my current dataset I have
5 songs from 1939
2 sauce recipes 1939
Between those 7 files I have 204 lines of text I going to get more
Do you think that's a good amount or do you think I might need a lot more
training what model? from scratch or pretrained
either way probably not really enough if you want something more or less reasonable out of it
just depends on how much not enough
from scratch
How much would be needed for a generative model anyway
GPT 3 was trained on hundreds of billions of words
GPT 2 was not very good
so having 204 lines of text is tantamount to absolutely nothing
unless you're fine with the model outputting "the" forever, lol
yeah I'd be looking for like samples from major historical archives
You need a lot of data
If you want to generate recipes, 2 recipes is 2 input samples, not 204 input samples.
I'm going for music and food of the time first I am going to go for historical events I'm just trying to get the culture of that time period heck I was working on finding recipes from that time period
I trained a pretty basic CNN and used in the hundreds of thousands of training images just for pretty simple categorization
I got a chicken ala king recipe I should really be counting how many lines i have
I feel like we just hit an inflection point. Jack Dorsey laid off 4k employees because 1 entity "claude code" took the job. Fifty percent of his company... He even said most companies will do the same in within a year. Companies dont even need to justify laying us off now, lol.
Dont worry though DJT is at the helm. 
Hi i am new here
welcome!
only true for diffusion models (and flow matching with a gaussian prior)
"we're not broke, we're more efficient cuz of AI"
I can only imagine the pressure academia these days is facing. How far are we from they replacing professors?
Or... will man made goods and human articulated sessions just become more expensive and for the rich?
I think most of them wouldnt mind have that part of their job automated
academia has so many bad profs cuz a lot of them just wanted to be researchers and have no vocation for teaching
Eh, today after -Syu/boot I also had a internet problem due to faulty firmware/bridge. What exactly was your problem? Was it unable to change powerstate, faulty bridge?
My problem was with D-Bus
You really really don't want D-Bus to fail
Ah, did you manage to recover?
nope
I ended up installing debian
how do you install rocm on debian though?
Guyzz 😔
Oh nice, I have Debian 13 running
Guys, I haven’t been able to impress my crush for a long time 😔
is this about data science?
Sorry bro but I want to impress her by programming but I can't
He has been around each sub talking about his crush and the task he wants
each sub?
Hey guys, i just started with dsa ... I practically have 0 knowledge about data science or ai but I want to grow fast..
So i joined kaggle and saw it has more of a practical approach, like learning the basics and then doing titanic competitions or so...
On the other hand I have a course i brought which is pretty well rated and covers most of the topics in details ...
I am learning from the course and solving the introductory competitions from kaggle and learning as I make model for the competitions while using ai to learn from while I make the model
Is that a good approach.. If you read this msg till end.. Thank you for your time
what's your goal for learning about data science or AI?
I am in cse 2nd year.. now my main aim is to be industry ready actually... I am pretty good at maths and studies as a overall... I also aim to make products that simplify life and is cool
Glad to known 😌
Any suggestion or direction
if you go in the right way it's pretty easy
some people will learning things useless, care about that
Could you suggest me any roadmap
Guys Im trying to make an LSTM but when I add layers it gets incredibly worse. The loss doesnt go down at all and it ends up sucking. Anyone know how to fix this?
Transform a python code to apk is hard. But using Kivy and Buildozer is more easy! I'll transform my first code to apk
did you do a time series?
start with this book 'hands-on machine learning with scikit learn, tensorflow and keras'
Thank you for that, will give it a read
Hello Ive been trying to get into small Neural networks I am really interested in learning and messing with the perceptron algorithms does anyone have some good documentation that maybe i havent found ?
After doing some more research and asking AI on how to describe and explain perceptron to me I ended up with these notes
Click here to see this code in our pastebin.
What is it?
What's your opinion of zero shot forecasting for time series? Many articles claimed that it's better than ARIMA that needs fine tuning
There's been a lot of different ones like TimesFM, Reverso, and Chronic
I've not had much experience using them, so I only have observations of what others have done
in general, foundational models (FMs) that claim zero-shot across datasets are hyped a lot, yet at the same time you'll find people or other research, say this I just found, showing that they don't perform 'great'
though I suppose it's easy to just plug them in and get some good enough results
For me personally I got ok results?
that specific paper means "not great" as in, if you actually investigate your data, you can usually beat these foundational models with smaller models and/or total dataset size
doesn't say that they outright do not work in general
I would say it's performance is relatively good
well great, if it works then keep using it, no need to think that hard
From online it does decent too
What models have you used? Mainly ARIMA?
Or some type of machine learning model for time series
simply detrending by linear reg + a gbm on those residuals has worked many times for me
I see
You use R for this or do you use sktime/ Darts time series? I mentioned Darts personally
Sktime is kinda hard to use for me
neither honestly, darts I just havent gotten to it
sktime was not a good time last time I touched it
for one its processing speed is completely horrendeous for seemingly simple tasks; somehow padding and cutting all series to the same fixed length takes minutes when in polars it took 2 seconds
integration with polars isn't great either and/or poorly explained; for example only by experimentation did I find that to make sktime recognize which is the time column, I had to prefix the column with __index__
Nice a fellow polars fan
ig one issue I often run into when doing ts with gbms is memory usage
tons of variables should be 'trivial' (like, lags are just, look x rows behind!) yet you have to duplicate then shift the data into a new column for the regressors to work anyway
feels like there should and could be some library that doesn't have to do this and thus saves lots of mem
but to my knowledge said library doesnt exist
What libraries do you use ?
For time series just statsmodels?
Actually statsmodels can't handle polars either
just raw scikit and manual stuff
or pytorch or one of its abstractions if using nn's ig, do the processing in polars if needed then switch to tensors
Darts handles covariates pretty well
I'll try it next time a ts comes up
I think Darts does it for you where you don't have to manually shift it. The only tricky thing is you have to remember what you want with the input chunk length and the output chunk length
I think sktime had lag transforms too, not sure about its memory use, but like the speed was just really unbearable for a lot of things that should be fast?
oh yeah and I think I also tried to get its catch22 to be fast, in the end iirc the parallel processing options straight up dont work or something and I had to install a different catch22 library that sktime can then use instead of its native impl or smthn
I have the same problem where sktime lags are hard to understand tbh
from darts.models import RegressionModel
regr_model = RegressionModel(lags=None,
lags_past_covariates=[-5, -4, -3, -2, -1],
lags_future_covariates=[-4, -3, -2, -1, 0])
regr_model.fit(flow_train,
past_covariates=melting,
future_covariates=rainfalls)
eval_model(regr_model)
For me you just input the lags you want and Darts I guess does it for you?
You just have to know like what exogenous variables you are dealing with
Can someone help me? https://github.com/tevoshw/machine-learning/blob/main/src/projects/project_17_stock_part_2/eda_model.ipynb
In this project i have a higher R2 and lower MAE, but I think the model have data leakage, can someone help to fix that?
Hello, quick question, when it comes to linear regression, is MAE, mean absolute error enough to describe the model behaviour?
For e.g, I have used the housing californian dataset to train a predictive model to predict house prices. Would MAE be enough? How can I know that pls
You can try also the r2, rmse, mse and more
in my opinion only look to one metrics it’s bad
i prefer to use 2+
why do we have so many metrics? I mean why can't we use only MAE for instance
for example, is it the same if you were off by 100k dollars on a 10k dollar house, vs a 10m dollar house?
on a 10k dollar house, it would be a very very bad prediction since we would be off by 1000%
let’s imagine the model errors the value of a 10 milion house in 4k, the error are to small and the model things are everything ok
Now other house of 10k (kitnet) and the model errors it’s again 4k, the error are be to big, and the model will don’t understand what’s wrong
There’s differents situations, for outliers and model can adjust the weights, and the accuracy for the real data are totally wrong
so we can choose another metrics to “ignore” outliers and only see tue tru data
there are many of examples and situations for that
Yes
Whats that?
A time series Forecasting library
Darts time series forecasting
Oh
Well I mean im using pytorch
That should be pretty good
Darts has pytorch models but the API is very user friendly. It has LSTMs
I mean the problem isnt that I cant make an LSTM
Its that when I add layers to it it sorta breaks
Interesting, maybe it's how you set it up.
For Darts they implement a standard vanilla LSTM I suppose
Do you guys think pressure conditioning scales with task complexity?
i thought this was pretty remarkable and wanted to share it
explain more that
So this is the idea. We know LLMs preform inconsistently on hard task. We seem stuck on better models, more compute and longer CoT, but what if you just change the stakes? So I made an experiment where i injected three types of pressure into the systems prompt at inference., economic (your budget is limited), environmental (errors have real consequences), and competitive (you're being benchmarked against other systems). Just context framing. I did 200 trials across 8 conditions on SWE-bench Verified. Triad condition hit +77% relative improvement over baseline but the most interesting part was find the scaling realationship. The harder task benefited more from the pressure and it was predictable. And I can measure this task complexity with a formula called UCF |Φ| i made, but you can use it before hande and see if pressure it even wroth applying.
Im validating on a GAIA benchmar which is running now, but so far its looking really good.
Its gonna take some time but ill share the visual
this might be repetitve but is data science at risk of being taken by AI?
everything is my friend. Im a UPS driver and my company is intergrated AI so hard, they are preparing the future for zero critical thinking or even a human..
so very harsh takeover?
Well, no. We're Teamsters. So they can't just axe us (thank god for union protections) but its a different battlefield these days then it was 10-20-50 years ago
Unions are something we all need to gravitate towards for our own protections from incoming AI and greedy companies.
honestly i would love for this sutpid AI bubble to burst, AI isnt even good for most cases and AGI is very unlikely I feel
Unfortunally, its not going to. As you can see its growing and getting better. So much a CEO axed 4k employees because Claude code took over the job
wait really?
Twitter??
tbh makes sense since the platform is full of slop
Ex Twitter CEO.
Twitter doesnt exist. Elon Musk bought it, hence the kitchen sink video.
He made a new company apparrrently and already axed 4k people
oh shit i thought it was twitter
No , twitter is now X. Twitter doesnt exist anymore but he did invent it
i guess data science isnt gonna last?
It will last, but on much smaller scale.
wdym?
AI will do most of the work, but a few high eductated people i imagine will oversee operations.
not a good career path then?
No, dont let me steer you wrong. You should learn it never the less.
im honestly interested in it
I guess the most important I said in all this is "Unions". If the country or the rich wont protect us, then we need ourselves.
We can really all have good jobs if we actually unite.
what are unions exactly? ive heard of them but never really understood them
They have been around a long time in American history. But they actually phyically fought with managment a century ago for fair worker rights. It was a bloody past, but they laid the ground work for the rest of us., You should look into it.
Let me see if I can get you a starting point.
Union violence in the United States is physical force intended to harm employers, managers, replacement workers, union abstainers, sympathizers of the prior groups, or their families. On various occasions violence has been committed by unions or union members during labor disputes in the United States. When union violence has occurred, it has fr...
hold on lol wrong on
In my opinion, the big techs plant fear in the society to have more gains and “control”, idk if you understand my pov
so engineered fear you mean?
I agree via lobbyist I imagine. Whatever brings in the most profits
in parts yes, what i want to say it’s most people think like you “maybe im gonna be taken by AI”
Just go all in
dont think to much
most people in the world are poor and dont have access to this technology.
even despite the risks people talk about?
just see the devs on X, saying about that, the money condition don’t take off the fear
risks have in all, i just trying to say for you, don’t have fear to be taken by AI
cuz if you already have this in your mind, you cannot achieve your best
the fear will control you
i say for my own experience
im not afraid of going all in i just dont want it to go to waste
id still do it but perosnally would like a bit more clarity
nothing is wasted, everything becomes an experience and good times
in technical terms saying to you, data science are in the beat moment for me
with the grow of AI big techs there are a many of “new jobs” for data created, and so many different experiences that a year doesn’t exist
i think thats a good way that always will need a good professional behind
ok thanks
I made a section for usage within AI. Please check out doc
Im testing right now. Mind you this a local model. Unsloth/qwen3.5-35b-a3b@q5_k_xl
I'm limited here Im on a single 4090 but i can only imagine with a bigger model
16 to 24 t/ks
Is it possible to use LM Studio for a discord chat bot?
chatgpt api
yeah but you only get a couple requests
it costs money for more
And im using a more uncensored model because i wwant it to be a fun bot
well tbh anything with unlimited request would cost money
and a free AI api would probably be weak
could always make ur own api with open source models
but you would probably need a dedicated machine to run it
Device name LAPTOP-RSNUQGVQ
Processor 13th Gen Intel(R) Core(TM) i7-13650HX (2.60 GHz)
Installed RAM 24.0 GB (23.7 GB usable)
Device ID CF0C74A1-3DA6-466F-940C-46A123517B61
Product ID 00342-21498-02091-AAOEM
System type 64-bit operating system, x64-based processor
Pen and touch No pen or touch input is available for this display
has 14 cores
with rtx 5050
yeah thatll do it
I'm steering a model but its personality is so volatile
I change one word and the model is a totally different person
I'm trying to make a gen alpha LLM
is there any data analyst guy who can guide me for projects??
If I want to prep for AI ML engineer roles. Is it worth grinding leetcode?
How else can I upskill myself? I have a masters in AI
I would also like to know, as I am finishing my undergraduate degree and about to begin a master's degree in AI.
-# skibidy bam bam bam
!warn @grim acorn your message was removed for soliciting donations
:incoming_envelope: :ok_hand: applied warning to @grim acorn.
Hello, question, is handling of outliers something we do in data preprocessing section or it's more in the eda?
Like in data preprocessing we handle duplicate and null values.
Then in eda, we see trends etc, then handle outliers?
Seeking Local Alternatives for Math/OCR Pipeline (LLM Evaluation)
Current Pipeline:
OCR (Claude 3.5/4.5): Converts PDF to pipe-separated question format.
Evaluation (GPT-5): Handles math calculations, step-by-step explanations, and formatting.
Verification (Gemini 2.0): Validates answers and handles uncertainty.
Export: Pipe-separated text → Word Table → Excel/Admin Panel.
The Problem: I want to replace the GPT-5 evaluation step with a local, cost-free alternative without losing the quality of math reasoning and explanation.
Questions:
Which local model (DeepSeek-R1, Llama 3.3, Qwen) is best for Hindi/English math evaluation?
Are there any Python-based orchestration tools you recommend for running this locally with high throughput?
Any tips on maintaining the strict pipe-separated structure when using local LLMs?
@flat crown you've asked this in at least three places. please pick one so that people don't duplicate their efforts
there's also #1479124546266202223
I dont know the answer, but my coworker does something similar but in English and Norwegian, he say that DeepSeek is better than Haiku, so maybe not not directly comparable, but Deepseek did very well on the math portion
and what is high throughtput? i use ollama locally
How do we decide whether we should remove outliers? I mean what kind of questions do we need to ask ourselves? For example if we take the housing dataset for california and we have outliers based on the total number of bed rooms, what would be some reasoning pls
@fair aspen Did Ai generate that? That reads like a teenager trying to be serious with an adult but has no idea what words are, and, or, trying to be hip explaining it.
if you think it's an outlier because of a confounding variable
there's no strict order of eda / data processing so I don't think that can really be answered
like you may do eda first then after seeing trends do processing, and then maybe eda to see if you can see other stuff, ...
or maybe you started out with some basic processing, then eda, then ...
yep I see, in my uni coursework, the project is graded based on "sections", like you have a section for eda, one for data pre processing etc but in real life/project this is often mixed together based on the insights we want to discover as we go?
yeah
I mean dont the courses usually show that graph of ml steps, which is a loop
lemme see if I can find it
yup 😎
this diagram
oh ok, didn't know that, yeah make sense when I see that... in my lecture slides, they make it as if it's a sequential workflow
Small question, currently I am working with the california housing dataset and the aim is to predict house prices. Now the house prices are large and the value is capped at 500,000.
Should I use log in such scenarios pls, if so where/when should I use it, to display the data or even inside the ML model, log should be used?
Can anyone help me install ROCm on Debian 13, my GPU is the rx 9060 xt
so do you want to put all the features in the same scale?
I have standardised the other numeric features but the house price/target variable, I haven't touch it yet
this dataset it's from sklearn or you get from kaggle?
kaggle
i have work with this dataset in my github, do you want to see?
sure, I can have a look
and for your question, i prefer to scale all the data, cuz the error always go to the lower possible
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# List of numeric columns to scale
num_cols = ['housing_median_age','total_rooms','total_bedrooms','population','households','median_income','median_house_value']
# Fit scaler on train and transform
train_scaled = scaler.fit_transform(train_set[num_cols])
# Transform test set using the same scaler
test_scaled = scaler.transform(test_set[num_cols])
Currently I have this
should median_house_value be included?
standard ols, potentially - check the distribution skew: usually house prices are right skewed, i.e. there's a small but significant chunk of premium houses that cost a fortune
if you do ols on raw prices it'll focus a lot on those and thus make it poorer at predicting the standard house prices; taking the log can make the distribution more normal
alternatives like gamma regression's assumptions about data are more correct here, so you don't need to log, or trees / gbms which don't really care
It's not mandatory, but I always do it. If I were you, I would create two datasets, one with the target scaled and the other without, and train both to see which one has the best accuracy. But I always scale everything.
noted, thanks !
Hi everyone! 👋
I'm building a Python data enrichment tool as a side project to improve my coding skills and automate a boring task at work.
The goal is to take an Excel file with 4000 Italian VAT numbers (Tax IDs) and extract the standard B2B email address for each company (I'm avoiding the official certified government emails due to strict privacy and cold outreach laws).
The issue is that I have zero budget for enterprise data APIs (like Dun & Bradstreet, Atoka, etc.), and enrichment tools like Dropcontact are too expensive for a learning experiment. I want to tackle this with pure web scraping.
Here is my 2-step architecture idea:
Step 1 (VAT to Domain): Take the VAT number, query a search engine (e.g., "VAT 12345678901"), and scrape the first relevant URL to find the company's official website.
Step 2 (Domain to Email): Once I have the URL, use a crawler to visit the homepage and the /contact page, then extract standard emails using Regex.
The stack I'm considering: pandas for CSV input/output, and Playwright or BeautifulSoup / requests for the actual crawling.
My questions for the community:
SERP Scraping: Google rate-limits and blocks very fast. Do you know of any ultra-cheap/free SERP APIs (is DuckDuckGo more lenient?), or should I just integrate residential proxies directly into Playwright?
Web Crawling: For step 2, is requests + BeautifulSoup + Regex enough to find emails, or do most modern sites obfuscate emails with JS, making Playwright mandatory?
Alternative Approaches: Is there a lateral thinking approach I'm missing to match a VAT number/Tax ID to a company domain for (almost) free?
Any advice on libraries, architectural patterns, or open-source GitHub repos I can learn from would be incredibly appreciated. Thanks a lot! 🚀
@serene scaffold
Hi, sorry about that. I didn’t realize posting my Python tool sale link would be considered solicitation. I’ll follow the server rules from now on. Thanks for letting me know. 👍
quick question, when we use linear regression it assumes that the dependent and independent variable are normally distributed?
Any use pytorch ?
what would you ask someone if they did?
it saves everyone time, including yours, if you always start with your actual question. don't ask to ask.
🙋♂️
I'm trying to help you out here. are you going to ask your question?
Hello, quick question, consider this graph. The problem is the axes. It seem that the range istoo big for the x-axis. Should I use standardised axes, so standardised variables here?
or is it because of the outliers?
Standardising won't help in the slightest; the plot would look exactly the same but with different labels and ticks on the axes
It's because of the outliers, yeah. You could also try plotting in log-log axes, or at least log-x.
Running the GAIA benchmark. They both get the right answer but one is faster, econ_comp. Not only was it faster , its attention to detail improved. But look at the outputs, the one on the bottom right is slightly more focused.
this is easy
If you want to stay away from enterprise models. My next question is what kind of hardware do you have for local AI models?
Thats your first hurdle.
youll want to research sglang and vllm or even lm studio. I can help you build something.
What hardware I should have? sorry for the question but I am a very beginner in the field.
thank you, will deep dive
I don't want to ever used
LableImag
And if so how do you label a video
Dont be sorry. You're are asking the right questions
What country do you live in and what do you access too? You sound outside of the US and EU to be straight up
Well , doesnt matter. You want soemthing with decent amount of VRAM. You can pick up a 5090 or AMD has 32 gig options but ROCm is a bit choppy but lots of work is being done.
AMD AI 395+ with some cards ? I mean there are a lot of options, really need to plan this out and narrow it down, maybe you just need somthing basic.
Search for LM Studio and LocalLLM on discord for more help and knowledgably people. There are more helpful places though
no; it assumes the errors are normally distributed, or "homoscedasticity"
in reality there are probably 0 real datasets which are homoscedasticity
ideally a good gpu with lots of vram
will do that, thanks !
didn't know about that, will read on it, thanks !!
Hi, quick question. I'm working with the california housing dataset and from what I've read on the internet, people who work with this dataset says that the housing prices have been capped at $500,000, same with the housing ages which have been capped at 52. But how can they say that, I mean is there any proof or it's pure deduction?
Anyone have a suggestion for IDE use python/pandas and SQL? I have looked into PyCharm and Vim which I understand is not an IDE but have read good things about it.
plot the housing prices / plot the house ages?
yep just did that, I assume the hard cut off indicates the capped values
yeah p much
I mean unless you have some very good explanation for why there are 1000 houses all with exactly 500k dollars; but it being capped is more likely
yep noted, thanks !
I am from EU. Thanks so much for the advice.
Will do, thanks again!
I have been looking online about the evaluation metrics using linear regression for the california housing dataset. It seems that every one has metrics like that, for e.g house prices different by 40k for e.g.
This suggest linear regression was never a good algorithm to be used?
The R^2 metric means that the regression model is only explain around 57.8247% of the total variance in the data which is quite poor
yeah I need to see if I can increase that
is n >> p?
hmm what are you referring to by n and p pls
n = amount of observations(data)
p = amount of predictors(variables
yeah I believed it's much much larger
lin reg can be really good if you set up the dataset correctly, which will inevitably be a lot of work
which is why you might prefer some more complex methods that don't need that much effort, like gbms
yep noted, thanks !
I got my first visuals back, but I nmeed mroe data points, start level 2 next.
if you want interpretability you should rather stick to linear regression but if prediction accuracy is the only thing you care about other models are much better suited for the job
Would anyone know what data science projects to do with a bunch of messages and associated user IDs? I tried to make a model that would predict volatile people before becoming volatile (basically filtering out people who cussed a lot then examining only the messages where they didn't cuss) but the confusion matrix was terrible. Any simpler ideas that could actually produce useful results out of the data?
The CSV is in this form:
user_id, message_content, date_posted
you said you had a confusion matrix. do you have other information about each user in a different CSV?
hi guys i got a question what course or youtube video do You recommend to learn scikit learn?
don't try to "learn scikit learn". focus on learning concepts that may or may not use scikit-learn
like what?
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
I already know calculus, statistics, and linear algebra
what have you tried to implement in python that's about data science?
i cleaned datasets, visualized them, and made a model but it was so basic
like i need to improve but i don't know what to do
what model?
there's like, a jillion types of models.
XGBC classifier
have you tried doing a neural network?
no
ik what is that but i never tried to make it with pure python
what do you mean by "pure python"? because you shouldn't make a neural network that way.
nvm
why nevermind?
cuz i just realized I am stupid
you are not
the short answer is no
I am kinda lost i learnt the essential libraries, math,and python now i don't know what to do
@wide wing try making a neural network in pytorch
you'll probably still use sklearn to partition the data into train and test, and to evaluate the performance.
alright another library to learn 😭
also i forgot to tell you i am new so i don't know that much
you're not "learning a library". you're learning about neural networks, and using a library to do it.
that's even worse
are you trying to learn about data science and AI, or what?
yes, i want to become a machine learning engineer
it's important to keep a positive attitude about learning new concepts, because that will never stop
alright thanks for ur tips :)
Are there any models which I can use locally to generate photorealistic images/video/voice?
Or if there are any good paid once?
They should have good docs, I am new to ai image/video stuff
Those kinds of models require enterprise grade hardware. Especially if you're going to generate video. I don't think you'll be able to find one that can run on consumer hardware that is photorealistic
looks pretty photorealistic to me. but id search around hugging face
Just make a non transformer based one from scratch!
Do you think emergence is close? || @barren gulch ||
emergence of what
Just like in general
Do you see it being a possibility?
what do you mean by emergence
Like AGI
Do you think or see AGI emerging within like 10 years.
No matter what AI gets invented, I don't think everyone will ever agree that a certain thing is AGI
True, I think it's just a bunch of layers of additive complexity at this point trying to figure out what works and doesn't. What's missing and isn't, what's the best algorithm if it's been invented yet, maybe its a combination of Hybrid Neural Symbolic Governed AI with modern LLM Agents for tool calls. Keep it grounded using a deterministic traceable deliberate substrate you might have a shot, but that's putting it 'simply'. There's a lot of guess work still
I think its quite interesting
I wouldn't consider something to be AGI unless it least tried to model constant cognition and constant sensory input. There's nothing happening in an LLM except when you use it to generate text, but the human mind is constantly active.
Yeah I agree that text generation is definitely not enough for this, but I'm thinking of having it paired with a larger system with continuous feedback(self looping), memory, self-modeling and some kind of ongoing world reaction, and maybe a sandbox for testing and validating internally before trying it out for real, The hardest part of that would be maintaining stability at scale
Can you set a goal and have it actually invent a way to achieve that one goal.
Can it set that goal itself.
Like those are questions that need a yes for it to work out
You can really sum it up in one word, homeostasis. I feel like when the system can maintain that.
@barren gulch got any projects?
@limber plover how are yours?
Anything cool with the Matrix stuff?
Heres the flow of my setup, anyone got any suggestions or critiques?
You guys should check out eraser, i discovered it today.
Looks useful
Thanks!
yup
Not really just looks typical
It works really well qwen 27b model.
What's suppose to stand out here?
Funny cause most enterprise models get these GAIA task wrong
Well , not most but a surprising amount compared to local models
Yeah its nice it's using known approaches is that your goal?
What are you building?
Its like a hybrid retrieval system
I wouldn't even know where to begin
You can't use words to describe it? lol
Well I can I just need to be careful without revealing architecture details
seems lame
I appreciate it
Well lets see it in action. Lets see your claim to fame?
Well im off to github and license my work. I feel like you're lurking to taking peoples work.
was an interesting convo though
I built a deterministic reasoning runtime, Instead of prompts and responses, like most agent based systems, this maintains a structured knowledge state and runs planning, exploration and evaluation loops on top of it against a sandboxed world model, and the key thing here is that every time I run it produces artifacts that can be traced, replayed and verified, so that behavior is reproducible rather than being purely probabilistic
Nah, not trying to take anyone’s work. Just trying to start a conversation with people
I don't care about LLMs. or Transformer based anything. It's all garbage.
for video the newest one is ltx 2.3 - I can't run it myself, but I'm pretty sure you either require top grade consumer gpus like rtx xx90's, or some mid-high tier + tons of ram for it to offload then wait like several minutes
voice I've not looked into too much
images, yes, something like z-image-turbo can run on 8gb vram with decent speeds; flux 2 [klein]s are pretty bad at anatomy when it comes to text-to-image, but they can edit well
@half pulsar no, I am still on python trying to remember what I did. The matrix though is simple mathematics. So I have to write that out first in my reasoning before I put any code to it.
Thanks for mentioning the names,
In the past I've tried some open source Loras? Models locally they were pretty bad like year ago, will try these and see what is possible now.
Can u mention any paid models maybe that are good at small videos and images generation.
I have tried but seems like everyone says theirs js the best, and can't try em all
loras aren't models though, they're stuff you add on top of models that modify their behavior
I don't use any paid api for image/video/audio gen so I wouldn't know
noooooooope
LOL my best test result so far.
BTC R:R 1.93, ETH R:R 2.28, SOL R:R 2.44 , XRP R:R 2.54. MFE 86 %.
Unseen pairs
ADA R:R 2.03, DOT R:R 1.9, LINK R:R 1.5, MFE 75 %
PPO meta learning is the way 🙂
I'm about to finish a demo for autonomous 3D printing. It will be able to design, print, evaluates the result, and adapts the next iteration in real time. I'll probably post the results once it's done.
Yeah but its more about what you do with the Matrix that can make it interesting
Python is simple though you'll get use to it!
Hey guys
Could anyone help me understand the tfidf feature extraction.
I have a 150k row dataset and I am trying a combo of char + word tfidf vectorizer from sklearn.
I got around 35k features and 150k rows now I can't possibly feed it to lightgbm or any tree based models cuz it will take around 2-3 for 1 model.
Is there any way to make the hyperparameter tuning less painful?
Is it better to add layers or layer size?
Before I answer that, what do you think the answer is?
Id guess they probably have different advantages
where size lets the network break down and analyze more thoroughly information
While layer amount lets it find more complicated patterns
This part is correct
I don't think this really means anything
you know taylor serieses?
I know of them
With more neurons it can derive other factors out of already existing ones
To then have more to work with
guys why SKLEARN don't have XGBoost and LigthGBM natively?
I WANT TO ANNOTATE LISS 4 IMAGES AND THERE ARE 1000 IMAGES SO IS THERE IS ANY TOOL I CAN USE TO ANNOTATE JUST THE BUIDLINGS
try asking again without caps lock
also "liss"? you mean less or is that some specific term
Linear Imaging Self-Scanning Sensor-4
i have 1000 images and i have tried SAM-GEO in qgis but it is for rbg images and it should look like buildings
LableImag
Can I automate it for 1000 tiff images?
No unfortunately or maybe somebody from GitHub learned how to automate it I don't know
hii, is it possible to predict tomorrow price based on previous data of stock market using neutral networks?
yes
will it give good accuracy or not?
of course
but dont need a neural network for the model have good accuracy
models like XGBoost and RandomForest can be great in some cases
I mean, obviously a nn will have a accuracy better, but others model are good too, so depends how well you need the model
have you tried?
yes, https://github.com/tevoshw/machine-learning/blob/main/src/projects/project_17_stock_part_2/eda_model.ipynb, it's a simple one, using simple models
i dont tunning or review, so the metrics can be better in this case
but an NN always gonna be the better option
usually no
additionally accuracy is a bad metric, e.g. if your model correctly predicts if the market goes up/down 60% of the time, that doesn't tell you about how much you gained from that. there could be a crash tomorrow falling into that 40% wrong predictions and you'd lose all your money
you used test data same as train data right? why?
in general trying to predict market from only the prior stock prices is a bad idea
what could be a good idea is if you collected data from other sources that's not the stock price, but could influence the stock price
yes you are right predicting price based on only previous data want give accuracy
as a recent and easy example, when deepseek released, it demonstrated that powerful LLMs can run on less capapble chips, nvidia stocks plummeted
stuff like this, which honestly is usually pretty obvious in hindsight, but in the moment is hard to find
How did it go for you?
The Darts time series. Were you able to do the tasks easier?
True, YouTube tends to do clickbaiting videos on that topic. There's good YouTubers that do it for instructional purposes (Neural nine)
I have a Q is AI intern is good option as a fresher? ? or not
yes
one thing to be careful about is
like, some time back I spoke with someone who heavily invests in the NASDAQ and at the same time also works in AI at deepmind
the nasdaq is heavily weighted towards AI stock, so this persons whole life is basically a thematic bet on AI
well I've not run into a ts issue yet so shrug
the issue with this ofc is that if AI doesnt workout, the nasdaq crashes, wipes out a solid chunk of his savings, and his primary income source would now be moot
career choices are investments, diversification is good
just to see if overfitting, of course have others methods and algorithms to see better
Price-only models are basically fancy coin flips with extra steps. Add real signals (news, vol, macros) or prepare to explain the empty wallet to future you 😭
For image classification, is it better to use a pre trained model or should I create my own model as a way to better understand these models? For context, im doing some practice projects for image classification
good ones do exist. the few ones I know amount to providing market making services to the market. the competition for that is fierce.
for learning hands-on experience is always better
Anything I should be wary about? Also I have an nvidia gpu so should i use my gpu to help with training the model faster?
that's usually a good idea, as long as you have enough vram that you don't need to swap stuff in and out of the gpu
Got a 3060 gpu, 12gb vram. I assume thats sufficient enough?
Tbh, i shouldve bought a 3090 when i first built my pc 😂
it depends entirely on how big your problems are, but 12gb vram is enough to do a lot of small scale testing
i usually get away with testing small things on a laptop with like 4gb vram, and then i send the full scale problem to a cluster with real ML hardware
maybe some extra context that might be helpful: normally i only think of problems with lots of linalg in terms of the biggest matrix i have to store; with ML though, modules like pytorch and jax also build a computational graph to compute gradients, so 12gb vram in ML is not really comparable to 12gb ram in regular computations. you need a bit of slack. just in case you get surprised by a random OOM with a small~ish problem
in any case, don't expect to train llms locally, but you can do a lot with 12 gigs
which models do big quant firms uses? do they use there custom models?
They make their own models for sure
Assume a logistic regression model or a perceptron where we include the two features x and x².
Then I have a two-dimensional space, where I can easily draw and visualize the decision boundary. It is neither a plane nor a hyperplane.
Since the quadratic term is included, the decision boundary will be curved rather than a straight line, even though both models are still linear in the parameters?
Last year around this time they were worth less than 500 dollars
You could have bought 2 for the price of one now
Yeah, that makes sense. Market making is one of the few areas where consistent edge can exist, but the competition and infrastructure requirements there are extremely intense.
Does anyone compile all models with data such as linear containing characteristics, method, etc?
yes, what defines whether a model is linear is not the shape of the line on its graph, but rather the relationship between the weights and features (i think so)
I am trying to build a logistic regression model for a churn dataset in scikit learn, i get 0.87 on the training dataset but when i use the testing dataset i get 0.57.
I am thinking that the training data is overfitting but what should i look out for and what should i study to understand why this is happening?
it's in sklearn?
yep
how we can't see the loss in the epochs, prob it's overfitting
try to add regularization
i have tried both l1 and l2, i always forget how to do the code thing
what's the sample of the df
the shape
~440k training rows for training data
~64k test rows for testing data
15 columns total
you can
- increase the lambda of regularization (0.1/ 0.2 - 0.5)
- decrease the max_iter,
- try ensembles and svm models
- improve the data
how it's sklearn and we don't have so many options, i think don't have so many things that you can check
i only think about this 4 options. but it's good you see this dataset on kaggle, prob will have others notebooks about
precision recall
0 0.95 0.22
1 0.53 0.99
this is the testing dataset classification report hope it can help
also thanks for the help will look into all four options
well, you can tell me the distribution of classes?
prob the target it's 90/10 or something like that
training
1.0 249999
0.0 190833
testing
0 33881
1 30493
hmmm, prob it's overfitting but idk whats happen
if you fix tell me what it's wrong
ok will do
so i did a correletion plot and i can see the issue
while the training data has variable that are from 20~55 that correlate to churn, the actual test data has 5~11 at best i think it is pretty logical for the model not being able to complete perform since what the model learns from training will different patterns from the testing dataset which has lower correlation for each variable towards churn, i might still be wrong but i think this is true
so if i remove some variable that have to much of a difference in correlation from the training dataset and testing dataset then the model can see patterns that are actually useful
so the distribution of the data in train/test are skewed?
i would say so
did you use the train_test_split?
in this case the shuffels it's true
so it's more the data, you can use methods to split better the train and test set, or do something with the data like IQR
i will look into that tomorrow probably, still it is intresting and thanks for the help
🫡
learns from training will different patterns from the testing
look into something called adversarial validation btw, which can make catching this stuff easier
thanks will do
@solar arrow I removed your message because the project looks like malware, with only an executable available and no source code.
ai go brr or somthing idk
me after the ai give bad code
im starting to under stand why people pay for decent moddles as i spent 1 god dame hours trying to fix a cropping issue as the ai REFUSED to expand the croped area for eatch target enought so make sure they still fit after expanded bye a local screen distortion matrix
i have no idea why it could not fathum it
i even did the god dame mathes my self
oh and if eny one was wondeirng how i got the data uh.. gantry ssytem in scrap mecanic
i fitted the exac fov in game and using linear projection and a hand full of hand labled images dumpted points along a ray in a 3d grid to get the data points. need to use the ais output to genrate the rays and re do it as my datas very messy lol
after i get this finished as i have a few things i found in testing that will likely thruther improve the ai im try doing mindusrty scemtic auto genration if eny one has ideas as its a very hard thing for ai im fully opten to sugestions as i love keeping notes on nishe ai training methods and ai. i dont care about the bog standard stuff as iv researched the standard stuff already
if we train a model with 80 : test 20
what if its like train a model for a where we take a data of 1 year
but in some case like a year end or other case final 20 may differ
how will u over
can i use this as i learning data analyst or database which one will be fine
i need a opinion
Hello, quick question, I have generated a heat map, as uploaded. I was wondering, how do you people evaluate a heatmap, what are the insights that you check/verify?
I read about multi-collinearity where multiple dependent variables have some relationships among themselves, why is that a problem? Is it a thing here pls
cuz the model uses one and ignores the other
the model stay “lost” in how uses the weights for each feature
wdym, like it focuses on only one ? but even if we include these as predictors?
in linear models the model uses all the features presents in X, but with multi-colli, the model dont know which feature it’s the most important
so the model stay “lost”
In tree and ensemble models multi-colli dont have a big impact
ah ok, so ideally, what does linear regression model want? Like in the predictors/dependent variable, what kind of relationship should exist?
multi-colli the information between 2 features it’s pratically the same, so the model can have weights that can’t generalize well
the weights can be y = 2x1 + 3x2
or can be y = 100x1 - 95x2
look that will make the same y, but the 2 model the weights are big and complex, so any problem or modifications in the dataset can have a bad accuracy
Hi I am looking for a invitation to the GraphRAG discord? Would anybody happen to know how I can get there?
when you say "GraphRAG", are you specifically talking about the Microsoft thing?
(Microsoft should be permanently banned from being allowed to name things.)
To be honest, I am not sure. I just saw a YT video GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem from the CEO (?) of Neo4j and on his last slide is a discord invitation to graphrag, but yes, could be the case
Furthermore, I am looking for a Python library (:-) ), learn more about Graph RAG
Have you worked with this software btw?
yes
Thank you for the invitation!
yw
Ah yes
Would I need to work about scalability? I am just at the beginning of my journey
don't try to prematurely optimize when you haven't even figured out what you're trying to do
I just got my KG to scale billions!
Its extremely difficult it took me several years
And basically, I wonder how to bring multiple ontologies together into one Knowledge Graph for a GraphRAG (not necessiliary the MS thing) 😄
So good luck!
I made it a rule in my team that we have to say "MS Graphrag" if we're talking about the microsoft thing
because "RAG with a graph" isn't a concept that microsoft can just own
True, but I am alone in this project
"don't try to prematurely optimize when you haven't even figured out what you're trying to do" will put this on my screen 😄
Being alone means nothing, you can do it absolutely
also I came up with a technique that involves generation-augmented retrieval
my coworker named it SteeleGAR
which is now my fremen name also
I am a newbie in this field and they want in six month a prototype .
Just build a structure that you're happy with, the structure itself needs to be modular and well thought out before you think about scaling it
Steele because of your nickname?
my real name.
That is my aim, but I am struggle even with this .
you'll figure it out as long as you structure your efforts in a way that continues to feel exciting and worthwhile
Just stay very deliberate with it, any choices you make from the start will have a impact on future infrastructure, Keep it open ended preferably soft-coded.
If I may explain: So I got a list of heterogeneous data sources, they wish to be put into a RAG. To make it more specific for this field, I thought about a GraphRag. Ontologies were selected, but multiple ontologies probably go into 1 Knowledge Graph?
But nobody thought about cleaning the data , how to structure it, and on top of it, previous people thought Excel is the tool of choice when the claimed the created a structured database based on two source, with manual inputs partially
What does this mean please?
and what would this mean please?
When you're building it you're gonna have to do a lot of problem solving so, you need to be careful with what you solve and even what order its solved. You have to have a good idea of what your needs and goals are, and stay aligned its easy to get off track on projects like that, Just make sure you make good use of your time basically. Just have you priorities straight and you'll be fine.
Don't overengineer it, just build what you need, complexity creep is a bitch
Build something small stable then optimize from there(MVP)
when you're trying to learn about a challenging domain like machine learning and AI, you need to have small victories along the way, or you'll burn out and give up. don't start by trying to build Jarvis, because you won't, and you'll give up before you learn anything.
Yep!
Math is a big part of this, you're gonna use a lot of it.
I think that's the fun part personally.
Thank you for your kindness and advice.
I can say, the people who hired me, have no understanding of software dev whatsoever. But they want an AI driven tool in 6 month. No specs given, no requirements defined, no business plan, no USP.
I told them I would need time to learn the software stack, the available libraries . 3 months minium. I was given 0.
Also all of the non-programmers think one could wave the AI wand and a tool is there.
"good idea of what your needs and goals are" - I am working on this as well
Re math: Happy with it, no problem, I have a STEM background and some knowledge in scientific computing
Next week I should come up with a strategy, which tools, how to deal with the data sources etc.
Fun fact, the people did not think at all about RAG, just used excel, regex and also manual extractions .
@half pulsar and @serene scaffold if you happen to have good starting points, I am ears and would be very grateful. I am looking for good software libraries, best open source. I was looking into Dockling, OpenAlex etc, Ontologies in general.
I wrote mine from scratch, that was my starting point for my project personally.
Even for learning about the methods? Sorry if my question was not good. I am looking for learning materials etc
No I carried over many years of experience, before even attempting something like that, because I knew it was a long term undertaking for what I'm trying to achieve
Hm, I have 6months and maybe next year as showcasing date.
I guess in the worst case, I will look for another job
I'd say be careful with how you calculate the graphs and how you implement it from the core, because the amount of combinatorial explosion you can get very quickly is insane.
I am not even there. I would need to find out what are good libaries etc , how to deal with a lot of unstructured data, heterogenous source, then GraphRag. May I ask where you learned your knowledge from?
Books, YT videos , chats with colleagues?
are heat maps used to explore features and identify relationships or it's just a way to confirm that a particular trend/relationship exist?
I haven't kept up with it all that well as I don't use any libraries, And knowledge wise, you're always researching it doesn't matter how much experience you have. Its always research, so yeah that'd be like books, yt videos, papers, google scholars, chats with other people.
There's a lot of depth here 😂
@half pulsar : Now I am confused. Don't you use Python or did you really wrote your own library?
I use Python, but all the core logic is my own code. I don’t rely on external libraries for the main systems.
both, heatmaps are just a easier way to identify that correlations, normally when you get the dataset you dont know nothing about, even the better model, so if you identify a high correlation in heatmaps it's prob a linear model
don't have a fix ideia for what using heatmaps, normally it's to identify corr and somethings that can help you at the model
Ok, I express my wish, if anybody has good learning resources for GraphRag (not general MS), please pass them my way
You can try NetworkX, Neo4J, LlamaIndex, PyKEEN if you're exploring Graph-RAG style pipelines
Hello 👋👋👋
Many thanks @half pulsar !!!!
Please why can't i speak in this space????
Take care 👋
Today I made a big achievement just been in such a happy mood. Took YEARS
Does anyone want gemini api for free? Not sure for how long I will give for free
Ping/dm me if want
!ban 985951964779139132 Repeatedly mentioning a project that purportedly offers free API calls to gemini, but which they're distributing on github as an executable with no source code
:incoming_envelope: :ok_hand: applied ban to @solar arrow permanently.
@half pulsar do you tackle emergent behavior or study? What I mean is, from basic implementation of the algorithms. Rather than NN directly. I have been doing experiments on this by having basic robot in a room and it is given basic function such as, left, right, up, down. I then give it basic memory elements so it can remember where it was last. It then has a choice of using it or not. You keep doing this until such system has emergent behavior out of all the small subsystems added. This seems basic but it is interesting. It's also faster for me rather than training NNs for this. You can get a rather sophisticated system behavior.
Yeah, that approach makes sense. Emergent behavior from simple subsystems interacting is definitely interesting, especially with memory and feedback loops. I’ve been exploring a similar direction, looking at how small components interacting over time can produce more complex behavior without relying on heavy pretrained models. Still experimenting with it to see how far it can generalize.
One of the tests I ran was letting it observe objects through a live camera feed in real time. After repeated drops it converged on a consistent downward acceleration from the trajectories, basically rediscovering gravity from observation. Still early experiments though mostly seeing what kinds of structure emerge from simple components interacting with the environment. But I'm soon moving it to larger scale testing on my 3D printer with a full feedback loop.
@half pulsar that is actually fantastic on what you are describing. What I did was restrict it's movement as if missing a leg to see if could figure out an effective way of moving around the room with just 3 legs. Which it did, it eventually emerged the same pattern after some time as if it had 4 legs. Not quite the same but close so that it could move more effectively around the room avoiding objects. It did make a large data set for it's position and choice of possible positions. Now I guess what I could do is some how feed the data back to it. There is also the matter of pruning as I don't want it to keep data that has not been used for some time as a reference.
That’s a neat experiment. Pruning problems is always quite tricky once the system starts accumulating a lot of experience. How are you planning to decide what gets kept versus dropped?
@half pulsar it would be based on least used coordinates. Or position in this case based on x y. With in certain amount of time. Every 30 sec it might delete it. Sort of like stacked. The bottom most stack gets a snip for now. Remember, I am still learning python.
Hello guys any inside ir35 contract roles in uk for ai engineer/ data scientist
I want to pretrain a embedding network for 2d signal data to produce meaningfull embeddings before using them for a downstream task, what are some techniques for this? I've tried nt-xent which works well for producing a nicely spread out latent space but im unsure if it is actually teaching any meaningful features or relying on specific (to the downstream task meaningless) outliers. Thanks! I've also looked into boyl and simclr.
Are there any broader literature review papers for this type of work? Seems like mostly its image based.
Hi guys just started with data science and ai.. know a bit of python and sql... Can you guys give me some project idea
How have your day been guy's i just finished an Internship search agent with python
I will like you guy's to recommend a good platform to kickstart my AI Engineering journey
do i use .py or .cpp to teach an ai about codeing?
Can you rephrase that question?
if i gave an ai a .py or a .cpp or a file that allows code to be ran could the ai understand the formating over time and learn to generate the .file of that type?
Before I answer, what do you think the answer is?
As long as it's good quality over quantity
And AI understands patterns and you don't have good quality data the AI spits out day that that could be incorrect or dangerous
We already know that it's possible for generative AI to be trained to generate python and cpp. But just having a mountain of python or cpp code isn't enough. There also has to be a way for the model to know how to associate the code with descriptions of what it does in a human language
do MNIST project (you train a model to predict numbers photos)
I have a list of datasets of closed models generation from hugging face. I need to look at each individual schema and maybe a single sample for each dataset so that I can convert it to a universal format, but I noticed I would have to download the whole dataset. Is it possible to only gather the schema and a single sample for this use case?
That sounds cool... I will try doing that too
I second Stel's suggestion. Further, I highly recommend building a MNIST autoencoder
Autoencoders are absolutely fundamental. The basic idea is this: take an image and run it through a network to produce an embedding, then run that embedding through a second machine to rebuild the original image
You might ask - why do something so pointless? Well - if the embedding created in the middle contains enough information to rebuild the original perfectly, you can be certain your embedding contains all of the information about the image. This principle can be applied to anything. If you can encode something such that it can be perfectly rebuilt, you've captured all the information about it. Now you can pass that embedding to other models for manipulation, information extraction, prediction, whatever you need
If you're going to do ML, autoencoders are, in my opinion, the place to start
Hi, I need a quick advice. I recently showed my work to my supervisor for my final year project. He told me to add a novelty feature. What my app is about, it's about animal welfare. So basically, users can create posts and have a little chat, posting images etc.
Now, my supervisor told me to add an NLP stuff to my posts. This is where I wanted some ideas. What kind of things can I add here?
Maybe I can try to train a model that will identify if posts are urgent?
What could be other possibilities? My supervisor told me to categorise positive vs negative post. But don't know if this will fit my context.
Any idea pls
Beyond that, is there a recommended tutorial for NLP just to get me started pls
The more I am here, the more my interest keeps growing in this field... I will try to make it and come back soon for suggestions.
can anyone tell me why my model of eminst is giving around 82% accuracy only? i tried using basic neutral network and cnn getting same on both.
nope
you need to show more of your process (what you do to the data, model specifications, training loops, etc)
https://colab.research.google.com/drive/1LTok7unr6oWjZNSwlQBQ0zG8F6YBIwXY?usp=sharing
this is notebook where i implement in cnn can you tell whats wrong
Normalization might help - I'd also recommend going with softmax
It also looks like you're doing rotations. Three layers should be enough for unrotated, unflipped, otherwise unmodified numbers, but you're introducing a very high degree of variance if you allow for those sorts of transforms
Three small layers might just not be enough
i already did normalization by /255. is it works or it should be in 0 to 1?
ohh got it
Another approach would be to convolve/pool all the way down to a 1x1 with 62 channels, convert that to a vector, and softmax that
Both approaches should work, in theory
yes let me try applying softmax
My recommendation - add the normalization, and remove any transforms/augments on the data to check if it's just a model capacity issue
how many layer is good?
If that doesn't work - bring it to ChatGPT. Your model looks correct to me, if perhaps a little small. But ChatGPT is very good at spotting eenie-weenie bugs that are hard to spot
sure
Well, for tiny images like the one you're working with, three should be enough if the dataset isn't too variable
More variation = more information = need more neurons to capture it all and identify features
Oh! You should also instantiate your weights. I'm a little rusty, I havn't built a model in a while
So you'll have to look it up, but, it can make a difference. That said, 82% is pretty good. Your model is learning. It's just that at some point, the signal either stops getting through to or the model simply doesn't have the capacity. You can tell which by examining the gradients. If the manitudes of the gradients explode or vanish to zero, you've got a structural issue
Not sure how you could with such a small model
Your learning rate is also a little high
Normally you wouldn't expect to see an LR above 0.0002
I couldn't make a file reader if it has the extension or I could just put py
It was also need to have some type of coach that understands basic writing and maybe a coach that understands python just like how catchy bt3 was created they took Chachi bt2 which only cared about the fundamentals with the what will call them morality coach telling it what I can and cannot say
I do not understand what you are saying.
What are your thoughts on progressive dropout to improve generalization and convergence speed? So starting at 0% dropout and increasing over time.
Heres a paper on it https://arxiv.org/pdf/1703.06229 (Curriculum dropout)
will have a look, thanks !
you can also see in kaggle, there are a lot of famous datasets and projects from beginners to seniors
https://www.kaggle.com/search?q=machine+learning+in%3Adatasets
Thank you for that
any question about the ml world, u can ask me 🙂
What is ML
machine learning
Sure that 💝
And with Machine Learning I can build AI like Grok and DeepSeek
no
well you can try to, but you'll hit a wall in either compute and/or data
machine learning it's a big world, and have subsets, one of them it's called deep learning, that neural networks, and grok and deep seek, are a billions of neural networks, so yes you can, but in the presents days this it's the high level of machine learning
I agree with Purplys, "but you'll hit a wall in either compute and/or data"
You will hit compute limitations and data limits before you even get far enough into it.
what is ur project im also a new learner
if u like we may work together
Hello, can someone explain why do we have to one-hot encode the label/target in NLP pls, what if we omit that and use a word index for e.g?
explain more
One-hot encoding is just a way to turn something that isn't a number (such as a word) into something that is (a vector)
["cat", "dog", "bird"]
cat -> 0
dog -> 1
bird -> 2
dog -> [0, 1, 0] (length of vocab)
Lets say our NLP stuff ends with giving probabilities:
Predicted probabilities:
[0.2, 0.7, 0.1]
P(cat) = 0.2
P(dog) = 0.7
P(bird) = 0.1
This lines up nicely with our one-hot vector (if we want to compare them and then update our system).
Other problem with having it being a single number is that it implies an ordinal relationship that does not exist: bird > dog > cat.
Other problem is that if you had say 3 neurons that you want to each respond to a different animal, it becomes much harder than if they have 3 inputs (from the one-hot) with 3 weights (instead of 1). In that case the weights become easy to learn, [1, 0, 0] to only respond to cat, [0, 1, 0] to respond to only dog, [1, 1, 0] to respond to both.
is it a cat?
is it a dog?
it's a catdog

Hi! I’m just wondering if it’s a good idea to start learning ML using roadmap sh?
depends on the roadmap
but cat being 0, dog being 1 and bird being 2, this are word indices, normally we would use embeddings directly, no? word_index is always a thing?
yep, but with word indices, these are already numbers, why should we one-hot encode them pls.... when we one hot encode, the shape of the list obtained is similar to the one the softmax layer produces, is this the only reason?
Hello, can someone confirm if the following workflow is correct in NLP when it comes to classification task pls:
So first step is to take the raw text and tokenize them.
This result in a sequence of text.
Next step is to convert this sequence of text into a sequence of numbers, word -> index so that they can be mapped to a vector embedding.
(Now, when we say map to a vector embedding, are these embeddings initialize at random?)
We are now at the embedding layer where we have a sequence of embeddings. These embeddings are feed to neural net to learn complex pattern. Here, does the embeddings changes? Kind of like when we change weights during backpropagation?
Then last layer would be a sigmoid for binary classification or softmax for multi classification.
When it comes to the labels used for inference, the labels are also in a numeric format, like 1,2,3. Then these are one hot encoded.
Why are they one hot encoded though? Is it just to fit the shape of the list of probabilities we obtained?
If you use the indices directly you introduce ordering that is not intended. E.g. Cat (index 5) being "smaller" than dog (index 10)
Generally its nicer to have everything be arond the same magnitude. But with indices you have wildly different magnitudes. Some models have a vocab size of over 200k. So the model would need to predict the number 1 and also the number 200k.
The embeddings are not always initialized randomly. Sometimes you use pretrained embeddings such as from Bert or word2vec
Yep I see, thanks !
Hello guys
Hope everyone is doing well
I just wanted to know what's the best cloud space to deploy aka hosting llms in?
And the best one to train them
Best also equal cheapest for me 😢
But still i need something that works great not just cheap
The reason that Squiggle said:
Other problem with having it being a single number is that it implies an ordinal relationship that does not exist: bird > dog > cat.
yup I see, thanks !
I'm finishing an "article" about ml for beginners and how they can understand it, when I'm finished, could you give me some feedback?
You can post it here, but I can't guarantee anything.
Don't reply to messages to say something completely unrelated to that message.
I was making this stock trading ai, firstly probably should explain how it works i took abt 10 years of stock market days and made the ai simulate trading for like 100 days at a time, then take the wins and losses to build a new ai on a infinite loop, but a problem im running into is the ai seems to take only the data from the most recent tade data which causes it to build an ai that only works well and has a high winrate in that specific time/dataset, i am curious abt how to make it more versatile and durable in any environment?
Yoo guys, I'm here to understand on how ai are able to like be able to get hold of the images for example being able to see a image of a certain skin issue and being able to identify it, how does it do that and what math is required behind it?, does it compare the images vectors of each pixel with the test picture? Or something else.
there are probably several ways that could be achieved. but the most traditional approach is to have a set of images of skin issues, where there's a record of which images depict which skin issues (this is called labeling--the images are "labeled" as belonging to a certain class, such as a skin issue, even if the "label" isn't an intrinsic part of the image (like a caption))
and then you train a convolutional neural network on those images, with their respective labels
in this approach, when you go to use the model "for real", there's no direct comparison between new images and the ones that they were trained on
The idea is that in a descending fashion each layer of a transform removes one heading so we have four on four layers by the second to last layer you arrive at 1
Which is somewhat similar to how a visual cortex behaves
But if the prediction is wrong it adds intention head to the layer
Incorrect layer by layer making it more like a human brain
Wtf is this yo... 😭
So I've been working away
And I think I've figured out how to do diffusion on a graph
hey everyone ! , my self om and i am new in this server...
A ai formula
😔✌️
I am thinking of making a visual recognition program is anyone here experienced with this
I sort of described how that works up here: #data-science-and-ml message
you'll want to look into convolutional neural networks.
Thanks alot pope
I absolve thee 🖖
So, I am reading the book Hands-on machine learning and that is already explained
what I really wanna is the programming part
How do I prepare a dataset, do I even need to prepare one?? Do I need to train a model and all of that stuff
I can't really walk you through, like, all of what you'd need to code to do it. but if the book is "hands-on", does it not have code examples?
you should find an existing dataset from a website such as kaggle.
you do need to train the model--that's the point here.
It is, it has a lot of code examples but I wanna make my own project that is not in the book and that actually trains myself
Okay thanks alot
I will check and see what I can find
The Ancient Secrets of Computer Vision
https://pjreddie.com/courses/computer-vision/
An introductory course on computer vision originally held Spring 2018 at the University of Washington.
@half pulsar : I wanted to say thank you! I am reading about the libraries you told me. NetworkX looks very interesting
@half pulsar Also PyKEEN . I was wondering if one could combine this with ModernBert
Of course!
Yes you can combine them they can complement each other
With adamW and a decaying LR should the gradient norm decrease over time? What does it mean if the gradient is more or less constant? And the loss is decreasing slowly. Is it circling a minimum?
I’m building a crypto trading AI and its learning speed is hella slow idk its supposed to be like this but its has like a 100x slower learning rate compared to my stock market trading AI which was converging in only a 8 hours of training. While this one is growing it its at a much much slower rate its been over 5 hours and its winrate only increased by a measly 0.2.
😭 ima take the laughing reaction as its not a good sign
Atleast its only going up😭 ✌️
Why ppl be putting that laughing reaction pls someone enlighten me
✌️
how long will ts take 😭
i believe 24 minutes and 31 seconds
if the inputs to a model during training are scaled should scaling also be used for inference?
what is it?
Yes
i think its working
😭 ✌️
Book for linear algebra?
check the pins
the pinned message from zestar has an mml book whose first 5 sections are on linear algebra fundamentals
Hi, quick question. I was reading about symmetric and asymmetric semantic search.
When it comes to symmetric semantic search, I thought that the query and the document should be of the same length. To some extent it's true but can also differ by 1 or 2 words. What matters is the natural flow/intent of the sentence?
For e.g, if as query I typed: "How to learn Machine learning", in the document I expect something like "Learning machine learning"?
In contrast with asymmetric semantic search, we try to compare the content of the query and the document, like if query is "What is machine learning", as document I can have: "Machine learning is a subset of Artificial intelligence which involves..."
Can someone confirm if the above statements are correct pls... would really appreciate if someone can add anything up to this if there is any clarification missing.
can someone explain the difference between CNNs and RNNs?
there's no similarity - a CNN is a neural network with convolution layers (see pytorch's Conv2d for an example), a RNN involves a hidden state that gets repeatedly fed back into the network.
in a MLP all the neurons in the layer are connected with others neurons in the before and after layer, and CNNs are just a few, right?
you could sort of represent a convolutional layer as a sparse ordinary layer, but it's a bit of an unnatural way to look at it
a convolutional layer is for data which has locality, so to say - where nearby cells are related. images, video, occasionally audio
that's why it makes sense to do what a convolutional layer does (have each cell of the output only depend on a small neighbourhood of it), whereas dense layers are used where there's no reason to expect locality.
but it's a bit of an unnatural way to look at it
(specifically, because a CNN usually involves not just conv2d layers, but also some non-learable ones like MaxPool2d)
hello, can anybody help me to fixing my error? i want to use gemini api key but the terminal say "the api key is not provided"
try google ai studio
Yo, so how does numpy even work
how can it convert a dataframe into arrays that can be understood by a computer
There is basically no conversion to be done. The dataframe stores the data in arrays, which are the columns.
It just gives a reference to the array to Numpy.
dataframes from pandas use numpy internally. the idea is that the data is already stored in the form of an array in memory, and numpy/pandas just wrap that array
Slight addition to that, Apache Arrow stores those columns in chunks.
But that does not change much (process them one at a time or in parallel as intended).
Oh
to add, the way it is implemented, if an object has __array__ method, numpy just calls it when creating array from it, and pd.DataFrame has it
yeah exactly, pandas basically wraps numpy arrays under the hood.
so most of the time when you convert a dataframe to numpy you're not really "cconverting", it's more like exposing the underlying array in memory.
that's also why operations with numpy/pandas are fast compared to pure python lists
im curious about some statistics - how many people out here are into deep learning, how many of them are using pytorch and how many are using tf in contrast to classic ml with things like pandas, sklearn, sql and whatever there are
Hey y'all, I've been researching the limitations and capabilities of my 1660 Ti 6Gb GPU. I don't mind using existing LLMs for specific purposes, but I would like to build (and maybe train??) a model which can at bare minimum maintain short conversational english. It doesn't need to have thinking or reasoning, tool calls or agentic functionality. I am hoping this is possible, either via a custom Python implementation or using existing solutions which can be modified for the aforementioned basic conversations. Does anyone have ANY tips for me? My previous attempts technically worked, but from my uneducated perspective the results were quite poor. I don't know exactly what to be expecting or what to look for yet. Any tidbits of info are appreciated. Thanks!
you might try using this, the smallest version of GPT-2 (the original ChatGPT was based on GPT-3): https://huggingface.co/openai-community/gpt2
and if you can use quantization to represent each parameter in 16 bits, that would help even more.
Before I go on a deep dive, is that a pretrained model, or what specifically is the link for? I was able to run Qwen 3.5-9b in LM Studio, but as I mentioned I really would like a custom solution. Does that link get me closer to that goal? And sorry if i'm asking questions which the link answers, lol
Lol i opened the link, second sentence first words "Pretained model"
you won't be able to train (ie, from scratch) an LLM that can even form coherent sentences on a consumer-grade GPU.
and even if you could, your hard drive probably can't store enough training data to do it, and you'd have to leave it running for weeks to train
That does make sense
you'd do better with colab, bigger gpu on free tier. (T4 15gb - lmited time, depends on availability)
as long as your carful about how you manage things... short runs, checkpoint often etc.etc.
but your STILL NEVER gonna make a LLM from scratch that can do anything worth even speaking about other than being able to say you made one...
not on anything consumer or free..
you could spend a bunch on A100/H100 time.. but that gets costly... fast.. and i wouldnt even consider that until you have some experience or you'll just waste credits..
there are other options also.. but im not so familiar with free offerings or pricing and availability.
but on the 1660 alone your essentially limited to just running inference on small models
(you could probably do some stuff with 1b-3b models.. but i dont think it would be worth it)
LLMs suck why would you want to make one from scratch.
There's much better things to build, world is too stuck on it for quick money.
that's a pretty subjective view..
if you have interests in ML/AI then it's actually quite interesting to look at doing SFT finetuning (i do it myself personally.. only at 7B-12B size though)
making you own is a big step though
But as a learning experience if your into the field..
why not..?
Transformers suck
I have no interest in profiting or making "quick money" from a custom LLM (if it WERE possible to do). I want to learn. I like Python.
I am now looking into fine-tuning, do you have recommendations for where to learn about it?
Just stick to simple stuff first, learn how the algorithms work, experiment and develop a fundamental understanding then scale if you find something interesting.
You don't need LLMs or Transformers or Pretrained Models.
The whole root of my original question was regarding all of those, though, so what are you referring to in the message above this?
i dont really have resources at hand.. but there's plenty of if no around if you search about for it...
unsloth have details on how to use the tools if you use that.. and some basic guides on how to do SFT.
else just search around.. some guides/posts on huggingface that are useful also.
I will do some digging online, thanks!
Just diving into the deep end like that is not good practice if you're trying to learn, Don't expect meaningful results anytime soon, keep the wins small and realistic.
yeah but thats not even what you have been saying..
youve been going.. "why llm's, llms suck, transformers suck, therefore dont bother.."
I learn best when facing challenges, so that's inherently incorrect in my case. I appreciate the advice regarding small and realistic wins, though. I do realize that now, since folks have been talking about the very low probability of being able to do what I originally intended.
yeah im the same.. i learn best by setting a challange, a target and learning to do it..
which is why i find the above grating.. it's not helpful.. it's actively demotivating.
That’s not what I’m saying. I'm not saying "LLMs suck so don't even bother" I'm just saying that jumping into building a full transformer/LLM scratch is a terrible starting point if the goal is to learn. Those systems are the result of many years of work between many fields, If someone skips the fundamentals and goes straight to LLM, they'll end up with just a bunch of Libraries without actually understanding what's happening.. If the goal is learning ML/AI, it’s far more productive to start with the underlying algorithms, gradient descent, simple neural nets, attention mechanisms, optimization, etc and experiment there first. Once you actually build and understand those pieces, building and modifying to achieve better results makes more sense. Fundamentals then thinking out the box is where the magic happens.
i must be hallucinating then.. i see you say it.. at least 2 maybe 3 times..
anyway it's pointless to argue..
i actually agree with that last point..
but everyone learns differently
@serene scaffold
it's better to just ping <@&831776746206265384>
!clban 1482509195642146869 spam account
:incoming_envelope: :ok_hand: applied ban to @grim turtle permanently.
!cleanban 1482509195642146869
:x: User is already permanently banned (#108860).
lol
no experience needed😆 Wow
Ello
been using it a bit lately.. more than before..
But these things are still not really reliable enough IMO, they can help for "the super common things" we all get tired of doing..
and they are improving. but still, better to do things yourself often
copilot/codex do make good doccomment writers though...
well.. untill they dont
I mainly use it to discuss and review my projects, but yeah it's not good enough for a whole project. I overused it in a project once and in the end had to fix half the script because Codex mixed stuff up
Claude in my experience is better
generally my position is.. if your learning.. not really a good idea,
doesnt really help you learn and they still make to many mistakes, if you do learn you might just learn bad things...
but if you already know, and can properly review stuff. they can be helpful
My dad works for cybersecurity company and are trying grok right now😅
I hadn't have any luck with Codex, I've tried in in VS but maybe that's just not a good place to use it?
Ooof, I don't think Grok is that good 🤣
I think you learn nothing at all, I mean it just does everything for you and that not very good...
gpt5.4-codex in codex in vsc codex app is not bad (recently)
though personally i do thing through RooCode -- which is a great harness for it much better prompt management IMO
That's what he said lol🤣
yes.. exactly..
if you learn anything it's from seeing the code and how it was used..
but if the codes no good.. then... you not learning anything good 🤣
Yeah I've liked the new 5.4 Model, It's been quite good at documentation
But if it is a long code or has some niche stuff in it, it gets confused real fast, I would love to not use it anymore, not even for reviewing, but I paid for this month already😔
Maybe but is it worth 20 bucks a month, I dont think so
Yep, and using Claude is also expensive but it seems the best for handling complexity.
I just pay via usage.
fair
Github Copilot 5.4 + Claude has been great for documenting my Large Codebase hasn't gotten confused yet, only started using it a few months ago, saves a lot of time and that's quite valuable to me. But I'd never let it make modifications especially alone, It needs to be "steered" often.
well some of thats down to how you frame tasks and your prompting skills.
but yeah honestly.. thats still the weak point of a lot of coding agents
i DO have a few somewhat complex projects ive been doing on the side, which are entirely Codex 5.4 just to test how good things are now (every model release i give one a project to do to see how things are shaping up)
and 5.4-codex has been surprisingly competent
But i still don't think these models are ready to be trusted yet..
IMO AI shouldnt "replace" you doing your work anyway.. when they get good they will be great as "accelerators" lets you do things faster, but i dont think they will ever really match a human dev with real world experience.
Totally agree
do be a language, and reading a language translates train of thought 
Documentation has always slowed me down the most, I'd prefer to not skip that part
I made some stuff for aviation where there was a lot of niche stuff in it and I really had a bad experience with it, with 5.4
For learning it might be trusted enough as a "review pal"
But you definitely shouldn't use to learn
Ah yeah that make sense though I don't think OpenAI really even has that as a target for this model.
that might be out of distribution. i mean, i dunno if codex has ever trained on that kinda stuff..
Probably not, Aviation nerds isn't the largest community I think, especially in coding😂
That's the problem you run into is when it doesn't actually know, it's just going to make it up, make it sound convincing after all its a text predictor not a thinker.
It doesn't know what it doesn't know
What it handled surprisingly well though is the aerodynamics presets
big problem with ai in general, not jsut codex, but has gotten better too with 5.4
Maybe some day ChatGPT will learn what "I don't know" means 🤣
the other thing we all have to remember is that, it's a bell curve
it trains on a huge distribution of code.. a LOT of that code comes from public repositories and code katas and similar.
this means that the output your getting is the most statistically likely.. and on the bell curve.. thats usualy a fair distance from the best stuff..
They forgot to put that into the training dataset 🤣
NEVER!😂
ohh that one is entirely OUR fault.. (i mean humans)
RHLF trains models not to show to much uncertainty. this biases them toward provide an answer.. NOT ask for more information..
"the most statistically likely", sounds genius and catastrophical at the same time tbh
they do try to offset it somewhat..
but if it's asking to much. or showing to much ambiguity.. thats bad for the powerpoints 🤣
LLMs are architecturally flawed and no company wants to face that, we're just watching them try to work around and band-aid "fix" its flaws.
They hope that it'll somehow somewhat work🤣
i dunno if i agree with that entirely
as yet they have been one of the most effective ways to do things...
is it perfect, yeah no...
but, it's better than a lot of the older methods in the ways that the leaders in the field seem to care about.. 🤷
I do personally think that we settled on things and put a whole lot of faith in them though..
i dont think they will be the answer to AGI and other such things.. not really
Good at natural language and that's how LLMs demonstrate emergence. But we can't try to make it out to do more unless you want to change the architecture. I believe that LLMs are just the "workers" while it still needs a true brain.
neither do I, I meant it as a joke, if they would hope that, ai would be cooked
I agree that they are not a answer to AGI not even close!, AGI wouldn't need a pretrained model.
But a big problem in the ai industry is the ai overuse, I mean Meta, we do not want Meta ai in fucking whatsapp
i think we share the same thought here.
the way i imagine things in future.. i do think that an LLM (of some form) will be an important component in a larger multimodal system with other kinds of models aswell. the llm will be the language processing part.. but i dont think tranformers alone can solve the other problems.
LLMs are enough to fabricate the appearance of Intelligence but it's not the Intelligence itself. They look cool so its quick money for them but its still afar from the true goal we all want.
We need Colossus and Guardian
I totally agree
Well fellas gotta go, nighty nighty😴
Night dude, good talk.
same, was kinda shy at first lol
for nn, its better to starting in tf and keras, or pytorch?
pytorch. forget that tensorflow even exists.
rn i'm read the hands on with ml, and in the book the main focus was tf
then the book is outdated.
but like, i can see the tf code and put at pytorch
the neural network concepts will still be up-to-date, but no one actually uses tensorflow anymore.
do you think it would be a good idea for me to continue reading the book, and just ignore the tf code and look up how it would be done in pytorch?
if you've already started reading the book, then yes.
i'm on chapter 10, the part that goes into nn
sure. keep reading the book, but don't focus too much on the tensorflow part, because you'll want to switch to pytorch asap.
ok, thanks
Hello im building an AI diffusion LLM similiar to mercury 2 by Inception; if you want to help or partecipate in this project, dm me!
Go way smaller. Try training on the TinyStories dataset and getting it to generate little stories.
Man I can't believe 3090s last year were worth 500 dollars and DDR4 was borderline Ewaste and now they're Gold again. Horrible time to buy hardware
Can't even build a Xeon with 256GB RAM(Either DDR3 or DDR4) Server for a few hundred dollars anymore.
Infinite money loop for datacenters though now that I have to rent 🥴
That was my initial direction, I was just struggling trying to balance number of parameters with the actual throughput potential of the card, so that neither one is the bottleneck, and I just can't find a good number yet.
Does anybody have a resource I can read about generating embedding vectors for Attention? Everything I read about has to deal with language/word processing specifically. In this case, you can define your embedding dimension as the length of all vocabulary. My feature space instead deals with only continuous numbers (which I normalize)
do you even need embeddings if you feature space is reals? like the reason you use embeddings is so that you can convert text into a numerical value, but if you already have the numbers, you don't need to embed them
also, regarding dimension matching vocab size, that sounds like one-hot encoding, which you wouldn't really use for text anyway, that'd be far too many dimensions, text embeddings are like, idk 256 to 4096 dimensions
for categories, sure
but again, if you already have numerical values, you don't need to embed them separately
I was looking into adding attention to my LSTM and what I've read is it uses embedded dimensionality which 99% of written text I've found assumes language processing
I think you should be able to just skip the embedding part and pass in vectors of your numerical features
Thank you, I'll keep this in mind as I continue in my review
What's the best in class unsupervised clustering algorithm. I have very noisy data, a lot of it (2 million) with like 16 features. (also pretty sure that there must be atleast a 100 clusters in it)
My current findings using GMM, reveals k=13 for layer 1 and then redoing clustering on these parent clusters resulted in on average 10 more subclusters. (often sharing common semantics with other subclusters of other parent clusters)
what should i learn so i can able to get good rank in kaggle? anyone having any idea?
I don't have an answer to that, but employers won't care about your kaggle rank
manifold is basically the 'shape' that data has when placed in a visualization space? Or is there something more to it?
Someone can correct me, but I think manifold describes any abstract mathematical object with dimensions exceeding what humans are familiar with visualizing
Hi, I was reading a bit about why casing does matters in NLP, especially with the example of Apple vs apple. I was wondering, for a news classification, would casing matters?
it's hypothesized that most real world data lie on shapes with specific properties, called manifolds