#data-science-and-ml
1 messages · Page 182 of 1
!warn 1398328649857372222 your message was removed for hiring, which is not allowed
:incoming_envelope: :ok_hand: applied warning to @dull glade.
how long is going through "10 minutes to pandas" actually supposed to take
have been working on ts for the past 2h
10 minutes my ass 😭
I'm curious about parallelizing training experiments, and the effects/bottlenecks it has on performance/training time
For my current models, I have an excessive amount of VRAM - I could train maybe 6-8 at a time. Right now I'm training 2, but it seems to have slowed down quite a bit
I know very little about GPU hardware. At least when it comes to CPU, I can monitor performance and I understand there's a difference between "normal" threads and hyperthreads. When you start maxing out bandwidth and all of the hyperthreads, you're going to start noticing a nonlinear drop in performance (I think). I have no idea how to monitor my gpu besides nvidia-smi
All that gives is memory usage, temperature, power, "Volatile GPU-Util", and process by process memory usage
Oh, it appears "Volatile GPU-Util" may be akin to what CPU thread usage is. Oh boy, i do see that nearing 100% 😂
if you're on windows, you can open task manager, go to the performance tab, click on your gpu, and see whether 3D is maxing out (iirc)
if it's an nvidia gpu you can switch one of the tabs to "cuda" and check that instead
so the data when placed in a multi dimensional graph, the data get a shape, but we can’t visualize, so we put in a way that humans can see (2d or 3d)?
I'm on linux, nvidia-smi is the only tool I am aware of
Agreed
I don't know about a good reading, for that it's gonna take a bit of thinking outside of the box. But you can definitely do it.
It's high in Complexity and the risk of it losing stability at scale is high.
yes
however I stress that it's not those manifolds are 2d or 3d, they can be much higher dims, just that to visualize ppl draw them as 2d or 3d
Manifolds are great
At least practical experience gives me a little more perspective, if hardware prices ever come down again so I can build a dream server 🥲
Non Manifold vs Manifold(My Software)
You're pretty much right with not being able to ever fully visualize it. I can only draw it in 3D space. But It runs in 96 Dimensions
can you explain more about that
the first picture I understand, but the second one i get lost
How are y'all today
Me waiting 5 hours for 2 models to train in parallel because I guess my GPU's compute throughput is a lot worse than I thought
(without knowing metrics that's meaningless, but usually 1 model takes <= 2 hours)
me crying in 1660 Ti watching any model train, at all 😭
dang idk, my last card was a 1050 Ti. My buddy gave me the 1660 when he upgraded his setup
yeah it's tough out here, but at least the card keeps up with most modern games which is fairly surprising for only 6 GB VRAM
wym by nf, i'm kinda lost lol
whaaaaa
I mean idk what you're trying to train but obviously can't train anything like an LLM. There's always a limit
No front, it's not very common so no problem
Mine does seem to have better hardware specs though
out here, sounds like a hard life, lol
I was originally trying to train a small LLM, but the time to train was unbearable
And the output even when training on TinyStories was very low quality
Yeah dude, training an LLM is not worth your time
I know D: it is sad
According to techpowerup benchmark, 1660 Ti is roughly 71% performant to mine. So I would still have a terrible time training LLMs
Why? LLM training sounds boring
I just want to do it, that's literally all
The literal entire industry runs on consuming 90% of electricity on the grid
That's like being interested in racing yachts
Not something I could ever have any interest in
That's the great thing about personal interests, everyone has different ones!
I am training a gemini api backed llm, my original goal was to run it fully local now it's gonna be mostly gemini because it's really time cosuming. Privacy gone but I have an ai assistant😂
I have local LLMs I didn't train them at all
You can download open source ones and just let them do inference without training
I don't think the point is having an LLM that is already trained, at least not for me
I can run models locally no problem already
There usually not very good sadly
at least I had that experience
You still need like 128GB RAM to run a decent model locally.
If their models can't run good locally, I presume anything we'd train would run even worse
What do you consider a decent model??
Unless you had like 5 million dollars to blow
LMAO
bro i am on a 1660 Ti. A decent model to me is one that runs on my machine without exploding. Appreciate the input though
also true😆
I mean for actual work. That it can do for you.
ELIZA
Yeah. I'm pretty sure that also depends on the intended use case.
Can one run pytorch with AMD gpu's or is it all cuda based
Yes.
Asterisk.
Need modern AMD GPUs, and support is sometimes shaky, but fixed mostly by the OSS community. Vulkan versions (not ROCm) exist too and have been making a lot of progress.
Specific machines with 128GB RAM needed are sold for this purpose. They often use the AMD Ryzen AI 395+.
Unified RAM/VRAM.
They go for about 2600-4000 USD.
GPU, price at launch: $1200, 2022
Price today: $1600

A gemini api runs on raspberry pi 5 with 8 gig ram, right?😌
You can also make your own setup with a bunch of dedicated GPUs and this can work, but I recommend SoC for lower power. With 4 AMD GPUs you still have less RAM (64 GB) and your electricity bill will cost you probably at least 300 more a year.
Which is why AMD, Nvidia, Apple, etc are selling these SoC boxes with a lot of RAM now.
You don't need as much compute, just a lot of RAM (for inference).
(Big model, and big context window)
Also they run a lot louder, which will get annoying unless you have a room just for it.
Also it's pretty big then, need a giant box for the case.
Man you guys lucked out
Last year was the last year you could get 3090s for 500 bucks, now they're double that
are there any open source llm models I can run on 1660ti (6gb) gpu, I remember trying some using ollama but it was very bad,
I mainly want to use it to generate text for maybe a description for youtube video or a tweet maybe U know basic stuff but to automate it.
I am in the same boat. It depends on how much system ram you have, and the model size really impacts the speed of output, but I am able to personally run Qwen3 1.7B at Quantization 4
I have 16gb ram,
not sure what/how quantization works. though..
Is Qwen3 decent?
last time I ran small model locally it hillucinated pretty bad like it was drunk
I think for the purpose of what you are trying to do it should be plenty? I'm honestly 50/50 on it, i've only used the model a short while so far. Quantization is a bit out of my repotoire for readily available explanations, if someone else can explain
the one I am using has reasoning and tool calls, so if that helps you decide
nw, thanks I'll google quantization iirc it has something to do with floats sizes like convert them 16b -> 8B but will search if i have to do this manually or itis something handled by ollama/models
No no no
you don't need to do manually
you can download the model in that format, but yes it does work with the float sizes!
"Qwen3-30B-A3B" this is the exact model People say is best
I cannot run that model at a realistic output rate to make it worth running to begin with, but yes that is a solid model for the card you and i have
Thanks alot will check this one out
Sounds good
Yay i fixed it i spit the ai into multiple AI’s that each control one part of the trading logic and it basically works now
9+
@limber plover How's it been?
I am finding the sklearn.feature_selection.SequentialFeatureSelector to be incredibly sensitive to the choice in its hyperparameters. Is it right to basically start brute forcing the aspect of the machine learning problem and accept whatever gives you the best outcomes?
Mainly, you have the scorer, the estimator, the direction of the search, a tolerance. But there are dozens of scorers, and dozens of estimators, and decisions of which is best rely too much on over-interpretitive thought processes.
This is a sample
df_work_trn, df_work_tst, df_trgt_trn, df_trgt_tst = skl.model_selection.train_test_split(
df_work,
df_trgt,
test_size=0.2,
random_state=42
)
selector = skl.feature_selection.SequentialFeatureSelector(
skl.naive_bayes.GaussianNB(),
n_features_to_select='auto',
direction='backward',
scoring='balanced_accuracy',
tol=0.0001,
cv=5,
)
selector.fit(df_work_trn, df_trgt_trn)
feature_names = f_work.columns[selector.get_support()]
print(feature_names)
print(len(feature_names))
there are a total of 38 features, but I can end up with anything between 3 and 33 features depending on these hyperparameter selections
for instance, I can replace 'balanced_accuracy' w/ 'roc_auc', or replace GaussianNB with XGBoost. And so on
and note that prior to this I performed an alternative feature selection using statistical tests. Got p-values from Logistic Regression and chi-squared tests for independence. Those produced features that resulted in a certain f1-score found in the high 80s
But, again, the feature set was as seemingly arbitrary as anything produced by the SFS approach
Hi, quick question, is there a recommended asymmetric semantic search model? Currently I'm using my base model of MSMARCO but just wondering if there is a newer/better one?
https://huggingface.co/spaces/mteb/leaderboard
filter by your usecase and hardware
noted, will give that a look, ty !
well one, "description for video" and "description for text tweet" are in wildly different parks of complexity; the former requires vision capabilities. esp. since you're on very constrained hardware you should try to narrow down the scope then search for specialized models
and 2, personally I advise moving away from ollama to just llamacpp, the former sits on top of the latter anyways and uses different api from everyone else for some reason
do we have a technical term for "similarity search"? I;m trying to filter by that but no results found 😭
retrieval
noted, ty !
or well, probably - what are you trying to do exactly?
like, I have an app, an animal welfare app, currently I look for exact keywords to return some results. For e.g, user type "dogs", anything that has keyword dogs is retrieved.
But if I write labrador, I want to also have dogs to be retrieved since they are semantically the same
yeah probably retrieval then
noted
question, have a look at this picture, this answer was generated by a LLM. Am I reading wrong or it's the LLM which is wrong? Normally as size of dataset increases, shouldn't we increase number of epoch?
I'm training a model based on a triplet dataset.
the idea is they'll have seen the same amount of like "total data"
so you don't have to recycle the previously seen data over and over (and risk overfitting)
yeah I see
does this differ by the task we are performing? For e.g, say we are classifying images, in this case, from what I have been doing so far, I always tried to increase number of epoch and training data to have a better performance
well you didn't only increase the number of epochs did you? data augmentation means that technically it's not really the 'same' image
yeah I see
Could someone correct me if I'm wrong, please? Basically, what a neural nn does is the following:
Each layer has its neurons, and each neuron in that layer generates a hyperplane with its own activation intensity. If the intensity is high, the hyperplane remains on the map; if it's low, it's deactivated, meaning it disappears. In the end, we would have several intersecting hyperplanes.
After that, for better visualization, we ignore the finiteness of the hyperplanes (but they would still be infinite), that is, where they intersect, they end there (so having a shape that represents how all the hyperplanes would look together) The final neurons observe which hyperplanes (neurons) are active and perform a linear function to check if these hyperplanes belong to them or not (basically, if the shape of the hyperplanes represents their "village"). If the output neuron perceives that certain neurons are active (that is, the shape represents its village), it understands that it belongs to it
XOR problem.
If you combine a hyperplane with a threshold (basically an if-statement to make a decision), you get a binary classifier ("above" or "below" the hyperplane), where the hyperplane is the decision boundary. Using non-linearities (and more than 1 layer) this decision boundary is a warped curving thing.
And to solve XOR this is required, since a simple line can't separate it correctly.
XOR problem being to have the model mimic an XOR gate.
(One of the original problems that caused the search for something better and ultimately multi-layer, nonlinear, solutions (followed by deep learning (backpropagation) as a way to train it))
That's basically what I was trying to say: each neuron produces a line/hyperplane with its own slope, and by joining them we can form curves and more abstract figures
for regression its same the logic I think, but like, how a group of lines and hyperplanes can do a regression problem?
in classification I understand that we need to separe the date with the lines, but in regression i get lost
Hi quick question, when building a neural network in pytorch for nlp task multiclass classification, how do we determine how the mode should look like along with the hidden layers?
seeing what has worked for other people. and trial and error.
yep noted, ty !
Remove the threshold, it's the shape itself.
Passing through/near the points.
Warping lets you pass through them all perfectly. But now you have another problem, how does it handle new points? A very warped thing wiggles and jumps all over the place, maybe the new points are just simply in between the others (linearly for example).
Also what happens if you keep adding more parameters?
I understand in math concepts, but I can’t visualize geometrically on the graph
https://www.desmos.com/calculator/v4lsww73sl (Polynomial regression)
You can drag around the points btw.
So each neuron create a region, and each region have a different function (ik just change the inclination of the line)
And each layer transform more the region
Just think of it as forming one big function that is very bendy and can bend to fit the points.
The neurons are just part of the computation of that big function.
You could write it out as a big function.
The graphical representation with "neurons" is actually just a visual way to represent that function, specifically what is sometimes called a "compute graph" (a DAG).
For example you could represent f(x, y) = x + y like that in symbolic algebraic form. Or as a 2D visual graph with 3 nodes ("""neurons""" in a deep learning model), the first 2 are the x and y inputs, and the third is the plus. So it looks like two nodes feeding into a third (x and y into plus).
You can also represent it as a tree, which is what a parser of that expression does. Plus is the parent, and x and y are children.
Execution is then just left child, right child, parent in that order. Where encountering x means push x's value into the stack, same for y, and then when you get to plus it pops the top two off the stack, adds them, and pushes the result.
The compute graph is a more generalized form of that, using a directed acyclic graph, which is more flexible than a regular tree graph (both don't have cycles, but one is directed).
f(a, b) = e = c * d = (a + b) * (b + 1)
The "neurons" are just part of the expression/function.
Finally we're getting some math in here. Best way to think of it, only way.
I didn't know that if I wanna build an app AI like Grok and ChatGPT I must use React Native.
I really thought that with python I can build my AI app multi-platform hybrid
no you dont
Hi, quick question, padding and truncation in nlp text classification pipeline, when does it occur? I thought it was at the tokenization level but I think I was wrong, it occurs at another stage?
Hey guys quink question how do you remember Python codes I'm having problem in remembering code if anyone can guide
padding, like for an array?
yeah like before we feed into the model
I understand that we need it to have the same size/same tensor size
hello, this is the data science channel. your question doesn't seem to be about that. try #python-discussion
Oh okay
the text usually has to be tokenized before it can be expressed as an array.
and by "usually", I can't think of a counterexample. but things in AI usually aren't absolute.
yep I see, small question, when do we add the unknown/ out of vocabularly token?
during tokenization
why is ocean s two words in text?
in the dataset it's, it's written as "ocean s" instead of "ocean's"
ocean s twelve raids box office ocean s twelve the crime caper sequel starring george clooney brad pitt and julia roberts has gone straight to number one in the us box office chart. it took $40.8m (£21m) in weekend ticket sales according to studio estimates. the sequel follows the master criminals as they try to pull off three major heists across europe. it knocked last week s number one national treasure into third place. wesley snipes blade: trinity was in second taking $16.1m (£8.4m). rounding out the top five was animated fable the polar express starring tom hanks and festive comedy christmas with the kranks. ocean s twelve box office triumph marks the fourth-biggest opening for a december release in the us after the three films in the lord of the rings trilogy. the sequel narrowly beat its 2001 predecessor ocean s eleven which took $38.1m (£19.8m) on its opening weekend and $184m (£95.8m) in total. a remake of the 1960s film starring frank sinatra and the rat pack ocean s eleven was directed by oscar-winning director steven soderbergh. soderbergh returns to direct the hit sequel which reunites clooney pitt and roberts with matt damon andy garcia and elliott gould. catherine zeta-jones joins the all-star cast. it s just a fun good holiday movie said dan fellman president of distribution at warner bros. however us critics were less complimentary about the $110m (£57.2m) project with the los angeles times labelling it a dispiriting vanity project . a milder review in the new york times dubbed the sequel unabashedly trivial .
are you sure you didn't remove the ' when you were data cleaning?
yep pretty sure, here is the link:
https://www.kaggle.com/datasets/yufengdev/bbc-fulltext-and-category/data
is that an indication of a poor dataset? 😭
Should I switch to AG news instead of this dataset?
you know how 's at the end of the word indicates posession? it's useful to treat that as its own word, so that the model can essentially learn that that's what it means. I think they should have kept the apostrophe.
yeah I see, I can use regEx to replace a space-separated s as being treated as 's, can this be helpful?
depends: what kind of model are you going to train?
multi class classification... I was reading the rows of text, the rows feel strange tbh
have a look at this one:
s korean credit card firm rescued south korea s largest credit card firm has averted liquidation following a one trillion won ($960m; £499m) bail-out. lg card had been threatened with collapse because of its huge debts but the firm s creditors and its former parent have stepped in to rescue it. a consortium of creditors and lg group a family owned conglomerate have each put up $480m to stabilise the firm. lg card has seven million customers and its collapse would have sent shockwaves through the country s economy. the firm s creditors - which own 99% of lg card - have been trying to agree a deal to secure its future for several weeks. they took control of the company in january when it avoided bankruptcy only through a $4.5bn bail-out. they had threatened to delist the company a move which would have triggered massive debt redemptions and forced the company into bankruptcy unless agreement was reached on its future funding. lg card will not need any more financial aid after this laah chong-gyu executive director of korea development bank - one of the firm s creditors - said. the agreement will see some 12 trillion won of debt converted into equity. the purpose of the capital injection is to avoid delisting and the goal will be met david kim an analyst at sejong securities told reuters. south korea s consumer credit market has been slowly recovering from a crisis in 2002 when a credit bubble burst and millions of consumers fell behind on their debt repayments. lg card returned to profit in september but needed further capital to avoid being thrown off the market. south korea s stock exchange can delist any firm if its debt exceeds its assets two years running.
that's what the model is going to do. but what kind of model is it? BERT?
I mean sometime it doesn't make sense, like one of the firm s creditors, it seems it removed all 's
ahhh no I'm going to use a LSTM, no transformer for now
word2vec followed by a lstm
it looks like in this dataset, when "trailing s" is used for pluralization (rather than posession), there's no space separation. so you don't need to worry about "isolated s" being overloaded (to mean both posesssion and pluralization)
so adding the apostrophe back doesn't make a difference.
yep I see, will just give it a try see what results I get later on
how do i make a counter on openCV so that as soon as I start the camera, the counter goes until the stopkey?
like a time counter mb
!warn 1443252853874495528 your messages were removed for offering a form of employment, which is not allowed.
:incoming_envelope: :ok_hand: applied warning to @calm cargo.
If I want to teach an AI colors like the name of certain colors I would need to give it every color combination and type
Is this a question or a statement?
Question
Let's say to make x color
You need this this and this do I have to type out each and every combination so it has multiple references to a variable that can change
@opaque condor what does it mean to "teach a color to AI"? be as specific as possible.
It really depends on what the input is. I'll just say, this is a fascinating little problem that would be perfect as a beginners project. I'm going to take and it flesh it out a bit, so the next time I encounter someone asking "what's a getting beginner ML project" I can show them
So for starters, in the most practical sense, you don't need an AI for this. A color is just a number (usually RGB). Imagine a 3D space just like the X, Y, Z coordinates we live in. A color is just a set of three coordinates just like X, Y, and Z - it's RGB
Right? So in a purely informational / numerical sense, to pair "a color" the number with "a color" the name, you just need a dictionary
And if you want a little wiggle room, you just snap to the nearest named color
I think Mechanical Fox wants to train a generative LLM to be able to answer the question "what colored paints would I need to mix together to create color x?"
Well I was circling in on this. So @opaque condor in your own words
It was going to be for
Mixing paints
Or subtractive colors
So I can mix pigments or paints
CMYK?
Yes
Well, I can't speak to the exact nature of pigments - I recommand you take what I'm about to tell you to someone with actual training in color theory
I do painting but I'm wondering since this is going to be in text do I have to type down each individual pigment color separately to mix together etc
You don't need a generative LLM to achieve that
^
Honestly, the standard approach to representing a color is by writing in RGB or some other format
Well I wanted to make a specific machine learning algorithm that could tell me how many milliliters so I get the perfect shade
That color, by definition, is the recipe
And if you need a mapping from color recipe to color name, just treat it as a vector space and snap to nearest name in the space for each set of coordinates (or vice versa)
This is just color theory + math. CMY/CMYK can approximate it, and you solve for pigment ratios with optimization.
Then again you may as well use Physical Pigment Models.
I wouldn't trust a LLM or Transformer here, as if you want real reliable results then you're gonna have to actually model it physically
hi - I'm trying to build some things tangentially around AI/ML workloads: what is it that is important to have in a language or elsewhere that would be of interest? I mean things that python or even C does not either make accessible or doesn't have as a built-in thing.
Secondly - where can I find code that these people running long-job inferrence or other things are running? I want to test things with comparable jobs to the real world
sorry for the delay in responding lol, btw I think I got it, so basically it's that?
(1 input, 2 hidden, 1 output)
Try making just f(x)=max(0, ax + b) in Desmos, and mess around with a and b as sliders.
Then combine multiple and mess around with the sliders.
i didn’t put the bias for makes it more simple to visualize, but now that’s correct, right?
The bias plays an important role. ax is linear, but ax+b is affine.
In the linear algebra sense. In calculus both are linear (contextual meaning).
yeah ik
finally I understand
thks
I'm pleased
I've joined up on a server for AI researchers
And as best I can tell, these are the people publishing the groundbreaking work. As a general rule, they don't let people who don't publish join up unless their willing to just lurk
So I shared with them the architecture I'm working on for an MtG AI
And for the most part, they tell me there's no obvious flaws
where can i learn about fine-tuning a model? Yes there are a lot of videos out there but what resources do you recommend?
I don't know of a good video about that, but, can you explain in your own words what the difference is between training and fine tuning, according to your current understanding, so that I know what your current understanding is?
The only wrong answer to this question is one that you look up.
(when we get to the end of this, you might even understand why there probably isn't a "good video" about fine tuning in general.)
Training is generating weights by giving the model data to train on.
Fine tuning is something you do later, For better results.
Am i right?
Hey folks, I have moved past the desire for a custom model of any kind, and am now looking to generate flowcharts or diagrams via python with Machine Learning or Reinforcement Learning based on human approval or rejection mechanics, does anything like this already exist, or can I theoretically build it in Python?
Basically the goal is to allow any amount of data to be input, in either .md, .txt and (maybe?) image formats, to enable the user (me) to programmatically generate or quickly iterate through designs with generative functionality, specifically for the purpose of creating layout diagrams for branching pathways of decision trees
ML and RL would come in (and maybe I am mistaken in what these are actually defined as, so please, correct me if I am wrong!) handy with constructing background memory (possibly a poor term) or context for the rest of the layout diagram, and allow me to quickly append, say, 3 branches of a decision's possible outcomes and then either approve of or reject the output of the generator (this, i think will be the easier part. I have a GUI for generating layouts in a different context so I can apply those lessons here). The approvals and rejections would "reinforce" things to avoid in that project's context, and things to lean-on more heavily.
Am i talking about a pipe dream/schizo or is this possible?
there are several tools that can already generate flowcharts and diagrams from md or md-like inputs in a deterministic fashion
I have tried mermaid, and various mcp-servers for LM Studio to integrate with an existing LLM interface, what kind of tools should I search for? Literally just "deterministic flowchart generator"?
an easy variation of what you're describing is asking an llm to generate, as an example, the latex source for a diagram based on markdown/text and images you provide
what does latex source get me? I am unfamiliar with it
latex (and the newer typst, if you're interested) allow you to generate pdf files with text, maths, diagrams, figures, etc based on text commands
it's a so-called "what you see is what you mean" pdf generator
oh wow! okay cool, I will do some experimentation with that before i ask anymore about my idea
thanks a ton!
let's see if this example works
oof
maybe something smaller
.latex
\begin{tikzpicture}[
% select an arrow style
= latex',
%
% set nodes to be 2cm wide, 0.8cm tall rectangles
every node/.style={
draw,
minimum width=2cm,
minimum height=0.8cm,
},
]
% place and name the nodes
\node (S1) {Step 1};
\node[below=of S1] (S2) {Step 2};
\node[below=of S2] (S3) {Step 3};
% draw connecting arrows
\draw[->]
(S1) edge (S2)
(S2) edge (S3);
\end{tikzpicture}
oooh that's just what i need!
@dusky hemlock something like that
maybe typst is easier to work with, but latex is more "well-established"
describing your desired diagram to an llm and asking it to generate latex for it is a pretty standard task. typst might not work so well because it's newer, but latex has decades of stackoverflow content on which llms have trained
Okay wonderful I am gonna give this a try
before i do though
you were mentioning that it could use markdown as input already? is that the case with LLMs in the context of what we're discussing?
I know normal LLMs use markdown as a standard, i mean for input do I have to do anything special for the markdown to be accepted as an input for the latex to function?
i'm fairly confident an llm should be able to handle it
Fantastic thanks again for the tips
though i think pandoc lets you convert markdown to latex by specifying some sort of configuration
emacs org mode does something similar
I am coming across an issue
when scaling data (StandardScaler, RobustScaler, whatever)
is it best practice to scale the entire dataset, i.e. before the train/test split, or is it best practice to scale the individual sub-datasets after the split?
it looks like scaling before train/test split might cause data leakage, but this is model dependent
you would usually determine either a scaling factor based solely on the training data, or you would apply some procedure that independently brings any dataset you'll feed into the network into an expected distribution/scale
e.g. you can normalize every input so that the largest value is 1 or the magnitude is 1
or each data set to be normally distributed with a given mean and variance (e.g. standard values or values determined from the training set)
the approach that works best depends on the nature of the data and the network you use
can you clarify?
it sounds like you are just saying "it depends"
the usual workflow is
- clean up the data
- train/test split
- scale the data
- apply the classification / regression model
I am just asking if I should change the order of steps 2 & 3
it really seems that the order above is right, because model evaluation should be performed on test data that is nominally independent from the training data. Scaling the whole dataset in one go removes that independence
this is indeed the answer
what you should never do is scale the training data based on the test data. in that sense, you can keep your numbered points in the order they already are
your options will rather be whether to scale the data completely independently, or to make a scaling scheme based on the training data and apply it to the test data (possibly without ever looking at the test data's properties)
How should I start learning ML any recommended videos and reps to start?
Hi everyone, i have a good knowledge of python, streamlit(dashboards) , ML models and now i want to learn more and explore through projects so if anyone of you have any projects where i can contribute please share
Checkout #1468524576479641744 and see if there is any project that seem interesting to you
Hey guys, are there any important certifications for machine and deep learning?
I work with latex thoroughly
I can assure you it increases stamina and throughput, as well as lowers risk
Definitally penetrates the market in ways that need more exploration
No, but a degree in computer science with AI coursework is usually a must. And it usually needs to be a master's degree.
Hey guys.
I want to know how much I know, and how much I dont. I have done courses in data science, gone through andrew ng course on coursea. I want to focus on machine learning or computer vision, What topics should I check out. So far, the projects I have done are, Image classification using qwen, model training for detecting an embryo quality by using tensorflow. I learnt about yolo, paddleOCR,easyOCR.
So do I know atleast 10% of the subject? I can perform EDA, perform analytics by using pandas, matplotlib, seaborn, numpy. Currently learning how to scrape data from a site using beautifulsoup and playwright.
I just started out on that.
what is your end goal? why does it matter how much you know? also percentages like that are not useful or really possible to evaluate to begin with
end goal would be to learn everything, such that I don't have to be dependent on others for help. So I just want to know the topics I should check out and learn.
It seems like you may want deep knowledge. So let me explain what I mean by that. I generally split knowledge into two categories, deep and shallow. Shallow knowledge is like memorizing certain libraries/APIs, or memorizing how to add two numbers on paper using the standard algorithm learned in schools for addition. It's what is needed to actually get real work done, and do so efficiently/quickly (practicing the actual real world things). Deep knowledge is like understanding the math behind the models you are using, or knowing why the addition algorithm learned in school works, the structure of numbers as strings of digits. Deep knowledge is needed to generalize well, and be able to do things like invent new types of models, and algorithms. Right now you listed what can be categorized as a lot shallow knowledge. You can accomplish many real practical tasks using many existing tools. But if your end goal is to "learn everything," then you probably want a ton of deep knowledge too. To that end I recommend getting books that dig into how all those tools you have been using work, from the ground up. If you feel like you could have invented some of these tools yourself, you have acquired deep knowledge.
Note that complete lack of shallow knowledge can hurt your attempt at getting deep knowledge, as you lack a bunch of practical tools to test your deep knowledge understanding.
This is why in schools they often teach arithmetic first via just memorization and lots of practice before getting into any deep mathematical knowledge (if you struggle with multiplying two numbers in your head your journey through that deep knowledge will be a slog).
Take a look at RL if you want to see how much you don't know 😂 I'm diving into that
I can't resist the idea of pushing the front-line of research
I think im like beethoven here with my bachelors comp sci degree and lifetime nerd hacking exp
@serene scaffold in what sense do you think yesn't is being used?
i mean n't is a negation clitic right and in english a negation clitic can only be used with auxiliary verbs, yes is not an auxiliary verb its not even a verb so any assumptions that construed: * yesn't is a valid phonological word
are False!!?
Certifications are meaningless.
They are a dime in a dozen and mean nothing.
Hiring is focused on two things experience and degrees.
There are no shortcuts.
Buddy, i have done masters in computer science. That degree doesn't mean jackshit when it comes to my knowledge. Only helps me to get a job.
A degree is what you make of it
They are there to give students opportunities, but it's up to them to take them or to limit themselves as a way to only get a job
Nvm, this is off-topic. Anyone?
It depends on your goals. See AIMA for instance for the more traditional approach.
There are also many statistical approaches to machine learning beyond the few popular models you mentioned.
So it depends on the applications you want to focus on and how deep in the rabbit hole you want to go
if you want to know how much you know or don't, it may be useful to look at some textbooks or papers published in your area(s) of interest
Can you elaborate what you mean by statistical approach? I want to focus on computer vision mainly. Anything related to that position. And deep enough such that I don't have to depend on a gpt or something else to write my own code.
Any books and papers in mind?
Also, isn't it better to have a wide range of options? Focusing on only one position doesn't seem right, what if there are no openings for that position? I'd have to wait months for one to appear.
Hello!! Someone use Kaggle for data science?
I'm not on-call to answer questions, even if they happen to relate to my focus area.
"yesn't" is only said as a joke
also the problem with "yesn't" isn't phonological. it's that the semantics are incoherent.
i struggle with calculating central tendency on paper. I know the formula an everything but i make alot of careless mistakes when doing it using a paper
is that a problem
what do you mean by central tendency? that includes a few things.
mean median mode
data scientists usually calculate those on a scale where it would be impractical to calculate them on paper.
the point is whether you understand what they mean
i struggle with doing long calculations using +,-,* and /, when it requires a lot of steps
i somehow end up making a mistake
you need to be good at algebra.
in terms of figuring out what formula represents a problem
what kind of problems should i solve and practice for tht
I'm not sure. look into ways to practice your mathematical reasoning skills.
does basic arithmetic operations fall under it?
you use basic arithmetic operations when you apply mathematical reasoning
I might know how to solve it but i mess up the calculation part
especially when its tedious
Still not answered. Anyone?
a few people responded. And Squiggle is especially knowledgeable, so I'd give what they said at least one additional read.
end goal would be to learn everything
you pretty much can't learn everything about ML. Just pick a direction that looks interesting and run towards it.
I will, but it isn't the answer I am looking for. I want to get the topics I am missing.
you can scroll through the indices of several books to get a feeling, but i think you're failing to realize the depth and breadth of your question
suffice to say getting a masters or phd in mathematics can be a good starting point to go in full depth in the theory of some ML topics
Ok. I get what you are saying.
But are there topics you think every newbie needs to know? Should be a handful no?
that's a completely different question, yes
what you asked before encompassed like 100 years of mathematics and computer science
for newbies wanting to understand what they're doing, linear algebra, multivariable calculus and statistics are the starting point to gently move into optimization
Done. I went through this roadmap to be exact. You can check the sub topics too.
https://roadmap.sh/ai-data-scientist
so you can do stuff like compute covariance matrices and do eigenvalues decomps, low rank approximation, statistical parameter estimation?
Idk what parameter estimation is, can elaborate. I can do others
You mean like weights, bias etc estimation?
well, not only
that's one version of it, the more modern deep learning approach
most problems don't require deep learning though
so convex and nonconvex optimization are generally useful topics to study, which can include deep learning but also include other stuff
the question to ask here would be: do you know when the MSE is a good choice of cost function? for what type of problem is it a good idea (i.e. problems with which statistical properties?)
what properties does your solution inherit if you use regularizers of different kinds
why do cnns work well for image processing
Yes, these are the questions I am looking for. Thanks alot. Idk the answers to these and I will find out.
you can look into books on optimization
stephen boyd has great free resources on convex optimization
Is it okay to add you as friend? Only if you are comfortable. I just have these kind of questions.
i'd rather not. i'm usually lurking here anyway
i also like louis scharf's statistical signal processing
very enlightening
Anything you can think of that would take a whole day or even more to learn? I am unemployed so i got time to kill.
Lol, if its flexing time lmk so I don't flex at the wrong time
@ocean hinge I'm not in the ML industry, but it all comes down to the knockout competition of winning a round of interviews. So whatever you can do to impress your employer and beat any code tests they throw at you will help you. A degree is in the impress category. So is having a portfolio of works you've done. I'm not willing to waste extra years pursing a degree in a field moving so fast, so I'm going with portfolio and self study
You could probably get together your vision projects and make a portfolio of them and apply to jobs for a year and hopefully land one :p Also I'd say you should be getting AI to write your code, not avoiding it. Embrace it :p (but maximize your understanding and knowledge)
how do I install rocm and pytorch for gfx 1200 (rx 9060 xt) on a python venv?
I use a docker container with rocm and pytorch on it and add my code as a volume
this way I don't have to mess my systems drivers / torch up
torchbox-custom-models:
build:
context: .
dockerfile: Dockerfile
devices:
- /dev/kfd:/dev/kfd
- /dev/dri:/dev/dri
volumes:
- /sytem/code/dir:/path/in/container
environment:
- HSA_OVERRIDE_GFX_VERSION=10.3.0
- HIP_VISIBLE_DEVICES=0
ipc: host
network_mode: host
shm_size: 16g
cap_add:
- SYS_PTRACE
security_opt:
- seccomp:unconfined
group_add:
- video
FROM rocm/pytorch:latest
# Set working directory
WORKDIR /root/dockerx/custom_models/
# Install Python dependencies
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
CMD ["/usr/bin/bash"]
oh okay then I'll switch to a docker container, thanks
You probably wanna check the card + rocm version I'm not sure if that HSA_OVERRIDE_GFX_VERSION is the right one
yeah I think that's an rx 6800 xt or something
who uses math libraries in python? have you ever implemented a bayesian hierarchical model with a horseshoe prior using variational inference and assess convergence without mcmc?
that is soooooo specific
but you can probably just ask your questions, even if people have nt done exactly the same thing, they will be able to answer
I trying to build iris biometric verification system using python without any physical fingerprint sensor I m planning to capture images from camera detect fingerprint edges, store them and them compare them for verification but current version is not working fine I mean accuracy is very low
If anyone now something please help me here by giving me solutions and suggestions It would be very helpful
Wait, so you are using fingerprints or irises ?
I think he meant capturing the fingerprint using the camera
If that's even feasible, I think it's going to largely depend on the camera ability
I don't think that what they meant
And I don't think that what they want really is possible. Irises are not that different from human to human, at least in the light spectrum. I think you should look online for how iris biometric verification is done in practice, but I am pretty confident it is far more complicated than a 1-NN with RBG array
(Which is what I assume you are doing)
Bro i am final year ai and data science, i got a job customer care associate, should i accept that job or is any company providing ai job roles for freshers and what are IT roles should i prepare for the job? I am confusing with python development ,java development, data analyst for learning course 😶🌫️
Take the job and look for a better one
Hello
I wrote a model that detects which species is there in a given image. I am getting an accuracy of 86%. How can I fine-tune it? Just increase my dataset?
I don't think you're using "fine tuning" correctly.
can you create a confusion matrix for its current performance and show us?
@serene scaffold
Sorry, I forgot to mention butterfly species.
this confusion matrix indicates perfect performance.
so either the confusion matrix is wrong, or "accuracy of 86%" is wrong.
Hello, quick question... pytorch is the way to go now when it comes to DL/ML/NLP stuff.
So I was wondering, what would be the proper way to start learning how to implement a linear regression model using pytorch for e.g.
For example, say I learnt linear regression and now I want to implement it through code. How would you people recommed me to learn pytorch pls because it seems too big, I don't really know where to start.
start by making a feed-forward neural network for classifying MNIST digits.
noted, will keep that in mind, ty !
if you're familiar enough to do the math manually, you can look up how to do the individual operations in pytorch. if you're already familiar with numpy, you can also use jax instead of pytorch so you don't have to learn a whole new module just to do ml
yep noted
im not the greatest at ai ml and things of that nature, but can somebody assure me if my understanding is correct?
I want to get the cosine similarity between A and B, B lets say is our target, and A is our input which we are unsur about. If say B is one word, and A is say 15 words, but is inclusive of the word found in B. Due to fact that there are 14 other words in A, the similarity between these 2 vectors will be lesser right? supposedly this is called dilluation. Is that correct? I want an answer from a human not AIs as this is a domain im not well versed in and i feel like AIs could hallucinate too hard.
we need to be very precise about terms. You said that A is 15 words and B is one word. Does that mean A is a matrix of shape (15, n) and B is a vector of shape (n,)? Otherwise, what does it mean for A to be 15 words?
I want an answer from a human not AIs
It is against the rules to copy and paste "answers" directly from AI in this server.
assume that both A and B have the same dimension (384 for example), but lets say that before the the vector embedding is done, B is literally "Hello" and A would be "Hello, to you there too... " etc, generally more words, that is the idea
i made a new memory system, i just got it up on webui today, its been in a terminal since development. But it connects via api using open at endpoints. Im putting the finishing touches on the fast mcp server, but you can use cloud agents too
well it blow Letta/Mem0 off the shelves. but its rust and python but i need to figure out to package it
Hi friends
ok
why
How do I start making AI and stuff?
in this case ig yeah? I mean if you took the cosine sim of the embeddings for "hello" and "hello" it's obviously 1, and "hello" and "hello, to you there too..." is obviously less than 1
however imagine something like "apple" and "fruit" - increasing the latter to "red fruit" might increase the cosine sim
do you wanna make one from scratch and understand fundamentals or just be able to use tools?
any 1 have recommended tutorial vids or reps to able to learn about ML?
i have text book with reps
does any of anyone have a background of (cv2, yolo, ctranslate2, edge-tts, piper-tts), I am building voice AI agent, with integration of os level navigation. If you interested in reach me on DM.
@grim wolf it's not appropriate to use this server to recruit people into secret projects.
Hello
Can anyone explain what PCA really is? I am not able to understand the google definition. I get you are decreasing the features, but how? You just ignore it?
Covariance:
Measurement that indicates how random variables change together, this gives us the direction and value of the relationship.
Covariance matrix:
Relationships among the features
Derived numbers from that for PCA selection -
Eigenvector: direction of component
Eigenvalue: amount of variance explained
You want the largest eigen values. These are the principle components. Hope that’s put simply enough
Covariance measures the relation between 2 features right? So if there are 10 features we calculate 10! times?
Yes features. if you have 10, It will become a 10x10 matrix, so yes to calculation. 100 values. The objective is to retain as much of the data structure as possible.
Load a basic dataset in Python and run
Object = np.cov(data, rowvar=False)
pd.DataFrame(Object)
Then compare that generated table to the shape of the dataset. You’ll see this behavior.
Edited: confused more traditional stats. So syntax, would have rowvar=False in cov()
the process of transforming images into matrices, does this preprocessing have a specific name?
What do you mean by "transforming images into matrices?"
when we want to do image recognition, we need to turn the images into numbers, and they organize themselves into matrices, right?
Images are already numbers.
Unless you mean capturing an image physically with a camera?
so if you give a .jpg to a neural network, without turning it into a number will it work?
JPEG is a file format that contains a compressed image. It requires some processing to extract just the image itself. You want to do operations on the uncompressed data (probably).
It's still numbers, just compressed.
u don't say, my point is that a model can't perform linear algebra on a compressed file header. It needs a structured numerical input., so I was asking for the industry term for that 'file-to-matrix' bridge, is it decompression? or other?
So there are multiple things going on. The first is loading the file into main memory, then there is parsing the file, and then there is performing decompression to get the image data itself in uncompressed form. Then you are associating that data with an object in the programming language. You are giving that object ownership of that data. That object has associated operations (methods) it can perform on/with the data it references. So it looks something like: load image with Pillow, it reads the file into main memory (RAM), parses it using its JPEG parser, decompresses the image, and returns a Pillow image object that has ownership of that data. I am now going to make a bit of guess here to what I think you are referring to. You then give that Pillow object to Numpy and now you have a Numpy array that references that image data. This process does not exactly have a single name, it's just passing around a reference to the data to another library.
The type of object from a data structures POV is called a multi-dimensional array, or N-D array for short as numpy calls its arrays.
Matrices are specifically a 2D table of things. Most typically used as notation for linear transforms in linear algebra.
Images don't fit that, unless they are specifically greyscale, or binary images formats (or any single channel format).
They have a third dimension, which are the color channels.
thks, can I have your opinion for a project that im doing?
another data science workflow question
earlier I asked about scaling the data prior to the train-test split, and the outcome of that query is that no, you do not scale the data before you perform any sort of n-fold validation
scaling the data before n-folding would cause leakage between the folds, which is malpractice
now, a different and slightly related question has arisen
when you are performing feature engineering, e.g. taking a numerical feature, and applying transformations on it. Multiplication of features with each other, or taking exponentials, or powers, etc., all in an effort to increase the number of features
should this done before the train-test split, or after ?
when answering this question, what you always have to consider is "does this operation give away the answer?"
also, if the operation affects how you represent data that goes into the model, you need to be able to perform the same operation on the X data of both train and test
otherwise, you won't be able to run the test data.
I heard some people say that anaconda will be replaced by uv, is it true?
anaconda was replaced by regular python at least 10 years ago.
there were many years after anaconda should have completely died, before uv was even an idea
I've been working in data science and AI since 2017, and I've never used anaconda a single time, and I've never needed to use it in any way, for any reason, ever.
even on Windows.
@rich river is that clear?
I see, I just wondered if I can just uninstall it
delete all remnant of the corruption from your computer
oh I think it is because of this post
That's a separate issue. Like I said, anaconda should have gone away many years ago.
I'd reckon that 2016 was the last year that using anaconda wasn't embarrassing.
I have a kind of peculiar problem; I have 2 datasets that's mostly categorical values. That in of itself isn't much of an issue, but the problem is: the unique values in the training and testing datasets are quite different.
The scoring metric is accuracy, so getting a decent score isn't much of an issue, but whenever I tried training the models, the validation accuracy comes out to 1.0, and the test score is ~0.94. I have no idea how to improve the model and get an actual useful result as the validation score is always max, even for base models...
So far, I tried one hot and ordinal for linear models, and ordinal for tree models. One hot with linear, and ordinal with tree models have a val score of 1.0, while ordinal with linear ones overfitted to a val score of ~0.99, but the test scores were ~0.89
is 0.94 a bad score for you?
I mean, the leaderboard has a score of 1 after an hour of submissions...
how large is the test set
plus, it kinda feels like cheating as in, i didn't even do anything but train a base logistic regression
it's rather small, around 1100 rows
the train set is 7k rows, but after dropping some missing data, it's around 6.8k
I wouldn't call that 'cheating' unless you mean you wanted to learn some other deeper techniques ig
that just means the dataset's easy to separate
I'd start by looking at the ones your model classified incorrectly
I don't know what the cutoff score is gonna be, but it's not gonna be under 0.85 considering the number of people who scored over .9 after less than 24 hours
that's the thing, my train and test datasets have a low number of common features. in almost every feature, the test set's distribution is different to the train set
what kind of data is it and what features do you have?
or can you not say due to competition rules or sth (if so, it's gonna be a bit harder to help)
it's an open competition, the scoring is for crossing different thresholds, not relative to the top score
there's a possibility that those who got perfect scores just cheated
looks like the uci mushrooms dataset: https://archive.ics.uci.edu/dataset/73/mushroom
if the hosts didn't do anything to change it then you probably can find the exact rows + target in here
Discover datasets around the world!
ehh might be slightly different actually
number_of_bruises is new
ah, then there's no way I can get a higher score as I can't use external data 😅
well it's not that you have to use external data to train your model to get better scores
but it probably will be a lot harder
esp. for the data that only appears in the testing set
yeah I guessed so
looking at the uci dataset, I'm pretty sure they just added the number_of_bruises as a numerical feature later on
I'm assuming that there's no way to cover the missing unique data from the test dataset?
I'd say unlikely
for the 200 rows which have missing data, how likely is it that you might be able to make educated guesses to the missing features, and that those missing features might be those currently you dont have? (e.g. a brown veil-colored shroom?)
looking at it, I might be able to reverse engineer how they made the test dataset
at which point it's less a data science exercise, but sure if you want to, or its important to score high for whatever reason
not for the cases of purely missing data, but in cases where the unique values in the test are a subset of the training set, it's a possibility
agreed lol, I'm not sure why they decided to mess with the data so much
thank you for finding out the real dataset and the help
Make one from scratch and understand the fundamentals.
Hello, just wanted to ask something.... I've seen all the influencers out there talking about programmers/developers job is over. I wanted to know to what extent is this true? A friend of mine told me that Node creator even stated a claim like that. For the influencers out there I didn't really bother but if creators like even for Node run time start to talk about that, I wanted to know how "bad" it is for developers, can we expect this to be worse?
In a time like that, do you people recommend to learn specific skills so that we can stand out from the others? I would really appreciate some advice 🙏
"influencers" say stuff to attract attention, because attention farming is their actual business.
I think the AI bubble is going to pop in a few months. but regardless, the worst case scenario is probably that there are fewer entry-level positions.
yepp noted
guys can anyone help me , im currently working with an anonymized dataset with huge distribution shift, its hft data, shuffled and we are not supposed to order it by time. its a as a regression task on shuffled row-level samples.
does anyone know what i can do? I have tried regression and lgbm but my LB score is like -0.331
please ping me if replying
if a regression and boosting models can't get the relationship, i think you need an preprocessing trying to improve the data or a NN to get that relationship
I am interested in mushroom data 😁
I’ve recently started a small internship with the Florida Department of Transportation and the skill is merely transitioning to thought process and not raw coding ability to be able to offload stuff like data cleaning/transformation to entry. It just no longer takes the same amount of time for that extra wage. Those help gain domain knowledge for new people though. So, more limited. At least with our current economy.
Pipelines, workflows, etc? AI still booty cheeks at creativity. Let alone complex ensemble predictive models.
Maybe DDR4/DDR5 would be cheap again 🙏
I'm seriously thinking at this rate we will not be getting DDR6 on consumer builds.
It will probably take a few years for hardware prices to normalize
I regret selling my high capacity servers last year, that's all I gotta say
DDR4 was basically borderline ewaste last year
In ai engineering what would u guyz suggest to learn first c++ or python
Obviously, you'll encounter bias for python on this server but the data ecosystem is top notch. Before you're writing models though, you'll want to spend a couple months, minimum, going over core python. NN code is very dense, you'll want to know how to read it going in. That said, once you're able to parse the code, you can whip up a working MNIST model in like half an hour and dive into AI coding.
Ok thanks for the advice
give me a couple of days and I'll dm you the link
Start with C++.
C/C#/C++ Lang's is some of the most fundamental languages one must know, You can easily translate your knowledge from there.
Later you will also find it VERY powerful, because Python has limits that C lang solves.
I say learn rust, python and go but thats me 🙂
Separate yourself from the pack. Find your own path, I'd research it. No one can tell where you path will be in 5-10 years.
Not in these times.
i learnt python then jvascript then cpp then c then asm then java then rust then go
altho i mightve forgot rust and go
since i dont use them anymore
Looking for a beginner level ML/DS study partner. (3-5 only)
We’ll study for at least an hour in VC at night (IST) daily.
It's okay if you can't open your mic
Just dm me those who want to join
i live in india but wont be able to join at night
thanks
guys who has the right tutorial which can teach me data sciences i know nothing
but i know python
hey developers,i am in my freshman yr and i started to learn python and its libraries.i wanted to ask few questions.like after learning single concept like if i learned oops today so should i try to build something with it or just the assignments i am doing with lectures is fine and building project when its right time(like after learning a lot and you get an idea)
yeah the more u create projects the better u can get
and can project ideas come from ai?does it work?i cant think something cool
even just an idea to project?
like i am not asking it to code
just prompting i know oops and basic python give me some projects ideas to build on these topics
Could start from here: https://kaggle.com/learn
💯 after learning something, you should definitely practice what you learn. So that it sticks. Doing assignments helps too but try to run your own exercises. I would prioritize the assignments first, and get that out of the way. And then extra exercises after that.
Don't ask the same question in multiple channels. It causes duplication of effort.
for a coursework i have trained a DQN to balance a frictionless cart pole (the usual one from gymnasyium https://gymnasium.farama.org/environments/classic_control/cart_pole/ with teh usual rules (episode terminates after 500, or if angle is above a 24 degree or if cart position is +-4.8)
when generating a greedy policy slice of how my dqn acts: cart position frozen at 0 + cart velocity frozen into 4 values (each get a subplot) i struggle to understand for the lower 2 plots thoose push right islands... could sm1 explain them to me?
https://github.com/TheTom/turboquant_plus
what in the fuuuuuuuuuuuuuuu
i solved the cart problem for dynamical systems with a bigass loss function
was fun
didnt even use deep learning
my loss function is given
i can share teh coursework later
but i have more bad news... :/
mind helping me out in a bit? @cedar tusk
whats the problem?
i share the stuff in a bit
more like the slices
are super inconsistant
and i do not know hwo to answer teh problem the best
this is first up given for teh loss
this is the coursework
bruh i cant sent .pdf
I'm researching a bit about TurboQuant and was quite amazed with their work.
On their 1-bit error correction, do they add the attention score (using quantized query and key Q^ . K^) with their respective error correction vector (Q^ . K^) + ΔQ^ + ΔK^? Or just (Q^ . K^) + ΔQ^. Where ΔQ^ andΔK^ are the scalar corrections?
Oh wow...
You need to do math for AI ????
yes
to use AI no, but to build it yes
It's interesting on how they dequantize the vectors back to their approximate original form instead of adding an error correction after calculating the attention scores.
I think they can even further reduce the KV-Cache size by calculating error after attention scores are calculated. Since in the current implementation with dequantization, they require the projection matrix matching the Key/Query dimension which is dxd.
Theoretically, they can reduce the projection matrix to m x d where m is far smaller than d by calculating a scalar correction and modifying the attention scores instead.
without numbers :(
Can you share a roadmap or something ?
idk anything about that but the library for ai is tensorflow btw
wdym no
tensorflow is outdated. everyone in industry uses pytorch or JAX
ok well i use tensorflow
it's too mainstream
everyone in industry switched to pytorch and tutorial authors didn't get the memo.
basically both tensorflow and pytorch are good but there really isn't one standard
I've never seen a coworker use tensorflow ever in the last five years.
just use pytorch. be on the winning team.
team?
the team of people who use pytorch
i give up
you don't need to give up. you can start winning by using pytorch.
i give up with arguing
it doesn't really matter, if you like the library then use it, there's nothing wrong with tensorflow
got nothing for that one huh
I wasn't looking at this channel, but yeah, I agree
Every soul is free to choose their life and what they'll be.
except when it comes to politics
how so?
because i hate the first guy in the moderator list
yk
im from israel n stuff
we have some unwanted problems over here..
@austere marsh I'm muting you if this continues. you can have whatever opinion you want, and you can put any country flag in your nickname, but we're not discussing this in the server.
well you asked i answered but ok..
saying "except when it comes to politics" doesn't indicate that you're about to say you hate a member of the staff.
politics is hating
like people are hating each other and idk why..
let's just live all happily together.
Great, we'll leave it at that.
also you did say "how so?" to the obvious bait so kinda not my fault..
I said it because I didn't understand what you were trying to say.
Everything that you say is "your fault".
Send a message to @sonic vapor if you have any other questions or comments about this.
!ban 932187617288667138 antisemitism
:incoming_envelope: :ok_hand: applied ban to @warped vault permanently.
All the algorithms are linear algebra and multivariable calc
You don't need to get insanely deep into those subjects to start, but it's worth having the basics in order to understand the properties of NNs and how they're designed.
i.e. in order to understand the effect of using different loss functions in a network, you have to understand its partial derivatives for a given input/output as a function of the weights and biases in the network.
So being able to do basic vector math and take derivatives is really useful.
Google graveyard/abandonware.
Jax is the new shiny toy for them.
TF 1.0 was a broken rigid mess, then Pytorch came in got all those users. They tried to make a comeback with TF 2.0, but it was too late by then.
(And now Google has moved on)
(Final nail in the coffin)
is this accurate?
https://platform.openai.com/tokenizer
Yes it should give you the precise tokens used by those models - why do you ask? What do you mean by "accurate"?
Someone has a good tutorial on binary classification, more particularly on svm and kernel trick ?
i have hands-on machine learning 2023 with scikit-learn, tensorflow and keras, is this still one of the golden texts for machine learning even though everything i've seen is saying that pytorch is the dominant framework now? just curious if a lot of the material is relevant and would help me picking up pytorch while getting the fundamentals a lot quicker
you're right about pytorch, but the fundamentals of how neural networks work are just a mathematical fact that won't really change, even if people come up with more creative architectures or applications.
awesome. thank you so much. i'm about to dive into this then.
i've had some experience with pytorch before, but i think this book will set the ground stage for everything and then i can tackle pytorch specifically once i finish this with some background
I got a question How much should ik about MLOps if i want to become a machine learning engineer?
it's going to depend on what role you end up getting and how that company/team decides to distribute labor. I have a coworker who on paper has the same job as me, but he pretty much only does MLOps.
Be prepared to learn as much about MLOps, or about whatever else as ends up being needed of you in the future. But for the moment, I would get comfortable with Docker and its core concepts.
So... when it comes to training LLMs and inference, is MoE just a matter of having the autoregression take on the task of picking the feature-space of the embeddings to apply adjustments?
If you don’t mind, I’d also like to ask if
you know any excellent resources for learning NLP?
I’m already familiar with the underlying mathematics and theory I just need help to implement and code
what do you think of as NLP? because pretty much no one cares about pre-2022 NLP right now.
and current stuff ("""""agentic""""""""") isn't really NLP
I am really confused on what to specialize in
I finisehd Machine Learning basics Now I want to specialize What is your reccomendation?
I was asking about NLP because i have choosen to specialize in LLMs
do you want to actually know how LLMs work, or do you just want to build agents from existing LLMs?
I already know do how they Work, but I lack the ability to write Code because i didn't learn how to Create an LLM model yet
there's no way that you really undersatnd how LLMs work if you just finished ML basics.
you also won't be able to create your own LLM unless you get a job at an exceptionally powerful company. LLMs cost millions of dollars to create.
I do know
I know that too
So you know ML basics but you don't know how to code ? That's probably something you need to get into before any thing else
i am having trouble recreating john cramers sonification of post planck epoch radiowave data from 2013
because if you rotate the data by 90 degrees, and cross reference declassified 2003 gateway, theres already quite the uh, that, before you even have to cross reference current classified gateway
basically im tryna do what cramer did, but, finer detail for wave inspection
oh, you're currently inside a black hole btw. rather, EVERYTHING is a black hole. that much has been hashed, now im just tryna confirm if its if its recursive black holes.
this proposed formula for 'why everything' wasn't supposed to end up being a black hole annulus intentionally. /but/...
one is one instance, the next 100k instances, the last one million instances, you can think of the 100k as sort of a theta wave oscillator of the hippocampus, but staring down the cyclical bangs, the color is just phase density being used as a heatmap, the last is one million
one trillion.
oh, sorry, forgot the 100k for probability, and here's the full proposed formula for 'Chaos sequence'
almost had it earlier
How do LLMs work?
token system, its akin to a brains weighted affinity network data clustering, you can use the program obsidian as a loose metaphor
I know how LLMs work
They claimed that they don't know how to code, but have taken ML basics and know how they work. I was trying to gauge their level of understanding of the architecture through asking
And ty 🙂
the topic of generalized ai, and the one im posting about, may uh, may intersect interest wise
I'll chime in more about my thoughts and the mechanisms after I hear from them, haha. Don't want to ask a question and give the answer away
Oh no 🙁
!unmute 115751921813422082
:incoming_envelope: :ok_hand: pardoned infraction timeout for @peak lark.
!paste
So that everyone can easily read your code, you can paste it in this website:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
you can attach more than one file at once in this paste bin.
ty
i was trying to just convey module 36 via crop alterative
I do still see some roles that are looking for NLP. Rag based stuff, vector databases, mainly when you need something grounded in reality and deployable efficiently at scale.
But it's definitely not what all the hype is centered around rn. (So I holistically agree with you)
On a different topic of handling missing data, how do you guys feel about model-based imputation? I know stacked models are a thing, but using a model to solve a problem with a model just seems..... maybe this is a me problem but it seems cursed, lol. Maybe it's just something I need to get over.
I have no problem with teacher/student modelling or transfer learning, for whatever reason
what is model autophagy degeneracy for 600$
(For anyone unaware about model-based imputation, it's a more recent method of handling missing data by having.... a model try to guess at the missing value based on the other features.)
fair
I would personally advocate for removing records with missing data, but if there's a systemic reason you're missing data you're going to bias your results
i gotinto this wholemess chasing 3 body vector problem off a hunch for, at the time, a buckshot whim for orthogonal superphase.
now the only thing it DOESNT involve is string theory
Wasn't the three body problem solved, even if it was ugly?
I haven't followed it too closely
there has been found, only to be a handful of stable orbits to date, believe it or not the forefront for phase vector sims is flatearthsociety, they believe in flat/holographic universe, which, spoiler, accurate, #hyper donut
No, I'm not talking about like... specific orbits. I mean I thought there was a closed form solution that was ugly but worked.
Oh. The solve is contested
if there is i'm not familiar,
the reason why it happens i believe, is the universe is inherently uneven, 'lobsided' both the cia model and my own conclude it to be product of phi
Seems like we first have to agree on if the three masses are equal
it's just the precedent of 3 for why, specifically 3
I mean, I think the most famous example of 3 body problem is the sun, the moon, and the earth
Because it could, you know, kill us
and then theres the cosmological axis of evil topic
\
i had that formula hashed BEFORE gettin this stuff,
sorry took me a second to fetch, ever wonder why 3 is the most present number in physics?
the brain too is a 3 body system
pi/phi my guy, planck era took place essentially because pre non-deterministic reality following chaos sequence just, ran out of anything else that adhered to the demand,
this could be paraphrased as 'infinite novelty'
anything else would be a closed deterministic loop, in full, just one thing defined by nothing, resulting in 'final event horizon scenario' all over again, planck era.
in essence this would be clarifying boltzmann brain as not quite the case and expanding on hawkings otherwise monopole formula for time/space/miwkowski space
it even clarifies antimatter/weak nuclear force violations
bump
fantastic, i managed to corrupt python
Newton came up with solutions for a system of 2, then tried 3
i tried using cramers output data to reverse engineer how he approached the data and got about this far before my awful laptop said no
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 2.06 GiB for an array with shape (2048, 2817, 48) and data type float64
yeah you can go as many as you like past three and be just fine, or up to 3, the moment its 3, no more fun times
so, we try something lighter
Wait.... are you suggesting closed form solutions exist for 4+ n_body problems?
this is uh, big bang related. technically, thats under the umbrella
There may be some specific high-symmetry special cases, but generally I'm pretty sure it doesn't get easier at 4+, it's just 3 where it gets hard. Since 2 is solvable
mm, bout symmetry
sec
theyhave the right results so to speak but it wasnt the first break rather the second
you're a sharp lad il give you that
as reminder, phase is the cia word for toroidal miwkowski space
Yeah its symmetry in n>=3. You cant do it otherwise.
i didnt make it very far, um, would anybody mind lending a hand if they have more ram than i do?
Click here to see this code in our pastebin.
Someone on ts server has a 256gb ram hobbyists datascience setup
just checked that file, it do be there too
take a look, id turn down your volume
Hi, I'm about to finish CS50 introduction to artificial intelligence and I started studying calculus 1-2. What should I study (and what resources could I use) for data analysis and machine learning? Ty
Optimize it, RAM shouldn't be the limiting factor here.
did, found out theres more missing from the details about how john cramer did that, infact, MOST EVERYTHING related to this is just 404 links not found
https://library.wolfram.com/infocenter/MathSource/5083/
https://wayback.archive-it.org/21834/20250903013453/https:/map.gsfc.nasa.gov/
between the classified stuff, these details, and the fact even hawking said it was the most hype of his entire career, then proceeding to make zero revision to his otherwise monopole formula? not hard to see a gag order when it happens, this isnt historically a novel concept, back when the iron curtain was a thing, you practically werent allowed to discuss water being wet
nasa even commisioned it as a bronze statue. whys this the first youre seeing it?
What are you even trying to do here?
well originally it was just killing that wave form. more you stab at it the more it hashed out tho
/currently/ im tryna figure out if common logic be the case, and its black holes all the way up/down/bifurification'd
What?
2003 declassified depiction of consciousness energy grid throughout a brain compared to bang wave, care to see the stuff regarding consciousness in here? they even go into what quantifies death and identity down to a geometry
Man did you take too many shrooms or something
no
not for like 8 years at least i mean
been awhile
i got this stuff like, not even a day after hashing my own formula as posted earlier
You're thinking aloud here, sure I give you that they are pretty patterns, but I want to see a cool demo or something not speculative out of context stuff.
sure
volume warning
There's is no structure or form to what you're talking about from my perspective. Keep things Concise
which part, theres a lot.
Nice its like a virtual oscilloscope
vector scope, basically looking down the barrel of an osciollo
its SUPPOSED to be used for doing stereoscopic phase relations/cancellation management,
this had a hella setback due to a depopulation mistake, but this is like, almost half of it, 'theres a lot'
Okay but for me I don't know what that has to do with FL Studio what is that a KG?
What's the relation here? I don't know what you're talking about
sonification of radiowave data,
380k years post bang to be specific, redshifted,shortened to 100/500 seconds
personally i found the notion of black hole theory rather edgy/cringe, and was not intending to end up with that as the take, however, run the formula for a bang cycles matter/probability, accounting for inflaton dark energy swap as a sort of threshold about a million times and you get a black hole annulus. then i get the cia stuff, which is literally just 'yeah black hole my dood'
they REALLY rely on geometry for everything though, fair i guess but, im more of a frequency guy, though the two are intergangable
two of the files most predominantly conveying this
visual wise i mean
is this really about data science/AI?
yeah, if youre trying to make one with qualia at least,
otherwise its just data related science, physics
if you want context for that i gotta dm, pdfs aint allowed
I think all your messages and images are likely to displace messages that are questions about implementing data science and AI in Python, which is what this channel is mostly for, so I'd appreciate if you tone it down.
that's impressive without any context,but sure
thanks!
wish i could do that
Hello
Can anyone explain me what confidence interval is? Its the 95% of times I get mean right?
It's the bound within X% (usually 95%, can be something else) of the experiments results
Bound within x? I didn't understand. It says mean among 95% of data in a Google search.
It's not necessarily 95% of confidence, you can make intervals with a confidence of 99.5% if you want, or 75% for example
Usually we pick 95% but we don't have to
It's the interval within which X% (can be 95%, can be something else) of the random experiments ends
For confidence intervals, are you referring to the bars on graphs?
A confidence interval is a range that shows our uncertainty about our estimate of a true population value, since in practice we generally don't have the luxury of getting that value.
Like let's say we sample 1000 python developers and ask their hourly pay and the average is $60/hr and we compute a 95% confidence interval of 50-70. This suggests that the true average pay for python developers is somewhere within that range.
The interval can be wider or narrower depending on sample size and variability within our sample.
^
(There’s more technical nuance to confidence intervals, confidence levels, and p-values, but I think this is a good enough high-level summary.)
Hi, I'm about to finish CS50 introduction to artificial intelligence and I started studying calculus 1-2. What should I study (and what resources could I use) for data analysis and machine learning? Ty
Hey @summer plover
I am going through gradient descent, and was wondering, are there any particular cases where we are supposed to use gradient descent and others where we use batch, mini-batch or stochastic gradient descent? Thanks!
start from stats 101, then mathematical stats, then statistical learning, after you are comfortable with statistics and loss functions and their usages then you can start with torch.
stochastic gradient descent is better for bigger data but takes more time to converge. The newton method aka small gradient descent requires to calculate (X^T X)^-1 for each iteration and this has exponential cost
but converges MUCH faster
time / compute tradeoff basically
if u understand cost function then you can literally create models for ANYTHING
For batch the number of batches is random right? So if I take less batches the graph might converge faster, does this cause any error in readings?
there is no difference between batch gd and normal gd, mini batch is used when you cant fit the data to ram
both gd methods can be used with batch or mini batch
oh wait
mini batch is normal gd but since smaller batches the size of X does not get too big so is fast to compute the inverse solution
it still requires the data to load to ram
lbfgs is the one that you dont load all data to ram
for example this is the lbfgs method for linear regression
Thank you, I really didn't know where to start
thats not lbfgs tho
yea apparently not, i thought after seeing lbfgs is low memory version that it would use a d&r approach
but apparently it just uses less memory FOR the compute and not the data that compute uses
I never heard of this method. Is it for optimization?
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the collection of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular algorithm for parameter estimation in machine learning. The algorithm's target problem is to minimi...
Hi, I am a hs student who is going to uni next year for cs. I want to know what should I prepare for if want do ai. I am currently are experimenting with decision trees
have you looked to see if you can do a major concentration in AI?
yes but I feel like could be to hard
I mean I can, but would it require lots knowledge in cs theory and math?
yes, but if you want to get a job doing AI, that's what you have to do.
alr thanks
could it be theoretically possible to compress some random string of digits by repeatedly 1) doing some reversible transform (and marking it down so we can reverse it on the decompression) and 2) applying some sort of naive (or not) compression on the resulting string? (obviously we want the step 1 to help step 2 as much as possible) im asking this because i need to compress some random string of a couple mb worth of integers into a couple of kb
The problem is how/where you store the compression algorithms for reversal. Like if your goal is to shrink a string and rebuild it, but you need to pass the blueprints with the shrunken string and it takes up more space than what it reduced then it's unhelpful
You can try public compression algorithms, there are more complex compression algorithms, but the naive approaches are collapsing repeats (if your data has a lot of repeats) and/or looking for common substrings and attaching a translation dictionary to the file to decode it
i'm making an open source llm inference product, if anybody would be open to me asking them questions to flesh it out/validate it please dm me. 🙏
You have to be willing to do that in the server if you want to talk about it here.
what in the TimeCube nonsense is this?
your compression algorithm will have to map every possible random string of digits into a smaller string, if smaller string is only allowed to have digits, it is not possible because there are less smaller strings than there are larger strings. But if you are allowed to use other symbols you could use them, like you could use your digits to index into ascii characters
but the only reason indexing into ascii characters would reduce size of your string is because string was not an efficient way to store digits in the first place
Any recommendations to where I can learn how to create AI?
learn math
Making an AI is super difficult. But you can get your feet wet in machine learning at https://kaggle.com/learn is or check the pinned messages
its not that difficult. What is your project?
we have one person saying "making an AI is super difficult" and another saying "it's not that difficult" because they're using two definitions of "to make AI".
@atomic glade the word "AI" is way overloaded at this point. Describe as specifically as possible what you want to create, if you had to describe it to someone who had heard of computers but not AI.
kinda why i asked what kind of project 🙂
What do you guys know about Net2Net type of Ais?
to avoid any confusion, we should call those neural networks instead of AIs.
the idea is that you use the output of one model, the "teacher", to be the target of another model, the "student". the teacher model is usually larger and more complex than the student.
this is useful if the training data for the teacher is no longer available, or if you want to "compress" the teacher.
let me give you context. : I'm setting up my stock AI to learn continuously from the live market, but I have to expand its neural network first so it has room for the new data. I'm using a technique called Net2Net, which literally lets me add an extra layer to the neural network without wiping its memory. Instead of starting over, Net2Net mathematically copies the AI's existing knowledge into the new layers, giving it a bigger brain instantly. This lets the AI adapt to current market trends in real-time with its new capacity, while I still feed it historical data so it retains its memory of massive market crashes
When you talk about the "AI" as something that has layers, you want to call it a neural network. Or just "a network". People in industry don't refer to individual things as "an AI".
let me know how that approach works out. alternatively, you could just keep training the existing network on more recent examples.
I was kinda wondering if anyone here has done something like this?
I've copied layers directly into other models before, but I haven't done anything with time series data.
how did it turn out?
not great, but it was a toy project.
what were the problems you ran into?
the model didn't really improve. I was trying to implement this paper, but with different data. https://ojs.aaai.org/index.php/AAAI/article/view/17600
yeah seen that paper.
Thanks for your help. Will look into things.
it's always been an abstract concept.
Hello
Can someone please explain backpropagation in a neural network? It is described as we are going back with the new weights. But going back where? We are in the same layer. it doesnt make any sense.
I’m trying to learn how to build systems that analyze data and make predictions or classifications automatically. For example, a program that can take in a set of inputs text, numbers, or other data and reccognize patterns in that data, and output a decision, score, or prediction based on what it learned.
@jagged stratus
Don't ping people who aren't already talking to you to ask them to answer your question. Nobody is on-call to answer questions.
Oh sorry, I see.
Thought you and courageandfire were the same person for a moment.
Oh its fine lol
I might be able to explain it more later when I'm not on mobile, but backprop involves updating all the layers.
Look into how to train a classifier with CSV data using a library like sklearn
Okay, I will. Do I need to worry about using PyTorch and TensorFlow right now, or are those things I should learn later after I get more comfortable with sklearn?
Pytorch and tensorflow are both for neural networks. You only need to learn one, and it should be pytorch.
But there are more kinds of models than just neutral networks. Non-neural models are easier to wrap your head around.
Also don't think about "being comfortable with sklearn". It's a general purpose toolbox. Focus on ML concepts.
All layers? U calculate y_hat = wx +b for a single layer right? And then get the loss function, gradient descent and then you implement backprop. I thought it happens like step by step. Am I wrong?
Single neuron*
Thanks, that actually makes things a lot clearer.
Something about you twos' PFPs activates the same neurons in my brain
Maybe I only have one neuron for that.
Dude...now I am second guessing everything. Haha
They're both guys with similar haircuts looking in roughly the same direction
Dw I keep doing the same exact thing 🤣
something that really helped me understand neural networks is understanding how they're actually really big composite functions. if you expanded out all the multiplications, summations, and activations, you'd have one really big function with lots of deeply nested parentheticals. Those parentheticals will repeat many times throughout the function, because of how interconnected the layers are.
and each time you want to adjust the weights, you have to compute the gradient for that whole function. which is a calculus derivative thing.
backpropogation is is just a math trick to make it easier to compute the gradient at each layer, because the gradient at a given layer depends on the next layer. so you start with the last layer and work your way back.
and by "you", I don't really mean you. the whole point of libraries like pytorch or JAX is that they do that work automatically.
how would one encode enneagrams, memories, in bits. i never stopped to think about that.
you don't. you do it in geometrical alignments/color,
why store every value for every concept or memory as bulky data when you can just recall a single shape for say, a [red] [bouncing] [ball] for familiarity at the same time, represented as a geometric connection
the shape never changes but the color for affinity rep does, probably would even support gradients
I mean you're sort of describing an encoder. Transforming the input into a contextual representation of the input.
But the model you use is going to be highly dependent on what type of problem you're trying to solve. We've seen that decoder only models are sufficient for coming off convincingly as human created text through LLMs
If you're trying to explore existing concepts/data and group them together and use distance measurements, that are cleaner and more effective models
there are*
And there's no reason someone couldn't create a more contextualized distance measurement. We have Gower's distance for being able to work with a mix of categorical and numberical data, for instance
ridiculous, lol
Local Setup
well again, theres reasons ya cant make one with qualia, but the pitch angle for geometric/color based angle for consolidation, not the worst idea i think
enneagrams are already inherently encoding, the more you can consolidate the less itd take processing load wise
Hello Im an AI researcher and I currently need a team, if you're interested text me please, I'm currently working on an algorithm that can significantly lower both the energy comsumption and the compute cost of ai training
like a paper?
not yet, thats what im working about
@hasty lynx what would you tell them if they DMed you, so that they can decide if they're able to participate?
Do you have any prototype, partial implementation, or early results yet, or is it still conceptual?
im working on the demo, i need other's opinion for that to work
Yeah but what's actually working right now?
im working on the core logic of this algorithm, currently im searching for similiar reverse engenneer algorithms, because my idea is that backpropagation is one of the ways to solve the complex problem of ai training, if we find a way to reverse engeneer a model's weights, past the relu activation, we can give human qas to the algorithm and get the human's weights
this is really simply talking
How are you reversing weights past ReLU, and what does that look like in practice? Have you tested it on a small model or dataset yet? Like what equations or process are you using to reverse weights, and do you have any experiment showing it works?
it works on a small model by looking at other's documentation, the relu is currently the problem, relu is zeroing numbers below 0 so to know if that weight is -1, -100 or -0.005 we take different questions in batches, its a system of equations, its like training a model, if we make the batch higher the model understands deeper and better concepts
I'm kind of confused. Why not just use a linear activation? (If all activations are linear, the network collapses to a single linear transform, so I'm unsure if you're saying your goal is to try to accomplish this)
ReLU zeroes out negative activations by design
and it became popular because it helps mitigate vanishing gradients compared to other activations
Also, by AI training what are you referring to? Machine learning? Deep learning? LLM/transformers?
It seems like you're referring to LLMs
But backprop is not so simple in practice for transformers. Transformers go all in on the attention mechanism, and in the case of LLM use, allowing them to weight previous tokens in a sequence at different amounts in order to predict the next one.
That said, backprop is still the training mechanism. I'm not sure what alternative you're proposing.
I'm a little confused on your statement on usingbackprop to solve ai training. Or what human qas means in this context. Human involvement is often used at the fine tuning step already.
Can you define what you're doing mathematically, perhaps?
(Edits for clarity)
If it would help to explain it more clearly, feel free to go into more technical detail.
I'm also a bit unclear on how any of this reduces either energy consumption or compute costs
That said, backprop is still the training mechanism. I'm not sure what alternative you're proposing.
I'm not sure if they know either lol
isn't the whole point of backprop that the cost/loss of the activation for a given set of inputs can be expressed as a function of the trainable params?
I try to give the benefit of the doubt. I am a little confused on if the goal it to improve the model's "understanding" or if we're trying to reduce compute costs during training.
Like maybe avoiding full backprop in some layers or fewer parameters or maybe are trying to optimize the attention mechanism in some way since it is expensive, I believe it's O(n^2)
Or better hardware for training, some way to reduce training steps, etc
It would help with clarity if we could isolate which computationally expensive step we're targeting and what the improvement would look like.
But also, there are tradeoffs to not doing the full process. And the only things that immediately come to mind for LLMs specifically are...... limiting the context window (truncating the amount of data that it's considering during token prediction) or freezing most of the weights and just doing like final layer tuning.
(And the tradeoffs are that the LLM will not be able to refer to earlier parts of the conversations and final layer retuning will not change its internal representations.)
But I'll stop pontificating until we hear more 🙂
Oh, sorry, thought I responded to this but I didn't. Essentially yes. And if someone can find a way to maintain similar performance while cutting down on the computational costs of this process they'll become rich
Especially if they manage to do so in a way that the fronteir AI companies have not implemented
There is some sort of R&D secrecy act in the US that triggers on the condition of exceeding 20% performance range of existing tech beyond what the meta is. (Its pretty fringe, ik)
That being said...
Use nomials, and prime numbers to generate polynomials, leverage causal masking on the symbolically purposed nomial participation, and bam, you can just use that instead of BPE character/word token embeddings.
Something something... makes it turn into AGI.
Then you can encode skip grams with nomials purposed for skip gram index identifiers, role, anything really. You can also have nomial nucleation where the heads have training stages just permuting which poly/nomial theyre targeting and if one sticks, keep it, as nucleation. i hypothesize thats way better than using strictly just skipgram multihead attention.
If there was some sort of R&D secrecy act with that sort of metric we wouldn't be discussing LLMs at all. Are you referring to the Invention Secrecy Act of 1951? That only covers inventions deemed to be "detrimental to national security"
Yeah that one. So i had shower thoughts about exactly what youre saying and I figured, the only reason LLMs are toyed around with by the public right now, is because other nations are using them too at this point.
But leveraging an even more optimal implementation would be that key factor
But I wouldnt jump into fringe topics further than that on these types of discords for posterity. You know, to preserve my credibility and all XD
It is important to maintain credibility, absolutely.
Im trying to figure out though, using nomials, whether to stick with embeds and weights and softmax, or use a twist on vectorization and nomials, and use that as a poor man's LLM
I threw around the idea of having embeddings that werent strictly word token based, so you could have tokens that represented semantics and roles/group membership. This way, you would have some prospect of performing inference on those tokens, and fill the rest in later. The context could be condensed due to chaining high level tokens instead of just a huge list of words
What if we've been thinking about AGI all wrong? Artificial implies we're manufacturing something that doesn't exist in nature, but intelligence is natural. Pattern completion, feedback loops, reinformcement, decay. They are all natural things we all experience every day. I'm calling it "Augmented General Intelligence", instead.. Real continual learning can only come from the topology, not the weights but it needs external correction pressure. It needs to work with use, not gradient decent training each time, its inefficient. You can't teach a neuron to be a different kind of neuron. but you can change which neurons connect to which, how strongly, and what gets reinforced vs what decays.
Are you proposing automated human feedback retraining loops? Human feedback retraining already happens, but only comes into new model versions and I think that’s a sane practice to prevent model drift for existing implementations. (If my program is behaving less accurately today than yesterday, I don’t want changes in the model to be something I need to consider alongside other possible sources of drift)
What you’re describing with neurons connection to which neurons is represented by the attention mechanism and don’t quote me on this but I believe is the most computationally expensive to train.
We also have final layer retraining, but this just affects the final output mapping, it does not fundamentally change the contextual representation of sentence structure and language that are part of the models contextualized “understanding” of the data
Not retraining loops. That's still gradient descent, still a new model version. What I'm describing happens at runtime, no training required. Attention is close but its still fixed after training. Im proposing something closer to synaptic plasticity. Edges between nodes strengthen with co-use and decay from neglect. "Neurons that fire together wire together"
Hebbian learning
Sounds like it will increase inference time. What’s the payoff and what are you trying to replace/improve? Are you wanting an instance if the model to do this or should this affect the global model?
Of the
The overhead is tiny using graph traversal compared to matrix multiplication. Spreading action over few thousand nodes is microseconds compared to seconds iin a training pass cost. Its a look up augmenting the inference, not replacing it.
So you’re proposing skip connections for… which use case? To help an instance better adapt to a single user/chat/progrsm?
But the payoff is huge, continuity without retraining. Thats just a tip of the iceberg
Can you define continuity in this context?
Sure, so the model doesnt remember you between sessions and every conversation starts cold. Continuity here means the memory graph persists, so the next session it already know your projects, terminology, concepts. Even when other agents in the shared graph use tools, its all saved to the graph.
Okay so you want it consistent across a single user.
OpenAI has a few features that support this. They would be your competition.
They allow for certain “memories” to be stored to a long term bank (that humans can edit/remove) and also uses vectorized summaries of prior chats. The benefit of this approach is continuity in a way that is interpretable and modifiable.
Do you believe your approach has benefits to this approach?
Ya I already made it.
Thats my local qwen 3.5 model, but I also have gemini, codex and claude connected to the same shared graph using fast MCP server
Still tinkering though, ESN Reservoir needs tweaking.
So the idea is that you are storing contextualized memory in a shared graph that you are connecting multiple instances to? What is being stored in the nodes?
ok sorry if I wasn't really clear, I don't talk English that much.
My goal currently is researching a way to skip backpropagation completely, speeding up the model's training exponentially.
Backpropagation is one of the methods to achieve the AI weights, but its really slow, think about it every step we move a bit closer to the final weights of the model after training.
I was reading a paper on reverse engeneering a model's weights, so like you give an LLM a set of Questions and Awnsers and you get with an algorithm a map of the discovered weights, this is still being limited by the ReLU activations, but I think i found a way to go past this: normally you would give a set of qas (questions and anwsers) and get the weights, the change I made its that we send those qas in batches of 8, 16, 32 so we can correlate the activations of the ReLU in group of 16, my example of the AI training referred to this: when we train a model, every step we give it 16, 32 or 64 groups of data so that the training algorithm can find the most efficient way to lower the loss, same here, we have more than one qa set per step of the algorithm.
Also I'm refferring to this as algorithm, but I'm currently considering the Idea of switching to ML, it's easier to develop in my free time.
This isn't about making a step of the training faster, this would impact the whole training system
gemini's opinion
Click here to see this code in our pastebin.
Gemini says that about any AI project
alright but i was clear, even if gemini isnt right, you can prove im wrong
Its important to first understand why backpropagation is used, and what it is ingesting to perform it
Sounds like youre going to steer towards an approach that leverages dijkstra heuristics from node to node in cluster scopes (with multi-head attention principles carried over into conditionally gating nodes/clusters)
guys, in machine learning are there specializations in problems or is it more general?
I was seeing some job openings and one of them said "+2 years of experience with Customer Churn"
are there such things, or was it a specific case?
There are specializations in problems (like time series analysis, classification, regression) but this is also a common regression/time series problem that many businesses want to tackle
so depending on the context of the company, will the problem be different? which are the main ones nowadays?
computer vision? nlp?
(minor disclaimer that I'm in medtech so while I'm aware of other roles in fields the examples will bend in that direction)
Computer vision is still pretty important. Medical imaging companies, retail theft companies, a lot of research labs all rely on computer vision.
NLP is still important for some contexts. Medical billing, anything where you need language processing against a ground truth so scanning medical insurance policy to try to find the relevant section of info vs what you're doing.
Time series processing/forecasting is also pretty big. Fin tech, business intelligence, even patient monitoring devices will rely on this.
Edge deployment is another thing that can come up that is important. Knowing how to deploy an optimized inference model with time or memory constraints.
Those are the ones that immediately come to mind.
Oh, consumer/Saas companies will often use unsupervised learning to better understand their customer base to determine how best to market them
And ofc classical/simple regression/classification problems are still useful in a variety of contexts
Do you know what sorts of companies/roles you're targetting? Or do you want a breadth of expertise that can flexibly be applied/target multiple companies?
So you work with ml in the medical field?
Yep!
i know how and why is backpropagation used, its a long solution to a complex problem with an easier solution
so cool
I'm still studying (I'm currently in LSTM) I plan to start looking in July, and since I've never had experience I don't know the area well yet
Gotcha. LSTMs are very useful for time series forecasting. Stock prices, weather data, medical data etc can use them.
The specific model that is best for any given problem will depend on problem constraints such as what type of data we're working with, how large the data set is, what sort of long-range dependencies we're looking at, etc.
And understanding RNNs and LSTMs help you understand Transformers
Since the reason eachof those models became popular is in part because they addressed the weakness in the prior model line
yes yes, I think like this, I want to study all kinds of models first
from classic ml to neural networks (MLP to transformers), and then when I have mastery of them and understand, then I intend to choose the area in machine learning, do you think it is a good one?
Definitely. And knowing a wide range of models also makes you flexible: you don't have to only apply to time series jobs if you know more than the temporal sequence models, you could apply to other roles as well.
A guy once told me that a degree in the field is an important form, which most companies ask for, and I joined a college this year (I'm 18), and I'll only graduate in 4 years, is it a problem to be studying or not to have a problem?
like, has to be graduated, or can be studying yet?
Tech is in a rough place right now. It's hard to predict, but all I can say is that as of this moment specifically a degree is essentially a requirement for a job in tech and ML/AI specifically tends to be more degree gated than most. Enough so that I've gone back for my masters, but that might just be because medtech/medical research expects it. I am not 100% sure if this is all ML/AI domains or just medtech, though.
But I've heard enough chatter from others that it's pretty much all ML/AI domains, just haven't confirmed it myself directly
how old are u?
Old, like fossil old, archaeologists study me to understand how a decrepit skeleton is able to move and walk around
(Late 30s)
Im using spreading activation for graph traversal, not Dijkstra.
is there here any faang employee
!warn 1417961964319015074 don't spam your program across channels, especially since it's not a python program
:incoming_envelope: :ok_hand: applied warning to @chilly junco.
Hi people
what would you ask them? is it about data science or ml?
Im curious how the amount of neurons in the ESN should scale with the graph.
Anyone have any insights?
@spark kernel your message was removed for asking for a job, which is not allowed
what the hell is that kind of screenshot
stop that
guys, I was reviewing classic ML models, and when I came across Naive Bayes, a question arose.
I understand the model behind it and its advantages and disadvantages.
I'd like to better understand the process of transforming words into numbers. I've heard of bag of words and TF something; can someone explain it to me?
Do you understand one hot vectors?
i think so, it's when you have a non numeric feature, and you transform all the classes of that features into new features with 0 and 1?
That's basically right.
Suppose you have 132 words that you want your model to be able to know. So you assign all of those words an arbitrary number between 0 and 131. Then you represent that word as a vector with 132 0s, except that the element for its index is 1.
This is a very basic way of doing it, since the vector is kind of meaningless, but it at least gives you a unique representation for each word.
And if you represented each word as just a number, it would seem like word #43 is "more" than word #25
gotcha, this was bag of words, right?
Yes
so we created a matrix (samples, n of words in 0 or 1)
Then if you wanted to represent a whole sentence, you can just have a vector where there's a 1 in the index for each word that appears at least once.
Can you think of a reason for why that is helpful, but still not amazing?
i just dont get a thing, when we created the features, the number for each sample it's 0 and 1 (like if have) or it's a count of how many times the words have in the sample
It's usually 1 if the word is present at all, else 0, no matter how many times it appears
i thought it was how many times the words appears, have a type of that, right?
No, it's usually just 1 or 0.
like for using the multinominal naive bayes, i saw a video that using how many times each words appears
I mean that's fine if they have a reason for doing it that way
What do you guys think of NLP? (Gen AI is a separate specialization, for clarity)
I'm looking over the elective courses for my masters and trying to determine which specializations I want to pick up.
I suppose I should try to select the most critical specializations first and then worry about extras afterwards, I may not have any spare. Still curious to hear your thoughts on classical NLP and how it pertains to stuff done today. (I know it's useful in RAG-type architecture when looking for similarity in vectorized large document DBs)
(as you probably know) I work in the language technology department of my company, and we often remark that it feels like we don't do actual NLP anymore, because that simply isn't in demand currently.
Very fair. You were one of the people I was hoping would answer. I appreciate the candor.
a few months ago, a coworker gave a presentation to the department about improving generic model performance for language with non-concatenative morphologies such as arabic, and our department head was like "wow, we actually talked about linguistics"
Hahaha. Out of curiosity, why do you think classical NLP isn't in demand anymore? Is it comparitive familiarity with LLMs vs NLP? Is it that there isn't compute costs/inference time bottlenecks? I admit that I haven't deployed anything LLM related profesionally so am not sure about how the token costs scale to compute costs of hosting a simpler NLP model. (Apologies if terminology is incorrect with NLP, I'm fairly new to it)
Because many many fewer people need to truly understand NLP to produce LLMs, relative to the number of people who can effectively utilize them without knowing anything about NLP.
I found a way to completely skip backpropagation, i tried on small and medium transformer models, my generator model performs 99.8% with 8 layers only, it can generate weights of models now purely with sets of questions and awnsers
I need bigger testers to find out if this is truly bulletproof, I've tried models with 90 million parameters max for now.