#data-science-and-ml
1 messages · Page 180 of 1
gradient descent is like climbing down a hill with a blindfold on. it's "stoachastic" when you feel around a few spots with your foot before deciding which way to go.
@tidal bough do you like my analogy or no
it's okay if you don't
I barely get to think about neural networks anymore now that everything is """""agentic"""""
ah so regular is just taking the trip on a guess and then finding out that there could've been a more efficient route? vs stochastic is taking one step, looking around for the next like downhill slope and going that way?
also it occurs to me that my definition of "not stochastic" might be inverted
It's easy to show that if you take the limit as learning rate approaches zero, the average direction of the gradient of SGD will match the true gradient (and the expectation value of the SGD gradient just is the true gradient, always). But as for why it works in practice for sizable learning rates... One way to explain that is that SGD doesn't go along the direction of the true gradient, it randomly deviates from it. The thing is that, for tasks with many local optimal like backpropagation in neural networks, SGD typically works better than true gradient descent because of this. The randomness makes it better at global optimization, the same way many global search algorithms involve randomness.
i think I remember reading some blogpost that showed a cool result relating SGD to a more complex algorithm but I'm not sure how to find it...
right so instead of assumed/ill informed planned routing, improvised routing?
Like choosing a route on google maps that appears to have less traffic but then adhoc taking another exit/road to avoid the evident traffic and arriving at the same destination?
Sure, that's maybe a decent analogy
i know stat quest is a really simplified version but this still confuddles me.
the way he explained regular descent didn't go back to the initial guesses either, the steps built up on top of each other?
I may have been thinking of https://distill.pub/2017/momentum/ , though it only covers SGD briefly and mostly talks about momentum
Consider taking that dataset, and plotting the MSE as a function of slope and intercept. That's the error function you're trying to minimize, and GD and SGD both do this by "walking" this landscape in small steps, evaluating the gradient at every location. That's maybe more understandable than looking at the dataset itself in (x,y) space with the curve shown.
Were you thinking of NAdam optimizer?
I don't think so
so in regular descent, the derivative of the loss function was calculated with ALL data in mind. a guess was taken for slope of loss function and intercept, multiplied into the learning rate and rinse and repeat until the loss function is close to zero.
i can understand how that would be a lot of computation for a LOT of data points for the parameters.
where as stochastic takes only one data point for the parameters, find a line of best fit, and outputs the parameters for step 1. then the subsequent steps take the previous parameter otuputs, multiply by learning rate, and do the same thing which leads to a more efficient path to local optima?
i can sort of understand it as: each data point being allowed to pull the line of best fit toward itself for SGD until it closely fits in an arrangment where all data points are "happy", where as RGD just shotguns a guess for slope for all data points and keeps adjusting that until "happy"
so i can see why SGD would be faster.
Sure, note that for SGD you're effectively doing much less computation per step, so you can afford many more steps.
E.g. you have a dataset of N=10**6 points. You could do one step of GD, computing the gradient on the entire dataset. Or you could do SGD on minibatches of m=1000 samples each. 1000 samples is many enough that the gradient computed this way would be a pretty good approximation of the true one, and for the same amount of compute, you'll be able to do N/m = 1000 steps, instead of one.
so a compromise of optimization between accuracy and computational power
(you could spend some of that speedup on lowing the learning rate for SGD to make it more like GD, but that's not necessarily what you want, because of that consideration I mentioned where randomness can improve convergence)
i feel like my interpretation here makes sense, could you correct me if I am wrong?
sorta like throwing one thing out of balance in the favor of other until you get close as opposed to finding the best arrangment for the initial data point and then optimizing off that to get a close arrangement that works for all.
Yeah, that describes SGD, sure
However I should note that... 2d dataset fitting is a toy problem, and for it, SGD is, I think, objectively not at all better than GD.
because the loss function here is smooth and convex. GD would just go directly to the minimum, whereas SGD would wander a little. SGD has pretty much no advantage here, so trying to understand why SGD can be good by only studying this task won't go well.
this was the only video I could find that didn't provide an unecessarily collegiate detailed explanantion without actual conceptual explanation.
like it didn't just spout variables and calculus but rather explained the underlying reasoning/process.
but thanks for the concise explanations whilst i stumbled through my thought process!
ill be back with more questions...soonish.
SGD wiggles around which lets it generalize better since it will jump out of steep local minima holes. Not having to go over the whole dataset matters for performance reasons. In practice batches are used in deep learning which is kind of in between the two, and this is done there because doing just one sample at a time would be slow as computers want to work on chunks of things at a time. So the batches act as chunks for better performance.
In non-toy, non-convex problems SGD is used for these properties.
gotcha
how does one determine what kind of cross validation method to use based on the amount of training data?
for example if it goes 50,100,200,400,500,1000 etc?
gm
Please, is there a paper or an article that explains how word embedding captures meaning from training? I recently finished learning linear and logistic regression and multiclass with softmax, so I'm planning on building a sentiment analysis. I'm planning on Word2vec embedding. Training numerical data is simpler because your X is the input data, but from what I've learned so far, the linear model takes the parameter h as an input whch is the avg of all the vectors of each word in a sentence.
And it trains and trains and changes the word vectors (The beginning of the confusion), how will the changing of the vectors make the model understand the words? Since the input is a mean vector and to find the h gradient, you won't use the parameters of the h you first passed.
I know I can use packages for embedding, but it's somthing i want to write from scratch, so if any paper, blog can help, cause i dint even think i under wht im trying to ask any longer.
Thank you.
@tidal bough is there a ML course you would recommend?
my current class is an elective that is...poorly taught IMO. would like something more clarity and structure.
They only represent meaning in the sense that a given word's vector is expected to be closer to vectors for words with similar meanings
Practice more math, helps builds a better mental model for that kind of stuff
A good way to start is to take a specific algorithm and work through it end-to-end, from the derivation of the equations to a full implementation in code. Translating it helps turn the math into something executable builds intuition much faster than passively watching lectures imo.
Hi
Guys I Just completed Python And I want To enter in data science field I don't know what to do Now
Can Anyone help me ?
What formal education do you have and what country are you in?
Wait can we put links to GitHub repos for data science projects in this channel?
There's #1468524576479641744
Ok then noted and thanks
For the more data science inclined python devs, when working with model predictions, is it like a procedural thing, or you just have to work an entire new brand of logic to get what you want
For example: if I'm working with a small dataset I usually use a linear regression model and then load the dataset, clean/wrangle the dataset , then select my features and my target, run some metric scores and visualize. And I go about with that almost everytime. Is that the standard case? I know different datasets and features to predict require nuance and different predictive models as it's not a one size fits all scenario, so I'm just asking if it's as procedural in data science like when you're making am omelette, where you know exactly what to do and the process doesn't change
Is there anyone who is looking for a dev?
Hey guys! I got very interested in coding and especially data science in the past year. I learnt python pretty decently and started learning other tools and libraries with kaggle.
I am ambitious but the path is unclear. I would be happy to get a little clarification about the best way to build out decent data science skills. Like a roadmap.
Thanks in advance
There are well described patterns/workflows for standard ML work, yes
Logically, you:
Understand the ask, and determine an approach
Identify relevant data sources, if you're working with more than one source of data
Understand the data, clean and reshape for your purposes
And determine if you need an ML model
If so, establish a baseline, and note general transformations that you may need (OHE, missingness, standardization)
Test different models and feature engineering steps
See if performance and complexity are justified
Determine deployment strategy
Like you noted, these steps are not all encompassing, and every problem has it's own details
Huh, so basically if the output is looking somewhat reasonable for a complex dataset, after using multiple models, then that is probably ok?
Well I appreciate the detailed answer if anything. T for thanks
If you've tested various models and feature engineering steps, then you may determine that, with the current dataset, you've reached a performance limit
You'll make the call if performance is acceptable
I guess that is why it is data science. It's still in a discovery phase of sorts
Usually within industry you can tie e.g. dollar amounts to the events you're trying to predict to help you determine if the model is worth implementing
This may potentially violate rule 9, but I think I'd like to look into potential options for learning AI by paying for a curriculum or some sort of teaching service. - I suppose I'd say, not asking for anybody here to do it specifically, but perhaps somebody here knows of some reputable sources for such things? I think what I need is a good curriculum to follow then I could be more confident in the directions I'm taking
You can ask about suggestions for learning resources that one has to pay for, as long as you don't try to solicit an exchange of money between people on this server.
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Hello can anyone look into my issue on python-help?
you could at least link it
you know exactly what to do and the process doesn't change
generally each dataset has its own quirks you have to work with, so nah
I mean as a very exaggerated example - would you treat a tabular dataset the same way you do an image dataset? I hope not
it's also seldom 'procedural' as in one-and-done, after a model is trained you inspect its predictions and if it's not good enough, go to one of the previous many steps to try and improve it
I just can't get over this concept. Easy to grasp W_(target node, source node) and easy to compute Wx+b.
But I've seen some other example online that takes in account W_(source node, target node) and then does a transpose of weight matrix before dot product. Why would we need to do that ? is that a special case based on inputs dimention or something else?
if I had to guess, it just depends on the author chose to construct the weight matrix in the first place and has no additional meaning
in the second image the weights of each "neuron" are organized in column vectors instead of lines
there's stuff about line vs column vectors which may be meaningful if you're in some advanced text, but I havent seen it a lot outside of research papers
https://en.wikipedia.org/wiki/Dual_space
but yea not worth digging into this if you just learning grad descent and other basics
In mathematics, any vector space
V
{\displaystyle V}
has a corresponding dual vector space (or just dual space for short) consisting of all linear forms on
V
,
{\displaystyle V,}
together with the vector space structure of pointwise addition and scala...
I'll stick with W_(target, source). its in accordance with matrix multiplication rules. Same approach is used in PyTorch and Keras. The other example is just verbose in my opinion. You take the transpose again, thats an example for maybe a teaching exercise or something.
I've got experience in DL, ADL. I was just wondering the "WHY" of doing it that way.
yea I agree, this is what makes sense and what is done by convention as far as I kno
authors preferences, context of the text, or plain random chance somehow
This was spot on. The Transpose comes from dual space view, I'll stick with primal view and ctrl + alt + dlt that ever saw the dual space way of doing neural networks lol.
@final kiln Thanks.
any time
Currently I'm doing BCA computer science And I'm from India
What are the most common data visualisation tools? Is it power bi and tableau?
If I was to learn one what should i choose?
Has anyone played around with WorkClaw?
!warn @hardy fractal your message was removed for listing a job, which is not allowed.
:incoming_envelope: :ok_hand: applied warning to @hardy fractal.
Ok, I am sorry.
What should I do for that?
Coding or non coding?
Tableau, Power BI, Looker Studio are the top 3 most common, but there are others
Programming languages can make data visualizations too though
go to a hiring website
I build a "brain" map of my AI system. Just wanted to see the inner workings a bit, but it will be really helpful for debugging. But since its initial origin point its built and connected these nodes over roughly 2 weeks, its organizing its thoughts. but so far so good, no hairball. Its currently integrating itself into WorkClaw, its really fun to watch it just go. I dont even watch TV anymore.
its pretty incredible what some of these local models can do these days
Yes and even excel, however, I really dislike having to choose one as an ultimatum. It's like going to a party with a random assortment of food and deciding you'll only take one item of food, despite no explicit instruction saying you should do so. So go ahead and have fun with both
does SMOTE work with high dimensional sparse matrices like TF-IDF matrices?
If not, then should I reduce the dimension of the TF-IDF matrix by limiting max_features, or use Truncated SVD to reduce the dimensions? What's a good rule of thumb to the max dimensions to use for SMOTE?
neither smote nor svd work for sparse data
general linear combinations aren't sparsity-preserving
I'm guessing there's no over sampling method for text data that uses just ML then?
I’m not sure if this is something I should be asking here, but do you think that having written my own programs to conduct materials analysis during university and graduate school could be considered one of my strengths when it comes to job hunting?
yes
Hlo
how's health?
better
Absolutely
guys, is there that idea to have tokens that only the LLM can generate and they are not displayed to user but they are like for reasoning (not just reasoning tokens, but tokens what are not in dataset, not in human texts, not mapped to real text, only unique id tokens)?
any bioinfos
Im thinking about doing a little programming project for my workplace where Ill make a program that can automatically detect and measure wear on a tool. My company is a tooling company and when Ive been tasked to record tool wear for a project, its tedious task taking a photo with a specialised microscope cam, positioning the tool and then trying to get a decent enough measurement. I want to first automate identifying siginificant tool wear and obtaining the measurement for it. How do i go about it?
I feel like you're describing coconut by meta? https://arxiv.org/pdf/2412.06769
is your emphasis on hiding stuff from the user?
if so just host your own chat interface that the users must use to interact with your model
Hi Guys
I have been experimenting with sentence relevancy for the past few weeks.
So I made Scout , its an experimental attention model that slightly modifies the standard Transformer attention architecture to learn directional relevance between sentences instead of tokens. Instead of asking "are these similar?", it asks "does sentence B actually help sentence A?"
Still a small model trained on ~4,500 synthetic pairs. The deeper question that am trying to asnwer is can attention mechanics encode functional utility rather than just contextual compatibility. But early results are interesting.
Do check it out and tell me what you think
no, I think just tokens would push model to be more token-eff
then the other person linked you something relevant I think
a quick search also found this survey on latent reasoning which may or may not be of interest
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, especially when guided by explicit chain-of-thought (CoT) reasoning that verbalizes intermediate steps. While CoT improves both interpretability and accuracy, its dependence on natural language reasoning limits the model's expressive bandwidth. Latent reasoning tac...
what something different, what I suggest is to add tokens what can be only generated by LLM and are not present in any datasets to the vocab
okay, not only that but also force LLM to use only them for reasoning by showing to the user all tokens except these
latent reasoning as I just said is something different
can you how that will work exactly? if you want:
- 'reasoning' with values not in the vocabulary
- humans can't see it, the model 'reasons internally'
how is that different than the normal no reasoning mode, where magical stuff happens internally in the llm, with intermediate vectors not corresponding to any particular word in the vocabulary, and also humans can't really 'see' it
how is that different than the normal no reasoning mode, where magical stuff happens internally in the llm, with intermediate vectors not corresponding to any particular word in the vocabulary, and also humans can't really 'see' it
allows to save progress between forward passes
normal non-reasoning LLMs can only "think" inside a single forward pass, which is extremely limited
so you want stateful llms? a model where some internal state is modified based on input, and said state also affects outputs?
actually before transformers stateful models were more prominent, like recurrent models, but they largely fell out of favor due to being hard to train/scale
you can check RWKV which is a novel architecture that's been devved for some years now (though to no larger scale or adoption), I think it has some close ideas
no more stateful than they currently are
elaborate?
My idea doesn't add any new state
the only state is previous tokens
and it is used anyway
actually, I wouldn't call it state, that's just part of LLM input (LLMs are auto-regressive)
and these 'tokens' are invisible to the outside?
how would these invisible tokens be meaningfully different from internal states?
internal states reset after each forward pass
one forward pass is only one token
not necessary, but they should not be mapped to some text
any data can be mapped to text; like of you opened a png in notepad you'll see text characters
I feel like I'm just trying and failing to guess what you're actually trying to describe
can you like say exactly what you mean and how its different from existing architectures
Ok 👍
anyone got experience with gpus and OOM errors? im doing a project where we have teach a neural network to play from a baseline of data and then improve from selfplay (baseline is generated by a MCTS agent and then a PUCT agent trains on it and starts the selfplay phase). the problem is that when i got to training the neural net i set a batch size of 64 for the data and it keeps causing a OOM error. iv only managed to train the first model on 10k games from the MCTS agent, then i generated another 10k games using the PUCT agent but now when i go to train the 2nd neural net on all the previous data it keeps causing a OOM error, im not sure why
Hello, quick question, our lecturer told us to split our data from the very beginning and to always work on the train data , anyone has any idea why? For e.g what I'm used to do is preprocess everything and at the very end then only split the data.
if the test data is unprocessed and we test our algorithm, say linear progression, is that a problem?
specifically, anything that needs to be 'learned' - you can only learn it from training data
this ranges from the obvious, like your linear regression model, to something not as obvious, such as StandardScaler needing to learn the mean and the std of the dataset - you can only use the training set to find the mean and std
when using standardScaler for e.g, this normally should be applied on whole dataset, no? Like to scale everything to a specific range?
it's in 2 steps
you should scaler.fit() only on the training set, but then you can do scaler.transform() on whatever
so e.g. you may do
train_X, test_X = train_test_split(X)
scaler = StandardScaler()
train_X = scaler.fit_transform(train_X)
test_X = scaler.transform(test_X)
... # more processing
however you can not do
scaler = StandardScaler()
X = scaler.fit_transform(X)
train_X, test_X = train_test_split(X)
```because your scaler had the information from the test set to learn a better mean and std from, your model will seem better than it is on true, unseen data
I recommend using a sklearn.pipeline if you have a lot of steps
oh ok, I see
also the above extends to manual processing as well
e.g. for some unsplit numerical dataset arr you can not do:
arr_standardized = (arr - arr.mean()) / arr.std()
... # split later
```because again the `.mean()` and `.std()` used information from what should be the test set
it's for the californian dataset housing prices, it's not really a big dataset with lots of steps, more a dataset to learn the fundamentals
yep I see, I will read a bit on what you mentioned and come back but I understand the gist of it; we want the test data to be truely unseen
yeah
again, anything that requires stuff to be learned from data, you must do after splitting
there are other operations that don't, in which case you can do it whenever
for example, it's often you train the model on the log of house prices (then exp back the model predictions, for the actual predictions), because usually house prices are very skewed and taking the log can make it fall into a nicer distribution
since log doesn't need to know anything about the dataset, you can do y = log(y) before splitting it into train_y and test_y no problem
oh ok
Will come back, I will dig a bit in what you mentioned/read a bit to document myself and come back, thanks for the insights
yeah I have a better understanding now, thanks !
Can I share my idea I had about Ai system I was building and how scifi is this or could this work?
Anyone here?
no need to wait for permission.
I just want to be courteous to the topic at hand they are having and not spam.
A long while I was doing research and I had this idea, though I don't know how practical this would be to implement. Parallel Forest: A Mesh Network-Enabled Model for Diverse Task Management
Overview of Model Architecture
The model utilizes a unique architecture that combines forest clusters with a mesh network to efficiently handle a diverse set of tasks. Each cluster, resembling a revolver drum, contains dense trees arranged in a matrix formation. This matrix arrangement enables parallel processing and efficient computation within each cluster, allowing for simultaneous data processing and pattern recognition.
In addition, the forests (Random) are interconnected through a mesh network, which provides redundancy and fault tolerance. This interconnected nature ensures that the model can continue functioning even if one forest goes down, as data can be rerouted and processed through alternative paths, maintaining the overall functionality of the model. This scalability allows the Random forest to expand based on the tasks given.
By leveraging the matrix formation for parallel processing and the mesh network for fault tolerance and redundancy, the model aims to achieve robustness, efficiency, and continuous operation across a wide range of tasks. this is a bit general overview....
can you give an example of something a model of this architecture could be trained to do?
also why would one forrest go down?
I have more of this, but I did work on some code and I started out with basic binary classification problem for valid and or not valid worlds of which I was training NN for, though it did have also decision trees. When I say go down, I mean power outage or problem with a node. If I remember right I simply wanted a system using mesh network for certain ML tasks, for example one node would be down so it switches to another. Sorry this was A LONG while ago I did this.
I don't have an example as this was a road map for me to start on, but I never did get properly started on it. The idea basically is a multi core processing. The idea is that, one cluster would have Just math so when math is needed that cluster would be activated FOR that task, it would then revolve again for say language processing. Sort of like LLM with multi hat prompmt however, each node does ONE taks only. The master controller then manages this of which node is used acting as input and output. I have more of this but I am afraid to spam. It.
I unfortunately have no math and no code at the moment that I can find. However here is more info on this.
Decision Tree Details
The dense trees arranged in a matrix formation within each forest would appear as a grid or array of interconnected nodes, representing the individual decision trees. Each row in the matrix could correspond to a specific feature or attribute, while each column could represent a different split or decision point within the tree. The nodes within the matrix would be interconnected to facilitate parallel processing and information exchange, allowing for efficient computation and pattern recognition within each cluster. This matrix arrangement enables simultaneous data processing and collaborative learning among the dense trees, contributing to the model's overall performance and robustness.
Just for the decision tree.
Basically if I remeber right, LLM have hard time with math as they are mostly LLM. Suppose you want a math problem now, the language node and math node would talk and are able to get to you the solver for the problem at hand. I used matrix because its the best for math processing. At this time I was working on a chip that was called MALU (matrix aritthamtic logic unitt) I thought I campe up with it, but apperently this existed already for sometime now since the 70s or early.
sorry not sure what is going on and double ttt textt, my keyboard being wireless is having problems just a sec.
Hi!
So I am new to python. My teacher told us that the second chance at exam is in may/june. i suddenlh got a notification and now our exam is on friday. The exam is like this multiple choice exam. You pick what code is right. Some of them you need to fill in the blanks and some are math python stuff. I have had a rough half year and I had to love across the country and I really dont need a good grade. I just need to pass. He said everything is allowez expect talking to people and AI. Do you guys have any tips for me? It would really help me, I need all the tips I can get, even the basic ones. I am not good ar coding and our teacher has been very absent. The whole class is failing and we know almost nothing.
had to move across*
eden you have question to help you with?
Not really a question, I am sorry! More like do you know any websites that perhaps can help me under the exam? Something like that. He said we can use internet but not AI
Right to help you with what exactly? I am not not sure what will be on your exam, you have notes to tell you what you should study for?
I have python for beginners. Not anything difficult. Why the exam is hard, is because the exam is full of very long code, and we need to either fill in or to pick between code lines. Give me 1 minute and I will show you how far we have come into python
Ok. Otherwise this reads as if.."my teacher told me something to do with the room and he wants it to look certain way, but I am not sure what to do about the room, we have been doing stuff in the room but he said we cannot talk to people or use ai to do something with the room....
Oh ok! Well I am sorry, never mind just forget it
I am not discouraging you I am tying to ask you to to ask me or us specifics. What was your topic about?
Oh good, now I am the bad guy...
@glacial socket Still there?
Check the syllabus / exam guide if you've been given one and look up the topics.
If you like books, Automate the Boring Stuff is decent and free. If you like videos, I think FreeCodeCamp has a Python intro.
Since you're asking in the data science channel, let us know if your exam is specifically about data science content
Maybe I am just bad with people. Thanks Twiibz
That said anyone interested in chiming in about what I wrote with regards to my road map to making this system. For example I can see maybe doing small nodes for certain tests. Perhaps making a binary decision tree but braking it up how it is accessed. Having a master controller managing. Perhaps have a node that just stores words, and then have binary decision tree tell you said words of valid or not. IF you ask for example ABC or bob is valid.
@serene scaffold Any input on this topic?
@serene scaffold Do you have no opinon on this? Should I move this to some other place for talks I am open to any ideas or questions.
I'm working right now and can't get into the level of depth that your topic requires
this is the best channel on this server for your topic.
@serene scaffold Ah ok my apology
Does not look like they are active there. Maybe I will comeback this evening.
That server sucks absolute shit
Instant banned for starting a talk in General chat
Redditor level mods
What the hell?
I was having a conversation about my idea I had while back and I got banned for it, never told why until I had to ask.
You were pasting LLM Text
Clear as day
LLM text? you mean REGULAR text? what is LLM text?
Just don't talk to me you just got me banned for even being near you
I never used GPT to think out my thoughts, Now I was pasting large block of text, BUT ASKED if could.
Unlike you I actually care, build and have had a passion for it.
Holy hell, how childish.
I was A asking if this something possible and B have a conversation about it. What did I have to gain by using GPT for this?
Whatever. That was on you. Don't copy and paste ChatGPT shit.
I got banned in Affiliation
BS it was a wall of text now don't argue no more
YES it was a wall of text, I ASKED IF I CAN DO THIS. You said sure.
I could have done it line by line why would I though?
I was getting my thought out. What a wired reaction.
I said SURE cause I just was looking for conversation in something I'm passionate about, I was literally about to tell you to stop the LLM Spam.
It was already getting on my Nerves just type it out next time.
Ok, lets do this, what part of it was LLM specifically.
The idea of multi core system for specific tasks?
How about we start off where we left off, It was about KGs
Ok tell me more about KGs
All right I wont do that, though I was not trying to spam. I actualy asked if this was ok, it seemed like it was.
I did build something though I no longer have the code, I started out, AS I was saying using simple binary decision tree classifier if I remember right, for valid and not valid words, I was training to look at impute like this dgo and dog and tell me if this was valid or not, though I was going to try and make it into my multi core system. I abandoned this idea.
Just a reminder put yourself in their shoes if you were running a server about AI, you'd be quickly flooded with just bots, No tolerance for that shit in the AI communities, no second chances in servers like that when you first join.
Fine, I am no longer interested to ever be near that server. I was suggested to go to it for deeper talks about it.
That was all, since the person who I was having a conversation with did not have the time for it.
I continued spaming text to get my full thought out. I could type it out but that would be really strange. The mod who handled this could have reached out and talked to me but nope.
Fine by me. Never using that server.
Build it again trust me you'd keep building it over and over for years realizing how hard it is, The main issue is stability at scale when we're talking KG
RIGHT, that is why I stopped as it would require rather a lot of work and processing, that said, it would be interesting to implement it in more small scale. Like using simple binary classification. Have nodes comunicate in a mesh network doing specfic tasks. I did not say it was a good idea, I was curious how practical it was, I even asked this.
I cannot now tell if anyone was even reading anything about by in between inputs.
If I can do it you probably can
Like I said if you want to build it you're gonna need to learn a lot more about "experimental" equations for AI
If you're not good at math it's going to be a struggle.
Tell me more about your system you built
It’s a governed hybrid system focused on determinism and bounded reasoning at scale.
Telling anymore than that would be revealing
Yes, I did not know you can make a hybrid system that is why I wanted to talk about it, what did yours solve when you built it?
Sorry, I having a conversation with the mod right now and they are REALLY wired. I honestly cannot tell if I am not talking to Ai.
The one that banned us..
I think that's as much as I'm going to share so don't expect much more cause anything else would be systems level revealing.
Are the nodes hardware or did you make software? I was thinking the cluster would be servers...more or less doing certain task IF I was to scale this.
I mostly started with emergent behavior systems .
I mean you can network it.
Okay enough is enough 🤷♂️
And if you wanted to see a visualization of it(limited though cause it'd crash)
That is EXACTLY how I saw it in my head though I did view it also form top view
To give you an idea of similar to what I had was something like this processing https://www.parallax.com/propeller-multicore-concept/
honestly that is incredible
Thanks
Man, I am telling you the mod that banned us is REALLY odd, I feel like they are an AI. We are talking about regarding how I knew a lot of stuff I talked about and they said we like cs50p. WHAT?
Its some Harvard corse but its not relevent to the topic...;
Doesn't matter to me anyways better if I'm not on there
Yeah dude I don't know really odd, anyway stay around I do want to talk more bout it. Let me ask this, was my "spam" wrong in terminology I used? Was there something I missed on what I said wrong way?
Let me be real with you here I'm done talking about KG stuff, for me I don't want to say something revealing.
Ah ok your own private project then?
That's as much as I'll say aloud publicly. Yeah
KGs are a hot topic right now
If I truly had something stable at scale and everything I said it was I'm sitting on Gold.
If you're interested in KGs then start now
If you just want a hobby project just take it easy
All right, anyway sorry for getting you banned but that was really odd.
@half pulsar Hey I found this, this seems like what you were talking about https://pmc.ncbi.nlm.nih.gov/articles/PMC11316662/?
hello, quick question, when we normalize our independent variables, say for a linear regression algorithm, do we need to normalize the dependent variable also, that is the target variable?
@thin sky Can we have a conversation? I am really confused what you do not get? I am actually really curious about this. Perhaps call?
What did you make? I made a KG too.
Yeah I know nothing about KGs I just got introduced into them.
I presented my idea and turns out the concept I have is similar to what I had in mind. I have not built anything yet. Long road there.
Now scale it to 500 million nodes. In Seconds. No LLM.
OK this may sound dumb but are you guys interested in having this compressed into smaller nodes?
Could one use matrix for this? I am trying to remember this again.
I'm using Postgres + pgvector with a typed graph schema . the data model already maps to what enterprise kg systems use. but the main thing i havent done yet is flip postgres from mirror to primary store.
@half pulsar Is there no way to do tthis # Compress ONE text into 2x2 matrix, then reconstruct
text = "HELLO"
Split & store in matrix (list of dicts)
matrix = [
[{"char": "H", "pos": 0}, {"char": "E", "pos": 1}],
[{"char": "L", "pos": 2}, {"char": "LO", "pos": 3}]
]
RECONSTRUCT (concatenate)
original = ''.join(cell["char"] for row in matrix for cell in row)
print(f"Original: '{original}'") # "HELLO" but on larger scale and with nodes?
I am thinking node in hardware terms here as a sort of another brain sort of like that one chip.
Pretty common stack
Sorry, can't matrix used for storing data if yes and then concatenate out? If so can't you use that for maybe not the node but the data it has?
It wouldn't hold up in scale.
Would that be just too much to compute?
Not only just that I wouldn't expect it to even be stable.
Why?
I thought matrix's are designed for fast processing, specifically numerical data no?
There is a lot more than to it then just trying to store information in a huge "matrix".
And why would you store something useless liike Hello.
The hello was an example
Get better examples, it'd help you build a better structure, you need a goal.
I thought also its braking it up so that its not processing all at once but in pieces
Though I guess overhead would be the problem trying to concatenating out is that why it would not scale?
Yeah overhead would be the biggest problem there.
Yeah, I remember trying to do something with compression and even trying to compress all the indices that have to go on to under what you compressed was a problem.
This is party why I left the project, I simply did not have the time to learn the exact math for it.
You need a algorithm.
That is true but what algo would be good to work with that?
I'm not telling you that XD
So need to make a custom one then?
Pretty much
Interesting...
Think in layers here
Yeah..but for matrix would it not be for x and y and z? You would basically need to make a cube?
If you layer it, that is what it would become yes?
These are questions you can find answers for online, there's a lot of research papers on that type of stuff.
I have read them but a long while ago, I am just remembering some stuff I read a long time ago.
Yeah this stuff has been around since the beginning
Layering would be an algo you would have to use.
I am just thinking outload..it would be best if I was in a voice chat group about this.
I have no experience with this or not much of it, but I am just thinking this out visually what it would look like and what you could possibly use for it.
The problem is that you're over thinking the wrong thing, its not as simple as how do I store information, its more of how can I retain that information at scale with stability and do it efficiently, it takes many "Layers" to get to that point even then you'd hit a wall of Complexity creep, Every-time.
It has to be thought of as Layers and Systems, its pure math here, if you need to I highly recommend refreshing math from the basics, Its about Repetition here, once you know the basics of math you can build up to actually implementing it into structured code.
Nobody is good at math, You need to practice regularly.
Ok I am reading this again as its been a while not humor me as I am using arithmetic logic unit for this that would be implemented in harware form..something like The core of the MAC is the Matrix Arithmetic Logic Unit (MALU). The overall functionality of the MALU is to perform the matrix operations and write the output to memory. Could one not do something like this same idea for nodes?
Once again overhead, tell me what do you want to do clearly here?
What I have saved was this link https://www.ece.ualberta.ca/~elliott/ee552/projects/1998f/matrix_calculator/MALU.htm now I get that this is NOT quite the same thing..but the concept seems like you could do it...I mean high level abstract computation is just down to binary anyway...
Like give me a demo of what your project would do. In text
The link I gave you is what I made a while back but using verlog I think that is the name...implementing a chip. For a 16 bit system.
I was wanting this as ALU would be much faster FOR numbers specifically. I then wondered if one can use this for higher levels. Mostly I wanted to see one can fit ML into 16bit system and what can it do.
I never did finish 16bit computer as I got distracted thinking about this...then Ai.. My mind loves to move fast..a lot
I'm so confused right now.
So you're building a 16 bit computer?
Yeah sorry topic jump, its my history why I am so hung up on matrix.
That makes more sense now personally I don't see the value in proceeding with that.
I was building it, I have a lot of ideas like these as I am expressing and I get half way through them and then leave them....
Like if you want to build AI you don't need to build a 16 bit computer
Talking about two different things here KGs and 16 bit computing. Its having me confused
No, but I do want to build one since I would know exactly what its doing, I would then try to fit a FORM of LM into this system to see how strained it could get. Perhaps then needing more ram but curious what sort of lago you would use to compress it further down.
Think of it this way, you can implement KG ON a compute yes clearly?
How low of bits can we make this to fit it.
Maybe even just one node.
I don't see the benefit
Think of the moon lander...
They had to do with very little yet they were able to get A LOT done, now what sort of algo did they have to use, Its impresseve, same idea, though we are NOT ladning on the moon.
Giving me 80s/90s vibings NOT a good thing
I mean this is what nvidia is doing are able to have large processing task done on a chip.
You can implement your KGs I bet, much better on thier chips.
I am sort of thinking a long those lines, though I thought matrix would be involved
Anyway, I digress....
My main question is, HOW small can you make the nodes to do what you need them to fit on embedded systems.
Why not? A node does not have to be with KGs..it could be some other processing node for say classification problem yes?
Even me telling you my project does hundreds of nodes in seconds is revealing
Hundresed of nodes in secs on how large of a system with how much ram do you need?
You don't have to answer that, but suppose you do it with some other simple problem
96gb is a lot of ram, though context here is how much processing are you doing with data, so it must be impressive, now assume you try to do this with even less ram. Would you think you could?
Answering that would tell you the magic behind the curtain
Is that the active project then you are working on?
Ah see, we think a like. I am just approaching it from another way. Clearly you cannot out engineer complexity to scale eventually you do need to add more ram.
So I guess I am not crazy or stupid in this yeah?
Can you tell me what task you are trying to solve with AI?
Ok let me ask this, can you access your nodes procedurally?
Meaning only with in certain task given it access said nodes as some time when its done it dose not proceed with the data untill need more?
Not sure if I made that right.
No...hold on let me think on this;...
hi
You have a node, and its doing some data or needs to access some data yes?
Ok wait, could you build nodes procedurally instead of by hand? if you know what I mean...
I feel like that is something you would want....
If I need more tasks done I would extend nodes but I have an algo that just builds them as needed yes?
I'm not going to describe the architecture
Sure you don't have to, but you get the idea what I am saying, am I on the right track in thinking that generaly?
I am not looking at your project or needing some info from yours, I am just reasoning all of this out, I am not even doing any math about it lol
The problem is that overhead for having to build these nodes as needed, the processing would be hard, so I guess you would have to control rendering for said task. I could think using pricewise function for this where it predicts future use needs. Or something of along the lines of that sort of algo,.
Like this is as much as I'll let myself say aloud, You just gotta take it and make what you want from it
This seems strange that I am actually that close to your project surely this all seems BS to you?
No you're way off-track but thats fine don't try to do what I did, it's up to you to build whats right.
Oh ok good
Don't try to follow others don't poison yourself, Just like I said the answers become clear when you know the math
Are you afraid that if you correct me it will leak out on how you are thinking so I might start develping along your lines?
I'm not going to tell you what ways or any other pointers, except for scale and stability. watch for complexity creep.
That's the only thing you should focus on after learning the math behind it
No, because everything I've said is just so vague and high level and you can make any thing out of it.
Let me ask this, can you make procedural nodes Suppse I do not care about processing for now.
Were you using some other architecture besides KGs. Meaning did you start the project with that in mind or was there some other you made before but did not work so you rebuilt it?
Anyway, I am actually not going to pursue this, its interesting to think about. I am just going to do some more python programming for whatever ideas I have, like procedural nodes that interests me now.
This is actually how I got my project here done, I was thinking this much and just implemented it, it worked but not sure how efficient it was, though I am not really good at programming.
This is something I've been working on since 2014. I don't expect people to try and pursue it.
is it possible to convert chemical structures and their information into vectors for machine learning
With regards to this anthropic has followed suit and removed safely from its mission statements as well, and currently chatter around is due to the upper hierarchy of anthropic frowning at their usage of their AI in military operations. Of course they use grok and openAIs chatgpt, but it seems claude appears to be a cut above the rest. And with Pete hesgeth leaning towards labelling anthropic as a "supply chain risk", I guess it was bound to happen
https://www.linkedin.com/news/story/anthropic-shifts-stance-on-ai-safety-7047916
Its a sad day, all integrity out the window with corporate America and the Trump admin.
The military world demand this regardless of who's the President.
Like a parade?
I get it, the USA and CHINA in a new space race, AI. Everything the admin does is political. But its not presidential the way its going down.
Theres no normalizing this dude.
anyways! im done
It's just the nature of the Pentagon. I'm not saying I agree with this.
Consider nukes, for instance. There's no guarantee that they will not use them at some point, and any external control over their ability to use them would be unthinkable to them.
This whole topic is f****d
Currently 21/400 ARC training tasks solved with unified approach! https://github.com/Julien-Livet/aicpp
Artificial intelligence with a network of connected neurons - Julien-Livet/aicpp
I'm running a flow for my chess model where I let codex: 1. validate lichess datasets (using python-chess) 2. upload to hf 3. start a runpod to train the model, 4. get it back 5. start and run the model with a bot account to compete against other bots on lichess. (all this done by codex running commands, scripts, and following my specs)
Anyone interesting in giving me some pointers or talking about this?
The model I'm training is a phase aware (early game, mid game, late game), LSTM next-move model, training on elite game PGNs (from lichess) capped at 4 random moves moves in the game.
guys i need help with something
I am building a NER for NCBI disease Corpus. my text abstract or the input sequence are variable in length. I Plan to use LSTM for this task initially and i am using TensorFlow as the framework. Problem is
How do I handle the Variable length input sequences
You can pad the variable length to maxlength seq?
ok i can give this a shot
Also I can be wrong but you can ignore the added info(padding) during training
ok how do i do that? isnt that a feature available in pytorch?
another thing, i am having issues tokenizing the sequencing. i tokenized the labels but for the input i still have no idea what is the right approach
I don't have any good resources for that but when I train something like convlstm or unet on images
Sometimes satellites don't capture data so it gives fillvalue or nans
I remember there is a way to ignore this fill values during training
I will share it as I find it
criterion = torch.nn.CrossEntropyLoss(ignore_index=255)
Something like this
yeah i researched it a bit apparently you pad the sequence by <PAD> and then mask it so it doesn't affect the training
Great
there is another thing i worked on recently but it was never cleared up. lets say i train a RNN model on patient EEG session. in testing i pass on variable length input like pat 1 5 session, pat 2 20 session and so on so based on that how do you predict the rest of the sequence
What do you think is the best way to track these types of stuff?
I tried doing something like a cup tracking but the resnet18 model I tried using to track coordinates of every cup at all frames is having trouble with the occluded samples (returning coordinates of nothing because it can't see the 2 hidden cups). Because of this, the network i trained to connect the coordinates per frame is making mistakes
Resnet18 might be lacking in resolution but i can't do heavier model as this is meant to run realtime
Maybe autoregressively? Like taking past few input x1,x2,x3 to predict x4 then x2,x3,x4 to predict x5 and so on
so if i train for 60 session and in testing phase i pass only 20 session will the LSTM throw any kind of error for not having fixed dimension? Linear regression and XGBoost did so unless i am understanding something different here
I will stop commenting as I am not qualified enough to comment on this ,I work on forecasting so in that I take like past 8images and then predict next 4images so it's little bit similar but still not enough to guide you.I will let someone else take over from here
Sessions ? what does each session is made up of I am curious
average of a EEG recording in a single session i believe
hello I made a data cleaner program but I need someone to test it... can anyone help?
https://github.com/Mohammed-Musab/Lazy-Data-Cleaner/releases
(Note that it might be unstable since I added GUI recently)
Hii everyone I'm new here
hello
This may be a stupid question, but i want to use LinearRegression() from sklearns, and use the fit function model.fit(X=x_train, y=x_train), my teacher has written for us to use model.fit(X=x_train, y=x_train) and not model.fit(X=x_train, y=y_train) , is this a typo or am i misunderstanding something? Also is there a reason why the MSE of model.fit(X=x_train, y=x_train) is 500+?
That's definitely a typo.
Do you understand x and y, and train and test?
Yes, also forget the question about MSE, I used the y_val set.
Great
Thank you
any one worked with equinox
just ask your actual question
Has anyone ever used the hand drawn number and letter data set to generate a message using the dataset
Guyzz listen I want to create a bot for instagram gc any body knows how to make ?? So olzz help me
I want to impress my crush
If you want to impress your crush be yourself you don't need some type of machine
Because people will just think that's too good to be true and when they find out it is resentment being hurt that's worse than just being yourself
Yuppp bro but I want to do something crazy for her
I want to make something by programming
How long have you been working with python?
Just a few months 😔
Specifically because that gives us more of understanding the question
Do you want in AI or something like a regular program
You gonna let that stop you?
This is your crush we're talking about. Take a page out of Nikes book and just do it. 🙂
Ahh got it
I'm sorry I just need more information before I can really give a response
AI-based is also fine, but I want it to impress her and look cute
Haha 😅 you’re right, I should just go for it
I’ll give you more details then.
So you were thinking of making a large language model
Or an ai that can generate a poem from a image?
I actually wanted to make a welcome bot for an Instagram group chat, but it’s quite difficult
But now I’m thinking of making one for my crush that trending blooming flower thing from reels and host it on something like Netlify
I'm sorry I wish I could help you
I don't really know how to use social media I wish I could help
can anyone suggest me project in neural network for resume
@lyric vale How simple or complex do you want the project? Why not make a simple binary classification and train the NN on dictionnary words for valid or not valid words. Something like dgo and dog, one is valid and one is not valid. You could then extend the list to add more words. Kinda over kill using NN but why not.
i am just started learning since 1 month so not much complex would be better
@tardy haven You can make something better, if you are really into them you can simply say, here is thing I tried making, I was going to program this whole complex thing, but I am not really good at it, but I tried my best. I am rather sure they will appreciate the effort.
right now i am working on human written text predictions which is almost over
@lyric vale Projects starts with what you know. What do you know?
basic concepts of neural network
Predicting just text like 1 2 ..4 the what is missing is? Or something like context aware?
i implemented this without any in build function so i can understand how it works
I had made something before and they really liked it, but now I want to do something crazy, which is beyond my limits
@lyric vale That is fine, then since you know basic things about NN, you can just look up what a binary classification problem is. it is not any more complicated then predicting text.
just predicting 0...9 a...z A...Z
@tardy haven If it is beyond your limits then how can you possibly make it?
after completing this i want to make it to smth advanced like complete page human written text to pdf
is it good idea or should i make smth else
@tardy haven Start with what you know and then make it creatively, something amazing is very subjective to individual, it means nothing on what you are saying amazing except what you see it is.
That’s why I’ve been giving my best for 2 weeks, but it’s still not working
Ahh got it , I’ll start with what I know and try to make it creative
@lyric vale Well, how close is that to what you know? Try it and see. However, binary classification problem is not much different, you are classifying two choices, is 0..9 a number yes? valid, is abc a number no? then not valid, I am being abstract here but this can help you with other topics later on.
ohh got it
@lyric vale That is just one way of using NN, you can also use it to build basic logic. Something like nand gates /AND/OR/XOR gates from that
yes i know binary classification
yes
Now this is simple right? But try building ALU just using NN. Its a ridiculous project but interesting exercises
arithmetic logic unit right
Yes
Which you can build from previous gates. You can start with XOR or use NAND. Nand is mostly used as it is a bit faster.
You can then rearange nand in to any basic gates, from there you build your structure for like dmux and mux and so on. However, in your case you are using NN or several of them doing just that.
Its not going to be efficient, but that is not the point.
yes you are right. i will try making this
should i make it from scratch or use inbuild functions?
Make a simple one, you can try training single NN on several chips or use several of them. You will have to have combine them eventualy using one after the other. NN based computers have been done before back in 1960
Depends on what you are interested in, is inbuild functions going to abstract too much from your learning?
probably i should use inbuilt functions cause if i make it from scratch then it will take too much time
@lyric vale true but you would learn more. However, if you know in general what inbuild functions do, then its ok to have that abstracted for you and treat it like a blackbox. IN computer engineering, the engineer is not really interested in how exactly the transistor arranged inside it, they are only interested on what the chip is doing. You leave the rest to hardware engineer. So you can think of it that way in this small case.
@limber plover thank you. btw where do you work and what is your role if you like to share.
@lyric vale Its hobby for me, I don't work any place, I do like collecting knowledge though.
Mostly I set out to learn how to learn, but real applications I leave to someone else.
wow.
I have a lot of info but very little depths
are you working on any project?
@lyric vale I have, yes, though I never finish them. Mostly because the current question I have about something gets answered, then I don't really continue it or have to.
For example, I have build ALU before, but I never had it used for anything, I got the general idea on what it was doing but I lost interest in the rest of the 16 bit system I was using.
nice but you should make smth that usefull to people
That is subjective. I cannot possibly know what is useful to people unless they say so. I can make something for me, that is useful and then hope someone finds it interesting.
it is not a change in arch, only in vocab, add N (I would try 256) tokens without mapping them to text, that's all
you are right
This also helps me keep my mind steady, and not criticizes my self too much, since to me I can see the imperfections, like how efficient is it really build NN computer... but then I am too close to the subject, a layman might think it is impressive and someone might want to do something with it..next thing you know, you are selling a product you had no idea had this sort of use for it. Someone found out though.
Anyway, I digress try that project and see how far it goes..use inbuilt functions don't use it, who cares.
Guyzz help me what should I do to impress my crush
That is a loaded question. Sounds like you don't know your crush well to impress her. For example does she like programming? If not then why do you want to use programming as a tool to impress. If you are telling her, look how smart I am on what I did, then that is a bit egocentric and you might have problems later down the road. Maybe just ask her out. And let her ask the questions on what you like then she might be impressed. Since you are not the one saying hey look I code.
Just know that this is AI and data science section. If you want advice ask general python group that is active.
She liked it last time, that’s the thing. I made something for her and she really liked it, and now I want to do something similar again. Can you help me?
How would I know what you made? And what she liked. I have no idea what I would help you with but then ask yourself this. When you impress her, should I also step in and say oh yeah I made this as well.
I gave you some advice, good or bad, that is the best I can do.
According to you, give me something nice that girls would like. I can make it in a way that she will definitely like it 100%
Sure, though you can just go to Python discussion and ask for help there. Show you previous work and ask, how can I improve on it.
My previous work wasn’t done very well, that’s why I’m asking for your help, sir
That is fine, however, this is not the right sub unless you are asking with AI and data science. Python discussion is what this is for. There is python expert there to help so they say so ask them. But you have to be exact. What is your goal what have you made and how do you see improvement for it.
not sure if this is the right place for this question: Is anyone here familiar with Reinforcement Learning on Farama Foundation's highway-env ?
I'm having trouble getting decent results using DicreteActions with DQN.
heyo
do economic, environmental, and competitive pressures improve llm code patch quality? im about to run a controlled study with contamination auditing using the qwen3.5-35b-a3b model to test this theory.
i honestly think it will, but i was wondering about your guys opinions
ill share the visuals it produces nevertheless, for science!
is there no demand of Data Science / Data analysis in Healthcare sector? i haven't seen a single DA/DS job in healthcaer
am i cooked?
Probably yes, I am curious to know your methodology to test this
Market is tough
You can look for data scientist jobs outside of healthcare
yeh, but healthcare/medicine is the only thing that sparks my interest and sustains long-term attention
so perhaps i should use healthcare datasets to really learn DS and then apply to DS jobs outside of healthcare
Yeah, or if you really wanna stay in healthcare, look at non data scientist jobs
right. i think i will do just that, use healthcare datasets to learn Data Science and after having learnt enough of DS, will apply outside of healthcare
skills would be transferable right? (my head says so, but i think i still need confirmation)
Yes
alright, thanks!
No but the demand is currently growing though as we speak
I seee, how much is it expected to grow in next 2 year? like till 2028 😅
It'd probably be another 1 - 2 years before you'll start seeing things pop up for it
I seeeee, apparently this is perfect time for my to start learnign DS for heatlcare sector then, it will undoubtly take me like 2 years to learn enough data science using healthcare datasets for a job
thank you!
Am I doing something wrong here?
I'm following along the python one liners book, I got to the neural network section. In his examples he gets a low finxter score for doing 0 hours of python coding in the input data, while mine gets a higher finxter score for 0 hours and a lower one for more hours. I'm getting the complete opposite behavior he is with the same dataset and inputs.
My first two responses was running the model twice at 0 hours, the next response was once at 20, the next was once at 50. The finxter score goes down the more hours of python weekly I input , but the book has the complete opposite behavior
But the code is the same and so is the dataset
I get why it thinks more hours is a less score since the lowest score is someone who says they code 35 hours a week
@lusty rune Were you able to resolve it?
How important is it knowing when to actually use dataframes or series? Cause the syntax is murdering my sanity. Example:
print(cars.loc[:, 'drives_right'])
print(cars['drives_right'])
print(cars[['drives_right']])
(python, pandas)
Not yet
Feel like there are too many ways to do something, or I'm doing something too many ways
This looks a lot more fun than fundamentals 😭
@odd shell I would use ai to answer that question if you want researching it. You don't need to let it code for you, but answering basic questions will give you a general idea alike. Also, you could look at the sources its using it.
I've limited my GPT on purpose using study methods, enforcing docs/community assistance feedback 😉
@lusty rune What book are you reading?
It's so fun but I already have the fundamentals down I've been coding for a couple of years now just not very consistent
I've leaned too heavily previously, realising it was causing damage to my learning. Though, I agree, GPT/LLM's can be potent if used right!
All I can think is that I put in the wrong data
I am not familiar with that book, I would have too look into the exercise.
But I spent 2 hours last night re checking the data set
I love their books..
This is the code example (I took these so I can go over the problem at lunch while I'm at my construction job not trying to pirate)
Yeah just looking at it, it seems data set you used would be the problem, I don't see any problem with the code...
I wonder if capital X is used due to grammar or purposefully
it's just a thing that X is capitalized and y isn't. but I don't capitalize it in my code.
I re input the data like 4 times last night, maybe the MLPRegressor algorithm updated since the book was published is the only other thing I can think of
I don't work with scikit, but this is honestly interesting lol
except that array shouldn't even be called X because it has the y in the last column
I love their books
Also got the automate the boring stuff with python book
The secret life of programs is probably my favorite
Of course SQL is the thickest.... ☠️
The powershell one I bought for my buddy's birthday
whenever I see powershell, I think, maybe I should switch to Linux completely? 😂
This one is pretty thick too
Yea, I know they're useful books, but too general 🙁
@lusty rune That is actually what I was thinking to, I was about to ask when was this book published...is there a way to look up the library and its updates?
I think they often come with repos?
Apparently it has been, I thought that was an issue as well when I ran into problems with the Kmeans algorithm but it was actually my data input, this one however seems to be the algorithm
I finished looking at the array but I do not see a discrepancy between yours and theirs...so it has to be the library
The automate the boring stuff has a python package made by AL that you use
I'll play with the data a bit to see if I can get similar behavior with different training inputs
The best thing to do I suspect, but you said you know why its behaving the way it is, so you generaly understand what the book is talking about, I would not get too hung up on it.
I played once with scikit on data I pulled from a videogame(eve online) 4 million rows or so, and still ended up a lot of overfitting
not familiar truly with the math/stats how to properly use it
pandas doesnt like that amount of data either lol
Yea I was going crazy last night before bed making sure the dataset was the same 😭 I'll play with the inputs for an hour when I get home before I move onto the next section to get more familiar with the algorithm
It's still super fun
I don't think I've ever dealt with that much data before
ML is always interesting...I which I kept my code from when I was doing classification problems
It was my first project 2 months after career switching from art 😂
I was in way over my head, but had a lot of fun
I got to a point where I kept building the dictionary data set in text and then parsing it for training so that then it could tell me valid and not valid words based on examples.
Example is cta and cat, cta is not valid but cat is, however, it was interesting that when I did tac which was not part of the binary decision, it still said not valid. Which makes sense but I did not program it for that.
This lib seems pretty interesting on say historical market data 🤔
Unsupervised learning is so cool
When I get done with this section I wanna teach an AI how to play blackjack or poker
Yeah you do get emergent behaviors from them.
I'm working on a little RPG game and it would be cool to teach my NPC enemies how to make the best moves based on players decisions
The first exercise in the ML section was using linear regression to predict a little stock market sample, it's been really cool learning about the different algorithms. This stuff used to be so intimidating to me but the way the book breaks it down is easy to understand and when I don't understand something too well I do more research on it
That sounds like you will need a lot of data for that as players play. I can see them getting smarter over time, but it will take a while for that.
Isn't that pretty deterministic?
Im trying this method now on a pokemon dataset and see if I can match later* generations with only earlier generation data
My favorite book I used was Grokking Algorithms..
Yea it would be simple to hard code
I'm pretty sure that book has been mentioned in this one
I might have to check it out
Alright, wonder what happens if I do this regression consectuively for gens? 😄
i guess meta-shfits from designers prolly makes it harder, unless they stick to their philosophy methodology?
or lack of data and we get goey? 😂
I am more interested in the library directly and its math. I never used it, I just set out making my own at some point.
yea, i get that. i need to start digging into math more
instead of building stuff i dont understand in the end truly
I did my own manual implementation of the KMeans algorithm to learn it better but I think it's fine to use libraries if you understand what it's doing under the hood
Yeah, though I have not done this in a while so I forgot a lot about it now.
I do electronics mostly, so I hardly have to deal with this high level programming. Mostly low embedded system programming.
I am only coming back because I don't have the budget to continue it and software is a lot simpler to get into.
yeah..the tools to do data analysis is so damn accessible
I loved learning about hardware in the secret life of programming book
Whoops
Wrong reply
The fact that python is free is amazing to me
Let me know how it goes!
Wait are there paid programming languages? 
something.microsoft?
Well, yes, for licenses.
If you plan on using it in commercial setting then yeah. For example there is programming language forth. Well, swift forth that is rather limiting until you pay.
the problem is rather where do you store all the data from the languages 😂
Same with something like "true basic"
If I remember right some charge you for compilers and such...language dependent
@lusty rune Did a "prediction" on type, if x stats = fire or water? then added +1 on every consecutive generation to see if it improved actual confidence. and eh, yea lol. they changed how they defined types. but also missing lots of nuance. ofc. anyhow, fun stuff
could improve the model (or worsen) if we consider every type, or more
Guys im trying to make an LSTM model but for some reason the loss flatlines and the outputs end up being all the same
(or very similar)
Also while training
are you training a sensor on different weather types ?
Train loss decreases slowly
No its supposed to be a timeseries forecaster
Basically I give it the last, say, 20 days, and it tells me the weather of tomorrow
val loss goes up
No real improvement overall
Wow, that is going to be hard but interesting.
well it's supposedly a textbook application of LSTMs
But for some reason it doesnt work
Yeah, I am not sure , I would have to look at the book you are using
textbook application doesnt mean it comes from a textbooks
It means its common/classic
OH I thought you were working it out from textbook sorry.
Are you using a library for this?
Has PyTorch been updated recently?
Probably
in this AI era I really doubt its left to itself
OpenAI uses it
All big AI firms do too
You can look up recent updates, it might be something with this if your are sure you data is right
No I doubt they messed up LSTMs
Not sure then, I have neve used it, but I thought from systems approach it might be that.
@rancid thorn Have you tried asking Ai on this?
what do you mean?
Asking ai on the problem you are having you said something "loss flatlines and the outputs end up being all the same?"
Well, what I got, not sure if makes any sense to you but "the model has collapsed to a trivial solution, like predicting the mean or mode of targets across all timesteps"
Using science direct as its source though \
Maybe that is too general?
no that doesnt really seem to be the issue
Were you just testing it or you know for sure?
I mean another one I got was "Unnormalized inputs or targets cause exploding/vanishing gradients, forcing the model to output safe constant values" uncertain if this helps.
Im clipping the values and the inputs are already normalized
All right what about "Learning Rate Problems
Too-low LR traps the optimizer in flat loss regions; too-high causes oscillations ending in constant predictions"
This sources is reddit so..take that with a grain of salt
Nope learning rate is a normal one
Have you ever made LSTM from scratch?
this is my first time
No I mean the library that you are using which use LSTMs yes?
yeah pytorch
NNs fail silently all the time, I doubt anyone can tell you what's wrong only looking at the predictions and losses
there's a very nice though a bit outdated recipe on training NNs, maybe something in there can help you
Musings of a Computer Scientist.
Well the other suggestions I got dead neurons not sure how accurate that is but I don't even know if you can see this or test this.
Sorry, not really helping as I don't know much about them. I just never use libraries for this and take apart what LSTM is. Maybe the math on how it actually does it.
For me its more of "its fine if you don't want to know what a brick is to lay it down, but if you want to know why it keeps crumbling, you better get to know the chemicals of it"
Why not look into pytorch forums if they have any, maybe they had a problem like yours
I dont think you should make AI libraries yourself
I mean understand what you're doing, necessary
Making it yourself from scratch, will probably end up doing worse
also on the topic of lstm (or deep learning in general) for time series:
every other week, some new hot sophisticated dl architecture for ts will come out boasting sota performance
but also, don't sleep on "traditional and outdated" methods like arima/ets, which are still surprisingly competitive in certain scenarios
buzzword buzzword buzzword
What are "sota" and "arima"?
@rancid thorn TURE but it would help you see the picture better.
.
hello
"state of the art," which means best of the best
and "autoregressive integrated moving average," a very very important traditional method of doing time series forecasting
describing them in detail on discord is probably not very effective, you can search them up when you want to learn more
oh okay i will thanks
This is the source data (a slice)
Bigger slice
From a purely visual standpoint, Id say theres some pattern
So it can work
So that is the original data?
What is the predicted then?
Sorry, I am going to ask basic questions, as I am now reading up on what LSTM is
I am reading that the limit for them is "Manual Optimization: Requires tuning for best performance" How do you know you optimized it well?
Its supposed to take the features of a series of days and predict the next day/sequence of days
before tuning you need to have a model that works at least a bit
If it doesnt then thats not the issue
What do you mean a bit? Why not indefinite?
Is there away to can plot out the predicted data vs original?
One of the things I see is "PyTorch provides a clean and flexible API to build and train LSTM models" Yet they also state "Version Gaps: API changes may affect older code"
what?
yeah
Also you can check how good the model is from loss
@rancid thorn I did not understand the way you phrased it. "before tuning you need to have a model that works at least a bit" what does this mean exactly? works at least a bit is like I guess it should work....seems uncertain..
If the model has some fundamental flaw that renders it completely useless fine tuning is useless
You need rough tuning before fine tuning
Ahh ok, and you have to do this all manually?
I thought tuning meant specific weight adjustments?
You have it coded so would there not be some constants you can adjust?
i thought the 'tuning' it's about preprocessing
to 'improve' the data and model can get a better accuracy
Yeah nevermind I got them confused, its been a while, yeah you would not adjust that as that is what the NN is doing ,you would adjust the learning speed and so on...
Also last NN I worked with was VERY simple XOR problem that I remember right now so I could do it manually
Anyway, I am losing interest now, since I am doing some other project. Hope someone can help you.
i dont get the problem
are u trying to improve a model?
I am not the one that had the problem StraReal did, you will have to scroll up to read their specific problem.
lol I hadn't seen it
Well, you did join in midway so there is that.
@rancid thorn Do you know if you can express LSTM like this F = x'y' + x'y + xy' = y'(x' + x) + xy'? Similar to product of sums in digital logic?
Sorry I just had to ask, as I am curious
An LSTM has more operations that that Im pretty sure
Yeah I know based on how large it is.
I mean theres two main variables, the short term and long term memory
+the input
Then they go through the forget gate, the input gate and the output gate
I was trying to see based on what I could find, it looks like something you can express similar to logic gates
I know a bit more about digital electronics and if I can connect my thinking that way, maybe I can understand it more
This is an LSTM expressed as a mathematical formula
Interesting
an LSTM it's a type of rnn to avoid vanishing gradient?
Yeah
And exploding gradient
And in doing that it adds long term memory which is really cool
So similar to programmable memory?
Not sure what that is
how that works?
it's like the RMSProp?
Basically long term memory gets carried from each LSTM to the next and it passes through a forget gate, which decides what % of it to remember, and an input gate, which decides what to add to the long term memory
It is never outright deleted
And short term memory is carried from one LSTM to the next but it doesnt go to the one after it too
so it's like a nn to remember things?
got it
rn i'm in the optimizers
dont see the nn
Basically it programs its own memory, it cannot be really expressed with bool logic but if you make it over time then you can sort of get it. The expresson I showed was F = x'y' + x'y + xy' = y'(x' + x) + xy' You can use this to minimze gate use and reduce to using say 3 and gates instead of 10. However, LSTM is not exactly like this but does have it basically overtime expression. Seems to use sigmoid a lot.
i got it the LSTM, i just don't understand yet, what is it
it's like a optimizer for rnn?
to resolve the vanishing and gradient probleml
Well, I am not accurate this is just my understanding, its not pure logic like that or static, so you cannot really use it. You cannot use digital logic gate expressions from input to out for this. IT chnages over time as needed, so in digital logice its 1 or 0 for input but with this its way more complicated than that.
Its an improvement of basic RNNs
But it is an RNN
Ok anyway, I really need to stop thinking about this for now.
Basic recurrent neural networks are great, because they can handle different amounts of sequential data, but even relatively small sequences of data can make them difficult to train. This is where Long Short-Term Memory (LSTM) saves the day. Long Short-Term Memory is a type of recurrent neural network that can handle much larger sequences of dat...
You should watch this
Its really good
in this case time series forecasting
Which is basically having a series of values and predicting the next
real project or for study?
Also if you dont know exactly what RNNs are watch this
https://www.youtube.com/watch?v=AsNTP8Kwu80
When you don't always have the same amount of data, like when translating different sentences from one language to another, or making stock market predictions from different companies, Recurrent Neural Networks come to the rescue. In this StatQuest, we'll show you how Recurrent Neural Networks work, one step at a time, and then we'll show you th...
Well I was hoping to apply it to the market
But in the worst case scenario I guess it will be study
oh yes
i'm not at that level yet.
i want to start studying NN until may, cuz currently I'm focusing more on the intermediate level, such as problems and gradient types and their optimizers (LR, regularization, ADAM, RMSprop, LR scheduler)
Honestly I think you should do the opposite
Learn the basics of NNs
Backpropagation, the chain rule, RNNs, what an NN even is
And then go into the details of every step
i know the backpropagation, the chain rule, how it's calculate by derivates and more
Oh yeah then youre good lol
activations functions and more
but just for simples like linear models
i never code a nn
just doing model with 1 layer (linear models)
you should code as soon as you learn something new
really get it printed into memory
Input
Output...
so the model predicts are bad in rainfall, and sunshine?
but it's worse in that 2 features right?
not really
the date feature are the day? or something else?
the date feature is the period of the year
at start and end of the year its 0
and towards the middle its 1
Just a cosine wave made to wrap from 0 to 1
do you have the number for loss?
in train set
I have these
Epoch 02 | train loss: 1.00803 | val loss: 0.96275
do you check if overfitting?
that don't a bad loss, but we can improve
The thing is it doesnt improve
the 2 epoch loss are the lower?
no I just took this one as sample
Here
can you send the first, the middle and the last?
i'ts happening a little of overfitting
No no
with overfitting the train loss would go down by a lot
It would end up looking right
Sure it wouldnt be useful
But it would end up looking right
But it doesnt
but in 98 epoch
the train loss down and the val loss increase
thats not a overfitting?
over time, it slightly is, but overall its not
What really happening here
Is that its as if the model wasnt even being trained
Its as if it was back to the start at every epoch
using pytorch right?
yes
did you dont forget some code?
checked it a thousand times
for epoch in range(1, NUM_EPOCHS + 1):
model.train()
train_losses = []
for seq, target in train_loader:
seq = seq.permute(1, 0, 2)
optimizer.zero_grad()
pred = model(seq)
loss = criterion(pred, target)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.5)
optimizer.step()
train_losses.append(loss.item())
# ---- validation ----
model.eval()
val_losses = []
with torch.no_grad():
for seq, target in val_loader:
seq = seq.permute(1, 0, 2)
pred = model(seq)
loss = criterion(pred, target)
val_losses.append(loss.item())
avg_train = np.mean(train_losses)
avg_val = np.mean(val_losses)
print(f'Epoch {epoch:02d} | train loss: {avg_train:.5f} | val loss: {avg_val:.5f}')
if avg_val < best_val:
best_val = avg_val
torch.save(model.state_dict(), 'best_weather_lstm.pt')
cprint(' -> checkpoint saved', 'm')```
i see in somewhere the thinking to avoid that
i'ts like
the loss don't decrease, so the weights dont change
the weights dont change, so we need to check the weights numbers (check the optimizer)
why the weights dont change? (lr lower/higher, the model find a local minimum) and more
i think in this case the model find a local minimum, do you check that?
I tried making the LR crazy high amounts
doesnt work
what optimizer?
Also not how it works
Adam
try the AdamW
i really think that the problems it's a local minimum
No, if it was that it would do a little optimization before flatlining
here it flatlines at the start and just stops
the train loss decrease with time but the val loss dont
well
try to increase the batch size
or put a lr schuler
or idk try to increase the alpha regularization terms
already tried changing all the hyper parameters
the data have outliers?
I really dont know what could possibly be the issue
no
the lr have a scheduler?
like starts in 0.00001 and goes to 0.001
no but it doesnt really matter if with both high and low values it doesnt work
the data it's scaled?
yep, its normalized
i saw that in reddit
" In your model (in the LSTM/RNN definition), is the batch_first parameter set to False? If it's set to True and you perform this permute, you are training the model with batch instead of time. This would explain why the validation loss never stabilizes: the model is trying to find temporal patterns in dimensions that are, in fact, different samples. "
"This would explain why the validation loss never stabilizes" not the issue at hand
Also its set to False
the better epoch it's 37
and after goes down
idk if it's a overfitting problem that are invisible or something
Can someone help with my help thread in #1035199133436354600 ?
@rancid thorn Still on that problem huh? Have your tired asking the pytorch community? I am sure they have discord.
Ooh do we reckon openai has a massive lifeline, now that they've signed with the US secretary of defense?
I know they were leaking money bad, but surely that's on the up for them from here
Working on Kaggle right now, didn't even know this could happen lol
hi everyone, Tooba here. I am a SWE Junior, currently in 6th semester.
Need your help: I am currently studying "Data Science for SE" course and for its 2nd assignment, I have to effectively visualize big data, which is basically the data of 4 parameters: No2, O3, PM2.5, PM10 of 100 stations hourly data across the world for the year 2025. The problem is that the data is so huge that I am unable to visualize it meaningfully so as to convey anything properly. See the image attached. Well, I need your suggestions or maybe yt tutorials to effective visualizations of such big data....thank you : )
find important features with respect to each other and plot their graph, find whose correlation is higher or lower for effective visulization
find their RFE
if its a linear data
you can also check their VIF score to find multicollinearity
thank you!
What its trying to predict
Cyclically repeats, always in the exact same way
What it did
So the code must be wrong
Because i do know for a fact that an LSTM can predict this
I would think that you'd want a single "time" dimension, not separate date & time.
What do you mean?
And, even then, if the intervals are constant, that dimension doesn't provide anything useful since it's ordered.
theres no separate date and time dimensions
Theres one feature
date
Then the dataframe is ordered of course, but that doesnt indicate in any way seasonality, as if the weather snapshots were taken, say, 2 days apart, then the formula would have to change
Is anyone working as a data engineer here?
Did you share your code anywhere?
So that everyone can easily read your code, you can paste it in this website:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
Anyone here could help with a computer vision task? Im supposed to perform transfer learning on a pretrained model like YOLOv8-seg which i did, and the end result is very bad, i lowered the confidence threshold to 0.05 and it only gave me a single prediction of a small piece of a dendrite as seen in the image.
the labels it was trained on are manually done by me and my friend and we took a good amount of time to do them correctly so that isnt the problem either, im suspecting the biggest problem is the image size i set when training and triggering inference which is 1024 but the real image dimensions are:
(2188, 3072, 3) h, w, channels
(1094, 1536, 3)
Maybe because the dendrites are so thin in some areas or faint it ends up squishing it out of existence when resizing.
So, I reduced your problem down to 1 dimension: max temp
Did you change the actual ai code to get this?
I think your lr was probably the biggest problem
whatd you do to it?
Changes to 1e-3, from 1e-1
With 1e-1: Epoch 20 | train loss: 0.98093 | val loss: 1.14488
At 1e-3: Epoch 20 | train loss: 0.45296 | val loss: 0.49629
Oh, 1e-5 is also Epoch 20 | train loss: 0.97913 | val loss: 1.11007
There's a bunch of hyperparameters to play around with her
layers, dropout, lr, etc
Well but the loss was 1e-3 in the code I sent
I'm looking at your code from yesterday https://paste.pythondiscord.com/KYORVZPCXVGMSN6PWSRHGWPQSE
Anyway, all I'm saying is, reduce to a single parameter: features = ['MaxTemp']
Can you send the code to plot it like you did?
Yup, one sec
my learning rate is set to 0.01 default and i have early stopping on 5 epochs if the loss doesnt change and it always stops at 5 epochs like its not learning anything
@left tartan ?
I gotcha, I needed sustenance
oh sure lol
I made a few small changes to your code, mainly changing lr... but also had to set device in a few places for cpu/gpu. The main thing in my case is I changed features to feature s= ["MaxTemp"].
Hey friends. I'm developing a stocks trading app that learns from trading behavior. It can describe any market condition. It is something like a state machine. But I have no experience with ML. If someone is interested in having a look. I'd love some feedback.
By getting the device
That play torch can run on is there a way of limiting how much is used
Can I? dm
Hi, is leetcode good for practicing for interviews?
It's good to practice leetcode questions, which may or may not be asked by interviewers
this is the data science and AI channel. I was never asked a single leetcode-style questions when I interviewed for DS/AI positions.
you can ask for job hunting advice in #career-advice, and you can ask leetcode questions in #algos-and-data-structs
okay thanks mate
okay thanks
https://github.com/GriffinCanCode/Callosum anyone like this?
what is it? why should people like it?
personality DSL for agents, compatible w langchain, lets you be deterministic about ai personalities
compiler's in OCaml and is lightning fast
quickly skimming through it, I'm not sure how this isn't just a glorified system prompt swapper
I doubt those presets work as well as advertised too, instead of helpfulness: 0.90 you might as well just say high helpfulness and the latter is probably way more understandable
The output is a system prompt because that's the interface LLMs expose and calling it a "prompt swapper" is like calling TypeScript a "glorified JS writer." Behind that output is an OCaml compiler with a real lexer/parser, typed AST, semantic analysis (cycle detection, conflicting modifiers, contradictory rules), and multi-target codegen (JSON, Lua, SQL, Cypher - not just prompts). The numeric values aren't for the LLM to interpret literally howeber they drive the DSL's rule system: behavioral conditionals, cross-trait interactions, evolution deltas, and compile-time validation that "high helpfulness" can't participate in.
behavioral conditionals, cross-trait interactions, evolution deltas, and compile-time validation
literally what does that mean? I mean I can guess, but it sounds like you understand it more, so please go ahead?
I guess my thoughts are like, ts is a lot more complicated than js, but it provides very visible benefits (like, well I mean, types)
the dsl thing is a lot more complicated, but in the end needs me to do the same amount of work as just writing a system prompt without it anyway?
ig if it works for you great, though currently I'm not seeing like too much benefit
https://github.com/plunder707/muon-curiosity/tree/main
I had an experiment my AI system and I wanted to run. Anyone have the resources?
hey
woah i saw your question yesterday and wow its so good
yeah there was some loss function explosion when i fixed it and fixed my data augmentation it went well 🕺
<@&831776746206265384> looks like ad ^?
!cleanban 906481045044625428 ads
:incoming_envelope: :ok_hand: applied ban to @pliant temple permanently.
and your problem? did u fixed?
nah
Where can I find data for code of multiple programming langs, preferably labelled in large amounts?
github?
Are you looking for the same logic implemented in multiple languages?
i need ones outside github
nope any random source code
why?
the guy I'm working with said so not really sure why
sounds like something worth asking
y did they have to choose json for tool definition in LLMs and for structured output
is supa token intensive
even if the LLM is cheap it just reduces performance cuz of all the cluttering in the input prompt
why does it have to be structured at all? does it make a difference to the LLM?
maybe it does make a difference if you want to parse the output
(and subsequently deterministically change said output to feed it back to the LLM)
I'm building a QA agent and I'm handling the context so how can I split the codeabse and index it so be used later on or there an mcp server that handle all of that ??
in general how to handle the context it contains hundred line of codes?
well first I'd imagine if to answer some of these questions, you need to understand all those hundred lines
if yes... well I mean that means you must send all of that into llm context
if not then you can think about it more
there has been code-oriented embedding models coming out lately that you could try to RAG with
I'd imagine json is one of the most trained formats for modern llms; anything custom and accuracy probably worsens
additionally I think integration with other tools is less of a headache when you just have plain ol' json
that said tho I went looking and landed here
I might try yaml instead 🤔
im impressed that yaml is actually better than json
tho tbf they did use gpt 4.1, I'd be more interested in the latest gen of models
wonder if they have the code for their bench
interesting stuff
well actually yeah, tool integration is gonna be a headache
everyone including the providers and the frameworks seem to have json only 😔
I might see if I can hack it into pydantic-ai somehow
that would be awesome though I'm not sure how it would work exactly
considering the providers also only use json
ig just prompt for yaml output? probably a lot less reliable
local might work ig
llamacpp grammar keeps winning
-# never used the others so wouldn't know about them
yea, id have to have a custom prompt type thing, both for structured output and for tool usage
long term Id probably just use json cuz they are actively training the models for it
honestly I wouldn't be surprised if newer models have/will have tokenizers specifically optimizing for json token length
hey guys
what's the best way to run llms locally with python?
I've tried many different things but none of them worked (vllm, transformers)
You have to have enough RAM, preferably on a GPU. Otherwise, you can't.
Saying that something "didn't work" doesn't communicate anything. What did you try to do, and what happened that was different from what you expected?
I have 16gb of vram
When I say it didn't work, I mean that I kept getting error after error and got burned out
You have to show the code and the whole error message for people to be able to help you.
Ollama
(but if you just try to run them with ollama, and you don't know why you were getting errors before, you'll probably get errors with ollama.)
is this a support channel?
sure
yay
I think I'm having an issue with ROCm
print(torch.cuda.is_available()) returns false
I have an RX 9060 XT by the way
How did you install pytorch?
from the AUR:
yay -S python-pytorch
that seems like you've only installed the CPU only version
The official docs shows how to install with ROCm or with CUDA or such and such
ya
I figure YAML is currently better for LLMs cuz humans understand it better and hence, there's more internet text correctly discussing big complex yaml than big complex json
is the whole "LLMs are a reflection of ourselves" typa thing
at least me personally, I'd rank yaml > XML > json as for ease of understanding
oh it's because I should've installed python-pytorch-rocm now it works
thanks
honestly, shrug
to my knowledge llm's getting more and more rl and other post training alignment, who knows what goes in there