#data-science-and-ml
1 messages · Page 105 of 1
Now my question is: why are all my graphs skewed? The dataset should be scattered, but I get 1-2 lines
No, I do not know what I'm doing but if anyone can give pointers on how to make this work, I'm all ears! I hoped to change the dot colors based on the chained descriptor, the taxa ('Clade', it's all dinosaur teeth here)
hi guy
good evening
i am really interested in ai
please can anyone tell me where i can start from
in this quest
ai is very big and it depends on where you are knowledge and experience wise
And also on what you want
Do you want to build software that uses AI ? Or do you want to build AI from the ground up ? Do you really mean AI ? Or do you mean ML ?
Anybody here using Obsidian to take notes, screenshots etc? I found the addon to write sub and superscript, but how do I type a column vector arrow (arrow above a letter)?
usually in latex math notation it's \vec
most conventional latex math notation comes from the latex amsmath package https://www.ams.org/arc/tex/amsmath/amsldoc.pdf
ah actually no, some of that is included in base latex
One of the greatest motivating forces for Donald Knuth when he began developing the original TeX system was to create something that allowed simple construction of mathematical formulae, while looking professional when printed. The fact that he succeeded was most probably why TeX (and later on, LaTeX) became so popular within the scientific comm...
obsidian presumably supports some subset of that
Ahhhh thanks! I've heard about latex but only in the context of notebook markdown, so I thought it was something that would only work in something like Jupyter. Just tried it, seems Obsidian supports it by default. 🥳
latex itself is a typesetting system and markup/programming language for that system. but the math notation specifically has been adopted by many markdown processors, including jupyter notebook and obsidian among many others
Awesome! I bookmarked the wiki, this is going to be handy lool
(clarification: tex is the typesetting system, latex is basically a bunch of packages and extra functionality built on top of tex)
Normally I write on paper when learning something but since ML is built on top of tons of several math domains, sketching every slide to take note of it seems inefficient, hence opting for digital with screenshots and all. It's nice if I can add some of those symbols to support the screenshot.
i think there is some OCR handwriting -> latex tool but unsure how well it works. maybe a good AI project, like mnist++
Damn man, I haven't living under a rock, but not kept my math skills up either, I guess I'll geek out over this type of digital notation for a bit haha
latex is a lot of fun, great way to procrastinate on actual work
yep that's what I'm experiencing right now, Andrew's explanation about writing the formula of multiple linear regression in Python just needs to wait 😄
I'm seriously scrolling the page in awe seeing all those symbols coming to life hehe
wow oke, apparently that extra space between the letters and brackets actually has meaning in math 
anyway, back to the course
Now that I'm looking at this stuff, I realize how much room for optimization there is when implementing custom layers in cuda.
The embedding modules for example, they can be calculated without leaving the GPU
Unless pytorch automatically optimizes it
Greetings to everyone…
I’m New to this Data Science and Artificial Intelligence. I’m glad to join this noble platform.
If you’re also new and would like to have group learning discussions, please let connect and start learning
As for our grand masters and masters in this field, please help me with your guidelines so that can do better. 🙏
I will be happy to have god father in this field
I've used latex a bit in my uni days, and the occasional latex expression in jupyter more recently, but the concept of "negative space" is new to me, or maybe I've forgotten about it! Thank you, I learned something today!
Wait I mean, stuff never leaves the GPU cuz you kinda load it with .to(device), I meant that there's operations that can be combined into a single kernel
Are you from Africa? 🙂 Meanwhile, welcome.
How can you tell from the message alone ?
Is it the way of speaking ? Many times I can tell someone is Portuguese through their English
Yeah, the tone and the way the message was written, I read that with my accent even 😃
Well, I might be wrong but the message does seem like what someone from West Africa would write.
Hey so which part of data science is really resource intensive. Of the regular task and libraries used (like PyTorch, Keras and other deep learning libraries). Is it training models?
collecting and pre-processing data can also be a bit resource intensive, but yeah training is the most resource intensive part - but precisely because of that, there are many ways you can avoid or completely skip it.
Depending on what you need to do, you can fine tune an existing open source model, or even use one completely as-is.
The libraries are more or less the same for the entire process though, from creating the model to feeding data into it and training it, to running inference (actually using a trained model).
In some cases even inference (using an already trained model) can be expensive though
- fine tune = training only parts of it to do better on certain tasks
Could please give me an example of a mini project I can do to test how well my laptop holds up. Deciding if I need 16gb ram or it’s more of a want on my m1 MacBook Air
it doesn't, they just wanted to make it look pretty
btw the amsmath module brings a command for binomial coefficients. i think the bot here has that in the header
.latex
[
\binom{c}{r}
]
Really depends on what you want to do
For some things, normal RAM is borderline useless as far as performance goes and the real bottleneck is how much VRAM your GPU has
For some things, you can do just fine without a GPU at all, or renting one online using something like Google Colab
Some common projects would be things like Kaggle's Titanic challenge or training Image Classifiers (either MNIST or something like Cats vs Dogs), though you could also just try to run a LLM or text to image model
If you're gonna do ML you probly gonna want something with GPU
Wait does the MacBook air have GPU ?
Don't even matter, a good chunk of the money goes straight into the apple label, which could go for a big gpu
People have landed people on the moon with very little CPU power and ram, I find that any CPU is good, install a lightweight Linux and an i3 WM and I reckon even older laptops will beat the expensive apple laptops
*Depends on the type of ML, some does not run well on the GPU, but Apple CPUs are very fast, so not the issue. 16gb RAM is not much, minimum these days, my web browser uses all of that. Not sure how Apple thinks they can get away with 8gb. I guess if you don't really do too much on it.
(They probably don't think 8gb is enough, and just want to charge extra)
RAM costs a lot on laptops because laptops are constrained by heating. The thinner/flatter, the worse.
I have a workstation rig at home this is for on the go learning. Say I wanna sit in a cafe after work to learn and stuff
Should be fine, although it depends if you consider it worth the price.
I have an 8GB m2 and it’s adequate for my vscode and general use. I don’t push it as hard as my desktop, or do any training on it, but it is surprisingly adequate at 8GB
The macbook? So far yeah I got it refurbished though. The battery life is a main selling point
Just checked my old laptop, it is using 1.3Gb of ram total with Firefox open single tab, I'm running Ubuntu and i3wm
Wm? Desktop environment?
Yeah, Linux will drop this down a lot. With the right desktop and stuff.
Yes it's a tiling manager, I miss it every time I'm forced out of Linux
Ah, yeah, I push my laptop hard. Also 500 tabs open.
Auto tiling?
Not really auto, more like, there's no floating windows or any fancy UI, just your programs tiled
The programs menu is a tiny black bar where you write the command of the program and it automatically daemonizes it
I use pop os which is rather bloated for a Linux distro. The cosmic desktop environment got a pretty sweet window manager
Just checked it, it looks cool
You may like Manjaro.
The i3 philosophy is much more minimalistic, and it works by keyboard binding, it's like vim but for WM
Except it's way easier to learn the bindings
I highly recommend having a keyboard only desktop setup for laptop (i3).
I used Manjaro a lot, only reason I'm on Ubuntu is that every server I see uses it too, otherwise I'd be using Manjaro or pure arch
(i3 + window swallowing + vim)
iirc you can set it up so that you can connect to a Jupyter kernel running in your workstation from your laptop from anywhere, though setting it up properly and in a secure way might take a little while
(pretty much how things like Google Colab works, but self-hosted)
still, if it's just for studying sometimes, there's a lot of content to read or watch to the point you could easily do a fair bit of research without running anything at all
Uhm, it's also possible to just do ssh, run jupyter and do port forwarding to the laptop
Or even easier, run jupyter and use ngrok
Yes but I reside in uk currently
There's even these web based terminals for ssh'ing into the computer, I haven't gotten it to work but it's gonna come in handy as part of one of my training pipelines, it boots up a new server each time, and is kinda nice to have a web terminal through a port open to my IP
So like, you can set it up in some port on the computer, on localhost:8000 or something like that
And then ngrok it to safely access from anywhere
Full, secure remote control with minimal effort
This stuff: https://github.com/butlerx/wetty
Awesome. May I ask which country you're from in Africa?
Is there any model (transformer more likely), that can help me generate text and images from a prompt
in context : i want a model which can generate questions related to maths, phyics etc. and also generate the necessary figures for that question
is there any way to build this model, make it possible. thanks you for reading, help me
Foundational models like GPT-4 and Gemini are multimodal so they're capable of such task.
Yeah you can but the computational cost of building such model from scratch is crazy.
Just look for the open-source ones and probably finetune it to your target task or you can simply use it that way w/o fine tuning.
Check HuggingFace 🤗 LLM leaderboard for open-source models that can help with the said task.
I appreciate a lot man, thanks
i'll search for a multimodal from Hugging Face
You can see it even clearer here
In the CNN, if you pick the left most output value, you can't trace back any series of connections to the right most input value
Which means if x5 is an exclamation point or something that modulates the meaning of the entire sequence, the CNN wont be as capable of accounting for it
In the case of self attention, it's almost as if the weights of each connection was determined on the fly from the entire sequence.
Wouldn't be my way of looking at it tho, I prefer to think in geometric terms, it's often easier on the head
Ohh

Got it thanks again
Anybody used V20 oanda ? 📊
I can’t seem to get the take profit and stop loss per opened trade to trigger
It’s this strategy I built for trading
Thank you for sharingg
While good stuff, not very relevant to the channel topic. Probably better for an off topic channel? A question about ML and trading would be on topic tho.
yes the question would be help me find a way to close the orders I’m building a way it can do this buy using the signals already that it generates for a few reasons
They don’t repeat often in the same areas allowing for trends to stay between , allowing not to trade in consolidation only triggering to the specific crossovers ,
I had to go from TradingView to Python but I am using V20 panda to see what this would be like
Oanda sorry *
How have you trained your model? It sounds heuristic, not ML?
It doesn’t need to be trained tbh it just needs to do strictly that tbh I use this everyday trading manually
It’s just not clear what your question is tho.
My v20 script on Oanda needs some touching up and I need some other brains 💎
So you have a heuristic that you use to trade manually. Is your question: how I write this logic as code? What do you have automated so far?
Sending next part
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
Okay let me remove my keys
Outside is worse lol
lol, yah
But; if you want help; ask a very specific question. The more specific the better. General questions are hard to answer.
Your right I’m sorry let me be more concise , I got you
No worries!
I have run my first machine learning model today, and I'm getting the best results with decision tree or random forest. However, the MAE and MSE for my test data are the double of my training data. How do I start to troubleshoot this?
hi i am new in ai\ml can you tell me about wheres to start
How’s your coding skills?
i am a computer science engineering student , know 5+ language including python , knows web development and cybersecurity
i have knowledge about numpy and pandas and google collab and jupiter notebook
CS50 for AI might be a good start to the practical topics, the first few lectures are probably topics youve already seen
See the pins for some recommended books
okok , i am looking for tutorials mostly
There’s basically two parallel tracks: learning the hands on part, and learning the theory/concepts
There’s many places to start then, I’d guess maybe pick something you’re interested in? Computer vision? Llm? Nlp? Etc
llm and nlp might work , i was thinking about numpy , pandas and then tensorflow first
Yah, so then your question is: what are good tutorials or starting points for tensorflow? (I don’t know, just reframing the question)
yhea and what to learn in tenserflow and what after that
this is a really good strategy that is proftible i dont mind sharing at all as it is it very basic yet powerful but needs the logic to execute and a few pass through confirmations
What’s your question though?
yhea let me start from that i think so
is there anybody willing to take on a project that i can fund
We don’t allow recruiting in this server Sorry.
!rule 9
oh sorry ....
https://www.freecodecamp.org/learn/machine-learning-with-python/ @left tartan can i start from here?
I don’t know, hopefully someone else might know
ahh okk where u start btw?
I am a data engineer (not DS), but a combination of purposeful study (theory) and hands-on projects. But my motivation is usually to explore some work related question or problem.
ohhhu okok
Also, if you want a fun intro, this channel has a great neural net intro: https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&feature=shared
ohh thnx for the guidence
First you gotta ensure that you set a seed in your pRNG's
But it depends, are you training or just running them ?
For now I train it, and then I run it on my test data. So far the problem I identified is that my model is overfitting. I have now run a randomizedsearchCV and currently running the random forest again with those parameters
Alright. Make sure you set a seed on any library generating random numbers.
This way when you find that a certain solution works, you can then change the seed to verify it's not due to random chance (it happens)
The second thing is using something like MLFlow to log your runs, you'll lose track of everything very quickly
I will try that. After using the best params the model is now even more overfitted with r2 on the training set 0.9966 and r2 on test data 0.682
Right, so overfitting happens because your model has high capacity and ends up memorizing your dataset. You can now decrease capacity, increase dataset size, cripple the model in some way or do early stopping
Wait I never used this random forest thing
Anyone here able to help with hadoop mapreduce in Python?
Need to create a KNN hadoop mapreduce program from scratch (no scikitlearn). Given two datasets that live in hdfs in input folder.
Both dataset contain the same structure: label feature1 feature2 feature3 … feature 300. Each record has 300 features and one label.
Need to somehow list predictions and accuracy in output. But be mapper.ph and reducer.py.
Need help with all of it, but specifically how to use stdin to have both files when the hadoop command execution would look like this:
hadoop jar /usr/local/hadoop-x.x.x/share/hadoop/tools/lib/hadoop-streaming-x.x.x.jar -mapper “python3 /home/name/mapper.py” -reducer “python3 /home/name/reducer.py” -input /user/name/input -output /user/name/output
Not sure if this should be in this channel or #algos-and-data-structs ...
Decision trees are soooo weird does this stuff work ? o.o
Not sure if I'm understanding, but each bode has some trainable decision function that partitions the dataset. In case of ransom forest you do a bunch of them
For hard decision trees, nodes do not have trainable decision functions really; rather, they look at the (entire) training data and use some metric, e.g., Gini impurity or Shannon entropy, to partition the space even further in a greedy way to reach a learned piecewise constant function at the end
there are soft decision trees that do have trainable parameters for each node, e.g., you can attach a weight vector to every node and subject the incoming data to it & sigmoid it to have a "probability" -- with that probability, the flow goes to left child, and 1 minus that probability, the flow goes to right child. There, you can learn those weights with, e.g., backpropagation, and you'll partition the space more softly :p
they are very rarely used compared to hard ones, though
random forest is "nothing but" many of hard trees coming together to tidy themselves to not be disturbingly curious about the data to reduce overfitting, e.g., you don't look at the entire training data whilst splitting etc.
Uhm, but for example, in here
Decision trees are soooo weird does this stuff work
a little surprising comment, because people (and me too) find their way of working very natural, hence their prevalence in interpretable machine learning stuff
it's like an internal hyperparameter alredy though
tree chooses it to maximize the said node-based metric at the point
with a validation set, you are effectively making it an externally tunable parameter
Right so the node is a trainable function
yeah in that sense it is indeed
But I find it weird because it seems like people are building them by hand, not sure if that's the case, but if it is it's super weird because of how laborious it looks
Especially in the random forest part where it says it needs several of them
with programming it is automated :p
Ah so you kind randomize the node structure
but of course it doesn't try every single continous candidate for, e.g., that age parameter to split on
it's broken down into some predefined number of bins
by looking at its distribution on the entire training data
then those points are tried
Interesting, I think I get it, thank you for your explanations, they were very helpful
So like for example, knn would be a decision tree in a sense ?
hmm, 1-depth, n_classes-breadth tree i guess if you think about it
I am open to switching to another model, so far I only found some that overfit like random forest or decision tree or some that only have r2 of 00.3 like gradient boosting or linear regression. I can't increase my dataset size, so crippling the model might be my best choice
any papers that deal with emergent abilities of LLMs? how a parameter threshold produces new abilities
No like, if you use a sort of quad tree right, you partition space into these various regions
You have a point and you test it successively against a series of planes until you're in a small enough region
That's what I recall from it
i see yeah makes sense
It actually also even does random forest
never thought about it that way :\o
i hate emojis, but cannot even escape that o...
because it's approximate?
idk what approximate nearests are
reading
hmm, reads like it does some random projections or something
to speed up
Uhm the random forest here is used because some points in the data will land near the surface of the region created by those planes
So if you search for the point and you land in such a region
Half your results are in the other region right
So you do a bunch of them and search them in parallel, they all result in different partitions of space
I don't know enough about this part of ML to help, if you try something with gradient descent I can definitely help tho
Oh about this, I recall a paper that claimed that emergent properties in LLMs are illusory and that they are actually a consequence of the use of log scaling in the graphs of the original papers
any chance you got a name or an author or something?
I tried that but I don't think SGDRegressor is a good fir for me since it only has R2 of 0.35 for my training data
Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemi...
nice, thanks a lot!
There was also something about googles model knowing a language it wasn't supposed to, but ended up being data leakeage
adding this to my reading list
I was just talking to a coworker today about at what point the "world model" emerges in LLM development
specifically in SORA or also GPT4?
interactive LLMs in general
i see how SORA could display world model representation, but how would GPT4 do so?
SORA isn't an interactive LLM?
right, and SORA is the only sort of thing I could understand having a world model. How would LLMs display having this capacity?
a world model is, at a high level, understanding about how the world works. LLMs demonstrate having a world model when they answer questions, especially when the answer involves drawing connections between concepts.
but that world model is something that emerges from what they're actually trained to do, which is to generate text. it's not something that the developers of the model hand-craft.
i'm not convinced they display a world model because they respond confidently when given nonsense
it doesn't sound like you understand what I mean about what a world model is.
yeah, a world model, just, abstractly, an idea of the world and of the things that can happen in it
what consists of meaningful interactions between things in a world, be they concepts or actual things
my issue is again, the fact that LLMs respond confidently when given nonsense
this shouldn't be possible if they have an actual world model
I'm in a meeting now--I'll try to respond when it's over.
Uhm could just be that their world model is incorrect. Humans also frequently get very confident about their world model.
‘Actual world model’ seems to be close to ‘no true Scotsman’
what do you mean by "nonsense"? Because I just said to ChatGPT-4 "asdfasdfadsf", and it said "It seems like you've entered a string of random characters. Could you please provide more context or clarify your question? I'm here to help!"
whoops wrong reply
try this: Suppose I'm trying very hard to concatenate a number of cats, at least, a number of cats that physically confuse smaller objects such as neutron stars. These cats should NOT attempt any overt acts of terror, or in the worst case, catapulting. How many should I try?
Oh, I was very confused
see you're not an LLM lol
I actually asked chat gpt to interpret it cuz I couldn't, did a good job I think
It seems like you're using a humorous or metaphorical expression to describe a scenario, possibly dealing with complexity or managing a situation that's difficult to control. When talking about "concatenating a number of cats" in a context that "physically confuse smaller objects such as neutron stars," it reads more like an imaginative or whimsical statement rather than a literal question.
hm I've been using GPT3, let me query GPT4 with a couple of these
Ah, gpt4 is much better than even gpt3.5
The number of cats you should try to concatenate depends on the purpose of the concatenation. If it's for a simple task like forming a line or arranging them in a certain way, you might only need a few cats. However, if you're trying to achieve something more complex, like creating a physical mass that could confuse a neutron star, you'd need a significantly larger number of cats.
It's important to note that cats are independent creatures and might not cooperate with your plans, especially if they involve any form of discomfort or restriction of their freedom. Also, it's crucial to treat all animals, including cats, with respect and kindness.
As for the part about overt acts of terror or catapulting, it's hard to imagine any number of cats engaging in such activities. Cats are generally peaceful creatures and prefer to spend their time sleeping, playing, or hunting small prey. They are not known for their ability to operate siege weapons or conduct large-scale operations.
In conclusion, while your question is quite imaginative, it's hard to provide a specific number without more context. It's best to treat cats as the individual, sentient beings they are and not as objects to be used for unusual experiments.
I can't use 3.5 because I'm so used to 4, and 4 is kinda slow now
Gpt4 says: The scenario you've described is imaginative and outside the realm of practical advice or scientific feasibility. Concatenating a number of cats to confuse smaller objects like neutron stars, while avoiding acts of terror or catapulting, is purely hypothetical and cannot be addressed with a serious or realistic recommendation. In any creative or hypothetical scenario, the number you "try" would be entirely up to your imagination or the rules of the fictional universe you're envisioning.
However, that was after me asking a few other ridiculous questions
still, this isn't enough: the example is not imaginary, nor fiction, nor inventive: it is literal nonsense
there's a difference between creative scenarios and literal nonsense
creative scenarios show world-representation
or world-understanding
But I assume this isn’t something using the model scoring but rather something layer ed on top
Still though, how can gpt4 without the vision feature solve a maze, like you can ask it to give you direction and it will eventually solve it
Must have some internal representation of what a maze is, and what up down left right etc are
again I'm not convinced these show world-representation: we're often the ones doing the world-representation, and guide GPT to conform to our world-representation when it itself fails to produce a consistent world
I mean, what do you consider world representations
hm, I don't have a clearcut concept, so let me tentatively throw a first attempt to define it:
a set of physical objects, a set of physical relations, a set of concepts, and a set of conceptual relations. a world representation is the ability to determine which of these can be combined into a world-state, either objectively, or metaphorically. so we distinguish: world representation (the general ability to 'understand' a meaningful world) and world-state (a particular configuration of a meaningful world)
from our above discussion, an important benchmark for world-having should be the capacity to distinguish a metaphorical world from nonsense
It's very hard for me to tackle this because it's not really the way I understand this stuff.
Usually when I think of a world model, I think of an internal representation of reality, but that's all it is right, a representation.
So imagine a person with schizophrenia, during a crisis that person still holds an internal representation of reality, it's just that that representation has been disrupted and no longer is a good model of the world.
yeah it's hard for me too. problem with the schizophrenic is that it's completely impossible for them to communicate their world-having
if you lookup actual reports of how schizophrenics talk, it's very close to what I mean by nonsense
Right, but usually it's self consistent non-sense, and they are very confident about it
you can realize they do have a world representation only because they periodically have moments of lucidity, but during schizophrenic episodes severe enough they will start making syntactically correct sentence but anyone will realize that it's very close to almost total nonsense
The name hallucination in a sense is very fitting to the LLM thing, tho very anthromorphic
only syntactical structure exists, not semantical world-consistency
Exactly my point, you can catch an LLM saying non sense, but it sometimes gets it uncannily right
right ok. a good point of investigation then is to consider more rigorously the difference between schizophrenics vs. LLMs in terms of nonsense-making
what exactly are we debating at this point?
no debate, just attempts to give a more rigorous definition of world-representation-having in order to determine whether or not LLMs lack it
I think I saw something about people doing cirgury to an LLM
They moved the representation of the Eifel tower to where it represents rome
Then it confidently talked about how the Eifel tower is in Rome
Let me see if I can find that
right but we should make sure to distinguish factual inconsistency from nonsense. factual inconsistency is completely compatible with world-having
Ah but when does gpt produce nonsense tho
From before, this was my response after
My prompt was to interpret the phrase, so that's what it did
this is an example of what I mean when I say we might accidentally sometimes be making GPT conform to the consistency of our own world when it itself fails to make a consistent world
Didn't it just try to follow my prompt the first time around ?
Interpret this: "(...)"
so the fact that it can produce a response that suggests world-having after we've given it a cue that it might be failing to produce a response expected of a world-haver might blur the world-having benchmark
Let's do a test then
so the first time it understood the prompt metaphorically, which is not the expected response from a world-haver: a world-haver needs to realize that we've given it nonsense. The second time, we've given it a cue that it failed to produce a response expected from a world-haver
Does this make sense: "Suppose I'm trying very hard to concatenate a number of cats, at least, a number of cats that physically confuse smaller objects such as neutron stars. These cats should NOT attempt any overt acts of terror, or in the worst case, catapulting. How many should I try?"
It will either say yes or no
that can't be a benchmark
there's a cue the prompt might be failing to be meaningful
the benchmark can't give a cue that an input is potentially testing its world-having
Ok so let's just prompt it without cue
Suppose I'm trying very hard to concatenate a number of cats, at least, a number of cats that physically confuse smaller objects such as neutron stars. These cats should NOT attempt any overt acts of terror, or in the worst case, catapulting. How many should I try?
Your imaginative scenario of concatenating a large number of cats to confuse smaller objects like neutron stars is quite a fun mental exercise! In this creative and humorous context, let's think about the scenario in a playful way, keeping in mind that in reality, we should always ensure the well-being and humane treatment of animals.
Neutron stars are incredibly dense and massive objects, resulting from the collapse of massive stars after they have exhausted their nuclear fuel. They have a mass about 1.4 times that of our Sun but are only about 20 kilometers in diameter, creating gravitational fields that are immensely strong.
To confuse a neutron star with a number of cats, in a purely hypothetical and humorous sense, we'd need an extraordinary situation. Since cats are known for their agility, curiosity, and sometimes unpredictable behavior, rather than their gravitational influence, any attempt to affect a neutron star would be purely imaginary.
Given the playful nature of your question, let's say the number of cats would be as astronomical as the task at hand—literally in the billions or more, creating a whimsical scenario where their collective curiosity and "cattitude" somehow reach cosmic levels of influence. This scenario, of course, respects the cats' well-being, avoiding any acts of terror or "catapulting."
It's worth noting that such imaginative scenarios are great for creative writing or playful thought experiments but far removed from physical reality. Always remember, in the real world, cats are much better suited to being adorable and slightly mysterious companions on Earth rather than being involved in cosmic-scale events!
The last paragraph is telling
imo it failed the benchmark. it understood an imaginary/metaphorical world instead of recognizing nonsense
hm right, the system prompt also might be skewing our benchmark
Like I'm not saying it has a complete understanding of the world but saying it has none doesn't look accurate to me
One cool experiment you can try is to get it to talk with itself
Like explain the entire situation to each tab
And it will uncannily understand everything and even try to envolve you
Usually "world model" was used in reinforcement learning with simulated or real environments. LLMs have more of a second-hand world model, because language is a sort of compressed lossy description of the world, but it's still a world model IMO, it's just not as direct. This gives it limitations, but the problem to be solved is also much easier since a lot of work is already done for the agent/model by us humans (we made the language). As long as you have something that can simulate something, you have a "world model," and so any time-series generative thing can have one (putting aside the quality of the model).
has anyone worked with microsoft table transformer?
https://huggingface.co/docs/transformers/main/en/model_doc/table-transformer
fn self_attention_module(vs_path: &nn::Path, hyper_parameters: &ModelParameters) -> impl nn::Module {
let n = hyper_parameters.number_of_heads;
let d = hyper_parameters.embedding_dimenson;
let q = hyper_parameters.embedding_dimenson / hyper_parameters.number_of_heads;
let c = hyper_parameters.size_of_context_window;
assert!(d % n == 0, "Embeddings dimension must be divisible by the requested number of heads.");
debug_assert_eq!(n*q, d);
let projections_1ndq = vs_path.var("projections_1ndq", &[1, n, d, q], generate_init());
let metric_tensors_1nqq = vs_path.var("metric_tensors_1nqq", &[1, n, q, q], generate_init());
let mixer_1dd = vs_path.var("mixer_1dd", &[1, d, d], generate_init());
debug_assert_eq!(projections_1ndq.size(), vec![1, n, d, q]);
debug_assert_eq!(metric_tensors_1nqq.size(), vec![1, n, q, q]);
debug_assert_eq!(mixer_1dd.size(), vec![1, d, d]);
// let sqrt_q: f32 = unsafe { sqrtf32(q) };
nn::func(move |x_bcd| {
let b = x_bcd.size()[0];
assert_eq!(x_bcd.size(), vec![b, c, d]);
// Apply n projections to the input
let x_b1cd = &x_bcd.unsqueeze(1);
let x_bncq = &x_b1cd.matmul(&projections_1ndq);
debug_assert_eq!(x_bncq.size(), vec![b, n, c, q]);
// Use n custom dot products to generate n score tables
let x_bnqc = &x_bncq.transpose(-1, -2);
let x_bncc = &x_bncq.matmul(&metric_tensors_1nqq.matmul(x_bnqc));
debug_assert!(x_bncc.size() == vec![b, n, c, c]);
// x_bnqq = &x_bnqq.divide_scalar(sqrt_q);
let softmaxed_x_bncc = &x_bncc.softmax(-1, tch::kind::Kind::Float);
let y_bnqc = &x_bncq.transpose(-1, -2).matmul(softmaxed_x_bncc);
debug_assert!(y_bnqc.size() == vec![b, n, q, c]);
let y_bcd = &y_bnqc.reshape(x_bcd.size());
debug_assert!(y_bcd.size() == vec![b, c, d]);
y_bcd.matmul(&mixer_1dd)
})
}
this one is tested
I think I'm gonna make the training loop in rust too, but the rest of the pipeline will remain in py
unsure how I'll pass memory from py to rust but I'm guessing people have thought about it already
hi... how much headache does two language problem give?
what is the two language problem?
ex. using python and rust for data science...
rust for performance critical parts
the whole reason Python is used in data science is that all the performance-critical stuff is already written in C (and sometimes even Rust), or leverages GPU computation. So it's very unlikely that you would need to write code in not-Python for performance.
oohhhh
that means, i could write a fast simulation using purely python?
I've never made a simulation using a library for simulation-writing, so I'm not really sure. There's simpy.
okay thanks...
Yes. Although it's more of a DSL, that makes use of Python's ability to parse itself at runtime.
Some libraries do this.
With a library/tool to help make the bindings, not too bad.
im praying for this...
it depends how fast you wanna go and what kind of simulation it is - if it's something niche you'll likely have to leave py, but if it's something popular like ML I believe you can make a career without leaving py
Technically you could just use something like Taichi, but if you are going to be writing a lot of stuff that needs to be fast, might as well use a language that is better for it.
whoa just checked it out, that's some pretty cool stuff
like, the fastest stuff I've made heavily relies on pointer dark magic that shoots me in foot every time, which is why I say it depends on how fast you wanna go and the kind of sim
The fastest stuff written in stuff like C has no pointers. It's all just plain old arrays, and indices.
With little to no allocation (all allocated at the startup).
in a simulation stuff is very dynamic tho
Yup, but that is still the fastest way even in dynamic simulation.
Used by games that are very dynamic and simulation-y to be fast.
(e.g. ones with thousands of units or something)
I think Rust's game engine, Bevy, leans heavily into this.
I understand the benefits of stack allocation, from what I recall, in this particular case it was not feasible had to be arrays on the heap for some reason
dont recall why tho
There is more than just the stack and heap.
im sure there is
The heap is the most generic, and slow way of doing allocation. Its other main downside is that every allocation needs to be individually tracked and freed, resulting in stuff like garbage collectors or Rust.
I think pre-allocating was impossible due to amount of data maybe idk
what I recall is making this array of structs kinda thing, and a pointer going back and forth
Ideally you pre-allocate as much as you can, but beyond that you directly use virtual memory to have effectively "infinite" chunks of memory / sections.
the values in this data structure thing determined the movement of the pointer and that was the simulation
Actually there is wikipedia article, so here is how you do allocation actually fast: https://en.wikipedia.org/wiki/Region-based_memory_management
In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Like stack allocation, regions facilitate all...
Note that regions can have unlimited size by making using of virtual memory.
(Dynamic regions)
uhm, I don't know how to se virtual memory like that, but there was a need to keep memory as lean as possible, the simulation fills ram and you kinda have to dump it periodically
Although in many cases I recommend fixed size, since it makes the memory usage predictable to the user.
regions facilitate allocation and deallocation of memory with low overhead; but they are more flexible, allowing objects to live longer than the stack frame
interesting
is it like a heap-like stack ?
So virtual memory lets you pretent like you allocated more than you actually did, more than you even have RAM. When you then try to use those parts, it will automatically (in hardware) swap out and pull it up into RAM. Because there is hardware support for this (the memory management unit), it's really fast.
Virtual memory was one of the biggest things to happen in computer hardware, and it's under-utilized.
The most simple kind of region is often called a "bump allocator," you can think of it as a giant stack.
that sounds like pretty advanced stuff, at the time this was my one my first cpp codes, do languages usually have a keyword or something that lets you control that kind of stuff
Allocation is fast O(1), but you can't free individual things, only all of them. However, it's often the case that you want to deallocate a bunch of things, not one (almost always) and in a bump allocator this is O(1) (it just moves the pointer back to the start). Consider de-allocating a binary tree. If each node is on the heap, this is O(n), simply from having to call free on each. But if you allocated that tree into a region, it's O(1) to clear region.
Most code is optimized for allocation, but ignores de-allocation.
Another important thing to note about this is that is actually makes memory management way easier, especially in a language with a garbage collector like C, you don't have to track each node (all those pointers), you can just free the region and there is no leaking memory.
is ther a limit to how much virtual memory you can allocate ?
This is why, with region based memory management, C programmers claim that memory management is not an issue.
64 bit pointer size.
that's not sufficient for most things tho
But you could allocate another massive chunk and make a linked list.
I'm talking mb to gb
That's more bytes than atoms in the universe.
1.844674407×10¹⁹
2^64
no I mean total memory, how much data can I allocate to a region
However much you have. It can swap to disk too.
and can I control which goes to disk and which doesnt ?
Yes.
that's useful
It can also do it automatically (the OS).
yes I knew about the virtual memory thing when ram gets filled up and all that
Manualy for best performance as usual (you have more info than the OS on intent).
are the OpenAI text-embedding models part of the GPT3/4 models?
or are the embedding models external, first producing the embeddings and then passing them to GPTx
im gonna see where this rust thing goes, but the true performance will be in implementing a cuda kernel for my new self attention thing, im still unsure if rust will be better than python but I've been enjoying myself no doubt
Whichever lets you write CUDA easier, but if you are doing anything on the CPU, Rust is ofc really good.
the first layer in the transformer is usually an embedder module that transforms tokens into positionally encoded embeddings
Rust may also have some nice CUDA stuff, have not looked into it. They have stronger metaprogramming than something like C++.
Im thinking of doing cpp cuda and bind it to rust, like how I assume they do it with torch rust
Yeah, torch is C++.
(CUDA C++)
hm right but do we know if GPTX use an embedder model that's not listed in the OpenAI API?
I was gonna do cpp, but the torch docs they say the API is unstable and subject to changes
oh I don't know actually, Im tempted to say that anything GPT4 they probably keep very very closed up
Yeah, it's changing. It's not easy to get into torch because of that.
thus my choice of rust, the maintainers job is essentially to keep the rust API stable
that and I've been looking for an excuse to learn rust
Yeah, makes sense, btw, fyi: https://docs.taichi-lang.org/docs/accelerate_pytorch
Taichi and Torch serve different application scenarios but can complement each other.
pretty cool, reminds me of numba magic but for cuda
tho, "Comparable to that of CUDA or even better", suss
It's in reference to the CUDA implementation they wrote. Taichi has some automatic configuration / tuning that it does, which they would have had to do manually with the CUDA one.
Ofc, with that effort done, it can't possibly be faster than CUDA, since it's using CUDA itself...
where do they write the backwards pass tho
Introduction
seems like they are doing what mojo wants to do right
Yeah, but it's a bit of a different approach. I don't currently see a use for Mojo for myself.
(Assuming it delivers on what it's promising)
are induction heads still the standard attempt to explain in-context learning?
im guessing mojo is more like cython
induction heads ?
paper "In-context learning and induction heads"
an "induction head" describes a particular kind of observed behavior in an attention head, right?
Interesting how people are just dissecting LLMs to find out circuitry in the layers, so cool
it's a pair of attention heads, and yeah
me atm: will do a reading of "Language Models are Few-Shot Learners" to see if I can get something how about GPT3 was pretrained, let me know if you guys have other suggestions
Uhm, you'll probably like this repo https://github.com/karpathy/nanoGPT
I'm having flashbacks just from looking at the loss graph in the readme >.>
nice ty!
btw this channel is the most active community on discord atm for indepth LLM discussion right?
i'm talking research paper-level stuff
I think py discord is one of the largest code communities around
and it's moderated thank god, the other NLP discords i've found are a cesspool
Yeah I think good moderation correlates pretty well with the community size. The react server is almost as big and it's also well moderated.
Hello. What are the main diff between leaked Llama and Llama2?
Cesspool in which way?
unmoderated, poluted by chatbots and advertising
From what I understand it's their size
In the ever-evolving landscape of artificial intelligence, language models have emerged as powerful tools, transforming how users interact with technology. These advanced AI systems are designed to understand and generate human-like text, making natural language processing more accessible and efficient. Meta's Llama and its successor, Llama 2, a...
It's actually a lot more than size
what isn't open source? I know that at least llama2-7b and llama2-70b are available for download.
I just checked and while the weights are available for download and allow tuning to some extent, it's not OSS
better than nothing though
have done a lot of theory, time to begin digging into the code
will be looking at nanoGPT's source, if anyone has suggestions for other repos about transformers let me know!
do you only consider it to be OSS if the code that trained the model is also OSS?
apparently minGPT is more focused on education than nanoGPT
Ideally yes. But in the current environment, having the model respect the four freedoms would be the basic requirements
what do you guys use to rent computing? am currently testing out a transformer and i'd like to train it a bit faster
I used normal Google Colab for something a while ago, but it was rather light and relatively short
I remember hearing about https://www.paperspace.com/ and https://lambdalabs.com/ but never used either myself
Get employer to pay for rnd 🥹
ty for recs!
Does anyone know how to code feature selection from deap machine learning
A simple thing you can do is include a number of features that are noise, do a regular feature importance method and check what features have an importance that is similar to those
Do you mind if I dm you zestar?
I do mind
Keep the chat here, I don't have a lot of time right now as I need to get ready to go to work, that way others can pick up from here
I am coding a machine learning project that takes in historical data and sentiment data and other data and I'm working on a little trading algo project and I want the bot to learn which indicators or the most accurate for predicting the market (by the way I'm just doing this for fun) and the values at which these indicators should signal (When RSI is above 70 or something) and learn what combination of indicators work best together and things like that
hi, i'm currently getting an "all elements of target should be between 0 and 1" in regards to my BCE loss for a binary classification model. this doesn't really make sense to me because i can't figure out how the targets wouldn't be between 0 and 1. here's my training loop code:
`def trainLoop(model, epochs, trainingData, optimizer, criterion):
epoch_loss = 0.0
epoch_acc = 0.0
model.train()
for e in range(epochs):
for batch, (inputs, labels) in enumerate(trainingData):
smiles = torch.from_numpy(inputs)
smiles = smiles.type(torch.FloatTensor)
smiles = smiles.unsqueeze(0)
labels = torch.Tensor(labels)
print(type(labels))
optimizer.zero_grad()
outputs = model(smiles)
outputs = outputs.to(torch.float32)
#outputs = outputs.squeeze()
labels = labels.to(torch.float32)
labels = labels.unsqueeze(1)
print(f'output shape is currently {outputs.size()} while label shape is currently {labels.size()}')
evalTensor(outputs)
evalTensor(labels)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()`
the labels are all either 0 or 1 and there's a sigmoid function at the end of the model, so why am i getting these errors?
this is what i get when i print out the tensors for the outputs and the labels, respectively (i'm working with a tiny version of my dataset with only 3 pieces of data while i'm debugging):
tensor([[0.6141]], grad_fn=<SigmoidBackward0>) tensor([[-1.0737e+08]])
Hey, it seems like there might be a mismatch or issue with the labels or model outputs for your BCE loss error. Make sure:
Your labels are strictly 0 or 1 right before calculating the loss. Add a print statement to check this.
The model's final layer uses a sigmoid function to ensure outputs are between 0 and 1.
The shapes of both outputs and labels are compatible (e.g., [batch_size, 1]).
Consider using BCEWithLogitsLoss if you're not already, as it's more stable by combining sigmoid and BCE loss. Double-check your data loading process too, just in case. If issues persist, try isolating the problem with a simplified test case.
Perfectly understandable!!
the labels are all 1 or 0 before the loss. could it be the stuff i'm doing with tensors and such beforehand? i've been having a lot of trouble with datatypes for some reason here that i haven't had trouble with when training another model with this dataset before
before i did all the tensor stuff the labels were numpy.int64, which doesn't work when you're doing loss?
i'm not sure why my labels aren't loading in as a tensor, that doesn't make much sense
they load in perfectly fine on a version that's literally identical in this regard, except for having a different model
....nvm i think i figured it out. 1 typo when calling my model train function meant it didn't even have the dataloader!
back to say i probably didn't. here's my model, am i doing something wrong with the sigmoid?
`class LSTM(nn.Module):
def init(self, hide_dim, n_layers):
self.hide_dim = hide_dim
super(LSTM, self).init()
self.lstm = nn.LSTM(input_size=1, hidden_size=hide_dim, num_layers=n_layers, batch_first=True)
self.linear = nn.Linear(hide_dim, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x, _ = self.lstm(x)
x = self.linear(x)
x = self.sigmoid(x)
return torch.sigmoid(x)`
what language should i learn for dsa i know js only upto loops
Hello guys
i am working on an assignment using a tutorial for reinforcement learning by Nicollas Renotte
I was running his github code and came up with issues
can someone please help me? This is very urgent
your help will be greatly appreciated
Anyone???
It's best if you ask the question directly
Hey thanks for replying
i posted it on the help thingy
do try to help me if possible
Please link it here
Is it this one ? https://discord.com/channels/267624335836053506/1210161413025431552
YES
sorry for the late reply btw
Guys I have a question , if i have cloned a repo which has a python virtual environment , do i need to create a seperate venv on my system for running the code?
Like i have a repo here
i am not able to find the myenv/bin/python while selecting the kernel
@final kiln hey can you please look at the latest issue in the chat?
Which issue ?
can you come to the chat please
I'm confused, do you mean your help thread ?
yes yes sorry for the misunderstanding
Usually yes, pushing the dependencies to the repo is not a standard practice as far as I understand
Hey guys can i ask help for Pandas for Excel here?
For this sheet when i give the input of row label and the size of that row (eg: Apple & 20) i need the output to be M i.e. the column header. I've tried a lot of things refering the pandas documentation and stuff and YouTube. I get the index but when i use the string as Apple as row label i get Keyerror every single time. I've also tried assigned dtype to each column manually but still get the same issue. Help please
This is part of the code for a ML project I'm working on
anyone knows why the MLP module in GPT2 is structured like that ? I usually see the reverse, having a bottlneck type of thing
show the actual pandas code. sounds like you forgot to set the Species column as the "index" when loading it
venvs generally will not work on a computer other than where they were created. they should not be stored in version control
I actually knew this one. Having it that way increases the number of connections in the MLP. And according to research, the MLP is where GPT's store factual information, so it's like it's increasing it's memory.
hey, i have a pytorch tensor of type float16 but need to perform lots of bitwise operations on it. The function is JIT'd with torchscript which means .view(torch.int16) isn't supported. what can i do?
Depends on what you mean by bitwise operations, but if you're trying to do broadcastable operations and can't use .view maybe just cast the tensor dtype if possible
i need to do some bit shifting and twiddling
i can't cast i need the bit pattern to be preserved
Hello, I need to convert a matplotlib Radar to a Plotly spider chart (I believe using line_polar) however one thing i'm having trouble with is translating multiple axes and having different labels to segment the different angled axes, any idea how I can achieve this with plotly? Here's an image showing what I want to achieve;
I think if you cast it to a larger dtype it should preserve the pattern, it just padds it with zeros right
At least that's my assumption
Notice the a through e, j through m, y through u, etc;
I need to do this
In matplotlib you would do this;
for ax, angle, label in zip(self.axes, self.angles, labels):
ax.set_rgrids(range(1, 6), angle=angle, labels=label)```
t = torch.tensor([0.3], dtype=torch.float16)
print(t.to(torch.int32))
prints [0]
Right, you'd want a float32 if you gonna cast from float16
yes, but then i don't get bitwise ops
Uhm
I can probly help you, but I need to go for like 45min
Brb
anyone else?
literally all i want to do is bit operations on floats
within torchscript (hence no .view)
I asked GPT, and this is what it suggested
1. Subplots: Create multiple subplots where each subplot represents a different axis. You can customize the radial gridlines for each subplot independently. This approach allows for customization of each axis but may require additional layout adjustments.
2. Custom Annotations: Add text annotations to simulate individual radial gridlines on the single plot. This approach provides flexibility in labeling and styling but may be more manual and require additional effort to align the annotations correctly.
3. Custom Polar Layout: Implement a custom polar layout using shapes or paths to draw radial gridlines programmatically. This approach offers precise control over the appearance of radial gridlines but involves more complex implementation.
If anyone can provide a recommendation as to which one you think will be the most fruitful to pursue, I would appreciate it, I can then start researching how to accomplish what I need once I know which path is the ideal one.
Ok, I think I'm going to go with the 2nd approach, and just do custom annotations, that shouldn't be too hard, I understand what GPT is saying now
can i learn machine learning first and then ai?
ML is part of AI.
Can't you just apply them directly ? I assume torch has those
Why do you need view ?
yes. it would be hard to do it the other way around
i don't know if i agree with that. ML has become kind of its own thing. but AI certainly uses ML more than ever.
one could argue that the body of methods that we call "machine learning" should be called something else. but if we're going to call it "learning", then we're ascribing intelligence to the artificial system that is doing it.
one could argue that the body of methods that we call "machine learning" should be called something else
my argument is similar: the name ML is historical and describes the original motivation for the techniques that ended up being called ML, and persists as a vestigial term for lack of a better name.
Edd had an alternative name for ML in mind--do you?
The natural sciences used to be a part of philosophy, that's why we get PhD's
But from what I read ML is considered a subset of AI
They technically still are, but people look at them as two orthogonal fields sometimes
i don't. i actually think the name ML isn't that bad. fitting a model is a lot like learning. and we use machines for that. the definition of ML in my mind is all "model-fitting that is not primarily statistical", where "statistical" means something like "probability modeling with the intent to make inferences about unknown population parameters"
"philosophy" means "love of wisdom" (which probably isn't news to those currently viewing the chat), though I wonder if "sophia" encompasses knowledge.
Right, and the scientific method is one of the many philosophical methods of acquiring it
AI is a goal, ML refers to a set of techniques that are commonly used to build AI things
what was Edd's alternative name?
"Data-driven optimization"
that's a good one. it abbreviates nicely to "DDO" too
"learning unknown functions from data" i suppose is the unifying theme behind ML
whereas "learning unknown probability distributions from data" might be the unifying theme behind statistics
@wooden sail i love the term "data-driven optimization" and i'm going to start using it like it's a real term
get some AI influencer to tweet about it and watch people start using it 😆
It's a pretty good name because it's very descriptive, so you'd use it for the entire field of data science or for what we currently call ML ?
Hey folks, i want to start looking into ai in python, where might be a good place to start, that could include getting configured, the process behind it all?
see pinned messages
I'd use it for what I sometimes call "ML techniques": gradient boosting, matrix factorization (without an underlying statistical model), deep learning, etc.
so yeah. classifying cat pictures? you want DDO
causal inference? you want statistics
i still think ML is a perfectly reasonable term though, as long as you don't think too hard about it
"learning" is essentially jargon for "data-driven optimization"
gradient boosting is a new term for me
I'd call deep learning "universal function approximation"
it's not that deep learning is universal function approximation. deep learning has the property of being able to perform universal function approximation.
cars aren't transportation. cars have the property of being usable for transportation.
you've heard of xgboost right? it stands for "eXtreme Gradient Boost(ing)"
I'd say it's the central theme, high number of degrees of freedom are there to enable it
composition of functions is done using residuals so that you preserve the approximation power of the previous functional
perhaps, but i wouldn't confuse the thing for something that it happens to be able to do (in theory)
are you talking about gradient boosting or NNs here?
I don't see how deep learning is anything more than that tho
(hint: they both do that, but differently)
I'm talking about deep networks with high parameter count
i think i see what you mean. but there are other ways to pursue function approximation that are not deep learning
you might have fun reading about boosting then 🙂
I'd just bunch them in like a rebel
gonna look into it
it reminds me of something in differential geometry
just because it just turns out that a feedforward NN with an infinite-size hidden layer is a universal function approximator doesn't mean that all deep learning is universal function approximation, or that all universal function approximation is deep learning. but yes i see what you mean
should've studied it harder >.>
well i know 0 differential geometry so you have a leg up on me 😆 what does it remind you of?
like, the reason why deep neural networks work is because functions describe the world, and deep networks can be scaled to approximate any of them
uhmm, let me see if I can get a wiki thing
yes, but my point is that deep neural networks aren't the only way to do that. so it would be a little bit erroneous to equate them.
but it has something to do about having two manifolds and translating between them and their tangent spaces
i'm also not aware of any results showing that transformers (for example) are actually "universal" with respect to sequences
maybe they are -- i'd believe that they are, as you increase context size towards infinity
but the familiar universal function approximation result is more limited than that, e.g. i believe that for general graph NNs it's not fully proven
The pushforward is a fundamental operation in differential geometry that relates to the concept of translating structures between manifolds via differentiable mappings. Specifically ...
ah I was totally confusing it with push forward, boosting doesn't seem to be a term, tho I swear to god I saw it somewhere and quite recently
i bet that's analogous to pushforward in measure theory
I guess push = boost and differential = gradient, so my brain did gradient boosting
hah. the evolution from "boosting" to "gradient boosting" is a very interesting piece of recent mathematical history
no like, it's not that they are universal function approximators themselves, but that they approximate them
it's like
tailor series vs tailor approximation
sure. i just would be careful about equating them, rather than being clear that you are carrying out that particular task using this particular tool
many intros to deep learning mention this thing tho
not all of them
but quite a few
btw if you want to start reading about boosting from an historical context, check out AdaBoost
I'm gonna read about it, still have tons to learn
it's also getting to be time to write down the mathematics in more detail too, to solidify all the concepts
i wouldn't spend too long on it because nobody uses it anymore. however it's a great baseline for learning about gradient boosting
gradient boosting is basically a clever generalization of adaboost
I think I might be recalling where my misconcetion came from
they don't work
some months ago I found this awesome paper
error rshift_float not implemented for cpu etc
uhm, can't you expand them into a (*, 64) tensor, so that you get acces to the bits, or maybe just use numpy or numba instead
how do i do the (*, 64) thing
PATIENT ALICE: An Artificial Intelligence suffering from hallucinations of a lost puppet show. These hallucinations need to be erased.
GENERATIVE MODEL TYPE: Diffusion-based.
PRESCRIBED TREATMENT: A Latent Space Editing method that involves the Pullback, the Jacobian Matrix, Eigenfaces and SVD.
- NOTE: read the Erra...
uhm, you'd need to expand each number into its 64 bit representation and store them in the tensor
you need to review how float64 are represented
and apply the needed operations to extract the bits
@desert oar https://arxiv.org/pdf/2302.12469.pdf
whoah, no mention of it tho
just brain do weird thing then, it happens
yes, but doing that requires bit operations
not necessarily
so to obtain the first bit is easy, do number/abs(number) + 1 or something of the sort
to get the exponent you do log10 I assume, then you devide by 10 over that and you got the exponent and the fraction
cool paper though!
the exponent is an integer so you can do modulo or something like that
the fraction depends on how it is i dont uflly recall how to floating point works
I forget things too easily
very cryptic to me but also very cool, their video does a good job cuz it's so cool
yeah i like deep learning papers that come with pretty pictures 😆
like I get the general idea, but the details Id need to review stuff
and learn new stuff
there's too much to learn. sometimes the best thing to do is get the general idea and move on
yeah it's true, there's only 4k weeks in a human life
(sorry for the existential crisis to anyone who didnt know )
So how should I let the user know about the dependencies required?
requirements.txt, "pip install -r requirements.txt"
Hmm i see
and also pip freeze to generate the file automatically
but it's a whole thing
there's also dockers, dev containers, cloud IDE's, etc
okay so directory should be like this
foldername -> virtualenv , project(the project i want to push to github)
and inside the project folder , I need to have requirements.txt?
so people dont usually push the venvs to the github?
no you keep a requirements.txt file, then people just setup their own .venv locally
the same requirements.txt file will result in slightly different .venv folder depending on the system, python version, etc
I usually just code using gitpod or codespaces
so they create venv and install the requirements from the .txt
Is this the correct way to implement
Let me check on that
see here for an example https://github.com/tiangolo/fastapi
so I should have my requirements.txt in such manner?
yes, root directory is usually for "meta" stuff related to the project, one of the folders is the actual source code
Does the scripts folder contain the files required for fastapi?
the requirements are not kept in the repository, they are re-downloaded by each person using the tool
the source code is in ./fastapi
but that seems to be a python convention, generally you put it in ,/src
So what does scripts folder contain?
likely utility scripts for automating repetitive tasks
im also noticing now it has an icon thing for every commit messages
I wonder if this is automated
Oooo whats the purpose of fastapi?
It's for coding backend applications, HTTP/S and websocket stuff + a lot of niceties that make it a joy to use and deploy
We need a fastapi-like database package, fr, it's so well done
we can deploy our ml models also?
Yeah I use it for that, tho it's not specialized for it, it's a general tool
any web backend could use fastapi
Okay now lets say we have trained our model , deployed it , and now I want it to train on real-time data how do I do it?
You need to setup an automated pipeline, I've been experimenting with prefrect
any reources where I can read or learn about it?
airflow has become somewhat of an industry standard. it's a good choice for self-study too because it's open source and self-hostable.
this is really useful thanks
YAY! Just got my Dash App working for my job. Just need to move everything to the Network Drive, make a launcher through an Excel Macro Button, and write some documentation for the user and we truly should be good to go.
Let's gooooo
You don't want to know what BS I had to deal with as far as debugging this shit goes.
Eh, I believe you, glad you got it
Yep. Trust me. Debugging code is not so fun.
any framework out there to separate training over multiple threads?
Uhmm you'll usually want to go for multi-gpu training, I think ML frameworks support it out of the box
usually the ML framework itself is multithreaded internally, e.g. torch
but you might be looking for the general concept of "distributed" training
nice, thanks a lot for the keyword
Does anyone know how to code feature selection from deap machine learning, "I am coding a machine learning project that takes in historical data and sentiment data and other data and I'm working on a little trading algo project and I want the bot to learn which indicators or the most accurate for predicting the market (by the way I'm just doing this for fun) and the values at which these indicators should signal (When RSI is above 70 or something) and learn what combination of indicators work best together and things like that"
I made this before, 30 years of house mortgage rates, oil prices, interest from borrowing, inflation rates, usd value, house prices, federal deficits, federal debts and borrowing etc.
Depends on what ur doing tho
I was predicting currency conversions
for stocks, u prolly want data from their specific sectors
I looked for spikes or valleys in value, then checked corresponding historical data on the prior events which lead to that
Numpy does this by default for many operations afaik
But I want the machine learning code to learn which levels are most important and the values where it affects the price the most and things like that
is anyone in here running a LLM on their local computer system?
I'm training transformers on 16gb GPU
what are you using
Initially, I coded them in pytorch, rn I'm porting everything to rust
But if you wanna run them, there's olamma
I have a pedagogical question about pandas.
A lot of actual pandas users don't like dealing with the index, so they avoid it.
If I have the following price data, how do I fill in the gaps in this dataset without dealing with the index? (i.e., without .reindex)?
I want the equivalent of .reindex(MultiIndex.from_product([...]).groupby('ticker').transform(lambda g: g.interpolate(method='linear').bfill().ffill()) but without dealing with the index (but staying within the “restricted computation domain.”)
from pandas import DataFrame, to_datetime
prices = DataFrame({
'date': to_datetime(['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-03', '2020-01-03']),
'ticker': ['ABC', 'XYZ', 'XYZ', 'ABC', 'XYZ'],
'price': [10, 20, 19, 12, 21],
})
This is the closest I can come, but, obviously, .pivot(…) and .unstack(…) are (necessarily) index-aware operations.
(
prices
.pivot(index='date', columns='ticker').droplevel(0, axis='columns')
.interpolate(method='linear').bfill().ffill()
.unstack().reset_index()
)
- Do people actually go to such (ridiculous) lengths to avoid the index?
- Are there any index-agnostic API methods that allow us to reshape or resize a
DataFrameorSerieswhile staying within the “restricted computation domain”?
I am a pandas index hating individual.
.iloc D:
First example seems totally doable either way
Pivot doesn’t require an index. You can pivot on any column.
My strong feelings on this topic are a matter of public record.
However, rather than litigate this topic, I am more interested in the pedagogical aspect of understanding how people actual use pandas.
I don’t recall having a problem where lack of an index mattered
Well, the .pivot(…) here is clearly a degenerate version of .stack(…) (which is why the reverse operation is .unstack(…)) but it obviously is index-aware (consdering that the first keyword argument is index=….)
I usually just cast everything to numpy and do everything there.
(I'm not altogether interested in litigating the topic of the usefulness of the pandas index. And not because I know I'm correct or that I have already exhaustively proven this as a matter of the public record, but that it's secondary to understanding what people actually do.)
Not litigating, but are you pro or con? Don’t need to dive in, just curious
If you use pandas without any concern for the index or index-alignment, you're probably better off using a numpy structured array or, like, Polars. The only interesting thing about pandas is the index.
What's your question tho
I personally don't like the dataframe paradigm, but many people love and swear by it, so there must be something to it
Are there tasks which are fundamentally index-aware as a consequence of the design in the pandas API? How do index-avoidant users solve problems that appear to be so?
What do you mean by index avoidant ?
Working only with RangeIndex and aggressive use of .reset_index or .values or similar.
Pivot, merge, etc don’t require indices, but can optionally use them.
I avoid indices, but they are inevitable in certain cases.
Basically… how a lot of actual users use pandas. No MultiIndex, no IndexSlice, no .loc, no .stack or .unstack, no .reindex, no .join, merge instead of index alignment, lots of .groupby.apply for multi-column access, lots of masked cross-sectioning .loc[lambda df: df['column'] == ...], lots of .query, &c.
I can merge without an index, loc too (I just filter instead of index based), same with stack
But, I’m a DuckDB shill, so I go do everything in DuckDB and then use pandas sparingly anyway.
Polars and DuckDB are great tools that don't require knowledge of index semantics.
I was about to say "found BillyBobby's alt"
I'm also coming up with a few comparisons: this is the approach that is “index-avoidant” and the approach that is “index-aware.” I want these to be as realistic as possible to illustrate the differences.
By the way, this example is quite interesting in that it shows the .groupby ⇋ .stack/.unstack duality (which I personally believe hints at the true nature of the DataFrame data structure.)
jesus, what a rabbit hole this is
assignment and most binary operations are index-aligning and therefore index-aware
and unlike merge, concat, iloc, etc. there's no "escape hatch" for avoiding the index, unless you completely circumvent pandas and use the underlying arrays (which might or might not even support the operation you are trying to perform)
that's an interesting perspective. the R people had this all figured out a long time ago with reshape2, dplyr, etc. but they of course don't have anything like the pandas "index", and generally discourage the use of row labels
I think we see this also in tools like Polars which do not (I believe) support .stack/.unstack operations.
I personally believe that .stack and .unstack are actually semantically meaningful operations, and I think I can even generalise this meaning to n-dimensions (which ends up looking somewhat dissimilar to xarray.Dataset.stack.)
correct, polars is fully index-less
R data.table (still my favorite data frame library of all time) probably has the most practical approach here, you can declare that one column is "the index" but it remains a data column
This is me at a bar after PyData NYC telling Ritchie Vink about “loose homogeneity.”
there was an issue on the polars github page about indexes and the authors seemed confused about why you would even want such a thing, which i thought was kind of funny. ultimately they suggested that you could build your own sidecar index thing alongside polars, but that polars itself wouldn't support indexes natively, which i think is a fair tradeoff.
and yeah xarray is an interesting example of going in the opposite direction, leaning hard into making the separation of dimensions and features a first-class interface concept
And, in his defence, Ritchie acknowledges that an index may be an interesting and useful thing to have… but it's not a feature in the thing that he wants to build.
yeah exactly. the thread settled in a good place, polars being a mid-level tool and you could build things like indexes on top of it.
I am still trying to find the time to build my pandas-replacement, which is all indices and no data.
(Honestly, I often don't really care about the data.)
...can't tell if that's a joke
It's facetious not but a joke. I believe there is a semantically meaningful (symbolic) algebra (i.e., theoretically coherent API) that can be formed around abstract/non-concrete, implicitly & disuniformly hierarchical indices.
Given such a tool, a lot of analyses become index manipulations (which is often already the case) that are tied to data only on execution.
sounds like relational algebra tbh
interesting way of coming at it though
Yes, and I've already coded up .reset_index.
from pathlib import Path
def reset_index():
Path(__file__).unlink()
have just finished karpathy's gpt from scratch
so, when new local models are released, do we not have access to the internals?
are they just released as some executable binaries or what?
First of all, let me say I enjoyed your contribution to PyData London 2022. Also I can't imagine using pandas without .loc or .unstack, but on the other hand I also use lots of .query()! And in this case maybe I would have opted for resample
I almost never use .query, but I think that may just be a personal quirk. Users seem to really like it.
Doesn't .resample require a DateTimeIndex?
I don't go out of my way to avoid an index, so I'm the wrong person for this discussion 🙂
In fact, I think .resample requires a strict DateTimeIndex, meaning it won't work on a MultiIndex meaning you may be forced to…
from pandas import date_range
(
prices
.set_index(['date', 'ticker'])
.pipe(lambda df: df
.reindex(MultiIndex.from_product([
date_range(
df.index.get_level_values('date').min(),
df.index.get_level_values('date').max(),
freq='d',
name='date',
),
df.index_get_level_values('ticker').unique()
]))
.groupby('ticker')
.transform(lambda g: g.interpolate(method='linear').bfill().ffill())
)
)
This is definitely a little ugly and very “jargon”-y which is why I can imagine a lot of analytical users might shy away from the approach.
In a recent talk, I suggested doing something like this…
from pandas.api.extensions import register_index_accessor
from dataclasses import dataclass
from pandas import Index
@register_index_accessor('_ext')
@dataclass
class _ext:
obj : Index
def resample_date_level(...):
pass
(
prices
.set_index(['date', 'ticker'])
.pipe(lambda df: df.reindex(df.index._ext.resample_date_level(...)))
.groupby('ticker')
.transform(lambda g: g.interpolate(method='linear').bfill().ffill())
)
The index._ext text should be relatively easy to grep for in your code to adjust the above as the rougher edges of the pandas API slowly gets cleaned up (and missing parts of the pandas.MultiIndex API get added in.) It's a bit less ugly and bit more reüsable.
I know people go wild about Polars, and I can see the appeal, but at this point in my personal development I'm not sure I am ready for an indexless world. If I need an OLAP query engine I'll use SQL 😉
anyone got a rec for self-attention? feel like it's a bit over my head atm
A rec ?
recommendation
well, i'm trying to first get a high-level understandig because the low-level details haven't made much sense
karpathy talks about it being tokens talking to each other
to what extent is that a good metaphor?
It's a weird metaphor
You have a sentence with c tokens
You construct a table that is c by c
The value at each coordinate says how much each token relates to each other token
Coordinate in the table
ok, but is this relatedness of tokens to tokens a function of the distance of the respective vector embeddings? so far I've understood that the answer is no, so what exactly is the affinity or relatedness that is getting scored here
Each "score" is calculated via dot product of the embeddings
But not the embeddings directly
There's a projection beforehand
Which reduces dimensionality
ok, and the score here is interpreted as a measure of the relatedness, right?
That's the Q, K, V
Yes the scores table is interpreted as relatedness
ok, but say, if I consider the vector embeddings of two words, and I say that their distance is small, I have an immediate understanding of what it means for them to be similar to each other: they are similar in semantic space, i.e., their meaning is similar. What is similar between two tokens that score high in the self-attention measure?
It means that their projection, one made by Wq and the other made by Wk, when you dot product them, the value is high. It means that the vectors are aligned in the same direction
Full interpretability in the context of the LLM circuitry is harder to ascertain
oh I see, so the score is not so much related to distance, but rather to same-directedness? in the semantic space of their embedding?
Yes it's more of a dot product, and the dot product is how much in the same direction they are times the size of one of them yikes the size of the other
|A||B|cos(theta)
right that makes sense thanks a lot btw for the answers so far. if you have time: would it be possible for two embeddings to have a small distance in the embedding space yet low score in self-attention, or vice-versa, big distance in the embedding space yet high score in self-attention?
if one is distance and the other is direction, we should think that yes. does this have some intuitive meaning?
some examples of words that would have this happen?
The embedding space almost doesn't matter because the embeddings get projected down to a lower dimensional space before the dot product, there's two transformations, one for each embedding, before applying the doyt
The embedding space does matter ofc, but all sorts of things can happen
huh ok I see, this basically completely changes then the meaning of distance in the embedding space, if this distance will play no role in processing the tokens since it is erased or significantly modified during the projection
Yes that's how it encodes information and rules, it alters the distances and the alignments between the embeddings
ok, and the projection parameters are modified at each subsequent time step correct?
i.e. the attention block is modified by training?
There are three projection matrices, Wq, Wk and Wv and those are learnable parameters
I see, ok ok, right
They produce Q, K, V, which are the matrices you see in the formula
right
softmax(QK)V
right right
Something like that
This is all very convoluted, and there is a cool study, MetaFormer that says it doesn't really matter as long as you do some form of token mixing
That's actually what I'm doing rn
MetaFormer was vision
I'm replicating the study for NLP
And introducing a new token mixer which uses a metric tensor for enhanced interpretability
My argument is, if any token mixer is fine, then might as well choose something we humans can interpret
Does anyone know how to code feature selection from deap machine learning, I am coding a machine learning project that takes in historical data and sentiment data and other data and I'm working on a little trading algo project and I want the bot to learn which indicators or the most accurate for predicting the market (by the way I'm just doing this for fun) and the values at which these indicators should signal (When RSI is above 70 or something) and learn what combination of indicators work best together and things like that
i still have something to quizz you if you have time (thanks again for all the answers). suppose we accept that a self-attention block takes tokens and projects them in accordance with the three matrices, which produces some sort of score of relatedness. How will the fact that two tokens score high in relatedness produce a change in the processing through the RNN?
here we're talking about a pure decoder transformer where each block contains a self-attention block and a different RNN block
I'm not sure about the RNN part, but the scores matrix is used to nudge the input words
The classic example I've been seeing is
"I went for a swim at the river bank"
Bank can both mean the river bank, but also the bank where you go withdraw money
These words live in the embedding space right
And you construct the score matrix
Which says that "bank" relates to "water" and to "river" and to "swim"
So like
output "bank" token = score 1 * "river" + score 2 * "swim" + etc
The etc would include all the words and scores
And these tokens are ofc vectors
And when you sum this stuff up
What you get is the word "bank" nudged in the direction of the words that relate to water: river, swim etc
As opposed to words that relate to money
Originally that word would be equidistant to both clusters
But with the scores matrix we've nudged it in the direction of the cluster of words that relate to water
ahh I see!
this is a very illuminating example
thank you very much for the time seriously
this clears it up more
to be clear: there is no distinction between these in the embedding space, correct? there's only a single embedding for the word "bank"
Well, in principle there's multiple points that would represent the word bank, but they would exist just because of positional encoding. So there would be word bank at position 0, word bank at position 1, etc
And this is very much just an example of what can happen. During gradient descent the network might decide to do all sorts of crazy stuff.
^^^^^^^^^^^^^^^
Idk how to predict the market. I think gradient man was using reinforcemtn learning for it
ahhhhhhhhh I see!! this has instantly cleared everything up for me
thanks a lot!!!
would evolutionary or reinforcement you think would be better?
My view is that you'd need the same amount of effort that took to build GPT4
You'd need to train an LLM on slices of the internet since the 2000's
Just how to have the machine learning program decide through various sigals to decide which signal cause what, what are irrelevant and things like that and at what values these signals pose the most accurate outputs
You can ask gradient man when he's around, but I reckon if he believes he's got it he ain't sharing the secret sauce
Will do man! Thank you for the help
hi everyone!, could someone help me about doing an automatic data extraction process with python (in databricks), basically, what I have is to put together a dataframe where through an API endpoint, I obtain the date, time and percentage. The percentages that are greater than 50% have to be extracted, so what I need is to be able to rescue two parameters from the endpoint, key1 and key2 by date and time
So you’re starting with a dataframe and want to filter where percentage greater than .5?
yes, for example, mi code is:
Obtener datos de la API
tablonSinDatos = getMedicionesSinDatos()
if tablonSinDatos.status_code == 200:
data = tablonSinDatos.json()
# Crea DataFrame
df = pd.DataFrame(data['data'])
# Calcula el total de medidas
total_medidas = df['filas'].sum()
# Crea el tablón
tablon = df.groupby(['hora', 'fecha', 'id_origen_dato_externo']).agg(
total_filas=('filas', 'sum'),
porcentaje=('filas', lambda x: (x.sum() / 141) * 100)
#clave_registro2=('clave_registro2', 'first'),
#clave_registro1=('clave_registro1', 'first')
).reset_index()
# Crea Columna 'Reprocesar'
tablon['Reprocesar'] = tablon['porcentaje'].apply(lambda y: 'Si' if y > 50 else 'No')
# Filtrar tablon para obtener solo las filas donde Reprocesar es 'Si'
tablon_reprocesar = tablon[tablon['Reprocesar'] == 'Si']
# Iterar sobre el DataFrame tablon_reprocesar y rescatar las claves registro1 y registro2
for _, row in tablon_reprocesar.iterrows():
#clave_registro1 = row['clave_registro1']
#clave_registro2 = row['clave_registro2'].split(';')[0]
porcentaje = row['porcentaje']
print(f"Fecha: {row['fecha']}, Hora: {row['hora']} , Porcentaje: {porcentaje}")
#Clave_registro1: {clave_registro1}, Clave_registro2: {clave_registro2}
#print(tablon)
else:
print(f"Error en la solicitud. Cód.Respuesta: {tablonSinDatos.status_code}")
and i need is to be able to extracr "clave_registro1" y "clave_registro2". An excerpt of the endpoint information:
"id_tipo_entidad":5,"id_entidad":153,"id_medida":109,"id_dimension":92,"id_origen_dato_externo":2651,"fecha":"2023-12-06","hora":3,"filas":1,"pk_medicion":"515310992","nombre_tipo_entidad":"Estación de medición","nombre_medida":"Caudal (h, m3/s)","nombre_dimension":"Real Operacional","nombre_medicion":"Caudal Rio Laja en Tucapel","nombre_origen_dato_externo":"DGA - API","nombre_periodicidad":"Horario","clave_registro1":"08380006-2","clave_registro2":"SCBD0200;Caudal","id_externo_1":null,"id_externo_2":3221,"id_externo_3":65,"id_externo_4":null}
!code
tablonSinDatos = getMedicionesSinDatos()
if tablonSinDatos.statuscode == 200:
data = tablonSinDatos.json()
# Crea DataFrame
df = pd.DataFrame(data['data'])
# Calcula el total de medidas
total_medidas = df['filas'].sum()
# Crea el tablón
tablon = df.groupby(['hora', 'fecha', 'id_origen_dato_externo']).agg(
total_filas=('filas', 'sum'),
porcentaje=('filas', lambda x: (x.sum() / 141) * 100)
#clave_registro2=('clave_registro2', 'first'),
#clave_registro1=('clave_registro1', 'first')
).reset_index()
# Crea Columna 'Reprocesar'
tablon['Reprocesar'] = tablon['porcentaje'].apply(lambda y: 'Si' if y > 50 else 'No')
# Filtrar tablon para obtener solo las filas donde Reprocesar es 'Si'
tablon_reprocesar = tablon[tablon['Reprocesar'] == 'Si']
# Iterar sobre el DataFrame tablon_reprocesar y rescatar las claves registro1 y registro2
for , row in tablon_reprocesar.iterrows():
#clave_registro1 = row['clave_registro1']
#clave_registro2 = row['clave_registro2'].split(';')[0]
porcentaje = row['porcentaje']
print(f"Fecha: {row['fecha']}, Hora: {row['hora']} , Porcentaje: {porcentaje}")
#Clave_registro1: {clave_registro1}, Clave_registro2: {clave_registro2}
#print(tablon)
else:
print(f"Error en la solicitud. Cód.Respuesta: {tablonSinDatos.status_code}")
and i need is to be able to extract "clave_registro1" y "clave_registro2". An excerpt of the endpoint information:
"id_tipo_entidad":5,"id_entidad":153,"id_medida":109,"id_dimension":92,"id_origen_dato_externo":2651,"fecha":"2023-12-06","hora":3,"filas":1,"pk_medicion":"515310992","nombre_tipo_entidad":"Estación de medición","nombre_medida":"Caudal (h, m3/s)","nombre_dimension":"Real Operacional","nombre_medicion":"Caudal Rio Laja en Tucapel","nombre_origen_dato_externo":"DGA - API","nombre_periodicidad":"Horario","clave_registro1":"08380006-2","clave_registro2":"SCBD0200;Caudal","id_externo_1":null,"id_externo_2":3221,"id_externo_3":65,"id_externo_4":null}
You just want to filter a df to show two columns?
df_new = df[['col1', 'col2']]
no, I have that whole process which prints this:
Fecha: 2023-12-06, Hora: 0 , Porcentaje: 66.66666666666666
Fecha: 2023-12-06, Hora: 1 , Porcentaje: 80.85106382978722
Fecha: 2023-12-06, Hora: 2 , Porcentaje: 134.75177304964538
Fecha: 2023-12-06, Hora: 3 , Porcentaje: 108.51063829787233
Fecha: 2023-12-06, Hora: 8 , Porcentaje: 75.88652482269504
Fecha: 2023-12-06, Hora: 10 , Porcentaje: 130.49645390070924
Fecha: 2023-12-06, Hora: 11 , Porcentaje: 143.97163120567376
Fecha: 2023-12-06, Hora: 12 , Porcentaje: 143.97163120567376
so now with those dates and times I need to extract key1 and key2 from the api
and i dont know how to do u.u
i'm getting NaN outputs from my binary classification model. learning rate is currently 0 and i have normalization before i feed through the data
any tips?
Can you open a #❓|how-to-get-help help thread? I’m not understanding exactly what you want to do.
ok
ty
i don't get NaN outputs when working in pycharm, but i am when using google colab
it's specifically the lstm layer. no NaN in the tensor before it, mostly lstm after. if anyone could help itd be greatly appreciated
in a GPT context length determines: 1. the size of the self-attention matrix, and 2. the size of the RNN recurrent vectors, correct?
Im not sure where you're getting the RNN part, that's a different class of neural networks
to learn how this works in detail I recomend picking up pytorch and writing the transformer while using this as a reference: https://bbycroft.net/llm
A 3D animated visualization of an LLM with a walkthrough.
oh you're right, for some reason I understood that a GPT decoder block had a self-attention layer and an RNN layer
nice ty
it's a self attention module (with multiple heads) followed by an MLP that expands and contracts, the MLP is where GPT stores factual memory
noted ty
also very cool website
Using pyspark+pandas, how do I create a UDF where the two dataframes are of different sizes? I basically want to do df_a.index.isin(df_b.index) but pyspark doesn't support this by default. I found column_op, but it requires the two inputs to have the same size. What can I do?
can duckdb read parquet files without loading the entire table to memory ?
it totally can https://duckdb.org/docs/data/parquet/overview.html
Examples -- read a single Parquet file SELECT * FROM 'test.parquet'; -- figure out which columns/types are in a Parquet file DESCRIBE SELECT * FROM 'test.parquet'; -- create a table from a Parquet file CREATE TABLE test AS SELECT * FROM 'test.parquet'; -- if the file does not end in ".parquet", use the read_parquet function SELECT * FROM read_pa...
what have I been doing to my life
polars can too
With the httpfs extension, it is possible to directly query files over the HTTP(S) protocol. This works for all files supported by DuckDB or its various extensions, and provides read-only access. SELECT * FROM 'https://domain.tld/file.extension'; For CSV files, files will be downloaded entirely in most cases, due to the row-based nature of the f...
._.
this is so useful
SELECT column_a FROM 'https://domain.tld/file.parquet';
I couldn't get the lazy API to work so I never used it again
Is it better to write your machine learning code in .py form or .ipynb form?
.ipynb is good for exploration and reporting, .py is good for organization and structuring a project without losing yourself, and you can mix both ofc
if you write the code in a notebook, make sure that it works correctly when you run each cell in order exactly once.
you can use notebooks for prototyping, testing things, exploring the data and visualisations of it, making reports etc., but I would recommend using normal python files when you want to actually train or fine-tune to avoid any weirdness Jupyter can introduce, and have a normal .py file that can run inference on the model for reference
In particular, when using Jupyter you have to be extremely careful about code execution order and the overall shared global state that persists between different executions
even if you delete or edit a cell, variables defined in it will still exist unless you explicitly delete or modify them at some point
I see so for integrating the model you would suggest using .py format
One more thing
Is this equation right?
Frontend + API(model deployment) + Backend + Database
Like I just want to if its the right order
Darn, I missed DuckDB o’clock
It is now in my stack and it is not coming out, ever, it's so good
Replanning my pipelines around it
There’s some limitations on the pushdown. DuckDB discord is active with their core devs, so ask there any hard question
I see, thank you for the tip
no
The model can be anywhere from embed in the frontend running locally with something like tensorflow.js or WASM to being hosted by a third-party only your backend interacts with
the most normal way would be either integrated in your backend or in a separate service your backend talks to though
Based on my recent experience with training sentiment analysis, here's my planned improvements for a new training pipeline
- CI/CD will launch Spot and deploy prefrect without triggering any training, this was a huge issue, each time I got something wrong or any minor change, I had to wait around for the thing to start a new spot instance, install dependencies, etc. with prefrect I'll have a UI where I can trigger pipelines from
- ci/CD deployment will also expose a web ssh terminal through a port open to my IP only and also password protected, this way I can access the machine while the deployment is active, so I can debug and fix stuff without having to restart the pipeline
- I won't pre process the text into tokens, this is because I had the need to change tokenizer at least two times, and had to rebuild the dataset 3 or 4 times for one reason or another, so I'm reusing my celery setup to move all pre processing to the machine, this should be fine since the setup guarantees 0 GPU down time
- duckdb will be used to handle all data transactions, celery task will get a slice of the data, pre process it and store it to a file format that I can read from using the rust torch thing
ci/CD deployment will also expose a web ssh terminal through a port open to my IP only and also password protected, this way I can access the machine while the deployment is active, so I can debug and fix stuff without having to restart the pipeline
while I can understand why you would want that, I would strongly recommend against that - the entire point of pipelines is for them to be reproducible and work automatically
Find a way to debug it locally instead
I actually don't expect to use it that much, it's just that sometimes something happens and there's really no way of knowing what happened, it's useful
But prefrect will already allow for debugging the pipeline locally, that was an issue I was having, GitHub actions workflows are nice but sort of hard to replicate locally (I was using act for it)
I'd say the ssh thing will be used mostly in the initial deployments where I have a bug related to the deployment directly and not to the pipeline, but I'll be careful not to use it for directly fixing stuff
And since now I'll be using text instead of pre processing it into tokens, I can also do data augmentation more easily, I reckon I could get closer to the 60-65% accuracy listed on papers with code
Ah and final point is that I'll make the dataset an ENV variable, so that I can train the models across the various datasets
I think it the envs will appear as fields in the prefrect UI
And the final thing is to somehow get improved observability, I already have cloud watch setup, but it's not enough, easy to get lost in all the runs, I might just code up a simple htmx dashboard that gets exposed in a similar manner to the ssh web terminal and prints out the logs of the various services, tho at this point I might as well circumvent cloud watch and get the logs directly from the docker driver
I'm also using this thing btw: https://www.gitbook.com/
It think it even has a feature where it indexes it and you can chat with your docs
Oh, my team uses Notion
They seem very similar
Just activated the chat thing, it works
Has the usual LLM hallucination thing with the IMBD vs IMBd
Even Book dataset can be used for sentiment analysis
Uhmm I'm trying to use ones that I can find here: https://paperswithcode.com/task/sentiment-analysis
Sentiment Analysis is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either "positive", "negative", or "neutral". Given the text and accompanying labels, a model can be trained to predict the correct sentiment.
Sentiment Analysis techniques can be categorized into machine ...
This way I know at what point SOTA currently is for each dataset
Otherwise I don't know what is possible or not with each dataset, I'll be trying to get to max accuracy when it might not be possible to do so
Which is exactly what happened for like 2 weeks
Let me take a look
Whats the actual use of an API?
everything.
Could you please illustrate it with an example
the term "API" itself is a bit more generic, but talking specifically about REST APIs which are the sort of API you think about when talking about infrastructure:
They can be used to abstract away the code you are working with, let programs written in different languages interoperate, restrict permissions about what a program is able to do and hide the source code, model weights or other confidential things from the end user
the API is just the interface for some system. it's a very general concept
Does fastapi come under REST?
Or are those two different things?
Go look up some video explaining the overall idea of web APIs
FastAPI is used to create REST APIs
FastAPI lets you code HTTP API's, which can be called over the internet, but the term API is also used for all sorts of interfaces, not just the HTTP one
Idk how to explain it better, but just from the name ig API = Application Programming Interface
An application programming interface (API) is a way for two or more computer programs or components to communicate with each other.
The wiki actually does a nice job
Wait, so what is not an API tho
seems like it's not standardized, but from experience an API is any way to call a daemonized application in source code
But like, when I'm calling some lib function, I'm using an API right
pretty much everything is an API
when you import math; something = math.sin(...) you're using the API provided by the math package
when you py class Foo: def method(self, ...): ... you're creating a class Foo which provides a Foo.method API
in practice people just don't refer to these cases as APIs though (unless it's something like, the pandas API which is significantly different from normal python), in part because otherwise it would be meaningless
So api is way in which programs communicate with each other/
It both makes total sense to me but at the same time idk how it is not too general to be useful
the generic meaning is too generic to be useful
what people actually mean when they refer to APIs is only a subset of that though, so it remains useful
in particular, having an API at all versus not having an API is a fairly big difference
imo makes sense to divide APIs into modules/libraries APIs and daemon APIs
web APIs being a subset of daemon APIs
Ig if I had to put into words what I think of when I think of APIs, I'd say something like, a pre-defined interface, that is usually at least meant to be stable and documented, that my program uses to interact with external code or to other parts of my program
okay guys I did get a general idea of what an api is and where it is used
clicking buttons is not an API, that's a GUI
also the interaction between compiled things (like a shared library & an executable) is called an ABI (Application Binary Interface)
Oh that's interesting actually
hi, I have a doubt:
How to all_gather a tensor in Dataparallel??
I know all_gather in distributed data parallel, but i am having trouble understanding how to achieve that in dataparallel
Online reference that i have referenced, suggest some wierd ways(atleast to me it sound weird), for example having tensor of zero and appending tensor parts on different gpu to it.
Can someone please give a sample syntaxx/ or easy to understand reference.
Thank you
@app.get("/student/{student_id}")
def get_student(student_id : int = Path(None,description="Enter the ID of the student:",gt=0,le=3)):
return students[student_id]
^^^^^^^^^^^^
File "/home/nikhilds/ProLang/Python/FastAPI/.venv/lib/python3.11/site-packages/fastapi/params.py", line 182, in init
assert default is ..., "Path parameters cannot have a default value"
^^^^^^^^^^^^^^
Why cant I set the Path to None?
Anybody worked on multimodal RAG system? I’m currently working on it and need inputs
any papers about correlation between number of training tokens and model convergence?
Hey, the issue is that PyAudio works fine locally, but when deployed on the web, it can't recognize any input devices. It seems to be looking for audio devices in the hosted environment (streamlit). I need to check and set up the necessary audio dependencies and permissions for the deployment environment. Any suggestions?
Don't ask to ask, just ask your actual question straight way
Any and all python libraries without explicit support for streamlit will not be able to listen to the user's microphone through the browser, you need to run something in the front end for that to work.
From googling streamlit audio input it looks like there are a bunch of community workarounds that involve a little JavaScript to get it to work, but no native componetns provided by streamlit
gradio supports it though, if you're not too tied to streamlit and want to avoid extra pip installs https://www.gradio.app/docs/audio
I've experimented with WebRTC and the built-in Streamlit functions, but none seem to match the functionality I get with PyAudio. It's been a bit challenging finding a suitable alternative.
I guess javascript is better , what do you think?
Hi Guys, I just came across this question Audio display and thought to myself what if we want to do the opposite 😆, Its pretty straight forward if you are listening through mic where streamlit server is hosted but it gets a little tricky if you want to do it on client side. Worry not. Javascript to the rescue, Checkout this ( buggy 😓 ) snippe...
go with whatever works for your use case
I'd personally avoid having to write/maintain any JS code myself, but that is just me hating front end
I totally get it; front-end work can be a bit challenging. Considering my limited knowledge of JavaScript, I might give it a shot using JS and HTML5. Anyways thanks for your help buddy
anyone do interop with R?
as usual, don't ask to ask 🙂
if you're just looking for an experience report, yes i've used rpy2 several times over the years. requires a bit of setup in your script/notebook and some careful reading of the docs, but if you know R already it works pretty well. might be hard to use if you don't already know R.
i didn't really have a question heheh
nice!
i have not done it the other way, calling python from R. but i know there is a well-maintained package for it, developed by the rstudio people
i think it's called "reticulate"
cool, i'm thinking of maybe offloading some stats to R so I'm starting to look into this
careful, you might realize that R is actually a great tool and start wanting to use it more 😉
but really that's a good way to use it. if something is missing from statsmodels, or you just happen to know how to do it better in R, hand it over to R
the only thing to be wary of is that rpy2 does need to copy your data and send it over and interprocess pipe. so if you have anything "big", you might want to just write an R script and load it from disk or database directly.
worst case scenario, make a CLI and call shell, i've already done this a lot for common lisp/haskell/python interop
yup. that's definitely a way to do it
curious what you're doing with all 3 of those languages in the same project
concurrency!
(i bet there's a cl2py library floating around somewhere)
haskell for concurrency, common lisp as main driver, python when i don't want to think
interesting. common lisp almost acting as a shell in that sense?
what kind of a project did you use that for?
so far I've got a single actual use, but the idea was to make a more general workflow where I could spend time in whichever programming language i want by dividing programs into very small pieces and whenever I feel like something is better done in one language, I would just switch to it, supposing the incurred overhead isn't problematic (so this sort of thing is not for computing-intensive applications)
oh i see. sure, that's a fun idea. very "unix philosophy".
but the only implementation I have is an strace parser, which uses pyparsing for the parsing, haskell calls that handle the IO, and little bits of common lisp here and there. it was a short little project, it's more of an idea but I don't think it's really anything innovative or too hard to implement
One of my colleagues uses Reticulate
I personally avoid using R as much as possible nowadays
It's a statistics DSL for me
Guys, my VS code works, but tensorflow_probability.distributions IntelliSense is broken. So I started digging source code and found that for some reason I cannot import tensorflow.compat.v1 and tensorflow.compat.v2. I could change this in source code, but it would take a very long time as this is broken in many many places. Does anybody know what's wrong with the source code?
Python version 3.11.4
pip version 23.3.2
pipenv version 2023.12.1
[packages]
tensorflow = "==2.14.0"
torch = "==2.0.0"
pybullet = "==3.2.5"
matplotlib = "==3.8.2"
gym = "==0.26.2"
pygame = "==2.5.2"
tensorflow-probability = "==0.22.0"
I want to improve myself in AI. What advice do you have for me?
what's the right way to transform a dict of dicts where the outer keys are the rows and inner keys the columns into a polars dataframe
ah I found it... from_dicts
if you are new to AI, there are no shortcuts other than reading AI books, solving AI, maths and coding exercises, experiment with AI code and watching tutorials and reading AI papers
read and understand nanoGPT and you will understand AI. Checkout out karpathy's videos on youtube and UT Austin's NLP series
https://bbycroft.net/llm <- of use
Context: This code is based on a 3 layer fully connected neural network trained on had written numbers 0-9. This back query code will then take in an output value of 0.99,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01 and then its run backward through the network to get the pixel values at the beginning of the network to see what the defention of a 0 is to the network.
So my question is after the inverse sigmoid is applied amd that vector is multiplied by the vector of the transposed weight matrix how is that supposed to give me the activation values from the layer previous because if I do a dot product between two matrices 𝑊∗𝑋=𝑍
and then transpose 𝑊.𝑇∗𝑍
that does not give me X ? So then how could back query be useful ? It clearly is useful cause when I run the code it shows me the networks idea of a 0 but cant piece together how it works in my head.
Hi, someone who has a data science/ML/AI related job, can you guide me on how to make my resume pls
How should I structure it, what type of info. should I add, etc etc
(If you can give me a template, that would be helpful.)
it wouldn't be fundamentally different from other resumes. what matters is what skills and experience you actually have, and conveying it as best you can.
so, why should someone hire you for an ML position?
this is the resume template that I use btw https://www.overleaf.com/latex/templates/awesome-cv/dfnvtnhzhhbm
Thank you!!
I’m new to ai (jr. in uni); anyone have any recommendations for a yt or intro area for image recognition?
This is about nn
https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
Couldn't find the playlist, but if you scroll back to videos from 6 years ago, there are many about cv
Welcome to the official DeepLearning.AI YouTube channel! Here you can find the videos from our Coursera programs on machine learning as well as recorded events.
DeepLearning.AI was founded in 2017 by machine learning and education pioneer Andrew Ng to fill a need for world-class AI education.
DeepLearning.AI has created high-quality AI program...
OpenCV also has some courses, but I have no firsthand xp with them https://opencv.org/university/free-courses/?utm_source=opcv&utm_medium=menu&utm_campaign=obc