#data-science-and-ml
1 messages · Page 133 of 1
And you could construct the algo however you wanted
And just like you, EAs are things I use at work but super infrequently
I think they are underrated and DL has taken quite a bit of the spotlight. But they are still worth knowing and having in your toolbox
Also it was cool to see some of the papers on EA applied to DL architectures
or even the evolution of weights
To me they occupy a very different space
Needing data vs not needing data
Interesting
When I need to fit some curve and I have no idea of the properties (convexity or so) + hyperparameter tuning
But I agree, they're important to have in your toolbox
My colleagues would try and solve typical problems like TSP with machine learning
Or nurse scheduling, (3D) knapsack etc.
oh yeah they are fun
Check the pinned posts 😄
ok
check out the pinned resources
Follow-up on this: What benchmarks you look for when evaluating a DL/ML classification paper. I was wondering things such as dataset, accuracy, confusion matrix. Is there anything else that can give me insights about their attempt? Thanks man 😄
which social media site is good for a sentiment analysis project? i was going for twitter but its api is paid now'
accuracy may not be great when dealing with imbalanced data
e.g. say you have a house fire dataset where only 1/100 houses actually caught fire, you can get 99% accuracy by always guessing no fire, but that model's terrible in practice (it's more important to correctly predict those that may catch fire to deal with it preemptively, than to correctly predict those that don't catch fire)
so you have 3 more stats, precision, recall, and f1.
to put it simply, if I were hunting a group of 100 ducks and got 60, I'd have 60% recall(which measures how many sheep I got out of the total); but in the process I also mistakenly shot 60 geese, then I'd have a 50% precision(which measures how good I was at shooting actual sheep and not something else)
I could be more cautious and only shoot those that I'm sure are ducks, maybe then I'd get 20 ducks and 0 goose, then I'd have 20% recall and 100% precision
f1 is like a healthy blend of precision and recall
Can't I directly calculate precision, recall and f1 from the confusion matrix?
you can also directly calculate accuracy from the confusion matrix, what's your point?
I get what you are saying. I was reffering to having cm is sufficient enough to calculate those parameters
Like why would I explicitly look for f1,recall and precision when I can see them from the cm?
So I've finished my Youtube Dislikes Predictor with linear regression, but I'd like to expand upon it. In particular, I am looking at these two unused features. From a glance, am I likely to get any possible insight from these variables at a significant level? If so, what techniques do you recommend I try?
fair enough
Also research papers might be pretty faked. Am i missing some metrics that can evaluate attempt
And help me identify weak links within the model
if i am using ollama locally
does it restart the specified model each time i make a request?
or are all the models that I pulled always running
i can see that ollama is always running
there is a keep_alive parameter which "controls how long the model will stay loaded into memory following the request (default: 5m)" (src: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion)
How does go mod tidy search for the module? Like in what order?
I want to transform large amount of data from kaggle to the cloud in database how can I do it I want to build pipeline ??
is it fine to learn nlp even i dont master python yet
learn python first!
i already know python but not mastered yet
I wonder how good it is to fine tune a small English pre trained model for a non-english language task
Or if it would just be better to pre train a very model entirely on the other language and specialize the task
Depends on how close the language is to English, but I'd say it's probably better to train a model from scratch
Hi everyone, Im Anna. I'm currently finishing my MBA with a focus on quantitative finance and for my dissertation I'd like to do a paper on "Machine Learning for Financial Market Forecasting: Unveiling Hidden Patterns for Informed Decisions" using LLM. I have a grounding in data science and I'm currently completing a course on the subject. However, I need help with the practical part of the model and training. If someone is interested in the subject and wants to help me, feel free to write me inbox so we can talk about it. Thanks
Could someone help me get started with the first task please? 🙂
This is how far I've gotten 🤦:
class Interval:
def __init__(…)
is it meant to say that c or d can't be 0 or that there can't be a 0 in that range? cuz I don't see why there couldn't be a 0 somewhere in that range
mmm, well, what exact next steps are you looking for? like, what exactly is not clear to you? I personally find the operator rules translate really well to Python
That's a interesting inquiry, I can check with my teacher.
Basically I don't know what the theory is even covering, like what am I looking at, it's just a bunch of jargon
well, let's break it down a bit
well, first, how much do you know about classes in general?
how many instance attributes would a single interval need?
or even earlier, how many arguments would __init__ need to take? (excluding self and assuming a simple case like the task suggests)
i was referring to the math related stuff, what in it do i need to understand to tackle the first task
it's just a bunch of information, would help if it was broken down
well, naively speaking, you pretty much need to grasp this bit
like, what part of the math do you not understand? cuz I find it explained relatively clearly 
why do min and max get involved for the latter two?
tbf, they are technically involved for the first two as well, but they can be simplified in those two cases
This reminds my of last years Advent of Code, wasn't there a task one could solve with this approach?
for multiplication I can think of cases where the smallest (left) values can multiply and return a larger value than other combinations
same for division
like, you can't just simplify them to simple arithmetic because there are several possible outcomes depending on the values used
I suppose they can be alternatively written as piecewise functions as well
I don't even know where to begin breaking it down 
like, do you have at least some intuition on what an interval is or what +/- is?
yes, [a, b] could mean that there's some parameter t that is within the bounds of a and b, e.g. a≤t≤b
and yes, i know how arithmetic operators like + and - works
does the task have to do with error margins?
This is about the butterfly effect
cuz division by 0
tbf, task 1 is simply asking you to create a class that takes in two values and that's about it
it's an "or" question 
I don't understand why there can't be a 0 somewhere in the range (a, b)
It’s marginal error
i don't see any question at all, the first image is all definitions
well, you replied to my question
He wants it explained
naturally if you want to divide a by b, b can't be 0
Then what the fuck was all that theory laden jargon about
Overwhelming the student
Teachers need to learn to condense the content
if 0 is in the interval, you'd have to split it in two because 0 wouldn't be in the domain anyway
well, you should have understood that after reading that whole thing 
you'd also have half open intervals
It gave you everything upfront for future tasks.
Care to elaborate with an EASY example?
students need to learn to read and extract important info
mmm, is it to do with how division is weird with stuff? like, how you can't just simplify some stuff sometimes because there's division, I think we had a discussion about this at some point
division by 0 is undefined and is always problematic
When you do allow division by zero, it's usually boring (e.g. all numbers are now equal to zero).
I’m abouta give my teacher some problems
They gave the F=ma example, it's basically trying to keep track of what the possible values could be (smallest to largest).
(An interval)
TLDR here’s how you can find out how to calculate the error margin, or the range within which the value you’re seeking is in?
(Or with more dimensions, a bounding box)
It’s a marginal difference between the actual size/weight between then actual figure unit
It come out to play in a very big way
Like in the stock market
Go to bbc channel then you find it
“The butterfly effect”
It could be for errors, but also just like for example "it could end up anywhere in between these two."
can it be interpreted as all [infinitely many? uncountably many?] values from one interval being divided by all values from the other interval? and that happens to include 0, so that bit has to be excluded (with say half open intervals)
Like if I throw a ball, and want to say it could end up in this interval, depending on this interval of mass, starting velocity, etc. And I want to calculate that interval.
It’s complex but you can find more there
It’s not okay if I post the link here
It sounds familiar. The butterfly effect, isn’t it that a butterfly could cause a typhoon by flapping its wings since it has a negligible ,but nevertheless a contribution, to the typhoon occurring?
sure, but I'm not sure how that's exactly related to the topic at hand
me neither
Yeah something like that
As it flaps it wings it’s could cause a tornado/hurricane in somewhere like in the US if the butterfly if it’s in Mexico or there about
Alright bro but what does it have to do with the theory
all possible combinations, yes
Yeah it is
It’s a bit complex that what makes final error is about that is minor but comes out to play a big difference
That just explaining I didn’t mean it
I see, that's cool
it's just formulated in a way you're not used to
you've done this all along whenever you got those questions about domain and range of functions
elementary arithmetic operators are binary functions too, and this shows one way of studying the domain and range
yeah, but using two ranges arithmetically threw me off a bit
@cloud flower
It’s proportional to the actual figure by a slight difference
When you guys are done chatting about the math stuff I will repost my question and what progress I’ve made🖐️
Just remembered 🧠 my brains now working harder
why does tensorflow upwards of 2.10 not support native gpu support on windows
so i have this very simple code: print(data["rank"]) raster["rank"] = data["rank"] print(raster["rank"]) which somehow outputs```idx
0 65.686275
1 77.450980
2 80.392157
3 37.254902
4 68.627451
...
576 68.000000
577 51.000000
578 46.000000
579 83.000000
580 84.000000
Name: rank, Length: 581, dtype: float64
0001_U_0018_2010 NaN
0010_L_0002_2010 NaN
0012_L_0048_2010 NaN
0013_U_0016_2010 NaN
0015_L_0050_2010 NaN
..
0072_L_0031_2010 NaN
0008_U_0012_2010 NaN
0009_L_0086_2010 NaN
0009_U_0009_2010 NaN
0009_U_0034_2010 NaN
damn guys, I dont think its posssible to running on my system. I've tried a lot of different things, but I would have to find something with more GPU memory.
Traceback (most recent call last):
File "/home/plunder/CATMANDODO63.py", line 692, in <module>
main()
File "/home/plunder/CATMANDODO63.py", line 687, in main
model = train(model, train_dataloader, optimizer, criterion, tokenizer, device, epochs, val_dataloader, num_frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/plunder/CATMANDODO63.py", line 423, in train
scaler.step(optimizer)
File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py", line 416, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py", line 315, in _maybe_opt_step
retval = optimizer.step(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/adam.py", line 163, in step
adam(
File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/adam.py", line 311, in adam
func(params,
File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/adam.py", line 565, in _multi_tensor_adam
exp_avg_sq_sqrt = torch._foreach_sqrt(device_exp_avg_sqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB. GPU 0 has a total capacty of 23.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 66.75 GiB is allocated by PyTorch, and 15.41 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(qusar) plunder@localhost:~$```
I feel like the code is right there, I just dont have the resources to run it.
I tried so many different things. But I i need commerical equipment at this point I guess. Switching the batch size, number of frames, accumlation steps in the training.. I can get it to go for a bit, but runs out of memory.
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
Heres the code. https://paste.pythondiscord.com/MV3A
Anyone have any suggestions?
Maybe I can use a gpt2 distill instead...
how long are your videos?
like 10-15 seconds.
how many fps?
25/s
so you are effectively working with a batch size of 375, using an extremely large model in consumer grade GPUs?
I'm not sure if I understood your loading code
I had a batch size of 16, tried to reduce it all the way down to 2. It runs for a bit. Input to VideoEncoder: batch_size=2, num_frames=16, channels=3, height=128, width=128
So no 375 batch sizes 🙂
I must have misunderstood how it integrates with the loader then
is this the correct way to do sentiment analysis
from textblob import TextBlob
def polarity(text):
return TextBlob(text).polarity
df['polarity'] = df['lyric'].apply(polarity)
def sentiment(label):
if label < 0:
return "Negative"
elif label == 0:
return "Neutral"
elif label >= 0:
return "Positive"
df['sentiment'] = df['polarity'].apply(sentiment)
I think I got it to work incorporating gardiuent accumulation and tensor management.
Keep my fingers crossed! Epoch 1/1: 16%|█████████ | 231/1402 [15:59<1:33:01, 4.77s/batch, Batch Loss=0.00133]Input to VideoEncoder: batch_size=4, num_frames=8, channels=3, height=128, width=128 After view reshape: torch.Size([4, 24, 128, 128]) After conv2d_layers: torch.Size([4, 512, 128, 128]) After view reshape before fc: torch.Size([4, 8388608]) After fc layer: torch.Size([4, 512]) Input to VectorQuantizer: torch.Size([4, 512]) After flattening and reshaping: torch.Size([32, 64]) Distances shape: torch.Size([32, 512]) Encoding indices shape: torch.Size([32]) Quantized tensor shape: torch.Size([4, 512]) Commitment loss: 1.738540959195234e-05, Codebook loss: 1.738540959195234e-05, VQ loss: 2.1731761080445722e-05 Input to VideoDecoder: torch.Size([4, 512]) After fc layer: torch.Size([4, 131072]) After view reshape: torch.Size([4, 512, 16, 16]) After conv_reduce: torch.Size([4, 512, 16, 16]) After conv2d_transpose_layers: torch.Size([4, 24, 128, 128]) Channels: 3, Expected size: 1572864, Actual size: 1572864 Final output shape: torch.Size([4, 8, 3, 128, 128]) Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation. Epoch 1/1: 17%|█████████ | 232/1402 [16:04<1:32:57, 4.77s/batch, Batch Loss=0.00223]
How much math, is enough math
to get started
i've studied a bit calculas in high-school also linear algebra,
do u have any specific resource for Stats and Probability to be specific? @final kiln
the problem with khan academy is
they have tooo toooo tooo deept in content
it literally took me idk how many hours to just do 1 out of 3 unit of Linear Algebra
also,
_ practicing frequently with py and jupyter notebooks_
how can i practice math with kaggle 🧐 i mean from python/coding itself
how can i practice math from coding
ohh
how can i do stats and probability from python
ohh
okay
ohh
Hey folks - I have to routinely run some data manipulation of my postgres database (likely once per day), I am considering making a python script deployed on k8 as a cronjob, but it feels a bit like overkill. Are there simpler methods?
Why do peeps code AI in notebook, and not just python files like every programmer does?
Hey there im using Autogen AI conversable agents to generate images. In my output i have a few urls that i need to get out of the conversation but i can't seem to find a solution for this problem. In the omage you can see one of the agents talking and giving an output url "ai_generated_img" with the link that i want to save in a variable
Can we consider supervised learning, as where result is known, and unsupervised where the output is not known?
it's not necessarily "known" / "unknown", but rather labelled or not
take outlier detection an example - you may train a model without telling it which records are outliers, then validate that it is catching all records you know that are outliers
another example would be the unsupervised pre-training step of GPT models and alike, I wouldn't consider there to be any unknowns, but it's still considered unsupervised
I was reading a book "100 pages ML book" and it has this formula of Support Vector Machine (SVM)
y = sign(wx - b)
Then it continues to say that ```math
wx - b >= +1 if y = +1, and
wx - b <= -1 if y = ≠1
But shouldn't it be using limits here?
Since, `sign(0.01) = +1` and `sign(-0.01) = -1`
@agile cobalt i would actually call known vs unknown as you did, since the output of a network is not always the label, and some unsupervised and self supervised approaches treat the input data as its own label too
Do you have to remember lots of formulas for ML?
I'm not an ML guy, but I think in math in general what matters is understanding the concepts behind the formula, not memorizing the formila
o.O
i agree with that
i was also about to write this as well. it just happens inevitably
So... I have two training/learning question. So my IT (A) and a co-worker (B) would like to learn python data entry, transformation, and clean up.
The packages I know about are scikitlearn, scipy, and pandas.
Q1) I primarily write in R and python mix (using both together), but I use pipe notation in R (see link). My boss also uses R notation, but does not know pandas. Only myself and IT person A have used pandas. https://www.r-bloggers.com/2021/05/the-new-r-pipe/
Does the scientific python packages have a general pipe operator or function for general python and data clean up? Because I like to convert my code over pure to python so they can learn better.
R 4.1.0 is out! And if version 4.0.0 made history with the revolutionary change of stringAsFactors = FALSE, the big splashing news in this next version is the implementation of a native pipe. The new pipe The “pipe” is one of the most distinctive qualities of tidyverse/dplyr code. I’m sure you’ve used or seen something like this: library(dplyr) ...
Q2) Are there any new scientific packages or resources since 2022 that people recommend to make coding easier?
how does YOLO object detection predict bounding boxes? i understand that it uses a grid and each cell predicts a bounding box(es) and class but i don't understand how its possible. does each grid cell run its own classifier so to speak?
Not sure if this answers your question but I found a related blog post: www.analyticsvidhya.com/blog/2018/12/practical-guide-object-detection-yolo-framewor-python/
Is it possible to fine-tune a model using 100 classes, in a laptop with a RTX 4060?
what is the least jarring way to step into learning more about the engineering side of AI?
or do I just need to go headfirst into learning the scary half of calculus?
I know a little bit about the big three (most comfortable in Calc, okay in stats and shaky in linear algebra)
i would argue that the linalg is the most important component, and the stats right after
the calc is often used more as a tool in helping out with getting nice results for the other two
okay thank you
Can running neural networks dismantle your pc?
And, when is imagedatagenerstor better? Under what circumstances?
Calculus is very easy. Just don’t let people get in your head.
doesn't it get really wild as you start sinking deeper?
Define "the engineering side"?
Being a consumer of GenAI libraries and services needs absolutely 0 math
And it's, for better or for worse, what many companies consider to be the "engineering side of AI"
I don’t know. Whatever is “hard” is dependent on the person. There is nothing that cannot be learned. I thought Game Theory was harder than like the most insane math classes ever.
idealistically working towards being able to make my own from the ground up
whether it ever makes sense to actually do that or not is an entirely seperate question
I think I might be mixing it up with something else at this point
Just, learn it, listen to no one’s opinion on it who isn’t suited to teach it, and decide for yourself.
I thought calc 3 was a walk in the park. Like, dirt easy.
Ok, like what then?
Who said I didn’t?
No, I just took so many partials and optimization problems with constraints to the point it was easy when I took it.
Bro, I took adanced calculus 1 and grad level optimization class. What I meant by hard, depends on the person. I don’t understand finance at all, but calculus never ever gave me a problem.
Ok, give me a problem. I don’t know how to answer that.
Like, end of calc 3 with cramers rule? I don’t know. Not even bad
All I am saying is it just depends on the person. I agree, abstract algebra is cancer, I hate Game Theory, but that is because there are not as many books on it as there are for calculus, matrix and linear algebra, stats/probability. I just think people should decide for themselves.
I don’t remember much, this was a while ago. I took calc1-3 in 2016-2017, I just remember game theory was insanely hard to me.
Sounds tough as hell.
But I've also heard that real analysis is very hard in general, personally I have no idea what it is
That sucks
Real analysis is stupid hard tho in all honesty
Stats, like I took stats 1-3, metrics 1-3, did well in all of them and I don’t remember anything from it. Like, basic concepts.
It’s not that I don’t get it, it’s just easier to explain concepts. I don’t know calc1-3, matrix and linear algebra, optimization were always much easier to me compared to even like finance stuff. I swear, it took me so long to understand the concept of a bond.
Is udemy free course on AI/ML is recognized as advertisement?
I mean I am new here, not much aware of rules, hence asking...
Can you remove this message? ads are against the rule and it's an advertisement
Any of you ever take Game Theory? It is a literal branch of mathematics.
The decoder in a transformer is trained on a whole sentence at once. It has a mask during self attention to prevent the an earlier timestep from looking forward and directly seeing what it should be outputting.
As far as I am aware the feedforward layer near the end is just an MLP, and they work by having every neuron in a subsequent layer sees all the outputs of the previous layer.
How is looking forward also prevented in the feedforward layer?
Thx did not know that
I assume the encoder is also only by embedding?
Thx
Thank you for acknowledging it as a branch of mathematics. It was until the 70s or something and all the pioneers of game, pretty much had an insane influence or partial differential equations.
@iron basalt
It's a field of mathematics, yes.
Invented / pushed by the cold war to beat the soviets.
(Its modern form (although like all math, you can find it waaaay earlier))
Industrial Organization!!!!! Yo! Bertrand games are so fun!
Bertrand Duopolies were just limits pretty much, but for price wars. Industrial Organization is my favorite topic of all time.
It's really fun and will change your perspective on all the actions taken by nations and such (a better understanding of why they are doing what they do (they calculate things, especially in wars)).
Yes
Was confused me were sequential move games when the second agent/player had multiple moves at a node when the player before already moved. I never understood that logic at all. And Bayesian Nash equilibrium, pretty much just sequential games, not simultaneous games
Bertrand and Cournout. I love game, just really wish I understood when similar about games turn sequential and then players have more moves for some reason.
Combinatorial game theory.
Do you know what I am talking about tho? I get they are using combinations of all possible moves, but, the person could not move at a specific node so I never understood roll they got second mover advantage in rollback equilibrium.
I did. I was taught Game Theory & Gambler's Ruin in my Statistics course.
Not sure, the sentence is a bit hard to follow. Do you mean how second-mover advantage can be a thing / under what circumstances?
so i am trying to create custom tool for agent but why does it get stuck like this?
what does this even mean. it runs the tool perfectly
@tool
def get_prices(query:str):
"""Can be used to get current market prices of any crypto asset"""
return 69.69
prices_tool = Tool(name = 'crypto prices',func= get_prices,description="Can be used to get current prices of crypto assets")
x = [prices_tool]
agent = initialize_agent(x,llm,AgentType.ZERO_SHOT_REACT_DESCRIPTION,verbose= True, handle_parsing_errors =True)```
This is how i am using it (hardcoded for now bcz it is faster)
LLMs have totally replaced regex for me
because you ask the LLM to write the regex, or what?
no like most of the times llms are just using natural language string to get some some data from that string
used to do regex for that and now everyone wants to use llm for it
also llm write regex really good for some reason,
🤨
No, when the rollback from an extensive for game changes because it was fist a simultaneous move game.
I drew out a example very quick
Anyone understand why the d2 vector for language is 0, and why on d1 “what” is 0.25 and candy 0.125 if they both show up once?
If it becomes sequential the second to move player can condition their move on the first player-to-move's move.
Now I know. It started out as a simultaneous move game and the row player had two plans of action but since it was originally a simultaneous game, the sequential move benefits the second against because he now has 4 options instead of 2. Yeah. It’s just 8 plans of action now. While the row player still only has two the column player picks the highest payoff.
Depending on the payoffs even after going sequential it could be first to move advantage (manipulate the possible moves for the second to move player) or second to move advantage (condition on first move made (more information (which may mean more moves))).
In this game, if the row player were to move second, would it be the same scenario and what is the rollback for that game? I am trying to remember the reasoning behind this. I’ve been relearning game for fun.
Are you referring to the information set? Separating equilibrium? And grim trigger and fit-for tat, was the made solely for collusion? Infinitely repeated prisoners dilemma.
So we don't have to refer to them as row and column player I found this diagram, with same payoffs:
(Up,low)
White plays low if black plays up, and white plays high if black plays down. This is the only subgame perfect equilibrium.
When you install a library through pip on Windows, sometimes you may encounter this error:
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
This means the library you're installing has code written in other languages and needs additional tools to install. To install these tools, follow the following steps: (Requires 6GB+ disk space)
1. Open https://visualstudio.microsoft.com/visual-cpp-build-tools/.
2. Click Download Build Tools >. A file named vs_BuildTools or vs_BuildTools.exe should start downloading. If no downloads start after a few seconds, click click here to retry.
3. Run the downloaded file. Click Continue to proceed.
4. Choose C++ build tools and press Install. You may need a reboot after the installation.
5. Try installing the library via pip again.
In that sequential game it would have to be (up,low). If black plays down, white plays high, this gives black a lower payoff. Black always chooses up and white always chooses low, unless the game becomes simultaneous. Then, it would depend in the second mover.
Black plays down.
It's first mover advantage.
This has second mover advantage:
Oh, white has the payoffs to the right in that game.
Yeah, I was reading it (white, black) in terms of payoffs
I was a bit confused too, used to the chess order
That game is weird. But yeah, Bernard would have: (zig,zig), (zig,zag), (zag, zig), (zag,zag) and Amy would only have up and down
Yeah.
I think you got it.
I do get it, it is just weird when the game starts out sequential and turns simultaneous.
I drew it out
I drew the starting game table and did when each person moved first. They have the same payoffs demo ending on the order of the moves and there is no pure strategy Nash equilibrium in the original game.
Hi! Apologies if this is a dumb question, I'm not a data scientist, just a programmer 🙂
I have two 2D spaces. In each space there are some number of points, each identified by (x,y) and some (text) label. The labels are not guaranteed to be unique (there might be two points in the same space with the same label), and there's no guarantee that every label in one space also exists in the other, but there will be a lot of correlation.
What I want to do is find a best-match transformation matrix to convert coordinates from one space into the other. This seems like the sort of thing that ought to exist (image processing has to have this covered, right?), but I'm not sure what I should be looking for. Can someone point me in the right direction, please?
not trying to blow you up about game man, I just kind forgot how much fun it was (and just doing math in general on paper was instead of programming all of the time) but like, with grim trigger, can that be used to detect if a a company will collude or defect on a collusion data set?
how are convolutional layers (output tensor) flattened into fully connected layers?
for example, in here, are there really 1690 * 10 * classes (10) weights to learn just for the final layer?
everyone is!
its 1690 * 10 weights
its just 13x13x10 flattened into 1690 inputs, and 10 outputs, so the weights are 1690*10 matrix
Hi guys,
I've integrated a RAG LLM-based OpenAI chatbot into my app. Now, I want to implement algorithms within the chatbot, like procedural steps that my chatbot will do sequentially. For example, the chatbot first asks the user for name, then save that information, then ask for their age, then again saves it, and finally when he has all the necessary informations he computes something based on this and then provides the result.
Is there a tool or method to achieve this within Langchain framework? Or can you point me to some keyword name. Thanks 😄
Hi everyone,
a very basic question from me, to decide between two implementation of Tensorflow: C API or Python?
I understand that Python is still implementing code written in C, but I am wondering if the size of data I am dealing with, makes it more time-consuming? I am trying to simulate a recursive iteration process of the type:
x[i+1] = f(x[i])
where x[i] is a vector with millions of components. Conventional iterative methods lead to "stagnation", kind of like vanishing gradient. So x[i+1] ~ x[i] after a while. I have tried updating the iteration scheme itself, include higher-order terms, but it is already slower.
So I was thinking maybe with some architecture of NNs, it might be possible to "jump ahead" in the iteration scheme, to accelerate convergence and get out of stagnating solutions.
Now comes the issue of what implementation to use: the standard, well-documenter Python Tensorflow, or the C API that barely had any documentation, just a loose clump of github pages. I thought that even if training is done in C/C++, for a NN defined in Python, maybe it would be faster? Or is it not so? Even if there's some degree of speed-up in C++, is it possible to implement it as quickly and consistently as in Python (using Python 3, if that matters). All CPU too, btw, no GPU. Got a whole cluster of hundreds of CPUs.
Thank you for any help/insights/suggestions!
Are you asking whether there's an appreciable performance difference between using tensorflow in c++ vs Python?
Yes, especially with the millions of datapoints here. I found a stackoverflow post that first said that OP found C++ API was somehow slower than Python. Not much responses there and later OP found some way to get speed-up, but no clarity on if it is appreciably faster than Python.
I don't have direct experience, so I'm speaking to generalities: the Python interface to tensorflow is the primary one, and most used one. I doubt you'd notice any improvements, and certainly would have a worse development experience.
But, perhaps in some isolated use case, one might find ways of improving... but I'd think you'd be at some expert level and a year or more of experience
Yeah, I just need to read it from others with more experience. I have tried Keras here and there, and now I have to go a level lower and use Tensorflow directly. I hope whatever model is implemented in Python can handle the data. Of course, people do image processing and all in Tensorflow Python, but this recursive iteration scheme might need some matrix multiplication operations (that are already millions x millions ~ quadrillions of unique elements).
matrix multiplication never happens directly in python
If the main thing that the iteration scheme does is matrix multiplication, that's evidence towards there not being a speedup from doing it in a compiled language, because multiplying matrices on the python side is already very efficient.
But doing these kind of multiplications is very slow. Not even enough RAM. Then I have to do some multiprocessing parallelisation, which is still way slower than openmp for loops in C.
...what do you mean? Are you writing your own matric multiplication function from scratch?
Since numpy's matmul implementation is already in a compiled language, and can use multiple cores if you have the right numpy build (I think it has to be the MKL one and not the BLAS one).
No, I cut the matrix into chunks, do matrix multiplication using numpy routines and put the result back together
Ah, I see. Is your matrix sparse?
Nah, pretty dense matrix, hence the quadrillions of elements.
Hmm, and yet it works better to manually split it into chunks than to let numpy handle it?
It doesn't even fit in the RAM. TBs of matrix alone. So I cut it into pieces and do batch-wise multiplication.
Oh wait, quadrillions of elements - so you can't even- yeah, okay, that makes sense
I wonder if something like dask has a streaming matmul implementation that'd work here, but it makes sense that you made your own. Anyway, naively it seems to me like there's nothing there that can't be done efficiently in python (do a partial matmul via numpy, afterwards (or even in the process) start loading the next chunk, etc), but I might not be thinking about some details of your implementation.
Using for loops to cut chunks then parallise the chunks. For loops already bad enough. OTOH, for loops in C are just simple, all static data types. Another advantage of C is now I don't need to define thr whole matrix. Just parallise the for loops, do some aggressive compilation, and it works out. Was hoping to get that compiler magic for NN training too.
For loops already bad enough
I don't really get why. The overhead of things like looping in python becomes noticable when the body of the loop takes very little time - but for you it's a pretty big matmul, so it seems to me that it shouldn't matter.
I did the chunks thing in python first. It is painful. I was thinking dask etc, but too much abstraction is going to make it slower. Trying to keep it simple. I wanted to try and "strip" all the OOP stuff on numpy matrices, if that made it faster. Was thinking of Cython because of it. Had some discussion here, found even Cython is at best a bit slower than pure C, so switched to C.
I am guessing it's the checking for datatypes and each loop has to then start the multiprocessing part. JIT and numba were other ideas to make the loops faster, but if I can get fastest-ish in C, why the extra effort?
oh, If you're using multiprocessing I actually have a guess what was happenning - it was probably the fact that arguments to functions generally need to be serialized in order to send them to another process. So you might have been eating the serialization overhead on all of that data, which is quite a lot.
Now with tensorflow, situation is different. Not seeing much discussion on the C API, just bits and pieces scattered around. Official Tensorflow website has a barebones as well on C API. So have to contend with this in Python.
I don't know much about the tensorflow C API (when I tried using libtorch my experience was that there was basically no docs about the C++ side and I had to read the python ones instead, so it's not much better either), but since you're basically just working with raw arrays, all you really need is to extract the pointer to the data and some shape information, and then work on that.
I was sending some index markers and a some small vectors from which the massive array would ve constructed (meshgrid without using meshgrid because of memory constraints). So each thread in the pool just made the piece it needed, did the math, and gave its part of the final output vector.
The indexing and slicing part becomes expensive. But if there's no other option...
I was thinking that maybe I will try to avoid the matrix stuff altogether. The tensors in the DNNs are kind of emulating that part anyway. So I could maybe try to emulate a massive matrix with quadrillion elements with say, 200 neurons with some 10-100 rank tensors? If such a thing is possible. It will be approximate, but the hope is that the DNNs find the most essential features to sufficiently emulate just right enough information to feed into the recursive iteration.
Maybe? I'd expect that if it's the kind of task that it needs an iteration scheme to compute, it would also be unstable and amplify the approximation errors exponentially as the iterations go on. But if that's not the case for yours, maybe it'll work.
Yeah, that is one of the problems. It's compounded by other kind of specific errors as well. I was just using the NN aspect as a "quick fix/jump" to the main method. I guess that is why this is still an open area of research, otherwise someone would've done it already lol. Thanks for taking the time though.
in pytorch, is there any disadvantage to using the lazy modules vs. the non-lazy ones? (e.g. nn.LazyLinear vs nn.Linear)
I see... to me right now they just feel like nicer-to-use counterparts to their non-lazy siblings
When you use it the warning tells you the downside 🙂
The API is unstable
I typically use non-lazy as it's a sanity check while I'm implementing the net
Hey Guys! Consider you are implementing a fashion recommender system which recommends user what to wear on that day on the basis of various factors which will make user look better on that particular day ... What are the requirements I need to have in my mind already for creating one
uh, you'd need to somehow obtain a dataset of "various factors + clothes worn -> 'how good they look'", which, good luck with that.
And what about the weather and surrounding factors coz they too will have an impact on the looks of what we wear i guess
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# Kodlar
# Veri Yükleme
veriler = pd.read_csv('D:\\Project\\eksikVeriler.csv')
print(veriler)
ulke = veriler.iloc[:, 0:1].values
print(ulke)
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
ulke[:,0] = le.fit_transform(veriler.iloc[:,0])
print(ulke)
ohe = preprocessing.OneHotEncoder()
ulke = ohe.fit_transform(ulke).toarray()
print(ulke)
print(list(range(22)))
sonuc = pd.DataFrame(data=ulke, index=range(22), columns=['fr', 'tr', 'us',''])
print(sonuc)
sonuc2 = pd.DataFrame(data=ulke, index=range(22), columns=['boy', 'kilo', 'yas',''])
print(sonuc2)
cinsiyet = veriler.iloc[:,-1].values
print(cinsiyet)
sonuc3 = pd.DataFrame(data=ulke, index=range(22), columns=['cinsiyet','','',''])
print(sonuc3)
s=pd.concat([sonuc, sonuc2], axis=1)
print(s)
s2=pd.concat([s, sonuc3], axis=1)
print(s2)
so im trying to train a nn using tensorflow, but it just returns the same output for all inputs. what could be causing this?
ulke,boy,kilo,yas,cinsiyet
tr,130,30,10,e
tr,125,36,11,e
tr,135,34,10,k
tr,133,30,9,k
tr,129,38,12,e
tr,180,90,30,e
tr,190,80,25,e
tr,175,90,35,e
tr,177,60,22,k
us,185,105,33,e
us,165,55,27,k
us,155,50,44,k
us,160,58,39,k
Us,162,59,41,k
us,167,62,55,k
fr,174,70,47,e
fr,193,90,23,e
fr,187,80,27,e
fr,183,88,28,e
fr,159,40,29,k
fr,164,66,32,k
fr,166,56,42,k
Please look my projects file
fr tr us boy kilo yas cinsiyet
0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0
1 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0
2 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0
3 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0
4 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0
5 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0
6 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0
7 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0
8 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0
9 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
10 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
11 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
12 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
13 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
14 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
15 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
16 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
17 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
18 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
19 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
20 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
21 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
This is my result
Why are my results like this
hi i need help, im trying to clean some text but i get this error:
import nltk import nltk.corpus from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from nltk.stem import WordNetLemmatizer import re import matplotlib def preprocessing_text(text): text = text.lower() text = re.sub(r"(@\[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", text) tokens = word_tokenize(text) return tokens def remove_stopwords(tokens): stop_words = set(stopwords.words('english')) filtered_tokens = [word for word in tokens if word not in stop_words] return filtered_tokens def cleaning_text(text): tokens = preprocessing_text(text) filtered_tokens = remove_stopwords(tokens) lemmatized_tokens = lemmatization(filtered_tokens) cleaned_text = ' '.join(lemmatized_tokens) return cleaned_text text = '/Users/avatarvaleria/UCSD/NLP/HH_english_transcripts.txt' with open(text, 'r') as file: text = file.read() print(text) cleaned_data = cleaning_text(text) print(cleaned_data)
`TypeError Traceback (most recent call last)
Cell In[59], line 1
----> 1 cleaned_data = cleaning_text(text)
2 print(cleaned_data)
Cell In[57], line 4, in cleaning_text(text)
2 tokens = preprocessing_text(text)
3 filtered_tokens = remove_stopwords(tokens)
----> 4 lemmatized_tokens = lemmatization(filtered_tokens)
5 cleaned_text = ' '.join(lemmatized_tokens)
6 return cleaned_text
Cell In[56], line 3, in lemmatization(tokens)
1 def lemmatization(tokens):
2 lemmatizer = WordNetLemmatizer()
----> 3 lemmatized_tokens = [lemmatizer.lemmatize(tokens) for token in tokens]
4 return lemmatized_tokens
Cell In[56], line 3, in <listcomp>(.0)
1 def lemmatization(tokens):
2 lemmatizer = WordNetLemmatizer()
----> 3 lemmatized_tokens = [lemmatizer.lemmatize(tokens) for token in tokens]
4 return lemmatized_tokens
File /opt/anaconda3/lib/python3.11/site-packages/nltk/stem/wordnet.py:45, in WordNetLemmatizer.lemmatize(self, word, pos)
33 def lemmatize(self, word: str, pos: str = "n") -> str:
34 """Lemmatize word using WordNet's built-in morphy function.
35 Returns the input word unchanged if it cannot be found in WordNet.
36
(...)
43 :return: The lemma of word, for the given pos.
44 """
---> 45 lemmas = wn._morphy(word, pos)
46 return min(lemmas, key=len) if lemmas else word
File /opt/anaconda3/lib/python3.11/site-packages/nltk/corpus/reader/wordnet.py:2096, in WordNetCorpusReader._morphy(self, form, pos, check_exceptions)
2094 # 0. Check the exception lists
2095 if check_exceptions:
-> 2096 if form in exceptions:
2097 return filter_forms([form] + exceptions[form])
2099 # 1. Apply rules once to the input to get y1, y2, y3, etc.
TypeError: unhashable type: 'list'`
You need to pass it as a single string not a list
Make a new def that handles that lemmization to each token in the list
im trying to change the learning rate in tf as such: ```py
from keras import backend as K
K.set_value(model.optimizer.learning_rate, 0.01)
but i get an error ```
AttributeError: module 'keras.api.backend' has no attribute 'set_value'
K.set_vaulue(model.optimizer.lr.assign(0.01) try that
same error
im on keras v3.3.3
how to find gtx 1650 gpu device plugin for tensorflow gpu support
latest ig and yes windows 11
i think i need device plugin for tensorflow to access my gpu
To use a particular device, like one would a native device in TensorFlow, users only have to install the device plug-in package for that device.
what is WSL2?
also is tensorflow intel works as device plugin? not sure
oh ic
so like
does pytorch provides all the functionality of tensorflow?
hmm lemme check
its so easy to make sequential models on tensorflow
i finally got the evaluation stage working correctly, i couldnt be happier! Epoch 1/1: 100%|█████████████████████████████████████████████| 351/351 [5:15:18<00:00, 53.90s/batch, Batch Loss=3.68e-5] Evaluation: 0%| | 0/88 [00:00<?, ?it/s]Input to VideoEncoder: batch_size=16, num_frames=16, channels=3, height=128, width=128 After view reshape: torch.Size([16, 48, 128, 128]) After conv2d_layers: torch.Size([16, 512, 128, 128]) After view reshape before fc: torch.Size([16, 8388608]) Input to VideoDecoder: torch.Size([16, 512]) After fc layer: torch.Size([16, 131072]) After view reshape: torch.Size([16, 512, 16, 16]) After conv_reduce: torch.Size([16, 512, 16, 16]) After conv2d_transpose_layers: torch.Size([16, 48, 128, 128]) Channels: 3, Expected size: 12582912, Actual size: 12582912 Final output shape: torch.Size([16, 16, 3, 128, 128]) Evaluation: 1%|▊ | 1/88 [00:10<15:15, 10.52s/it]Input to VideoEncoder: batch_size=16, num_frames=16, channels=3, height=128, width=128
Epoch 1 Metrics: {'Total Loss': tensor(8.0492e-07, device='cuda:0'), 'PSNR': 2.810752446743289, 'SSIM': 0.06163982837892718}
Seems alright for the first epoch.
Jesus the pytorch model is 16.6 gigs
yeah, there are models now that are hundreds of GBs
Right? I mean the falcon 180b is like 350gigs, I was just so surprised it was so big from the first epoch.
models don't grow over each epoch.
This is good to know.
@rich moth in fact, the size of a model is constant. more parameters -> larger model.
epochs are just a complete pass over the training data. More epochs only means more chances to adjust the parameters. It doesn't add additional parameters.
Time to quanitize
Nooo
What does quantizing even do? I've heard that it makes models smaller/possible to run on worse hardware but also makes them perform worse.
Is it like lowering the resolution of the model in a way?
ur kinda right on lowering the resolution, basically it just makes the weights use smaller floating point precision values
it's lowering the number of bits used to represent weights and biases in the model
so if standard model uses fp 64 u can lower it to like fp16
Ah I see I see, thank you
quantization can even happen on ridiculously small bit precisions like 8, 4, 3, 2, 1
at that point you don't even use float though
they just use ints instead
ye, i think thats how they got llama running on pico iirc
Wow, pretty cool
ye it's very interesting there been a lot of cool system machine learning papers as of late
ah, right, didn't see them cause I suppress warnings 😅
great answers above already
here's a pull request from 1 year ago (so maybe a bit outdated) adding K-quantization to llama.cpp
from the first graph you can see that mid-high quantization(quants that keep more bits, compressing the model less) actually doesn't hurt the quality too much(lower perplexity is better) but lowers the hardware reqs significantly(note that the x-axis is in log scale)
so from a running-LLMs-locally perspective at least, there's almost no reason not to run a quantized model
Thank you!
In NN training, the validation and test dataset can have any batch size right? Like I can just use the biggest allowed by my pc to accelerate the learning
I don't think that is relevant for the test/validation batch size @final kiln . If you calculate some measure like accuracy/f1-score etc. on the entire test/validation dataset, it doesn't matter what batch size you use, other than a larger batch size will likely be more efficient for your computer (time wise).
And in case you calculate the loss on the test/validation set, make sure you calculate the average loss per item, such that if you run it with a different batch size, you can still compare it.
Does that make sense? @hard nest
I mean, I calculate the loss mean, like add the loss every batch and then divide it by the number of batches
That's what I was thinking, since it appeared that it won't affect anything except the time, big batches are the best
does anyone here use mmdetection?
Looks like tutorials are sparse; though last release was January.
https://mmdetection.readthedocs.io/en/latest/
I just wanted to know if anyone tried the demo, I'm getting trash fps on videos while using rtmdet tiny model. I thought it'd be as fast as yolo
I don’t think it is geared toward English-speaking users as much, and it feels like something that should have started with a HuggingFace ‘Space’.
hello
i wanted to ask if someone can help me with a tad bit of issue that i am running into
how do you go about doing str.extract
in pyspark.pandas ?
it is saying that is not supported in the documentation
search the docs for 'substr' ; there are Regex and SQL fixes for slicing and extracting columns from a db
Why is the SVD and Laplace considered 'hard'? The kids in the math subReddits would eat that stuff for a snack
It's not math tho
Im using vector quantization durning the training. It consist of a codebook with learnable embeddings. In the forward pass the input features are mapped to the clostest embeddings in the codebook, quantizing the features. It actually allows me to preform this on my system by compressing it to lower dimensions. I imagine if they were in their orginal format this would take an insane amount of time.
do we still don't have a better way other than gradient descent for training AI? since from what I learned it seems to not even guarantee you to get the lowest valley, it just guaranteed you getting to the lowest point of the nearest valley
SGD is one optimization algorithm. There are others.
I see... I guess i need to learn more first 
but in general, I don't think there even can be an optimization algorithm that is guaranteed to find the global minimum
I see... 😔
just guaranteed you getting to the lowest point of the nearest valley
that's only true for normal gradient descent; the fancy ones are a bit better about it (or worse, if you're unlucky)
but generally speaking, yes. optimization is hard.
Are we talking of neural networks?
Because if the problem is convex you have strong guarantees with basic SGD
yeah
numerical optimization is the only math where i ran out of symbols 😂
acc can get very messay
hey guys so I am currently working on a project surrounding training my own Faster RCNN model and it's running as we speak but it's tages ages and I have no refernce for when it's going to sotp traning it. Do you guys know any way that on google colab I am either able to monitor it's traning speed as it runs through or if there is a free / cheap way to speak up the speed of it
yea it doesnt guarantee to find the global minimum. So we implement a temperature to the algorithm. The temperature is hot at the beginning(more prone to explore even if it finds a minima) and it gets colder with each step the algorithm makes
is it taking a lot of time per epoch? If no you can verbose the epochs and reduce them accordingly
[07/11 16:25:03 d2.engine.train_loop]: Starting training from iteration 0
[07/11 16:35:38 d2.utils.events]: eta: 2:26:54 iter: 19 total_loss: 1.874 loss_cls: 0.6037 loss_box_reg: 0.5858 loss_mask: 0.6831 loss_rpn_cls: 0.005265 loss_rpn_loc: 0.009363 time: 31.3418 last_time: 25.0513 data_time: 0.0441 last_data_time: 0.0071 lr: 1.6068e-05
[07/11 16:45:58 d2.utils.events]: eta: 2:13:44 iter: 39 total_loss: 1.661 loss_cls: 0.4723 loss_box_reg: 0.5756 loss_mask: 0.6377 loss_rpn_cls: 0.007119 loss_rpn_loc: 0.01208 time: 31.0541 last_time: 36.0582 data_time: 0.0089 last_data_time: 0.0081 lr: 3.2718e-05
these are what it's outputing currently and you can see the eta per 10 iterations
That's when the tildes and hats come in 😛
10 minutes for 20 epochs(30 s/epochs)
I am currently trying to train it on 300 iterations so that it can serve as a base point for where I continue from and what I'm doing wrong but if it's going to take 30 seconds per iteration I'm gonna have to leave it over night than because that sounds horrid
how big is the model being trained
it's around 120 images in the training portion
this is my first real attempt I would say at working with training my own model as well
how many layers are in your model?
does it also make sense every time it's returning me these eta's that it gets shorter and shoter?
import detectron2
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
from detectron2 import model_zoo
# Setup configuration
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("my_dataset_train",)
cfg.DATASETS.TEST = ("my_dataset_val",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") # Use pre-trained weights
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 300 # Adjust the number of iterations as needed
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # Number of classes (excluding background)
# Use CPU for training
cfg.MODEL.DEVICE = "cpu"
# Create output directory
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
# Train the model
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
yea eta is estimated time of arrival(completion)
that's the configurations for the training
so yeah the total time left is decrasing everytime
and you are using a cpu instead of cuda?
if you have any idea of how I could decrease this please let me know
lmfao yes because I'm on a mac
idk if there was another way around it
you are on colab no?
colab has a gpu
runtime>>change runtime type>> T4 GPU
Use CPU for training
cfg.MODEL.DEVICE = "cuda"
Use metal if u can
Okay
what is that?
I will make sure to do so
make sure to do what i said? if yes, ping me if you have any issue 🙂
It’s basically apples version of cuda, since m1 is both cpu/gpu u use metal to write kernels for it
so like you use device = "cuda" for nvidia?
does metal use nvidia architecture?
No it’s apples custom architecture
For metal it’s like mps iirc
I’d check documentation tho for the library u r using
oh
Yeah metal is p neat, apple made their own dl library recently for metal
i guess ill check it out sometime
i thought there was no hardware accelation for my m1
lol
I have an M2 Mac so would that change anything
I’ll make sure to let u know later I’m taking a break for now
never thought NN would be this complex 🥹
where to learn how to make NN like this?
how to know it's ok to literally add a sin or cos formula to a bias
^ Transformers and NNs are 2 different things.
how do they even know self-attention can be made like that
i wanted to ask
if pyspark.pandas is worth using
or should i use pyspark directly
it's different? what do people call transformer then?
is there something similar to it?
It's fine for large databases- you probably don't need to add pandas, at least not right away
https://www.linkedin.com/pulse/leveraging-pyspark-integrating-diverse-data-sources-guide-ramesh-0ljnc/
thx
ok :3
thing is i know pandas and i have tasked to do the work i have done in pandas to be done in pyspark
and found out there is this pyspark.pandas
but like i do not know if it is i should use it or not
and i tried to use it
i mean i'd argue transformers r subsets of nn, i mean they have feed forward layers after all
i only worse and worse errors
Try just pandas then. Because PySpark is for Apache I think
is there maybe others NN model other than Transformer?
This is right. They are a specific form of NLP. Not very different from a Neural Net, you're right.
there r like apis to perform pandas
function on spark df
yes that only i was using
it is pandas api
for pyspark
transformers ARE neural networks, just a particular kind
Definition & Types of Neural Networks: There are 7 types of Neural Networks, know the advantages and disadvantages of each thing on mygreatlearning.com
thx ❤️
ohh i wanted to ask are there any courses that you would reccoment for deep leanring
i have done the coursera deep learning specialization
this aren't on same level as transformer tho
as for the answer to "how do people come up with this" and "how to know when to use which activation function", it's "by studying a lot of math"
mm I think I would look to Amazon or your local bookstores
you usually need a very good statistical or optimization or other math-based motivation to come up with a new architecture that works well and know why it works
if you want to make new stuff without stumbling around in the dark, yes
You are completely right that is a bad article- BERT and GPT are CNN architecture; not transformers (afair)
watch 3b1b video
big bro can u teach me how to make the next KAN
if you jus want to use stuff, you can just fish up the hottest stuff used atm
for ??
oh! they have the vid, thx for telling
texts on Deep Learning.
if ur starting off learn linear, calc first then get understanding of statistics
from there u can get more niche math like optimization and analysis
like are there any other resources thing is i have adhd and i have problems with books i cannot seem to finish them
ok 👀
for linear algebra if there is any i would love to buy i tend to my deep leaning on laptop where i can like code it out a lil
yeah its on deep learning a 6-7 part series
I feel like each connection between neural is just a linear algebra tbh
it is, u compute the layers with whole bunch of dot products
can you share some rescources for linear algebra and calc
how much do you know?
hm what's ur level of mathematical maturity? r u comfortables with proofs?
Well you got through the Coursera course- so maybe ask your instructor or advisor for continuing education resources. Deep Learning isn't really my specialty, I would just recommend keep searching the web and try to stay consistent with a learning schedule, and you will retain a lot.
yes
i am college graduate level
I love this paper
watch andrew ng deeplearning vids
This is 'legit'
There are calculus and linear algebra videos by 3blue1brown on YouTube, I can't say whether they are good or not since I'm just a beginner
But I like the way he explains stuff so far, a lot of visual explanations
and i wanted to ask like HMMMM
how do you think a neural network reaches the optimal solution honestly whenever to try to sum it over in my head and explain i simply cannot because it is just dot product happening and lot of it is random
ok then i would suggest gilbert strang or axler linear alg
strang is a lot more computational whereas axler is more proof based and formal
as for calculus paul's online math notes is a good intro and if u want to do deep dive read spivak
damn... that is matrix... does this mean I need to learn FFT 😭
I never understand FFT (Fast Fourier Transform)
you use derivatives to see whether changing a parameter will make the cost go up or down, then move a little in the downward direction
its useful to know imo
also ur prob gonna learn in college
at some pooint
that's GD but obviously there are lots of other optimization algorithms
https://youtu.be/VMj-3S1tku0?si=yQn4FO-9U1_jsBB9
this is the video imo
This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school.
Links:
- micrograd on github: https://github.com/karpathy/micrograd
- jupyter notebooks I built in this video: https://github.com/karpathy/nn-z...
i mean most neural networks have a convex
loss function so loss is guaranteed to be suffecient decrase in certain cases
you use Gradient descent for that no? or it just don't always work in real practice? 
yes gradient descent
hmmmmmmm idk it just seems odd to me how it all works out
you can never an output being reached with a singlular unit of neuron what exaclty is the point of stacking them up
like if you were to spread a neural network with same amount of neurons with that of hidden layers will it work the same ?
if not then what fucntionality is the hidden layer adding ?
i know the question seems stupid but i cannot understand it
plain gradient descent is the basis for a lot of optimization functions, but usually we don't use traditional gradient descent by itself and add lots of fancy stuff to it (mini batches, momentum, adaptive learning rates, scheduling, etc)
I see... this stuff is very complex... 🙂
yeah like waterfall said there different ways top optimize models such as adam and adagrad
we can its the Universal approximation theorem
some of the methods such as adagrad r p simple
you can get lost in the sauce with optimization algorithms (quasi-hyperbolic momentum 🥴 )
every unit will approximate the function to some extent
@hearty depot & @vagrant root if I hook up my google colab to a local runtime do you guys think it would run faster
adding more units will make it more precise
that sound hard 💀
but we can have a non linear activation regardless of hidden layer or not
or dont use gradient descent and use lbfgs instead 😂
no not really m1 isnt accelerated as the nvidia gpu on colab is
^
okay thank you
p sure colab has tpu
I'm lazy I just stick to my genetic algorithms so I don't have to do math 
the nvidia gpu(cuda) is designed for ai matrix multiplication
so if a neural network is a one dim array with non liner activation and the number of neuron matches the number of neuron units with the hidden layer the performace will be the same ?
tbf even stuff like pytorch abstract tf out of the math, like i met phd students that dont know how autodiff work cuz they just use pytorch w/o thinking
true bc writing it all from scratch is fucking pain 
yo @final kiln how is the job search going?
lessgooo
congrats
goodluck
I like how everyone just accepts this as is even tho there haven’t been formal proof on it
The continuous case right?
isn't one hidden layer just mean it only pass through 2 linear function? that mean it's should behave like X^2 polynomial graph no?
Lowk a blessing and a curse, some of this beliefs make for the worst papers
Mfers be writing papers on emergent properties of llms and then be using mcq for the metric 😭
No shit there is a sharp linear increase in accuracy, now try that a non linear metric
I see...
I just watch this, it's seems the main factor is because of ReLU?
they r technically equal in expressive powers
I'm a noob but I've never even heard of that, is KAN new?
like u make kan from mlp layers
yeah its a supposes
to be an alt to mlp
I don't know much math yet, sounds like something similar to a Fourier Transform?
kan adjusts the activation function along with the weights making it more verssatile
Interesting, thanks
I still can't understand Fourier and I have studying it for like 4 years 😭
I get the basic but once you get into the compression and accelerating stuff 💀
I understand that image, but putting it into practice is another things
It's from here; you got this.
https://stemporium.blog/2023/04/13/what-is-the-fourier-transform-and-how-is-it-used-in-image-processing/

It's a more recent hyped paper that in practice is just the same thing again but worse, several of these pop up over time in ML. You can also show it to be the same mathematically.
You might get some neat concepts from such papers, but don't let the hype get to you.
(Kolmogorov's work)
I mean, I like that people way smarter than me are investigating approaches that are different from the current ones
Yeah, but it's a bit different when they go on Twitter and start spamming that it's the death of current deep learning without any practical evidence / demonstration.
Oh yeah for sure
You can find a ton of these "I solved AI" types that don't really have anything to show ever.
But if there is a demonstration, that I can reasonably reproduce, I am very interested.
try it for yourself
wow I don't know such things exist
🙂
notice how all different waves are made on same frequencies with different altitudes
thats what fourier transform does it deconstructs a waveform into multiple frequencies waves
science
this book is nice imp
also a lot of good examples in code
Why is ReLU mostly used in the hidden layers of a neural network than compared to Sigmoid?
Well for one
One problem w sigmoids r vanishing gradients, relu r less prone to this
Hello. I am having difficulty conceptualizing specific parts of neural networks. I have went over the math, and I understand how the math works in terms of calculating the values. Here is my question:
What is the significance of the prediction any one neuron makes (linear regression predictions and activation in this case RelU). So like, when that value is produced and then passed into another layer with the dot product of further weights, at the end of the day, how do all of those numbers come together to form a cohesive output.
Essentially what does each part mean to that final whole? Cause i dont get how each neuron relates to its output.
nothing you can interpret as a person, really
But, I cant grasp that. Because the person who made it mustve realized, "this computation provided me a prediction... so if i do this, this, and this, it will give me a nonlinear more well-rounded prediction", no?
nope
in fact, especially if you approach it from the perspective that NNs were inspired by the brain, which we also don't understand, the idea is that complex organized behavior "emerges" from simple interactions in "inexplicable" manners
the theorems involved in justifying neural networks are claims of existence of good approximators, but they are not constructive (they don't tell you exactly how to build such a network)
you can read into explainable AI if you like
so what ur saying is, they did linear regression and got a prediction. and then used that prediction in tandem with tons of other predictions and were like hm so if we make this a big chain and we do tons of dot products with tons of weights then we can get better predictions?
then with that, how would RelU make it non-linear? just by adding holes in the data?
except in general the individual layers are not predictions at all, and will not work if you consider them alone, so no
relu is nonlinear in the sense that it does not satisfy the definition of linearity 😛
sure, "adding holes in the data" in this case is a nonlinearity, but it isn't always
all nonlinearity means here is that you applied a function for which it is NOT true that T(aB + cD) = aT(B) + cT(D), where T is a transformation, a and c are scalars, and B and D are vectors
"punching holes in the data" can also be done by multiplying with binary matrices, but this is a linear transformation, so the idea has to be defined more carefully
but, it uses the equation of a prediction tho, no? dot product of weights with corresponding features
like how it impacts decision making process?
one of problems with a lot of nns r that they r flexible
why is a matrix multiplication the equation of prediction?
but hard to interpret when compared to classic strategies like linear reg
the composition of all of the layers of a network makes a prediction
true, but how does that apply to the data? like how does getting rid of negative numbers result in a better prediction?
you're trying to interpret stuff that makes no sense
i just associate it that way because thats the equation for linear regression which produces a prediction
i mean linear regression makes sense
you are producing a line of best fit
you have already assigned (a weird) meaning to these things in your head, that's what's throwing you off
this only makes sense if you had a line you were predicting. here, you don't. you're not optimizing to fit a particular line with each layer
im p sure zeroing the value has more to do w preserving the gradients iirc
the layers are not doing linear regression
hm
oh. so is there any classification to what they are doing?
no
how so?
if you find one you win a prize, cuz researchers haven't so far
a lot of deep learning is "lmao it worked, look"
thats so mind boggling. how would people know then that doing that math, and doing that math in layers, produces a prediction?
because there are severa ltheorems saying that if you compose some number of nonlinear functions, you can get arbitrarily close to any other function
i js cant conceptualize that
it just helps prevent the gradient from converging to zero
😭
it doesn't tell you which functions nor how many to compose
nor what each of them mean
also iirc non-linearity is another reason why relus exist iirc
read about the universal approximation theorem
those theorems motivate you to try composing simple functions. the training procedure does not give you nicely interpretable layers, they do arbitrary shit
wait... so that means that RelU does have a meaning and isnt just meant to apply non-linearity
not a "meaning". it has particular properties when you compose it with itself several times and then differentiate through it
properties that are nice for some optimization strategies, but not others
hm
also calculation for relu r a lot easier in terms of flops compared to like sigmoid
my professor showed us how neural nets worked for XOR, and he said that neural networks for XOR produce a line (when using non-linearity), but it produces a fat line
our professor didnt explain why to use sigmoid. just that it introduces non-linearity
i mean from what hes conveyed it seems rather simple to implement like a XOR neural network with no libraries
for one, it has the nice property of producing outputs in the range 0 to 1, which you want in this case
What makes a line fat?
Imagine you get some input and you compute a ton of random functions on that input. If you have enough of those one or more of them will probably compute the correct answer to the problem. There are neural networks that operate on this as a foundation itself. So you can see why big networks would generally work, even if your method is random init except for the last bit.
u dont even nn tbh for xor, u can solve it with nested perceptron or manually calculating the weights
You need non-linearity to get all kinds of combinations.
and what's the difference between nn and nested perceptron
he showed us a graph of a line with a slope of id say approximately -x and he conveyed it as two lines that are parallel with different y intercepts, and the space between those two lines are corresponding to two XOR values and the space outside of those two lines correspond to two other XOR values
Sigmoid was specifically chosen for several nice properties and also it kind of mimics real neuron activations which is why we still call them neural networks even though it's far removed from that at this point.
damn. so i was trying to conceptualize something that isnt conceptualize-able
not in the way you wanted, no
So the fat line is the space between two lines I guess?
but, in the end, the neural network produced three values. those three values were probabilities of dog, cat, and then smth else
ig in theory they technically r nn
i read abt the neurologist who developed them first. quite interesting.
i was dumbfounded to find that after reading 10-20 articles none of them address why
so ur assertions make sense
they only address the math to do so
which IMO is relatively simple considering I learned the calculus and linear algebra for it two days ago
so @wooden sail the dot product acting as a method of assigning similarity in terms of vector math isnt relevant at all to how the neural net produces these predictions?
It's that explaining each part does not mean anything to a human, there is nothing to explain because explaining something requires you to simplify it and so it need a nice simplifcation.
not in any way that is useful to you
so, could u write a neural net without dot products?
yes
interesting
you could use other functions instead. what even IS a neural network?
thats rlly interesting actually
lol computational neuroscience is acc so insane, i read some neurodynamic papers and i was like wtf
it's often a lot more useful to just think of it as function composition
idk ive always read its js meant to mimic a brain
this doesn't help at all tbh
huh? didnt the creator mimic a human brain
thats how he developed the concept
he was a neurologist or smth
yeah but we don't use them as they were originally envisioned, and the original analysis was also sorely lacking
Like when you look at a puddle of water, and you splash it, and that causes some output. You can give some general properties and even explain how each individual particle works on its own, but if I gave you some random state in that dynamical process and asked you what it "means" for the output, there is nothing to really say other than it will cause the output to happen eventually / has a lot of random information.
i mean ig if u look at the earliest models, they imitated neuron in the sense that they made certain decisions based on whether a certain thershold value was outputted or not
kinda like how synapses fire
carrying that forward will only hurt you
oh. so we evolved the original idea into a more fleshed out jumble of... nonsense?
hm
hm
i find it ironic that the tool thats meant to make sense out of things that dont always make sense itself doesnt make sense
😐
if u want to something closer to brain model, look at ml papers on assembly calculus
that is prob what ur looking for
idk enough calc for that. i just learned what we needed to do linear algebra 😭
im js taking a class
AI was never a tool to make sense of things, it's a tool for replacing one problem with another
huh
wdym
i mean isnt it inherently meant to solve things that arent classicly solvable using standard decision trees?
it addressed the problem of not knowing what a function is, and replaces it with "maybe if i have enough examples of inputs and outputs, i can get something similar to it"
except with 0 guarantees of any kind
well its not really calc heavy, its more focused on graph theory/prob
(not technically true about the 0 guarantees, but the guarantees are usually not of the kind one would want. for some architectures, you can explicitly when under which conditions you'll fit the training data exactly and get overfitting)
ig but it clearly is right enough to have made a big difference on our world
i didnt realize how present ML is everywhere
now transformers are going crazy
yes, and also for inexplicable reasons, which is why it's not allowed in many safety-critical applications
lol tbf, the guarantees in the sense of pac learning and rademacher complexity do get p messy
To understand more about what the problem is and why humans struggle with this, see Wolfram's work (https://www.wolframscience.com/nks/ ).
(It's about complexity)
what are some examples?
ty
so we dont even know how neural networks come up with patterns.
100% autonomous vehicles are illegal in most places
do we even know if they come up with patterns?
true but they are also newer. much newer and need a lot of work
my cousin is building a new autopilot system with his company and he was telling me abt it
i'm just trying to make sure you're not romanticizing AI as something it isn't
it was rlly interesting the problems and also solutions
lol trust me i do not romanticize it 😭
but im also more wrapping my head around this
i mean we been getting a bit better at it w explainable ai but often times when use nn u r gaining flexibility in lieu of interpretability
so theres no need to deflect its usefulness. it obviously has downsides too
huh? so there is explainable?
(besides linear regression btw)
yeah, there are certain techniques to see which features influence decision making process
examples being shap and saliency maps
Well the problem is what "how" means here. I can give you a general overview, but what you are probably asking for is a step by step demonstration, which gets back to the thing where I could say what each node in the network is computing, but I can't assign a label to that that has meaning to a human at a high level.
😐
what is the general overview u would give? im wondering if its what i currently know
The calculation happening is the best description I can give.
ah well i know that already
i understand conceptually how they work how you have your features and with your data, you run the NN over and over again to get the weights and bias that minimize the error. then at the end u can do other mathematical techniques to get the answer into different formats
and i know the mathematical functions used
After training (assuming it did well on that), we can say that it has found some way to consistently do the dataset correctly. So we can say that it kind of contains some of that in itself (lossily encoded). And maybe, it can answer correctly outside of that original data (hard to tell if it's doing well at this or not and if it just needs more data or not).
for stuff like cnn, u can visualize the convolutional layers by the overall values of the convolutions when doing forward step
like this kind hacky but it can help u know what ur netowrk think is improtant or not
well isnt it merely just the weights and bias it trains. then it uses those weights on new data and it shld predict close to what the actual thing shld be relative to the data being related to the training data
(like same category yk what i mean)
ah. well im learning abt CNNs today so ill get back to u abt that
personally i find linear aggression the most intuitive
that was rlly intuitive to me and made a lot of sense
If you have sparse neural networks, and especially not distributed, it becomes a lot easier to tell because unlike dense not everything is hooked up to everything / a giant soup. Convolutions (networks) have sparse weights (shared weights basically). This makes them better for visualization / understanding, but still not great due to dense activations.
Even with sparsity, while better, it's still not explainable when complex enough.
to tell what?
What is going on / what it learned.
Sometimes u just be gambling for lottery tickets LOL
wdym what is going on. in terms of what
lol ppl who do a lot of statistics must be in love with NNs 😭
Are you referring to the lottery ticket hypothesis?
(It's probably right and seems to hold so far from people testing various pruned networks)
In a CNN you may actually see it picking up on detecting eyes in an image for example.
I want to write a simple AI and I want to write the logic that runs it, my biggest question is how I will do gradient optimization?
sure, but we see the result. not how its working o the inside
I previously made DAGs neuron network but now I want to try MLP
gradient descent?
Yeah
do you know calculus?
okay. so do u know the sum of squared residuals that will produce the curve that gradient descent will attempt to optimize?
But it for some reason I still don't understand it maybe because he didn't explain how to work with biases to
Yes
well biases are like a y intercept
What I mean is that if you have a fully connected layer like in an MLP, you will find it hard to point at same part of it and go "see? eyes!"
its just another value that can help make the prediction more correct
so instead of training just weights with features, you can have an additional value that is determined and factored into the overall calculation
they arent always necessary
But that is a human problem, because it's about what means something to a human.
hm
interesting
does this make sense @late lichen
I got the idea of back propagation but I feel like it only update the weights and never with biases
gradient descent comes in once u understand sum of squared residuals
you perform the gradient calculation for biases as well
which updates it as the model is trained
i can find the formula for u if u need tho
like the math associated with updating it
Let's say we got the perfect weights but with that isn't will already work? I mean if we ramp up the bias ,the descendant node will recognize that node to be active isn't?
Yeah, lora and pruning black magic
it can still work without bias. they arent necessary
Yeah, it's nice. Although you can also start sparse too, but since deep learning tools already exist it can often be easier to use that and then prune after.
Unless you have to start sparse or it won't run fast enough.
@iron basalt btw, I read here that this is a thing, but ive never heard of this in my studies so far
Each node connects to others, and has its own associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network.
is this js another form of an activation function?
like relu?
looks to be that way
That's the original MLP.
is it talking abt an activation function tho?
Yes.
ok ty
But I was also thinking that when the node is completely inactive it will be 0 no matter what
Wait I think I get it, so we perform back prop on weights and after that we do it on biases?
Was connected to a camera with a 400 pixel image to recognize images.
First perceptron implementation.
I was thinking that even we have perfect weights we always lack something if we don't tell how much the node want to be activated
Then they added more layers, although as you can see the physical implementation is not really a great idea.
And the sole reason is to introduce non linearity?
For the perceptron there were no layers, so it was just at the end to decide 1, -1 (class).
Oh. But for like modern Anna’s
NNs
Yes, if you don't add non-linearity you can collapse the whole thing.
Yeah like with XOR.
?@harsh sun
Biases are more weights, but input always set to 1.
But isn't if we ramp the bias it will increase the final val more than just weights?
wdym backprop on the biases?
they r performed at the same time via chain rule
If we need to increase the gradient of a single node how we will know that we need to increase the weights instead of bias?
How it works?
we have a convex loss function
we calculate gradient in respect to that and update weights and bias layerwise using an algo called autodiff
basically top down approach
I don't know much on "convex loss function"
Too dark i can't se...
this is a basic definition and not very precise but convex loss means u can draw a line between any two points and every points in between the two points would be below the line
Cool
Let's say we have 2 nodes hidden and output nodes
What the formula looks like on the bias and weights to update it's value?
@hearty depot
LangChainDeprecationWarning: The class `LLMChain` was deprecated in LangChain 0.1.17 and will be removed in 1.0. Use RunnableSequence, e.g., `prompt | llm` instead.```
how do i resolve these
prompt | llm is going to be your input prompt as a class instance.
Thats supposed to be English enough....
Eh that doesn't show up in the docs very much though. They have OpaquePrompts but it's supposed to work with API calls more I think
Nevermind; I don't know
btw, why do neural networks not solve non-linear things without a non-linear activation. doesnt it discover patterns based off of the values it calculates?
because you can't fit a linear function to a nonlinear function
neural networks without nonlinear activations are just linear function estimators
Maybe add more layer
¯_(ツ)_/¯
adding more layers does nothing if you don't add nonlinearity, it's just making a bigger linear function
Make more sence here
gelu is an activation function made to introduce a form of deterministic regularization into the activation function instead of as a separate, stochastic process (like dropout for example). It preformed pretty well compared to ReLU on the benchmarks in the original paper https://arxiv.org/pdf/1606.08415v5
neural networks dont result in a line tho. i dont even get how neural networks work, and ppl said no one does. but if its all about these random things coming together to create an output, how does non-linearity achieve that
im having trouble conceptualizing NNs
Neural networks don't result in a line because we add nonlinearity to them. https://www.youtube.com/watch?v=s-V7gKrsels this might be a good watch for you, this topic is often described using the "boundary problem".
when people say "no one knows how neural networks work", they don't really mean what it sounds like that means at face value.
^^
people understand neural networks enough to build them, what we don't understand is the actual data stored inside of massive models after training; the values of the weights and biases and what they relate to are viewed as a black box. Say you are given a single parameter from GPT4's weights and you can change it up or down, we don't have a way to know exactly how that change will effect the output without testing that output. If we did, we could do "AI brain surgery" and fine tune models by hand instead of having to train them further through things like RLHF to align them.
It might be worth picking a resource on neural networks too.
For instance https://www.deeplearningbook.org/contents/mlp.html is free and goes over the non-linearity part, but there are other resources which might be more to your liking as well
i mean thats what 3 ppl were telling me earlier today (i wont say who)
im in a class with a professor who didnt describe how they work besides from the mathematical functions
and ive read 10-20 articles and it never explained how each node is significant to the total output
yeah no ive known that.
ill send the thread
When they said "no one knows how neural networks work", that statement can be true or false depending on what is meant by "work".
my thread conveys the sort of things i was wondering
theres a lot so its hard to rephrase it now
i just cant conceptualize that neural networks are linear inherently if u didnt have the activation
you could try to take a simple network and write down the equation
okay so why don't we take a simple model and lay it out as a single function that will show you what's happening
exactly 
i know the equation is linear
i know the math for the neural networks
but like, the patterns that the neural net discovers as it trains. i dont see how it results in a line
to be honest, I am not clear on the hold up for you
it has nothing to do with the patterns it learns, the neural network without nonlinearity is only able to output a line. it's job is just to find the best line that fits.
if I give you a linear function like x * w + b how could you ever represent anything other than a line by changing w and b? (these are scalars in the example)
arent there a bunch of lines though? like a line per node?
because the equation is linear per node
s(x) is the activation function, change it to just x and see what happens.
holy shit
so the non-linearity allows the weights to create a form like this
otherwise no matter what its a straight line
The decision boundary curves.
huh?
This is XOR, the boundary between the two classes is that curve.
Depending on which side the point lies, it spits out that class.
i see. so as the model trains, this whole blob can take many different forms right?
depending on the task
You can change the weights, move the sliders, see what happens.
Try making up your own points in the table, can you manually find the solution (weights)?
say we add another layer to my example l1 = x * w1 + b1 l2 = l1 * w2 + b2 we can break this down to (x * w1 + b1) * w2 + b2, no matter how many multiplication or additions you add to this function the output will always be linear
why has no one else recommended this tool?
this is like ground breaking
A lot of people know about desmos.
so @iron basalt by making is non-linear, your output would inherently be shaped weirdly. training it just manipulates the "thing" so that it fits the solutions?
but using it to display the result of an nn
Not sure what the thing is, but it does make it not a straight line.
Try ReLU btw.
You can visually see the line-ness of ReLU.
so weights are essentially molding this into a solution
Yeah, that is what parameters do, they morph stuff to do whatever.
how does each specific node contribute to this final thing?
It's in the desmos page.
wdym
like i know mathematically how each node contributes. the dot product of the nodes of that layer with the weight of the node that its being applied to
yeah but like how does it contribute to the solution. like how does doing the dot product between the current layer and the weight of a specific node, how does that value come to impact this in the end.
That is also in the code.
... i know mathematically.....
conceptually is what im asking
I'm not sure how to answer that / what you are asking for. The math shown there is conceptually that. Math is the best way to describe it.
Are you asking why, not how?
actually i think i know why. lmk if this is true. the value you get from the dot product is weighted. so when it trains you can shape the data. then plugging that in again to another node above means that the value passed into that node is a reflection of many previous weights. therefore, by changing the weight, you are having an even greater impact because just one node later in the nn uses many weights.
does that sound right?
It's just feeding one into another. Make a new blank desmos and play around with random functions and their combinations. Get a feel for how to make up functions / shape and such.
Try some stuff like f(x)=x, f(x)=cos(x), f(x)=cos(2x), f(x)=x+cos(2x). Try adding parameter sliders, like f(x)=ax+bcos(cx). Then try making another function, g(x), and try making it make use of f(x).
Then know that a NN is just a bunch of these combinations of various functions to solve the problem. But instead of having cos(x) in there (although some NNs do), we have just something simple, like ReLU(dot product and bias), and by composing a ton of them we can also get what we want.
So many functions combine into one function that is supposed to approximate the data?
hey yall, i'm new here and i was just wondering how long yall have been programming/working with AI and what its like in your opinion 🙂
yeah so start with you!
It's a rapidly progressing technology that seems to be making strides in a small amount of time, considering its capacity and its forward progress. What's not to like? I've read in headlones AI is a boom like the dot.com, but i think those people are fools that dont understand enough.
Ok ty
Started like two days ago.
Depending on which AI method u use is differing levels of complexity
I’d say linear regression is pretty easy
Neural networks, now that I get them, are a bit more complex. CNNs are weird I’m still working em out.
is there any point in accounting for multicollinearity when using methods like ridge and lasso?
-
Does lasso avoid multicollinearity?
it CAN, but not always. it depends on whether the matrix involved in the computation has a large enough "kruskal rank" -
if there are two highly related terms, wouldn't lasso just pick one, and shrink the other to zero, essentially rendering techniques like vif useless?
related to the previous point, it favors terms that result in a lower L1 norm when deciding which terms to ignore. it can only do this in a useful way if the kruskal rank is high enough, and the result you get is only useful if what you want is a sparse solution. VIF is a statistical criterion based on covariance. if you interpret LASSO as the maximum a posteriori estimator, what it does is assume the solution vector follows a Laplace distribution, but says nothing about the covariance directly. they do different things, you have to decide which approach works for your problem. -
Lasso is strong for a large number of predictors, but what's the limit?
idk what you mean with this. usually what matters is the ratio of nonzero predictors vs the total, which is also related to the kruskal rank. there's a thing called "phase transition maps" which plot out at what level of sparsity L1 regularization methods break down. -
What stops me from simply "feature engineering" tons of useless variables on the basis that one of them might be good for the model? After all, anything deemed useless will shrink to zero given a good lambda value.
the more correlated the features are with each other, the worse L1 regularization works, so adding extra features is pretty much always bad no matter how you look at it
Does google colab offer GPUs for free access?
yes but they're shared and sometimes you don't get immediate access
it can tell you to wait if you've been using too much compute time
but how do I know if I have access to one or not
I am using device = torch.device("cuda" if torch.cuda.is_available() else "cpu") and it says CPU
sure
I mean what is the difference between TPU v2 and T4 GPU?
tpu is not a gpu
i've never played with tpu's myself so i'm not familiar with the specs and i can't really comment on what is better here
try them out and see what happens
tpu is google's gpu specific for ai mat muls
gpu is specific for matmuls
oh
ye
Thx
Noted but what does deprecated mean?
not quite right, they're way more power efficient
i heard google were pulling off tpus?
i would also imagine they work great for anything using XLA (which pytorch does not use by default)
a new one was announced like 3 months ago
moving towards tensor chips?
oh i will look into that
isn't that what TPUs are?
I know some of them but where can I learn the others?
I only know Input, Hidden, Output, and Recurrent cell
don't know others things
let me check
they are similar
inputs/different types of input notation
neurons/different notations
outputs/different notaions
recuurent/different
convolution you can learn via
Discrete convolutions, from probability to image processing and FFTs.
Video on the continuous case: https://youtu.be/IaSGqQa5O-M
Help fund future projects: https://www.patreon.com/3blue1brown
Special thanks to these supporters: https://3b1b.co/lessons/convolutions#thanks
An equally valuable form of support is to simply share the videos.
Other v...
thx
FROM CLAUDE:
Input Cell: The entry point for data into a neural network.
Backfed Input Cell: An input cell that receives feedback from later layers.
Noisy Input Cell: Adds random noise to input data for improved generalization.
Hidden Cell: Processes and transforms data in intermediate network layers.
Probablistic Hidden Cell: Incorporates randomness for modeling uncertainty.
Spiking Hidden Cell: Mimics biological neurons by firing at specific thresholds.
Capsule Cell: Groups neurons to represent entities and preserve hierarchical relationships.
Output Cell: Produces the final prediction or result of the network.
Match Input Output Cell: Attempts to recreate specific input patterns in its output.
Recurrent Cell: Processes sequential data using self-looping connections.
Memory Cell: Stores information over time in recurrent networks.
Gated Memory Cell: Controls information flow with additional mechanisms.
Kernel: A small matrix of weights used in convolutional operations.
Convolution or Pool: Applies kernels or reduces spatial dimensions of data.
thx ❤️
apparently tensor chips are mobile only, my bad
I think I had a meta overfitting problem 🥴
The optimization algorithm you use for hyperparam tuning can also overfit if the tuning set is fixed. After some time it may find hyperparams that don't generalize to other datasets
That's really interesting
if some one want data science and coding related courses dm me
I have a 3 way split
It's on the holdout test you can see if the hyperparam tuning "overfit"
This is 100% how you should do it
But
I don't cross validate neural nets
All the rest, yes
Takes too much time
Because you mostly do DL right?
Doing the entire thing is way better, improves skin quality, sleep at night, blood pressure etc.
But it takes too long for deep nets
I'd argue that not training a model means you still need to evaluate
Unless it's low stakes stuff
That's not what I meant 😮
If I were doing RAGs in the real world I'd spend a lot of time thinking about how to quantify their performance
That's the most high value literature out there right now for "industry ML"
You're not training anything but you do need to iterate and be able to quantify the performance gain somehow, otherwise it's anecdotal and small sample based
How will you evaluate a RAG?
Evaluation is the thing I take the most seriously 😄
There was talk of collaborating with the emergency services to make a model to detect people falling
95-99% of such a project is data collection + evaluation framework stuff
I don't think it's something that data people take seriously tho
My colleague just had some videos of him falling and was making the models on that
Are you talking about hypothesis testing?
no research
I don't want to sound annoying but your question isn't specific enough 🙂
would you consider a research on 2000 element dataset valid? if it provides good result