worldly dawn Jul 8, 2024, 7:52 AM

#

that sounds like a class I would have loved

past meteor Jul 8, 2024, 7:52 AM

#

And you could construct the algo however you wanted

#

And just like you, EAs are things I use at work but super infrequently

worldly dawn Jul 8, 2024, 7:55 AM

#

I think they are underrated and DL has taken quite a bit of the spotlight. But they are still worth knowing and having in your toolbox

#

Also it was cool to see some of the papers on EA applied to DL architectures

#

or even the evolution of weights

past meteor Jul 8, 2024, 7:59 AM

#

To me they occupy a very different space

worldly dawn Jul 8, 2024, 8:00 AM

#

which is?

#

and how do you use them at work?

past meteor Jul 8, 2024, 8:00 AM

#

Needing data vs not needing data

worldly dawn Jul 8, 2024, 8:01 AM

#

Interesting

past meteor Jul 8, 2024, 8:02 AM

#

worldly dawn and how do you use them at work?

When I need to fit some curve and I have no idea of the properties (convexity or so) + hyperparameter tuning

#

But I agree, they're important to have in your toolbox

#

My colleagues would try and solve typical problems like TSP with machine learning

#

Or nurse scheduling, (3D) knapsack etc.

worldly dawn Jul 8, 2024, 8:05 AM

#

oh yeah they are fun

lethal pendant Jul 8, 2024, 8:05 AM

#

hello

#

can anyone tell me what should i do to start

#

ill do any suggestion

past meteor Jul 8, 2024, 8:06 AM

#

lethal pendant can anyone tell me what should i do to start

Check the pinned posts 😄

lethal pendant Jul 8, 2024, 8:06 AM

#

ok

worldly dawn Jul 8, 2024, 8:06 AM

#

lethal pendant ill do any suggestion

check out the pinned resources

toxic mortar Jul 8, 2024, 9:53 AM

#

Follow-up on this: What benchmarks you look for when evaluating a DL/ML classification paper. I was wondering things such as dataset, accuracy, confusion matrix. Is there anything else that can give me insights about their attempt? Thanks man 😄

hoary merlin Jul 8, 2024, 9:59 AM

#

which social media site is good for a sentiment analysis project? i was going for twitter but its api is paid now'

jaunty helm Jul 8, 2024, 10:23 AM

#

toxic mortar Follow-up on this: What benchmarks you look for when evaluating a DL/ML classif...

accuracy may not be great when dealing with imbalanced data
e.g. say you have a house fire dataset where only 1/100 houses actually caught fire, you can get 99% accuracy by always guessing no fire, but that model's terrible in practice (it's more important to correctly predict those that may catch fire to deal with it preemptively, than to correctly predict those that don't catch fire)
so you have 3 more stats, precision, recall, and f1.
to put it simply, if I were hunting a group of 100 ducks and got 60, I'd have 60% recall(which measures how many sheep I got out of the total); but in the process I also mistakenly shot 60 geese, then I'd have a 50% precision(which measures how good I was at shooting actual sheep and not something else)
I could be more cautious and only shoot those that I'm sure are ducks, maybe then I'd get 20 ducks and 0 goose, then I'd have 20% recall and 100% precision
f1 is like a healthy blend of precision and recall

toxic mortar Jul 8, 2024, 10:24 AM

#

jaunty helm accuracy may not be great when dealing with imbalanced data e.g. say you have a ...

Can't I directly calculate precision, recall and f1 from the confusion matrix?

jaunty helm Jul 8, 2024, 10:24 AM

#

toxic mortar Can't I directly calculate precision, recall and f1 from the confusion matrix?

you can also directly calculate accuracy from the confusion matrix, what's your point?

toxic mortar Jul 8, 2024, 10:25 AM

#

jaunty helm you can also directly calculate accuracy from the confusion matrix, what's your ...

I get what you are saying. I was reffering to having cm is sufficient enough to calculate those parameters

#

Like why would I explicitly look for f1,recall and precision when I can see them from the cm?

ionic valley Jul 8, 2024, 10:25 AM

#

So I've finished my Youtube Dislikes Predictor with linear regression, but I'd like to expand upon it. In particular, I am looking at these two unused features. From a glance, am I likely to get any possible insight from these variables at a significant level? If so, what techniques do you recommend I try?

jaunty helm Jul 8, 2024, 10:26 AM

#

toxic mortar Like why would I explicitly look for f1,recall and precision when I can see them...

fair enough

toxic mortar Jul 8, 2024, 10:27 AM

#

Also research papers might be pretty faked. Am i missing some metrics that can evaluate attempt

#

And help me identify weak links within the model

narrow tiger Jul 8, 2024, 12:26 PM

#

if i am using ollama locally
does it restart the specified model each time i make a request?
or are all the models that I pulled always running
i can see that ollama is always running

sick raft Jul 8, 2024, 1:11 PM

#

narrow tiger if i am using ollama locally does it restart the specified model each time i ma...

there is a keep_alive parameter which "controls how long the model will stay loaded into memory following the request (default: 5m)" (src: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion)

GitHub

ollama/docs/api.md at main · ollama/ollama

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. - ollama/ollama

hallow sphinx Jul 8, 2024, 3:29 PM

#

How does go mod tidy search for the module? Like in what order?

arctic silo Jul 8, 2024, 4:08 PM

#

I want to transform large amount of data from kaggle to the cloud in database how can I do it I want to build pipeline ??

spare magnet Jul 8, 2024, 4:57 PM

#

is it fine to learn nlp even i dont master python yet

unkempt apex Jul 8, 2024, 5:30 PM

#

spare magnet is it fine to learn nlp even i dont master python yet

learn python first!

spare magnet Jul 8, 2024, 5:30 PM

#

unkempt apex learn python first!

i already know python but not mastered yet

unkempt apex Jul 8, 2024, 5:31 PM

#

no one have mastered yet !

#

go then start NLP!

hearty token Jul 8, 2024, 5:59 PM

#

I wonder how good it is to fine tune a small English pre trained model for a non-english language task

#

Or if it would just be better to pre train a very model entirely on the other language and specialize the task

small wedge Jul 8, 2024, 6:18 PM

#

Depends on how close the language is to English, but I'd say it's probably better to train a model from scratch

valid basalt Jul 8, 2024, 7:51 PM

#

Hi everyone, Im Anna. I'm currently finishing my MBA with a focus on quantitative finance and for my dissertation I'd like to do a paper on "Machine Learning for Financial Market Forecasting: Unveiling Hidden Patterns for Informed Decisions" using LLM. I have a grounding in data science and I'm currently completing a course on the subject. However, I need help with the practical part of the model and training. If someone is interested in the subject and wants to help me, feel free to write me inbox so we can talk about it. Thanks

cloud flower Jul 8, 2024, 8:03 PM

#

Could someone help me get started with the first task please? 🙂

#

This is how far I've gotten 🤦:

class Interval:
    def __init__(…)

spring field Jul 8, 2024, 8:13 PM

#

is it meant to say that c or d can't be 0 or that there can't be a 0 in that range? cuz I don't see why there couldn't be a 0 somewhere in that range

spring field Jul 8, 2024, 8:14 PM

#

cloud flower This is how far I've gotten 🤦: ```py class Interval: def __init__(…) ```

mmm, well, what exact next steps are you looking for? like, what exactly is not clear to you? I personally find the operator rules translate really well to Python

cloud flower Jul 8, 2024, 8:22 PM

#

spring field is it meant to say that c or d can't be 0 or that there can't be a 0 in that ran...

That's a interesting inquiry, I can check with my teacher.

cloud flower Jul 8, 2024, 8:23 PM

#

spring field mmm, well, what exact next steps are you looking for? like, what exactly is not ...

Basically I don't know what the theory is even covering, like what am I looking at, it's just a bunch of jargon

spring field Jul 8, 2024, 8:24 PM

#

well, let's break it down a bit
well, first, how much do you know about classes in general?
how many instance attributes would a single interval need?

#

or even earlier, how many arguments would __init__ need to take? (excluding self and assuming a simple case like the task suggests)

cloud flower Jul 8, 2024, 8:37 PM

#

spring field well, let's break it down a bit well, first, how much do you know about classes ...

i was referring to the math related stuff, what in it do i need to understand to tackle the first task

#

it's just a bunch of information, would help if it was broken down

spring field Jul 8, 2024, 8:38 PM

#

well, naively speaking, you pretty much need to grasp this bit

#

like, what part of the math do you not understand? cuz I find it explained relatively clearly pithink

serene scaffold Jul 8, 2024, 8:41 PM

#

spring field well, naively speaking, you pretty much need to grasp this bit

why do min and max get involved for the latter two?

spring field Jul 8, 2024, 8:46 PM

#

tbf, they are technically involved for the first two as well, but they can be simplified in those two cases

pallid badge Jul 8, 2024, 8:46 PM

#

cloud flower Could someone help me get started with the first task please? 🙂

This reminds my of last years Advent of Code, wasn't there a task one could solve with this approach?

spring field Jul 8, 2024, 8:47 PM

#

for multiplication I can think of cases where the smallest (left) values can multiply and return a larger value than other combinations

#

same for division

#

like, you can't just simplify them to simple arithmetic because there are several possible outcomes depending on the values used

#

I suppose they can be alternatively written as piecewise functions as well

cloud flower Jul 8, 2024, 8:50 PM

#

@spring field could you dumb this down for me

#

for starters

spring field Jul 8, 2024, 8:55 PM

#

I don't even know where to begin breaking it down sobbing
like, do you have at least some intuition on what an interval is or what +/- is?

cloud flower Jul 8, 2024, 8:56 PM

#

yes, [a, b] could mean that there's some parameter t that is within the bounds of a and b, e.g. a≤t≤b

#

and yes, i know how arithmetic operators like + and - works

#

does the task have to do with error margins?

calm osprey Jul 8, 2024, 8:58 PM

#

cloud flower <@670379095951147019> could you dumb this down for me

This is about the butterfly effect

wooden sail Jul 8, 2024, 8:58 PM

#

spring field is it meant to say that c or d can't be 0 or that there can't be a 0 in that ran...

cuz division by 0

spring field Jul 8, 2024, 8:58 PM

#

cloud flower does the task have to do with error margins?

tbf, task 1 is simply asking you to create a class that takes in two values and that's about it

spring field Jul 8, 2024, 8:59 PM

#

wooden sail cuz division by 0

it's an "or" question waaaaaaaaaahhhhhh

#

I don't understand why there can't be a 0 somewhere in the range (a, b)

calm osprey Jul 8, 2024, 8:59 PM

#

spring field I don't understand why there can't be a 0 somewhere in the range (a, b)

It’s marginal error

wooden sail Jul 8, 2024, 8:59 PM

#

i don't see any question at all, the first image is all definitions

spring field Jul 8, 2024, 9:00 PM

#

well, you replied to my question

calm osprey Jul 8, 2024, 9:00 PM

#

wooden sail i don't see any question at all, the first image is all definitions

He wants it explained

wooden sail Jul 8, 2024, 9:00 PM

#

naturally if you want to divide a by b, b can't be 0

cloud flower Jul 8, 2024, 9:00 PM

#

spring field tbf, task 1 is simply asking you to create a class that takes in two values and ...

Then what the fuck was all that theory laden jargon about

#

Overwhelming the student

#

Teachers need to learn to condense the content

wooden sail Jul 8, 2024, 9:00 PM

#

if 0 is in the interval, you'd have to split it in two because 0 wouldn't be in the domain anyway

spring field Jul 8, 2024, 9:01 PM

#

cloud flower Then what the fuck was all that theory laden jargon about

well, you should have understood that after reading that whole thing pithink

wooden sail Jul 8, 2024, 9:01 PM

#

you'd also have half open intervals

iron basalt Jul 8, 2024, 9:01 PM

#

cloud flower Then what the fuck was all that theory laden jargon about

It gave you everything upfront for future tasks.

cloud flower Jul 8, 2024, 9:01 PM

#

calm osprey This is about the butterfly effect

Care to elaborate with an EASY example?

wooden sail Jul 8, 2024, 9:02 PM

#

cloud flower Teachers need to learn to condense the content

students need to learn to read and extract important info

spring field Jul 8, 2024, 9:02 PM

#

wooden sail if 0 is in the interval, you'd have to split it in two because 0 wouldn't be in ...

mmm, is it to do with how division is weird with stuff? like, how you can't just simplify some stuff sometimes because there's division, I think we had a discussion about this at some point

wooden sail Jul 8, 2024, 9:02 PM

#

spring field mmm, is it to do with how division is weird with stuff? like, how you can't just...

division by 0 is undefined and is always problematic

cloud flower Jul 8, 2024, 9:03 PM

#

This task is problematic

#

The course too

iron basalt Jul 8, 2024, 9:03 PM

#

When you do allow division by zero, it's usually boring (e.g. all numbers are now equal to zero).

cloud flower Jul 8, 2024, 9:03 PM

#

I’m abouta give my teacher some problems

iron basalt Jul 8, 2024, 9:04 PM

#

cloud flower Care to elaborate with an EASY example?

They gave the F=ma example, it's basically trying to keep track of what the possible values could be (smallest to largest).

#

(An interval)

cloud flower Jul 8, 2024, 9:05 PM

#

TLDR here’s how you can find out how to calculate the error margin, or the range within which the value you’re seeking is in?

iron basalt Jul 8, 2024, 9:05 PM

#

(Or with more dimensions, a bounding box)

calm osprey Jul 8, 2024, 9:06 PM

#

cloud flower Care to elaborate with an EASY example?

It’s a marginal difference between the actual size/weight between then actual figure unit
It come out to play in a very big way
Like in the stock market

Go to bbc channel then you find it
“The butterfly effect”

iron basalt Jul 8, 2024, 9:06 PM

#

cloud flower TLDR here’s how you can find out how to calculate the error margin, or the range...

It could be for errors, but also just like for example "it could end up anywhere in between these two."

spring field Jul 8, 2024, 9:07 PM

#

wooden sail division by 0 is undefined and is always problematic

can it be interpreted as all [infinitely many? uncountably many?] values from one interval being divided by all values from the other interval? and that happens to include 0, so that bit has to be excluded (with say half open intervals)

iron basalt Jul 8, 2024, 9:08 PM

#

Like if I throw a ball, and want to say it could end up in this interval, depending on this interval of mass, starting velocity, etc. And I want to calculate that interval.

calm osprey Jul 8, 2024, 9:08 PM

#

It’s complex but you can find more there
It’s not okay if I post the link here

cloud flower Jul 8, 2024, 9:08 PM

#

calm osprey It’s a marginal difference between the actual size/weight between then actual f...

It sounds familiar. The butterfly effect, isn’t it that a butterfly could cause a typhoon by flapping its wings since it has a negligible ,but nevertheless a contribution, to the typhoon occurring?

spring field Jul 8, 2024, 9:08 PM

#

sure, but I'm not sure how that's exactly related to the topic at hand

cloud flower Jul 8, 2024, 9:09 PM

#

me neither

calm osprey Jul 8, 2024, 9:10 PM

#

cloud flower It sounds familiar. The butterfly effect, isn’t it that a butterfly could cause ...

Yeah something like that
As it flaps it wings it’s could cause a tornado/hurricane in somewhere like in the US if the butterfly if it’s in Mexico or there about

cloud flower Jul 8, 2024, 9:10 PM

#

Alright bro but what does it have to do with the theory

wooden sail Jul 8, 2024, 9:11 PM

#

spring field can it be interpreted as all [infinitely many? uncountably many?] values from on...

all possible combinations, yes

calm osprey Jul 8, 2024, 9:11 PM

#

spring field sure, but I'm not sure how that's exactly related to the topic at hand

Yeah it is
It’s a bit complex that what makes final error is about that is minor but comes out to play a big difference

calm osprey Jul 8, 2024, 9:11 PM

#

cloud flower Alright bro but what does it have to do with the theory

That just explaining I didn’t mean it

spring field Jul 8, 2024, 9:11 PM

#

wooden sail all possible combinations, yes

I see, that's cool

wooden sail Jul 8, 2024, 9:12 PM

#

it's just formulated in a way you're not used to

#

you've done this all along whenever you got those questions about domain and range of functions

#

elementary arithmetic operators are binary functions too, and this shows one way of studying the domain and range

spring field Jul 8, 2024, 9:14 PM

#

yeah, but using two ranges arithmetically threw me off a bit

calm osprey Jul 8, 2024, 9:15 PM

#

@cloud flower
It’s proportional to the actual figure by a slight difference

cloud flower Jul 8, 2024, 9:15 PM

#

When you guys are done chatting about the math stuff I will repost my question and what progress I’ve made🖐️

calm osprey Jul 8, 2024, 9:15 PM

#

Just remembered 🧠 my brains now working harder

mental rampart Jul 8, 2024, 9:16 PM

#

why does tensorflow upwards of 2.10 not support native gpu support on windows

calm osprey Jul 8, 2024, 9:16 PM

#

cloud flower When you guys are done chatting about the math stuff I will repost my question a...

https://tenor.com/view/nouns-nounish-nounsdao-abacus-counting-gif-16562575250327679873

Tenor

umbral delta Jul 8, 2024, 10:02 PM

#

so i have this very simple code: print(data["rank"]) raster["rank"] = data["rank"] print(raster["rank"]) which somehow outputs```idx
0 65.686275
1 77.450980
2 80.392157
3 37.254902
4 68.627451
...
576 68.000000
577 51.000000
578 46.000000
579 83.000000
580 84.000000
Name: rank, Length: 581, dtype: float64
0001_U_0018_2010 NaN
0010_L_0002_2010 NaN
0012_L_0048_2010 NaN
0013_U_0016_2010 NaN
0015_L_0050_2010 NaN
..
0072_L_0031_2010 NaN
0008_U_0012_2010 NaN
0009_L_0086_2010 NaN
0009_U_0009_2010 NaN
0009_U_0034_2010 NaN

left tartan Jul 8, 2024, 10:42 PM

#

Because the indices aren't aligned

#

Reset the index of raster then see what happens

rich moth Jul 9, 2024, 1:00 AM

#

damn guys, I dont think its posssible to running on my system. I've tried a lot of different things, but I would have to find something with more GPU memory.

Traceback (most recent call last):
  File "/home/plunder/CATMANDODO63.py", line 692, in <module>
    main()
  File "/home/plunder/CATMANDODO63.py", line 687, in main
    model = train(model, train_dataloader, optimizer, criterion, tokenizer, device, epochs, val_dataloader, num_frames)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/plunder/CATMANDODO63.py", line 423, in train
    scaler.step(optimizer)
  File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py", line 416, in step
    retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py", line 315, in _maybe_opt_step
    retval = optimizer.step(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/optimizer.py", line 373, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
    ret = func(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/adam.py", line 163, in step
    adam(
  File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/adam.py", line 311, in adam
    func(params,
  File "/home/plunder/miniconda3/envs/qusar/lib/python3.11/site-packages/torch/optim/adam.py", line 565, in _multi_tensor_adam
    exp_avg_sq_sqrt = torch._foreach_sqrt(device_exp_avg_sqs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB. GPU 0 has a total capacty of 23.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 66.75 GiB is allocated by PyTorch, and 15.41 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(qusar) plunder@localhost:~$```

I feel like the code is right there, I just dont have the resources to run it.

#

I tried so many different things. But I i need commerical equipment at this point I guess. Switching the batch size, number of frames, accumlation steps in the training.. I can get it to go for a bit, but runs out of memory.

AJIvXivpvsr1l2sywCuMWRAEh_43WvFhHi6PGKOkhkdGxOiwJ2vEJPR93JFfge-6Ur-E6mVxyfYooBzxlH277p8Zys1BPxUiOZHnN-M1gsscw2OHo5G6vJIGGitE-_HOBNexhFaqFD5DeGWxmtQSOdyk8Jxe7I-eRC0siK2ck_rmVeWzqCyu38_ieNDI9BZans24OeTBq2wGf2I32JiX56pESJBmTIu1kFeNtFKRbUZf_3k1UBlm96u9BYBci3T74HTshWKqHG4I-KnqZKQ5wfus_AZr8qjL648KIwd.png

#

!paste

arctic wedgeBOT Jul 9, 2024, 1:07 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

rich moth Jul 9, 2024, 1:07 AM

#

Heres the code. https://paste.pythondiscord.com/MV3A

#

Anyone have any suggestions?

#

Maybe I can use a gpt2 distill instead...

agile cobalt Jul 9, 2024, 1:13 AM

#

rich moth Heres the code. https://paste.pythondiscord.com/MV3A

how long are your videos?

rich moth Jul 9, 2024, 1:14 AM

#

like 10-15 seconds.

agile cobalt Jul 9, 2024, 1:14 AM

#

how many fps?

rich moth Jul 9, 2024, 1:14 AM

#

25/s

agile cobalt Jul 9, 2024, 1:15 AM

#

so you are effectively working with a batch size of 375, using an extremely large model in consumer grade GPUs?

#

I'm not sure if I understood your loading code

rich moth Jul 9, 2024, 1:17 AM

#

I had a batch size of 16, tried to reduce it all the way down to 2. It runs for a bit. Input to VideoEncoder: batch_size=2, num_frames=16, channels=3, height=128, width=128

#

So no 375 batch sizes 🙂

agile cobalt Jul 9, 2024, 1:21 AM

#

I must have misunderstood how it integrates with the loader then

lapis sequoia Jul 9, 2024, 1:24 AM

#

is this the correct way to do sentiment analysis

#

from textblob import TextBlob

def polarity(text):
return TextBlob(text).polarity

df['polarity'] = df['lyric'].apply(polarity)

def sentiment(label):
if label < 0:
return "Negative"
elif label == 0:
return "Neutral"
elif label >= 0:
return "Positive"

df['sentiment'] = df['polarity'].apply(sentiment)

rich moth Jul 9, 2024, 4:38 AM

#

I think I got it to work incorporating gardiuent accumulation and tensor management.

#

Keep my fingers crossed! Epoch 1/1: 16%|█████████ | 231/1402 [15:59<1:33:01, 4.77s/batch, Batch Loss=0.00133]Input to VideoEncoder: batch_size=4, num_frames=8, channels=3, height=128, width=128 After view reshape: torch.Size([4, 24, 128, 128]) After conv2d_layers: torch.Size([4, 512, 128, 128]) After view reshape before fc: torch.Size([4, 8388608]) After fc layer: torch.Size([4, 512]) Input to VectorQuantizer: torch.Size([4, 512]) After flattening and reshaping: torch.Size([32, 64]) Distances shape: torch.Size([32, 512]) Encoding indices shape: torch.Size([32]) Quantized tensor shape: torch.Size([4, 512]) Commitment loss: 1.738540959195234e-05, Codebook loss: 1.738540959195234e-05, VQ loss: 2.1731761080445722e-05 Input to VideoDecoder: torch.Size([4, 512]) After fc layer: torch.Size([4, 131072]) After view reshape: torch.Size([4, 512, 16, 16]) After conv_reduce: torch.Size([4, 512, 16, 16]) After conv2d_transpose_layers: torch.Size([4, 24, 128, 128]) Channels: 3, Expected size: 1572864, Actual size: 1572864 Final output shape: torch.Size([4, 8, 3, 128, 128]) Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation. Epoch 1/1: 17%|█████████ | 232/1402 [16:04<1:32:57, 4.77s/batch, Batch Loss=0.00223]

peak ridge Jul 9, 2024, 9:02 AM

#

How much math, is enough math
to get started

#

i've studied a bit calculas in high-school also linear algebra,
do u have any specific resource for Stats and Probability to be specific? @final kiln

#

the problem with khan academy is

#

they have tooo toooo tooo deept in content

#

it literally took me idk how many hours to just do 1 out of 3 unit of Linear Algebra

#

also,
_ practicing frequently with py and jupyter notebooks_
how can i practice math with kaggle 🧐 i mean from python/coding itself

#

how can i practice math from coding

#

ohh

#

how can i do stats and probability from python

#

ohh

#

okay

#

ohh

devout python Jul 9, 2024, 11:19 AM

#

Hey folks - I have to routinely run some data manipulation of my postgres database (likely once per day), I am considering making a python script deployed on k8 as a cronjob, but it feels a bit like overkill. Are there simpler methods?

hallow sphinx Jul 9, 2024, 12:43 PM

#

Why do peeps code AI in notebook, and not just python files like every programmer does?

mint goblet Jul 9, 2024, 1:18 PM

#

Hey there im using Autogen AI conversable agents to generate images. In my output i have a few urls that i need to get out of the conversation but i can't seem to find a solution for this problem. In the omage you can see one of the agents talking and giving an output url "ai_generated_img" with the link that i want to save in a variable

hallow sphinx Jul 9, 2024, 2:00 PM

#

Can we consider supervised learning, as where result is known, and unsupervised where the output is not known?

agile cobalt Jul 9, 2024, 2:28 PM

#

it's not necessarily "known" / "unknown", but rather labelled or not

take outlier detection an example - you may train a model without telling it which records are outliers, then validate that it is catching all records you know that are outliers

another example would be the unsupervised pre-training step of GPT models and alike, I wouldn't consider there to be any unknowns, but it's still considered unsupervised

hallow sphinx Jul 9, 2024, 2:34 PM

#

I was reading a book "100 pages ML book" and it has this formula of Support Vector Machine (SVM)

y = sign(wx - b)

Then it continues to say that ```math
wx - b >= +1 if y = +1, and
wx - b <= -1 if y = ≠1


But shouldn't it be using limits here?
Since, `sign(0.01) = +1` and `sign(-0.01) = -1`

wooden sail Jul 9, 2024, 2:58 PM

#

hallow sphinx Can we consider supervised learning, as where result is known, and unsupervised ...

@agile cobalt i would actually call known vs unknown as you did, since the output of a network is not always the label, and some unsupervised and self supervised approaches treat the input data as its own label too

hallow sphinx Jul 9, 2024, 3:04 PM

#

Do you have to remember lots of formulas for ML?

serene grail Jul 9, 2024, 3:06 PM

#

I'm not an ML guy, but I think in math in general what matters is understanding the concepts behind the formula, not memorizing the formila

hallow sphinx Jul 9, 2024, 3:06 PM

#

o.O

wooden sail Jul 9, 2024, 3:07 PM

#

i agree with that

#

i was also about to write this as well. it just happens inevitably

tepid pecan Jul 9, 2024, 3:20 PM

#

So... I have two training/learning question. So my IT (A) and a co-worker (B) would like to learn python data entry, transformation, and clean up.
The packages I know about are scikitlearn, scipy, and pandas.

#

Q1) I primarily write in R and python mix (using both together), but I use pipe notation in R (see link). My boss also uses R notation, but does not know pandas. Only myself and IT person A have used pandas. https://www.r-bloggers.com/2021/05/the-new-r-pipe/
Does the scientific python packages have a general pipe operator or function for general python and data clean up? Because I like to convert my code over pure to python so they can learn better.

R-bloggers

Code R

The new R pipe | R-bloggers

R 4.1.0 is out! And if version 4.0.0 made history with the revolutionary change of stringAsFactors = FALSE, the big splashing news in this next version is the implementation of a native pipe. The new pipe The “pipe” is one of the most distinctive qualities of tidyverse/dplyr code. I’m sure you’ve used or seen something like this: library(dplyr) ...

#

Q2) Are there any new scientific packages or resources since 2022 that people recommend to make coding easier?

stable rover Jul 9, 2024, 3:59 PM

#

how does YOLO object detection predict bounding boxes? i understand that it uses a grid and each cell predicts a bounding box(es) and class but i don't understand how its possible. does each grid cell run its own classifier so to speak?

tepid pecan Jul 9, 2024, 4:13 PM

#

stable rover how does YOLO object detection predict bounding boxes? i understand that it uses...

Not sure if this answers your question but I found a related blog post: www.analyticsvidhya.com/blog/2018/12/practical-guide-object-detection-yolo-framewor-python/

frigid cove Jul 9, 2024, 5:29 PM

#

Is it possible to fine-tune a model using 100 classes, in a laptop with a RTX 4060?

cosmic lynx Jul 9, 2024, 6:26 PM

#

what is the least jarring way to step into learning more about the engineering side of AI?
or do I just need to go headfirst into learning the scary half of calculus?
I know a little bit about the big three (most comfortable in Calc, okay in stats and shaky in linear algebra)

wooden sail Jul 9, 2024, 6:34 PM

#

i would argue that the linalg is the most important component, and the stats right after

#

the calc is often used more as a tool in helping out with getting nice results for the other two

cosmic lynx Jul 9, 2024, 6:35 PM

#

okay thank you

lapis sequoia Jul 9, 2024, 6:47 PM

#

Can running neural networks dismantle your pc?

#

And, when is imagedatagenerstor better? Under what circumstances?

lapis sequoia Jul 9, 2024, 6:49 PM

#

cosmic lynx what is the least jarring way to step into learning more about the engineering s...

Calculus is very easy. Just don’t let people get in your head.

cosmic lynx Jul 9, 2024, 6:49 PM

#

lapis sequoia Calculus is very easy. Just don’t let people get in your head.

doesn't it get really wild as you start sinking deeper?

past meteor Jul 9, 2024, 6:50 PM

#

cosmic lynx what is the least jarring way to step into learning more about the engineering s...

Define "the engineering side"?

#

Being a consumer of GenAI libraries and services needs absolutely 0 math

#

And it's, for better or for worse, what many companies consider to be the "engineering side of AI"

lapis sequoia Jul 9, 2024, 6:51 PM

#

cosmic lynx doesn't it get really wild as you start sinking deeper?

I don’t know. Whatever is “hard” is dependent on the person. There is nothing that cannot be learned. I thought Game Theory was harder than like the most insane math classes ever.

cosmic lynx Jul 9, 2024, 6:53 PM

#

past meteor Define "the engineering side"?

idealistically working towards being able to make my own from the ground up
whether it ever makes sense to actually do that or not is an entirely seperate question

cosmic lynx Jul 9, 2024, 6:54 PM

#

lapis sequoia I don’t know. Whatever is “hard” is dependent on the person. There is nothing th...

I think I might be mixing it up with something else at this point

lapis sequoia Jul 9, 2024, 6:56 PM

#

cosmic lynx I think I might be mixing it up with something else at this point

Just, learn it, listen to no one’s opinion on it who isn’t suited to teach it, and decide for yourself.

#

I thought calc 3 was a walk in the park. Like, dirt easy.

#

Ok, like what then?

#

Who said I didn’t?

#

No, I just took so many partials and optimization problems with constraints to the point it was easy when I took it.

#

Bro, I took adanced calculus 1 and grad level optimization class. What I meant by hard, depends on the person. I don’t understand finance at all, but calculus never ever gave me a problem.

#

Ok, give me a problem. I don’t know how to answer that.

#

Like, end of calc 3 with cramers rule? I don’t know. Not even bad

#

All I am saying is it just depends on the person. I agree, abstract algebra is cancer, I hate Game Theory, but that is because there are not as many books on it as there are for calculus, matrix and linear algebra, stats/probability. I just think people should decide for themselves.

#

I don’t remember much, this was a while ago. I took calc1-3 in 2016-2017, I just remember game theory was insanely hard to me.

serene grail Jul 9, 2024, 7:26 PM

#

Sounds tough as hell.
But I've also heard that real analysis is very hard in general, personally I have no idea what it is

#

That sucks

lapis sequoia Jul 9, 2024, 7:27 PM

#

Real analysis is stupid hard tho in all honesty

#

Stats, like I took stats 1-3, metrics 1-3, did well in all of them and I don’t remember anything from it. Like, basic concepts.

#

It’s not that I don’t get it, it’s just easier to explain concepts. I don’t know calc1-3, matrix and linear algebra, optimization were always much easier to me compared to even like finance stuff. I swear, it took me so long to understand the concept of a bond.

obsidian mesa Jul 9, 2024, 7:39 PM

#

Is udemy free course on AI/ML is recognized as advertisement?

#

I mean I am new here, not much aware of rules, hence asking...

past meteor Jul 9, 2024, 7:44 PM

#

Can you remove this message? ads are against the rule and it's an advertisement

lapis sequoia Jul 9, 2024, 7:46 PM

#

Any of you ever take Game Theory? It is a literal branch of mathematics.

wary cosmos Jul 9, 2024, 7:57 PM

#

The decoder in a transformer is trained on a whole sentence at once. It has a mask during self attention to prevent the an earlier timestep from looking forward and directly seeing what it should be outputting.

As far as I am aware the feedforward layer near the end is just an MLP, and they work by having every neuron in a subsequent layer sees all the outputs of the previous layer.

How is looking forward also prevented in the feedforward layer?

#

Thx did not know that

I assume the encoder is also only by embedding?

#

Thx

lapis sequoia Jul 9, 2024, 8:04 PM

#

Thank you for acknowledging it as a branch of mathematics. It was until the 70s or something and all the pioneers of game, pretty much had an insane influence or partial differential equations.

wary cosmos Jul 9, 2024, 8:05 PM

#

Isn’t game theory just a complicated optimization function

#

Broadly speaking

past meteor Jul 9, 2024, 8:06 PM

#

lapis sequoia Any of you ever take Game Theory? It is a literal branch of mathematics.

@iron basalt

iron basalt Jul 9, 2024, 8:08 PM

#

lapis sequoia Any of you ever take Game Theory? It is a literal branch of mathematics.

It's a field of mathematics, yes.

#

Invented / pushed by the cold war to beat the soviets.

#

(Its modern form (although like all math, you can find it waaaay earlier))

lapis sequoia Jul 9, 2024, 8:12 PM

#

iron basalt It's a field of mathematics, yes.

Industrial Organization!!!!! Yo! Bertrand games are so fun!

#

Bertrand Duopolies were just limits pretty much, but for price wars. Industrial Organization is my favorite topic of all time.

iron basalt Jul 9, 2024, 8:15 PM

#

It's really fun and will change your perspective on all the actions taken by nations and such (a better understanding of why they are doing what they do (they calculate things, especially in wars)).

lapis sequoia Jul 9, 2024, 8:15 PM

#

Yes

lapis sequoia Jul 9, 2024, 8:16 PM

#

iron basalt It's really fun and will change your perspective on all the actions taken by nat...

Was confused me were sequential move games when the second agent/player had multiple moves at a node when the player before already moved. I never understood that logic at all. And Bayesian Nash equilibrium, pretty much just sequential games, not simultaneous games

lapis sequoia Jul 9, 2024, 8:19 PM

#

iron basalt It's really fun and will change your perspective on all the actions taken by nat...

Bertrand and Cournout. I love game, just really wish I understood when similar about games turn sequential and then players have more moves for some reason.

iron basalt Jul 9, 2024, 8:21 PM

#

lapis sequoia Bertrand and Cournout. I love game, just really wish I understood when similar a...

Combinatorial game theory.

lapis sequoia Jul 9, 2024, 8:24 PM

#

iron basalt Combinatorial game theory.

Do you know what I am talking about tho? I get they are using combinations of all possible moves, but, the person could not move at a specific node so I never understood roll they got second mover advantage in rollback equilibrium.

odd meteor Jul 9, 2024, 9:36 PM

#

lapis sequoia Any of you ever take Game Theory? It is a literal branch of mathematics.

I did. I was taught Game Theory & Gambler's Ruin in my Statistics course.

iron basalt Jul 9, 2024, 9:53 PM

#

lapis sequoia Do you know what I am talking about tho? I get they are using combinations of al...

Not sure, the sentence is a bit hard to follow. Do you mean how second-mover advantage can be a thing / under what circumstances?

narrow tiger Jul 9, 2024, 10:15 PM

#

so i am trying to create custom tool for agent but why does it get stuck like this?

#

what does this even mean. it runs the tool perfectly

#

@tool
def get_prices(query:str):
    """Can be used to get current market prices of any crypto asset"""
    return 69.69

prices_tool = Tool(name = 'crypto prices',func= get_prices,description="Can be used to get current prices of crypto assets")

x = [prices_tool]
agent = initialize_agent(x,llm,AgentType.ZERO_SHOT_REACT_DESCRIPTION,verbose= True, handle_parsing_errors =True)```
This is how i am using it (hardcoded for now bcz it is faster)

#

LLMs have totally replaced regex for me ducky_concerned

serene scaffold Jul 9, 2024, 10:35 PM

#

narrow tiger LLMs have totally replaced regex for me<:ducky_concerned:1178032077514477629>

because you ask the LLM to write the regex, or what?

narrow tiger Jul 9, 2024, 10:50 PM

#

no like most of the times llms are just using natural language string to get some some data from that string

#

used to do regex for that and now everyone wants to use llm for it

#

also llm write regex really good for some reason,

buoyant vine Jul 9, 2024, 11:35 PM

#

🤨

lapis sequoia Jul 9, 2024, 11:48 PM

#

iron basalt Not sure, the sentence is a bit hard to follow. Do you mean how second-mover adv...

No, when the rollback from an extensive for game changes because it was fist a simultaneous move game.

lapis sequoia Jul 9, 2024, 11:53 PM

#

iron basalt Not sure, the sentence is a bit hard to follow. Do you mean how second-mover adv...

I drew out a example very quick

junior ibex Jul 10, 2024, 12:00 AM

#

Anyone understand why the d2 vector for language is 0, and why on d1 “what” is 0.25 and candy 0.125 if they both show up once?

iron basalt Jul 10, 2024, 12:09 AM

#

lapis sequoia No, when the rollback from an extensive for game changes because it was fist a s...

If it becomes sequential the second to move player can condition their move on the first player-to-move's move.

lapis sequoia Jul 10, 2024, 12:10 AM

#

iron basalt Not sure, the sentence is a bit hard to follow. Do you mean how second-mover adv...

Now I know. It started out as a simultaneous move game and the row player had two plans of action but since it was originally a simultaneous game, the sequential move benefits the second against because he now has 4 options instead of 2. Yeah. It’s just 8 plans of action now. While the row player still only has two the column player picks the highest payoff.

iron basalt Jul 10, 2024, 12:12 AM

#

lapis sequoia Now I know. It started out as a simultaneous move game and the row player had tw...

Depending on the payoffs even after going sequential it could be first to move advantage (manipulate the possible moves for the second to move player) or second to move advantage (condition on first move made (more information (which may mean more moves))).

lapis sequoia Jul 10, 2024, 12:14 AM

#

iron basalt Depending on the payoffs even after going sequential it could be first to move a...

In this game, if the row player were to move second, would it be the same scenario and what is the rollback for that game? I am trying to remember the reasoning behind this. I’ve been relearning game for fun.

lapis sequoia Jul 10, 2024, 12:17 AM

#

iron basalt Depending on the payoffs even after going sequential it could be first to move a...

Are you referring to the information set? Separating equilibrium? And grim trigger and fit-for tat, was the made solely for collusion? Infinitely repeated prisoners dilemma.

iron basalt Jul 10, 2024, 12:26 AM

#

lapis sequoia In this game, if the row player were to move second, would it be the same scenar...

So we don't have to refer to them as row and column player I found this diagram, with same payoffs:

lapis sequoia Jul 10, 2024, 12:27 AM

#

(Up,low)

iron basalt Jul 10, 2024, 12:28 AM

#

White plays low if black plays up, and white plays high if black plays down. This is the only subgame perfect equilibrium.

lapis sequoia Jul 10, 2024, 12:28 AM

#

Yes.

#

!but that is purely sequential. I am talking when it becomes simultaneous

arctic wedgeBOT Jul 10, 2024, 12:29 AM

#

Microsoft Visual C++ Build Tools

When you install a library through pip on Windows, sometimes you may encounter this error:

error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

This means the library you're installing has code written in other languages and needs additional tools to install. To install these tools, follow the following steps: (Requires 6GB+ disk space)

1. Open https://visualstudio.microsoft.com/visual-cpp-build-tools/.
2. Click Download Build Tools >. A file named vs_BuildTools or vs_BuildTools.exe should start downloading. If no downloads start after a few seconds, click click here to retry.
3. Run the downloaded file. Click Continue to proceed.
4. Choose C++ build tools and press Install. You may need a reboot after the installation.
5. Try installing the library via pip again.

lapis sequoia Jul 10, 2024, 12:32 AM

#

iron basalt White plays low if black plays up, and white plays high if black plays down. Thi...

In that sequential game it would have to be (up,low). If black plays down, white plays high, this gives black a lower payoff. Black always chooses up and white always chooses low, unless the game becomes simultaneous. Then, it would depend in the second mover.

iron basalt Jul 10, 2024, 12:33 AM

#

lapis sequoia In that sequential game it would have to be (up,low). If black plays down, white...

Black plays down.

#

It's first mover advantage.

#

This has second mover advantage:

lapis sequoia Jul 10, 2024, 12:35 AM

#

Oh, white has the payoffs to the right in that game.

iron basalt Jul 10, 2024, 12:35 AM

#

Yeah.

#

(Black, White)

lapis sequoia Jul 10, 2024, 12:36 AM

#

Yeah, I was reading it (white, black) in terms of payoffs

serene grail Jul 10, 2024, 12:36 AM

#

I was a bit confused too, used to the chess order

lapis sequoia Jul 10, 2024, 12:39 AM

#

iron basalt This has second mover advantage:

That game is weird. But yeah, Bernard would have: (zig,zig), (zig,zag), (zag, zig), (zag,zag) and Amy would only have up and down

iron basalt Jul 10, 2024, 12:39 AM

#

lapis sequoia That game is weird. But yeah, Bernard would have: (zig,zig), (zig,zag), (zag, zi...

Yeah.

#

I think you got it.

lapis sequoia Jul 10, 2024, 12:40 AM

#

I do get it, it is just weird when the game starts out sequential and turns simultaneous.

lapis sequoia Jul 10, 2024, 12:48 AM

#

iron basalt Yeah.

I drew it out

#

I drew the starting game table and did when each person moved first. They have the same payoffs demo ending on the order of the moves and there is no pure strategy Nash equilibrium in the original game.

iron matrix Jul 10, 2024, 1:10 AM

#

Hi! Apologies if this is a dumb question, I'm not a data scientist, just a programmer 🙂

#

I have two 2D spaces. In each space there are some number of points, each identified by (x,y) and some (text) label. The labels are not guaranteed to be unique (there might be two points in the same space with the same label), and there's no guarantee that every label in one space also exists in the other, but there will be a lot of correlation.

#

What I want to do is find a best-match transformation matrix to convert coordinates from one space into the other. This seems like the sort of thing that ought to exist (image processing has to have this covered, right?), but I'm not sure what I should be looking for. Can someone point me in the right direction, please?

lapis sequoia Jul 10, 2024, 1:56 AM

#

iron basalt Yeah.

not trying to blow you up about game man, I just kind forgot how much fun it was (and just doing math in general on paper was instead of programming all of the time) but like, with grim trigger, can that be used to detect if a a company will collude or defect on a collusion data set?

stable rover Jul 10, 2024, 3:00 AM

#

how are convolutional layers (output tensor) flattened into fully connected layers?

#

for example, in here, are there really 1690 * 10 * classes (10) weights to learn just for the final layer?

opal magnet Jul 10, 2024, 3:39 AM

#

Anyone here

#

And would be willing to help

unkempt apex Jul 10, 2024, 9:23 AM

#

everyone is!

lapis sequoia Jul 10, 2024, 12:05 PM

#

stable rover for example, in here, are there really 1690 * 10 * classes (10) weights to learn...

its 1690 * 10 weights

#

its just 13x13x10 flattened into 1690 inputs, and 10 outputs, so the weights are 1690*10 matrix

toxic mortar Jul 10, 2024, 12:19 PM

#

Hi guys,

I've integrated a RAG LLM-based OpenAI chatbot into my app. Now, I want to implement algorithms within the chatbot, like procedural steps that my chatbot will do sequentially. For example, the chatbot first asks the user for name, then save that information, then ask for their age, then again saves it, and finally when he has all the necessary informations he computes something based on this and then provides the result.

Is there a tool or method to achieve this within Langchain framework? Or can you point me to some keyword name. Thanks 😄

fiery vigil Jul 10, 2024, 12:52 PM

#

Hi everyone,

a very basic question from me, to decide between two implementation of Tensorflow: C API or Python?

I understand that Python is still implementing code written in C, but I am wondering if the size of data I am dealing with, makes it more time-consuming? I am trying to simulate a recursive iteration process of the type:

x[i+1] = f(x[i])

where x[i] is a vector with millions of components. Conventional iterative methods lead to "stagnation", kind of like vanishing gradient. So x[i+1] ~ x[i] after a while. I have tried updating the iteration scheme itself, include higher-order terms, but it is already slower.

So I was thinking maybe with some architecture of NNs, it might be possible to "jump ahead" in the iteration scheme, to accelerate convergence and get out of stagnating solutions.

Now comes the issue of what implementation to use: the standard, well-documenter Python Tensorflow, or the C API that barely had any documentation, just a loose clump of github pages. I thought that even if training is done in C/C++, for a NN defined in Python, maybe it would be faster? Or is it not so? Even if there's some degree of speed-up in C++, is it possible to implement it as quickly and consistently as in Python (using Python 3, if that matters). All CPU too, btw, no GPU. Got a whole cluster of hundreds of CPUs.

Thank you for any help/insights/suggestions!

left tartan Jul 10, 2024, 1:09 PM

#

fiery vigil Hi everyone, a very basic question from me, to decide between two implementatio...

Are you asking whether there's an appreciable performance difference between using tensorflow in c++ vs Python?

fiery vigil Jul 10, 2024, 1:11 PM

#

left tartan Are you asking whether there's an appreciable performance difference between usi...

Yes, especially with the millions of datapoints here. I found a stackoverflow post that first said that OP found C++ API was somehow slower than Python. Not much responses there and later OP found some way to get speed-up, but no clarity on if it is appreciably faster than Python.

left tartan Jul 10, 2024, 1:13 PM

#

fiery vigil Yes, especially with the millions of datapoints here. I found a stackoverflow po...

I don't have direct experience, so I'm speaking to generalities: the Python interface to tensorflow is the primary one, and most used one. I doubt you'd notice any improvements, and certainly would have a worse development experience.

#

But, perhaps in some isolated use case, one might find ways of improving... but I'd think you'd be at some expert level and a year or more of experience

fiery vigil Jul 10, 2024, 1:16 PM

#

left tartan I don't have direct experience, so I'm speaking to generalities: the Python inte...

Yeah, I just need to read it from others with more experience. I have tried Keras here and there, and now I have to go a level lower and use Tensorflow directly. I hope whatever model is implemented in Python can handle the data. Of course, people do image processing and all in Tensorflow Python, but this recursive iteration scheme might need some matrix multiplication operations (that are already millions x millions ~ quadrillions of unique elements).

wooden sail Jul 10, 2024, 1:20 PM

#

matrix multiplication never happens directly in python

tidal bough Jul 10, 2024, 2:55 PM

#

fiery vigil Yeah, I just need to read it from others with more experience. I have tried Kera...

If the main thing that the iteration scheme does is matrix multiplication, that's evidence towards there not being a speedup from doing it in a compiled language, because multiplying matrices on the python side is already very efficient.

fiery vigil Jul 10, 2024, 2:56 PM

#

wooden sail matrix multiplication never happens directly in python

But doing these kind of multiplications is very slow. Not even enough RAM. Then I have to do some multiprocessing parallelisation, which is still way slower than openmp for loops in C.

tidal bough Jul 10, 2024, 2:56 PM

#

...what do you mean? Are you writing your own matric multiplication function from scratch?

#

Since numpy's matmul implementation is already in a compiled language, and can use multiple cores if you have the right numpy build (I think it has to be the MKL one and not the BLAS one).

fiery vigil Jul 10, 2024, 2:57 PM

#

tidal bough ...what do you mean? Are you writing your own matric multiplication function fro...

No, I cut the matrix into chunks, do matrix multiplication using numpy routines and put the result back together

tidal bough Jul 10, 2024, 2:57 PM

#

Ah, I see. Is your matrix sparse?

fiery vigil Jul 10, 2024, 2:57 PM

#

tidal bough Ah, I see. Is your matrix sparse?

Nah, pretty dense matrix, hence the quadrillions of elements.

tidal bough Jul 10, 2024, 2:58 PM

#

Hmm, and yet it works better to manually split it into chunks than to let numpy handle it?

fiery vigil Jul 10, 2024, 2:58 PM

#

tidal bough Hmm, and yet it works better to manually split it into chunks than to let numpy ...

It doesn't even fit in the RAM. TBs of matrix alone. So I cut it into pieces and do batch-wise multiplication.

tidal bough Jul 10, 2024, 2:59 PM

#

Oh wait, quadrillions of elements - so you can't even- yeah, okay, that makes sense

#

I wonder if something like dask has a streaming matmul implementation that'd work here, but it makes sense that you made your own. Anyway, naively it seems to me like there's nothing there that can't be done efficiently in python (do a partial matmul via numpy, afterwards (or even in the process) start loading the next chunk, etc), but I might not be thinking about some details of your implementation.

fiery vigil Jul 10, 2024, 3:01 PM

#

Using for loops to cut chunks then parallise the chunks. For loops already bad enough. OTOH, for loops in C are just simple, all static data types. Another advantage of C is now I don't need to define thr whole matrix. Just parallise the for loops, do some aggressive compilation, and it works out. Was hoping to get that compiler magic for NN training too.

tidal bough Jul 10, 2024, 3:02 PM

#

For loops already bad enough
I don't really get why. The overhead of things like looping in python becomes noticable when the body of the loop takes very little time - but for you it's a pretty big matmul, so it seems to me that it shouldn't matter.

fiery vigil Jul 10, 2024, 3:04 PM

#

I did the chunks thing in python first. It is painful. I was thinking dask etc, but too much abstraction is going to make it slower. Trying to keep it simple. I wanted to try and "strip" all the OOP stuff on numpy matrices, if that made it faster. Was thinking of Cython because of it. Had some discussion here, found even Cython is at best a bit slower than pure C, so switched to C.

fiery vigil Jul 10, 2024, 3:06 PM

#

tidal bough > For loops already bad enough I don't really get why. The overhead of things li...

I am guessing it's the checking for datatypes and each loop has to then start the multiprocessing part. JIT and numba were other ideas to make the loops faster, but if I can get fastest-ish in C, why the extra effort?

tidal bough Jul 10, 2024, 3:08 PM

#

oh, If you're using multiprocessing I actually have a guess what was happenning - it was probably the fact that arguments to functions generally need to be serialized in order to send them to another process. So you might have been eating the serialization overhead on all of that data, which is quite a lot.

fiery vigil Jul 10, 2024, 3:08 PM

#

Now with tensorflow, situation is different. Not seeing much discussion on the C API, just bits and pieces scattered around. Official Tensorflow website has a barebones as well on C API. So have to contend with this in Python.

tidal bough Jul 10, 2024, 3:10 PM

#

I don't know much about the tensorflow C API (when I tried using libtorch my experience was that there was basically no docs about the C++ side and I had to read the python ones instead, so it's not much better either), but since you're basically just working with raw arrays, all you really need is to extract the pointer to the data and some shape information, and then work on that.

fiery vigil Jul 10, 2024, 3:11 PM

#

tidal bough oh, If you're using multiprocessing I actually have a guess what was happenning ...

I was sending some index markers and a some small vectors from which the massive array would ve constructed (meshgrid without using meshgrid because of memory constraints). So each thread in the pool just made the piece it needed, did the math, and gave its part of the final output vector.

fiery vigil Jul 10, 2024, 3:14 PM

#

tidal bough I don't know much about the tensorflow C API (when I tried using `libtorch` my e...

The indexing and slicing part becomes expensive. But if there's no other option...
I was thinking that maybe I will try to avoid the matrix stuff altogether. The tensors in the DNNs are kind of emulating that part anyway. So I could maybe try to emulate a massive matrix with quadrillion elements with say, 200 neurons with some 10-100 rank tensors? If such a thing is possible. It will be approximate, but the hope is that the DNNs find the most essential features to sufficiently emulate just right enough information to feed into the recursive iteration.

tidal bough Jul 10, 2024, 3:25 PM

#

Maybe? I'd expect that if it's the kind of task that it needs an iteration scheme to compute, it would also be unstable and amplify the approximation errors exponentially as the iterations go on. But if that's not the case for yours, maybe it'll work.

fiery vigil Jul 10, 2024, 3:28 PM

#

tidal bough Maybe? I'd expect that if it's the kind of task that it needs an iteration schem...

Yeah, that is one of the problems. It's compounded by other kind of specific errors as well. I was just using the NN aspect as a "quick fix/jump" to the main method. I guess that is why this is still an open area of research, otherwise someone would've done it already lol. Thanks for taking the time though.

jaunty helm Jul 10, 2024, 4:20 PM

#

in pytorch, is there any disadvantage to using the lazy modules vs. the non-lazy ones? (e.g. nn.LazyLinear vs nn.Linear)

#

I see... to me right now they just feel like nicer-to-use counterparts to their non-lazy siblings

past meteor Jul 10, 2024, 6:10 PM

#

jaunty helm in `pytorch`, is there any disadvantage to using the lazy modules vs. the non-la...

When you use it the warning tells you the downside 🙂

#

The API is unstable

#

I typically use non-lazy as it's a sanity check while I'm implementing the net

mild yarrow Jul 10, 2024, 6:43 PM

#

Hey Guys! Consider you are implementing a fashion recommender system which recommends user what to wear on that day on the basis of various factors which will make user look better on that particular day ... What are the requirements I need to have in my mind already for creating one

tidal bough Jul 10, 2024, 7:05 PM

#

mild yarrow Hey Guys! Consider you are implementing a fashion recommender system which recom...

uh, you'd need to somehow obtain a dataset of "various factors + clothes worn -> 'how good they look'", which, good luck with that.

mild yarrow Jul 10, 2024, 7:06 PM

#

tidal bough uh, you'd need to somehow obtain a dataset of "various factors + clothes worn ->...

And what about the weather and surrounding factors coz they too will have an impact on the looks of what we wear i guess

split bone Jul 10, 2024, 7:37 PM

#

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder, OneHotEncoder


# Kodlar



# Veri Yükleme

veriler = pd.read_csv('D:\\Project\\eksikVeriler.csv')
print(veriler)


ulke = veriler.iloc[:, 0:1].values
print(ulke)

from sklearn import preprocessing

le = preprocessing.LabelEncoder()

ulke[:,0] = le.fit_transform(veriler.iloc[:,0])

print(ulke)

ohe = preprocessing.OneHotEncoder()
ulke = ohe.fit_transform(ulke).toarray()
print(ulke)

print(list(range(22)))
sonuc = pd.DataFrame(data=ulke, index=range(22), columns=['fr', 'tr', 'us',''])
print(sonuc)

sonuc2 = pd.DataFrame(data=ulke, index=range(22), columns=['boy', 'kilo', 'yas',''])
print(sonuc2)

cinsiyet = veriler.iloc[:,-1].values
print(cinsiyet)

sonuc3 = pd.DataFrame(data=ulke, index=range(22), columns=['cinsiyet','','',''])
print(sonuc3)

s=pd.concat([sonuc, sonuc2], axis=1)
print(s)

s2=pd.concat([s, sonuc3], axis=1)
print(s2)

umbral delta Jul 10, 2024, 7:37 PM

#

so im trying to train a nn using tensorflow, but it just returns the same output for all inputs. what could be causing this?

split bone Jul 10, 2024, 7:37 PM

#

split bone ```m import pandas as pd import numpy as np from sklearn.impute import SimpleImp...

ulke,boy,kilo,yas,cinsiyet
tr,130,30,10,e 
tr,125,36,11,e 
tr,135,34,10,k 
tr,133,30,9,k 
tr,129,38,12,e 
tr,180,90,30,e 
tr,190,80,25,e 
tr,175,90,35,e 
tr,177,60,22,k 
us,185,105,33,e 
us,165,55,27,k 
us,155,50,44,k 
us,160,58,39,k 
Us,162,59,41,k 
us,167,62,55,k 
fr,174,70,47,e 
fr,193,90,23,e 
fr,187,80,27,e 
fr,183,88,28,e 
fr,159,40,29,k 
fr,164,66,32,k 
fr,166,56,42,k

#

Please look my projects file

#

     fr   tr   us       boy  kilo  yas       cinsiyet               
0   0.0  1.0  0.0  0.0  0.0   1.0  0.0  0.0       0.0  1.0  0.0  0.0
1   0.0  1.0  0.0  0.0  0.0   1.0  0.0  0.0       0.0  1.0  0.0  0.0
2   0.0  1.0  0.0  0.0  0.0   1.0  0.0  0.0       0.0  1.0  0.0  0.0
3   0.0  1.0  0.0  0.0  0.0   1.0  0.0  0.0       0.0  1.0  0.0  0.0
4   0.0  1.0  0.0  0.0  0.0   1.0  0.0  0.0       0.0  1.0  0.0  0.0
5   0.0  1.0  0.0  0.0  0.0   1.0  0.0  0.0       0.0  1.0  0.0  0.0
6   0.0  1.0  0.0  0.0  0.0   1.0  0.0  0.0       0.0  1.0  0.0  0.0
7   0.0  1.0  0.0  0.0  0.0   1.0  0.0  0.0       0.0  1.0  0.0  0.0
8   0.0  1.0  0.0  0.0  0.0   1.0  0.0  0.0       0.0  1.0  0.0  0.0
9   0.0  0.0  1.0  0.0  0.0   0.0  1.0  0.0       0.0  0.0  1.0  0.0
10  0.0  0.0  1.0  0.0  0.0   0.0  1.0  0.0       0.0  0.0  1.0  0.0
11  0.0  0.0  1.0  0.0  0.0   0.0  1.0  0.0       0.0  0.0  1.0  0.0
12  0.0  0.0  1.0  0.0  0.0   0.0  1.0  0.0       0.0  0.0  1.0  0.0
13  0.0  0.0  1.0  0.0  0.0   0.0  1.0  0.0       0.0  0.0  1.0  0.0
14  0.0  0.0  1.0  0.0  0.0   0.0  1.0  0.0       0.0  0.0  1.0  0.0
15  0.0  0.0  0.0  1.0  0.0   0.0  0.0  1.0       0.0  0.0  0.0  1.0
16  1.0  0.0  0.0  0.0  1.0   0.0  0.0  0.0       1.0  0.0  0.0  0.0
17  1.0  0.0  0.0  0.0  1.0   0.0  0.0  0.0       1.0  0.0  0.0  0.0
18  1.0  0.0  0.0  0.0  1.0   0.0  0.0  0.0       1.0  0.0  0.0  0.0
19  1.0  0.0  0.0  0.0  1.0   0.0  0.0  0.0       1.0  0.0  0.0  0.0
20  1.0  0.0  0.0  0.0  1.0   0.0  0.0  0.0       1.0  0.0  0.0  0.0
21  1.0  0.0  0.0  0.0  1.0   0.0  0.0  0.0       1.0  0.0  0.0  0.0

#

This is my result

#

Why are my results like this

abstract wasp Jul 10, 2024, 7:48 PM

#

hi i need help, im trying to clean some text but i get this error:
import nltk import nltk.corpus from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from nltk.stem import WordNetLemmatizer import re import matplotlib def preprocessing_text(text): text = text.lower() text = re.sub(r"(@\[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", text) tokens = word_tokenize(text) return tokens def remove_stopwords(tokens): stop_words = set(stopwords.words('english')) filtered_tokens = [word for word in tokens if word not in stop_words] return filtered_tokens def cleaning_text(text): tokens = preprocessing_text(text) filtered_tokens = remove_stopwords(tokens) lemmatized_tokens = lemmatization(filtered_tokens) cleaned_text = ' '.join(lemmatized_tokens) return cleaned_text text = '/Users/avatarvaleria/UCSD/NLP/HH_english_transcripts.txt' with open(text, 'r') as file: text = file.read() print(text) cleaned_data = cleaning_text(text) print(cleaned_data)

#

`TypeError Traceback (most recent call last)
Cell In[59], line 1
----> 1 cleaned_data = cleaning_text(text)
2 print(cleaned_data)

Cell In[57], line 4, in cleaning_text(text)
2 tokens = preprocessing_text(text)
3 filtered_tokens = remove_stopwords(tokens)
----> 4 lemmatized_tokens = lemmatization(filtered_tokens)
5 cleaned_text = ' '.join(lemmatized_tokens)
6 return cleaned_text
Cell In[56], line 3, in lemmatization(tokens)
1 def lemmatization(tokens):
2 lemmatizer = WordNetLemmatizer()
----> 3 lemmatized_tokens = [lemmatizer.lemmatize(tokens) for token in tokens]
4 return lemmatized_tokens
Cell In[56], line 3, in <listcomp>(.0)
1 def lemmatization(tokens):
2 lemmatizer = WordNetLemmatizer()
----> 3 lemmatized_tokens = [lemmatizer.lemmatize(tokens) for token in tokens]
4 return lemmatized_tokens
File /opt/anaconda3/lib/python3.11/site-packages/nltk/stem/wordnet.py:45, in WordNetLemmatizer.lemmatize(self, word, pos)
33 def lemmatize(self, word: str, pos: str = "n") -> str:
34 """Lemmatize word using WordNet's built-in morphy function.
35 Returns the input word unchanged if it cannot be found in WordNet.
36
(...)
43 :return: The lemma of word, for the given pos.
44 """
---> 45 lemmas = wn._morphy(word, pos)
46 return min(lemmas, key=len) if lemmas else word
File /opt/anaconda3/lib/python3.11/site-packages/nltk/corpus/reader/wordnet.py:2096, in WordNetCorpusReader._morphy(self, form, pos, check_exceptions)
2094 # 0. Check the exception lists
2095 if check_exceptions:
-> 2096 if form in exceptions:
2097 return filter_forms([form] + exceptions[form])
2099 # 1. Apply rules once to the input to get y1, y2, y3, etc.
TypeError: unhashable type: 'list'`

rich moth Jul 10, 2024, 8:35 PM

#

abstract wasp `TypeError Traceback (most recent call last) Cel...

You need to pass it as a single string not a list

#

Make a new def that handles that lemmization to each token in the list

umbral delta Jul 10, 2024, 8:51 PM

#

im trying to change the learning rate in tf as such: ```py
from keras import backend as K
K.set_value(model.optimizer.learning_rate, 0.01)

#

but i get an error ```
AttributeError: module 'keras.api.backend' has no attribute 'set_value'

rich moth Jul 10, 2024, 8:54 PM

#

umbral delta im trying to change the learning rate in tf as such: ```py from keras import bac...

K.set_vaulue(model.optimizer.lr.assign(0.01) try that

umbral delta Jul 10, 2024, 8:57 PM

#

rich moth K.set_vaulue(model.optimizer.lr.assign(0.01) try that

same error

umbral delta Jul 10, 2024, 9:07 PM

#

rich moth K.set_vaulue(model.optimizer.lr.assign(0.01) try that

im on keras v3.3.3

mental rampart Jul 10, 2024, 11:06 PM

#

how to find gtx 1650 gpu device plugin for tensorflow gpu support

mental rampart Jul 10, 2024, 11:30 PM

#

latest ig and yes windows 11

#

i think i need device plugin for tensorflow to access my gpu
To use a particular device, like one would a native device in TensorFlow, users only have to install the device plug-in package for that device.

#

what is WSL2?

#

also is tensorflow intel works as device plugin? not sure

#

oh ic

#

so like

#

does pytorch provides all the functionality of tensorflow?

#

hmm lemme check

#

its so easy to make sequential models on tensorflow

rich moth Jul 10, 2024, 11:59 PM

#

i finally got the evaluation stage working correctly, i couldnt be happier! Epoch 1/1: 100%|█████████████████████████████████████████████| 351/351 [5:15:18<00:00, 53.90s/batch, Batch Loss=3.68e-5] Evaluation: 0%| | 0/88 [00:00<?, ?it/s]Input to VideoEncoder: batch_size=16, num_frames=16, channels=3, height=128, width=128 After view reshape: torch.Size([16, 48, 128, 128]) After conv2d_layers: torch.Size([16, 512, 128, 128]) After view reshape before fc: torch.Size([16, 8388608]) Input to VideoDecoder: torch.Size([16, 512]) After fc layer: torch.Size([16, 131072]) After view reshape: torch.Size([16, 512, 16, 16]) After conv_reduce: torch.Size([16, 512, 16, 16]) After conv2d_transpose_layers: torch.Size([16, 48, 128, 128]) Channels: 3, Expected size: 12582912, Actual size: 12582912 Final output shape: torch.Size([16, 16, 3, 128, 128]) Evaluation: 1%|▊ | 1/88 [00:10<15:15, 10.52s/it]Input to VideoEncoder: batch_size=16, num_frames=16, channels=3, height=128, width=128

#

Epoch 1 Metrics: {'Total Loss': tensor(8.0492e-07, device='cuda:0'), 'PSNR': 2.810752446743289, 'SSIM': 0.06163982837892718}

#

Seems alright for the first epoch.

#

Jesus the pytorch model is 16.6 gigs

serene scaffold Jul 11, 2024, 12:38 AM

#

rich moth Jesus the pytorch model is 16.6 gigs

yeah, there are models now that are hundreds of GBs

rich moth Jul 11, 2024, 12:47 AM

#

serene scaffold yeah, there are models now that are hundreds of GBs

Right? I mean the falcon 180b is like 350gigs, I was just so surprised it was so big from the first epoch.

serene scaffold Jul 11, 2024, 12:47 AM

#

rich moth Right? I mean the falcon 180b is like 350gigs, I was just so surprised it was so...

models don't grow over each epoch.

rich moth Jul 11, 2024, 12:50 AM

#

serene scaffold models don't grow over each epoch.

This is good to know.

serene scaffold Jul 11, 2024, 1:42 AM

#

@rich moth in fact, the size of a model is constant. more parameters -> larger model.
epochs are just a complete pass over the training data. More epochs only means more chances to adjust the parameters. It doesn't add additional parameters.

hearty depot Jul 11, 2024, 2:18 AM

#

rich moth Jesus the pytorch model is 16.6 gigs

Time to quanitize

serene scaffold Jul 11, 2024, 2:26 AM

#

hearty depot Time to quanitize

Nooo

serene grail Jul 11, 2024, 2:40 AM

#

What does quantizing even do? I've heard that it makes models smaller/possible to run on worse hardware but also makes them perform worse.
Is it like lowering the resolution of the model in a way?

hearty depot Jul 11, 2024, 2:46 AM

#

serene grail What does quantizing even do? I've heard that it makes models smaller/possible t...

ur kinda right on lowering the resolution, basically it just makes the weights use smaller floating point precision values

small wedge Jul 11, 2024, 2:46 AM

#

it's lowering the number of bits used to represent weights and biases in the model

hearty depot Jul 11, 2024, 2:46 AM

#

so if standard model uses fp 64 u can lower it to like fp16

serene grail Jul 11, 2024, 2:47 AM

#

hearty depot ur kinda right on lowering the resolution, basically it just makes the weights u...

Ah I see I see, thank you

small wedge Jul 11, 2024, 2:47 AM

#

quantization can even happen on ridiculously small bit precisions like 8, 4, 3, 2, 1

#

at that point you don't even use float though

#

they just use ints instead

hearty depot Jul 11, 2024, 2:47 AM

#

small wedge quantization can even happen on ridiculously small bit precisions like 8, 4, 3, ...

ye, i think thats how they got llama running on pico iirc

serene grail Jul 11, 2024, 2:48 AM

#

hearty depot ye, i think thats how they got llama running on pico iirc

Wow, pretty cool

hearty depot Jul 11, 2024, 2:49 AM

#

serene grail Wow, pretty cool

ye it's very interesting there been a lot of cool system machine learning papers as of late

jaunty helm Jul 11, 2024, 3:49 AM

#

past meteor When you use it the warning tells you the downside 🙂

ah, right, didn't see them cause I suppress warnings 😅

jaunty helm Jul 11, 2024, 4:04 AM

#

serene grail What does quantizing even do? I've heard that it makes models smaller/possible t...

great answers above already
here's a pull request from 1 year ago (so maybe a bit outdated) adding K-quantization to llama.cpp
from the first graph you can see that mid-high quantization(quants that keep more bits, compressing the model less) actually doesn't hurt the quality too much(lower perplexity is better) but lowers the hardware reqs significantly(note that the x-axis is in log scale)
so from a running-LLMs-locally perspective at least, there's almost no reason not to run a quantized model

serene grail Jul 11, 2024, 4:05 AM

#

jaunty helm great answers above already [here](https://github.com/ggerganov/llama.cpp/pull/1...

Thank you!

hard nest Jul 11, 2024, 8:03 AM

#

In NN training, the validation and test dataset can have any batch size right? Like I can just use the biggest allowed by my pc to accelerate the learning

mild dirge Jul 11, 2024, 8:30 AM

#

hard nest In NN training, the validation and test dataset can have any batch size right? L...

I don't think that is relevant for the test/validation batch size @final kiln . If you calculate some measure like accuracy/f1-score etc. on the entire test/validation dataset, it doesn't matter what batch size you use, other than a larger batch size will likely be more efficient for your computer (time wise).

mild dirge Jul 11, 2024, 8:31 AM

#

hard nest In NN training, the validation and test dataset can have any batch size right? L...

And in case you calculate the loss on the test/validation set, make sure you calculate the average loss per item, such that if you run it with a different batch size, you can still compare it.

#

Does that make sense? @hard nest

hard nest Jul 11, 2024, 9:30 AM

#

mild dirge And in case you calculate the loss on the test/validation set, make sure you cal...

I mean, I calculate the loss mean, like add the loss every batch and then divide it by the number of batches

hard nest Jul 11, 2024, 9:31 AM

#

mild dirge I don't think that is relevant for the test/validation batch size <@935270247366...

That's what I was thinking, since it appeared that it won't affect anything except the time, big batches are the best

plucky island Jul 11, 2024, 10:31 AM

#

does anyone here use mmdetection?

severe hare Jul 11, 2024, 12:10 PM

#

plucky island does anyone here use mmdetection?

Looks like tutorials are sparse; though last release was January.
https://mmdetection.readthedocs.io/en/latest/

plucky island Jul 11, 2024, 12:36 PM

#

severe hare Looks like tutorials are sparse; though last release was January. https://mmdet...

I just wanted to know if anyone tried the demo, I'm getting trash fps on videos while using rtmdet tiny model. I thought it'd be as fast as yolo

severe hare Jul 11, 2024, 1:04 PM

#

plucky island I just wanted to know if anyone tried the demo, I'm getting trash fps on videos ...

I don’t think it is geared toward English-speaking users as much, and it feels like something that should have started with a HuggingFace ‘Space’.

ember pawn Jul 11, 2024, 2:05 PM

#

hello

#

i wanted to ask if someone can help me with a tad bit of issue that i am running into

#

how do you go about doing str.extract

#

in pyspark.pandas ?

#

it is saying that is not supported in the documentation

austere agate Jul 11, 2024, 2:31 PM

#

whole lotta shit that is out of my knowledge that I dont understand

#

But soon

vagrant root Jul 11, 2024, 3:55 PM

#

https://www.deep-ml.com/

#

has anyone tried this?

severe hare Jul 11, 2024, 4:00 PM

#

ember pawn it is saying that is not supported in the documentation

search the docs for 'substr' ; there are Regex and SQL fixes for slicing and extracting columns from a db

severe hare Jul 11, 2024, 4:01 PM

#

vagrant root https://www.deep-ml.com/

Why is the SVD and Laplace considered 'hard'? The kids in the math subReddits would eat that stuff for a snack

vagrant root Jul 11, 2024, 4:02 PM

#

It's not math tho

rich moth Jul 11, 2024, 4:20 PM

#

hearty depot Time to quanitize

Im using vector quantization durning the training. It consist of a codebook with learnable embeddings. In the forward pass the input features are mapped to the clostest embeddings in the codebook, quantizing the features. It actually allows me to preform this on my system by compressing it to lower dimensions. I imagine if they were in their orginal format this would take an insane amount of time.

haughty cradle Jul 11, 2024, 4:21 PM

#

do we still don't have a better way other than gradient descent for training AI? since from what I learned it seems to not even guarantee you to get the lowest valley, it just guaranteed you getting to the lowest point of the nearest valley

serene scaffold Jul 11, 2024, 4:24 PM

#

haughty cradle do we still don't have a better way other than gradient descent for training AI?...

SGD is one optimization algorithm. There are others.

haughty cradle Jul 11, 2024, 4:24 PM

#

I see... I guess i need to learn more first pithink

serene scaffold Jul 11, 2024, 4:25 PM

#

but in general, I don't think there even can be an optimization algorithm that is guaranteed to find the global minimum

haughty cradle Jul 11, 2024, 4:26 PM

#

I see... 😔

tidal bough Jul 11, 2024, 4:27 PM

#

just guaranteed you getting to the lowest point of the nearest valley
that's only true for normal gradient descent; the fancy ones are a bit better about it (or worse, if you're unlucky)

#

but generally speaking, yes. optimization is hard.

past meteor Jul 11, 2024, 4:31 PM

#

Are we talking of neural networks?

#

Because if the problem is convex you have strong guarantees with basic SGD

hearty depot Jul 11, 2024, 4:32 PM

#

yeah

hearty depot Jul 11, 2024, 4:33 PM

#

tidal bough but generally speaking, yes. optimization is hard.

numerical optimization is the only math where i ran out of symbols 😂
acc can get very messay

wild loom Jul 11, 2024, 4:46 PM

#

hey guys so I am currently working on a project surrounding training my own Faster RCNN model and it's running as we speak but it's tages ages and I have no refernce for when it's going to sotp traning it. Do you guys know any way that on google colab I am either able to monitor it's traning speed as it runs through or if there is a free / cheap way to speak up the speed of it

vagrant root Jul 11, 2024, 4:53 PM

#

haughty cradle do we still don't have a better way other than gradient descent for training AI?...

yea it doesnt guarantee to find the global minimum. So we implement a temperature to the algorithm. The temperature is hot at the beginning(more prone to explore even if it finds a minima) and it gets colder with each step the algorithm makes

vagrant root Jul 11, 2024, 4:55 PM

#

wild loom hey guys so I am currently working on a project surrounding training my own Fast...

is it taking a lot of time per epoch? If no you can verbose the epochs and reduce them accordingly

wild loom Jul 11, 2024, 4:55 PM

#

[07/11 16:25:03 d2.engine.train_loop]: Starting training from iteration 0
[07/11 16:35:38 d2.utils.events]:  eta: 2:26:54  iter: 19  total_loss: 1.874  loss_cls: 0.6037  loss_box_reg: 0.5858  loss_mask: 0.6831  loss_rpn_cls: 0.005265  loss_rpn_loc: 0.009363    time: 31.3418  last_time: 25.0513  data_time: 0.0441  last_data_time: 0.0071   lr: 1.6068e-05  
[07/11 16:45:58 d2.utils.events]:  eta: 2:13:44  iter: 39  total_loss: 1.661  loss_cls: 0.4723  loss_box_reg: 0.5756  loss_mask: 0.6377  loss_rpn_cls: 0.007119  loss_rpn_loc: 0.01208    time: 31.0541  last_time: 36.0582  data_time: 0.0089  last_data_time: 0.0081   lr: 3.2718e-05

#

these are what it's outputing currently and you can see the eta per 10 iterations

tidal bough Jul 11, 2024, 4:56 PM

#

hearty depot numerical optimization is the only math where i ran out of symbols 😂 acc can g...

That's when the tildes and hats come in 😛

vagrant root Jul 11, 2024, 4:56 PM

#

10 minutes for 20 epochs(30 s/epochs)

wild loom Jul 11, 2024, 4:56 PM

#

I am currently trying to train it on 300 iterations so that it can serve as a base point for where I continue from and what I'm doing wrong but if it's going to take 30 seconds per iteration I'm gonna have to leave it over night than because that sounds horrid

vagrant root Jul 11, 2024, 4:56 PM

#

how big is the model being trained

wild loom Jul 11, 2024, 4:56 PM

#

it's around 120 images in the training portion

#

this is my first real attempt I would say at working with training my own model as well

vagrant root Jul 11, 2024, 4:57 PM

#

how many layers are in your model?

wild loom Jul 11, 2024, 4:57 PM

#

does it also make sense every time it's returning me these eta's that it gets shorter and shoter?

#

import detectron2
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
from detectron2 import model_zoo

# Setup configuration
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("my_dataset_train",)
cfg.DATASETS.TEST = ("my_dataset_val",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Use pre-trained weights
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 300    # Adjust the number of iterations as needed
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # Number of classes (excluding background)

# Use CPU for training
cfg.MODEL.DEVICE = "cpu"

# Create output directory
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

# Train the model
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

vagrant root Jul 11, 2024, 4:58 PM

#

yea eta is estimated time of arrival(completion)

wild loom Jul 11, 2024, 4:58 PM

#

that's the configurations for the training

vagrant root Jul 11, 2024, 4:58 PM

#

so yeah the total time left is decrasing everytime

wild loom Jul 11, 2024, 4:59 PM

#

shit

#

thanks for you're help lmfao

vagrant root Jul 11, 2024, 4:59 PM

#

and you are using a cpu instead of cuda?

wild loom Jul 11, 2024, 4:59 PM

#

if you have any idea of how I could decrease this please let me know

wild loom Jul 11, 2024, 4:59 PM

#

vagrant root and you are using a cpu instead of cuda?

lmfao yes because I'm on a mac

#

idk if there was another way around it

vagrant root Jul 11, 2024, 5:00 PM

#

wild loom lmfao yes because I'm on a mac

you are on colab no?

#

colab has a gpu

#

runtime>>change runtime type>> T4 GPU

#

Use CPU for training

cfg.MODEL.DEVICE = "cuda"

hearty depot Jul 11, 2024, 5:00 PM

#

wild loom lmfao yes because I'm on a mac

Use metal if u can

wild loom Jul 11, 2024, 5:02 PM

#

Okay

vagrant root Jul 11, 2024, 5:02 PM

#

hearty depot Use metal if u can

what is that?

wild loom Jul 11, 2024, 5:02 PM

#

I will make sure to do so

vagrant root Jul 11, 2024, 5:03 PM

#

wild loom I will make sure to do so

make sure to do what i said? if yes, ping me if you have any issue 🙂

hearty depot Jul 11, 2024, 5:03 PM

#

vagrant root what is that?

It’s basically apples version of cuda, since m1 is both cpu/gpu u use metal to write kernels for it

vagrant root Jul 11, 2024, 5:04 PM

#

hearty depot It’s basically apples version of cuda, since m1 is both cpu/gpu u use metal to w...

so like you use device = "cuda" for nvidia?
does metal use nvidia architecture?

hearty depot Jul 11, 2024, 5:06 PM

#

vagrant root so like you use device = "cuda" for nvidia? does metal use nvidia architecture?

No it’s apples custom architecture

#

For metal it’s like mps iirc

#

I’d check documentation tho for the library u r using

vagrant root Jul 11, 2024, 5:06 PM

#

oh

hearty depot Jul 11, 2024, 5:07 PM

#

Yeah metal is p neat, apple made their own dl library recently for metal

https://github.com/ml-explore/mlx

GitHub

GitHub - ml-explore/mlx: MLX: An array framework for Apple silicon

MLX: An array framework for Apple silicon. Contribute to ml-explore/mlx development by creating an account on GitHub.

vagrant root Jul 11, 2024, 5:14 PM

#

hearty depot I’d check documentation tho for the library u r using

i guess ill check it out sometime

#

i thought there was no hardware accelation for my m1

#

lol

wild loom Jul 11, 2024, 5:51 PM

#

I have an M2 Mac so would that change anything

wild loom Jul 11, 2024, 5:51 PM

#

vagrant root make sure to do what i said? if yes, ping me if you have any issue 🙂

I’ll make sure to let u know later I’m taking a break for now

haughty cradle Jul 11, 2024, 5:52 PM

#

never thought NN would be this complex 🥹

where to learn how to make NN like this?

#

how to know it's ok to literally add a sin or cos formula to a bias

severe hare Jul 11, 2024, 5:53 PM

#

^ Transformers and NNs are 2 different things.

haughty cradle Jul 11, 2024, 5:53 PM

#

how do they even know self-attention can be made like that

ember pawn Jul 11, 2024, 5:53 PM

#

severe hare search the docs for 'substr' ; there are Regex and SQL fixes for slicing and ext...

i wanted to ask

#

if pyspark.pandas is worth using

#

or should i use pyspark directly

haughty cradle Jul 11, 2024, 5:53 PM

#

severe hare ^ Transformers and NNs are 2 different things.

it's different? what do people call transformer then?

#

is there something similar to it?

severe hare Jul 11, 2024, 5:54 PM

#

ember pawn or should i use pyspark directly

It's fine for large databases- you probably don't need to add pandas, at least not right away
https://www.linkedin.com/pulse/leveraging-pyspark-integrating-diverse-data-sources-guide-ramesh-0ljnc/

Leveraging PySpark for Integrating Diverse Data Sources: A Guide

In today’s data-driven landscape, the ability to seamlessly integrate data from various sources is a vital skill for data professionals. Apache Spark, particularly its Python API PySpark, offers robust capabilities for handling large-scale data processing in a distributed computing environment.

severe hare Jul 11, 2024, 5:55 PM

#

haughty cradle it's different? what do people call transformer then?

https://huggingface.co/learn/nlp-course/en/chapter1/3

Transformers, what can they do? - Hugging Face NLP Course

haughty cradle Jul 11, 2024, 5:56 PM

#

thx

ember pawn Jul 11, 2024, 5:56 PM

#

severe hare It's fine for large databases- you probably don't need to add pandas, at least n...

ok :3
thing is i know pandas and i have tasked to do the work i have done in pandas to be done in pyspark

#

and found out there is this pyspark.pandas

#

but like i do not know if it is i should use it or not

#

and i tried to use it

hearty depot Jul 11, 2024, 5:57 PM

#

severe hare ^ Transformers and NNs are 2 different things.

i mean i'd argue transformers r subsets of nn, i mean they have feed forward layers after all

ember pawn Jul 11, 2024, 5:57 PM

#

i only worse and worse errors

severe hare Jul 11, 2024, 5:57 PM

#

ember pawn but like i do not know if it is i should use it or not

Try just pandas then. Because PySpark is for Apache I think

haughty cradle Jul 11, 2024, 5:57 PM

#

is there maybe others NN model other than Transformer?

severe hare Jul 11, 2024, 5:58 PM

#

hearty depot i mean i'd argue transformers r subsets of nn, i mean they have feed forward lay...

This is right. They are a specific form of NLP. Not very different from a Neural Net, you're right.

hearty depot Jul 11, 2024, 5:58 PM

#

ember pawn ok :3 thing is i know pandas and i have tasked to do the work i have done in pan...

there r like apis to perform pandas

#

function on spark df

ember pawn Jul 11, 2024, 5:58 PM

#

hearty depot there r like apis to perform pandas

yes that only i was using

#

it is pandas api

#

for pyspark

wooden sail Jul 11, 2024, 5:59 PM

#

haughty cradle thx

transformers ARE neural networks, just a particular kind

severe hare Jul 11, 2024, 5:59 PM

#

haughty cradle is there maybe others NN model other than Transformer?

https://www.mygreatlearning.com/blog/types-of-neural-networks/

Great Learning Blog: Free Resources what Matters to shape your Career!

Types of Neural Networks and Definition of Neural Network

Definition & Types of Neural Networks: There are 7 types of Neural Networks, know the advantages and disadvantages of each thing on mygreatlearning.com

haughty cradle Jul 11, 2024, 6:00 PM

#

severe hare https://www.mygreatlearning.com/blog/types-of-neural-networks/

thx ❤️

severe hare Jul 11, 2024, 6:00 PM

#

haughty cradle is there maybe others NN model other than Transformer?

https://machine-learning-made-simple.medium.com/what-are-the-different-types-of-transformers-in-ai-5085275664e8

Medium

What are the Different Types of Transformers in AI

Understanding the biggest neural network in Deep Learning

ember pawn Jul 11, 2024, 6:00 PM

#

ohh i wanted to ask are there any courses that you would reccoment for deep leanring
i have done the coursera deep learning specialization

haughty cradle Jul 11, 2024, 6:00 PM

#

severe hare https://www.mygreatlearning.com/blog/types-of-neural-networks/

this aren't on same level as transformer tho

wooden sail Jul 11, 2024, 6:00 PM

#

as for the answer to "how do people come up with this" and "how to know when to use which activation function", it's "by studying a lot of math"

severe hare Jul 11, 2024, 6:01 PM

#

ember pawn ohh i wanted to ask are there any courses that you would reccoment for deep lean...

mm I think I would look to Amazon or your local bookstores

wooden sail Jul 11, 2024, 6:01 PM

#

you usually need a very good statistical or optimization or other math-based motivation to come up with a new architecture that works well and know why it works

haughty cradle Jul 11, 2024, 6:02 PM

#

damn... so I need to learn the math

#

I see...

#

pithink I guess I need to learn more

wooden sail Jul 11, 2024, 6:02 PM

#

if you want to make new stuff without stumbling around in the dark, yes

severe hare Jul 11, 2024, 6:02 PM

#

haughty cradle this aren't on same level as transformer tho

You are completely right that is a bad article- BERT and GPT are CNN architecture; not transformers (afair)

vagrant root Jul 11, 2024, 6:02 PM

#

haughty cradle how do they even know self-attention can be made like that

watch 3b1b video

hearty depot Jul 11, 2024, 6:02 PM

#

wooden sail you usually need a very good statistical or optimization or other math-based mot...

big bro can u teach me how to make the next KAN

wooden sail Jul 11, 2024, 6:02 PM

#

if you jus want to use stuff, you can just fish up the hottest stuff used atm

ember pawn Jul 11, 2024, 6:03 PM

#

severe hare mm I think I would look to Amazon or your local bookstores

for ??

haughty cradle Jul 11, 2024, 6:03 PM

#

vagrant root watch 3b1b video

oh! they have the vid, thx for telling

severe hare Jul 11, 2024, 6:04 PM

#

ember pawn for ??

texts on Deep Learning.

hearty depot Jul 11, 2024, 6:04 PM

#

haughty cradle damn... so I need to learn the math

if ur starting off learn linear, calc first then get understanding of statistics
from there u can get more niche math like optimization and analysis

ember pawn Jul 11, 2024, 6:05 PM

#

severe hare texts on Deep Learning.

like are there any other resources thing is i have adhd and i have problems with books i cannot seem to finish them

haughty cradle Jul 11, 2024, 6:05 PM

#

hearty depot if ur starting off learn linear, calc first then get understanding of statistics...

ok 👀

ember pawn Jul 11, 2024, 6:05 PM

#

for linear algebra if there is any i would love to buy i tend to my deep leaning on laptop where i can like code it out a lil

vagrant root Jul 11, 2024, 6:06 PM

#

haughty cradle oh! they have the vid, thx for telling

yeah its on deep learning a 6-7 part series

haughty cradle Jul 11, 2024, 6:06 PM

#

I feel like each connection between neural is just a linear algebra tbh

hearty depot Jul 11, 2024, 6:07 PM

#

haughty cradle I feel like each connection between neural is just a linear algebra tbh

it is, u compute the layers with whole bunch of dot products

ember pawn Jul 11, 2024, 6:07 PM

#

hearty depot if ur starting off learn linear, calc first then get understanding of statistics...

can you share some rescources for linear algebra and calc

vagrant root Jul 11, 2024, 6:07 PM

#

ember pawn can you share some rescources for linear algebra and calc

how much do you know?

hearty depot Jul 11, 2024, 6:07 PM

#

ember pawn can you share some rescources for linear algebra and calc

hm what's ur level of mathematical maturity? r u comfortables with proofs?

severe hare Jul 11, 2024, 6:07 PM

#

ember pawn like are there any other resources thing is i have adhd and i have problems with...

Well you got through the Coursera course- so maybe ask your instructor or advisor for continuing education resources. Deep Learning isn't really my specialty, I would just recommend keep searching the web and try to stay consistent with a learning schedule, and you will retain a lot.

ember pawn Jul 11, 2024, 6:08 PM

#

yes
i am college graduate level

small wedge Jul 11, 2024, 6:08 PM

#

ember pawn can you share some rescources for linear algebra and calc

http://arxiv.org/pdf/1802.01528

#

I love this paper

vagrant root Jul 11, 2024, 6:08 PM

#

ember pawn yes i am college graduate level

watch andrew ng deeplearning vids

severe hare Jul 11, 2024, 6:08 PM

#

small wedge http://arxiv.org/pdf/1802.01528

This is 'legit'

serene grail Jul 11, 2024, 6:09 PM

#

ember pawn for linear algebra if there is any i would love to buy i tend to my deep leaning...

There are calculus and linear algebra videos by 3blue1brown on YouTube, I can't say whether they are good or not since I'm just a beginner
But I like the way he explains stuff so far, a lot of visual explanations

ember pawn Jul 11, 2024, 6:09 PM

#

and i wanted to ask like HMMMM
how do you think a neural network reaches the optimal solution honestly whenever to try to sum it over in my head and explain i simply cannot because it is just dot product happening and lot of it is random

hearty depot Jul 11, 2024, 6:09 PM

#

ember pawn yes i am college graduate level

ok then i would suggest gilbert strang or axler linear alg
strang is a lot more computational whereas axler is more proof based and formal

as for calculus paul's online math notes is a good intro and if u want to do deep dive read spivak

haughty cradle Jul 11, 2024, 6:10 PM

#

damn... that is matrix... does this mean I need to learn FFT 😭
I never understand FFT (Fast Fourier Transform)

small wedge Jul 11, 2024, 6:10 PM

#

ember pawn and i wanted to ask like HMMMM how do you think a neural network reaches the opt...

you use derivatives to see whether changing a parameter will make the cost go up or down, then move a little in the downward direction

hearty depot Jul 11, 2024, 6:10 PM

#

haughty cradle damn... that is matrix... does this mean I need to learn FFT 😭 I never understa...

its useful to know imo

#

also ur prob gonna learn in college

#

at some pooint

small wedge Jul 11, 2024, 6:10 PM

#

small wedge you use derivatives to see whether changing a parameter will make the cost go up...

that's GD but obviously there are lots of other optimization algorithms

vagrant root Jul 11, 2024, 6:10 PM

#

ember pawn and i wanted to ask like HMMMM how do you think a neural network reaches the opt...

https://youtu.be/VMj-3S1tku0?si=yQn4FO-9U1_jsBB9
this is the video imo

YouTube

Andrej Karpathy

The spelled-out intro to neural networks and backpropagation: build...

This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school.

Links:

micrograd on github: https://github.com/karpathy/micrograd
jupyter notebooks I built in this video: https://github.com/karpathy/nn-z...

▶ Play video

hearty depot Jul 11, 2024, 6:11 PM

#

ember pawn and i wanted to ask like HMMMM how do you think a neural network reaches the opt...

i mean most neural networks have a convex

#

loss function so loss is guaranteed to be suffecient decrase in certain cases

haughty cradle Jul 11, 2024, 6:12 PM

#

you use Gradient descent for that no? or it just don't always work in real practice? pithink

vagrant root Jul 11, 2024, 6:13 PM

#

haughty cradle you use Gradient descent for that no? or it just don't always work in real pract...

yes gradient descent

ember pawn Jul 11, 2024, 6:13 PM

#

hmmmmmmm idk it just seems odd to me how it all works out
you can never an output being reached with a singlular unit of neuron what exaclty is the point of stacking them up
like if you were to spread a neural network with same amount of neurons with that of hidden layers will it work the same ?
if not then what fucntionality is the hidden layer adding ?

#

i know the question seems stupid but i cannot understand it

small wedge Jul 11, 2024, 6:14 PM

#

haughty cradle you use Gradient descent for that no? or it just don't always work in real pract...

plain gradient descent is the basis for a lot of optimization functions, but usually we don't use traditional gradient descent by itself and add lots of fancy stuff to it (mini batches, momentum, adaptive learning rates, scheduling, etc)

haughty cradle Jul 11, 2024, 6:14 PM

#

small wedge plain gradient descent is the basis for a lot of optimization functions, but usu...

I see... this stuff is very complex... 🙂

hearty depot Jul 11, 2024, 6:15 PM

#

haughty cradle you use Gradient descent for that no? or it just don't always work in real pract...

yeah like waterfall said there different ways top optimize models such as adam and adagrad

vagrant root Jul 11, 2024, 6:15 PM

#

ember pawn hmmmmmmm idk it just seems odd to me how it all works out you can never an outp...

we can its the Universal approximation theorem

hearty depot Jul 11, 2024, 6:15 PM

#

haughty cradle I see... this stuff is very complex... 🙂

some of the methods such as adagrad r p simple

small wedge Jul 11, 2024, 6:15 PM

#

you can get lost in the sauce with optimization algorithms (quasi-hyperbolic momentum 🥴 )

vagrant root Jul 11, 2024, 6:15 PM

#

vagrant root we can its the Universal approximation theorem

every unit will approximate the function to some extent

wild loom Jul 11, 2024, 6:16 PM

#

@hearty depot & @vagrant root if I hook up my google colab to a local runtime do you guys think it would run faster

vagrant root Jul 11, 2024, 6:16 PM

#

adding more units will make it more precise

haughty cradle Jul 11, 2024, 6:16 PM

#

small wedge you can get lost in the sauce with optimization algorithms (quasi-hyperbolic mom...

that sound hard 💀

ember pawn Jul 11, 2024, 6:16 PM

#

but we can have a non linear activation regardless of hidden layer or not

hearty depot Jul 11, 2024, 6:16 PM

#

small wedge you can get lost in the sauce with optimization algorithms (quasi-hyperbolic mom...

or dont use gradient descent and use lbfgs instead 😂

vagrant root Jul 11, 2024, 6:16 PM

#

wild loom <@451561544883634186> & <@807909714192760832> if I hook up my google colab to a...

no not really m1 isnt accelerated as the nvidia gpu on colab is

hearty depot Jul 11, 2024, 6:17 PM

#

^

wild loom Jul 11, 2024, 6:17 PM

#

okay thank you

hearty depot Jul 11, 2024, 6:17 PM

#

p sure colab has tpu

small wedge Jul 11, 2024, 6:17 PM

#

I'm lazy I just stick to my genetic algorithms so I don't have to do math peepoSit

vagrant root Jul 11, 2024, 6:17 PM

#

the nvidia gpu(cuda) is designed for ai matrix multiplication

ember pawn Jul 11, 2024, 6:18 PM

#

so if a neural network is a one dim array with non liner activation and the number of neuron matches the number of neuron units with the hidden layer the performace will be the same ?

hearty depot Jul 11, 2024, 6:18 PM

#

small wedge I'm lazy I just stick to my genetic algorithms so I don't have to do math <:peep...

tbf even stuff like pytorch abstract tf out of the math, like i met phd students that dont know how autodiff work cuz they just use pytorch w/o thinking

small wedge Jul 11, 2024, 6:18 PM

#

hearty depot tbf even stuff like pytorch abstract tf out of the math, like i met phd students...

true bc writing it all from scratch is fucking pain SWEATSTINY

vagrant root Jul 11, 2024, 6:20 PM

#

https://x.com/Hamptonism/status/1796111788292866468

@ember pawn

ₕₐₘₚₜₒₙ — e/acc (@Hamptonism) on X

The universal approximation theorem states that a neural network with one hidden layer can approximate continuous functions on compact sets with any desired precision.

#

yo @final kiln how is the job search going?

#

lessgooo

#

congrats

#

goodluck

hearty depot Jul 11, 2024, 6:21 PM

#

vagrant root https://x.com/Hamptonism/status/1796111788292866468 <@729600430710980618>

I like how everyone just accepts this as is even tho there haven’t been formal proof on it

#

The continuous case right?

haughty cradle Jul 11, 2024, 6:24 PM

#

isn't one hidden layer just mean it only pass through 2 linear function? that mean it's should behave like X^2 polynomial graph no?

hearty depot Jul 11, 2024, 6:25 PM

#

Lowk a blessing and a curse, some of this beliefs make for the worst papers
Mfers be writing papers on emergent properties of llms and then be using mcq for the metric 😭
No shit there is a sharp linear increase in accuracy, now try that a non linear metric

haughty cradle Jul 11, 2024, 6:27 PM

#

I see...

haughty cradle Jul 11, 2024, 6:27 PM

#

vagrant root https://x.com/Hamptonism/status/1796111788292866468 <@729600430710980618>

I just watch this, it's seems the main factor is because of ReLU?

vagrant root Jul 11, 2024, 6:28 PM

#

haughty cradle I just watch this, it's seems the main factor is because of ReLU?

yea

haughty cradle Jul 11, 2024, 6:32 PM

#

I found this meme on that X link, what does this mean?

#

KAN superior?

hearty depot Jul 11, 2024, 6:35 PM

#

haughty cradle KAN superior?

they r technically equal in expressive powers

serene grail Jul 11, 2024, 6:35 PM

#

I'm a noob but I've never even heard of that, is KAN new?

hearty depot Jul 11, 2024, 6:35 PM

#

like u make kan from mlp layers

hearty depot Jul 11, 2024, 6:35 PM

#

serene grail I'm a noob but I've never even heard of that, is KAN new?

yeah its a supposes

#

to be an alt to mlp

serene grail Jul 11, 2024, 6:37 PM

#

I don't know much math yet, sounds like something similar to a Fourier Transform?

vagrant root Jul 11, 2024, 6:37 PM

#

haughty cradle I found this meme on that X link, what does this mean?

kan adjusts the activation function along with the weights making it more verssatile

serene grail Jul 11, 2024, 6:38 PM

#

Interesting, thanks

haughty cradle Jul 11, 2024, 6:39 PM

#

I still can't understand Fourier and I have studying it for like 4 years 😭

#

I get the basic but once you get into the compression and accelerating stuff 💀

severe hare Jul 11, 2024, 6:47 PM

#

haughty cradle I still can't understand Fourier and I have studying it for like 4 years 😭

haughty cradle Jul 11, 2024, 6:51 PM

#

I understand that image, but putting it into practice is another things

severe hare Jul 11, 2024, 6:57 PM

#

haughty cradle I understand that image, but putting it into practice is another things

It's from here; you got this.
https://stemporium.blog/2023/04/13/what-is-the-fourier-transform-and-how-is-it-used-in-image-processing/

STEMporium

wongthegreat

What is the Fourier Transform and how is it used in Image processing?

Nearly everything in our life can be represented as a waveform. From the images displayed on our phone screens to sound waves coming from our headphones, they can all be represented as a waveform. …

haughty cradle Jul 11, 2024, 6:57 PM

#

pithink

iron basalt Jul 11, 2024, 7:02 PM

#

haughty cradle KAN superior?

It's a more recent hyped paper that in practice is just the same thing again but worse, several of these pop up over time in ML. You can also show it to be the same mathematically.

#

You might get some neat concepts from such papers, but don't let the hype get to you.

#

(Kolmogorov's work)

serene grail Jul 11, 2024, 7:05 PM

#

I mean, I like that people way smarter than me are investigating approaches that are different from the current ones

iron basalt Jul 11, 2024, 7:06 PM

#

serene grail I mean, I like that people way smarter than me are investigating approaches that...

Yeah, but it's a bit different when they go on Twitter and start spamming that it's the death of current deep learning without any practical evidence / demonstration.

serene grail Jul 11, 2024, 7:06 PM

#

iron basalt Yeah, but it's a bit different when they go on Twitter and start spamming that i...

Oh yeah for sure

iron basalt Jul 11, 2024, 7:07 PM

#

serene grail Oh yeah for sure

You can find a ton of these "I solved AI" types that don't really have anything to show ever.

#

But if there is a demonstration, that I can reasonably reproduce, I am very interested.

vagrant root Jul 11, 2024, 7:21 PM

#

haughty cradle I understand that image, but putting it into practice is another things

https://phet.colorado.edu/sims/html/fourier-making-waves/latest/fourier-making-waves_all.html

‪Fourier: Making Waves‬

#

try it for yourself

haughty cradle Jul 11, 2024, 7:25 PM

#

wow I don't know such things exist

vagrant root Jul 11, 2024, 7:25 PM

#

🙂

#

notice how all different waves are made on same frequencies with different altitudes

#

thats what fourier transform does it deconstructs a waveform into multiple frequencies waves

ornate bronze Jul 11, 2024, 7:38 PM

#

science

hearty depot Jul 11, 2024, 7:51 PM

#

haughty cradle I understand that image, but putting it into practice is another things

https://www.dspguide.com/pdfbook.htm

#

this book is nice imp

#

also a lot of good examples in code

river cape Jul 11, 2024, 8:32 PM

#

Why is ReLU mostly used in the hidden layers of a neural network than compared to Sigmoid?

hearty depot Jul 11, 2024, 8:41 PM

#

river cape Why is ReLU mostly used in the hidden layers of a neural network than compared t...

Well for one

#

One problem w sigmoids r vanishing gradients, relu r less prone to this

harsh sun Jul 11, 2024, 8:44 PM

#

Hello. I am having difficulty conceptualizing specific parts of neural networks. I have went over the math, and I understand how the math works in terms of calculating the values. Here is my question:

What is the significance of the prediction any one neuron makes (linear regression predictions and activation in this case RelU). So like, when that value is produced and then passed into another layer with the dot product of further weights, at the end of the day, how do all of those numbers come together to form a cohesive output.

#

Essentially what does each part mean to that final whole? Cause i dont get how each neuron relates to its output.

wooden sail Jul 11, 2024, 9:03 PM

#

harsh sun Essentially what does each part mean to that final whole? Cause i dont get how e...

nothing you can interpret as a person, really

harsh sun Jul 11, 2024, 9:04 PM

#

wooden sail nothing you can interpret as a person, really

But, I cant grasp that. Because the person who made it mustve realized, "this computation provided me a prediction... so if i do this, this, and this, it will give me a nonlinear more well-rounded prediction", no?

wooden sail Jul 11, 2024, 9:05 PM

#

nope

#

in fact, especially if you approach it from the perspective that NNs were inspired by the brain, which we also don't understand, the idea is that complex organized behavior "emerges" from simple interactions in "inexplicable" manners

#

the theorems involved in justifying neural networks are claims of existence of good approximators, but they are not constructive (they don't tell you exactly how to build such a network)

#

you can read into explainable AI if you like

harsh sun Jul 11, 2024, 9:10 PM

#

wooden sail you can read into explainable AI if you like

so what ur saying is, they did linear regression and got a prediction. and then used that prediction in tandem with tons of other predictions and were like hm so if we make this a big chain and we do tons of dot products with tons of weights then we can get better predictions?

#

then with that, how would RelU make it non-linear? just by adding holes in the data?

wooden sail Jul 11, 2024, 9:14 PM

#

harsh sun so what ur saying is, they did linear regression and got a prediction. and then ...

except in general the individual layers are not predictions at all, and will not work if you consider them alone, so no

wooden sail Jul 11, 2024, 9:14 PM

#

harsh sun then with that, how would RelU make it non-linear? just by adding holes in the d...

relu is nonlinear in the sense that it does not satisfy the definition of linearity 😛

#

sure, "adding holes in the data" in this case is a nonlinearity, but it isn't always

#

all nonlinearity means here is that you applied a function for which it is NOT true that T(aB + cD) = aT(B) + cT(D), where T is a transformation, a and c are scalars, and B and D are vectors

#

"punching holes in the data" can also be done by multiplying with binary matrices, but this is a linear transformation, so the idea has to be defined more carefully

harsh sun Jul 11, 2024, 9:17 PM

#

wooden sail except in general the individual layers are not predictions at all, and will not...

but, it uses the equation of a prediction tho, no? dot product of weights with corresponding features

hearty depot Jul 11, 2024, 9:17 PM

#

harsh sun Essentially what does each part mean to that final whole? Cause i dont get how e...

like how it impacts decision making process?

#

one of problems with a lot of nns r that they r flexible

wooden sail Jul 11, 2024, 9:18 PM

#

harsh sun but, it uses the equation of a prediction tho, no? dot product of weights with c...

why is a matrix multiplication the equation of prediction?

hearty depot Jul 11, 2024, 9:18 PM

#

but hard to interpret when compared to classic strategies like linear reg

wooden sail Jul 11, 2024, 9:18 PM

#

the composition of all of the layers of a network makes a prediction

harsh sun Jul 11, 2024, 9:18 PM

#

wooden sail relu is nonlinear in the sense that it does not satisfy the definition of linear...

true, but how does that apply to the data? like how does getting rid of negative numbers result in a better prediction?

wooden sail Jul 11, 2024, 9:19 PM

#

you're trying to interpret stuff that makes no sense

harsh sun Jul 11, 2024, 9:19 PM

#

wooden sail why is a matrix multiplication the equation of prediction?

i just associate it that way because thats the equation for linear regression which produces a prediction

#

i mean linear regression makes sense

#

you are producing a line of best fit

wooden sail Jul 11, 2024, 9:19 PM

#

you have already assigned (a weird) meaning to these things in your head, that's what's throwing you off

wooden sail Jul 11, 2024, 9:20 PM

#

harsh sun you are producing a line of best fit

this only makes sense if you had a line you were predicting. here, you don't. you're not optimizing to fit a particular line with each layer

hearty depot Jul 11, 2024, 9:20 PM

#

harsh sun true, but how does that apply to the data? like how does getting rid of negative...

im p sure zeroing the value has more to do w preserving the gradients iirc

wooden sail Jul 11, 2024, 9:20 PM

#

the layers are not doing linear regression

harsh sun Jul 11, 2024, 9:20 PM

#

wooden sail you have already assigned (a weird) meaning to these things in your head, that's...

hm

harsh sun Jul 11, 2024, 9:20 PM

#

wooden sail the layers are not doing linear regression

oh. so is there any classification to what they are doing?

wooden sail Jul 11, 2024, 9:20 PM

#

no

harsh sun Jul 11, 2024, 9:20 PM

#

hearty depot im p sure zeroing the value has more to do w preserving the gradients iirc

how so?

wooden sail Jul 11, 2024, 9:20 PM

#

if you find one you win a prize, cuz researchers haven't so far

#

a lot of deep learning is "lmao it worked, look"

harsh sun Jul 11, 2024, 9:21 PM

#

wooden sail no

thats so mind boggling. how would people know then that doing that math, and doing that math in layers, produces a prediction?

wooden sail Jul 11, 2024, 9:22 PM

#

because there are severa ltheorems saying that if you compose some number of nonlinear functions, you can get arbitrarily close to any other function

harsh sun Jul 11, 2024, 9:22 PM

#

i js cant conceptualize that

hearty depot Jul 11, 2024, 9:22 PM

#

harsh sun how so?

it just helps prevent the gradient from converging to zero

harsh sun Jul 11, 2024, 9:22 PM

#

wooden sail because there are severa ltheorems saying that if you compose some number of non...

😭

wooden sail Jul 11, 2024, 9:22 PM

#

it doesn't tell you which functions nor how many to compose

#

nor what each of them mean

hearty depot Jul 11, 2024, 9:22 PM

#

also iirc non-linearity is another reason why relus exist iirc

hearty depot Jul 11, 2024, 9:23 PM

#

harsh sun thats so mind boggling. how would people know then that doing that math, and doi...

read about the universal approximation theorem

wooden sail Jul 11, 2024, 9:23 PM

#

those theorems motivate you to try composing simple functions. the training procedure does not give you nicely interpretable layers, they do arbitrary shit

harsh sun Jul 11, 2024, 9:23 PM

#

hearty depot it just helps prevent the gradient from converging to zero

wait... so that means that RelU does have a meaning and isnt just meant to apply non-linearity

wooden sail Jul 11, 2024, 9:23 PM

#

harsh sun wait... so that means that RelU does have a meaning and isnt just meant to apply...

not a "meaning". it has particular properties when you compose it with itself several times and then differentiate through it

#

properties that are nice for some optimization strategies, but not others

harsh sun Jul 11, 2024, 9:23 PM

#

hm

hearty depot Jul 11, 2024, 9:24 PM

#

also calculation for relu r a lot easier in terms of flops compared to like sigmoid

harsh sun Jul 11, 2024, 9:24 PM

#

my professor showed us how neural nets worked for XOR, and he said that neural networks for XOR produce a line (when using non-linearity), but it produces a fat line

harsh sun Jul 11, 2024, 9:24 PM

#

hearty depot also calculation for relu r a lot easier in terms of flops compared to like sigm...

our professor didnt explain why to use sigmoid. just that it introduces non-linearity

#

i mean from what hes conveyed it seems rather simple to implement like a XOR neural network with no libraries

wooden sail Jul 11, 2024, 9:25 PM

#

for one, it has the nice property of producing outputs in the range 0 to 1, which you want in this case

serene grail Jul 11, 2024, 9:25 PM

#

What makes a line fat?

iron basalt Jul 11, 2024, 9:26 PM

#

harsh sun i js cant conceptualize that

Imagine you get some input and you compute a ton of random functions on that input. If you have enough of those one or more of them will probably compute the correct answer to the problem. There are neural networks that operate on this as a foundation itself. So you can see why big networks would generally work, even if your method is random init except for the last bit.

hearty depot Jul 11, 2024, 9:26 PM

#

harsh sun i mean from what hes conveyed it seems rather simple to implement like a XOR neu...

u dont even nn tbh for xor, u can solve it with nested perceptron or manually calculating the weights

iron basalt Jul 11, 2024, 9:26 PM

#

iron basalt Imagine you get some input and you compute a ton of random functions on that inp...

You need non-linearity to get all kinds of combinations.

wooden sail Jul 11, 2024, 9:26 PM

#

hearty depot u dont even nn tbh for xor, u can solve it with nested perceptron or manually ca...

and what's the difference between nn and nested perceptron

harsh sun Jul 11, 2024, 9:27 PM

#

serene grail What makes a line fat?

he showed us a graph of a line with a slope of id say approximately -x and he conveyed it as two lines that are parallel with different y intercepts, and the space between those two lines are corresponding to two XOR values and the space outside of those two lines correspond to two other XOR values

iron basalt Jul 11, 2024, 9:27 PM

#

Sigmoid was specifically chosen for several nice properties and also it kind of mimics real neuron activations which is why we still call them neural networks even though it's far removed from that at this point.

harsh sun Jul 11, 2024, 9:27 PM

#

iron basalt Imagine you get some input and you compute a ton of random functions on that inp...

damn. so i was trying to conceptualize something that isnt conceptualize-able

wooden sail Jul 11, 2024, 9:27 PM

#

not in the way you wanted, no

serene grail Jul 11, 2024, 9:28 PM

#

harsh sun he showed us a graph of a line with a slope of id say approximately -x and he co...

So the fat line is the space between two lines I guess?

harsh sun Jul 11, 2024, 9:28 PM

#

but, in the end, the neural network produced three values. those three values were probabilities of dog, cat, and then smth else

hearty depot Jul 11, 2024, 9:28 PM

#

wooden sail and what's the difference between nn and nested perceptron

ig in theory they technically r nn

harsh sun Jul 11, 2024, 9:28 PM

#

iron basalt Sigmoid was specifically chosen for several nice properties and also it kind of ...

i read abt the neurologist who developed them first. quite interesting.

#

i was dumbfounded to find that after reading 10-20 articles none of them address why

#

so ur assertions make sense

#

they only address the math to do so

#

which IMO is relatively simple considering I learned the calculus and linear algebra for it two days ago

#

so @wooden sail the dot product acting as a method of assigning similarity in terms of vector math isnt relevant at all to how the neural net produces these predictions?

iron basalt Jul 11, 2024, 9:30 PM

#

harsh sun damn. so i was trying to conceptualize something that isnt conceptualize-able

It's that explaining each part does not mean anything to a human, there is nothing to explain because explaining something requires you to simplify it and so it need a nice simplifcation.

wooden sail Jul 11, 2024, 9:30 PM

#

harsh sun so <@467435887236612106> the dot product acting as a method of assigning similar...

not in any way that is useful to you

harsh sun Jul 11, 2024, 9:30 PM

#

wooden sail not in any way that is useful to you

so, could u write a neural net without dot products?

wooden sail Jul 11, 2024, 9:31 PM

#

yes

harsh sun Jul 11, 2024, 9:31 PM

#

interesting

wooden sail Jul 11, 2024, 9:31 PM

#

you could use other functions instead. what even IS a neural network?

harsh sun Jul 11, 2024, 9:31 PM

#

thats rlly interesting actually

hearty depot Jul 11, 2024, 9:31 PM

#

harsh sun i read abt the neurologist who developed them first. quite interesting.

lol computational neuroscience is acc so insane, i read some neurodynamic papers and i was like wtf

wooden sail Jul 11, 2024, 9:31 PM

#

it's often a lot more useful to just think of it as function composition

harsh sun Jul 11, 2024, 9:31 PM

#

wooden sail you could use other functions instead. what even IS a neural network?

idk ive always read its js meant to mimic a brain

wooden sail Jul 11, 2024, 9:31 PM

#

harsh sun idk ive always read its js meant to mimic a brain

this doesn't help at all tbh

harsh sun Jul 11, 2024, 9:31 PM

#

ik

#

but it makes me feel happier inside

wooden sail Jul 11, 2024, 9:32 PM

#

not to mention it's not even true (anymore)

#

double whamie

harsh sun Jul 11, 2024, 9:32 PM

#

wooden sail double whamie

huh? didnt the creator mimic a human brain

#

thats how he developed the concept

#

he was a neurologist or smth

wooden sail Jul 11, 2024, 9:32 PM

#

harsh sun huh? didnt the creator mimic a human brain

yeah but we don't use them as they were originally envisioned, and the original analysis was also sorely lacking

iron basalt Jul 11, 2024, 9:32 PM

#

Like when you look at a puddle of water, and you splash it, and that causes some output. You can give some general properties and even explain how each individual particle works on its own, but if I gave you some random state in that dynamical process and asked you what it "means" for the output, there is nothing to really say other than it will cause the output to happen eventually / has a lot of random information.

hearty depot Jul 11, 2024, 9:32 PM

#

harsh sun idk ive always read its js meant to mimic a brain

i mean ig if u look at the earliest models, they imitated neuron in the sense that they made certain decisions based on whether a certain thershold value was outputted or not
kinda like how synapses fire

wooden sail Jul 11, 2024, 9:32 PM

#

carrying that forward will only hurt you

harsh sun Jul 11, 2024, 9:33 PM

#

wooden sail yeah but we don't use them as they were originally envisioned, and the original ...

oh. so we evolved the original idea into a more fleshed out jumble of... nonsense?

harsh sun Jul 11, 2024, 9:33 PM

#

iron basalt Like when you look at a puddle of water, and you splash it, and that causes some...

hm

harsh sun Jul 11, 2024, 9:33 PM

#

hearty depot i mean ig if u look at the earliest models, they imitated neuron in the sense th...

hm

#

i find it ironic that the tool thats meant to make sense out of things that dont always make sense itself doesnt make sense

#

😐

hearty depot Jul 11, 2024, 9:34 PM

#

harsh sun hm

if u want to something closer to brain model, look at ml papers on assembly calculus
that is prob what ur looking for

harsh sun Jul 11, 2024, 9:34 PM

#

hearty depot if u want to something closer to brain model, look at ml papers on assembly calc...

idk enough calc for that. i just learned what we needed to do linear algebra 😭

#

im js taking a class

wooden sail Jul 11, 2024, 9:35 PM

#

harsh sun 😐

AI was never a tool to make sense of things, it's a tool for replacing one problem with another

harsh sun Jul 11, 2024, 9:35 PM

#

wooden sail AI was never a tool to make sense of things, it's a tool for replacing one probl...

huh

#

wdym

#

i mean isnt it inherently meant to solve things that arent classicly solvable using standard decision trees?

wooden sail Jul 11, 2024, 9:36 PM

#

it addressed the problem of not knowing what a function is, and replaces it with "maybe if i have enough examples of inputs and outputs, i can get something similar to it"

wooden sail Jul 11, 2024, 9:36 PM

#

harsh sun i mean isnt it inherently meant to solve things that arent classicly solvable us...

except with 0 guarantees of any kind

hearty depot Jul 11, 2024, 9:37 PM

#

harsh sun idk enough calc for that. i just learned what we needed to do linear algebra 😭

well its not really calc heavy, its more focused on graph theory/prob

wooden sail Jul 11, 2024, 9:37 PM

#

(not technically true about the 0 guarantees, but the guarantees are usually not of the kind one would want. for some architectures, you can explicitly when under which conditions you'll fit the training data exactly and get overfitting)

harsh sun Jul 11, 2024, 9:37 PM

#

wooden sail except with 0 guarantees of any kind

ig but it clearly is right enough to have made a big difference on our world

#

i didnt realize how present ML is everywhere

#

now transformers are going crazy

wooden sail Jul 11, 2024, 9:38 PM

#

harsh sun ig but it clearly is right enough to have made a big difference on our world

yes, and also for inexplicable reasons, which is why it's not allowed in many safety-critical applications

hearty depot Jul 11, 2024, 9:38 PM

#

wooden sail (not technically true about the 0 guarantees, but the guarantees are usually not...

lol tbf, the guarantees in the sense of pac learning and rademacher complexity do get p messy

iron basalt Jul 11, 2024, 9:39 PM

#

harsh sun i find it ironic that the tool thats meant to make sense out of things that dont...

To understand more about what the problem is and why humans struggle with this, see Wolfram's work (https://www.wolframscience.com/nks/ ).

Stephen Wolfram: A New Kind of Science | Online—Table of Contents

The latest on exploring the computational universe, with free online access to Stephen Wolfram's classic 1,200-page breakthrough book.

#

(It's about complexity)

harsh sun Jul 11, 2024, 9:40 PM

#

wooden sail yes, and also for inexplicable reasons, which is why it's not allowed in many sa...

what are some examples?

harsh sun Jul 11, 2024, 9:41 PM

#

iron basalt To understand more about what the problem is and why humans struggle with this, ...

ty

#

so we dont even know how neural networks come up with patterns.

wooden sail Jul 11, 2024, 9:41 PM

#

100% autonomous vehicles are illegal in most places

harsh sun Jul 11, 2024, 9:41 PM

#

do we even know if they come up with patterns?

harsh sun Jul 11, 2024, 9:41 PM

#

wooden sail 100% autonomous vehicles are illegal in most places

true but they are also newer. much newer and need a lot of work

#

my cousin is building a new autopilot system with his company and he was telling me abt it

wooden sail Jul 11, 2024, 9:42 PM

#

i'm just trying to make sure you're not romanticizing AI as something it isn't

harsh sun Jul 11, 2024, 9:42 PM

#

it was rlly interesting the problems and also solutions

harsh sun Jul 11, 2024, 9:42 PM

#

wooden sail i'm just trying to make sure you're not romanticizing AI as something it isn't

lol trust me i do not romanticize it 😭

#

but im also more wrapping my head around this

hearty depot Jul 11, 2024, 9:43 PM

#

harsh sun do we even know if they come up with patterns?

i mean we been getting a bit better at it w explainable ai but often times when use nn u r gaining flexibility in lieu of interpretability

harsh sun Jul 11, 2024, 9:43 PM

#

so theres no need to deflect its usefulness. it obviously has downsides too

harsh sun Jul 11, 2024, 9:44 PM

#

hearty depot i mean we been getting a bit better at it w explainable ai but often times when ...

huh? so there is explainable?

#

(besides linear regression btw)

hearty depot Jul 11, 2024, 9:45 PM

#

harsh sun huh? so there is explainable?

yeah, there are certain techniques to see which features influence decision making process

#

examples being shap and saliency maps

iron basalt Jul 11, 2024, 9:45 PM

#

harsh sun so we dont even know how neural networks come up with patterns.

Well the problem is what "how" means here. I can give you a general overview, but what you are probably asking for is a step by step demonstration, which gets back to the thing where I could say what each node in the network is computing, but I can't assign a label to that that has meaning to a human at a high level.

harsh sun Jul 11, 2024, 9:45 PM

#

hearty depot examples being shap and saliency maps

😐

harsh sun Jul 11, 2024, 9:45 PM

#

iron basalt Well the problem is what "how" means here. I can give you a general overview, bu...

what is the general overview u would give? im wondering if its what i currently know

iron basalt Jul 11, 2024, 9:46 PM

#

The calculation happening is the best description I can give.

harsh sun Jul 11, 2024, 9:47 PM

#

iron basalt The calculation happening is the best description I can give.

ah well i know that already

#

i understand conceptually how they work how you have your features and with your data, you run the NN over and over again to get the weights and bias that minimize the error. then at the end u can do other mathematical techniques to get the answer into different formats

#

and i know the mathematical functions used

iron basalt Jul 11, 2024, 9:48 PM

#

harsh sun what is the general overview u would give? im wondering if its what i currently ...

After training (assuming it did well on that), we can say that it has found some way to consistently do the dataset correctly. So we can say that it kind of contains some of that in itself (lossily encoded). And maybe, it can answer correctly outside of that original data (hard to tell if it's doing well at this or not and if it just needs more data or not).

hearty depot Jul 11, 2024, 9:49 PM

#

harsh sun ah well i know that already

for stuff like cnn, u can visualize the convolutional layers by the overall values of the convolutions when doing forward step
like this kind hacky but it can help u know what ur netowrk think is improtant or not

harsh sun Jul 11, 2024, 9:49 PM

#

iron basalt After training (assuming it did well on that), we can say that it has found some...

well isnt it merely just the weights and bias it trains. then it uses those weights on new data and it shld predict close to what the actual thing shld be relative to the data being related to the training data

#

(like same category yk what i mean)

harsh sun Jul 11, 2024, 9:50 PM

#

hearty depot for stuff like cnn, u can visualize the convolutional layers by the overall valu...

ah. well im learning abt CNNs today so ill get back to u abt that

#

personally i find linear aggression the most intuitive

#

that was rlly intuitive to me and made a lot of sense

iron basalt Jul 11, 2024, 9:52 PM

#

If you have sparse neural networks, and especially not distributed, it becomes a lot easier to tell because unlike dense not everything is hooked up to everything / a giant soup. Convolutions (networks) have sparse weights (shared weights basically). This makes them better for visualization / understanding, but still not great due to dense activations.

#

Even with sparsity, while better, it's still not explainable when complex enough.

harsh sun Jul 11, 2024, 9:52 PM

#

iron basalt If you have sparse neural networks, and especially not distributed, it becomes a...

to tell what?

iron basalt Jul 11, 2024, 9:52 PM

#

harsh sun to tell what?

What is going on / what it learned.

hearty depot Jul 11, 2024, 9:53 PM

#

iron basalt Even with sparsity, while better, it's still not explainable when complex enough...

Sometimes u just be gambling for lottery tickets LOL

harsh sun Jul 11, 2024, 9:53 PM

#

iron basalt What is going on / what it learned.

wdym what is going on. in terms of what

harsh sun Jul 11, 2024, 9:53 PM

#

hearty depot Sometimes u just be gambling for lottery tickets LOL

lol ppl who do a lot of statistics must be in love with NNs 😭

iron basalt Jul 11, 2024, 9:53 PM

#

hearty depot Sometimes u just be gambling for lottery tickets LOL

Are you referring to the lottery ticket hypothesis?

#

(It's probably right and seems to hold so far from people testing various pruned networks)

iron basalt Jul 11, 2024, 9:54 PM

#

harsh sun wdym what is going on. in terms of what

In a CNN you may actually see it picking up on detecting eyes in an image for example.

late lichen Jul 11, 2024, 10:01 PM

#

I want to write a simple AI and I want to write the logic that runs it, my biggest question is how I will do gradient optimization?

harsh sun Jul 11, 2024, 10:01 PM

#

iron basalt In a CNN you may actually see it picking up on detecting eyes in an image for ex...

sure, but we see the result. not how its working o the inside

late lichen Jul 11, 2024, 10:01 PM

#

I previously made DAGs neuron network but now I want to try MLP

harsh sun Jul 11, 2024, 10:01 PM

#

late lichen I want to write a simple AI and I want to write the logic that runs it, my bigge...

gradient descent?

late lichen Jul 11, 2024, 10:02 PM

#

Yeah

harsh sun Jul 11, 2024, 10:02 PM

#

late lichen Yeah

do you know calculus?

late lichen Jul 11, 2024, 10:02 PM

#

Yes

#

I have watched 3brown1blue vid

harsh sun Jul 11, 2024, 10:03 PM

#

okay. so do u know the sum of squared residuals that will produce the curve that gradient descent will attempt to optimize?

late lichen Jul 11, 2024, 10:03 PM

#

But it for some reason I still don't understand it maybe because he didn't explain how to work with biases to

harsh sun Jul 11, 2024, 10:04 PM

#

sum of squared residuals is what im trying to reference

#

do you know what that is

late lichen Jul 11, 2024, 10:04 PM

#

Yes

harsh sun Jul 11, 2024, 10:04 PM

#

late lichen But it for some reason I still don't understand it maybe because he didn't expla...

well biases are like a y intercept

iron basalt Jul 11, 2024, 10:04 PM

#

harsh sun sure, but we see the result. not how its working o the inside

What I mean is that if you have a fully connected layer like in an MLP, you will find it hard to point at same part of it and go "see? eyes!"

harsh sun Jul 11, 2024, 10:04 PM

#

its just another value that can help make the prediction more correct

#

so instead of training just weights with features, you can have an additional value that is determined and factored into the overall calculation

#

they arent always necessary

iron basalt Jul 11, 2024, 10:05 PM

#

But that is a human problem, because it's about what means something to a human.

harsh sun Jul 11, 2024, 10:05 PM

#

iron basalt What I mean is that if you have a fully connected layer like in an MLP, you will...

hm

#

interesting

harsh sun Jul 11, 2024, 10:06 PM

#

harsh sun they arent always necessary

does this make sense @late lichen

late lichen Jul 11, 2024, 10:06 PM

#

I got the idea of back propagation but I feel like it only update the weights and never with biases

harsh sun Jul 11, 2024, 10:06 PM

#

gradient descent comes in once u understand sum of squared residuals

harsh sun Jul 11, 2024, 10:07 PM

#

late lichen I got the idea of back propagation but I feel like it only update the weights an...

you perform the gradient calculation for biases as well

#

which updates it as the model is trained

#

i can find the formula for u if u need tho

#

like the math associated with updating it

late lichen Jul 11, 2024, 10:09 PM

#

Let's say we got the perfect weights but with that isn't will already work? I mean if we ramp up the bias ,the descendant node will recognize that node to be active isn't?

hearty depot Jul 11, 2024, 10:09 PM

#

iron basalt Are you referring to the lottery ticket hypothesis?

Yeah, lora and pruning black magic

harsh sun Jul 11, 2024, 10:09 PM

#

late lichen Let's say we got the perfect weights but with that isn't will already work? I me...

it can still work without bias. they arent necessary

iron basalt Jul 11, 2024, 10:10 PM

#

hearty depot Yeah, lora and pruning black magic

Yeah, it's nice. Although you can also start sparse too, but since deep learning tools already exist it can often be easier to use that and then prune after.

#

Unless you have to start sparse or it won't run fast enough.

harsh sun Jul 11, 2024, 10:11 PM

#

@iron basalt btw, I read here that this is a thing, but ive never heard of this in my studies so far

Each node connects to others, and has its own associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network.
is this js another form of an activation function?

#

like relu?

#

looks to be that way

iron basalt Jul 11, 2024, 10:12 PM

#

harsh sun <@119925597395877889> btw, I read here that this is a thing, but ive never heard...

That's the original MLP.

harsh sun Jul 11, 2024, 10:12 PM

#

iron basalt That's the original MLP.

is it talking abt an activation function tho?

iron basalt Jul 11, 2024, 10:12 PM

#

harsh sun is it talking abt an activation function tho?

Yes.

harsh sun Jul 11, 2024, 10:12 PM

#

ok ty

late lichen Jul 11, 2024, 10:13 PM

#

harsh sun it can still work without bias. they arent necessary

But I was also thinking that when the node is completely inactive it will be 0 no matter what

#

Wait I think I get it, so we perform back prop on weights and after that we do it on biases?

iron basalt Jul 11, 2024, 10:14 PM

#

iron basalt That's the original MLP.

Was connected to a camera with a 400 pixel image to recognize images.

#

First perceptron implementation.

late lichen Jul 11, 2024, 10:15 PM

#

I was thinking that even we have perfect weights we always lack something if we don't tell how much the node want to be activated

iron basalt Jul 11, 2024, 10:15 PM

#

Then they added more layers, although as you can see the physical implementation is not really a great idea.

harsh sun Jul 11, 2024, 10:16 PM

#

iron basalt Yes.

And the sole reason is to introduce non linearity?

harsh sun Jul 11, 2024, 10:17 PM

#

iron basalt Was connected to a camera with a 400 pixel image to recognize images.

Hm

iron basalt Jul 11, 2024, 10:17 PM

#

harsh sun And the sole reason is to introduce non linearity?

For the perceptron there were no layers, so it was just at the end to decide 1, -1 (class).

harsh sun Jul 11, 2024, 10:18 PM

#

iron basalt For the perceptron there were no layers, so it was just at the end to decide 1, ...

Oh. But for like modern Anna’s

#

NNs

iron basalt Jul 11, 2024, 10:18 PM

#

harsh sun Oh. But for like modern Anna’s

Yes, if you don't add non-linearity you can collapse the whole thing.

harsh sun Jul 11, 2024, 10:18 PM

#

iron basalt Yes, if you don't add non-linearity you can collapse the whole thing.

Yeah like with XOR.

late lichen Jul 11, 2024, 10:19 PM

#

late lichen Wait I think I get it, so we perform back prop on weights and after that we do i...

?@harsh sun

iron basalt Jul 11, 2024, 10:19 PM

#

late lichen ?<@520741459478052886>

Biases are more weights, but input always set to 1.

late lichen Jul 11, 2024, 10:21 PM

#

But isn't if we ramp the bias it will increase the final val more than just weights?

hearty depot Jul 11, 2024, 10:23 PM

#

late lichen Wait I think I get it, so we perform back prop on weights and after that we do i...

wdym backprop on the biases?

#

they r performed at the same time via chain rule

late lichen Jul 11, 2024, 10:29 PM

#

If we need to increase the gradient of a single node how we will know that we need to increase the weights instead of bias?

late lichen Jul 11, 2024, 10:30 PM

#

hearty depot they r performed at the same time via chain rule

How it works?

hearty depot Jul 11, 2024, 10:30 PM

#

late lichen How it works?

we have a convex loss function

#

we calculate gradient in respect to that and update weights and bias layerwise using an algo called autodiff

#

basically top down approach

late lichen Jul 11, 2024, 10:31 PM

#

I don't know much on "convex loss function"

late lichen Jul 11, 2024, 10:32 PM

#

hearty depot basically top down approach

Too dark i can't se...

hearty depot Jul 11, 2024, 10:34 PM

#

late lichen I don't know much on "convex loss function"

this is a basic definition and not very precise but convex loss means u can draw a line between any two points and every points in between the two points would be below the line

late lichen Jul 11, 2024, 10:47 PM

#

Cool

#

Let's say we have 2 nodes hidden and output nodes

What the formula looks like on the bias and weights to update it's value?

#

@hearty depot

narrow tiger Jul 11, 2024, 11:38 PM

#

LangChainDeprecationWarning: The class `LLMChain` was deprecated in LangChain 0.1.17 and will be removed in 1.0. Use RunnableSequence, e.g., `prompt | llm` instead.```
how do i resolve these

severe hare Jul 11, 2024, 11:50 PM

#

prompt | llm is going to be your input prompt as a class instance.

late lichen Jul 11, 2024, 11:53 PM

#

Thats supposed to be English enough....

severe hare Jul 11, 2024, 11:54 PM

#

Eh that doesn't show up in the docs very much though. They have OpaquePrompts but it's supposed to work with API calls more I think

severe hare Jul 11, 2024, 11:54 PM

#

narrow tiger ``` LangChainDeprecationWarning: The class `LLMChain` was deprecated in LangChai...

Nevermind; I don't know

harsh sun Jul 12, 2024, 12:28 AM

#

btw, why do neural networks not solve non-linear things without a non-linear activation. doesnt it discover patterns based off of the values it calculates?

small wedge Jul 12, 2024, 12:34 AM

#

harsh sun btw, why do neural networks not solve non-linear things without a non-linear act...

because you can't fit a linear function to a nonlinear function

#

neural networks without nonlinear activations are just linear function estimators

late lichen Jul 12, 2024, 12:35 AM

#

harsh sun btw, why do neural networks not solve non-linear things without a non-linear act...

Maybe add more layer

#

¯_(ツ)_/¯

small wedge Jul 12, 2024, 12:36 AM

#

adding more layers does nothing if you don't add nonlinearity, it's just making a bigger linear function

late lichen Jul 12, 2024, 12:36 AM

#

Hmmm

#

Kinda make sence

#

gelu is good function for hidden layers instead of relu?

late lichen Jul 12, 2024, 12:38 AM

#

small wedge neural networks without nonlinear activations are just linear function estimator...

Make more sence here

small wedge Jul 12, 2024, 12:49 AM

#

late lichen gelu is good function for hidden layers instead of relu?

gelu is an activation function made to introduce a form of deterministic regularization into the activation function instead of as a separate, stochastic process (like dropout for example). It preformed pretty well compared to ReLU on the benchmarks in the original paper https://arxiv.org/pdf/1606.08415v5

harsh sun Jul 12, 2024, 12:52 AM

#

small wedge because you can't fit a linear function to a nonlinear function

neural networks dont result in a line tho. i dont even get how neural networks work, and ppl said no one does. but if its all about these random things coming together to create an output, how does non-linearity achieve that

#

im having trouble conceptualizing NNs

small wedge Jul 12, 2024, 12:56 AM

#

harsh sun neural networks dont result in a line tho. i dont even get how neural networks w...

Neural networks don't result in a line because we add nonlinearity to them. https://www.youtube.com/watch?v=s-V7gKrsels this might be a good watch for you, this topic is often described using the "boundary problem".

serene scaffold Jul 12, 2024, 12:56 AM

#

harsh sun neural networks dont result in a line tho. i dont even get how neural networks w...

when people say "no one knows how neural networks work", they don't really mean what it sounds like that means at face value.

small wedge Jul 12, 2024, 12:56 AM

#

^^

#

people understand neural networks enough to build them, what we don't understand is the actual data stored inside of massive models after training; the values of the weights and biases and what they relate to are viewed as a black box. Say you are given a single parameter from GPT4's weights and you can change it up or down, we don't have a way to know exactly how that change will effect the output without testing that output. If we did, we could do "AI brain surgery" and fine tune models by hand instead of having to train them further through things like RLHF to align them.

worldly dawn Jul 12, 2024, 1:05 AM

#

harsh sun im having trouble conceptualizing NNs

It might be worth picking a resource on neural networks too.
For instance https://www.deeplearningbook.org/contents/mlp.html is free and goes over the non-linearity part, but there are other resources which might be more to your liking as well

harsh sun Jul 12, 2024, 1:10 AM

#

serene scaffold when people say "no one knows how neural networks work", they don't really mean ...

i mean thats what 3 ppl were telling me earlier today (i wont say who)

harsh sun Jul 12, 2024, 1:10 AM

#

worldly dawn It might be worth picking a resource on neural networks too. For instance <https...

im in a class with a professor who didnt describe how they work besides from the mathematical functions

#

and ive read 10-20 articles and it never explained how each node is significant to the total output

harsh sun Jul 12, 2024, 1:11 AM

#

small wedge people understand neural networks enough to build them, what we don't understand...

yeah no ive known that.

#

ill send the thread

#

#data-science-and-ml message

serene scaffold Jul 12, 2024, 1:13 AM

#

harsh sun i mean thats what 3 ppl were telling me earlier today (i wont say who)

When they said "no one knows how neural networks work", that statement can be true or false depending on what is meant by "work".

harsh sun Jul 12, 2024, 1:14 AM

#

serene scaffold When they said "no one knows how neural networks work", that statement can be tr...

my thread conveys the sort of things i was wondering

#

theres a lot so its hard to rephrase it now

harsh sun Jul 12, 2024, 1:18 AM

#

small wedge people understand neural networks enough to build them, what we don't understand...

i just cant conceptualize that neural networks are linear inherently if u didnt have the activation

worldly dawn Jul 12, 2024, 1:18 AM

#

harsh sun i just cant conceptualize that neural networks are linear inherently if u didnt ...

you could try to take a simple network and write down the equation

small wedge Jul 12, 2024, 1:19 AM

#

okay so why don't we take a simple model and lay it out as a single function that will show you what's happening

small wedge Jul 12, 2024, 1:19 AM

#

worldly dawn you could try to take a simple network and write down the equation

exactly this

harsh sun Jul 12, 2024, 1:19 AM

#

worldly dawn you could try to take a simple network and write down the equation

i know the equation is linear

#

i know the math for the neural networks

#

but like, the patterns that the neural net discovers as it trains. i dont see how it results in a line

worldly dawn Jul 12, 2024, 1:22 AM

#

to be honest, I am not clear on the hold up for you

small wedge Jul 12, 2024, 1:23 AM

#

it has nothing to do with the patterns it learns, the neural network without nonlinearity is only able to output a line. it's job is just to find the best line that fits.

#

if I give you a linear function like x * w + b how could you ever represent anything other than a line by changing w and b? (these are scalars in the example)

harsh sun Jul 12, 2024, 1:24 AM

#

small wedge if I give you a linear function like `x * w + b` how could you ever represent an...

arent there a bunch of lines though? like a line per node?

#

because the equation is linear per node

iron basalt Jul 12, 2024, 1:26 AM

#

harsh sun arent there a bunch of lines though? like a line per node?

https://www.desmos.com/calculator/xm6x1obhry

Desmos

Desmos | Graphing Calculator

#

s(x) is the activation function, change it to just x and see what happens.

harsh sun Jul 12, 2024, 1:27 AM

#

iron basalt https://www.desmos.com/calculator/xm6x1obhry

holy shit

harsh sun Jul 12, 2024, 1:28 AM

#

iron basalt https://www.desmos.com/calculator/xm6x1obhry

so the non-linearity allows the weights to create a form like this

#

otherwise no matter what its a straight line

iron basalt Jul 12, 2024, 1:28 AM

#

harsh sun so the non-linearity allows the weights to create a form like this

The decision boundary curves.

harsh sun Jul 12, 2024, 1:29 AM

#

iron basalt The decision boundary curves.

huh?

iron basalt Jul 12, 2024, 1:29 AM

#

This is XOR, the boundary between the two classes is that curve.

#

Depending on which side the point lies, it spits out that class.

harsh sun Jul 12, 2024, 1:30 AM

#

iron basalt Depending on which side the point lies, it spits out that class.

i see. so as the model trains, this whole blob can take many different forms right?

#

depending on the task

iron basalt Jul 12, 2024, 1:30 AM

#

harsh sun i see. so as the model trains, this whole blob can take many different forms rig...

You can change the weights, move the sliders, see what happens.

#

Try making up your own points in the table, can you manually find the solution (weights)?

small wedge Jul 12, 2024, 1:31 AM

#

harsh sun because the equation is linear per node

say we add another layer to my example l1 = x * w1 + b1 l2 = l1 * w2 + b2 we can break this down to (x * w1 + b1) * w2 + b2, no matter how many multiplication or additions you add to this function the output will always be linear

harsh sun Jul 12, 2024, 1:31 AM

#

iron basalt You can change the weights, move the sliders, see what happens.

why has no one else recommended this tool?

#

this is like ground breaking

serene scaffold Jul 12, 2024, 1:32 AM

#

A lot of people know about desmos.

harsh sun Jul 12, 2024, 1:33 AM

#

so @iron basalt by making is non-linear, your output would inherently be shaped weirdly. training it just manipulates the "thing" so that it fits the solutions?

harsh sun Jul 12, 2024, 1:33 AM

#

serene scaffold A lot of people know about desmos.

but using it to display the result of an nn

iron basalt Jul 12, 2024, 1:37 AM

#

harsh sun so <@119925597395877889> by making is non-linear, your output would inherently b...

Not sure what the thing is, but it does make it not a straight line.

#

Try ReLU btw.

#

#

You can visually see the line-ness of ReLU.

harsh sun Jul 12, 2024, 1:44 AM

#

iron basalt You can visually see the line-ness of ReLU.

so weights are essentially molding this into a solution

iron basalt Jul 12, 2024, 1:45 AM

#

harsh sun so weights are essentially molding this into a solution

Yeah, that is what parameters do, they morph stuff to do whatever.

harsh sun Jul 12, 2024, 1:46 AM

#

iron basalt Yeah, that is what parameters do, they morph stuff to do whatever.

how does each specific node contribute to this final thing?

iron basalt Jul 12, 2024, 1:46 AM

#

harsh sun how does each specific node contribute to this final thing?

It's in the desmos page.

harsh sun Jul 12, 2024, 1:46 AM

#

iron basalt It's in the desmos page.

wdym

harsh sun Jul 12, 2024, 1:46 AM

#

iron basalt It's in the desmos page.

like i know mathematically how each node contributes. the dot product of the nodes of that layer with the weight of the node that its being applied to

iron basalt Jul 12, 2024, 1:47 AM

#

harsh sun wdym

Yeah, that is the "how."

#

It's the literal code.

harsh sun Jul 12, 2024, 1:48 AM

#

iron basalt Yeah, that is the "how."

yeah but like how does it contribute to the solution. like how does doing the dot product between the current layer and the weight of a specific node, how does that value come to impact this in the end.

iron basalt Jul 12, 2024, 1:48 AM

#

harsh sun yeah but like how does it contribute to the solution. like how does doing the do...

That is also in the code.

harsh sun Jul 12, 2024, 1:49 AM

#

iron basalt That is also in the code.

... i know mathematically.....

#

conceptually is what im asking

iron basalt Jul 12, 2024, 1:49 AM

#

harsh sun ... i know mathematically.....

I'm not sure how to answer that / what you are asking for. The math shown there is conceptually that. Math is the best way to describe it.

#

Are you asking why, not how?

harsh sun Jul 12, 2024, 1:53 AM

#

iron basalt I'm not sure how to answer that / what you are asking for. The math shown there ...

actually i think i know why. lmk if this is true. the value you get from the dot product is weighted. so when it trains you can shape the data. then plugging that in again to another node above means that the value passed into that node is a reflection of many previous weights. therefore, by changing the weight, you are having an even greater impact because just one node later in the nn uses many weights.

#

does that sound right?

iron basalt Jul 12, 2024, 1:56 AM

#

harsh sun actually i think i know why. lmk if this is true. the value you get from the dot...

It's just feeding one into another. Make a new blank desmos and play around with random functions and their combinations. Get a feel for how to make up functions / shape and such.

#

Try some stuff like f(x)=x, f(x)=cos(x), f(x)=cos(2x), f(x)=x+cos(2x). Try adding parameter sliders, like f(x)=ax+bcos(cx). Then try making another function, g(x), and try making it make use of f(x).

#

Then know that a NN is just a bunch of these combinations of various functions to solve the problem. But instead of having cos(x) in there (although some NNs do), we have just something simple, like ReLU(dot product and bias), and by composing a ton of them we can also get what we want.

serene grail Jul 12, 2024, 2:07 AM

#

So many functions combine into one function that is supposed to approximate the data?

ionic valley Jul 12, 2024, 2:15 AM

#

https://discord.com/channels/267624335836053506/1261143782179606619

Any help?

snow garden Jul 12, 2024, 3:19 AM

#

hey yall, i'm new here and i was just wondering how long yall have been programming/working with AI and what its like in your opinion 🙂

unkempt apex Jul 12, 2024, 3:21 AM

#

yeah so start with you!

rich moth Jul 12, 2024, 3:25 AM

#

snow garden hey yall, i'm new here and i was just wondering how long yall have been programm...

It's a rapidly progressing technology that seems to be making strides in a small amount of time, considering its capacity and its forward progress. What's not to like? I've read in headlones AI is a boom like the dot.com, but i think those people are fools that dont understand enough.

harsh sun Jul 12, 2024, 3:53 AM

#

iron basalt It's just feeding one into another. Make a new blank desmos and play around with...

Ok ty

harsh sun Jul 12, 2024, 3:53 AM

#

snow garden hey yall, i'm new here and i was just wondering how long yall have been programm...

Started like two days ago.

#

Depending on which AI method u use is differing levels of complexity

#

I’d say linear regression is pretty easy

#

Neural networks, now that I get them, are a bit more complex. CNNs are weird I’m still working em out.

ionic valley Jul 12, 2024, 4:48 AM

#

is there any point in accounting for multicollinearity when using methods like ridge and lasso?

wooden sail Jul 12, 2024, 6:09 AM

#

ionic valley https://discord.com/channels/267624335836053506/1261143782179606619 Any help?

Does lasso avoid multicollinearity?
it CAN, but not always. it depends on whether the matrix involved in the computation has a large enough "kruskal rank"
if there are two highly related terms, wouldn't lasso just pick one, and shrink the other to zero, essentially rendering techniques like vif useless?
related to the previous point, it favors terms that result in a lower L1 norm when deciding which terms to ignore. it can only do this in a useful way if the kruskal rank is high enough, and the result you get is only useful if what you want is a sparse solution. VIF is a statistical criterion based on covariance. if you interpret LASSO as the maximum a posteriori estimator, what it does is assume the solution vector follows a Laplace distribution, but says nothing about the covariance directly. they do different things, you have to decide which approach works for your problem.
Lasso is strong for a large number of predictors, but what's the limit?
idk what you mean with this. usually what matters is the ratio of nonzero predictors vs the total, which is also related to the kruskal rank. there's a thing called "phase transition maps" which plot out at what level of sparsity L1 regularization methods break down.
What stops me from simply "feature engineering" tons of useless variables on the basis that one of them might be good for the model? After all, anything deemed useless will shrink to zero given a good lambda value.
the more correlated the features are with each other, the worse L1 regularization works, so adding extra features is pretty much always bad no matter how you look at it

deep sleet Jul 12, 2024, 6:50 AM

#

Does google colab offer GPUs for free access?

wooden sail Jul 12, 2024, 6:51 AM

#

yes but they're shared and sometimes you don't get immediate access

#

it can tell you to wait if you've been using too much compute time

deep sleet Jul 12, 2024, 6:54 AM

#

but how do I know if I have access to one or not

#

I am using device = torch.device("cuda" if torch.cuda.is_available() else "cpu") and it says CPU

wooden sail Jul 12, 2024, 6:54 AM

#

the code won't run 😛

#

you have to change the runtime type to be able to see the gpu

deep sleet Jul 12, 2024, 6:54 AM

#

oh

#

T4?

wooden sail Jul 12, 2024, 6:55 AM

#

sure

deep sleet Jul 12, 2024, 6:56 AM

#

wooden sail sure

I mean what is the difference between TPU v2 and T4 GPU?

wooden sail Jul 12, 2024, 6:56 AM

#

tpu is not a gpu

#

i've never played with tpu's myself so i'm not familiar with the specs and i can't really comment on what is better here

#

try them out and see what happens

deep sleet Jul 12, 2024, 6:57 AM

#

oh

#

I think I will google what they are first

vagrant root Jul 12, 2024, 6:57 AM

#

deep sleet I mean what is the difference between TPU v2 and T4 GPU?

tpu is google's gpu specific for ai mat muls

#

gpu is specific for matmuls

deep sleet Jul 12, 2024, 6:58 AM

#

oh

vagrant root Jul 12, 2024, 6:58 AM

#

ye

deep sleet Jul 12, 2024, 6:58 AM

#

Thx

vagrant root Jul 12, 2024, 6:59 AM

#

go for t4 gpus

#

tpus are deprecated

deep sleet Jul 12, 2024, 6:59 AM

#

Noted but what does deprecated mean?

vagrant root Jul 12, 2024, 6:59 AM

#

not up to the technology

#

left to die projects basically

wooden sail Jul 12, 2024, 7:00 AM

#

not quite right, they're way more power efficient

vagrant root Jul 12, 2024, 7:01 AM

#

wooden sail not quite right, they're way more power efficient

i heard google were pulling off tpus?

wooden sail Jul 12, 2024, 7:01 AM

#

i would also imagine they work great for anything using XLA (which pytorch does not use by default)

wooden sail Jul 12, 2024, 7:02 AM

#

vagrant root i heard google were pulling off tpus?

a new one was announced like 3 months ago

vagrant root Jul 12, 2024, 7:02 AM

#

moving towards tensor chips?

vagrant root Jul 12, 2024, 7:02 AM

#

wooden sail a new one was announced like 3 months ago

oh i will look into that

wooden sail Jul 12, 2024, 7:02 AM

#

vagrant root moving towards tensor chips?

isn't that what TPUs are?

haughty cradle Jul 12, 2024, 7:02 AM

#

I know some of them but where can I learn the others?

#

I only know Input, Hidden, Output, and Recurrent cell

#

don't know others things

vagrant root Jul 12, 2024, 7:03 AM

#

wooden sail isn't that what TPUs are?

let me check

vagrant root Jul 12, 2024, 7:05 AM

#

haughty cradle I know some of them but where can I learn the others?

they are similar
inputs/different types of input notation
neurons/different notations
outputs/different notaions
recuurent/different
convolution you can learn via

#

https://youtu.be/KuXjwB4LzSA?si=oRF9p9DBUizjGvYZ

YouTube

3Blue1Brown

But what is a convolution?

Discrete convolutions, from probability to image processing and FFTs.
Video on the continuous case: https://youtu.be/IaSGqQa5O-M
Help fund future projects: https://www.patreon.com/3blue1brown
Special thanks to these supporters: https://3b1b.co/lessons/convolutions#thanks
An equally valuable form of support is to simply share the videos.

Other v...

▶ Play video

haughty cradle Jul 12, 2024, 7:05 AM

#

thx

vagrant root Jul 12, 2024, 7:09 AM

#

haughty cradle thx

FROM CLAUDE:
Input Cell: The entry point for data into a neural network.
Backfed Input Cell: An input cell that receives feedback from later layers.
Noisy Input Cell: Adds random noise to input data for improved generalization.
Hidden Cell: Processes and transforms data in intermediate network layers.
Probablistic Hidden Cell: Incorporates randomness for modeling uncertainty.
Spiking Hidden Cell: Mimics biological neurons by firing at specific thresholds.
Capsule Cell: Groups neurons to represent entities and preserve hierarchical relationships.
Output Cell: Produces the final prediction or result of the network.
Match Input Output Cell: Attempts to recreate specific input patterns in its output.
Recurrent Cell: Processes sequential data using self-looping connections.
Memory Cell: Stores information over time in recurrent networks.
Gated Memory Cell: Controls information flow with additional mechanisms.
Kernel: A small matrix of weights used in convolutional operations.
Convolution or Pool: Applies kernels or reduces spatial dimensions of data.

haughty cradle Jul 12, 2024, 7:09 AM

#

thx ❤️

vagrant root Jul 12, 2024, 7:12 AM

#

wooden sail isn't that what TPUs are?

apparently tensor chips are mobile only, my bad

past meteor Jul 12, 2024, 7:28 AM

#

I think I had a meta overfitting problem 🥴

#

The optimization algorithm you use for hyperparam tuning can also overfit if the tuning set is fixed. After some time it may find hyperparams that don't generalize to other datasets

#

That's really interesting

lapis sequoia Jul 12, 2024, 7:33 AM

#

if some one want data science and coding related courses dm me

past meteor Jul 12, 2024, 7:39 AM

#

I have a 3 way split

#

It's on the holdout test you can see if the hyperparam tuning "overfit"

#

This is 100% how you should do it

#

But

#

I don't cross validate neural nets

#

All the rest, yes

#

Takes too much time

#

Because you mostly do DL right?

#

Doing the entire thing is way better, improves skin quality, sleep at night, blood pressure etc.

#

But it takes too long for deep nets

#

I'd argue that not training a model means you still need to evaluate

#

Unless it's low stakes stuff

#

That's not what I meant 😮

#

If I were doing RAGs in the real world I'd spend a lot of time thinking about how to quantify their performance

#

That's the most high value literature out there right now for "industry ML"

#

You're not training anything but you do need to iterate and be able to quantify the performance gain somehow, otherwise it's anecdotal and small sample based

#

How will you evaluate a RAG?

#

Evaluation is the thing I take the most seriously 😄

#

There was talk of collaborating with the emergency services to make a model to detect people falling

#

95-99% of such a project is data collection + evaluation framework stuff

#

I don't think it's something that data people take seriously tho

#

My colleague just had some videos of him falling and was making the models on that

vagrant root Jul 12, 2024, 7:55 AM

#

is a 2000 sample data valid in your opinion?

#

to form a hypothesis

past meteor Jul 12, 2024, 7:55 AM

#

vagrant root is a 2000 sample data valid in your opinion?

Are you talking about hypothesis testing?

vagrant root Jul 12, 2024, 7:56 AM

#

past meteor Are you talking about hypothesis testing?

no research

past meteor Jul 12, 2024, 7:56 AM

#

I don't want to sound annoying but your question isn't specific enough 🙂

vagrant root Jul 12, 2024, 7:56 AM

#

would you consider a research on 2000 element dataset valid? if it provides good result

past meteor Jul 12, 2024, 7:56 AM

#

"To form a hypothesis" what do you exactly mean with this?

#

Are you training models? What are you doing

#

Or statistical tests? Or data analysis?

vagrant root Jul 12, 2024, 7:57 AM

#

i have a method proposal which takes 2000 samples and trains the model on it

#

the model performs with an accuracy of 85 on testing and 94 on validation

#

would you consider the method to be a valid research or is the data to small to infer