#data-science-and-ml | Python | Page 155

mortal bolt Dec 29, 2024, 2:49 PM

#

thanks billy, i will take a look

wooden sail Dec 29, 2024, 2:49 PM

#

a while back, someone i know asked similar questions regarding indexing and slicing subarrays on stack overflow. maybe my answer helps you some https://stackoverflow.com/questions/76627832/understanding-the-behaviour-of-advanced-multi-dimensional-indexing-on-a-4d-ndarr

Stack Overflow

Understanding the behaviour of advanced multi-dimensional indexing ...

Scenario
I have a 4D ndarray consisting of multiple 3D images/voxels with dimensions (voxels, dim1, dim2, dim3), let's say (12 voxels, 96 pixels, 96 pixels, 96 pixels). My goal is to sample a range...

thorny geode Dec 29, 2024, 3:06 PM

#

thank you 🙂

thorny geode Dec 29, 2024, 5:00 PM

#

wooden sail a while back, someone i know asked similar questions regarding indexing and slic...

thank you 🙂

thorny geode Dec 29, 2024, 5:01 PM

#

wooden sail a while back, someone i know asked similar questions regarding indexing and slic...

Its a great explanation! I've also learned that on ISLP, so I say I'm on a good path of learning data science

#

ah, is that a good thing or a bad thing? sorrys i want to clarify what are you trying to convey

left tartan Dec 29, 2024, 5:04 PM

#

thorny geode ah, is that a good thing or a bad thing? sorrys i want to clarify what are you t...

I'm saying that's a hard example to understand for a beginner

#

So it's ok to be confused

thorny geode Dec 29, 2024, 5:17 PM

#

left tartan So it's ok to be confused

yey 😄 im sure there is no more things to be confused of

pale sierra Dec 29, 2024, 8:17 PM

#

Hi. Stelercus. What do you work on?

serene scaffold Dec 29, 2024, 8:18 PM

#

pale sierra Hi. Stelercus. What do you work on?

I can't really go into specifics, but it often involves fine-tuning interactive LLMs from huggingface and creating pipelines for them to complete specific tasks.

pale sierra Dec 29, 2024, 8:21 PM

#

serene scaffold I can't really go into specifics, but it often involves fine-tuning interactive ...

How long have you been coding?

serene scaffold Dec 29, 2024, 8:21 PM

#

pale sierra How long have you been coding?

seven years, I guess?

pale sierra Dec 29, 2024, 8:25 PM

#

serene scaffold seven years, I guess?

me too

#

but its been on and off

fallen gorge Dec 29, 2024, 8:35 PM

#

Hey guys, I'm a new member,, I'm a beginner in the coding world, I was looking for some guidance from you guys. Where do I begin from?

serene scaffold Dec 29, 2024, 8:39 PM

#

fallen gorge Hey guys, I'm a new member,, I'm a beginner in the coding world, I was looking f...

!resources

arctic wedgeBOT Dec 29, 2024, 8:39 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

fallen gorge Dec 29, 2024, 8:41 PM

#

@serene scaffold @serene scaffold Thanks 😊

rich moth Dec 29, 2024, 10:56 PM

#

Starting to see some really strange learning dynamics. It learns in like three phases from what I can tell. First, it quickly learns basic patterns (epoch 0-50), then adapts its quantum "inspired" features to tackle more complex stuff (epochs 50-100). Then seems it stabilizes and dynamically transitions between states based on market complexity looking for an optimal balance. But I thought you guys might be interested to see it, I don't think I've run into intersecting lines like that before. I had an idea for visualizing the plots, I was gonna to save them every epoch and combine them in chronological order to make like a "flip book" of the animation. Or does anyone know if their any modules forr this?

lapis sequoia Dec 30, 2024, 2:19 AM

#

Does being stupid cracked at game theory, like, being the actual best response to all responses help with GANs and reinforcement learning? Pic related, it was years ago, but still, does game theory help a lot with RL and GANS?

simple plinth Dec 30, 2024, 6:45 AM

#

from where can i start learning DS+AI?

devout cloak Dec 30, 2024, 6:49 AM

#

I mean a GANs system is literally a two player zero sum game where the goal is to reach the Nash Equilibrium where the discriminator cannot tell the difference between the generated images and the real images.

You could employ strategies for optimization based off of
game theory like having alternating updates to the gradient descent to prevent oscillations within the system

rich river Dec 30, 2024, 8:19 AM

#

it is really annoying there are CPU tensors and cuda tensors

thorny geode Dec 30, 2024, 9:27 AM

#

hello, short question here,

Auto_re.loc[lambda df: df['year'] > 80, ['weight', 'origin']]

how does python knows that df takes argument from Auto_re?

#

its property specific 😑

#

these python lab chapter is almost done !!!!

tawdry sundial Dec 30, 2024, 1:39 PM

#

Are rnn only able to have 3 unique weights in 1 layer?

serene scaffold Dec 30, 2024, 1:40 PM

#

tawdry sundial Are rnn only able to have 3 unique weights in 1 layer?

no

tawdry sundial Dec 30, 2024, 1:41 PM

#

I cant think of a way of fitting more weights

#

w1. w2 and w3

#

they are the same just unrolled

serene scaffold Dec 30, 2024, 1:44 PM

#

an RNN is just "a neural network with a cyclic computation graph and a hidden state". any architecture that does those things is an RNN.

tawdry sundial Dec 30, 2024, 1:45 PM

#

oof

#

that a bit broad definition

serene scaffold Dec 30, 2024, 1:45 PM

#

it's less broad than "neural network".

#

you don't need to know about all possible RNNs.

eager hamlet Dec 30, 2024, 2:07 PM

#

hey, just trying to start out with learning ML

would it be better to go through a course like what Andrew NG taught or read through something like "Dive into Deep Learning"?

serene scaffold Dec 30, 2024, 2:08 PM

#

eager hamlet hey, just trying to start out with learning ML would it be better to go through...

between those two, I would start with Andrew Ng's beginner course.

#

and if the course says that you need to know something before taking the course (like some kind of math), they mean it.

eager hamlet Dec 30, 2024, 2:09 PM

#

yeah I'm familiar with undergraduate level math

eager hamlet Dec 30, 2024, 2:10 PM

#

serene scaffold between those two, I would start with Andrew Ng's beginner course.

cool thanks, and also I saw in the pinned messages here that Columbia's course is better than Andrew Ng's, what would you recommend?

serene scaffold Dec 30, 2024, 2:11 PM

#

eager hamlet cool thanks, and also I saw in the pinned messages here that Columbia's course i...

which pinned message is that? I don't have a preference between the two.

eager hamlet Dec 30, 2024, 2:11 PM

#

#data-science-and-ml message

#

^ this one

serene scaffold Dec 30, 2024, 2:12 PM

#

that was ages ago bing_shrug I don't even know that person

eager hamlet Dec 30, 2024, 2:12 PM

#

ah lol okay

serene scaffold Dec 30, 2024, 2:12 PM

#

but I hope they're doing great, whoever they are.

thorny geode Dec 30, 2024, 2:31 PM

#

Stelercus, have you tried doing kaggle competitions

serene scaffold Dec 30, 2024, 2:35 PM

#

thorny geode Stelercus, have you tried doing kaggle competitions

No.

thorny geode Dec 30, 2024, 2:35 PM

#

serene scaffold No.

Okay. i hope ill make you interested in kaggle

serene scaffold Dec 30, 2024, 2:36 PM

#

~~I hope you, too.~~ refers to a corrected typo

thorny geode Dec 30, 2024, 2:36 PM

#

serene scaffold ~~I hope you, too.~~ refers to a corrected typo

noooo 😱 i misclicked

serene scaffold Dec 30, 2024, 2:36 PM

#

thorny geode Okay. i hope ill make you interested in kaggle

I have a full time job and I'd rather spend the rest of my time doing other things, including helping people on this server.

thorny geode Dec 30, 2024, 2:38 PM

#

serene scaffold I have a full time job and I'd rather spend the rest of my time doing other thin...

alrighty, im glad you have time to help people 😁 including me of course

past meteor Dec 30, 2024, 2:39 PM

#

thorny geode Stelercus, have you tried doing kaggle competitions

I used to do Kaggle's tabular playground series

#

It's good fun but imo Kaggle is far from representative of data science / ML

thorny geode Dec 30, 2024, 2:48 PM

#

past meteor It's good fun but imo Kaggle is far from representative of data science / ML

is it because they only deal with predictions?

past meteor Dec 30, 2024, 2:49 PM

#

That's one reason, the other one is more stuble and harmful

serene scaffold Dec 30, 2024, 2:49 PM

#

thorny geode is it because they only deal with predictions?

probably that the problems you're solving on kaggle are very artificial and neat.

tawdry sundial Dec 30, 2024, 2:53 PM

#

rich moth Starting to see some really strange learning dynamics. It learns in like thr...

how does your validation loss keep on decreasing while training loss stay pretty much fixed (the end) ?

past meteor Dec 30, 2024, 2:53 PM

#

thorny geode is it because they only deal with predictions?

I'll give you the technical definition because it's more concise 😄

Say you have a training distribution and a testing distribution.

For the training set you have access to

X ~ P(X), the distribution of the independent variables
Y ~ P(Y), the distribution of the dependent variables
P(Y|X), their relationships.

For the test set you have access to:

X ~ P(X) <----------- this is the problem (!)

In the real world you totally do not have access to the independent variables you will be predicting for in the future a priori. Kagglers abuse this information a lot by doing stuff like a PCA on the entire dataset instead on just the training set etc.

Also, lots of models in the real world fail because of concept drift, this is precisely when P(X) is changing over time

#

Basically, in all of the competitions I did you had to leak to get ahead

fickle shale Dec 30, 2024, 2:54 PM

#

past meteor I'll give you the technical definition because it's more concise 😄 Say you hav...

so how someone who is fresher ,create protfolio for entry level job?

past meteor Dec 30, 2024, 2:55 PM

#

Or in the real world you have a different issue where you introducing the model will actually influence P(X) and send you to places you've never trained on etc.

past meteor Dec 30, 2024, 2:57 PM

#

fickle shale so how someone who is fresher ,create protfolio for entry level job?

You can do Kaggle, just don't think of it as a panacea 🙂

thorny geode Dec 30, 2024, 2:57 PM

#

serene scaffold probably that the problems you're solving on kaggle are very artificial and neat...

ah, so in the real world, the data is messier and need to be cleaned up

rich moth Dec 30, 2024, 2:58 PM

#

tawdry sundial how does your validation loss keep on decreasing while training loss stay pretty...

It's actually by design. The training loss stabilizes because the model has learned the basic patterns, but the validation loss keeps improving because the model is still refining its uncertainty estimates and complexity handling. The negative validation loss isn't an error - it's a feature of how the loss function rewards both accurate predictions and well-calibrated confidence estimates.

past meteor Dec 30, 2024, 2:58 PM

#

Personally, I learnt a lot from doing tabular playground so I'd recommend doing a bit of Kaggle

fickle shale Dec 30, 2024, 2:58 PM

#

past meteor You can do Kaggle, just don't think of it as a panacea 🙂

What's the good way to learn datascience?

past meteor Dec 30, 2024, 2:58 PM

#

As for portfolios? I had one, nobody ever asked in interviews, ever 😄

past meteor Dec 30, 2024, 2:59 PM

#

fickle shale What's the good way to learn datascience?

university + books (check the pinnned thread, I recommend a bunch)

thorny geode Dec 30, 2024, 2:59 PM

#

past meteor I'll give you the technical definition because it's more concise 😄 Say you hav...

...oh... thats why some competitions with prize pool hide their test dataset to avoid abuse, even making the leaderpoint point based only a fraction of the test dataset

past meteor Dec 30, 2024, 3:00 PM

#

thorny geode ...oh... thats why some competitions with prize pool hide their test dataset to...

I heard nowadays some competitions have you submit a model instead of predictions, that effectively solves this problem

#

And then they run inference against your model without ever giving you samples of the test set => a lot fairer

thorny geode Dec 30, 2024, 3:00 PM

#

past meteor Basically, in all of the competitions I did you had to leak to get ahead

i see

thorny geode Dec 30, 2024, 3:01 PM

#

past meteor I heard nowadays some competitions have you submit a model instead of prediction...

ah, yes, i see some competitions do that instead of submitting csv files

tawdry sundial Dec 30, 2024, 3:02 PM

#

rich moth It's actually by design. The training loss stabilizes because the model has lear...

so it is purely from rewarding more simplistic patterns?

past meteor Dec 30, 2024, 3:02 PM

#

Anyway, if you want to learn ML/AI and you have the ability to just go to uni that's my recommendation

tawdry sundial Dec 30, 2024, 3:02 PM

#

validation loss is way below the training loss

#

didnt know that was a thing

past meteor Dec 30, 2024, 3:03 PM

#

tawdry sundial validation loss is way below the training loss

common if you use dropout for example

fickle shale Dec 30, 2024, 3:03 PM

#

past meteor Anyway, if you want to learn ML/AI and you have the ability to just go to uni th...

uni student but pursuing or!

past meteor Dec 30, 2024, 3:03 PM

#

fickle shale uni student but pursuing or!

operations research? That's a good choice

fickle shale Dec 30, 2024, 3:04 PM

#

past meteor operations research? That's a good choice

yeah!

#

lot of math!

tawdry sundial Dec 30, 2024, 3:04 PM

#

past meteor common if you use dropout for example

dropout? how is it common? algorithm minimizes validation loss more than training loss while backpropagating on training loss?

past meteor Dec 30, 2024, 3:06 PM

#

tawdry sundial dropout? how is it common? algorithm minimizes validation loss more than trainin...

Not quite 🙂 it's basically because depending on the implementation neurons are zero'd out in the forward pass for training but not in eval mode when the validation loss is being computed

thorny geode Dec 30, 2024, 3:09 PM

#

past meteor Anyway, if you want to learn ML/AI and you have the ability to just go to uni th...

okay, ill try finishing ISLP before going to uni

past meteor Dec 30, 2024, 3:09 PM

#

thorny geode okay, ill try finishing ISLP before going to uni

You're still in secondary school?

thorny geode Dec 30, 2024, 3:09 PM

#

past meteor You're still in secondary school?

yeah

past meteor Dec 30, 2024, 3:10 PM

#

thorny geode yeah

That means you're miles ahead of where I was when I was your age, keep it up ❤️

thorny geode Dec 30, 2024, 3:10 PM

#

past meteor That means you're miles ahead of where I was when I was your age, keep it up ❤️

thank you 🥲

#

although honestly i can pursue the math aspects of data science since i am lucky enough to experience competitive math and programming beforehand

rich moth Dec 30, 2024, 3:58 PM

#

No, it's more sophisticated than that. It isn't just learning simpler patterns, it's actually learning to balance prediction accuracy with uncertainty estimation. The decreasing validation loss shows the model getting better at both predicting AND knowing how confident it should be in each prediction. That V-shaped error pattern in the plots shows its learning proper market behavior. It's understanding that larger moves inherently have more uncertainty.

lapis sequoia Dec 30, 2024, 4:59 PM

#

I ment well*

eager hamlet Dec 30, 2024, 5:51 PM

#

serene scaffold between those two, I would start with Andrew Ng's beginner course.

hey sorry just confirming

#

you're talking about cs229 right?

serene scaffold Dec 30, 2024, 5:52 PM

#

eager hamlet you're talking about cs229 right?

Idk

eager hamlet Dec 30, 2024, 5:52 PM

#

https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU

YouTube

Stanford CS229: Machine Learning Full Course taught by Andrew Ng | ...

Led by Andrew Ng, this course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (gen...

serene scaffold Dec 30, 2024, 5:54 PM

#

eager hamlet https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU

If that's his most entry-level ml course then go for it

devout cloak Dec 30, 2024, 6:48 PM

#

lapis sequoia

There’s no way to objectively define an answer as this is a subjective question. What are your goals and what kinds of things are you trying to build

odd meteor Dec 30, 2024, 7:08 PM

#

rich river it is really annoying there are CPU tensors and cuda tensors

I think it's probably gonna be less annoying if you look at it this way 😀

By default, when we create a tensor, it'll reside in the CPU. However, you have the liberty to relocate this tensor to a new residence (GPU) if you wanna optimize for speed since GPU usually have more cores than CPU.

It's pretty much easy to move a tensor back and forth from GPU to CPU. You just have to do that with keen attention to avoid performing operation on two tensors that resides in different locations.

candid hornet Dec 30, 2024, 7:32 PM

#

hi everyone
i am new to this server
does any one know about the gguf

#

what is the system requirements for this

#

hi

random dune Dec 30, 2024, 7:44 PM

#

is this the right channel for pandas questions?

mystic peak Dec 30, 2024, 7:53 PM

#

how do I make a reward system with ai learning system I saw a mario kart vid on a person making an ai that impoves the longer it plays and I wondered how it works

#

I want to see if it's possible to make one for a 2d fighting game and see if it can beat actual people

serene scaffold Dec 30, 2024, 7:57 PM

#

random dune is this the right channel for pandas questions?

Yep! Go ahead and ask your question as specifically as you can, and please also put an example of the dataframe in the paste bin. do print(df.head().to_dict('list'))

#

!paste

arctic wedgeBOT Dec 30, 2024, 7:57 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

odd meteor Dec 30, 2024, 8:01 PM

#

fickle shale so how someone who is fresher ,create protfolio for entry level job?

While the number of ML job openings has exploded in recent years, the number of applicants has grown even x10 times.

This means landing an ML job today is WAAAY harder than it was five years ago.

If this makes you feel anxious or triggers that "I’m not good enough" thoughts, just take a deep breath cos everyone has it. E-V-E-R-Y-O-N-E.

Grab your favorite drink and shift your perspective:

What are companies hiring ML engineers trying to solve?
These companies face overwhelming noise in the AI space, and they desperately need technical experts (like you) to cut through it. They need people who can design and build ML systems that turn raw data into smart decisions.

What does this mean for you?

You need to demonstrate that you can take a real-world business problem, frame it as an ML problem, and solve it by:

Building a feature pipeline (feature engineering).
Creating a training pipeline (training or fine-tuning models).
Serving predictions (inference pipeline).

Package this solution into a Docker container and deploy it to a compute platform like AWS Lambda or Kubernetes.

This is the essence of real-world ML engineering—no more, no less.

How do you show this? Build a professional side project.

What makes it professional?

Gone are the days when a polished Jupyter notebook on GitHub was enough to land a job. Today, you need to go further.

Solve a specific business problem by:

Picking a real-world problem that excites you.
Using a data API to ingest and transform data into ML features.
Training a model (e.g., an XGBoost model, an LLM agent, or fine-tuning a base LLM).
Building an API to serve the model’s predictions.

Finish it with a clear, professional README file in your repo.

This is what hiring managers need to see—and it’s absolutely within your reach.

Remember:
You don’t learn first and then build. You learn by building.

To your success 🥂✌️

random dune Dec 30, 2024, 8:03 PM

#

My question is: I have a dataframe with three columns. I want to use these values to calculate a new value for every row in this dataframe. The formula is seen in the image above, and n = 3 just like the three columns. I have the values for x[i0] as a tuple of 3 double tuples. I want to use these values later as the x-axis of a plot. Would it be better to make this new column into a series or to add it to the dataframe as another column?

#

values = confirmed_exoplanets.loc[:, ["koi_period", "koi_teq", "koi_prad"]]```

#

confirmed_exoplanets is a subset of a larger dataset

serene scaffold Dec 30, 2024, 8:05 PM

#

random dune My question is: I have a dataframe with three columns. I want to use these value...

can you show samples of the dataframe(s)?

random dune Dec 30, 2024, 8:07 PM

#

{'koi_period': [9.48803557, 54.4183827, 2.525591777, 11.09432054, 4.13443512], 'koi_teq': [793.0, 443.0, 1406.0, 835.0, 1160.0], 'koi_prad': [2.26, 2.83, 2.75, 3.9, 2.77]} is the sample of values

serene scaffold Dec 30, 2024, 8:11 PM

#

@random dune I would organize it like this.

In [75]: koi
Out[75]:
      period     teq  prad
0   9.488036   793.0  2.26
1  54.418383   443.0  2.83
2   2.525592  1406.0  2.75
3  11.094321   835.0  3.90
4   4.134435  1160.0  2.77

In [76]: coef
Out[76]:
          0  1
period  365  1
teq     254  2
prad      1  4

In [77]: koi * coef[1]
Out[77]:
      period     teq   prad
0   9.488036  1586.0   9.04
1  54.418383   886.0  11.32
2   2.525592  2812.0  11.00
3  11.094321  1670.0  15.60
4   4.134435  2320.0  11.08

#

note how I changed the names of the columns in koi, and also how doing koi * coef[1] multiplies the teq values by 2 and the prad values by 4.

random dune Dec 30, 2024, 8:15 PM

#

i see, can this be extended into a full expression like the equation i sent? so pow(1 - abs((koi[0] - coef[0][0])/koi[0] + coef[0][0])), coef[1][0]/3)

#

or something of the sort, ofc

serene scaffold Dec 30, 2024, 8:17 PM

#

@random dune idk if this is what you want

In [81]: (koi - coef[0]) / (koi + coef[1])
Out[81]:
       period       teq      prad
0  -33.896907  0.677987  0.201278
1   -5.604307  0.424719  0.267936
2 -102.812359  0.818182  0.259259
3  -29.262138  0.694146  0.367089
4  -70.283401  0.779690  0.261448

In [82]: np.prod(1 - np.abs((koi - coef[0]) / (koi + coef[1])))
Out[82]:
period   -3.019632e+07
teq       2.269543e-03
prad      2.024582e-01
dtype: float64

#

also I know it's missing the exponent. idk what w is.

random dune Dec 30, 2024, 8:18 PM

#

oh, well the x[i0] in the equation would be the first column of your version of coef, and each w is the second column

serene scaffold Dec 30, 2024, 8:22 PM

#

@random dune this?

In [93]: coef
Out[93]:
          0  w
period  365  1
teq     254  2
prad      1  4

In [94]: koi
Out[94]:
      period     teq  prad
0   9.488036   793.0  2.26
1  54.418383   443.0  2.83
2   2.525592  1406.0  2.75
3  11.094321   835.0  3.90
4   4.134435  1160.0  2.77

In [95]: np.prod((1 - np.abs((koi - coef[0]) / (koi + coef[0]))) ** (coef['w'] / len(koi)))
Out[95]:
period    0.047383
teq       0.201104
prad      0.071537
dtype: float64

random dune Dec 30, 2024, 8:24 PM

#

is there a way to get the product for each individual row in koi? or rather what is prod multiplying?

serene scaffold Dec 30, 2024, 8:28 PM

#

random dune is there a way to get the product for each individual row in koi? or rather what...

In [98]: np.abs((koi - coef[0]) / (koi + coef[0])) ** (coef['w'] / len(koi))
Out[98]:
     period       teq      prad
0  0.989654  0.766755  0.467436
1  0.941685  0.593324  0.553862
2  0.997236  0.864048  0.543507
3  0.987912  0.777785  0.657297
4  0.995479  0.836896  0.546142

np.prod does the product of all the elements along whichever axis gets collapsed.

#

You can do axis=1 instead

In [100]: np.prod((1 - np.abs((koi - coef[0]) / (koi + coef[0]))) ** (coef['w'] / len(koi)), axis=1)
Out[100]:
0    0.278978
1    0.400076
2    0.159780
3    0.204347
4    0.187055

random dune Dec 30, 2024, 8:30 PM

#

I see! so this is a list of those values! thank you very much, now I know operations like this can be executed on dataframes and series

untold bloom Dec 30, 2024, 8:30 PM

#

In [55]: coeffs_df = pd.DataFrame.from_dict(coefficients, orient="index", columns=["x0", "w"])

In [56]: df.sub(coeffs["x0"]).div(df.add(coeffs_df["x0"])).abs().rsub(1).pow(coeffs_df["w"].div(3)).prod(axis=1)
Out[56]:
0    0.000074
1    0.000034
2    0.000028
3    0.000093
4    0.000015
5    0.000031
6    0.000000
7    0.000016
dtype: float64

#

pandas has methods for its objects to perform your formula, sub for subtraction, abs for absolute value, pow for power etc.

#

you can chain them to build it

#

s.rsub(1) does 1 - s

#

all element-wise, except for .prod which is an aggregator

random dune Dec 30, 2024, 8:33 PM

#

Oh okay!

untold bloom Dec 30, 2024, 8:33 PM

#

we say to collapse each row by taking the product over columns belonging each row (1 means that; 0 is the default axis, collapses other way)

#

as an aside, numpy's aggregators by default aggregate the entire thing, i.e., give back a scalar; however, when passed a pandas object, since they implement appropriate numpy dunders, the default axis=0 is in action

random dune Dec 30, 2024, 8:37 PM

#

untold bloom ```py In [55]: coeffs_df = pd.DataFrame.from_dict(coefficients, orient="index", ...

what is the df here? thats the dataframe with the 3 columns right?

untold bloom Dec 30, 2024, 8:38 PM

#

yes your main dataframe

#

it's assumed to have column names same as the keys of coefficients' dictionary; period etc.

#

so that the alignment will work as intended

random dune Dec 30, 2024, 8:39 PM

#

Ok, so the chaining follows PEMDAS correct?

untold bloom Dec 30, 2024, 8:39 PM

#

it follows your formula from inner to outer side

#

subtract x0 first
then divide that by x + x0
then take absolute value etc.

#

operations are element-wise so they happen to every element of the frame

#

and the x0 and w values will be "broadcast" appropriately to happen to each row as intended

#

because frame is of shape (N, 3), x0 and w (3,) each

#

it's as if x0 and w are repeated N times to have (N, 3), then operations are done

#

and which coefficient goes to which column is determined by matching their names

#

both broadcasting and alignment happen automatically for us

random dune Dec 30, 2024, 8:44 PM

#

and by N you mean the length of the frame not the n in the formula, which we take as 3?

iron basalt Dec 30, 2024, 8:45 PM

#

random dune Ok, so the chaining follows PEMDAS correct?

https://docs.python.org/3/reference/expressions.html#operator-precedence

Python documentation

6. Expressions

This chapter explains the meaning of the elements of expressions in Python. Syntax Notes: In this and the following chapters, extended BNF notation will be used to describe syntax, not lexical anal...

untold bloom Dec 30, 2024, 8:45 PM

#

random dune and by N you mean the length of the frame not the n in the formula, which we tak...

yes

#

N = 8 in the example above

#

that link is a wrong answer

random dune Dec 30, 2024, 8:46 PM

#

so the resulting series should have a length of N too, right?

untold bloom Dec 30, 2024, 8:46 PM

#

indeed

random dune Dec 30, 2024, 8:48 PM

#

coeffs = pd.DataFrame.from_dict(values, orient="index", columns=["x0", "w"])
exoplanets = confirmed_exoplanets.loc[:, ["koi_period", "koi_teq", "koi_prad"]]
print(exoplanets.sub(coeffs["x0"]).div(df.add(coeffs["x0"])).abs().rsub(1).pow(coeffs["w"].div(3)).prod(axis=1))
``` using this code, i get the following output: ```0       1.0
1       1.0
2       1.0
3       1.0
4       1.0
       ... 
9559    1.0
9560    1.0
9561    1.0
9562    1.0
9563    1.0
Length: 9564, dtype: float64

#

which is weird, since it seems it tripled or even more

#

it probably is due to some misuse

untold bloom Dec 30, 2024, 8:49 PM

#

you have "df" in the code what is it

random dune Dec 30, 2024, 8:50 PM

#

the exoplanets would be that, since it is the subset of the main dataframe but only taking those 3 columns

untold bloom Dec 30, 2024, 8:50 PM

#

so you have df = exoplanets or something

#

because code shared uses exoplanets and df both

#

also these have ["koi_period", "koi_teq", "koi_prad"] koi_ in front

#

coeffs don't, so there is a mismatch

#

you can do coeffs = pd.DataFrame....add_prefix("koi_") to remedy that

random dune Dec 30, 2024, 8:52 PM

#

can i also just add koi_ manually to the values set?

untold bloom Dec 30, 2024, 8:52 PM

#

that also works

random dune Dec 30, 2024, 8:55 PM

#

Okay! it works finally! So as a last question, when i will want to plot using this new series as the x and another column as the y i can extract y and just plot it correct?

untold bloom Dec 30, 2024, 8:56 PM

#

yes absolutely

#

you can even do df.plot(x="teq", y="something")

#

or you can use whichever plotting library tou want to use by passing df["stuff"] to them as x/y values

random dune Dec 30, 2024, 8:58 PM

#

okay so the x is the series from that formula, just named that way yes? and y would be my extracted other series

#

or no?

#

oh wait i see df there

untold bloom Dec 30, 2024, 8:58 PM

#

yes

#

well if everything is in a dataframe as columns, we access them using df[col_name] syntax right?

#

then you can do, e.g., plt.plot(df["some_col"], df["some_other"])

#

the first one was a convenience function of dataframes to plot "quickly" as calling a method

random dune Dec 30, 2024, 9:00 PM

#

ah i see so df is the combined x and y

untold bloom Dec 30, 2024, 9:00 PM

#

then you pass column names as strings there and it knows which dataframe to look at because that's what it's called on

#

yes df is still exoplanets

#

i kind of assumed this newly calculated thing also went into this as a new column

#

it doesn't have to

random dune Dec 30, 2024, 9:06 PM

#

untold bloom then you can do, e.g., `plt.plot(df["some_col"], df["some_other"])`

Okay! i have decided i will keep them as series and just plot them using this method

#

so i used plt.plot(star_mass, exoplanets_similarity) (star_mass is a series) and got this. I assume its because star_mass being the index and not sorted has caused this mess

#

i guess ill have to combine the two series into one and then sort it?

#

by star_mass ofc

untold bloom Dec 30, 2024, 9:12 PM

#

scatter plot maybe?

#

plt.scatter

#

sorting also works if it's reasonable to do for the quantities

random dune Dec 30, 2024, 9:20 PM

#

star_mass = confirmed_exoplanets["koi_smass"]
plot1 = pd.concat([star_mass, exoplanets_similarity], axis=1)
plot1.sort_values(by=["koi_smass"])
print(plot1)``` and the result is ```      koi_smass         0
0         0.919  0.119111
1         0.919  0.217223
4         1.095  0.047048
5         1.053  0.070895
6         1.053  0.061181
...         ...       ...
8817      0.169  0.245184
8956      0.169  0.160354
9014      0.892  0.889360
9083      1.010  0.109329
9181      0.698  0.235039

#

which seems unsorted still

#

and when i added plot1.sort_values(by=["koi_smass"], ascending=True, inplace=True) i got ``` koi_smass 0
6020 0.096 0.192808
3043 0.096 0.241544
655 0.132 0.127794
652 0.132 0.118477
653 0.132 0.071698
... ... ...
1633 2.646 0.008439
519 2.736 0.001650
5968 3.573 0.004942
2210 NaN 0.626638
8571 NaN 0.332637

untold bloom Dec 30, 2024, 9:22 PM

#

random dune which seems unsorted still

because sort_values returns a new dataframe, leaving the current one unaffected

#

unless you pass inplace=True

#

another way is re-assign, i.e., df = df.sort_values(...) (df generic frame here)

random dune Dec 30, 2024, 9:23 PM

#

is the NaN a result of some rows being left null or just a number range issue?

untold bloom Dec 30, 2024, 9:25 PM

#

that may be due to pd.concat([star_mass, exoplanets_similarity], axis=1)

#

if star_mass and exoplanets_similarity don't have the exact same index, then for the nonexistent indexes in one another, NaN will be put to the missing one

#

and the resultant index is the union of that of passed Serieses

#

if they have the same index, then NaN is coming from somewhere else

#

inherent

random dune Dec 30, 2024, 9:34 PM

#

could it be because exoplanets_parameters = confirmed_exoplanets.loc[:, ["koi_period", "koi_teq", "koi_prad"]] uses loc and star_mass = confirmed_exoplanets["koi_smass"] doesnt? they are both a column subset of the same dataframe, it seems

#

if not then it seems it would be inherent

untold bloom Dec 30, 2024, 9:36 PM

#

two are different way of selecting columns, achieveing the same, so that wouldn't be the root cause yeah

#

might need to look at smass' source to see if it was there to begin with

#

formula may give NaNs too, e.g., 0/0 is NaN

random dune Dec 30, 2024, 9:39 PM

#

Yep, turns out the dataset doesnt have it, weird, ill have to redownload it

limber belfry Dec 30, 2024, 10:21 PM

#

Does anyone have a code for a real time object detection program using a custom yolo11n model? The frames are from my screen. It would be perfect if i can get like 10-15 fps (ips). Please help. @ me if you answer

flat hawk Dec 30, 2024, 11:50 PM

#

Anybody has any idea why does my training loss seem to systematically dip?

weak oxide Dec 31, 2024, 12:45 AM

#

random dune could it be because ```exoplanets_parameters = confirmed_exoplanets.loc[:, ["koi...

What dataset your using

odd meteor Dec 31, 2024, 2:09 AM

#

flat hawk Anybody has any idea why does my training loss seem to systematically dip?

It's most likely learning rate (lr) related. A well-choosen learning rate allows for steady movement towards global minima w/o getting trapped in local minima or experiencing bumpy fluctuations.

If lr is too high, the loss will jump erratically (yours isn't so crazy though,)
if lr is too low, your model get stuck in local minima or take too long to converge.

If you have the time for further experiments, try to figure out if it's really a lr issue. You could use

Good old manual strategy (start from using a lr that's too large, say, lr= 1.0, then lower it until it's too small, say, lr= 1e-4 then compare and contrast with the resulting learning curve from your experiments)
Look for a paper that solved similar task, then use the same learning rate they used (a shortcut that works like charm)
If you don't wanna take the shortcut in #2, then try using automatic learning rate finder (Lightning framework has this cool option that helps one automatically find an optimal learning rate).

You can as well try other advanced concepts like:

Learning rate schedulers like StepLR, Decay on Plateau, and Cosine Annealing.
Adding momentum parameter in your optimizer to help dampen those oscillations.

rich moth Dec 31, 2024, 3:17 AM

#

flat hawk Anybody has any idea why does my training loss seem to systematically dip?

Learning rate would be a good start. I suggest optuna. Set some parameter ranges and it make a study , let it run for a while and check your results.

rich moth Dec 31, 2024, 3:31 AM

#

flat hawk Anybody has any idea why does my training loss seem to systematically dip?

I suggest a two stage optimization apprroach using optuna. Start with an intial broad search followed by a refined search around the best parameters.

#

This is my go to strategy

#

!paste

arctic wedgeBOT Dec 31, 2024, 3:40 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

rich moth Dec 31, 2024, 3:41 AM

#

flat hawk Anybody has any idea why does my training loss seem to systematically dip?

Something like this. https://paste.pythondiscord.com/ABYA

rich river Dec 31, 2024, 3:45 AM

#

odd meteor I think it's probably gonna be less annoying if you look at it this way 😀 By d...

there are too many operations to be done like .cuda(), .cpu() and I have to ensure the tensor being operated are on the same device

midnight swallow Dec 31, 2024, 5:01 AM

#

hi guys, i am just starting data science and ai, can someone give me like a mindmap to follow?

fickle shale Dec 31, 2024, 5:46 AM

#

odd meteor While the number of ML job openings has exploded in recent years, the number of ...

Thanks for advice! Great Advice!

random dune Dec 31, 2024, 6:40 AM

#

weak oxide What dataset your using

https://www.kaggle.com/datasets/nasa/kepler-exoplanet-search-results?resource=download

Kepler Exoplanet Search Results

10000 exoplanet candidates examined by the Kepler Space Observatory

paper oracle Dec 31, 2024, 7:11 AM

#

Hi guys I'm comfortable coding in python but would like to start creating an ai but i do not know where to start so could anyone dm how did you guys start

#

I've completed harvard's course CS50 Python

odd meteor Dec 31, 2024, 7:33 AM

#

rich river there are too many operations to be done like `.cuda()`, `.cpu()` and I have to ...

I've always preferred verbosity over elegance. It helps me a lot to understand better when I'm learning new stuff. So if that helps, add comments on your code especially around areas that aren't looking 100% "customer friendly" yet.

You'll get used to it soon trust me 😀

odd meteor Dec 31, 2024, 7:33 AM

#

fickle shale Thanks for advice! Great Advice!

You're welcome

odd meteor Dec 31, 2024, 7:38 AM

#

midnight swallow hi guys, i am just starting data science and ai, can someone give me like a mind...

I usually recommend starting from https://kaggle.com/learn you can also check the pinned post.

Learn Python, Data Viz, Pandas & More | Tutorials | Kaggle

Practical data skills you can apply immediately: that's what you'll learn in these no-cost courses. They're the fastest (and most fun) way to become a data scientist or improve your current skills.

clear condor Dec 31, 2024, 10:26 AM

#

so i made a working first order ODE solver

#

(for orbits)

#

now im trying to do Runge-Kutta 4th order and its just not working. i understand the math but i dont understand why its not working

flat hawk Dec 31, 2024, 12:02 PM

#

rich moth I suggest a two stage optimization apprroach using optuna. Start with an intia...

thanks for the tips among what you wrote i am using cosine annealing. I have yet to implement gridsearch to iterate through lr's

clear condor Dec 31, 2024, 12:12 PM

#

hELP

devout cloak Dec 31, 2024, 12:57 PM

#

clear condor now im trying to do Runge-Kutta 4th order and its just not working. i understand...

With no knowledge of your code or what you’re working with a good place to start is considering your step size and if what’s you’re seeing could be caused by that (error per step should be on order h5 cumulative order of h4)

#

[test smaller dt and try to see if that helps, if so try and find a good or optimal balance between compute time and error reduction]

#

Also depending on what you’re doing you could consider an implementation like this https://www.sciencedirect.com/science/article/abs/pii/S0378475420300604

yet again very not sure without seeing your code or what ODE’s or system of ODE’s you’re working with

#

unrelated to above

I’m looking to start working on some new techniques for hyperparameter optimization in GANs and I wondered if anyone has any interesting papers I should consider, I am familiar with optuna, but I’m looking to do some paper implementations for practice and experimenting

odd meteor Dec 31, 2024, 4:11 PM

#

SIGIR 2025 is happening in Europe as well. If you work on low resource languages or information retrieval, you can submit a 2-pages proposal to this venue and go present your work.

https://sigir2025.dei.unipd.it/index.html

SIGIR 2025

SIGIR 2025, Padua, 13-18 July | Home

The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval | July 13-18, 2025 in Padua, Italy

lapis sequoia Dec 31, 2024, 4:59 PM

#

lapis sequoia

poll_question_text

What should I learn (fyi -i learnt python very qell before)

victor_answer_votes

2

total_votes

4

fading wigeon Dec 31, 2024, 10:08 PM

#

lapis sequoia

Context? What are you trying to do and JS and ML just to clarify what do they stand for? (I'm assuming javascript and machine learning but don't want to assume)

#

Also, I have a question. When you're trying to diagnose a model to see if getting more training data will help, what do you do?

My first thought would be to do a learning curve, train the model on subsets of your training set and look at your cross validation and training set error to determine if new data helps. But that can be computationally expensive so I understand that it's not done in practice. So how would you determine if expanding your data set would help? (I've worked in fields where it was exorbitantly expensive to acquire more data, so would only do it if it could be proven to help beforehand)

fading wigeon Dec 31, 2024, 10:41 PM

#

Hey, I finally got a chance to go back and review this, thank you!

My only question is what do you mean by training multiple output layers? I thought there was only a single output layer? Or am I mising something?

wheat merlin Dec 31, 2024, 11:35 PM

#

midnight swallow hi guys, i am just starting data science and ai, can someone give me like a mind...

Ok. So I feel pretty qualified to answer this as a current PhD student. I started from scratch, about 2 years ago, with a very rudimentary math background (basic calc).

What I did was I tried to learn and understand everything in the Sci-Kit learn library. Just go through watch videos and look at examples. Then in your own code, I would practice regression (linear, logit, and maybe fixed/random effects time series) -> then I would go to classification (learn random forest, logit, svm, other advanced versions you find interesting) -> then I would go to clustering and dimensionality reduction (k-means and PCA; you could go deeper, but I dont find this stuff as important/interesting personally) -> then I would learn some basic preprocessing (return to previous regressions or whatever, and learn imputation, feature extraction, and normalization techniques)

Once you have learned all this, I feel like you can now stand up within the machine learning space. Everything else you want to learn, should come easier. Personally, from this point, I directed all my time into learning natural language processing stuff. But if you find video/audio/image stuff more interesting you could do that. The nice part about these more advanced techniques, is that a majority of them run on the same 1-2 model architectures and therefore can be understood relatively easy.

#

From a natural language processing perspective what I did was I specialized with PyTorch (I dont like Keras/tensorflow libraries). Then, I implemented my first model to classify my text dataset by positive/negative sentiment using a base model included in PyTorch. Once I learned the math and architecture behind these base models, I dedicated a ton of time into the older (2017) BERT models (not generative AI/LLM). I taught myself using PyTorch documentation to build RoBERTa from scratch, and implemented all the code for the tokenizer, attention mechanism, feedforward, dataloader, etc. This was the most informative project I did for sure. In the process, I made sure to understand all the function parameters to the best of my ability, which was definitely a really good thing. From here, I would say you are more than qualified to start reading research papers and digging into all the nascent advancements.

Sorry, this was a very long post, but I wish I had this when I started learning. The biggest thing I would say I learned about machine learning in general, is that it is a ton of different fields working together. You have to be an effective statistician, mathematician, programmer, data analyst, and data scientist to truly understand all the intricacies/complexities the field is moving to.

#

Obviously this was my experience as a research focused individual, other's may have had different experiences

#

I would also HIGHLY recommend that you work with data you find interesting and intriguing. You can answer any question about the data, and to really engage with the learning, I think it's super important you choose datasets you have questions about. Kaggle is a good website for this as is Nature's database, or Harvard Dataverse for academic papers.

wheat merlin Jan 1, 2025, 12:20 AM

#

fading wigeon Also, I have a question. When you're trying to diagnose a model to see if getti...

You could perhaps bootstrap the data and see how the test error looks-- if it's far off the training accuracy, I would say that indicates a need for more data

#

Would be slightly more computational feasible. I'm not sure of any other interesting methods, but there probably are some new advancements

odd meteor Jan 1, 2025, 12:21 AM

#

fading wigeon Hey, I finally got a chance to go back and review this, thank you! My only ques...

You're welcome.

Yeah, we have just 1 output layer. What I meant by "multiple output layer" is, the MLP part, say, updating the last 2 or 3 layers, that is, going beyond fine-tuning just the last layer.

fading wigeon Jan 1, 2025, 12:21 AM

#

What do you mean by bootstrapping the data in this context?

wheat merlin Jan 1, 2025, 12:22 AM

#

Using the dataset you have, and creating an extra amount of rows. Maybe (20% more) if it's a few thousand.

#

I think that would be a good method for survey methodologists or social scientists to predict cost/benefit

#

I did some reading

#

I guess you could also just cross-validate on your set and look at the variability between folds

fading wigeon Jan 1, 2025, 12:54 AM

#

Yeah I suppose

fading wigeon Jan 1, 2025, 1:20 AM

#

A learning curve still seems to be the ideal way, but... I suppose it can sometimes be hard to justify the computational resources

#

or you could be in a field where it's cheaper to get more data than it is to do the learning curve

#

(I worked in neuroscience and more data meant conducting an expensive study that costs like 40k per patient)

dusty viper Jan 1, 2025, 1:24 AM

#

Happy new year guys. I’m a sixth form student in the Uk and am confident in my basic Python codeing skills I would js like some advice on how to get stated with my ai journey, my aim by the end of next year is to build a chat bot that helps with finance, well thats the end goal😓. Can you guys direct me to some courses or smth that’ll help me learn how to implement my skills into Ai and start my journey, it would be great to receive advice from u guys who have been doing this for a while.

fading wigeon Jan 1, 2025, 1:25 AM

#

Can you give some examples of what you might want it to do?

#

My first thought is to think of it as two separate projects. A chatbot interface then a finance model/algorithms for specific financial problems

#

Unless you want it to just give generic/general finance advice I suppose

dusty viper Jan 1, 2025, 1:30 AM

#

It’s fro my NEA and want it to be based around investing stocks (because i have a lot of prior knowledge abt that field) and choosing the best etfs to invest in over a long term, but initially i just want to start learning how to code and actually make a chat bot, can you give me a starting point such as a course or a video that’ll help me

#

I’ve done a decent amount of game dev and got kinda bored of it so i wanted to move over to something i find more interesting such as AI

cold goblet Jan 1, 2025, 3:51 AM

#

pivot according to my course slides 💀

#

I feel like I am losing my mind

#

can someone confirm that this is in fact not pivot

wheat merlin Jan 1, 2025, 3:55 AM

#

i think of pivot as changing from wide or long, but I guess you could argue there is like "pivot" "pivot_wider" and "pivot_longer"

#

? maybe idk lol

#

definitely feels confusing though

cold goblet Jan 1, 2025, 3:58 AM

#

I am honestly looking at it and, like this isn't actually doing anything but then it's in the slides

earnest canyon Jan 1, 2025, 3:59 AM

#

Happy New Year to Everyone ✨💫

serene scaffold Jan 1, 2025, 4:01 AM

#

cold goblet pivot according to my course slides 💀

If I just read that definition without context, I'd think it's about transposing
That said, I'd have to concentrate to come up with a good definition of pivoting that's easy to understand

cold goblet Jan 1, 2025, 4:07 AM

#

serene scaffold If I just read that definition without context, I'd think it's about transposing...

it's not pivot though, right?

#

I might still have to write it as pivot on the answer sheet but still

serene scaffold Jan 1, 2025, 4:08 AM

#

cold goblet it's not pivot though, right?

The terms used in this space aren't absolute

cold goblet Jan 1, 2025, 4:09 AM

#

serene scaffold The terms used in this space aren't absolute

yeah, but it's supposed to mean a database pivot I think. idk

#

like a group by, I guess??

serene scaffold Jan 1, 2025, 4:09 AM

#

Uh no

#

What is the question?

#

When the rows become columns and the columns become rows, that's called transposing. And then pivoting is a specific thing that is completely different from that.

But if your instructor uses those words differently than everyone I know, I can't make them stop.

cold goblet Jan 1, 2025, 4:12 AM

#

what are the possible operations in an OLAP? and the oeprations are drill up, down, slice, dice and pivot

serene scaffold Jan 1, 2025, 4:12 AM

#

Idk what olap is

cold goblet Jan 1, 2025, 4:12 AM

#

ahh. okay

serene scaffold Jan 1, 2025, 4:13 AM

#

Sounds like it's a certain way of conceptualizing large stores of structured data

cold goblet Jan 1, 2025, 4:14 AM

#

I'm thinking that too. but there's no mention of what exactly the operation is supposed to accomplish or what's the possible use case.

wheat merlin Jan 1, 2025, 4:17 AM

#

I think it's just to display the information to a viewer differently in an eventual table or something

neat sparrow Jan 1, 2025, 4:28 AM

#

Is HuggingFace good for dataset gathering and NLP libraries such as NLTK?

iron basalt Jan 1, 2025, 4:34 AM

#

cold goblet pivot according to my course slides 💀

https://www.ibm.com/think/topics/olap

What is OLAP? | IBM

Learn more about OLAP, a core component of data warehousing implementations to enable fast, flexible data analysis.

#

It's basically an implementation detail for databases that want to support operations that affect a lot of the data, not transactional.

#

Example database that supports OLAP: https://duckdb.org/

DuckDB

An in-process SQL OLAP database management system

DuckDB is an in-process SQL OLAP database management system. Simple, feature-rich, fast & open source.

#

Basically, OLTP oriented is lots of small (and simple) queries (that often edit state). OLAP oriented is a few very large (and complex) queries that are for analysis (often read only). Technically you can have some database that can do both well.

#

All the table libraries used in data science would usually fall under OLAP or OLAP adjacent (OLAP is a specific thing (the cube) (but maybe also not so specific, it's a bit hand wavy), but either way these libraries are for analysis, not doing a bunch of transactions).

#

(However, its specific terms for stuff like pivot mean something else (internal terminology))

neat sparrow Jan 1, 2025, 4:44 AM

#

neat sparrow Is HuggingFace good for dataset gathering and NLP libraries such as NLTK?

(This is for a text based ai that can respond to a user)

iron basalt Jan 1, 2025, 4:44 AM

#

(Like how you don't need to care about what ACID is, the database just does the thing you want without knowing that)

wheat merlin Jan 1, 2025, 4:55 AM

#

neat sparrow (This is for a text based ai that can respond to a user)

yeah for sure, a lot of the datasets used in current research articles are uploaded to huggingface

neat sparrow Jan 1, 2025, 5:08 AM

#

wheat merlin yeah for sure, a lot of the datasets used in current research articles are uploa...

Are there are better choices? Or do you have any suggestions...?

wheat merlin Jan 1, 2025, 5:10 AM

#

The best datasets that are used to train current models are on huggingface

#

if you want like "fun" datasets, you could look at Kaggle

#

if you want social science datasets like political information you could try Harvard Dataverse

neat sparrow Jan 1, 2025, 5:11 AM

#

What about math or history ones?

#

Thank you for the info by the way

wheat merlin Jan 1, 2025, 5:13 AM

#

Ya, idk about history exactly, but probably on Harvard Dataverse

#

if you are interested in like the history of wars or something, there is correlates of war which is a dataset you can find

#

https://www.dataverse.pitt.edu/external/datasets.php

#

here is some other history ones too

neat sparrow Jan 1, 2025, 5:16 AM

#

Thanks. One more thing, according to Claude AI, I should use spaCY over NLTK for Natural Language Processing. Is that a good choice in my case, based on what it seems I'm using?

wheat merlin Jan 1, 2025, 5:24 AM

#

I actually havent used either, I have only used PyTorch and HuggingFace

#

It looks like NLTK has more options and may be a bit more customizable

rich moth Jan 1, 2025, 7:04 AM

#

!paste

arctic wedgeBOT Jan 1, 2025, 7:04 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

gritty vessel Jan 1, 2025, 9:04 AM

#

Hey is there any tutorial to train auto encoder on custom dataset?

#

My Input shape is 453,958

#

And it always comes out as 456,960

#

As output

unkempt apex Jan 1, 2025, 9:36 AM

#

gritty vessel Hey is there any tutorial to train auto encoder on custom dataset?

https://www.tensorflow.org/tutorials/generative/autoencoder

neat sparrow Jan 1, 2025, 5:50 PM

#

wheat merlin yeah for sure, a lot of the datasets used in current research articles are uploa...

Do you (or anyone who can answer this) believe this dataset: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu can be processed on a high-end household computer at decent speed (if it has the storage space and cpu for it)?

HuggingFaceFW/fineweb-edu · Datasets at Hugging Face

wheat merlin Jan 1, 2025, 5:51 PM

#

No, I think you would need multiple GPU's and like 1tb of RAM lol

#

its 9.45 TB of harddrive space

neat sparrow Jan 1, 2025, 5:53 PM

#

wheat merlin its 9.45 TB of harddrive space

I'm aware...

neat sparrow Jan 1, 2025, 5:54 PM

#

wheat merlin No, I think you would need multiple GPU's and like 1tb of RAM lol

Dang, alright.

wheat merlin Jan 1, 2025, 5:54 PM

#

You could try and see if there is a subset, or you could random sample the data

neat sparrow Jan 1, 2025, 5:54 PM

#

wheat merlin You could try and see if there is a subset, or you could random sample the data

What do you mean exactly?

wheat merlin Jan 1, 2025, 5:55 PM

#

Like randomly select 1 million rows out of the 1 billion, so that you have a smaller dataset to work with

#

Is there anything you like about that dataset in particular? I could try and help you find a more reasonable one for home compute

#

https://www.reddit.com/r/LocalLLaMA/comments/1dl1k61/comment/l9m89zo/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

mark-lord's comment on "FineWeb-Edu is actually nuts"

Explore this conversation and more from the LocalLLaMA community

#

I would recommend trying this

#

So they import the dataset with huggingface and they are selecting specific keywords in the rows

#

it would probably take a long time to process, but you could get a few million rows to play with

#

just replace the keywords with stuff that you are interested in

neat sparrow Jan 1, 2025, 5:59 PM

#

I'm looking for a dataset that's good enough to be able to give the model enough data to be able to answer most daily questions.

#

And questions with specific grammatical words

#

And also questions for programming, math, and history.

#

(Not all of those in one though. I'm okay with multiple small datasets that can accommodate those.)

wheat merlin Jan 1, 2025, 6:04 PM

#

As far as academic stuff goes, the dataset you found would be good. You can use that code and replace the keywords with history or math questions

#

I think for more daily stuff, yoiu would want to use this dataset, which is just FineWeb but not trained on Google Scholar stuff:
https://huggingface.co/datasets/HuggingFaceFW/fineweb

HuggingFaceFW/fineweb · Datasets at Hugging Face

#

And for programming, there are seperate datasets like: https://huggingface.co/datasets/Programming-Language/codeagent-python

Programming-Language/codeagent-python · Datasets at Hugging Face

#

This python dataset would probably not take an obscene amount of time to model, the train is 300k rows

#

https://github.com/markhliu/DGAI/blob/main/README.md

GitHub

DGAI/README.md at main · markhliu/DGAI

Learn Generative AI with PyTorch (Manning Publications, 2024) - markhliu/DGAI

#

I found this book too, it looks like the end chapter (chapter 16) teaches you how to "use the LangChain library to combine pre-trained large language models (LLMs) with Wolfram Alpha and Wikipedia APIs to create a zero-shot know-it-all personal assistant."

#

which seems kinda like what you want

#

If you are interested in this book, i would recommend doing chapter 8 through 12, then 16 if you want to converge your models

neat sparrow Jan 1, 2025, 6:35 PM

#

wheat merlin I think for more daily stuff, yoiu would want to use this dataset, which is just...

I already looked at that one... and I'm not using it because you should probably check how much space it uses...

wheat merlin Jan 1, 2025, 6:36 PM

#

yeah, its crazy

#

I think with the huggingface library though you can import the dataset iteratively so you don't have to download it or anything

#

if you wanted to skim through all the rows to get the data you want it would take forever of course

neat sparrow Jan 1, 2025, 6:38 PM

#

wheat merlin I found this book too, it looks like the end chapter (chapter 16) teaches you ho...

Ah. Seems interesting. I'll check out that one as well. I'm going to most likely use a pre-trained model, but I was thinking about using that to make a zero-shot.

#

I already found some good wikipedia datasets that are cleaned

wheat merlin Jan 1, 2025, 6:39 PM

#

Nice. Yeah, I would start with that. You can always just replace the dataset if you find a better one. The model architecture would stay basically the same

neat sparrow Jan 1, 2025, 6:39 PM

#

wheat merlin if you wanted to skim through all the rows to get the data you want it would tak...

Yeah... 😂

wheat merlin Jan 1, 2025, 6:40 PM

#

Have you done any implementation of pre-trained models before?

neat sparrow Jan 1, 2025, 6:40 PM

#

wheat merlin Have you done any implementation of pre-trained models before?

Yea, once.

wheat merlin Jan 1, 2025, 6:41 PM

#

Cool. If you need any help or have questions about the basic premises of the math behind neural nets you can @ me

neat sparrow Jan 1, 2025, 6:41 PM

#

I used BERT to pre-train a model with some medium datasets

wheat merlin Jan 1, 2025, 6:42 PM

#

cool, I did that over the summer for a polling firm I was working for

#

BERT is a great way to learn neural nets for sure

neat sparrow Jan 1, 2025, 6:46 PM

#

wheat merlin cool, I did that over the summer for a polling firm I was working for

Nice, man. I haven't tried this yet, but can I pre-train this using multiple models?

wheat merlin Jan 1, 2025, 6:47 PM

#

What do you mean

#

Like train multiple models for classification and linking them together? Or something else

neat sparrow Jan 1, 2025, 6:47 PM

#

wheat merlin Like train multiple models for classification and linking them together? Or some...

Essentially, yes.

wheat merlin Jan 1, 2025, 6:48 PM

#

I think you can do that. I haven't done it personally. But it appears the LangChain library is what you would want to research

#

I know they use LangChain to make the multi-modal models that have image, chat, and other technologies all in the one

#

Not entirely sure about classification models though

neat sparrow Jan 1, 2025, 6:50 PM

#

Alright, thanks, I think I've got enough answers for now. Is it okay if I dm you next time to not use up the chat?

wheat merlin Jan 1, 2025, 6:50 PM

#

yeah 👍

solemn silo Jan 1, 2025, 8:58 PM

#

Please can someone help me understand the chain rule because i swear I have spend forever, like a few months, and I still don't understand it 💢 ‼️

serene scaffold Jan 1, 2025, 9:02 PM

#

solemn silo Please can someone help me understand the chain rule because i swear I have spen...

do you understand recursion?

#

also, can you tell me the derivative (wrt x) of 4x^3?

paper cove Jan 1, 2025, 11:22 PM

#

hi

#

is there any place to start learning tokenization and neural networks

#

more on fundamental or mathematical level

wheat merlin Jan 1, 2025, 11:25 PM

#

paper cove more on fundamental or mathematical level

https://www.youtube.com/@statquest

YouTube

StatQuest with Josh Starmer

Statistics, Machine Learning and Data Science can sometimes seem like very scary topics, but since each technique is really just a combination of small and simple steps, they are actually quite simple. My goal with StatQuest is to break down the major methodologies into easy to understand pieces. That said, I don't dumb down the material. Instea...

#

this guy is REALLY good if you need visuals like I do

serene scaffold Jan 1, 2025, 11:28 PM

#

paper cove is there any place to start learning tokenization and neural networks

tokenization isn't mathematical. it's just "what is considered a word in this context, and where are the word boundaries?"

paper cove Jan 1, 2025, 11:28 PM

#

i meant for neural networks

serene scaffold Jan 1, 2025, 11:29 PM

#

I know.

#

you don't really need to "learn tokenization". you just use an existing tokenizer. making your own tokenizer is very advanced and unusual.

paper cove Jan 1, 2025, 11:29 PM

#

but it is hard to tokenize things in other languages

serene scaffold Jan 1, 2025, 11:30 PM

#

what are you trying to do?

paper cove Jan 1, 2025, 11:30 PM

#

like in japanese, we don't use space to differentiate in words

#

it is all continuous

wheat merlin Jan 1, 2025, 11:30 PM

#

it's not that hard to make a tokenizer

serene scaffold Jan 1, 2025, 11:30 PM

#

paper cove like in japanese, we don't use space to differentiate in words

and I maintain that you do not need to "learn tokenization". you can look for a Japanese tokenizer.

paper cove Jan 1, 2025, 11:31 PM

#

serene scaffold and I maintain that you do not need to "learn tokenization". you can look for a ...

i see

#

by the way understanding a tokenizer as a function that converts a string into a nth dimension vector is good?

#

where there are 'n' words in entire dictionary

#

and less than 'n' words in the string

wheat merlin Jan 1, 2025, 11:34 PM

#

Right, for older models. For newer models there is text embedding which adds some more complexity

rich moth Jan 1, 2025, 11:34 PM

#

Looking for some feedback on this idea , what am I missing here? I’ve been brainstorming a generalized phase encoding system that adapts to different data types and computes a complexity score. The core equation is Φ(x) = x + A * exp(iθ(x)), where θ(x) captures intrinsic structure depending on the data type:

Time Series: θ(x) = ω * t + φ(volatility, trend_strength) (e.g., trends, seasonality).
Images: θ(x) = Σ(spatial_frequency * position + texture_density) (patterns + textures).
Text: θ(x) = semantic_embedding * syntactic_structure (word embeddings + grammar).
Tabular: θ(x) = Σ(feature_importance * value_normalization) (relationships between columns).

paper cove Jan 1, 2025, 11:35 PM

#

wheat merlin Right, for older models. For newer models there is text embedding which adds som...

but there should be also something like order of words?

#

so how can i preserve the order of words into a vector if i use it?

wheat merlin Jan 1, 2025, 11:37 PM

#

So I think methods like word2vec keep an order of words, but to reduce computational costs they reduce the words to a lower dimensional vector

#

so if your vocab was 1000 words, you can set a cap on word2vec to minimize it to the 300th word

#

so you are left with a vector of 300 length, that still preserves order of words, just is more efficient

paper cove Jan 1, 2025, 11:39 PM

#

what the vector consists of btw?

wheat merlin Jan 1, 2025, 11:40 PM

#

I also think the vectors are grouped by context with the embeddings, so they arn't necessarily in the same order they came in on

serene scaffold Jan 1, 2025, 11:40 PM

#

paper cove by the way understanding a tokenizer as a function that converts a string into a...

No, that's an incorrect understanding. At its core, a tokenizer just splits something into individual words (tokens). what you're describing is a one-hot representation of a token.

paper cove Jan 1, 2025, 11:41 PM

#

serene scaffold No, that's an incorrect understanding. At its core, a tokenizer just splits some...

i see, so i just get a list of words from a string

#

in the same order

serene scaffold Jan 1, 2025, 11:42 PM

#

paper cove i see, so i just get a list of words from a string

pretty much. if you want the tokenizer (as an object in your code) to return those tokens as one-hot vectors, you can do it like that, but that's extra functionality in addition to the actual tokenization.

wheat merlin Jan 1, 2025, 11:43 PM

#

right, so the current method is embedding which looks like this:

#

so you can limit to the nth dimension, instead of the vocab size

serene scaffold Jan 1, 2025, 11:43 PM

#

wheat merlin so you can limit to the nth dimension, instead of the vocab size

it sounds like you're conflating unrelated concepts

wheat merlin Jan 1, 2025, 11:44 PM

#

and vectors incorporate context by having values similar to each other

paper cove Jan 1, 2025, 11:44 PM

#

wheat merlin right, so the current method is embedding which looks like this:

sorry i don't get it, i meant like how to convert strings to vector preserving the order of words

serene scaffold Jan 1, 2025, 11:44 PM

#

if you have a vocabulary of m words and you want to represent them as n dimensional vectors, m and n are unrelated.

paper cove Jan 1, 2025, 11:46 PM

#

for example, the sentence is "This is a cat."

wheat merlin Jan 1, 2025, 11:46 PM

#

are you asking specifically for purposes like translation?

paper cove Jan 1, 2025, 11:46 PM

#

tokenizer will convert it into ["This", "is", "a", "cat."]

wheat merlin Jan 1, 2025, 11:46 PM

#

yes

paper cove Jan 1, 2025, 11:46 PM

#

wheat merlin are you asking specifically for purposes like translation?

no, i am asking for generative models

serene scaffold Jan 1, 2025, 11:47 PM

#

it would probably have the "." as its own token, because you don't want "cat" and "cat." to be different

paper cove Jan 1, 2025, 11:47 PM

#

i see

paper cove Jan 1, 2025, 11:47 PM

#

paper cove tokenizer will convert it into ["This", "is", "a", "cat."]

["This", "is", "a", "cat, "."]

#

so i got this output ig from tokenizer

serene scaffold Jan 1, 2025, 11:47 PM

#

and it might also treat "This" and "this" as the same token

paper cove Jan 1, 2025, 11:47 PM

#

now i assign them numbers?

paper cove Jan 1, 2025, 11:48 PM

#

serene scaffold and it might also treat "This" and "this" as the same token

i see

#

["this", "is", "a", "cat, "."]

serene scaffold Jan 1, 2025, 11:48 PM

#

paper cove now i assign them numbers?

for generative models like the GPT family, you have you use the tokenizer that comes with the model.

wheat merlin Jan 1, 2025, 11:48 PM

#

yeah I think GPT uses RoPE for its embeddings

#

but other models use different things

rich moth Jan 1, 2025, 11:49 PM

#

Sometimes words like unbreakable will get split
["un," "break," "able,"]

paper cove Jan 1, 2025, 11:49 PM

#

i see

wheat merlin Jan 1, 2025, 11:50 PM

#

thats called lemetization? or stemming?

serene scaffold Jan 1, 2025, 11:50 PM

#

wheat merlin thats called lemetization? or stemming?

I've heard it called subtokenization. when a word is composed of individual parts that have their own meaning, those parts are called morphemes.

wheat merlin Jan 1, 2025, 11:51 PM

#

yeah that makes sense

paper cove Jan 1, 2025, 11:51 PM

#

if my dictionary contains the following words.

a
an
the
this
that
these
those
is
are
am
cat
dog
.
,

wheat merlin Jan 1, 2025, 11:51 PM

#

Stemming appears to be removing common suffixes and lemmatization reduces words to their root form

paper cove Jan 1, 2025, 11:51 PM

#

so i use 14th dimension vector?

serene scaffold Jan 1, 2025, 11:52 PM

#

paper cove if my dictionary contains the following words. 1. a 2. an 3. the 4. this 5. that...

most of those words are considered "stop words". which are words that don't carry any intrinsic meaning.

serene scaffold Jan 1, 2025, 11:52 PM

#

paper cove so i use 14th dimension vector?

you're assuming that you're going to one-hot encode each token. which isn't a foregone conclusion.

paper cove Jan 1, 2025, 11:54 PM

#

so models play a very important role in what input they take

#

i was thinking to convert string into some form which won't lose any information and is computational so that any model can use it

serene scaffold Jan 1, 2025, 11:56 PM

#

paper cove so models play a very important role in what input they take

models don't "decide" what input they take, if that's what you mean. the structure of the input is something you have to decide when you decide on the model architecture.

serene scaffold Jan 1, 2025, 11:57 PM

#

paper cove i was thinking to convert string into some form which won't lose any information...

that's not possible, no.

paper cove Jan 1, 2025, 11:57 PM

#

i see

serene scaffold Jan 1, 2025, 11:58 PM

#

when you map tokens to integers (like assigning "cat" to 42069), that mapping is arbitrary--it's only significant for the model inasfaras you always use the same integer for the same token.

paper cove Jan 2, 2025, 12:02 AM

#

so after doing that, i need to prepare a neural network which can take that list of integers which are coverted from tokens

wheat merlin Jan 2, 2025, 12:04 AM

#

yeah, for RoBERTa specifically the "main" steps are encoder -> word embedding -> position embedding -> attention -> encoder

#

other models are different ofc

paper cove Jan 2, 2025, 12:05 AM

#

what is word embedding now?

wheat merlin Jan 2, 2025, 12:07 AM

#

word being converted to a dense vector

#

["I", "love", "coding"]

[[0.1, 0.3, 0.4], [0.5, 0.2, 0.7], [0.8, 0.1, 0.6]]

#

then does similar for position embeddings

#

then it adds the two together to get the full embedding

#

then that value goes to the attention

paper cove Jan 2, 2025, 12:09 AM

#

are these values pre assigned or?

wheat merlin Jan 2, 2025, 12:09 AM

#

when you create the tokenizer you train it on a dictionary of words I think

#

so yeah I think it is trained and knows what to give each word

paper cove Jan 2, 2025, 12:10 AM

#

but tokenizer just converted strings into a list of words?

#

i have dictionary, and i have assigned a number to each word, and converted the token into integer

#

after that i don't get what word embedding is representing here

wheat merlin Jan 2, 2025, 12:11 AM

#

yes, my understanding is you do that in the initial encoding block, then the word embedding creates the dense vector

#

def forward(self, prompts):
if isinstance(prompts, str):
prompts = [prompts]

    encoded = [self.tokenizer.encode(prompt) for prompt in prompts]
    max_len = max(len(seq) for seq in encoded)
    padded = [seq + [self.tokenizer.get_pad_token_id()] * (max_len - len(seq)) for seq in encoded]
    
    input_ids = torch.tensor(padded, device=self.device)
    if input_ids.dim() == 1:
        input_ids = input_ids.unsqueeze(0)
    
    word_embeds = self.word_embedding.get_embeddings(input_ids)
    pos_embeds = self.positional_embedding(input_ids.size(1))
    
    embeddings = word_embeds + pos_embeds
    
    attention_mask = (input_ids != self.tokenizer.get_pad_token_id()).float()
    extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)
    extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype)
    extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
    
    encoder_outputs = self.encoder(embeddings, attention_mask=extended_attention_mask)
    sequence_output = encoder_outputs[0]
    
    pooled_output = sequence_output[:, 0, :]
    
    logits = self.classifier(pooled_output)
    
    return logits

#

code helps me visualize

paper cove Jan 2, 2025, 12:13 AM

#

#

got an image from internet

#

not sure if this is what word embedding means

wheat merlin Jan 2, 2025, 12:13 AM

#

that looks inaccurate to me

paper cove Jan 2, 2025, 12:13 AM

#

i see

wheat merlin Jan 2, 2025, 12:13 AM

#

at least with bert models

#

i think they are conflating multiple steps into the word embedding

#

it doesn't just go word to perfect, there is multiple steps as you can see in the above code

paper cove Jan 2, 2025, 12:15 AM

#

any recommendations for docs or video to learn word embedding first?

wheat merlin Jan 2, 2025, 12:16 AM

#

https://youtu.be/viZrOnJclY0?si=841i7IXd3_AmETaY

YouTube

StatQuest with Josh Starmer

Word Embedding and Word2Vec, Clearly Explained!!!

Words are great, but if we want to use them as input to a neural network, we have to convert them to numbers. One of the most popular methods for assigning numbers to words is to use a Neural Network to create Word Embeddings. In this StatQuest, we go through the steps required to create Word Embeddings, and show how we can visualize and validat...

▶ Play video

paper cove Jan 2, 2025, 12:16 AM

#

thanks

paper cove Jan 2, 2025, 12:29 AM

#

wheat merlin https://youtu.be/viZrOnJclY0?si=841i7IXd3_AmETaY

at 5:55 in this video, just want to ask if the activation function is nothing but just a value assigner to each category

flat token Jan 2, 2025, 12:29 AM

#

paper cove i was thinking to convert string into some form which won't lose any information...

Lexicographical encodings are legitimate but it is not bijective so you can't guarantee you can recover what you sent in originally

paper cove Jan 2, 2025, 12:29 AM

#

like f_noun("cat") = 1
f_noun("is")=0
where f_noun is an activation function???

wheat merlin Jan 2, 2025, 12:36 AM

#

I think its just a value assigner yeah

#

or perhaps a selector is a better description

neat sparrow Jan 2, 2025, 1:41 AM

#

paper cove no, i am asking for generative models

Just use an existing tokenizer.... I read through this and I don't why you can't.

rich moth Jan 2, 2025, 2:34 AM

#

Okay, this is pretty cool, guys. You can actually see the model shifting between using the base dimensions, and expanding its dimensions as the complexity changes. This is what I hoped would happen, adapting on the fly. I think the quantum-inspired is still the key to all this but adaptive dimensions stuff is something else. It seems to be learning really well, and only uses the extra capacity when it needs it, and chills out when it doesn't. Check these out

wheat merlin Jan 2, 2025, 2:41 AM

#

what's the code for the adaptive dimension?

#

is it the Adaptive library?

rich moth Jan 2, 2025, 2:47 AM

#

This one is indeed my highest score -256 . What is pretty amazing is the base dimension is only 64, but its modified with a expansion factor. I think I found it ideal to allow it to be small or modest, big isn't necessary as its highly more optimized, the vector space that is. Like switching between its dims doesn't seem to affect its accuracy at all. I believe this could be foundation for time series data or other data that's chronological organized.

@wheat merlin It's a custom job, not a library. Basically, I've got this "Market Complexity Detector" that checks out the market's vibe - you know, volatility, trends, that kinda stuff. Then, based on that complexity score, the transformer blocks can "expand" or "compress" their dimensions.

wheat merlin Jan 2, 2025, 2:48 AM

#

That's really cool

#

Is it proprietary or is it based off a paper implementation?

rich moth Jan 2, 2025, 2:49 AM

#

Thank man, its been a labor of love so long it feels like

#

It all started with an idea trying to perdict prime numbers

#

Obviously didnt have the compute capaticy to see it through but learned some cool stuff hehe

rich moth Jan 2, 2025, 2:55 AM

#

wheat merlin Is it proprietary or is it based off a paper implementation?

I believe it's proprietary in the sense that I didn't base it off any specific paper or implementation. I'm sure there are papers out there with similar ideas, its a big field hehe. But my friends did give me the idea in our ML chat channel but I just theorized the rest.

#

Didint happen over night though and I stumbled upon a lot of stuff just through experimentation more than anything.

wheat merlin Jan 2, 2025, 2:57 AM

#

Yeah im kinda looking into similar papers, it looks like this stuff kinda started getting implemented around 21-22

#

But you may have done something unique idk... could be worth looking to see if you could write a publication if you are interested in that

#

Cool idea though, definitely going to look into it more in regards to Graph theory

gritty vessel Jan 2, 2025, 3:06 AM

#

unkempt apex <https://www.tensorflow.org/tutorials/generative/autoencoder>

Thank you it worked

rich moth Jan 2, 2025, 3:09 AM

#

wheat merlin Yeah im kinda looking into similar papers, it looks like this stuff kinda starte...

Thanks, dude. I appreciate that. I’m just an old UPS driver who loves tinkering with this stuff in my spare time. Writing a paper sounds cool, but honestly, kind of intimidating, and I've read my fair share. But the whole academic process Is a whole another beast, you know? I found my secret sauce in life is brainstorming and tinkering. I guess like problem solving one a whole, but everyone is a problem solver I guess lol. But with that said I have started trying to compile a lot of this data into a report which I can hope to refine and use the visuals I've gathered along with other metrics I'm sure I can eventually tackle that.

#

Zapbot is a beast.

plush kettle Jan 2, 2025, 9:20 AM

#

Is opencv good for image augmentation? I would like to create random faces with various perspectives like for example: looking right, looking left, etc.

wooden sail Jan 2, 2025, 11:30 AM

#

plush kettle Is opencv good for image augmentation? I would like to create random faces with ...

this is not an augmentation task. you would need a generative model that is itself trained on real pictures from various perspectives

plush kettle Jan 2, 2025, 11:31 AM

#

Can you give me such example/s?

wooden sail Jan 2, 2025, 11:34 AM

#

plush kettle Can you give me such example/s?

stylegan2 is one that comes to mind, but this is old by now. i'm sure nvidia must have something for this task as well

plush kettle Jan 2, 2025, 11:35 AM

#

Alright, thanks

wooden sail Jan 2, 2025, 11:36 AM

#

plush kettle Alright, thanks

https://research.nvidia.com/labs/nxp/lp3d/ here's one from nvidia

#

but yeah my main point is that that is not an augmentation task, it's a very difficult modeling/inversion problem

#

if you use this to train another network, you now have several points of failure because the new images cannot be trusted (you have a dirty training set)

solemn silo Jan 2, 2025, 12:07 PM

#

serene scaffold also, can you tell me the derivative (wrt x) of `4x^3`?

i understand recursion and it is 12x^2

wooden sail Jan 2, 2025, 12:09 PM

#

solemn silo i understand recursion and it is 12x^2

the chain rule is usually applied recursively, which is why stelercus asked you that

#

what about the chain rule is troubling you?

young granite Jan 2, 2025, 3:57 PM

#

guys i want to compare rows in my feature df and see which ids have a high unity.

           col_0  col_1  col_2  col_3  col_4  col_5  col_6  col_7  col_8  \
412788399      0      0      1      1     55  41351      1  47333      1   
412763015      0      0      1      1     62  92000     99  47999      5   

           col_9  ...  col_37  col_38  col_39  col_40  col_41  col_42  col_43  \
412788399    0.0  ...     1.0     1.0     0.0     0.5     0.5     0.5     1.0   
412763015    0.0  ...     1.0     0.0     0.5     0.5     0.5     0.5     1.0   

           col_44  col_45  col_46  
412788399     1.0     1.0    21.0  
412763015     1.0     1.0    12.0

currently i use the last col (sum of features) but i would love a better approach, not sure how that would look tho :D.
I did feature dist., corr and relationship plots already.

Any ideas?

serene scaffold Jan 2, 2025, 4:01 PM

#

young granite guys i want to compare rows in my feature df and see which ids have a high unity...

what does it mean for two IDs to have high unity?

young granite Jan 2, 2025, 4:02 PM

#

serene scaffold what does it mean for two IDs to have high unity?

i would hope to see similar samples having a high unity

serene scaffold Jan 2, 2025, 4:02 PM

#

young granite i would hope to see similar samples having a high unity

I'm asking what unity means.

young granite Jan 2, 2025, 4:04 PM

#

serene scaffold I'm asking what unity means.

share similar feature set

serene scaffold Jan 2, 2025, 4:11 PM

#

@young granitefor each pair of rows x, y, you can calculate the elementwise |x - y|, I guess

young granite Jan 2, 2025, 4:12 PM

#

serene scaffold <@385750261420916736>for each pair of rows `x, y`, you can calculate the element...

i think the result would be similar to my sum approach, wont it?
Also i would love to visualize it directly

serene scaffold Jan 2, 2025, 4:13 PM

#

young granite i think the result would be similar to my sum approach, wont it? Also i would lo...

it wouldn't be the same as the sum approach.
look into manhattan distance.

young granite Jan 2, 2025, 4:17 PM

#

serene scaffold it wouldn't be the same as the sum approach. look into manhattan distance.

but then i would have to do that for each row, thought of a more direct approach

serene scaffold Jan 2, 2025, 4:33 PM

#

young granite but then i would have to do that for each row, thought of a more direct approach

taking the sum of each row and comparing that sum to other rows, is not the same as the manhattan distance. [1, 2, 3] and [3, 2, 1] have the same sum, but their mahattan distance is 4.

#

and [2, 2, 2] also has the same sum, but manhattan([1, 2, 3], [2, 2, 2]) < manhattan([1, 2, 3], [3, 2, 1])

wheat merlin Jan 2, 2025, 4:42 PM

#

You could look at pairwise similarity using cosine or jaccard similarity

#

you could also use graph theory with those pairwise similarity meaures to visualize the ids close to each other

serene scaffold Jan 2, 2025, 4:46 PM

#

wheat merlin You could look at pairwise similarity using cosine or jaccard similarity

jaccard? they're not sets

wheat merlin Jan 2, 2025, 4:52 PM

#

https://www.geeksforgeeks.org/how-to-calculate-jaccard-similarity-in-r/

GeeksforGeeks

How to Calculate Jaccard Similarity in R? - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

#

crazy they are used for recommendation systems... with data that looks exactly like what he is working with...

serene scaffold Jan 2, 2025, 4:54 PM

#

wheat merlin crazy they are used for recommendation systems... with data that looks exactly l...

jaccard is used for calculating the similarity of two sets of discrete items. it doesn't work for continuous values like the ones Greenleek has.

#

(or you can use jaccard anyway, but you'll get something meaningless)

#

I recommend not using greeksforgeeks--their information is almost always questionable, and there are so many free resources out there.

wheat merlin Jan 2, 2025, 5:01 PM

#

yeah you are right that is for binary data; there would be a better approach for sure

#

but there is a fuzzy jaccard index that can take continuous data if you were really bent on using jaccard

#

https://arxiv.org/pdf/2008.02216

solemn silo Jan 2, 2025, 5:31 PM

#

wooden sail what about the chain rule is troubling you?

I don't really know I think it is the notation but then again I don't really understand it conceptually well. Sorry, I just said a bunch of very loose things.

wooden sail Jan 2, 2025, 6:02 PM

#

solemn silo I don't really know I think it is the notation but then again I don't really und...

have you tried with some simple examples and then build up from there?

#

maybe an illustrative example for you is that x^6, for which you already know the derivative is 6 x^5, can be written as (x^3)^2, where we can think of f(x) = x^3, and g(z) = z^2, and then take g(f(x)). applying the chain rule should yield 6x^5, so you can check yourself if you did it right

#

and similarly for polynomials with higher powers, say x^8, noticing that x^8 = ((x^2)^2)^2

dry raft Jan 2, 2025, 7:01 PM

#

hey guys

#

I am trying to use a vision transformer for binary classification

#

however, at some point, it keeps guessing "1" and gets stuck at this

#

how can i fix this(broad tips)

wheat merlin Jan 2, 2025, 7:56 PM

#

dry raft how can i fix this(broad tips)

what model are you using and how much data do you have?

#

I think there are a lot of potential options:

Make sure your class balance is good, try changing learning rate, try regularization

dry raft Jan 2, 2025, 7:57 PM

#

wheat merlin what model are you using and how much data do you have?

The model is known as TinyViT and I am using something called pneumoniaMnist from MedMNIST

#

There are GitHub repositories for this by the way

wheat merlin Jan 2, 2025, 7:57 PM

#

Ok I can look at it, I only have used one ViT before though

#

have you tried using a different size of the model, it looks like they have 5m, 11m, and 21m parameters models

#

but I would also look at: class balance, learning rate, and regularization

boreal gale Jan 2, 2025, 10:17 PM

#

young granite share similar feature set

similar in what sense?

is (1,1,1) and (1.1, 1.1, 1.1) similar? since they are close in magnitude away from 0
or just (1,1,0) and (1,1,1) ? since they share the 2 preceeding 1s

empty wing Jan 3, 2025, 12:39 AM

#

ive been trying to make a neuronetwork combined with a sort of macro mixed into it where it would scan a group images and click on a specific one for a set period of time and if it doesnt see the images it would skip to the next set of pictures, ive been using pytorch with cuda and anaconda and i was confused on how i can give my network pictures to learn from if im confined to a terminal

dry raft Jan 3, 2025, 1:54 AM

#

wheat merlin but I would also look at: class balance, learning rate, and regularization

Ale

#

Alr

upbeat prism Jan 3, 2025, 11:42 AM

#

empty wing ive been trying to make a neuronetwork combined with a sort of macro mixed into ...

Well how do you read the pictures in your code?

balmy grail Jan 3, 2025, 12:38 PM

#

Hey, does anyone have a course reccomendation for ML

upbeat prism Jan 3, 2025, 1:34 PM

#

balmy grail Hey, does anyone have a course reccomendation for ML

never did it but maybe https://www.kaggle.com/learn

Learn Python, Data Viz, Pandas & More | Tutorials | Kaggle

Practical data skills you can apply immediately: that's what you'll learn in these no-cost courses. They're the fastest (and most fun) way to become a data scientist or improve your current skills.

fickle shale Jan 3, 2025, 3:32 PM

#

During the validation in seq 2 seq , what if the decoder outputs an output sequence shorter than the true output sequence. How is the loss calculated in such cases. will categorical cross entropy work in such cases.

wheat merlin Jan 3, 2025, 3:47 PM

#

fickle shale ```During the validation in seq 2 seq , what if the decoder outputs an output se...

Im pretty sure models pad sequences as the first step

#

to standardize lengths

#

at least in the models im familiar with that's the process, I can't speak for generative models necessarily

#

https://stackoverflow.com/questions/57393033/why-do-we-need-padding-in-seq2seq-network

Stack Overflow

Why do we need padding in seq2seq network

To handle sequences of different, I would like to know.
Why do we need padding the sequence the word to the same length?
If the answer is "Yes, you need padding.". Can I set the padding in other

fickle shale Jan 3, 2025, 3:56 PM

#

wheat merlin Im pretty sure models pad sequences as the first step

in translation sometimes a word is made of 2-3 words how they process in encoder?

wheat merlin Jan 3, 2025, 3:56 PM

#

I think that's part of the tokenizers function

#

you would train the tokenizer on a dictionary, it is able to then assign values to words

#

so multiple word words would be known if it was in the tokenizer dictionary when it was trained

#

I could be wrong on that last part

#

it may just treat each word as seperate, and it wouldn't matter much because the word and positional embeddings would still find similarities

#

but in sum, it's part of the tokenizer's function to make sure it's processed correctly

serene scaffold Jan 3, 2025, 4:00 PM

#

wheat merlin Im pretty sure models pad sequences as the first step

If you have an NLP model that processes sequences (such as sentences) of varying length, and it can do more than one sequence in a batch, you have to pad all but the longest one, so that the array/tensor is "rectangular"

#

But it's not the model that does this. "You" have to do it before passing the tensor into the model.

wheat merlin Jan 3, 2025, 4:03 PM

#

You are right. I was defining model as the whole ipynb file. The padding would be part of the preprocessing before the data is sent to the attention head of the actual model

fickle shale Jan 3, 2025, 4:04 PM

#

serene scaffold If you have an NLP model that processes sequences (such as sentences) of varying...

what if we use ohe?

serene scaffold Jan 3, 2025, 4:04 PM

#

wheat merlin You are right. I was defining model as the whole ipynb file. The padding would b...

That's a very... Expansive definition of "model". Be sure to banish it from your vocabulary.

limpid zenith Jan 3, 2025, 4:05 PM

#

yeah generally a model is the layer definitions and/or it's weights
a jupyter notebook isn't really a model

wheat merlin Jan 3, 2025, 4:07 PM

#

we call BERT a model right

#

but when we say that, we know it has it's own unique tokenizer and preprocessing steps

#

a model in my mind is the preprocessing, tokenizer, embedding, attention, and encoder outputs

#

if we say that the model is only the attention onwards, that's also misleading, because you could make an entirely different "model" with a new tokenizer and preprocessing steps

fickle shale Jan 3, 2025, 4:13 PM

#

fickle shale what if we use ohe?

.

wheat merlin Jan 3, 2025, 4:15 PM

#

fickle shale .

not sure exactly, but ohe is rarely used now in nlp models

empty wing Jan 3, 2025, 5:16 PM

#

upbeat prism Well how do you read the pictures in your code?

That’s what I’m trying to figure out, how do I get my code to read pictures

upbeat prism Jan 3, 2025, 6:27 PM

#

empty wing That’s what I’m trying to figure out, how do I get my code to read pictures

https://pytorch.org/vision/0.19/generated/torchvision.io.read_image.html or whatever else too you use, scikit etc. image is just an array resp. RGB tensor

gray slate Jan 3, 2025, 9:57 PM

#

Anyone here trained smaller language models? I'm looking for advice on making something small enough to run offline on ordinary machines. Project background for context:
https://bitplane.net/log/2025/01/uh-halp-data/

I'm looking for advice on what to train on top of, how much data I should be generating, and how small I can expect to get it. And if anyone wants to join in and help, that'd also be cool

shrewd mountain Jan 4, 2025, 4:00 AM

#

If I want to start studying for ai technology where should I start still kinda confused me

serene scaffold Jan 4, 2025, 4:07 AM

#

shrewd mountain If I want to start studying for ai technology where should I start still kinda c...

it's really challenging to get your bearings. every resource makes different assumptions about what you already know.

what I would avoid for sure are "tutorials" on websites like Medium. They're not written to be helpful--they're just portfolio fodder for the authors.

what is your goal for learning about AI?

shrewd mountain Jan 4, 2025, 4:17 AM

#

serene scaffold it's really challenging to get your bearings. every resource makes different ass...

I always have been very interested in programming but with no specific topic but lately I have been very invested into ai. I am planning to take my major in AI

#

I have really basic knowledge on python and planning to expand on it

serene scaffold Jan 4, 2025, 4:18 AM

#

shrewd mountain I always have been very interested in programming but with no specific topic but...

are you in the US or where?

shrewd mountain Jan 4, 2025, 4:18 AM

#

serene scaffold are you in the US or where?

Indonesia

#

Southeast Asia

#

However taking a university somewhere else

serene scaffold Jan 4, 2025, 4:21 AM

#

shrewd mountain Southeast Asia

I know where Indonesia is. Are you used to talking to people who don't?
I usually recommend that people start by learning how to manipulate data with pandas. (or you can try using polars, I guess.)
that doesn't actually involve AI, but it's important to get a sense for what "data" is like.

shrewd mountain Jan 4, 2025, 4:21 AM

#

serene scaffold I know where Indonesia is. Are you used to talking to people who don't? I usuall...

Sorry but wdym by talking to people who don't

serene scaffold Jan 4, 2025, 4:22 AM

#

shrewd mountain Sorry but wdym by talking to people who don't

you specified that Indonesia is in southeast asia, so I thought you thought I didn't know.

shrewd mountain Jan 4, 2025, 4:22 AM

#

serene scaffold you specified that Indonesia is in southeast asia, so I thought you thought I di...

Ah i see mb

#

And yes ppl usually don't know where that is

serene scaffold Jan 4, 2025, 4:22 AM

#

wtf

shrewd mountain Jan 4, 2025, 4:24 AM

#

The problems that I am having is that school does not teaches comp science sadly so I need to study it myself. On top of that I have very little time so ye. Btw rq what is manipulating data with pandas

#

Searched it up and not 100% sure that I understand it

serene scaffold Jan 4, 2025, 4:25 AM

#

shrewd mountain The problems that I am having is that school does not teaches comp science sadl...

tabular data is where you have rows and columns of data
manipulating it is where you change it or aggregate it or spread it out to get different perspectives

shrewd mountain Jan 4, 2025, 4:26 AM

#

serene scaffold tabular data is where you have rows and columns of data manipulating it is where...

I see

serene scaffold Jan 4, 2025, 4:27 AM

#

like, if you have sales data where each row represents one transaction, you can transform it so that each row represents a month and each column represents a year. and then you can see if there are annual trends.

shrewd mountain Jan 4, 2025, 4:27 AM

#

Ah alright then. Welp ig I'll be learning from scratch again lol. Any web recommendations of vids to study this

serene scaffold Jan 4, 2025, 4:27 AM

#

ML is basically all about finding trends/patterns in data (or rather, making the computer figure out what the pattern is)

iron basalt Jan 4, 2025, 4:29 AM

#

shrewd mountain If I want to start studying for ai technology where should I start still kinda c...

What do you think "AI" is? And what do you want to make?

shrewd mountain Jan 4, 2025, 4:32 AM

#

iron basalt What do you think "AI" is? And what do you want to make?

A tool that will most likely help with like efficiency and figuring out small errors that can adjust the machines. What I want to make with it? I just really like automation in addition i always been very interested in it how it works. (I probably want to make like chatting bots or automation bots)

#

Idk if that is a good answer or not so ye

iron basalt Jan 4, 2025, 4:33 AM

#

shrewd mountain Idk if that is a good answer or not so ye

There is no wrong answer to this. I just wanted to know what you know. And what you want.

shrewd mountain Jan 4, 2025, 4:33 AM

#

I see

gray slate Jan 4, 2025, 4:33 AM

#

have you got a decent gfx card? or money?

shrewd mountain Jan 4, 2025, 4:33 AM

#

gray slate have you got a decent gfx card? or money?

I would say so

serene scaffold Jan 4, 2025, 4:33 AM

#

gray slate have you got a decent gfx card? or money?

not everyone is trying to fine-tune llama.

gray slate Jan 4, 2025, 4:33 AM

#

'cause both of those things will help you if you want to run long jobs

gray slate Jan 4, 2025, 4:34 AM

#

serene scaffold not everyone is trying to fine-tune llama.

loll fair. I really don't want to 😦
I wanna find a tiny model that will work with my data

iron basalt Jan 4, 2025, 4:34 AM

#

shrewd mountain A tool that will most likely help with like efficiency and figuring out small er...

What you are looking for is sounds like machine learning (ML). Which is heavily used in AI too. Being able to mess around with data as was already suggested is a good starting point.

shrewd mountain Jan 4, 2025, 4:35 AM

#

Alrighty

iron basalt Jan 4, 2025, 4:35 AM

#

I recommend getting a book on the topic, or a course.

#

(Or both)

gray slate Jan 4, 2025, 4:36 AM

#

sentdex has a pretty good youtube channel for doing ML from scratch in python, he's a good guy too

shrewd mountain Jan 4, 2025, 4:39 AM

#

iron basalt I recommend getting a book on the topic, or a course.

Well I tried kinda a pain to find books in my area and I have joined a course before which was very underwhelming it taught me basically nothing...

#

Probably my best bet is to find a eBook

iron basalt Jan 4, 2025, 4:40 AM

#

shrewd mountain Probably my best bet is to find a eBook

There are online courses too.

#

You will need two things in general for machine learning. You need to be very comfortable with programming, being able to make small to medium sized practical programs (manipulating files and such, a good resource for this is https://automatetheboringstuff.com/ ), and also the basics of data structures and algorithms (only really need to the basics here, but if you like it, you can dig further, it will only make you better). The second thing you need is mathematical knowledge, the usual recommendations are calculus, linear algebra, and statistics.

#

Also additional for programming is data manipulation / analysis with stuff like Pandas as was already mentioned.

#

(tabular data)

shrewd mountain Jan 4, 2025, 4:42 AM

#

Calculus my favourite.. Teacher taught us it about 1 week and gave us a test with horrible results in it. For the whole class

gray slate Jan 4, 2025, 4:42 AM

#

I really enjoyed this video as a high level overview: https://youtu.be/0QczhVg5HaI

YouTube

Emergent Garden

Why Neural Networks can learn (almost) anything

A video about neural networks, how they work, and why they're useful.

My twitter: https://twitter.com/max_romana

SOURCES
Neural network playground: https://playground.tensorflow.org/

Universal Function Approximation:
Proof: https://cognitivemedium.com/magic_paper/assets/Hornik.pdf
Covering ReLUs: https://proceedings.neurips.cc/paper/2017/hash...

▶ Play video

iron basalt Jan 4, 2025, 4:44 AM

#

shrewd mountain Calculus my favourite.. Teacher taught us it about 1 week and gave us a test wit...

If you can find some book online then I would go with that.

#

I don't really know of a good recommendation for calculus books.

#

Maybe someone here has one.

wooden sail Jan 4, 2025, 4:44 AM

#

spivak 💀 (don't. it's a greak book, but it's more meant for people going down the maths route)

gray slate Jan 4, 2025, 4:44 AM

#

Khan Academy goes all the way up to calculus and beyond, it's interactive

iron basalt Jan 4, 2025, 4:45 AM

#

gray slate Khan Academy goes all the way up to calculus and beyond, it's interactive

Alternative to books would be these new online tutor-like methods.

#

This includes Khan Academy or brilliant.org.

gray slate Jan 4, 2025, 4:46 AM

#

I personally think the "learn by play" is the best way to learn anything, and the new LLM methods of exploring knowledge are really powerful too

iron basalt Jan 4, 2025, 4:46 AM

#

There are additional materials that can help, but I would treat them as additional. https://www.youtube.com/watch?v=WUvTyaaNkzM&list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr

YouTube

3Blue1Brown

The essence of calculus

What might it feel like to invent calculus?
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to share the videos.
Special thanks to these supporters: http://3b1b.co/lessons/essence-of-calculus#thanks

In this first video of the series, we see how unraveling the nuances of a simple geometry que...

▶ Play video

gray slate Jan 4, 2025, 4:47 AM

#

When I learn a new programming language or technology I want to understand, I get ChatGPT to take the role of personal tutor. First constrain it by making it come up with a plan. Then give me examples, get me to explain what I think is going on, then have it correct me while I ask questions

#

then once I've covered the topic, get it to move on to the next stage. It worked really well for learning golang

shrewd mountain Jan 4, 2025, 4:49 AM

#

Kk

wooden sail Jan 4, 2025, 4:50 AM

#

a quick search reveals that george simmons' calculus with analytic geometry seems to be used in MIT for engineering courses. stewart's calculus books are also standard engineering books

#

that's around the level to aim for imo, for the sake of practicality

iron basalt Jan 4, 2025, 4:52 AM

#

I'm going to give a some more random resources, so you have options. https://www.youtube.com/watch?v=TjZBTDzGeGg&list=PLnvKubj2-I2LhIibS8TOGC42xsD3-liux&index=2

YouTube

MIT OpenCourseWare

1. Introduction and Scope

MIT 6.034 Artificial Intelligence, Fall 2010
View the complete course: http://ocw.mit.edu/6-034F10
Instructor: Patrick Winston

In this lecture, Prof. Winston introduces artificial intelligence and provides a brief history of the field. The last ten minutes are devoted to information about the course at MIT.

License: Creative Commons BY-NC-SA
...

▶ Play video

#

https://www.amazon.com/Artificial-Intelligence-A-Modern-Approach/dp/0134610997

Artificial Intelligence: A Modern Approach (Pearson Series in Artif...

Artificial Intelligence: A Modern Approach (Pearson Series in Artifical Intelligence)

#

https://mml-book.github.io/book/mml-book.pdf

wooden sail Jan 4, 2025, 4:56 AM

#

there's also the python one, i think it's in the pins

#

https://www.statlearning.com/ intro to statistical learning with applications in python

An Introduction to Statistical Learning

shrewd mountain Jan 4, 2025, 6:11 AM

#

Alright tysm also sorry for late reply

shrewd mountain Jan 4, 2025, 7:12 AM

#

iron basalt There are additional materials that can help, but I would treat them as addition...

Btw would you say the 12 hour python course by brocode is a good yt vid for like reviewing the whole python language

fickle shale Jan 4, 2025, 2:05 PM

#

https://getfluently.app/

Fluently: AI-powered English coach

Improve your English fluency with personalized feedback based on your real speech. Refine your accent, perfect your grammar, and expand your vocabulary.

fickle shale Jan 4, 2025, 2:07 PM

#

fickle shale https://getfluently.app/

check this i really like the ai of this i want to create something like this can anyone tell me how can i create! or anyone want to collabrate?

tawdry sundial Jan 4, 2025, 2:08 PM

#

whats the best llm to run locally?

#

(using rtx 4070super)

#

currently looking at hugging face leaderboard

#

its quite odd that mistral or llama isnt on the list

#

is falcon3 really the best option?

lapis sequoia Jan 4, 2025, 2:14 PM

#

Hii

digital hatch Jan 4, 2025, 2:16 PM

#

My friend work in APD bank

solemn venture Jan 4, 2025, 2:57 PM

#

tawdry sundial whats the best llm to run locally?

task dependent but i hear good things about qwen2.5

warped harness Jan 4, 2025, 3:26 PM

#

Guy I want to make my own ai chat bot, what should I study?

#

Like any roadmap?

fickle shale Jan 4, 2025, 3:26 PM

#

warped harness Guy I want to make my own ai chat bot, what should I study?

using llm!

gray slate Jan 4, 2025, 3:27 PM

#

download ollama, run it, then pick a model. then do prompt-hacking and requests to call the API

fickle shale Jan 4, 2025, 3:27 PM

#

rasa

warped harness Jan 4, 2025, 3:28 PM

#

fickle shale using llm!

Can u elaborate a bit

fickle shale Jan 4, 2025, 3:28 PM

#

gray slate download ollama, run it, then pick a model. then do prompt-hacking and `requests...

.

gray slate Jan 4, 2025, 3:29 PM

#

warped harness Can u elaborate a bit

Got a decent GPU in your machine?

warped harness Jan 4, 2025, 3:32 PM

#

gray slate Got a decent GPU in your machine?

3080ti?

#

Is it good enough?

placid ravine Jan 4, 2025, 3:33 PM

#

warped harness Is it good enough?

you have a BEAST with you

#

it will be excellent for learning and running a decently large model
idk how large coz i dont do llms
but yea @gray slate and @fickle shale might tell you that
but you can start working

gray slate Jan 4, 2025, 3:42 PM

#

warped harness 3080ti?

Same as my laptop I think. 16GB? Should be good enough yeah. Here's an example:
https://asciinema.org/a/696998

#

basically, to configure it you set the system prompt, either:

edit the default system prompt by editing the manifest file (I haven't tried this)
or send in the system prompt like I did in the curl command

to use it, you can either:

chat with it directly on the command line, after running the model
pipe data into it from a script (doesn't work well in Windows, adds junk on the end for some reason), like echo hello! | ollama run model and capture the output, or
use requests or curl like I did, and join the response bit together. (use json.loads on each line as they come back from requests.post, so you get the typing out effect if you need it)

gray slate Jan 4, 2025, 3:52 PM

#

placid ravine it will be excellent for learning and running a decently large model idk how lar...

Unfortunately 16GB isn't really enough. quantized mixtral will just about run, but it's dog slow. It's a pretty clever model though. llama3.3 hasn't been quantized on ollama.com yet, and that is slow as hell on my 64GB Orin too. but it's a pretty powerful model.

warped harness Jan 4, 2025, 3:53 PM

#

Damnn

gray slate Jan 4, 2025, 3:53 PM

#

llama3.2 is fast enough, and good enough for most tasks though

#

And if you ever need to use something smarter, you can run a larger model on vast.ai for 10 cents an hour

warped harness Jan 4, 2025, 3:54 PM

#

Thank you so much for your time brother

gray slate Jan 4, 2025, 3:54 PM

#

You're welcome 🙂

#

If it helps, here's an example in Python. It isn't doing a line by line approach though:
https://github.com/bitplane/uh-halp-data/blob/master/scripts/02.popularity_contest.py

I'm basically using the LLM to say "order these 10 commands by likelihood of a user typing them into a terminal", and I loop over them until all 40,000 commands have been sorted by "what the LLM thinks are the most useful commands"

#

steal the function from line 71 and hack it to do whatever you need 🙂

#

lol just ran mixtral and qwen locally, same GFX card:
https://asciinema.org/a/FKOrYUoebjXJOxJ0FNWXc2uO8

You need to do some serious prompt hacking if you want to get them to work for you!

#

^ @warped harness same gfx card as yours

wheat merlin Jan 4, 2025, 4:14 PM

#

anyone have experience with graph neural nets

#

stuck on a loss of .693 which is just random guessing basically lol

gray slate Jan 4, 2025, 4:15 PM

#

nope but interested to learn. how do you evaluate?

wheat merlin Jan 4, 2025, 4:15 PM

#

so what im doing is im trying to predict edges between nodes

#

which is a binary classification

#

so it uses binary_cross_entropy_with_logits for loss

#

my understanding is a loss of .693 with this function is just as good as 50/50 guessing

gray slate Jan 4, 2025, 4:16 PM

#

yeah sounds like it. does it not shift at all?

wheat merlin Jan 4, 2025, 4:17 PM

#

#

starts insanely high and gets down over the 100 epochs

#

for my model I HAD to add gradient clipping

gray slate Jan 4, 2025, 4:18 PM

#

how big is your data and your model?

wheat merlin Jan 4, 2025, 4:19 PM

#

only 1300 edges, which is probably the problem

#

I have been using event history analysis with a network component, it's called NEHA, but wanted to try with neural net

gray slate Jan 4, 2025, 4:20 PM

#

yeah, it can't generalize if your model isn't way smaller than your data. from what I understand anyway -- i'm no expert

#

can you augment your data to make realistic fake stuff and pass in a load?

wheat merlin Jan 4, 2025, 4:21 PM

#

yeah actually, that's part of the data preprocessing

#

i've never heard of negative_sampling before but it's super cool

#

like you said, it just makes fake networks so the model can classify between true and fake

gray slate Jan 4, 2025, 4:22 PM

#

nor me, what's that?
I'm a noob really, lots of software experience not much ML experience

#

But, if you take the "every weight and bias is just a line y = mx + b style, and every activation function is just a way to crop it so you get line segments rather than a linear slope and offfset", then all you're doing is fitting a function to some points... getting data right should be mostly about getting the fake data on the same line as the real stuff

gray slate Jan 4, 2025, 4:28 PM

#

wheat merlin yeah actually, that's part of the data preprocessing

chatgpt reckons that if your F1 score isn't moviing even a tiny bit, then your model probably isn't getting both of the binary classes

wheat merlin Jan 4, 2025, 4:30 PM

#

yeah im thinking I need to mess with the data before I mess with the model anymore

#

it is imbalanced and not scaled

gray slate Jan 4, 2025, 4:31 PM

#

so with a graph network, do the graphs all have to be the same depth? or do you like.... truncate them or something?

wheat merlin Jan 4, 2025, 4:33 PM

#

im doing edge classification, so I don't think I have to deal with graphs of different sizes

#

but from what I understand, GNN's can handle graphs of varying sizes when it comes to classifying the graph or nodes

gray slate Jan 4, 2025, 4:37 PM

#

oh cool, okay yeah I was confused and thinking "how the hell do you feed different length graphs into a network" - but looks like you don't, you feed stats about nodes or edges in, so it's a data formatting thing rather than a network architecture?

wheat merlin Jan 4, 2025, 4:38 PM

#

yeah, I think its essentially an edge list with node attribute data columns

#

so for me, im looking at US states

#

so it would be like:

receiver sender GDP %White etc.

gray slate Jan 4, 2025, 4:39 PM

#

ah okay and you're building a good ole fashioned discriminator lol
is that legal?

#

I worked at a loan company doing devops for linear regression models and they (data scientists) had to be really careful about that sort of thing

wheat merlin Jan 4, 2025, 4:41 PM

#

yeah, im not doing anything crazy lol. Im not trying to predict like crime rates and blaming it on a certain demographic or something

#

basically im taking a developed theory from my field and using a different method

#

not really changing the question or anything

gray slate Jan 4, 2025, 4:42 PM

#

stuff like "when we predict chance of a default by geographical location, the location often encodes ethnicity as a by-product"

wheat merlin Jan 4, 2025, 4:42 PM

#

yeah I could see stuff like that being problematic

#

thats why theory is so important, cuz you have to be able to explain and understand the relationships you are modeling

gray slate Jan 4, 2025, 4:44 PM

#

it's harder than it looks! I enjoy the data thing and reasoning about data but I've never actually built a full end to end model, keep getting stuck overthinking the feature engineering part, or failing to generalize

wheat merlin Jan 4, 2025, 4:45 PM

#

I highly recommend running through PyTorch if you are interested. PyTorch would probably be pretty easy to code with a devops background

#

people and companies love to use pretrained models now and just importing them from huggingface

#

but imo the most fun part is building the architecture and stuff with torch

gray slate Jan 4, 2025, 4:46 PM

#

I've played with other people's models and that, trained tacotron 2 models, and I tried to make a model that predicted "distance by road" given geographical locations, but never made it out of the workbook

#

currently I'm generating data for my uh tool (pip install uh-halp) - I want to fine tune a tiny model and have it run locally

wheat merlin Jan 4, 2025, 4:47 PM

#

that's cool, haven't heard of that before

#

when I get a house, im totally going to get a server rack and run models locally and stuff

#

the computer engineering stuff is so cool and fun to play with

gray slate Jan 4, 2025, 4:48 PM

#

I think the uh-halp thing should have small enough data that it can run locally. at least I think so anyway. I mean, a few hundred MB maybe?

wheat merlin Jan 4, 2025, 4:48 PM

#

rn im running python through Google colab. I have pro, so im using a Nvidia A100. It runs so fast that I thought my code was broken lol

gray slate Jan 4, 2025, 4:48 PM

#

nice! does it still disconnect you?

wheat merlin Jan 4, 2025, 4:49 PM

#

haven't had any problems recently

gray slate Jan 4, 2025, 4:49 PM

#

colab annoyed the crap out of me, deleting my data and progress every few hours

wheat merlin Jan 4, 2025, 4:49 PM

#

I was running a model, and it was running at 50gb of ram for 17.5 hours lmao

#

and then google shutdown my runtime

gray slate Jan 4, 2025, 4:49 PM

#

I shifted to vast.ai, which has its own problems but it's a docker container you run yourself

wheat merlin Jan 4, 2025, 4:49 PM

#

ooo ill check that out

gray slate Jan 4, 2025, 4:50 PM

#

i mean you run it on other people's computers for money, pay by the hour

wheat merlin Jan 4, 2025, 4:50 PM

#

my workplace has a cluster where I can get a gpu, 12 cores, and 250gb of ram. But it wasn't working earlier, so that's why im running on colab

gray slate Jan 4, 2025, 4:51 PM

#

but it's like 5 cents an hour for something like my laptop and not set my legs on fire or need to leave it open

wheat merlin Jan 4, 2025, 4:51 PM

#

yeah that sounds really nice

gray slate Jan 4, 2025, 4:51 PM

#

I bought an Orin too, lots of RAM but not the fastest thing

wheat merlin Jan 4, 2025, 4:52 PM

#

I have a question about Orin and the related small computers

#

my uncle is a penetration tester and does all the cybersecurity for a fortune 500 company

#

so he uses a lot of data and stuff

#

but he was looking to get a small workstation like an orin

gray slate Jan 4, 2025, 4:52 PM

#

it's 64gb shared between CPU and GPU but the CPU is 8 cores ARM and the GPU is Tegra... so .. yeah, not the fastest

wheat merlin Jan 4, 2025, 4:52 PM

#

do you have any recommendations? I think it would need a good cpu as opposed to a good gpu

gray slate Jan 4, 2025, 4:53 PM

#

I tend to rent boxes when I need speed, and use a lightweight gaming laptop as my daily driver

#

but in work we have macbook pro laptops, they're surprisingly good. 30GB of RAM on them, again, shared between CPU and GPU. battery life is insane

wheat merlin Jan 4, 2025, 4:55 PM

#

yeah im looking at upgrading my laptop to the new base m4 macbook

#

the battery life is insane

#

my uncle swears by the macbook cuz the terminal is really easy I guess for devops stuff

gray slate Jan 4, 2025, 4:56 PM

#

yeah it's nice to have a BSD shell. not as nice as Linux IMO, but it's usable unlike Windows

#

They're fast enough to run Linux in a VM and run Kali and stuff as images I think

#

but the CPU architecture stuff can be a bit of a pain if people haven't published aarch64 binaries

wheat merlin Jan 4, 2025, 4:58 PM

#

yeah

#

my uncle has the macbook with the intel cpu

#

and he was saying that to upgrade he couldn't transfer his vm's and stuff or something

#

so he had to remake all the stuff he was using

gray slate Jan 4, 2025, 4:59 PM

#

yeah, or ... 🐌

wheat merlin Jan 4, 2025, 4:59 PM

#

I dont know a ton about terminal stuff lol

gray slate Jan 4, 2025, 5:00 PM

#

I barely use the GUI! only for the web really

wheat merlin Jan 4, 2025, 5:01 PM

#

do you do fullstack stuff or what kinda work do you do usually

gray slate Jan 4, 2025, 5:02 PM

#

lifelong developer, games and stuff, c/c++, python back-end, did years in load testing, bit of full stack and devops, currently doing SRE work

#

I basically live in tmux. Use vscode sometimes, but vim mostly

austere rock Jan 4, 2025, 5:03 PM

#

what a giga chad

gray slate Jan 4, 2025, 5:03 PM

#

lol

#

in work it's all shitty web apps 😦

#

so I have to get my terminal fix at home

austere rock Jan 4, 2025, 5:04 PM

#

building my own conversational AI with open source libraries is my latest obsession

gray slate Jan 4, 2025, 5:04 PM

#

cool whatya building?

warped harness Jan 4, 2025, 5:04 PM

#

gray slate ^ <@950904954766364682> same gfx card as yours

Can i add you as freind? Like if I need help in future so i can just contact you directly. Will that be alright?

wheat merlin Jan 4, 2025, 5:04 PM

#

austere rock building my own conversational AI with open source libraries is my latest obsess...

broo im so obssessed with ml rn, im working like 80 hour weeks lmao

gray slate Jan 4, 2025, 5:04 PM

#

warped harness Can i add you as freind? Like if I need help in future so i can just contact you...

yeah go for it 🙂

gray slate Jan 4, 2025, 5:05 PM

#

wheat merlin broo im so obssessed with ml rn, im working like 80 hour weeks lmao

status: coding in R

shudders

wheat merlin Jan 4, 2025, 5:06 PM

#

LOL

#

I do my ML in python, don't worry

austere rock Jan 4, 2025, 5:06 PM

#

my job isn't in tech its in dominos delivery driver, no college education, been job hunting and starting my own AI company in my free time

#

never touched R

gray slate Jan 4, 2025, 5:06 PM

#

I wrote a Pydantic to R object conversion thingy for a client, and... well, R really upsets me

wheat merlin Jan 4, 2025, 5:07 PM

#

bro R is the best program i've ever touched

#

pls dm me if you need help with R

austere rock Jan 4, 2025, 5:07 PM

#

I don't wanna touch R unless there's no better alternatives

wheat merlin Jan 4, 2025, 5:08 PM

#

I highly recommend R for data transformation, visualizations, and regression stuff

#

but for ML definitely use python or use the python in R thing

gray slate Jan 4, 2025, 5:08 PM

#

you've got keywords that make it so parameters don't get dereferened unti they're inside the function. and all the modern R code with the filters and stuff is slow as hell because it's basically mixing compiled and interpreted code

wheat merlin Jan 4, 2025, 5:09 PM

#

the dplyr tidyverse stuff? yeah I was told it is really bad on ram

gray slate Jan 4, 2025, 5:09 PM

#

yes

wheat merlin Jan 4, 2025, 5:09 PM

#

there is a package built by Apache called Arrow

#

it just masks all the dplyr functions and makes it significantly faster

gray slate Jan 4, 2025, 5:09 PM

#

oh wow didn't know about that. I filed a bug with dplyr saying "hey every function is slow" and they basically said "yeah, not fixing"

wheat merlin Jan 4, 2025, 5:10 PM

#

https://github.com/apache/arrow

GitHub

GitHub - apache/arrow: Apache Arrow is the universal columnar forma...

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics - apache/arrow

gray slate Jan 4, 2025, 5:10 PM

#

spent about 3 days trying to figure out how to optimize it but it needed a partial rewrite. ended up just replacing aggregation functions with R ones

wheat merlin Jan 4, 2025, 5:11 PM

#

yeah it's really cool, it has the same feature that datasets from huggingface in python has

#

where you can load part of the data but it doesn't go the ram somehow

#

idk exactly how it works but its pretty cool

gray slate Jan 4, 2025, 5:12 PM

#

the state we were in is, we had data scientists who knew databases, and they wrote their extract code in SQL-flavoured R, big rectangular chunks of data. then train their models on it

#

then they wanted to do it live rather than on a batch job

#

so naturally when prepping data coming into the system, you need to run it through the same aggregation pipeline, fastapi -> R and back again

#

the R part wasn't threadsafe, didn't perform, the data types are a free for all, and the model ran in Python lol

#

but the only way to do it safely was to have the in and out of R step, "it's okay if it takes a second" and I was like my code runs in < 40ms and I'm disgusted by how slow it is!

#

shakes fist at dplyr

wheat merlin Jan 4, 2025, 5:17 PM

#

lmao

#

no one in my field cares about speed, it's pretty funny

#

we don't usually use super big data, but all the replication data and stuff is always so inefficient

placid ravine Jan 4, 2025, 5:18 PM

#

gray slate Unfortunately 16GB isn't really enough. quantized `mixtral` will just about run,...

ouch

gray slate Jan 4, 2025, 5:19 PM

#

wheat merlin we don't usually use super big data, but all the replication data and stuff is a...

I'm just anal about it because of the games programming stuff I think. in batch jobs I commit all the sins though

wheat merlin Jan 4, 2025, 5:19 PM

#

yeah, that makes sense

#

(although no modern game developers seem to care about optimization lol)

gray slate Jan 4, 2025, 5:20 PM

#

that was the best bit for me, the optimization

#

These things:
https://bitplane.net/dev/c++/grass/
https://bitplane.net/dev/c++/impostors/
https://bitplane.net/dev/c++/ply-loader/

#

oops my "emojis as bullet points" trick has messed up my site

wheat merlin Jan 4, 2025, 5:24 PM

#

https://bitplane.net/dev/c++/irrvaders/

#

this is cool, I like the three-breasted alien lmfao

gray slate Jan 4, 2025, 5:28 PM

#

inspired by total recall's "you make me wish i had threeee hands"

wheat merlin Jan 4, 2025, 5:30 PM

#

No one has seen this yet, but its part of my dissertation. I'm not going to say what im modeling, but ill say that its related to state level policies

#

#

need to share it cuz I think it's super cool lol

gray slate Jan 4, 2025, 5:30 PM

#

wow yeah looks nice

wheat merlin Jan 4, 2025, 5:31 PM

#

the thing with R is it has all the network stuff, python really lacks in that department

gray slate Jan 4, 2025, 5:43 PM

#

it does? I thought it was mostly for rectangular SQL-like data

#

though I guess I only did one project in it!

gray slate Jan 4, 2025, 6:01 PM

#

@wheat merlin do you use any sort of runner to run your data creation stuff? Makefiles are grating on me

#

it'd be really nice if there was a docker/make style hash(inputs) cached job thingy I could use and run things on other hosts by just adding scripts and output names, and spinning up a runner with tags on it

wheat merlin Jan 4, 2025, 6:45 PM

#

gray slate <@214812898306949121> do you use any sort of runner to run your data creation st...

I don't, no. Don't really know what makefiles are exactly; I don't do any model deployment for businesses or anything.

#

I just have everything in my rscript and press play

#

have a working directory set and stuff

#

and my data is small enough that I just write to csv and import the csv in my model execute file

#

im kinda just now learning git, forking, and pull request stuff; so that's where I am at with data management lol

rancid sorrel Jan 4, 2025, 7:57 PM

#

you mean the graph network diagraming?

#

or you talking about raw network stack?

wheat merlin Jan 4, 2025, 8:15 PM

#

wdym

#

Network as in graph network diagramming and modeling

wild sluice Jan 4, 2025, 8:35 PM

#

is Malmo for reinforcement learning dead?

solemn venture Jan 4, 2025, 11:34 PM

#

Never heard of Malmo

pine heron Jan 5, 2025, 2:09 AM

#

Hello everyone, I wrote optimizers for TensorFlow and Keras, and they are used in the same way as Keras optimizers.

https://github.com/NoteDance/optimizers

GitHub

GitHub - NoteDance/optimizers: Optimizers for TensorFlow and Keras.

Optimizers for TensorFlow and Keras. Contribute to NoteDance/optimizers development by creating an account on GitHub.

rancid sorrel Jan 5, 2025, 2:41 AM

#

fun times

neat sparrow Jan 5, 2025, 4:06 AM

#

I need some help; I'm trying to download a parquet dataset from hugging face, ds = load_dataset("lighteval/natural_questions_clean", split="train"). I've been trying to figure out for most of the day why the script is always only downloading the ReadMe.md file and then logging a 'tags' error no matter what dataset I use. I'm sure it's just a really easy fix but I can't quite figure it out. I've looked through documentation and videos aswell.

wild sluice Jan 5, 2025, 6:05 AM

#

solemn venture Never heard of Malmo

damn its like an environment in Minecraft for training AI agents

flat token Jan 5, 2025, 6:41 AM

#

wheat merlin No one has seen this yet, but its part of my dissertation. I'm not going to say ...

Using deep learning to optimize political demarcations to avoid excessive gerrymandering. There are a few people who model this is a massive convex optimization problem and solve it as an LP. I think there is a guy at Penn and a guy at UNC doing it

gritty vessel Jan 5, 2025, 7:30 AM

#

Hey everyone I wanted to ask how can I add temporal features in auto encoder?

#

I have multiple events and from each events I am taking first 10 time steps

#

So shape of my data comes to

#

There are 45 batches

#

And each batch is of size ([1,10,453,958])

#

And whole dataset

#

Is of shape ([450,453,958])

#

Am like progressing correctly?

#

I am trying to train auto encoder

#

But want it to capture temporal features as well

rancid sorrel Jan 5, 2025, 9:36 AM

#

Try use liquid time if your varying the time step

#

It requires some minor modification to the keras library but keras LTC NPC works great

umbral star Jan 5, 2025, 12:55 PM

#

How to extract clothes from images? For example, imagine a picture of a person wearing a T-shirt and I want to isolate every pixel of the shirt and save it as its own image. How can I do that? I know that opencv can do this but with things like faces and maybe animals but how can you train it to detect things like say guns or any object?
I have CC-ed this question in #media-processing in case that channel would be a better place to ask this.

wheat merlin Jan 5, 2025, 2:00 PM

#

flat token Using deep learning to optimize political demarcations to avoid excessive gerrym...

Thats cool. Do u do poli sci?

gray slate Jan 5, 2025, 2:07 PM

#

umbral star How to extract clothes from images? For example, imagine a picture of a person w...

Segmentation

mild dirge Jan 5, 2025, 2:13 PM

#

umbral star How to extract clothes from images? For example, imagine a picture of a person w...

It's called semantic segmentation. It's where you classify each pixel into one of several classes (such as clothing, weaponry, etc.)

gritty vessel Jan 5, 2025, 2:14 PM

#

rancid sorrel Try use liquid time if your varying the time step

Yes there are varying time steps will look into this

rancid sorrel Jan 5, 2025, 2:17 PM

#

from ncps import wirings from ncps.tf import LTC as LTC ncps is sparse wiring libary

#

its very quick

umbral star Jan 5, 2025, 2:53 PM

#

Thanks @silent @gray slate and @silent @mild dirge

gritty vessel Jan 5, 2025, 3:22 PM

#

rancid sorrel `from ncps import wirings from ncps.tf import LTC as LTC` ncps is sparse wiring ...

Can I ask one more thing? I always get like shape mismatch for unusual shapes of images like currently I am using 4,1,10,453,958 dimensions

#

Batch size 4 1channel 10 time steps 453 height 958 width

#

But in out put I am getting always different shapes

rancid sorrel Jan 5, 2025, 3:27 PM

#

well sadly thats the issue with nural networks

#

you need to specify in and out

#

#

my best advice to you is to load up tensorboard so you can see the NN and what its doing

#

one of the first thing to do with images is to reshape the input using image processing layer

#

https://www.kaggle.com/code/zeeshanlatif/image-processing-basics-with-opencv-for-beginners good thing to look at

Image Processing Basics with OpenCV for Beginners

Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

gritty vessel Jan 5, 2025, 3:50 PM

#

Thanks

#

I managed to make input and output size same

toxic mortar Jan 5, 2025, 3:58 PM

#

In this article for understanding LSTM networks, it says "forget" gates can decide when to forget some information for future propagation. Is this really what happens or this is just a naive attempt to interpret these networks?

rancid sorrel Jan 5, 2025, 4:00 PM

#

return_sequences=True is what i think it means

#

its a tensorflow option

#

but essentual a forget gate just resets its state randomly

#

its another random behaviour you can add

toxic mortar Jan 5, 2025, 4:03 PM

#

Why would you continue to randomize the weight matrix, if you have already initialized with random values?

rancid sorrel Jan 5, 2025, 4:03 PM

#

another thing to prevent overfitting most likly

#

and tbh that sounds almost exactly like the dropout feature

toxic mortar Jan 5, 2025, 4:07 PM

#

Really? Dropout feature initialize percentages to see how these weights are relevant to the output, and based on the output result during back propagation they are updated based on the p or 1-p probabilities. Sounds very different to me

#

If you do not set explicit metric, how can you track its effect?

rancid sorrel Jan 5, 2025, 4:08 PM

#

i have been known to be wrong, and am also learning this stuff too 😉

#

you use a tool like tensorboard to track the optimization

wheat merlin Jan 5, 2025, 4:09 PM

#

toxic mortar Why would you continue to randomize the weight matrix, if you have already initi...

IDK but this COULD be related to preventing exploding or vanishing gradients? if im understanding correctly

fickle shale Jan 5, 2025, 4:13 PM

#

wheat merlin IDK but this COULD be related to preventing exploding or vanishing gradients? if...

Lstm solved vanishing gradients problem!

fickle shale Jan 5, 2025, 4:14 PM

#

toxic mortar Why would you continue to randomize the weight matrix, if you have already initi...

randomize weight matrix for finding better local minima best soln may be!

wheat merlin Jan 5, 2025, 4:14 PM

#

that could be it, good thought

toxic mortar Jan 5, 2025, 4:17 PM

#

Guys, the name of the gate is "forget", not randomize. It uses sigmoid activation function ( codomen [0,1] ) previous cell state to see what is retained or discarded from previous memory cell context

#

LSTM solves vanishing gradients with cell memory context and GRUs

#

I am asking wether we can introduce some intuition of how we pick those things what we want to forget

jaunty helm Jan 5, 2025, 4:18 PM

#

toxic mortar In this article for understanding LSTM networks, it says "forget" gates can deci...

maybe that's the thought behind it when it was designed
if you look at the implementation it's just some coefficients that the internal states get modified by

fickle shale Jan 5, 2025, 4:19 PM

#

toxic mortar I am asking wether we can introduce some intuition of how we pick those things w...

it determines which part is relevant from previous state!

#

they trained with lot of data lstm is not always correct it learns

#

it trains with lot of traning data then it knows which words is relevant for ur problem so it predicts correctly(not always) next word!

flat token Jan 5, 2025, 4:31 PM

#

wheat merlin Thats cool. Do u do poli sci?

No PhD in applied Mathematics

tawdry sundial Jan 5, 2025, 4:41 PM

#

are there any agent libraries that offer a wide range of models TTS, STT, VAD and other models like this for example

#

a bunch of models that can be stacked on one another to perform tasks

#

this picture is from livekit agent https://github.com/livekit/agents

#

but it uses this architecture, so i cant just make a script and run agent locally

neat sparrow Jan 5, 2025, 5:09 PM

#

neat sparrow I need some help; I'm trying to download a parquet dataset from hugging face, ds...

Turns out that was an easy fix. I just needed to update the hugging face and datasets modules with pip install

gritty vessel Jan 5, 2025, 6:31 PM

#

flat token No PhD in applied Mathematics

Tell me about your research!

flat token Jan 5, 2025, 6:37 PM

#

gritty vessel Tell me about your research!

im working on a paper that is almost finished for the journal of algorithms which is basically the optimal traversal of n-nomial tree structure path generation that solves an open problem of how to algorithmically generate all the combinations of n-long integer with no permutations. I also work on 2 problems in DMARL (deep multi agent reinforcement learning) and the control of multi agent aquatic rover systems

#

im also fooling around with a problem in K-theory and another problem in stochastic path generation as a solution for monte-carlo convergence failures but those aren't really serious pursuits yet just stuff im fooling with

gritty vessel Jan 5, 2025, 6:38 PM

#

What deep multi agent reinforcement learning ?

#

So instead of single agent we will have multiple agents and then there experience will be combined together?

flat token Jan 5, 2025, 6:39 PM

#

gritty vessel So instead of single agent we will have multiple agents and then there experienc...

combined....not exactly. this problem can be super multi-faceted but in my case i consider that yes there experiences can be shared, but they still are naturally destructive agents because there are external force vectors acting on all the agents

#

and the construction of my environment is critical becuase if i don't reflect this none of the agents can learn at all

#

however multi-agent RL can also be something where agents cannot share experiences but must still interact within the same system

#

or the system may become a POMDP (partially observable* markov decision process) where not everything is known about the system except for what is observed at your current time step, all past time steps, and maybe some amount of n < \infty time steps in the future (i.e. maybe you know 3 time steps or maybe 10 in the future, but you dont know all of them)

#

and these agents need to learn to work together given this

#

so there are a lot of different ways in which DMARL can be formulated it's just a use case thing i would say

#

#

and this is what this would look like in a holding pattern with M agents

gritty vessel Jan 5, 2025, 6:52 PM

#

Half of stuff is going up my head

#

But it's interesting

#

Any papers regarding this?

#

That you would suggest

flat token Jan 5, 2025, 7:28 PM

#

gritty vessel Any papers regarding this?

Google control of stochastic multi agent systems

#

And then you can deep learning to any of those things

#

To learn how to apply the math to deep learning

gritty vessel Jan 5, 2025, 7:44 PM

#

Will check it out thank you

iron basalt Jan 5, 2025, 8:33 PM

#

toxic mortar I am asking wether we can introduce some intuition of how we pick those things w...

It's a direct multiplicative filter on the previous state. Without it, it can only be modified through addition in the next part.

#

This makes its control of the cell state more powerful.

#

With just addition it could struggle to forget things (quickly).

#

Imagine that you have some problem that requires you to modify the cell state only slightly based on the current input or it will explode. So maybe the added values are around 0.01 in magnitude. But you want to forget something instantly based on the current input you just got. So maybe you have some cell state value of 1.0 and you want it instantly to go to 0. With just the addition you can't do that without breaking this whole "small added value from the input" requirement. But multiplication can do so. It can be like 0.0001 (near zero) and is separate from the addition part. So basically it gives the network better flexibility / options on the cell state control. Just addition is pretty limited and may not work at all for certain problems.

#

(Just multiplication would not work due to being 0 to 1 (only same or decreasing))

unkempt wigeon Jan 5, 2025, 8:43 PM

#

What book for pytorch do I need so I can put it in my cart

odd meteor Jan 5, 2025, 9:24 PM

#

unkempt wigeon What book for pytorch do I need so I can put it in my cart

https://www.amazon.com/Machine-Learning-PyTorch-Scikit-Learn-scikit-learn-ebook-dp-B09NW48MR1/dp/B09NW48MR1?&linkCode=sl1&tag=rasbt03-20&linkId=2da544f92871fcfa1ed83a1b3c72e659&language=en_US&ref_=as_li_ss_tl

https://www.manning.com/books/build-a-large-language-model-from-scratch?utm_source=raschka&utm_medium=affiliate&utm_campaign=book_raschka_build_12_12_23&a_aid=raschka&a_bid=4c2437a0&chan=mm_website

If you prefer a free resources, checkout https://d2l.ai

Machine Learning with PyTorch and Scikit-Learn: Develop machine lea...

Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python

Manning Publications

Build a Large Language Model (From Scratch)

Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up!

In Build a Large Language Model (from Scratch) bestselling author Sebastian Raschka guides you step by step through creating your own LLM. Each stage is explained with clear text, diagrams, and examples. You’ll go from the initial design and c...

gray slate Jan 5, 2025, 9:59 PM

#

I'd really like some kind of learning that has insights rather than the mathematics. Like why do they use "mean squared distance from prediction" rather than "absolute distance from predicted value".

Is it for algorithmic performance (no sqrt call), because of the tradition of std deviation, or because it's the output of a multiplication, so squared distance is the natural thing to use because it's on the same scale? Why not "absolute cubic distance from prediction"

It's like ML is filled with algorithms that seem arbitrary, or work because they have been tested empirically, or have complex mathematical reasons for why they work but not a reason you can just grock

#

Probably my mathematical illiteracy talking here, but I find that maths in general has a kind of... uh... it's like it's a way to logically deduce far deeper than humans are capable of understanding. It searches for proofs, not understanding, like an exploration of a tree in what is essentially a tautology-space.
So the outputs of it tend feel opaque, like circular reasoning, and process-based faith. Dunno if it's easier if you're a mathematician, but I'd guess it isn't based on the outputs.

Apologies for the unprompted philosophical noise lol

tidal bough Jan 5, 2025, 10:15 PM

#

gray slate I'd really like some kind of learning that has insights rather than the mathemat...

There's a nice explanation of why mean squared error is common in statistics in Koks "exporations in mathematical physics" (the arithmetical mean forms a natural pair with the sum-of-squares and with the normal distribution, whereas the sum of absolute distances is associated with the median rather than the mean), and in general you might enjoy that book to get some deeper insight into the math. Here's a few pages (note: f(x)=1 is probably a typo and should be f(x)=x).

#

(It's not the first time I encountered this idea though, just a recent one, but I'm not sure I can remember where I originally read about it)

gray slate Jan 5, 2025, 10:23 PM

#

tidal bough There's a nice explanation of why mean squared error is common in statistics in ...

the sum of absolute distances is associated with the median rather than the mean
That's interesting. So if you want to measure distance from median you'd use the sum of absolute distances from it, but the mean you'd multiply, and so have to then do the sqroot to bring it back to the same scale?

tidal bough Jan 5, 2025, 10:23 PM

#

(I'm not actually sure if anything goes wrong if you use mean-absolute-distance as your error metric in ML, though. In 1d it would cause the problem that there'll be an entire region where the gradient is 0, but I think not in any higher dimension)

tidal bough Jan 5, 2025, 10:24 PM

#

gray slate > the sum of absolute distances is associated with the median rather than the me...

So if you want to measure distance from median you'd use the sum of absolute distances from it
Sort of. What I mean is that if you try minimizing ∑ |x_i-m| over a 1d dataset, you'll get that the optimal solution is the median (specifically, for an odd number of points it's the middle point and for an even number of points it's the entire region between the two middle points). Whereas for the MSE (Σ (x_i-m)^2), the optimal solution is always the mean.

gray slate Jan 5, 2025, 10:25 PM

#

cool thank you. though I'm reading those pages and find it pretty mentally exhausting as my math-fu is pretty weak!

iron basalt Jan 5, 2025, 10:28 PM

#

gray slate Probably my mathematical illiteracy talking here, but I find that maths in gener...

https://www.youtube.com/watch?v=B-eh2SD54fM

YouTube

PankaZz

Feynman-"what differs physics from mathematics"

A simple explanation of physics vs mathematics by RICHARD FEYNMAN

▶ Play video

gray slate Jan 5, 2025, 10:31 PM

#

iron basalt https://www.youtube.com/watch?v=B-eh2SD54fM

Feynman is a pretty good example of someone who valued understanding!

#

This is kinda how I feel about comp.sci as someone who started out in procedural, imperative programming. I kinda have that (steps, processes, operations) baked into my intuition (rather than relationships etc)

iron basalt Jan 5, 2025, 10:34 PM

#

iron basalt https://www.youtube.com/watch?v=B-eh2SD54fM

This video is missing the last line where he says "and later it always turns out that the poor physicist has to come back and say excuse me, when you wanted to tell me about the 4 dimensions..."

#

As for ML, it's in its 'Babylonian' phase. We know certain parts and how they are connected, but not the whole thing from which it can be built (like axioms). And this is why most ML papers feel like loosely connected guesses that are only somewhat mathematically justified (this varies, but those that seem most immediately applicable probably are like this). https://www.youtube.com/watch?v=YaUlqXRPMmY

YouTube

TehPhysicalist

Feynman: 'Greek' versus 'Babylonian' mathematics

Richard Feynman explains the main differences in the traditions of how mathematical reasoning is employed between mathematicians and physicists.

▶ Play video

gray slate Jan 5, 2025, 10:36 PM

#

Re: dimensions, I've spent the past year or so trying to wrap my head around what the hell a dimension even is. And I kept coming back to pi and sine, and became convinced that rather than being this "infinitely long, transcendental number" it was "a simple process where you start off with something and collapse back to some ratio"

#

Then I discovered that "playing pool with pi" thing and was blown away by it

iron basalt Jan 5, 2025, 10:37 PM

#

iron basalt As for ML, it's in its 'Babylonian' phase. We know certain parts and how they ar...

The 'Babylonion' phase always comes first in new inventions, because it's the fastest way to make progress and the most natural process of randomly poking at things that humans do.

gray slate Jan 5, 2025, 10:40 PM

#

iron basalt The 'Babylonion' phase always comes first in new inventions, because it's the fa...

I honestly don't think that mathematics actually produces understanding, it's kinda orthogonal to proof. From a philosophical perspective I always go back to the "try to write a simple program that implements Pythagoras theorem, and would pass code review"
I can't do it yet. One day I hope to be able to

iron basalt Jan 5, 2025, 10:40 PM

#

gray slate I honestly don't think that mathematics actually produces understanding, it's ki...

It does produce understanding, but only after it has been reduced via special case, as when it's applied to the real world.

#

This is a limitation of humans with complexity.

#

What we can "understand" is a small subset.

rancid sorrel Jan 5, 2025, 10:42 PM

#

tidal bough There's a nice explanation of why mean squared error is common in statistics in ...

any idea what is a "good" mse for depe learning?

iron basalt Jan 5, 2025, 10:42 PM

#

We understand the simple cases, and then we take a leap of faith on process to generalize.

#

With careful rigor.

gray slate Jan 5, 2025, 10:43 PM

#

iron basalt This is a limitation of humans with complexity.

But also the process, right? Like, if you optimize for proof alone, and you have this tree-search that goes really deep and narrow, and you can keep going indefinitely. But if you don't use that tool, you can't go very deep - you have to go to the next level, invent concepts then pull those back out to the level above. You get understanding that way, but it's a much less precise and efficient way of working, takes generations to change language and so on

rancid sorrel Jan 5, 2025, 10:43 PM

#

i think my moder is overfitting, but is working great on test data

#

so i really got no idea how much i trust it

#

esp as it get there in 3 epocs

iron basalt Jan 5, 2025, 10:45 PM

#

gray slate But also the process, right? Like, if you optimize for proof alone, and you have...

This is where heuristic search comes in or "intuition." However it's very limited for humans, but we still try. And it can go decently far after much practice and exposure, but math is endless, so you have to give up at some point.

#

We then have computers (machines), which can also use heuristics and are much faster. We have automated provers and such, which have already done much that humans seem to not be able to (in a reasonable amount of time).

rancid sorrel Jan 5, 2025, 10:47 PM

#

{
    "train_mse": 0.0037,  
    "test_mse": 0.0057,   
    "train_rmse": 0.61,  
    "test_rmse": 0.75,   
    "train_mae": 0.50,   
    "test_mae": 0.53,    
    "test_mape": 3500687731287.13
    "train_r2": 99.95,   
    "test_r2": 99.93    
}```

#

these are % btw

#

and i have no idea how much i sould trst results like these in 3 epocs

gray slate Jan 5, 2025, 10:47 PM

#

iron basalt This is where heuristic search comes in or "intuition." However it's very limite...

I think maybe not having the ability to use heuristics, by being somewhat blind to it and leaning much harder on method and symbolic reasoning, many lifetimes were spent chasing things that are essentially arguing over how many angels can dance on the head of a pin

gray slate Jan 5, 2025, 10:48 PM

#

rancid sorrel ```json { "train_mse": 0.0037, "test_mse": 0.0057, "train_rmse"...

what's your data and what you're trying to solve? I'm a ML noob but have a decent nose for data and stuff

tidal bough Jan 5, 2025, 10:48 PM

#

rancid sorrel any idea what is a "good" mse for depe learning?

I think for ML regression problems people mostly use just MSE. I've seen fancier loss functions like Huber loss but not sure they're used in deep learning.

gray slate Jan 5, 2025, 10:49 PM

#

gray slate Then I discovered that "playing pool with pi" thing and was blown away by it

https://prajwalsouza.github.io/Experiments/Colliding-Blocks.html

Colliding Blocks Counting PI - 3Blue1Brown

A block collision simulation based on a 3Blue1Brown video.

iron basalt Jan 5, 2025, 10:49 PM

#

gray slate I think maybe not having the ability to use heuristics, by being somewhat blind ...

Well the neat thing about math is that even if it's not where you wanted to end up, as long as you have rigor, you did prove something. And so some progress was made. Who knows if it's useful or not.

rancid sorrel Jan 5, 2025, 10:49 PM

#

time serise data, predicting a specifc column,

iron basalt Jan 5, 2025, 10:50 PM

#

iron basalt Well the neat thing about math is that even if it's not where you wanted to end ...

This is in contrast to physics, where you could have wasted a lifetime on something not real (e.g. string theory).

rancid sorrel Jan 5, 2025, 10:50 PM

#

thats LTSM ive got another model with random drop and its still over 95% accuracy

#

honestly i just feel suspecious when i get any corrolation this good

tidal bough Jan 5, 2025, 10:50 PM

#

rancid sorrel ```json { "train_mse": 0.0037, "test_mse": 0.0057, "train_rmse"...

99.93% test accuracy? IMO that means that either your dataset is trivial, or something is wrong and your test dataset leaks into the training
(or your training process implicitly does this, if you're doing something unusual like genetic algoritms)

rancid sorrel Jan 5, 2025, 10:51 PM

#

its a month of stock data at 1m trade incraments

gray slate Jan 5, 2025, 10:51 PM

#

iron basalt Well the neat thing about math is that even if it's not where you wanted to end ...

Well a lot of it is framing and is built into the assumptions. Like take Cantor's diagonal argument, he was proving the infinite infinity of the king of kings. When he could have been disproving the possibility of the reals

rancid sorrel Jan 5, 2025, 10:52 PM

#

data is shape (8601, 6)

#

for one month, one stock

gray slate Jan 5, 2025, 10:52 PM

#

rancid sorrel its a month of stock data at 1m trade incraments

I think the problem with stock data is you're always gonna have other people's prediction models built into it, and they'll have more parameters than you, right?

iron basalt Jan 5, 2025, 10:53 PM

#

gray slate Well a lot of it is framing and is built into the assumptions. Like take Cantor'...

Yes, you need to state your assumptions. That is the game you are playing. Mathematicians are interested in playing such games, regardless if they apply to reality. It's an art form, and needs no use case like other art forms.

rancid sorrel Jan 5, 2025, 10:53 PM

#

#

for 500 diffrent stocks and 500 trainings the validation goes to under 0.5% loss in 3 or less epcos

tidal bough Jan 5, 2025, 10:55 PM

#

hmm, 0.5% of what?

rancid sorrel Jan 5, 2025, 10:55 PM

#

its ltsm MSE, adam as the optimizer

iron basalt Jan 5, 2025, 10:55 PM

#

iron basalt Yes, you need to state your assumptions. That is the game you are playing. Mathe...

Note that in math, you just state the rules. But in physics, the entire point is to try to figure out what they are, and you can never be sure that your guesses are correct, only that they are not shown wrong yet.

rancid sorrel Jan 5, 2025, 10:56 PM

#

so the val loss is MSE evaluator per epoc

tidal bough Jan 5, 2025, 10:57 PM

#

iron basalt Note that in math, you just state the rules. But in physics, the entire point is...

The other way famously worked out quite fine for Einstein :p

gray slate Jan 5, 2025, 10:57 PM

#

iron basalt Yes, you need to state your assumptions. That is the game you are playing. Mathe...

Yeah you have a huge number of possible axioms, and the space of questions you can ask is.. I dunno, factorial of that? then you have cultural leanings and fashions that lead you into the space, much of which is a psychological trap

rancid sorrel Jan 5, 2025, 10:57 PM

#

but this is honestly why i am deeply suspious, its a standard 80% split for data and seed 42

#

so nothing unusaual ther

#

80:20

tidal bough Jan 5, 2025, 10:58 PM

#

rancid sorrel but this is honestly why i am deeply suspious, its a standard 80% split for data...

Yeah, I'd be suspicious too. Maybe try graphing real vs predicted prices for one stock, to see if maybe the loss is higher than you think

gray slate Jan 5, 2025, 10:58 PM

#

rancid sorrel but this is honestly why i am deeply suspious, its a standard 80% split for data...

are you gonna use it for trading? 'cause it might be better to try to predict Elliot Wave or something rather than raw price

rancid sorrel Jan 5, 2025, 10:58 PM

#

i can hit month 4 with the model trained on 3 months of data and see if thats going ok

rancid sorrel Jan 5, 2025, 10:59 PM

#

gray slate are you gonna use it for trading? 'cause it might be better to try to predict El...

not yet no, i am evaluatiing diff models for effeincy

#

and accuracy

tidal bough Jan 5, 2025, 10:59 PM

#

if they really do match, then it's somehow training on the test data- huh, wait a minute. This is an LSTM, right? Possibly you need some special way of splitting the data for LSTMs, since otherwise your "training" set might accidentally cover the entire dataset (just not all possible subsets of it)

iron basalt Jan 5, 2025, 10:59 PM

#

tidal bough The other way famously worked out quite fine for Einstein :p

Yeah, you still guess the rules, but unlike math where that is what it is (these are the rules I feel like playing with), in physics you need to check it against reality.

#

(Experiments)

rancid sorrel Jan 5, 2025, 11:00 PM

#

tidal bough if they really do match, then it's somehow training on the test data- huh, wait ...

def pipeline_1d(input_file, model_creation_func):
    data = pd.read_parquet(input_file)
    X_train, X_test, y_train, y_test = split_train_test(data, 'Close')
    model = model_creation_func(X_train)
    tensorboard_cb = tensorboard_callback(input_file, model.name)
    model, history = fit_model(X_train, X_test, y_train, y_test, model, input_file)````

iron basalt Jan 5, 2025, 11:00 PM

#

iron basalt Yeah, you still guess the rules, but unlike math where that is what it is (these...

https://www.youtube.com/watch?v=EYPapE-3FRw

YouTube

seabala

Feynman on Scientific Method.

Physicist Richard Feynman explains the scientific and unscientific methods of understanding nature.

▶ Play video

#

(All of this in the same lecture btw, covers a lot)

tidal bough Jan 5, 2025, 11:01 PM

#

rancid sorrel ```python def pipeline_1d(input_file, model_creation_func): data = pd.read_p...

Does split_train_test make sure not to shuffle the points?

rancid sorrel Jan 5, 2025, 11:01 PM

#

it really shoudnt be using the testing data in training, given its not being passed that way
y_train_pred, y_test_pred = model.predict(X_train), model.predict(X_test)

iron basalt Jan 5, 2025, 11:01 PM

#

iron basalt https://www.youtube.com/watch?v=EYPapE-3FRw

(note that current ML feels a lot more like this than math in many cases)

rancid sorrel Jan 5, 2025, 11:01 PM

#

def split_train_test(data, target_column):
    X, y = split_features_target(data, target_column)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Reshape X_train and X_test to 3D (samples, timesteps, features)
    X_train = X_train.values.reshape((X_train.shape[0], 1, X_train.shape[1]))
    X_test = X_test.values.reshape((X_test.shape[0], 1, X_test.shape[1]))
    
    return X_train, X_test, y_train, y_test

gray slate Jan 5, 2025, 11:01 PM

#

tidal bough The other way famously worked out quite fine for Einstein :p

Einstein worked in the patent office and had the idea that patent time registration was relative due to speed of communication - that insight was a real-world intuition right?

rancid sorrel Jan 5, 2025, 11:01 PM

#

did i fuckup?