#data-science-and-ml

1 messages · Page 155 of 1

mortal bolt
#

thanks billy, i will take a look

wooden sail
#

a while back, someone i know asked similar questions regarding indexing and slicing subarrays on stack overflow. maybe my answer helps you some https://stackoverflow.com/questions/76627832/understanding-the-behaviour-of-advanced-multi-dimensional-indexing-on-a-4d-ndarr

thorny geode
#

thank you 🙂

thorny geode
#

ah, is that a good thing or a bad thing? sorrys i want to clarify what are you trying to convey

left tartan
#

So it's ok to be confused

thorny geode
pale sierra
#

Hi. Stelercus. What do you work on?

serene scaffold
serene scaffold
pale sierra
#

but its been on and off

fallen gorge
#

Hey guys, I'm a new member,, I'm a beginner in the coding world, I was looking for some guidance from you guys. Where do I begin from?

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

fallen gorge
#

@serene scaffold @serene scaffold Thanks 😊

rich moth
#

Starting to see some really strange learning dynamics. It learns in like three phases from what I can tell. First, it quickly learns basic patterns (epoch 0-50), then adapts its quantum "inspired" features to tackle more complex stuff (epochs 50-100). Then seems it stabilizes and dynamically transitions between states based on market complexity looking for an optimal balance. But I thought you guys might be interested to see it, I don't think I've run into intersecting lines like that before. I had an idea for visualizing the plots, I was gonna to save them every epoch and combine them in chronological order to make like a "flip book" of the animation. Or does anyone know if their any modules forr this?

lapis sequoia
#

Does being stupid cracked at game theory, like, being the actual best response to all responses help with GANs and reinforcement learning? Pic related, it was years ago, but still, does game theory help a lot with RL and GANS?

simple plinth
#

from where can i start learning DS+AI?

devout cloak
#

I mean a GANs system is literally a two player zero sum game where the goal is to reach the Nash Equilibrium where the discriminator cannot tell the difference between the generated images and the real images.

You could employ strategies for optimization based off of
game theory like having alternating updates to the gradient descent to prevent oscillations within the system

rich river
#

it is really annoying there are CPU tensors and cuda tensors

thorny geode
#

hello, short question here,

Auto_re.loc[lambda df: df['year'] > 80, ['weight', 'origin']]

how does python knows that df takes argument from Auto_re?

#

its property specific 😑

#

these python lab chapter is almost done !!!!

tawdry sundial
#

Are rnn only able to have 3 unique weights in 1 layer?

tawdry sundial
#

I cant think of a way of fitting more weights

#

w1. w2 and w3

#

they are the same just unrolled

serene scaffold
#

an RNN is just "a neural network with a cyclic computation graph and a hidden state". any architecture that does those things is an RNN.

tawdry sundial
#

oof

#

that a bit broad definition

serene scaffold
#

it's less broad than "neural network".

#

you don't need to know about all possible RNNs.

eager hamlet
#

hey, just trying to start out with learning ML

would it be better to go through a course like what Andrew NG taught or read through something like "Dive into Deep Learning"?

serene scaffold
#

and if the course says that you need to know something before taking the course (like some kind of math), they mean it.

eager hamlet
#

yeah I'm familiar with undergraduate level math

eager hamlet
serene scaffold
eager hamlet
#

^ this one

serene scaffold
#

that was ages ago bing_shrug I don't even know that person

eager hamlet
#

ah lol okay

serene scaffold
#

but I hope they're doing great, whoever they are.

thorny geode
#

Stelercus, have you tried doing kaggle competitions

thorny geode
serene scaffold
#

I hope you, too. refers to a corrected typo

thorny geode
serene scaffold
thorny geode
past meteor
#

It's good fun but imo Kaggle is far from representative of data science / ML

thorny geode
past meteor
#

That's one reason, the other one is more stuble and harmful

serene scaffold
tawdry sundial
past meteor
# thorny geode is it because they only deal with predictions?

I'll give you the technical definition because it's more concise 😄

Say you have a training distribution and a testing distribution.

For the training set you have access to

X ~ P(X), the distribution of the independent variables
Y ~ P(Y), the distribution of the dependent variables
P(Y|X), their relationships.

For the test set you have access to:

X ~ P(X) <----------- this is the problem (!)

In the real world you totally do not have access to the independent variables you will be predicting for in the future a priori. Kagglers abuse this information a lot by doing stuff like a PCA on the entire dataset instead on just the training set etc.

Also, lots of models in the real world fail because of concept drift, this is precisely when P(X) is changing over time

#

Basically, in all of the competitions I did you had to leak to get ahead

fickle shale
past meteor
#

Or in the real world you have a different issue where you introducing the model will actually influence P(X) and send you to places you've never trained on etc.

past meteor
thorny geode
rich moth
past meteor
#

Personally, I learnt a lot from doing tabular playground so I'd recommend doing a bit of Kaggle

fickle shale
past meteor
#

As for portfolios? I had one, nobody ever asked in interviews, ever 😄

past meteor
thorny geode
past meteor
#

And then they run inference against your model without ever giving you samples of the test set => a lot fairer

thorny geode
tawdry sundial
past meteor
#

Anyway, if you want to learn ML/AI and you have the ability to just go to uni that's my recommendation

tawdry sundial
#

validation loss is way below the training loss

#

didnt know that was a thing

past meteor
past meteor
fickle shale
#

lot of math!

tawdry sundial
past meteor
thorny geode
past meteor
thorny geode
past meteor
# thorny geode yeah

That means you're miles ahead of where I was when I was your age, keep it up ❤️

thorny geode
#

although honestly i can pursue the math aspects of data science since i am lucky enough to experience competitive math and programming beforehand

rich moth
#

No, it's more sophisticated than that. It isn't just learning simpler patterns, it's actually learning to balance prediction accuracy with uncertainty estimation. The decreasing validation loss shows the model getting better at both predicting AND knowing how confident it should be in each prediction. That V-shaped error pattern in the plots shows its learning proper market behavior. It's understanding that larger moves inherently have more uncertainty.

lapis sequoia
#

I ment well*

eager hamlet
#

you're talking about cs229 right?

serene scaffold
eager hamlet
serene scaffold
devout cloak
# lapis sequoia

There’s no way to objectively define an answer as this is a subjective question. What are your goals and what kinds of things are you trying to build

odd meteor
# rich river it is really annoying there are CPU tensors and cuda tensors

I think it's probably gonna be less annoying if you look at it this way 😀

By default, when we create a tensor, it'll reside in the CPU. However, you have the liberty to relocate this tensor to a new residence (GPU) if you wanna optimize for speed since GPU usually have more cores than CPU.

It's pretty much easy to move a tensor back and forth from GPU to CPU. You just have to do that with keen attention to avoid performing operation on two tensors that resides in different locations.

candid hornet
#

hi everyone
i am new to this server
does any one know about the gguf

#

what is the system requirements for this

#

hi

random dune
#

is this the right channel for pandas questions?

mystic peak
#

how do I make a reward system with ai learning system I saw a mario kart vid on a person making an ai that impoves the longer it plays and I wondered how it works

#

I want to see if it's possible to make one for a 2d fighting game and see if it can beat actual people

serene scaffold
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

odd meteor
# fickle shale so how someone who is fresher ,create protfolio for entry level job?

While the number of ML job openings has exploded in recent years, the number of applicants has grown even x10 times.

This means landing an ML job today is WAAAY harder than it was five years ago.

If this makes you feel anxious or triggers that "I’m not good enough" thoughts, just take a deep breath cos everyone has it. E-V-E-R-Y-O-N-E.

Grab your favorite drink and shift your perspective:

  1. What are companies hiring ML engineers trying to solve?
    These companies face overwhelming noise in the AI space, and they desperately need technical experts (like you) to cut through it. They need people who can design and build ML systems that turn raw data into smart decisions.

What does this mean for you?

You need to demonstrate that you can take a real-world business problem, frame it as an ML problem, and solve it by:

  • Building a feature pipeline (feature engineering).
  • Creating a training pipeline (training or fine-tuning models).
  • Serving predictions (inference pipeline).

Package this solution into a Docker container and deploy it to a compute platform like AWS Lambda or Kubernetes.

This is the essence of real-world ML engineering—no more, no less.

How do you show this? Build a professional side project.

What makes it professional?

Gone are the days when a polished Jupyter notebook on GitHub was enough to land a job. Today, you need to go further.

Solve a specific business problem by:

  • Picking a real-world problem that excites you.
  • Using a data API to ingest and transform data into ML features.
  • Training a model (e.g., an XGBoost model, an LLM agent, or fine-tuning a base LLM).
  • Building an API to serve the model’s predictions.

Finish it with a clear, professional README file in your repo.

This is what hiring managers need to see—and it’s absolutely within your reach.

Remember:
You don’t learn first and then build. You learn by building.

To your success 🥂✌️

random dune
#

My question is: I have a dataframe with three columns. I want to use these values to calculate a new value for every row in this dataframe. The formula is seen in the image above, and n = 3 just like the three columns. I have the values for x[i0] as a tuple of 3 double tuples. I want to use these values later as the x-axis of a plot. Would it be better to make this new column into a series or to add it to the dataframe as another column?

#
values = confirmed_exoplanets.loc[:, ["koi_period", "koi_teq", "koi_prad"]]```
#

confirmed_exoplanets is a subset of a larger dataset

serene scaffold
random dune
#

{'koi_period': [9.48803557, 54.4183827, 2.525591777, 11.09432054, 4.13443512], 'koi_teq': [793.0, 443.0, 1406.0, 835.0, 1160.0], 'koi_prad': [2.26, 2.83, 2.75, 3.9, 2.77]} is the sample of values

serene scaffold
#

@random dune I would organize it like this.

In [75]: koi
Out[75]:
      period     teq  prad
0   9.488036   793.0  2.26
1  54.418383   443.0  2.83
2   2.525592  1406.0  2.75
3  11.094321   835.0  3.90
4   4.134435  1160.0  2.77

In [76]: coef
Out[76]:
          0  1
period  365  1
teq     254  2
prad      1  4

In [77]: koi * coef[1]
Out[77]:
      period     teq   prad
0   9.488036  1586.0   9.04
1  54.418383   886.0  11.32
2   2.525592  2812.0  11.00
3  11.094321  1670.0  15.60
4   4.134435  2320.0  11.08
#

note how I changed the names of the columns in koi, and also how doing koi * coef[1] multiplies the teq values by 2 and the prad values by 4.

random dune
#

i see, can this be extended into a full expression like the equation i sent? so pow(1 - abs((koi[0] - coef[0][0])/koi[0] + coef[0][0])), coef[1][0]/3)

#

or something of the sort, ofc

serene scaffold
#

@random dune idk if this is what you want

In [81]: (koi - coef[0]) / (koi + coef[1])
Out[81]:
       period       teq      prad
0  -33.896907  0.677987  0.201278
1   -5.604307  0.424719  0.267936
2 -102.812359  0.818182  0.259259
3  -29.262138  0.694146  0.367089
4  -70.283401  0.779690  0.261448

In [82]: np.prod(1 - np.abs((koi - coef[0]) / (koi + coef[1])))
Out[82]:
period   -3.019632e+07
teq       2.269543e-03
prad      2.024582e-01
dtype: float64
#

also I know it's missing the exponent. idk what w is.

random dune
#

oh, well the x[i0] in the equation would be the first column of your version of coef, and each w is the second column

serene scaffold
#

@random dune this?

In [93]: coef
Out[93]:
          0  w
period  365  1
teq     254  2
prad      1  4

In [94]: koi
Out[94]:
      period     teq  prad
0   9.488036   793.0  2.26
1  54.418383   443.0  2.83
2   2.525592  1406.0  2.75
3  11.094321   835.0  3.90
4   4.134435  1160.0  2.77

In [95]: np.prod((1 - np.abs((koi - coef[0]) / (koi + coef[0]))) ** (coef['w'] / len(koi)))
Out[95]:
period    0.047383
teq       0.201104
prad      0.071537
dtype: float64
random dune
#

is there a way to get the product for each individual row in koi? or rather what is prod multiplying?

serene scaffold
#

You can do axis=1 instead

In [100]: np.prod((1 - np.abs((koi - coef[0]) / (koi + coef[0]))) ** (coef['w'] / len(koi)), axis=1)
Out[100]:
0    0.278978
1    0.400076
2    0.159780
3    0.204347
4    0.187055
random dune
#

I see! so this is a list of those values! thank you very much, now I know operations like this can be executed on dataframes and series

untold bloom
#
In [55]: coeffs_df = pd.DataFrame.from_dict(coefficients, orient="index", columns=["x0", "w"])

In [56]: df.sub(coeffs["x0"]).div(df.add(coeffs_df["x0"])).abs().rsub(1).pow(coeffs_df["w"].div(3)).prod(axis=1)
Out[56]:
0    0.000074
1    0.000034
2    0.000028
3    0.000093
4    0.000015
5    0.000031
6    0.000000
7    0.000016
dtype: float64
#

pandas has methods for its objects to perform your formula, sub for subtraction, abs for absolute value, pow for power etc.

#

you can chain them to build it

#

s.rsub(1) does 1 - s

#

all element-wise, except for .prod which is an aggregator

random dune
#

Oh okay!

untold bloom
#

we say to collapse each row by taking the product over columns belonging each row (1 means that; 0 is the default axis, collapses other way)

#

as an aside, numpy's aggregators by default aggregate the entire thing, i.e., give back a scalar; however, when passed a pandas object, since they implement appropriate numpy dunders, the default axis=0 is in action

random dune
untold bloom
#

yes your main dataframe

#

it's assumed to have column names same as the keys of coefficients' dictionary; period etc.

#

so that the alignment will work as intended

random dune
#

Ok, so the chaining follows PEMDAS correct?

untold bloom
#

it follows your formula from inner to outer side

#
  • subtract x0 first
  • then divide that by x + x0
  • then take absolute value etc.
#

operations are element-wise so they happen to every element of the frame

#

and the x0 and w values will be "broadcast" appropriately to happen to each row as intended

#

because frame is of shape (N, 3), x0 and w (3,) each

#

it's as if x0 and w are repeated N times to have (N, 3), then operations are done

#

and which coefficient goes to which column is determined by matching their names

#

both broadcasting and alignment happen automatically for us

random dune
#

and by N you mean the length of the frame not the n in the formula, which we take as 3?

iron basalt
untold bloom
#

N = 8 in the example above

#

that link is a wrong answer

random dune
#

so the resulting series should have a length of N too, right?

untold bloom
#

indeed

random dune
#
coeffs = pd.DataFrame.from_dict(values, orient="index", columns=["x0", "w"])
exoplanets = confirmed_exoplanets.loc[:, ["koi_period", "koi_teq", "koi_prad"]]
print(exoplanets.sub(coeffs["x0"]).div(df.add(coeffs["x0"])).abs().rsub(1).pow(coeffs["w"].div(3)).prod(axis=1))
``` using this code, i get the following output: ```0       1.0
1       1.0
2       1.0
3       1.0
4       1.0
       ... 
9559    1.0
9560    1.0
9561    1.0
9562    1.0
9563    1.0
Length: 9564, dtype: float64
#

which is weird, since it seems it tripled or even more

#

it probably is due to some misuse

untold bloom
#

you have "df" in the code what is it

random dune
#

the exoplanets would be that, since it is the subset of the main dataframe but only taking those 3 columns

untold bloom
#

so you have df = exoplanets or something

#

because code shared uses exoplanets and df both

#

also these have ["koi_period", "koi_teq", "koi_prad"] koi_ in front

#

coeffs don't, so there is a mismatch

#

you can do coeffs = pd.DataFrame....add_prefix("koi_") to remedy that

random dune
#

can i also just add koi_ manually to the values set?

untold bloom
#

that also works

random dune
#

Okay! it works finally! So as a last question, when i will want to plot using this new series as the x and another column as the y i can extract y and just plot it correct?

untold bloom
#

yes absolutely

#

you can even do df.plot(x="teq", y="something")

#

or you can use whichever plotting library tou want to use by passing df["stuff"] to them as x/y values

random dune
#

okay so the x is the series from that formula, just named that way yes? and y would be my extracted other series

#

or no?

#

oh wait i see df there

untold bloom
#

yes

#

well if everything is in a dataframe as columns, we access them using df[col_name] syntax right?

#

then you can do, e.g., plt.plot(df["some_col"], df["some_other"])

#

the first one was a convenience function of dataframes to plot "quickly" as calling a method

random dune
#

ah i see so df is the combined x and y

untold bloom
#

then you pass column names as strings there and it knows which dataframe to look at because that's what it's called on

#

yes df is still exoplanets

#

i kind of assumed this newly calculated thing also went into this as a new column

#

it doesn't have to

random dune
#

so i used plt.plot(star_mass, exoplanets_similarity) (star_mass is a series) and got this. I assume its because star_mass being the index and not sorted has caused this mess

#

i guess ill have to combine the two series into one and then sort it?

#

by star_mass ofc

untold bloom
#

scatter plot maybe?

#

plt.scatter

#

sorting also works if it's reasonable to do for the quantities

random dune
#
star_mass = confirmed_exoplanets["koi_smass"]
plot1 = pd.concat([star_mass, exoplanets_similarity], axis=1)
plot1.sort_values(by=["koi_smass"])
print(plot1)``` and the result is ```      koi_smass         0
0         0.919  0.119111
1         0.919  0.217223
4         1.095  0.047048
5         1.053  0.070895
6         1.053  0.061181
...         ...       ...
8817      0.169  0.245184
8956      0.169  0.160354
9014      0.892  0.889360
9083      1.010  0.109329
9181      0.698  0.235039
#

which seems unsorted still

#

and when i added plot1.sort_values(by=["koi_smass"], ascending=True, inplace=True) i got ``` koi_smass 0
6020 0.096 0.192808
3043 0.096 0.241544
655 0.132 0.127794
652 0.132 0.118477
653 0.132 0.071698
... ... ...
1633 2.646 0.008439
519 2.736 0.001650
5968 3.573 0.004942
2210 NaN 0.626638
8571 NaN 0.332637

untold bloom
#

unless you pass inplace=True

#

another way is re-assign, i.e., df = df.sort_values(...) (df generic frame here)

random dune
#

is the NaN a result of some rows being left null or just a number range issue?

untold bloom
#

that may be due to pd.concat([star_mass, exoplanets_similarity], axis=1)

#

if star_mass and exoplanets_similarity don't have the exact same index, then for the nonexistent indexes in one another, NaN will be put to the missing one

#

and the resultant index is the union of that of passed Serieses

#

if they have the same index, then NaN is coming from somewhere else

#

inherent

random dune
#

could it be because exoplanets_parameters = confirmed_exoplanets.loc[:, ["koi_period", "koi_teq", "koi_prad"]] uses loc and star_mass = confirmed_exoplanets["koi_smass"] doesnt? they are both a column subset of the same dataframe, it seems

#

if not then it seems it would be inherent

untold bloom
#

two are different way of selecting columns, achieveing the same, so that wouldn't be the root cause yeah

#

might need to look at smass' source to see if it was there to begin with

#

formula may give NaNs too, e.g., 0/0 is NaN

random dune
#

Yep, turns out the dataset doesnt have it, weird, ill have to redownload it

limber belfry
#

Does anyone have a code for a real time object detection program using a custom yolo11n model? The frames are from my screen. It would be perfect if i can get like 10-15 fps (ips). Please help. @ me if you answer

flat hawk
#

Anybody has any idea why does my training loss seem to systematically dip?

odd meteor
# flat hawk Anybody has any idea why does my training loss seem to systematically dip?

It's most likely learning rate (lr) related. A well-choosen learning rate allows for steady movement towards global minima w/o getting trapped in local minima or experiencing bumpy fluctuations.

  • If lr is too high, the loss will jump erratically (yours isn't so crazy though,)

  • if lr is too low, your model get stuck in local minima or take too long to converge.

If you have the time for further experiments, try to figure out if it's really a lr issue. You could use

  1. Good old manual strategy (start from using a lr that's too large, say, lr= 1.0, then lower it until it's too small, say, lr= 1e-4 then compare and contrast with the resulting learning curve from your experiments)

  2. Look for a paper that solved similar task, then use the same learning rate they used (a shortcut that works like charm)

  3. If you don't wanna take the shortcut in #2, then try using automatic learning rate finder (Lightning framework has this cool option that helps one automatically find an optimal learning rate).

You can as well try other advanced concepts like:

  1. Learning rate schedulers like StepLR, Decay on Plateau, and Cosine Annealing.

  2. Adding momentum parameter in your optimizer to help dampen those oscillations.

rich moth
rich moth
#

This is my go to strategy

#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

rich river
midnight swallow
#

hi guys, i am just starting data science and ai, can someone give me like a mindmap to follow?

fickle shale
random dune
paper oracle
#

Hi guys I'm comfortable coding in python but would like to start creating an ai but i do not know where to start so could anyone dm how did you guys start

#

I've completed harvard's course CS50 Python

odd meteor
odd meteor
odd meteor
clear condor
#

so i made a working first order ODE solver

#

(for orbits)

#

now im trying to do Runge-Kutta 4th order and its just not working. i understand the math but i dont understand why its not working

flat hawk
clear condor
#

hELP

devout cloak
#

[test smaller dt and try to see if that helps, if so try and find a good or optimal balance between compute time and error reduction]

#

unrelated to above

I’m looking to start working on some new techniques for hyperparameter optimization in GANs and I wondered if anyone has any interesting papers I should consider, I am familiar with optuna, but I’m looking to do some paper implementations for practice and experimenting

odd meteor
#

SIGIR 2025 is happening in Europe as well. If you work on low resource languages or information retrieval, you can submit a 2-pages proposal to this venue and go present your work.

https://sigir2025.dei.unipd.it/index.html

SIGIR 2025

The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval | July 13-18, 2025 in Padua, Italy

lapis sequoia
# lapis sequoia
poll_question_text

What should I learn (fyi -i learnt python very qell before)

victor_answer_votes

2

total_votes

4

fading wigeon
# lapis sequoia

Context? What are you trying to do and JS and ML just to clarify what do they stand for? (I'm assuming javascript and machine learning but don't want to assume)

#

Also, I have a question. When you're trying to diagnose a model to see if getting more training data will help, what do you do?

My first thought would be to do a learning curve, train the model on subsets of your training set and look at your cross validation and training set error to determine if new data helps. But that can be computationally expensive so I understand that it's not done in practice. So how would you determine if expanding your data set would help? (I've worked in fields where it was exorbitantly expensive to acquire more data, so would only do it if it could be proven to help beforehand)

fading wigeon
#

Hey, I finally got a chance to go back and review this, thank you!

My only question is what do you mean by training multiple output layers? I thought there was only a single output layer? Or am I mising something?

wheat merlin
# midnight swallow hi guys, i am just starting data science and ai, can someone give me like a mind...

Ok. So I feel pretty qualified to answer this as a current PhD student. I started from scratch, about 2 years ago, with a very rudimentary math background (basic calc).

What I did was I tried to learn and understand everything in the Sci-Kit learn library. Just go through watch videos and look at examples. Then in your own code, I would practice regression (linear, logit, and maybe fixed/random effects time series) -> then I would go to classification (learn random forest, logit, svm, other advanced versions you find interesting) -> then I would go to clustering and dimensionality reduction (k-means and PCA; you could go deeper, but I dont find this stuff as important/interesting personally) -> then I would learn some basic preprocessing (return to previous regressions or whatever, and learn imputation, feature extraction, and normalization techniques)

Once you have learned all this, I feel like you can now stand up within the machine learning space. Everything else you want to learn, should come easier. Personally, from this point, I directed all my time into learning natural language processing stuff. But if you find video/audio/image stuff more interesting you could do that. The nice part about these more advanced techniques, is that a majority of them run on the same 1-2 model architectures and therefore can be understood relatively easy.

#

From a natural language processing perspective what I did was I specialized with PyTorch (I dont like Keras/tensorflow libraries). Then, I implemented my first model to classify my text dataset by positive/negative sentiment using a base model included in PyTorch. Once I learned the math and architecture behind these base models, I dedicated a ton of time into the older (2017) BERT models (not generative AI/LLM). I taught myself using PyTorch documentation to build RoBERTa from scratch, and implemented all the code for the tokenizer, attention mechanism, feedforward, dataloader, etc. This was the most informative project I did for sure. In the process, I made sure to understand all the function parameters to the best of my ability, which was definitely a really good thing. From here, I would say you are more than qualified to start reading research papers and digging into all the nascent advancements.

Sorry, this was a very long post, but I wish I had this when I started learning. The biggest thing I would say I learned about machine learning in general, is that it is a ton of different fields working together. You have to be an effective statistician, mathematician, programmer, data analyst, and data scientist to truly understand all the intricacies/complexities the field is moving to.

#

Obviously this was my experience as a research focused individual, other's may have had different experiences

#

I would also HIGHLY recommend that you work with data you find interesting and intriguing. You can answer any question about the data, and to really engage with the learning, I think it's super important you choose datasets you have questions about. Kaggle is a good website for this as is Nature's database, or Harvard Dataverse for academic papers.

wheat merlin
#

Would be slightly more computational feasible. I'm not sure of any other interesting methods, but there probably are some new advancements

odd meteor
fading wigeon
#

What do you mean by bootstrapping the data in this context?

wheat merlin
#

Using the dataset you have, and creating an extra amount of rows. Maybe (20% more) if it's a few thousand.

#

I think that would be a good method for survey methodologists or social scientists to predict cost/benefit

#

I did some reading

#

I guess you could also just cross-validate on your set and look at the variability between folds

fading wigeon
#

Yeah I suppose

fading wigeon
#

A learning curve still seems to be the ideal way, but... I suppose it can sometimes be hard to justify the computational resources

#

or you could be in a field where it's cheaper to get more data than it is to do the learning curve

#

(I worked in neuroscience and more data meant conducting an expensive study that costs like 40k per patient)

dusty viper
#

Happy new year guys. I’m a sixth form student in the Uk and am confident in my basic Python codeing skills I would js like some advice on how to get stated with my ai journey, my aim by the end of next year is to build a chat bot that helps with finance, well thats the end goal😓. Can you guys direct me to some courses or smth that’ll help me learn how to implement my skills into Ai and start my journey, it would be great to receive advice from u guys who have been doing this for a while.

fading wigeon
#

Can you give some examples of what you might want it to do?

#

My first thought is to think of it as two separate projects. A chatbot interface then a finance model/algorithms for specific financial problems

#

Unless you want it to just give generic/general finance advice I suppose

dusty viper
#

It’s fro my NEA and want it to be based around investing stocks (because i have a lot of prior knowledge abt that field) and choosing the best etfs to invest in over a long term, but initially i just want to start learning how to code and actually make a chat bot, can you give me a starting point such as a course or a video that’ll help me

#

I’ve done a decent amount of game dev and got kinda bored of it so i wanted to move over to something i find more interesting such as AI

cold goblet
#

pivot according to my course slides 💀

#

I feel like I am losing my mind

#

can someone confirm that this is in fact not pivot

wheat merlin
#

i think of pivot as changing from wide or long, but I guess you could argue there is like "pivot" "pivot_wider" and "pivot_longer"

#

? maybe idk lol

#

definitely feels confusing though

cold goblet
#

I am honestly looking at it and, like this isn't actually doing anything but then it's in the slides

earnest canyon
#

Happy New Year to Everyone ✨💫

serene scaffold
cold goblet
#

I might still have to write it as pivot on the answer sheet but still

serene scaffold
cold goblet
#

like a group by, I guess??

serene scaffold
#

Uh no

#

What is the question?

#

When the rows become columns and the columns become rows, that's called transposing. And then pivoting is a specific thing that is completely different from that.

But if your instructor uses those words differently than everyone I know, I can't make them stop.

cold goblet
#

what are the possible operations in an OLAP? and the oeprations are drill up, down, slice, dice and pivot

serene scaffold
#

Idk what olap is

cold goblet
#

ahh. okay

serene scaffold
#

Sounds like it's a certain way of conceptualizing large stores of structured data

cold goblet
#

I'm thinking that too. but there's no mention of what exactly the operation is supposed to accomplish or what's the possible use case.

wheat merlin
#

I think it's just to display the information to a viewer differently in an eventual table or something

neat sparrow
#

Is HuggingFace good for dataset gathering and NLP libraries such as NLTK?

iron basalt
#

It's basically an implementation detail for databases that want to support operations that affect a lot of the data, not transactional.

#

Basically, OLTP oriented is lots of small (and simple) queries (that often edit state). OLAP oriented is a few very large (and complex) queries that are for analysis (often read only). Technically you can have some database that can do both well.

#

All the table libraries used in data science would usually fall under OLAP or OLAP adjacent (OLAP is a specific thing (the cube) (but maybe also not so specific, it's a bit hand wavy), but either way these libraries are for analysis, not doing a bunch of transactions).

#

(However, its specific terms for stuff like pivot mean something else (internal terminology))

neat sparrow
iron basalt
#

(Like how you don't need to care about what ACID is, the database just does the thing you want without knowing that)

wheat merlin
neat sparrow
wheat merlin
#

The best datasets that are used to train current models are on huggingface

#

if you want like "fun" datasets, you could look at Kaggle

#

if you want social science datasets like political information you could try Harvard Dataverse

neat sparrow
#

What about math or history ones?

#

Thank you for the info by the way

wheat merlin
#

Ya, idk about history exactly, but probably on Harvard Dataverse

#

if you are interested in like the history of wars or something, there is correlates of war which is a dataset you can find

#

here is some other history ones too

neat sparrow
#

Thanks. One more thing, according to Claude AI, I should use spaCY over NLTK for Natural Language Processing. Is that a good choice in my case, based on what it seems I'm using?

wheat merlin
#

I actually havent used either, I have only used PyTorch and HuggingFace

#

It looks like NLTK has more options and may be a bit more customizable

rich moth
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

gritty vessel
#

Hey is there any tutorial to train auto encoder on custom dataset?

#

My Input shape is 453,958

#

And it always comes out as 456,960

#

As output

neat sparrow
wheat merlin
#

No, I think you would need multiple GPU's and like 1tb of RAM lol

#

its 9.45 TB of harddrive space

neat sparrow
wheat merlin
#

You could try and see if there is a subset, or you could random sample the data

wheat merlin
#

Like randomly select 1 million rows out of the 1 billion, so that you have a smaller dataset to work with

#

Is there anything you like about that dataset in particular? I could try and help you find a more reasonable one for home compute

#

I would recommend trying this

#

So they import the dataset with huggingface and they are selecting specific keywords in the rows

#

it would probably take a long time to process, but you could get a few million rows to play with

#

just replace the keywords with stuff that you are interested in

neat sparrow
#

I'm looking for a dataset that's good enough to be able to give the model enough data to be able to answer most daily questions.

#

And questions with specific grammatical words

#

And also questions for programming, math, and history.

#

(Not all of those in one though. I'm okay with multiple small datasets that can accommodate those.)

wheat merlin
#

As far as academic stuff goes, the dataset you found would be good. You can use that code and replace the keywords with history or math questions

#

This python dataset would probably not take an obscene amount of time to model, the train is 300k rows

#

I found this book too, it looks like the end chapter (chapter 16) teaches you how to "use the LangChain library to combine pre-trained large language models (LLMs) with Wolfram Alpha and Wikipedia APIs to create a zero-shot know-it-all personal assistant."

#

which seems kinda like what you want

#

If you are interested in this book, i would recommend doing chapter 8 through 12, then 16 if you want to converge your models

neat sparrow
wheat merlin
#

yeah, its crazy

#

I think with the huggingface library though you can import the dataset iteratively so you don't have to download it or anything

#

if you wanted to skim through all the rows to get the data you want it would take forever of course

neat sparrow
#

I already found some good wikipedia datasets that are cleaned

wheat merlin
#

Nice. Yeah, I would start with that. You can always just replace the dataset if you find a better one. The model architecture would stay basically the same

wheat merlin
#

Have you done any implementation of pre-trained models before?

wheat merlin
#

Cool. If you need any help or have questions about the basic premises of the math behind neural nets you can @ me

neat sparrow
#

I used BERT to pre-train a model with some medium datasets

wheat merlin
#

cool, I did that over the summer for a polling firm I was working for

#

BERT is a great way to learn neural nets for sure

neat sparrow
wheat merlin
#

What do you mean

#

Like train multiple models for classification and linking them together? Or something else

wheat merlin
#

I think you can do that. I haven't done it personally. But it appears the LangChain library is what you would want to research

#

I know they use LangChain to make the multi-modal models that have image, chat, and other technologies all in the one

#

Not entirely sure about classification models though

neat sparrow
#

Alright, thanks, I think I've got enough answers for now. Is it okay if I dm you next time to not use up the chat?

wheat merlin
#

yeah 👍

solemn silo
#

Please can someone help me understand the chain rule because i swear I have spend forever, like a few months, and I still don't understand it 💢 ‼️

serene scaffold
#

also, can you tell me the derivative (wrt x) of 4x^3?

paper cove
#

hi

#

is there any place to start learning tokenization and neural networks

#

more on fundamental or mathematical level

wheat merlin
#

this guy is REALLY good if you need visuals like I do

serene scaffold
paper cove
#

i meant for neural networks

serene scaffold
#

I know.

#

you don't really need to "learn tokenization". you just use an existing tokenizer. making your own tokenizer is very advanced and unusual.

paper cove
#

but it is hard to tokenize things in other languages

serene scaffold
#

what are you trying to do?

paper cove
#

like in japanese, we don't use space to differentiate in words

#

it is all continuous

wheat merlin
#

it's not that hard to make a tokenizer

serene scaffold
paper cove
#

by the way understanding a tokenizer as a function that converts a string into a nth dimension vector is good?

#

where there are 'n' words in entire dictionary

#

and less than 'n' words in the string

wheat merlin
#

Right, for older models. For newer models there is text embedding which adds some more complexity

rich moth
#

Looking for some feedback on this idea , what am I missing here? I’ve been brainstorming a generalized phase encoding system that adapts to different data types and computes a complexity score. The core equation is Φ(x) = x + A * exp(iθ(x)), where θ(x) captures intrinsic structure depending on the data type:

Time Series: θ(x) = ω * t + φ(volatility, trend_strength) (e.g., trends, seasonality).
Images: θ(x) = Σ(spatial_frequency * position + texture_density) (patterns + textures).
Text: θ(x) = semantic_embedding * syntactic_structure (word embeddings + grammar).
Tabular: θ(x) = Σ(feature_importance * value_normalization) (relationships between columns).

paper cove
#

so how can i preserve the order of words into a vector if i use it?

wheat merlin
#

So I think methods like word2vec keep an order of words, but to reduce computational costs they reduce the words to a lower dimensional vector

#

so if your vocab was 1000 words, you can set a cap on word2vec to minimize it to the 300th word

#

so you are left with a vector of 300 length, that still preserves order of words, just is more efficient

paper cove
#

what the vector consists of btw?

wheat merlin
#

I also think the vectors are grouped by context with the embeddings, so they arn't necessarily in the same order they came in on

serene scaffold
paper cove
#

in the same order

serene scaffold
wheat merlin
#

right, so the current method is embedding which looks like this:

#

so you can limit to the nth dimension, instead of the vocab size

serene scaffold
wheat merlin
#

and vectors incorporate context by having values similar to each other

paper cove
serene scaffold
#

if you have a vocabulary of m words and you want to represent them as n dimensional vectors, m and n are unrelated.

paper cove
#

for example, the sentence is "This is a cat."

wheat merlin
#

are you asking specifically for purposes like translation?

paper cove
#

tokenizer will convert it into ["This", "is", "a", "cat."]

wheat merlin
#

yes

paper cove
serene scaffold
#

it would probably have the "." as its own token, because you don't want "cat" and "cat." to be different

paper cove
#

i see

paper cove
#

so i got this output ig from tokenizer

serene scaffold
#

and it might also treat "This" and "this" as the same token

paper cove
#

now i assign them numbers?

paper cove
#

["this", "is", "a", "cat, "."]

serene scaffold
wheat merlin
#

yeah I think GPT uses RoPE for its embeddings

#

but other models use different things

rich moth
#

Sometimes words like unbreakable will get split
["un," "break," "able,"]

paper cove
#

i see

wheat merlin
#

thats called lemetization? or stemming?

serene scaffold
wheat merlin
#

yeah that makes sense

paper cove
#

if my dictionary contains the following words.

  1. a
  2. an
  3. the
  4. this
  5. that
  6. these
  7. those
  8. is
  9. are
  10. am
  11. cat
  12. dog
  13. .
  14. ,
wheat merlin
#

Stemming appears to be removing common suffixes and lemmatization reduces words to their root form

paper cove
#

so i use 14th dimension vector?

serene scaffold
serene scaffold
paper cove
#

so models play a very important role in what input they take

#

i was thinking to convert string into some form which won't lose any information and is computational so that any model can use it

serene scaffold
paper cove
#

i see

serene scaffold
#

when you map tokens to integers (like assigning "cat" to 42069), that mapping is arbitrary--it's only significant for the model inasfaras you always use the same integer for the same token.

paper cove
#

so after doing that, i need to prepare a neural network which can take that list of integers which are coverted from tokens

wheat merlin
#

yeah, for RoBERTa specifically the "main" steps are encoder -> word embedding -> position embedding -> attention -> encoder

#

other models are different ofc

paper cove
#

what is word embedding now?

wheat merlin
#

word being converted to a dense vector

#

["I", "love", "coding"]

[[0.1, 0.3, 0.4], [0.5, 0.2, 0.7], [0.8, 0.1, 0.6]]

#

then does similar for position embeddings

#

then it adds the two together to get the full embedding

#

then that value goes to the attention

paper cove
#

are these values pre assigned or?

wheat merlin
#

when you create the tokenizer you train it on a dictionary of words I think

#

so yeah I think it is trained and knows what to give each word

paper cove
#

but tokenizer just converted strings into a list of words?

#

i have dictionary, and i have assigned a number to each word, and converted the token into integer

#

after that i don't get what word embedding is representing here

wheat merlin
#

yes, my understanding is you do that in the initial encoding block, then the word embedding creates the dense vector

#

def forward(self, prompts):
if isinstance(prompts, str):
prompts = [prompts]

    encoded = [self.tokenizer.encode(prompt) for prompt in prompts]
    max_len = max(len(seq) for seq in encoded)
    padded = [seq + [self.tokenizer.get_pad_token_id()] * (max_len - len(seq)) for seq in encoded]
    
    input_ids = torch.tensor(padded, device=self.device)
    if input_ids.dim() == 1:
        input_ids = input_ids.unsqueeze(0)
    
    word_embeds = self.word_embedding.get_embeddings(input_ids)
    pos_embeds = self.positional_embedding(input_ids.size(1))
    
    embeddings = word_embeds + pos_embeds
    
    attention_mask = (input_ids != self.tokenizer.get_pad_token_id()).float()
    extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)
    extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype)
    extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
    
    encoder_outputs = self.encoder(embeddings, attention_mask=extended_attention_mask)
    sequence_output = encoder_outputs[0]
    
    pooled_output = sequence_output[:, 0, :]
    
    logits = self.classifier(pooled_output)
    
    return logits
#

code helps me visualize

paper cove
#

got an image from internet

#

not sure if this is what word embedding means

wheat merlin
#

that looks inaccurate to me

paper cove
#

i see

wheat merlin
#

at least with bert models

#

i think they are conflating multiple steps into the word embedding

#

it doesn't just go word to perfect, there is multiple steps as you can see in the above code

paper cove
#

any recommendations for docs or video to learn word embedding first?

wheat merlin
#

Words are great, but if we want to use them as input to a neural network, we have to convert them to numbers. One of the most popular methods for assigning numbers to words is to use a Neural Network to create Word Embeddings. In this StatQuest, we go through the steps required to create Word Embeddings, and show how we can visualize and validat...

▶ Play video
paper cove
#

thanks

paper cove
flat token
paper cove
#

like f_noun("cat") = 1
f_noun("is")=0
where f_noun is an activation function???

wheat merlin
#

I think its just a value assigner yeah

#

or perhaps a selector is a better description

neat sparrow
rich moth
#

Okay, this is pretty cool, guys. You can actually see the model shifting between using the base dimensions, and expanding its dimensions as the complexity changes. This is what I hoped would happen, adapting on the fly. I think the quantum-inspired is still the key to all this but adaptive dimensions stuff is something else. It seems to be learning really well, and only uses the extra capacity when it needs it, and chills out when it doesn't. Check these out

wheat merlin
#

what's the code for the adaptive dimension?

#

is it the Adaptive library?

rich moth
#

This one is indeed my highest score -256 . What is pretty amazing is the base dimension is only 64, but its modified with a expansion factor. I think I found it ideal to allow it to be small or modest, big isn't necessary as its highly more optimized, the vector space that is. Like switching between its dims doesn't seem to affect its accuracy at all. I believe this could be foundation for time series data or other data that's chronological organized.

@wheat merlin It's a custom job, not a library. Basically, I've got this "Market Complexity Detector" that checks out the market's vibe - you know, volatility, trends, that kinda stuff. Then, based on that complexity score, the transformer blocks can "expand" or "compress" their dimensions.

wheat merlin
#

That's really cool

#

Is it proprietary or is it based off a paper implementation?

rich moth
#

Thank man, its been a labor of love so long it feels like

#

It all started with an idea trying to perdict prime numbers

#

Obviously didnt have the compute capaticy to see it through but learned some cool stuff hehe

rich moth
#

Didint happen over night though and I stumbled upon a lot of stuff just through experimentation more than anything.

wheat merlin
#

Yeah im kinda looking into similar papers, it looks like this stuff kinda started getting implemented around 21-22

#

But you may have done something unique idk... could be worth looking to see if you could write a publication if you are interested in that

#

Cool idea though, definitely going to look into it more in regards to Graph theory

gritty vessel
rich moth
# wheat merlin Yeah im kinda looking into similar papers, it looks like this stuff kinda starte...

Thanks, dude. I appreciate that. I’m just an old UPS driver who loves tinkering with this stuff in my spare time. Writing a paper sounds cool, but honestly, kind of intimidating, and I've read my fair share. But the whole academic process Is a whole another beast, you know? I found my secret sauce in life is brainstorming and tinkering. I guess like problem solving one a whole, but everyone is a problem solver I guess lol. But with that said I have started trying to compile a lot of this data into a report which I can hope to refine and use the visuals I've gathered along with other metrics I'm sure I can eventually tackle that.

#

Zapbot is a beast.

plush kettle
#

Is opencv good for image augmentation? I would like to create random faces with various perspectives like for example: looking right, looking left, etc.

wooden sail
plush kettle
#

Can you give me such example/s?

wooden sail
plush kettle
#

Alright, thanks

wooden sail
#

but yeah my main point is that that is not an augmentation task, it's a very difficult modeling/inversion problem

#

if you use this to train another network, you now have several points of failure because the new images cannot be trusted (you have a dirty training set)

solemn silo
wooden sail
#

what about the chain rule is troubling you?

young granite
#

guys i want to compare rows in my feature df and see which ids have a high unity.

           col_0  col_1  col_2  col_3  col_4  col_5  col_6  col_7  col_8  \
412788399      0      0      1      1     55  41351      1  47333      1   
412763015      0      0      1      1     62  92000     99  47999      5   

           col_9  ...  col_37  col_38  col_39  col_40  col_41  col_42  col_43  \
412788399    0.0  ...     1.0     1.0     0.0     0.5     0.5     0.5     1.0   
412763015    0.0  ...     1.0     0.0     0.5     0.5     0.5     0.5     1.0   

           col_44  col_45  col_46  
412788399     1.0     1.0    21.0  
412763015     1.0     1.0    12.0

currently i use the last col (sum of features) but i would love a better approach, not sure how that would look tho :D.
I did feature dist., corr and relationship plots already.

Any ideas?

serene scaffold
young granite
serene scaffold
young granite
serene scaffold
#

@young granitefor each pair of rows x, y, you can calculate the elementwise |x - y|, I guess

young granite
serene scaffold
young granite
serene scaffold
#

and [2, 2, 2] also has the same sum, but manhattan([1, 2, 3], [2, 2, 2]) < manhattan([1, 2, 3], [3, 2, 1])

wheat merlin
#

You could look at pairwise similarity using cosine or jaccard similarity

#

you could also use graph theory with those pairwise similarity meaures to visualize the ids close to each other

serene scaffold
wheat merlin
#

crazy they are used for recommendation systems... with data that looks exactly like what he is working with...

serene scaffold
#

(or you can use jaccard anyway, but you'll get something meaningless)

#

I recommend not using greeksforgeeks--their information is almost always questionable, and there are so many free resources out there.

wheat merlin
#

yeah you are right that is for binary data; there would be a better approach for sure

#

but there is a fuzzy jaccard index that can take continuous data if you were really bent on using jaccard

solemn silo
wooden sail
#

maybe an illustrative example for you is that x^6, for which you already know the derivative is 6 x^5, can be written as (x^3)^2, where we can think of f(x) = x^3, and g(z) = z^2, and then take g(f(x)). applying the chain rule should yield 6x^5, so you can check yourself if you did it right

#

and similarly for polynomials with higher powers, say x^8, noticing that x^8 = ((x^2)^2)^2

dry raft
#

hey guys

#

I am trying to use a vision transformer for binary classification

#

however, at some point, it keeps guessing "1" and gets stuck at this

#

how can i fix this(broad tips)

wheat merlin
#

I think there are a lot of potential options:

Make sure your class balance is good, try changing learning rate, try regularization

dry raft
#

There are GitHub repositories for this by the way

wheat merlin
#

Ok I can look at it, I only have used one ViT before though

#

have you tried using a different size of the model, it looks like they have 5m, 11m, and 21m parameters models

#

but I would also look at: class balance, learning rate, and regularization

boreal gale
# young granite share similar feature set

similar in what sense?

is (1,1,1) and (1.1, 1.1, 1.1) similar? since they are close in magnitude away from 0
or just (1,1,0) and (1,1,1) ? since they share the 2 preceeding 1s

empty wing
#

ive been trying to make a neuronetwork combined with a sort of macro mixed into it where it would scan a group images and click on a specific one for a set period of time and if it doesnt see the images it would skip to the next set of pictures, ive been using pytorch with cuda and anaconda and i was confused on how i can give my network pictures to learn from if im confined to a terminal

upbeat prism
balmy grail
#

Hey, does anyone have a course reccomendation for ML

upbeat prism
fickle shale
#

During the validation in seq 2 seq , what if the decoder outputs an output sequence shorter than the true output sequence. How is the loss calculated in such cases. will categorical cross entropy work in such cases.

wheat merlin
#

to standardize lengths

#

at least in the models im familiar with that's the process, I can't speak for generative models necessarily

fickle shale
wheat merlin
#

I think that's part of the tokenizers function

#

you would train the tokenizer on a dictionary, it is able to then assign values to words

#

so multiple word words would be known if it was in the tokenizer dictionary when it was trained

#

I could be wrong on that last part

#

it may just treat each word as seperate, and it wouldn't matter much because the word and positional embeddings would still find similarities

#

but in sum, it's part of the tokenizer's function to make sure it's processed correctly

serene scaffold
#

But it's not the model that does this. "You" have to do it before passing the tensor into the model.

wheat merlin
#

You are right. I was defining model as the whole ipynb file. The padding would be part of the preprocessing before the data is sent to the attention head of the actual model

serene scaffold
limpid zenith
#

yeah generally a model is the layer definitions and/or it's weights
a jupyter notebook isn't really a model

wheat merlin
#

we call BERT a model right

#

but when we say that, we know it has it's own unique tokenizer and preprocessing steps

#

a model in my mind is the preprocessing, tokenizer, embedding, attention, and encoder outputs

#

if we say that the model is only the attention onwards, that's also misleading, because you could make an entirely different "model" with a new tokenizer and preprocessing steps

fickle shale
wheat merlin
empty wing
upbeat prism
gray slate
#

Anyone here trained smaller language models? I'm looking for advice on making something small enough to run offline on ordinary machines. Project background for context:
https://bitplane.net/log/2025/01/uh-halp-data/

I'm looking for advice on what to train on top of, how much data I should be generating, and how small I can expect to get it. And if anyone wants to join in and help, that'd also be cool

shrewd mountain
#

If I want to start studying for ai technology where should I start still kinda confused me

serene scaffold
shrewd mountain
#

I have really basic knowledge on python and planning to expand on it

shrewd mountain
#

Southeast Asia

#

However taking a university somewhere else

serene scaffold
# shrewd mountain Southeast Asia

I know where Indonesia is. Are you used to talking to people who don't?
I usually recommend that people start by learning how to manipulate data with pandas. (or you can try using polars, I guess.)
that doesn't actually involve AI, but it's important to get a sense for what "data" is like.

shrewd mountain
serene scaffold
shrewd mountain
#

And yes ppl usually don't know where that is

serene scaffold
#

wtf

shrewd mountain
#

The problems that I am having is that school does not teaches comp science sadly so I need to study it myself. On top of that I have very little time so ye. Btw rq what is manipulating data with pandas

#

Searched it up and not 100% sure that I understand it

serene scaffold
serene scaffold
#

like, if you have sales data where each row represents one transaction, you can transform it so that each row represents a month and each column represents a year. and then you can see if there are annual trends.

shrewd mountain
#

Ah alright then. Welp ig I'll be learning from scratch again lol. Any web recommendations of vids to study this

serene scaffold
#

ML is basically all about finding trends/patterns in data (or rather, making the computer figure out what the pattern is)

iron basalt
shrewd mountain
# iron basalt What do you think "AI" is? And what do you want to make?

A tool that will most likely help with like efficiency and figuring out small errors that can adjust the machines. What I want to make with it? I just really like automation in addition i always been very interested in it how it works. (I probably want to make like chatting bots or automation bots)

#

Idk if that is a good answer or not so ye

iron basalt
shrewd mountain
#

I see

gray slate
#

have you got a decent gfx card? or money?

shrewd mountain
serene scaffold
gray slate
#

'cause both of those things will help you if you want to run long jobs

gray slate
iron basalt
shrewd mountain
#

Alrighty

iron basalt
#

I recommend getting a book on the topic, or a course.

#

(Or both)

gray slate
#

sentdex has a pretty good youtube channel for doing ML from scratch in python, he's a good guy too

shrewd mountain
#

Probably my best bet is to find a eBook

iron basalt
#

You will need two things in general for machine learning. You need to be very comfortable with programming, being able to make small to medium sized practical programs (manipulating files and such, a good resource for this is https://automatetheboringstuff.com/ ), and also the basics of data structures and algorithms (only really need to the basics here, but if you like it, you can dig further, it will only make you better). The second thing you need is mathematical knowledge, the usual recommendations are calculus, linear algebra, and statistics.

#

Also additional for programming is data manipulation / analysis with stuff like Pandas as was already mentioned.

#

(tabular data)

shrewd mountain
#

Calculus my favourite.. Teacher taught us it about 1 week and gave us a test with horrible results in it. For the whole class

gray slate
#

I really enjoyed this video as a high level overview: https://youtu.be/0QczhVg5HaI

A video about neural networks, how they work, and why they're useful.

My twitter: https://twitter.com/max_romana

SOURCES
Neural network playground: https://playground.tensorflow.org/

Universal Function Approximation:
Proof: https://cognitivemedium.com/magic_paper/assets/Hornik.pdf
Covering ReLUs: https://proceedings.neurips.cc/paper/2017/hash...

▶ Play video
iron basalt
#

I don't really know of a good recommendation for calculus books.

#

Maybe someone here has one.

wooden sail
#

spivak 💀 (don't. it's a greak book, but it's more meant for people going down the maths route)

gray slate
#

Khan Academy goes all the way up to calculus and beyond, it's interactive

iron basalt
gray slate
#

I personally think the "learn by play" is the best way to learn anything, and the new LLM methods of exploring knowledge are really powerful too

iron basalt
#

There are additional materials that can help, but I would treat them as additional. https://www.youtube.com/watch?v=WUvTyaaNkzM&list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr

What might it feel like to invent calculus?
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to share the videos.
Special thanks to these supporters: http://3b1b.co/lessons/essence-of-calculus#thanks

In this first video of the series, we see how unraveling the nuances of a simple geometry que...

▶ Play video
gray slate
#

When I learn a new programming language or technology I want to understand, I get ChatGPT to take the role of personal tutor. First constrain it by making it come up with a plan. Then give me examples, get me to explain what I think is going on, then have it correct me while I ask questions

#

then once I've covered the topic, get it to move on to the next stage. It worked really well for learning golang

shrewd mountain
#

Kk

wooden sail
#

a quick search reveals that george simmons' calculus with analytic geometry seems to be used in MIT for engineering courses. stewart's calculus books are also standard engineering books

#

that's around the level to aim for imo, for the sake of practicality

iron basalt
#

I'm going to give a some more random resources, so you have options. https://www.youtube.com/watch?v=TjZBTDzGeGg&list=PLnvKubj2-I2LhIibS8TOGC42xsD3-liux&index=2

MIT 6.034 Artificial Intelligence, Fall 2010
View the complete course: http://ocw.mit.edu/6-034F10
Instructor: Patrick Winston

In this lecture, Prof. Winston introduces artificial intelligence and provides a brief history of the field. The last ten minutes are devoted to information about the course at MIT.

License: Creative Commons BY-NC-SA
...

▶ Play video
wooden sail
#

there's also the python one, i think it's in the pins

shrewd mountain
#

Alright tysm also sorry for late reply

shrewd mountain
fickle shale
fickle shale
tawdry sundial
#

whats the best llm to run locally?

#

(using rtx 4070super)

#

currently looking at hugging face leaderboard

#

its quite odd that mistral or llama isnt on the list

#

is falcon3 really the best option?

lapis sequoia
#

Hii

digital hatch
#

My friend work in APD bank

solemn venture
warped harness
#

Guy I want to make my own ai chat bot, what should I study?

#

Like any roadmap?

gray slate
#

download ollama, run it, then pick a model. then do prompt-hacking and requests to call the API

fickle shale
warped harness
gray slate
warped harness
#

Is it good enough?

placid ravine
#

it will be excellent for learning and running a decently large model
idk how large coz i dont do llms
but yea @gray slate and @fickle shale might tell you that
but you can start working

gray slate
#

basically, to configure it you set the system prompt, either:

  1. edit the default system prompt by editing the manifest file (I haven't tried this)
  2. or send in the system prompt like I did in the curl command

to use it, you can either:

  1. chat with it directly on the command line, after running the model
  2. pipe data into it from a script (doesn't work well in Windows, adds junk on the end for some reason), like echo hello! | ollama run model and capture the output, or
  3. use requests or curl like I did, and join the response bit together. (use json.loads on each line as they come back from requests.post, so you get the typing out effect if you need it)
gray slate
warped harness
#

Damnn

gray slate
#

llama3.2 is fast enough, and good enough for most tasks though

#

And if you ever need to use something smarter, you can run a larger model on vast.ai for 10 cents an hour

warped harness
#

Thank you so much for your time brother

gray slate
#

You're welcome 🙂

#

steal the function from line 71 and hack it to do whatever you need 🙂

#

^ @warped harness same gfx card as yours

wheat merlin
#

anyone have experience with graph neural nets

#

stuck on a loss of .693 which is just random guessing basically lol

gray slate
#

nope but interested to learn. how do you evaluate?

wheat merlin
#

so what im doing is im trying to predict edges between nodes

#

which is a binary classification

#

so it uses binary_cross_entropy_with_logits for loss

#

my understanding is a loss of .693 with this function is just as good as 50/50 guessing

gray slate
#

yeah sounds like it. does it not shift at all?

wheat merlin
#

starts insanely high and gets down over the 100 epochs

#

for my model I HAD to add gradient clipping

gray slate
#

how big is your data and your model?

wheat merlin
#

only 1300 edges, which is probably the problem

#

I have been using event history analysis with a network component, it's called NEHA, but wanted to try with neural net

gray slate
#

yeah, it can't generalize if your model isn't way smaller than your data. from what I understand anyway -- i'm no expert

#

can you augment your data to make realistic fake stuff and pass in a load?

wheat merlin
#

yeah actually, that's part of the data preprocessing

#

i've never heard of negative_sampling before but it's super cool

#

like you said, it just makes fake networks so the model can classify between true and fake

gray slate
#

nor me, what's that?
I'm a noob really, lots of software experience not much ML experience

#

But, if you take the "every weight and bias is just a line y = mx + b style, and every activation function is just a way to crop it so you get line segments rather than a linear slope and offfset", then all you're doing is fitting a function to some points... getting data right should be mostly about getting the fake data on the same line as the real stuff

gray slate
wheat merlin
#

yeah im thinking I need to mess with the data before I mess with the model anymore

#

it is imbalanced and not scaled

gray slate
#

so with a graph network, do the graphs all have to be the same depth? or do you like.... truncate them or something?

wheat merlin
#

im doing edge classification, so I don't think I have to deal with graphs of different sizes

#

but from what I understand, GNN's can handle graphs of varying sizes when it comes to classifying the graph or nodes

gray slate
#

oh cool, okay yeah I was confused and thinking "how the hell do you feed different length graphs into a network" - but looks like you don't, you feed stats about nodes or edges in, so it's a data formatting thing rather than a network architecture?

wheat merlin
#

yeah, I think its essentially an edge list with node attribute data columns

#

so for me, im looking at US states

#

so it would be like:

receiver sender GDP %White etc.

gray slate
#

ah okay and you're building a good ole fashioned discriminator lol
is that legal?

#

I worked at a loan company doing devops for linear regression models and they (data scientists) had to be really careful about that sort of thing

wheat merlin
#

yeah, im not doing anything crazy lol. Im not trying to predict like crime rates and blaming it on a certain demographic or something

#

basically im taking a developed theory from my field and using a different method

#

not really changing the question or anything

gray slate
#

stuff like "when we predict chance of a default by geographical location, the location often encodes ethnicity as a by-product"

wheat merlin
#

yeah I could see stuff like that being problematic

#

thats why theory is so important, cuz you have to be able to explain and understand the relationships you are modeling

gray slate
#

it's harder than it looks! I enjoy the data thing and reasoning about data but I've never actually built a full end to end model, keep getting stuck overthinking the feature engineering part, or failing to generalize

wheat merlin
#

I highly recommend running through PyTorch if you are interested. PyTorch would probably be pretty easy to code with a devops background

#

people and companies love to use pretrained models now and just importing them from huggingface

#

but imo the most fun part is building the architecture and stuff with torch

gray slate
#

I've played with other people's models and that, trained tacotron 2 models, and I tried to make a model that predicted "distance by road" given geographical locations, but never made it out of the workbook

#

currently I'm generating data for my uh tool (pip install uh-halp) - I want to fine tune a tiny model and have it run locally

wheat merlin
#

that's cool, haven't heard of that before

#

when I get a house, im totally going to get a server rack and run models locally and stuff

#

the computer engineering stuff is so cool and fun to play with

gray slate
#

I think the uh-halp thing should have small enough data that it can run locally. at least I think so anyway. I mean, a few hundred MB maybe?

wheat merlin
#

rn im running python through Google colab. I have pro, so im using a Nvidia A100. It runs so fast that I thought my code was broken lol

gray slate
#

nice! does it still disconnect you?

wheat merlin
#

haven't had any problems recently

gray slate
#

colab annoyed the crap out of me, deleting my data and progress every few hours

wheat merlin
#

I was running a model, and it was running at 50gb of ram for 17.5 hours lmao

#

and then google shutdown my runtime

gray slate
#

I shifted to vast.ai, which has its own problems but it's a docker container you run yourself

wheat merlin
#

ooo ill check that out

gray slate
#

i mean you run it on other people's computers for money, pay by the hour

wheat merlin
#

my workplace has a cluster where I can get a gpu, 12 cores, and 250gb of ram. But it wasn't working earlier, so that's why im running on colab

gray slate
#

but it's like 5 cents an hour for something like my laptop and not set my legs on fire or need to leave it open

wheat merlin
#

yeah that sounds really nice

gray slate
#

I bought an Orin too, lots of RAM but not the fastest thing

wheat merlin
#

I have a question about Orin and the related small computers

#

my uncle is a penetration tester and does all the cybersecurity for a fortune 500 company

#

so he uses a lot of data and stuff

#

but he was looking to get a small workstation like an orin

gray slate
#

it's 64gb shared between CPU and GPU but the CPU is 8 cores ARM and the GPU is Tegra... so .. yeah, not the fastest

wheat merlin
#

do you have any recommendations? I think it would need a good cpu as opposed to a good gpu

gray slate
#

I tend to rent boxes when I need speed, and use a lightweight gaming laptop as my daily driver

#

but in work we have macbook pro laptops, they're surprisingly good. 30GB of RAM on them, again, shared between CPU and GPU. battery life is insane

wheat merlin
#

yeah im looking at upgrading my laptop to the new base m4 macbook

#

the battery life is insane

#

my uncle swears by the macbook cuz the terminal is really easy I guess for devops stuff

gray slate
#

yeah it's nice to have a BSD shell. not as nice as Linux IMO, but it's usable unlike Windows

#

They're fast enough to run Linux in a VM and run Kali and stuff as images I think

#

but the CPU architecture stuff can be a bit of a pain if people haven't published aarch64 binaries

wheat merlin
#

yeah

#

my uncle has the macbook with the intel cpu

#

and he was saying that to upgrade he couldn't transfer his vm's and stuff or something

#

so he had to remake all the stuff he was using

gray slate
#

yeah, or ... 🐌

wheat merlin
#

I dont know a ton about terminal stuff lol

gray slate
#

I barely use the GUI! only for the web really

wheat merlin
#

do you do fullstack stuff or what kinda work do you do usually

gray slate
#

lifelong developer, games and stuff, c/c++, python back-end, did years in load testing, bit of full stack and devops, currently doing SRE work

#

I basically live in tmux. Use vscode sometimes, but vim mostly

austere rock
#

what a giga chad

gray slate
#

lol

#

in work it's all shitty web apps 😦

#

so I have to get my terminal fix at home

austere rock
#

building my own conversational AI with open source libraries is my latest obsession

gray slate
#

cool whatya building?

warped harness
wheat merlin
gray slate
wheat merlin
#

LOL

#

I do my ML in python, don't worry

austere rock
#

my job isn't in tech its in dominos delivery driver, no college education, been job hunting and starting my own AI company in my free time

#

never touched R

gray slate
#

I wrote a Pydantic to R object conversion thingy for a client, and... well, R really upsets me

wheat merlin
#

bro R is the best program i've ever touched

#

pls dm me if you need help with R

austere rock
#

I don't wanna touch R unless there's no better alternatives

wheat merlin
#

I highly recommend R for data transformation, visualizations, and regression stuff

#

but for ML definitely use python or use the python in R thing

gray slate
#

you've got keywords that make it so parameters don't get dereferened unti they're inside the function. and all the modern R code with the filters and stuff is slow as hell because it's basically mixing compiled and interpreted code

wheat merlin
#

the dplyr tidyverse stuff? yeah I was told it is really bad on ram

gray slate
#

yes

wheat merlin
#

there is a package built by Apache called Arrow

#

it just masks all the dplyr functions and makes it significantly faster

gray slate
#

oh wow didn't know about that. I filed a bug with dplyr saying "hey every function is slow" and they basically said "yeah, not fixing"

wheat merlin
gray slate
#

spent about 3 days trying to figure out how to optimize it but it needed a partial rewrite. ended up just replacing aggregation functions with R ones

wheat merlin
#

yeah it's really cool, it has the same feature that datasets from huggingface in python has

#

where you can load part of the data but it doesn't go the ram somehow

#

idk exactly how it works but its pretty cool

gray slate
#

the state we were in is, we had data scientists who knew databases, and they wrote their extract code in SQL-flavoured R, big rectangular chunks of data. then train their models on it

#

then they wanted to do it live rather than on a batch job

#

so naturally when prepping data coming into the system, you need to run it through the same aggregation pipeline, fastapi -> R and back again

#

the R part wasn't threadsafe, didn't perform, the data types are a free for all, and the model ran in Python lol

#

but the only way to do it safely was to have the in and out of R step, "it's okay if it takes a second" and I was like my code runs in < 40ms and I'm disgusted by how slow it is!

#

shakes fist at dplyr

wheat merlin
#

lmao

#

no one in my field cares about speed, it's pretty funny

#

we don't usually use super big data, but all the replication data and stuff is always so inefficient

gray slate
wheat merlin
#

yeah, that makes sense

#

(although no modern game developers seem to care about optimization lol)

gray slate
#

that was the best bit for me, the optimization

#

oops my "emojis as bullet points" trick has messed up my site

wheat merlin
#

this is cool, I like the three-breasted alien lmfao

gray slate
#

inspired by total recall's "you make me wish i had threeee hands"

wheat merlin
#

No one has seen this yet, but its part of my dissertation. I'm not going to say what im modeling, but ill say that its related to state level policies

#

need to share it cuz I think it's super cool lol

gray slate
#

wow yeah looks nice

wheat merlin
#

the thing with R is it has all the network stuff, python really lacks in that department

gray slate
#

it does? I thought it was mostly for rectangular SQL-like data

#

though I guess I only did one project in it!

gray slate
#

@wheat merlin do you use any sort of runner to run your data creation stuff? Makefiles are grating on me

#

it'd be really nice if there was a docker/make style hash(inputs) cached job thingy I could use and run things on other hosts by just adding scripts and output names, and spinning up a runner with tags on it

wheat merlin
#

I just have everything in my rscript and press play

#

have a working directory set and stuff

#

and my data is small enough that I just write to csv and import the csv in my model execute file

#

im kinda just now learning git, forking, and pull request stuff; so that's where I am at with data management lol

rancid sorrel
#

you mean the graph network diagraming?

#

or you talking about raw network stack?

wheat merlin
#

wdym

#

Network as in graph network diagramming and modeling

wild sluice
#

is Malmo for reinforcement learning dead?

solemn venture
#

Never heard of Malmo

pine heron
rancid sorrel
#

fun times

neat sparrow
#

I need some help; I'm trying to download a parquet dataset from hugging face, ds = load_dataset("lighteval/natural_questions_clean", split="train"). I've been trying to figure out for most of the day why the script is always only downloading the ReadMe.md file and then logging a 'tags' error no matter what dataset I use. I'm sure it's just a really easy fix but I can't quite figure it out. I've looked through documentation and videos aswell.

wild sluice
flat token
gritty vessel
#

Hey everyone I wanted to ask how can I add temporal features in auto encoder?

#

I have multiple events and from each events I am taking first 10 time steps

#

So shape of my data comes to

#

There are 45 batches

#

And each batch is of size ([1,10,453,958])

#

And whole dataset

#

Is of shape ([450,453,958])

#

Am like progressing correctly?

#

I am trying to train auto encoder

#

But want it to capture temporal features as well

rancid sorrel
#

Try use liquid time if your varying the time step

#

It requires some minor modification to the keras library but keras LTC NPC works great

umbral star
#

How to extract clothes from images? For example, imagine a picture of a person wearing a T-shirt and I want to isolate every pixel of the shirt and save it as its own image. How can I do that? I know that opencv can do this but with things like faces and maybe animals but how can you train it to detect things like say guns or any object?
I have CC-ed this question in #media-processing in case that channel would be a better place to ask this.

wheat merlin
mild dirge
gritty vessel
rancid sorrel
#

from ncps import wirings from ncps.tf import LTC as LTC ncps is sparse wiring libary

#

its very quick

umbral star
#

Thanks @silent @gray slate and @silent @mild dirge

gritty vessel
#

Batch size 4 1channel 10 time steps 453 height 958 width

#

But in out put I am getting always different shapes

rancid sorrel
#

well sadly thats the issue with nural networks

#

you need to specify in and out

#

my best advice to you is to load up tensorboard so you can see the NN and what its doing

#

one of the first thing to do with images is to reshape the input using image processing layer

gritty vessel
#

Thanks

#

I managed to make input and output size same

toxic mortar
#

In this article for understanding LSTM networks, it says "forget" gates can decide when to forget some information for future propagation. Is this really what happens or this is just a naive attempt to interpret these networks?

rancid sorrel
#

return_sequences=True is what i think it means

#

its a tensorflow option

#

but essentual a forget gate just resets its state randomly

#

its another random behaviour you can add

toxic mortar
#

Why would you continue to randomize the weight matrix, if you have already initialized with random values?

rancid sorrel
#

another thing to prevent overfitting most likly

#

and tbh that sounds almost exactly like the dropout feature

toxic mortar
#

Really? Dropout feature initialize percentages to see how these weights are relevant to the output, and based on the output result during back propagation they are updated based on the p or 1-p probabilities. Sounds very different to me

#

If you do not set explicit metric, how can you track its effect?

rancid sorrel
#

i have been known to be wrong, and am also learning this stuff too 😉

#

you use a tool like tensorboard to track the optimization

wheat merlin
fickle shale
fickle shale
wheat merlin
#

that could be it, good thought

toxic mortar
#

Guys, the name of the gate is "forget", not randomize. It uses sigmoid activation function ( codomen [0,1] ) previous cell state to see what is retained or discarded from previous memory cell context

#

LSTM solves vanishing gradients with cell memory context and GRUs

#

I am asking wether we can introduce some intuition of how we pick those things what we want to forget

jaunty helm
fickle shale
#

they trained with lot of data lstm is not always correct it learns

#

it trains with lot of traning data then it knows which words is relevant for ur problem so it predicts correctly(not always) next word!

flat token
tawdry sundial
#

are there any agent libraries that offer a wide range of models TTS, STT, VAD and other models like this for example

#

a bunch of models that can be stacked on one another to perform tasks

#

but it uses this architecture, so i cant just make a script and run agent locally

neat sparrow
gritty vessel
flat token
# gritty vessel Tell me about your research!

im working on a paper that is almost finished for the journal of algorithms which is basically the optimal traversal of n-nomial tree structure path generation that solves an open problem of how to algorithmically generate all the combinations of n-long integer with no permutations. I also work on 2 problems in DMARL (deep multi agent reinforcement learning) and the control of multi agent aquatic rover systems

#

im also fooling around with a problem in K-theory and another problem in stochastic path generation as a solution for monte-carlo convergence failures but those aren't really serious pursuits yet just stuff im fooling with

gritty vessel
#

What deep multi agent reinforcement learning ?

#

So instead of single agent we will have multiple agents and then there experience will be combined together?

flat token
#

and the construction of my environment is critical becuase if i don't reflect this none of the agents can learn at all

#

however multi-agent RL can also be something where agents cannot share experiences but must still interact within the same system

#

or the system may become a POMDP (partially observable* markov decision process) where not everything is known about the system except for what is observed at your current time step, all past time steps, and maybe some amount of n < \infty time steps in the future (i.e. maybe you know 3 time steps or maybe 10 in the future, but you dont know all of them)

#

and these agents need to learn to work together given this

#

so there are a lot of different ways in which DMARL can be formulated it's just a use case thing i would say

#

and this is what this would look like in a holding pattern with M agents

gritty vessel
#

Half of stuff is going up my head

#

But it's interesting

#

Any papers regarding this?

#

That you would suggest

flat token
#

And then you can deep learning to any of those things

#

To learn how to apply the math to deep learning

gritty vessel
#

Will check it out thank you

iron basalt
#

This makes its control of the cell state more powerful.

#

With just addition it could struggle to forget things (quickly).

#

Imagine that you have some problem that requires you to modify the cell state only slightly based on the current input or it will explode. So maybe the added values are around 0.01 in magnitude. But you want to forget something instantly based on the current input you just got. So maybe you have some cell state value of 1.0 and you want it instantly to go to 0. With just the addition you can't do that without breaking this whole "small added value from the input" requirement. But multiplication can do so. It can be like 0.0001 (near zero) and is separate from the addition part. So basically it gives the network better flexibility / options on the cell state control. Just addition is pretty limited and may not work at all for certain problems.

#

(Just multiplication would not work due to being 0 to 1 (only same or decreasing))

unkempt wigeon
#

What book for pytorch do I need so I can put it in my cart

odd meteor
# unkempt wigeon What book for pytorch do I need so I can put it in my cart
Manning Publications

Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up!

In Build a Large Language Model (from Scratch) bestselling author Sebastian Raschka guides you step by step through creating your own LLM. Each stage is explained with clear text, diagrams, and examples. You’ll go from the initial design and c...

gray slate
#

I'd really like some kind of learning that has insights rather than the mathematics. Like why do they use "mean squared distance from prediction" rather than "absolute distance from predicted value".

Is it for algorithmic performance (no sqrt call), because of the tradition of std deviation, or because it's the output of a multiplication, so squared distance is the natural thing to use because it's on the same scale? Why not "absolute cubic distance from prediction"

It's like ML is filled with algorithms that seem arbitrary, or work because they have been tested empirically, or have complex mathematical reasons for why they work but not a reason you can just grock

#

Probably my mathematical illiteracy talking here, but I find that maths in general has a kind of... uh... it's like it's a way to logically deduce far deeper than humans are capable of understanding. It searches for proofs, not understanding, like an exploration of a tree in what is essentially a tautology-space.
So the outputs of it tend feel opaque, like circular reasoning, and process-based faith. Dunno if it's easier if you're a mathematician, but I'd guess it isn't based on the outputs.

Apologies for the unprompted philosophical noise lol

tidal bough
# gray slate I'd really like some kind of learning that has insights rather than the mathemat...

There's a nice explanation of why mean squared error is common in statistics in Koks "exporations in mathematical physics" (the arithmetical mean forms a natural pair with the sum-of-squares and with the normal distribution, whereas the sum of absolute distances is associated with the median rather than the mean), and in general you might enjoy that book to get some deeper insight into the math. Here's a few pages (note: f(x)=1 is probably a typo and should be f(x)=x).

#

(It's not the first time I encountered this idea though, just a recent one, but I'm not sure I can remember where I originally read about it)

gray slate
tidal bough
#

(I'm not actually sure if anything goes wrong if you use mean-absolute-distance as your error metric in ML, though. In 1d it would cause the problem that there'll be an entire region where the gradient is 0, but I think not in any higher dimension)

tidal bough
# gray slate > the sum of absolute distances is associated with the median rather than the me...

So if you want to measure distance from median you'd use the sum of absolute distances from it
Sort of. What I mean is that if you try minimizing ∑ |x_i-m| over a 1d dataset, you'll get that the optimal solution is the median (specifically, for an odd number of points it's the middle point and for an even number of points it's the entire region between the two middle points). Whereas for the MSE (Σ (x_i-m)^2), the optimal solution is always the mean.

gray slate
#

cool thank you. though I'm reading those pages and find it pretty mentally exhausting as my math-fu is pretty weak!

gray slate
#

This is kinda how I feel about comp.sci as someone who started out in procedural, imperative programming. I kinda have that (steps, processes, operations) baked into my intuition (rather than relationships etc)

iron basalt
#

As for ML, it's in its 'Babylonian' phase. We know certain parts and how they are connected, but not the whole thing from which it can be built (like axioms). And this is why most ML papers feel like loosely connected guesses that are only somewhat mathematically justified (this varies, but those that seem most immediately applicable probably are like this). https://www.youtube.com/watch?v=YaUlqXRPMmY

Richard Feynman explains the main differences in the traditions of how mathematical reasoning is employed between mathematicians and physicists.

▶ Play video
gray slate
#

Re: dimensions, I've spent the past year or so trying to wrap my head around what the hell a dimension even is. And I kept coming back to pi and sine, and became convinced that rather than being this "infinitely long, transcendental number" it was "a simple process where you start off with something and collapse back to some ratio"

#

Then I discovered that "playing pool with pi" thing and was blown away by it

iron basalt
gray slate
iron basalt
#

This is a limitation of humans with complexity.

#

What we can "understand" is a small subset.

rancid sorrel
iron basalt
#

We understand the simple cases, and then we take a leap of faith on process to generalize.

#

With careful rigor.

gray slate
# iron basalt This is a limitation of humans with complexity.

But also the process, right? Like, if you optimize for proof alone, and you have this tree-search that goes really deep and narrow, and you can keep going indefinitely. But if you don't use that tool, you can't go very deep - you have to go to the next level, invent concepts then pull those back out to the level above. You get understanding that way, but it's a much less precise and efficient way of working, takes generations to change language and so on

rancid sorrel
#

i think my moder is overfitting, but is working great on test data

#

so i really got no idea how much i trust it

#

esp as it get there in 3 epocs

iron basalt
#

We then have computers (machines), which can also use heuristics and are much faster. We have automated provers and such, which have already done much that humans seem to not be able to (in a reasonable amount of time).

rancid sorrel
#
{
    "train_mse": 0.0037,  
    "test_mse": 0.0057,   
    "train_rmse": 0.61,  
    "test_rmse": 0.75,   
    "train_mae": 0.50,   
    "test_mae": 0.53,    
    "test_mape": 3500687731287.13
    "train_r2": 99.95,   
    "test_r2": 99.93    
}```
#

these are % btw

#

and i have no idea how much i sould trst results like these in 3 epocs

gray slate
gray slate
tidal bough
iron basalt
rancid sorrel
#

time serise data, predicting a specifc column,

iron basalt
rancid sorrel
#

thats LTSM ive got another model with random drop and its still over 95% accuracy

#

honestly i just feel suspecious when i get any corrolation this good

tidal bough
rancid sorrel
#

its a month of stock data at 1m trade incraments

gray slate
rancid sorrel
#

data is shape (8601, 6)

#

for one month, one stock

gray slate
iron basalt
rancid sorrel
#

for 500 diffrent stocks and 500 trainings the validation goes to under 0.5% loss in 3 or less epcos

tidal bough
#

hmm, 0.5% of what?

rancid sorrel
#

its ltsm MSE, adam as the optimizer

iron basalt
rancid sorrel
#

so the val loss is MSE evaluator per epoc

tidal bough
gray slate
rancid sorrel
#

but this is honestly why i am deeply suspious, its a standard 80% split for data and seed 42

#

so nothing unusaual ther

#

80:20

tidal bough
gray slate
rancid sorrel
#

i can hit month 4 with the model trained on 3 months of data and see if thats going ok

rancid sorrel
#

and accuracy

tidal bough
#

if they really do match, then it's somehow training on the test data- huh, wait a minute. This is an LSTM, right? Possibly you need some special way of splitting the data for LSTMs, since otherwise your "training" set might accidentally cover the entire dataset (just not all possible subsets of it)

iron basalt
#

(Experiments)

rancid sorrel
iron basalt
#

(All of this in the same lecture btw, covers a lot)

tidal bough
rancid sorrel
#

it really shoudnt be using the testing data in training, given its not being passed that way
y_train_pred, y_test_pred = model.predict(X_train), model.predict(X_test)

iron basalt
rancid sorrel
#
def split_train_test(data, target_column):
    X, y = split_features_target(data, target_column)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Reshape X_train and X_test to 3D (samples, timesteps, features)
    X_train = X_train.values.reshape((X_train.shape[0], 1, X_train.shape[1]))
    X_test = X_test.values.reshape((X_test.shape[0], 1, X_test.shape[1]))
    
    return X_train, X_test, y_train, y_test
gray slate
rancid sorrel
#

did i fuckup?