#data-science-and-ml

1 messages ยท Page 89 of 1

wooden sail
#

data driven only refers to how the optimization is done

#

i.e. analytically vs. stochastically from measurements

past meteor
#

why does every subdomain use the same basic terminology to mean different things ๐Ÿ˜ฉ

wooden sail
#

any form of stochastic grad desc is data driven, regardless of what you're optimizing

wooden sail
#

i kinda trust this paper cuz this yonina eldar is a well known mathematician

past meteor
iron basalt
past meteor
#

They have model-based on a spectrum with data-driven on opposite ends

wooden sail
past meteor
#

I think they use "model" in the same way as reinforcement learning uses "model" (the MPC-kind of "model")

past meteor
#

(I don't know who they are)

wooden sail
#

lemme go check ๐Ÿ˜ฉ smh

past meteor
#

I lifted this from the paper

wooden sail
#

i've been bamboozled

wooden sail
iron basalt
wooden sail
#

cuz you can have model-based and data-driven at the same time

iron basalt
#

That's like calling my layer with a non-linear activation function a "linear nonlinear layer."

wooden sail
#
hybrid model-based/data-driven systems...```
#

the "model-based" part refers to architecture, while "data-driven" refers to how the parameters of the architecture are learned

#

they're letting people off easy by not calling it black-box + data-driven

past meteor
#

I agree with this. The "fully" model driven (whatever that is) still need to estimate a handful or parameters

#

But what they can do is way more constrained

wooden sail
#

yeah. and you can either do that from data stochastically or analytically

#

right, the one model will only work for a specific type of problem, usually

#

it can't adapt by just changing the parameters

past meteor
#

The predictions are as good as the chosen model

#

So if the model used in the domain is already a massive oversimplification of reality

#

Like the ones are that try to model human stuff in silco

wooden sail
#

that's one take on what model means though, not the only one

#

the other one is to take an optimizer which has convergence guarantees but involves expensive steps

#

then replace those expensive steps with a black-box network

#

this is independent of whether what you're "modelling" is modelled with a network or not

past meteor
#

Ah yes, okay this makes sense

#

So you're not using it as an emulator - you're basically using it to converge your expensive model

#

Correct?

wooden sail
#

you'll see a lot of ADMM on crack done this way

#

kinda, yeah

#

a good combo of these things is to take a network that learns a model for something complicated, and its parameters are learned by grabbing a well established optimization routine and intertwining it with the network

#

these usually end up somewhat like autoencoders

#

like input -> modelling network -> output -> optimization routine turned into network -> the parameters we care about

#

and you learn the parameters for the forward and inverse networks together using some fancy cost function, e.g. possibly enforcing the model network to solve a differential equation as part of its cost instead of only fitting data

past meteor
#

I think this makes sense to me yes ๐Ÿค”

#

Do you have a paper on this as well?

wooden sail
#

not off the top of my head, but many recent papers solving inverse problems with physics-informed neural networks should be doing something similar

rugged comet
#

Yeah, columns like zip code make that tough.

rugged comet
#

I am writing a DecisionTreeClassifer from scratch for fun and to help me understand the algorithm better.
How are model weights/decision nodes saved within a DecisionTreeClassifier object? I'm thinking about perhaps creating a new class for the nodes similar to the Tree data structure.
I think that's how they might do it.

placid cedar
#

guys, how do i solve this error

storm smelt
#

Actually, I'm just confused about how to ask the question

wild wadi
#

Hello everyone! Is there any experienced python webscraper around whos got 5 minutes for a few questions from a noob??? :$$$

serene scaffold
serene scaffold
wild wadi
#

Done, thank you!

storm smelt
serene scaffold
#

But if you already have a more specific question than that that you can put words to, please go ahead.

pure palm
#

I am actually looking for a team to participate in ML hackathons
I have worked on LLMs, autogen, diffusion models, VAEs
If anyone is interested pls let me know.

mild dirge
#

@iron basalt I am now reading conscious MIND Resonant BRAIN, but I am coming across a lot of tiny mistakes (referencing the wrong images, mixing up rows/columns, saying someone lived from 1869 until 1854 etc.). Am I reading an old version or something, or is this something you found as well? (asking you because you recommended this book earier, the book is really interesting so far)

west cloak
#

I have a question on classifiers and the bagging method. What classifiers gain from this method?

scarlet helm
#

Hello, I am creating a sentiment analyzer with Python and keras vanilla RNN, the dataset consists of two columns sentence and label (Positive and negative), I tokenize these sentences, eliminate stop words and convert them to a number, currently the accuracy of my model is 52% How can I improve this?

#

Here is my model definition

serene scaffold
arctic wedgeBOT
#
Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

scarlet helm
#
def vainilla_rnn():
    model = Sequential()
    model.add(Embedding(vocab_size, 200, input_length=maxlen))
    model.add(SimpleRNN(200, input_shape=(maxlen,1),return_sequences=False))
    model.add(Dense(num_classes))
    model.add(Activation('sigmoid'))
    model.summary()

    adam= optimizers.Adam(lr=0.001)
    #model.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy']  )
    return model
#

Here is model definiton

serene scaffold
scarlet helm
#

10 but If I increment the number of epochs I don't get a big increase in the performance

#

For example I tried with 30 and the result was similar

desert oar
#

and as a baseline, did you try something like logistic regression with tfidf features, or hashed features, or a simpler word embedding model like cbow/skipgram ?

iron basalt
desert oar
#

also how big is the dataset? maybe you don't have enough data to learn a useful embedding space, maybe you want to consider pre-trained word vectors from a bigger model & data set

mild dirge
iron basalt
mild dirge
#

He talks about a few presentations on yt, probably going to watch one to get an idea

desert oar
#

"bagging" is just a cute abbreviation of "bootstrap aggregating"

#
scarlet helm
scarlet helm
agile owl
#

what's the right way to find a model that fits a time series best on average across all segments if you split it up into, say 20ths

#

if I fit it overall there are certain periods that dominate and lead it to performing quite poorly in others and I want it to perform more consistently over different time periods

#

I started by just leaving out 10% of the data as test but nothing fits both train and test that well doing it the way I'm doing it

#

I wasn't just going to do average but average/std because just average would lead to the same result as fitting it the way I'm doing now I believe

#

I want a model to consistently perform well across all the segmentations without being necessarily optimal for any one of them

peak thorn
#

hi guy i m new in ml can you just me difference between weights,parameter and hyperp... in simple words it making me more confuse on internet...

agile owl
#

weights are endogeneously fitted by the model

#

hyperparameters are exogeneous to the model

#

parameters I beieve can refer to both

past meteor
agile owl
#

if my question isn't clear happy to clarify btw

serene scaffold
agile owl
#

I like the ones who post poll questions about Python or ML and their answers are WRONG

left tartan
past meteor
agile owl
#

I have a lot of ppl in my linkedin feeds posting "Python" or "ML" quizzes and their questions are either totally impractical or the answer are flat out wrong

past meteor
agile owl
#

@past meteor right, so this is making a bunch of train and test sets

#

where the test set for each train set comes after the train set

peak thorn
#

i still don't get it sorry help me init

agile owl
#

I guess it's better than before but I was worried that you might get different results just depending on how you split it up

#

so I wanted the training data to contain everything before a certain date

#

but control for performance on different splits of the train data

#

I guess I'll try this approach

#

and see how it works

past meteor
agile owl
#

thx will look into it more

past meteor
#

In reality what you do is this:

Decide how large your test set is in %
Find the date that aligns with this %
Split everything before this data into the training set. Everything after into the validation set.
Use time series validation to make N training sets that only contain data before the test sets it's evaluating on.
Do 1 final evaluation on the validation set

agile owl
#

gotcha

past meteor
agile owl
#

there are some numerical values or boolean values you might need to provide to the model

#

so it performs a certain way when doing its fitting to the data

#

that's another way to think about it

#

the model can't generate it itself because it's an assumption

serene scaffold
#

you can also think of hyperparameters as the parameters of the parameters

#

should have been called metaparameters

agile owl
#

yes

serene scaffold
#

then facebook will say that only they can do ml

agile owl
#

hah

peak thorn
#

@past meteor thanks

placid cedar
#

hi guys, are there any great ways to improve my CNN model's performance?

#

i have already done data augmentation, regularisation

agile owl
#

one common example is like, if you want to regularize the coefficients of a model (push them closer to zero to avoid overfitting) the penalty you apply to the sum of the weights is a hyperparameter

placid cedar
#

but my validation data just cant seem to increase

#

always around the 0.72 - 0.77 range

agile owl
#

technically the sum of the abs of the weights or the square of the weights

serene scaffold
#

!paste please never show code as text

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

placid cedar
#

ah alright

placid cedar
#
model_6 = models.Sequential()

model_6.add(layers.Conv2D(58, (3, 3), activation='relu',
                        input_shape=(img_size, img_size, 3))) # 3 colours rgb
model_6.add(layers.MaxPooling2D((2, 2)))

model_6.add(layers.Conv2D(116, (3, 3), kernel_regularizer=regularizers.l2(0.0001), activation='relu'))
model_6.add(layers.MaxPooling2D((2, 2)))

model_6.add(layers.Conv2D(232, (3, 3), kernel_regularizer=regularizers.l2(0.0001), activation='relu'))
model_6.add(layers.MaxPooling2D((2, 2)))

model_6.add(layers.Conv2D(232, (3, 3), kernel_regularizer=regularizers.l2(0.0001), activation='relu'))
model_6.add(layers.MaxPooling2D((2, 2)))
model_6.add(layers.Dropout(0.5))

model_6.add(layers.Flatten())
model_6.add(layers.Dropout(0.5))

model_6.add(layers.Dense(10, activation='softmax'))

model_6.compile(loss='categorical_crossentropy',
              optimizer=optimizers.Adam(learning_rate= 5e-4),
              metrics=['acc'])

model_6.summary()
#

current number of parameters is 903070

#

the current results

past meteor
#

The link I sent you are good tips

#

I could tell you what I do but it's more or less what's in the link

placid cedar
#

mmm i see

placid cedar
#

seems that the training data is starting to show slower learning rate alongside with the increasing epoches

past meteor
#

Start by looking at the mistakes you're making

agile owl
#

@past meteor "Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them." Does this not bias the weights towards fitting the starting period the best since it is included the most times?

past meteor
#

when I was doing comp vision in the past I noticed cases where the model was failing and I was like "damn I couldn't get that right either"

past meteor
#

I don't think there's anything you can do about this

agile owl
#

I guess part of my problem is also that the overfitting is being done by DEAP so I'm not sure how to reincoporate the validation results into the parameters explicitly

#

but I can work on that

#

I wanted to see if you can use this for online training

#

but wasn't able to find anything

past meteor
#

Why DEAP?

agile owl
#

because it's for trading strategies and I didn't want to make any assumptions that are necessary for likelihood based methods

#

to be fair part of it was just that I learned about it and thought it was exciting and wanted to explore this approach too

#

not necessarily extremely deliberate

#

but I know other people do this too so there's probably some basis

past meteor
#

Evolutionary algorithms are typically a waste of compute imho ๐Ÿ˜„

agile owl
#

why do you say that

past meteor
#

Because they are black-box optimization methods that use little to no information of the problem being solved

agile owl
#

it also avoids false precision from false assumptions though

past meteor
#

Yesn't

#

You still need a fitness function

agile owl
#

my thought process behind it is that a lot of people are making unjustified assumptions

#

the fitness function is much easier to think about though

past meteor
#

If you have a fitness function that is say the MSE you've devolved into just maximum likelihood

agile owl
#

just do something like total profit * (total_profit/max_drawdown) or something like that you know what you want to get out of it

agile cobalt
agile owl
#

right

#

that's why I'm trying to avoid overfitting

#

I've realized it's a problem

past meteor
#

I did a full course on EA's in uni. Fun stuff, really enjoyed it. Waste of compute for most problems.

agile owl
#

where would you say they are best suited then

past meteor
#
  1. Multi objective optimization
  2. Combintorial optimization
#

So actually I'd say: use it

#

It's fun to play around with because then you'll see the limitations more clearly after a while ๐Ÿ˜„

agile cobalt
echo mesa
#

Guys, do you guys know a good paper that explains linear regression from scratch both mathematically and having examples in python with it?

desert oar
agile owl
#

lots of what they do isn't actually that advanced at all

buoyant vine
#

no need :P Just need to be faster than you

agile owl
#

well exactly

#

I'm not playing that game I'm not trying to be an algo market maker

#

I'm going for superior signals

buoyant vine
#

Often they simply can afford what you cannot, and since most of the time they're doing HFT and are as close to the exchange as possible they just snuff out the little guys

agile owl
#

it's not HFT to be clear

#

I know that is not a space you want to compete with bigger players

#

momentum and value exist on different timescales though

#

I've been doing strategies based on analyzing the degree of mean reversion via hurst exponents

#

and using deap to fit parameters around how these things are calculated, windows, etc.

#

cutoff levels for different behaviors

#

like if it's trending then go with the trend unless the trend coefficient goes above a certain number then bet against the trend continuing

#

etc.

#

It wasn't clear to me that there's a framework that is meant for fitting parameters in this way which is why I went for DEAP because it doesn't require hardly any assumptions

#

you just need to give it the fitness function

jagged hedge
#

i have a macbook pro 13 inch M2 with 8GB memory how can i run LLMs like 66B parameters or 100B parameters on it is there a way i can do so on cloud like aws or colab pro and are they really worth and can run such big models??

desert oar
#

just keep in mind you're not the only person doing this, certainly not the first to have this idea

agile owl
#

the reason why trading strategies remain profitable over time is this: constrained behavior

#

lots of the biggest players have constrained or dumb behavior due to operational constraints

#

so you are siphoning money off of mutual funds and pensions and insurance companies that are not as concerned about short term profits

#

and do suboptimal behaviors

#

here's an example

#

the Goldman Sachs Commodities Index

desert oar
#

right. if you have a model that seems to work, go for it. just remember that backtesting only gets you so far. it sounds like you have more domain knowledge than the usual person who wanders in here asking about algo trading bots, so maybe you know what you're doing

agile owl
#

I have decent knowledge of markets I could get better at algos though

desert oar
#

but then i'm sure you know that you're not the only person attempting to profit off of well-known behavior

agile owl
#

yeah but it's about money-weighted views not people-weighted views

#

if the people with the most money act like whales then you can be the barnacle

#

or the fish that eat their dead skin

#

if the big companies do things on a scale that such things don't matter to them

#

then you can profit from the inefficiency it creates

#

every monthend they rebalance their portfolios etc.

#

just the fact that they wait until the end of the month and do it at the end of the month every time is an exploitable ineffcieicny

agile cobalt
desert oar
agile owl
#

I have some idea, some vision

#

I think saying I know what I'm doing is a stretch ๐Ÿ˜„

#

I do have an idea of where inefficiencies exist

#

and why they exist

#

but how to exploit them most effetively is what I'm trying to discover

#

if there's a better way to fit parameters for trading strategies than DEAP I'd like to find it too

jagged hedge
agile owl
#

I just assumed the cost function's topology was too ill-defined to use other techniques

#

but that might not be true

#

things like what combination of weights on moving averages, lenth of moving average windows, number of lags in hurst exponent calculations, etc.

#

there is no "y"-value

#

in a traditional sense

#

I think what I would use if I weren't using DEAP is reinforcement learning

#

but I still need to learn more about that and it takes more setup

desert oar
agile owl
#

it looks at the alignment of different periodicities of hurst exponents and moving averages

desert oar
#

maybe you can adjust or constrain it somehow, or reparameterize it to be something that can be optimized more easily

agile owl
#

and decides to go with or against trends

#

and scales bracket orders to conditional volatility

#

and the objective function itself is an accumulation of profit

#

that arises from its actions

#

overfitting is a serious problem though because what will generate the most profit changes over time periods

desert oar
#

i don't know what a hurst exponent is, i'm not a finance person myself. but it sounds like you have some kind of iterative thing, where the procedure collects data for some fixed period of time, then takes an action, then repeats?

agile owl
#

yes it's iterative

#

it takes an action given a signal that exists at one time

#

and we don't know what the value of that action is until some indeterminate time in the future

#

when it meets an exit criteria

#

whether that is a stop out or a take profit

desert oar
#

i see, yeah that definitely sounds like it could be a problem. it sounds like you were on the "high variance" side of the bias-variance tradeoff

agile owl
#

yes

desert oar
#

first things first, i'm not familiar with the technicalities of these models, but i know there is quite an extensive literature of reinforcement learning for exactly this kind of iterative agent scenario

#

so you first might want to just check to see what already exists to avoid reinventing the wheel

agile owl
#

I was avoiding the investment into RL because you have to design the game

#

which is a bit more involved than setting up DEAP

#

but a bullet I will have to bite eventually

desert oar
#

but in general, there are two broad approaches in machine learning and statistics to avoid overfitting: reduce the amount of information the model can obtain from the training data set, or try to generate a large collection of realistic synthetic data sets, and average across many model fits on those data sets

agile owl
#

first approach seems more feasible here

desert oar
agile owl
#

one thing I was thinking about doing is actually inverting the relationship between test and train

#

I train on the smaller, later dataset

#

and then see if it performed adequately on the larger one

#

in case we revert to a different historical regime

#

afaik I have no theoretical basis for doing that though

desert oar
#

so what are the actual parameters in this model? what is being learned/fitted here?

#

fitting on a smaller data set might just make the overfitting worse, hard to say

agile owl
#

trigger criteria, window lengths, and weights that sum to one

#

and a couple scaling values

#

toolbox.attr_fast_period,
toolbox.attr_med_period,
toolbox.attr_slow_period,
toolbox.attr_ma_signal_period,
toolbox.attr_hurst_signal_period,
toolbox.attr_hurst_lags,
toolbox.attr_trend_fast_weight,
toolbox.attr_trend_med_weight,
toolbox.attr_trend_slow_weight,
toolbox.attr_reversion_fast_weight,
toolbox.attr_reversion_med_weight,
toolbox.attr_reversion_slow_weight,
toolbox.attr_reversion_sigma_open,
toolbox.attr_reversion_sigma_close,
toolbox.attr_trend_sigma_open,
toolbox.attr_trend_sigma_close,
toolbox.attr_trend_sigma_stop,
toolbox.attr_reversion_sigma_stop,
toolbox.attr_hurst_kill_period_reversion,
toolbox.attr_hurst_kill_period_trend,
toolbox.attr_hurst_trending_trigger,
toolbox.attr_hurst_reversion_trigger,
toolbox.attr_trend_fundamental_scaler,
toolbox.attr_reversion_fundamental_scaler,
toolbox.attr_fast_fundamental_period,
toolbox.attr_med_fundamental_period,
toolbox.attr_slow_fundamental_period,
toolbox.attr_reversion_fast_fundamental_weight,
toolbox.attr_reversion_med_fundamental_weight,
toolbox.attr_reversion_slow_fundamental_weight,
toolbox.attr_trend_fast_fundamental_weight,
toolbox.attr_trend_med_fundamental_weight,
toolbox.attr_trend_slow_fundamental_weight,
toolbox.attr_hurst_bottomout_trigger,
toolbox.attr_hurst_topout_trigger

#

things like this

desert oar
#

i see

agile owl
#

it behaves differently in trending and reverting environments so they have independent parameters

desert oar
#

are these all or mostly continuous numerical values bounded in some known range? you might want to try something in the category of bayesian blackbox optimization instead of evolutionary algorithm

agile owl
#

some of them are ints

#

some of them are real numbers

desert oar
#

I don't have much experience with the latter, but I have modest experience with the former for hyperparameter tuning of machine learning models

past meteor
#

both of them can be used for the same set of problems

#

Bayesian optimization is sequential of nature

#

EA are made for parallelism

agile owl
#

parallelism is strong here

#

the time complexity is high

desert oar
#

in bayes opt you can still use parallelism to get better exploration of the parameter space at each step

past meteor
#

Bayes opt also tries to be really efficient in the amount of iterations it needs to find the optimum

#

I think bayes opt is closer to exploitation on the exploration - exploitation scale

desert oar
#

my thinking is that you might get better regularization out of it, fitting something closer to a smooth curve over the parameter space

#

not sure if that intuition is off

agile owl
#

I'll look into it

#

is there a lib you can recommend

desert oar
#

optuna and hyperopt in python. i've had good results with the latter specifically

#

i also wonder if you can simplify this somewhat by learning individual sub models instead of trying to optimize the entire thing all at once

agile owl
#

I think all of the parameters interact with each other though

desert oar
#

for example, if you need to forecast something in order to make a decision, you can fit a separate probabilistic model for that specific thing, and use the distribution of predicted forecasts as input to some decision component

past meteor
#

I'm still thinking of the EA vs bayesian opt

agile owl
#

like if I add a set of genes to the EA to do someting

#

they will change the optimal values of the other genes

past meteor
#

If evaluating your f isn't expensive I'd always take EA's

desert oar
#

yeah, i'm just thinking of heuristics that might get you closer to something that works well without overfitting, and might be faster/easier to iterate on

desert oar
#

which in this case doesn't seem to require intensive numerical computation, so maybe it's cheap to evaluate, in which case you can take advantage of the high exploration potential of EA?

#

i'm curious in that case if there are just some parameters you can tune to reduce sensitivity to data

agile owl
#

yeah I have one that scales fundamental signals based on a separate model

#

I tried making it a bool at first

#

and the version where it was nonzero always won

past meteor
#

You can turn it into a multi-objective optimization problem and add an objective tangentially related to overfitting if regularization isn't enough

agile owl
#

so I changed it to scale freely

#

i see

#

I tried to do that to some extent by making it scale to the overall number/size of transactions in the fitness function

#

but it was insufficient

#

I think the problem is the market switches dynamics

#

over time

#

so either have to figure out a way to segment the time series beforehand using something like an HMM state prediction

#

(but that presumes your HMM has the data that explains the switch in regimes)

#

but that will also produce its own form of overfitting I thin

#

where it assumes everything is too much like the other datapoints that are assigned that state

#

i'll look into bayes though thanks everyne

#

btw if you work with time series that exhibit heteroskedasticity or other types of heterogeneous behavior over time I think hurst exponents can be a valuable piece of information for signal processing

#

might be valuabe in other domains too

#

it's just a measure of how much something is like brownian noise

#

vs trending against the mean vs reverting to a mean

#

The Hurst exponent is a statistical measure of long-term memory of time series. The existence and form of such memory are of great interest in financial markets, as financial returns are not generally governed by random walks. The Hurst exponent is a single scalar value that indicates if a time series is purely random, trending, [โ€ฆ]

#

probably has applications to fraud detection, etc.

#

when something begins to trend that wasn't before

#

like is the number of users registering for your app from north macedonia trending hard

#

although you could probably catch something like that more easily

#

โ€ฆ[The concept] was originally developed in hydrology for the practical matter of determining the optimum size of the dam for the Nile river by Harold Edwin Hurst [but] can help us classify the pattern of time series of prices under a certain time horizon.โ€ [Bui and ลšlepaczuk]

desert oar
#

in my usual understanding of RL, part of what the algorithm does is adapt dynamically to the environment, for example the multi arm bandit. part of why i suggested looking into that literature is because maybe you could get ideas for how to make your model adaptive, rather than every parameter exactly to whatever happens to be in your historical data, which i suspect will always overfit to some extent

#

or maybe that's what those parameters you showed me are doing, i don't know enough about the models and domain to comment on that

agile owl
#

right I wanted to find a way to make this online

#

There is a layer of conditionality that is imposed by my apriori assumptions of what might be good

#

and the DEAP algo results either confirm or reject my hypothesis (at least on the same data)

#

the problem is generalizability which I think online learning would solve

#

at least as well as one could expect to solve it

#
 if self.trending_bull:
                self.state_history.append(1)
                fundamental_factor = vol_est * self.fundamental_signal_trend * self.p.trend_fundamental_scaler
                if self.data.close < self.target_trend - self.p.trend_sigma_open * vol_est - fundamental_factor:
                    stop_px = self.target_trend - self.p.trend_sigma_stop * vol_est - fundamental_factor
                    limit_px = self.target_trend - self.p.trend_sigma_close * vol_est - fundamental_factor
                    trade = self.sell_bracket(limitprice=limit_px, stopprice=stop_px)
                    self.log(f'TRENDING BULL - ENTRY {trade[0].price} - STOP LOSS {stop_px} - TAKE PROFIT {limit_px}')
``` so for example all of this logic was something I proposed a priori and simply fit the parameters the logic uses using DEAP
#

and I've been iteratively adding to it depending on what improves fitness and what doesn't

#

anything that is under the p attribute is a parameter exposed to DEAP directly

#

an example genome looks something like this

        champ = (
            [18, 52, 100, 6, 55, 9, 55, 94, 60, 15, 47, 149, 3.342307816105754, 1.094156334970098, 1.13173368372055, 10,
             -0.7274971304499664, 58, 24, 4, 0.6331356749123428, 0.3652205723791574, 7.9251262018994435, 39, 60, 134, 239,
             3, 38, 57, 123, 97, 9, 0.17026288765418376, 0.7678126560406366]
        )```
#

the ints are either windows or weights I designed to sum to one by taking the sum of multiple parameters as the denominator

#

I figured ints are fine for that since it would be false precision to use floats anyway

#

it's already producing a float at the end

#

where each int is the numerator and the sum is the denominator

#

this is one of the fitness functions I played around with to try to reduce overfitting by increasing the number of actions by the agent
fitness = profit * (profit / (max_dd if max_dd else 1)) ** 2 * np.sqrt(no_trades)

#

where no_trades is the number of actions that have been consummated

#

dd = drawdown

#

they get instantited from these prior distributions:

    toolbox.register("attr_fast_period", random.randint, 10, 51)
    toolbox.register("attr_med_period", random.randint, 15, 101)
    toolbox.register("attr_slow_period", random.randint, 20, 201)
    toolbox.register("attr_hurst_signal_period", random.randint, 1, 101)
    toolbox.register("attr_ma_signal_period", random.randint, 1, 101)
    toolbox.register("attr_hurst_lags", random.randint, 7, 20)
    toolbox.register("attr_trend_fast_weight", random.randint, 1, 100)
    toolbox.register("attr_trend_med_weight", random.randint, 1, 100)
    toolbox.register("attr_trend_slow_weight", random.randint, 1, 100)
    ...
    toolbox.register("attr_fast_fundamental_period", random.randint, 20, 120)
    toolbox.register("attr_med_fundamental_period", random.randint, 60, 240)
    toolbox.register("attr_slow_fundamental_period", random.randint, 180, 360)
    toolbox.register("attr_reversion_fast_fundamental_weight", random.randint, 1, 101)
    toolbox.register("attr_reversion_med_fundamental_weight", random.randint, 1, 101)
    toolbox.register("attr_reversion_slow_fundamental_weight", random.randint, 1, 101)
    toolbox.register("attr_trend_fast_fundamental_weight", random.randint, 1, 101)
    toolbox.register("attr_trend_med_fundamental_weight", random.randint, 1, 101)
    toolbox.register("attr_trend_slow_fundamental_weight", random.randint, 1, 101)
    toolbox.register("attr_hurst_bottomout_trigger", random.uniform, 0.1, 0.3)
    toolbox.register("attr_hurst_topout_trigger", random.uniform, 0.55, 0.8)
#

when I do booleans I use random.choice on a tuple containing 0 and 1

#

the overfitting in practice

#

the topline should become more positive and should be large in comparison to the red line beneath it if things are going well

desert oar
#

yeah, i think my only suggestion at this point is to look at reinforcement learning to see how they do it

#

it sounds like you have a good conceptual framework, which is probably the most important thing

agile owl
lapis sequoia
#

I am so bored with supervised learing. Need new ideas

#

Do any of you think that put 3000 hours into python and data stuff is kind of insane for the time span of a single year?

agile cobalt
#

3000 hours in a year sounds pretty insane for anything at all

torpid quartz
#

Anyone got practical ML/AL resources for beginners? Something that shows the process of making a ML project and doesnโ€™t get too deep into math.

torpid quartz
#

My mind kind of blanks when I look at a math formula

agile cobalt
lapis sequoia
#

Yes.

#

There was a time period for 3 months straight were I would program 14 hours a day every single day and go through dataset after dataset.

agile cobalt
#

is your issue just the notation, or the math itself?

torpid quartz
#

I dunno calculus though.

lapis sequoia
#

Dude, math is a very vast thing. The most vast thing ever

lapis sequoia
#

it is not that bad

agile cobalt
#

you could try keeping a reference sheet with the meaning of common symbols, or just spend some time properly learning it, but you'll have to get used to it one way or the other

lapis sequoia
#

@torpid quartz What is the highest math course you have ever took?

lapis sequoia
#

ok, not bad

#

take trig, then calc

#

If you want. I took calc1-3 before I touched stats

torpid quartz
#

I donโ€™t think trig is a course where I am

lapis sequoia
#

which, I do not knoe about

lapis sequoia
torpid quartz
lapis sequoia
#

Like I took insane grad level optimization classes and I barely use scipy,optimize

lapis sequoia
torpid quartz
#

The tutorials online only seem to touch the surface of opencv and CNNs

lapis sequoia
#

Ok, what have you done so far in ML?

torpid quartz
#

Uhhโ€ฆ train a decision tree with scikit learn and train the mninst dataset

#

Basically nothing

#

Bit of face detection, but not with a NN

lapis sequoia
#

what do you use the most for ML?

torpid quartz
#

Huh? Like what language?

#

I know a bit of scikit learn I guess

lapis sequoia
torpid quartz
#

I know python and rust pretty well in terms of general programming

#

Bit of c++, bit of Haskell, bit of ts

lapis sequoia
#

just do whatever you want. No one is stopping you

torpid quartz
#

Ok I guess

#

Iโ€™m thinking of if thereโ€™s a way to recognize hand gestures using ML

lapis sequoia
#

just do it

#

that is it

#

you just do it

torpid quartz
#

Withโ€ฆ what?

#

Iโ€™ve jumped headfirst into stuff before, but I have no idea how to even start this

lapis sequoia
#

Straight up, I would get more comfortable with basic stuff before jumping in

torpid quartz
lapis sequoia
#

I do not know, you are confusing me.

torpid quartz
#

Ok sorry

agile owl
#

reinforcement learning

agile owl
#

reinforcement learning I think addresses a much more interesting class of problems than supervised learning

#

not "what should the next value be" or "what class does this individual belong to" but "how should I behave over time to optimize some metric"

#

I haven't gotten into it as much as I'd like myself

iron basalt
lapis sequoia
agile owl
#

ok

lapis sequoia
#

Not now at least

umbral charm
#
import numpy as np
import matplotlib.pyplot as plt
#do not truncate
np.set_printoptions(threshold=np.inf)
x, y, z  = np.loadtxt(fname = 'data.csv', unpack = True, delimiter = ',', skiprows = 1,) #load data
X, Y = np.meshgrid(x, y)
Z = (2/5)*np.e**(-X**2/2) + (2/5)*np.e**(-Y**2/2) - (3/5)
r = (Z - z)**2
print(np.max(r))
print(np.min(r))
print(np.where(r == 1.7784100967330392e-16))
print(r[107, 107])
#

this retuns

#

0.2544509833800965
1.7784100967330392e-16
(array([ 4, 36], dtype=int64), array([107, 107], dtype=int64))
0.0015725117183072214

#

how come i cant find what index 1.77....E-16 is

arctic wedgeBOT
#

numpy.argmin(a, axis=None, out=None, *, keepdims=<no value>)```
Returns the indices of the minimum values along an axis.
desert oar
#

!e ```python
import numpy as np
x = np.array([0, -1e-16, 2, 4, 6, 8])
i_min = np.argmin(x)
print((i_min, x[i_min]))

umbral charm
arctic wedgeBOT
#

@desert oar :white_check_mark: Your 3.12 eval job has completed with return code 0.

(1, -1e-16)
umbral charm
#

on 2d arrays

desert oar
umbral charm
#

So what was wrong with np.where?

desert oar
desert oar
#

well, the problem is that you're looking for exact floating point equality, which is squirrely

umbral charm
#

just realised (array([ 4, 36], dtype=int64), array([107, 107], dtype=int64)) this means it in index [4, 107]

desert oar
#

argmin is going to be less fussy

#

also, the axis keyword is a bit funky. it tells you which axis/dimension is "consumed" by the operation. so axis=0 means that it will find the argmin by "consuming" the 0th (outermost) axis, returning a result with the other axes intact

#

!e ```python
import numpy as np
x = np.arange(9).reshape((3, 3))
print(np.argmin(x, axis=0))

arctic wedgeBOT
#

@desert oar :white_check_mark: Your 3.12 eval job has completed with return code 0.

[0 0 0]
desert oar
#

!e ```python
import numpy as np
import numpy.random

x_flat = np.arange(345)
np.random.shuffle(x_flat)

x = x_flat.reshape((3, 4, 5))

print(np.argmin(x, axis=-1))

etc.
arctic wedgeBOT
#

@desert oar :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | [[4 0 4 1]
002 |  [1 4 1 1]
003 |  [3 1 1 0]]
desert oar
#

(oops, it doesn't like argmin over multiple axes... TIL)

umbral charm
#

Im guessing argmax

#

is also asthing

desert oar
#

indeed, but it's better to check the docs than guess ๐Ÿ˜‰

agile owl
#

@past meteor what should the number of generations and size of the population be functions of when deciding those hyperparameters for EA? the crossover prob and mutation probs seem like pure shots in the dark but I imagine that you can reason about what you want here. Specifically, I'm wondering if having too many generations leads to overfitting.

#

seems like it should to me

#

seems like NGEN should be picked relative to the size of the dataset and number of codons in an individual but not sure if there's some rule of thumb here

lapis sequoia
#

What IDE do you guys use?

agile owl
#

pycharm

#

I'm thinking one way to get around this problem is exponential decay of profit so that profits from a long time ago add less to the fitness

#

the other idea I had is to use two fitness functions, one that calculates a reduced result over several segments and one that operates just on the final test

#

which isn't ideal but it seems to be a very direct way to get the existing algorithm to do what i want

agile owl
#
        inner_fitnesses.append(inner_fitness)
    fitness = np.median(inner_fitnesses)``` let's see where this black magic leads
#

it just werks

#

that's the other nice thing about EAs

#

I can do whatever I want and it's up to me to decide whether it makes sense or I like it

#

I can make it more conservative just by switching out median for min or some percentile below 50

#

the numbers are so much smaller it makes me sad but that's proof it's working

#

it takes 90 minutes across 16 high end Ryzen mobile cores from 2021

#

idk if that's considered expensive to you or not for EA

lapis sequoia
#

why is there like a facebook link under one of my repos?

lapis sequoia
cold osprey
#

wheres that

lapis sequoia
#

Under traffic

agile owl
#

I ended up finding a way to reduce the variance by fitting two separate models based on an innate state segmentation of the training data (i.e., whether the central bank is raising interest rates or not as opposed to doing some latent state model)

#

I also did the segmentation into different time periods and calculating the median fitness thing

#

I consider it a workable solution for now but will probably follow up on the oter suggestions you guys made here

rugged comet
agile owl
#

is the idea that you want to write one that will be plug and play with the rest of scikit learn

#

or just a decision tree classifier in general

#

because I think that class name is specifically one from sklearn isn't it

#

the decision tree algorithm in general there's a lot of resources

rugged comet
#

So sklearn already has a DecisionTreeClassifier. I'm trying to create my own without looking at sklearns source code.
Just a decision tree clasifier in general.

agile owl
#

this might be helpful

#

I was just asking because oftentimes people want to create custom versions of library classes and want them to play nicely with the rest of the library

#

which is a lot harder to do than just writing something

rugged comet
agile owl
#

I skimmed it and it seems as good as any other discussion of how DTs work inside

rugged comet
agile owl
#

well part of the joy of writing custom classes is you can decide how to do that yourself

rugged comet
#

ah

agile owl
#

if you want to see how sklearn does it you can try to go into their source code but I'm sure that it's an implementation of a base class and you'll have to jump all over to find the full picture

#

sounded like you had an interesting idea why not go for it and see what works and what doesn't

rugged comet
#

I'm thinking about how to store the decision at each node in the tree. Writing something like feature < threshold for a continuous column wouldn't work because it just gets evauated and doesn't save the condition itself.

#

Perhaps I could store the operator and the operands separately.

iron basalt
rugged comet
#

Wouldn't I have to overload the operator in whatever class the feature is and whatever class the threshold is? Or maybe I could create a new Decision class...

rugged comet
iron basalt
rugged comet
#

If I don't need that, I can't think of another way to save a conditional statement without it getting evaluated.

iron basalt
#

!e ```py
def decision(a, b):
def foo():
return a < b
return foo

d = decision(10, 20)
print(d())

arctic wedgeBOT
#

@iron basalt :white_check_mark: Your 3.12 eval job has completed with return code 0.

True
rugged comet
#

Yes, that's what I was thinking for making a new class, more or less.

iron basalt
#

!e ```py
import ast
print(ast.dump(ast.parse("x < y"), indent=2))

arctic wedgeBOT
#

@iron basalt :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | Module(
002 |   body=[
003 |     Expr(
004 |       value=Compare(
005 |         left=Name(id='x', ctx=Load()),
006 |         ops=[
007 |           Lt()],
008 |         comparators=[
009 |           Name(id='y', ctx=Load())]))],
010 |   type_ignores=[])
rugged comet
iron basalt
rugged comet
#

Something like this perhaps

class Decision():
    def __init__(self, left_operand, right_operand, operator):
        self.left_operand = left_operand
        self.right_operand = right_operand
        self.operator = operator

    def evaluate(self):
        return self.left_operand.operator(self.right_operand)

iron basalt
#
op(left, right)```
rugged comet
#
d = Decision(7, 8, __eq__)

It's saying the __eq__ isn't defined.
Oh preobably because it's a method.

iron basalt
#

Can use lambda here too.

rugged comet
#

That's a good idea.

iron basalt
#

!e py less = lambda x, y: x < y print(less(10, 20))

arctic wedgeBOT
#

@iron basalt :white_check_mark: Your 3.12 eval job has completed with return code 0.

True
rugged comet
#
class Decision():
    def __init__(self, left_operand, right_operand, operator):
        self.left_operand = left_operand
        self.right_operand = right_operand
        self.operator = operator

    def evaluate(self):
        return self.operator(self.left_operand, self.right_operand)
d = Decision(7, 8, lambda x, y: x == y)
print(d.evaluate())
False
#

I think this will work but it feels kind of weird.

#

Using a function to evaluate an operator.

iron basalt
#

Yup, and if you prefer __call__ instead of evaluate.

#

This is functional programming, It's cumbersome in Python since it does not support functional programming well, but it works.

rugged comet
#

I didn't know __call__ existed. Thanks.
I guess this is rather new to me.

iron basalt
rugged comet
#

Neat.

iron basalt
#

In mathematics and computer science, currying is the technique of translating the evaluation of a function that takes multiple arguments into evaluating a sequence of functions, each with a single argument. For example, currying a function

    f
  

{\displaystyle f}

that takes three arguments creates a nested ...

desert oar
rugged comet
#

I think I'm making an n-ary decision tree.

desert oar
#

with arbitrary n?

rugged comet
#

Yeah

desert oar
#

i'd say even then you probably can just store the split points in a tuple/list, no?

rugged comet
#

Can you elaborate on what you mean by "split points"?

desert oar
#

which decision tree algorithm are you implementing?

rugged comet
#

It's a classification one if that's what you're asking. If not, then I don't know.

desert oar
#

respectfully, i recommend clarifying the actual algorithm you want to implement before trying to implement it

#

there are standard algorithms for decision trees in ML

rugged comet
#

https://www.youtube.com/watch?v=_L39rN6gz7Y
For the most part, I'm trying to do this. But with multiple classes instead of two classes.

Decision trees are part of the foundation for Machine Learning. Although they are quite simple, they are very flexible and pop up in a very wide variety of situations. This StatQuest covers all the basics and shows you how to create a new tree from scratch, one step at a time.

NOTE: This is an updated and revised version of the Decision Tree St...

โ–ถ Play video
desert oar
#

i see. let me at least skim the video to see what they're presenting

rugged comet
#

Okay.

desert oar
#

this looks very much like a binary decision tree

#

i think you might be confused between the arity of the tree and the number of classes being predicted

rugged comet
#

Can I have a binary decision tree with 3+ classes?

#

I was thinking about that. For creating the split points, I could do "if it is this class, go to the left. all other classes go to the right."

desert oar
#

sure. the class score at each node is just the % of data points with that class in the node

#

well you can't use the classes to create the split points... otherwise, how could you make predictions on new data?

rugged comet
#

I oversimplified a little bit. You would still determine which class is the best predictor for each split point using the Gini Impurity.

desert oar
#

i don't follow, sorry

#

let's reserve the term "class" for the thing we are trying to predict - the output of the model

#

and let's use the term "category" for the inputs to the model

rugged comet
# desert oar i don't follow, sorry

So to build the tree, you determine which category gives the best split, the lowest Gini impurity. Then that category becomes the root. You send all rows that are True to the condition to the left and the rest to the right. WIth your new set of rows, for the left, you again determine the category that gives the best split. You do the same for the right. You continue until all leaves are pure, they only contain samples of one class.

desert oar
rugged comet
desert oar
#

a feature is a column in the input data. a category is... a category. e.g. a feature might be "eye color" and some categories of eye color might be "blue" and "brown"

rugged comet
#

I was just confirming.

desert oar
#

got it. i'm not sure if you're learning this material in english or another language

#

and you don't need to keep splitting until the tree is perfectly pure. scikit-learn for example provides several stopping criteria

#

e.g. you can refuse to split any leaf that's below a certain size. or you can refuse to split any leaf where the purity gain is lower than some threshold.

#

or you can refuse to split beyond a certain maximum depth

rugged comet
#

So are you saying that instead of finding the feature that gives the best split, you have to find the category that gives the best split among all features? I assume continuous features are also taken into consideration. Like which numeric threshold gives the best split. And then compare that with the rest of the categorical splits?

#

Say I have to features that each have 3+ categories, I would need to find the best category in order to determine the root node or any nodes thereafter?

desert oar
rugged comet
#

I assumed that the video I linked was implying that each split happens feature-wise, not category-wise.

desert oar
#

you only split on one feature at a time

#

but how do you choose which feature to split on? and how do you choose where to split?

#

the answer is that you choose the best split for each feature, and then you choose the feature whose best split is the best overall

rugged comet
# desert oar can you clarify what you mean by "feature-wise"?

I mean you pick a feature, such as Loves Soda, a binary feature, send all rows for which the condition is true to the left and the rest to the right.
Another example with 3 classes: Eye color. Send the blue-eyed rows to the left and the brown and green eyed rows to the right.
You'd determine which feature to split on by calculating the Gini Impurity for that split.

desert oar
rugged comet
desert oar
rugged comet
#

That's how I assume it would be done, but I don't have any evidence.

desert oar
desert oar
past meteor
agile owl
#

they can't make me redundant if I'm the only one who knows how to tune the hyperparameters ๐Ÿ˜Ž

#

./s

past meteor
#

You don't need to implement decisions trees like this but it's good to remember they're also called recursive partitioning. If you know this you can conceptually simplify the problem to this "how do I split in one node" and "when do I stop splitting?"

rugged comet
#

This project is fun so far. I'm glad I started it.

desert oar
#

the point i'm trying to emphasize here is that "splitting on feature" itself is an ill-defined concept

#

(also, terminology note: usually we think of every feature having its own distinct categories. so the categories of "eye color" are eye colors, the categories of "hair color" are hair colors, etc)

rugged comet
desert oar
rugged comet
past meteor
#

In uni we had great slides on DT's I can send them to you @rugged comet

past meteor
#

Maybe it goes into a bit too much detail towards the end. I think it's fine to go to slide 57, implement the tree and then go back and do the other 40

wooden sail
#

send that to me as well :x

blissful meadow
#

I have this AI model Im using flask to build. Our other backend is in node. The model takes 3 hours to get back with a response. How should I architect a request?

Should the node backend send an async post request with the data and wait for the the flask backend to respond. Or should the node just post and the flask says ok and does the process and post the results to the node backend separately?

cold osprey
#

3 hours wut

#

Inference time of 3 hours?

blissful meadow
#

Lets just say, it continously outputs what is needed and the end is a long list of things

#

I even wonder if I should go serverless

zealous tartan
#

hey, do you have suggestion for getting started in data science and ai with python. I am only a high school student who will start college next summer. I know the basics of python like defining a function, lists, loops, file functions, conditions, etc. i also know few basic modules like math, mysql.connector, random. I would only invest in some paid courses if they are actually worth it and also that doesn't really have a prerequisite like uni level maths cause i cant really cram uni level maths in few months can i? or maybe i can who knows. Well your help is appreciated just ping me or dm me the suggestions Thanks :D Also I dont understand git hub even a bit tuitorial for that would be nice too I am dumb like a 5 year old excited about magicians(aka cs engineers)

past meteor
past meteor
blissful meadow
# past meteor Can you explain what this model is? Why is it taking 3 hours? Why do you have a ...

the node backend is our normal application backend. Our AI model is in python. While it is possible to run python in node etc, we dont want to do that. So, the other choice was to make another ec2 instance and deploy the rest api using flask.
It takes 3 hours because it does a lot of work. One output is fed to another and in total takes a lot of time. Since the output is depended on each other, the work is sequential. Honestly, somone else wrote it and Im sure it could be made faster but it is what it is right now

#

Its a large language model

past meteor
#

Is it a neural network?

#

ah okay, that's an important thing to mention

#

CPU or GPU?

cold osprey
#

has to be gpu rightt

blissful meadow
#

yeah

#

we're calling an already build llm. Each call takes a minute. The amount of calls end up taking hours. The resource is not intensive on our end

past meteor
blissful meadow
#

Hence the serverless consideration

past meteor
#

Is this work you can do in-batch "whenever" and then give it to your main app? Does it need to be a web app?

brisk sage
#

Hey guys, Iโ€™m trying to analyze a relatively small dataset (128 observations with subgroups as small as 20). This data is continuous, but very nonlinear (independent vs dependent variable).
Do you have any suggestions for a good model to analyze this?

OLS and GLM wonโ€™t work since itโ€™s nonlinear, Iโ€™ve read and implemented a random Forest analysis, but with such a small sample size it might be prone to overfitting. Would a reduction of trees (say to 20/40 instead of 100/1000) work?

past meteor
# brisk sage Hey guys, Iโ€™m trying to analyze a relatively small dataset (128 observations wit...
  1. polynomial features and transformations with GLMs
  2. regularisation on your random forest (tune the cost complexity parameter). Reducing trees on random forest will not help you, it averages the trees. Reducing the amount of trees will likely cause more overfitting.
  3. Consider using gradient boosting. Usually it has stronger defaults than RF. You can tune it by reducing the amount of trees
  4. Try RBF-SVMs. They only have 2 or 3 hyper parameters for respectively classification and regression. They're more or less the best model on small datasets but they just scale poorly ๐Ÿ™‚
blissful meadow
past meteor
#

You might be already doing that, in that case: I don't know ๐Ÿคท .

blissful meadow
#

Im going to look into web hooks, pub/sub and the like.

past meteor
#

But that won't work if it's 3 hours ofc ๐Ÿคฃ

brisk sage
# past meteor 1. polynomial features and transformations with GLMs 2. regularisation on your...

Which of these would you recommend the most?
I have adapted the random Forest and created a SVM model, both of which return more less the same results. I have just been creating so many models lately and every new one is returning new results that Iโ€™m quite confused which is the best to use ๐Ÿ˜…
A friend who knows a thing or two about statistics recommended GLM, however with nonlinear data, a linear model wouldnโ€™t do well, thus I came up with random forest

past meteor
#

GLMs work well if you transform your input

left tartan
# zealous tartan hey, do you have suggestion for getting started in data science and ai with pyth...

This is a better question for #career-advice : it sounds like you probably need to first practice your Python programming a bit and reinforce what youโ€™ve learned, but you can also tackle some basic ML coding projects (see Kaggle.com/learn and CS50 for AI) to learn some of the coding basics. For preparing for college: make sure you math fundamentals are strong. Calculus is the typical first year course, and itโ€™s not hard to prepare for if you put a little time in. I wouldnโ€™t worry about AI/ML math if you havenโ€™t started college yet. Just be ready for calculus.

brisk sage
brisk sage
# past meteor GLMs work well if you transform your input

Since my data is right skewed, youโ€™re referring to a logarithmic transformation of the input?
This might be difficult, since it contains many 0 values.
Is there no other way?
SVM seemed to work, what did you mean by scaling poorly?

weak mortar
#

hello i have a quick question about interacting with HuggingFace API. you think here is the place to ask or some other room more appropriate ?

past meteor
past meteor
# serene scaffold what about xgboost

We'd have to run a benchmark. The appeal of SVMs are a universal approximator (contingent on tuning just needs 2 hyperparamers). Gradient boosting is a different beast, lots of dials you can turn.

In practice I always use Xgboost, CatBoost or HistGradientBoostingClassifier and I rarely use SVMs because:

  1. Gradient boosting performs really strongly with 0 hyperparam tuning, same can't be said for SVMs. I rarely can be bother to tune them in reality.
  2. I rarely have datasets small enough for support vector machines not to OOM.
#

On smallish datasets in the past I've run into situations where I couldn't outperform SVMs with gradient boosting tho

brisk sage
past meteor
brisk sage
#

Im trying to perform a multivariate analysis on my data, measuring the impact of something like age/time/temperature on amplitude values (given in percentage, hence the 0s).

Using these systems of course I get MSE values and have calculated a p value using a t test in the residuals.

Iโ€™m trying to publish a manuscript in medicine and the readers are rather fond of p values

past meteor
#

Okay, that's a very important detail.

#

I'd use a GLM then as well. I'd ask this question to a statistician as well, not people doing ML.

west cloak
#

I have a code

#

Can anyone look over it?

#

It is reinforcement learning and project is balls in bins

#

I am on b

brisk sage
#

Do you think a SVM would work here?

desert oar
#

p-values are problematic for several reasons, but i agree you are definitely looking for a more "statistical" approach

desert oar
#

so i suggest first figuring out some kind of modeling strategy first, and then getting it to work in code

#

you'll have to do that iteratively. start with a modeling strategy, then get it working in code. if the model is no good, repeat. don't try to do both simultaneously if you aren't confident with the code.

#

in statistical inference it's usually good practice anyway to have a model in mind before trying to fit a model. otherwise you end up just digging around for results, and that's how you get spurious invalid non-replicable results

buoyant vine
#

How can I convert a pytorch tensor from 1D to 2D?

Effectively, I have a single record which is a 1D tensor, but the model expects a batch, making it 2D which is shape of (N, 25) so how can I convert my 1D to be effectively a [[1, 2, 3...]] instead of a [1, 2, 3...]

serene scaffold
#

Or .reshape(1, -1)

#

Negative one gets solved for whatever integer completes the product

buoyant vine
#

ah

#

I see, let me try this

#

I had (1, 25) originally

#

ah, I think I might be dumb and forgetting my input is actually already 2D and it wants 3D

#

FacePalm I forget this is a GloVe model rather than 1D embeddings

serene scaffold
#

I'm on mobile or I'd explain better

buoyant vine
#

yeah, I got it working, thanks!

last lodge
#

Two questions:

  1. How can I add only the highlighted axis lines?
  2. How can I add padding to the axis labels?
#

found how to add padding, just need 1.

umbral charm
#
import numpy as np
import matplotlib.pyplot as plt
x, y, z = np.loadtxt(fname = 'data.csv', unpack = True, delimiter = ',', skiprows = 1) #load data
ax = plt.axes(projection='3d')
ax.scatter(x, y, z, cmap = 'turbo', s = 25, c = z, edgecolors = 'black') #creates a 3d scatter plot
plt.title('3d scatter plot of data.csv')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.colorbar()
plt.savefig('data.png', dpi = 300)
plt.show()
#

why wont my oclour bar show

#

i get

#

raise RuntimeError('No mappable was found to use for colorbar '
RuntimeError: No mappable was found to use for colorbar creation. First define a mappable such as an image (with imshow) or a contour set (with contourf).
NVM I FIXED IT

brisk sage
agile owl
agile owl
#

like n~=500

#

but of course it all depends on the specific use case

#

I don't really have any motivation to use SVM's anymore though

#

they are slow and usually don't work well without tuning which wastes even more time

#

am I wrong for dismissing SVMs?

past meteor
#

Tuning SVMs is not hard, it's 2 hyperparameters

agile owl
#

it's not hard but it can be tedious

past meteor
#

How?

agile owl
#

i'm a perfectionist I guess

#

I am never satisfied and keep trying to tune

past meteor
#

Then you'd love them. Just make a grid of parmaters on a log scale and search

agile owl
#

XD

past meteor
#

Grab coffee while it's running

agile owl
#

it's slow to iterate

past meteor
#

huh

#

ime they're fast

agile owl
#

I had a problem where an SVM took like 50x longer to fit than LightGBM and it performed way worse

#

and then to tune it on top of that?

past meteor
#

How much data did you have?

agile owl
#

it was wide data but n=~1000 or so

#

m ~= 100

past meteor
#

They're pretty much invariant to the amount of columns. At least, a lot more than other methods.

#

That's actually the main appeal of the method

#

As mentioned, the reason why people don't use them is that you need to compute the kernel matrix which is a N x N data structure

agile owl
#

my LightGBM estimates were actually correlated with the test data too and the SVM formed a cross on the scatterplot XD

#

I had never seen anything like that before

brisk sage
past meteor
#

Using 32-bit floating point you can compute the upper limit of how many data points you can use with SVMs based on your RAM, it's quite low

agile owl
#

SVM y_hat vs y_test plot be looking like +

#

admittedly without tuning

#

but when I saw that I didn't have any motivation to tune it when the boosting model already was better

past meteor
#

Yeah, that's the issue ๐Ÿ˜ฉ

agile owl
#

Is there a reason why you don't use LightGBM?

past meteor
#

I also use LightGBM ๐Ÿ™‚ LightGBM, Xgboost, CatBoost, HistGradientBoosting, ...

agile owl
#

I found this new thing called Natural Gradient Boosting out of Stanford

#

I thought it was really interesting but the variance estimates aren't that great compared to dedicated variance models like GARCH

#

one of the appeals is it estimates mean and variance at the same time

#

these gifs sold me

#

but then it didn't seem to really be great on large n data anyway vs the others

#

also it performs worse on the scale parameter than machine learning algorithms regressing y^2 on lagged y

#

for time series data at least

#

I tried using it in the same way I would use LightGBM or a RandomForestRegressor and it did comparably but the allure was not having to have a separate model for the variance with this approach and that didn't pan out

#

(it drastically underestimated true variance)

lapis sequoia
#

is there a way to run quantization in transformers on an amd gpu

serene scaffold
lapis sequoia
#

AMD only has rocm

serene scaffold
#

Then no.

lapis sequoia
#

oh

#

would there be a way to quantize the model from, e.g google colab then download to my system?

serene scaffold
#

Yes

lapis sequoia
#

ah

agile owl
#

I am very disappointed that people don't demand an alternative to Nvidia

#

are we just going to keep lining Jensen's pockets forever

serene scaffold
#

There should definitely be more options, yeah

agile owl
#

I think Intel is ironically probably closer to supporting more GPU compute than AMD last I checked

#

idk what AMD is doing

lapis sequoia
#

yeah

#

only disadvantage of an amd gpu

#

the price and performance is amazing but no development stuff

agile owl
#

I think they overfocus on the budget and gaming markets

#

prob bc they figured there's no way to beat Nvidia at CUDA

lapis sequoia
#

yea lol

agile owl
#

but they seem to just chronically underinvest in ROCm

lapis sequoia
#

i thnk i might get intel next, oneapi seems cool

#

BROO my google colab environment reset

#

Now i have to upload llama 2 7b AGAIN

#

i hate this

agile owl
#

this is why I just bit the bullet and made my own ML workstation although being able to do that makes me a bit privileged

serene scaffold
lapis sequoia
#

i have a gpu

#

well it wont recognize it

serene scaffold
#

Does it have cuda

lapis sequoia
#

bruh

serene scaffold
#

Because if you don't have cuda, that's the same as not having a GPU as far as ML is concerned

lapis sequoia
#

apparently someone was able to get rocm working with transformers

#

let me see how

agile owl
#

there are SOME things that support Rocm

#

you can do SOME ML

serene scaffold
#

Hmm, like what?

agile owl
#

I used it years ago i don't remember

#

I did get frustrated though

serene scaffold
#

I'm not doubting you. This is just the first I've heard

agile owl
#

there were lots of caveats

lapis sequoia
#

ive used tensorflow with rocm before i think

agile owl
#

and only the most popular and basic algorithms had any support

lapis sequoia
#

wait but i also need my gpu to be recognized in wsl

agile owl
#

I had some surplus AMD gpus

#

and I wanted to see if I could use them in a machine of redundant parts

#

to get some value for ML purposes

#

the project was aborted when I realized how limiting ROCm is

#

but it's not zero

lapis sequoia
#

my gpu in wsl is a Microsoft Corporation Device 008e

#

so i need to make it recognize my amd gpu

agile owl
#

is using WSL worth it if you have a L1 hypervisor Linux VM or baremetal

#

I still haven't used it

lapis sequoia
#

ok well i finally have my llama 2 7b back

#

but i have 1gb left

#

ayyy its loading shards

#

i just hope it doesnt take disk

#

it works guys

agile owl
#

glad it worked out for you

lapis sequoia
#

thanks!

keen quartz
#

hey can anyone help me ?
need a data base containg real resumes for a model cant get any data set for it can you all upload some on a form that i will share ?

iron basalt
# agile owl I think Intel is ironically probably closer to supporting more GPU compute than ...

This is not really AMD's fault in any way, you can do ML just fine on an AMD GPU. The issue is that everyone has locked into Nvidia by writing everything with CUDA. AMD has been working on conversion layers that let you effectively run CUDA on AMD GPUs. They do work, but do not have everything implemented. If the open source community wanted to, they could implement everything with OpenCL or Vulkan and then we would not be GPU vendor locked.

agile owl
#

AMD has NOT done the marketing or outreach

#

most people don't even know ROCm exists

#

it is totally their fault

#

Stelercus didn't even know it did anything at all

#

that's AMD's fault

iron basalt
#

(Until recently)

agile owl
#

you are blaming the coding community for not doing the thing but AMD didn't really try to make it happen either

#

they could have invested more into subverting NVIDIA's dominance but they shyed away to focus on gaming and budget computers

iron basalt
agile owl
#

I think ROCm also made some weird design decision

#

where everything has to be an atomic operation

#

whereas in CUDA that isn't the case

#

at least that was what was going on last time I used it

iron basalt
#

Nvidia did give them more support, which was the whole plan. Nvidia knew that they had an opportunity here, and they took it.

iron basalt
agile owl
#

ROCm was very ambitious I think

iron basalt
#

(Then we can even do FPGAs)

agile owl
#

trying to make it so you can use an array of any kind of different GPUs

#

and to make that work they had to enforce atomicity

#

but CUDA code isn't written like that

#

so not only do they provide less resources but they raised the bar

#

so many problems with that project

iron basalt
#

ROCm has many problems, I don't think it should be used, but there are other options that work fine.

agile owl
#

to be clear I think ROCm's way is better in theory

#

but in practice it will never get there

#

like GNU Hurd

iron basalt
#

Yes, in theory, but it's AMD and AMD flops when it comes to SDKs.

#

(Even in gaming)

#

Nvidia does now have a lock-in monopoly on deep learning, but AMD also did not really care / cared too late. And Intel is just doing their thing, not sure what they are really going for.

#

Many ML libs used to have OpenCL actually for a while.

past meteor
#

I know ROCm exists and stel probably as well but the issue remains if you want to deal with that uncertainty of it being runnable or not on AMD

iron basalt
#

OpenCL would be really nice to have again, since it's any GPU, CPU, or even other things like FPGAs.

past meteor
#

If you get an NVIDIA card you know ahead of time it'll run

#

If you go down the AMD ROCm route sooner or later you'll hit a brick wall. This can be after 1 day or after 1 year.

wooden sail
#

intel has something similar to nvidia's chokehold through mkl

#

lots of computing software runs better on intel processors thanks to it

agile owl
#

Stel actually said he didn't know ROCm was usable at all

#

which to be fair is a close approximation

#

xD

iron basalt
#

The best option are those CUDA-ROCm layers, like the one torch has.

#

I forgot their arconym, it's part of that whole family of HPC stuff.

past meteor
#

I went for an AMD CPU with an NVIDIA gpu ๐Ÿฅด

iron basalt
#

(I remember getting an Nvidia GPU that said it supported OpenCL on the box! But it did not)

wooden sail
#

that would make things like matlab chug on amd processors

past meteor
#

Damn is that why I can't run avx on our servers

wooden sail
#

it could be

past meteor
#

I'll check tomorrow

wooden sail
#

you may need to replace mkl with openblas

agile owl
#

what happened to OpenCL

#

did everyone just abandon it to live in Jensen Huang' sworld

iron basalt
iron basalt
#

Just deep learning specifically uses CUDA for everything.

#

Because the popular frameworks are built on it.

agile owl
#

Honestly someone should just antitrust them

#

I'm sure they're lobbying against it hard

iron basalt
agile owl
#

it's not the size it's the lack of competition

#

you can be a huge conglomerate in every market and that's fine

#

it's when you maliciously take control of one market

#

like Intel and NVIDIA try to do

iron basalt
#

Yeah, the problem is that computer hardware is really hard to get into.

agile owl
#

and MSFT

iron basalt
#

To make HPC stuff.

wooden sail
#

intel arc to the rescue ๐Ÿ˜ฉ

agile owl
#

I understand why that was the case in the past

iron basalt
#

And it does not help that nobody has interest in adopting anything else, the advice is still just "buy an Nvidia GPU."

agile owl
#

but we are in a new era

#

that demands more competition

#

this isn't bleeding edge anymore it's mainstream

#

the industrial organization has become a major hindrance

#

Nividia just captures all the economic surplus

iron basalt
#

Computer hardware is very complicated, involves geopolitics.

#

(Since it directly translates to military power)

agile owl
#

they should simply force Nvidia to split its software division off from the hardware one

iron basalt
#

Probably, and split in other ways.

agile owl
#

I think it's amazing they got everyone worried about SkyNet instead of their massive monopoly you gotta wonder if that's a PR campaign

#

"don't look here, look THERE"

#

anyway this is why I've long been a skeptic of intellectual property

#

if everyone could freely use available knowledge we'd all be better off

echo mesa
#

Guys I've gone through the introduction to ml andrew ng course and have read the data science first principles with python book. What would the next step be? Should I learn more statistics and then read the statistical learning with python book or what should I do, to deepen my knowledge? Perhaps I should learn more mathematics overall?

agile owl
#

try to do projects and try to learn how to improve their results

#

it's always best if you have something to work on

iron basalt
past meteor
echo mesa
agile owl
#

having something to work on gives you material to self-direct your education in a way that is most meaningful to you, raises important questions that often won't arise in coursework, and gives you more satisfaction (at least in my experience) and therefore motivation

rugged comet
#

What are the tradeoffs of using a binary tree vs. an n-ary tree for decision tree classifiers?
Are most decision tree classifier binary trees?

wary vortex
#

Hello, how can I create a text to text chatbot using pytorch and a dataset consisting of questions and answers? The chatbot should respond to questions asked(it is going to be a mental help chatbot specifically). I am new to pytorch and I can not figure out how to do it.

agile owl
#

is there any problem that an n-ary tree can solve that a binary tree can't?

#

What if you have a problem where only one split actually increases information?

#

you have to be careful with mental health but I'm sure you know that

agile owl
#

It seems to me like the tradeoff depends on whether more than one split per node is worth it

#

and you will only know that for a specific problem

#

I don't think it will end up mattering

#

but I might be wrong

#

my intuition says that it will collapse into the same solution and may perform slightly better or worse

#

the tree will be more shallow

#

so if you would have otherwise run into the depth limit that might be avoided which will lead to more total calculations running

#

that's my best guess

#

I think it's a bit easier to reason about a binary tree

edgy frost
#

I'm currently fine-tuning/training a sentiment analysis model DistilBERT with a dataset with 60,000+ entries using K-Folds, is it normal for it to take a while? (i tried 36,000+ before it took 1 hr and 30 minutes for 1 epoch alone) Just using my personal computer to run the training code.

agile owl
#

does anyone know of a somewhat frequently updated retail commerce prices paid dataset that's available at a reasonable cost or free

#

I imagine that this data is extremely valuable so the answer is probably hell no

boreal nest
#

hello everyone, has anyone tried prefect-flow for etl pipelines?

odd meteor
edgy frost
agile owl
#

trying to do BERT without a GPU is a fool's errand

cold osprey
#

LOL

#

Just funny haha

#

Can use colab or kaggle free if the data fits in the free gpu

primal ice
#

Is anyone know chatgpt dan prompt still work

past meteor
#

The nuance is that you want to train on GPU yes, but doing inference on CPU makes a lot of sense. BERT only needs 400 MB ram or so.

agile owl
#

he is training though

#

he's fine tuning

#

that's what I meant

past meteor
#

Alright! ๐Ÿ˜„ it's still a statement I'd try and nuance as much as possible. It's kind of important especially beginners know CPU inference is possible with a decent latency

#

GPU stuff just costs so much more so it's a good one to know. Base Bert takes like sub 150ms but you can get it to sub 10 if you try hard.

cold osprey
#

Any example of something that outright won't work on CPU?

odd meteor
#

I think anything that works on GPU can as well work on CPU. The major question would be, at what computational cost?

Some task are better off done in GPU than CPU (and vice versa)

cold osprey
#

Ah ok

zealous tartan
wooden sail
#

the short answer is yes. the long answer is yeeeeeeeees. almost everything in ML and data science either straight up IS statistics, or involves it in some way. it's easier to pick up after calc and linalg though

left tartan
left tartan
gloomy parrot
#

Hello eveyone, im currently using detectron2 for objec detection but im having a problem when it comes to predicting, it gives me a wrong prediction? How can i solve this?

lapis sequoia
#

is there a way to download a 4bit converted model from colab

lapis sequoia
#

when i try push_to_Hub (huggingface) it says "NotImplementedError: You are calling save_pretrained on a 4-bit converted model. This is currently not supported"

dense crane
#

what can be other ways to deal with that loss instead of changing the generator architecture?

#

i mean i will improve the generator but is there something else what can i do?

simple shuttle
#

Anyone has any suggestions on how to start learning machine learning please? Should I simply go to youtube ?

serene scaffold
simple shuttle
#

I think I am capable of the math required for machine learning as I did computational finance for my master degree

serene scaffold
simple shuttle
serene scaffold
#

okay, because I wouldn't expect that degree to include linear algebra. but I wouldn't know, either.

there are these three textbooks: #data-science-and-ml message

and then there are more suggestions on our website

#

!resources data science

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

simple shuttle
#

Thanks a lot man

#

I'll get right on that

odd meteor
echo mesa
#

Guys, besides gradient descent, what other methods are there to find the best fitting line for a data set? Why is gradient descent so popular, is it the "best" and most effective method?

#

Also is it any helpful that as I go thru different statistics concepts i would try to implement them in python?

lapis sequoia
#

C? Data Science? Why not?

echo mesa
lapis sequoia
#

No. I just wondered if that would just be complete nonsense.

#

like, can the economy support 40,000 devs? I do not know

lapis sequoia
echo mesa
lapis sequoia
#

No, just went to school. Uh, actually, I cannot remember the name of the youtube channel, but they wrote a book called something like 'Intro to statistical learning' or something. I read that entire text and watched all of thier videos. Also, a couple of other good channels: Data School, Statquest,Zedstatistics

#

That is all you need really

lapis sequoia
#

yes. Damn.

#

Mine was in R. Damn, that was my introduction to machine learning years ago

echo mesa
lapis sequoia
#

R is just so bad and I am not saying that because this is a python server

echo mesa
lapis sequoia
#

user-defined functions in R are just the worst

#

Youll be fine then

echo mesa
#

I mean its kinda general to say this, but in the book "data science first principles in python" he said that he doesnt like r and he would rather focus on python

lapis sequoia
#

That text just changed my life in such a drastic way, That hit me hard.

echo mesa
willow sorrel
#

hey theree guys i need a little help around computer vision for a hackathon, its probably a basic thing to work around with but i got no experience with computer vision or such libraries, i'd be really grateful if someone can help me around it. its basically detecting the gun at the first stage, at the second stage detect the gun and the person holding it and capture the person, in the third stage we need to detect if the person is holding it in a stance of firing or holding it neutrally, im done with the first stage of the problem cant figure anything from there on, i'd be really grateful if anyone can guide me towards the second stage atleast

lapis sequoia
#

I meant it in a good way

odd meteor
odd meteor
# echo mesa Also is it any helpful that as I go thru different statistics concepts i would t...

Most of those statistical concept are already implemented in a lot libraries like Scipy, Statsmodel, Sklearn, PyTorch, TensorFlow etc... So yeah it's cool to implement them in Python.

You might find the latest edition of ISL very useful since its last edition is in Python.

I guess my only let down is that, it doesn't cover conformal prediction yet. Hopefully they'll add a chapter on Conformal Prediction in their subsequent edition.

The free pdf Version of ISL can be downloaded via https://www.statlearning.com/

odd meteor
lapis sequoia
#

I donโ€™t know, I used to use R a lot but learned more in python because there was a snobby Data Science community on a different discord server that just spit on R

lapis sequoia
#

Reading that text and knowing everything they are talking about makes me want to cry. Like, I donโ€™t know. The hardest course I ever took was a optimization grad class which was so hard that I suggest that no one does it. I donโ€™t, lol. I was shooting heroin two years ago for five years. I was sick of that life and litterally replaced it with DS when I decided to go to grad school and get my masters. I was so inspired by that R version of that text. It is hard to explain how m uh that means to me. Really nothing means more to me than that. And the person who introduced me to this. Your mind is powerful and you can do whatever you want if your conviction is true. No one has ever failed when they genuinely tried. Sorry for that long text, just had to say that. You can do whatever you want and your mind is reality.

past meteor
# echo mesa Guys, besides gradient descent, what other methods are there to find the best fi...

The reason why gradient descent works so well is people prioritize loss functions that are convex, basically U shaped. If the optimization surface has this shape you can easily use gradients to iteratively move to the "bottom" of the U, that's where the derivative is 0, that's the idea.

Now, there's equivalences between doing this and maximizing the (negative log) likelihood. Maximising the likelihood is basically choosing the weights such that P(y|X) is as high as possible. In words "the probability of observing the target variable given the data is as high as possible". As emyrs said, there is also an equivalence between gradient descent, maximum likelihood and certain matrix decomposition methods that come out of linear algebra. OLS for instance gives you a solution that 1) maximizes the likelihood 2) has a gradient of 0 with a closed-form solution.

Last but not least, there's fun touchpoints with computer science. (Stochastic) Gradient descent is very efficient in that it's optimized for not using a lot of memory, it scales well for large datasets. If your dataset is small you can use second derivatives, conjugate gradient, BFGS and so on. Matrix decomposition also uses way more memory and thus doesn't scale as well to large datasets. Also good to know there's algorithms like SVM that have a "problem" that have non-convex surfaces, they use more exotic things like quadratic programming.

echo mesa
wooden sail
#

i would round this out by mentioning that only linear least squares is this nicely behaved

#

if you formulate the maximum likelihood problem using a neural network, for example, the cost function is not convex. in these cases, the solution you get from gradient descent depends on how close you were to a particular local minimum or saddle point

oblique quarry
#

does anyone know why Im getting negative eigenvalues? ```py
def sqrtOfMatrix(data):
eigenValues, eigenVectors = np.linalg.eigh(data)
assert (eigenValues >= 0).all(), "Matrix should be positive semi-definite"
return eigenVectors * np.sqrt(eigenValues) @ eigenVectors

matrix = np.random.randn(5,5)
matrix = matrix - matrix.mean(axis=0, keepdims=True)
covMatrix = matrix.T.dot(matrix)
sqrtMatrix = sqrtOfMatrix(covMatrix)

this book, these matrices, however, will be assumed to be positive definite. In view of this
assumption, these matrices will also admit their respective inverses."
wooden sail
#

the original matrix you made has no special properties other than having only nonnegative entries

viscid socket
#

hey can anyone give me some nlp project ideas?

oblique quarry
# wooden sail the original matrix you made has no special properties other than having only no...

Thank you, changed the code accordingly but it still produces the same error. I assume that it has to be numerical imprecision. I assume that this is because the majority of the variance is explained in the first principal component so that eventually the magnitude of represented by the eigenvalues becomes so small that it virtually becomes zero. Mabye adding a regularization term will do?

#

yeah it seemed to have work, but if theres a better way than just outright increasing the determinant, let me know ```py
def sqrtOfMatrix(data):
data += np.eye(len(data)) * 1e-12
eigenValues, eigenVectors = np.linalg.eigh(data)
assert (eigenValues >= 0).all(), "Matrix should be positive semi-definite"
return eigenVectors * np.sqrt(eigenValues) @ eigenVectors

matrix = np.random.randn(5,5)
matrix = matrix - matrix.mean(axis=0, keepdims=True)
covMatrix = matrix.T.dot(matrix)
sqrtMatrix = sqrtOfMatrix(covMatrix)```

wooden sail
#

random matrices should have exponentially decaying eigenvalues, off the top of my head. there should be some papers discussing this... at least for the case of matrices with gaussian entries. then when squaring, you get a rather poor condition number unless you load the main diagonal

oblique quarry
#

[-1.84880995e-15 1.78080773e-01 1.89576151e+00 4.99760224e+00
1.34696135e+01]

oblique quarry
wooden sail
oblique quarry
#

๐Ÿ‘

wooden sail
#

the high level idea being that, even in the best case (depending on the distribution), you will only get a diagonal covariance for infinitely long vectors or with infinitely many realizations averaged out. in the finite case this means the vector in the matrix are not orthogonal, which will have an impact on the eigenvalues

oblique quarry
#

Much appreciated

buoyant vine
#

\o/ For the first time, I have actually connected a training dashboard to my AI runs so I can actually see what the model is doing, neptune is cool, but i'm wondering if MLFlow is a better cheaper alternative pithink

buoyant vine
#

Has anyone used the XLMR bert model before btw on larger datasets (800k+ points) my loss seems to be higher than I expected and the change seems to not really reducing and I can't quite work out if I should stop it or not...

It has early termination setup, but you can see the loss is changing quite aggressively

#

I think part of the reason might be the dataset itself isn't shuffed (which maybe I should do that ๐Ÿ˜… )

#

so very similar pieces of text are already likely part of the same batch

narrow tiger
#

any free gpt like repo i can clone and use for basic chating like reply with true or false
"sun sets in west"
"is message "xxxxxx" a spam"

buoyant vine
#

requires a reasonably decent machine to run the smallest models still tho

narrow tiger
#

found this they say u can use local host running a model as replacment for openai API too

buoyant vine
#

yes there are lots of alternatives mostly build on llama

#

that being said, you need a very big machine to run the bigger models

#

and realistically its the biggest models which are the ones more comparable to openai

narrow tiger
#

thanks

#

this is insane literally free pithink

#

also saw a video/image generative model

narrow tiger
#

is llama2 different then ollama?

buoyant vine
#

ollama is a tool for running llama and other llms

#

Install ollama then you can do ollama run llama2

#

which will run the 7B param model

narrow tiger
#

thanks

narrow tiger
#

it's too frank ๐Ÿคฃ

#

isit supposed to be this frank out of the box lol?

#

ok iis codelama better almost as good as phind or chatgpt?

#

what are some cool things you guys are builduing/testing using these models?

echo mesa
#

Guys, Im about to read the statistical learning with application in python book, im just wondering whether i should learn a book about just statistics so that i can understand those concepts in the book? Or would it make sense to just read about general statistics and implement the concepts in python and then move onto the statistical learning book?