#data-science-and-ml
1 messages ยท Page 89 of 1
why does every subdomain use the same basic terminology to mean different things ๐ฉ
any form of stochastic grad desc is data driven, regardless of what you're optimizing
yeah it's dumb shit
i kinda trust this paper cuz this yonina eldar is a well known mathematician
"Perceptron" is a specific thing though, you can't change its activation function. #data-science-and-ml message
They have model-based on a spectrum with data-driven on opposite ends
i'm not sure i fully agree with that
I think they use "model" in the same way as reinforcement learning uses "model" (the MPC-kind of "model")
Do you mean you disagree with yonina eldar ๐
(I don't know who they are)
lemme go check ๐ฉ smh
I lifted this from the paper
oh i totally missed it said perceptron
i've been bamboozled
i mean, in the text it says "purely data-driven" which is different from how it's shown in the figure
You might find "gaussian perceptron" but that is abusing the naming.
cuz you can have model-based and data-driven at the same time
That's like calling my layer with a non-linear activation function a "linear nonlinear layer."
hybrid model-based/data-driven systems...```
the "model-based" part refers to architecture, while "data-driven" refers to how the parameters of the architecture are learned
they're letting people off easy by not calling it black-box + data-driven
Yes ๐ค
I agree with this. The "fully" model driven (whatever that is) still need to estimate a handful or parameters
But what they can do is way more constrained
yeah. and you can either do that from data stochastically or analytically
right, the one model will only work for a specific type of problem, usually
it can't adapt by just changing the parameters
The predictions are as good as the chosen model
So if the model used in the domain is already a massive oversimplification of reality
Like the ones are that try to model human stuff in silco
that's one take on what model means though, not the only one
the other one is to take an optimizer which has convergence guarantees but involves expensive steps
then replace those expensive steps with a black-box network
this is independent of whether what you're "modelling" is modelled with a network or not
Ah yes, okay this makes sense
So you're not using it as an emulator - you're basically using it to converge your expensive model
Correct?
you'll see a lot of ADMM on crack done this way
kinda, yeah
a good combo of these things is to take a network that learns a model for something complicated, and its parameters are learned by grabbing a well established optimization routine and intertwining it with the network
these usually end up somewhat like autoencoders
like input -> modelling network -> output -> optimization routine turned into network -> the parameters we care about
and you learn the parameters for the forward and inverse networks together using some fancy cost function, e.g. possibly enforcing the model network to solve a differential equation as part of its cost instead of only fitting data
not off the top of my head, but many recent papers solving inverse problems with physics-informed neural networks should be doing something similar
Yeah, columns like zip code make that tough.
I am writing a DecisionTreeClassifer from scratch for fun and to help me understand the algorithm better.
How are model weights/decision nodes saved within a DecisionTreeClassifier object? I'm thinking about perhaps creating a new class for the nodes similar to the Tree data structure.
I think that's how they might do it.
print("Hi")
Actually, I'm just confused about how to ask the question
Hello everyone! Is there any experienced python webscraper around whos got 5 minutes for a few questions from a noob??? :$$$
This channel isn't about web scraping. So please open a thread in #1035199133436354600. And give enough information that people can start answering your question
Just say as much as you can about what you're trying to do
Done, thank you!
I want to make modeling using the KNN method but I can't understand the material yet. Is there anyone who wants to give me material or explain the material to me?
Try reading a different guide or watching a different video, and use this channel if you have a specific question. If someone writes up an explanation for you, it won't be fundamentally different from explanations you can already find online
But if you already have a more specific question than that that you can put words to, please go ahead.
I am actually looking for a team to participate in ML hackathons
I have worked on LLMs, autogen, diffusion models, VAEs
If anyone is interested pls let me know.
@iron basalt I am now reading conscious MIND Resonant BRAIN, but I am coming across a lot of tiny mistakes (referencing the wrong images, mixing up rows/columns, saying someone lived from 1869 until 1854 etc.). Am I reading an old version or something, or is this something you found as well? (asking you because you recommended this book earier, the book is really interesting so far)
I have a question on classifiers and the bagging method. What classifiers gain from this method?
Hello, I am creating a sentiment analyzer with Python and keras vanilla RNN, the dataset consists of two columns sentence and label (Positive and negative), I tokenize these sentences, eliminate stop words and convert them to a number, currently the accuracy of my model is 52% How can I improve this?
Here is my model definition
!code
def vainilla_rnn():
model = Sequential()
model.add(Embedding(vocab_size, 200, input_length=maxlen))
model.add(SimpleRNN(200, input_shape=(maxlen,1),return_sequences=False))
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))
model.summary()
adam= optimizers.Adam(lr=0.001)
#model.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'] )
return model
Here is model definiton
how many epochs did you run?
10 but If I increment the number of epochs I don't get a big increase in the performance
For example I tried with 30 and the result was similar
unless this is a known data set with known benchmark accuracy with various model types, there's always the possibility that the data are not clearly separated by sentiment. where are you getting the sentiment labels from?
and as a baseline, did you try something like logistic regression with tfidf features, or hashed features, or a simpler word embedding model like cbow/skipgram ?
It has several mistakes and could use a new edition. It also tends to go all over the place at times.
also how big is the dataset? maybe you don't have enough data to learn a useful embedding space, maybe you want to consider pre-trained word vectors from a bigger model & data set
Exactly haha. Really interesting book, but kinda hard to follow at certain points.
Grossberg is known for having confusing presentations too.
He talks about a few presentations on yt, probably going to watch one to get an idea
you're asking what's the benefit of using bagging? are you familiar with the more general concept of bootstrapping in statistics? bagging is more or less just bootstrapping for a predictive model
"bagging" is just a cute abbreviation of "bootstrap aggregating"
This example illustrates and compares the bias-variance decomposition of the expected mean squared error of a single estimator against a bagging ensemble. In regression, the expected mean squared e...
Ensemble methods combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator. Two very famous ...
The dataset used is "Sentiment Labelled Sentences Dataset", from the UC Irvine Machine Learning Repository.
The sentences come from three different websites/fields:
amazon.com
imdb.com
yelp.com
https://archive.ics.uci.edu/dataset/331/sentiment+labelled+sentences
Discover datasets around the world!
I also implemented a Dummy Classifier and it got a acurracy of 51%
what's the right way to find a model that fits a time series best on average across all segments if you split it up into, say 20ths
if I fit it overall there are certain periods that dominate and lead it to performing quite poorly in others and I want it to perform more consistently over different time periods
I started by just leaving out 10% of the data as test but nothing fits both train and test that well doing it the way I'm doing it
I wasn't just going to do average but average/std because just average would lead to the same result as fitting it the way I'm doing now I believe
I want a model to consistently perform well across all the segmentations without being necessarily optimal for any one of them
hi guy i m new in ml can you just me difference between weights,parameter and hyperp... in simple words it making me more confuse on internet...
weights are endogeneously fitted by the model
hyperparameters are exogeneous to the model
parameters I beieve can refer to both
weights = parameters = coefficients
hyperparameters are the "settings" of your model.
if my question isn't clear happy to clarify btw
you know wannabe linkedin influencers who post bullshit programming suggestions?
I saw one who referred to keyword arguments of a pandas method as hyperparameters 
that's just ... I have no words
I like the ones who post poll questions about Python or ML and their answers are WRONG
I was trying to find an emoji that expressed my feelings.
Time series cross validation exists
I have a lot of ppl in my linkedin feeds posting "Python" or "ML" quizzes and their questions are either totally impractical or the answer are flat out wrong
Examples using sklearn.model_selection.TimeSeriesSplit: Time-related feature engineering L1-based models for Sparse Signals Visualizing cross-validation behavior in scikit-learn
@past meteor right, so this is making a bunch of train and test sets
where the test set for each train set comes after the train set
i still don't get it sorry help me init
I guess it's better than before but I was worried that you might get different results just depending on how you split it up
so I wanted the training data to contain everything before a certain date
but control for performance on different splits of the train data
I guess I'll try this approach
and see how it works
yes, time series split does this
thx will look into it more
In reality what you do is this:
Decide how large your test set is in %
Find the date that aligns with this %
Split everything before this data into the training set. Everything after into the validation set.
Use time series validation to make N training sets that only contain data before the test sets it's evaluating on.
Do 1 final evaluation on the validation set
gotcha
parameters = "the model", hyper parameters = settings used to create the model
there are some numerical values or boolean values you might need to provide to the model
so it performs a certain way when doing its fitting to the data
that's another way to think about it
the model can't generate it itself because it's an assumption
you can also think of hyperparameters as the parameters of the parameters
should have been called metaparameters
yes
then facebook will say that only they can do ml
hah
@past meteor thanks
hi guys, are there any great ways to improve my CNN model's performance?
i have already done data augmentation, regularisation
one common example is like, if you want to regularize the coefficients of a model (push them closer to zero to avoid overfitting) the penalty you apply to the sum of the weights is a hyperparameter
but my validation data just cant seem to increase
always around the 0.72 - 0.77 range
technically the sum of the abs of the weights or the square of the weights
there's no way to answer that question in general. you have to say what the model does, how you're currently going about training it, how it's currently performing, and some high-level properties of your training data. without at least all that information, people can only give random suggestions that might be useless.
!paste please never show code as text
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
ah alright
Musings of a Computer Scientist.
model_6 = models.Sequential()
model_6.add(layers.Conv2D(58, (3, 3), activation='relu',
input_shape=(img_size, img_size, 3))) # 3 colours rgb
model_6.add(layers.MaxPooling2D((2, 2)))
model_6.add(layers.Conv2D(116, (3, 3), kernel_regularizer=regularizers.l2(0.0001), activation='relu'))
model_6.add(layers.MaxPooling2D((2, 2)))
model_6.add(layers.Conv2D(232, (3, 3), kernel_regularizer=regularizers.l2(0.0001), activation='relu'))
model_6.add(layers.MaxPooling2D((2, 2)))
model_6.add(layers.Conv2D(232, (3, 3), kernel_regularizer=regularizers.l2(0.0001), activation='relu'))
model_6.add(layers.MaxPooling2D((2, 2)))
model_6.add(layers.Dropout(0.5))
model_6.add(layers.Flatten())
model_6.add(layers.Dropout(0.5))
model_6.add(layers.Dense(10, activation='softmax'))
model_6.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(learning_rate= 5e-4),
metrics=['acc'])
model_6.summary()
current number of parameters is 903070
the current results
The link I sent you are good tips
I could tell you what I do but it's more or less what's in the link
mmm i see
should i try to make my model more complex till it reaches an accuracy of above 0.9, then start to regularise?
seems that the training data is starting to show slower learning rate alongside with the increasing epoches
it's all in the link ๐
Start by looking at the mistakes you're making
@past meteor "Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them." Does this not bias the weights towards fitting the starting period the best since it is included the most times?
when I was doing comp vision in the past I noticed cases where the model was failing and I was like "damn I couldn't get that right either"
Oh that's an interesting remark
I don't think there's anything you can do about this
I guess part of my problem is also that the overfitting is being done by DEAP so I'm not sure how to reincoporate the validation results into the parameters explicitly
but I can work on that
I wanted to see if you can use this for online training
but wasn't able to find anything
Why DEAP?
because it's for trading strategies and I didn't want to make any assumptions that are necessary for likelihood based methods
to be fair part of it was just that I learned about it and thought it was exciting and wanted to explore this approach too
not necessarily extremely deliberate
but I know other people do this too so there's probably some basis
Evolutionary algorithms are typically a waste of compute imho ๐
why do you say that
Because they are black-box optimization methods that use little to no information of the problem being solved
it also avoids false precision from false assumptions though
my thought process behind it is that a lot of people are making unjustified assumptions
the fitness function is much easier to think about though
If you have a fitness function that is say the MSE you've devolved into just maximum likelihood
just do something like total profit * (total_profit/max_drawdown) or something like that you know what you want to get out of it
the AI may as well still learn a bunch of unjustified assumptions that only apply to the training data and will not transfer at all to the real world
I did a full course on EA's in uni. Fun stuff, really enjoyed it. Waste of compute for most problems.
where would you say they are best suited then
- Multi objective optimization
- Combintorial optimization
So actually I'd say: use it
It's fun to play around with because then you'll see the limitations more clearly after a while ๐
one way or the other, just remember that you'll be competing with a few billion dollar companies if you are thinking about algorithmic trading.
Be fully prepared to lose any and all money you throw into it.
Guys, do you guys know a good paper that explains linear regression from scratch both mathematically and having examples in python with it?
It sounds like you want a textbook, not a paper. Look up Introduction to Statistical Learning with Python, it should be freely available on the author's site as PDF (although I personally enjoy having hardcopy books when I can)
gotcha, thanks
I've worked for billion dollar companies and I see the opportunity for disruption
lots of what they do isn't actually that advanced at all
no need :P Just need to be faster than you
well exactly
I'm not playing that game I'm not trying to be an algo market maker
I'm going for superior signals
Often they simply can afford what you cannot, and since most of the time they're doing HFT and are as close to the exchange as possible they just snuff out the little guys
it's not HFT to be clear
I know that is not a space you want to compete with bigger players
momentum and value exist on different timescales though
I've been doing strategies based on analyzing the degree of mean reversion via hurst exponents
and using deap to fit parameters around how these things are calculated, windows, etc.
cutoff levels for different behaviors
like if it's trending then go with the trend unless the trend coefficient goes above a certain number then bet against the trend continuing
etc.
It wasn't clear to me that there's a framework that is meant for fitting parameters in this way which is why I went for DEAP because it doesn't require hardly any assumptions
you just need to give it the fitness function
i have a macbook pro 13 inch M2 with 8GB memory how can i run LLMs like 66B parameters or 100B parameters on it is there a way i can do so on cloud like aws or colab pro and are they really worth and can run such big models??
just keep in mind you're not the only person doing this, certainly not the first to have this idea
the reason why trading strategies remain profitable over time is this: constrained behavior
lots of the biggest players have constrained or dumb behavior due to operational constraints
so you are siphoning money off of mutual funds and pensions and insurance companies that are not as concerned about short term profits
and do suboptimal behaviors
here's an example
the Goldman Sachs Commodities Index
right. if you have a model that seems to work, go for it. just remember that backtesting only gets you so far. it sounds like you have more domain knowledge than the usual person who wanders in here asking about algo trading bots, so maybe you know what you're doing
I have decent knowledge of markets I could get better at algos though
but then i'm sure you know that you're not the only person attempting to profit off of well-known behavior
yeah but it's about money-weighted views not people-weighted views
if the people with the most money act like whales then you can be the barnacle
or the fish that eat their dead skin
if the big companies do things on a scale that such things don't matter to them
then you can profit from the inefficiency it creates
every monthend they rebalance their portfolios etc.
just the fact that they wait until the end of the month and do it at the end of the month every time is an exploitable ineffcieicny
locally: I don't think so
(also, remember that RAM memory and GPU memory are two different things)
on the cloud: Yes, but it can get pretty expensive
"are they worth it": depends. If you need to ask, probably not.
for sure. like i said you seem to know what you're doing, so best of luck
I have some idea, some vision
I think saying I know what I'm doing is a stretch ๐
I do have an idea of where inefficiencies exist
and why they exist
but how to exploit them most effetively is what I'm trying to discover
if there's a better way to fit parameters for trading strategies than DEAP I'd like to find it too
yup i am asking if they are worth it cause i am a student trying to test their capability can cannot bear monthly costs of 200-300USD to merely running a model for learning
I just assumed the cost function's topology was too ill-defined to use other techniques
but that might not be true
things like what combination of weights on moving averages, lenth of moving average windows, number of lags in hurst exponent calculations, etc.
there is no "y"-value
in a traditional sense
I think what I would use if I weren't using DEAP is reinforcement learning
but I still need to learn more about that and it takes more setup
is there a tldr for how your algorithm actually works?
it looks at the alignment of different periodicities of hurst exponents and moving averages
maybe you can adjust or constrain it somehow, or reparameterize it to be something that can be optimized more easily
and decides to go with or against trends
and scales bracket orders to conditional volatility
and the objective function itself is an accumulation of profit
that arises from its actions
overfitting is a serious problem though because what will generate the most profit changes over time periods
i don't know what a hurst exponent is, i'm not a finance person myself. but it sounds like you have some kind of iterative thing, where the procedure collects data for some fixed period of time, then takes an action, then repeats?
yes it's iterative
it takes an action given a signal that exists at one time
and we don't know what the value of that action is until some indeterminate time in the future
when it meets an exit criteria
whether that is a stop out or a take profit
i see, yeah that definitely sounds like it could be a problem. it sounds like you were on the "high variance" side of the bias-variance tradeoff
yes
first things first, i'm not familiar with the technicalities of these models, but i know there is quite an extensive literature of reinforcement learning for exactly this kind of iterative agent scenario
so you first might want to just check to see what already exists to avoid reinventing the wheel
I was avoiding the investment into RL because you have to design the game
which is a bit more involved than setting up DEAP
but a bullet I will have to bite eventually
but in general, there are two broad approaches in machine learning and statistics to avoid overfitting: reduce the amount of information the model can obtain from the training data set, or try to generate a large collection of realistic synthetic data sets, and average across many model fits on those data sets
first approach seems more feasible here
i believe RL is a broader literature than the stuff we've been seeing in the last few years with "AI plays Mario" type of thing. there are traditional algorithms like the "multi armed bandit" that as far as I know fall under the category of RL
one thing I was thinking about doing is actually inverting the relationship between test and train
I train on the smaller, later dataset
and then see if it performed adequately on the larger one
in case we revert to a different historical regime
afaik I have no theoretical basis for doing that though
so what are the actual parameters in this model? what is being learned/fitted here?
fitting on a smaller data set might just make the overfitting worse, hard to say
trigger criteria, window lengths, and weights that sum to one
and a couple scaling values
toolbox.attr_fast_period,
toolbox.attr_med_period,
toolbox.attr_slow_period,
toolbox.attr_ma_signal_period,
toolbox.attr_hurst_signal_period,
toolbox.attr_hurst_lags,
toolbox.attr_trend_fast_weight,
toolbox.attr_trend_med_weight,
toolbox.attr_trend_slow_weight,
toolbox.attr_reversion_fast_weight,
toolbox.attr_reversion_med_weight,
toolbox.attr_reversion_slow_weight,
toolbox.attr_reversion_sigma_open,
toolbox.attr_reversion_sigma_close,
toolbox.attr_trend_sigma_open,
toolbox.attr_trend_sigma_close,
toolbox.attr_trend_sigma_stop,
toolbox.attr_reversion_sigma_stop,
toolbox.attr_hurst_kill_period_reversion,
toolbox.attr_hurst_kill_period_trend,
toolbox.attr_hurst_trending_trigger,
toolbox.attr_hurst_reversion_trigger,
toolbox.attr_trend_fundamental_scaler,
toolbox.attr_reversion_fundamental_scaler,
toolbox.attr_fast_fundamental_period,
toolbox.attr_med_fundamental_period,
toolbox.attr_slow_fundamental_period,
toolbox.attr_reversion_fast_fundamental_weight,
toolbox.attr_reversion_med_fundamental_weight,
toolbox.attr_reversion_slow_fundamental_weight,
toolbox.attr_trend_fast_fundamental_weight,
toolbox.attr_trend_med_fundamental_weight,
toolbox.attr_trend_slow_fundamental_weight,
toolbox.attr_hurst_bottomout_trigger,
toolbox.attr_hurst_topout_trigger
things like this
i see
it behaves differently in trending and reverting environments so they have independent parameters
are these all or mostly continuous numerical values bounded in some known range? you might want to try something in the category of bayesian blackbox optimization instead of evolutionary algorithm
I don't have much experience with the latter, but I have modest experience with the former for hyperparameter tuning of machine learning models
both of them can be used for the same set of problems
Bayesian optimization is sequential of nature
EA are made for parallelism
in bayes opt you can still use parallelism to get better exploration of the parameter space at each step
Bayes opt also tries to be really efficient in the amount of iterations it needs to find the optimum
I think bayes opt is closer to exploitation on the exploration - exploitation scale
my thinking is that you might get better regularization out of it, fitting something closer to a smooth curve over the parameter space
not sure if that intuition is off
optuna and hyperopt in python. i've had good results with the latter specifically
i also wonder if you can simplify this somewhat by learning individual sub models instead of trying to optimize the entire thing all at once
I think all of the parameters interact with each other though
for example, if you need to forecast something in order to make a decision, you can fit a separate probabilistic model for that specific thing, and use the distribution of predicted forecasts as input to some decision component
I'm still thinking of the EA vs bayesian opt
like if I add a set of genes to the EA to do someting
they will change the optimal values of the other genes
If evaluating your f isn't expensive I'd always take EA's
yeah, i'm just thinking of heuristics that might get you closer to something that works well without overfitting, and might be faster/easier to iterate on
that would be like training your model and evaluating its performance?
which in this case doesn't seem to require intensive numerical computation, so maybe it's cheap to evaluate, in which case you can take advantage of the high exploration potential of EA?
i'm curious in that case if there are just some parameters you can tune to reduce sensitivity to data
yeah I have one that scales fundamental signals based on a separate model
I tried making it a bool at first
and the version where it was nonzero always won
You can turn it into a multi-objective optimization problem and add an objective tangentially related to overfitting if regularization isn't enough
so I changed it to scale freely
i see
I tried to do that to some extent by making it scale to the overall number/size of transactions in the fitness function
but it was insufficient
I think the problem is the market switches dynamics
over time
so either have to figure out a way to segment the time series beforehand using something like an HMM state prediction
(but that presumes your HMM has the data that explains the switch in regimes)
but that will also produce its own form of overfitting I thin
where it assumes everything is too much like the other datapoints that are assigned that state
i'll look into bayes though thanks everyne
btw if you work with time series that exhibit heteroskedasticity or other types of heterogeneous behavior over time I think hurst exponents can be a valuable piece of information for signal processing
might be valuabe in other domains too
it's just a measure of how much something is like brownian noise
vs trending against the mean vs reverting to a mean
The Hurst exponent is a statistical measure of long-term memory of time series. The existence and form of such memory are of great interest in financial markets, as financial returns are not generally governed by random walks. The Hurst exponent is a single scalar value that indicates if a time series is purely random, trending, [โฆ]
probably has applications to fraud detection, etc.
when something begins to trend that wasn't before
like is the number of users registering for your app from north macedonia trending hard
although you could probably catch something like that more easily
โฆ[The concept] was originally developed in hydrology for the practical matter of determining the optimum size of the dam for the Nile river by Harold Edwin Hurst [but] can help us classify the pattern of time series of prices under a certain time horizon.โ [Bui and ลlepaczuk]
in my usual understanding of RL, part of what the algorithm does is adapt dynamically to the environment, for example the multi arm bandit. part of why i suggested looking into that literature is because maybe you could get ideas for how to make your model adaptive, rather than every parameter exactly to whatever happens to be in your historical data, which i suspect will always overfit to some extent
or maybe that's what those parameters you showed me are doing, i don't know enough about the models and domain to comment on that
right I wanted to find a way to make this online
There is a layer of conditionality that is imposed by my apriori assumptions of what might be good
and the DEAP algo results either confirm or reject my hypothesis (at least on the same data)
the problem is generalizability which I think online learning would solve
at least as well as one could expect to solve it
if self.trending_bull:
self.state_history.append(1)
fundamental_factor = vol_est * self.fundamental_signal_trend * self.p.trend_fundamental_scaler
if self.data.close < self.target_trend - self.p.trend_sigma_open * vol_est - fundamental_factor:
stop_px = self.target_trend - self.p.trend_sigma_stop * vol_est - fundamental_factor
limit_px = self.target_trend - self.p.trend_sigma_close * vol_est - fundamental_factor
trade = self.sell_bracket(limitprice=limit_px, stopprice=stop_px)
self.log(f'TRENDING BULL - ENTRY {trade[0].price} - STOP LOSS {stop_px} - TAKE PROFIT {limit_px}')
``` so for example all of this logic was something I proposed a priori and simply fit the parameters the logic uses using DEAP
and I've been iteratively adding to it depending on what improves fitness and what doesn't
anything that is under the p attribute is a parameter exposed to DEAP directly
an example genome looks something like this
champ = (
[18, 52, 100, 6, 55, 9, 55, 94, 60, 15, 47, 149, 3.342307816105754, 1.094156334970098, 1.13173368372055, 10,
-0.7274971304499664, 58, 24, 4, 0.6331356749123428, 0.3652205723791574, 7.9251262018994435, 39, 60, 134, 239,
3, 38, 57, 123, 97, 9, 0.17026288765418376, 0.7678126560406366]
)```
the ints are either windows or weights I designed to sum to one by taking the sum of multiple parameters as the denominator
I figured ints are fine for that since it would be false precision to use floats anyway
it's already producing a float at the end
where each int is the numerator and the sum is the denominator
this is one of the fitness functions I played around with to try to reduce overfitting by increasing the number of actions by the agent
fitness = profit * (profit / (max_dd if max_dd else 1)) ** 2 * np.sqrt(no_trades)
where no_trades is the number of actions that have been consummated
dd = drawdown
they get instantited from these prior distributions:
toolbox.register("attr_fast_period", random.randint, 10, 51)
toolbox.register("attr_med_period", random.randint, 15, 101)
toolbox.register("attr_slow_period", random.randint, 20, 201)
toolbox.register("attr_hurst_signal_period", random.randint, 1, 101)
toolbox.register("attr_ma_signal_period", random.randint, 1, 101)
toolbox.register("attr_hurst_lags", random.randint, 7, 20)
toolbox.register("attr_trend_fast_weight", random.randint, 1, 100)
toolbox.register("attr_trend_med_weight", random.randint, 1, 100)
toolbox.register("attr_trend_slow_weight", random.randint, 1, 100)
...
toolbox.register("attr_fast_fundamental_period", random.randint, 20, 120)
toolbox.register("attr_med_fundamental_period", random.randint, 60, 240)
toolbox.register("attr_slow_fundamental_period", random.randint, 180, 360)
toolbox.register("attr_reversion_fast_fundamental_weight", random.randint, 1, 101)
toolbox.register("attr_reversion_med_fundamental_weight", random.randint, 1, 101)
toolbox.register("attr_reversion_slow_fundamental_weight", random.randint, 1, 101)
toolbox.register("attr_trend_fast_fundamental_weight", random.randint, 1, 101)
toolbox.register("attr_trend_med_fundamental_weight", random.randint, 1, 101)
toolbox.register("attr_trend_slow_fundamental_weight", random.randint, 1, 101)
toolbox.register("attr_hurst_bottomout_trigger", random.uniform, 0.1, 0.3)
toolbox.register("attr_hurst_topout_trigger", random.uniform, 0.55, 0.8)
when I do booleans I use random.choice on a tuple containing 0 and 1
the overfitting in practice
train
test
the topline should become more positive and should be large in comparison to the red line beneath it if things are going well
yeah, i think my only suggestion at this point is to look at reinforcement learning to see how they do it
it sounds like you have a good conceptual framework, which is probably the most important thing
https://arxiv.org/pdf/1504.08168.pdf I found this on EA overfitting
I am so bored with supervised learing. Need new ideas
Do any of you think that put 3000 hours into python and data stuff is kind of insane for the time span of a single year?
3000 hours in a year sounds pretty insane for anything at all
Anyone got practical ML/AL resources for beginners? Something that shows the process of making a ML project and doesnโt get too deep into math.
Why?
My mind kind of blanks when I look at a math formula
that's more than 8 hours per day every single day?
Yes.
There was a time period for 3 months straight were I would program 14 hours a day every single day and go through dataset after dataset.
machine learning is literally built upon math/statistics
you'll have to get used to math to some extent if you want to build machine learning models
is your issue just the notation, or the math itself?
Mostly the notation. A lot of ML math seems to be pretty simple, but the formulas have all sorts of weird symbols and stuff. I guess Iโm not used to them
I dunno calculus though.
Dude, math is a very vast thing. The most vast thing ever
you could try keeping a reference sheet with the meaning of common symbols, or just spend some time properly learning it, but you'll have to get used to it one way or the other
K ig
@torpid quartz What is the highest math course you have ever took?
Geometry
ok, not bad
take trig, then calc
If you want. I took calc1-3 before I touched stats
I donโt think trig is a course where I am
which, I do not knoe about
What kind of ML are you trying to do?
Idk, I think computer vision is pretty cool
Like I took insane grad level optimization classes and I barely use scipy,optimize
just find a way to do it
The tutorials online only seem to touch the surface of opencv and CNNs
Ok, what have you done so far in ML?
Uhhโฆ train a decision tree with scikit learn and train the mninst dataset
Basically nothing
Bit of face detection, but not with a NN
what do you use the most for ML?
yes
I know python and rust pretty well in terms of general programming
Bit of c++, bit of Haskell, bit of ts
just do whatever you want. No one is stopping you
Ok I guess
Iโm thinking of if thereโs a way to recognize hand gestures using ML
Withโฆ what?
Iโve jumped headfirst into stuff before, but I have no idea how to even start this
Straight up, I would get more comfortable with basic stuff before jumping in
Basic stuff as in math, or concepts, or simple algorithms that arenโt NNs?
I do not know, you are confusing me.
Ok sorry
try iterative agents ๐
reinforcement learning
kaggle.com is good
reinforcement learning I think addresses a much more interesting class of problems than supervised learning
not "what should the next value be" or "what class does this individual belong to" but "how should I behave over time to optimize some metric"
I haven't gotten into it as much as I'd like myself
Not insane, just probably not something most people have the time to do.
No
ok
Not now at least
import numpy as np
import matplotlib.pyplot as plt
#do not truncate
np.set_printoptions(threshold=np.inf)
x, y, z = np.loadtxt(fname = 'data.csv', unpack = True, delimiter = ',', skiprows = 1,) #load data
X, Y = np.meshgrid(x, y)
Z = (2/5)*np.e**(-X**2/2) + (2/5)*np.e**(-Y**2/2) - (3/5)
r = (Z - z)**2
print(np.max(r))
print(np.min(r))
print(np.where(r == 1.7784100967330392e-16))
print(r[107, 107])
this retuns
0.2544509833800965
1.7784100967330392e-16
(array([ 4, 36], dtype=int64), array([107, 107], dtype=int64))
0.0015725117183072214
how come i cant find what index 1.77....E-16 is
!d numpy.argmin
numpy.argmin(a, axis=None, out=None, *, keepdims=<no value>)```
Returns the indices of the minimum values along an axis.
!e ```python
import numpy as np
x = np.array([0, -1e-16, 2, 4, 6, 8])
i_min = np.argmin(x)
print((i_min, x[i_min]))
Would this work
@desert oar :white_check_mark: Your 3.12 eval job has completed with return code 0.
(1, -1e-16)
on 2d arrays
yes, but check the docs. the axis= function controls how it works on arrays of > 1 dimension
will do
So what was wrong with np.where?
nothing, but read the docs carefully to see what that output means
well, the problem is that you're looking for exact floating point equality, which is squirrely
just realised (array([ 4, 36], dtype=int64), array([107, 107], dtype=int64)) this means it in index [4, 107]
argmin is going to be less fussy
also, the axis keyword is a bit funky. it tells you which axis/dimension is "consumed" by the operation. so axis=0 means that it will find the argmin by "consuming" the 0th (outermost) axis, returning a result with the other axes intact
!e ```python
import numpy as np
x = np.arange(9).reshape((3, 3))
print(np.argmin(x, axis=0))
@desert oar :white_check_mark: Your 3.12 eval job has completed with return code 0.
[0 0 0]
!e ```python
import numpy as np
import numpy.random
x_flat = np.arange(345)
np.random.shuffle(x_flat)
x = x_flat.reshape((3, 4, 5))
print(np.argmin(x, axis=-1))
etc.
@desert oar :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | [[4 0 4 1]
002 | [1 4 1 1]
003 | [3 1 1 0]]
(oops, it doesn't like argmin over multiple axes... TIL)
@past meteor what should the number of generations and size of the population be functions of when deciding those hyperparameters for EA? the crossover prob and mutation probs seem like pure shots in the dark but I imagine that you can reason about what you want here. Specifically, I'm wondering if having too many generations leads to overfitting.
seems like it should to me
seems like NGEN should be picked relative to the size of the dataset and number of codons in an individual but not sure if there's some rule of thumb here
What IDE do you guys use?
pycharm
I'm thinking one way to get around this problem is exponential decay of profit so that profits from a long time ago add less to the fitness
the other idea I had is to use two fitness functions, one that calculates a reduced result over several segments and one that operates just on the final test
which isn't ideal but it seems to be a very direct way to get the existing algorithm to do what i want
inner_fitnesses.append(inner_fitness)
fitness = np.median(inner_fitnesses)``` let's see where this black magic leads
it just werks
that's the other nice thing about EAs
I can do whatever I want and it's up to me to decide whether it makes sense or I like it
I can make it more conservative just by switching out median for min or some percentile below 50
the numbers are so much smaller it makes me sad but that's proof it's working
it takes 90 minutes across 16 high end Ryzen mobile cores from 2021
idk if that's considered expensive to you or not for EA
why is there like a facebook link under one of my repos?
wdym lel
wheres that
Under traffic
I ended up finding a way to reduce the variance by fitting two separate models based on an innate state segmentation of the training data (i.e., whether the central bank is raising interest rates or not as opposed to doing some latent state model)
I also did the segmentation into different time periods and calculating the median fitness thing
I consider it a workable solution for now but will probably follow up on the oter suggestions you guys made here
?
Also bumping my question.
is the idea that you want to write one that will be plug and play with the rest of scikit learn
or just a decision tree classifier in general
because I think that class name is specifically one from sklearn isn't it
the decision tree algorithm in general there's a lot of resources
So sklearn already has a DecisionTreeClassifier. I'm trying to create my own without looking at sklearns source code.
Just a decision tree clasifier in general.
this might be helpful
I was just asking because oftentimes people want to create custom versions of library classes and want them to play nicely with the rest of the library
which is a lot harder to do than just writing something
That shouldn't be necessary.
I skimmed it and it seems as good as any other discussion of how DTs work inside
From a quick glance-over, I don't think this goes into how model weights/decisions are stored within the object.
well part of the joy of writing custom classes is you can decide how to do that yourself
ah
if you want to see how sklearn does it you can try to go into their source code but I'm sure that it's an implementation of a base class and you'll have to jump all over to find the full picture
sounded like you had an interesting idea why not go for it and see what works and what doesn't
I'm thinking about how to store the decision at each node in the tree. Writing something like feature < threshold for a continuous column wouldn't work because it just gets evauated and doesn't save the condition itself.
Perhaps I could store the operator and the operands separately.
You can do something cute here in Python and use operator overloading.
Wouldn't I have to overload the operator in whatever class the feature is and whatever class the threshold is? Or maybe I could create a new Decision class...
Yes.
Yes to making a new class or yes to overloading in the operands' classes?
Classes and overloads, but you don't really need that, it's just so you can do this fancy thing of writing feature < threshold in Python.
If I don't need that, I can't think of another way to save a conditional statement without it getting evaluated.
!e ```py
def decision(a, b):
def foo():
return a < b
return foo
d = decision(10, 20)
print(d())
@iron basalt :white_check_mark: Your 3.12 eval job has completed with return code 0.
True
Yes, that's what I was thinking for making a new class, more or less.
!e ```py
import ast
print(ast.dump(ast.parse("x < y"), indent=2))
@iron basalt :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | Module(
002 | body=[
003 | Expr(
004 | value=Compare(
005 | left=Name(id='x', ctx=Load()),
006 | ops=[
007 | Lt()],
008 | comparators=[
009 | Name(id='y', ctx=Load())]))],
010 | type_ignores=[])
Can you explain what's going on here?
Directly using Python's parser to parse the code into an abstract syntax tree.
Something like this perhaps
class Decision():
def __init__(self, left_operand, right_operand, operator):
self.left_operand = left_operand
self.right_operand = right_operand
self.operator = operator
def evaluate(self):
return self.left_operand.operator(self.right_operand)
Works too, you have a bunch of options.
op(left, right)```
d = Decision(7, 8, __eq__)
It's saying the __eq__ isn't defined.
Oh preobably because it's a method.
Can use lambda here too.
That's a good idea.
!e py less = lambda x, y: x < y print(less(10, 20))
@iron basalt :white_check_mark: Your 3.12 eval job has completed with return code 0.
True
class Decision():
def __init__(self, left_operand, right_operand, operator):
self.left_operand = left_operand
self.right_operand = right_operand
self.operator = operator
def evaluate(self):
return self.operator(self.left_operand, self.right_operand)
d = Decision(7, 8, lambda x, y: x == y)
print(d.evaluate())
False
I think this will work but it feels kind of weird.
Using a function to evaluate an operator.
Yup, and if you prefer __call__ instead of evaluate.
This is functional programming, It's cumbersome in Python since it does not support functional programming well, but it works.
I didn't know __call__ existed. Thanks.
I guess this is rather new to me.
Haskell example: ```haskell
ghci> add a b = a + b
ghci> add 10 20
30
ghci> add_five x = add x 5
ghci> add_five 10
15
ghci> add_ten_and_five = add_five 10
ghci> add_ten_and_five
15
Neat.
In mathematics and computer science, currying is the technique of translating the evaluation of a function that takes multiple arguments into evaluating a sequence of functions, each with a single argument. For example, currying a function
f
{\displaystyle f}
that takes three arguments creates a nested ...
you don't need to save the entire conditional statement in a binary decision tree, you just need to save the split point
I think I'm making an n-ary decision tree.
with arbitrary n?
Yeah
i'd say even then you probably can just store the split points in a tuple/list, no?
Can you elaborate on what you mean by "split points"?
which decision tree algorithm are you implementing?
It's a classification one if that's what you're asking. If not, then I don't know.
respectfully, i recommend clarifying the actual algorithm you want to implement before trying to implement it
there are standard algorithms for decision trees in ML
https://www.youtube.com/watch?v=_L39rN6gz7Y
For the most part, I'm trying to do this. But with multiple classes instead of two classes.
Decision trees are part of the foundation for Machine Learning. Although they are quite simple, they are very flexible and pop up in a very wide variety of situations. This StatQuest covers all the basics and shows you how to create a new tree from scratch, one step at a time.
NOTE: This is an updated and revised version of the Decision Tree St...
i see. let me at least skim the video to see what they're presenting
Okay.
this looks very much like a binary decision tree
i think you might be confused between the arity of the tree and the number of classes being predicted
Can I have a binary decision tree with 3+ classes?
I was thinking about that. For creating the split points, I could do "if it is this class, go to the left. all other classes go to the right."
sure. the class score at each node is just the % of data points with that class in the node
well you can't use the classes to create the split points... otherwise, how could you make predictions on new data?
I oversimplified a little bit. You would still determine which class is the best predictor for each split point using the Gini Impurity.
i don't follow, sorry
let's reserve the term "class" for the thing we are trying to predict - the output of the model
and let's use the term "category" for the inputs to the model
So to build the tree, you determine which category gives the best split, the lowest Gini impurity. Then that category becomes the root. You send all rows that are True to the condition to the left and the rest to the right. WIth your new set of rows, for the left, you again determine the category that gives the best split. You do the same for the right. You continue until all leaves are pure, they only contain samples of one class.
sure. you also need to consider that if you have multiple features, you need to choose the best category to split on from among all features
How do you differentiate feature and category?
i'm admittedly surprised you're asking this question because i feel like you've been doing machine learning related things for a while
a feature is a column in the input data. a category is... a category. e.g. a feature might be "eye color" and some categories of eye color might be "blue" and "brown"
I was just confirming.
got it. i'm not sure if you're learning this material in english or another language
and you don't need to keep splitting until the tree is perfectly pure. scikit-learn for example provides several stopping criteria
e.g. you can refuse to split any leaf that's below a certain size. or you can refuse to split any leaf where the purity gain is lower than some threshold.
or you can refuse to split beyond a certain maximum depth
So are you saying that instead of finding the feature that gives the best split, you have to find the category that gives the best split among all features? I assume continuous features are also taken into consideration. Like which numeric threshold gives the best split. And then compare that with the rest of the categorical splits?
Say I have to features that each have 3+ categories, I would need to find the best category in order to determine the root node or any nodes thereafter?
yes, that's how a typical decision tree works
I assumed that the video I linked was implying that each split happens feature-wise, not category-wise.
can you clarify what you mean by "feature-wise"?
you only split on one feature at a time
but how do you choose which feature to split on? and how do you choose where to split?
the answer is that you choose the best split for each feature, and then you choose the feature whose best split is the best overall
I mean you pick a feature, such as Loves Soda, a binary feature, send all rows for which the condition is true to the left and the rest to the right.
Another example with 3 classes: Eye color. Send the blue-eyed rows to the left and the brown and green eyed rows to the right.
You'd determine which feature to split on by calculating the Gini Impurity for that split.
You'd determine which feature to split on by calculating the Gini Impurity for that split.
be more specific. if i choose eye color, how do i decide that blue goes left and brown/green goes right? what if i have eye color, hair color, leg length, and forearm length all in the same dataset? how do i decide which feature to use for splitting each node?
if i choose eye color, how do i decide that blue goes left and brown/green goes right?
Would you not calculate the Gini Impurity for blue-left, green-left, and brown-left with the others going right? Whichever eye color gives the lowest total Gini Impurity, you select that as the Gini Impurity to represent that feature as you go on to compare it with the other features.
yes, that's right. but then how do i decide to split on eye color, or hair color, or one of the other features?
Say blue left was the best split for eye color. The Gini Impurity for that split is X. You then find the hair color that gives the best split. The Gini Impurity for that split is Y. You do this for all features. You now have a value for Gini Impurity that represents each feature. You select the feature who had the lowest impurity. You then split that feature by its best category.
That's how I assume it would be done, but I don't have any evidence.
right, it's good to be very clear about what the algorithm is, before you try to implement it!
right, exactly. but that's equivalent to just looking at all splits of all categories of all features
No idea. This was also one of my takeaways of EAs very sensitive to hyperparameters and they're problem specific.
they can't make me redundant if I'm the only one who knows how to tune the hyperparameters ๐
./s
You don't need to implement decisions trees like this but it's good to remember they're also called recursive partitioning. If you know this you can conceptually simplify the problem to this "how do I split in one node" and "when do I stop splitting?"
I was struggling while attempting to implement recursive node creation. I think I'll be able to do it though. I should be able to set the stop condition to be when a leaf is pure. I can add max depth and min samples later probably.
Have to change my code to split on category instead of feature though.
This project is fun so far. I'm glad I started it.
you can still write your code to find the best split of each feature, and then choose the feature with the best split
the point i'm trying to emphasize here is that "splitting on feature" itself is an ill-defined concept
(also, terminology note: usually we think of every feature having its own distinct categories. so the categories of "eye color" are eye colors, the categories of "hair color" are hair colors, etc)
Every categorical feature or every feature including continuous numeric ones?
in general, only categorical features have categories... it's in the name!
in the case of a decision tree specifically, we artificially create categories by splitting.
in general, only categorical features have categories... it's in the name!
Just clarifying.
in the case of a decision tree specifically, we artificially create categories by splitting.
That makes sense. So like sayingage > 25creates categories out of continuous data.
In uni we had great slides on DT's I can send them to you @rugged comet
I would appreciate that.
Maybe it goes into a bit too much detail towards the end. I think it's fine to go to slide 57, implement the tree and then go back and do the other 40
send that to me as well :x
I have this AI model Im using flask to build. Our other backend is in node. The model takes 3 hours to get back with a response. How should I architect a request?
Should the node backend send an async post request with the data and wait for the the flask backend to respond. Or should the node just post and the flask says ok and does the process and post the results to the node backend separately?
Lets just say, it continously outputs what is needed and the end is a long list of things
I even wonder if I should go serverless
hey, do you have suggestion for getting started in data science and ai with python. I am only a high school student who will start college next summer. I know the basics of python like defining a function, lists, loops, file functions, conditions, etc. i also know few basic modules like math, mysql.connector, random. I would only invest in some paid courses if they are actually worth it and also that doesn't really have a prerequisite like uni level maths cause i cant really cram uni level maths in few months can i? or maybe i can who knows. Well your help is appreciated just ping me or dm me the suggestions Thanks :D Also I dont understand git hub even a bit tuitorial for that would be nice too I am dumb like a 5 year old excited about magicians(aka cs engineers)
If it takes three hours you should absolutely not go serverless you'll burn money
Can you explain what this model is? Why is it taking 3 hours? Why do you have a backend in flask and one in node?
the node backend is our normal application backend. Our AI model is in python. While it is possible to run python in node etc, we dont want to do that. So, the other choice was to make another ec2 instance and deploy the rest api using flask.
It takes 3 hours because it does a lot of work. One output is fed to another and in total takes a lot of time. Since the output is depended on each other, the work is sequential. Honestly, somone else wrote it and Im sure it could be made faster but it is what it is right now
Its a large language model
has to be gpu rightt
yeah
we're calling an already build llm. Each call takes a minute. The amount of calls end up taking hours. The resource is not intensive on our end
Stuff like BERT is still possible on CPU hence the question ๐
Hence the serverless consideration
Is this work you can do in-batch "whenever" and then give it to your main app? Does it need to be a web app?
Hey guys, Iโm trying to analyze a relatively small dataset (128 observations with subgroups as small as 20). This data is continuous, but very nonlinear (independent vs dependent variable).
Do you have any suggestions for a good model to analyze this?
OLS and GLM wonโt work since itโs nonlinear, Iโve read and implemented a random Forest analysis, but with such a small sample size it might be prone to overfitting. Would a reduction of trees (say to 20/40 instead of 100/1000) work?
- polynomial features and transformations with GLMs
- regularisation on your random forest (tune the cost complexity parameter). Reducing trees on random forest will not help you, it averages the trees. Reducing the amount of trees will likely cause more overfitting.
- Consider using gradient boosting. Usually it has stronger defaults than RF. You can tune it by reducing the amount of trees
- Try RBF-SVMs. They only have 2 or 3 hyper parameters for respectively classification and regression. They're more or less the best model on small datasets but they just scale poorly ๐
no need for web app. I was going to create rest apis. Yeah, its essentially python code making requests to llm models and processing the results
If you're able to do things in batch: in real-world settings doing 5 vector-vector multiplications is slower than 1 matrix vector multiplication. Same idea applies for tensors. Basically, if you can pool requests and evaluate a bunch of them at once it'll be faster than doing them one-by-one.
You might be already doing that, in that case: I don't know ๐คท .
Im going to look into web hooks, pub/sub and the like.
Yeah. I don't know the product you're building but you can always "answer" immediately from node and give them an endpoint to the Flask "worker" in this case which they can poll for the model's results.
But that won't work if it's 3 hours ofc ๐คฃ
Which of these would you recommend the most?
I have adapted the random Forest and created a SVM model, both of which return more less the same results. I have just been creating so many models lately and every new one is returning new results that Iโm quite confused which is the best to use ๐
A friend who knows a thing or two about statistics recommended GLM, however with nonlinear data, a linear model wouldnโt do well, thus I came up with random forest
Have you tuned them or are you just using the default parameters?
GLMs work well if you transform your input
This is a better question for #career-advice : it sounds like you probably need to first practice your Python programming a bit and reinforce what youโve learned, but you can also tackle some basic ML coding projects (see Kaggle.com/learn and CS50 for AI) to learn some of the coding basics. For preparing for college: make sure you math fundamentals are strong. Calculus is the typical first year course, and itโs not hard to prepare for if you put a little time in. I wouldnโt worry about AI/ML math if you havenโt started college yet. Just be ready for calculus.
I have adapted the hyperparameters max_depth, min sample split and min samples at each leaf. Also, I have used the sklearn cross validation for the MSE which at times does result in a rather big difference from the one calculated by the random forest
Since my data is right skewed, youโre referring to a logarithmic transformation of the input?
This might be difficult, since it contains many 0 values.
Is there no other way?
SVM seemed to work, what did you mean by scaling poorly?
hello i have a quick question about interacting with HuggingFace API. you think here is the place to ask or some other room more appropriate ?
Yeah it doesn't need to be logarithmic. You can also take abritrary polynomials https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html or even splines https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.SplineTransformer.html#sklearn.preprocessing.SplineTransformer
If you scroll down to the docs there's typically examples of how to use them.
SVMs are possibly the most powerful model there is, but they're not spoken of / used a lot because they don't work if you have a lot of data. You don't, so it's perfect here. Be sure to tune its parameters as well though ๐
what about xgboost
We'd have to run a benchmark. The appeal of SVMs are a universal approximator (contingent on tuning just needs 2 hyperparamers). Gradient boosting is a different beast, lots of dials you can turn.
In practice I always use Xgboost, CatBoost or HistGradientBoostingClassifier and I rarely use SVMs because:
- Gradient boosting performs really strongly with 0 hyperparam tuning, same can't be said for SVMs. I rarely can be bother to tune them in reality.
- I rarely have datasets small enough for support vector machines not to OOM.
On smallish datasets in the past I've run into situations where I couldn't outperform SVMs with gradient boosting tho
Xgboosting is a lot easier to implement than data adaption via polynomials or splines.
However, now I have GLM, random forest, SVM, xgboosting, gradient boosting from sklearn and decision trees - how do I figure out which of all those p-values is the correct one?
p-values? ๐ค What are you doing?
Im trying to perform a multivariate analysis on my data, measuring the impact of something like age/time/temperature on amplitude values (given in percentage, hence the 0s).
Using these systems of course I get MSE values and have calculated a p value using a t test in the residuals.
Iโm trying to publish a manuscript in medicine and the readers are rather fond of p values
Okay, that's a very important detail.
I'd use a GLM then as well. I'd ask this question to a statistician as well, not people doing ML.
I have a code
Can anyone look over it?
It is reinforcement learning and project is balls in bins
I am on b
The problem is with the nonlinearity of my data. All attempts of adapting the code have resulted in endless errors, so I was looking for an alternative
Do you think a SVM would work here?
do you have some scientific model in mind, or are you just guessing it's nonlinear because the model didn't seem to fit well?
p-values are problematic for several reasons, but i agree you are definitely looking for a more "statistical" approach
it sounds like you're having trouble with the code, but it also sounds like you're struggling with the modeling. not a good combination, it can be overwhelming and hard to figure out what your problem is
so i suggest first figuring out some kind of modeling strategy first, and then getting it to work in code
you'll have to do that iteratively. start with a modeling strategy, then get it working in code. if the model is no good, repeat. don't try to do both simultaneously if you aren't confident with the code.
in statistical inference it's usually good practice anyway to have a model in mind before trying to fit a model. otherwise you end up just digging around for results, and that's how you get spurious invalid non-replicable results
How can I convert a pytorch tensor from 1D to 2D?
Effectively, I have a single record which is a 1D tensor, but the model expects a batch, making it 2D which is shape of (N, 25) so how can I convert my 1D to be effectively a [[1, 2, 3...]] instead of a [1, 2, 3...]
. reshape(-1, 25)
Or .reshape(1, -1)
Negative one gets solved for whatever integer completes the product
ah
I see, let me try this
I had (1, 25) originally
ah, I think I might be dumb and forgetting my input is actually already 2D and it wants 3D
I forget this is a GloVe model rather than 1D embeddings
You can also do (1, 1, -1)
I'm on mobile or I'd explain better
yeah, I got it working, thanks!
Two questions:
- How can I add only the highlighted axis lines?
- How can I add padding to the axis labels?
found how to add padding, just need 1.
import numpy as np
import matplotlib.pyplot as plt
x, y, z = np.loadtxt(fname = 'data.csv', unpack = True, delimiter = ',', skiprows = 1) #load data
ax = plt.axes(projection='3d')
ax.scatter(x, y, z, cmap = 'turbo', s = 25, c = z, edgecolors = 'black') #creates a 3d scatter plot
plt.title('3d scatter plot of data.csv')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.colorbar()
plt.savefig('data.png', dpi = 300)
plt.show()
why wont my oclour bar show
i get
raise RuntimeError('No mappable was found to use for colorbar '
RuntimeError: No mappable was found to use for colorbar creation. First define a mappable such as an image (with imshow) or a contour set (with contourf).
NVM I FIXED IT
I was told to perform a multivariate analysis on my data and so I tried. According to the scatterplot and histplot it is no linear data (eg Int grades from 1-6 on the x axis, with amplitudes on the y axis). Itโs not possible to fit a valid regression line through it, hence Iโm looking for an alternative to GLM
what do you mean by "valid regression line"?
I've found that even on small datasets LightGBM usually beats SVM hard without the arduous tuning process
like n~=500
but of course it all depends on the specific use case
I don't really have any motivation to use SVM's anymore though
they are slow and usually don't work well without tuning which wastes even more time
am I wrong for dismissing SVMs?
Tuning SVMs is not hard, it's 2 hyperparameters
it's not hard but it can be tedious
How?
Then you'd love them. Just make a grid of parmaters on a log scale and search
XD
Grab coffee while it's running
it's slow to iterate
I had a problem where an SVM took like 50x longer to fit than LightGBM and it performed way worse
and then to tune it on top of that?
How much data did you have?
They're pretty much invariant to the amount of columns. At least, a lot more than other methods.
That's actually the main appeal of the method
As mentioned, the reason why people don't use them is that you need to compute the kernel matrix which is a N x N data structure
my LightGBM estimates were actually correlated with the test data too and the SVM formed a cross on the scatterplot XD
I had never seen anything like that before
The datapoints are scattered widely, you just canโt fit a regression line through it. Imagine x axis 1-6 and y axis 0-2,5. the regression line would be horizontally
Using 32-bit floating point you can compute the upper limit of how many data points you can use with SVMs based on your RAM, it's quite low
SVM y_hat vs y_test plot be looking like +
admittedly without tuning
but when I saw that I didn't have any motivation to tune it when the boosting model already was better
Yeah, that's the issue ๐ฉ
.
Is there a reason why you don't use LightGBM?
I also use LightGBM ๐ LightGBM, Xgboost, CatBoost, HistGradientBoosting, ...
I found this new thing called Natural Gradient Boosting out of Stanford
I thought it was really interesting but the variance estimates aren't that great compared to dedicated variance models like GARCH
one of the appeals is it estimates mean and variance at the same time
NGBoost: Natural Gradient Boosting for Probabilistic Prediction.
these gifs sold me
but then it didn't seem to really be great on large n data anyway vs the others
also it performs worse on the scale parameter than machine learning algorithms regressing y^2 on lagged y
for time series data at least
I tried using it in the same way I would use LightGBM or a RandomForestRegressor and it did comparably but the allure was not having to have a separate model for the variance with this approach and that didn't pan out
(it drastically underestimated true variance)
is there a way to run quantization in transformers on an amd gpu
You should be able to do it on any gpu with CUDA.
AMD only has rocm
Then no.
oh
would there be a way to quantize the model from, e.g google colab then download to my system?
Yes
ah
I am very disappointed that people don't demand an alternative to Nvidia
are we just going to keep lining Jensen's pockets forever
There should definitely be more options, yeah
I think Intel is ironically probably closer to supporting more GPU compute than AMD last I checked
idk what AMD is doing
yeah
only disadvantage of an amd gpu
the price and performance is amazing but no development stuff
I think they overfocus on the budget and gaming markets
prob bc they figured there's no way to beat Nvidia at CUDA
yea lol
but they seem to just chronically underinvest in ROCm
i thnk i might get intel next, oneapi seems cool
BROO my google colab environment reset
Now i have to upload llama 2 7b AGAIN
i hate this
this is why I just bit the bullet and made my own ML workstation although being able to do that makes me a bit privileged
Tbh, even after you fine time llama2-7b on colab, you probably won't even be able to deploy it locally without a gpu
Does it have cuda
bruh
Because if you don't have cuda, that's the same as not having a GPU as far as ML is concerned
Hmm, like what?
I'm not doubting you. This is just the first I've heard
there were lots of caveats
ive used tensorflow with rocm before i think
and only the most popular and basic algorithms had any support
wait but i also need my gpu to be recognized in wsl
I had some surplus AMD gpus
and I wanted to see if I could use them in a machine of redundant parts
to get some value for ML purposes
the project was aborted when I realized how limiting ROCm is
but it's not zero
my gpu in wsl is a Microsoft Corporation Device 008e
so i need to make it recognize my amd gpu
is using WSL worth it if you have a L1 hypervisor Linux VM or baremetal
I still haven't used it
ok well i finally have my llama 2 7b back
but i have 1gb left
ayyy its loading shards
i just hope it doesnt take disk
it works guys
glad it worked out for you
thanks!
hey can anyone help me ?
need a data base containg real resumes for a model cant get any data set for it can you all upload some on a form that i will share ?
This is not really AMD's fault in any way, you can do ML just fine on an AMD GPU. The issue is that everyone has locked into Nvidia by writing everything with CUDA. AMD has been working on conversion layers that let you effectively run CUDA on AMD GPUs. They do work, but do not have everything implemented. If the open source community wanted to, they could implement everything with OpenCL or Vulkan and then we would not be GPU vendor locked.
AMD has NOT done the marketing or outreach
most people don't even know ROCm exists
it is totally their fault
Stelercus didn't even know it did anything at all
that's AMD's fault
They have not actively pursued the market, but it's totally doable.
(Until recently)
you are blaming the coding community for not doing the thing but AMD didn't really try to make it happen either
they could have invested more into subverting NVIDIA's dominance but they shyed away to focus on gaming and budget computers
I'm not blaming anyone, they just wrote their stuff for CUDA, and now they are locked in because it's a lot of work to rewrite it all.
I think ROCm also made some weird design decision
where everything has to be an atomic operation
whereas in CUDA that isn't the case
at least that was what was going on last time I used it
Nvidia did give them more support, which was the whole plan. Nvidia knew that they had an opportunity here, and they took it.
Does not even need ROCm, I would prefer something cross platform, like OpenCL.
ROCm was very ambitious I think
(Then we can even do FPGAs)
trying to make it so you can use an array of any kind of different GPUs
and to make that work they had to enforce atomicity
but CUDA code isn't written like that
so not only do they provide less resources but they raised the bar
so many problems with that project
ROCm has many problems, I don't think it should be used, but there are other options that work fine.
to be clear I think ROCm's way is better in theory
but in practice it will never get there
like GNU Hurd
Yes, in theory, but it's AMD and AMD flops when it comes to SDKs.
(Even in gaming)
Nvidia does now have a lock-in monopoly on deep learning, but AMD also did not really care / cared too late. And Intel is just doing their thing, not sure what they are really going for.
Many ML libs used to have OpenCL actually for a while.
I know ROCm exists and stel probably as well but the issue remains if you want to deal with that uncertainty of it being runnable or not on AMD
OpenCL would be really nice to have again, since it's any GPU, CPU, or even other things like FPGAs.
If you get an NVIDIA card you know ahead of time it'll run
If you go down the AMD ROCm route sooner or later you'll hit a brick wall. This can be after 1 day or after 1 year.
intel has something similar to nvidia's chokehold through mkl
lots of computing software runs better on intel processors thanks to it
Stel actually said he didn't know ROCm was usable at all
which to be fair is a close approximation
xD
The best option are those CUDA-ROCm layers, like the one torch has.
I forgot their arconym, it's part of that whole family of HPC stuff.
I went for an AMD CPU with an NVIDIA gpu ๐ฅด
At least Intel does not intentionally make the other options run slower... (they intentionally crippled the OpenCL drivers back when OpenCL was still 1.0 to get everyone on to CUDA, now they don't care and it works ok).
(I remember getting an Nvidia GPU that said it supported OpenCL on the box! But it did not)
don't they? until recently, anything using mkl and detecting it's not running on intel would immediately turn off avx2 even if it was available
that would make things like matlab chug on amd processors
Damn is that why I can't run avx on our servers
it could be
I'll check tomorrow
you may need to replace mkl with openblas
That is a bit different, basically Intel has paid for features that are unlocked. Nvidia has this too, but it's not the same as saying you support something only to intentionally make it slower so your thing (CUDA in this case) looks better.
It's still there, still works fine, still is heavily used.
Just deep learning specifically uses CUDA for everything.
Because the popular frameworks are built on it.
They almost got ARM, so they are pretty much as mega giant corporation as it gets.
it's not the size it's the lack of competition
you can be a huge conglomerate in every market and that's fine
it's when you maliciously take control of one market
like Intel and NVIDIA try to do
Yeah, the problem is that computer hardware is really hard to get into.
and MSFT
To make HPC stuff.
intel arc to the rescue ๐ฉ
I understand why that was the case in the past
And it does not help that nobody has interest in adopting anything else, the advice is still just "buy an Nvidia GPU."
but we are in a new era
that demands more competition
this isn't bleeding edge anymore it's mainstream
the industrial organization has become a major hindrance
Nividia just captures all the economic surplus
Computer hardware is very complicated, involves geopolitics.
(Since it directly translates to military power)
they should simply force Nvidia to split its software division off from the hardware one
Probably, and split in other ways.
I think it's amazing they got everyone worried about SkyNet instead of their massive monopoly you gotta wonder if that's a PR campaign
"don't look here, look THERE"
anyway this is why I've long been a skeptic of intellectual property
if everyone could freely use available knowledge we'd all be better off
Guys I've gone through the introduction to ml andrew ng course and have read the data science first principles with python book. What would the next step be? Should I learn more statistics and then read the statistical learning with python book or what should I do, to deepen my knowledge? Perhaps I should learn more mathematics overall?
try to do projects and try to learn how to improve their results
it's always best if you have something to work on
"During a gold rush, sell shovels."
That's true
Alternate reading and doing. I highly recommend getting on www.kaggle.com and doing the "basic" courses there and the basic projects.
Yeah that's true I should keep learning and improving my knowledge while working on things and try to implement my learnt knowledge
having something to work on gives you material to self-direct your education in a way that is most meaningful to you, raises important questions that often won't arise in coursework, and gives you more satisfaction (at least in my experience) and therefore motivation
exactly
What are the tradeoffs of using a binary tree vs. an n-ary tree for decision tree classifiers?
Are most decision tree classifier binary trees?
Hello, how can I create a text to text chatbot using pytorch and a dataset consisting of questions and answers? The chatbot should respond to questions asked(it is going to be a mental help chatbot specifically). I am new to pytorch and I can not figure out how to do it.
is there any problem that an n-ary tree can solve that a binary tree can't?
What if you have a problem where only one split actually increases information?
you have to be careful with mental health but I'm sure you know that
I don't know.
It seems to me like the tradeoff depends on whether more than one split per node is worth it
and you will only know that for a specific problem
I don't think it will end up mattering
but I might be wrong
my intuition says that it will collapse into the same solution and may perform slightly better or worse
the tree will be more shallow
so if you would have otherwise run into the depth limit that might be avoided which will lead to more total calculations running
that's my best guess
I think it's a bit easier to reason about a binary tree
I'm currently fine-tuning/training a sentiment analysis model DistilBERT with a dataset with 60,000+ entries using K-Folds, is it normal for it to take a while? (i tried 36,000+ before it took 1 hr and 30 minutes for 1 epoch alone) Just using my personal computer to run the training code.
does anyone know of a somewhat frequently updated retail commerce prices paid dataset that's available at a reasonable cost or free
I imagine that this data is extremely valuable so the answer is probably hell no
hello everyone, has anyone tried prefect-flow for etl pipelines?
It's quite common for it to take time if you're doing this on a CPU. However, if you'd like to speed up things, try using a GPU if possible.
If you're using PyTorch, it's even easier to push your model and data to GPU, and do your fine tunning from there.
i see, i just finished my data set with 60k entries a single epoch took 3 hrs and 11 minutes altho not really in a rush to train this. will look into trying using my gpu.
trying to do BERT without a GPU is a fool's errand
Is anyone know chatgpt dan prompt still work
This is not actually true
The nuance is that you want to train on GPU yes, but doing inference on CPU makes a lot of sense. BERT only needs 400 MB ram or so.
Alright! ๐ it's still a statement I'd try and nuance as much as possible. It's kind of important especially beginners know CPU inference is possible with a decent latency
GPU stuff just costs so much more so it's a good one to know. Base Bert takes like sub 150ms but you can get it to sub 10 if you try hard.
Any example of something that outright won't work on CPU?
I think anything that works on GPU can as well work on CPU. The major question would be, at what computational cost?
Some task are better off done in GPU than CPU (and vice versa)
Ah ok
thanks. My calculus practice is pretty strong since my last year of hs has it in the syllabus. Also should i look in statistics too as a maths fundamental??
the short answer is yes. the long answer is yeeeeeeeees. almost everything in ML and data science either straight up IS statistics, or involves it in some way. it's easier to pick up after calc and linalg though
See the pins for some reading material but Iโm just advising on how to succeed as an undergraduate studying CS: the main topics are Calc, Linear Algebra, Statistics, and Discrete Math. Some programs let you do 3-4 semesters of Calc OR 2+Linear; you should choose linear. After undergrad, thereโs a separate question of preparing for data science which other folks can answer
from a โpreparing for collegeโ perspective: doesnโt really matter what you study. From a learning something interesting and fun, Iโd say stats, no question.
Hello eveyone, im currently using detectron2 for objec detection but im having a problem when it comes to predicting, it gives me a wrong prediction? How can i solve this?
Ohh okie
is there a way to download a 4bit converted model from colab
thanks
when i try push_to_Hub (huggingface) it says "NotImplementedError: You are calling save_pretrained on a 4-bit converted model. This is currently not supported"
what can be other ways to deal with that loss instead of changing the generator architecture?
i mean i will improve the generator but is there something else what can i do?
Anyone has any suggestions on how to start learning machine learning please? Should I simply go to youtube ?
Machine learning is where you optimize a mathematical function based on data. So it's all applied math. How well do you understand differential calculus, probability and statistics, and arithemtic using vectors/arrays/matrices?
I think I am capable of the math required for machine learning as I did computational finance for my master degree
does that include at least all of the things I said?
thanks for replying btw
Yes, I know all the maths
okay, because I wouldn't expect that degree to include linear algebra. but I wouldn't know, either.
there are these three textbooks: #data-science-and-ml message
and then there are more suggestions on our website
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Adding to what pope Stelercus said, you can also check https://Kaggle.com/learn
Practical data skills you can apply immediately: that's what you'll learn in these no-cost courses. They're the fastest (and most fun) way to become a data scientist or improve your current skills.
Guys, besides gradient descent, what other methods are there to find the best fitting line for a data set? Why is gradient descent so popular, is it the "best" and most effective method?
Also is it any helpful that as I go thru different statistics concepts i would try to implement them in python?
C? Data Science? Why not?
not sure what you mean, if this was meant for me
No. I just wondered if that would just be complete nonsense.
like, can the economy support 40,000 devs? I do not know
Yes. Honestly, I took calc 1-3 before I took stats and barley payed attention and got an A, and I litteraly had to relearn everything. Yes you should do that.
Alright, thanks- Any good statistics book you've read?
No, just went to school. Uh, actually, I cannot remember the name of the youtube channel, but they wrote a book called something like 'Intro to statistical learning' or something. I read that entire text and watched all of thier videos. Also, a couple of other good channels: Data School, Statquest,Zedstatistics
That is all you need really
Yeah thats this book: https://www.statlearning.com/
Gotcha thanks very much
yes. Damn.
Mine was in R. Damn, that was my introduction to machine learning years ago
yeah there are two versions, I think I'll start with the R one
R is just so bad and I am not saying that because this is a python server
but the other one which is "...applications in python" is literally the same but in python so the content of the book mathematically does not change
Is it? I havent actually tried it yet, but ive heard people dont really like it
I mean its kinda general to say this, but in the book "data science first principles in python" he said that he doesnt like r and he would rather focus on python
That text just changed my life in such a drastic way, That hit me hard.
Jeez, thats not too positive, you think i shouldnt get into it?
hey theree guys i need a little help around computer vision for a hackathon, its probably a basic thing to work around with but i got no experience with computer vision or such libraries, i'd be really grateful if someone can help me around it. its basically detecting the gun at the first stage, at the second stage detect the gun and the person holding it and capture the person, in the third stage we need to detect if the person is holding it in a stance of firing or holding it neutrally, im done with the first stage of the problem cant figure anything from there on, i'd be really grateful if anyone can guide me towards the second stage atleast
How was that not positive?
I meant it in a good way
Gradient Descent isn't the only method. You can use OLS as well.
Although some people would argue that OLS isn't an optimization algorithm but an estimation technique. However, OLS and Gradient Descent tend to find the line of best fit using different approach.
Most of those statistical concept are already implemented in a lot libraries like Scipy, Statsmodel, Sklearn, PyTorch, TensorFlow etc... So yeah it's cool to implement them in Python.
You might find the latest edition of ISL very useful since its last edition is in Python.
I guess my only let down is that, it doesn't cover conformal prediction yet. Hopefully they'll add a chapter on Conformal Prediction in their subsequent edition.
The free pdf Version of ISL can be downloaded via https://www.statlearning.com/
๐ When it comes to stats and better viz, leave it for R. I was taught R in school but I left it and moved to Python (I don't even have any concrete reason. I guess I find python more 'customer-friendly') ๐
I donโt know, I used to use R a lot but learned more in python because there was a snobby Data Science community on a different discord server that just spit on R
Reading that text and knowing everything they are talking about makes me want to cry. Like, I donโt know. The hardest course I ever took was a optimization grad class which was so hard that I suggest that no one does it. I donโt, lol. I was shooting heroin two years ago for five years. I was sick of that life and litterally replaced it with DS when I decided to go to grad school and get my masters. I was so inspired by that R version of that text. It is hard to explain how m uh that means to me. Really nothing means more to me than that. And the person who introduced me to this. Your mind is powerful and you can do whatever you want if your conviction is true. No one has ever failed when they genuinely tried. Sorry for that long text, just had to say that. You can do whatever you want and your mind is reality.
The reason why gradient descent works so well is people prioritize loss functions that are convex, basically U shaped. If the optimization surface has this shape you can easily use gradients to iteratively move to the "bottom" of the U, that's where the derivative is 0, that's the idea.
Now, there's equivalences between doing this and maximizing the (negative log) likelihood. Maximising the likelihood is basically choosing the weights such that P(y|X) is as high as possible. In words "the probability of observing the target variable given the data is as high as possible". As emyrs said, there is also an equivalence between gradient descent, maximum likelihood and certain matrix decomposition methods that come out of linear algebra. OLS for instance gives you a solution that 1) maximizes the likelihood 2) has a gradient of 0 with a closed-form solution.
Last but not least, there's fun touchpoints with computer science. (Stochastic) Gradient descent is very efficient in that it's optimized for not using a lot of memory, it scales well for large datasets. If your dataset is small you can use second derivatives, conjugate gradient, BFGS and so on. Matrix decomposition also uses way more memory and thus doesn't scale as well to large datasets. Also good to know there's algorithms like SVM that have a "problem" that have non-convex surfaces, they use more exotic things like quadratic programming.
Thanks very much, in the book are the statistics concepts explained or they are just presented with python, reason why im asking is whether I have to read a statistics book first to get started with it?
Thanks very much
i would round this out by mentioning that only linear least squares is this nicely behaved
if you formulate the maximum likelihood problem using a neural network, for example, the cost function is not convex. in these cases, the solution you get from gradient descent depends on how close you were to a particular local minimum or saddle point
does anyone know why Im getting negative eigenvalues? ```py
def sqrtOfMatrix(data):
eigenValues, eigenVectors = np.linalg.eigh(data)
assert (eigenValues >= 0).all(), "Matrix should be positive semi-definite"
return eigenVectors * np.sqrt(eigenValues) @ eigenVectors
matrix = np.random.randn(5,5)
matrix = matrix - matrix.mean(axis=0, keepdims=True)
covMatrix = matrix.T.dot(matrix)
sqrtMatrix = sqrtOfMatrix(covMatrix)
this book, these matrices, however, will be assumed to be positive definite. In view of this
assumption, these matrices will also admit their respective inverses."
you're looking for the sqrt of covMatrix, not matrix
the original matrix you made has no special properties other than having only nonnegative entries
hey can anyone give me some nlp project ideas?
Thank you, changed the code accordingly but it still produces the same error. I assume that it has to be numerical imprecision. I assume that this is because the majority of the variance is explained in the first principal component so that eventually the magnitude of represented by the eigenvalues becomes so small that it virtually becomes zero. Mabye adding a regularization term will do?
yeah it seemed to have work, but if theres a better way than just outright increasing the determinant, let me know ```py
def sqrtOfMatrix(data):
data += np.eye(len(data)) * 1e-12
eigenValues, eigenVectors = np.linalg.eigh(data)
assert (eigenValues >= 0).all(), "Matrix should be positive semi-definite"
return eigenVectors * np.sqrt(eigenValues) @ eigenVectors
matrix = np.random.randn(5,5)
matrix = matrix - matrix.mean(axis=0, keepdims=True)
covMatrix = matrix.T.dot(matrix)
sqrtMatrix = sqrtOfMatrix(covMatrix)```
can you show an example of the eigvals you get? you can always make them arbitrarily large by adding a scaled identity matrix
random matrices should have exponentially decaying eigenvalues, off the top of my head. there should be some papers discussing this... at least for the case of matrices with gaussian entries. then when squaring, you get a rather poor condition number unless you load the main diagonal
[-1.84880995e-15 1.78080773e-01 1.89576151e+00 4.99760224e+00
1.34696135e+01]
do you have any resources which i can use to read up on that?
i'll have to look around. as for your example, notice that the lragest and smallest eigenvalues are roughly 1e16 apart, which puts the dynamic range on the order of the machine epsilon. the smallest eigval is 0 to within machine precision
๐
section 3 here https://link.springer.com/content/pdf/10.1155/2007/71953.pdf and this other paper https://arxiv.org/pdf/2101.02928.pdf may shed some light
the high level idea being that, even in the best case (depending on the distribution), you will only get a diagonal covariance for infinitely long vectors or with infinitely many realizations averaged out. in the finite case this means the vector in the matrix are not orthogonal, which will have an impact on the eigenvalues
Much appreciated
\o/ For the first time, I have actually connected a training dashboard to my AI runs so I can actually see what the model is doing, neptune is cool, but i'm wondering if MLFlow is a better cheaper alternative 
Has anyone used the XLMR bert model before btw on larger datasets (800k+ points) my loss seems to be higher than I expected and the change seems to not really reducing and I can't quite work out if I should stop it or not...
It has early termination setup, but you can see the loss is changing quite aggressively
I think part of the reason might be the dataset itself isn't shuffed (which maybe I should do that ๐ )
so very similar pieces of text are already likely part of the same batch
any free gpt like repo i can clone and use for basic chating like reply with true or false
"sun sets in west"
"is message "xxxxxx" a spam"
look at ollama with llama2
requires a reasonably decent machine to run the smallest models still tho
found this they say u can use local host running a model as replacment for openai API too
yes there are lots of alternatives mostly build on llama
that being said, you need a very big machine to run the bigger models
and realistically its the biggest models which are the ones more comparable to openai
wait very noob question but how do i try this out? ig i can use only the very basic with 8b
is llama2 different then ollama?
ollama is a tool for running llama and other llms
Install ollama then you can do ollama run llama2
which will run the 7B param model
thanks
it's too frank ๐คฃ
isit supposed to be this frank out of the box lol?
ok iis codelama better almost as good as phind or chatgpt?
what are some cool things you guys are builduing/testing using these models?
Guys, Im about to read the statistical learning with application in python book, im just wondering whether i should learn a book about just statistics so that i can understand those concepts in the book? Or would it make sense to just read about general statistics and implement the concepts in python and then move onto the statistical learning book?