#Help !! Which statistical tool do I use? [for research, beginner]

107 messages · Page 1 of 1 (latest)

split blade
#

How can I use data about past occupied horizontal land mass/area in creating an equation that predicts the value after 20 years megathonk ? And which statistical tool is most appropriate? Thankss!! :3

naive marshBOT
#
  1. Ask your question and show the work you've done so far. If you've posted a screenshot of a question, specify which part you need help with.
  2. Wait patiently for a helper to come along.
  3. Once someone helps you, say thank you and close the thread with:
    +close
    
  4. Feel free to nominate the person for helper of the week in #helper-nominations
  5. Do not ping the mods, unless someone is breaking the rules.
  6. If you're happy with the help you got here, and the server overall, you can contribute financially as well:
teal kindle
#

For starters

split blade
teal kindle
#

You have data points X and you want to predict Y, in the form <w, X> + b

split blade
#

what if the points are too scattered tho? that means i cant rely on the regression model to write an equation for prediction ryt??

teal kindle
#

Where w is a vector and < . , . > is the dot product

teal kindle
#

To fit the best plane to fit your data

#

If you feel that a linear model is not appropriate, there must be something else that can be used

#

I'm just saying linear regression because it is fairly simple and beginner friendly

#

Otherwise in general you look for a feature map f so that you compute predictions Y ~ <w, f(X)> + b

split blade
#

we just had to state which tool we're gonna use

teal kindle
#

Two decades ago i think they used SVMs and whatnot

#

Decision trees too

#

It may sound a bit daunting but machine learning offers many other methods of regression

#

That are mostly more flexible than linear regression

#

That being said, usually the simpler the model the better

split blade
split blade
#

i rlly dont know stuff so im sorry xDD but likeee

split blade
#

or nahh ?

#

our research topic rlly got us

#

making equations and allat

#

lol

split blade
#

inevitably we have to write one

teal kindle
#

For a svm?

#

This is the complicated version

split blade
#

i'm def gonna consider ittt

#

in what ways is it better than just plain linear ?

#

i trust u thooo that it's more flexible n stuff : D

#

i just gootta know how to justify it to my group and my adviser too

teal kindle
#

So essentially, the idea is as follows

#

You have two random variables $X \in \mathbb{R}^p$ and $Y \in \mathbb{R}$, you want to find out how they are related

junior remnantBOT
teal kindle
#

So what you usually do is an estimation: $\hat{Y} = f(X)$, and you want your estimation not to be too far from $Y$

junior remnantBOT
teal kindle
#

So the entire question is, how do you build $f$ ?

junior remnantBOT
teal kindle
#

Here, what we call a model, you can think of it as a set of assumptions over the distribution of $X$ and the distribution of $Y$

junior remnantBOT
teal kindle
#

For instance, the linear regression model assumes that $Y = \langle w, X \rangle + b + \epsilon$, where $\epsilon \sim \mathcal{N}(0, \sigma^2)$

junior remnantBOT
teal kindle
#

In other words, a linear relationship with X, plus a small error

#

So you obviously want to find the w and b that fit best

#

So, here, your estimation of $Y$ is $\hat{Y} = \langle w, X \rangle + b$

junior remnantBOT
teal kindle
#

And this is a fairly simple model

#

because it's just linear

#

$\hat{Y} = f_{w,b}(X) = \langle w, X \rangle + b$

junior remnantBOT
teal kindle
#

But in general, there are models that come up with more complicated estimations of Y

#

In which case, without losing generality, you can denote them:
$\hat{Y} = f_{\theta}(X)$

junior remnantBOT
teal kindle
#

Where $\theta$ is the parameter of your model, which you want to fit

junior remnantBOT
teal kindle
#

Just like the line slope and intercept

#

Do you understand up to this point?

split blade
#

yahhh ey

teal kindle
#

Ok, that's very good

#

Because if you understand up to now

#

you basically understood machine learning

#

We make a model for the data, and we optimize the parameter $\theta$ to fit data the best

junior remnantBOT
teal kindle
#

And here, support vector machine is one such model

#

with its own specific set of parameters $\theta$

junior remnantBOT
teal kindle
#

And there are many other models, with their own sets of assumptions and their $f_\theta$

junior remnantBOT
teal kindle
#

It just turns out that in linear regression, $\theta = (w, b)$ and that the estimation $f_\theta$ is fairly simple

junior remnantBOT
teal kindle
#

That's why, if linear regression does not suffice, perhaps you need a better model

#

that isn't so simple, but may accurately model the variety of your data

split blade
#

yeahhh : D

teal kindle
#

And a couple of decades ago, before neural networks were popular, people liked SVMs a lot

#

But since nowadays you have more modern techniques to do regression, nowadays SVM is just a good math textbook exercise

#

by no means it is bad, just to be clear

#

definitely an affordable method to see if you just want to look around

#

but of course there are other things than SVM too

#

gaussian mixture

#

linear discriminant analysis (though mainly for classification)

split blade
#

ooooo~ i'll look into themm i can understand if i rlly tried xDD

#

nice knowing regressions r rlly the ones for thiss

#

i thought mayb i was geeking T__T

teal kindle
#

If my 10 minutes crash course got to you, then you understood basically what regression using ML is like

split blade
#

ooooo~ realll !

#

i get ittt

#

i think i got what i needed to know flush ech

teal kindle
#

I do advise you to talk with your teacher or professor about this too

#

Maybe they will have better insights to provide

#

I'm very biased because I do research in AI so there's that

split blade
#

thanksss brahh

teal kindle
#

You're welcome

split blade
#

stuff was fun gwahahahaha

#

i'll be sleeping now it's almost 4am in here xD

#

THANKS AGAINNN RAHHHHH