#Linear regression model code in jupyter with sklearn

23 messages · Page 1 of 1 (latest)

foggy sapphire
#

Hi Wondering if any one can help.

I'm trying to get my head around how to create linear regression models and im getting some low numbers in my scores and wondering if any one could show me some code and explain how to get some better scores

im using the KC housing data from Kaggle. would be much appreciated

granite sedge
foggy sapphire
#

In my rsquared result. I'm wondering what parameters can I change to increase it.

granite sedge
#

Could you show us what you've done

#

And any plots of your data you've made?

foggy sapphire
#

import matplotlib as plt
import seaborn as sb
import pandas as pd
import numpy as np
import sklearn
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn import metrics

dataset = pd.read_csv('kc_house_data.csv')

dataset.fillna(0, inplace=True)

for i in dataset.columns:
dataset[i]=dataset[i].astype(float)

dataset['house_age'] = 2023 - dataset['yr_built']

dataset['reno_age'] = 2023 - dataset['yr_renovated']

dataset['reno_age'] = dataset.reno_age.apply(lambda x: x if len(str(int(x)))==2 else 0.0)

dataset.drop(['yr_built', 'yr_renovated'], axis=1, inplace=True)

X = dataset.iloc[ : , 1: ]
Y = dataset.iloc[ : , 0]

X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.1, random_state=3)

regressor = LinearRegression()
regressor.fit(X_train, Y_train)

svregressor = SVR(kernel = 'linear' )
svregressor.fit(X_train, Y_train)

dtregressor = DecisionTreeRegressor()
dtregressor.fit(X_train, Y_train)

rfregressor = RandomForestRegressor()
rfregressor.fit(X_train, Y_train)

pred = regressor.predict(X_test)
predsv = svregressor.predict(X_test)
preddt = dtregressor.predict(X_test)
predrf = rfregressor.predict(X_test)
print(Y_test)
print(pred)
print(predsv)
print(preddt)
print(predrf)

r_square = metrics.r2_score(Y_test, pred)
r_squaresv = metrics.r2_score(Y_test, predsv)
r_squaredt = metrics.r2_score(Y_test, preddt)
r_squarerf = metrics.r2_score(Y_test, predrf)
print(r_square)
print(r_squaresv)
print(r_squaredt)
print(r_squarerf)

granite sedge
#

I can see a few issues with your code and your process

#

Have you done any explorations or visualisations of your data?

foggy sapphire
#

no thing yet

foggy sapphire
#

where would you suggest I start with visualising

granite sedge
#

Well I would first have a look at what columns you have and based on that, some data visualisations may be useful

#

For example, for numeric features, you may want to plot scatterplots to see if your data fits the assumptions for a linear regression

#

Or if there may be multicollinearity

foggy sapphire
#

I have run sb.pairplot(dataset) and can see it over, would you suggest dropping anything that dropping anything that doesnt have the linear regression factors

granite sedge
#

Not necessarily

#

You can always apply transformations to data if it's not linear

#

It's kind of hard for me to suggest what you can do because I don't have the data nor can I see what you have/your outputs

#

Are you following any guides/tutorials for this project or are you just trying it out yourself

foggy sapphire
#

we have been asked the below

#

Task 1: Regression Task (50% - 100 marks)
Create a regression model using algorithm 1 – state which algorithm and submit working source code (20 marks)
Create a regression model using algorithm 2 – state which algorithm and submit working source code (20 marks)
Report (60 marks) – you should use the points below as a guide, you may also write about different areas of what you have submitted regarding your code.
Showing accuracies of each model. Which did better? (10 marks)
Explanation of how the two models work (include references). Why did one perform better than the other for your dataset? (25 marks)
Suggest different methods to improve your models. E.g., Remove/add data? If so, which features would you build on? Think about visualising the data first and seeing which features do not help the prediction. (25 marks)
Task 2: Classification Task (predicting a class for new input) (50% - 100 marks)
Create a classification model using algorithm 1 – state which algorithm and submit working source code (20 marks)
Create a classification model using algorithm 2 – state which algorithm and submit working source code (20 marks)
Report (60 marks) – you should use the points below as a guide, you may also write about different areas of what you have submitted regarding your code.
Showing accuracies of each model. Which did better? (10 marks)
Explanation of how the two models work (include references). Why did one perform better than the other for your dataset? (25 marks)
Suggest different methods to improve your models. E.g. Remove/add data? If so, which features would you build on? Think about visualising the data first and seeing which features do not help the prediction. (25 marks)

#

im using the KC housing data for the regression from kaggle

#

the two models i was going to focus on would be SVR and Random tree