Regression in PyCaret!!!

Accredian Publication
7 min readSep 25, 2020

--

Let us first understand what Regression analysis is.

Regression analysis is a statistical process where a relationship between the dependant and independent features is established.
One of the most common forms of Regression analysis is Linear Regression.

→ Linear regression was the first type of regression analysis to be studied rigorously and to be used extensively in practical applications.
→ Linear Regression works on building a “Linear Relationship” between the independent and dependent variables in the data.
* When only one independent variable is present then the Linear regression can be said to be “Simple Linear Regression”.
* In the case of multiple independent features, the Linear Regression can be said to be “Multiple Linear Regression”.

Also, Check out our Article on:

Complete Guide to PyCaret
Classification using PyCaret
Anomaly Detection using PyCaret
Clustering using PyCaret

Regression Using Scikit-learn

→ We start by importing the necessary Libraries

import pandas as pdimport numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error,r2_score,mean_absolute_errorfrom catboost import CatBoostRegressorfrom sklearn.linear_model import LinearRegressionfrom sklearn.ensemble import RandomForestRegressorfrom xgboost import XGBRegressorfrom sklearn.tree import DecisionTreeRegressor

→ Then we need to check for the presence of missing values

data = pd.read_csv('/contents/boston.csv')
data.isnull().sum()

Handling outliers is also a important task, However for the sake of simplicity I will be skipping the step.

→ Then we need to separate the data into independent and dependant features and then split into train and test.

x = data.drop('medv',axis = 1)y = data['medv']x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.3,random_state = 123)

→ Building Models

linear_model = LinearRegression().fit(x_train,y_train)decision_tree = DecisionTreeRegressor().fit(x_train,y_train)random_model = RandomForestRegressor().fit(x_train,y_train)xgb_model = XGBRegressor().fit(x_train,y_train)cat_model = CatBoostRegressor().fit(x_train,y_train)

→ Predicting and evaluating different models

def evaluate_Regression_models(model,x_test,y_test):
prediction = model.predict(x_test)
print("Mean Absolute Error:",
mean_absolute_error(y_test,prediction))
print("Mean Squared Error : ",
mean_squared_error(y_test,prediction))
print("Root Mean Squared Error : ",
np.sqrt(mean_squared_error(y_test,prediction)))
print("R2 Score : ",r2_score(y_test,prediction))
  • Linear Regression
  • Decision Tree
  • RandomForest
  • XGBoost
  • CatBoost

If we look at all this in a single frame it comes around to 30 lines, even though hyperparameter tuning and outlier handling was not done.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error
from catboost import CatBoostRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
from sklearn.tree import DecisionTreeRegressor
data = pd.read_csv('/contents/boston.csv')
data.isnull().sum()
x = data.drop('medv',axis = 1)
y = data['medv']
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.3,random_state = 123)
linear_model = LinearRegression().fit(x_train,y_train)
decision_tree = DecisionTreeRegressor().fit(x_train,y_train)
random_model = RandomForestRegressor().fit(x_train,y_train)
xgb_model = XGBRegressor().fit(x_train,y_train)
cat_model = CatBoostRegressor().fit(x_train,y_train)
def evaluate_Regression_models(model,x_test,y_test):
prediction = model.predict(x_test)
print("Mean Absolute Error:",
mean_absolute_error(y_test,prediction))
print("Mean Squared Error : ",
mean_squared_error(y_test,prediction))
print("Root Mean Squared Error : ",
np.sqrt(mean_squared_error(y_test,prediction)))
print("R2 Score : ",r2_score(y_test,prediction))

What if I tell you all these 25 lines of code could be reduced to a mere 10 lines of code including hyperparameter tuning.

Getting Started with Regression in PyCaret!!

If you are not familiar with PyCaret. I suggest you to first go through the below link before moving on from here.

Complete Guide to PyCaret.

→ Reading the data in the PyCaret library.

import pycaretfrom pycaret.regression import *data = pd.read_csv('/contents/boston.csv')

We will work with the boston data.

Setting up the PyCaret environment

Before moving on with any kind of experimentation using PyCaret we need to set up the environment.
It is a mandatory step that should be done before any machine learning experiment.

reg = setup(data = data, target = 'medv')

As you know PyCaret helps in model deployment too, so all the experiment done is saved in a pipeline and this pipeline can be deployed into production with ease.

After this press enter and you will get results as shown below.

Compare models

This function compares each and every model present in the PyCaret depending upon the problem statement.
Training of every model is done using the default hyperparameters and evaluates performance metrics using the cross-validation.

compare_models()

Creating Models

Creating a model in PyCaret is one of the simplest tasks.

The “create_model” function takes in just the model ID as a string and performs the task.

CBR = create_model('catboost')

Regression: MAE, MSE, RMSE, R2, RMSLE, MAPE

Model ID for Regression Models.

+------------+-----------------------------------+
| ID | Name |
+------------+-----------------------------------+
| ‘lr’ | Linear Regression |
| ‘lasso’ | Lasso Regression |
| ‘ridge’ | Ridge Regression |
| ‘en’ | Elastic Net |
| ‘lar’ | Least Angle Regression |
| ‘llar’ | Lasso Least Angle Regression |
| ‘omp’ | Orthogonal Matching Pursuit |
| ‘br’ | Bayesian Ridge |
| ‘ard’ | Automatic Relevance Determination |
| ‘par’ | Passive Aggressive Regressor |
| ‘ransac’ | Random Sample Consensus |
| ‘tr’ | TheilSen Regressor |
| ‘huber’ | Huber Regressor |
| ‘kr’ | Kernel Ridge |
| ‘svm’ | Support Vector Machine |
| ‘knn’ | K Neighbors Regressor |
| ‘dt’ | Decision Tree |
| ‘rf’ | Random Forest |
| ‘et’ | Extra Trees Regressor |
| ‘ada’ | AdaBoost Regressor |
| ‘gbr’ | Gradient Boosting Regressor |
| ‘mlp’ | Multi Level Perceptron |
| ‘xgboost’ | Extreme Gradient Boosting |
| ‘lightgbm’ | Light Gradient Boosting |
| ‘catboost’ | CatBoost Regressor |
+------------+-----------------------------------+

Tune Model

It provides just one line function to perform hyperparameter tuning of any model present in the PyCaret Library.

It tunes the hyperparameter of the model passed as an estimator using a Random grid search with pre-defined grids that are fully customizable.

tuned_CBR = tune_model(CBR,n_iter = 50)

Plot a Model

It helps in checking the performance of a model with different graphs in one line of code.

model = create_model('Model_name')plot_model(model)

Plot ID for Regression Models

+-----------------------------+-------------+
| Name | Plot |
+-----------------------------+-------------+
| Residuals Plot | ‘residuals’ |
| Prediction Error Plot | ‘error’ |
| Cooks Distance Plot | ‘cooks’ |
| Recursive Feature Selection | ‘rfe’ |
| Learning Curve | ‘learning’ |
| Validation Curve | ‘vc’ |
| Manifold Learning | ‘manifold’ |
| Feature Importance | ‘feature’ |
| Model Hyperparameter | ‘parameter’ |
+-----------------------------+-------------+

Interpret Model

After building a model one of the most important task is to interpret the results.

Model Interpretability helps debug the model by analyzing what the model really thinks is important.

interpret_model(tuned_CBR)

Predict Model

predict_model(tuned_CBR)

Finalize Model

It is the last step of building a model in PyCaret.

This function takes a trained model object and returns a model that has been trained on the entire dataset.

model = create_model('Model_name')finalize_model(model)

Save Models

Saving a trained model in PyCaret is as simple as writing save_model. The function takes a trained model object and saves the entire transformation pipeline and trained model object as a transferable binary pickle file for later use.

save_model(tuned_CBR,'final_model')

If we look at all the code lines in a single frame, we can see how PyCaret literally reduces the time to build as well as compare models.

import pycaret
from pycaret.regression import *
data = pd.read_csv('/contents/boston.csv')reg = setup(data = data, target = 'medv')compare_models()CBR = create_model('catboost')tuned_CBR = tune_model(CBR,n_iter = 50)interpret_model(tuned_CBR)predict_model(tuned_CBR)finalize_model(model)

It is clear how PyCaret’s Low code approach can boost the experimentation time for data scientist and come to solution without wasting time on codes.

Also, Check out our Article on:

Complete Guide to PyCaret
Classification using PyCaret
Anomaly Detection using PyCaret
Clustering using PyCaret

Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.

Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.

Visit us on https://www.insaid.co/

--

--

Accredian Publication

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!