Anomaly Detection Using PyCaret!!!

Accredian Publication
4 min readMar 12, 2021

--

  • Anomaly detection involves identifying the differences, deviations, and exceptions from the norm in a dataset.
    For example, credit card companies collect data on everything we purchase, including the amount of money we spend, where we spend it, what we spend it on, how frequently we make purchases, and more.
  • The goal of anomaly detection is to identify cases that are unusual within data that looks the same.
  • Anomaly detection is an important tool for detecting fraud, network intrusion, and other rare events that may have great significance but are hard to find.

Getting Started with Anomaly Detection!!

If you are not familiar with PyCaret. I suggest you to first go through the below link before moving on from here.

Complete Guide to PyCaret.

Also, Check out our Article on:

Complete Guide to PyCaret
Classification using PyCaret
Regression using PyCaret
Clustering using PyCaret

Reading Data

Reading the pre-loaded data in the PyCaret library.
You can also use datasets from outside.

from pycaret.datasets import get_dataget_data('index')

Scrolling down we can find datasets available for the Classification Modelling.

import pycaretfrom pycaret.anomaly import *data = get_data('anomaly')

We will use the anomaly data

Setting up the PyCaret environment

Before moving on with any kind of experimentation using PyCaret we need to set up the environment.
It is a mandatory step that should be done before any machine learning experiment.

ano = setup(data = data)

As you know PyCaret helps in model deployment too, so all the experiment done is saved in a pipeline and this pipeline can be deployed into production with ease.

After this press enter and you will get results as shown below.

Creating Models

Creating a model in PyCaret is one of the simplest task.

The “create_model” function takes in just the model ID as string and performs the task.

create_model('Model_id')

Model ID for Anomaly Models.

+-------------+-----------------------------------+
| ID | Name |
+-------------+-----------------------------------+
| ‘abod’ | Angle-base Outlier Detection |
| ‘iforest’ | Isolation Forest |
| ‘cluster’ | Clustering-Based Local Outlier |
| ‘cof’ | Connectivity-Based Outlier Factor |
| ‘histogram’ | Histogram-based Outlier Detection |
| ‘knn’ | k-Nearest Neighbors Detector |
| ‘lof’ | Local Outlier Factor |
| ‘svm’ | One-class SVM detector |
| ‘pca’ | Principal Component Analysis |
| ‘mcd’ | Minimum Covariance Determinant |
| ‘sod’ | Subspace Outlier Detection |
| ‘sos | Stochastic Outlier Selection |
+-------------+-----------------------------------+

Plot a Model

It helps in checking the performance of a model with different graphs in one line of code.

plot_model(model)

By default, tsne plots are done.

Umap Plot:

Predict Model

After our model is built, we can use a function to generate predictions on hold-out / test set.

prediction = predict_model(model,data = data)

Checking all the data with label 1:

Save Models

Saving a trained model in PyCaret is as simple as writing save_model. The function takes a trained model object and saves the entire transformation pipeline and trained model object as a transferable binary pickle file for later use.

Also, Check out our Article on:

Complete Guide to PyCaret
Classification using PyCaret
Regression using PyCaret
Clustering using PyCaret

Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.

Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.

Visit us on https://www.insaid.co/

--

--

Accredian Publication

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!