Model Deployment using Flask: AI End-to-End Series (Part — 4)

Accredian Publication
5 min readDec 28, 2021


By Hiren Rupchandani, Abhinav Jangir, and Ashish Lepcha

Now that our model is ready to be deployed, this article will cover how we can locally deploy it using flask. Before proceeding, let’s refresh on the basics of deployment.

  • Deployment is defined as a process through which you integrate a machine learning model into an existing production environment to obtain effective business decisions based on data.
  • It is one of the last steps in the machine learning life cycle.
  • The purpose of deploying your model is so that you can make the predictions from a trained ML model available to others, whether that be users, management, or other systems.

Necessary Steps in Model Deployment

The four steps to machine learning deployment include:

  1. Develop and create a model in a training environment.
  2. Test and clean the code to prepare it for deployment.
  3. Prepare for container deployment.
  4. Plan for continuous monitoring and maintenance after machine learning deployment.

Type of Model Deployment

The several ways to deploy a model can be categorized as follows:

1. One-off

  • There can be cases where you don’t need to re-train your model every time it is in production.
  • This is where one-off deployment comes in where you train your model just once before pushing it to production.
  • The model requires re-training only when the pipeline needs to be fixed or the model is deteriorated (also known as concept drift).
One-off deployment (Credit: Spongebob)

2. Batch

  • Batch allows you to constantly have an up-to-date version of your model.
  • In batch processing, a series of tasks are clearly defined such that they execute at certain triggers or intervals.
  • The model is trained after certain intervals with a new batch of data that has been collected recently during the production phase.
A batch prediction architecture on Google Cloud (Source)

3. Real-time

  • There can be times where you need to make predictions in real-time.
  • For example: Determining fraudulent transactions or spam email checks.
  • This can be made possible using highly efficient pipelines consisting of models with low latency and good throughput.
Low Latency Model using GCP (Source)
  • Many systems use a hybrid of both — real-time and batch deployment.

What is Flask?

  • Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries.
  • It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions.
  • Flask depends on the Jinja template engine and the Werkzeug WSGI toolkit.

What is Ngrok?

  • Ngrok is a free program that exposes a process running on your localhost to the public internet.
  • Ngrok is a handy tool that creates a secure, publicly accessible tunnel URL to an application that’s running on localhost.
  • You can share the tunnel URL so others can view your work, or you can create it publicly accessible.
  • We’ll be using the pyngrok library for this. pyngrok is a Python wrapper for ngrok that makes ngrok available via a convenient Python API.

Installing Flask and Pyngrok

  • Flask and pyngrok libraries can be installed simply using the pip package:
!pip install -U Flask
!pip install pyngrok
  • We will also import the necessary packages for the deployment
import threading
from pyngrok import ngrok
from PIL import Image
from flask import Flask, redirect, url_for, request, render_template, jsonify

Model Deployment Using Flask

  • Creating a development environment:
os.environ["FLASK_ENV"] = "development"
  • Initialize the flask web app and set a port number:
app = Flask(__name__)
port = 5000
  • Setting an authorization token allows us to open multiple tunnels at the same time:
  • Open a ngrok tunnel to the HTTP server:
public_url = ngrok.connect(port).public_urlprint(" * ngrok tunnel \"{}\" -> \"{}\"".format(public_url, port))
  • You will find a ngrok auth token that you can set as your base URL.

P.S. Every user gets their own ngrok token. Kindly generate your own token on ngrok website by signing in.

  • Update any base URLs to use the public ngrok URL:
app.config["BASE_URL"] = public_url
  • Load the pre-trained Keras model:
model = load_model('/content/drive/MyDrive/AI Deployment materials /Deploy Face mask Detection/3. Model Building/Mask_detection_model(3).h5')
  • Rendering Our Main Template:

We had already created an “index1.html” page where we can upload our image and send it our model for predictions.

def index():
____# Main page
____return render_template('index1.html')
  • Get the file from post request, pre-process the file, and make the prediction using the model.
@app.route('/predict', methods=['POST'])def upload():
____if request.method == 'POST':
____# Get the file from post request
____f = request.files['file']
____# Opening Uploaded File
____image =['file'].stream)
____# Resizing Image Based On Model Requiremen
____timage = image.resize((224,224))
____# Changing Image to Array of p
____ixelsimage = img_to_array(image)
____# Preprocessing Image
____image = preprocess_input(image)
____image=np.expand_dims(image, axis=0)____# Predicting on Test Image
____y_pred=np.where(predictions >= threshold, 'Non Mask','Mask')
____result = y_pred[0]
____print('[PREDICTED CLASSES]: {}'.format(y_pred))
____print('[RESULT]: {}'.format(result))
____return result
  • We will use another thread to start our flask server.
  • Threading enables more than one work to be done almost simultaneously in the same process environment. It is one of the parallel programming methods.
  • Thread provides us convenience by shortening the loading time.
  • Start the Flask server in a new thread
threading.Thread(, kwargs={"use_reloader": False}).start()
  • Go to the URL provided in the output. For our system, the URL is given as:
ngrok tunnel "" -> ""
  • On navigating to the URL, we see the following page:
Index Page
  • You can now select any image (masked/unmasked) and make predictions:
Making Prediction
  • Hurray! We have finally deployed our model!
Model Deployment

What’s Next?

In the next article of this series, we will see how we can deploy our model on a server using GCP.

Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.

Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.



Accredian Publication

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!