Productizing Machine Learning Models - An Introduction

  Feb 11, 2020 4:36:36 AM

The typical machine learning (ML) model lifecycle comprises of training and scoring pipelines. In the training pipeline multiple models are evaluated using different algorithms with the best model(s) being selected at the end. The scoring pipeline provides predictions on new data in production on these selected models. Depending on the end use case or industry application being served, these training and scoring pipelines may have to be accessed and updated with varying frequencies. This may be done monthly, daily, hourly, or even in a matter of minutes.

This post will explore some of the challenges in placing ML models in a production environment. It will also introduce few alternatives available to host different types of ML models.

Challenges in Productizing ML Models
The following are typical challenges faced while porting ML models from a development environment to a production environment:blog-imrTypes of ML Model Formats
Once a model has been generated, it needs to be saved or exported so that it can be used for scoring and prediction. Some of the common options available are as follows:

  1. Model object
    The model variable or object can be saved, calling the prediction function of the model object only when scoring of new data is required. This works well for rapid development and faster model iterations, but isn’t suitable for production-based scenarios.
  2. Pickle
    This method serializes a model to bitstream, so that it can be stored and accessed later. It is supported primarily in Python, and can also be exported to R language.
  3. JSON & YAML
    This approach saves and exports the model as a JSON object in case of SparkML, or as a YAML object for Keras. It also provides additional metadata about model parameters and training configuration along with compatibility for deep learning models.
  4. PMML
    Predictive Model Markup Language (PMML) is a standardized method of converting models into a format which is compatible across multiple languages. It has limited support for predictive algorithms, with support for complex algorithms being a work in progress for future releases.
  5. POJO and MOJO
    Plain Old Java Object (POJO) and Maven Plain Old Java Object (MOJO) provide a standardized way of embedding models into a Java based environment or application. At present, they are primarily supported by the H2O ML platform.

blog_challenge2Productizing ML models – A Brief Overview

ML Model Deployment Options

The following are some of the viable choices for hosting ML models:

1. Pub-Sub (Publisher-Subscriber) Approach
A publisher-subscriber approach allows models to be triggered based on events or in a scheduled manner. This can be achieved by combining multiple services such as Azure Event Hubs and Azure Functions, AWS Kinesis and AWS Lambda, Apache Kafka and Spark among others.

2. Web Services
ML models can also be exposed through RESTful APIs which can be deployed as containers using Kubernetes, OpenShift, and Docker. These provide the advantage of scalability and load balancing as and when required.

3. Web Applications
Similar to the web services approach, ML models can also be exposed through web applications such as React, Dash, and Angular among others. In turn, they will call a REST API hosting the model at the backend. This approach has the advantage of proving an interactive user interface for the end users.

4. Integration with Analytics Data Mart
This is typically used in batch systems as seen in credit card approvals, managing customer churn etc. It is ideal in situations where recommendations are expected in bulk and the volume of data can be predicted.

Sasken has been engaging with its clients in deploying a wide range of solutions. This ranges from setting up Information Management Systems, implementing Big data platforms to providing data discovery services and building predictive models via Machine Learning and Deep Learning frameworks. We work with our clients across various domains such as Industrial IoT, Manufacturing, Automotive, Transportation, and Telecommunications to help address their diverse needs.

Posted by:
Vignesh Radhakrishnan
Data Scientist- Applications and Data Services Practice

Want To Know More About This Topic?

You might also like