Sasken - Blog

Quality Assurance for Machine Learning Models - Part 2

Written by Vignesh Radhakrishnan | Dec 9, 2019 12:30:53 PM

After covering the importance of QA in the context of AI and ML models in Part 1, this post will cover the various Black box testing techniques appropriate for ML models. It will also shed some light on the best practices for their QA testing and Sasken’s unique approach.

Black box Testing Techniques for ML/AI models

Black box testing follows a methodology, wherein the internal functional design and implementation are not known to the QA engineer. In the context of ML/ AI models, Black box testing means that aspects such as details of algorithms used, model tuning parameters, data transformation, and model features etc. are not made available to the QA engineer. This helps in providing an objective evaluation of the model, eliminates developer bias to a large extent, and exposes discrepancies in the ML/ AI application being tested. The following Black box testing techniques are adopted for QA evaluation of ML models:

  • Validation Testing of Deployed Model

In this method, we evaluate the performance of model post-deployment on test data sets and new data from production scenarios. It compares current model performance (using metrics like accuracy, recall, MAPE etc.), against accuracy at the time of deployment in an ongoing manner. Any significant differences in the metrics are highlighted and trigger a re-evaluation of the deployed model

  • Validation Against Multiple Alternate Models

The intent with this technique is to test all models built during the model development process, along with the final model selected for deployment. This ensures an ongoing performance evaluation of all models and gives us better insights about their behavior. For example, a data scientist might have evaluated multiple algorithms such as Random Forest, Support Vector Machine and Gradient boosted trees initially before finalizing on Random Forest for deployment. The performance of these models might improve or degrade when additional data is provided to them. In case of significant variations in results of the alternate models when compared to the finalized model, a defect is logged for further analysis by Data Scientists

  • Performance Testing of Models

Test model performance in terms of compute resources required for processing, memory required to host the model (size of model), latency in response from model execution etc. Performance degradation can also be a trigger to evaluate optimization of the existing model or development of an alternative more efficient model 

  • Metamorphic Testing

This method involves data mutation to augment test data sets used for evaluating model performance and accuracy. This could involve aspects such as changing value of certain features, addition or removal of attributes, duplication of samples, removal of samples, affine transformations etc.  Using this method we can potentially simulate scenarios not covered in training data, and therefore increase model stability

  • Coverage Guided Fuzzing

This technique prepares test data that ensures all features in a model or neurons in a neural network are activated and tested in turn. Based on the feedback obtained from the model, further test datasets are developed with guided test scenarios developed over time.


Business Benefits of Black box Testing
  • Improved decision making by validating robustness of models
  • Greater focus of Data science team on model development process, by reducing model testing and maintenance overhead for the core team
  • Better auditability and transparency of the model as a result of 3rd party validation

 

Best Practices for ML/ DL Model Testing
  • It is recommended to expose models being tested as RESTful service so they can be called quickly and predictions for new incoming data can be easily made to identify defects, if any
  • To assist in ongoing validation of model, it is necessary to create a store of “Error Records” in a repository. This should be automatically updated with the results of predictions vs. actuals for any new tests carried out
  • Defects should be sorted by metrics such as prediction probability or error percentages to ensure test engineers start their testing scenarios with the most useful test cases
  • To enable faster validation, it is important to create a test dataset based on historical data, derived from previously known defects and successful scenarios i.e. verified results from the model
  • If new bugs are successfully verified and their correct label value is known, such records are marked as new training data and are to be stored in the repository for further evaluation and testing


Sasken’s Approach to ML Black box testing

To create an effective Black box testing strategy, the following components are essential in Sasken’s approach:

  • Establishing a QA Center of Excellence (CoE )/ team, independent from the development team to ensure unbiased evaluation of the deployed models
  • Documentation of business case and sources of data. This should detail the modelling methodology followed by the Data science team and results of data analysis
  • Availability of training and testing datasets used in model development, as well as access to data generated post-production deployment of these models
  • Black box testing of models using the four methodologies detailed earlier
  • Detailed documentation of results, test bed creation, analysis of failed test cases etc. provided as feedback to the Data Scientists


Sasken’s approach to Black Box TestingClick to Enlarge

Read more about Sasken's expertise in providing efficient Digital Testing strategies.