Drive Business Values with Agile Approach to Develop and Operationalize Machine Learning (ML) Models
Business and technology professionals have been continuing to face challenges in operationalizing ML for effective development, deployment and governance. Many of us still view the operationalization process more of an art than a systemic approach. This results in significant challenges in scalability and maintenance of the ML models. Why? Because ML initiatives are different from traditional IT product development process. ML initiatives are very experimental and require skills from many more domains, for example — statistical analysis, data analysis, platform engineering and application development. Also, there is often lack of process understanding, communication gap between teams involved, and development and ops teams’ unwillingness to engage in each other domains for effective alignment of development and operationalization of ML models.
It is recommended that responsible business and IT professionals should reframe their thinking and focus on:
1. Machine learning development as its own life cycle. For example, establish Machine Learning Development Life Cycle (MLDLC) by leveraging ML, DevSecOps and DataOps skills and capabilities where data architecture comes first.
2. Establish ML platform to support AI use cases
3. Once implemented, management and governance need to be put in place for continuous management and monitoring of the ML models, associated data and business values
In the following, we will discuss critical capabilities like data, platform, processes etc. needed for an effective operationalization framework to address the above challenges. MLDLC has been derived using the Cross-Industry Standard Process. This is a simple and yet applies discipline that is needed for data and engineering integrity and robustness. This results in establishing an ML model development life cycle which comprises of four main cycles — development, quality assurance, deployment and management and governance.
A typical machine learning architecture includes five functional stages:

Planning and Development:
In contrast to a static algorithm coded by a software developer, an ML model is an algorithm that is learning and dynamically updated. You can think of a software application as an amalgamation of algorithms, defined by design patterns and coded by software engineers, that perform planned tasks. Once an application is released to product, it may not perform as planned, promoting developers to rethink, redesign, and rewrite it (continuous integration / continuous delivery). ML models, on the other hand, essentially dynamic algorithms. This dynamism presents a host of new challenges for planner, who work in conjunction with product owners and quality assurance (QA) teams. For example, how should the QA team test and report?
Unlike other IT projects, the appropriate moments for seeding a project portfolio approach is during the planning phase, and should start with creating a set of business problems to solve for:
Insights and problem identification. Focus on problems that would be difficult to solve with traditional programming. For example, consider Smart Reply. The Smart Reply team recognized that users spend a lot of time replying to emails and messages; a product that can predict likely responses can save user time. Imagine trying to create a system like Smart Reply with conventional programming. There isn’t a clear approach. By contrast, machine learning can solve these problems by examining patterns in data and adapting with them. Think of ML as just one of the tools in your toolkit and only bring it out when appropriate.
With these examples in mind ask yourself the following questions:
1. What problem is my product facing?
2. Would it be a good problem for ML?
Don’t ask the questions the other way around!
Data source discovery: The use cases to solve selected problems could come from industry best practice. Another approach is to create “data map,” where we could explore existing analytical and data assets that have remained untapped. Business processes that can automated, to further drive existing business ideas, and uncover the need for new sets of data should be included within the analytical process. Partner with business lines and IT organizations, to tap into high volume data generating applications to look for untapped data sources and insights. In general, continuous innovation with AI is driven by data, ideas, ML models and cross-functional team.
Data transformation: Build a data pipeline to ingest data from various sources with varying structures into a Logical Data Warehouse (LDW). The data integration component supports ML pipeline needs, for example — real time, near real time and batch data streams. This should be based on preliminary use cases and needs to evolve to support the upcoming problems to solve. Machine learning and the Apache Kafka® ecosystem are a great combination for training and deploying analytic models at scale:
Source: Confluent

The feature engineering or feature analysis is a process during which features that describe the structures inherent in your data are analyzed and selected. Much of the ingested data may include variables are redundant or irrelevant. Sometime feature analysis is a part of the sample selection process. It’s an important subcomponent that assists with filtering the data that may violate privacy conditions or promote unethical predictions.
Model engineering or data modeling includes the data model designs and machine algorithms used din ML data processing (including clustering and training algorithms). The modeling portion of the architecture is where algorithms are selected and adapted to address the problem that be examined in the execution phase. For example, if the learning application will involve cluster analysis, data clustering algorithms will be art of the ML data model used here. If the learning to be performed is supervised, data training algorithms will be involved as well. Extensible algorithms are available as part of ApacheMXNet, Apache Spark MLiB etc.
Model validation: There is no single validation method that works in all scenarios. It is important to understand if you are dealing with groups, time-indexed data, or if you are leaking data in your validation procedure. So which validation method is right for me? It depends! But validation techniques typically stop at k-fold cross-validation. To minimize sampling bias, we can think about approach validation slightly different. What if, instead of making a single split, we make many splits and validate on all combinations of these splits? This is where k-fold cross-validation comes in. it splits the data into k folds, then trains the data on k-1 folds and test on the one-fold that was left out. It does this for all combinations and averages the result on each instance.

The above illustration is sourced from towardsdatascience.com. The advantage is that all observations are used for both training and validation, and each observation is used once for validation.
Model execution: execution is the environment where the processed and training data is forwarded for use in the execute of ML routines (such as A/B testing and tuning). Depending on how advanced the ML routines are, the performance needed for execution may be significant. Hence, one key consideration in this are is the amount of processing power that will be needed to effectively execute ML routines — whether that infrastructure is hosted on-premises as a service from a cloud provider. For example, for a relatively simple neural net with only four or five inputs (or, features”) in it, the processing could be handled using a regular CPU on a desktop server or laptop computer and vice-versa. mL and data science teams will often look to test and debug ML models or algorithms prior to deployment. The testing of ML models typically multidimensional — that is, developers must test for data, proper model fit and proper execution. This could be challenging, and it is recommended to design the test environment that mimic production as closely as possible to avoid issues when operationalizing the entire workflow.
Quality Assurance:
Model release. The model is promoted to the release step, ready to be undertaken by the operationalization team and labeled as a “candidate” model. Which is development-vetted, but not yet fully production ready. It is registered within the registry of the model management system.
Endpoint identification. This is the validation of the decision points where the model will be delivered its insight. In general, this implementation is in the form of a REST API or container images, depending on the deployment. The information pertaining to the endpoint identification points of the models for AI-based systems (for example, Chabot, image classification systems etc.).
Parameter testing. Target business processes might be subject to technical constraints where the velocity, shape, volume and quality of the input data in the production environment might not exactly align with the data in sandboxes used to develop the models. This step is aimed at testing the alignment of the data once the model is part of a real-world application, dashboard or an analytics report.
Integration testing. If the expected data matches the development assumptions, integration assumptions (that is REST APIs, microservices call and code integration) also must be tested to ensure proper performance.
Instantiation validation. AS models in production are often part of model ensembles, even slight variations in those elemental models (such as propensity to buy models instantiated across multiple states or regions in the same country) can produce radically different results
KPI validation. Model performance should be measured not only against technical parameters (such as precision) but also against the hypothesis against the estimated business OKRs (Objectives and Key Results) set forth as part of the business understanding step indicated earlier within the ideation section of the prework step.
Deployment:
Once the model has been tested and confirmed to perform within the pre-defined parameters, the model is for deployment. The objective in the deployment phase is to activate that model within existing business processes across the organization at the endpoints, we previously discussed. From this point forward the model remains active as long it is meeting the business needs and its performance goals. In the deployment phase, there are seven key steps to operationalize the model development process:
Management and governance. Once the model is ready for activation, it should be included in the catalogue, documented, and versioned. The model management system should act as the single point of reference for managing and governing the module. In general business is owner of the model, similarly how it owns the data that was used to build and train the model.
Model activation. In this step, the validated model is transitioned to as activated model. The models are “production ready” and fully documented and meets the enterprise and government rules and policies.
Model deployment. Based on the architecture, the models are then executed, for example — on-premises, AWS/Azure or hybrid. Measures should be taken to guarantee the models effective transaction processing.
Application integration. In this step, the model joins the AI based system e.g. chatbot framework. The application developers or data analysts come into play to integrate the model within a production application or analytical platform. The model is finally expected to deliver the business value.
Production audit procedures. Once the model is deployed, it is very important to monitor the performance of the model. For this, model analytics must be implemented to gather the necessary data to monitor models in production. Metrics like accuracy, response time, input data variations and infrastructure performance should be operationalized to keep the model’s production under control.
Model behavior tracking. Performance thresholds and notification mechanisms are implemented in this step. Model behavior tracking together with the production audit procedure systematically notifies any divergence or questionable behavior.
KPI validation. Continuing from QA cycle and fed by last two steps, the KPI validation step consistently measures the business contribution of the models in production AI-based systems. The notion is to receive estimated business value that can be attributed to the model. The production audit data and procedures feed to the KPI validation process.
ML Organization and Governance:
Tim Fountaine and his colleagues in their research presented two cases in the HBR. One consolidated its AI and analytics teams in a central hub, with all analytics staff reporting to the chief data and analytics officer and being deployed to business units as needed. The second decentralized nearly all its analytics talent, having teams reside in and report to the business units. Both firms developed AI on a scale at the top of their industry; the second organization grew from 30 to 200 profitable AI initiatives in just two years. And both selected their model after considering their organizations’ structure, capabilities, strategy, and unique characteristics.
The hub. A small handful of responsibilities are always best handled by a hub and led by the chief analytics or chief data officer. These include data governance, AI recruiting and training strategy, and work with third-party providers of data and AI services and software. Hubs should nurture AI talent, create communities where AI experts can share best practices, and lay out processes for AI development across the organization. Hubs should also be responsible for systems and standards related to AI. These should be driven by the needs of a firm’s initiatives.
The spokes. Another handful of responsibilities should almost always be owned by the spokes, because they’re closest to those who will be using the AI systems. Among them are tasks related to adoption, including end-user training, workflow redesign, incentive programs, performance management, and impact tracking.

Organizing for scale. AI-enabled companies divide key roles between a hub and spokes. A few tasks are always owned by the hub, and the spokes always own execution. The rest of the work falls into a gray area, and a firm’s individual characteristics determine where it should be done.
The gray area. Much of the work in successful AI transformations falls into a g ray area in terms of responsibility. Deciding where responsibility should lie within an organization is not an exact science, but it should be influenced by three factors:
The maturity of ML capabilities. When a company is early in its AI journey, it often makes sense for analytics executives, data scientists, data engineers, user interface designers, visualization specialists who graphically interpret analytics findings, and the like to sit within a hub and be deployed as needed to the spokes. Working together, these players can establish the company’s core AI assets and capabilities, such as common analytics tools, data processes, and delivery methodologies. But as time passes and processes become standardized, these experts can reside within the spokes just as (or more) effectively.
Business model complexity. The greater the number of business functions, lines of business, or geographies AI tools will support, the greater the need to build guilds of AI experts (of, say, data scientists or designers). Companies with complex businesses often consolidate these guilds in the hub and then assign them out as needed to business units, functions, or geographies.
The pace and level of technical innovation required. When they need to innovate rapidly, some companies put more gray-area strategy and capability building in the hub, so they can monitor industry and technology changes better and quickly deploy AI resources to head off competitive challenges.
Note:
1) The term was popularized by Gartner, Google and Microsoft.