Drive Business Values with Agile Approach to Develop and Operationalize Machine Learning (ML) Models

10 min readMar 25, 2021

Business and technology professionals have been continuing to face challenges in operationalizing ML for effective development, deployment, and governance. Many of us still view the operationalization process as more of an art than a systemic approach. This results in significant challenges in the scalability and maintenance of the ML models. Why? Because ML initiatives are different from traditional IT product development initiatives. ML initiatives are very experimental and require skills from many more domains, for example — statistical analysis, data analysis, platform engineering, and application development. Also, there is often a lack of process understanding, communication gap between teams involved, and development and ops teams’ unwillingness to engage in each other domains for effective alignment of ML models’ development and operationalization.

Experts recommend that responsible business and IT professionals should reframe their thinking and focus on:

1. Machine learning development as its life cycle. For example, establish Machine Learning Development Life Cycle1 (MLDLC) by leveraging ML, DevSecOps, and DataOps skills and capabilities where data architecture comes first.

2. Establish an ML platform to support AI use cases

3. Once implemented, management and governance need to be put in place for continuous management and monitoring of the ML models, associated data, and business values

In the following, we will discuss critical capabilities like data, platform, processes, etc., needed for a practical operationalization framework to address the above challenges. MLDLC has been derived using the Cross-Industry Standard Process. This is a simple and yet applies discipline that is needed for data and engineering integrity and robustness. This results in establishing an ML model development life cycle comprising four primary phases — development, quality assurance, deployment, management, and governance.

A typical machine learning architecture includes five functional stages:

Planning and Development:

In contrast to a static algorithm coded by a software developer, an ML model is an algorithm that is learning and dynamically updated. You can think of a software application as an amalgamation of algorithms, defined by design patterns and coded by software engineers, that perform planned tasks. Once an application is released to the product, it may not function as intended, prompting developers to rethink, redesign, and rewrite it (continuous integration / continuous delivery). ML models, on the other hand, are essentially a set of dynamic algorithms. This dynamism presents a host of new challenges for a planner who works in conjunction with product owners and quality assurance (QA) teams. For example, how should the QA team test and report?

Unlike other IT projects, the appropriate moments for seeding a project portfolio approach is during the planning phase, and should start with creating a set of business problems to solve for:

Insights and problem identification. Focus on problems that would be difficult to solve with traditional programming. For example, consider Smart Reply. The Smart Reply team recognized that users spend a lot of time replying to emails and messages; a product that can predict likely responses can save the user time. Imagine trying to create a system like Smart Reply with conventional programming. There isn’t a straightforward approach. By contrast, machine learning can solve these problems by examining patterns in data and adapting to them. Think of ML as just one of your toolkit tools and only bring it out when appropriate.

With these examples in mind, ask yourself the following questions:

1. What problem is my product facing?

2. Would it be a good problem for ML?

Don’t ask the questions the other way around!

Data source discovery: The use cases to solve selected problems could come from industry best practices. Another approach is to create a “data map,” where we could explore existing analytical and data assets that have remained untapped. Business processes that can be automated, further drive existing business ideas, and uncover the need for new data sets should be included within the analytical process. The product owner should partner with business lines and IT organizations to tap into high volume data generating applications to look for untapped data sources and insights. In general, continuous innovation with AI is driven by data, ideas, ML models, and cross-functional teams.

Data transformation: Build a data pipeline to ingest data from various sources with varying structures into a Logical Data Warehouse (LDW). The data integration component supports ML pipeline needs, for example — real-time, near real-time, and batch data streams. This should be based on preliminary use cases and needs to evolve to support the upcoming problems to solve. Machine learning and the Apache Kafka® ecosystem are an excellent combination for training and deploying analytic models at scale:

Feature engineering or feature analysis is when features that describe the structures inherent in your data are analyzed and selected. Much of the ingested data may include variables that are redundant or irrelevant. Sometimes feature analysis is a part of the sample selection process. It’s a critical subcomponent that helps filter the data that may violate privacy conditions or promote unethical predictions.

Model engineering or data modeling includes the data model designs and machine algorithms used in ML data processing (including clustering and training algorithms). The modeling portion of the architecture is where algorithms are selected and adapted to address the problem that is examined in the execution phase. For example, if the learning application will involve cluster analysis, data clustering algorithms will be an art of the ML data model used here. If the learning to be performed is supervised, data training algorithms will be involved as well. Extensible algorithms are available as part of ApacheMXNet, Apache Spark MLiB, etc.

Model validation: There is no single validation method that works in all scenarios. It is essential to understand if you are dealing with groups, time-indexed data, or leaking data in your validation procedure. So, which validation method is right for me? It depends! But validation techniques typically stop at k-fold cross-validation. To minimize sampling bias, we can think about approach validation slightly differently. What if we do many splits and validate all combinations of these splits instead of doing a single split? This is where k-fold cross-validation comes in. it splits the data into k folds, then trains the data on k-1 folds and tests on the one-fold left out. It does this for all combinations and averages the result on each instance. The above illustration is sourced from towardsdatascience.com. The advantage is that all observations are used for both training and validation, and each observation is used once for verification.

Model execution: execution is the environment where the processed and training data is forwarded for use in the execution of ML routines (such as A/B testing and tuning). Depending on how advanced the ML routines are, the performance needed for execution may be significant. Hence, one key consideration in this is the amount of processing power required to effectively execute ML routines — whether that infrastructure is hosted on-premises as a service from a cloud provider. For example, for a relatively simple neural net with only four or five inputs (or, features”) in it, the processing could be handled using a regular CPU on a desktop server or laptop computer and vice-versa. mL and data science teams will often look to test and debug ML models or algorithms before deployment. The testing of ML models typically multidimensional — that is, developers must test for data, proper model fit, and good execution. This could be challenging, and it is recommended to design a test environment that mimics production as closely as possible to avoid issues when operationalizing the entire workflow.

Quality Assurance:

Model release. The model is promoted to the release step, ready to be undertaken by the operationalization team and labeled as a “candidate” model, development-vetted, but not yet fully production-ready. It is registered within the registry of the model management system.

Endpoint identification. This is the validation of the decision points where the model will be delivered its insight. In general, this implementation is in the form of a REST API or container images, depending on the deployment. The information about the endpoint identification points to AI-based systems models (for example, Chabot, image classification systems, etc.).

Parameter testing. Target business processes might be subject to technical constraints. The velocity, shape, volume, and quality of the input data in the production environment might not align with the sandboxes’ data to develop the models. This step aims to test the data’s alignment once the model is part of a real-world application, dashboard, or analytics report.

Integration testing. If the expected data matches the development assumptions, integration assumptions (REST APIs, microservices call and code integration), one needs to test to ensure proper performance.

Instantiation validation. AS models in production are often part of model ensembles, even slight variations in those elemental models (such as propensity to buy models instantiated across multiple states or regions in the same country) can produce radically different results

KPI validation. We should measure model performance against technical parameters (such as precision) and the hypothesis against the estimated business OKRs (Objectives and Key Results) set forth as part of the business understanding step indicated earlier within the prework step’s ideation section.

Deployment:

Once the model has been tested and confirmed to perform within the pre-defined parameters, the model is for deployment. The deployment phase aims to activate that model within existing business processes across the organization at the endpoints we previously discussed. From this point forward, the model remains active as long as it meets the business needs and performance goals. In the deployment phase, there are seven key steps to operationalize the model development process:

Management and governance. Once the model is ready for activation, it should be included in the catalog, documented, and versioned. The model management system should act as the single point of reference for managing and governing the module. In general, a business owns the model, similarly to how it owns the data used to build and train the model.

Model activation. In this step, the validated model is transitioned to an activated model. The models are “production-ready” and fully documented and meet the enterprise and government rules and policies.

Model deployment. The models are then executed based on the architecture, for example — on-premises, AWS/Azure, or hybrid. We should take proper steps to guarantee the model’s effective transaction processing.

Application integration. In this step, the model joins the AI-based system, e.g., chatbot framework. The application developers or data analysts come into play to integrate the model within a production application or analytical platform. The model is finally expected to deliver the business value.

Production audit procedures. Once the model is deployed, it is essential to monitor the performance of model. For this, model analytics must be implemented to gather the necessary data to watch models in production. Metrics like accuracy, response time, input data variations, and infrastructure performance should be operationalized to keep the model’s production output under control.

Model behavior tracking. Performance thresholds and notification mechanisms are implemented in this step. Model behavior tracking, together with the production audit procedure, systematically notifies any divergence or questionable behavior.

KPI validation. Continuing from the QA cycle and fed by the last two steps, the KPI validation step consistently measures the models’ business contribution in production AI-based systems. The notion is to receive the estimated business value that can be attributed to the model. The production audit data and procedures feed to the KPI validation process.

ML Organization and Governance:

Tim Fountaine and his colleagues in their research presented two cases in the HBR. One consolidated its AI and analytics teams in a central hub, with all analytics staff reporting to the chief data and analytics officer and being deployed to business units as needed. The second decentralized nearly all its analytics talent, having teams reside in and report to the business units. Both firms developed AI on a scale at the top of their industry; the second organization grew from 30 to 200 profitable AI initiatives in just two years. And both selected their model after considering their organizations’ structure, capabilities, strategy, and unique characteristics.

The hub. A small handful of responsibilities are always best handled by a hub and led by the chief analytics or chief data officer. These include data governance, AI recruiting and training strategy, and work with third-party data and AI services and software providers. Hubs should nurture AI talent, create communities where AI experts can share best practices, and layout processes for AI development across the organization. The Hubs should also be responsible for systems and standards related to AI. These should be driven by the needs of a firm’s initiatives.

The spokes. Another handful of responsibilities should almost always be owned by the spokes, because they’re closest to those who will be using the AI systems. Among them are tasks related to adoption, including end-user training, workflow redesign, incentive programs, performance management, and impact tracking.

Organizing for scale. AI-enabled companies divide key roles between a hub and spokes. The hub always owns a few tasks, and the spokes always own execution. The rest of the work falls into a gray area, and a firm’s characteristics determine where it should be done.

The gray area. Much of the work in successful AI transformations fall into a gray area in terms of responsibility. Deciding where responsibility should lie within an organization is not an exact science, but three factors should influence it:

The maturity of ML capabilities. It often makes sense for a hub to make room for analytics executives, data scientists, data engineers, user interface designers, visualization specialists who graphically interpret analytics findings and deploy resources as needed to the spokes in the early AI journey. Working together, these players can establish the company’s core AI assets and capabilities, such as standard analytics tools, data processes, and delivery methodologies. But as time passes and processes become standardized, these experts can reside within the spokes just as (or more) effectively.

Business model complexity. The greater the number of business functions, lines of business, or geographies AI tools will support, the greater the need to build guilds of AI experts (of, say, data scientists or designers). Companies with complex businesses often consolidate these guilds in the hub and then assign them out as needed to business units, functions, or geographies.

The pace and level of technical innovation are required. When an organization needs to innovate rapidly, some companies put more gray-area strategy and capability building in the hub to monitor industry and technology changes better and quickly deploy AI resources to head off competitive challenges.

Note:

1) Gartner, Google, and Microsoft popularized the term.

Drive Business Values with Agile Approach to Develop and Operationalize Machine Learning (ML) Models

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Imran Salahuddin, ICF PCC, SPC5, CSP

No responses yet