Data science & machine learning are the latest buzz words within most companies. From the CEO to employees at all levels, most want to use the data to gain magical insights about customers, business and operations. This leads to smaller, disjointed data science models within the company. Which leads to engineering rework, wasted efforts, high economic costs and lower ROI. This does not help the overall goals of the company.
Product Managers can maximize the impact of data science and machine learning efforts.
Why Product Managers?
Data science provides intelligence and improves performance of many departments. Using data science, product organization learn more about their customer behaviors. Corporate leaders gain insights into business metrics. Customer success teams learn about potential renewal risks and price increase opportunities. IT teams gain operational insights into the infrastructure. Security teams detect and avoid potential threats. Marketing teams gain insights to better target their campaigns. These are just a few examples of potential benefits. Product managers can drive these cross-functional efforts in a coordinated manner. Because of their cross-functional relationships, product managers can understand the needs of different groups. They can bring these efforts together, rank them based on their business impact and have the engineering team focus on the right priorities.
How to maximize the impact?
This is as a strategic decision with a long-term impact on the company, and product managers should approach it as such. Think of the following three aspects to maximize the impact of the Data Science & Machine Learning efforts.
Stakeholders:
Start with the focus on the Stakeholders, i.e. departments within your company that are requesting data science & machine learning models. Start by understanding what they are trying to do? Do they need data science or machine learning? Ask questions around availability of data sets. How large are those data sets? Can we get by using PowerPivot or do we need Hadoop storage? Is the data structured, unstructured or a combination? Can we train/build a model using this data? Does the data have enough velocity and veracity?
Most engineering group would be able to decide what they need, but for others you have to guide them to the right path.
Once you decide that this problem requires data science, add the stakeholder to your list of probable users and calculate the impact it is going to have for the company.
Platform:
Build a platform that stores all the data science & machine learning models developed within the organization, with the focus of encouraging reuse.
Goal –
The goal is to build an economical platform that serves data science & machine learning models needs of all stakeholders. In addition, reuse of the models, minimizing rework, and maximizing benefits for the company are part of the goal. Needless to say, the platform should be easy to use.
Reusable Models –
This is easier said than done. Because it is difficult to find two different data sets that are exactly the same. For the model to be reusable, the new data set needs to match the model training data set. Which is rare.
Nonetheless the model can still be reusable if its build purpose matches its usage scenario. To achieve this, start with good specifications for the trained model. These specifications should detail the purpose of the model, the training data set used, and potentially the results achieved by the model. Any user of this platform should be able to look at this specification and make a determination if this model meets their requirements.
Once the user finds the right model, he/she should be able to easily retrain the model with new data set. This retraining may change the model behavior, so offer a way for the user to check back the model with differentiated specifications.
Library –
Once you have a small library with a few reusable models, provide a way for different users to change and enhance these models. Make sure you have proper versioning available for the models, with differentiated specifications with every checkin. Also make sure that the users can access and deploy any version of the same model. This ensures that the right version of the model is able to serve the needs of various stakeholders.
REST API –
Build a well structured and secure REST API for these models. Using this REST API, the users can feed the model with data from different sources (self or cloud hosted). They should also be able to run this model in different environments (cloud, datacenter, serverless containers). Make the use of this REST API mandatory for all models. This ensures that these models are easily accessible across different departments in the company.
Newer Models –
Allow users to add newer models to the library. Specify your requirements for the REST API, data sources, auth/login and runtime environments for the model. This ensures that model fits well into the platform and other stakeholders are able to securely reuse it.
Programming Languages –
Add support for different programming languages used within your company. To understand which languages you need to support, you would need to survey both data scientists and engineers within your company. The data scientists might need R, Python or C++ support. But other engineers might need Java, NodeJs or .NET support. To make the models on the platform reusable, you would need to support the programming languages used by both groups.
Search –
Now think about search and ways to make it easier for users to find the right models. Use both business and technical tags to describe the models, along with intuitive search technologies (auto suggest, filters etc.). This could be the most challenging part of the platform, given that the specifications, training data set, and results achieved will not fully convey the reusability in the new scenario. So you have to figure out different strategies to make this search work for all users on the platform.
Ownership & Controls –
This platform has models developed and used by different stakeholders, serving the specific needs of your company. Ensure that the different stakeholders feel that they have ownership in the platform. This will drive higher usage and contribution. But to make it scalable and to drive reuse, tightly control the REST API specifications and the overall platform user interface.
This platform could potentially be the biggest cross-functional product that you might have developed. Ensure that there is enough alignment and support to make it a success.
Economics:
Now that you have a platform, you have to make sure that the models within the platform do not drain the resources of the company (i.e. compute, engineering or budget). Inform the users about the potential model compute costs for the models available in the library. This ensures that the users pick the right model that meets their needs and their budget.
Create a business case template to help the stakeholders decide if they want to invest in the development efforts for newer models. Sometimes, this business case might deter the users from developing newer models and encourage reusing exiting models.
But for some specialized cases, discourage model development from scratch. Leverage the reusable algorithms and models provided by companies like Algorithmia.
Build the platform in a way that it calculates the overall costs to the company, and it is able to prove its ROI. This ensures that the platform gets long-term support and your company benefits from the data intelligence.
Leave a comment to get in touch with me on any topic related to Product Management, Innovation and Strategy.
References: