By: Mahmood-Ali Parker (Intern Praelexis 2020-2021)
As a data scientist working on a product team (or simply looking to build a product that incorporates machine learning), coming up with a good idea can be tough. There are brainstorming processes one can dedicate to finding a product idea worth investigating. With machine learning, there are many other factors outside of the usual product development process we have to consider such as data collection, data quality and quantity of the data.
Now, let’s assume you/your team has found an area worth investigating in your domain of choice. Presumably, we’re seeking to improve a process with machine learning or create value where otherwise not possible.
Whether it involves computer vision, natural language processing Al or predictive modelling with tabular data, there is usually a way to apply some form of machine learning to most use cases where good data can be collected. Given the right data (another problem entirely), you can find a way to solve, or significantly improve, solutions to many problems using machine learning.
The power of the internet and free and open-source software (FOSS) has made developing machine learning models for most well-defined tasks relatively easy today. For instance:
- Forms of computer vision problems can be solved effectively, even with smaller datasets, with a few lines of code using a pretrained model like some variation of a ResNet model.
- Most forms of regression and classification problems with tabular data can be solved effectively in many ways with tools such as deep learning and boosted decision tree modelling.
- Pre-trained language models like BERT and T5 can be used for a large variety of natural language processing tasks such chatbots and text classification.
While machine learning models may be the more glamorous part of a data science product, they only form a small, albeit essential, part of a complete product. As a data scientist, it’s very easy to get stuck in the trap of thinking that applying some revolutionary state-of-the-art model to a new use case will drive a product to success in the market.
“The hardest thing in machine learning is to find how to productively leverage it in your product. The second hardest thing is to collect and annotate the right dataset. Building and training models is relatively straightforward by comparison.” – François Chollet (Deep Learning Researcher at Google, Creator of Keras)
In practice, having an effective machine learning model far from guarantees success. It’s almost an assumption that a data scientist can create an effective model given the right data. An effective model is more of a pre-requisite to be able to succeed than the driving force. Being able to leverage a machine learning model is much more important than the development process. This is where product development comes into play.
It’s relatively uncommon to come across product development skills on a list of must-haves for the data science profession but so many positions, particularly at start-ups involve the development of products. If you intend to work primarily as a consultant or in an operational capacity, then this isn’t something that will concern you. The creativity that goes into making good products can be valuable in other domains though.
If you are not explicitly involved with the product development process as a data scientist working in product, which is highly unlikely, you most certainly have to work with people who are (the product manager or product owner springs to mind). Understanding the product developer’s process is extremely important to stay aligned, efficient and help the team make informed decisions. There is so much risk of going off on a tangent or experiencing unnecessary stumbling blocks when you are not completely aware of what is expected of your development process in the grand scheme of the product. This could lead to wasted time, wasted resources and might cause friction in the team. Of course, your team should be familiar with the data science process at a high level too to prevent similar issues from cropping up but that’s something a product manager/owner can elaborate on.
In a start-up environment, a product team will often find themselves creating products completely from scratch. As a data scientist involved in this process, your input and feedback is important to create a more accurate roadmap and plan for the product. Ensuring everyone remains aligned on the aspects of the product relevant to you is imperative. Some of the more obvious questions you might want to ensure are answered early on include:
- How are we going to collect data?
- What does high quality data look like for us?
- Where’s the data going to be stored?
- What does an effective machine learning model look like?
- How is our model going to be deployed?
The data scientist’s job is to ensure everyone else in the team understands the significance of these questions and the manners in which they need to be answered. At the same time, you need to understand their process to better deliver your feedback and concerns.
While product managers/owners are usually central to a product’s lifecycle process, a successful product will always come as the result of a collaborative effort. Each member of the product team oversees their own areas of expertise and needs to consult with the rest of the team to stay aligned. You may need to explain your work in different ways to convey the message to different players on your team. A software engineer will need to apply your input to their process differently to someone who may be in charge of design.
To be do be able to respond to critiques from your colleagues and give criticism constructively as well as leverage machine learning effectively in a product, product development skills will go a long way.
There are multiple frameworks that products teams can use to go about product development but you’re most likely to encounter Agile methodologies. Agile is also what we use here at Praelexis. Getting acquainted with it can help prepare you to work in most product teams due to its widespread adoption.
Agile project management is an iterative approach to software product development. It focuses on consistent releases and integrating customer feedback into every iteration. Teams work in brief sessions called ‘sprints’ and regroup regularly to evaluate the state of things and make adjustments as necessary. The four Agile values are:
- Individuals and Interactions over Processes and Tools
- Working Software over Comprehensive Documentation
- Customer Collaboration over Contract Negotiation
- Responding to Change over Following a Plan
A good background in Agile will prepare a data scientist to integrate well into any product team or even take on some more responsibility with a product over and above the machine learning requirements if there’s a desire to move in that direction. There are many paths to learning Agile, including some expensive certifications. Getting these certifications can be helpful for certain jobs in product but for the most part aren’t necessary for a data scientist role. It’s more than sufficient to learn through alternatives like YouTube, Udemy and LinkedIn Learning to get a working proficiency in Agile methodologies.
To get a higher level of the product development process, it’s also important to understand the product development life cycle (PDLC) to get a feel for how a product will progress in the grand scheme of things. The PDLC has four stages which a product may go through depending on how it’s managed. A brief overview:
- Introduction: This is when the product enters the market. Public awareness is essential to the success of the product at this stage.
- Growth: The stage where a product’s customer base (and market share) expands and most of the profits are made.
- Maturity: When sales and market share start to stabilise. Sales volumes typically peak in this stage before going into decline unless development is revisited.
- Decline: Gradual decline as product becomes less economically viable.
Each stage has a lot more detail and nuance. Managing a product effectively at each stage is key to a successful product. You can read more about the PDCL here.
There are many more aspects to product development to explore outside of Agile and the PDLC and I implore anyone interested to read further. As a starting point, good Agile skills and a thorough understanding of the product development life cycle provide an excellent toolkit. Data scientists can flourish in the product space with the right preparation.