How to Create and Mature your Data Science Practice

Data Science is the art of using data to inform your actions. The most recent spate of digital disruption and increased success of companies harnessing data has unleashed a new wave of roles called the Chief Data Officer (or) Chief Data Science Officer and has necessitated every company to take a step back and focus on an organizational data strategy. Data Science means a broad spectrum. It ranges anywhere from basic operational data Data Science can span anywhere from operational data to advanced artificial intelligence computing. So where do you really start?


Oh yeah. It starts with the basics. Start with a data warehouse which was in vogue before data science. Do an analysis of all the available systems, processes, and data sets to simply build a data warehouse, ability to visualize, and make the data available to the stakeholders. At the end of this step, your business should have data driven intelligence to optimize the business. For example, HR should be able to say how many people, retention/attrition risks, Supply Chain should be able to predict cost of goods, location and sourcing strategy, Finance should be able to see patterns and predict revenues, Sales should be able to see the closing price of last comparative sale, and business leaders should be able to have reasonable insights to lead the business.

This would be one of the hardest things to do and expect to overcome many things that you thought would be easy such as regulations, disconnected systems, disconnected processes, and a lot more. It will give you the ability to build a roadmap for the future.


While Data Warehouses work really well for traditional and fixed data sets that can be structured, their ability to thrive in a world where data is everywhere comes becomes an expensive proposition. Expanding your data warehouse into a data lake would be next step. This involves bringing in solutions like Hadoop that are really made for both processing large volumes of structured and unstructured data. But why unstructured data? We are humans we don’t talk like databases. Our day-to-day work is a lot of unstructured data, our paper work we fill is unstructured data, and there is a lot of that around. Getting the right infrastructure in place and training the organizations muscles to start looking at both structured and unstructured data would be the next logical step. This step should build on the above and train the organization to do simple things. For example, HR should be able to look at job postings, look at keywords, and see if they are aligned to business strategy, business leaders should be able to see the employee climate by harnessing social media discussions, manufacturing units should be able to read manuals and extract insights, and a lot more.

Your organization’s ability to compute and need for data scientists becomes much more prominent here.


While data warehouses and data lakes get you of the ground, there is a lot more to specialize. Data in today’s world can be organized in structured (e.g. she at 5 mangoes), unstructured (e.g. she loves mangoes), visual (e.g.100 photos of her eating a mango), and verbal (e.g. she is describing about mangoes). As you start your evolution through Step 1 and Step 2, you will end-up finding various data sources and types that you would otherwise not think about in the organization. Based on your business prioritize building expertise in each of the domains. This specialization is hard and can be aided with constant innovations from services like Watson. At the end of this, your organizations’ ability to use various sources of data as a differentiator.

Your organization’s ability to compute and need for data scientists becomes profound and potential need for domain experts and larger teams becomes essential.


Almost all organization’s processes should annoyingly run with data. The big data is all great but it is the small data that is most valuable. The key performance indicators, the leading indicators, the predictive analytics, and evolution of your organization’s data as a decision making tool should all become part of your organization’s operating systems. At the end of this step, your organization’s operating system should be able to make complex models, compute relationships, converse with users, communicate with leaders, get predictive information and recommendations, and simply become a habit. The governance processes, regulatory readiness, integration of systems, functioning of data warehouse/data lake, meetings, time to procure data, should all become seamless. Simply said “data becomes a habit”.


The fundamentals of data doesn’t change. Almost at every step, an organization should master the art of collecting data, analyzing it, deriving insights, and using those insights to inform the next steps. Just like any product management exercise the step-by-step growth has to be balanced with the depth of capabilities at every step to prevent creation of a messy data science organization.

Have you created a data science organization (or) looking to create one? Can you share your insights via comments below?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: