Every aspect of our lives, from life-saving medical advancements, transportation safety and security, to economic viability and even the convenience of selecting a movie, or a book, can be improved by creating better data analytics through Data Science.

A Guide

Data is an ever-growing byproduct of our daily lives. An ever growing number of devices and processes collect and store bits of data by the second, from your phone and thermostat, to your car, to the roadway on which you drive, to all aspects of your daily life at work.

As we move from fragmentation and silos to and ever-connected and recorded future, data is becoming the new currency and a vital manufactured resource. The power, importance, and responsibility such incredible data stewardship will demand of us in the coming decades is hard to imagine and yet most will fail to fully appreciate the insights data can provide.

Businesses that do not rise to the occasion and garner insights from this new resource are destined for failure. 

What is Data Science?

  • It is the art of converting data to intelligence.
  • It creates layers of abstraction between the complexities of data, math and statistics, and algorithms, and the final decision support product. 
  • It’s an addition to the traditional backward-looking analytical views of data by adding a forward-looking inductive model of data analysis. 
  • Data Science capabilities are built iteratively over time. You mature through your experiences working in the three domains: Features, Model, and Decisions. Each stage allows for growth and improvement, tackling growing complexities with a wider breadth of analytic capabilities. Significant gains can be found in every stage and every stage offer value.  
  • Data Science is a group effort bringing together capabilities, expertise, and data from all over the enterprise. 

Why Now?

Opportunity Costs | Data Science is an emerging field, opportunity costs arise when a competitor implements and generates value from data before you. When competitors are able to successfully leverage Data Science to gain insights, they can drive differentiated customer value propositions and lead their industries as a result.

Abundance of Data | Huge amounts of data are being generated and stored every instant. Data Science can be used to transform data into intelligence that help improve existing processes. Operating costs can be driven down dramatically by effectively incorporating the complex interrelationships in data like never before.

Advancements in Technology and Algorithms | Processing capabilities by way of faster CPU and faster GPUS along with nearly weekly advancements in supporting libraries and algorithms make the development much faster and cheaper than ever before.

How does it work?

Truth be told, Data Science is a complex field. It is difficult, intellectually taxing work, which requires the sophisticated integration of talent, tools and techniques. To do this, we’ll transform the process into three simple domains with simplified activities, so that the application can be easily understood and executed. 

Features | This domain focuses on preparing the data which you will feed into the learning models. 
  • Q&A - It starts with understating what questions are at top of your mind for which the answers elude you. This becomes the baseline for the direction we take. The art of Machine Learning has an added benefit of answering questions you never had or thought off. 
  • Aquire Data - Acquiring data is a more complex then it seems on the surface. There are the obvious issues of none-existence, low quality, love volume, and then there are the institutional issues of ownership and access rights. Add to that all the external data that you might want. In this step, we focus on creating as smooth an access path as possible to any and all data we might need, driving towards creating of data lakes.
  • Prepare Data - Preparing data begins with analyzing the data, the most difficult part of the process, and the most time consuming. This includes all aspects of dealing with missing data, scaling, normalizing, creating features from existing features, balancing features, and finally creating tensors for evaluation by the learning models. 
Models | This domain focuses on consuming the tensors and working through models, selecting the best without allowing perfection be the enemy of good. Perfection does not exist in machine learning.  
  • Build Models - At the start many ML models are viable and its best to select a grouping and build on top of the tensors working towards a smaller subset worth testing. 
  • Comapre and Test Models - With a smaller subset of working models, it’s time to test performance and scalability. Further reducing the workable set. At this set its good to have one to three viable models. 
  • Fine Tune a Selection - With the final set in hand, it’s time to fine tune, adjust hyper parameters, evaluating performance once again, and reducing the set to a final model. 
Decisions |  This domain focuses on the deployment and consumption of the final model.  
  • Deployment Wrapper - This step focuses on the best way to deploy. It could be as simple as running a command line on your computer and feeding it a spreadsheet, resulting in an output file, all the way to building an application around the model, or converting it to a cloud delivery system. 
  • Deployment and Consumption - This step encompasses the physical deployment, usage training, and consumption. Again, it could be as simple as dealing with one person and one PC, to a larger group with its undelaying organizational structure and related complexities. 
  • Evaluation and Refinement - This step is the evaluation process and continued refinement of the model as new data is collected and as lessons are learned and as new features are born. 

What does it take?

... To do it right. To do it effectively. To do it efficiently. 

  • Curiosity is necessary to peel apart a problem and examine the interrelationships between data that may appear  superficially unrelated.
  • Creativity is required to invent and try new approaches to  solving a problem, which often times have never been applied in such a context before.
  • Focus is required to design and test a technique over days and weeks, find it doesn’t work, learn from the failure, and try again.
  • Attention to Detail is needed to maintain rigor, and to detect and avoid over-reliance on intuition when examining data.

information (at)
biteconomics (dot) com

© Copyright 2018-2020, Bit Economics LLC - All Rights Reserved