“The difference between the most dissimilar characters, between a philosopher and a common street porter, for example, seems to arise not so much from...

The difference between the most dissimilar characters, between a philosopher and a common street porter, for example, seems to arise not so much from nature, as from habit, custom, and education.”  – Adam Smith,  An Inquiry into the Nature and Causes of the Wealth of Nations

In this compelling Adam Smith quote phrase, we can replace “philosopher” with “data scientist.” As Smith highlights in his quote, it is by repetition, constant exposition, focused effort and through study that we come to understand and learn things thus differentiating the professions we might have. This holds true as a data scientist. By nature, any one can be a data scientist. If you want to truly learn and succeed in this world, however, you must commit yourself to spending the time to learn the concepts, tools and talks that interest you. 

Data Science is a vast field filled with multiple topics, various areas of focus and dozens of different applications. The amount of opportunities is significant but if you have zero experience in data science, you need a plan to determine how best to grow and succeed. 

There are three things about data science you should understand clearly from the beginning:

  1. Types of business problems that align to different data science model categories
  2. Different types of data science algorithms that help solve these problems
  3. Tools and programming languages used to implement these algorithms and solutions

 

#1: Understand the Alignment Between Business and Data Science

It is best to start by understanding how data science is applicable to real world problems. Data science is deeply rooted in statistics but before you can simply dive in, you need to understand how business problems define the type of statistical model to apply in the resolution. Define the business problem first and then apply the appropriate data science methodologies. Different business problems will be studied or resolved via prediction, classification or regression, exploration and description, pattern recognition, clustering, correlation analysis, etc. Understand the alignment between business problem and data science application(statistical model) inside and out to create the foundation of your understanding and set your target goals.

 

#2 Understand the Algorithms Within Each Data Science Modeling Category 

Once you understand the alignment between business problems and data science applications, the next step is to fully understand the different algorithms that reside within each type of data science modeling category. To clarify, algorithms are ways to perform calculations to get your results based on the datasets you have on hand. When you use an algorithm with your data, you create a new model that uses the parameters you are calculating apply to create specific outputs thereby providing targeted insights into your data’s behavior. For example in clustering (a type of data science category), there are different algorithms such as K-Means, Hierarchical, DBSCAN and more. Each of these different algorithms can be used to help in resolving a specific business problem based upon its characteristics and resolution requirements.  

 

#3 Understand the Tools and Programming Languages to Create Algorithms and Models

There are countless different languages and tools out there to help you as you build out your first models and algorithms. As such, you don’t need to start each project from scratch. Languages such as Python, R, SQL, and Julia are widely used for data analysis so it would be useful for you to learn a couple of them. There are also libraries and frameworks like TensorFlow, Spark and others to become familiar with that will help you complete an array of common data science problem-solving tasks . You could learn a language or framework by studying its documentation and read books about it or by implementing specific things and building your skills from there. Studying and experimenting are two guaranteed ways to improve your skillsets in data science. All you need is the motivation and time committed to the effort. 

Humans are capable of many things. One of these things is that we have the capabilities of abstraction and analogies which means we can learn and understand from tutorials, workshops, and use case talk. To better understand the fundamentals of data science,  we can abstract insights from use cases or business case talks because the speaker walks you through the business problem he faces, and shares how he would approach the problem and resolve the matter using different algorithms and models. Start-to-end problem solving explanations – very valuable. 

To this end – think about it. When was the last time you went to a company and asked the people involved in an important project, similar to yours, to tell you how they solved it? It’s a very difficult thing to come by in actuality. These type of talks and explanations of use cases resolve that exact matter. 

Conference also can aid in enabling you to understand the three aforementioned components in a holistic manner. Make sure that at your next conference you choose a balanced mix of use cases, workshops, and tutorials according to your interests. Schedule the talks you want to attend in advance, connect with like-minded people, and take a bunch of notes because data scientists are not made by nature, but by their exposition and practice of the topics that matter to them.

One of the conferences to consider is the Open Data Science Conference series. The ODSC East Conference is coming on in May 2018. Take the chance of attending a 4-day conference and trigger your career in data science!

Diego Arenas

Diego Arenas, ODSC

I've worked in BI, DWH, and Data Mining. MSc in Data Science. Experience in multiple BI and Data Science tools always thinking how to solve information needs and add value to organisations from the data available. Experience with Business Objects, Pentaho, Informatica Power Center, SSAS, SSIS, SSRS, MS SQL Server from 2000 to 2017, and other DBMS, Tableau, Hadoop, Python, R, SQL. Predicting modelling. My interest are in Information Systems, Data Modeling, Predictive and Descriptive Analysis, Machine Learning, Data Visualization, Open Data. Specialties: Data modeling, data warehousing, data mining, performance management, business intelligence.