

5 Mistakes to Avoid as a Beginner Data Scientist
Career Insightsposted by Ralabs December 18, 2018 Ralabs

In recent memory with the growth of AI, data science has expanded into an even more attractive career option. It’s a career that is usually well-paid; plus, assignments are almost always fascinating. What should newcomers do to be successful in this area? There are few things to pay extra attention to in an effort to avoid some common mistakes.
1. Do not study without practice
Many people who start their career in this sphere make the same mistake – they take a lot of online courses and learn too many concepts, but do not try to put them into practice. Knowledge is not enough. When you learn an algorithm, try to find out all of its pros and cons, its limitations, and how to apply it to practical situations. There is a tricky thing – when you are learning advanced libraries such as Python’s ggplot2, for example, you rarely understand what is going on in its background. It would be better to apply what was learned to an experiment and get a deeper understanding of the process.
2. Learn math
Algebra, statistics, probability, and calculus. You need these four concepts to dive into the deeper areas of data science. It is a big mistake to code algorithms from scratch without learning the prerequisites. As you take your first steps in data science, you don’t need to create every algorithm from scratch, but if you are planning to do so, try to learn as you go along. Going deeper into data science, make sure you fill the gaps in your knowledge of the basic mathematical concepts.
3. Validate and re-validate your models
If you think you made a perfect machine learning model, the first thing you need to do is to check it again. Even if the predictive power of your model is very high, you are just halfway to success. The model fits perfectly with observational data? Great! However, it is necessary to re-validate it at set intervals. Modeled relationships may change continuously, so the predictive power of a model can collapse because of that. This problem can be easily avoided. You need to check the data regularly depending on changes in relationships in the model. The predictive power of models is influenced by many factors, and in some situations, data scientists have to rebuild their models. It’s a good practice to build a few models and define the distributions of variables.
4. Watch the difference between correlation and causation
Even some experienced data scientists make this mistake – they misunderstand the differences between correlation and causation. Correlation is when two factors are observed at the same time, but causation is when the first of these factors leads to the second one. This difference is often ignored by data scientists, which leads to huge mistakes. Data is often used to explain the correlation between variables. But in practice, if two subjects are somehow related to each other, it does not mean they have a causative dependence. So, if you are making a decision based on correlation without understanding the cause, be ready to get faulty results.
5. Formulate clear questions
Without the right question, you can’t collect the right datasets. Data science requires structuring and well-defined questions, too. It is a common mistake to pay attention to the data without understanding the question that needs to be answered through analysis. A huge number of data science projects give an answer on “what” kind of questions, which gives just numbers without explanations. This happens when scientists do not follow their main goal. But our task is to answer the “why” kind of questions to understand something that was not clear before. Also, do not forget your question when you choose visualization techniques to represent the results. Sometimes this choice is navigated by aesthetic taste instead of dataset characteristics. So, a perfect goal for your model is a big part of success.