fbpx
5 SQL Data Wrangling Techniques Every Expert Should Know
Data wrangling is an essential job function for data engineering, data science, or machine learning roles. As knowledgable coders, many of these professionals can rely on their programming skills and help from libraries like Pandas to wrangle data. However, it can often be optimal to manipulate... Read more
Repurposing Binary Serialized Data Structures During the Process of Data Ingestion
There are a great many binary formats that data might live in. Everything very popular has grown good open-source libraries, but you may encounter some legacy or in-house format for which this is not true. Good general advice is that unless there is an ongoing and/or... Read more
Brace Yourself, Data Cleaning is Coming
If you are just too familiar with This Crazy Thing Called Data Cleaning, with both the classical and psychological tricks that help, if your hair has already gone grey because of it, if you are simply seeking fast, fun, and furious nontrivial tricks, I encourage you... Read more
Automating Data Wrangling – The Next Machine Learning Frontier
Editor’s note: Be sure to check out Alex’s talk at ODSC West 2019 this November, “The Last Frontier of Machine Learning – Data Wrangling.” Up to 95% of a data scientist’s time is spent data wrangling. Conversely, about 99% of data-scientists hate data wrangling. That’s problematic.... Read more
The Importance of PreProcessing Data the Right Way
There are so many different aspects of training a neural network that affect its performance. Many data scientists spend too much time thinking about learning rates, neuron structures, and epochs before actually using correctly optimized data. Without properly formatting data, your neural network will be useless,... Read more
Top Data Wrangling Skills Required for Data Scientists
Whatever you want to call it – data wrangling, data munging, or data transformation, the part of the Data Science Process sitting in between data acquisition and exploratory data analysis (EDA) is one of the core skills a data scientist must have. It includes a set... Read more
No Need for Deciphering – Learn How to Make Your Own Dataset Instead
Key Takeaways: By creating, capturing, and curating data, one can practice “data creationism” and be creative with data to make your own dataset. While Iris and Titanic are well-known datasets available to experiment with machine learning and data science, challenge yourself to create your own dataset.... Read more
Bayesian Estimation, Group Comparison, and Workflow
Over the past year, having learned about Bayesian inference methods, I finally see how estimation, group comparison, and model checking build upon each other into this really elegant framework for data analysis. Parameter Estimation The foundation of this is “estimating a parameter”. In a typical situation,... Read more
How Well Did Data Scientists Predict the 2018 World Cup? (Hint: Not Very)
This year’s World Cup in Russia was the most watched sporting event in history. GlobalWebIndex reports that up to 3.4 billion people – around half of the world’s population – watched some part of the tournament. As with past World Cups, a global prediction market emerged... Read more
The Best Mario Kart Character According to Data Science
Mario Kart was a staple of my childhood — my friends and I would spend hours after school as Mario, Luigi, and other characters from the Nintendo universe racing around cartoonish tracks and lobbing pixelated bananas at each other. One thing that always vexed our little group of... Read more