The Importance of Processing Data the Right Way

The Importance of Processing Data the Right Way

WordPress database error: [Unknown column 'pop.last_viewed' in 'where clause']
SELECT pop.postid FROM wp_popularpostssummary as pop, wp_posts as p WHERE pop.postid = p.ID AND pop.last_viewed > DATE_SUB('2018-10-18 03:18:40', INTERVAL 1 WEEK) AND p.post_type = "post" GROUP BY pop.postid ORDER BY SUM(pop.pageviews) DESC LIMIT 5

Data Wranglingposted by Caspar Wylie, ODSC October 16, 2018

There are so many different aspects of training a neural network that affect its performance. Many data scientists spend too much time thinking about learning rates, neuron structures, and epochs before actually using correctly optimized data. Without properly formatting data, your neural network will be useless, regardless of the... Read more
Top Data Wrangling Skills Required for Data Scientists
Whatever you want to call it – data wrangling, data munging, or data transformation, the part of the Data Science Process sitting in between data acquisition and exploratory data analysis (EDA) is one of the core skills a data scientist must have. It includes a set of tasks you... Read more
No Need for Deciphering – Learn How to Make Your Own Dataset Instead
Key Takeaways: By creating, capturing, and curating data, one can practice “data creationism” and be creative with data to make your own dataset. While Iris and Titanic are well-known datasets available to experiment with machine learning and data science, challenge yourself to create your own dataset. Anything can be... Read more
Bayesian Estimation, Group Comparison, and Workflow
Over the past year, having learned about Bayesian inference methods, I finally see how estimation, group comparison, and model checking build upon each other into this really elegant framework for data analysis. Parameter Estimation The foundation of this is “estimating a parameter”. In a typical situation, we are most... Read more
How Well Did Data Scientists Predict the 2018 World Cup? (Hint: Not Very)
This year’s World Cup in Russia was the most watched sporting event in history. GlobalWebIndex reports that up to 3.4 billion people – around half of the world’s population – watched some part of the tournament. As with past World Cups, a global prediction market emerged allowing spectators to... Read more
The Best Mario Kart Character According to Data Science
Mario Kart was a staple of my childhood — my friends and I would spend hours after school as Mario, Luigi, and other characters from the Nintendo universe racing around cartoonish tracks and lobbing pixelated bananas at each other. One thing that always vexed our little group of would-be speedsters was... Read more
Starting a Data Science Project
I spoke in a Webinar this past Saturday about how to get into Data Science. One of the questions asked was “What does a typical day look like?”  I think there is a big opportunity to explain what really happens before any machine learning takes place for a large... Read more
Predicting the Truncated xorshift32* Random Number Generator
Software programmers need random number generators. For this purpose, they often use functions with outputs that appear random. Gerstmann has a nice post about Better C++ Pseudo Random Number Generator. He investigates the following generator: uint32_t xorshift(uint64_t *m_seed) { uint64_t result = *m_seed * 0xd989bcacc137dcd5ull; *m_seed ^= *m_seed >> 11;... Read more
How Quickly Can You Compute the Dot Product Between Two Large Vectors?
A dot (or scalar) product is a fairly simple operation that simply sums the many products: float sum = 0; for (size_t i = 0; i < len; i++) { sum += x1 * x2; } return sum; It is nevertheless tremendously important. You know these fancy machine learning... Read more
Data Processing on Modern Hardware
If you had to design a new database system optimized for the hardware we have today, how would you do it? And what is the new hardware you should care about? This was the topic of a seminar I attended last week in Germany at Dagstuhl. Here are some thoughts:... Read more
Open Data Science - Your News Source for AI, Machine Learning & more