Repurposing Binary Serialized Data Structures During the Process of Data Ingestion
There are a great many binary formats that data might live in. Everything very popular has grown good open-source libraries, but you may encounter some legacy or in-house format for which this is not true. Good general advice is that unless there is an ongoing and/or... Read more
Brace Yourself, Data Cleaning is Coming
If you are just too familiar with This Crazy Thing Called Data Cleaning, with both the classical and psychological tricks that help, if your hair has already gone grey because of it, if you are simply seeking fast, fun, and furious nontrivial tricks, I encourage you... Read more
The Un-Sexy Data Science: Data Cleaning
The Data Science profession, while often misunderstood (even by the field itself — a topic for another post possibly), was famously labeled the “sexiest job of the 21st century” by the Harvard Business Review in October 2012. Since that time, businesses, education providers, and potential talent... Read more
Searching for a Data Colander for Automatic Data Cleaning
Editor’s note: The following is an article written by Devavrat Shah of MIT and Christina Lee Yu of Cornell University. Be sure to check out their presentation at ODSC East 2019, “Predictions in Excel through Estimating Missing Values”.” The stated mission of Data Science or Data-Driven... Read more