The Latest Advances in Classification With Too Many Labels
At ODSC London 2018, Dr. Michael Swarbrick Jones, Ph.D. gave a technical lecture on the latest in large-scale multilabel classification and how practitioners should be managing their hyperparameters and data to get the most out of their models, focusing specifically on problems where there are thousands or tens of... Read more
3 Sought-After Data Science Skills to Get Hired in 2019
Getting a job in data science is a wide-open prospect. Standing out and being successful is another story. Don’t languish in mediocre data science hell. Get to work on these essential skills to stand out in the data science field, get hired, and thrive, and find out what these... Read more
How to Fix Data Leakage – Your Model’s Greatest Enemy
At ODSC London 2018, Yuriy Guts of DataRobot gave a talk on data leakage, including potential sources of the problem and how it can be remedied. Data leakage – also sometimes referred to as data snooping – is a phenomenon in machine learning that occurs when a model is... Read more
Going to the Bank: Using Deep Learning For Banking and the Financial Industry
At ODSC London 2018, Pavel Shkadzko explained to the audience how Gini GmbH, where he works as a semantics engineer, uses deep learning to automate information extraction from financial documents, such as invoices. By applying deep learning to tasks historically handled by optical character recognition and clever regular expression... Read more
Reviewing Amazon’s Machine Learning University – Is it Worth All of the Hype?
As an educator in the field of data science, I’m always interested in new learning resources for machine learning. The industry needs a new crop of data scientists to fill the rising demand. This is why I was pleased to learn of the recent announcement of free access to... Read more
Monthly Summary of Selected Trends, Activities, and Insights for R – November 2018
In November, activities continued to increase beyond the numbers recorded since July across the R ecosystem. This was most notable in events and in the downloads of R packages. Total package downloads from a single CRAN mirror and in one single year hit half-billion this November for the first... Read more
Create Your First Face Detector in Minutes Using Deep Learning
Face detection is one of the most widely-demanded subfields of computer vision. Due to the advent of deep learning, computer vision has gained significant development in the last few years, and this trend is only going to increase over time. There are more and more people using computer vision... Read more
Most Influential Data Science Research Papers for 2018
As an academic researcher in a previous life, I like to maintain ties to the research community while working in the data science field. I feel that a firm understanding of the origins for the technologies I use in my consulting work: AI, machine learning, and deep learning, helps... Read more
10 Tips to Get Started with Kaggle
Kaggle is a well-known community website for data scientists to compete in machine learning challenges. Competitive machine learning can be a great way to hone your skills, as well as demonstrate your skills. In this article, I will provide 10 useful tips to get started with Kaggle and get... Read more
The Data Scientist’s Holy Grail – Labeled Data Sets
The Holy Grail for data scientists is the ability to obtain labeled data sets for the purpose of training a supervised machine learning algorithm. An algorithm’s ability to “learn” is based on training it using a labeled training set – having known response variable values that correspond to a... Read more