Preparations for a Post-Pandemic Retail Environment
Covid-19 has challenged us to redesign multiple aspects of our life and this has inevitably led to wide-ranging impact across business in multiple different sectors. The retail industry has especially been disrupted as people seek convenience from the safety and comforts of their homes. As the... Read more
Brand Voice: Deep Learning for Speech Synthesis
The production of artificial natural-sounding human speech is a fascinating topic due to its complexity and surprising results, with applications that range from chatbots to the automatization of audio content in news media. One obvious example of a Text-to-Speech (TTS) application for news media is a... Read more
Building a Robust Data Pipeline with the “dAG Stack”: dbt, Airflow, and Great Expectations
Data quality has become a much-discussed topic in the fields of data engineering and data science, and it has become clear that ensuring data quality is absolutely crucial to avoiding a case of “garbage in – garbage out”. Apache Airflow and dbt (data build tool) are... Read more
Building a Holistic Risk Profile: Near Real-Time Approach to Insider Threat Detection
Each year, organizations and research firms output yearly Cybersecurity Threat and Breach predictions. Due to the Covid-19 pandemic, a global shift has occurred, forcing organizations to adapt to a “new normal” that includes a more distributed remote workforce. In turn, this has greatly affected data theft.... Read more
PyTorch Lightning: From Research to Production, Minus the Boilerplate
The following post introduces PyTorch Lightning, outlines its core design philosophy, and provides inline examples of how this philosophy enables more reproducible and production-capable deep learning code. What is PyTorch Lightning? PyTorch Lighting is a lightweight PyTorch wrapper for high-performance AI research. Simply put, PyTorch Lightning... Read more
AI and the Fight against Fake News & Fake Stats
One of the biggest challenges facing society today is the proliferation of fake news and fake stats. It is relatively easy, today, to come up with statistics and charts that can bolster dubious claims. But, it is much more difficult to counter such claims. And it... Read more
Dataset Management for Computer Vision
When building computer vision solutions, the emphasis is usually on the modeling side and on leveraging the latest algorithm. While the model is important, in my experience I have found that an even more important component to delivering a successful solution is to build and maintain... Read more
Scaling LightGBM with Dask
LightGBM is an open-source framework for solving supervised learning problems with gradient-boosted decision trees (GBDTs). It ships with built-in support for distributed training, which just means “using multiple machines at the same time to train a model”. Distributed training can allow you to train on larger... Read more
Measure Twice, Model Once
Supervised machine learning is essentially classification: ball vs strike; dog vs cat vs horse vs cow; etc. For these types of problems, the most fundamental question is always: can I create an accurate and generalized model (classifier) from the data I have collected? Today, the only... Read more
Data Visualization for Data Scientists – Choosing the Right Tool for the Job
Photo for Data Visualization for Data Scientists by David Pisnoy on Unsplash. Data visualization sometimes gets categorized as a field separate from machine learning or data science. Skill in designing effective, attractive plots and graphs doesn’t show up in job descriptions in the same way as... Read more