Save 45% off ODSC East, it's just a few months away!

days

:

:

for an extra 20% off, use the code: ODSC20
Go
Millennials are Less Likely to Divorce

Millennials are Less Likely to Divorce

Millennials are getting married later than previous generations, as I wrote about here.  But the ones who get married are no more likely to divorce during the first 10 years, and after that they might be substantially less likely to get divorced. The following figure shows estimates for the fraction of people who have not […]

2 Visualizations

2 Visualizations

Editor’s note: This is the second of a series of posts on the caret package. The featurePlot function is a wrapper for different lattice plots to visualize the data. For example, the following figures show the default plot for continuous outcomes generated using the featurePlotfunction. For classification data sets, the iris data are used for illustration. […]

Containers for Data Science

Containers for Data Science

Containers represent a simple way of creating pipelines for data analysis or even data science architectures. In this post, I will explain some of the container features and suggest a microservices architecture for data science professionals. People often describe containers as “lightweight virtual machines”, but it’s a fallacy. Virtual Machines emulate the hardware and OS […]

Write tests

Write tests

Tests are important for community driven open source software. This post contains brief reasons why you should test your code, particularly if you submit changes to existing open source projects. This work was originally at matthewrocklin.com and is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project Why we don’t test. A […]

What Is Predictive Analytics (and Why Do You Need It)?

What Is Predictive Analytics (and Why Do You Need It)?

Try this statistic on for size: The 500 petabytes of digital healthcare data that existed in 2012 is predicted to reach 25,000 petabytes by the year 2020. That’s an increase of nearly 50 times the amount of data from just eight years prior! Healthcare marketers may be swimming in data, but what’s important is to […]

NLP with NLTK – Part 1

NLP with NLTK – Part 1

Introduction: The idea of using a structured programming language to interact with computers is being challenged by Natural Language Processing (NLP) and Natural Language Understanding methods. NLP holds great promises of making computer interfaces accessible to a wide range of audiences – as humans would be able to talk to computers in their own native […]

Will My Kiva Loan Get Funded?

Will My Kiva Loan Get Funded?

Web Scraping Project contributed by Christian Holmes – Data Science Student in the NYC Data Science Academy Bootcamp Kiva Basics Microlending has become increasingly popular in recent years. If you’ve never heard of the concept before, microlending is a method of poverty alleviation implemented in the developing world. Small amounts of capital is loaned to people who would […]

Representation Learning Bonus Material

Representation Learning Bonus Material

This post is part of a three part series. Notes on Representation Learning Notes on Representation Learning Continued Representation Learning Bonus Material Using GANs to Generate Images Based On Text Descriptions Below are some neat pictures demonstrating the use of GANs to generate images based on text descriptions.  All the images below are generated by a […]

Mixed-mode Estimation in Petersburg

Mixed-mode Estimation in Petersburg

A couple of months ago I posted an overview of simple estimation of hierarchical events using python and petersburg. At the time it probably seemed a little bit trivial, just building a structured frequency model and drawing samples from it. But I have finally implemented the next step to complete the intended functionality. This post […]