Student Loans: a Subprime Time-bomb for the US Government?

Student Loans: a Sub...

Exploratory Data Analysis Visualization Project contributed by James Stebbins – Data Science Student in the NYC Data Science Academy Bootcamp Are Student Loans a Subprime Time-bomb for the US Government? There is overwhelming concern among politicians, professionals, and students that the current student loan market may be the next soaring hot air ballon primed to […]

Visualizing Professional Tennis Upsets: ATP 2012-2014 Men’s Singles Matches

Visualizing Professi...

Exploratory Data Analysis Visualization Project contributed by Tyler Knutson – Data Science Student in the NYC Data Science Academy Bootcamp Context Men’s professional tennis is unique in that despite the dominance of a select few competitors at the top of the ATP world rankings, upsets do occur regularly.  How dominant are these top players?  Consider […]

Inside Serial Killer Data: Part Two

Inside Serial Killer...

This is the second part of a two-part series on serial killer data. To read part one and to learn more about the origins of this data, check out part one here. One of the best things about this dataset is that it includes detailed information on the victims and not just the killers. The data […]

Inside Serial Killer Data: Part One

Inside Serial Killer...

Have you wondered about serial killer data? Have you asked yourself “What do the demographics of serial killers look like?” or “Are there correlations between certain types of killing methods and motivations for killing?” Well you’re in luck because we’ve gotten our hands on some juicy serial killer data featuring pretty much anything you’ve ever wanted […]

Visualizing the Relationship Between Infant Mortality Rates & Resource Availability

Visualizing the Rela...

Introduction We know that war and civil unrest account for a significant proportion of deaths every year, but how much can mortality rates be attributed to a simple lack of basic resources and amenities, and what relationship do mortality rates have with such factors? That’s what I set out to uncover using WorldBank data that […]

Beyond One-hot: an Exploration of Categorical Variables

Beyond One-hot: an E...

In machine learning, data is king. The algorithms and models used to make predictions with the data are important, and very interesting, but ML is still subject to the idea of garbage-in-garbage-out. With that in mind, let’s look at a little subset of those input data: categorical variables. Categorical variables (wiki) are those that represent a […]

The Pressure Cooker: Population Density and Crime

The Pressure Cooker:...

Do Higher Population Densities Increase Crime? Crime, particularly violent crime, is always prevalent in the public consciousness. At the same time, the UN reported in 2014 that population densities and the prevalence of urban areas continue to increase, with more than half the world’s population living in urban areas for the first time in history. The […]

Spatial Analysis, Is Airbnb even Legal in NYC?

Spatial Analysis, Is...

Editor’s note: Opinions expressed in this post do not necessarily reflect the views of #ODSC . Airbnb boasts almost two million listings in 34,000 cities, and according to data from Inside Airbnb, a independent data analysis website, listed about 36000 apartments in New York as of July 5, 2016. This data exploration sets out to visualize how Airbnb […]

Distributed Dask Arrays #3

Distributed Dask Arr...

In this post we analyze weather data across a cluster using NumPy in parallel with dask.array. We focus on the following: How to set up the distributed scheduler with a job scheduler like Sun GridEngine. How to load NetCDF data from a network file system (NFS) into distributed RAM How to manipulate data with dask.arrays […]