Every week we bring you a selection of the best data science articles we find in Cyberspace. If you want to...

Every week we bring you a selection of the best data science articles we find in Cyberspace. If you want to dig deeper into these Data Science and Machine Learning topics do not miss the next Open Data Science conference, Boston, May 20-22. With over a 100 talks, workshops and tutorials, the ODSC conference is a must go event for all data scientists.

We start with a couple of amazing data visualizations, learn about the history of AI in gaming, program a drone in Python and dive into 300Tb of LHC data.

We often see network analysis applied related to the exploration of our social networks, twitter, linkedin and so forth. This time network analysis has been applied to another type of object, galaxies.


Using data from 24,000 galaxies, Kim Albrecht (@kimay) visual researcher at the Center for Complex Network Research in Boston gives us a staggering view of the cosmos using Network analysis models to visualize the cosmic web.

The demo is built around 3 types of neighbor aggregation. A fixed length model, one based on the size of the galaxy and finally a K-Means model. This interactive visualizations allows you to select the model parameter, zoom in and out or rotate the galaxy networks.

A fascinating look into the fundamental structure of the universe.

This truly amazing and interactive map displays the movements of the global merchant fleet over the course of 2012. Duncan Clark & Robin Houston (@robinhouston) from Kiln a Data Visualization Studio in London used WebGL to plot over 250 million data points over a map of the world specifically made for the project. Data sources for shipping positions are from exactEarth for AIS data (location/speed) and Clarksons Research UK World Fleet Register for static vessel information. The CO2 emissions for each type of vessels are also calculated.

Removing the background map will still result in a very accurate outline of the world’s coastlines. The Panama and Suez canals stand out with queues of ships waiting for their turns to cross into a different ocean or sea while the main ports stand out with clusters of ships.

Programming a drone to be autonomous is today’s remote controlled police car you loved as a kid. A dream come true for any self respecting nerd (such as myself). Well the team at Yhat has done it by programming a semi-Autonomous Drone with Python and node.js and they show you how in this article.


The drone is first programmed to do a simple sequence of actions, connecting to wifi, taking off, flying, landing. The actions get more and more complex with back flips and other moves. To make the drone semi autonomous the idea was to use a mix of OpenCV for image tracking and Yhat’s own model deployment software, ScienceOps. They allowed the drone to follow a red piece of cloth like a bull would in a corrida. With code, videos and useful tips to make the whole thing work, this article will get you started on your first step toward skynet. And you may well be tempted to tune up your own drone with lasers and a robotic voice a la Terminator?

The historic milestone of AlphaGo’s win over the world-class Go champion Lee Sedol was 60 years in the making.

This excellent series of articles by Andrey Kurenkov retraces the evolution of Artificial Intelligence attempts and successes at playing human games.

From the first MiniMax based games (checkers and chess) based on Shannon and Turing works in the 1950’s to games such as Backgammon and Go, from winning over inexperienced users to beating world champions, from Trees to Neural Networks and Reinforcement learning, Andrey Kurenkov walks us through decades of ideas and personalities that resulted in what game AI is today.


CERN releases 300TB of Large Hadron Collider data into open access

Now THIS is Big Data! In fact Massive Big Data. Very much in phase with the gigantism of the experiments it represents.

This 300TB dataset has been collected at the LHC by the CMS (Compact Muon Solenoid) detector in 2011. According to the CERN Opendata site:

The CMS (Compact Muon Solenoid) experiment is a large particle physics detectors built on the Large Hadron Collider (LHC) at CERN in Switzerland and France. Its goal is to investigate a wide range of physics, including properties of the recently discovered Higgs boson as well as searches for extra dimensions and particles that could make up dark matter.

This ginormous dataset is composed of primary datasets made of the raw data and derived datasets which are simpler to use. CERN makes available tools dedicated to the analysis of this data with a Cern Virtual Machine which comes preloaded with the software environment needed to analyse the CMS data as well as many tutorials and code samples.

This is not the first time CERN has released such a gigantic dataset. Already in 2014, CERN released 27TB of research data collected in 2010.

The best way to start would be with the CMS learning resources page where you’ll find many tutorials such as “a basic introduction to fundamental concepts of data analysis in High-Energy Physics experiments”. All you need to become a better particle physicist.

To read more from Alex sign up for our newsletter or follow him on twitter @alexip.

Alex Perrier

Alex Perrier

Lead Data Scientist focused on Natural Language Processing and Predictive Modeling, a background in stochastic processes and signal processing and extensive experience in agile software development. I recently authored a book on AWS Machine Learning with Packt Pub. I am a creative start-up co-founder with clear communication skills, project management and business development experience. Team lead and team builder.