fbpx
Show Me the Data: 8 Awesome Time Series Sources Show Me the Data: 8 Awesome Time Series Sources
Thanks to the Internet of Things, smart cities, e-health, autonomous machines, and other innovations, time series datasets are being produced in... Show Me the Data: 8 Awesome Time Series Sources

Thanks to the Internet of Things, smart cities, e-health, autonomous machines, and other innovations, time series datasets are being produced in even more massive quantities. It can be used for econometrics, trend detection, pattern recognition, predictions, and is an essential ingredient in statistics, machine learning, and even deep learning models.

Learning time-series techniques will become increasingly important to any serious data scientist or machine learning engineer. Here are a few things to consider and some datasets to get you started.

What is time series?

An essential characteristic of time series data is that it’s a collection of data point observations that are stored with respect to their time. These observations with continuous timestamps are often collected with their target variables to build basic regression models. However, time series models go beyond simple data timestamps. Time series has a long history and are used to diagnose past behavior as well as to predict future behavior. Newly developed neural network architectures have taken time-series analysis to a new level

Examples of time series datasets

Federal Reserve Economic Data – FRED

When it comes to time-series datasets, FRED is the motherload. It contains over 750,000 data series points from over 70 sources and is entirely free. Drill down on the host of economic and research data from many countries including the USA, Germany, and Japan to name a few. Each time series data set is easily downloadable and many include time series graphs for quick reference.

https://odsc.com/california/

GitHub

GitHub has perhaps the widest and most diverse set of time series datasets available anywhere. However, the downside is you’ll need to do a bit of legwork to find access. Luckily, a few kind souls have done some of the work for us. You’ll find some open source directories listed on GitHub itself such as the Awesome Time Series Database. Awesome Public Dataset is an incredible resource with many time series sets included.

Kaggle

Where there are Kaggle competitions there will be a dataset to go with it. Given the popularity of time series models, it’s no surprise that Kaggle is a great source to find this data. Some notable sets include:  Walmart Sales in Stormy WeatherWikipedia Web Traffic Forecasting, Favorita Grocery Sales Forecasting, Recruit Restaurant Visitor Forecasting, and  COVID19 Global Forecasting. If that’s not enough, just query Kaggle’s dataset engine and you’ll find over 1,682 listed (last time we checked).

Google

Sure you use Google for all manners of searches, but less well-known is its dataset search engine. Let’s say your new startup is predicting airfare prices – you can simply key in “average USA airfares” and Google will return datasets and related searches. The datasets tend to be smaller but useful nonetheless. Useful features include the ability to search by last updates, download format, topics, license (free vs paid), etc.

The UEA & UCR Time Series Classification Repository

The UEA and UCR Time Series Classification Repository provides a common set of time series data available for experimentation and research in time series classification tasks. It also shows a very diverse set of data.

Data Portal

Hosted and run by the Open Knowledge Foundation, the Data Portal currently lists over 590 data portals. Many are national, state, city, or local government portals but also include various institutions.

The University of California,  Irvine (UCI)

UCI’s Center for Machine Learning and Intelligent Systems keeps a machine learning dataset repository that allows you to explore over 500 datasets. through a searchable interface. Datasets range across many topics, vary in terms of size, from only a few cases (or “instances”) up to over 43 million, and from only 1 or 2 variables (or “attributes”) to over a million variables. Currently, there are 121 time series datasets available across a range of domains.

CompEngine

Last but certainly not least is the very interesting and insightful Time Series CompEngine. Not only does it give you access to time series data sets as the name suggests; it’s also a comparison engine for time-series data. The website allows you to upload time-series data and interactively visualize how your data relates to the time series that others have measured or generated.

It works by allowing you to upload a new time-series dataset and the  CompEngine computes the set’s properties or “features.” It in turn uses these features to find similar types of data that are already in the CompEngine database. You can then interactively explore how your data is placed in this broader context to help with your research.

Get started with machine learning for data science and add it to your skillset at ODSC West 2022

If you’re looking to add an in-demand, evergreen, and broad-use skill to your repertoire, then maybe it’s time to learn machine learning or other core data science skills. At ODSC West 2022, we’ll have an entire mini bootcamp track where you can start with core beginner skills and work your way up to more advanced data science skills, such as working with NLP or neural networks. By registering now, you’ll also gain access to Ai+ Training on demand for a year. Sign up now, start learning today!

Sheamus McGovern

Founder of ODSC and Software Architect specializing in, complex multi-platform systems across multiple industries including finance, healthcare, and education.

1