fbpx
Show Me the Data: 8 Awesome Time Series Sources Show Me the Data: 8 Awesome Time Series Sources
Thanks to the Internet of Things, smart cities, e-health, autonomous machines, and other innovations, time series datasets are being produced in... Show Me the Data: 8 Awesome Time Series Sources

Thanks to the Internet of Things, smart cities, e-health, autonomous machines, and other innovations, time series datasets are being produced in even more massive quantities. It can be used for econometrics, trend detection, pattern recognition, predictions, and is an essential ingredient in statistics, machine learning, and even deep learning models.

Learning time-series techniques will become increasingly important to any serious data scientist or machine learning engineer. Here are a few things to consider and some datasets to get you started.

What is time series?

An essential characteristic of time series data is that it’s a collection of data point observations that are stored with respect to their time. These observations with continuous timestamps are often collected with their target variables to build basic regression models. However, time series models go beyond simple data timestamps. Time series has a long history and are used to diagnose past behavior as well as to predict future behavior. Newly developed neural network architectures have taken time-series analysis to a new level

Examples of time series datasets

Federal Reserve Economic Data – FRED

When it comes to time-series datasets, FRED is the motherload. It contains over 750,000 data series points from over 70 sources and is entirely free. Drill down on the host of economic and research data from many countries including the USA, Germany, and Japan to name a few. Each time series data set is easily downloadable and many include time series graphs for quick reference.

https://odsc.com/europe/

GitHub

GitHub has perhaps the widest and most diverse set of time series datasets available anywhere. However, the downside is you’ll need to do a bit of legwork to find access. Luckily, a few kind souls have done some of the work for us. You’ll find some open source directories listed on GitHub itself such as the Awesome Time Series Database. Awesome Public Dataset is an incredible resource with many time series sets included.

Kaggle

Where there are Kaggle competitions there will be a dataset to go with it. Given the popularity of time series models, it’s no surprise that Kaggle is a great source to find this data. Some notable sets include:  Walmart Sales in Stormy WeatherWikipedia Web Traffic Forecasting, Favorita Grocery Sales Forecasting, Recruit Restaurant Visitor Forecasting, and  COVID19 Global Forecasting. If that’s not enough, just query Kaggle’s dataset engine and you’ll find over 1,682 listed (last time we checked).

Google

Sure you use Google for all manners of searches, but less well-known is its dataset search engine. Let’s say your new startup is predicting airfare prices – you can simply key in “average USA airfares” and Google will return datasets and related searches. The datasets tend to be smaller but useful nonetheless. Useful features include the ability to search by last updates, download format, topics, license (free vs paid), etc.

The UEA & UCR Time Series Classification Repository

The UEA and UCR Time Series Classification Repository provides a common set of time series data available for experimentation and research in time series classification tasks. It also shows a very diverse set of data.

Data Portal

Hosted and run by the Open Knowledge Foundation, the Data Portal currently lists over 590 data portals. Many are national, state, city, or local government portals but also include various institutions.

The University of California,  Irvine (UCI)

UCI’s Center for Machine Learning and Intelligent Systems keeps a machine learning dataset repository that allows you to explore over 500 datasets. through a searchable interface. Datasets range across many topics, vary in terms of size, from only a few cases (or “instances”) up to over 43 million, and from only 1 or 2 variables (or “attributes”) to over a million variables. Currently, there are 121 time series datasets available across a range of domains.

CompEngine

Last but certainly not least is the very interesting and insightful Time Series CompEngine. Not only does it give you access to time series data sets as the name suggests; it’s also a comparison engine for time-series data. The website allows you to upload time-series data and interactively visualize how your data relates to the time series that others have measured or generated.

It works by allowing you to upload a new time-series dataset and the  CompEngine computes the set’s properties or “features.” It in turn uses these features to find similar types of data that are already in the CompEngine database. You can then interactively explore how your data is placed in this broader context to help with your research.

Learn more about machine learning platforms and skills at ODSC East 2022

We just listed off quite a few machine learning engineering platforms, skills, and frameworks. It’s not expected to know every single thing mentioned above, but knowing a good chunk of them – and how to apply them in business settings – will help you get a job or become better at your current one.

At ODSC Europe 2022, we have an entire track devoted to machine learning and deep learning. Learn ML engineering skills and platforms like the ones listed above. Here are a few sessions scheduled so far:

  • Beyond the Basics: Data Visualization in Python
  • A Hands-on Guide to Machine Learning with TensorFlow
  • Introduction to Machine Learning
  • Rule Induction and Reasoning in Knowledge Graphs
  • Digital Twins: Not All Digital Twins are Identical
  • Time-Series in Python – Preprocessing and Machine Learning
  • The Bayesian Revolution in Online Marketing
  • How to Teach Our World Knowledge to a Neural Network?
  • Dynamic and Context-Dependent Stock Price Prediction Using Attention Modules and News Sentiment
  • Open Source Explainability – Understanding Model Decisions Using Alibi
  • PyTorch 101: Building a Model Step-by-step
  • GANs N’ Roses: Understanding Generative Models
  • How to Write a Scikit-learn Compatible Estimator
  • Explainability by Design: a Methodology to Support Explanations in Decision-making Systems
  • Diffusion Models for Text-to-Image Generation
  • Visually Inspecting Data Profiles for Data Distribution Shifts
  • Computer Perception Challenges in Drone Applications Using Quality Data Annotation
  • Next Generation Web Apps: Create a Machine Learning Powered Smart Cam in the Browser with TensorFlow.js
  • Optimizing Your Analytics Life Cycle with Machine Learning and Open Source
  • Machine Learning for Economics and Finance in TensorFlow 2
  • Revealing the Inner Self: Automatic Differentiation (Autodiff) Clearly Explained
  • Moving into the Frequency Domain with the Fourier Transform

Sheamus McGovern

Founder of ODSC and Software Architect specializing in, complex multi-platform systems across multiple industries including finance, healthcare, and education.

1