Thanks to the Internet of Things, smart cities, e-health, autonomous machines, and other innovations, time series datasets are being produced in even more massive quantities. It can be used for econometrics, trend detection, pattern recognition, predictions, and is an essential ingredient in statistics, machine learning, and even deep learning models.
Learning time-series techniques will become increasingly important to any serious data scientist or machine learning engineer. Here are a few things to consider and some datasets to get you started.
What is time series?
An essential characteristic of time series data is that it’s a collection of data point observations that are stored with respect to their time. These observations with continuous timestamps are often collected with their target variables to build basic regression models. However, time series models go beyond simple data timestamps. Time series has a long history and are used to diagnose past behavior as well as to predict future behavior. Newly developed neural network architectures have taken time-series analysis to a new level
Examples of time series datasets
Federal Reserve Economic Data – FRED
When it comes to time-series datasets, FRED is the motherload. It contains over 750,000 data series points from over 70 sources and is entirely free. Drill down on the host of economic and research data from many countries including the USA, Germany, and Japan to name a few. Each time series data set is easily downloadable and many include time series graphs for quick reference.
GitHub has perhaps the widest and most diverse set of time series datasets available anywhere. However, the downside is you’ll need to do a bit of legwork to find access. Luckily, a few kind souls have done some of the work for us. You’ll find some open source directories listed on GitHub itself such as the Awesome Time Series Database. Awesome Public Dataset is an incredible resource with many time series sets included.
Where there are Kaggle competitions there will be a dataset to go with it. Given the popularity of time series models, it’s no surprise that Kaggle is a great source to find this data. Some notable sets include: Walmart Sales in Stormy Weather, Wikipedia Web Traffic Forecasting, Favorita Grocery Sales Forecasting, Recruit Restaurant Visitor Forecasting, and COVID19 Global Forecasting. If that’s not enough, just query Kaggle’s dataset engine and you’ll find over 1,682 listed (last time we checked).
Sure you use Google for all manners of searches, but less well-known is its dataset search engine. Let’s say your new startup is predicting airfare prices – you can simply key in “average USA airfares” and Google will return datasets and related searches. The datasets tend to be smaller but useful nonetheless. Useful features include the ability to search by last updates, download format, topics, license (free vs paid), etc.
The UEA & UCR Time Series Classification Repository
The UEA and UCR Time Series Classification Repository provides a common set of time series data available for experimentation and research in time series classification tasks. It also shows a very diverse set of data.
Hosted and run by the Open Knowledge Foundation, the Data Portal currently lists over 590 data portals. Many are national, state, city, or local government portals but also include various institutions.
The University of California, Irvine (UCI)
UCI’s Center for Machine Learning and Intelligent Systems keeps a machine learning dataset repository that allows you to explore over 500 datasets. through a searchable interface. Datasets range across many topics, vary in terms of size, from only a few cases (or “instances”) up to over 43 million, and from only 1 or 2 variables (or “attributes”) to over a million variables. Currently, there are 121 time series datasets available across a range of domains.
Last but certainly not least is the very interesting and insightful Time Series CompEngine. Not only does it give you access to time series data sets as the name suggests; it’s also a comparison engine for time-series data. The website allows you to upload time-series data and interactively visualize how your data relates to the time series that others have measured or generated.
It works by allowing you to upload a new time-series dataset and the CompEngine computes the set’s properties or “features.” It in turn uses these features to find similar types of data that are already in the CompEngine database. You can then interactively explore how your data is placed in this broader context to help with your research.
Get started with machine learning for data science and add it to your skillset at ODSC West 2022
If you’re looking to add an in-demand, evergreen, and broad-use skill to your repertoire, then maybe it’s time to learn machine learning or other core data science skills. At ODSC West 2022, we’ll have an entire mini bootcamp track where you can start with core beginner skills and work your way up to more advanced data science skills, such as working with NLP or neural networks. By registering now, you’ll also gain access to Ai+ Training on demand for a year. Sign up now, start learning today!