Time series forecasting using TensorFlow involves building models like Convolutional and Recurrent Neural Networks (CNNs and RNNs). This tutorial covers forecasting for a single time step with one or all features, as well as multi-step forecasting using single-shot and autoregressive approaches.
Setup and Dataset
We'll use a weather time series dataset from the Max Planck Institute for Biogeochemistry. This dataset contains 14 features collected every 10 minutes between 2009 and 2016.
First, import the necessary libraries and load the data:
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
zip_path = tf.keras.utils.get_file(
origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
fname='jena_climate_2009_2016.csv.zip',
extract=True)
csv_path, _ = os.path.splitext(zip_path)
df = pd.read_csv(csv_path)
df = df[5::6] # Subsample to hourly data
date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')
Let's visualize some of the features:
plot_cols = ['T (degC)', 'p (mbar)', 'rho (g/m**3)']
plot_features = df[plot_cols]
plot_features.index = date_time
plot_features.plot(subplots=True)
Data Cleaning and Feature Engineering
Clean the wind velocity data:
wv = df['wv (m/s)']
bad_wv = wv == -9999.0
wv[bad_wv] = 0.0
max_wv = df['max. wv (m/s)']
bad_max_wv = max_wv == -9999.0
max_wv[bad_max_wv] = 0.0
Convert wind direction and velocity to wind vectors:
wv = df.pop('wv (m/s)')
max_wv = df.pop('max. wv (m/s)')
wd_rad = df.pop('wd (deg)') * np.pi / 180
df['Wx'] = wv * np.cos(wd_rad)
df['Wy'] = wv * np.sin(wd_rad)
df['max Wx'] = max_wv * np.cos(wd_rad)
df['max Wy'] = max_wv * np.sin(wd_rad)
Create time-based features:
timestamp_s = date_time.map(pd.Timestamp.timestamp)
day = 24*60*60
year = (365.2425)*day
df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day))
df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day))
df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year))
df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))
Data Splitting and Normalization
Split the data into training, validation, and test sets:
n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]
Normalize the data:
train_mean = train_df.mean()
train_std = train_df.std()
train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std
Data Windowing
Create a WindowGenerator
class to handle data windowing:
class WindowGenerator():
def __init__(self, input_width, label_width, shift,
train_df=train_df, val_df=val_df, test_df=test_df,
label_columns=None):
# Initialize window parameters
self.train_df = train_df
self.val_df = val_df
self.test_df = test_df
self.label_columns = label_columns
self.input_width = input_width
self.label_width = label_width
self.shift = shift
# Calculate indices for input and label windows
self.total_window_size = input_width + shift
self.input_slice = slice(0, input_width)
self.input_indices = np.arange(self.total_window_size)[self.input_slice]
self.label_start = self.total_window_size - self.label_width
self.labels_slice = slice(self.label_start, None)
self.label_indices = np.arange(self.total_window_size)[self.labels_slice]
def split_window(self, features):
inputs = features[:, self.input_slice, :]
labels = features[:, self.labels_slice, :]
if self.label_columns is not None:
labels = tf.stack(
[labels[:, :, self.column_indices[name]] for name in self.label_columns],
axis=-1)
return inputs, labels
def make_dataset(self, data):
data = np.array(data, dtype=np.float32)
ds = tf.keras.utils.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32,)
ds = ds.map(self.split_window)
return ds
This WindowGenerator
class allows for flexible creation of input-label pairs for various forecasting tasks.
Conclusion
With these preprocessing steps and the WindowGenerator
class, we've laid the groundwork for building time series forecasting models using TensorFlow. The next steps would involve creating and training models for single-step and multi-step forecasting tasks.
- Chollet F. Deep Learning with Python. Manning Publications; 2017.
- Géron A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Media; 2019.