Skip to content

↑ Time Series Features

Preprocessing

Timeseries features are handled as sequence features, with the only difference being that the matrix in the HDF5 preprocessing file uses floats instead of integers.

Since data is continuous, the JSON file, which typically stores vocabulary mappings, isn't needed.

Ludwig supports two data formats for timeseries:

  • Row-major (default): each row in the dataset is already a space-separated sequence of floats representing one complete window. Use timeseries_length_limit to cap the window size.
  • Column-major: each row is a single scalar observation. Ludwig converts to row-major automatically using a sliding window controlled by preprocessing.window_size (for inputs) or preprocessing.horizon (for outputs).

Input Features

Preprocessing

missing_value_strategy: fill_with_const
tokenizer: space
timeseries_length_limit: 256
padding_value: 0.0
padding: right
fill_value: ''
computed_fill_value: ''
window_size: 0

Parameters:

  • missing_value_strategy (default: fill_with_const) : What strategy to follow when a row of data is missing. Options: fill_with_const, fill_with_mode, bfill, ffill, drop_row.
  • tokenizer (default: space):
  • timeseries_length_limit (default: 256):
  • padding_value (default: 0.0):
  • padding (default: right):
  • fill_value (default: ``):
  • computed_fill_value (default: ``):
  • window_size (default: 0): Optional lookback window size used to convert a column-major dataset (one observation per row) into a row-major dataset (each row has a timeseries window of observations). Starting from a given observation, a sliding window is taken going window_size - 1 rows back to form the timeseries input feature. If this value is left as 0, then it is assumed that the dataset has been provided in row-major format (i.e., it has already been preprocessed such that each row is a timeseries window).

Column-major preprocessing with window_size

When your dataset has one observation per row (column-major), set window_size to the number of past observations each input window should span:

input_features:
  - name: temperature
    type: timeseries
    preprocessing:
      window_size: 24   # use the last 24 observations as context
      padding_value: 0.0

Ludwig will slide a window of length window_size over the column and produce one row-major embedding per observation, padding the beginning of the series with padding_value.

Encoders

Sequence Encoders

Time series encoders are the same as for Sequence Features, with one exception:

Time series features don't have an embedding layer at the beginning, so the b x s placeholders (where b is the batch size and s is the sequence length) are directly mapped to a b x s x 1 tensor and then passed to the different sequential encoders.

The encoder parameters specified at the feature level are:

  • tied (default null): name of another input feature to tie the weights of the encoder with. It needs to be the name of a feature of the same type and with the same encoder parameters.

Example timeseries input feature:

name: timeseries_column_name
type: timeseries
tied: null
encoder:
    type: parallel_cnn

PatchTST

PatchTST (type: patchtst) splits the time series into fixed-length overlapping patches, projects each patch to a learned embedding, and encodes the sequence of patch embeddings with a Transformer. Processing is channel-independent: each variate is encoded separately before being combined by the combiner.

Reference: Nie et al., "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers", ICLR 2023.

encoder:
  type: patchtst
  patch_size: 16        # length of each patch in timesteps
  patch_stride: 8       # step between consecutive patches (overlap when < patch_size)
  d_model: 128          # patch embedding dimension
  num_heads: 8          # number of Transformer attention heads
  num_layers: 3         # number of Transformer encoder layers
  ffn_dim: 256          # feed-forward hidden dimension
  output_size: 256      # size of the final encoder output vector
  reduce_output: mean   # how to aggregate patch outputs: mean | last | first

Parameters:

Parameter Default Description
patch_size 16 Length of each patch in timesteps
patch_stride 8 Step between consecutive patch start positions. Set equal to patch_size for non-overlapping patches
d_model 128 Dimension of the patch embedding and Transformer hidden states
num_heads 8 Number of multi-head attention heads
num_layers 3 Number of Transformer encoder layers
ffn_dim 256 Hidden dimension of the Transformer feed-forward sub-layer
output_size 256 Dimension of the final output vector
reduce_output mean Aggregation strategy across patch outputs: mean, last, or first

N-BEATS

N-BEATS (type: nbeats) is a pure MLP architecture with doubly-residual stacking. Each block produces a backcast (subtracts its modeled portion from the input) and a forecast contribution. Contributions from all blocks are summed to produce the final output. The design requires no time-series-specific inductive biases and achieves strong results on univariate forecasting benchmarks.

Reference: Oreshkin et al., "N-BEATS: Neural basis expansion analysis for interpretable time series forecasting", ICLR 2020.

encoder:
  type: nbeats
  num_stacks: 2         # number of stacks (groups of blocks)
  num_blocks: 3         # number of blocks per stack
  num_layers: 4         # number of FC layers inside each block
  layer_size: 256       # hidden dimension of each FC layer
  output_size: 256      # size of the final encoder output vector

Parameters:

Parameter Default Description
num_stacks 2 Number of stacks (each stack is a group of residual blocks)
num_blocks 3 Number of blocks per stack
num_layers 4 Number of fully-connected layers inside each block
layer_size 256 Hidden dimension of each fully-connected layer
output_size 256 Dimension of the final output vector

Passthrough Encoder

graph LR
  A["12\n7\n43\n65\n23\n4\n1"] --> B["Cast float32"];
  B --> C["Aggregation\n Reduce\n Operation"];
  C --> ...;

The passthrough encoder simply transforms each input value into a float value and adds a dimension to the input tensor, creating a b x s x 1 tensor where b is the batch size and s is the length of the sequence. The tensor is reduced along the s dimension to obtain a single vector of size h for each element of the batch. If you want to output the full b x s x h tensor, you can specify reduce_output: null. This is useful for timeseries features when you want to pass the raw window directly to a downstream combiner such as the sequence combiner.

encoder:
    type: passthrough
    encoding_size: null
    reduce_output: null
    skip: false
    adapter: null

Parameters:

  • encoding_size (default: null): The size of the encoding vector, or None if sequence elements are scalars.
  • reduce_output (default: null): How to reduce the output tensor along the s sequence length dimension if the rank of the tensor is greater than 2. Options: last, sum, mean, avg, max, concat, attention, attention_pooling, none, None, null.
  • skip (default: false):
  • adapter (default: null):

Output Features

Ludwig supports timeseries as an output feature for forecasting tasks. The decoder projects the combined representation to a vector of length horizon — one predicted value per future timestep. All steps are predicted simultaneously in a single forward pass (direct multi-step forecasting).

Preprocessing

missing_value_strategy: drop_row
tokenizer: space
timeseries_length_limit: 256
padding_value: 0.0
padding: right
fill_value: ''
computed_fill_value: ''
horizon: 0

Parameters:

  • missing_value_strategy (default: drop_row) : What strategy to follow when a row of data is missing. Options: fill_with_const, fill_with_mode, bfill, ffill, drop_row.
  • tokenizer (default: space):
  • timeseries_length_limit (default: 256):
  • padding_value (default: 0.0):
  • padding (default: right):
  • fill_value (default: ``):
  • computed_fill_value (default: ``):
  • horizon (default: 0): Optional forecasting horizon used to convert a column-major dataset (one observation per row) into a row-major dataset (each row has a timeseries window of observations). Starting from a given observation, a sliding window is token going horizon rows forward in time, excluding the observation in the current row. If this value is left as 0, then it is assumed that the dataset has been provided in row-major format (i.e., it has already been preprocessed such that each row is a timeseries window).

Column-major preprocessing with horizon

When using column-major data, set horizon on the output feature to tell Ludwig how many steps ahead each training target spans:

output_features:
  - name: temperature
    type: timeseries
    preprocessing:
      horizon: 12   # predict the next 12 observations
    decoder:
      type: projector

The input feature must share the same column name (Ludwig uses it to align input windows with output targets):

input_features:
  - name: temperature
    type: timeseries
    preprocessing:
      window_size: 24
output_features:
  - name: temperature
    type: timeseries
    preprocessing:
      horizon: 12
    decoder:
      type: projector

Decoders

Projector

The projector decoder is a fully-connected layer (or stack of FC layers) that maps the combiner output to a vector of size horizon. This is the recommended decoder for timeseries output.

output_features:
  - name: temperature
    type: timeseries
    decoder:
      type: projector

Metrics

Ludwig computes the following metrics for timeseries output features during training and evaluation.

Mean Absolute Scaled Error (MASE)

MASE normalizes the mean absolute error by the mean absolute error of the in-sample naive one-step forecast (i.e. the mean absolute difference between consecutive observations). Because the scale cancels out, MASE is comparable across datasets with different units or magnitudes. A MASE of 1.0 means the model performs identically to the naive baseline; values below 1.0 indicate the model outperforms it.

output_features:
  - name: temperature
    type: timeseries
    validation_metric: mean_absolute_scaled_error

Symmetric Mean Absolute Percentage Error (sMAPE)

sMAPE computes the percentage error symmetrically, using the average of the absolute predicted and actual values as the denominator. This avoids the asymmetry of standard MAPE (which penalises over-forecasts more than under-forecasts) and is bounded between 0% and 200%.

output_features:
  - name: temperature
    type: timeseries
    validation_metric: symmetric_mean_absolute_percentage_error

Available metrics summary

Metric key Description
mean_squared_error Mean squared error
mean_absolute_error Mean absolute error
mean_absolute_scaled_error Scale-free MAE normalised by naive baseline
symmetric_mean_absolute_percentage_error Symmetric percentage error (0–200%)
r2 Coefficient of determination

Loss

The default loss for timeseries output features is Huber loss, which is robust to outliers compared to MSE. You can override it:

output_features:
  - name: temperature
    type: timeseries
    loss:
      type: mean_squared_error

Forecasting with model.forecast()

After training a model with timeseries input and output features, use model.forecast() to generate multi-step predictions from a seed dataset:

import pandas as pd
from ludwig.api import LudwigModel

model = LudwigModel.load("results/experiment_run/model")

# Seed data — must contain enough rows to fill the input window_size.
# Only the last window_size rows are used as context.
seed_df = pd.read_csv("recent_observations.csv")

# Predict 48 steps ahead, iteratively sliding the window.
forecast_df = model.forecast(seed_df, horizon=48)
print(forecast_df)
# Returns a DataFrame with one column per timeseries output feature,
# and one row per forecasted timestep.

model.forecast() uses an efficient incremental strategy: it preprocesses the initial window once, then slides each new prediction into the window in O(1) per step rather than re-running full preprocessing.

Parameters:

Parameter Type Default Description
dataset DataFrame / path required Seed data containing at least window_size rows
horizon int 1 Number of timesteps to forecast ahead
data_format str "auto" Dataset format (csv, parquet, etc.)
output_directory str None If set, saves forecast results here
output_format str "parquet" Format for saved results

Note

model.forecast() requires the model to have at least one timeseries input feature and at least one timeseries output feature. If the output feature column name matches the input feature column name, Ludwig automatically feeds each predicted value back as input for the next step.