Solution Architecture: MDL Structural Break Detector
1. System Purpose & Vision
The MDL Structural Break Detector is a software system designed to win the ADIA Lab Structural Break Challenge. It operates on a unified hypothesis that structural breaks in a time series can be detected by identifying fundamental changes in the series’ underlying causal dynamics.
The system’s core strategy is to:
- Pre-train a sophisticated Transformer-based autoencoder on a curated dataset of complex, rule-based systems (Elementary Cellular Automata) to make it an expert at recognizing and compressing abstract dynamics.
- Formalize this training through the lens of the Minimum Description Length (MDL) principle, where the model learns to find the most compact “dynamical fingerprint” for any given sequence.
- Deploy this trained model during inference to analyze segments of a real-world time series, transform them into a comparable symbolic format, and calculate a “break score” based on the distance between their respective dynamical fingerprints.
2. C4 View: System Context (Level 1)
Shows how our system fits into its environment.
Diagram:
+-----------------+
| Quantitative |
| Researcher |
| (Person) |
+-----------------+
|
| Initiates Training & Inference
v
+------------------------------------------------+
| |
| MDL Structural Break Detector |
| (Our Software System) |
| |
| Analyzes time series using a pre-trained |
| dynamical model to predict the probability |
| of a structural break. |
| |
+------------------------------------------------+
^ |
| |
Provides | | Submits
Train/Test| | Predictions
Data | |
| v
+-----------------------------+
| |
| ADIA Challenge Platform |
| (External System) |
| |
+-----------------------------+
3. C4 View: Container Diagram (Level 2)
Shows the major, high-level, independently runnable parts of our system.
Diagram:
+-----------------------------------------------------------------------------------+
| |
| System Boundary: MDL Structural Break Detector |
| |
| +-----------------------------+ +------------------------------------+ |
| | Pre-Training Pipeline |------>| Model Store | |
| | (Python Script) | | (File System: .pth, .joblib files) |<--+
| | (Implements train()) | +------------------------------------+ |
| +-------------^---------------+ ^ |
| | | Reads Trained Encoder |
| Writes | | |
| /Reads v | |
| +-----------------------------+ +-----+-----------------------+ |
| | Synthetic ECA Dataset | | Inference Pipeline |--------->|
| | (File System: .pt file) | | (Implements infer()) | |
| +-----------------------------+ +-------------^---------------+ |
| | Reads Test Data |
| | |
+--------------------------------------------------------+--------------------------+
|
v
+-------------------+
| ADIA Data Store |
| (data/*.parquet) |
+-------------------+
4. C4 View: Component Diagram (Level 3)
Zooms into each container to show its logical modules and their responsibilities.
4.1. Pre-Training Pipeline Components
Diagram:
+-------------------------------------------------------------------------+
| |
| Container: Pre-Training Pipeline |
| |
| +---------------------+ Requests Data +----------------------+ |
| | |-------------------->| | |
| | MDLTrainer |<--------------------| ECADataGenerator | |
| | (Manages training | Provides Batch | (Creates synthetic | |
| | loop, dual loss) | | ECA data) | |
| +----------+----------+ +----------------------+ |
| | |
| Passes Data, | Computes Loss |
| Returns v |
| Logits/Recons| |
| +--------------------------+ Provides +--------------------------+ |
| | DynamicalAutoencoder |---Trained-->| EncoderSaver | |
| | (Encoder, Decoder, | Model | (Extracts and saves just | |
| | Classification Head) | | the encoder state) | |
| +--------------------------+ +-------------+------------+ |
| | |
| | Writes .pth |
| v |
+-------------------------------------------------------------------------+
|
To: Model Store (External)
4.2. Inference Pipeline Components
Diagram:
+---------------------------------------------------------------------------------+
| |
| Container: Inference Pipeline |
| |
| +--------------------+ Loads State +--------------------------------+ |
| | EncoderLoader |<-----------------| Model Store (External) | |
| | (Loads .pth file | +--------------------------------+ |
| | into nn.Module) | | |
| +---------+----------+ | |
| | | |
| Provides | Trained Encoder | |
| Loaded v | |
| Model +----------------------+ Feeds Symbol Seq +----------------------+ |
| | |------------------->| | |
| | Fingerprinter | | SeriesProcessor | |
| | (Generates stable | | (Applies full data | |
| | fingerprint for a |<-------------------| transformation pipe) | |
| | time series) | Returns Symbol Seq+---------^-------------+ |
| +-----------+----------+ | | |
| | | Reads Raw | Series Data |
| Returns | Fingerprint | v |
| Before/After v | +-------------------+ |
| +----------------------+ | | ADIA Data Store | |
| | BreakScoreCalculator | | (data/*.parquet) | |
| | (Computes distance | +-------------------+ |
| | between fingerprints) | |
| +-----------+----------+ |
| | |
| | Final [0,1] Score |
| v |
+---------------------------------------------------------------------------------+
5. C4 View: Code (Level 4)
Defines the key classes and their public interfaces (the “contracts”) for implementation and testing.
File: model_architecture.py
import torch
import torch.nn as nn
class TransformerEncoder(nn.Module):
def __init__(self, input_dim: int, model_dim: int, num_heads: int, num_layers: int, latent_dim: int): ...
def forward(self, sequence_batch: torch.Tensor) -> torch.Tensor: ...
class TransformerDecoder(nn.Module):
def __init__(self, latent_dim: int, model_dim: int, num_heads: int, num_layers: int, output_dim: int): ...
def forward(self, fingerprint_batch: torch.Tensor, seq_len: int) -> torch.Tensor: ...
class DynamicalAutoencoder(nn.Module):
def __init__(self, encoder: TransformerEncoder, decoder: TransformerDecoder, num_classes: int, latent_dim: int): ...
def forward(self, sequence_batch: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]: ...
def encode(self, sequence_batch: torch.Tensor) -> torch.Tensor: ...
File: data_processing.py
import numpy as np
import pandas as pd
class ECADataGenerator:
def __init__(self, config: dict): ...
def generate_training_data(self) -> tuple[np.ndarray, np.ndarray]: ...
class PermutationSymbolizer:
def __init__(self, embedding_dim: int): ...
def transform_series(self, series: pd.Series) -> np.ndarray: ...
class SeriesProcessor:
def __init__(self, symbolizer: PermutationSymbolizer, sequence_length: int): ...
def process(self, series: pd.Series) -> torch.Tensor | None: ...
File: pipelines.py
import torch
import pandas as pd
from model_architecture import DynamicalAutoencoder, TransformerEncoder
from data_processing import ECADataGenerator, SeriesProcessor
class MDLTrainer:
def __init__(self, model: DynamicalAutoencoder, data_generator: ECADataGenerator, config: dict): ...
def train(self) -> nn.Module: ...
def save_encoder(self, encoder: nn.Module, path: str): ...
class InferencePipeline:
def __init__(self, encoder: TransformerEncoder, series_processor: SeriesProcessor): ...
def calculate_break_score(self, series_before: pd.Series, series_after: pd.Series) -> float: ...
This complete document provides a top-to-bottom, coherent view of the entire system. It establishes a clear plan, defines responsibilities, and sets up explicit contracts between all components, laying a solid foundation for successful implementation.