Cybernetic Intelligence

C4 Architecture: Level 1 - System Context (Revised)

Purpose: This revised diagram accurately shows that our system is a self-contained notebook which is executed by the ADIA platform. The platform is not just a data source, but the host execution environment for our code.

Scope: The system is the notebook itself. Its “user” is the ADIA platform’s execution engine, which provides the necessary inputs and consumes the outputs directly.

Diagram:

C4Context
    title Revised System Context Diagram for MDL Structural Break Detector

    Person(researcher, "Quantitative Researcher", "Develops and tests the system locally.")
    System_Ext(adia_platform, "ADIA Challenge Platform", "The cloud environment that executes our notebook, provides data, and evaluates the results.")

    System(mdl_detector, "MDL Structural Break Detector Notebook", "A self-contained Python notebook that implements a transfer-learning strategy to detect structural breaks.")

    Rel(researcher, mdl_detector, "Develops & Submits")

    Rel_U(adia_platform, mdl_detector, "Executes `train()` and `infer()` functions")
    Rel_D(mdl_detector, adia_platform, "Yields predictions from `infer()`")

Revised Elements and Relationships:

MDL Structural Break Detector Notebook (Our System):
- Description: A self-contained Python notebook that implements a transfer-learning strategy to detect structural breaks.
- Responsibilities: Encapsulates all logic for data processing, model training (pre-training and fine-tuning), and inference within the train() and infer() functions as required.
Quantitative Researcher (Person):
- Description: The developer and operator of the system.
- Interaction: The researcher’s primary role is to Develops & Submits the notebook file itself. They run the system locally for testing but do not run it on the platform directly.
ADIA Challenge Platform (External System):
- Description: The cloud environment that executes our notebook, provides data, and evaluates the results.
- Revised Interactions:
  - Executes train() and infer() functions: The platform is the active agent. It imports our notebook as a module and calls the required functions, providing the data (X_train, y_train, X_test) as arguments in memory.
  - Receives Predictions: Our system doesn’t write a file for submission. Instead, the infer() function Yields predictions one by one directly back to the platform’s runner, which collects them for scoring.

C4 Architecture: Level 2 - Container Diagram

Purpose: The Container Diagram breaks down the overall MDL Structural Break Detector Notebook into its major, high-level, logical blocks. These are not necessarily separate files but represent distinct, cohesive areas of responsibility within our single notebook. It shows the primary “moving parts” of our system and how they interact.

Scope: This view focuses on the logical separation of concerns inside our notebook. It distinguishes between the training logic, the inference logic, and the persistent model artifact that connects them.

Diagram:

C4Container
    title Container Diagram for MDL Structural Break Detector Notebook

    System_Ext(adia_platform, "ADIA Challenge Platform", "Host execution environment")

    Container_Boundary(notebook_boundary, "MDL Structural Break Detector Notebook") {
        
        Container(training_pipeline, "Training Pipeline Logic", "Python Code", "Orchestrates the two-stage model training process defined in the train() function.")
        
        Container(inference_pipeline, "Inference Pipeline Logic", "Python Code", "Orchestrates model loading and prediction generation as defined in the infer() function.")
        
        Container(core_library, "Core Services Library", "Python Code", "A library of reusable, foundational components like data processors and model architectures.")
        
        ContainerDb(model_store, "Model Store", "File System Directory", "Persists the trained model artifact between the train and infer steps.")
    }

    Rel(adia_platform, training_pipeline, "Calls train()", "Provides X_train, y_train")
    Rel(adia_platform, inference_pipeline, "Calls infer()", "Provides X_test")
    
    Rel(training_pipeline, model_store, "Writes", "Saves the final trained artifact")
    Rel(inference_pipeline, model_store, "Reads", "Loads the final trained artifact")
    
    Rel(training_pipeline, core_library, "Uses", "Imports and uses components")
    Rel(inference_pipeline, core_library, "Uses", "Imports and uses components")

Container Descriptions and Collaborations

Container Name	Type	Responsibilities	Key Collaborators
Training Pipeline Logic	Code Container	- Orchestrate the `train()` function: Manage the entire two-stage training flow. - Execute Pre-training: Control the generation of synthetic data and the initial training of the `DynamicalAutoencoder`. - Execute Fine-tuning: Control the processing of real training data and the fine-tuning of the `StructuralBreakClassifier`. - Manage Model Persistence: Ensure the final, trained encoder and its configuration are saved correctly.	- `Core Services Library`: Imports and uses model and data processing components. - `Model Store`: Writes the final trained artifact to it. - `ADIA Challenge Platform`: Is called by the platform and receives `X_train`/`y_train` data from it.
Inference Pipeline Logic	Code Container	- Orchestrate the `infer()` function: Manage the entire prediction generation flow. - Load the Model: Control the loading of the trained encoder from the `Model Store`. - Process Test Data: Manage the transformation of the incoming `X_test` data stream. - Generate Predictions: Use the loaded model to calculate and `yield` the final structural break scores.	- `Core Services Library`: Imports and uses model and data processing components. - `Model Store`: Reads the final trained artifact from it. - `ADIA Challenge Platform`: Is called by the platform, receives `X_test` data from it, and yields predictions back to it.
Core Services Library	Code Container	- Provide Foundational Building Blocks: Define all reusable, core components of the system. - Define Data Processors: Contain the classes for synthetic data generation (`ECADataGenerator`) and real data transformation (`SeriesProcessor`, `PermutationSymbolizer`). - Define Model Architectures: Contain the `torch.nn.Module` classes for all neural network models (`TransformerEncoder`, `DynamicalAutoencoder`, etc.).	- `Training Pipeline Logic`: Is used by this container. - `Inference Pipeline Logic`: Is used by this container.
Model Store	Database Container	- Persist the Trained Artifact: Act as the durable storage layer connecting the training and inference phases. - Store Model Weights: Reliably hold the `.pth` file containing the final encoder’s parameters. - Store Model Configuration: Reliably hold the `.joblib` file containing the hyperparameters needed to reconstruct the model.	- `Training Pipeline Logic`: Is written to by this container. - `Inference Pipeline Logic`: Is read from by this container.

This provides a complete and accurate Level 2 view of our system, detailing the responsibilities of each major logical block and how they collaborate. We are now ready to zoom into each of these containers.

C4 Architecture: Level 3 - Component Diagrams (Cleaned, Corrected, and Comment-Free)

Here are the universally compatible Mermaid diagrams.

Container 1: Training Pipeline Logic

C4Component
    title Component Diagram for Training Pipeline

    Container_Boundary(training_container, "Training Pipeline Logic") {
        Component(data_gen, "ECADataGenerator", "Creates synthetic ECA data for pre-training.")
        Component(pre_trainer, "MDLPreTrainer", "Manages the pre-training loop.")
        Component(autoencoder, "DynamicalAutoencoder", "The full model for pre-training.")
        
        Component(series_proc, "SeriesProcessor", "Processes real ADIA data.")
        Component(fine_tuner, "BreakClassifierFinetuner", "Manages the fine-tuning loop.")
        Component(break_classifier, "StructuralBreakClassifier", "The model for fine-tuning.")
        
        Component(saver, "EncoderSaver", "Saves the final model artifact.")
    }
    
    System_Ext(model_store, "Model Store")

    Rel_R(data_gen, pre_trainer, "Provides ECA data to")
    Rel_R(pre_trainer, autoencoder, "Trains")
    
    Rel_D(autoencoder, break_classifier, "Provides Pre-trained Encoder")

    Rel_R(series_proc, fine_tuner, "Provides processed real data to")
    Rel_R(fine_tuner, break_classifier, "Fine-tunes")
    
    Rel_D(break_classifier, saver, "Provides Final Encoder")
    Rel_R(saver, model_store, "Writes artifact to")

Container 2: Inference Pipeline Logic

C4Component
    title Component Diagram for Inference Pipeline

    System_Ext(model_store, "Model Store")
    System_Ext(adia_platform, "ADIA Platform Runner")

    Container_Boundary(inference_container, "Inference Pipeline Logic") {
        Component(loader, "EncoderLoader", "Loads model from the Model Store.")
        Component(encoder, "TransformerEncoder", "The loaded, fine-tuned model artifact.")
        Component(series_proc, "SeriesProcessor", "Transforms raw test data into symbolic sequences.")
        Component(fingerprinter, "Fingerprinter", "Generates a stable fingerprint for a data segment.")
        Component(scorer, "BreakScoreCalculator", "Computes the final distance score.")
    }

    Rel_R(loader, model_store, "Reads artifact from")
    Rel_D(loader, encoder, "Instantiates")
    
    Rel_D(fingerprinter, series_proc, "Uses")
    Rel_D(fingerprinter, encoder, "Uses")
    
    Rel_D(scorer, fingerprinter, "Gets 'before' and 'after' fingerprints from")
    
    Rel_R(scorer, adia_platform, "Yields Prediction to")

Container 3: Model Store

C4Component
    title Component Diagram for Model Store

    Container_Boundary(model_store_container, "Model Store (File System Directory)") {
        Component(model_weights, "final_encoder.pth", "PyTorch State Dictionary", "The learned numerical weights of the final encoder.")
        Component(model_config, "model_config.joblib", "Configuration File", "Hyperparameters needed to build the model architecture before loading weights.")
    }

Complete Component Registry

Component Name	Container	Role	Responsibilities	Key Collaborators
ECADataGenerator	Training Pipeline	Synthetic Data Factory	- Generate Elementary Cellular Automata simulations for a curated set of complex rules. - Slice simulations into `(sequence, rule_label)` pairs for pre-training.	`MDLPreTrainer`
MDLPreTrainer	Training Pipeline	Pre-training Orchestrator	- Manage the training loop for the autoencoder. - Calculate and apply the dual MDL-inspired loss (reconstruction + classification). - Perform backpropagation to learn general dynamics.	`ECADataGenerator`, `DynamicalAutoencoder`
DynamicalAutoencoder	Training Pipeline	General Dynamics Model	- Encapsulate the `TransformerEncoder`, `TransformerDecoder`, and a classification head. - Learn to compress and reconstruct ECA sequences during pre-training.	`MDLPreTrainer`
SeriesProcessor	Training / Inference	Real Data Transformer	- Transform raw ADIA time series into symbolic vector sequences using a `PermutationSymbolizer`. - Handle all windowing, embedding, and batching logic.	`BreakClassifierFinetuner` (in train), `Fingerprinter` (in infer)
BreakClassifierFinetuner	Training Pipeline	Fine-tuning Orchestrator	- Manage the fine-tuning loop using real `X_train`/`y_train` data. - Calculate and apply the Binary Cross-Entropy loss. - Update weights of the classification head and unfrozen encoder layers.	`SeriesProcessor`, `StructuralBreakClassifier`
StructuralBreakClassifier	Training Pipeline	Specialized Break Model	- Encapsulate the pre-trained, partially-frozen `TransformerEncoder` and a new classification head designed to predict a break from two fingerprints.	`BreakClassifierFinetuner`
EncoderSaver	Training Pipeline	Artifact Persistence	- Receive the final, fine-tuned `TransformerEncoder` module and its configuration. - Save the `state_dict` and config to the Model Store.	`StructuralBreakClassifier`, `Model Store`
EncoderLoader	Inference Pipeline	Artifact Hydration	- Load the encoder `state_dict` and configuration from the Model Store. - Instantiate a `TransformerEncoder` object that is ready for inference.	`Model Store`, `TransformerEncoder`
TransformerEncoder	Inference Pipeline	The Core Model Artifact	- A `torch.nn.Module` instantiated by the `EncoderLoader`. - Its sole responsibility is to take a batch of symbolic sequences and produce a batch of fingerprint vectors.	`Fingerprinter`
Fingerprinter	Inference Pipeline	Vector Generator	- Orchestrate the use of the `Encoder` and `SeriesProcessor` to generate a single, stable fingerprint vector for a given time series segment (`before` or `after`).	`TransformerEncoder`, `SeriesProcessor`
BreakScoreCalculator	Inference Pipeline	Prediction Calculator	- Compute the final `[0,1]` score, typically by calculating the cosine distance between the `before` and `after` fingerprint vectors provided by the `Fingerprinter`.	`Fingerprinter`
final_encoder.pth	Model Store	Trained Model Weights	- A binary file containing the learned numerical weights and biases of the final `TransformerEncoder`.	`EncoderSaver`, `EncoderLoader`
model_config.joblib	Model Store	Model Architecture Config	- A file storing essential hyperparameters (e.g., `latent_dim`, `model_dim`, `num_layers`) needed to correctly instantiate the `TransformerEncoder` class.	`EncoderSaver`, `EncoderLoader`

Of course. This is the final and most detailed level of the C4 model, where we define the “code contracts” for our components. Each class will be presented with its purpose, key responsibilities, and collaborators, providing a clear blueprint for implementation.

C4 Architecture: Level 4 - Code View

This level details the primary classes within each of our logical modules.

Module 1: `core_library/data_processing.py`

Module Purpose: Contains all classes related to data creation and transformation. These are the foundational building blocks for handling both synthetic and real data.

Class Name	Role & Responsibilities	Key Collaborators
`PermutationSymbolizer`	Symbolic Converter. - Is initialized with an `embedding_dim`. - Its core responsibility is to take a single numeric vector and convert it into a discrete ordinal pattern (permutation symbol), correctly handling ties.	(None - Foundational)
`SeriesProcessor`	Real Data Transformer. - Is initialized with a `PermutationSymbolizer` and a `sequence_length`. - Manages the full pipeline for real data: time-delay embedding, symbolization via its collaborator, and windowing into sequences for the model.	`PermutationSymbolizer`
`ECADataGenerator`	Synthetic Data Factory. - Is initialized with a configuration dictionary (`rules_to_use`, `width`, `steps`, etc.). - Responsible for simulating Elementary Cellular Automata and packaging the output into `(sequence, label)` pairs for pre-training.	(None - Foundational)

Code Skeleton:

# In core_library/data_processing.py

import numpy as np
import pandas as pd
import torch

class PermutationSymbolizer:
    def __init__(self, embedding_dim: int): ...
    def symbolize_vector(self, vector: np.ndarray) -> np.ndarray: ...

class SeriesProcessor:
    def __init__(self, symbolizer: PermutationSymbolizer, sequence_length: int): ...
    def process(self, series: pd.Series) -> torch.Tensor | None: ...

class ECADataGenerator:
    def __init__(self, config: dict): ...
    def generate_training_data(self) -> tuple[np.ndarray, np.ndarray]: ...

Module 2: `core_library/model_architecture.py`

Module Purpose: Contains all torch.nn.Module definitions. These are the structural blueprints for our neural network models.

Class Name	Role & Responsibilities	Key Collaborators
`TransformerEncoder`	Sequence Encoder. - A `nn.Module` primitive. - Takes a sequence of vectors and compresses it into a single “fingerprint” vector.	(None - Primitive)
`TransformerDecoder`	Sequence Decoder. - A `nn.Module` primitive. - Takes a fingerprint vector and attempts to reconstruct the original sequence.	(None - Primitive)
`DynamicalAutoencoder`	Pre-training Model. - A composite `nn.Module`. - Combines an encoder, a decoder, and a classification head for the dual-loss pre-training stage.	`TransformerEncoder`, `TransformerDecoder`
`StructuralBreakClassifier`	Fine-tuning Model. - A composite `nn.Module`. - Combines a (partially frozen) pre-trained encoder with a new classification head to predict a break from two fingerprints.	`TransformerEncoder`

Code Skeleton:

# In core_library/model_architecture.py

import torch
import torch.nn as nn

class TransformerEncoder(nn.Module):
    def __init__(self, input_dim: int, model_dim: int, num_heads: int, num_layers: int, latent_dim: int): ...
    def forward(self, sequence_batch: torch.Tensor) -> torch.Tensor: ...

class TransformerDecoder(nn.Module):
    def __init__(self, latent_dim: int, model_dim: int, num_heads: int, num_layers: int, output_dim: int): ...
    def forward(self, fingerprint_batch: torch.Tensor, seq_len: int) -> torch.Tensor: ...

class DynamicalAutoencoder(nn.Module):
    def __init__(self, encoder: TransformerEncoder, decoder: TransformerDecoder, num_classes: int, latent_dim: int): ...
    def forward(self, sequence_batch: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]: ...

class StructuralBreakClassifier(nn.Module):
    def __init__(self, encoder: TransformerEncoder, latent_dim: int, freeze_encoder_body: bool = True): ...
    def forward(self, processed_seq_before: torch.Tensor, processed_seq_after: torch.Tensor) -> torch.Tensor: ...

Module 3: `training_pipeline.py`

Module Purpose: Contains the high-level orchestration logic for the train() function.

Class Name	Role & Responsibilities	Key Collaborators
`MDLPreTrainer`	Pre-training Orchestrator. - Manages the training loop for the `DynamicalAutoencoder`. - Uses `ECADataGenerator` to get data. - Calculates and backpropagates the dual MDL loss.	`ECADataGenerator`, `DynamicalAutoencoder`
`BreakClassifierFinetuner`	Fine-tuning Orchestrator. - Manages the training loop for the `StructuralBreakClassifier`. - Uses `SeriesProcessor` to handle real `X_train` data. - Calculates and backpropagates the BCE loss against `y_train`.	`SeriesProcessor`, `StructuralBreakClassifier`
`EncoderSaver`	Artifact Manager. - A simple helper class or function. - Takes the final fine-tuned encoder and its config, and saves them to the `Model Store` directory.	`StructuralBreakClassifier`, `Model Store` (filesystem)

Code Skeleton:

# In training_pipeline.py

import pandas as pd
# Imports from core_library
from .core_library.model_architecture import DynamicalAutoencoder, StructuralBreakClassifier
from .core_library.data_processing import SeriesProcessor, ECADataGenerator

class MDLPreTrainer:
    def __init__(self, model: DynamicalAutoencoder, config: dict): ...
    def pretrain(self, data_generator: ECADataGenerator) -> DynamicalAutoencoder: ...

class BreakClassifierFinetuner:
    def __init__(self, model: StructuralBreakClassifier, config: dict): ...
    def finetune(self, X_train_df: pd.DataFrame, y_train: pd.Series, series_processor: SeriesProcessor) -> StructuralBreakClassifier: ...

class EncoderSaver:
    def save(self, model: nn.Module, config: dict, path: str): ...

Module 4: `inference_pipeline.py`

Module Purpose: Contains the high-level orchestration logic for the infer() function.

Class Name	Role & Responsibilities	Key Collaborators
`EncoderLoader`	Artifact Loader. - A helper class or function. - Reads the config and weights from the `Model Store` and instantiates a ready-to-use `TransformerEncoder`.	`Model Store` (filesystem), `TransformerEncoder`
`Fingerprinter`	Vector Generator. - A helper class or function. - Orchestrates the use of `SeriesProcessor` and the loaded `TransformerEncoder` to produce a stable fingerprint for a given time series segment.	`SeriesProcessor`, `TransformerEncoder`
`BreakScoreCalculator`	Prediction Calculator. - A simple helper function. - Takes two fingerprint vectors and computes their cosine distance to get the final `[0,1]` score.	(None - simple math)

Code Skeleton:

# In inference_pipeline.py

import typing
import pandas as pd
# Imports from core_library
from .core_library.model_architecture import TransformerEncoder
from .core_library.data_processing import SeriesProcessor

class EncoderLoader:
    def load(self, path: str) -> tuple[TransformerEncoder, dict]: ...

class Fingerprinter:
    def __init__(self, encoder: TransformerEncoder, processor: SeriesProcessor): ...
    def generate(self, series: pd.Series) -> torch.Tensor | None: ...

class BreakScoreCalculator:
    def calculate(self, fp_before: torch.Tensor, fp_after: torch.Tensor) -> float: ...

This completes the C4 model, providing a clear, top-to-bottom architectural design from system context down to the specific classes and their responsibilities.

C4 Architecture: Level 1 - System Context (Revised)

C4 Architecture: Level 2 - Container Diagram

Container Descriptions and Collaborations

C4 Architecture: Level 3 - Component Diagrams (Cleaned, Corrected, and Comment-Free)

Container 1: Training Pipeline Logic

Container 2: Inference Pipeline Logic

Container 3: Model Store

Complete Component Registry

C4 Architecture: Level 4 - Code View

Module 1: core_library/data_processing.py

Module 2: core_library/model_architecture.py

Module 3: training_pipeline.py

Module 4: inference_pipeline.py

Module 1: `core_library/data_processing.py`

Module 2: `core_library/model_architecture.py`

Module 3: `training_pipeline.py`

Module 4: `inference_pipeline.py`