Solution Architecture v4

This document outlines the complete C4 architecture for our solution to the ADIA challenge. This version (v4) is a self-contained blueprint incorporating all design decisions and corrections, providing a definitive guide for implementation.

Level 1: System Context

This view shows our system in relation to the user and the external platform. Our solution is a self-contained library that is called by the ADIA Platform Runner, which expects train() and infer() entrypoints.

C4Context
    title System Context Diagram for ADIA Challenge

    System_Ext(adia_platform, "ADIA Platform Runner", "The environment that calls our train() and infer() functions.")
    
    System(our_system, "Our Solution", "A two-stage deep learning pipeline to detect structural breaks in time series.")

    Rel(adia_platform, our_system, "Calls train() and infer(), providing data.")

Level 2: Container Diagram

This view breaks down our solution into its major, high-level structural blocks or “containers.”

C4Container
    title System Container Diagram for ADIA Challenge

    System_Ext(adia_platform, "ADIA Platform Runner", "The environment that calls our train() and infer() functions.")
    
    Container_Boundary(our_system, "Our Solution") {
        Container(core_lib, "Core Services Library", "Python Module", "Contains all foundational, reusable code for data processing and model definitions.")
        Container(training_pipeline, "Training Pipeline Logic", "Python Module", "Orchestrates the two-stage training process (pre-training and fine-tuning).")
        Container(inference_pipeline, "Inference Pipeline Logic", "Python Module", "Orchestrates the prediction process using the trained model.")
        ContainerDb(model_store, "Model Store", "File System Directory", "Stores the final, trained model artifact and its configuration.")
    }

    Rel(adia_platform, training_pipeline, "Calls train()")
    Rel(adia_platform, inference_pipeline, "Calls infer()")
    
    Rel(training_pipeline, core_lib, "Uses data processing & model components from")
    Rel(training_pipeline, model_store, "Writes final model artifact to")

    Rel(inference_pipeline, core_lib, "Uses data processing components from")
    Rel(inference_pipeline, model_store, "Reads final model artifact from")

Level 3: Component Diagrams

This level shows the components inside each container.

Container 1: Core Services Library

This container holds all the foundational, reusable building blocks of our system.

C4Component
    title Component Diagram for Core Services Library

    Container_Boundary(core_container, "Core Services Library") {
        Component(perm_sym, "PermutationSymbolizer", "Converts a vector to a symbolic permutation.")
        Component(series_proc, "SeriesProcessor", "Transforms a full time series into symbolic sequences.")
        Component(eca_gen, "ECADataGenerator", "Creates synthetic ECA data for pre-training.")

        Component(trans_enc, "HierarchicalDynamicalEncoder", "Model primitive for encoding sequences.")
        Component(trans_dec, "HierarchicalDynamicalDecoder", "Model primitive for decoding sequences.")
        Component(dyn_ae, "MDL_AU_Net_Autoencoder", "Composite model for pre-training.")
        Component(break_class, "StructuralBreakClassifier", "Composite model for fine-tuning.")
    }

    Rel(series_proc, perm_sym, "Uses")
    Rel(dyn_ae, trans_enc, "Is composed of")
    Rel(dyn_ae, trans_dec, "Is composed of")
    Rel(break_class, trans_enc, "Is composed of")

Container 2: Training Pipeline Logic

This container’s components are pure orchestrators that manage the two-stage training flow.

C4Component
    title Component Diagram for Training Pipeline

    Container_Boundary(core_container, "Core Services Library") {
        Component(eca_gen, "ECADataGenerator")
        Component(dyn_ae, "MDL_AU_Net_Autoencoder")
        Component(series_proc, "SeriesProcessor")
        Component(break_class, "StructuralBreakClassifier")
    }

    Container_Boundary(training_container, "Training Pipeline Logic") {
        Component(pre_trainer, "MDLPreTrainer", "Manages the pre-training loop.")
        Component(fine_tuner, "BreakClassifierFinetuner", "Manages the fine-tuning loop.")
        Component(saver, "EncoderSaver", "Saves the final model artifact.")
    }

    System_Ext(model_store, "Model Store")
    
    Rel(pre_trainer, eca_gen, "Uses")
    Rel(pre_trainer, dyn_ae, "Trains")

    Rel(fine_tuner, series_proc, "Uses")
    Rel(fine_tuner, break_class, "Fine-tunes")
    
    Rel_D(break_class, saver, "Provides Final Encoder to")
    Rel_R(saver, model_store, "Writes artifact to")

Container 3: Inference Pipeline Logic

This container’s components load the final model and use core services to generate predictions.

C4Component
    title Component Diagram for Inference Pipeline

    System_Ext(model_store, "Model Store")
    System_Ext(adia_platform, "ADIA Platform Runner")

    Container_Boundary(inference_container, "Inference Pipeline Logic") {
        Component(loader, "EncoderLoader", "Loads model from the Model Store.")
        Component(encoder, "HierarchicalDynamicalEncoder", "The loaded, fine-tuned model artifact.")
        Component(series_proc, "SeriesProcessor", "Transforms raw test data into symbolic sequences.")
        Component(fingerprinter, "Fingerprinter", "Generates a stable fingerprint for a data segment.")
        Component(scorer, "BreakScoreCalculator", "Computes the final distance score.")
    }

    Rel_R(loader, model_store, "Reads artifact from")
    Rel_D(loader, encoder, "Instantiates")
    
    Rel_D(fingerprinter, series_proc, "Uses")
    Rel_D(fingerprinter, encoder, "Uses")
    
    Rel_D(scorer, fingerprinter, "Gets 'before' and 'after' fingerprints from")
    
    Rel_R(scorer, adia_platform, "Yields Prediction to")

Container 4: Model Store

This container represents the persistence layer (model_directory_path).

C4Component
    title Component Diagram for Model Store

    Container_Boundary(model_store_container, "Model Store (File System Directory)") {
        Component(model_weights, "final_encoder.pth", "PyTorch State Dictionary", "The learned numerical weights of the final encoder.")
        Component(model_config, "model_config.joblib", "Configuration File", "Hyperparameters needed to build the model architecture before loading weights.")
    }

Level 4: Code View (The Blueprint for Implementation)

This level details the primary classes and their corrected “code contracts.”

Module 1: core_library/data_processing.py

Class Name Role & Responsibilities Key Public Methods Key Collaborators
PermutationSymbolizer Symbolic Converter.
- Converts a single numeric vector into a discrete ordinal pattern symbol.
- Uses randomized tie-breaking for robustness.
__init__(embedding_dim, seed)
symbolize_vector(vector)
(None - Foundational)
SeriesProcessor Real Data Transformer.
- Manages the full pipeline: time-delay embedding, symbolization, and windowing into sequences.
- Handles edge cases like series being too short.
__init__(symbolizer, sequence_length)
process(series)
PermutationSymbolizer
ECADataGenerator Synthetic Data Factory.
- Simulates Elementary Cellular Automata to create a labeled dataset.
- Handles composite rules and ensures reproducibility.
__init__(config)
generate_training_data()
(None - Uses cellpylib externally)

Module 2: core_library/model_architecture.py

Class Name Role & Responsibilities Key Public Methods Key Collaborators
HierarchicalDynamicalEncoder Sequence Encoder (Contracting Path).
- A nn.Module that compresses a sequence into a final “fingerprint” sequence.
- Its forward pass MUST return a tuple: (fingerprint_sequence, residuals_list).
__init__(args)
forward(sequence_batch)
(None - Primitive)
HierarchicalDynamicalDecoder Sequence Decoder (Expanding Path).
- A nn.Module that reconstructs the original sequence.
- Its forward pass MUST accept two arguments: (fingerprint_seq, residuals).
__init__(args, transitions)
forward(fingerprint_seq, residuals)
(None - Primitive)
MDL_AU_Net_Autoencoder Pre-training Model.
- A composite nn.Module that combines the Encoder, Decoder, and a classification head.
- Its internal logic correctly handles the tuple returned by the encoder.
__init__(args)
forward(sequence_batch)
encode(sequence_batch)
HierarchicalDynamicalEncoder, HierarchicalDynamicalDecoder
StructuralBreakClassifier Fine-tuning Model.
- A composite nn.Module that predicts a break from processed before and after periods.
- Its forward pass MUST accept two lists of tensors: (before_seqs, after_seqs).
- Its internal logic MUST correctly unpack the (fingerprint, _) tuple when calling its encoder.
__init__(encoder, latent_dim, ...)
forward(before_seqs, after_seqs)
HierarchicalDynamicalEncoder

Module 3: training_pipeline.py

Class Name Role & Responsibilities Key Public Methods Key Collaborators
MDLPreTrainer Pre-training Orchestrator.
- Manages the training loop for the MDL_AU_Net_Autoencoder.
__init__(model, config)
pretrain(data_generator)
MDL_AU_Net_Autoencoder, ECADataGenerator
BreakClassifierFinetuner Fine-tuning Orchestrator.
- Manages the training loop for the StructuralBreakClassifier.
- Its implementation must pass lists of tensors to the classifier.
__init__(model, config)
finetune(X_train, y_train, processor)
StructuralBreakClassifier, SeriesProcessor
EncoderSaver Artifact Manager.
- Saves the final fine-tuned encoder and its configuration.
save(model, config, path) StructuralBreakClassifier

Module 4: inference_pipeline.py

Class Name Role & Responsibilities Key Public Methods Key Collaborators
EncoderLoader Artifact Loader.
- Reads the config and weights from the Model Store.
load(path) HierarchicalDynamicalEncoder
Fingerprinter Vector Generator.
- Orchestrates producing a single, stable fingerprint for a time series segment.
- Its implementation must handle lists of sequences and average the resulting fingerprints.
__init__(encoder, processor)
generate(series)
SeriesProcessor, HierarchicalDynamicalEncoder
BreakScoreCalculator Prediction Calculator.
- Takes two fingerprint vectors and computes their cosine distance.
calculate(fp_before, fp_after) (None - simple math)