Internal Technical Whitepaper

Title: From Theory to Submission: Implementing CIv8r Structural Break Detection on ADIA Challenge Data (Colab Edition)


1. Purpose & Implementation Goals

This internal whitepaper translates the theoretical constructs of the CIv8r hypothesis suite—CIv8-ECA, CIv8r-LLM, and CIv8-unified—into a concrete, reproducible, and high-performance implementation for the ADIA Lab Structural Break Challenge.

Goals:

  • Achieve a top-tier ROC AUC score for structural break detection.
  • Implement symbolic-latent detection pipeline using open-source repositories.
  • Leverage our Colab-based distributed task orchestration layer.
  • Ensure reproducibility and checkpointed training within Colab runtime constraints.

2. Architecture Overview

A. Symbolic Pipeline (CIv8-ECA Layer)

  • Detects motif transitions, compressibility shifts across boundary.
  • Uses ECA rules (e.g. 30, 110) and symbolic grammar features.
  • Repos:

    • eca-rule-transform
    • glenford-symbolic-ts

B. Latent Embedding Pipeline (CIv8r-LLM Layer)

  • Learns latent topologies and detects geometric discontinuities.
  • Applies attention-based encoder and UMAP-style topology modeling.
  • Repos:

    • timeseries-transformer
    • topo-ml

C. Unified Scoring Layer (CIv8-unified)

  • Fuses symbolic and latent signals into a final structural break score.
  • Implements scoring fusion logic using time-windowed alignment.
  • Repos:

    • tsflex

3. Repository Integration Map

Each module integrates key GitHub repositories:

Module Repository Source Used For
symbolic_features.py eca-rule-transform, glenford-symbolic-ts Braun et al. ECA motif extraction
latent_encoder.py timeseries-transformer Shani et al. Transformer-based latent drift
topo_analysis.py topo-ml (UMAP) Walch et al. Latent curvature divergence
fusion_score.py tsflex Predict IDLab Feature fusion and ROC optimization
orchestration_server.py Custom Algoplexity Task scheduler & checkpointing

4. Module-by-Module Implementation

4.1 symbolic_features.py

  • Accepts per-series dataframe (with period column).
  • Applies ECA rules (30, 110, randomized) to symbolic discretization.
  • Outputs: motif frequency delta, entropy difference, compression delta.

4.2 latent_encoder.py

  • Applies pretrained transformer to before/after segments.
  • Measures latent space shift via cosine distance and attention flow.

4.3 topo_analysis.py

  • Projects residual stream or hidden state via UMAP.
  • Computes pre/post boundary manifold curvature.

4.4 fusion_score.py

  • Aligns symbolic and latent feature vectors.
  • Trains lightweight model (logistic/XGBoost) or uses calibrated weighting.
  • Outputs a probability score: 0 (no break) – 1 (break).

5. Train/Infer Adaptation

train()

  • Generates lookup tables or classifier weights.
  • Optionally saves pre-computed symbolic/latent feature sets.

infer()

  • Receives time series (via ADIA test format).
  • Runs symbolic + latent extraction.
  • Scores using fusion model.
  • Outputs prediction using ADIA required interface.

6. Evaluation Strategy

  • Local testing using crunch.test()
  • ROC AUC maximization through score fusion tuning
  • False positive control via symbolic confidence thresholds
  • Checkpoint and resume logic embedded in worker process

7. Task Sequence for Implementation

Phase Depends On Description
Baseline Forking & Setup None Reproduce and verify ADIA starter baseline
Symbolic Module Baseline Implement ECA rule encoding, motif deltas
Latent Module Symbolic Module Train transformer + derive latent deltas
Topological Divergence Latent Module Add UMAP/curvature-based fault geometry
Unified Fusion Symbolic + Latent Score fusion model with symbolic-latent alignment
ROC Optimization Unified Fusion Tune classifier thresholds, improve AUC
Final Submission Wrapping All modules Convert to single submission-ready notebook with backup

8. Appendix A: GitHub Citation Table

Component Repo Paper Used For
Symbolic ECA eca-rule-transform Braun et al. ECA motif deltas
Grammar Extraction glenford-symbolic-ts Shani & Braun Symbolic sequences
Latent Attention timeseries-transformer Shani et al. Latent flows
Topological Divergence topo-ml Walch et al. Curvature shifts
Feature Alignment tsflex IDLab Time-windowed fusion
ADIA Baseline crunchdao/quickstarters ADIA Lab Base pipeline

9. Appendix B: Submission Checklist

  • Notebook conforms to train() and infer() API
  • ROC AUC locally exceeds baseline (0.8+ target)
  • Colab runtimes checkpoint every 10 hours
  • All workers authenticated to central coordinator
  • Backup notebook ready for Sept 30

End of Whitepaper