Internal Technical Whitepaper
Internal Technical Whitepaper
Title: From Theory to Submission: Implementing CIv8r Structural Break Detection on ADIA Challenge Data (Colab Edition)
1. Purpose & Implementation Goals
This internal whitepaper translates the theoretical constructs of the CIv8r hypothesis suite—CIv8-ECA, CIv8r-LLM, and CIv8-unified—into a concrete, reproducible, and high-performance implementation for the ADIA Lab Structural Break Challenge.
Goals:
- Achieve a top-tier ROC AUC score for structural break detection.
- Implement symbolic-latent detection pipeline using open-source repositories.
- Leverage our Colab-based distributed task orchestration layer.
- Ensure reproducibility and checkpointed training within Colab runtime constraints.
2. Architecture Overview
A. Symbolic Pipeline (CIv8-ECA Layer)
- Detects motif transitions, compressibility shifts across boundary.
- Uses ECA rules (e.g. 30, 110) and symbolic grammar features.
-
Repos:
eca-rule-transform
glenford-symbolic-ts
B. Latent Embedding Pipeline (CIv8r-LLM Layer)
- Learns latent topologies and detects geometric discontinuities.
- Applies attention-based encoder and UMAP-style topology modeling.
-
Repos:
timeseries-transformer
topo-ml
C. Unified Scoring Layer (CIv8-unified)
- Fuses symbolic and latent signals into a final structural break score.
- Implements scoring fusion logic using time-windowed alignment.
-
Repos:
tsflex
3. Repository Integration Map
Each module integrates key GitHub repositories:
Module | Repository | Source | Used For |
---|---|---|---|
symbolic_features.py |
eca-rule-transform , glenford-symbolic-ts |
Braun et al. | ECA motif extraction |
latent_encoder.py |
timeseries-transformer |
Shani et al. | Transformer-based latent drift |
topo_analysis.py |
topo-ml (UMAP) |
Walch et al. | Latent curvature divergence |
fusion_score.py |
tsflex |
Predict IDLab | Feature fusion and ROC optimization |
orchestration_server.py |
Custom | Algoplexity | Task scheduler & checkpointing |
4. Module-by-Module Implementation
4.1 symbolic_features.py
- Accepts per-series dataframe (with
period
column). - Applies ECA rules (30, 110, randomized) to symbolic discretization.
- Outputs: motif frequency delta, entropy difference, compression delta.
4.2 latent_encoder.py
- Applies pretrained transformer to before/after segments.
- Measures latent space shift via cosine distance and attention flow.
4.3 topo_analysis.py
- Projects residual stream or hidden state via UMAP.
- Computes pre/post boundary manifold curvature.
4.4 fusion_score.py
- Aligns symbolic and latent feature vectors.
- Trains lightweight model (logistic/XGBoost) or uses calibrated weighting.
- Outputs a probability score: 0 (no break) – 1 (break).
5. Train/Infer Adaptation
train()
- Generates lookup tables or classifier weights.
- Optionally saves pre-computed symbolic/latent feature sets.
infer()
- Receives time series (via ADIA test format).
- Runs symbolic + latent extraction.
- Scores using fusion model.
- Outputs prediction using ADIA required interface.
6. Evaluation Strategy
- Local testing using
crunch.test()
- ROC AUC maximization through score fusion tuning
- False positive control via symbolic confidence thresholds
- Checkpoint and resume logic embedded in worker process
7. Task Sequence for Implementation
Phase | Depends On | Description |
---|---|---|
Baseline Forking & Setup | None | Reproduce and verify ADIA starter baseline |
Symbolic Module | Baseline | Implement ECA rule encoding, motif deltas |
Latent Module | Symbolic Module | Train transformer + derive latent deltas |
Topological Divergence | Latent Module | Add UMAP/curvature-based fault geometry |
Unified Fusion | Symbolic + Latent | Score fusion model with symbolic-latent alignment |
ROC Optimization | Unified Fusion | Tune classifier thresholds, improve AUC |
Final Submission Wrapping | All modules | Convert to single submission-ready notebook with backup |
8. Appendix A: GitHub Citation Table
Component | Repo | Paper | Used For |
---|---|---|---|
Symbolic ECA | eca-rule-transform |
Braun et al. | ECA motif deltas |
Grammar Extraction | glenford-symbolic-ts |
Shani & Braun | Symbolic sequences |
Latent Attention | timeseries-transformer |
Shani et al. | Latent flows |
Topological Divergence | topo-ml |
Walch et al. | Curvature shifts |
Feature Alignment | tsflex |
IDLab | Time-windowed fusion |
ADIA Baseline | crunchdao/quickstarters |
ADIA Lab | Base pipeline |
9. Appendix B: Submission Checklist
- Notebook conforms to
train()
andinfer()
API - ROC AUC locally exceeds baseline (0.8+ target)
- Colab runtimes checkpoint every 10 hours
- All workers authenticated to central coordinator
- Backup notebook ready for Sept 30
End of Whitepaper