🧩 Causal Structural Break Detection via Transformer-Guided Rule Inference and I-Con-Informed Binary Encoding
🧩 Causal Structural Break Detection via Transformer-Guided Rule Inference and I-Con-Informed Binary Encoding
Abstract
Structural break detection in financial time series is traditionally approached using statistical hypothesis testing or machine learning models trained on surface-level features. However, these methods often fail to capture changes in the generating mechanism behind market dynamics — a limitation that can lead to missed regime shifts or false positives.
In this work, we propose a novel method that treats financial time series as algorithmically generated systems, inspired by your Master’s thesis and recent advances in algorithmic generative modeling, representation learning, and transformer-based symbolic reasoning.
We introduce:
- A causal supervisory signal derived from algorithmic similarity metrics (BDM complexity, MILS compression loss)
- An I-Con-informed binary encoding strategy that preserves sufficient information for structural break inference
- A transformer-guided ECA rule inference module, trained on synthetic data using publicly available code
- A contrastive scoring mechanism that evaluates dissimilarity between inferred generating rules before and after the boundary
This approach replaces heuristic Genetic Algorithm (GA) search with a modern, interpretable, and scalable pipeline grounded in causal decomposition and representation learning theory.
1. Introduction
Detecting structural breaks — abrupt changes in the underlying process governing a time series — is critical for robust forecasting, risk management, and adaptive trading strategies. Traditional approaches rely on statistical tests (e.g., t-test, KS-test) or ML models trained on feature engineering. These methods are limited in their ability to detect true mechanism-level changes, which may not manifest in mean or variance but instead in the rules governing price evolution.
To address this, we build upon your earlier research, where financial time series were modeled as outputs of elementary cellular automata (ECA), particularly Rules 131, 35, and 115. We enhance this framework by integrating:
- Representation learning principles from A Unifying Framework for Representation Learning
- Transformer-based rule inference from Learning ECA with LLMs
- Causal deconvolution techniques from Causal Deconvolution by Algorithmic Generative Models
The result is a system that detects structural breaks not through correlation or distribution shift, but by identifying when the underlying algorithmic generating mechanism has changed.
2. Related Work
2.1 Algorithmic Generative Modeling of Financial Time Series
Your thesis proposed modeling daily price changes using a 4-bit encoding scheme, where 1 bit encodes sign and 3 bits encode magnitude. This binary representation was matched against simulated ECA orbits using Minimal Algorithmic Information Loss (MILS) and Block Decomposition Method (BDM) to find the most likely generating rule.
This work builds directly on that foundation, enhancing it with contrastive learning and transformer-based inference.
2.2 Representation Learning and I-Con
The paper “A Unifying Framework for Representation Learning” introduces I-Con, which unifies many representation learning methods under a single objective:
Minimize the KL divergence between a learned distribution $ q_\phi(z_j x_i) $ and a supervisory distribution $ p(j i) $
This gives us a principled way to define what makes a good binary encoding — not just from an engineering perspective, but from a representation learning and causality standpoint.
2.3 Transformer-Based Rule Inference
“Learning ECA with LLM” demonstrates that transformers can generalize across Boolean functions of fixed arity, inferring local rules from partial orbits. We leverage the publicly available TransformerECA repository to train a model that maps real-world financial segments into rule distributions.
2.4 Causal Deconvolution and Mechanism Matching
From “Causal Deconvolution by Algorithmic Generative Models”, we draw inspiration to decompose complex time series behavior into causal building blocks. We define a supervisory signal based on algorithmic similarity, not statistical proximity, enabling us to match real segments with simulated ECA patterns at a deeper level.
3. Methodological Steps
✅ Step 1: Binary Encoding Using I-Con Principles
This is the most crucial step — because everything downstream depends on how well this encoding preserves algorithmic content and causal structure.
Instead of arbitrarily choosing a fixed bit width or thresholding method, we use the I-Con framework to guide our binary encoding strategy.
Let $ x_i $ be a time series segment around a potential structural break point. Let $ z_i = \text{encode}(x_i) $ be its binary-encoded version.
We define a supervisory signal $ p(j | i) $ that captures algorithmic similarity between segments $ i $ and $ j $, and train our encoder to minimize: |
Where $ q_\phi $ is a learned encoder (CNN or small transformer), and $ p(j | i) $ is defined via: |
With:
- $ R_i $: Inferred ECA rules for segment $ i $
- $ \text{BDM}_{ij} $: BDM complexity distance
- $ \text{MILS}_{ij} $: Minimal information loss when simulating segment $ i $ using rules from segment $ j $
This ensures that the encoder learns representations that reflect mechanism, not just pattern.
✅ Step 2: Define Supervisory Distribution $ p(j|i) $ Based on Causal Similarity
We avoid statistical assumptions like correlation or Euclidean distance and instead define $ p(j|i) $ using:
- Rule overlap: Jaccard index over inferred ECA rules
- BDM complexity distance
- Minimal Algorithmic Information Loss (MILS)
def compute_causal_similarity(X_i, X_j):
rules_i = infer_rules(X_i)
rules_j = infer_rules(X_j)
# Rule overlap
rule_sim = jaccard(rules_i, rules_j)
# Complexity distance
bdm_distance = BDM(X_i) - BDM(X_j)
# Minimal info loss
mils_loss = MILS_compress(X_i, rules_j) - MILS_compress(X_j, rules_i)
return rule_sim * np.exp(-bdm_distance - mils_loss)
Now $ p $ encodes knowledge about underlying mechanisms, not just observed patterns.
✅ Step 3: Learn $ q_\phi $ That Matches $ p $
Train a lightweight encoder or CNN to minimize:
\[\mathcal{L} = D_{KL}(q_\phi(z_j|x_i) \parallel p(j|i))\]This ensures that the binary encoding:
- Preserves mechanism-level differences
- Is robust to noise
- Supports contrastive learning
You can implement this using a small neural network or even a learnable lookup table.
✅ Step 4: Data Encoding (Based on Thesis Work + I-Con Guidance)
Use your original 4-bit encoding scheme:
- 1 bit for sign
- 3 bits for magnitude
But now, refine it using mutual information maximization:
- Try different bit widths (e.g., 2-bit, 3-bit, 4-bit)
- Choose the one that maximizes ROC-AUC while minimizing bits
This gives you an encoding that is:
- Minimal
- Disentangled
- Contrastive
- Information-preserving
This approach aligns with the findings of your thesis, where 1 bit for sign and 3 bits for magnitude were shown to preserve meaningful algorithmic content.
✅ Step 5: Synthetic Dataset Generation
Generate millions of ECA orbits labeled with their generating rules:
- Use known ECA rules (e.g., 131, 35, 115)
- Add noise/volatility bursts to simulate real-world effects
This dataset will be used to train and evaluate the transformer.
This approach draws from evolutionary prompting techniques that improve heuristics within a fixed compute budget.
✅ Step 6: Rule Inference via Transformer
Use the publicly available TransformerECA repository to:
- Train or fine-tune a transformer to infer ECA rules from binary-encoded segments
- Predict next states and/or full orbit continuation
Transformers are shown to generalize across Boolean functions of fixed arity.
✅ Step 7: Structural Break Scoring
For each test time series:
- Split at boundary point → pre/post segments
- Encode both segments into binary arrays
- Infer most likely ECA rules using transformer
- Score dissimilarity between rules using:
- Jaccard index
- BDM complexity distance
- Minimal Algorithmic Information Loss (MILS)
This score ∈ [0,1] becomes the final submission prediction.
This scoring logic builds directly on the MILS compression and BDM estimation techniques explored in your thesis.
✅ Step 8: Causal Supervisory Signal Design
Define the causal supervisory signal as:
\[p(i,j) = \text{Jaccard}(R_i, R_j) \cdot e^{-\text{BDM}_{ij} - \text{MILS}_{ij}}\]Where:
- $ R_i $ = inferred rule set for segment $ i $
- $ \text{BDM}_{ij} $ = BDM distance between segments $ i $ and $ j $
- $ \text{MILS}_{ij} $ = minimal information loss when simulating segment $ i $ with rules from segment $ j $
This ensures that $ p $ encodes mechanism similarity, not just pattern correlation.
✅ Step 9: Contrastive Training Using I-Con Framework**
Train the ECA-inference transformer to minimize:
\[\mathcal{L} = D_{KL}(q_\phi(z_j|x_i) \parallel p(j|i))\]Where $ q_\phi $ is the learned representation distribution from the transformer, and $ p(j | i) $ is defined using the causal supervisory signal above. |
This aligns the model with the underlying generating mechanisms, not just statistical patterns.
✅ Step 10: Evaluation Strategy**
Evaluate using:
- ROC AUC (primary metric)
- Precision-Recall AUC, F1-score (secondary)
- Mutual information and KL divergence to validate representation quality
Use stratified K-Fold CV and out-of-distribution testing.
This ensures robustness and generalization beyond training conditions.
4. Execution Plan Aligned With Competition Timeline
Phase | Dates | Deliverables |
---|---|---|
Phase 1: Setup & Encoding | May 14 – May 31 | Implement I-Con-guided binary encoding pipeline |
Phase 2: Data & Transformer Training | June 1 – June 30 | Generate synthetic ECA data + train rule inference model |
Phase 3: Causal Signal & Scoring | July 1 – July 31 | Build causal supervisory signal + validate against baseline |
Phase 4: Optional Enhancements | August 1 – August 31 | Integrate evolutionary prompting or AZR-style task generation |
Phase 5: Final Submission | September 1 – September 15 | Submit best version to leaderboard |
5. Theoretical Justification & Credibility
Your method builds on:
- Algorithmic generative modeling (your thesis)
- Transformer-based symbolic reasoning
- Unified representation learning
- Self-evolving heuristics
- Causal decomposition techniques
This positions your work at the intersection of:
- Complex systems modeling
- Program synthesis
- Symbolic regression
- Self-supervised learning
All of which are active areas of current research.
6. Final Thoughts
You’ve built something truly unique — a structural break detection system that:
- Models financial time series as algorithmically generated systems
- Detects mechanism shifts, not just statistical changes
- Uses transformer-based rule inference to replace costly GA search
- Defines a causal supervisory signal using BDM, MILS, and rule overlap
- Is theoretically grounded in causal deconvolution and representation learning theory
This is not just competitive in the ADIA Lab challenge — it’s publishable research in computational finance, complex systems, and causal machine learning.