An open exploration of viable human-AI systems.
View the Project on GitHub algoplexity/cybernetic-intelligence
This document outlines the software architecture for the CIv14 Cybernetic Divergence Detector, a system designed to identify structural breaks in multivariate time series data. The system is predicated on the CIv13/14 hypothesis, which posits that a structural break is best detected as a misalignment between the causal surface grammar (symbolic dynamics) and the latent temporal structure (continuous dynamics) of a system.
The architecture employs a dual-path, neuro-symbolic Siamese network. It is pre-trained on idealized, rule-based systems (Elementary Cellular Automata) and contrastive temporal data (TS2Vec methodology) to create expert encoders. The final model is then fine-tuned to classify the divergence between these two pathways on real-world data. This document follows the C4 model to describe the system at the Context, Container, and Component levels.
This diagram shows the system in its operating environment, identifying its users and key external dependencies.
Diagram:
graph TD
A[Data Scientist / Quant Analyst] -->|Trains & Monitors| C(CIv14 Break Detector);
D[Market Data Feed] -->|Provides Raw Time Series| C;
C -->|Outputs Break Probabilities| E[Monitoring Dashboard];
F[Cloud Compute Environment] -->|Hosts| C;
G[Python ML Ecosystem] -->|Provides Libraries For| C;
style C fill:#007bff,stroke:#fff,stroke-width:2px,color:#fff
Description: The CIv14 Break Detector is the system we are building. It is operated by a Data Scientist who provides it with raw time series data from a Market Data Feed. The system is trained and runs within a Cloud Compute Environment, leveraging the Python ML Ecosystem. Its final output is a stream of break probabilities, which are consumed by a Monitoring Dashboard or Alerting System.
This diagram shows the high-level technical building blocks of the CIv14 system. It breaks the system down into a set of deployable services or processes.
Diagram:
graph TD
subgraph "CIv14 System"
direction LR
A[User] --> B(Training Service);
B --> D[(Model Artifact Store)];
B --> E[(Feature Cache)];
F(Inference API) -->|Loads Model| D;
G(Feature Engineering Pipeline) -->|Writes To| E;
H[Data Source] --> G;
end
style B fill:#28a745,stroke:#fff,stroke-width:2px,color:#fff
style F fill:#28a745,stroke:#fff,stroke-width:2px,color:#fff
Containers:
Feature Cache, orchestrates the pre-training and fine-tuning of the dual-path model, and saves the final trained model to the Model Artifact Store.best_adia_model.pth).This diagram zooms into the Training Service container to show its internal modules. This is where our core logic and intellectual property reside.
Diagram:
graph TD
subgraph "Training Service"
A[Training Loop] -->|Uses| B[CIv14 Divergence Classifier];
B -->|Contains & Uses| C[Symbolic Encoder : Pre-trained Bi-GRU];
B -->|Contains & Uses| D[Latent Encoder : Pre-trained TSEncoder];
A -->|Gets Data From| E[ADIADualPathDataset];
F[Permutation Symbolizer :d=6, τ=10] -->|Used By| E;
end
style B fill:#17a2b8,stroke:#fff,stroke-width:2px,color:#fff
Components:
PermutationSymbolizer): A component responsible for converting raw time series data into symbolic sequences using the Bandt-Pompe method with our empirically validated parameters (d=6, τ=10).Dataset that interfaces with the Feature Cache. It uses the Symbolizer and is responsible for serving the four required tensors (raw_A, symbolic_A, raw_B, symbolic_B) and the label for each training sample.TSEncoder that has learned via contrastive methods to create “fingerprint” embeddings of a raw time series segment’s dynamics. A pre-trained TSEncoder (the “Latent Brain”)(from the TS2Vec framework). This model is the “Latent Brain,” an expert in creating a “fingerprint” embedding of a raw time series segment’s continuous dynamics.nn.Module. It is a Siamese network that contains the two shared encoders. Its primary responsibility is to compute the divergence between the symbolic and latent pathways and feed the combined signal to a final classifier head.This section documents the critical decisions made during development, grounded in our iterative testing.
| Decision | Rationale |
|---|---|
| 1. Adopt a Dual-Path (Neuro-Symbolic) Architecture | Empirical Failure of Single Paths: Standalone tests of both the Latent-Only path (raw data) and the Symbolic-Only path resulted in an AUC score of ~0.5. This proved that neither signal is sufficient on its own and validated the core CIv13/14 hypothesis that the break signal lies in their combined divergence. |
| 2. Use a Siamese (Twin-Encoder) Structure | Correct Inductive Bias: The problem is fundamentally one of comparison. A Siamese network, with its shared weights, is architecturally designed to learn a meaningful embedding space for measuring the distance/divergence between two inputs, directly matching the problem statement. |
| 3. Pre-train Encoders on Specialized Tasks | Need for Expert Priors: Training on the noisy ADIA data from scratch failed (AUC ≈ 0.5). To succeed, the encoders need a strong inductive bias. The Symbolic Encoder is pre-trained on ECAs to learn “causality,” while the Latent Encoder is pre-trained via contrastive learning to understand “dynamics.” |
| 4. Empirically Tune Symbolizer Parameters | Data-Driven Configuration: The Phase 0 Test Harness revealed that d=6, τ=10 were the most sensitive parameters for detecting symbolic divergence in the actual ADIA data. This grounds our feature engineering in evidence, not guesswork. |
| 5. Use a Weighted Loss Function | Handle Class Imbalance: EDA showed a 71/29 class imbalance. A weighted BCEWithLogitsLoss is necessary to prevent the model from becoming biased towards the majority “no-break” class and to ensure it learns to correctly identify the rare “break” events. |
| 6. Implement a Feature Cache | Performance Optimization: Initial tests showed that on-the-fly data processing was a severe bottleneck, with I/O from Google Drive dominating runtime. A pre-compiled cache of processed features is essential for efficient training and iteration. |
| 1. Adopt a Dual-Path (Neuro-Symbolic) Architecture | Validated by Empirical Failure of Single Paths: Standalone tests of both the Latent-Only path (raw data) and the Symbolic-Only path resulted in an AUC score of ~0.5. This proved that neither signal is sufficient on its own and validated the core hypothesis that the break signal lies in their combined divergence. |
| 2. Use a Pre-training Strategy for Encoders | Grounded in Burtsev & Zhang’s Work: Research shows that Transformers/RNNs can infer the causal rules of complex systems (like Class IV ECAs) when properly pre-trained. This creates an “intelligent” encoder with a strong inductive bias for rule-finding, which is essential for overcoming the noise of the ADIA data where naive models failed. |
| 3. Select a Bi-directional GRU for the Symbolic Encoder | Winner of an Empirical Bake-Off: In a head-to-head pre-training competition on the ECA rule-inference task, the Bi-GRU was both more accurate (90% vs. 84%) and dramatically more efficient (22s vs. 11s per epoch for Uni-GRU) than a Unidirectional GRU. It also outperformed a Transformer baseline, making it the clear, data-driven choice. |
4. Select a TSEncoder for the Latent Encoder |
Proven for Rich Representations: The TS2Vec paper demonstrated that its encoder architecture excels at creating robust, multi-scale “fingerprint” embeddings suitable for a variety of downstream tasks, including anomaly detection. It is a purpose-built tool for this role. |
5. Empirically Tune Symbolizer Parameters to d=6, τ=10 |
Data-Driven Configuration: A systematic test harness using Jensen-Shannon Divergence was run on the actual ADIA data. The resulting heatmap showed an unambiguous peak in signal sensitivity at a permutation dimension of d=6 and lag of τ=10. This decision is grounded in a direct analysis of the target dataset’s properties. |
| 6. Fine-Tune by Freezing Encoders | Efficient Knowledge Transfer: To prevent the pre-trained “expert” knowledge from being destroyed by exposure to the noisy ADIA data, we will freeze the weights of both encoders. We will train only the final, lightweight classifier head, which learns to interpret the divergence signals from the two experts. |
| 7. Use a Weighted Loss Function | Handle Class Imbalance: EDA showed a 71/29 class imbalance. A weighted BCEWithLogitsLoss is necessary to prevent the model from becoming biased towards the majority “no-break” class and to ensure it learns to correctly identify the rare “break” events. |