V4 Architecture Implementation Plan
V4 Architecture Implementation Plan
Critical Implementation Details
1. PermutationSymbolizer - The Foundation
class PermutationSymbolizer:
def __init__(self, embedding_dim, seed):
# Key insight: Use ordinal patterns to capture local dynamics
# This makes the approach invariant to amplitude scaling
self.embedding_dim = embedding_dim
self.rng = np.random.RandomState(seed)
def symbolize_vector(self, vector):
# Convert to ordinal pattern (Bandt-Pompe symbolization)
# Handle ties with randomized tie-breaking for robustness
return self._compute_ordinal_pattern(vector)
Why this works: Ordinal patterns capture the relative ordering of values, making the approach robust to noise and amplitude variations while preserving temporal structure.
2. HierarchicalDynamicalEncoder - The Core Innovation
class HierarchicalDynamicalEncoder(nn.Module):
def forward(self, sequence_batch):
# CRITICAL: Must return (fingerprint_sequence, residuals_list)
# The residuals enable perfect reconstruction in the decoder
fingerprint_seq, residuals = self.encode_hierarchically(sequence_batch)
return fingerprint_seq, residuals
Key Design Decision: The tuple return format ensures that the decoder can perfectly reconstruct the input, which is essential for the MDL objective.
3. MDL_AU_Net_Autoencoder - The Pre-training Model
class MDL_AU_Net_Autoencoder(nn.Module):
def forward(self, sequence_batch):
# Encode with residuals
fingerprint_seq, residuals = self.encoder(sequence_batch)
# Decode for reconstruction
reconstructed = self.decoder(fingerprint_seq, residuals)
# Classify for rule identification
rule_logits = self.classifier(fingerprint_seq)
return reconstructed, rule_logits
MDL Objective: The model learns to compress (encode) and decompress (decode) while maintaining the ability to classify the underlying rule. This forces it to learn meaningful, structured representations.
4. StructuralBreakClassifier - The Fine-tuning Model
class StructuralBreakClassifier(nn.Module):
def forward(self, before_seqs, after_seqs):
# Process both periods
before_fingerprints = [self.encoder(seq)[0] for seq in before_seqs] # [0] extracts fingerprint
after_fingerprints = [self.encoder(seq)[0] for seq in after_seqs]
# Average fingerprints for stability
avg_before = torch.stack(before_fingerprints).mean(dim=0)
avg_after = torch.stack(after_fingerprints).mean(dim=0)
# Compare fingerprints
return self.classifier(torch.cat([avg_before, avg_after], dim=-1))
Key Insight: Averaging multiple fingerprints from the same period increases robustness to noise and provides a more stable representation.
Implementation Priorities
Phase 1: Core Components (Week 1-2)
- PermutationSymbolizer
- Implement ordinal pattern computation
- Add robust tie-breaking
- Validate on synthetic data
- SeriesProcessor
- Time-delay embedding
- Sliding window extraction
- Edge case handling
- Basic Encoder-Decoder
- Simple transformer blocks
- Residual connections
- Tuple return format
Phase 2: Advanced Architecture (Week 2-3)
- Hierarchical Attention
- Multi-scale processing
- Skip connections
- Residual preservation
- ECA Data Generation
- Diverse rule synthesis
- Composite rule handling
- Balanced dataset creation
- MDL Training Loop
- Reconstruction loss
- Classification loss
- Proper loss weighting
Phase 3: Fine-tuning and Optimization (Week 3-4)
- Structural Break Classifier
- Fingerprint averaging
- Comparison mechanisms
- Calibration for probability output
- Pipeline Integration
- End-to-end training
- Model persistence
- Inference optimization
Critical Success Factors
1. Representation Quality
- The encoder must learn meaningful, transferable representations
- Test on diverse synthetic datasets before real data
- Validate that similar dynamics produce similar fingerprints
2. Stability and Robustness
- Averaging multiple fingerprints is crucial for noisy data
- Proper normalization at each stage
- Robust handling of edge cases (short series, missing values)
3. Computational Efficiency
- Model size must fit competition constraints
- Inference time must be reasonable
- Memory usage optimization for long sequences
Validation Strategy
Synthetic Data Tests
- ECA Rule Transitions: Test on known rule changes
- Noise Robustness: Add varying levels of noise
- Scale Invariance: Test with different amplitude scales
- Temporal Robustness: Vary sequence lengths
Real Data Validation
- Cross-validation: Split training data properly
- Ablation Studies: Test individual components
- Comparison: Benchmark against statistical methods
- Interpretability: Analyze learned representations
Risk Mitigation
Technical Risks
- Overfitting to ECA: Use diverse synthetic data
- Poor Transfer: Validate on held-out real data early
- Computational Cost: Profile and optimize bottlenecks
Architectural Risks
- Tuple Format Issues: Extensive unit testing
- Fingerprint Averaging: Validate mathematical correctness
- Loss Function Balance: Systematic hyperparameter search
Expected Advantages Over Baseline
- Temporal Context: Captures sequential dependencies
- Learned Representations: Adapts to data characteristics
- Multi-scale Processing: Handles different break timescales
- Robustness: Ordinal patterns + averaging increase stability
- Transferability: Pre-trained representations generalize better