C4 Level 3: Component View – `symbolic_features.py`

This component view describes the internal structure of the symbolic_features.py module in the CIv8r mesoscope system. It is responsible for extracting symbolic features from univariate time series using cellular automata (ECA), symbolic tokenization, and motif-level analysis.

📦 Container: `symbolic_features.py`

Purpose: Generate symbolic representations and features from pre-break and post-break segments of a time series for downstream fusion and scoring.

Source Repositories:

TransformerECA (TransformerECA_ipynb.txt)
Intelligence_at_the_edge_of_chaos (Intelligence_at_the_edge_of_chaos_py.txt)

GitHub Traceback License:

Apache 2.0 (confirmed for both uploaded sources)

🧩 Internal Components

Component	Description	GitHub Source	Reuse Type
`generate_symbolic_sequence(series, rule, steps)`	Converts raw time series into symbolic bit string using ECA dynamics	`cellular_automaton()` in `utils.py`【66†Intelligence_at_the_edge_of_chaos】	🛠️ Adapt
`tokenize_symbolic_sequence(symbolic_sequence)`	Applies 1D tokenizer to generate motif tokens	`SimpleTokenizer`【65†TransformerECA_ipynb】	♻️ Reuse
`compute_entropy_delta(pre_tokens, post_tokens)`	Measures change in symbolic entropy across boundary	New logic (entropy stats)	✳️ Build
`compute_compression_delta(pre_seq, post_seq)`	Uses LZ or dictionary compression size delta	Adapted from motif stack ideas	✳️ Build
`extract_motif_frequencies(tokens)`	Extracts motif frequency table from token list	Implicit in GPT2-style prep【66†Intelligence_at_the_edge_of_chaos】	🛠️ Adapt
`compute_motif_kl_divergence(pre_freq, post_freq)`	Computes KL divergence or JS divergence between pre/post frequency tables	New logic	✳️ Build
`generate_feature_vector()`	Combines all symbolic metrics into output dict	New code	✳️ Build

🔗 Dependencies

numpy, collections.Counter, math.log2
Optional: lzma or custom LZ complexity estimator
tqdm, re for tokenizer sanitation (from TransformerECA)

🧪 Test Strategy

Component	Test Case
`generate_symbolic_sequence()`	Compare output bits for known ECA rules (e.g. Rule 30, Rule 90)
`tokenize_symbolic_sequence()`	Validate consistent tokens from symbolic strings
`compute_entropy_delta()`	Assert entropy(pre) < entropy(post) for injected drift
`compute_motif_kl_divergence()`	KL(p	q) ≠ KL(q	p); verify symmetry using JS

📤 Output Format

{
    'entropy_delta': 0.23,
    'compression_delta': 0.18,
    'kl_divergence': 1.09,
    'js_divergence': 0.65,
    'pre_freq': {"010": 8, "110": 3},
    'post_freq': {"010": 2, "001": 6}
}

This dictionary will be passed to the fusion_score.py module for integration with latent features.

✅ Status Summary

Core symbolic engine fully backed by uploaded GitHub repos
Divergence and delta metrics need to be implemented
All components Colab-compatible and checkpointable

Next Steps:

Integrate this component view into CIv8r internal whitepaper as Appendix C
Begin prototyping symbolic runtime in Colab with minimal inputs
Validate outputs on ADIA sample test data

End of Component View

C4 Level 3: Component View – `symbolic_features.py`

📦 Container: `symbolic_features.py`

Purpose: Generate symbolic representations and features from pre-break and post-break segments of a time series for downstream fusion and scoring.

Source Repositories:

TransformerECA (TransformerECA_ipynb.txt)
Intelligence_at_the_edge_of_chaos (Intelligence_at_the_edge_of_chaos_py.txt)

GitHub Traceback License:

Apache 2.0 (confirmed for both uploaded sources)

🧩 Internal Components

Component	Description	GitHub Source	Reuse Type
`generate_symbolic_sequence(series, rule, steps)`	Converts raw time series into symbolic bit string using ECA dynamics	`cellular_automaton()` in `utils.py`【66†Intelligence_at_the_edge_of_chaos】	🛠️ Adapt
`tokenize_symbolic_sequence(symbolic_sequence)`	Applies 1D tokenizer to generate motif tokens	`SimpleTokenizer`【65†TransformerECA_ipynb】	♻️ Reuse
`compute_entropy_delta(pre_tokens, post_tokens)`	Measures change in symbolic entropy across boundary	New logic (entropy stats)	✳️ Build
`compute_compression_delta(pre_seq, post_seq)`	Uses LZ or dictionary compression size delta	Adapted from motif stack ideas	✳️ Build
`extract_motif_frequencies(tokens)`	Extracts motif frequency table from token list	Implicit in GPT2-style prep【66†Intelligence_at_the_edge_of_chaos】	🛠️ Adapt
`compute_motif_kl_divergence(pre_freq, post_freq)`	Computes KL divergence or JS divergence between pre/post frequency tables	New logic	✳️ Build
`generate_feature_vector()`	Combines all symbolic metrics into output dict	New code	✳️ Build

🔗 Dependencies

numpy, collections.Counter, math.log2
Optional: lzma or custom LZ complexity estimator
tqdm, re for tokenizer sanitation (from TransformerECA)

🧪 Test Strategy

Component	Test Case
`generate_symbolic_sequence()`	Compare output bits for known ECA rules (e.g. Rule 30, Rule 90)
`tokenize_symbolic_sequence()`	Validate consistent tokens from symbolic strings
`compute_entropy_delta()`	Assert entropy(pre) < entropy(post) for injected drift
`compute_motif_kl_divergence()`	KL(p	q) ≠ KL(q	p); verify symmetry using JS

📤 Output Format

{
    'entropy_delta': 0.23,
    'compression_delta': 0.18,
    'kl_divergence': 1.09,
    'js_divergence': 0.65,
    'pre_freq': {"010": 8, "110": 3},
    'post_freq': {"010": 2, "001": 6}
}

This dictionary will be passed to the fusion_score.py module for integration with latent features.

✅ Status Summary

Core symbolic engine fully backed by uploaded GitHub repos
Divergence and delta metrics need to be implemented
All components Colab-compatible and checkpointable

Next Steps:

Integrate this component view into CIv8r internal whitepaper as Appendix C
Begin prototyping symbolic runtime in Colab with minimal inputs
Validate outputs on ADIA sample test data

End of Component View

C4 Level 3: Component View – symbolic_features.py

📦 Container: symbolic_features.py

🧩 Internal Components

🔗 Dependencies

🧪 Test Strategy

📤 Output Format

✅ Status Summary

C4 Level 3: Component View – symbolic_features.py

📦 Container: symbolic_features.py

🧩 Internal Components

🔗 Dependencies

🧪 Test Strategy

📤 Output Format

✅ Status Summary

C4 Level 3: Component View – `symbolic_features.py`

📦 Container: `symbolic_features.py`

C4 Level 3: Component View – `symbolic_features.py`

📦 Container: `symbolic_features.py`