NESM-LM v2: Source Code for "A Neuro-Evolutionary System Model Integrating Perception, Action, and Energetic Homeostasis in Artificial Agents"

This repository contains the complete source code for NESM-LM v2 (Neuro-Evolutionary System Model Language Modelling, version 2.0), a bio-inspired language model architecture submitted for publication in Neural Computing and Applications (Springer). NESM-LM v2 is a biologically grounded substitute for the Transformer architecture. It replaces the O(n²) self-attention mechanism with an O(n) thalamic gating module, and the KV-cache (O(n·H·L)) with a cerebellar LSTM at constant memory O(H). Its seventh native module — the Hippocampus integrates Elastic Weight Consolidation (EWC), reservoir-sampled episodic replay, and Gradient Episodic Memory (GEM) as unconditional architectural components, making NESM-LM v2, to the authors' knowledge, the first language model whose architecture structurally prevents catastrophic forgetting. Architecture — Seven Biological Modules # Module Biological homologue Function ① SensoryCortex Sensory cortex Token embedding + Welford online normalisation ②③ ThalamoCorticalBlock × N Thalamus + Associative cortex O(n) contextual gating + deep MLP ④ Cerebellum (LSTM) Cerebellum Sequential memory — replaces KV-cache at O(H) constant ⑤ Brainstem Brainstem / locus coeruleus Homeostatic gain regulation ∈ [0.5, 1.5] ⑥ Hippocampus Hippocampus EWC + Episodic replay + GEM — native anti-forgetting ⑦ BasalGanglia Basal ganglia Vocabulary projection + weight tying Key Results - Controlled Benchmark Iso-parameter benchmark vs GPT-2 (41,527,296 params identical corpus, schedule, precision) : Model Parameters Best val perplexity Epoch Throughput NESM-LM v2 41,069,825 170.48 ★ 12 / 20 ≈ 10,500 tok/s GPT-2 (matched) 41,527,296 499.99 20 / 20 ≈ 11,267 tok/s NESM-LM v2 achieves 2.9× lower validation perplexity than the iso-parameter GPT-2 baseline and converges 37% earlier (epoch 12 vs 20+). Both models trained from scratch, same corpus (43,616,698 characters), AdamW lr = 3×10⁻⁴, cosine schedule, batch 32, seq_len 512, fp16. Repository Structure NESM EVOLUTION/ │ ├── script v3/ ← Production codebase (v2 FINAL) │ ├── nesm_lm_core_v2_FINAL.py # Full 7-module architecture (Hippocampus included) │ ├── cerebellum_pretrain_lm.py # Cerebellar pre-training │ ├── train_nesm_lm_v2.py # Standard training loop │ ├── train_continual.py # Continual learning — sequential corpora │ ├── benchmark_nesm_vs_gpt2.py # Iso-parameter benchmark vs GPT-2 │ ├── benchmark_full.py # Extended benchmark suite │ ├── README.md # Quick-start guide │ ├── GUIDE.md # Complete research guide │ └── TESTING_GUIDE.md # Falsifiability test suite (7 claims) │ ├── NESM_LM_Functional_Architecture_v2_EN.drawio ├── NESM_LM_Technical_Architecture_v2_EN.drawio ├── NESM_LM_Architecture_Fonctionnelle_v2.drawio ├── NESM_LM_Architecture_Technique_v2.drawio │ └── Old/ ← Previous iterations (v1, v2.1, v2.2) — kept for traceability Recommended Workflow Step 1 Cerebellar pre-training python cerebellum_pretrain_lm.py \ --corpus data/corpus.txt --tokenizer_type bpe \ --hidden_dim 512 --epochs 10 --output pretrained_cerebellum.pt Step 2 Full NESM-LM v2 training python train_nesm_lm_v2.py \ --corpus data/corpus.txt --hidden_dim 512 --num_blocks 4 \ --seq_len 512 --batch_size 32 --epochs 30 --lr 3e-4 \ --cereb_loss_weight 0.1 --cereb_pretrain pretrained_cerebellum.pt \ --fp16 --output_dir runs/nesm_lm_v2 Step 3 Continual learning across corpora python train_continual.py \ --corpora data/corpus_A.txt data/corpus_B.txt --fp16 Step 4 Benchmark vs GPT-2 python benchmark_nesm_vs_gpt2.py \ --corpus data/corpus.txt --hidden_dim 512 --num_blocks 4 \ --seq_len 512 --batch_size 32 --epochs 20 --fp16 Dependencies pip install torch>=2.0 tiktoken pip install wandb # optional — experiment tracking (--wandb flag) Tested on: Python 3.12 · PyTorch 2.3.1 · Ubuntu 24.04 LTS · NVIDIA Tesla V100 32 GB (fp16) · NVIDIA RTX 3080 10 GB. Loss Function L_total = L_lm + λ · L_cereb + L_ewc L_lm = CrossEntropy(logits[:, :-1], token_ids[:, 1:]) L_cereb = MSE(normalize(pred_emb), normalize(next_emb)) [scale-invariant] L_ewc = (λ_ewc/2) · Σᵢ Fᵢ · (θᵢ − θ*ᵢ)² [active post-consolidation] λ = 0.1 (cerebellar weight) · λ_ewc = 5,000 (EWC penalty) Experimental Infrastructure System GPU RAM OS HPE ProLiant ML350 Gen10 (20 cores) NVIDIA Tesla V100 32 GB 96 GB DDR4 Ubuntu 24.04 LTS PC Aorus Model S (Intel i9, 5.1 GHz) NVIDIA RTX 3080 10 GB 32 GB DDR4 Ubuntu 24.04 LTS

Related Organizations

Université Virtuelle de Côte d'Ivoire
Cote d'Ivoire

Keywords

Energetic homeostasis, Multimodal sensorimotor integration, catastrophic forgetting, Perception–action loop, hippocampus module, Neuro-evolutionary systems, Biologically inspired artificial intelligence, Computational neuroscience, Brain-inspired architectures, Reinforcement learning, elastic weight consolidation, episodic replay, Adaptive agents, O(n) sequence modelling

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average