
This repository contains the complete source code for NESM-LM v2 (Neuro-Evolutionary System Model Language Modelling, version 2.0), a bio-inspired language model architecture submitted for publication in Neural Computing and Applications (Springer). NESM-LM v2 is a biologically grounded substitute for the Transformer architecture. It replaces the O(n²) self-attention mechanism with an O(n) thalamic gating module, and the KV-cache (O(n·H·L)) with a cerebellar LSTM at constant memory O(H). Its seventh native module — the Hippocampus integrates Elastic Weight Consolidation (EWC), reservoir-sampled episodic replay, and Gradient Episodic Memory (GEM) as unconditional architectural components, making NESM-LM v2, to the authors' knowledge, the first language model whose architecture structurally prevents catastrophic forgetting. Architecture — Seven Biological Modules # Module Biological homologue Function ① SensoryCortex Sensory cortex Token embedding + Welford online normalisation ②③ ThalamoCorticalBlock × N Thalamus + Associative cortex O(n) contextual gating + deep MLP ④ Cerebellum (LSTM) Cerebellum Sequential memory — replaces KV-cache at O(H) constant ⑤ Brainstem Brainstem / locus coeruleus Homeostatic gain regulation ∈ [0.5, 1.5] ⑥ Hippocampus Hippocampus EWC + Episodic replay + GEM — native anti-forgetting ⑦ BasalGanglia Basal ganglia Vocabulary projection + weight tying Key Results - Controlled Benchmark Iso-parameter benchmark vs GPT-2 (41,527,296 params identical corpus, schedule, precision) : Model Parameters Best val perplexity Epoch Throughput NESM-LM v2 41,069,825 170.48 ★ 12 / 20 ≈ 10,500 tok/s GPT-2 (matched) 41,527,296 499.99 20 / 20 ≈ 11,267 tok/s NESM-LM v2 achieves 2.9× lower validation perplexity than the iso-parameter GPT-2 baseline and converges 37% earlier (epoch 12 vs 20+). Both models trained from scratch, same corpus (43,616,698 characters), AdamW lr = 3×10⁻⁴, cosine schedule, batch 32, seq_len 512, fp16. Repository Structure NESM EVOLUTION/ │ ├── script v3/ ← Production codebase (v2 FINAL) │ ├── nesm_lm_core_v2_FINAL.py # Full 7-module architecture (Hippocampus included) │ ├── cerebellum_pretrain_lm.py # Cerebellar pre-training │ ├── train_nesm_lm_v2.py # Standard training loop │ ├── train_continual.py # Continual learning — sequential corpora │ ├── benchmark_nesm_vs_gpt2.py # Iso-parameter benchmark vs GPT-2 │ ├── benchmark_full.py # Extended benchmark suite │ ├── README.md # Quick-start guide │ ├── GUIDE.md # Complete research guide │ └── TESTING_GUIDE.md # Falsifiability test suite (7 claims) │ ├── NESM_LM_Functional_Architecture_v2_EN.drawio ├── NESM_LM_Technical_Architecture_v2_EN.drawio ├── NESM_LM_Architecture_Fonctionnelle_v2.drawio ├── NESM_LM_Architecture_Technique_v2.drawio │ └── Old/ ← Previous iterations (v1, v2.1, v2.2) — kept for traceability Recommended Workflow Step 1 Cerebellar pre-training python cerebellum_pretrain_lm.py \ --corpus data/corpus.txt --tokenizer_type bpe \ --hidden_dim 512 --epochs 10 --output pretrained_cerebellum.pt Step 2 Full NESM-LM v2 training python train_nesm_lm_v2.py \ --corpus data/corpus.txt --hidden_dim 512 --num_blocks 4 \ --seq_len 512 --batch_size 32 --epochs 30 --lr 3e-4 \ --cereb_loss_weight 0.1 --cereb_pretrain pretrained_cerebellum.pt \ --fp16 --output_dir runs/nesm_lm_v2 Step 3 Continual learning across corpora python train_continual.py \ --corpora data/corpus_A.txt data/corpus_B.txt --fp16 Step 4 Benchmark vs GPT-2 python benchmark_nesm_vs_gpt2.py \ --corpus data/corpus.txt --hidden_dim 512 --num_blocks 4 \ --seq_len 512 --batch_size 32 --epochs 20 --fp16 Dependencies pip install torch>=2.0 tiktoken pip install wandb # optional — experiment tracking (--wandb flag) Tested on: Python 3.12 · PyTorch 2.3.1 · Ubuntu 24.04 LTS · NVIDIA Tesla V100 32 GB (fp16) · NVIDIA RTX 3080 10 GB. Loss Function L_total = L_lm + λ · L_cereb + L_ewc L_lm = CrossEntropy(logits[:, :-1], token_ids[:, 1:]) L_cereb = MSE(normalize(pred_emb), normalize(next_emb)) [scale-invariant] L_ewc = (λ_ewc/2) · Σᵢ Fᵢ · (θᵢ − θ*ᵢ)² [active post-consolidation] λ = 0.1 (cerebellar weight) · λ_ewc = 5,000 (EWC penalty) Experimental Infrastructure System GPU RAM OS HPE ProLiant ML350 Gen10 (20 cores) NVIDIA Tesla V100 32 GB 96 GB DDR4 Ubuntu 24.04 LTS PC Aorus Model S (Intel i9, 5.1 GHz) NVIDIA RTX 3080 10 GB 32 GB DDR4 Ubuntu 24.04 LTS
Energetic homeostasis, Multimodal sensorimotor integration, catastrophic forgetting, Perception–action loop, hippocampus module, Neuro-evolutionary systems, Biologically inspired artificial intelligence, Computational neuroscience, Brain-inspired architectures, Reinforcement learning, elastic weight consolidation, episodic replay, Adaptive agents, O(n) sequence modelling
Energetic homeostasis, Multimodal sensorimotor integration, catastrophic forgetting, Perception–action loop, hippocampus module, Neuro-evolutionary systems, Biologically inspired artificial intelligence, Computational neuroscience, Brain-inspired architectures, Reinforcement learning, elastic weight consolidation, episodic replay, Adaptive agents, O(n) sequence modelling
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
