
Neural network architectures have traditionally evolved along two largely independent axes: improvements in learning mechanisms and improvements in representational capacity. Recurrent neural networks rely on backpropagation through time (BPTT) to model temporal dependencies, while convolutional neural networks (CNNs) exploit spatial locality and are primarily applied to visual domains. In this work, we decouple these dimensions and investigate two orthogonal extensions. First, we introduce a persistent-memory sequence model that replaces BPTT with localized updates and lateral propagation. Second, we demonstrate that convolution can be extended beyond visual domains to multimodal structured data. Experimental results show that localized propagation captures training-set structure but fails to generalize under the tested single-modal configuration, while multimodal convolution improves held-out classification performance as data scale increases. Although limitations remain, these results suggest that neural architectures can be redesigned at a higher level of abstraction by independently modifying learning and representation mechanisms.
