Traces Propagation: Rethinking Spatial Credit Assignment in Spiking Neural Networks Through Local Contrastive Learning
Spiking Neural Networks represent a fundamental departure from conventional artificial neural networks. By employing discrete spike events rather than continuous activations, SNNs offer a computational framework that mirrors the energy efficiency and temporal processing capabilities of biological neural systems. Yet this biological fidelity introduces a formidable training challenge. The need to assign credit across both spatial dimensions (layer depth) and temporal dimensions (spike timing) has historically tethered SNN optimization to Backpropagation Through Time. This algorithm, while effective, requires storing complete activation histories and propagating gradients backward through both space and time. The resulting memory footprint grows prohibitively with sequence length and network depth, rendering BPTT unsuitable for edge deployment and biologically implausible as a model of brain learning.
Recent advances in local learning rules have addressed the temporal component through eligibility traces. These mechanisms allow neurons to maintain local records of recent activity, enabling temporal credit assignment without storing full history. However, spatial credit assignment has remained problematic. Existing approaches typically require auxiliary layer-wise matrices or feedback weights that increase memory overhead precisely when scalability matters most. The paper "Traces Propagation: Memory-Efficient and Scalable Forward-Only Learning in Spiking Neural Networks" introduces a compelling alternative. By reimagining spatial credit assignment as a local contrastive problem rather than a global optimization task, Traces Propagation achieves fully local learning with favorable memory scaling.
The Mechanics of Forward-Only Learning
The technical architecture of Traces Propagation rests on two pillars: eligibility traces for temporal credit and layer-wise contrastive losses for spatial credit. Eligibility traces operate as decaying accumulators of presynaptic activity. When a postsynaptic spike occurs, the trace provides a local estimate of which presynaptic inputs contributed to that event, eliminating the need to unroll computational graphs through time. This mechanism aligns with established models of synaptic plasticity in neuroscience, where recent pre- and postsynaptic activity determines synaptic modification.
What distinguishes TP is its treatment of spatial credit. Rather than computing gradients through weight matrices, TP employs a contrastive loss at each layer. Neurons learn by distinguishing between target patterns and negative samples, using eligibility traces to modulate this comparison. Crucially, this approach requires no auxiliary feedback matrices or stored activation histories. The memory complexity scales with the number of neurons rather than the product of sequence length and network depth, a critical advantage for embedded applications.
This architectural choice yields significant empirical benefits. On the NMNIST and SHD datasets, TP outperforms existing fully local rules such as Decoupled Contrastive Local Synaptic learning. For more complex benchmarks including DVS-GESTURE and DVS-CIFAR10, TP achieves competitive accuracy while scaling to deeper architectures like VGG-9. The memory advantages become particularly pronounced for datasets with numerous classes, where auxiliary matrix methods face quadratic scaling challenges that TP avoids entirely.
Empirical Validation and Biological Parallels
The practical implications of TP extend beyond benchmark performance to real-world deployment scenarios. The paper demonstrates effective fine-tuning on the Google Speech Commands dataset, suggesting utility for keyword spotting applications at the edge. This capability addresses a critical gap in neuromorphic computing. While BPTT produces accurate models, the memory requirements prevent on-device adaptation to new speakers or acoustic environments. TP enables continuous learning scenarios where devices adapt to user-specific patterns without offloading data to centralized servers.
The biological plausibility of TP merits particular attention. The algorithm mirrors morphogenetic processes in developmental biology. Embryonic cells respond to local chemical gradients, which function similarly to eligibility traces, accumulating signals over time. Cell fate decisions emerge from local concentration comparisons rather than global coordination across the entire embryo. TP implements an analogous computational strategy, suggesting that forward-only learning rules naturally exhibit the scaling properties observed in biological systems. This connection implies that the constraints driving biological evolution, specifically metabolic efficiency and local computation, may converge on similar algorithmic solutions as engineering optimization for edge devices.
Critical Analysis and Limitations
The innovation of treating spatial credit assignment as contrastive learning rather than gradient propagation represents a conceptual shift with broad implications. In standard deep learning, spatial credit flows backward through weight