Selective Memory as Geometry: Coreset Methods for Continual Medical Reinforcement Learning on Edge Devices

Introduction

The deployment of artificial intelligence in clinical radiology faces a fundamental tension. While deep learning models are typically trained on centralized, high-performance computing clusters with abundant GPU resources, actual medical imaging occurs in distributed, resource-constrained environments. Edge devices in mobile MRI units, point-of-care scanners, and legacy hospital systems operate under strict memory, bandwidth, and power limitations. Compounding this challenge, imaging environments evolve continuously; scanner protocols change, patient populations shift, and hardware degrades or upgrades. This creates a need for lifelong learning systems that can adapt to new data distributions without forgetting previous tasks, all while operating on hardware that lacks the capacity to store complete replay buffers or run full gradient descent on historical data.

The paper "A framework for dynamically training and adapting deep reinforcement learning models to different, low-compute, and continuously changing radiology deployment environments" addresses this gap by reframing the experience replay mechanism in deep reinforcement learning (DRL). Rather than treating memory constraints as a storage problem to be solved with compression or cloud offloading, the authors approach it as a geometric covering problem. They demonstrate that for anatomical landmark localization in full-body DIXON MRI sequences, carefully curated subsets of experiences can outperform full replay buffers while achieving 27x compression. This result challenges the assumption that more data always yields better policies, and instead suggests that the quality of the training distribution, measured by information geometry, determines adaptation performance.

Coreset Algorithms as Geometric Selectors

The authors implement three distinct coreset selection strategies, each representing a different philosophical approach to memory compression. The neighborhood averaging coreset aggregates similar observations locally, effectively constructing a smoothed manifold of the state space. The neighborhood sensitivity-based sampling coreset weights samples by their gradient impact, preserving experiences that most significantly influence policy updates. Most notably, the maximum entropy coreset selects observations that maximize diversity in the latent representation space.

This maximum entropy approach deserves particular attention. In the context of full-body DIXON water and fat MRI images, consecutive slices exhibit high temporal redundancy; adjacent images contain nearly identical anatomical structures with minor spatial offsets. A standard replay buffer would store hundreds of redundant transitions from the kidney or liver regions, creating clusters in the latent space that bias the policy gradient estimates. By contrast, maximum entropy selection treats the replay buffer as a geometric covering of the task manifold. It selects the minimal set of observations whose latent representations span the relevant subspace, effectively ensuring that the policy network receives training signals from the full anatomical spectrum without over-sampling common configurations.

The technical implementation relies on selective experience replay within a lifelong learning framework. Rather than accumulating all historical transitions, the system maintains a fixed-size coreset that is dynamically updated as the imaging environment changes. When the DRL agent encounters new DIXON sequences from a different scanner or protocol, it compresses these experiences through the coreset algorithm before incorporating them into the training buffer. This allows the model to adapt to new environments while maintaining representational capacity for previously encountered anatomical variations.

Performance Implications and the Paradox of Compression

The experimental results reveal a counterintuitive finding that merits careful analysis. On the task of localizing five anatomical landmarks, left knee, right trochanter, left kidney, spleen, and lung, the maximum entropy coreset achieved an average distance error of 11.97±12.02 mm, compared to the conventional lifelong learning framework's 19.24±50.77 mm. The reduction in mean error is substantial, but the reduction in variance is perhaps more significant. The standard deviation drops from over 50 mm to approximately 12 mm, indicating that the coreset-based training produces dramatically more stable policies.

This stability suggests that conventional experience replay buffers in medical RL suffer from gradient confusion caused by redundant or noisy transitions. When an agent trains on unfiltered MRI sequences, redundant slices from homogeneous tissue regions (such as subcutaneous fat or muscle compartments) create conflicting gradient signals. The policy network receives multiple, slightly contradictory instructions for similar states, destabilizing convergence. Maximum entropy coresets eliminate this noise by ensuring that each training batch covers the latent space uniformly, providing consistent gradient directions that reflect the true geometry of anatomical localization.

The 27x compression ratio has immediate practical implications for edge deployment. A standard full-body MRI sequence might contain 200-300 slices; compressing this by 27x reduces the memory footprint to roughly 7-11 representative images per scan. This enables continual learning on devices with limited RAM, such as embedded systems in portable MRI units or older hospital workstations that cannot accommodate full deep learning frameworks. Furthermore, the compression happens at the experience level, not the model level, meaning the architecture itself remains unmodified. This avoids the accuracy degradation typically associated with model quantization or pruning techniques.

Original Insights and Broader Context

This work connects to a broader shift in machine learning from big data to right data. In medical imaging, where annotation is expensive and patient privacy restricts data pooling, the ability to identify maximally informative subsets carries significant value. The maximum entropy coreset approach resembles active learning strategies, but with a crucial difference; it operates retrospectively on the agent's own experience rather than requiring an oracle to label unselected samples. This makes it suitable for unsupervised or weakly supervised adaptation scenarios common in clinical deployment.

However, several limitations warrant consideration. The 27x compression factor is specific to volumetric MRI with high inter-slice redundancy. The technique may not generalize to modalities with sparse spatial sampling, such as interventional radiology with fluoroscopy or sparse-view CT reconstruction, where consecutive frames capture fundamentally different information. Additionally, the maximum entropy criterion assumes a stable feature extractor; if the imaging environment changes drastically (for example, switching from 1.5T to 7T MRI with fundamentally different contrast mechanisms), the latent space geometry itself may shift, rendering the coreset selection criterion obsolete.

From a systems perspective, this framework suggests that federated learning architectures for medical AI should focus not just on model aggregation, but on experience curation. If edge devices compress their local experiences using maximum entropy criteria before transmitting to a central server, the communication bandwidth requirements drop proportionally to the compression ratio, while potentially improving the quality of the aggregated global model by filtering out locally redundant observations.

Conclusion and Open Questions

The finding that maximum entropy coresets outperform full replay buffers raises fundamental questions about the nature of catastrophic forgetting in deep reinforcement learning. Perhaps forgetting is not merely the loss of previously learned information, but the interference caused by redundant or poorly selected training examples. If a carefully chosen subset of experiences preserves task performance better than the complete dataset, then the pathology of forgetting may lie in the buffer's information geometry rather than the plasticity of the neural network itself.

Looking forward, several questions remain unanswered. How do these coreset algorithms perform under adversarial distribution shifts, such as the introduction of pathological anatomy not present in the initial training distribution? Can the coreset selection mechanism itself be made adaptive, updating its notion of diversity as the policy improves and explores new regions of the state space? And finally, can we extend this geometric framework to other medical AI modalities, such as histopathology whole-slide images or genomic sequence data, where the concept of "consecutive slices" does not apply?

The transition from centralized, batch-trained models to distributed, continually adapting systems represents a necessary evolution for clinical AI. By treating memory constraints as an opportunity for geometric optimization rather than a limitation to be overcome with hardware, this framework points toward a future where medical AI systems become more intelligent not by growing larger, but by remembering better.