Ultra-low-latency FPGA-accelerated neural netweork inference at 40MHz at CMS

In the realm of data processing and physics analysis at the Large Hadron Collider (LHC), deep learning based algorithms have proven to be more advantageous than traditional physics based algorithms in certain cases [2]. This study explores cutting-edge methodologies for the low latency neural network inference on Field Programmable Gate Array (FPGA) devices. Specifically, the study focuses on muon primitive recalibration and fake/real muon pair classification at the rate of 40 MHz within the CMS L1 trigger system. The primary objective of this work is to develop an low-latency neural network model, strategically combining various techniques such as quantization aware training, knowledge distillation, transfer learning, and pruning schedules to reduce the computational footprint when compared to the preexisting baseline while simultaneously improving on reconstruction performance. Using the said strategy, the models were compressed over four times while still achieving significantly lower error rates than given baselines.

Found an issue? Give us feedback