
ALA: Asynchronous LLM Advisor Bounded Logit Perturbation Channels for LLM-Guided Reinforcement Learning Author Cahlen Humphreys, Enfuse Labs Abstract A novel architecture that enables large language models to provide real-time strategic guidance to reinforcement learning agents via bounded logit perturbation channels with importance sampling correction for PPO training. Key Innovations Time-bounded bias expiration Multi-advisor voting Importance sampling correction for unbiased policy gradients License CC BY 4.0 Links Website: https://mc.enfuse.ai Paper PDF included in release assets
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
