CarperAI/trlx: v0.4

Summary of release notes: Along with many improvements to experiment tracking, rollout logging, and configuration flexibility, new highlight features include: Support for T5-based student models. Check out this example, where we show how to fine-tune a FLAN-T5 model on CNN/DailyMail for summarization. Support for parameter-efficient tuning methods. Some of our preliminary results have shown LoRA to be a promising technique in scaling RLHF under low-resource settings and hope users get the chance to explore its potential. We've seen a ~30% reduction in memory usage and ~20% reduction in wallclock time for the same performance (quick report here) Out-of-the-box support for 8-bit Adam(W) optimizers via TimDettmers/bitsandbytes, leading to a 15% decrease in memory allocation in one of our baseline examples (related report). Other interesting examples are in the works, so stay tuned! What's Changed ILQL indicies on wrong device by @cat-state in https://github.com/CarperAI/trlx/pull/105 Fix ppo ratio inaccuracy by @reciprocated in https://github.com/CarperAI/trlx/pull/108 Set RNG seeds across multiple dependencies by @jon-tow in https://github.com/CarperAI/trlx/pull/113 Set seed after default config instantiation by @jon-tow in https://github.com/CarperAI/trlx/pull/114 Move queries on the device by @reciprocated in https://github.com/CarperAI/trlx/pull/115 Add ppo randomwalks example by @reciprocated in https://github.com/CarperAI/trlx/pull/119 Add unit tests to ensure valid example configs by @jon-tow in https://github.com/CarperAI/trlx/pull/120 updating gptj-config by @Dahoas in https://github.com/CarperAI/trlx/pull/109 Fix get distributed config by @reciprocated in https://github.com/CarperAI/trlx/pull/122 Add local rollout logging by @thomfoster in https://github.com/CarperAI/trlx/pull/124 Add support for more CausalLMs by @jon-tow in https://github.com/CarperAI/trlx/pull/103 Add hydra head support for GPTNeo by @jon-tow in https://github.com/CarperAI/trlx/pull/126 Add BloomModel hydra support by @jon-tow in https://github.com/CarperAI/trlx/pull/129 Simplifying logic to merge configs by @leshanbog in https://github.com/CarperAI/trlx/pull/134 add: load function for AccelerateRLModel by @dongs0104 in https://github.com/CarperAI/trlx/pull/136 Add OptimizerConfig and SchedulerConfig by @jon-tow in https://github.com/CarperAI/trlx/pull/135 Remove incorrect default config settings by @jon-tow in https://github.com/CarperAI/trlx/pull/137 Update TRL acknowledgement by @osanseviero in https://github.com/CarperAI/trlx/pull/138 Fix context overflow by @reciprocated in https://github.com/CarperAI/trlx/pull/131 Fix seeding per process by @reciprocated in https://github.com/CarperAI/trlx/pull/141 Set device-specific seeding with global rank by @jon-tow in https://github.com/CarperAI/trlx/pull/143 Freeze hydra model branches by @jon-tow in https://github.com/CarperAI/trlx/pull/140 Refactor RL model wrapper into a trainer module by @jon-tow in https://github.com/CarperAI/trlx/pull/144 Logging learning rate by @leshanbog in https://github.com/CarperAI/trlx/pull/147 Fix instantiating base transformer from a custom config by @reciprocated in https://github.com/CarperAI/trlx/pull/149 Linear LR scheduler by @leshanbog in https://github.com/CarperAI/trlx/pull/150 Update pre-commit version and add isort by @jon-tow in https://github.com/CarperAI/trlx/pull/152 fix: configure flake8, fix errors, add trackers config by @Mistobaan in https://github.com/CarperAI/trlx/pull/157 Features/use-python-3.8-in-ci by @Mistobaan in https://github.com/CarperAI/trlx/pull/159 Add bitsandbytes optimizer support by @aicrumb in https://github.com/CarperAI/trlx/pull/133 initial commit for trlx LORA support by @ethankim00 in https://github.com/CarperAI/trlx/pull/110 Fix default delta_kwargs handling by @jon-tow in https://github.com/CarperAI/trlx/pull/171 Add T5 model by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/145 Fix wandb.errors.RequireError as reported in #162 by @ayulockin in https://github.com/CarperAI/trlx/pull/167 Update README.md by @LouisCastricato in https://github.com/CarperAI/trlx/pull/180 Update ILQL details by @reciprocated in https://github.com/CarperAI/trlx/pull/156 Add OpenAI Summarize RLHF with trlX by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/175 Fix HuggingFace model.save_pretrained for DDP by @jon-tow in https://github.com/CarperAI/trlx/pull/181 Update generation utilities by @reciprocated in https://github.com/CarperAI/trlx/pull/172 New Contributors @thomfoster made their first contribution in https://github.com/CarperAI/trlx/pull/124 @leshanbog made their first contribution in https://github.com/CarperAI/trlx/pull/134 @dongs0104 made their first contribution in https://github.com/CarperAI/trlx/pull/136 @osanseviero made their first contribution in https://github.com/CarperAI/trlx/pull/138 @Mistobaan made their first contribution in https://github.com/CarperAI/trlx/pull/157 @aicrumb made their first contribution in https://github.com/CarperAI/trlx/pull/133 @ethankim00 made their first contribution in https://github.com/CarperAI/trlx/pull/110 @PhungVanDuy made their first contribution in https://github.com/CarperAI/trlx/pull/145 Full Changelog: https://github.com/CarperAI/trlx/compare/v0.3...v0.4

Related Organizations

Samsung Electronics (South Korea)
Korea (Republic of)
Samsung (South Korea)
Korea (Republic of)

5 Research products, page 1 of 1

EleutherAI/lm-evaluation-harness: v0.3.0
2022IsAmongTopNSimilarDocuments
CarperAI/trlx: v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration
2023IsAmongTopNSimilarDocuments
EleutherAI/lm-evaluation-harness: lm-eval v0.4.9.2 Release Notes
2025IsAmongTopNSimilarDocuments
CarperAI/trlx: v0.7.0: NeMO PPO, PEFT Migration, and Fixes
2023IsAmongTopNSimilarDocuments
CarperAI/trlx: v0.7.0: NeMO PPO, PEFT Migration, and Fixes
2023IsVersionOf

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average