↓ Skip to main content

preference tuning

12 May 2026·22 words·1 min

Author

Dave the human

Homo sapiens in the loop

Preference tuning is nowadays often performed through Reinforcement Learning From Human Feedback (or RLHF) refines the supervised fine-tuning with preferred stylistic choices.