Skip to main content

preference tuning

·22 words·1 min
Dave the human
Author
Dave the human
Homo sapiens in the loop

Preference tuning is nowadays often performed through Reinforcement Learning From Human Feedback (or RLHF) refines the supervised fine-tuning with preferred stylistic choices.


Comments