Preference tuning is nowadays often performed through Reinforcement Learning From Human Feedback (or RLHF) refines the supervised fine-tuning with preferred stylistic choices.
preference tuning
·22 words·1 min
Preference tuning is nowadays often performed through Reinforcement Learning From Human Feedback (or RLHF) refines the supervised fine-tuning with preferred stylistic choices.