Skip to main content

reinforcement learning from human feedback

·65 words·1 min
Dave the human
Author
Dave the human
Homo sapiens in the loop

Reinforcement Learning From Human Feedback (or RLHF) involves human evaluations or rankings of model outputs as reward signals; this means that humans are involved in the process to guide the model toward human-preferred behaviors.

This is in contrast with reinforcement learning in the context of reasoning, where the models rely on automated or environment-based reward signals (more objective but potentially less aligned with human preference).


Comments