reinforcement learning from human feedback

Reinforcement Learning From Human Feedback (or RLHF) involves human evaluations or rankings of model outputs as reward signals; this means that humans are involved in the process to guide the model toward human-preferred behaviors.

This is in contrast with reinforcement learning in the context of reasoning, where the models rely on automated or environment-based reward signals (more objective but potentially less aligned with human preference).