RLHF

Also known as: reinforcement learning from human feedback

Using human preference signals to align model outputs with policy and tone goals - common in modern LLM post-training stacks.

See also

custom slm

← Back to full glossary · View on index

Contact if you need a term added for a security or procurement review.