Distillation (also called knowledge distillation) consists of transferring complex reasoning patterns learned by larger models into smaller ones. In deep learning, distillation happens when a smaller “student” model learns from outputs and logits of a larger “teacher” model; when talking about Large Language Models, distillation typically means performing supervised fine-tuning using high-quality labeled instruction datasets generated by a more capable LLM.
distillation
·61 words·1 min