InstructGPT

Published on Mar 4, 2022 Updated on Jan 15, 2026 LLMs LLMs, GPTs, Instruct Tuning One minute

Contents

NeurlPS 2022 OpenAI arXiv 2203.02155

TL;DR

InstructGPT aligns language models with human intent using supervised fine-tuning (SFT) and reinforcement learning from Human Feedback (RLHF) to improve following instructions and minimize harmful outputs.

Motivations & Innovations

Approach

Model

GPT-3 pretrained language models.

Training Recipe

Supervised Fine-tuning (SFT)

Reward Modeling (RM)

Reinforcement Learning (RL)

Data Recipe

Step1: Collect demonstration data, and train a supervised policy.

Step 2: Collect comparison data, and train a reward model.

Step3: Optimize a policy against the reward model using PPO.