OpenPipe’s client/server loop for agent trajectories, rewards, and GRPO-style updates.
Agent Reinforcement Trainer (ART)
Reinforcement learning becomes more complex when the “agent” is an LLM. Instead of choosing a simple action like moving left or right—an LLM agent produces multi-step reasoning traces, tool calls, conversations and plans.
Training such agents requires a system that can collect these trajectories, assign rewards and update the model reliably.
ART (Agent Reinforcement Trainer), built by OpenPipe, provides that system.
It is an open-source framework designed specifically for training agentic LLMs from experience. ART handles the pieces that are difficult to engineer manually:
ART uses a lightweight client that wraps your existing agent with minimal changes. The client communicates with an ART training server, which manages rollouts, reward computation, batching and optimization.
A key feature is ART’s support for Group Relative Policy Optimization (GRPO), an RL algorithm widely used for training LLMs. GRPO allows the model to learn from trajectory-level rewards rather than token-level labels, which is essential for improving behaviors like planning, correction and tool use.
The workflow looks like this:
By handling rollout execution, reward processing and policy optimization, ART lets developers focus on designing effective reward signals and agent strategies rather than building RL infrastructure.