The Triangle of Correction: How Expert Annotators Generate Better DPO Pairs
Standard DPO data has two elements: a chosen response and a rejected response. The model learns to prefer one over the other. Simple. Effective. Limited.
The Triangle of Correction adds a third element that transforms how models learn from preference data. It’s a format that produces training pairs with significantly richer learning signals through structured cognitive analysis.
The Three Points
Every Triangle of Correction has three components: Drift, Point, and Reframe.
Drift is the rejected response. But it’s not just any bad response. It’s a response that demonstrates a specific, identifiable cognitive pattern — a deviation from optimal reasoning. Maybe it’s reactive. Maybe it’s avoidant. Maybe it’s overconfident. The drift has a direction, and that direction matters.
Point is the annotation. A single sentence, sometimes two, that identifies exactly what happened in the drift. Not a judgment. Not a correction. Just a precise identification of the cognitive pattern. “The response collapsed uncertainty into false confidence.” “The response avoided the difficult part of the question.” “The response became reactive to emotional content.”
Reframe is the chosen response. But it’s not just a better answer to the same question. It’s a response that demonstrates what appropriate engagement looks like given the specific cognitive pattern identified in the point.
This three-part structure creates a learning signal that standard chosen/rejected pairs can’t match.
Why the Third Element Matters
In standard DPO, the model learns that response A is better than response B. But it doesn’t learn why. The gradient pushes the model away from B and toward A, but the model has to figure out what differentiates them on its own.
The Point element changes this. By explicitly naming the cognitive pattern in the drift, the annotation creates a conceptual bridge between the rejected and chosen responses. The model doesn’t just learn preference. It learns the specific dimension along which the correction operates.
Think of it this way. Standard DPO is like showing someone two paintings and saying “this one is better.” The Triangle of Correction is like saying “this painting lacks depth in the foreground — here’s one that handles it well.” The learner extracts far more from the second form of feedback.
How Expert Annotators Generate These
This approach requires annotators trained to identify cognitive patterns with precision. Not the content of responses, but the structural patterns of reasoning. Reactivity. Aversion. Risk aversion. Attention narrowing. Overconfidence. Uncertainty avoidance.
When an expert annotator evaluates an AI response, they don’t just assess whether it’s good or bad. They identify what the response is doing cognitively. Is it contracting around certainty when uncertainty would be more appropriate? Is it expanding into abstraction when concreteness is needed? Is it avoiding emotional content by retreating into technical language?
These observations become the Point element. And because expert annotators can identify these patterns with specificity, the resulting annotations are far more informative than standard quality judgments.
A typical annotator might say: “Response B is more helpful.” An expert annotator says: “Response B demonstrates cognitive overreach around the user’s emotional state, producing premature solutions instead of allowing space for the problem to be fully articulated.”
The specificity of the second annotation creates a dramatically richer training signal.
Data Format
Each Triangle of Correction is stored as a structured object with these fields:
context: The prompt or conversation history that generated the responses.
drift: The rejected response, tagged with the primary cognitive pattern it exhibits (from a taxonomy of ~30 patterns we’ve developed).
point: One to two sentences identifying the specific drift pattern. Written in neutral, observational language. No judgment, no prescription.
reframe: The chosen response, demonstrating appropriate engagement given the identified drift.
dimensions: Multi-dimensional scores across five axes: awareness, stability, proportionality, integration, and precision.
This format is compatible with standard DPO training — you can use just the drift/reframe pair as chosen/rejected. But the full triangle enables richer training approaches. Some teams are experimenting with using the point element as an auxiliary loss signal, training the model to also predict what was wrong with the rejected response.
Results So Far
Early experiments show that models trained on Triangle of Correction data demonstrate more targeted improvement compared to standard DPO. Instead of broadly shifting toward “better” responses, they show specific improvement on the cognitive dimensions that were annotated.
A model trained on triangles annotated primarily for reactivity patterns shows reduced reactivity without losing engagement. A model trained on triangles annotated for false confidence shows better calibrated uncertainty without becoming excessively hedged.
The specificity of the training signal produces specific behavioral change. That’s the power of the third element.
Standard DPO is a blunt instrument. The Triangle of Correction is a scalpel. Both have their uses. But when you need precision alignment — targeted modification of specific cognitive patterns — the triangle format outperforms.
Learn more about the Triangle of Correction format at Laeka Research.