sharpbyte.dev
← Prompt engineering
Prompt engineering · topic 6 of 8

Attentive Reasoning Queries (ARQ)

Structured queries (often JSON) replace improvisation; Parlant is one implementation story.

Bonus: ARQ

Here’s the core problem with current techniques that this new approach solves.

We have enough research to conclude that LLMs often struggle to assess what truly matters in a particular stage of a long, multi-turn conversation.

For instance, when you give Agents a 2,000-word system prompt filled with policies, tone rules, and behavioral dos and don’ts, you expect them to follow it word by word.

But here’s what actually happens:

  • They start strong initially.
  • Soon, they drift and start hallucinating.
  • Shortly after, they forget what was said five turns ago.

And finally, the LLM that was supposed to “never promise a refund” is happily offering one.

This means they can easily ignore crucial rules (stated initially) halfway through the process.

We expect techniques like Chain-of-Thought will help.

Attentive Reasoning Queries (ARQs)

But even with methods like CoT, reasoning remains free-form, i.e., the model “thinks aloud” but it has limited domain-specific control.

That’s the exact problem the new technique, called Attentive Reasoning Queries (ARQs), solves.

Instead of letting LLMs reason freely, ARQs guide them through explicit, domain-specific questions.

Essentially, each reasoning step is encoded as a targeted query inside a JSON schema.

For example, before making a recommendation or deciding on a tool call, the LLM is prompted to fill structured keys like:

Structured ARQ reasoning from the deck.
Structured ARQ reasoning from the deck.

What this type of query does

This type of query does two things:

  • 1. Reinstate critical instructions by keeping the LLM aligned mid-conversation.
  • 2. Facilitate intermediate reasoning, so that the decisions are auditable and verifiable.

By the time the LLM generates the final response, it’s already walked through a sequence of controlled reasoning steps, which did not involve any free text exploration (unlike techniques like CoT or ToT).

Example of structured keys / inline query.
Example of structured keys / inline query.
Alignment and auditability emphasis in the slides.
Alignment and auditability emphasis in the slides.

Success rate across 87 test scenarios

Here’s the success rate across 87 test scenarios:

  • ARQ — 90.2%
  • CoT reasoning — 86.1%
  • Direct response generation — 81.5%

Parlant and why structure wins

This approach is actually implemented in Parlant, a recently trending open-source framework to build instruction-following Agents.

ARQs are integrated into three key modules:

  • Guideline proposer to decide which behavioral rules apply.
  • Tool caller to determine what external functions to use.
  • Message generator, when it produces the final customer-facing reply.

The core insight applies regardless of what tools you use:

When you make reasoning explicit, measurable, and domain-aware, LLMs stop improvising and start reasoning with intention. Free-form thinking sounds powerful, but in high-stakes or multi-turn scenarios, structure always wins.

ARQ solves the problem of uncontrolled reasoning by adding structure.

But there’s another challenge: many aligned LLMs stop exploring alternative answers altogether.

Even with good reasoning steps, the model may collapse into the same safe, typical responses.

To regain that lost diversity without retraining the model, we use Verbalized Sampling.

Parlant modules described in the PDF.
Parlant modules described in the PDF.

Key takeaways

  • ARQ encodes steps as structured, domain-aware queries.
  • Aims for mid-chat alignment plus auditable intermediate decisions.
  • Deck cites benchmark lifts—replicate on your eval harness.