Structured output

Structured and Constrained LLM Output¶

Large language models produce fluent text, but without additional constraints they are free to output anything — extra prose, invalid JSON, or label names that don't match your schema. Constrained decoding solves this by modifying token sampling at inference time: a constraint is compiled into per-step logit masks so that only tokens consistent with the constraint can ever be sampled.

Ludwig supports three types of output constraints, all configured in the output feature's decoder block:

Constraint	Config key	Use case
JSON schema	`decoder.json_schema`	Structured extraction, tool-call responses
Regular expression	`decoder.regex`	Classification, fixed-format fields
EBNF grammar	`decoder.grammar`	Complex structured formats

JSON schema constraints¶

A JSON schema constraint guarantees that the model's output is valid JSON matching a specific structure. Ludwig compiles the schema into token masks that are applied at every sampling step, so the model cannot produce malformed JSON, extra keys, or values outside an enum.

Configuration¶

model_type: llm
base_model: microsoft/phi-2

prompt:
  task: >
    Extract the named entities from the input text and return them as a JSON
    object with this structure:
    {"entities": [{"text": "...", "type": "PERSON|ORG|LOC|DATE"}]}.
    Return only valid JSON, nothing else.

input_features:
  - name: text
    type: text

output_features:
  - name: output
    type: text
    decoder:
      type: text_parser
      json_schema:
        type: object
        properties:
          entities:
            type: array
            items:
              type: object
              properties:
                text:
                  type: string
                type:
                  type: string
                  enum: [PERSON, ORG, LOC, DATE]
              required: [text, type]
        required: [entities]
        additionalProperties: false

generation:
  max_new_tokens: 200
  temperature: 0.1
  do_sample: false

backend:
  type: local

Example¶

Given the input:

Apple Inc. was founded by Steve Jobs in Cupertino, California on April 1, 1976.

The model produces exactly:

{"entities": [
  {"text": "Apple Inc.", "type": "ORG"},
  {"text": "Steve Jobs", "type": "PERSON"},
  {"text": "Cupertino", "type": "LOC"},
  {"text": "California", "type": "LOC"},
  {"text": "April 1, 1976", "type": "DATE"}
]}

The output is always valid JSON. It cannot contain extra keys or type values outside the enum.

Regex constraints¶

A regex constraint restricts the output to strings that match the pattern. This is ideal for classification tasks where the valid outputs form a small, known set.

Configuration¶

model_type: llm
base_model: Qwen/Qwen2-0.5B-Instruct

prompt:
  task: >
    Classify the sentiment of the following text.
    Respond with exactly one word: positive, negative, or neutral.

input_features:
  - name: text
    type: text

output_features:
  - name: sentiment
    type: text
    decoder:
      type: text_parser
      regex: "(positive|negative|neutral)"

generation:
  max_new_tokens: 10
  temperature: 0.0
  do_sample: false

backend:
  type: local

Example¶

Input	Unconstrained output	Constrained output
`"I loved this product!"`	`"The sentiment is positive."`	`positive`
`"The service was terrible."`	`"Negative sentiment."`	`negative`
`"The movie was okay."`	`"The text expresses neutral or slightly negative..."`	`neutral`

Without a constraint the model often prepends or appends prose. The regex guarantees a single clean label.

Grammar constraints¶

For more complex formats that can't be captured by a single regex, Ludwig supports EBNF grammars:

output_features:
  - name: output
    type: text
    decoder:
      type: text_parser
      grammar: |
        root   ::= object
        object ::= "{" members "}"
        members ::= pair ("," pair)*
        pair   ::= string ":" value
        value  ::= string | number | "true" | "false" | "null"
        string ::= "\"" [^"]* "\""
        number ::= [0-9]+

Grammar constraints are more expressive than regexes but compile more slowly. Use them when you need recursive structures that can't be expressed as a regular language.

Logits extraction¶

Ludwig can return the raw logits (pre-softmax vocabulary scores) for each generated token alongside the prediction. Logits are useful for:

Token-level confidence scoring
Calibration and uncertainty quantification
Downstream reranking or ensemble methods

Enable logits collection with output_logits: true in the output feature:

output_features:
  - name: response
    type: text
    output_logits: true

Then call predict with collect_predictions=True:

from ludwig.api import LudwigModel
import pandas as pd

model = LudwigModel(config="config.yaml")
preds, output_df, _ = model.predict(
    dataset=pd.DataFrame({"text": ["Is the sky blue?"]}),
    collect_predictions=True,
)

print(preds["response_predictions"].iloc[0])
# -> "Yes"

logits = output_df["response_logits"].iloc[0]
# logits is a 2D array of shape (num_generated_tokens, vocab_size)

Combining with fine-tuning¶

Constrained decoding works with both zero-shot inference and fine-tuned models. For a fine-tuned model, add the constraint to the output feature decoder in the same config used for prediction:

model = LudwigModel.load("/path/to/finetuned_model")
# Override the decoder config at prediction time if needed
preds, _, _ = model.predict(dataset=df)

Performance considerations¶

JSON schema constraints carry a small overhead per token (~5–10 ms on CPU) because the schema automaton must be advanced at each step. The overhead is negligible for GPU-based inference.
Regex constraints are fast — typically under 1 ms per token.
Grammar constraints have a higher compile cost at startup. Cache the compiled grammar across requests.
Constrained decoding is compatible with vLLM serving (see Serving Ludwig LLMs with vLLM).

Interactive notebook¶

An interactive walkthrough of all examples above is available in the Ludwig examples repository:

The notebook includes a side-by-side comparison of constrained vs unconstrained outputs and works on a free Colab T4 GPU.

Structured output

Structured and Constrained LLM Output¶

JSON schema constraints¶

Configuration¶

Example¶

Regex constraints¶

Configuration¶

Example¶

Grammar constraints¶

Logits extraction¶

Combining with fine-tuning¶

Performance considerations¶

Interactive notebook¶

See also¶