Skip to content

Add LogitProcessor interface for pre-sampling logit transforms (#19517)#19517

Merged
meta-codesync[bot] merged 1 commit into
mainfrom
export-D104767967
May 15, 2026
Merged

Add LogitProcessor interface for pre-sampling logit transforms (#19517)#19517
meta-codesync[bot] merged 1 commit into
mainfrom
export-D104767967

Conversation

@kirklandsign
Copy link
Copy Markdown
Contributor

@kirklandsign kirklandsign commented May 12, 2026

Summary:

Introduces a LogitProcessor abstract interface that lets callers mutate
logits in place between the model forward pass and the sampler. Enables
grammar-constrained decoding, logit biasing, repetition penalties, and
similar transforms without touching the core generation loop.

Interface (extension/llm/sampler/logit_processor.h, ~15 lines):

  • Single virtual method process(::executorch::aten::Tensor logits) that
    returns Error::Ok or aborts the chain on a non-Ok return.
  • Tensor passed by value (handle-typed ATen idiom; mutations propagate
    through the shared underlying buffer).
  • Each implementation declares its own dtype expectations -- the chain
    runner does not cast or copy the tensor. Typical implementations check
    logits.scalar_type() and either dispatch to a kernel or return
    InvalidArgument.
  • Tensor shape contract (rank 2 = [batch, vocab], rank 3 =
    [batch, seq, vocab] advanced to last sequence position) mirrors
    sample_from_logits.

Wiring (extension/llm/runner/text_token_generator.h):

  • New public methods add_logit_processor, clear_logit_processors,
    num_logit_processors.
  • Inside generate(), between step() and logits_to_token(), the loop
    invokes each registered processor:
    for (auto& processor : logit_processors_) {
    ET_CHECK_OK_OR_RETURN_ERROR(processor->process(logits_tensor));
    }
  • Empty chain is the existing fast path; no behavior change for callers
    that don't register processors.

Configure processors before calling generate() -- concurrent
modification during generation is not safe.

Reviewed By: Gasoonjia

Differential Revision: D104767967

cc @larryliu0820 @mergennachin @cccclai @helunwencser @jackzhxng

Copilot AI review requested due to automatic review settings May 12, 2026 18:04
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 12, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19517

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 6 Pending, 1 Unclassified Failure

As of commit 7d57b0a with merge base 4c474af (image):

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

  • Check Labels / Check labels (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
    RuntimeError: GraphQL query

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 12, 2026

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104767967.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@kirklandsign
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 12, 2026

Claude finished @kirklandsign's task in 1m 31s —— View job


Code Review: Add LogitProcessor interface for pre-sampling logit transforms

  • Read changed files
  • Analyze interface design
  • Check integration in TextTokenGenerator
  • Review tests
  • Post review feedback

Overall this is a clean, well-scoped addition. The interface is minimal, the integration point is correct, and tests cover the key behaviors. A few observations:

Design — looks good

  • The LogitProcessor interface (extension/llm/sampler/logit_processor.h) is appropriately minimal — a single pure virtual process(float*, int32_t) method. The ET_EXPERIMENTAL annotation is a sensible choice for a new API surface.
  • Using std::shared_ptr for processor ownership in TextTokenGenerator is reasonable given processors may be shared across generators or managed externally.

Potential issues

  1. apply_logit_processors_ handles dim==3 but not dim>3 (text_token_generator.h:217-219): The guard checks dim >= 2 but only has special-case offset logic for dim==3. If a model produced a 4D tensor, the logits pointer would point to the beginning of the last dimension without the correct sequence offset. This mirrors sample_from_logits in util.h:48-50 which also only handles dim==3, so this is consistent with existing behavior — but worth noting.

  2. Float-only restriction (text_token_generator.h:209-213): The processor chain enforces ScalarType::Float, but sample_from_logits supports Half, BFloat16, and UInt16. If a model produces non-Float logits, the processor chain will return an error rather than silently skip. This is the safer choice (fail loudly), but callers using quantized logit outputs won't be able to use processors without an explicit cast. A comment noting this design decision would be helpful for future maintainers.

  3. No test for the integration path (apply_logit_processors_): The unit tests in test_logit_processor.cpp test the LogitProcessor interface in isolation (which is good), but there's no integration test verifying apply_logit_processors_ correctly extracts the last-position logits from a 3D tensor and passes them to the chain. This is the trickiest logic in the PR. Consider adding a test that constructs a mock Tensor with shape [1, seq_len, vocab] and verifies only the final position's logits are modified.

  4. Minor: prev_token assigned but not used (text_token_generator.h:128): prev_token = cur_token is set before the processor chain runs but isn't used by the processors. This is pre-existing code (used later for decode), just noting it's not a new issue.

Buck targets

  • logit_processor.h is exported from the sampler target (sampler/targets.bzl:10) — correct placement.
  • text_token_generator gains a direct dep on sampler (runner/targets.bzl:71) — already present, so the new #include is covered.
  • Test target (sampler/test/targets.bzl:27-34) uses the non-aten sampler dep, which is appropriate since the test doesn't need torch tensors.

Nits

  • text_token_generator.h:45: The null check (if (processor)) silently drops null shared_ptrs. This is fine as defensive programming, but an ET_CHECK_MSG would surface misuse earlier during development. Matter of taste.

Summary

Clean PR, well-designed interface. The main suggestion is adding an integration test for the 3D tensor offset logic in apply_logit_processors_. The Float-only restriction is a reasonable starting point that can be relaxed later if needed.


Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a LogitProcessor interface to enable in-place mutation of model logits between the decoder forward pass and sampling, allowing features like constrained decoding and logit biasing without changing the core generation loop.

Changes:

  • Adds LogitProcessor interface under extension/llm/sampler/.
  • Extends TextTokenGenerator with a configurable processor chain applied pre-sampling.
  • Updates Buck targets to export the new header and adds a unit test for the interface.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
extension/llm/sampler/test/test_logit_processor.cpp Adds unit tests validating basic LogitProcessor behavior and ordering semantics.
extension/llm/sampler/test/targets.bzl Adds a Buck test target for the new logit processor tests.
extension/llm/sampler/targets.bzl Exports logit_processor.h from the sampler library target.
extension/llm/sampler/logit_processor.h Introduces the LogitProcessor pure virtual interface.
extension/llm/runner/text_token_generator.h Adds processor registration APIs and applies processor chain to logits before sampling.
extension/llm/runner/targets.bzl Adds runner dependency on the sampler target (for LogitProcessor).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +216 to +218
const auto vocab_size = logits_tensor.size(logits_tensor.dim() - 1);
if (logits_tensor.dim() == 3) {
const auto num_tokens = logits_tensor.size(1);
Comment on lines +215 to +223
auto* logits = logits_tensor.mutable_data_ptr<float>();
const auto vocab_size = logits_tensor.size(logits_tensor.dim() - 1);
if (logits_tensor.dim() == 3) {
const auto num_tokens = logits_tensor.size(1);
logits += (num_tokens - 1) * vocab_size;
}
for (auto& processor : logit_processors_) {
processor->process(logits, static_cast<int32_t>(vocab_size));
}
Comment on lines +209 to +213
ET_CHECK_OR_RETURN_ERROR(
logits_tensor.scalar_type() == ::executorch::aten::ScalarType::Float,
InvalidArgument,
"LogitProcessor chain only supports Float logits; got dtype %d",
static_cast<int>(logits_tensor.scalar_type()));
Comment on lines +130 to +132
if (!logit_processors_.empty()) {
ET_CHECK_OK_OR_RETURN_ERROR(apply_logit_processors_(logits_tensor));
}
Comment thread extension/llm/sampler/logit_processor.h Outdated
Comment on lines +43 to +46
* @param vocab_size Number of logits in the buffer (size of the model's
* output vocabulary for the current step).
*/
virtual void process(float* logits, int32_t vocab_size) = 0;
@meta-codesync meta-codesync Bot changed the title Add LogitProcessor interface for pre-sampling logit transforms Add LogitProcessor interface for pre-sampling logit transforms (#19517) May 12, 2026
meta-codesync Bot pushed a commit that referenced this pull request May 12, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:
- `LogitProcessor` (new): pure virtual interface with a single `process(float*, int32_t)` method, placed in `extension/llm/sampler/`.
- `TextTokenGenerator`: gains `add_logit_processor()`, `clear_logit_processors()`, and `num_logit_processors()`. The processor chain runs after the model step and before `logits_to_token()`. When no processors are registered, behavior is identical to before.
- `apply_logit_processors_()`: private helper that validates Float dtype, advances to the last-position logits for 3D tensors (mirroring `logits_to_token`), and invokes each processor in order.
- Buck: `logit_processor.h` exported from the sampler target; `text_token_generator` gains a direct dep on sampler; test target added.

Processors must be configured before calling `generate()` — concurrent modification during generation is not safe.

Differential Revision: D104767967
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from 3b3862f to 6ebfdf6 Compare May 12, 2026 19:15
meta-codesync Bot pushed a commit that referenced this pull request May 13, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:
- `LogitProcessor` (new): pure virtual interface with a single `process(float*, int32_t)` method, placed in `extension/llm/sampler/`.
- `TextTokenGenerator`: gains `add_logit_processor()`, `clear_logit_processors()`, and `num_logit_processors()`. The processor chain runs after the model step and before `logits_to_token()`. When no processors are registered, behavior is identical to before.
- `apply_logit_processors_()`: private helper that validates Float dtype, advances to the last-position logits for 3D tensors (mirroring `logits_to_token`), and invokes each processor in order.
- Buck: `logit_processor.h` exported from the sampler target; `text_token_generator` gains a direct dep on sampler; test target added.

Processors must be configured before calling `generate()` — concurrent modification during generation is not safe.

Differential Revision: D104767967
Copilot AI review requested due to automatic review settings May 13, 2026 21:48
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from 6ebfdf6 to d03e9db Compare May 13, 2026 21:48
@kirklandsign kirklandsign review requested due to automatic review settings May 13, 2026 21:48
meta-codesync Bot pushed a commit that referenced this pull request May 13, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:
- `LogitProcessor` (new): pure virtual interface with a single `process(float*, int32_t)` method, placed in `extension/llm/sampler/`.
- `TextTokenGenerator`: gains `add_logit_processor()`, `clear_logit_processors()`, and `num_logit_processors()`. The processor chain runs after the model step and before `logits_to_token()`. When no processors are registered, behavior is identical to before.
- `apply_logit_processors_()`: private helper that validates Float dtype, advances to the last-position logits for 3D tensors (mirroring `logits_to_token`), and invokes each processor in order.
- Buck: `logit_processor.h` exported from the sampler target; `text_token_generator` gains a direct dep on sampler; test target added.

Processors must be configured before calling `generate()` — concurrent modification during generation is not safe.

Differential Revision: D104767967
Copilot AI review requested due to automatic review settings May 13, 2026 22:11
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from d03e9db to cac76f9 Compare May 13, 2026 22:11
@kirklandsign kirklandsign review requested due to automatic review settings May 13, 2026 22:11
meta-codesync Bot pushed a commit that referenced this pull request May 13, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:
- `LogitProcessor` (new): abstract class with a constructor that takes `vocab_size` and a pure virtual `process(float*)` method, placed in `extension/llm/sampler/`. The `vocab_size` is fixed per model and stored as a member, avoiding redundant per-call arguments.
- `TextTokenGenerator`: gains `add_logit_processor()`, `clear_logit_processors()`, and `num_logit_processors()`. The processor chain runs after the model step and before `logits_to_token()`. When no processors are registered, behavior is identical to before.
- `apply_logit_processors_()`: private helper that advances to the last-position logits for 3D tensors (mirroring `logits_to_token`), and invokes each processor in order. Supports Float, Half, BFloat16, and UInt16 dtypes: Float logits are processed in place (zero-copy); for other dtypes, logits are cast to a temporary float buffer, processed, then cast back to the original dtype.
- Buck: `logit_processor.h` exported from the sampler target; `text_token_generator` gains a direct dep on sampler; test target added.

Processors must be configured before calling `generate()` -- concurrent modification during generation is not safe.

Differential Revision: D104767967
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from cac76f9 to 7676c50 Compare May 13, 2026 22:13
@kirklandsign
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Claude finished @kirklandsign's task in 1m 5s —— View job


Code Review: Add LogitProcessor interface for pre-sampling logit transforms

  • Read changed files
  • Analyze interface design and implementation
  • Check integration in TextTokenGenerator
  • Review tests and Buck targets
  • Post review feedback

Clean, well-scoped PR. The interface is minimal and the integration point is correct. A few observations:

Strengths

  • LogitProcessor interface is appropriately minimal — a single pure virtual process(float*) method with ET_EXPERIMENTAL annotation.
  • Multi-dtype support via ET_SWITCH_THREE_TYPES with float-conversion for non-Float types is the right tradeoff (zero-copy for the common case, correct for others).
  • The num_tokens > 0 check at line 214-217 of text_token_generator.h properly addresses the empty-sequence-dimension concern.
  • Tests cover single processor mutation, chain ordering, masking, and out-of-range safety.

Issues

  1. vocab_size truncation from ssize_t to int32_t (text_token_generator.h:210):

    logits_tensor.size() returns ssize_t (typically 64-bit), but vocab_size is stored as int32_t. While no current model has vocab > 2B, the silent narrowing could be flagged by -Wshorten-64-to-32. A static_cast<int32_t> would make the truncation explicit — or better, assert vocab_size <= INT32_MAX before proceeding. The same type mismatch exists in the LogitProcessor interface itself (int32_t vocab_size constructor param). Since sample_from_logits uses ssize_t internally, this creates an inconsistency at the API boundary.

  2. Offset arithmetic for dim==3 also uses int32_t (text_token_generator.h:218):

    (num_tokens - 1) * vocab_size is computed as int32_t * int32_t. If vocab_size is large (e.g., 128K for some LLMs) and num_tokens is large (long sequences without KV cache), this could overflow int32_t. The equivalent code in util.h:50 uses ssize_t for both num_tokens and vocab_size, avoiding this. Consider matching the existing pattern:

    const ssize_t vocab_size = logits_tensor.size(logits_tensor.dim() - 1);
    ssize_t offset = 0;
    if (logits_tensor.dim() == 3) {
      const ssize_t num_tokens = logits_tensor.size(1);
      ...
      offset = (num_tokens - 1) * vocab_size;
    }

    Then cast to int32_t only when passing to processor->process().

  3. prev_token is unused by the processor chain but assigned before it (text_token_generator.h:128):

    This is pre-existing, but worth noting: if a future processor wants context (e.g., repetition penalty needs token history), the interface would need extension. The current design intentionally avoids this complexity, which is fine for v1.

Nits

  • text_token_generator.h:45: The null check on add_logit_processor silently drops null shared_ptrs. This is fine defensively, but an ET_CHECK_MSG would catch misuse earlier during development.
  • The LogitProcessor stores vocab_size_ but apply_logit_processors_ doesn't verify that registered processors' vocab_size() matches the tensor's vocab dimension. A mismatch would cause silent buffer overruns. Consider either: (a) asserting processor->vocab_size() == vocab_size in the loop, or (b) documenting that the caller is responsible for ensuring consistency.

Buck targets — correct

  • logit_processor.h exported from sampler target.
  • text_token_generator already depends on sampler via text_decoder_runner's transitive dep, plus the explicit dep added in runner/targets.bzl:71.
  • Test target uses non-aten sampler dep appropriately.

Summary

The main actionable item is the int32_t overflow risk in the offset arithmetic (issue #2) — this is a real correctness bug for large vocab models in non-KV-cache mode. The type mismatch with ssize_t (issue #1) is lower priority but worth fixing for consistency with util.h. Everything else looks good.


View job

meta-codesync Bot pushed a commit that referenced this pull request May 13, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:
- `LogitProcessor` (new): abstract class with a constructor that takes `vocab_size` and a pure virtual `process(float*)` method, placed in `extension/llm/sampler/`. The `vocab_size` is fixed per model and stored as a member, avoiding redundant per-call arguments.
- `TextTokenGenerator`: gains `add_logit_processor()`, `clear_logit_processors()`, and `num_logit_processors()`. The processor chain runs after the model step and before `logits_to_token()`. When no processors are registered, behavior is identical to before.
- `apply_logit_processors_()`: private helper that advances to the last-position logits for 3D tensors (mirroring `logits_to_token`), and invokes each processor in order. Supports Float, Half, BFloat16, and UInt16 dtypes: Float logits are processed in place (zero-copy); for other dtypes, logits are cast to a temporary float buffer, processed, then cast back to the original dtype.
- Buck: `logit_processor.h` exported from the sampler target; `text_token_generator` gains a direct dep on sampler; test target added.

Processors must be configured before calling `generate()` -- concurrent modification during generation is not safe.

Differential Revision: D104767967
Copilot AI review requested due to automatic review settings May 13, 2026 22:22
Copilot AI review requested due to automatic review settings May 14, 2026 07:09
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from a7cdd2e to a9add25 Compare May 14, 2026 07:09
@kirklandsign kirklandsign review requested due to automatic review settings May 14, 2026 07:09
meta-codesync Bot pushed a commit that referenced this pull request May 14, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:
- `LogitProcessor` (new): abstract class with a constructor that takes `vocab_size` and a pure virtual `process(float*)` method, placed in `extension/llm/sampler/`. The `vocab_size` is fixed per model and stored as a member, avoiding redundant per-call arguments.
- `TextTokenGenerator`: gains `add_logit_processor()`, `clear_logit_processors()`, and `num_logit_processors()`. The processor chain runs after the model step and before `logits_to_token()`. When no processors are registered, behavior is identical to before.
- `apply_logit_processors_()`: private helper that advances to the last-position logits for 3D tensors (mirroring `logits_to_token`), and invokes each processor in order. Supports Float, Half, BFloat16, and UInt16 dtypes: Float logits are processed in place (zero-copy); for other dtypes, logits are cast to a temporary float buffer, processed, then cast back to the original dtype.
- Buck: `logit_processor.h` exported from the sampler target; `text_token_generator` gains a direct dep on sampler; test target added.

Processors must be configured before calling `generate()` -- concurrent modification during generation is not safe.

Differential Revision: D104767967
Copilot AI review requested due to automatic review settings May 14, 2026 07:13
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from a9add25 to b25bbf5 Compare May 14, 2026 07:13
@kirklandsign kirklandsign review requested due to automatic review settings May 14, 2026 07:13
meta-codesync Bot pushed a commit that referenced this pull request May 14, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:
- `LogitProcessor` (new): abstract class with a constructor that takes `vocab_size` and a pure virtual `process(float*)` method, placed in `extension/llm/sampler/`. The `vocab_size` is fixed per model and stored as a member, avoiding redundant per-call arguments.
- `TextTokenGenerator`: gains `add_logit_processor()`, `clear_logit_processors()`, and `num_logit_processors()`. The processor chain runs after the model step and before `logits_to_token()`. When no processors are registered, behavior is identical to before.
- `apply_logit_processors_()`: private helper that advances to the last-position logits for 3D tensors (mirroring `logits_to_token`), and invokes each processor in order. Supports Float, Half, BFloat16, and UInt16 dtypes: Float logits are processed in place (zero-copy); for other dtypes, logits are cast to a temporary float buffer, processed, then cast back to the original dtype.
- Buck: `logit_processor.h` exported from the sampler target; `text_token_generator` gains a direct dep on sampler; test target added.

Processors must be configured before calling `generate()` -- concurrent modification during generation is not safe.

Differential Revision: D104767967
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from b25bbf5 to fa26f94 Compare May 14, 2026 07:13
meta-codesync Bot pushed a commit that referenced this pull request May 14, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that lets callers mutate
logits in place between the model forward pass and the sampler. Enables
grammar-constrained decoding, logit biasing, repetition penalties, and
similar transforms without touching the core generation loop.

Interface (`extension/llm/sampler/logit_processor.h`, ~15 lines):
- Single virtual method `process(::executorch::aten::Tensor logits)` that
  returns `Error::Ok` or aborts the chain on a non-Ok return.
- Tensor passed by value (handle-typed ATen idiom; mutations propagate
  through the shared underlying buffer).
- Each implementation declares its own dtype expectations -- the chain
  runner does not cast or copy the tensor. Typical implementations check
  `logits.scalar_type()` and either dispatch to a kernel or return
  InvalidArgument.
- Tensor shape contract (rank 2 = `[batch, vocab]`, rank 3 =
  `[batch, seq, vocab]` advanced to last sequence position) mirrors
  `sample_from_logits`.

Wiring (`extension/llm/runner/text_token_generator.h`):
- New public methods `add_logit_processor`, `clear_logit_processors`,
  `num_logit_processors`.
- Inside `generate()`, between `step()` and `logits_to_token()`, the loop
  invokes each registered processor:
    for (auto& processor : logit_processors_) {
      ET_CHECK_OK_OR_RETURN_ERROR(processor->process(logits_tensor));
    }
- Empty chain is the existing fast path; no behavior change for callers
  that don't register processors.

Configure processors before calling `generate()` -- concurrent
modification during generation is not safe.

Differential Revision: D104767967
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from fa26f94 to b416b30 Compare May 14, 2026 07:17
Copilot AI review requested due to automatic review settings May 14, 2026 07:17
@kirklandsign kirklandsign review requested due to automatic review settings May 14, 2026 07:17
meta-codesync Bot pushed a commit that referenced this pull request May 14, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that lets callers mutate
logits in place between the model forward pass and the sampler. Enables
grammar-constrained decoding, logit biasing, repetition penalties, and
similar transforms without touching the core generation loop.

Interface (`extension/llm/sampler/logit_processor.h`, ~15 lines):
- Single virtual method `process(::executorch::aten::Tensor logits)` that
  returns `Error::Ok` or aborts the chain on a non-Ok return.
- Tensor passed by value (handle-typed ATen idiom; mutations propagate
  through the shared underlying buffer).
- Each implementation declares its own dtype expectations -- the chain
  runner does not cast or copy the tensor. Typical implementations check
  `logits.scalar_type()` and either dispatch to a kernel or return
  InvalidArgument.
- Tensor shape contract (rank 2 = `[batch, vocab]`, rank 3 =
  `[batch, seq, vocab]` advanced to last sequence position) mirrors
  `sample_from_logits`.

Wiring (`extension/llm/runner/text_token_generator.h`):
- New public methods `add_logit_processor`, `clear_logit_processors`,
  `num_logit_processors`.
- Inside `generate()`, between `step()` and `logits_to_token()`, the loop
  invokes each registered processor:
    for (auto& processor : logit_processors_) {
      ET_CHECK_OK_OR_RETURN_ERROR(processor->process(logits_tensor));
    }
- Empty chain is the existing fast path; no behavior change for callers
  that don't register processors.

Configure processors before calling `generate()` -- concurrent
modification during generation is not safe.

Differential Revision: D104767967
Copilot AI review requested due to automatic review settings May 14, 2026 20:42
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from b416b30 to c08e1e3 Compare May 14, 2026 20:42
@kirklandsign kirklandsign review requested due to automatic review settings May 14, 2026 20:42
meta-codesync Bot pushed a commit that referenced this pull request May 14, 2026
Summary:

Introduces a `LogitProcessor` abstract interface that lets callers mutate
logits in place between the model forward pass and the sampler. Enables
grammar-constrained decoding, logit biasing, repetition penalties, and
similar transforms without touching the core generation loop.

Interface (`extension/llm/sampler/logit_processor.h`, ~15 lines):
- Single virtual method `process(::executorch::aten::Tensor logits)` that
  returns `Error::Ok` or aborts the chain on a non-Ok return.
- Tensor passed by value (handle-typed ATen idiom; mutations propagate
  through the shared underlying buffer).
- Each implementation declares its own dtype expectations -- the chain
  runner does not cast or copy the tensor. Typical implementations check
  `logits.scalar_type()` and either dispatch to a kernel or return
  InvalidArgument.
- Tensor shape contract (rank 2 = `[batch, vocab]`, rank 3 =
  `[batch, seq, vocab]` advanced to last sequence position) mirrors
  `sample_from_logits`.

Wiring (`extension/llm/runner/text_token_generator.h`):
- New public methods `add_logit_processor`, `clear_logit_processors`,
  `num_logit_processors`.
- Inside `generate()`, between `step()` and `logits_to_token()`, the loop
  invokes each registered processor:
    for (auto& processor : logit_processors_) {
      ET_CHECK_OK_OR_RETURN_ERROR(processor->process(logits_tensor));
    }
- Empty chain is the existing fast path; no behavior change for callers
  that don't register processors.

Configure processors before calling `generate()` -- concurrent
modification during generation is not safe.

Reviewed By: Gasoonjia

Differential Revision: D104767967
Copilot AI review requested due to automatic review settings May 14, 2026 21:14
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from c08e1e3 to 0e37b09 Compare May 14, 2026 21:14
@kirklandsign kirklandsign review requested due to automatic review settings May 14, 2026 21:14
Summary:

Introduces a `LogitProcessor` abstract interface that lets callers mutate
logits in place between the model forward pass and the sampler. Enables
grammar-constrained decoding, logit biasing, repetition penalties, and
similar transforms without touching the core generation loop.

Interface (`extension/llm/sampler/logit_processor.h`, ~15 lines):
- Single virtual method `process(::executorch::aten::Tensor logits)` that
  returns `Error::Ok` or aborts the chain on a non-Ok return.
- Tensor passed by value (handle-typed ATen idiom; mutations propagate
  through the shared underlying buffer).
- Each implementation declares its own dtype expectations -- the chain
  runner does not cast or copy the tensor. Typical implementations check
  `logits.scalar_type()` and either dispatch to a kernel or return
  InvalidArgument.
- Tensor shape contract (rank 2 = `[batch, vocab]`, rank 3 =
  `[batch, seq, vocab]` advanced to last sequence position) mirrors
  `sample_from_logits`.

Wiring (`extension/llm/runner/text_token_generator.h`):
- New public methods `add_logit_processor`, `clear_logit_processors`,
  `num_logit_processors`.
- Inside `generate()`, between `step()` and `logits_to_token()`, the loop
  invokes each registered processor:
    for (auto& processor : logit_processors_) {
      ET_CHECK_OK_OR_RETURN_ERROR(processor->process(logits_tensor));
    }
- Empty chain is the existing fast path; no behavior change for callers
  that don't register processors.

Configure processors before calling `generate()` -- concurrent
modification during generation is not safe.

Reviewed By: Gasoonjia

Differential Revision: D104767967
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from 0e37b09 to 7d57b0a Compare May 14, 2026 21:17
Copilot AI review requested due to automatic review settings May 14, 2026 21:17
@kirklandsign kirklandsign review requested due to automatic review settings May 14, 2026 21:17
@kirklandsign kirklandsign added the module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code label May 14, 2026
@kirklandsign kirklandsign added the release notes: llm Changes to llm utilities label May 14, 2026
@meta-codesync meta-codesync Bot merged commit 174d3ad into main May 15, 2026
177 of 199 checks passed
@meta-codesync meta-codesync Bot deleted the export-D104767967 branch May 15, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code release notes: llm Changes to llm utilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants