Skip to content

Add AWS Comprehend PII redactor for JS SDK#562

Open
GaruBrothers wants to merge 2 commits into
arakoodev:tsfrom
GaruBrothers:codex-aws-comprehend-redactor
Open

Add AWS Comprehend PII redactor for JS SDK#562
GaruBrothers wants to merge 2 commits into
arakoodev:tsfrom
GaruBrothers:codex-aws-comprehend-redactor

Conversation

@GaruBrothers

@GaruBrothers GaruBrothers commented May 27, 2026

Copy link
Copy Markdown

Implements the AWS Comprehend PII redaction utility requested in #290 for the JavaScript SDK.

/claim #290

What changed

  • Added AwsComprehendRedactor under the AI package.
  • Uses Amazon Comprehend DetectPiiEntities to detect PII spans.
  • Redacts prompts using stable AWS offsets, including Unicode-safe code point offset conversion for JavaScript strings.
  • Supports filtering by entity type and confidence score.
  • Adds helpers for prompt strings and chat message arrays.
  • Exports the redactor from @arakoodev/edgechains.js/ai.
  • Adds mocked unit tests for offset redaction, filtering, chat message handling and emoji/code-point offsets.
  • Adds a working example under examples/aws-comprehend-redaction.

Verification

npx jest src/ai/src/tests/awsComprehendRedactor.test.ts --runInBand
npx tsc -p tsconfig.json --noEmit
npx tsc -b

Results locally:

  • awsComprehendRedactor.test.ts: 6 tests passed.
  • tsc --noEmit: passed.
  • tsc -b: passed.
  • GitHub Actions js: passed.
  • GitHub Actions CLAAssistant: passed.
  • npm run build: not run successfully on Windows because the existing script uses Unix rm -rf.

Note

The real AWS call requires AWS credentials and region configured in the environment. Tests mock the Comprehend client so CI does not need AWS credentials.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be6959bcce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +205 to +207
redactedText.slice(0, entity.BeginOffset) +
replacement +
redactedText.slice(entity.EndOffset)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Convert Comprehend code-point offsets before slicing

When the prompt contains supplementary Unicode characters before the PII span, this slices at the wrong positions because Comprehend reports offsets in UTF-8 code points while JavaScript slice and text.length use UTF-16 code units. For example, an emoji earlier in the text shifts every following AWS offset by one code unit, so the redactor can leave part of an email/phone number unredacted or remove adjacent non-PII text. Convert code-point offsets to JS string indices before validating and slicing.

Useful? React with 👍 / 👎.

@GaruBrothers

Copy link
Copy Markdown
Author

I have read the Arakoo CLA Document and I hereby sign the CLA

@GaruBrothers

Copy link
Copy Markdown
Author

recheck

@GaruBrothers

Copy link
Copy Markdown
Author

Fixed the Unicode offset issue in dd70025 by converting Comprehend code point offsets to JavaScript UTF-16 code unit indices before slicing. Added an emoji regression test. Local verification: awsComprehendRedactor.test.ts now passes 6/6 tests, and npx tsc -p tsconfig.json --noEmit passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant