Name	Name	Last commit message	Last commit date
parent directory ..
src/text_to_code_lambda	src/text_to_code_lambda
tests	tests
README.md	README.md
pyproject.toml	pyproject.toml

Text-to-Code Lambda

Overview
Source-Bucket Directed Architecture
Pipeline Behavior
Outputs
IAM Requirements
Logging
Environment Variables
Tests

Overview

The Text-to-Code (TTC) Lambda infers structured medical codes, such as LOINC, from free text in eICRs when the original document is missing standard coded values.

It is triggered by SQS messages that wrap S3 EventBridge notifications. Each record points to an incoming TTC submission object in S3. The Lambda loads the related schematron validation response, loads the original eICR, evaluates candidate free-text values, embeds selected text, queries OpenSearch, reranks returned code suggestions, and writes TTC output artifacts back to S3.

The TTC output is consumed by downstream augmentation workflows.

Source-Bucket Directed Architecture

The TTC Lambda does not use a static, environment-variable-configured S3 bucket. Instead, it extracts the bucket name directly from the incoming S3 event payload at detail.bucket.name.

All reads and writes for a given invocation target the bucket that triggered the event.

This design enables a single deployed Lambda to serve multiple, independent data pipelines without reconfiguration. In the case of AIMS, the same TTC Lambda can be used for:

eCR Pipeline (ecr-data-repository bucket) — production processing of eICRs that fail TTC schematron validation.
TTC Training Pipeline (ecr-ttc-training bucket) — offline evaluation of TTC model performance against anonymized, baseline-tagged data.

Because the Lambda follows the event to whatever bucket produced it, adding a new pipeline is as simple as wiring a new bucket's EventBridge rule to the existing SQS queue. No Lambda code or bucket environment-variable change is required.

If an event does not include detail.bucket.name, the Lambda raises an error instead of falling back to a static bucket.

Pipeline Behavior

For each SQS record, the Lambda:

Parses the SQS body as an EventBridge S3 event.
Extracts the triggering object key and source bucket.
Extracts the persistence_id from the object key using TTC_INPUT_PREFIX.
Loads schematron validation responses from S3.
Extracts relevant schematron data fields for TTC processing.
Loads the original eICR from S3.
Extracts eICR metadata.
Evaluates free-text candidates from the eICR.
Selects the most relevant candidate text for each schematron error.
Embeds the selected text.
Queries OpenSearch using vector search.
Reranks OpenSearch results.
Builds NonstandardCodeInstance outputs for matched results.
Tracks unmatched schematron errors and reasons.
Saves TTC output for augmentation.
Saves TTC metadata output for analysis and evaluation.

If no relevant schematron fields are found, the Lambda writes TTC metadata explaining why processing was skipped and returns a successful no-match result.

If relevant fields are found but no code matches are selected, the Lambda still writes outputs and returns a successful no-match result.

Outputs

The Lambda writes two S3 artifacts.

TTC augmentation output

Written to:

<TTC_OUTPUT_PREFIX><persistence_id>

Default prefix:

TTCAugmentationMetadataV2/

This output is consumed by the Augmentation Lambda. It includes:

persistence_id
eicr_metadata
matched schematron_errors
unmatched_schematron_errors

TTC metadata output

Written to:

<TTC_METADATA_PREFIX><persistence_id>.json

Default prefix:

TTCMetadataV2/

This output is used for TTC analysis, debugging, and model evaluation. It includes:

persistence_id
eicr_metadata
processed schematron error details
OpenSearch result metadata
reranker result metadata
processed_at

IAM Requirements

The Lambda's execution role must have s3:GetObject and s3:PutObject permissions on every bucket that may produce events for it.

This is required by the source-bucket directed model: the Lambda reads inputs from, and writes outputs back to, whichever bucket the event originated from.

When onboarding a new bucket, update the Lambda's IAM policy to grant read/write access to that bucket.

The Lambda also needs permissions to access the configured OpenSearch cluster.

Logging

Every TTC invocation logs the record count at the start.

For each record, the Lambda logs the event bucket, triggering object key, and derived persistence ID as structured fields. It also carries the bucket name, persistence ID, and trigger S3 key as structured context through downstream log lines during record processing.

This makes it possible to filter CloudWatch logs by bucket, object key, or persistence ID when debugging cross-pipeline issues.

Environment Variables

Variable	Required	Default	Description
`SCHEMATRON_ERROR_PREFIX`	No	`ValidationResponseV2/`	S3 key prefix for schematron validation responses
`TTC_INPUT_PREFIX`	No	`TextToCodeSubmissionV2/`	S3 key prefix for TTC submission triggers
`TTC_OUTPUT_PREFIX`	No	`TTCAugmentationMetadataV2/`	S3 key prefix for TTC augmentation output
`TTC_METADATA_PREFIX`	No	`TTCMetadataV2/`	S3 key prefix for TTC analysis metadata
`AWS_REGION`	No	Auto-provided by Lambda	AWS region used by shared AWS client helpers
`S3_ENDPOINT_URL`	No	—	Optional custom S3 endpoint for local or mocked environments
`OPENSEARCH_ENDPOINT_URL`	Yes	—	OpenSearch cluster endpoint
`OPENSEARCH_INDEX`	No	`ttc-index`	OpenSearch index name for vector search

Tests

Run the package tests with:

just test all packages/text-to-code-lambda/tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Text-to-Code Lambda

Table of Contents

Overview

Source-Bucket Directed Architecture

Pipeline Behavior

Outputs

TTC augmentation output

TTC metadata output

IAM Requirements

Logging

Environment Variables

Tests

FilesExpand file tree

text-to-code-lambda

Directory actions

More options

Directory actions

More options

Latest commit

History

text-to-code-lambda

Folders and files

parent directory

README.md

Text-to-Code Lambda

Table of Contents

Overview

Source-Bucket Directed Architecture

Pipeline Behavior

Outputs

TTC augmentation output

TTC metadata output

IAM Requirements

Logging

Environment Variables

Tests