Skip to content

fix: ship handler.mjs as the sole Lambda entry point#784

Open
zarirhamza wants to merge 5 commits into
mainfrom
zarir/sles-2888-esm-handler-shim
Open

fix: ship handler.mjs as the sole Lambda entry point#784
zarirhamza wants to merge 5 commits into
mainfrom
zarir/sles-2888-esm-handler-shim

Conversation

@zarirhamza

@zarirhamza zarirhamza commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes ESM Lambda failures with the Datadog handler redirect by removing the dist/handler.js artifact from the published npm tarball. Lambda's bootstrap then resolves node_modules/datadog-lambda-js/dist/handler.handler to dist/handler.mjs, whose async load() already handles both CJS and ESM user modules. This aligns the npm package with the layer's behavior (which has always loaded handler.mjs directly).

Motivation

AWS Lambda's CJS resolver picks dist/handler.js before dist/handler.mjs when the handler string is node_modules/datadog-lambda-js/dist/handler.handler. Today dist/handler.js ships as a verbatim copy of dist/handler.cjs, so ESM user functions hit ERR_REQUIRE_ESM (or the surface "Cannot find module" variant when _tryRequireSync swallows it).

Reported in #782.

Approach

handler.mjs already handles both CJS and ESM user modules through its async load(). The only thing standing in its way is dist/handler.js, which Lambda's resolver picks first. This PR removes the source of that artifact end-to-end:

  • Delete src/handler.cjs (it was only reachable via the shim's require("./handler.cjs"), and Lambda's resolver doesn't auto-resolve .cjs).
  • Tighten scripts/update_dist_version.sh to copy only src/handler.mjs into dist/.
  • Drop the layer-side RUN rm /nodejs/.../handler.js in the Dockerfile — there is no handler.js to remove now.
  • Drop the cp ./dist/handler.cjs ./dist/handler.js lines from .gitlab/scripts/publish_npm.sh and scripts/publish_prod.sh (these were the original injection points).

End state: handler.mjs is the sole Lambda entry point in both the npm tarball and the published layer.

Why not the runtime-branching shim (earlier commits in this PR)

Initial commits added a src/handler.js CJS shim that detected ESM vs CJS user code at runtime and routed accordingly. After reviewer feedback and a closer look at the cold-start path, the shim is the wrong shape:

  • For CJS users it saved ~10-30ms of init time vs. going through handler.mjs's async load(). Negligible against the 500ms-3s a Datadog-instrumented Lambda already spends in init.
  • For ESM users it moved 300-800ms of tracer + user-handler loading from Lambda's init phase into the first invocation, which is strictly worse:
    • First-invoke timeout risk on tight-timeout functions (3s API Gateway is common).
    • Provisioned concurrency is wasted — work deferred past init isn't pre-warmed.
    • Lambda SnapStart is wasted — snapshot is taken after init, and the shim's lazy import() runs post-snapshot.

handler.mjs runs all that work under Lambda's init top-level await, so PC / SnapStart customers pre-pay the cost and get fast invocations.

Aligns with:

  • The comment thread on this PR from the original handler.cjs / handler.mjs author recommending we "only bundle the ESM handler" since all supported runtimes (Node 18+) support top-level await in ESM.
  • The reporter's own suggested fix in #782.
  • The intent of the earlier PR #697.

Testing Guidelines

  • yarn build produces a dist/ with handler.mjs only — no handler.js, no handler.cjs.
  • yarn check-formatting, yarn lint, and yarn test behave the same as main (two pre-existing flakes in src/index.spec.ts around nock.isDone() reproduce on plain origin/main and are unrelated).
  • Integration test suite (esm_node{18,20,22,24}) verifies the layer path with ESM user code. The same path now also covers the npm tarball, since both ship the same handler.

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

AWS Lambda's CJS resolver picks `dist/handler.js` before `dist/handler.mjs`
when the handler string is `node_modules/datadog-lambda-js/dist/handler.handler`,
causing ESM user functions to fail with ERR_REQUIRE_ESM (or the surface
"Cannot find module" variant when _tryRequireSync swallows it).

Today `dist/handler.js` ships as a verbatim copy of `dist/handler.cjs`
(via `cp ./dist/handler.cjs ./dist/handler.js` in the publish scripts),
so it can never load ESM user code.

This replaces the copy with a real `src/handler.js` shim that:

- Detects ESM user code via the task root `package.json` `type` field or
  a `.mjs` `DD_LAMBDA_HANDLER` / `_HANDLER`.
- For CJS user code: synchronously delegates to `handler.cjs`, preserving
  the prior fast path (no cold-start regression).
- For ESM user code: exposes an async handler that lazily `import()`s
  `handler.mjs` on the first invocation. `handler.mjs`'s async `load()`
  transparently handles both CJS and ESM user modules.

`scripts/update_dist_version.sh` already copies `src/handler.*` into
`dist/`, so the shim ships automatically; the explicit
`cp ./dist/handler.cjs ./dist/handler.js` lines in `publish_npm.sh` and
`publish_prod.sh` are no longer needed and are removed.

Refs: #782
Refs: SLES-2888, SLES-2895, SVLS-9238
@datadog-prod-us1-3

datadog-prod-us1-3 Bot commented Jun 9, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

DataDog/datadog-lambda-js | e2e-test-status   View in Datadog   GitLab

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 3aa51e2 | Docs | Datadog PR Page | Give us feedback!

The Dockerfile copies `dist/` wholesale into the layer, so the new
`dist/handler.js` shim would get included and Lambda's bootstrap would
resolve `handler.handler` to it. Inside the layer that goes through the
shim's CJS branch -> handler.cjs -> _tryRequireSync, which can't load
`.mjs` user modules (this regressed the `esm_node*` integration tests).

The layer has never shipped a `handler.js`; Lambda's resolver falls
through to `handler.mjs`, whose async `load()` handles both CJS and ESM
user code. Drop the shim from the layer to preserve that behavior. The
shim is still useful in the npm tarball, where the publish scripts used
to copy `handler.cjs` to `handler.js`.
@zarirhamza

Copy link
Copy Markdown
Contributor Author

First push regressed the esm_node* integration tests on all four runtimes — error was Runtime.ImportModuleError: Cannot find module 'esm' through handler.cjs. Root cause: the Dockerfile copies dist/ wholesale into the layer, so the new dist/handler.js shim ended up on the layer too, and Lambda's bootstrap resolved handler.handler to it instead of falling through to handler.mjs.

Inside the layer, the shim's userIsEsm() check returns false for the integration test's esm_node function (handler string is esm.handle with no .mjs suffix, and integration_tests/package.json has no type: "module"), so it delegated to handler.cjsloadSync_tryRequireSync, which doesn't try .mjs.

The layer has never shipped a handler.js (it was only injected into the npm tarball by the publish-script cp), and Lambda's resolver fall-through to handler.mjs already handles both CJS and ESM via the async load(). Second commit (270a846) drops the shim from the layer via RUN rm so the layer keeps that behavior unchanged from main. Shim still ships in the npm package, which is where it's actually needed.

@zarirhamza zarirhamza marked this pull request as ready for review June 9, 2026 15:19
@zarirhamza zarirhamza requested review from a team as code owners June 9, 2026 15:19
@zarirhamza zarirhamza requested a review from lym953 June 9, 2026 15:19

@duncanista duncanista left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — the bug is real and the runtime-branching shim is a reasonable approach. A few thoughts on the structure and on coverage that I think are worth addressing before merge. Nothing blocking the direction, just some cleanup that would make the file layout easier to reason about for the next person who touches it.

Comment thread src/handler.js Outdated
Comment thread src/handler.js Outdated
Comment thread src/handler.js Outdated
Comment thread Dockerfile Outdated
- Rename `userIsEsm()` to `shouldUseEsmHandler()`. The function returns a
  routing decision based on heuristics (env regex + package.json sniff),
  not an authoritative fact about the user's code, so the new name reads
  more honestly at the call site.

- Inline the former `src/handler.cjs` body into the CJS branch of the
  shim and delete `src/handler.cjs`. Lambda's resolver doesn't auto-resolve
  `.cjs`, so handler.cjs was only ever reachable via this shim's
  `require("./handler.cjs")`. Keeping it as a separate file made the load
  graph confusing — future readers grepping for "what loads handler.cjs?"
  would find one caller in the same directory. The heavy `require()`s
  remain scoped to the CJS branch so ESM users don't pay the cost of
  pulling in the full tracer at require-time.

- Add `src/handler.spec.ts` covering shouldUseEsmHandler's four-case
  matrix (.mjs in _HANDLER / DD_LAMBDA_HANDLER, package.json type=module,
  package.json without type field, package.json missing, package.json
  unparseable). Detection is the load-bearing piece — a regression there
  means either ESM users keep seeing ERR_REQUIRE_ESM or CJS users pay an
  unnecessary async-import cold-start cost.

- Tighten `scripts/update_dist_version.sh`'s `cp src/handler.*` to list
  the runtime artifacts explicitly. The broader glob also picked up
  `src/handler.spec.ts`, which is a test fixture, not a runtime file —
  shipping it would also be silently inconsistent with the Dockerfile's
  hard-coded `rm /nodejs/.../handler.js`.

- Expand the Dockerfile comment to point at
  `scripts/update_dist_version.sh` as the source of the file being
  removed, so a future change to how `src/handler.js` is copied into
  `dist/` is visibly tied to the layer-side `rm`.
@DarcyRaynerDD

Copy link
Copy Markdown
Contributor

I think I wrote the original version of the handler.cjs/esm module split a few years ago, so maybe this context would help.

So, much of this logic was due to whether the node runtime supports ESM modules and top level await, (at the time we supported node 12 which didn't). Top level await is only supported in ESM, and is needed when dynamic importing ESM modules so we can receive the handler function instead of a promise, and export the result. This isn't possible in CJS, while you can import ESM modules dynamically, they end up as promises and if you want to await that promise you have to do it within an async function. ESM doesn't have any problems requiring CJS modules dynamically though. So, when the customer is using the layer, and a runtime >= 14, it makes sense to just use the ESM handler since it should handle all cases in theory. Since older runtimes are completely deprecated now, we should only bundle the ESM handler into the layer. I haven't looked at the support case to check if this reasoning was solid, just what I remember verifying at the time, and I know that was the previous attempted approach, but I'm confused how it could have broken CJS users.

The second case is when installing datadog-lambda-js via npm. Since we don't know which module format the customer is using, we have to package both versions. I believe I did some experimentation to find which files would be auto-imported based on their file extension. I think you had to specify the .mjs extension for it to work for esm. Typically though, a customer using the npm module could just import the datadog-lambda-js library and wrap manually if they ran into an issue.

After feedback from Darcy Rayner (original handler.cjs/mjs author) and
re-evaluating the cold-start trade-offs, the shim is the wrong shape.

The shim's CJS branch saved ~10-30ms of cold-start init for CJS users
on the npm-redirect path, but its ESM branch moved 300-800ms of
tracer/handler load work from Lambda's init phase into the first
invocation. That's worse in three ways for ESM users (the cohort this
PR exists to fix):

  1. First-invoke timeout risk on functions with tight timeouts —
     the loading work now eats into the invocation's time budget.
  2. Provisioned concurrency is wasted — work deferred past init isn't
     pre-warmed, so PC customers pay for warm capacity and still hit
     cold-start latency on the first real request.
  3. Lambda SnapStart is wasted for the same reason — snapshot is
     taken after init, and the shim's lazy `import()` runs post-snapshot.

handler.mjs's async `load()` already handles both CJS and ESM user
modules transparently. Lambda's bootstrap resolves `dist/handler.handler`
to handler.mjs once `.js` is absent, so removing the shim is sufficient
and matches the behavior the published layer has always had.

End state:
- Sole Lambda entry point everywhere: `handler.mjs`.
- The npm tarball and the layer ship the same `dist/` (one handler file).
- No detection heuristics, no shim, no `handler.cjs` (already gone).

Aligns with:
- Darcy Rayner's recommendation on the PR thread to "only bundle the
  ESM handler" since all supported runtimes (Node 18+) support top-level
  await in ESM.
- The customer's own suggested fix in
  #782.
- SLES-2888's confirmed-working internal workaround
  (`rm dist/handler.js` post-install).
- The intent of the earlier PR #697.
@zarirhamza zarirhamza changed the title fix: route ESM Lambdas through handler.mjs via CJS shim fix: ship handler.mjs as the sole Lambda entry point Jun 11, 2026
@zarirhamza

Copy link
Copy Markdown
Contributor Author

@DarcyRaynerDD thanks for the historical context — it's exactly what was missing. You're right on both counts.

I switched the PR to do what you suggested in commit ee53944: handler.mjs is now the sole Lambda entry point in both the npm tarball and the layer. The shim is gone, src/handler.cjs is gone, and dist/ ships a single handler file. The diff against main collapses down to ~6 source lines plus removal of the dead files.

Re your "I'm confused how it could have broken CJS users" point — I went back through this carefully and I don't think it would have. Lambda's bootstrap resolves dist/handler.handler to handler.mjs once .js is absent, and handler.mjs's async load() already handles both CJS and ESM user modules transparently. The earlier attempt (PR #697) seems to have been closed without a recorded technical reason; I suspect it was process drift rather than a real failure. The integration test suite's esm_node{18,20,22,24} jobs cover exactly this code path against a real ESM user function, and they pass on this branch.

I had originally added a runtime-branching shim to avoid making CJS users on the npm-redirect path pay an extra ~10-30ms of init time (sync loadSync vs. async await load()). On reflection that's noise relative to a Datadog Lambda's normal init cost, and the shim's deferred-load behavior actually penalized ESM users on first invocation by 300-800ms — including PC and SnapStart customers, where init work is supposed to be pre-paid. Net loss for the customers this PR exists to help, so it's gone.

Description updated. Welcome any further notes.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the CommonJS handler artifact path so AWS Lambda resolves the Datadog handler redirect to the ESM handler.mjs, avoiding ESM user-handler load failures caused by dist/handler.js being preferred by the bootstrap resolver.

Changes:

  • Delete src/handler.cjs so there’s no longer a CJS handler source to ship/copy.
  • Update scripts/update_dist_version.sh to copy only src/handler.mjs into dist/.
  • Remove publishing-time steps that copied dist/handler.cjs to dist/handler.js.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/handler.cjs Removes the CJS handler implementation so it can’t be shipped as an entry point.
scripts/update_dist_version.sh Restricts post-build copying to handler.mjs as the intended Lambda entry point.
scripts/publish_prod.sh Stops injecting dist/handler.js during prod publishing.
.gitlab/scripts/publish_npm.sh Stops injecting dist/handler.js during CI npm publishing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/update_dist_version.sh
@lucaspimentel

lucaspimentel commented Jun 11, 2026

Copy link
Copy Markdown
Member

According to 🤖

loadSync is now dead code after this PR

Deleting src/handler.cjs removes the only consumer of loadSync, leaving an orphaned synchronous loader chain in src/runtime/user-function.ts:

  • loadSync (line 317) — its only callers were handler.cjs (the traceExtractor load and the wrapped-handler load); both are gone with the file.
  • _loadUserAppSync (line 224) — only called by loadSync.
  • _tryRequireSync (line 164) — only called by _loadUserAppSync.

It's also not reachable through the package's public entry point: src/index.ts imports only subscribeToDC from ./runtime, so loadSync was never exposed to external consumers despite the export { load, loadSync } re-export in src/runtime/index.ts:1. No tests reference it either.

Since this PR is what orphaned it, it'd be clean to remove the three functions (user-function.ts:164-196, 224-242, 317-342) and drop loadSync from the runtime/index.ts re-export in the same change. The async load / _loadUserApp / _tryRequire path remains the sole live loader, which matches the PR's goal of a single ESM entry point.

…rtifacts

Two follow-ups from PR #784 review:

1. Remove `loadSync`, `_loadUserAppSync`, `_tryRequireSync`. Deleting
   `src/handler.cjs` in the previous commit removed the only consumers of
   the synchronous loader chain. `loadSync` was never re-exported from the
   public `src/index.ts` entry point, no tests reference it, and the only
   internal import (`src/handler.mjs`) uses the async `load()`. The three
   functions and the `loadSync` re-export in `src/runtime/index.ts` are
   now dead code, so drop them.

2. Defensive cleanup in `scripts/update_dist_version.sh`: explicitly
   `rm -f dist/handler.js dist/handler.cjs` before copying handler.mjs.
   `tsc` does not clean stale files in `dist/` between incremental
   builds, so a leftover `handler.js` from a prior build of an earlier
   ref could resurface and silently re-introduce the resolver bug this
   PR is fixing. The publish scripts already do `rm -rf ./dist` before
   `yarn build`, so this is purely a guard for local dev builds.
@zarirhamza

Copy link
Copy Markdown
Contributor Author

@lucaspimentel good catch — verified and addressed in 3aa51e2:

  • loadSync only ever had two callers, both in handler.cjs (the traceExtractor load and the wrapped-handler load). Gone.
  • _loadUserAppSync only called by loadSync. Gone.
  • _tryRequireSync only called by _loadUserAppSync. Gone.
  • Confirmed not in the public API: src/index.ts only imports subscribeToDC from ./runtime, and no tests reference loadSync.

Removed all three functions from src/runtime/user-function.ts and dropped loadSync from the src/runtime/index.ts re-export in the same chunk. Build + lint + tests pass; the two nock.isDone() flakes in src/index.spec.ts reproduce on plain origin/main and are unrelated.

Net diff: -105 lines from user-function.ts, -1 char from runtime/index.ts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants