fix: ship handler.mjs as the sole Lambda entry point by zarirhamza · Pull Request #784 · DataDog/datadog-lambda-js

zarirhamza · 2026-06-09T14:20:18Z

What does this PR do?

Fixes ESM Lambda failures with the Datadog handler redirect by removing the dist/handler.js artifact from the published npm tarball. Lambda's bootstrap then resolves node_modules/datadog-lambda-js/dist/handler.handler to dist/handler.mjs, whose async load() already handles both CJS and ESM user modules. This aligns the npm package with the layer's behavior (which has always loaded handler.mjs directly).

Motivation

AWS Lambda's CJS resolver picks dist/handler.js before dist/handler.mjs when the handler string is node_modules/datadog-lambda-js/dist/handler.handler. Today dist/handler.js ships as a verbatim copy of dist/handler.cjs, so ESM user functions hit ERR_REQUIRE_ESM (or the surface "Cannot find module" variant when _tryRequireSync swallows it).

Reported in #782.

Approach

handler.mjs already handles both CJS and ESM user modules through its async load(). The only thing standing in its way is dist/handler.js, which Lambda's resolver picks first. This PR removes the source of that artifact end-to-end:

Delete src/handler.cjs (it was only reachable via the shim's require("./handler.cjs"), and Lambda's resolver doesn't auto-resolve .cjs).
Tighten scripts/update_dist_version.sh to copy only src/handler.mjs into dist/.
Drop the layer-side RUN rm /nodejs/.../handler.js in the Dockerfile — there is no handler.js to remove now.
Drop the cp ./dist/handler.cjs ./dist/handler.js lines from .gitlab/scripts/publish_npm.sh and scripts/publish_prod.sh (these were the original injection points).

End state: handler.mjs is the sole Lambda entry point in both the npm tarball and the published layer.

Why not the runtime-branching shim (earlier commits in this PR)

Initial commits added a src/handler.js CJS shim that detected ESM vs CJS user code at runtime and routed accordingly. After reviewer feedback and a closer look at the cold-start path, the shim is the wrong shape:

For CJS users it saved ~10-30ms of init time vs. going through handler.mjs's async load(). Negligible against the 500ms-3s a Datadog-instrumented Lambda already spends in init.
For ESM users it moved 300-800ms of tracer + user-handler loading from Lambda's init phase into the first invocation, which is strictly worse:
- First-invoke timeout risk on tight-timeout functions (3s API Gateway is common).
- Provisioned concurrency is wasted — work deferred past init isn't pre-warmed.
- Lambda SnapStart is wasted — snapshot is taken after init, and the shim's lazy import() runs post-snapshot.

handler.mjs runs all that work under Lambda's init top-level await, so PC / SnapStart customers pre-pay the cost and get fast invocations.

Aligns with:

The comment thread on this PR from the original handler.cjs / handler.mjs author recommending we "only bundle the ESM handler" since all supported runtimes (Node 18+) support top-level await in ESM.
The reporter's own suggested fix in #782.
The intent of the earlier PR #697.

Testing Guidelines

yarn build produces a dist/ with handler.mjs only — no handler.js, no handler.cjs.
yarn check-formatting, yarn lint, and yarn test behave the same as main (two pre-existing flakes in src/index.spec.ts around nock.isDone() reproduce on plain origin/main and are unrelated).
Integration test suite (esm_node{18,20,22,24}) verifies the layer path with ESM user code. The same path now also covers the npm tarball, since both ship the same handler.

Types of Changes

Bug fix
New feature
Breaking change
Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

This PR's description is comprehensive
This PR contains breaking changes that are documented in the description
This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
This PR impacts documentation, and it has been updated (or a ticket has been logged)
This PR's changes are covered by the automated tests
This PR collects user input/sensitive content into Datadog
This PR passes the integration tests (ask a Datadog member to run the tests)

AWS Lambda's CJS resolver picks `dist/handler.js` before `dist/handler.mjs` when the handler string is `node_modules/datadog-lambda-js/dist/handler.handler`, causing ESM user functions to fail with ERR_REQUIRE_ESM (or the surface "Cannot find module" variant when _tryRequireSync swallows it). Today `dist/handler.js` ships as a verbatim copy of `dist/handler.cjs` (via `cp ./dist/handler.cjs ./dist/handler.js` in the publish scripts), so it can never load ESM user code. This replaces the copy with a real `src/handler.js` shim that: - Detects ESM user code via the task root `package.json` `type` field or a `.mjs` `DD_LAMBDA_HANDLER` / `_HANDLER`. - For CJS user code: synchronously delegates to `handler.cjs`, preserving the prior fast path (no cold-start regression). - For ESM user code: exposes an async handler that lazily `import()`s `handler.mjs` on the first invocation. `handler.mjs`'s async `load()` transparently handles both CJS and ESM user modules. `scripts/update_dist_version.sh` already copies `src/handler.*` into `dist/`, so the shim ships automatically; the explicit `cp ./dist/handler.cjs ./dist/handler.js` lines in `publish_npm.sh` and `publish_prod.sh` are no longer needed and are removed. Refs: #782 Refs: SLES-2888, SLES-2895, SVLS-9238

datadog-prod-us1-3 · 2026-06-09T14:27:52Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

DataDog/datadog-lambda-js | e2e-test-status

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 3aa51e2 | Docs | Datadog PR Page | Give us feedback!}

The Dockerfile copies `dist/` wholesale into the layer, so the new `dist/handler.js` shim would get included and Lambda's bootstrap would resolve `handler.handler` to it. Inside the layer that goes through the shim's CJS branch -> handler.cjs -> _tryRequireSync, which can't load `.mjs` user modules (this regressed the `esm_node*` integration tests). The layer has never shipped a `handler.js`; Lambda's resolver falls through to `handler.mjs`, whose async `load()` handles both CJS and ESM user code. Drop the shim from the layer to preserve that behavior. The shim is still useful in the npm tarball, where the publish scripts used to copy `handler.cjs` to `handler.js`.

zarirhamza · 2026-06-09T14:43:28Z

First push regressed the esm_node* integration tests on all four runtimes — error was Runtime.ImportModuleError: Cannot find module 'esm' through handler.cjs. Root cause: the Dockerfile copies dist/ wholesale into the layer, so the new dist/handler.js shim ended up on the layer too, and Lambda's bootstrap resolved handler.handler to it instead of falling through to handler.mjs.

Inside the layer, the shim's userIsEsm() check returns false for the integration test's esm_node function (handler string is esm.handle with no .mjs suffix, and integration_tests/package.json has no type: "module"), so it delegated to handler.cjs → loadSync → _tryRequireSync, which doesn't try .mjs.

The layer has never shipped a handler.js (it was only injected into the npm tarball by the publish-script cp), and Lambda's resolver fall-through to handler.mjs already handles both CJS and ESM via the async load(). Second commit (270a846) drops the shim from the layer via RUN rm so the layer keeps that behavior unchanged from main. Shim still ships in the npm package, which is where it's actually needed.

duncanista

Thanks for tackling this — the bug is real and the runtime-branching shim is a reasonable approach. A few thoughts on the structure and on coverage that I think are worth addressing before merge. Nothing blocking the direction, just some cleanup that would make the file layout easier to reason about for the next person who touches it.

- Rename `userIsEsm()` to `shouldUseEsmHandler()`. The function returns a routing decision based on heuristics (env regex + package.json sniff), not an authoritative fact about the user's code, so the new name reads more honestly at the call site. - Inline the former `src/handler.cjs` body into the CJS branch of the shim and delete `src/handler.cjs`. Lambda's resolver doesn't auto-resolve `.cjs`, so handler.cjs was only ever reachable via this shim's `require("./handler.cjs")`. Keeping it as a separate file made the load graph confusing — future readers grepping for "what loads handler.cjs?" would find one caller in the same directory. The heavy `require()`s remain scoped to the CJS branch so ESM users don't pay the cost of pulling in the full tracer at require-time. - Add `src/handler.spec.ts` covering shouldUseEsmHandler's four-case matrix (.mjs in _HANDLER / DD_LAMBDA_HANDLER, package.json type=module, package.json without type field, package.json missing, package.json unparseable). Detection is the load-bearing piece — a regression there means either ESM users keep seeing ERR_REQUIRE_ESM or CJS users pay an unnecessary async-import cold-start cost. - Tighten `scripts/update_dist_version.sh`'s `cp src/handler.*` to list the runtime artifacts explicitly. The broader glob also picked up `src/handler.spec.ts`, which is a test fixture, not a runtime file — shipping it would also be silently inconsistent with the Dockerfile's hard-coded `rm /nodejs/.../handler.js`. - Expand the Dockerfile comment to point at `scripts/update_dist_version.sh` as the source of the file being removed, so a future change to how `src/handler.js` is copied into `dist/` is visibly tied to the layer-side `rm`.

DarcyRaynerDD · 2026-06-11T14:47:28Z

I think I wrote the original version of the handler.cjs/esm module split a few years ago, so maybe this context would help.

So, much of this logic was due to whether the node runtime supports ESM modules and top level await, (at the time we supported node 12 which didn't). Top level await is only supported in ESM, and is needed when dynamic importing ESM modules so we can receive the handler function instead of a promise, and export the result. This isn't possible in CJS, while you can import ESM modules dynamically, they end up as promises and if you want to await that promise you have to do it within an async function. ESM doesn't have any problems requiring CJS modules dynamically though. So, when the customer is using the layer, and a runtime >= 14, it makes sense to just use the ESM handler since it should handle all cases in theory. Since older runtimes are completely deprecated now, we should only bundle the ESM handler into the layer. I haven't looked at the support case to check if this reasoning was solid, just what I remember verifying at the time, and I know that was the previous attempted approach, but I'm confused how it could have broken CJS users.

The second case is when installing datadog-lambda-js via npm. Since we don't know which module format the customer is using, we have to package both versions. I believe I did some experimentation to find which files would be auto-imported based on their file extension. I think you had to specify the .mjs extension for it to work for esm. Typically though, a customer using the npm module could just import the datadog-lambda-js library and wrap manually if they ran into an issue.

After feedback from Darcy Rayner (original handler.cjs/mjs author) and re-evaluating the cold-start trade-offs, the shim is the wrong shape. The shim's CJS branch saved ~10-30ms of cold-start init for CJS users on the npm-redirect path, but its ESM branch moved 300-800ms of tracer/handler load work from Lambda's init phase into the first invocation. That's worse in three ways for ESM users (the cohort this PR exists to fix): 1. First-invoke timeout risk on functions with tight timeouts — the loading work now eats into the invocation's time budget. 2. Provisioned concurrency is wasted — work deferred past init isn't pre-warmed, so PC customers pay for warm capacity and still hit cold-start latency on the first real request. 3. Lambda SnapStart is wasted for the same reason — snapshot is taken after init, and the shim's lazy `import()` runs post-snapshot. handler.mjs's async `load()` already handles both CJS and ESM user modules transparently. Lambda's bootstrap resolves `dist/handler.handler` to handler.mjs once `.js` is absent, so removing the shim is sufficient and matches the behavior the published layer has always had. End state: - Sole Lambda entry point everywhere: `handler.mjs`. - The npm tarball and the layer ship the same `dist/` (one handler file). - No detection heuristics, no shim, no `handler.cjs` (already gone). Aligns with: - Darcy Rayner's recommendation on the PR thread to "only bundle the ESM handler" since all supported runtimes (Node 18+) support top-level await in ESM. - The customer's own suggested fix in #782. - SLES-2888's confirmed-working internal workaround (`rm dist/handler.js` post-install). - The intent of the earlier PR #697.

zarirhamza · 2026-06-11T15:06:44Z

@DarcyRaynerDD thanks for the historical context — it's exactly what was missing. You're right on both counts.

I switched the PR to do what you suggested in commit ee53944: handler.mjs is now the sole Lambda entry point in both the npm tarball and the layer. The shim is gone, src/handler.cjs is gone, and dist/ ships a single handler file. The diff against main collapses down to ~6 source lines plus removal of the dead files.

Re your "I'm confused how it could have broken CJS users" point — I went back through this carefully and I don't think it would have. Lambda's bootstrap resolves dist/handler.handler to handler.mjs once .js is absent, and handler.mjs's async load() already handles both CJS and ESM user modules transparently. The earlier attempt (PR #697) seems to have been closed without a recorded technical reason; I suspect it was process drift rather than a real failure. The integration test suite's esm_node{18,20,22,24} jobs cover exactly this code path against a real ESM user function, and they pass on this branch.

I had originally added a runtime-branching shim to avoid making CJS users on the npm-redirect path pay an extra ~10-30ms of init time (sync loadSync vs. async await load()). On reflection that's noise relative to a Datadog Lambda's normal init cost, and the shim's deferred-load behavior actually penalized ESM users on first invocation by 300-800ms — including PC and SnapStart customers, where init work is supposed to be pre-paid. Net loss for the customers this PR exists to help, so it's gone.

Description updated. Welcome any further notes.

Copilot

Pull request overview

Removes the CommonJS handler artifact path so AWS Lambda resolves the Datadog handler redirect to the ESM handler.mjs, avoiding ESM user-handler load failures caused by dist/handler.js being preferred by the bootstrap resolver.

Changes:

Delete src/handler.cjs so there’s no longer a CJS handler source to ship/copy.
Update scripts/update_dist_version.sh to copy only src/handler.mjs into dist/.
Remove publishing-time steps that copied dist/handler.cjs to dist/handler.js.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`src/handler.cjs`	Removes the CJS handler implementation so it can’t be shipped as an entry point.
`scripts/update_dist_version.sh`	Restricts post-build copying to `handler.mjs` as the intended Lambda entry point.
`scripts/publish_prod.sh`	Stops injecting `dist/handler.js` during prod publishing.
`.gitlab/scripts/publish_npm.sh`	Stops injecting `dist/handler.js` during CI npm publishing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lucaspimentel · 2026-06-11T18:45:50Z

According to 🤖

loadSync is now dead code after this PR

Deleting src/handler.cjs removes the only consumer of loadSync, leaving an orphaned synchronous loader chain in src/runtime/user-function.ts:

loadSync (line 317) — its only callers were handler.cjs (the traceExtractor load and the wrapped-handler load); both are gone with the file.
_loadUserAppSync (line 224) — only called by loadSync.
_tryRequireSync (line 164) — only called by _loadUserAppSync.

It's also not reachable through the package's public entry point: src/index.ts imports only subscribeToDC from ./runtime, so loadSync was never exposed to external consumers despite the export { load, loadSync } re-export in src/runtime/index.ts:1. No tests reference it either.

Since this PR is what orphaned it, it'd be clean to remove the three functions (user-function.ts:164-196, 224-242, 317-342) and drop loadSync from the runtime/index.ts re-export in the same change. The async load / _loadUserApp / _tryRequire path remains the sole live loader, which matches the PR's goal of a single ESM entry point.

…rtifacts Two follow-ups from PR #784 review: 1. Remove `loadSync`, `_loadUserAppSync`, `_tryRequireSync`. Deleting `src/handler.cjs` in the previous commit removed the only consumers of the synchronous loader chain. `loadSync` was never re-exported from the public `src/index.ts` entry point, no tests reference it, and the only internal import (`src/handler.mjs`) uses the async `load()`. The three functions and the `loadSync` re-export in `src/runtime/index.ts` are now dead code, so drop them. 2. Defensive cleanup in `scripts/update_dist_version.sh`: explicitly `rm -f dist/handler.js dist/handler.cjs` before copying handler.mjs. `tsc` does not clean stale files in `dist/` between incremental builds, so a leftover `handler.js` from a prior build of an earlier ref could resurface and silently re-introduce the resolver bug this PR is fixing. The publish scripts already do `rm -rf ./dist` before `yarn build`, so this is purely a guard for local dev builds.

zarirhamza · 2026-06-11T18:56:10Z

@lucaspimentel good catch — verified and addressed in 3aa51e2:

loadSync only ever had two callers, both in handler.cjs (the traceExtractor load and the wrapped-handler load). Gone.
_loadUserAppSync only called by loadSync. Gone.
_tryRequireSync only called by _loadUserAppSync. Gone.
Confirmed not in the public API: src/index.ts only imports subscribeToDC from ./runtime, and no tests reference loadSync.

Removed all three functions from src/runtime/user-function.ts and dropped loadSync from the src/runtime/index.ts re-export in the same chunk. Build + lint + tests pass; the two nock.isDone() flakes in src/index.spec.ts reproduce on plain origin/main and are unrelated.

Net diff: -105 lines from user-function.ts, -1 char from runtime/index.ts.

zarirhamza marked this pull request as ready for review June 9, 2026 15:19

zarirhamza requested review from a team as code owners June 9, 2026 15:19

zarirhamza requested a review from lym953 June 9, 2026 15:19

duncanista reviewed Jun 10, 2026

View reviewed changes

Comment thread src/handler.js Outdated

Comment thread src/handler.js Outdated

Comment thread src/handler.js Outdated

Comment thread Dockerfile Outdated

zarirhamza changed the title ~~fix: route ESM Lambdas through handler.mjs via CJS shim~~ fix: ship handler.mjs as the sole Lambda entry point Jun 11, 2026

lucaspimentel requested a review from Copilot June 11, 2026 18:37

Copilot started reviewing on behalf of lucaspimentel June 11, 2026 18:38 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Comment thread scripts/update_dist_version.sh

samchungy mentioned this pull request Jun 14, 2026

Revert Lambda ESM changes seek-oss/skuba#2456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ship handler.mjs as the sole Lambda entry point#784

fix: ship handler.mjs as the sole Lambda entry point#784
zarirhamza wants to merge 5 commits into
mainfrom
zarir/sles-2888-esm-handler-shim

zarirhamza commented Jun 9, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-3 Bot commented Jun 9, 2026 •

edited by datadog-official Bot

Loading

Uh oh!

zarirhamza commented Jun 9, 2026

Uh oh!

duncanista left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarcyRaynerDD commented Jun 11, 2026

Uh oh!

zarirhamza commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

lucaspimentel commented Jun 11, 2026 •

edited

Loading

Uh oh!

zarirhamza commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

zarirhamza commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Approach

Why not the runtime-branching shim (earlier commits in this PR)

Testing Guidelines

Types of Changes

Check all that apply

Uh oh!

datadog-prod-us1-3 Bot commented Jun 9, 2026 • edited by datadog-official Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

zarirhamza commented Jun 9, 2026

Uh oh!

duncanista left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarcyRaynerDD commented Jun 11, 2026

Uh oh!

zarirhamza commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

lucaspimentel commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zarirhamza commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zarirhamza commented Jun 9, 2026 •

edited

Loading

datadog-prod-us1-3 Bot commented Jun 9, 2026 •

edited by datadog-official Bot

Loading

lucaspimentel commented Jun 11, 2026 •

edited

Loading