module: add clearCache for CJS and ESM#61767
Conversation
|
Review requested:
|
90303e6 to
1d0accc
Compare
|
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. |
|
I’m relatively +1 on having this in Node.js, but I recall having a a lot of discussions about this @GeoffreyBooth and @nodejs/loaders teams about this, and it would massively break the spec, expectations, and invariants regarding ESM. (Note, this is what people have been asking us to add for a long time). My personal objection to this API is that it would inadvertently leak memory at every turn, so while this sounds good in theory, in practice it would significantly backfire in long-running scenarios. An option could be to expose it only behind a flag, putting the user in charge of choosing this behavior. Every single scenario where I saw HMR in Node.js ends up in memory leaks. This is the reason why I had so much interest and hopes for ShadowRealm. |
benjamingr
left a comment
There was a problem hiding this comment.
I am still +1 on the feature from a user usability point of view. Code lgtm.
We're giving users a tool, it may be seen as a footgun by some but hopefully libraries that use the API correctly and warn users about incorrect usage emerge. |
|
@mcollina Thanks for the feedback. I agree the ESM semantics concerns are real. This API doesn’t change the core ESM invariants (single instance per URL); it only removes Node's internal cache entries to allow explicit reloads in opt‑in workflows. Even with that, existing references (namespaces, listeners, closures) can keep old graphs alive, so this is still potentially leaky unless the app does explicit disposal. I’ll make sure the docs call out the risks and the fact that this only clears Node’s internal caches, and I’d like loader team input on the final shape of the API. This commit should address some of your concerns. b3bd79a
Thanks for the review @benjamingr. Would you mind re-reviewing again so I can trigger CI? |
|
Thanks a lot for this ❤️ |
|
Rather than violating ESM invariants, can't node just provide a function that imports a url? i.e. While the given example of: const url = new URL('./mod.mjs', import.meta.url);
await import(url.href);
clearCache(url);
await import(url.href); // re-executes the moduleis indeed not spec compliant, it's perfectly legal to have something like: import { clearCache, importModule } from "node:module";
await importModule(someUrl);
clearCache();
await importModule(someUrl); // reexecute |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #61767 +/- ##
==========================================
- Coverage 89.72% 89.71% -0.02%
==========================================
Files 676 677 +1
Lines 206065 206553 +488
Branches 39508 39597 +89
==========================================
+ Hits 184897 185301 +404
- Misses 13315 13383 +68
- Partials 7853 7869 +16
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
While I am +1 to the idea in general, I am afraid the current API may bring more problem than it solves...see the comments.
(Granted it isn't really a problem unique to this specific design, I think the issue is more that this is not a very well solved problem so far, I don't really know what it should look like, though I think I might be able to point out what it should not look like to avoid adding/re-introducing leaks/use-after-frees that user land workarounds can already manage)
|
I was the one requesting this while sitting next to yagiz today. We take advantage of Module Federation which allows us to distribute code at runtime. However, when parts of the distributed system are updated, it gets stuck in module cache. I've had some workarounds, like attempting to purge require cache - however when it comes to esm, it's a difficult problem. Since we do this distribution primarily in production, and there can be thousands of updates a day, I block esm from being supported because it'll leak memory - which was fine for several years but becoming more problematic in modern tooling. On lambda we cannot just exit a process and bring a new one up without triggering a empty deploy, which has generally been a perf hit to cold start a new lambda vs try and "reset" the module cache for primitive hot reload. Now, I know this might be controversial, or not recommended - but the reality is that many large companies use federation, most fortune 50 companies use it heavily. All of them are relying on userland cobbling I've created. If there is a solution, it would be greatly appreciated by all of my users. I believe this would also be very useful in general for tooling like rspack etc where we have universal dev serves. If invalidation of specific modules causes complexity, I'd be more than happy with a nuclear option like resetModuleCache() which just clears everything entirely. Would be a little slower, but nothing is slower than killing a process and bringing up a new one. "Soft Restart" node without killing it. Don't have much opinion on spec compliance etc, can go through NAPI as well if that would avoid any spec concerns or pushback. |
|
Chiming in to say that re-loading a module is very helpful in tests. We can do this with the fabulous CJS paradigm, but ESM does not have a viable equivalent and it should. |
|
I think there are still quite a few places that need updates/tests - I tried my best to find them, but there are some dusty corners in the module loader that I have never poked at, you might want to take a heap snapshot or write more tests with
|
|
I think I addressed all of your concerns @joyeecheung. Let me know if I missed anything! |
Just pinging @guybedford to speak on the spec concerns. I think we should wait for him or someone similarly knowledgeable about the spec to comment before landing. In general I'm +1 on the feature, assuming it can be safely implemented. My (dim) recollection was that the last time we considered it, it was impossible to modify an ES module after it had been loaded into V8. Has that changed in recent years? How do you handle cases like |
|
I checked out the existing HMR solutions a bit and I think this API may be enough: /**
* @param {string|URL} specifier // This is what would've been passed into import(specifier) or require(specifier)
* @param {{
* parentURL: string | URL, // Mandatory, because parent identity is part of the resolution cache key
* importAttributes?: Record<string, string>, // Optional, only meaningful when resolver is "import"
* resolver: "import" | "require", // Specifies how resolution should be performed
* caches: "resolution" | "module" | "all", // resolution: only clear resolution cache; module: clear cache for the module everywhere in Node.js (not counting JS level references)
* }} options
*/
function clearCache(specifier, options) {}clearing resolution cache is still useful for HMR solutions that do cache busting URLs - which I think may actually be the more spec-compliant way to implement it. The spec violation technically doesn't come from evaluation, but from module mapping specified by HostLoadImportedModule:
The cache clearing API makes it possible for the same referrer + module request to get different module records in return, but it does not mean this must be violating the spec by nature, it just delegates the responsibility of correctness to whoever that uses this API, similar to how V8 delegates this to Node.js. One way to ensure this is correctly implemented is to use a cache busting referrer (i.e. A minimal example of using this (ignoring some complexities from e.g. fs) can be let app, rev = 0;
const reload = async () => {
const prev = rev ? `./app.mjs?hmr=${rev}` : null;
await app?.dispose?.(); // clear side effects
if (prev) {
module.clearCache(prev, {
parentURL: import.meta.url,
resolver: "import",
caches: "all",
});
}
app = await import(`./app.mjs?hmr=${++rev}`);
};
await reload();
http.createServer((req, res) => app.handle(req, res)).listen(3000);
fs.watch(new URL(import.meta.resolve('./app.mjs')), reload); |
|
@joyeecheung In that example, if |
|
@ScriptedAlchemy @Nsttt the meeting is today at 4pm UTC. If nobody sent you the link yet, feel free to email me at the email on my GitHub profile and I'll send you the meeting URL. |
This is a minimal example, but if dependencies need to be supported, the HMR solution can just append hmr parameter to all the dependencies via a loader hook that track the graph through context.parentURL and manage the lifecycle of them, which IIUC is what they already do for CJS anyway (because |
d4fb1b4 to
7e9a7c5
Compare
|
I think one way to test this more robustly (i.e. V8 can actually garbage collect it) might be something like this: // Flags: --expose-internals
const { internalBinding } = require('internal/test/binding');
const { ModuleWrap } = internalBinding('module_wrap');
const { queryObjects } require('node:v8'); // Let's run the test in CJS to reduce the noise from queryObject
let app, rev = 0;
const reload = async () => {
const prev = rev ? `./app.mjs?hmr=${rev}` : null;
if (prev) {
module.clearCache(prev, {
parentURL: import.meta.url,
resolver: "import",
caches: "all",
});
}
app = await import(`./app.mjs?hmr=${++rev}`);
};
(async() {
await reload(); // first load
await reload(); // second
const result = queryObjects(ModuleWrap, { format: 'summary' });
// Validate that result no longer includes module with a wrap whose .url includes `app.mjs?hmr=0`
})();(Or use checkIfCollectableByCounting with ModuleWrap) |
|
@joyeecheung Pushed a new test according to your recommendations. |
|
@joyeecheung @mcollina would you mind re-reviewing? |
|
@guybedford @joyeecheung @GeoffreyBooth I'd like to land this on Monday, but I want to make sure we are all aligned with this change. Would you mind reviewing or leave a comment about your thoughts? Just don't forget that this is an "active development" API which we can iterate over time. |
|
Just wanted to chime in that since going full ESM a few years ago, this has been one of the biggest things I've been missing over CJS, for live reloading NodeJS code for local development based workflows (been using Worker Threads as a fallback). Very much looking forward to this one! 💚 |
| } | ||
|
|
||
| // Clear resolution cache. Only ESM has a structured resolution cache; | ||
| // CJS resolution results are not separately cached. |
There was a problem hiding this comment.
I don't think this is true, there is relativeResolveCache. Other than that, we also have stat cache in CJS. When the resolution cache needs to be cleared we should clear them - otherwise it can also slowly leak or go stale when the file layout changes, leading to an incorrect resolution the next time. Can you add some tests for them?
We also have package.json caches, though that might be a bit harder to clean - can you add a test to check that if package.json get updated such that the exports condition point to a different resolution, after clearing the cache, the second load correctly resolve to the file pointed to by the updated export condition?
| * @returns {string} | ||
| */ | ||
| function resolveClearCacheURL(specifier, parentURL) { | ||
| const parsedURL = getURLFromClearCacheSpecifier(specifier); |
There was a problem hiding this comment.
Why is it bypassing the hooks here? I think for ESM, simply cascadedLoader.resolveSync(parentURL, specifier).url should be enough, the special cases seem to introduce the inconsistency that for absolute paths, the hooks are bypassed, which could break user code that import full URLs + hooks that redirect them - then the module being cleared is incorrectly resolved. Can you add a test to check that when a hook is registered, say that redirect a full path to another path, it's the redirected path's module cache that gets cleared when caches is all?
| let request = specifier; | ||
| if (parsedURL) { | ||
| if (parsedURL.protocol !== 'file:' || parsedURL.search !== '' || parsedURL.hash !== '') { | ||
| return null; |
There was a problem hiding this comment.
What does returning null mean here? I think for CJS, creating a fake parent and then resolveForCJSWithHooks is enough. Similarly, if we bypass the hook for absolute paths, this may fail to clear the module cache when the resolution is customized.
| const cascadedLoader = getOrInitializeCascadedLoader(); | ||
| let deleteResolveCalls = 0; | ||
| const originalDeleteResolveCacheEntry = cascadedLoader.deleteResolveCacheEntry; | ||
| cascadedLoader.deleteResolveCacheEntry = function(...args) { |
There was a problem hiding this comment.
Patching a method to check how it's called can be somewhat brittle - I think we can simply expose the resolve cache and check that it's cleared instead?
| * @param {string} filename | ||
| * @returns {boolean} true if any entries were deleted. | ||
| */ | ||
| deleteResolveCacheByFilename(filename) { |
There was a problem hiding this comment.
Is this used anywhere? If unused, this can be deleted. Since the resolution cache clearing is differentiated based on resolver, this doesn't seem to be needed (if resolver is import, then just clear that exact same request + parentURL + import attribute entry. If resolver is require, this is not used at all).
| const file = path.join(__dirname, 'mod.js'); | ||
| require(file); | ||
|
|
||
| clearCache(file, { |
There was a problem hiding this comment.
Can you use the snippet in #61767 (comment) to demonstrate it beyond clearing only the module cache with an absolute path? That would be a more realistic example (even though it's still relatively naiive).
Also for the import path, I think the documentation deserves a reference to the ECMA262 spec referenced in that comment - if the user re-import the exact same module request (without updating the search parameters), it technically breaks a spec invariant, so it's recommended to change the search parameter for the next load. Otherwise the behavior should be considered undefined, and we can't guarantee its correctness (we are sort of relying on the fact that V8 isn't technically following the spec right now for this to not fail - if V8 gets refactored to follow the spec more closely, not respecting the idempotency requirement in the spec might lead to a CHECK/crash).
There was a problem hiding this comment.
If you wanted to follow the spec perfectly you would replace invalidated module records with a tombstone record. Then if the user imports a previously-cleared module it should throw. The idempotency invariant is only required in the normal completion case.
| } | ||
|
|
||
| /** | ||
| * Remove load cache entries for a URL and its file-path variants. |
There was a problem hiding this comment.
This seems a bit too broad for resolver: "import" - it could be surprising that by clearCache('./foo?t=1', ...) you also clear the cache for ./foo?t=2 - intuitively, one might expect that the latter would remain and won't see a re-load unless you specifically clear it. I think this behavior either needs a call out in the documentation, or become configurable in the API.
GeoffreyBooth
left a comment
There was a problem hiding this comment.
@guybedford @joyeecheung @GeoffreyBooth I'd like to land this on Monday, but I want to make sure we are all aligned with this change. Would you mind reviewing or leave a comment about your thoughts? Just don't forget that this is an "active development" API which we can iterate over time.
I defer to @joyeecheung on the technical review and to @guybedford on the spec compliance, so if both of them approve then it seems good to me!
One thing I'm wondering about is what about people using this in production. It's obviously designed only for development, but nothing stops someone from using this API anywhere, or for dependencies calling it; is that a concern? Could it be a security concern if a malicious dependency calls this in production? I assume not since what this does is basically already possible in CommonJS, but it might be something worth addressing if only in a comment.
| `resolver` is `'import'`. | ||
| Clears module resolution and/or module caches for a module. This enables | ||
| reload patterns similar to deleting from `require.cache` in CommonJS, and is useful for HMR. |
There was a problem hiding this comment.
I don't think we spell out the acronym before this.
| reload patterns similar to deleting from `require.cache` in CommonJS, and is useful for HMR. | |
| reload patterns similar to deleting from `require.cache` in CommonJS, and is useful for | |
| hot module reload. |
| caches: 'module', | ||
| }); | ||
| require(file); // re-executes the module | ||
| ``` |
There was a problem hiding this comment.
This example is good albeit minimal; should we add an expanded example along the lines of “if you want to implement HMR, this is how you do it”?
Introduce Module.clearCache() to invalidate CommonJS and ESM module caches with optional resolution context, enabling HMR-like reloads. Document the API and add tests/fixtures to cover cache invalidation behavior.