Skip to content

Commit 36d440e

Browse files
fix: repo-wide correctness, security & filesystem-safety hardening pass (v3.2.0) (#92)
* fix: repo-wide correctness, security & filesystem-safety hardening pass (v3.2.0) Reviewed every source file in both crates line by line, fixed the bugs found, and added regression tests throughout. Highlights: Security - patch/package.rs: path-traversal via validate-before-normalize (package//etc/passwd escaped the package tree) - patch/diff.rs: clamp unbounded Vec preallocation from untrusted bsdiff target-size header (OOM/abort on a hostile delta) - vex/verify.rs: omit zero-file patches instead of emitting an evidence-free not_affected attestation Filesystem safety / atomicity / rollback - apply: DirWriteGuard for read-only dirs, chown-before-chmod to keep setuid/setgid, parent-dir fsync after rename - cow: atomic rename-over symlink (no pre-unlink), stage cleanup - rollback: delegate to hardened apply_file_patch; AlreadyOriginal before blob check; read-only-dir new-file delete - file_hash/git_sha256: open-once + fstat (TOCTOU), regular-file guard, size/body mismatch detection - cargo/nuget sidecars: hardened writes/deletes in read-only caches - cleanup_blobs: symlink-tolerant, accurate counts - apply_lock: classify genuine flock errors as Io, clamp timeout sleep Crawlers (on-disk layout & metadata) - composer v-prefix + malformed-entry tolerance + on-disk check - go cache-at-root, version case-encoding, GOPATH list, module directive - npm symlink following + nested-recursion guard - nuget global-cache version casing - python macOS framework layout + dist-info dir-name fallback - deno macOS cache path, XDG_CACHE_HOME, empty DENO_DIR - maven XML-comment stripping + skip-section depth - cargo TOML header tolerance + dir-name version split - shared utils/fs::entry_is_dir follows symlinks API client, commands & misc - proxy-url override on binary downloads; deterministic org/title/batch flag; case-insensitive hash compare - USER_AGENT + telemetry version track CARGO_PKG_VERSION (was 1.0.0) - apply release-variant NotFound spurious-failure fix - get/scan/remove char-safe truncation (UTF-8 panic) - setup/repair honest non-zero exit codes + telemetry - rollback no-op miscount; unlock released-snapshot; vex qualified PURLs - package.json non-object/dedup/glob/key-order (preserve_order) - json_envelope status invariant + oldUuid; list ordering; fuzzy_match tie-break; lock_cli sub-second timeout; vex schema/product fixes Updated stale repair/python_crawler e2e expectations to the corrected contracts. Bumped version to 3.2.0 and added the scripts/study-crates.ts audit harness used to drive the review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: fix Windows-only file_hash directory-rejection assertion `File::open` on a directory fails outright on Windows (different OS error kind), whereas on Unix it opens and the is_file() guard rejects it with InvalidInput. The production code rejects directories on both platforms; only pin the specific InvalidInput kind off-Windows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent d1108cb commit 36d440e

114 files changed

Lines changed: 11196 additions & 1506 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,3 +148,6 @@ npm/socket-patch/bin/socket-patch-*
148148
crates/socket-patch-cli/README.md
149149
npm/socket-patch/README.md
150150
pypi/socket-patch/README.md
151+
152+
# Generated by scripts/study-crates.ts
153+
study-output/

CHANGELOG.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,130 @@ in this file — see `.github/workflows/release.yml` (`version` job).
1414

1515
## [Unreleased]
1616

17+
## [3.2.0] — 2026-05-29
18+
19+
A repo-wide correctness, security, and filesystem-safety hardening pass: every
20+
source file in both crates was reviewed line by line, the bugs found were fixed,
21+
and regression tests were added throughout (the lib + integration suites grow by
22+
~10k lines of mostly tests). The audit harness used to drive the review lives in
23+
`scripts/study-crates.ts`.
24+
25+
### Security
26+
27+
- **Path-traversal in archive extraction.** `read_archive_to_map`
28+
(`patch/package.rs`) validated the raw tar entry path but returned the
29+
`package/`-stripped path, so an entry like `package//etc/passwd` passed every
30+
check and then resolved to an absolute `/etc/passwd` that `Path::join`
31+
writes outside the package tree. Validation now runs on the normalized path
32+
actually written to disk.
33+
- **Unbounded preallocation from an untrusted delta header.** `apply_diff`
34+
(`patch/diff.rs`) reserved a `Vec` sized from the bsdiff target-size header,
35+
which qbsdiff never validates — a tiny hostile delta could claim up to
36+
`i64::MAX` and abort the process. The hint is now clamped to 64 MiB.
37+
- **Evidence-free VEX attestation.** `verify_patch_record` (`vex/verify.rs`)
38+
returned `applied` for a patch touching zero files, producing a
39+
`not_affected` statement with no on-disk evidence; zero-file records are now
40+
omitted (`no_files`).
41+
42+
### Fixed — filesystem safety, atomicity & rollback
43+
44+
- **`apply` could not write into read-only directories** (Go module cache marks
45+
dirs `0o555`); added a `DirWriteGuard` that temporarily grants write on the
46+
parent dir around the CoW-break + atomic rename and restores its exact mode.
47+
- **`apply` stripped setuid/setgid bits** on every patched file because `chown`
48+
ran after `chmod`; reordered to chown-before-chmod, plus a parent-dir `fsync`
49+
so the rename survives a crash.
50+
- **Non-atomic symlink break** (`patch/cow.rs`) removed the file before staging
51+
its replacement, destroying it with no rollback on a failed write; now
52+
rename-over the link, matching the hardlink path. Stage files are cleaned up
53+
on every error arm.
54+
- **`rollback` used an unsafe in-place write**; it now delegates to the hardened
55+
`apply_file_patch` (atomic, CoW-safe, validate-before-write, permission
56+
restore). Also: a GC'd before-blob no longer shadows the already-original
57+
short-circuit, and new-file deletion works inside read-only directories.
58+
- **Hash integrity:** `compute_file_git_sha256` (`patch/file_hash.rs`) opened
59+
and stat'd the path separately (TOCTOU) and never checked the target was a
60+
regular file (a directory hashed as the empty blob); now opens once, fstats
61+
the descriptor, and rejects non-regular files. `compute_git_sha256_from_reader`
62+
now errors when the streamed byte count disagrees with the declared size.
63+
- **Sidecar writes in read-only caches:** the cargo `.cargo-checksum.json`
64+
rewrite and the NuGet `.nupkg.metadata` delete used bare, non-atomic I/O that
65+
failed `EACCES` in the locked-down registry trees they exist to serve; both
66+
now go through the hardened write/`DirWriteGuard` paths.
67+
- **Blob cleanup** (`utils/cleanup_blobs.rs`) aborted the whole sweep on one
68+
dangling symlink and inflated the "checked" count with subdirs/dotfiles; now
69+
uses `symlink_metadata`, skips stat errors, and counts only real blobs.
70+
- **Lock acquisition** (`patch/apply_lock.rs`) mapped every `flock` error to
71+
`Held` (masking `ENOLCK`/`EACCES`/unsupported-FS and busy-waiting through the
72+
whole timeout) and overshot sub-100 ms waits; genuine faults now surface
73+
immediately and the sleep is clamped to the remaining budget.
74+
75+
### Fixed — crawlers (on-disk layout & metadata)
76+
77+
- **Composer:** normalize the `v`-prefixed `installed.json` version against bare
78+
PURLs, tolerate a single malformed entry instead of dropping the file, and
79+
skip packages absent on disk.
80+
- **Go:** only skip `cache/` at the module-cache root (not at any depth),
81+
decode/encode case-escaped versions (`v1.0.0-RC1``…-!r!c1`), treat `GOPATH`
82+
as a path list, and reject malformed/empty `module` directives.
83+
- **npm:** follow symlinked directories during the global-fallback walk
84+
(`DirEntry::metadata()` doesn't follow links) and guard nested recursion so it
85+
doesn't descend through symlinked packages.
86+
- **NuGet:** lowercase the version directory (not just the id) when resolving the
87+
global packages folder, so prerelease-cased versions resolve.
88+
- **Python:** the macOS framework `Versions/` layout uses bare `3.11` dirs, and a
89+
package with missing/malformed `METADATA` now falls back to its
90+
`<name>-<version>.dist-info` directory name instead of vanishing.
91+
- **Deno:** correct the macOS cache path (`~/Library/Caches/deno`), honor
92+
`XDG_CACHE_HOME` on Linux, and treat an empty `DENO_DIR` as unset.
93+
- **Maven:** strip XML comments before tag matching and handle self-closing /
94+
inline skip-sections so a commented or oddly-formatted POM can't leak a
95+
plugin's coordinates as the project's.
96+
- **Cargo:** tolerate `[package]` headers with comments/whitespace and split
97+
`<name>-<version>` dirs at the dotted version (handles numeric pre-releases).
98+
- **Shared:** `utils/fs::entry_is_dir` now follows symlinks, fixing symlinked
99+
package-dir discovery across every dir-walking crawler at once.
100+
101+
### Fixed — API client, commands & misc
102+
103+
- **API client:** honor a `--proxy-url` override on binary downloads (was
104+
re-derived from env), and make org selection, patch titles, and the
105+
individual-query batch capability flag deterministic / order-independent;
106+
hash comparison is now case-insensitive.
107+
- **Version reporting:** `USER_AGENT` and telemetry `context.version` were
108+
hardcoded to `1.0`/`1.0.0`; both now derive from `CARGO_PKG_VERSION`.
109+
- **`apply`** no longer emits a spurious `Failed` envelope event for a
110+
release-variant whose first file is `NotFound`.
111+
- **UTF-8 safety:** `get`/`scan`/`remove` truncated display strings with raw
112+
byte slices that panic on multi-byte API text; all use char-safe truncation.
113+
- **Exit codes:** `setup` now exits non-zero (not `already_configured`) when a
114+
`package.json` fails to parse, and `repair` exits non-zero and fires failure
115+
telemetry on a partial download failure (also gates the offline dry-run
116+
"would download" event and threads through `bytes_freed`).
117+
- **`rollback`** no longer miscounts zero-file records as already-original or
118+
double-counts no-ops in dry-run; **`unlock`** reports `released` from a
119+
pre-`acquire` snapshot so a probe-created lock file isn't reported as removed.
120+
- **`vex`** resolves qualified PyPI/Gem/Maven PURLs via the rollback-aware
121+
resolver so those patches are no longer dropped as `package_not_found`.
122+
- **`package.json` handling:** no longer panics on a non-object root or
123+
non-object `scripts`, de-dups overlapping workspace patterns, handles bare
124+
`*`/`**`/deep globs, strips inline YAML comments, and preserves top-level key
125+
order (enabled `serde_json`'s `preserve_order`).
126+
- Smaller fixes: deterministic `list` output ordering, case-insensitive
127+
`fuzzy_match` tie-break, `json_envelope` status-invariant enforcement +
128+
`oldUuid` field, `lock_cli` sub-second timeout message, blob-fetcher
129+
all-skipped formatting, VEX `Statement.timestamp` made optional per OpenVEX
130+
0.2.0, and VEX git-remote `url` parsing.
131+
132+
### Tests & tooling
133+
134+
- Hundreds of regression tests added across the patch engine, crawlers, API
135+
client, manifest, `package.json`, VEX, and CLI command layers; the stale
136+
`repair`/`python_crawler` e2e expectations were updated to the corrected
137+
contracts. Full suite green (`--features cargo`).
138+
- Added the `scripts/study-crates.ts` per-file audit harness (with an example
139+
prompt config) used to drive this review.
140+
17141
## [3.1.0] — 2026-05-26
18142

19143
### Added

Cargo.lock

Lines changed: 3 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,16 @@ members = ["crates/socket-patch-core", "crates/socket-patch-cli"]
33
resolver = "2"
44

55
[workspace.package]
6-
version = "3.1.0"
6+
version = "3.2.0"
77
edition = "2021"
88
license = "MIT"
99
repository = "https://github.com/SocketDev/socket-patch"
1010

1111
[workspace.dependencies]
12-
socket-patch-core = { path = "crates/socket-patch-core", version = "=3.1.0" }
12+
socket-patch-core = { path = "crates/socket-patch-core", version = "=3.2.0" }
1313
clap = { version = "=4.5.60", features = ["derive", "env"] }
1414
serde = { version = "=1.0.228", features = ["derive"] }
15-
serde_json = "=1.0.149"
15+
serde_json = { version = "=1.0.149", features = ["preserve_order"] }
1616
sha2 = "=0.10.9"
1717
hex = "=0.4.3"
1818
reqwest = { version = "=0.12.28", features = ["rustls-tls", "json"], default-features = false }

crates/socket-patch-cli/src/args.rs

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,3 +284,120 @@ impl Default for GlobalArgs {
284284
}
285285
}
286286
}
287+
288+
#[cfg(test)]
289+
mod tests {
290+
use super::*;
291+
292+
/// `api_client_overrides` must forward every populated value verbatim.
293+
#[test]
294+
fn api_client_overrides_forwards_set_values() {
295+
let args = GlobalArgs {
296+
api_url: "https://api.example.com".to_string(),
297+
api_token: Some("tok123".to_string()),
298+
org: Some("acme".to_string()),
299+
proxy_url: "https://proxy.example.com".to_string(),
300+
..GlobalArgs::default()
301+
};
302+
let o = args.api_client_overrides();
303+
assert_eq!(o.api_url.as_deref(), Some("https://api.example.com"));
304+
assert_eq!(o.api_token.as_deref(), Some("tok123"));
305+
assert_eq!(o.org_slug.as_deref(), Some("acme"));
306+
assert_eq!(o.proxy_url.as_deref(), Some("https://proxy.example.com"));
307+
}
308+
309+
/// `GlobalArgs::default()` leaves `api_url`/`proxy_url` empty and the
310+
/// optional fields `None`, so every override must come back `None` —
311+
/// this is what lets integration tests set `SOCKET_*` env vars *after*
312+
/// constructing args and still have env-var resolution win downstream.
313+
#[test]
314+
fn api_client_overrides_default_is_all_none() {
315+
let o = GlobalArgs::default().api_client_overrides();
316+
assert!(o.api_url.is_none(), "empty api_url must not be forwarded");
317+
assert!(o.proxy_url.is_none(), "empty proxy_url must not be forwarded");
318+
assert!(o.api_token.is_none());
319+
assert!(o.org_slug.is_none());
320+
}
321+
322+
/// Empty strings for url/token/org are filtered out, not forwarded as
323+
/// `Some("")` — otherwise an empty CLI value would mask env-var fallback.
324+
#[test]
325+
fn api_client_overrides_filters_empty_strings() {
326+
let args = GlobalArgs {
327+
api_url: String::new(),
328+
api_token: Some(String::new()),
329+
org: Some(String::new()),
330+
proxy_url: String::new(),
331+
..GlobalArgs::default()
332+
};
333+
let o = args.api_client_overrides();
334+
assert!(o.api_url.is_none());
335+
assert!(o.api_token.is_none());
336+
assert!(o.org_slug.is_none());
337+
assert!(o.proxy_url.is_none());
338+
}
339+
340+
/// A relative `manifest_path` is resolved against `cwd`.
341+
#[test]
342+
fn resolved_manifest_path_joins_relative_against_cwd() {
343+
let args = GlobalArgs {
344+
cwd: PathBuf::from("/work/project"),
345+
manifest_path: ".socket/manifest.json".to_string(),
346+
..GlobalArgs::default()
347+
};
348+
assert_eq!(
349+
args.resolved_manifest_path(),
350+
PathBuf::from("/work/project/.socket/manifest.json"),
351+
);
352+
}
353+
354+
/// An absolute `manifest_path` ignores `cwd` and passes through unchanged.
355+
#[test]
356+
fn resolved_manifest_path_passes_absolute_through() {
357+
let args = GlobalArgs {
358+
cwd: PathBuf::from("/work/project"),
359+
manifest_path: "/etc/socket/manifest.json".to_string(),
360+
..GlobalArgs::default()
361+
};
362+
assert_eq!(
363+
args.resolved_manifest_path(),
364+
PathBuf::from("/etc/socket/manifest.json"),
365+
);
366+
}
367+
368+
/// `apply_env_toggles` mirrors `--debug` / `--no-telemetry` into the env
369+
/// vars core code reads directly, and is a no-op when the flags are off.
370+
/// `#[serial]` because it mutates process-global env state.
371+
#[test]
372+
#[serial_test::serial]
373+
fn apply_env_toggles_mirrors_flags_into_env() {
374+
let saved_debug = std::env::var("SOCKET_DEBUG").ok();
375+
let saved_telemetry = std::env::var("SOCKET_TELEMETRY_DISABLED").ok();
376+
std::env::remove_var("SOCKET_DEBUG");
377+
std::env::remove_var("SOCKET_TELEMETRY_DISABLED");
378+
379+
// Flags off: no-op, env stays unset.
380+
apply_env_toggles(&GlobalArgs::default());
381+
assert!(std::env::var("SOCKET_DEBUG").is_err());
382+
assert!(std::env::var("SOCKET_TELEMETRY_DISABLED").is_err());
383+
384+
// Flags on: mirrored into the env.
385+
let args = GlobalArgs {
386+
debug: true,
387+
no_telemetry: true,
388+
..GlobalArgs::default()
389+
};
390+
apply_env_toggles(&args);
391+
assert_eq!(std::env::var("SOCKET_DEBUG").as_deref(), Ok("1"));
392+
assert_eq!(std::env::var("SOCKET_TELEMETRY_DISABLED").as_deref(), Ok("1"));
393+
394+
match saved_debug {
395+
Some(v) => std::env::set_var("SOCKET_DEBUG", v),
396+
None => std::env::remove_var("SOCKET_DEBUG"),
397+
}
398+
match saved_telemetry {
399+
Some(v) => std::env::set_var("SOCKET_TELEMETRY_DISABLED", v),
400+
None => std::env::remove_var("SOCKET_TELEMETRY_DISABLED"),
401+
}
402+
}
403+
}

0 commit comments

Comments
 (0)