dissect: add --manifest switch to generate a UAPI.16 manifest file fr…#41990
dissect: add --manifest switch to generate a UAPI.16 manifest file fr…#41990poettering wants to merge 5 commits into
Conversation
|
Do not merge until the spec is also good to be merged. |
Claude review of PR #41990 (97448c7)Must fix
Suggestions
Nits
|
|
Making this as draft, as this has the do-not-merge tag. |
The comments in rm-rf.h suggested to do this already manually. Let's add a way to make this less manual.
…om a directory tree The UAPI.16 spec is being discussed here: uapi-group/specifications#213
| r = sd_json_buildo( | ||
| &j, | ||
| SD_JSON_BUILD_PAIR_CONDITION(is_root, "mediaType", JSON_BUILD_CONST_STRING("application/vnd.uapi.16.manifest")), | ||
| SD_JSON_BUILD_PAIR_CONDITION(!is_root, "name", SD_JSON_BUILD_STRING(path)), |
There was a problem hiding this comment.
Claude: must-fix: The name field is built from the raw filename via SD_JSON_BUILD_STRING(path), but sd_json_variant_new_stringn() rejects any string that is not valid UTF-8 with -EUCLEAN (src/libsystemd/sd-json/sd-json.c:424). Linux filenames are arbitrary byte sequences and the image / directory tree may be untrusted, so a single non-UTF-8 (or mojibake) name anywhere in the tree makes sd_json_buildo() fail, manifest_print_item() return the error, and recurse_dir() abort — the entire --manifest run dies with "Failed to build JSON object" on the first bad name. The --mtree path tolerates such names (it prints them raw / escapes the symlink target). Consider validating or escaping the name (or skipping with a warning) so an untrusted image cannot abort manifest generation. The same applies if any intermediate path component is non-UTF-8.
| } | ||
|
|
||
| /* Suppress the type on the top-level entry and on regular files. */ | ||
| type = !is_root && !S_ISREG(sx->stx_mode) ? inode_type_to_string(sx->stx_mode) : NULL; |
There was a problem hiding this comment.
Claude: suggestion: The type value comes from inode_type_to_string(), which yields the systemd-internal tokens dir/lnk/chr/blk/fifo/sock (note these differ from the --mtree vocabulary dir/file/link/block/char/socket). Please confirm these exact strings match the type enum defined by the UAPI.16 Manifest Format spec (uapi-group/specifications#213) — a manifest using a different vocabulary than the spec will not validate. The CI test in the last commit hardcodes the same tokens, so it would pass even if they diverged from the spec.
| SD_JSON_BUILD_PAIR_CONDITION(!is_root, "name", SD_JSON_BUILD_STRING(path)), | ||
| SD_JSON_BUILD_PAIR_CONDITION(!!type, "type", SD_JSON_BUILD_STRING(type)), | ||
| SD_JSON_BUILD_PAIR_CONDITION(have_size, "size", SD_JSON_BUILD_UNSIGNED(size)), | ||
| SD_JSON_BUILD_PAIR_CONDITION(!!data, "contents", SD_JSON_BUILD_ARRAY(SD_JSON_BUILD_OBJECT(SD_JSON_BUILD_PAIR("literal", SD_JSON_BUILD_BASE64(data, size))))), |
There was a problem hiding this comment.
Claude: suggestion: For symlinks the target is emitted as contents: [{ "literal": base64(target) }]. (The base64 size argument is correct here — it is strlen(target) set in the S_IFLNK branch, not the stale UINT64_MAX default, and contents is gated on !!data so the two cases are mutually exclusive.) The open conformance question is whether the UAPI.16 spec represents a symlink target through the same contents/literal mechanism it uses for regular-file contents, or via a dedicated field. Please confirm the spec sanctions reusing contents/literal for symlink targets.
| (void) rm_rf_children(TAKE_FD(hd->db_fd), REMOVE_PHYSICAL|REMOVE_REWIND, /* root_dev= */ NULL); | ||
|
|
||
| if (hd->db_path) | ||
| hd->db_path = rmdir_and_free(hd->db_path); |
There was a problem hiding this comment.
Claude: nit: rmdir_and_free(NULL) is a no-op that returns NULL, so the if (hd->db_path) guard is redundant. The surrounding systemd idiom is to call such freers unconditionally: hd->db_path = rmdir_and_free(hd->db_path);
| <term><option>--manifest</option></term> | ||
|
|
||
| <listitem><para>Generates a file manifest of the specified disk image or directory, following the | ||
| <ulink url="https://uapi-group.org/specifications/specs/manifest_format/">UAPI.16 Manifest |
There was a problem hiding this comment.
Claude: nit: This links to https://uapi-group.org/specifications/specs/manifest_format/, but the UAPI.16 Manifest Format spec is still an open, unmerged proposal (uapi-group/specifications#213, and this PR carries a do-not-merge label). That URL likely does not resolve until the spec PR merges. Consider pointing at the spec PR until it is published, or holding the docs until the canonical URL exists, so the man page does not ship a dead link.
| # Unix socket — bind() it with python3 if available (process exits, file persists) | ||
| HAVE_SOCKET=0 | ||
| if command -v python3 >/dev/null 2>&1; then | ||
| python3 -c "import socket,sys; s=socket.socket(socket.AF_UNIX); s.bind(sys.argv[1])" "$MANIFEST_ROOT/socket" |
There was a problem hiding this comment.
Claude: suggestion: The socket coverage is gated only on command -v python3, but the bind() itself runs unguarded under errexit. If python3 is present but the bind fails for any reason (sandboxed environment, AF_UNIX restricted, an unexpected error), the whole test aborts even though the socket case is meant to be best-effort. Consider folding the bind into the guard so a failure only leaves HAVE_SOCKET=0, e.g. if command -v python3 >/dev/null 2>&1 && python3 -c '...' "$MANIFEST_ROOT/socket"; then HAVE_SOCKET=1; fi.
| # Capture the JSON-seq stream and convert to JSON-Lines for jq filtering. Each | ||
| # record is already on its own line in compact form, so stripping the RS (0x1E) | ||
| # byte gives us a valid JSON-Lines stream. | ||
| systemd-dissect --manifest "$MANIFEST_ROOT" >"$MANIFEST_ROOT.seq" | ||
| tr -d '\036' <"$MANIFEST_ROOT.seq" >"$MANIFEST_ROOT.jsonl" | ||
| test -s "$MANIFEST_ROOT.jsonl" | ||
|
|
||
| # Helper: emit the entry whose "name" matches $1 (root entry has no name) | ||
| get_entry() { jq -c --arg n "$1" 'select(.name == $n)' <"$MANIFEST_ROOT.jsonl"; } | ||
|
|
||
| # Root entry is the first record, has mediaType and no name | ||
| head -n1 "$MANIFEST_ROOT.jsonl" | \ |
There was a problem hiding this comment.
Claude: nit: The tr -d '\036' -> JSON-Lines conversion and the head -n1 root-record extraction both rely on each manifest record being emitted as a single compact line. That holds only because the implementation dumps with no PRETTY flag; if the manifest output ever became multi-line, head -n1 would no longer capture the full root record (jq reading the whole stream would still cope, but this one root assertion would not). Selecting the root record with jq over the full stream (e.g. jq -s '.[0]' over the stripped output) would be more robust than head -n1.
…om a directory tree
The UAPI.16 spec is being discussed here:
uapi-group/specifications#213