Rec group builder#13687
Conversation
It previously matched the `I16` variant instead of `ValType`, so it returned the wrong answer for every storage type. The method is otherwise unused inside the tree, which is why this went unnoticed.
Adds `RecGroupBuilder`, which lets embedders declare kind-typed labels (`PendingStructId`/`PendingArrayId`/`PendingFuncId`), use them as forward references while defining other types via a small build-time "template" family, and register the whole group at once with `build()`. This makes it possible to construct self-referential and mutually-recursive struct/array/func types directly from the embedder API, which previously required plucking such types out of a module's imports/exports. The builder lowers its members to module-canonical `WasmSubType`s (using 0-based `Module` indices for intra-group references and `Engine` indices for already-registered types) and reuses the existing rec-group registration path, so hash-consing, runtime canonicalization, supertype lists, and GC layouts all come for free. Implements the embedder API requested in bytecodealliance#10176.
e4e67d9 to
ff45908
Compare
|
@somdoron given that you just posted an AI plan verbatim in this PR's associated issue, I am assuming you also used AI to implement this PR. Can you confirm that you have already reviewed the AI tool's work in detail before posting it here and asking reviewers to spend their time on it? And are you taking full responsibility for the code in this PR? https://github.com/bytecodealliance/governance/blob/main/AI_TOOL_POLICY.md |
|
@fitzgen I confirmed that I reviewed the AI tool. I reviewed the code, and I'm taking full responsibility for the code in this PR. The PR is for a real problem I'm suffering from, and is focused and small. |
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "wasm-proposal:gc", "wasmtime:api", "wasmtime:docs"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
fitzgen
left a comment
There was a problem hiding this comment.
First of all, I want to say that this feature is going to involve a lot of API design, and probably won't land super quickly because of that. If you really need to be able to define rec groups in the host now, you can do that via building Modules (either via WAT strings or via wasm_encoder) that define the rec groups you need and export a global of each of the rec groups types so you can exfiltrate the GlobalType and the global type's ValType on the host via Module::exports. So that should unblock you for the time being, if necessary.
Some high-level thoughts on the design:
-
I don't think we need to distinguish between pending structs versus pending arrays at the type level. I don't think it is buying us much in terms of correctness by construction or whatever, but it makes ramping up on the API that much harder.
-
I don't love adding
FieldTemplate/HeapTypeTemplate/etc... types to Wasmtime's public API. I think we could avoid that (and also not force callers to construct slices of them, potentially allocating when they otherwise wouldn't) by having a struct type builder that borrows the rec group builder and exposes a method to define fields one at a time, something like this:impl RecGroupBuilder<'_> { pub fn define_struct(&mut self, ty: PendingType) -> StructTypeBuilder<'_> { ... } } pub struct StructTypeBuilder<'a> { rec_group: &'a mut RecGroupBuilder, // ... } impl StructTypeBuilder<'_> { /// Define a normal, non-forward-reference field. pub fn field(&mut self, ty: FieldType) { ... } /// Define a field that is a forward reference to another type defined in this rec group. pub fn forward_ref_field(&mut self, ty: PendingType) -> ForwardRefFieldBuilder<'_> { ... } } pub struct ForwardRefFieldBuilder<'a> { parent: StructOrArrayTypeBuilder<'a>, // ... } impl ForwardRefFieldBuilder<'_> { pub fn nullable(&mut self, nullable: bool) -> &mut Self { ... } pub fn shared(&mut self, shared: bool) -> &mut Self { ... } pub fn finish(self) -> &mut StructTypeBuilder<'_> { ... } }
Replace the public surface with a fluent nested-builder API: - Use a single kind-agnostic `PendingType` handle instead of separate `PendingStructId`/`PendingArrayId`/`PendingFuncId`. - Drop the public `*Template` value types. Concrete fields/params take the existing `FieldType`/`ValType` directly; forward references to in-group siblings use `forward_ref_field`/`forward_ref_element`/`forward_ref_param`/ `forward_ref_result`, which take plain `(mutability, is_nullable, target)` arguments (no sub-builders, no slice allocation). - Finality and supertypes are set via builder methods: `finality(Finality)`, `supertype(<concrete>)` for already-registered supertypes, and `forward_supertype(PendingType)` for in-group siblings. `build()` performs the same structural/finality validation the one-off constructors do, after registration so that forward supertypes resolve. Getters are `get_struct`/`get_array`/`get_func -> Option<..>`.
|
Thanks, I have a workaround for now (similar to what you suggested). I applied your suggestion, using a builder API for struct/array/func. I haven't built the ForwardRefFieldBuilder yet, I went with an API similar to StructType::new, taking both nullability and mutability as parameters. It does lack the shared field at the moment. Let me know if you think the field builder API is preferred. |
Yeah the builder API is preferred because it is less brittle as the Wasm spec evolves (e.g. with the addition of |
f786458 to
8f26178
Compare
|
Done, now with the builder API for fields / array elements / function params. I didn't add |
fitzgen
left a comment
There was a problem hiding this comment.
Thanks, I think we are on the right track. Some more comments below.
(Don't want to start getting too detailed with the review until the skeleton / overall design is in the final shape.)
| types: impl ExactSizeIterator<Item = WasmSubType>, | ||
| ) -> Result<Vec<RegisteredType>, OutOfMemory> { | ||
| let len = types.len(); | ||
| assert!(len >= 1, "a rec group must contain at least one type"); |
There was a problem hiding this comment.
This isn't true at the spec level, and is easy to support (just remove this assertion and everything should work, AFAICT) so I don't think we should require it.
| /// A struct field or array element being defined: either an already-known | ||
| /// concrete/abstract type, or a forward reference to a sibling in this group. | ||
| enum FieldDef { | ||
| Concrete(FieldType), | ||
| Forward { | ||
| target: u32, | ||
| nullable: bool, | ||
| mutable: bool, | ||
| }, | ||
| } | ||
|
|
||
| /// A function parameter or result being defined: either an already-known | ||
| /// concrete/abstract type, or a forward reference to a sibling in this group. | ||
| enum ValDef { | ||
| Concrete(ValType), | ||
| Forward { target: u32, nullable: bool }, | ||
| } | ||
|
|
||
| /// A supertype being defined: either a sibling in this group or an | ||
| /// already-registered concrete type. Generic over the concrete kind. | ||
| enum SuperDef<T> { | ||
| Forward(u32), | ||
| Known(T), | ||
| } | ||
|
|
||
| /// The in-progress definition of one member of a rec group. | ||
| enum MemberDef { | ||
| Struct { | ||
| finality: Finality, | ||
| supertype: Option<SuperDef<StructType>>, | ||
| fields: Vec<FieldDef>, | ||
| }, | ||
| Array { | ||
| finality: Finality, | ||
| supertype: Option<SuperDef<ArrayType>>, | ||
| element: Option<FieldDef>, | ||
| }, | ||
| Func { | ||
| finality: Finality, | ||
| supertype: Option<SuperDef<FuncType>>, | ||
| params: Vec<ValDef>, | ||
| results: Vec<ValDef>, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Rather than duplicating all these types for the forward-or-not cases, could we reuse the types from wasmtime_environ::types::* and have the not-forward cases be EngineOrModuleTypeIndex::Engine and references within the rec group be EngineOrModuleTypeIndex::RecGroup or something like that?
| pub struct RecGroupBuilder { | ||
| engine: Engine, | ||
| builder_id: usize, | ||
| members: Vec<Option<MemberDef>>, |
There was a problem hiding this comment.
Re: my previous comment, we would ideally make this become
| members: Vec<Option<MemberDef>>, | |
| inner: WasmRecGroup, |
Although unfortunately that would require defining all struct vs array vs function types up front for all the members, leading to an awkward API for the builder, so I guess we probably actually want something like this:
| members: Vec<Option<MemberDef>>, | |
| members: Vec<Option<WasmSubType>>, |
Although I guess we could do the first if we had different RecGroupBuilder::declare_struct vs RecGroupBuilder::declare_array vs RecGroupBuilder::decalre_func, as you originally formulated it... I didn't realize the implications that change would have here. I think we actually do want to have those separately-typed declare methods so we can have this builder wrap a WasmRecGroup. Sorry for the churn.
There was a problem hiding this comment.
I think Vec<Option<WasmSubType>> might be better. The reason is that TypeRegistryInner::register_rec_group_types takes an iterator of WasmSubType, so we don't get much benefit from wrapping a WasmRecGroup here.
In any case, we can't construct the underlying WasmSubTypes for the WasmRecGroup until the struct/array/func builder finishes. We could create a placeholder of the correct composite kind when declare_* is called, but we'd still have to replace it once the struct/array/func builder finishes. Since that overwrite is needed either way, I don't think the separately-typed declare methods are required to wrap a WasmRecGroup — unless you'd prefer them for API reasons.
There was a problem hiding this comment.
Correction: we do need the separate declare functions — they're required to build the WasmHeapType for a forward reference, whose concrete variant (ConcreteStruct / ConcreteArray / ConcreteFunc) encodes the composite kind. So the kind has to be known at declare time, even though the body isn't filled in yet.
There was a problem hiding this comment.
Some conclusions after digging in:
- We can use multiple declare functions, one per type.
- We can use
WasmHeapTypewithEngineOrModuleTypeIndex::EngineandEngineOrModuleTypeIndex::Modulefor forward references. That would removeFieldDef,ValDef, andSuperDef. - Using
WasmRecGrouporVec<WasmSubType>for theRecGroupBuilder's internal type isn't very elegant, since they aren't mutable. - We can use
Vec<WasmSubType>: create a placeholder ondeclare_*, and create the real type when we callRecGroupBuilder::buildorArray/Func/StructBuilder::finish.
Anyway, I think keeping the MemberDef enum is more elegant than trying to use WasmRecGroup or Vec<WasmSubType>, since it's mutable and allows accumulation.
| /// Declare a new type in this rec group, returning a handle that can be used | ||
| /// as a forward reference before the type is defined. |
There was a problem hiding this comment.
This should document that the order of declare calls determines the order that the types are defined within the rec group, which has semantically visible implications. In the following example, $f != $f' and $s != $s', but you might otherwise not realize this when using the RecGroupBuilder, declare, and PendingType.
(rec (type $f (func))
(type $s (struct)))
(rec (type $s' (struct))
(type $f' (func)))
There was a problem hiding this comment.
done, added documentation to each declare function and RecGroupBuilder.
|
The new commit stores At the moment only For example, two structs where let mut builder = RecGroupBuilder::new(&engine);
let a = builder.declare_struct();
let b = builder.declare_struct();
builder
.define_struct(a)
.forward_ref_field(b)
.nullable(true)
.finish() // commit the field, back to the struct builder
.finish(); // commit the struct to the rec group
builder
.define_struct(b)
.field(FieldType::new(
Mutability::Const,
StorageType::ValType(ValType::I32),
))
.finish();
let group = builder.build()?;
let a: StructType = group.get_struct(a).unwrap();
let b: StructType = group.get_struct(b).unwrap(); |
Implements the embedder API requested in issue #10176.
Embedders can currently only build one-off struct/array/func types, so types that reference
themselves or each other can't be constructed directly. This adds RecGroupBuilder: declare
kind-typed labels, use them as forward references, and build() the whole recursion group at
once. It lowers to module-canonical WasmSubTypes and reuses the existing rec-group
registration path.
Also fixes an unrelated pre-existing bug in StorageType::is_val_type (matched I16 instead of
ValType), in its own commit.