800: Allow declaring that multiple Entities should be created/updated by lindsay-stevens · Pull Request #818 · XLSForm/pyxform

lindsay-stevens · 2026-02-11T06:18:16Z

Closes #800
Closes #819

Why is this the best possible solution? Were any other approaches considered?

Initial implementation that passes existing relevant tests and a handful of new ones but still needs:

validations to be added
error messages tidied up
much more + thorough tests
- mainly solver regression suite remaining but current tests cover many common patterns
- remaining existing tests in test_create_repeat.py may be better covered by solver suite
more docstrings / comments
- could be more thorough still but is reasonably clear at the moment
resolve todo's
probably some restructuring to improve readability / maintainability e.g.
- replace mysterious dict/tuple/etc with types (with class methods etc)
- redundant data processing / call paths

As noted in below comments, the index to this PR is the docstring at the top of test_entities.py which assigns a short code to each requirement from the XForm spec, for validations, and for general behaviour. These short codes are then used to tag the relevant test cases.

What are the regression risks?

New feature but main regression risk is breaking forms for the old entity specs

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Yes, there's a spec doc update already but XLSForm side not documented yet

Before submitting this PR, please make sure you have:

included test cases for core behavior and edge cases in tests
run python -m unittest and verified all tests pass
run ruff format pyxform tests and ruff check pyxform tests to lint code
verified that any code or assets from external sources are properly credited in comments

- survey_element.py: add "type" (if available) to the repr for debugging - entity_declaration.py: add missing "type" field which appears in the dict prepared by entity_parsing.py, and in general set that type argument by default - survey.py: similar to the above add type field to init signature, and remove unicode/repr in favour of survey_element implementation

- works the same but could allow builder.py to be compatible with Survey/SurveyElement tree as well as the workbook_to_json output dict

- allowed to put entities in repeats (v1) - allowed to declare more than one entity (v2) - pyxform places the entity instead of user-specified (v2)

- previous equality check would only emit for a direct repeat parent - also nest under `if` to avoid potentially adding `None` to children

- NAMES_011 is about period in the name - NAMES_012 is about reserved words name/label

- test plan not fully implemented yet but the idea is to do a chunk of it to check that the approach seems sensible and usable in practice

- old: test_entities_columns__multiple_unexpected - new: test_unexpected_column__multiple__error

- old: test_entities_columns__one_unexpected - new: test_unexpected_column__single__error

- old: test_dataset_with_reserved_prefix__errors - new: test_dataset_name__reserved_prefix__error

- old: test_dataset_with_period_in_name__errors - new: test_dataset_name__period__error

- old: test_dataset_with_invalid_xml_name__errors - new: test_dataset_name__xml_identifier__error

- old: test_worksheet_name_close_to_entities__produces_warning - new: test_sheet_name_misspelling__warning

- old: test_saveto_without_entities_sheet__errors - new: test_no_entity_declarations__error

- old: test_reference_name_not_found__error - new: test_unresolved_variable_reference__error

- needed to refactor/move workbook_to_json survey type parsing because previous saveto check would raise a false positive error if the type contained "group" or "repeat" but that can occur for selects etc, so control type parsing is moved earlier so that result can be used, and so it's possible to catch a save_to in an end_group / end_repeat.

- converted `{dataset_name: list[ReferenceSource]}` to an object: - used in many functions so it's now clearer what that object is - allows it can carry the row_number for error messages - may make sense to move some function(s) to it as class methods - test cases are somewhat light since the intention is to cover the solver behaviour more thoroughly in the topology test suite

- fold allocate_entities_to_paths into allocate_entities_to_containers since it doesn't get used elsewhere and relies on the external all_allocations dict anyway - also replace somewhat confusing `while` loop that just is based on ints with hopefully clearer range loop over requested_path - replace get_allocation_request repeated max/min calls with equivalent single loop over the references, clearer and avoids repeated loops

lognaturel · 2026-02-27T00:41:25Z

This is feeling really magical, I must say! I think users will find it "just works." I'm starting to get myself familiar with the code and coming up with more interesting forms to see how they match up with your tests. Let me know if there's anything I should pay particular attention to.

lindsay-stevens · 2026-02-27T11:25:00Z

@ln thanks! I think something that would be good to review initially is the traceability test suite plan at the top of test_entities.py. I'm still refining it, but the gist of it is there. What I was going for is a way to organise the test suite for entities, in a lightweight way that can map each test to the relevant requirement(s) and vice versa. There are fancy ways to do this sort of thing with method decorators or gherkin libraries etc. - but as it is currently, at least it allows (in Pycharm or similar) to double-click or search each requirement tag (e.g ES001) to see where the relevant tests are.

Another thing that would be good to review early is the ENTITY_009, ENTITY_011, and ENTITY_012 errors for topology problems. With those cases, it seemed like the sort of thing where it may be quite confusing for users to resolve for non-trivial forms - even if the message includes every bit of context available (which may be too verbose) where the error is raised - because ultimately pyxform can't know exactly what the user intended to do. Maybe there is a good docs URL that can do the heavy lifting on explaining, rather than a paragraph-sized error? etc.

- as described in XLSForm#819 if an entity_id is defined then the default setvalue node which targets the entity_id should not be emitted. - included cases that show behaviour when a dynamic default for the referenced entity_id is included i.e. the setvalue for the default targets the question rather than the entity_id attribute - as discussed on forum thread 55163, in Collect this duplicate setvalue/bind sees the bind "win" whereas in Webforms the setvalue "wins" so it is a functionality problem as well as being redundant. - added / moved test cases for implicit update/upsert modes - many of the tests that are still in the old modules include assertions for save_to which is not part of test_entities yet - added requirement and assertion for the instance csv since while that instance is essential for updating, it's not currently intended to be emitted by default - added requirements for not emitting namespace/version since these were covered in previous test cases

- L560: the allocations are derived from the entity declarations so the dataset_name should always be found - L567: it seems like the has_repeat_ancestor flag is sufficient - L568: the id_attr is added by default on L286 and the actions attr defaults to an empty list so it's safe to assume both exist - to help keep the default function in sync though, the action value for the repeat setvalue is copied from the top-level action - the top-level action is only added if there is no entity_id L234.

- L424: the container path was being calculated from the stack for each row, but it is a property of each stack entry which can be calculated in advance. - added a ContainerPath class to represent the collection of container nodes used in many places, and used it to calculate the container path for each stack entry when adding to the stack - also the path can assume the previous stack entry has a container path as well, so that value is used instead of re-iterating

- L696: deleted branch since it's not reachable - if there are multiple entity declarations, then the code would go through L690 -> L691

lognaturel · 2026-02-27T20:51:19Z

tests/entities/test_create_repeat.py

+        | | end_group    |       |       |         |
        | | end_repeat   |       |       |         |

        | entities |


🤯 You really went for the most diabolical cases!

lognaturel · 2026-02-27T21:06:49Z

tests/entities/test_entities.py

+- dataset_2: xpath - where the meta/entity block should be for requestor_2
+
+
+*Stress test*


Wow, nice one. That's a very big survey, if things are working for that, we're in great shape.

(I think) I'm most of the way through the traceability suite, but this stress test is part of the solver suite, so I'm not sure it passes yet, but then again the expectation for the stress test is pretty loose.

lognaturel · 2026-02-27T22:47:32Z

Thanks for the guidance on how to approach early review. I've looked at tests and messages ~~and one thing that's jumping out to me is that there's one common pattern that's not yet supported: savetos for the same list across inner groups within top level or repeats.~~ Sample forms:

Edit: these do work!! I've updated them with some minor unrelated fixes that made conversion failed. I must have forgotten to install.

lindsay-stevens · 2026-03-02T11:25:30Z

savetos for the same list across inner groups within top level or repeats

It may not have worked earlier but using the latest commit 6b525bb, conceptually this is OK. There's a test test_save_to_scope_breach__depth_1_group__save_to_only__ok which shows saveto inside and outside a group. But I'll add some more cases that follow the patterns in these XLSX files to lock them in. I was able to convert all 3 of the files, except that two of them had a missing referenced name error - member_hh perhaps should be named hhid, because there was no other question named hhid and the entity labels referenced ${hhid} - after fixing that they converted.

- add a test for the error that odk_validate raises when the instance referenced in update/upsert mode isn't there (EB019) - update other such tests to use csv-external so that the instance is present - not a core part of the test as such, but required for update/upsert to work without an odk_validate error

- depth 1: survey/group - depth 2: repeat/group - depth 3: repeat/group + var outside repeat

lognaturel · 2026-03-02T14:25:04Z

That’s amazing to read, thank you! I wonder whether I forgot to rebuild? I’ll try to get to the bottom of it and keep working through the test cases. 🚀

- previously, not adding any references to or from an entity would result in them being silently ignored, but now such entities are considered as targeting the survey - allocation currently only considers ancestors so only one such unreferenced entity can exist in the form, i.e. even if there are spare valid groups the entities won't be pushed down into them - removed EB011 since it is covered by EB010 in that the row number is what makes it stable by breaking ties using a distinct value

- locally the @baseVersion seemed to reliably be the one in the error message, but on GH Actions it seems to prefer @branchId, so the test assertion uses generic message fragments, which should still be selective enough for this error and not match others

lognaturel · 2026-03-02T20:39:11Z

Confirmed those three work and updated the source above to fix the issues you mentioned. So good!

I'm continuing my review looking specifically at risk to existing functionality we may not have explicitly tested and common patterns we expect. I think we should merge a version sooner than later so QA and the broader community can start pushing on it.

- combinations of: - create, update, multiple properties, multiple entities - survey, group, repeat, repeat/group, group/repeat - delete tests covered by these newer ones

- previously did not handle a blank list_name / dataset where other columns had a value, but now there is an error for that - the dataset_name checks assumed only one entity is possible so had hardcoded row=2 in all errors, but now it uses the actual row number

- should help keep what is parsed in sync with current entity cols - still need outer get_entity_variable_references to index on question_name to avoid having to do nested iteration on every row

- fixes odk_validate failures

lindsay-stevens · 2026-03-03T15:35:18Z

Maybe it's from looking at it for so long, but this PR seems to be in decent enough shape for merging. The remaining work is mainly the solver suite which is searching for edge cases.

lognaturel

Maybe it's from looking at it for so long, but this PR seems to be in decent enough shape for merging.

No, it's that you did it! What a tricky algorithmic puzzle.

I've identified a couple of edge cases but nothing serious. I'll file those as separate issues.

lognaturel · 2026-03-04T01:11:38Z

tests/entities/test_entities.py

+    - EB007: implicit entity_id=1, create_if=0, update_if=1 (update / if)
+    - EB008: implicit entity_id=1, create_if=1, update_if=0 (error, EV010)
+    - EB009: implicit entity_id=1, create_if=1, update_if=1 (upsert / if)
+    - EB010: Meta/entity allocations are stable/deterministic


I'm not seeing a test where multiple allocations are possible that would fail if the allocation rule changed. Something like

md = """ | survey | | | type | name | label | save_to | | | begin_group | g1 | G1 | | | | text | q1 | Q1 | e1p1 | | | begin_group | g2 | G2 | | | | text | q2 | Q2 | e1p2 | | | end_group | g2 | | | | | end_group | g1 | | | | entities | | | list_name | label | | | e1 | E1 | """

I do think this form illustrates what I wanted because both g1 and toplevel would be valid placements for the declaration. I just realized it also produces an invalid XForm because it currently places it in g2: #823

lindsay-stevens added 3 commits February 11, 2026 16:57

chg: loosen dict requirement for builder.py to Mapping

ef0fd5a

- works the same but could allow builder.py to be compatible with Survey/SurveyElement tree as well as the workbook_to_json output dict

chg: initial implementation of entities spec 2025.1.0 (v2)

19e5ea3

- allowed to put entities in repeats (v1) - allowed to declare more than one entity (v2) - pyxform places the entity instead of user-specified (v2)

lognaturel moved this to ✏️ in progress in ODK Central Feb 11, 2026

lognaturel added this to ODK Central Feb 11, 2026

fix: emit repeat setvalue if entity target has a repeat ancestor

2348363

- previous equality check would only emit for a direct repeat parent - also nest under `if` to avoid potentially adding `None` to children

lindsay-stevens force-pushed the pyxform-800 branch from c28cb5f to 2348363 Compare February 12, 2026 05:57

lindsay-stevens added 2 commits February 17, 2026 04:16

fix: save_to validation raising wrong error message

813ed13

- NAMES_011 is about period in the name - NAMES_012 is about reserved words name/label

add: detailed xpath assertions for entities create repeat test cases

0dd6578

lognaturel mentioned this pull request Feb 18, 2026

Remove or replace XLSForms with repeat column on entities sheet getodk/central#1658

Open

lindsay-stevens added 20 commits February 21, 2026 01:04

add: new entities test module with test plans and poc test cases

ab8c080

- test plan not fully implemented yet but the idea is to do a chunk of it to check that the approach seems sensible and usable in practice

dev: delete test covered by new test_entities module

2a0672d

- old: test_entities_columns__multiple_unexpected - new: test_unexpected_column__multiple__error

dev: delete test covered by new test_entities module

b8ee322

- old: test_entities_columns__one_unexpected - new: test_unexpected_column__single__error

dev: delete test covered by new test_entities module

d604c58

- old: test_dataset_with_reserved_prefix__errors - new: test_dataset_name__reserved_prefix__error

dev: delete test covered by new test_entities module

305247b

- old: test_dataset_with_period_in_name__errors - new: test_dataset_name__period__error

dev: delete test covered by new test_entities module

cf5b22b

- old: test_dataset_with_invalid_xml_name__errors - new: test_dataset_name__xml_identifier__error

dev: delete test covered by new test_entities module

efc0ee6

- old: test_worksheet_name_close_to_entities__produces_warning - new: test_sheet_name_misspelling__warning

dev: delete test covered by new test_entities module

bd500d1

- old: test_saveto_without_entities_sheet__errors - new: test_no_entity_declarations__error

dev: delete test covered by new test_entities module

f231f54

- old: test_reference_name_not_found__error - new: test_unresolved_variable_reference__error

fix: docstring typo in validate_question_group_repeat_name

da3eb47

add: validation for a missing entity declaration

b65baf3

chg: update message and tests for missing entity create label error

49bd637

dev: tidy up markdown table formatting / alignment

12c5339

chg: update message and tests for missing entity upsert update_if error

990bd4d

chg: update missing entity update/upsert entity_id error

13b7d41

add: validation for missing entity save_to prefix with multiple entities

12b5fab

chg: delete EV013 since it's covered by EV008 unresolved save_to prefix

4722df0

add: tests and clearer error for save_to scope breach check

a4eee45

lindsay-stevens added 4 commits February 28, 2026 00:59

dev: resolve todo item in entities_parsing.py

6b525bb

- L696: deleted branch since it's not reachable - if there are multiple entity declarations, then the code would go through L690 -> L691

lognaturel reviewed Feb 27, 2026

View reviewed changes

lindsay-stevens added 2 commits March 2, 2026 23:10

add: tests for savetos within same boundary but different containers

465e1dc

- depth 1: survey/group - depth 2: repeat/group - depth 3: repeat/group + var outside repeat

lindsay-stevens added 2 commits March 3, 2026 03:31

lindsay-stevens force-pushed the pyxform-800 branch from fe4457b to 6eb21a2 Compare March 2, 2026 16:45

lindsay-stevens added 4 commits March 3, 2026 19:57

add: test cases for save_to scenarios

86ec42f

- combinations of: - create, update, multiple properties, multiple entities - survey, group, repeat, repeat/group, group/repeat - delete tests covered by these newer ones

chg: move variable parsing to get_entity_declaration

85eee0b

- should help keep what is parsed in sync with current entity cols - still need outer get_entity_variable_references to index on question_name to avoid having to do nested iteration on every row

fix: add instance to save_to with update mode tests

1342843

- fixes odk_validate failures

lindsay-stevens marked this pull request as ready for review March 3, 2026 15:30

lognaturel approved these changes Mar 3, 2026

View reviewed changes

lognaturel merged commit 3906d14 into XLSForm:master Mar 3, 2026
14 checks passed

github-project-automation bot moved this from ✏️ in progress to ✅ done in ODK Central Mar 3, 2026

lognaturel reviewed Mar 4, 2026

View reviewed changes

lognaturel mentioned this pull request Mar 4, 2026

Test multi-entity functionality across Central + Collect + XLSForm/pyxform + Web Forms getodk/central#1598

Open

lindsay-stevens deleted the pyxform-800 branch March 6, 2026 10:25

		- dataset_2: xpath - where the meta/entity block should be for requestor_2


		Stress test

Conversation

lindsay-stevens commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Before submitting this PR, please make sure you have:

Uh oh!

lognaturel commented Feb 27, 2026

Uh oh!

lindsay-stevens commented Feb 27, 2026

Uh oh!

lognaturel Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

lognaturel Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

lindsay-stevens Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

lognaturel commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lindsay-stevens commented Mar 2, 2026

Uh oh!

lognaturel commented Mar 2, 2026

Uh oh!

lognaturel commented Mar 2, 2026

Uh oh!

lindsay-stevens commented Mar 3, 2026

Uh oh!

lognaturel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lognaturel Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

lognaturel Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lindsay-stevens commented Feb 11, 2026 •

edited

Loading

lognaturel commented Feb 27, 2026 •

edited

Loading

lognaturel Mar 4, 2026 •

edited

Loading