800: Allow declaring that multiple Entities should be created/updated#818
800: Allow declaring that multiple Entities should be created/updated#818lognaturel merged 46 commits intoXLSForm:masterfrom
Conversation
- survey_element.py: add "type" (if available) to the repr for debugging - entity_declaration.py: add missing "type" field which appears in the dict prepared by entity_parsing.py, and in general set that type argument by default - survey.py: similar to the above add type field to init signature, and remove unicode/repr in favour of survey_element implementation
- works the same but could allow builder.py to be compatible with Survey/SurveyElement tree as well as the workbook_to_json output dict
- allowed to put entities in repeats (v1) - allowed to declare more than one entity (v2) - pyxform places the entity instead of user-specified (v2)
- previous equality check would only emit for a direct repeat parent - also nest under `if` to avoid potentially adding `None` to children
c28cb5f to
2348363
Compare
- NAMES_011 is about period in the name - NAMES_012 is about reserved words name/label
- test plan not fully implemented yet but the idea is to do a chunk of it to check that the approach seems sensible and usable in practice
- old: test_entities_columns__multiple_unexpected - new: test_unexpected_column__multiple__error
- old: test_entities_columns__one_unexpected - new: test_unexpected_column__single__error
- old: test_dataset_with_reserved_prefix__errors - new: test_dataset_name__reserved_prefix__error
- old: test_dataset_with_period_in_name__errors - new: test_dataset_name__period__error
- old: test_dataset_with_invalid_xml_name__errors - new: test_dataset_name__xml_identifier__error
- old: test_worksheet_name_close_to_entities__produces_warning - new: test_sheet_name_misspelling__warning
- old: test_saveto_without_entities_sheet__errors - new: test_no_entity_declarations__error
- old: test_reference_name_not_found__error - new: test_unresolved_variable_reference__error
- needed to refactor/move workbook_to_json survey type parsing because previous saveto check would raise a false positive error if the type contained "group" or "repeat" but that can occur for selects etc, so control type parsing is moved earlier so that result can be used, and so it's possible to catch a save_to in an end_group / end_repeat.
- converted `{dataset_name: list[ReferenceSource]}` to an object:
- used in many functions so it's now clearer what that object is
- allows it can carry the row_number for error messages
- may make sense to move some function(s) to it as class methods
- test cases are somewhat light since the intention is to cover the
solver behaviour more thoroughly in the topology test suite
- fold allocate_entities_to_paths into allocate_entities_to_containers
since it doesn't get used elsewhere and relies on the external
all_allocations dict anyway
- also replace somewhat confusing `while` loop that just is based on
ints with hopefully clearer range loop over requested_path
- replace get_allocation_request repeated max/min calls with equivalent
single loop over the references, clearer and avoids repeated loops
|
This is feeling really magical, I must say! I think users will find it "just works." I'm starting to get myself familiar with the code and coming up with more interesting forms to see how they match up with your tests. Let me know if there's anything I should pay particular attention to. |
|
@ln thanks! I think something that would be good to review initially is the traceability test suite plan at the top of Another thing that would be good to review early is the |
- as described in XLSForm#819 if an entity_id is defined then the default setvalue node which targets the entity_id should not be emitted. - included cases that show behaviour when a dynamic default for the referenced entity_id is included i.e. the setvalue for the default targets the question rather than the entity_id attribute - as discussed on forum thread 55163, in Collect this duplicate setvalue/bind sees the bind "win" whereas in Webforms the setvalue "wins" so it is a functionality problem as well as being redundant. - added / moved test cases for implicit update/upsert modes - many of the tests that are still in the old modules include assertions for save_to which is not part of test_entities yet - added requirement and assertion for the instance csv since while that instance is essential for updating, it's not currently intended to be emitted by default - added requirements for not emitting namespace/version since these were covered in previous test cases
- L560: the allocations are derived from the entity declarations so
the dataset_name should always be found
- L567: it seems like the has_repeat_ancestor flag is sufficient
- L568: the id_attr is added by default on L286 and the actions attr
defaults to an empty list so it's safe to assume both exist
- to help keep the default function in sync though, the action value
for the repeat setvalue is copied from the top-level action
- the top-level action is only added if there is no entity_id L234.
- L424: the container path was being calculated from the stack for each
row, but it is a property of each stack entry which can be calculated
in advance.
- added a ContainerPath class to represent the collection of
container nodes used in many places, and used it to calculate
the container path for each stack entry when adding to the stack
- also the path can assume the previous stack entry has a container
path as well, so that value is used instead of re-iterating
- L696: deleted branch since it's not reachable - if there are multiple entity declarations, then the code would go through L690 -> L691
| | | end_group | | | | | ||
| | | end_repeat | | | | | ||
|
|
||
| | entities | |
There was a problem hiding this comment.
🤯 You really went for the most diabolical cases!
| - dataset_2: xpath - where the meta/entity block should be for requestor_2 | ||
|
|
||
|
|
||
| *Stress test* |
There was a problem hiding this comment.
Wow, nice one. That's a very big survey, if things are working for that, we're in great shape.
There was a problem hiding this comment.
(I think) I'm most of the way through the traceability suite, but this stress test is part of the solver suite, so I'm not sure it passes yet, but then again the expectation for the stress test is pretty loose.
|
Thanks for the guidance on how to approach early review. I've looked at tests and messages Edit: these do work!! I've updated them with some minor unrelated fixes that made conversion failed. I must have forgotten to install. |
It may not have worked earlier but using the latest commit 6b525bb, conceptually this is OK. There's a test |
- add a test for the error that odk_validate raises when the instance referenced in update/upsert mode isn't there (EB019) - update other such tests to use csv-external so that the instance is present - not a core part of the test as such, but required for update/upsert to work without an odk_validate error
- depth 1: survey/group - depth 2: repeat/group - depth 3: repeat/group + var outside repeat
|
That’s amazing to read, thank you! I wonder whether I forgot to rebuild? I’ll try to get to the bottom of it and keep working through the test cases. 🚀 |
- previously, not adding any references to or from an entity would
result in them being silently ignored, but now such entities are
considered as targeting the survey
- allocation currently only considers ancestors so only one such
unreferenced entity can exist in the form, i.e. even if there are
spare valid groups the entities won't be pushed down into them
- removed EB011 since it is covered by EB010 in that the row number is
what makes it stable by breaking ties using a distinct value
- locally the @baseVersion seemed to reliably be the one in the error message, but on GH Actions it seems to prefer @branchId, so the test assertion uses generic message fragments, which should still be selective enough for this error and not match others
fe4457b to
6eb21a2
Compare
|
Confirmed those three work and updated the source above to fix the issues you mentioned. So good! I'm continuing my review looking specifically at risk to existing functionality we may not have explicitly tested and common patterns we expect. I think we should merge a version sooner than later so QA and the broader community can start pushing on it. |
- combinations of: - create, update, multiple properties, multiple entities - survey, group, repeat, repeat/group, group/repeat - delete tests covered by these newer ones
- previously did not handle a blank list_name / dataset where other columns had a value, but now there is an error for that - the dataset_name checks assumed only one entity is possible so had hardcoded row=2 in all errors, but now it uses the actual row number
- should help keep what is parsed in sync with current entity cols - still need outer get_entity_variable_references to index on question_name to avoid having to do nested iteration on every row
- fixes odk_validate failures
|
Maybe it's from looking at it for so long, but this PR seems to be in decent enough shape for merging. The remaining work is mainly the solver suite which is searching for edge cases. |
lognaturel
left a comment
There was a problem hiding this comment.
Maybe it's from looking at it for so long, but this PR seems to be in decent enough shape for merging.
No, it's that you did it! What a tricky algorithmic puzzle.
I've identified a couple of edge cases but nothing serious. I'll file those as separate issues.
| - EB007: implicit entity_id=1, create_if=0, update_if=1 (update / if) | ||
| - EB008: implicit entity_id=1, create_if=1, update_if=0 (error, EV010) | ||
| - EB009: implicit entity_id=1, create_if=1, update_if=1 (upsert / if) | ||
| - EB010: Meta/entity allocations are stable/deterministic |
There was a problem hiding this comment.
I'm not seeing a test where multiple allocations are possible that would fail if the allocation rule changed. Something like
md = """
| survey |
| | type | name | label | save_to |
| | begin_group | g1 | G1 | |
| | text | q1 | Q1 | e1p1 |
| | begin_group | g2 | G2 | |
| | text | q2 | Q2 | e1p2 |
| | end_group | g2 | | |
| | end_group | g1 | | |
| entities |
| | list_name | label |
| | e1 | E1 |
"""
There was a problem hiding this comment.
I do think this form illustrates what I wanted because both g1 and toplevel would be valid placements for the declaration. I just realized it also produces an invalid XForm because it currently places it in g2: #823
Closes #800
Closes #819
Why is this the best possible solution? Were any other approaches considered?
Initial implementation that passes existing relevant tests and a handful of new ones but still needs:
As noted in below comments, the index to this PR is the docstring at the top of
test_entities.pywhich assigns a short code to each requirement from the XForm spec, for validations, and for general behaviour. These short codes are then used to tag the relevant test cases.What are the regression risks?
New feature but main regression risk is breaking forms for the old entity specs
Does this change require updates to documentation? If so, please file an issue here and include the link below.
Yes, there's a spec doc update already but XLSForm side not documented yet
Before submitting this PR, please make sure you have:
testspython -m unittestand verified all tests passruff format pyxform testsandruff check pyxform teststo lint code