Scott/store#375
Conversation
…ll data had returned
There was a problem hiding this comment.
Code Review
This pull request transitions the application from mock queries to a robust, real-time SSE streaming query system integrated with a Zustand-based state store (useAtlasStore) and Tldraw canvas synchronization. It introduces dynamic HTML parsing for tables and notes, a facet selector for charts, and developer state import/export capabilities. The review feedback highlights a critical XSS vulnerability in the HTML parser where event handlers are not stripped, along with several performance and robustness optimizations. These include hoisting Intl.NumberFormat instantiations out of render loops, deduplicating selected entity DCIDs, avoiding redundant store subscription executions, and guarding against infinite loops in ancestor chain resolution.
…the subscribe callback in sync_store.ts
nick-nlb
left a comment
There was a problem hiding this comment.
Thank you for all this Scott! Looks great, nice to see the store take shape.
A few comments below. Some of them we could roll out into another PR to prevent this one from getting to large/complex (i.e. the rethinking of how query results/cards are connected, unit tests, etc).
For any that you would prefer to address separately, we can place them in the tracker and do them as a follow-up.
| title: string; | ||
| variables: ChartVariable[]; | ||
| entities: ChartEntity[]; | ||
| metadata: ChartMetadata[]; |
There was a problem hiding this comment.
This isn't part of this PR directly (but does have a bearing on code in the PR that uses this interface):
We are bundling observation/facet/entity information into an interface called ChartMetadata, stored in a key labeled metadata.
Most of this information isn't actually metadata, but rather the actual observations themselves (the data/value sequence that makes up the time series), facet/series keys such as measurement method, observationPeriod, unit, etc, (which are a form of metadata, but here are the keys that distinguish a facet).
We will soon have endpoints by available through which Data Weaver will be able to get full, enriched metadata, that would have a matching interface here
For this particular section of the payload, something broader like data or timeSeries would fit better. Each entry of this array is really a TimeSeries. Most likely it will be rendered as a chart by the frontend (although even now we have the tabular view as well), but here is really a collection of facet-grouped observations.
We can envision how the relationship between the time series and the chart might be more complex when we think about how the frontend might combine time series into charts.
Right now, each chart renders a single time series, but we will soon have single charts that have multiple time series. This could be either a single stat var, multiple entities (GDP, Canada/USA) or a multiple variables, single stat var (GDP/Life Expectancy).
|
|
||
| export interface QueryResult { | ||
| id: string; | ||
| chartType: ChartType; |
There was a problem hiding this comment.
Following up on the previous comment about metadata to time series:
I see that the chart type was removed here. We only have line charts right now, but we will need some way for the front end to know "given this particular set of time series, group them together this way, and use this suggested chart type".
We have the array of time series (currently called metadata), but we don't yet have the suggestion of how these end up being grouped together in cards (if at all) and then what the appropriate charting tool is to display them.
We could have a separate attribute in the QueryResult that represents a suggested grouping of stat vars and entities into individual cards, with each grouping being given a suggested chart type.
| id: string; | ||
| chartType: ChartType; | ||
| title: string; | ||
| variables: ChartVariable[]; |
There was a problem hiding this comment.
Similarly to the current ChartMetadata, the ChartVariable and ChartEntity aren't really yet specific to charts (although they could be grouped that way later. This is more just a naming things, but these could be just "statVars", "entities", and they could actually be grouped into a metadata interface here (the linking of DCIDs to human-readable names could fit into the broad category of metadata).
| places = | ||
| selectedEntityDcids.length > 0 ? selectedEntityDcids : [query]; | ||
| } | ||
| if (places.length === 0) places = [query]; |
There was a problem hiding this comment.
If there is no place, why does the query get used? (If I search for "Tell me about GDP", this will end up being slotted into the places section here.
| if (domNode instanceof Element) { | ||
| for (const attr of Object.keys(domNode.attribs)) { | ||
| if (attr.toLowerCase().startsWith('on')) { | ||
| delete domNode.attribs[attr]; |
There was a problem hiding this comment.
There is a degree of sanitization happening here, but it could be hardened. Instead of deleting elements that start with "on", could we take a whitelist approach and remove all attributes except for the few that we need (i.e., we may only need class, href, target, rel etc, but not id, name, onX).
We are also dropping a particular set of node types (script, style) but it's hard to be very sure that we have covered every possible edge case in custom sanitization. I.e., form tags and svgs might be a vector.
We could keep this sanitization (while hardening it) but if we are handling and passing HTML, we should use a production grade sanitization library.
| colorScheme={BUTTON_COLOR_SCHEME} | ||
| onClick={exportState} | ||
| > | ||
| Export State |
There was a problem hiding this comment.
This is a very helpful feature! At some point, it might be nice to have this viewable inline (rather than downloading and opening) but for now, this is great.
|
|
||
| export interface CardEntry { | ||
| shapeId: string; | ||
| historyNodeId: string; |
There was a problem hiding this comment.
We are tightly coupling data retrieval (our query history) and presentation (the cards on the canvas and how they decide what data to display). In particular, a card is locked to a single query round and a single entity/variable combination (with the first one displaying by default).
This works for simple tables where users click to spawn individual charts. However, it limits the more complex possibilities that we will need.
For example, f I search for "Life Expectancy and Age Pyramid in North America", I might expect to get multiple stat vars and multiple time series, combined in various ways, and displayed over multiple chart cards, all opened right away.
In that example, I might get one chart (a line chart) that contains time series for the various countries in NA, as well as a second chart that contains a histograph of the age pyramid (once bar charts are available).
Relatedly, if we want to manipulate charts directly on the front end (let's say combining two line charts generated over two different queries into one), this tight coupling limits us. Combining two charts on the canvas shouldn't alter or merge the underlying history.
Thinking aloud about how we could approach this decoupling:
We could introduce a pointer structure in the store:
DataSeriesSpec {
historyNodeId: string;
placeDcid: string;
variableDcid: string;
}
Each chart then holds a series array of these DataSeriesSpec entries (which would allow us to merge, split, or otherwise customize charts on the canvas by manipulating the series arrays, leaving the underlying query history and data records untouched).
|
|
||
| // Register table card. | ||
| const tableId = `shape:${active.nodeId}__${entityDcid}__table`; | ||
| cardRegister(tableId, active.nodeId, 'table', entityDcid); |
There was a problem hiding this comment.
Maybe a batch registration of multiple cards for sequences like this?
| querySetStatus: (val) => | ||
| set({ currentStatus: val }, undefined, 'querySetStatus'), | ||
|
|
||
| getAncestorChain: (nodeId) => { |
There was a problem hiding this comment.
Can we start to introduce unit tests for these functions (this could be a separate, follow-up PR).
scott/store— Zustand Store & Query Pipeline IntegrationSummary
Introduces a centralized Zustand store (
AtlasStore) as the single source of truth for atlas state, replacing the previously scattered local state. Connects the streaming query pipeline end-to-end so queries produce visible cards on the canvas with chart, table, and notes variants.Key Changes
Store (
src/store/)AtlasStore(Zustand + devtools) managing history nodes, a card registry, and transient UI state (processing flag, status message).queryStart,nodeAddResult,queryComplete,queryFail,queryCancel,cardRegister/cardUnregister, etc.Store ↔ Canvas Sync (
scopes/atlas/sync_store.ts)AtlasContent(chart / table / text) from rawQueryResultdata in the store.Query Pipeline
QueryProviderrefactored to dispatch streaming events (STREAM_EVENT) into the store rather than managing cards locally.QueryResult, parsed viamarked, so the client receives pre-rendered HTML.STATUS.*) through the store.UI
HtmlParsedprimitive — renders trusted server-generated HTML with clickable internal links (triggers new chart queries).Misc
markedto the client bundle).validate_hrefutility for safe link handling within parsed HTML.use_dev_modehook for toggling developer tooling.