-
Notifications
You must be signed in to change notification settings - Fork 300
Move SQLite writes to a background thread and buffer read state #3716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughReworks shape metadata storage to an ETS-backed shape_meta_table and adds a WriteBuffer GenServer that batches shape metadata mutations (add/remove/snapshot) before persisting to SQLite. Query and connection layers made buffer-aware; supervisors, telemetry, config, and tests updated for buffered semantics and labeled checkouts. Public signatures largely preserved. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ShapeDb
participant WriteBuffer
participant SQLite
Client->>ShapeDb: add_shape(handle, shape, hash, relations)
ShapeDb->>WriteBuffer: enqueue :add (buffer shape, relations, hash)
WriteBuffer-->>ShapeDb: {:ok, shape_hash}
Note right of WriteBuffer: background or explicit flush will persist batched ops
WriteBuffer->>SQLite: batch apply (add/remove/snapshot ops)
SQLite-->>WriteBuffer: ok
WriteBuffer-->>ShapeDb: flush completion (cleanup tombstones)
ShapeDb-->>Client: {:ok, shape_hash}
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3716 +/- ##
==========================================
+ Coverage 83.40% 87.36% +3.96%
==========================================
Files 35 23 -12
Lines 2380 2011 -369
Branches 530 533 +3
==========================================
- Hits 1985 1757 -228
+ Misses 393 252 -141
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment has been minimized.
This comment has been minimized.
1cdec3c to
2734d7b
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
benchmark this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex`:
- Around line 106-125: The fallback clause handle_checkout({:checkout, _label},
_from, conn, pool_state) is unreachable because handle_enqueue({:checkout,
label}, pool_state) always enqueues {:checkout, label, now()}, so remove the
dead clause (the two-arity pattern) from connection.ex (or instead replace it
with a clear comment if you intentionally want defensive behavior); update tests
accordingly to ensure no behavior changes.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
packages/sync-service/lib/electric/shape_cache/shape_status.expackages/sync-service/lib/electric/shape_cache/shape_status/shape_db.expackages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.expackages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.expackages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses
📚 Learning: 2026-01-14T14:45:05.838Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead
Applied to files:
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
🧬 Code graph analysis (2)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (3)
packages/sync-service/lib/electric/connection/manager.ex (1)
pool_name(180-182)packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)
explain(182-186)packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)
explain(263-265)explain(267-269)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (2)
checkout_write!(168-180)checkout!(159-166)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Check and build examples/linearlite-read-only
- GitHub Check: Check and build examples/tanstack-db-web-starter
- GitHub Check: Check and build examples/linearlite
- GitHub Check: Check and build examples/nextjs
- GitHub Check: Check and build examples/react
- GitHub Check: Check and build examples/encryption
- GitHub Check: Check and build examples/proxy-auth
- GitHub Check: Run Lux integration tests
🔇 Additional comments (18)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)
6-8: LGTM!The new
list_shape_metaquery correctly selects the required metadata fields (handle, hash, snapshot_state) for the ETS cache population.
249-253: LGTM!The stream function correctly transforms the database's integer
snapshot_stateto a boolean representing "snapshot started" status, consistent with thesnapshot_started?function logic at Line 199.packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (3)
12-15: LGTM!The updated imports correctly reflect the new function signatures that accept action labels for telemetry tracking.
35-47: LGTM!Action labels are consistently applied across all checkout operations, enabling meaningful telemetry tracking per operation type.
100-106: LGTM!The renamed
reduce_shape_meta/3correctly uses the newlist_shape_meta_streamand the name accurately reflects the richer metadata being processed.packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (1)
176-199: LGTM!The test comprehensively validates the renamed
reduce_shape_meta/3function, correctly verifying the {handle, hash, snapshot_started} tuple structure with proper setup of both started and non-started snapshots.packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (3)
132-140: LGTM!The reduced SQLite page cache (512KB) is appropriate since hot-path lookups now go through the ETS cache. The comment clearly explains the negative value convention.
114-118: LGTM!The telemetry emission provides valuable observability for pool checkout latency, enabling monitoring and debugging of potential connection pool bottlenecks per operation type.
159-180: LGTM!The updated function signatures correctly propagate the label through to NimblePool, enabling the telemetry instrumentation in
handle_checkout.packages/sync-service/lib/electric/shape_cache/shape_status.ex (9)
32-34: LGTM!The tuple structure
{handle, hash, snapshot_started, last_read_time}and position constant are clearly documented, making the code self-explanatory.
69-75: LGTM!The ETS insertion correctly uses
insert_newto prevent race conditions and properly sequences the persistent store write before the cache write.
169-175: LGTM!The hash validation via ETS lookup is an effective optimization, avoiding SQLite queries for shape validation on the hot path.
182-186: LGTM!The update correctly sequences the persistent write before the cache update and properly uses
update_elementwith the correct tuple position.
190-193: LGTM!Reading
snapshot_started?from ETS cache provides fast hot-path access while maintaining a safe default offalsefor missing entries.
203-205: Verify:snapshot_complete?not cached in ETS.While
snapshot_started?now reads from the ETS cache (Line 190-193),snapshot_complete?still queries SQLite directly. This asymmetry may be intentional ifsnapshot_complete?is called less frequently, but it creates inconsistent performance characteristics.If
snapshot_complete?is also on a hot path, consider extending the meta tuple to{handle, hash, snapshot_started, snapshot_complete, last_read_time}and caching both states.
231-267: LGTM!The LRU algorithm using
gb_treesis an efficient approach (O(n) scan with O(log k) tree operations) and correctly handles the new 4-tuple structure.
297-310: LGTM!The population logic correctly transforms the 3-tuple from
ShapeDb.reduce_shape_metainto the 4-tuple required by ETS, initializing all shapes with the samestart_timefor a fair LRU baseline after restart.
283-295: LGTM!The ETS table configuration with
read_concurrency: trueandwrite_concurrency: :autois well-suited for a cache that's primarily read-heavy with occasional writes.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
2734d7b to
bb41c9a
Compare
|
Concurrent shape creation looks significantly slower. Do you think this might cause timeouts with high load? |
fee08a1 to
62d685a
Compare
This comment has been minimized.
This comment has been minimized.
62d685a to
67a2d9f
Compare
|
benchmark this |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
benchmark this |
This comment has been minimized.
This comment has been minimized.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
b9b0fce to
fceff29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@packages/sync-service/config/runtime.exs`:
- Around line 199-206: The current config decodes JSON into atoms using
Jason.decode!(..., keys: :atoms) which can create new atoms and exhaust the BEAM
atom table; update the decode call for max_concurrent_requests to use keys:
:atoms! instead (i.e., replace Jason.decode!(json, keys: :atoms) with
Jason.decode!(json, keys: :atoms!)) so only existing atoms are used; locate the
max_concurrent_requests assignment and the Jason.decode! invocation to make this
change.
In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 401-425: The batch write silently ignores non-:ok returns causing
failed DB ops to be dropped; update do_batch_write so each call to
Query.remove_shape, Query.mark_snapshot_started, and
Query.mark_snapshot_complete is enforced with pattern matching (e.g., expect :ok
and let a non-:ok return raise) similar to the existing :ok = Query.add_shape
line, so failures propagate to the rescue block instead of being ignored during
the Enum.each over entries.
- Around line 339-381: flush_until_empty currently leaves entries marked with
flushing=true when do_batch_write returns :error, preventing retries; after
detecting a non-:ok result from do_batch_write(stack_id, entries) you should
iterate the collected entries and reset their flushing flag back to false in the
ops_table so they can be retried later (use the same key shape used by
mark_and_collect_entries/ops_table entries, updating the flushing field via
:ets.update_element or a select/replace on ops_table for each {ts, _op} in
entries). Ensure this logic runs only on failure paths and references the local
variables ops_table, entries, and the flushing flag so successful paths are
unchanged.
- Around line 56-58: The ops table currently keys only by timestamp() which can
collide; add a new function op_key/0 that returns a composite monotonic
timestamp plus a unique integer (e.g., {System.monotonic_time(),
:erlang.unique_integer([:positive])}) and keep timestamp/0 if needed for
readability, then replace all uses of ts = timestamp() in add_shape,
remove_shape, queue_snapshot_started, and queue_snapshot_complete with ts =
op_key() so each op key is ordered and unique, preventing overwrites in the
:ordered_set.
In
`@packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs`:
- Around line 117-120: Replace the fixed Process.sleep(5) with a deterministic
synchronization: wait (via assert_receive) for the RequestBatcher to confirm the
shape is fully enqueued or poll a public/readable status function on
RequestBatcher (e.g., RequestBatcher.enqueued?/2 or similar) until it returns
true before calling RequestBatcher.remove_shape(stack_id, `@shape_handle_2`);
ensure you add or use an existing test hook/message (like an :shape_enqueued
notification) from the add_shape handling so the test can reliably
assert_receive that notification instead of sleeping.
🧹 Nitpick comments (2)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (1)
123-125: Emit checkout telemetry for non-queued checkouts too.Fast-path checkouts skip telemetry, which can skew queue-time metrics. Consider emitting the same event with
queue_time_μs: 0in this clause for consistency.♻️ Suggested adjustment
def handle_checkout({:checkout, _label}, _from, conn, pool_state) do + OpenTelemetry.execute( + [:electric, :shape_db, :pool, :checkout], + %{queue_time_μs: 0}, + %{label: label, stack_id: Keyword.get(pool_state, :stack_id)} + ) {:ok, conn, conn, pool_state} endpackages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (1)
223-229: Alignsnapshot_started?with bitmask semantics.The comment defines bit 0 as “started.” Using
> 0treats any nonzero state as started. Consider using a bitmask check to match the stated semantics and future-proof ifsnapshot_stateever becomes2.♻️ Suggested adjustment
- snapshot_started? = snapshot_state > 0 + snapshot_started? = Bitwise.band(snapshot_state, 1) == 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
integration-tests/tests/crash-recovery.luxpackages/sync-service/config/runtime.exspackages/sync-service/lib/electric/core_supervisor.expackages/sync-service/lib/electric/monitored_core_supervisor.expackages/sync-service/lib/electric/shape_cache/shape_status.expackages/sync-service/lib/electric/shape_cache/shape_status/shape_db.expackages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.expackages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.expackages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.expackages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.expackages/sync-service/lib/electric/shapes/supervisor.expackages/sync-service/lib/electric/stack_supervisor/telemetry.expackages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exspackages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exspackages/sync-service/test/support/component_setup.ex
💤 Files with no reviewable changes (2)
- packages/sync-service/lib/electric/core_supervisor.ex
- packages/sync-service/lib/electric/shapes/supervisor.ex
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-14T14:45:05.838Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead
Applied to files:
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.expackages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
📚 Learning: 2026-01-15T13:54:51.778Z
Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex:106-126
Timestamp: 2026-01-15T13:54:51.778Z
Learning: In NimblePool, when a worker is immediately available, the pool skips `handle_enqueue/2` and calls `handle_checkout/4` directly with the original checkout command. When no worker is available, the request goes through `handle_enqueue/2` (where state can be transformed) before `handle_checkout/4` is called. This means both patterns must be handled in `handle_checkout/4` to support both immediate and queued checkouts.
Applied to files:
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
🧬 Code graph analysis (4)
packages/sync-service/lib/electric/monitored_core_supervisor.ex (1)
packages/sync-service/lib/electric/telemetry/sentry.ex (1)
set_tags_context(23-25)
packages/sync-service/lib/electric/shape_cache/shape_status.ex (2)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)
add_shape(100-131)mark_snapshot_started(176-183)packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)
mark_snapshot_started(67-73)
packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (4)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (2)
add_shape(205-221)remove_shape(224-240)packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (7)
add_shape(100-131)mark_snapshot_started(176-183)shape_for_handle(145-149)handle_for_shape(139-143)mark_snapshot_complete(185-192)remove_shape(194-211)handle_exists?(151-156)packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)
mark_snapshot_started(67-73)mark_snapshot_complete(75-81)packages/sync-service/lib/electric/shapes.ex (1)
mark_snapshot_started(123-127)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (3)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (4)
add_shape(100-131)remove_shape(194-211)mark_snapshot_started(176-183)mark_snapshot_complete(185-192)packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (1)
checkout_write!(169-181)packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)
mark_snapshot_started(67-73)mark_snapshot_complete(75-81)
🔇 Additional comments (26)
packages/sync-service/lib/electric/monitored_core_supervisor.ex (1)
17-17: No action needed — tweaks is guaranteed to be a keyword list by schema validation.The codebase explicitly defines
tweaksastype: :keyword_listin the schema validation (stack_supervisor.ex lines 104–110) with a default of[]. This type constraint is enforced across all code paths. Additionally, core_supervisor.ex usesKeyword.fetch!(opts, :tweaks), which would fail immediately if tweaks were ever a map. All existing uses of tweaks in the codebase employKeyword.get()orKeyword.fetch!()without any defensive logic, confirming that the type is safe and consistent.Likely an incorrect or invalid review comment.
packages/sync-service/lib/electric/stack_supervisor/telemetry.ex (2)
43-62: Pending-write metric wiring looks good.
Adds the metric and schedules periodic reporting, improving visibility into buffered writes.
84-96: Defensive write-buffer telemetry emission looks good.
Gracefully handles the buffer not being available yet during early startup.packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (4)
176-202: Nice coverage for reduce_shape_meta + snapshot_started propagation.
Flushing before reduction makes the SQLite-backed state deterministic in the test.
223-250: Snapshot start/complete tests look good.
These updates align with the new snapshot lifecycle APIs.
273-277: Helper keeps snapshot assertions compact.
Makes the new snapshot-state checks easier to follow.
336-415: Good write-buffer behavior coverage.
The suite exercises buffered visibility, flush behavior, tombstones, and snapshot transitions well.packages/sync-service/lib/electric/shape_cache/shape_status.ex (5)
32-61: shape_meta_table initialization looks solid.
The new meta-table lifecycle (create/populate) is clear and coherent.
69-139: Add/remove/reset keep the meta cache in sync.
The ETS updates align with the buffered write path.
157-194: Handle validation + snapshot_started caching aligns with the new tuple.
Consistent use of the meta table makes these paths straightforward.
216-263: LRU/last_read_time updates align with the new tuple layout.
The fold now correctly pulls the last_read_time element.
275-304: Meta-table helpers are clean.
Creation and population helpers look good.packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (7)
2-23: Updated ShapeDb doc + buffering wiring look good.
The module-level intent is clear.
39-86: Buffered mutation path looks consistent.
Add/remove/snapshot/reset all route through the buffer cleanly.
88-131: Buffer-first reads with tombstone checks look good.
The fallback to SQLite is clear and safe.
133-176: Merged buffered/SQLite listing with tombstone filtering looks good.
Deduping by handle makes sense here.
184-209: Reduction/count paths align with the buffer model.
The updated flow matches the new read/write semantics.
216-248: Buffer-aware handle checks and pre-validation flush are good.
This keeps validation consistent with persisted state.
256-260: Pending-buffer metric accessor is a good addition.
Useful for telemetry and load testing.packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (7)
1-62: Module structure and high-level API look good.
The doc + API layout is easy to follow.
64-69: flush_sync API is handy for tests and shutdown.
Nice addition for deterministic flushes.
73-127: Lookup helpers and tombstone gating read clearly.
The semantics are easy to reason about.
129-153: Buffered shape listing/count helpers look good.
They match the expected buffered semantics.
155-200: Relation filtering and tombstone accounting look consistent.
The logic aligns with the Query-level expectations.
258-315: Init and ETS table setup look good.
The table configuration and manual_flush_only flag are clear.
317-333: Poll/flush/terminate wiring looks good.
The periodic flush loop is straightforward.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
Show resolved
Hide resolved
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
Show resolved
Hide resolved
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
Show resolved
Hide resolved
packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 207-224: When re-adding a handle in add_shape the earlier
tombstone operation can still hide the new add until flush; update add_shape
(which uses shapes_table_name/ops_table via shapes_table and ops_table and
inserts into ops_table) to clear any pending tombstone/remove op for that handle
before queuing the {:add,...} entry—e.g. remove matching {:remove, handle, ...}
entries from ops_table (using :ets.match_delete or lookup+delete) so immediate
reads (lookup_*/has_handle?) reflect the new add.
🧹 Nitpick comments (1)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (1)
24-27: Clarify ops table key shape in docs.Line 24 describes a single
timestamp, but the key is now a composite fromop_key/0. Update the doc to prevent confusion.♻️ Suggested doc tweak
- `{timestamp, operation, flushing}` + `{op_key, operation, flushing}` (where op_key is a composite monotonic key)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.changeset/gorgeous-readers-return.mdpackages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses
📚 Learning: 2026-01-14T14:45:05.838Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead
Applied to files:
.changeset/gorgeous-readers-return.md
🧬 Code graph analysis (1)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (3)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex (3)
name(6-8)start_link(10-12)init(14-52)packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (3)
add_shape(100-131)mark_snapshot_started(176-183)mark_snapshot_complete(185-192)packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)
mark_snapshot_started(67-73)mark_snapshot_complete(75-81)
🔇 Additional comments (4)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (3)
74-110: Lookup paths look solid.Tombstone-first filtering and buffer lookups are clear and consistent.
269-331: Lifecycle wiring is clean.Init, polling, and synchronous flush hooks read clearly and are easy to reason about.
342-433: Flush + cleanup flow is cohesive.Batching, cleanup, and retry behavior are well-structured.
.changeset/gorgeous-readers-return.md (1)
1-5: Changeset summary aligns with the implementation.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
Show resolved
Hide resolved
| ## Crash electric | ||
| [shell electric] | ||
| # flush pending writes so that the shape we just created is definitely in the db | ||
| !Electric.ShapeCache.ShapeStatus.ShapeDb.WriteBuffer.flush_sync("single_stack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are first flushing the changes then we are not simulating crash recovery - the test should cover the behaviour of a crash, including unflushed write buffer changes, and if that breaks electric we should fix it no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will give it a more detailed look tomorrow, main comments are 1) many full table scans and tab2lists on the buffer but its not guaranteed to be small (?) 2) crash test now doesn't actually test a crash (?) how do we handle the inflight requests being lost
will leave opus review as well below this
| tombstones = tombstones_table_name(stack_id) | ||
|
|
||
| shapes_table_name(stack_id) | ||
| |> :ets.tab2list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather avoid tab2list where possible and instead do a reduce/fold - less memory pressure and scale independent?
| tombstones = tombstones_table_name(stack_id) | ||
|
|
||
| shapes_table_name(stack_id) | ||
| |> :ets.tab2list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above - prefer reduce over tab2list
| end | ||
|
|
||
| defp has_buffered_add?(ops_table, handle) do | ||
| :ets.match_object(ops_table, {:_, {:add, handle, :_, :_, :_, :_}, :_}) != [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also full table scan (per-tombstone) - this is for telemetry only and again the buffer should be small but it is a bit scary to do a full scan per tombstone entry, surely there's more efficient ways of doing this (since we're doing tab2list again on the tombstones anyway) - we dont need to traverse the whole table every time. Having some sort of index might be worth it?
| @doc "Look up a handle by comparable shape binary. Returns {:ok, handle} or :not_found." | ||
| def lookup_handle(stack_id, comparable_binary) do | ||
| tombstones = tombstones_table_name(stack_id) | ||
| pattern = {:_, :_, comparable_binary} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify, this is a full table scan - I guess the idea is that the buffer should be small and we can afford this, but if write intensity is high won't this be an issue? and it's on the critical path as well
| # Write functions | ||
|
|
||
| @doc "Add a shape to the buffer" | ||
| def add_shape(stack_id, handle, shape_binary, comparable_binary, hash, relations) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify - there is no limit to the size of this buffer, where it would force a flush and apply back pressure forcefully, right?
msfstef
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a well-structured performance optimization that introduces an ETS-based write buffer to prevent SQLite timeout cascades. The latest commit addresses some initial concerns (flushing flag reset, operation key uniqueness), but a few issues remain.
🔴 Critical: Crash Recovery Test Semantics
integration-tests/tests/crash-recovery.lux:50-52
The test now explicitly flushes before crashing:
!Electric.ShapeCache.ShapeStatus.ShapeDb.WriteBuffer.flush_sync("single_stack")
This defeats the purpose of crash recovery testing. The test should verify that Electric handles unflushed writes correctly on restart (even if that means the shape doesn't exist). This is a fundamental change in crash semantics that should be addressed in the code/documentation, not papered over in tests.
🔴 Critical: Data Loss on Crash
Acknowledged in the PR description, but worth emphasizing: shapes created just before a crash may not exist after restart. This changes the durability guarantees and should be clearly documented for operators.
🟠 Moderate: TOCTOU Race in handle_for_shape
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex:72-88
case checkout!(stack_id, :handle_for_shape, &Query.handle_for_shape(&1, comparable_shape)) do
{:ok, handle} ->
if WriteBuffer.is_tombstoned?(stack_id, handle) do # Race: tombstone could be added here
:error
else
{:ok, handle}
end
Between the SQLite lookup and the tombstone check, a concurrent remove_shape could tombstone the handle. The window is small but exists under high concurrency.
🟠 Moderate: Linear Scans in Buffer Lookups
write_buffer.ex:136-152 (lookup_handle) and write_buffer.ex:171-172 (has_buffered_add?) use full table scans with match_object. Under the exact high-load conditions this PR targets (many concurrent shape creations), these could become bottlenecks.
Consider a secondary index for comparable_binary -> handle lookups.
🟡 Minor: Test Flakiness
request_batcher_test.exs:118
# Small delay to ensure the add_shape GenServer.call is processed before remove
Process.sleep(5)Adding Process.sleep to fix ordering suggests there's a real race condition that could manifest under CI load.
✅ Fixed in Latest Commit
- Flushing flag reset on error - Now properly allows retry on failed batches
- Operation key uniqueness - Uses
{monotonic_time, unique_integer}tuple to prevent collisions
Summary
| Severity | Issue | Status |
|---|---|---|
| 🔴 Critical | Crash recovery test doesn't test actual crash recovery | Open |
| 🔴 Critical | Data loss on crash (needs documentation) | Open |
| 🟠 Moderate | TOCTOU race in handle lookups | Open |
| 🟠 Moderate | Linear scans in buffer lookups | Open |
| 🟡 Minor | Test flakiness with Process.sleep | Open |
The crash recovery test concern is the most pressing - the current test is not validating what it claims to validate.
|
|
||
| assert_receive :adding_shape_2 | ||
| # Small delay to ensure the add_shape GenServer.call is processed before remove | ||
| Process.sleep(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not wait for the :processor_called message here as well? after that we wait for the processor_delay set above so the deduplication should work fine - generally we patch the processor call entirely so I'm unclear on why we need a sleep here
Reduce load on sqlite by moving the core metadata checked by most read requests into ETS and prevent timeouts during sustained parallel shape creation by writing to sqlite asynchronously in a background process. Writes are now done directly to ETS and flushed to disk in batches.
Only shape -> handle and handle -> shapes lookups are now handled by reading directly from sqlite. All other, small, metadata is held in an ets wriite-through cache and populated from sqlite on boot.
Pending writes are flushed to disk on shutdown. This is quick, ~1,000 writes/s, but could cause delays to shutdown in an overloaded system. There is a risk of loss of in-flight writes if the system crashes.
Associated changes:
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.