Move SQLite writes to a background thread and buffer read state #3716

magnetised · 2026-01-15T12:25:43Z

Reduce load on sqlite by moving the core metadata checked by most read requests into ETS and prevent timeouts during sustained parallel shape creation by writing to sqlite asynchronously in a background process. Writes are now done directly to ETS and flushed to disk in batches.

Only shape -> handle and handle -> shapes lookups are now handled by reading directly from sqlite. All other, small, metadata is held in an ets wriite-through cache and populated from sqlite on boot.

Pending writes are flushed to disk on shutdown. This is quick, ~1,000 writes/s, but could cause delays to shutdown in an overloaded system. There is a risk of loss of in-flight writes if the system crashes.

Associated changes:

Move ShapeCleaner up in the supervisor hierarchy so it's available before the ShapeStatus process attempts to cleanup incomplete/invalid shapes
Allow for configuring load shedding via an environment variable - makes it easier to run load tests locally without maintaining a patch on the default configuration

Summary by CodeRabbit

New Features
- Write-buffered shape metadata with background flushes to reduce DB timeouts
- Optional max concurrent requests configuration
- New telemetry for pending shape write buffer size
Bug Fixes
- More accurate snapshot state tracking and consistency during add/remove/flush
Chores
- Adjusted shape-cleaner supervision and startup wiring
Tests
- Updated tests and integration flows to exercise buffering and snapshot semantics

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T12:25:48Z

📝 Walkthrough

Walkthrough

Reworks shape metadata storage to an ETS-backed shape_meta_table and adds a WriteBuffer GenServer that batches shape metadata mutations (add/remove/snapshot) before persisting to SQLite. Query and connection layers made buffer-aware; supervisors, telemetry, config, and tests updated for buffered semantics and labeled checkouts. Public signatures largely preserved.

Changes

Cohort / File(s)	Summary
ETS metadata refactor `packages/sync-service/lib/electric/shape_cache/shape_status.ex`	Replace `shape_last_used_table` with `shape_meta_table` storing 4-tuples (`handle`, `hash`, `snapshot_started`, `last_read_time`) and update all ETS ops to the new tuple shape.
Shape DB core & queries `packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex`, `packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex`	Add buffer-aware read/write paths that consult the WriteBuffer before SQLite; enqueue adds/removes/snapshot markers; rename streaming/query helpers to expose shape metadata; `reduce_shape_handles/3` → `reduce_shape_meta/3`.
WriteBuffer implementation `packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`	New GenServer providing ETS-backed buffering: pending operation queue, tombstones, buffered shape lookups, snapshot queueing, periodic/background flush, synchronous flush API, batching to SQLite, and many inspection helpers (new public API).
Connection pool & telemetry labeling `packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex`	Checkout/check-out-write APIs extended to accept a `label`; enqueue/checkout flows carry label; emit OpenTelemetry queue metrics and label explain calls.
Supervisor wiring `packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex`, `packages/sync-service/lib/electric/monitored_core_supervisor.ex`, `packages/sync-service/lib/electric/core_supervisor.ex`, `packages/sync-service/lib/electric/shapes/supervisor.ex`	Add `ShapeDb.WriteBuffer` child to ShapeDb supervisor; add CleanupTaskSupervisor child via tweaks; remove legacy `shape_cleaner` from some supervisors.
Telemetry periodic metrics `packages/sync-service/lib/electric/stack_supervisor/telemetry.ex`	Add periodic reporting for pending write-buffer size and emit telemetry [:electric, :shape_db, :write_buffer, :pending_writes].
Config and runtime `packages/sync-service/config/runtime.exs`	Add optional `max_concurrent_requests` config read from `ELECTRIC_MAX_CONCURRENT_REQUESTS` (JSON or nil).
Tests & test helpers `packages/sync-service/test/.../shape_db_test.exs`, `packages/sync-service/test/support/component_setup.ex`, `integration-tests/tests/crash-recovery.lux`, `packages/sync-service/test/.../request_batcher_test.exs`	Update tests to use `reduce_shape_meta/3`, flush write buffer where needed, adjust snapshot state expectations, add small sync delay in one test, and set `manual_flush_only: true` in test setup.
Changelog / changeset `.changeset/gorgeous-readers-return.md`	New changeset documenting the ETS-based write-through cache and asynchronous SQLite writes.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ShapeDb
  participant WriteBuffer
  participant SQLite

  Client->>ShapeDb: add_shape(handle, shape, hash, relations)
  ShapeDb->>WriteBuffer: enqueue :add (buffer shape, relations, hash)
  WriteBuffer-->>ShapeDb: {:ok, shape_hash}
  Note right of WriteBuffer: background or explicit flush will persist batched ops
  WriteBuffer->>SQLite: batch apply (add/remove/snapshot ops)
  SQLite-->>WriteBuffer: ok
  WriteBuffer-->>ShapeDb: flush completion (cleanup tombstones)
  ShapeDb-->>Client: {:ok, shape_hash}

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested reviewers

icehaunter

Poem

🐰 I buffered shapes in an ETSy nest,
Queued snaps and tombs, then let them rest.
Batches woke and scurried to write,
Labels kept checkouts cozy and bright.
Hop, hop—now metadata sleeps through the night.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main architectural change: moving SQLite writes to a background thread and buffering read state in ETS, which aligns with the PR's core objective.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-01-15T12:27:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.36%. Comparing base (6b7b453) to head (9af4ea1).
⚠️ Report is 11 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3716      +/-   ##
==========================================
+ Coverage   83.40%   87.36%   +3.96%     
==========================================
  Files          35       23      -12     
  Lines        2380     2011     -369     
  Branches      530      533       +3     
==========================================
- Hits         1985     1757     -228     
+ Misses        393      252     -141     
  Partials        2        2

Flag	Coverage Δ
electric-telemetry	`?`
elixir	`?`
packages/experimental	`87.73% <ø> (ø)`
packages/react-hooks	`86.48% <ø> (ø)`
packages/start	`82.83% <ø> (ø)`
packages/typescript-client	`93.47% <ø> (ø)`
packages/y-electric	`56.05% <ø> (ø)`
typescript	`87.36% <ø> (ø)`
unit-tests	`87.36% <ø> (+3.96%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

magnetised · 2026-01-15T12:36:14Z

@coderabbitai review

coderabbitai · 2026-01-15T12:36:21Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

magnetised · 2026-01-15T12:36:22Z

benchmark this

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex`:
- Around line 106-125: The fallback clause handle_checkout({:checkout, _label},
_from, conn, pool_state) is unreachable because handle_enqueue({:checkout,
label}, pool_state) always enqueues {:checkout, label, now()}, so remove the
dead clause (the two-arity pattern) from connection.ex (or instead replace it
with a clear comment if you intentionally want defensive behavior); update tests
accordingly to ensure no behavior changes.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c25526c and 2734d7b.

📒 Files selected for processing (5)

packages/sync-service/lib/electric/shape_cache/shape_status.ex
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex
packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses

📚 Learning: 2026-01-14T14:45:05.838Z

Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex

🧬 Code graph analysis (2)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (3)

packages/sync-service/lib/electric/connection/manager.ex (1)

pool_name (180-182)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)

explain (182-186)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)

explain (263-265)

explain (267-269)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (2)

checkout_write! (168-180)

checkout! (159-166)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Check and build examples/linearlite-read-only
GitHub Check: Check and build examples/tanstack-db-web-starter
GitHub Check: Check and build examples/linearlite
GitHub Check: Check and build examples/nextjs
GitHub Check: Check and build examples/react
GitHub Check: Check and build examples/encryption
GitHub Check: Check and build examples/proxy-auth
GitHub Check: Run Lux integration tests

🔇 Additional comments (18)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)

6-8: LGTM!

The new list_shape_meta query correctly selects the required metadata fields (handle, hash, snapshot_state) for the ETS cache population.

249-253: LGTM!

The stream function correctly transforms the database's integer snapshot_state to a boolean representing "snapshot started" status, consistent with the snapshot_started? function logic at Line 199.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (3)

12-15: LGTM!

The updated imports correctly reflect the new function signatures that accept action labels for telemetry tracking.

35-47: LGTM!

Action labels are consistently applied across all checkout operations, enabling meaningful telemetry tracking per operation type.

100-106: LGTM!

The renamed reduce_shape_meta/3 correctly uses the new list_shape_meta_stream and the name accurately reflects the richer metadata being processed.

packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (1)

176-199: LGTM!

The test comprehensively validates the renamed reduce_shape_meta/3 function, correctly verifying the {handle, hash, snapshot_started} tuple structure with proper setup of both started and non-started snapshots.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (3)

132-140: LGTM!

The reduced SQLite page cache (512KB) is appropriate since hot-path lookups now go through the ETS cache. The comment clearly explains the negative value convention.

114-118: LGTM!

The telemetry emission provides valuable observability for pool checkout latency, enabling monitoring and debugging of potential connection pool bottlenecks per operation type.

159-180: LGTM!

The updated function signatures correctly propagate the label through to NimblePool, enabling the telemetry instrumentation in handle_checkout.

packages/sync-service/lib/electric/shape_cache/shape_status.ex (9)

32-34: LGTM!

The tuple structure {handle, hash, snapshot_started, last_read_time} and position constant are clearly documented, making the code self-explanatory.

69-75: LGTM!

The ETS insertion correctly uses insert_new to prevent race conditions and properly sequences the persistent store write before the cache write.

169-175: LGTM!

The hash validation via ETS lookup is an effective optimization, avoiding SQLite queries for shape validation on the hot path.

182-186: LGTM!

The update correctly sequences the persistent write before the cache update and properly uses update_element with the correct tuple position.

190-193: LGTM!

Reading snapshot_started? from ETS cache provides fast hot-path access while maintaining a safe default of false for missing entries.

203-205: Verify: snapshot_complete? not cached in ETS.

While snapshot_started? now reads from the ETS cache (Line 190-193), snapshot_complete? still queries SQLite directly. This asymmetry may be intentional if snapshot_complete? is called less frequently, but it creates inconsistent performance characteristics.

If snapshot_complete? is also on a hot path, consider extending the meta tuple to {handle, hash, snapshot_started, snapshot_complete, last_read_time} and caching both states.

231-267: LGTM!

The LRU algorithm using gb_trees is an efficient approach (O(n) scan with O(log k) tree operations) and correctly handles the new 4-tuple structure.

297-310: LGTM!

The population logic correctly transforms the 3-tuple from ShapeDb.reduce_shape_meta into the 4-tuple required by ETS, initializing all shapes with the same start_time for a fair LRU baseline after restart.

283-295: LGTM!

The ETS table configuration with read_concurrency: true and write_concurrency: :auto is well-suited for a cache that's primarily read-heavy with occasional writes.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex

github-actions · 2026-01-15T12:44:52Z

Benchmark results, triggered for 2734d

write fanout completed

unrelated shapes one client latency completed

many shapes one client latency completed

concurrent shape creation completed

diverse shape fanout completed

balegas · 2026-01-19T09:01:25Z

Concurrent shape creation looks significantly slower. Do you think this might cause timeouts with high load?

magnetised · 2026-01-19T15:58:37Z

benchmark this

github-actions · 2026-01-19T16:09:53Z

Benchmark results, triggered for 55864

write fanout completed

unrelated shapes one client latency completed

many shapes one client latency completed

concurrent shape creation completed

diverse shape fanout completed

magnetised · 2026-01-20T12:28:36Z

benchmark this

github-actions · 2026-01-20T12:35:42Z

Benchmark results, triggered for b02c0

write fanout completed

unrelated shapes one client latency completed

many shapes one client latency completed

concurrent shape creation completed

diverse shape fanout completed

magnetised · 2026-01-20T13:50:54Z

@coderabbitai review

coderabbitai · 2026-01-20T13:51:05Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@packages/sync-service/config/runtime.exs`:
- Around line 199-206: The current config decodes JSON into atoms using
Jason.decode!(..., keys: :atoms) which can create new atoms and exhaust the BEAM
atom table; update the decode call for max_concurrent_requests to use keys:
:atoms! instead (i.e., replace Jason.decode!(json, keys: :atoms) with
Jason.decode!(json, keys: :atoms!)) so only existing atoms are used; locate the
max_concurrent_requests assignment and the Jason.decode! invocation to make this
change.

In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 401-425: The batch write silently ignores non-:ok returns causing
failed DB ops to be dropped; update do_batch_write so each call to
Query.remove_shape, Query.mark_snapshot_started, and
Query.mark_snapshot_complete is enforced with pattern matching (e.g., expect :ok
and let a non-:ok return raise) similar to the existing :ok = Query.add_shape
line, so failures propagate to the rescue block instead of being ignored during
the Enum.each over entries.
- Around line 339-381: flush_until_empty currently leaves entries marked with
flushing=true when do_batch_write returns :error, preventing retries; after
detecting a non-:ok result from do_batch_write(stack_id, entries) you should
iterate the collected entries and reset their flushing flag back to false in the
ops_table so they can be retried later (use the same key shape used by
mark_and_collect_entries/ops_table entries, updating the flushing field via
:ets.update_element or a select/replace on ops_table for each {ts, _op} in
entries). Ensure this logic runs only on failure paths and references the local
variables ops_table, entries, and the flushing flag so successful paths are
unchanged.
- Around line 56-58: The ops table currently keys only by timestamp() which can
collide; add a new function op_key/0 that returns a composite monotonic
timestamp plus a unique integer (e.g., {System.monotonic_time(),
:erlang.unique_integer([:positive])}) and keep timestamp/0 if needed for
readability, then replace all uses of ts = timestamp() in add_shape,
remove_shape, queue_snapshot_started, and queue_snapshot_complete with ts =
op_key() so each op key is ordered and unique, preventing overwrites in the
:ordered_set.

In
`@packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs`:
- Around line 117-120: Replace the fixed Process.sleep(5) with a deterministic
synchronization: wait (via assert_receive) for the RequestBatcher to confirm the
shape is fully enqueued or poll a public/readable status function on
RequestBatcher (e.g., RequestBatcher.enqueued?/2 or similar) until it returns
true before calling RequestBatcher.remove_shape(stack_id, `@shape_handle_2`);
ensure you add or use an existing test hook/message (like an :shape_enqueued
notification) from the add_shape handling so the test can reliably
assert_receive that notification instead of sleeping.

🧹 Nitpick comments (2)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (1)
123-125: Emit checkout telemetry for non-queued checkouts too.

Fast-path checkouts skip telemetry, which can skew queue-time metrics. Consider emitting the same event with queue_time_μs: 0 in this clause for consistency.
♻️ Suggested adjustment
 def handle_checkout({:checkout, _label}, _from, conn, pool_state) do
+  OpenTelemetry.execute(
+    [:electric, :shape_db, :pool, :checkout],
+    %{queue_time_μs: 0},
+    %{label: label, stack_id: Keyword.get(pool_state, :stack_id)}
+  )
   {:ok, conn, conn, pool_state}
 end
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (1)
223-229: Align snapshot_started? with bitmask semantics.

The comment defines bit 0 as “started.” Using > 0 treats any nonzero state as started. Consider using a bitmask check to match the stated semantics and future-proof if snapshot_state ever becomes 2.
♻️ Suggested adjustment
-    snapshot_started? = snapshot_state > 0
+    snapshot_started? = Bitwise.band(snapshot_state, 1) == 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2734d7b and fceff29.

📒 Files selected for processing (15)

integration-tests/tests/crash-recovery.lux
packages/sync-service/config/runtime.exs
packages/sync-service/lib/electric/core_supervisor.ex
packages/sync-service/lib/electric/monitored_core_supervisor.ex
packages/sync-service/lib/electric/shape_cache/shape_status.ex
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
packages/sync-service/lib/electric/shapes/supervisor.ex
packages/sync-service/lib/electric/stack_supervisor/telemetry.ex
packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs
packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs
packages/sync-service/test/support/component_setup.ex

💤 Files with no reviewable changes (2)

packages/sync-service/lib/electric/core_supervisor.ex
packages/sync-service/lib/electric/shapes/supervisor.ex

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2026-01-14T14:45:05.838Z

Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex

📚 Learning: 2026-01-15T13:54:51.778Z

Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex:106-126
Timestamp: 2026-01-15T13:54:51.778Z
Learning: In NimblePool, when a worker is immediately available, the pool skips `handle_enqueue/2` and calls `handle_checkout/4` directly with the original checkout command. When no worker is available, the request goes through `handle_enqueue/2` (where state can be transformed) before `handle_checkout/4` is called. This means both patterns must be handled in `handle_checkout/4` to support both immediate and queued checkouts.

Applied to files:

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex

🧬 Code graph analysis (4)

packages/sync-service/lib/electric/monitored_core_supervisor.ex (1)

packages/sync-service/lib/electric/telemetry/sentry.ex (1)

set_tags_context (23-25)

packages/sync-service/lib/electric/shape_cache/shape_status.ex (2)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)

add_shape (100-131)

mark_snapshot_started (176-183)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)

mark_snapshot_started (67-73)

packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (4)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (2)

add_shape (205-221)

remove_shape (224-240)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (7)

add_shape (100-131)

mark_snapshot_started (176-183)

shape_for_handle (145-149)

handle_for_shape (139-143)

mark_snapshot_complete (185-192)

remove_shape (194-211)

handle_exists? (151-156)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)

mark_snapshot_started (67-73)

mark_snapshot_complete (75-81)

packages/sync-service/lib/electric/shapes.ex (1)

mark_snapshot_started (123-127)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (3)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (4)

add_shape (100-131)

remove_shape (194-211)

mark_snapshot_started (176-183)

mark_snapshot_complete (185-192)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (1)

checkout_write! (169-181)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)

mark_snapshot_started (67-73)

mark_snapshot_complete (75-81)

🔇 Additional comments (26)

packages/sync-service/lib/electric/monitored_core_supervisor.ex (1)

17-17: No action needed — tweaks is guaranteed to be a keyword list by schema validation.

The codebase explicitly defines tweaks as type: :keyword_list in the schema validation (stack_supervisor.ex lines 104–110) with a default of []. This type constraint is enforced across all code paths. Additionally, core_supervisor.ex uses Keyword.fetch!(opts, :tweaks), which would fail immediately if tweaks were ever a map. All existing uses of tweaks in the codebase employ Keyword.get() or Keyword.fetch!() without any defensive logic, confirming that the type is safe and consistent.

Likely an incorrect or invalid review comment.

packages/sync-service/lib/electric/stack_supervisor/telemetry.ex (2)

43-62: Pending-write metric wiring looks good.
Adds the metric and schedules periodic reporting, improving visibility into buffered writes.

84-96: Defensive write-buffer telemetry emission looks good.
Gracefully handles the buffer not being available yet during early startup.

packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (4)

176-202: Nice coverage for reduce_shape_meta + snapshot_started propagation.
Flushing before reduction makes the SQLite-backed state deterministic in the test.

223-250: Snapshot start/complete tests look good.
These updates align with the new snapshot lifecycle APIs.

273-277: Helper keeps snapshot assertions compact.
Makes the new snapshot-state checks easier to follow.

336-415: Good write-buffer behavior coverage.
The suite exercises buffered visibility, flush behavior, tombstones, and snapshot transitions well.

packages/sync-service/lib/electric/shape_cache/shape_status.ex (5)

32-61: shape_meta_table initialization looks solid.
The new meta-table lifecycle (create/populate) is clear and coherent.

69-139: Add/remove/reset keep the meta cache in sync.
The ETS updates align with the buffered write path.

157-194: Handle validation + snapshot_started caching aligns with the new tuple.
Consistent use of the meta table makes these paths straightforward.

216-263: LRU/last_read_time updates align with the new tuple layout.
The fold now correctly pulls the last_read_time element.

275-304: Meta-table helpers are clean.
Creation and population helpers look good.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (7)

2-23: Updated ShapeDb doc + buffering wiring look good.
The module-level intent is clear.

39-86: Buffered mutation path looks consistent.
Add/remove/snapshot/reset all route through the buffer cleanly.

88-131: Buffer-first reads with tombstone checks look good.
The fallback to SQLite is clear and safe.

133-176: Merged buffered/SQLite listing with tombstone filtering looks good.
Deduping by handle makes sense here.

184-209: Reduction/count paths align with the buffer model.
The updated flow matches the new read/write semantics.

216-248: Buffer-aware handle checks and pre-validation flush are good.
This keeps validation consistent with persisted state.

256-260: Pending-buffer metric accessor is a good addition.
Useful for telemetry and load testing.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (7)

1-62: Module structure and high-level API look good.
The doc + API layout is easy to follow.

64-69: flush_sync API is handy for tests and shutdown.
Nice addition for deterministic flushes.

73-127: Lookup helpers and tombstone gating read clearly.
The semantics are easy to reason about.

129-153: Buffered shape listing/count helpers look good.
They match the expected buffered semantics.

155-200: Relation filtering and tombstone accounting look consistent.
The logic aligns with the Query-level expectations.

258-315: Init and ETS table setup look good.
The table configuration and manual_flush_only flag are clear.

317-333: Poll/flush/terminate wiring looks good.
The periodic flush loop is straightforward.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

packages/sync-service/config/runtime.exs

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 207-224: When re-adding a handle in add_shape the earlier
tombstone operation can still hide the new add until flush; update add_shape
(which uses shapes_table_name/ops_table via shapes_table and ops_table and
inserts into ops_table) to clear any pending tombstone/remove op for that handle
before queuing the {:add,...} entry—e.g. remove matching {:remove, handle, ...}
entries from ops_table (using :ets.match_delete or lookup+delete) so immediate
reads (lookup_*/has_handle?) reflect the new add.

🧹 Nitpick comments (1)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (1)
24-27: Clarify ops table key shape in docs.

Line 24 describes a single timestamp, but the key is now a composite from op_key/0. Update the doc to prevent confusion.
♻️ Suggested doc tweak
-  `{timestamp, operation, flushing}`
+  `{op_key, operation, flushing}` (where op_key is a composite monotonic key)

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fceff29 and 9af4ea1.

📒 Files selected for processing (2)

.changeset/gorgeous-readers-return.md
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses

📚 Learning: 2026-01-14T14:45:05.838Z

Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

.changeset/gorgeous-readers-return.md

🧬 Code graph analysis (1)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (3)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex (3)

name (6-8)

start_link (10-12)

init (14-52)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (3)

add_shape (100-131)

mark_snapshot_started (176-183)

mark_snapshot_complete (185-192)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)

mark_snapshot_started (67-73)

mark_snapshot_complete (75-81)

🔇 Additional comments (4)

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (3)

74-110: Lookup paths look solid.

Tombstone-first filtering and buffer lookups are clear and consistent.

269-331: Lifecycle wiring is clean.

Init, polling, and synchronous flush hooks read clearly and are easy to reason about.

342-433: Flush + cleanup flow is cohesive.

Batching, cleanup, and retry behavior are well-structured.

.changeset/gorgeous-readers-return.md (1)

1-5: Changeset summary aligns with the implementation.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

msfstef · 2026-01-20T17:11:14Z

integration-tests/tests/crash-recovery.lux

 ## Crash electric
 [shell electric]
+  # flush pending writes so that the shape we just created is definitely in the db
+  !Electric.ShapeCache.ShapeStatus.ShapeDb.WriteBuffer.flush_sync("single_stack")


If we are first flushing the changes then we are not simulating crash recovery - the test should cover the behaviour of a crash, including unflushed write buffer changes, and if that breaks electric we should fix it no?

msfstef

Will give it a more detailed look tomorrow, main comments are 1) many full table scans and tab2lists on the buffer but its not guaranteed to be small (?) 2) crash test now doesn't actually test a crash (?) how do we handle the inflight requests being lost

will leave opus review as well below this

msfstef · 2026-01-20T17:17:37Z

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

+    tombstones = tombstones_table_name(stack_id)
+
+    shapes_table_name(stack_id)
+    |> :ets.tab2list()


I would rather avoid tab2list where possible and instead do a reduce/fold - less memory pressure and scale independent?

msfstef · 2026-01-20T17:17:52Z

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

+    tombstones = tombstones_table_name(stack_id)
+
+    shapes_table_name(stack_id)
+    |> :ets.tab2list()


same as above - prefer reduce over tab2list

msfstef · 2026-01-20T17:22:04Z

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

+  end
+
+  defp has_buffered_add?(ops_table, handle) do
+    :ets.match_object(ops_table, {:_, {:add, handle, :_, :_, :_, :_}, :_}) != []


also full table scan (per-tombstone) - this is for telemetry only and again the buffer should be small but it is a bit scary to do a full scan per tombstone entry, surely there's more efficient ways of doing this (since we're doing tab2list again on the tombstones anyway) - we dont need to traverse the whole table every time. Having some sort of index might be worth it?

msfstef · 2026-01-20T17:22:34Z

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

+  @doc "Look up a handle by comparable shape binary. Returns {:ok, handle} or :not_found."
+  def lookup_handle(stack_id, comparable_binary) do
+    tombstones = tombstones_table_name(stack_id)
+    pattern = {:_, :_, comparable_binary}


to clarify, this is a full table scan - I guess the idea is that the buffer should be small and we can afford this, but if write intensity is high won't this be an issue? and it's on the critical path as well

msfstef · 2026-01-20T17:24:55Z

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

+  # Write functions
+
+  @doc "Add a shape to the buffer"
+  def add_shape(stack_id, handle, shape_binary, comparable_binary, hash, relations) do


To clarify - there is no limit to the size of this buffer, where it would force a flush and apply back pressure forcefully, right?

msfstef

Code Review

This is a well-structured performance optimization that introduces an ETS-based write buffer to prevent SQLite timeout cascades. The latest commit addresses some initial concerns (flushing flag reset, operation key uniqueness), but a few issues remain.

🔴 Critical: Crash Recovery Test Semantics

integration-tests/tests/crash-recovery.lux:50-52

The test now explicitly flushes before crashing:

!Electric.ShapeCache.ShapeStatus.ShapeDb.WriteBuffer.flush_sync("single_stack")

This defeats the purpose of crash recovery testing. The test should verify that Electric handles unflushed writes correctly on restart (even if that means the shape doesn't exist). This is a fundamental change in crash semantics that should be addressed in the code/documentation, not papered over in tests.

🔴 Critical: Data Loss on Crash

Acknowledged in the PR description, but worth emphasizing: shapes created just before a crash may not exist after restart. This changes the durability guarantees and should be clearly documented for operators.

🟠 Moderate: TOCTOU Race in `handle_for_shape`

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex:72-88

case checkout!(stack_id, :handle_for_shape, &Query.handle_for_shape(&1, comparable_shape)) do
  {:ok, handle} ->
    if WriteBuffer.is_tombstoned?(stack_id, handle) do  # Race: tombstone could be added here
      :error
    else
      {:ok, handle}
    end

Between the SQLite lookup and the tombstone check, a concurrent remove_shape could tombstone the handle. The window is small but exists under high concurrency.

🟠 Moderate: Linear Scans in Buffer Lookups

write_buffer.ex:136-152 (lookup_handle) and write_buffer.ex:171-172 (has_buffered_add?) use full table scans with match_object. Under the exact high-load conditions this PR targets (many concurrent shape creations), these could become bottlenecks.

Consider a secondary index for comparable_binary -> handle lookups.

🟡 Minor: Test Flakiness

request_batcher_test.exs:118

# Small delay to ensure the add_shape GenServer.call is processed before remove
Process.sleep(5)

Adding Process.sleep to fix ordering suggests there's a real race condition that could manifest under CI load.

✅ Fixed in Latest Commit

Flushing flag reset on error - Now properly allows retry on failed batches
Operation key uniqueness - Uses {monotonic_time, unique_integer} tuple to prevent collisions

Summary

Severity	Issue	Status
🔴 Critical	Crash recovery test doesn't test actual crash recovery	Open
🔴 Critical	Data loss on crash (needs documentation)	Open
🟠 Moderate	TOCTOU race in handle lookups	Open
🟠 Moderate	Linear scans in buffer lookups	Open
🟡 Minor	Test flakiness with Process.sleep	Open

The crash recovery test concern is the most pressing - the current test is not validating what it claims to validate.

msfstef · 2026-01-21T09:22:03Z

packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs


      assert_receive :adding_shape_2
+      # Small delay to ensure the add_shape GenServer.call is processed before remove
+      Process.sleep(5)


why not wait for the :processor_called message here as well? after that we wait for the processor_delay set above so the deduplication should work fine - generally we patch the processor call entirely so I'm unclear on why we need a sleep here

This comment has been minimized.

Sign in to view

magnetised force-pushed the magnetised/sqlite-ets-cache branch from 1cdec3c to 2734d7b Compare January 15, 2026 12:34

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex Show resolved Hide resolved

This comment has been minimized.

Sign in to view

magnetised force-pushed the magnetised/sqlite-ets-cache branch from 2734d7b to bb41c9a Compare January 15, 2026 13:49

magnetised added 3 commits January 19, 2026 15:15

Put ets cache in front of sqlite

7ebc8a3

make load shedding configurable via an env var

23a9828

add write buffer

59d7416

magnetised force-pushed the magnetised/sqlite-ets-cache branch 2 times, most recently from fee08a1 to 62d685a Compare January 19, 2026 15:32

This comment has been minimized.

Sign in to view

telemetry

67a2d9f

magnetised force-pushed the magnetised/sqlite-ets-cache branch from 62d685a to 67a2d9f Compare January 19, 2026 15:42

fix for shape removal race condition

558648f

magnetised added 3 commits January 19, 2026 17:17

moving shape cleaner start before shape status

561f89d

add a buffer with shape to handle and handle to shape data

079ee97

Refactor and remove unused functions

3e39fb4

This comment has been minimized.

Sign in to view

magnetised added 3 commits January 20, 2026 11:32

[format]

d6ff1a6

Dedupe handles

6295103

More efficient if not entirely accurate count_shapes

b02c0b2

This comment has been minimized.

Sign in to view

clean up comments

be609e1

magnetised self-assigned this Jan 20, 2026

This comment has been minimized.

Sign in to view

ensure shapedb is flushed before crashing

fceff29

magnetised force-pushed the magnetised/sqlite-ets-cache branch from b9b0fce to fceff29 Compare January 20, 2026 13:52

magnetised changed the title ~~Put ets cache in front of sqlite~~ Move SQLite writes to a background thread and buffer read state Jan 20, 2026

magnetised marked this pull request as ready for review January 20, 2026 14:29

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

magnetised added 2 commits January 20, 2026 15:12

[changeset]

5b698d9

review

9af4ea1

This comment has been minimized.

Sign in to view

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex Show resolved Hide resolved

msfstef reviewed Jan 20, 2026

View reviewed changes

msfstef reviewed Jan 21, 2026

View reviewed changes

Move SQLite writes to a background thread and buffer read state #3716

Are you sure you want to change the base?

Move SQLite writes to a background thread and buffer read state #3716

Conversation

magnetised commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

codecov bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment has been minimized.

magnetised commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026

Uh oh!

magnetised commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

balegas commented Jan 19, 2026

Uh oh!

This comment has been minimized.

magnetised commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

magnetised commented Jan 20, 2026

Uh oh!

github-actions bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

magnetised commented Jan 20, 2026

Uh oh!

coderabbitai bot commented Jan 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

msfstef Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

msfstef left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msfstef Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

msfstef Jan 20, 2026

magnetised commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

codecov bot commented Jan 15, 2026 •

edited

Loading

github-actions bot commented Jan 15, 2026 •

edited

Loading

github-actions bot commented Jan 19, 2026 •

edited

Loading

github-actions bot commented Jan 20, 2026 •

edited

Loading

msfstef left a comment •

edited

Loading

🟠 Moderate: TOCTOU Race in `handle_for_shape`