Skip to content

Add Server.readlines_to_eof_timed for bulk trace-stream drain (#70)#71

Open
fdcastel wants to merge 1 commit into
FirebirdSQL:masterfrom
fdcastel:feat/readlines-to-eof-timed
Open

Add Server.readlines_to_eof_timed for bulk trace-stream drain (#70)#71
fdcastel wants to merge 1 commit into
FirebirdSQL:masterfrom
fdcastel:feat/readlines-to-eof-timed

Conversation

@fdcastel
Copy link
Copy Markdown
Member

Resolves #70.

What

Adds Server.readlines_to_eof_timed(timeout) — a bulk-drain counterpart to readline_timed that uses the isc_info_svc_to_eof request mode (the same one Firebird's fbtracemgr uses internally). Returns the multi-line chunk delivered in one services-API round-trip, distinguishes end-of-service (isc_info_end) from transient idleness (isc_info_svc_timeout / isc_info_data_not_ready / isc_info_truncated), and lets callers drain a busy trace stream at fbtracemgr-equivalent rates.

Why

readline_timed sends bytes([SrvInfoCode.LINE]) and gets one line per call. Under sustained trace-stream load a Python consumer is throttled to roughly 480 events/sec (10k lines/sec × ~21 lines per event of typical OLTP trace output). fbtracemgr (which sends isc_info_svc_to_eof) reaches roughly 3000 events/sec on the same workload; the gap is purely the request-mode round-trip overhead. When the Python reader falls behind, events accumulate in Firebird's server-side trace buffer until the engine discards them — and the consumer never knows.

Concrete numbers from a matched-workload comparison (Firebird 5.0.3 on Windows; HammerDB TPC-C 200-call transaction mix against a pre-loaded 1-warehouse schema; identical trace.conf for both readers):

Reader Request mode START_TX captured Total events Window
fbtracemgr isc_info_svc_to_eof 201 10 170 3.3 s
Python via readline_timed isc_info_svc_line 58 2 605 1.1 s
Python via the new readlines_to_eof_timed isc_info_svc_to_eof 201 10 226 3.4 s

Same Python process, same python3-driver connection — the only delta between rows 2 and 3 is the request item. Full investigation write-up in #70.

Design choices

  1. Additive, not destructive. readline_timed and its one-line semantics stay exactly as-is. Server.readline, stdin handling, and any third-party callers depending on per-line behavior keep their current behavior. The new method is opt-in for consumers (like trace-stream readers) that need bulk throughput.

  2. Trailing-tag-aware EOF. The Firebird engine (src/jrd/svc.cpp) emits one of four trailing indicators after a to_eof response:

    • isc_info_svc_timeout (64) — server timed out, session live.
    • isc_info_data_not_ready (4) — no payload available, session live.
    • isc_info_truncated (2) — buffer truncated, more data on next call.
    • isc_info_end (1, implicit) — service has finished.

    readline_timed only checks for isc_info_svc_timeout and conflates the other three into None/EOF. The new method explicitly checks for isc_info_end as the EOF signal and treats everything else as "session live, may have more data". This was the second half of the issue — a transient idle round was prematurely terminating long-lived consumers.

  3. Chunk delivery, not line splitting. readlines_to_eof_timed returns the server's multi-line buffer verbatim. Per-line consumers split with splitlines(keepends=True). The driver doesn't dictate line semantics on a path designed for trace-stream consumers who often want the raw bytes anyway (to dump verbatim to a file, the way fbtracemgr does).

Tests

Two new integration tests in tests/test_server.py, matching the existing service-test style:

I have not added a unit test that mocks the services connection because the existing tests are all integration-style against a live Firebird; happy to add mock-driven tests if the project prefers that direction.

Compatibility

  • Public API addition only; no signature changes.
  • Server.readline (which internally calls readline_timed via _read_output) keeps the same behavior. No callers were modified.
  • Tested against the existing test_server.py suite shape; no other tests were modified.

Notes

Section heading in CHANGELOG.md added as [Unreleased] to be retitled when the maintainers tag the next release.

Happy to iterate on naming (readlines_to_eof_timed vs read_to_eof_timed vs something else), splitting semantics, or test coverage. I am also fine reshaping this as a replacement for readline_timed rather than an addition if the maintainers prefer that route, though it would be a breaking change for any callers relying on the one-line semantics.

…rdSQL#70)

`readline_timed` requests one line per services-API round-trip
(`isc_info_svc_line`). Under sustained trace-stream throughput the
per-call overhead caps a Python consumer at roughly 480 trace
events/sec, while Firebird's own `fbtracemgr` reaches roughly 3000
trace events/sec because it uses `isc_info_svc_to_eof` (drain many
lines per round-trip). The slow-drain path causes server-side trace
buffer overruns and event loss — invisible to the client.

This commit adds `readlines_to_eof_timed(timeout)` as a parallel
method, mirroring fbtracemgr's request shape:

- Sends `bytes([SrvInfoCode.TO_EOF])` as the request item.
- Inspects the trailing indicator after the sized-string payload:
    isc_info_end (1)            -> end of service, return None
    isc_info_svc_timeout (64)   -> transient, return TIMEOUT
    isc_info_data_not_ready (4) -> transient, return data or TIMEOUT
    isc_info_truncated (2)      -> more data coming, return data
  This explicitly distinguishes "service finished" from "no data
  right now". `readline_timed` only checks for `isc_info_svc_timeout`
  and treats every other empty payload as `None` (EOF), which
  prematurely terminates long-running consumers on transient idle
  rounds.
- Returns the multi-line chunk verbatim; callers split with
  `.splitlines(keepends=True)`.

`readline_timed` is left unchanged because `Server.readline`,
stdin/STDOUT handlers, and any third-party callers depend on its
one-line semantics. The new method is opt-in for consumers that need
fbtracemgr-equivalent throughput.

Tests:
- `test_readlines_to_eof_timed_drain` — runs `database.get_statistics`
  twice (once via `readlines()`, once via the new method) and asserts
  identical normalised output, confirming the bulk path captures the
  full service response.
- `test_readlines_to_eof_timed_returns_timeout_when_idle` — starts a
  trace session against an idle database; the very first read MUST
  return `TIMEOUT`, not `None`. Pins the "transient idle round != EOF"
  contract that the bug fix depends on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

readline_timed drops trace stream events under load (uses isc_info_svc_line per-call instead of isc_info_svc_to_eof bulk drain)

1 participant