Summary
Server.readline_timed (the API documented for consuming a Firebird trace stream from Python) drops the majority of trace events under sustained load. The cause is the request mode it sends to the services API: isc_info_svc_line (= SrvInfoCode.LINE, value 62), which requests one line per round-trip. Under realistic OLTP throughput the per-call overhead caps the consumer at roughly 480 trace-events per second, while Firebird's fbtracemgr.exe (the reference C++ consumer the engine ships) keeps up with the full stream because it uses isc_info_svc_to_eof (= SrvInfoCode.TO_EOF, value 63) and drains many lines per round-trip.
I have evidence from a five-experiment baseline (matched workload, same python3-driver build, only the request mode varying) showing the gap is purely the request-mode choice: a Python reader using _svc.query(..., bytes([SrvInfoCode.TO_EOF]), ...) against the same trace session captures as many events as fbtracemgr does, exactly.
Reproduction
- Start a Firebird 5.0.x server with a workload that emits trace at a non-trivial rate. I used HammerDB's TPC-C driver (200-call transaction mix run via
tclsh from https://github.com/fdcastel/HammerDB/tree/feature/firebird-support) but the same effect should appear with gbak, a sysbench-style workload, or even a tight loop of EXECUTE_STATEMENT_* events.
- Open a long-running trace session via
python3-driver:
import firebird.driver as fb
conn = fb.connect_server(server="localhost/3050", user="SYSDBA", password="masterkey")
session_id = conn.trace.start(config=EVERYTHING_CONFIG, name="repro")
while True:
line = conn.readline_timed(1)
if line is None: break
sys.stdout.write(line)
- Run the same trace.conf against the same workload using
fbtracemgr -se localhost:service_mgr -user SYSDBA -password masterkey -start -name repro2 -config trace.conf > fbtracemgr.out.
- Count
START_TRANSACTION (or any frequent event type) in each capture.
Expected: identical counts. Observed: fbtracemgr captures roughly 6× more events on a runtxn-only workload, and ~17× more when the trace session is also observing a heavy bulk-load phase.
Quantitative evidence
Five experiments under matched conditions (Firebird 5.0.3 on Windows Server 2022; HammerDB TPC-C 200-call transaction mix against a pre-loaded 1-warehouse schema; trace config = EVERYTHING_CONFIG mirrored from the project's defaults):
| Experiment |
Consumer |
Request mode |
START_TX captured |
Total events |
trace bytes |
| E1 |
fbtracemgr |
isc_info_svc_to_eof |
756 (load+run) |
1 209 620 |
1.34 GB |
| E2 |
fbtracemgr |
isc_info_svc_to_eof |
201 (run only) |
10 170 |
8.1 MB |
| E3 |
python3-driver via Server.readline_timed |
isc_info_svc_line |
58 (run only) |
2 605 |
2.1 MB |
| E4 |
python3-driver bare readline_timed loop |
isc_info_svc_line |
58 (run only) |
2 977 |
2.4 MB |
| E5 |
python3-driver low-level _svc.query(..., TO_EOF, ...) |
isc_info_svc_to_eof |
201 (run only) |
10 226 |
8.4 MB |
E5 is the same Python process, same connection class, same Firebird session — only the SPB request item differs. It captures the full stream identically to fbtracemgr. Per-call verbose instrumentation on E3/E4 confirmed the Python reader is not throughput-bound on its own: it processes incoming lines at ~10 k lines/sec consistently. The events it's missing are dropped server-side by Firebird's trace buffer because the LINE-mode round-trips can't drain it fast enough; events accumulate until the server discards them to protect itself.
The harness produced raw traces, per-call timeline logs, and response-tag histograms. I can share specific bytes or a self-contained reproducer if helpful.
Root cause (single line)
Server.readline_timed sends bytes([SrvInfoCode.LINE]) as the request item. Compare with Firebird's own reference trace consumer at src/utilities/fbtracemgr/traceMgrMain.cpp:216: const char query[] = {isc_info_svc_to_eof, isc_info_end};. Server-side handling for the two request items goes through the same get() function in src/jrd/svc.cpp but with different flags (GET_LINE vs GET_EOF), and the bulk-drain path is the only one fast enough for a busy trace stream.
A secondary issue lives in the same method: when the server returns isc_info_data_not_ready (transient buffer-empty, but session still active), readline_timed returns None. Callers conventionally treat None as EOF and tear down the iterator, which means a brief idle moment can prematurely terminate trace capture. The behavior is consistent with the current code:
data = self.response.read_sized_string(encoding=self.encoding, errors=self.encoding_errors)
if self.response.get_tag() == SrvInfoCode.TIMEOUT:
return TIMEOUT
return data if data else None
SrvInfoCode.TIMEOUT is checked explicitly but isc_info_data_not_ready (value 4) is not — both should signal "no data right now, session still alive" rather than EOF. fbtracemgr's loop treats both indicators identically: it keeps polling until either Ctrl-C or a genuine end-of-service (which the server signals by ending the response with isc_info_end without a trailing indicator).
Proposed fix
Adding a new public method is the least disruptive shape — existing readline_timed callers (notably Server.readline, stdin handling, and any third-party code relying on per-line semantics) keep their current behavior.
def readlines_to_eof_timed(self, timeout: int) -> str | Sentinel | None:
"""Drain available trace/services text from the server in one
round-trip. Returns the (possibly multi-line) chunk, `TIMEOUT` when
the server-side timeout elapsed with no payload, or `None` when the
service has reported `isc_info_end` (end of stream).
Distinct from `readline_timed` in that:
- the request item is `isc_info_svc_to_eof`, not `isc_info_svc_line`;
the server returns up to MAXBUF bytes per call instead of one line.
- the trailing indicator is inspected: `isc_info_svc_timeout` and
`isc_info_data_not_ready` both return `TIMEOUT` (idle, keep going);
`isc_info_truncated` is silently absorbed (more data coming);
only `isc_info_end` (no trailing indicator) returns `None`.
Designed for trace-session consumers that need fbtracemgr-equivalent
throughput; falls back to `readline_timed` for one-line semantics.
"""
self.response.clear()
self._svc.query(self._make_request(timeout),
bytes([SrvInfoCode.TO_EOF]),
self.response.raw)
if (tag := self.response.get_tag()) != SrvInfoCode.TO_EOF:
raise InterfaceError(f"Service responded with error code: {tag}")
data = self.response.read_sized_string(encoding=self.encoding,
errors=self.encoding_errors)
trailing = self.response.get_tag()
if trailing == ISC_INFO_END: # value 1
return data if data else None
if data:
return data
return TIMEOUT
Server.readline could optionally call this internally and split the chunk into per-line yields, preserving its existing signature while picking up the throughput improvement automatically; the per-line buffering pattern would resemble the line-buffer in the existing Server.readline flow.
Tests should cover at least:
- a busy mock services connection where
_svc.query returns multi-line responses with the various trailing indicators
- the
isc_info_end termination case (final round with no payload, no trailing indicator)
- the
isc_info_data_not_ready idle case (does NOT return None)
I'm happy to send a PR with the implementation and tests if the maintainers prefer that route.
Context
I hit this while diagnosing low event-capture rates in a long-running Python trace consumer on a busy multi-host deployment. The five-experiment baseline above pinned the root cause to the request-mode choice. As a workaround I call _svc.query directly with the TO_EOF request item (private-API access in user code) and the throughput gap closes — the same Python process now captures the full trace stream identically to fbtracemgr. I would much rather depend on a public driver method than maintain the workaround.
Happy to provide more diagnostics, sample captures, or a draft PR — whichever shape the maintainers find easiest to move forward.
Thanks for the work on this library.
Summary
Server.readline_timed(the API documented for consuming a Firebird trace stream from Python) drops the majority of trace events under sustained load. The cause is the request mode it sends to the services API:isc_info_svc_line(=SrvInfoCode.LINE, value 62), which requests one line per round-trip. Under realistic OLTP throughput the per-call overhead caps the consumer at roughly 480 trace-events per second, while Firebird'sfbtracemgr.exe(the reference C++ consumer the engine ships) keeps up with the full stream because it usesisc_info_svc_to_eof(=SrvInfoCode.TO_EOF, value 63) and drains many lines per round-trip.I have evidence from a five-experiment baseline (matched workload, same
python3-driverbuild, only the request mode varying) showing the gap is purely the request-mode choice: a Python reader using_svc.query(..., bytes([SrvInfoCode.TO_EOF]), ...)against the same trace session captures as many events asfbtracemgrdoes, exactly.Reproduction
tclshfrom https://github.com/fdcastel/HammerDB/tree/feature/firebird-support) but the same effect should appear withgbak, asysbench-style workload, or even a tight loop ofEXECUTE_STATEMENT_*events.python3-driver:fbtracemgr -se localhost:service_mgr -user SYSDBA -password masterkey -start -name repro2 -config trace.conf > fbtracemgr.out.START_TRANSACTION(or any frequent event type) in each capture.Expected: identical counts. Observed:
fbtracemgrcaptures roughly 6× more events on a runtxn-only workload, and ~17× more when the trace session is also observing a heavy bulk-load phase.Quantitative evidence
Five experiments under matched conditions (Firebird 5.0.3 on Windows Server 2022; HammerDB TPC-C 200-call transaction mix against a pre-loaded 1-warehouse schema; trace config =
EVERYTHING_CONFIGmirrored from the project's defaults):fbtracemgrisc_info_svc_to_eoffbtracemgrisc_info_svc_to_eofServer.readline_timedisc_info_svc_linereadline_timedloopisc_info_svc_line_svc.query(..., TO_EOF, ...)isc_info_svc_to_eofE5 is the same Python process, same connection class, same Firebird session — only the SPB request item differs. It captures the full stream identically to
fbtracemgr. Per-call verbose instrumentation on E3/E4 confirmed the Python reader is not throughput-bound on its own: it processes incoming lines at ~10 k lines/sec consistently. The events it's missing are dropped server-side by Firebird's trace buffer because the LINE-mode round-trips can't drain it fast enough; events accumulate until the server discards them to protect itself.The harness produced raw traces, per-call timeline logs, and response-tag histograms. I can share specific bytes or a self-contained reproducer if helpful.
Root cause (single line)
Server.readline_timedsendsbytes([SrvInfoCode.LINE])as the request item. Compare with Firebird's own reference trace consumer atsrc/utilities/fbtracemgr/traceMgrMain.cpp:216:const char query[] = {isc_info_svc_to_eof, isc_info_end};. Server-side handling for the two request items goes through the sameget()function insrc/jrd/svc.cppbut with different flags (GET_LINEvsGET_EOF), and the bulk-drain path is the only one fast enough for a busy trace stream.A secondary issue lives in the same method: when the server returns
isc_info_data_not_ready(transient buffer-empty, but session still active),readline_timedreturnsNone. Callers conventionally treatNoneas EOF and tear down the iterator, which means a brief idle moment can prematurely terminate trace capture. The behavior is consistent with the current code:SrvInfoCode.TIMEOUTis checked explicitly butisc_info_data_not_ready(value 4) is not — both should signal "no data right now, session still alive" rather than EOF.fbtracemgr's loop treats both indicators identically: it keeps polling until either Ctrl-C or a genuine end-of-service (which the server signals by ending the response withisc_info_endwithout a trailing indicator).Proposed fix
Adding a new public method is the least disruptive shape — existing
readline_timedcallers (notablyServer.readline, stdin handling, and any third-party code relying on per-line semantics) keep their current behavior.Server.readlinecould optionally call this internally and split the chunk into per-line yields, preserving its existing signature while picking up the throughput improvement automatically; the per-line buffering pattern would resemble the line-buffer in the existingServer.readlineflow.Tests should cover at least:
_svc.queryreturns multi-line responses with the various trailing indicatorsisc_info_endtermination case (final round with no payload, no trailing indicator)isc_info_data_not_readyidle case (does NOT returnNone)I'm happy to send a PR with the implementation and tests if the maintainers prefer that route.
Context
I hit this while diagnosing low event-capture rates in a long-running Python trace consumer on a busy multi-host deployment. The five-experiment baseline above pinned the root cause to the request-mode choice. As a workaround I call
_svc.querydirectly with theTO_EOFrequest item (private-API access in user code) and the throughput gap closes — the same Python process now captures the full trace stream identically tofbtracemgr. I would much rather depend on a public driver method than maintain the workaround.Happy to provide more diagnostics, sample captures, or a draft PR — whichever shape the maintainers find easiest to move forward.
Thanks for the work on this library.