|
| 1 | +# High Performance IO |
| 2 | + |
| 3 | +This guide explains how to achieve optimal performance when using `io-stream` by understanding and controlling flush behavior. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The key to high-performance IO with `io-stream` is understanding when and how to flush your write buffer. Improper flush timing can significantly impact throughput, latency, and CPU usage. This guide helps you choose the right buffering strategy for your application. |
| 8 | + |
| 9 | +## Why Buffering Matters |
| 10 | + |
| 11 | +Every write to an underlying IO object (file, socket, pipe) involves a system call, which has overhead: |
| 12 | + |
| 13 | +- **Context switching**: Transferring control between userspace and kernel space. |
| 14 | +- **System call overhead**: The cost of invoking kernel functions. |
| 15 | +- **Network packet overhead**: For sockets, each small write may trigger a separate packet. |
| 16 | + |
| 17 | +Buffering solves this by accumulating data in memory and performing larger, less frequent writes. However, buffering introduces latency - data sits in memory until flushed. |
| 18 | + |
| 19 | +Use buffering when you need: |
| 20 | +- **High throughput**: Maximize data transfer rate for bulk operations. |
| 21 | +- **Reduced CPU usage**: Minimize system call overhead when writing many small pieces. |
| 22 | +- **Efficient network utilization**: Avoid sending many tiny packets. |
| 23 | + |
| 24 | +## The Flush/Throughput Tradeoff |
| 25 | + |
| 26 | +There's a fundamental tradeoff between responsiveness and throughput: |
| 27 | + |
| 28 | +```mermaid |
| 29 | +graph LR |
| 30 | + A[Immediate Flush] -->|Low Latency| B[Responsive] |
| 31 | + A -->|Many System Calls| C[Lower Throughput] |
| 32 | + D[Delayed Flush] -->|Higher Latency| E[Buffered] |
| 33 | + D -->|Fewer System Calls| F[Higher Throughput] |
| 34 | +``` |
| 35 | + |
| 36 | +**Immediate flushing** (after every write): |
| 37 | +- ✅ Data is sent immediately - low latency. |
| 38 | +- ✅ Simple mental model - predictable behavior. |
| 39 | +- ❌ High system call overhead. |
| 40 | +- ❌ Lower maximum throughput. |
| 41 | +- ❌ More CPU usage. |
| 42 | +- ❌ Network inefficiency (many small packets). |
| 43 | + |
| 44 | +**Buffered flushing** (accumulate before sending): |
| 45 | +- ✅ Fewer system calls - higher throughput. |
| 46 | +- ✅ Better CPU efficiency. |
| 47 | +- ✅ More efficient network packet utilization. |
| 48 | +- ❌ Data is delayed - higher latency. |
| 49 | +- ❌ Requires careful flush management. |
| 50 | + |
| 51 | +## Automatic Flush Behavior |
| 52 | + |
| 53 | +`io-stream` automatically flushes in these situations: |
| 54 | + |
| 55 | +~~~ ruby |
| 56 | +# 1. Buffer reaches minimum_write_size (default: 64KB) |
| 57 | +stream.write("x" * 65536) # Automatically flushes |
| 58 | + |
| 59 | +# 2. Using puts() always flushes |
| 60 | +stream.puts("This is flushed immediately") |
| 61 | + |
| 62 | +# 3. Closing the stream |
| 63 | +stream.close # Flushes any remaining data |
| 64 | +~~~ |
| 65 | + |
| 66 | +## Choosing Your Flush Strategy |
| 67 | + |
| 68 | +### Strategy 1: Let Automatic Flushing Handle It |
| 69 | + |
| 70 | +Best for: Bulk data transfer, file processing, log writing. |
| 71 | + |
| 72 | +~~~ ruby |
| 73 | +require 'io/stream' |
| 74 | + |
| 75 | +# Default behavior - automatic flush at 64KB |
| 76 | +stream = IO::Stream::Buffered.open("large_file.dat", "w") |
| 77 | + |
| 78 | +# Write lots of data |
| 79 | +1000.times do |i| |
| 80 | + stream.write("Record #{i}\n" * 1000) |
| 81 | +end |
| 82 | + |
| 83 | +stream.close # Final flush on close |
| 84 | +~~~ |
| 85 | + |
| 86 | +**When to use:** |
| 87 | +- Writing large amounts of data continuously. |
| 88 | +- Throughput is more important than latency. |
| 89 | +- You don't need interactive feedback. |
| 90 | + |
| 91 | +### Strategy 2: Manual Flush at Logical Boundaries |
| 92 | + |
| 93 | +Best for: Request/response protocols, transaction processing, structured logging. |
| 94 | + |
| 95 | +~~~ ruby |
| 96 | +require 'io/stream' |
| 97 | +require 'socket' |
| 98 | + |
| 99 | +socket = TCPSocket.new("example.com", 80) |
| 100 | +stream = IO::Stream(socket) |
| 101 | + |
| 102 | +# Build complete HTTP request |
| 103 | +stream.write("GET / HTTP/1.1\r\n") |
| 104 | +stream.write("Host: example.com\r\n") |
| 105 | +stream.write("Connection: close\r\n") |
| 106 | +stream.write("\r\n") |
| 107 | + |
| 108 | +# Flush after complete request |
| 109 | +stream.flush # Send request as one operation |
| 110 | +~~~ |
| 111 | + |
| 112 | +**When to use:** |
| 113 | +- Message-based protocols (HTTP, Redis, etc.) |
| 114 | +- You need to send complete "units" of data |
| 115 | +- Each logical operation should complete atomically |
| 116 | +- Balance between throughput and responsiveness |
| 117 | + |
| 118 | +### Strategy 3: Immediate Flush for Interactive Applications |
| 119 | + |
| 120 | +Best for: Chat applications, streaming responses, real-time dashboards. |
| 121 | + |
| 122 | +~~~ ruby |
| 123 | +require 'io/stream' |
| 124 | + |
| 125 | +# Use smaller buffer for more frequent automatic flushes |
| 126 | +stream = IO::Stream::Buffered.new( |
| 127 | + socket, |
| 128 | + minimum_write_size: 512 # Smaller buffer = more responsive |
| 129 | +) |
| 130 | + |
| 131 | +# Or flush after every message |
| 132 | +stream.write(message) |
| 133 | +stream.flush # Ensure immediate delivery |
| 134 | +~~~ |
| 135 | + |
| 136 | +**When to use:** |
| 137 | +- Real-time user interaction required. |
| 138 | +- Low latency is critical. |
| 139 | +- Data arrives in small, discrete chunks. |
| 140 | + |
| 141 | +### Strategy 4: Time-Based Flushing |
| 142 | + |
| 143 | +Best for: Streaming data, progress updates, monitoring |
| 144 | + |
| 145 | +~~~ ruby |
| 146 | +require 'io/stream' |
| 147 | + |
| 148 | +stream = IO::Stream::Buffered.open("stream.log", "w") |
| 149 | +last_flush = Time.now |
| 150 | + |
| 151 | +loop do |
| 152 | + stream.write(generate_log_entry) |
| 153 | + |
| 154 | + # Flush every second or when buffer is large |
| 155 | + if Time.now - last_flush > 1.0 |
| 156 | + stream.flush |
| 157 | + last_flush = Time.now |
| 158 | + end |
| 159 | +end |
| 160 | +~~~ |
| 161 | + |
| 162 | +**When to use:** |
| 163 | +- Ensuring regular progress visibility. |
| 164 | +- Protecting against data loss (periodic flush to disk). |
| 165 | +- Streaming applications with real-time monitoring. |
| 166 | + |
| 167 | +### Strategy 5: Readiness based flushing |
| 168 | + |
| 169 | +Best for: interactive protocols, terminal applications, chat servers. |
| 170 | + |
| 171 | +~~~ ruby |
| 172 | +require 'io/stream' |
| 173 | + |
| 174 | +stream = IO::Stream::Buffered.new(socket, minimum_write_size: 1024) |
| 175 | + |
| 176 | +loop do |
| 177 | + # Blocking read from a queue of messages to send: |
| 178 | + chunk = queue.pop |
| 179 | + stream.write(chunk) |
| 180 | + |
| 181 | + if queue.empty? |
| 182 | + # Flush when we are likely to block on the queue: |
| 183 | + stream.flush |
| 184 | + end |
| 185 | +end |
| 186 | +~~~ |
| 187 | + |
| 188 | +**When to use:** |
| 189 | +- When you have unpredictable message arrival patterns. |
| 190 | +- When you want to ensure the lowest possible latency while still benefiting from buffering when messages arrive in bursts. |
| 191 | + |
| 192 | +## Buffer Size Configuration |
| 193 | + |
| 194 | +The `minimum_write_size` parameter controls when automatic flushing occurs: |
| 195 | + |
| 196 | +~~~ ruby |
| 197 | +# Very small buffer - more responsive, lower throughput |
| 198 | +stream = IO::Stream::Buffered.new(io, minimum_write_size: 1024) |
| 199 | + |
| 200 | +# Default - balanced (64KB) |
| 201 | +stream = IO::Stream::Buffered.new(io) |
| 202 | + |
| 203 | +# Large buffer - maximum throughput, higher latency |
| 204 | +stream = IO::Stream::Buffered.new(io, minimum_write_size: 512 * 1024) |
| 205 | +~~~ |
| 206 | + |
| 207 | +### Choosing Buffer Size |
| 208 | + |
| 209 | +**Small buffers (1-8KB):** |
| 210 | +- Interactive protocols (terminal, chat). |
| 211 | +- Real-time data visualization. |
| 212 | +- Acceptable: Lower throughput. |
| 213 | + |
| 214 | +**Medium buffers (8-64KB):** |
| 215 | +- Web servers (default is good). |
| 216 | +- Application servers. |
| 217 | +- Database connections. |
| 218 | +- Balance of throughput and responsiveness. |
| 219 | + |
| 220 | +**Large buffers (64KB-1MB):** |
| 221 | +- File processing. |
| 222 | +- Bulk data transfer. |
| 223 | +- Video encoding. |
| 224 | +- Logging systems. |
| 225 | +- Only latency-insensitive applications. |
0 commit comments