Skip to content

Thread-safety: data race in ImagingDefaultArena block cache (memory_get_block / memory_return_block) #9600

@ctkhanhly

Description

@ctkhanhly

What did you do?

Used asyncio.to_thread() / ThreadPoolExecutor to offload PIL Image operations (create, load, resize, encode) to multiple threads concurrently.

What did you expect to happen?

Thread-safe access to PIL Image allocation/deallocation, or documentation stating that multi-threaded usage requires set_blocks_max(0).

What actually happened?

ImagingDefaultArena in src/libImaging/Storage.c uses shared mutable state (blocks_cached, blocks_pool) without any synchronization. When multiple threads call memory_get_block / memory_return_block concurrently (which happens when PIL's GIL-releasing C operations run in parallel), data races can occur:

// memory_get_block (line 310) — no lock
if (arena->blocks_cached > 0) {
    arena->blocks_cached -= 1;                          // ← read-modify-write, no lock
    block = arena->blocks_pool[arena->blocks_cached];   // ← concurrent read, no lock
}

// memory_return_block (line 347) — no lock
if (arena->blocks_cached < arena->blocks_max) {
    arena->blocks_pool[arena->blocks_cached] = block;   // ← concurrent write, no lock
    arena->blocks_cached += 1;                          // ← read-modify-write, no lock
} else {
    free(block.ptr);
}

Race scenario (two threads returning blocks simultaneously):

Thread A: reads blocks_cached = 19 (< blocks_max 20)
Thread B: reads blocks_cached = 19 (same stale value — no lock)
Thread A: writes blocks_pool[19] = block_A, increments to 20
Thread B: writes blocks_pool[19] = block_B (OVERWRITES block_A)

Result: block_A.ptr is permanently lost — malloc'd, never free'd.

Similarly, memory_get_block can hand the same cached block to two threads simultaneously (use-after-free / double-free risk).

The race window is narrow under the GIL (requires concurrent C-level execution during GIL-released operations like resize/encode), making it hard to reproduce in simple tests. Under sustained high-concurrency production workloads with longer GIL-released operations, the race triggers more frequently.

Workaround: Image.core.set_blocks_max(0) disables the cache entirely, eliminating the shared mutable state.

What are your OS, Python and Pillow versions?

  • OS: Linux x86_64
  • Python: 3.12
  • Pillow: 10.4.0
import asyncio
from concurrent.futures import ThreadPoolExecutor
from PIL import Image

# This triggers concurrent access to ImagingDefaultArena from multiple threads.
# The race is probabilistic and more likely under sustained production load
# with longer GIL-released operations (resize, encode).
async def main():
    loop = asyncio.get_event_loop()
    loop.set_default_executor(ThreadPoolExecutor(max_workers=20))

    resolutions = [(512, 512), (1024, 1024), (1536, 1536), (2048, 2048)]

    async def process(res):
        def _work():
            img = Image.new("RGB", res)
            img.load()
        await asyncio.to_thread(_work)

    for _ in range(1000):
        await asyncio.gather(*[process(resolutions[i % 4]) for i in range(20)])

asyncio.run(main())

Additional context:

PIL's C extensions release the GIL during image operations, enabling true parallelism in thread pools. However, ImagingDefaultArena is a process-global struct with no mutex or atomic operations protecting its state. When asyncio.to_thread dispatches PIL work to a ThreadPoolExecutor, multiple threads can call memory_get_block / memory_return_block concurrently during the GIL-released portions of operations like Image.resize() or Image.save().

This is particularly relevant for:

  • Python's increasing use of asyncio.to_thread for GIL-releasing C extensions
  • The upcoming free-threaded CPython (PEP 703), which will remove the GIL entirely and make this race trivially triggerable

Suggested fixes:

  1. Add a pthread_mutex around blocks_cached / blocks_pool access in memory_get_block and memory_return_block
  2. Or use per-thread arenas (eliminates contention)
  3. Or document that the block cache is not thread-safe and recommend set_blocks_max(0) for multi-threaded usage

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions