Skip to content

Configure_azure_monitor()` deadlocks on macOS when performance counters enabled #44604

@rads-1996

Description

@rads-1996
  • Package Name:
  • Package Version: 1.8.3
  • Operating System:
  • Python Version:

Describe the bug
A clear and concise description of what the bug is.

Bug report: configure_azure_monitor() deadlocks on macOS when performance counters enabled

Summary

Calling azure.monitor.opentelemetry.configure_azure_monitor() on macOS with default settings (enable_performance_counters=True) hangs indefinitely inside configure_azure_monitor().

Workaround: pass enable_performance_counters=False.

Environment

  • OS: macOS 26.2 (Build 25C56), arm64
  • Python: 3.11.14 (/opt/homebrew/opt/python@3.11/bin/python3.11)
  • Packages (venv):
    • azure-monitor-opentelemetry==1.8.3
    • azure-monitor-opentelemetry-exporter==1.0.0b46
    • azure-core==1.37.0
    • azure-core-tracing-opentelemetry==1.0.0b12
    • opentelemetry-api==1.39.0
    • opentelemetry-sdk==1.39.0
    • opentelemetry-instrumentation==0.60b0 (and related instrumentations pulled in by the distro)
    • psutil==7.2.1

Repro steps

From this repo root:

  1. Create venv + install deps:
/opt/homebrew/bin/python3.11 -m venv .venv-repro
source .venv-repro/bin/activate
python -m pip install -U pip
python -m pip install 'azure-monitor-opentelemetry~=1.6' python-dotenv
  1. Repro hang (Ctrl+C to stop and print a traceback):
source .venv-repro/bin/activate
python repro_configure_azure_monitor_perf_counters_mac.py
  1. Workaround (succeeds):
source .venv-repro/bin/activate
python repro_configure_azure_monitor_perf_counters_mac.py --disable-performance-counters

Optional: python run_repro.py runs both paths and interrupts the hang after ~10s to capture a traceback.

Expected behavior

configure_azure_monitor() should return normally on macOS. If a specific performance counter is unsupported on the platform, it should be skipped without blocking configuration.

Actual behavior

configure_azure_monitor() hangs (deadlock) on macOS when enable_performance_counters=True (default).

run_repro.py captures this traceback (SIGINT after a timeout):

WARNING:azure.monitor.opentelemetry.exporter._performance_counters._manager:Process I/O Rate performance counter is not available on this platform.
Traceback (most recent call last):
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/repro_configure_azure_monitor_perf_counters_mac.py", line 93, in <module>
    raise SystemExit(main())
                     ^^^^^^
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/repro_configure_azure_monitor_perf_counters_mac.py", line 74, in main
    configure_azure_monitor(
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/_configure.py", line 141, in configure_azure_monitor
    _setup_metrics(configurations)
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/_configure.py", line 286, in _setup_metrics
    enable_performance_counters(meter_provider=meter_provider)
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_performance_counters/_manager.py", line 657, in enable_performance_counters
    _PerformanceCountersManager(meter_provider)
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_utils.py", line 407, in __call__
    instance = super().__call__(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_performance_counters/_manager.py", line 601, in __init__
    _logger.warning("Process I/O Rate performance counter is not available on this platform.")
  File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1501, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1634, in _log
    self.handle(record)
  File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1644, in handle
    self.callHandlers(record)
  File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1706, in callHandlers
    hdlr.handle(record)
  File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 978, in handle
    self.emit(record)
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 579, in emit
    logger.emit(self._translate(record))
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 665, in emit
    self._multi_log_record_processor.on_emit(writable_record)
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 345, in on_emit
    lp.on_emit(log_record)
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_performance_counters/_processor.py", line 17, in on_emit
    pcm = _PerformanceCountersManager()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_utils.py", line 404, in __call__
    with cls._lock:
KeyboardInterrupt

Root cause analysis (suspected)

This appears to be a self-deadlock caused by logging during singleton initialization:

  1. configure_azure_monitor() sets up logging before metrics.
  2. With performance counters enabled, logging setup installs _PerformanceCountersLogRecordProcessor.
  3. During metrics setup, enable_performance_counters() instantiates _PerformanceCountersManager (a Singleton guarded by a non-reentrant threading.Lock).
  4. On macOS, ProcessIORate is not available (_IO_AVAILABLE == False), so _PerformanceCountersManager.__init__() emits:
    • _logger.warning("Process I/O Rate performance counter is not available on this platform.")
  5. That warning is processed by OpenTelemetry’s LoggingHandler, which invokes _PerformanceCountersLogRecordProcessor.on_emit().
  6. on_emit() calls _PerformanceCountersManager() again, but the singleton lock is already held by the in-progress initialization → deadlock.

This likely impacts any platform where ProcessIORate is unavailable (macOS, and possibly some Linux distros).

Suggested fixes

Any of these should break the cycle:

  • Make the singleton lock re-entrant (threading.RLock) so re-entry from the same thread can proceed.
  • Avoid logging from inside _PerformanceCountersManager.__init__() (or defer until after initialization).
  • Make _PerformanceCountersLogRecordProcessor avoid instantiating _PerformanceCountersManager() during initialization (e.g., only record if already created).

Notes

The repro script uses APPLICATION_INSIGHTS_CONNECTION_STRING if set, otherwise a valid-format fallback connection string; the deadlock occurs before any telemetry content is required.

To Reproduce
Steps to reproduce the behavior:
#!/usr/bin/env python3
"""
Helper to run the repro in both modes:

  1. enable_performance_counters=False (should succeed)
  2. default enable_performance_counters=True (hangs on macOS; we interrupt to collect a traceback)

Run from the venv used to install azure-monitor-opentelemetry:
python run_repro.py
"""

from future import annotations

import os
import signal
import subprocess
import sys
from dataclasses import dataclass

@DataClass(frozen=True)
class RunResult:
cmd: list[str]
returncode: int | None
stdout: str
stderr: str
timed_out: bool

def _run(cmd: list[str], *, timeout_s: float) -> RunResult:
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
env={**os.environ, "PYTHONUNBUFFERED": "1"},
)
try:
stdout, stderr = proc.communicate(timeout=timeout_s)
return RunResult(cmd, proc.returncode, stdout, stderr, False)
except subprocess.TimeoutExpired:
proc.send_signal(signal.SIGINT)
try:
stdout, stderr = proc.communicate(timeout=5)
except subprocess.TimeoutExpired:
proc.kill()
stdout, stderr = proc.communicate()
return RunResult(cmd, proc.returncode, stdout, stderr, True)

def main() -> int:
repro = "repro_configure_azure_monitor_perf_counters_mac.py"

good = _run(
    [sys.executable, "-u", repro, "--disable-performance-counters", "--sleep-seconds", "0.1"],
    timeout_s=30,
)
print("\n=== disable-performance-counters (expected: succeeds) ===")
print(good.stdout, end="")
if good.stderr:
    print(good.stderr, end="", file=sys.stderr)
if good.returncode != 0:
    print(f"Unexpected non-zero exit: {good.returncode}", file=sys.stderr)
    return 1

bad = _run([sys.executable, "-u", repro, "--sleep-seconds", "0.1"], timeout_s=10)
print("\n=== performance counters enabled (expected: hangs on macOS) ===")
print(bad.stdout, end="")
if bad.stderr:
    print(bad.stderr, end="", file=sys.stderr)

if bad.timed_out:
    print("\nResult: process did not exit within timeout (expected on macOS).")
    return 0

print("\nResult: process exited without timing out (unexpected if bug reproduces).")
print(f"Exit code: {bad.returncode}", file=sys.stderr)
return 2

if name == "main":
raise SystemExit(main())

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions