-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
- Package Name:
- Package Version: 1.8.3
- Operating System:
- Python Version:
Describe the bug
A clear and concise description of what the bug is.
Bug report: configure_azure_monitor() deadlocks on macOS when performance counters enabled
Summary
Calling azure.monitor.opentelemetry.configure_azure_monitor() on macOS with default settings (enable_performance_counters=True) hangs indefinitely inside configure_azure_monitor().
Workaround: pass enable_performance_counters=False.
Environment
- OS: macOS 26.2 (Build 25C56), arm64
- Python: 3.11.14 (
/opt/homebrew/opt/python@3.11/bin/python3.11) - Packages (venv):
azure-monitor-opentelemetry==1.8.3azure-monitor-opentelemetry-exporter==1.0.0b46azure-core==1.37.0azure-core-tracing-opentelemetry==1.0.0b12opentelemetry-api==1.39.0opentelemetry-sdk==1.39.0opentelemetry-instrumentation==0.60b0(and related instrumentations pulled in by the distro)psutil==7.2.1
Repro steps
From this repo root:
- Create venv + install deps:
/opt/homebrew/bin/python3.11 -m venv .venv-repro
source .venv-repro/bin/activate
python -m pip install -U pip
python -m pip install 'azure-monitor-opentelemetry~=1.6' python-dotenv- Repro hang (Ctrl+C to stop and print a traceback):
source .venv-repro/bin/activate
python repro_configure_azure_monitor_perf_counters_mac.py- Workaround (succeeds):
source .venv-repro/bin/activate
python repro_configure_azure_monitor_perf_counters_mac.py --disable-performance-countersOptional: python run_repro.py runs both paths and interrupts the hang after ~10s to capture a traceback.
Expected behavior
configure_azure_monitor() should return normally on macOS. If a specific performance counter is unsupported on the platform, it should be skipped without blocking configuration.
Actual behavior
configure_azure_monitor() hangs (deadlock) on macOS when enable_performance_counters=True (default).
run_repro.py captures this traceback (SIGINT after a timeout):
WARNING:azure.monitor.opentelemetry.exporter._performance_counters._manager:Process I/O Rate performance counter is not available on this platform.
Traceback (most recent call last):
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/repro_configure_azure_monitor_perf_counters_mac.py", line 93, in <module>
raise SystemExit(main())
^^^^^^
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/repro_configure_azure_monitor_perf_counters_mac.py", line 74, in main
configure_azure_monitor(
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/_configure.py", line 141, in configure_azure_monitor
_setup_metrics(configurations)
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/_configure.py", line 286, in _setup_metrics
enable_performance_counters(meter_provider=meter_provider)
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_performance_counters/_manager.py", line 657, in enable_performance_counters
_PerformanceCountersManager(meter_provider)
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_utils.py", line 407, in __call__
instance = super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_performance_counters/_manager.py", line 601, in __init__
_logger.warning("Process I/O Rate performance counter is not available on this platform.")
File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1501, in warning
self._log(WARNING, msg, args, **kwargs)
File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1634, in _log
self.handle(record)
File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1644, in handle
self.callHandlers(record)
File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1706, in callHandlers
hdlr.handle(record)
File "/opt/homebrew/Cellar/python@3.11/3.11.14_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 978, in handle
self.emit(record)
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 579, in emit
logger.emit(self._translate(record))
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 665, in emit
self._multi_log_record_processor.on_emit(writable_record)
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/opentelemetry/sdk/_logs/_internal/__init__.py", line 345, in on_emit
lp.on_emit(log_record)
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_performance_counters/_processor.py", line 17, in on_emit
pcm = _PerformanceCountersManager()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nagkumar/Documents/msft.nosync/enable_perf_counter_langchain_tracer/.venv-repro/lib/python3.11/site-packages/azure/monitor/opentelemetry/exporter/_utils.py", line 404, in __call__
with cls._lock:
KeyboardInterrupt
Root cause analysis (suspected)
This appears to be a self-deadlock caused by logging during singleton initialization:
configure_azure_monitor()sets up logging before metrics.- With performance counters enabled, logging setup installs
_PerformanceCountersLogRecordProcessor. - During metrics setup,
enable_performance_counters()instantiates_PerformanceCountersManager(aSingletonguarded by a non-reentrantthreading.Lock). - On macOS,
ProcessIORateis not available (_IO_AVAILABLE == False), so_PerformanceCountersManager.__init__()emits:_logger.warning("Process I/O Rate performance counter is not available on this platform.")
- That warning is processed by OpenTelemetry’s
LoggingHandler, which invokes_PerformanceCountersLogRecordProcessor.on_emit(). on_emit()calls_PerformanceCountersManager()again, but the singleton lock is already held by the in-progress initialization → deadlock.
This likely impacts any platform where ProcessIORate is unavailable (macOS, and possibly some Linux distros).
Suggested fixes
Any of these should break the cycle:
- Make the singleton lock re-entrant (
threading.RLock) so re-entry from the same thread can proceed. - Avoid logging from inside
_PerformanceCountersManager.__init__()(or defer until after initialization). - Make
_PerformanceCountersLogRecordProcessoravoid instantiating_PerformanceCountersManager()during initialization (e.g., only record if already created).
Notes
The repro script uses APPLICATION_INSIGHTS_CONNECTION_STRING if set, otherwise a valid-format fallback connection string; the deadlock occurs before any telemetry content is required.
To Reproduce
Steps to reproduce the behavior:
#!/usr/bin/env python3
"""
Helper to run the repro in both modes:
enable_performance_counters=False(should succeed)- default
enable_performance_counters=True(hangs on macOS; we interrupt to collect a traceback)
Run from the venv used to install azure-monitor-opentelemetry:
python run_repro.py
"""
from future import annotations
import os
import signal
import subprocess
import sys
from dataclasses import dataclass
@DataClass(frozen=True)
class RunResult:
cmd: list[str]
returncode: int | None
stdout: str
stderr: str
timed_out: bool
def _run(cmd: list[str], *, timeout_s: float) -> RunResult:
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
env={**os.environ, "PYTHONUNBUFFERED": "1"},
)
try:
stdout, stderr = proc.communicate(timeout=timeout_s)
return RunResult(cmd, proc.returncode, stdout, stderr, False)
except subprocess.TimeoutExpired:
proc.send_signal(signal.SIGINT)
try:
stdout, stderr = proc.communicate(timeout=5)
except subprocess.TimeoutExpired:
proc.kill()
stdout, stderr = proc.communicate()
return RunResult(cmd, proc.returncode, stdout, stderr, True)
def main() -> int:
repro = "repro_configure_azure_monitor_perf_counters_mac.py"
good = _run(
[sys.executable, "-u", repro, "--disable-performance-counters", "--sleep-seconds", "0.1"],
timeout_s=30,
)
print("\n=== disable-performance-counters (expected: succeeds) ===")
print(good.stdout, end="")
if good.stderr:
print(good.stderr, end="", file=sys.stderr)
if good.returncode != 0:
print(f"Unexpected non-zero exit: {good.returncode}", file=sys.stderr)
return 1
bad = _run([sys.executable, "-u", repro, "--sleep-seconds", "0.1"], timeout_s=10)
print("\n=== performance counters enabled (expected: hangs on macOS) ===")
print(bad.stdout, end="")
if bad.stderr:
print(bad.stderr, end="", file=sys.stderr)
if bad.timed_out:
print("\nResult: process did not exit within timeout (expected on macOS).")
return 0
print("\nResult: process exited without timing out (unexpected if bug reproduces).")
print(f"Exit code: {bad.returncode}", file=sys.stderr)
return 2
if name == "main":
raise SystemExit(main())
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.