Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,33 @@ Save this to `dmesg_custom_config.json` and run:
node-scraper --plugin-configs=dmesg_custom_config.json run-plugins DmesgPlugin
```

### Regex helper and `RegexSearchPlugin`

A small utility of common regex patterns is available at `nodescraper.regex_patterns` to
help build analyzer-friendly `error_regex` dicts. This is useful when composing configs for
`RegexSearchPlugin` or other analyzers that accept `error_regex` lists.

Python example (programmatic usage):

```py
from nodescraper import regex_patterns
from nodescraper.plugins.regex_search.regex_search_analyzer import RegexSearchAnalyzer
from nodescraper.plugins.regex_search.regex_search_data import RegexSearchData

# build error_regex list from named common patterns
rules = regex_patterns.build_error_regex_dicts(["ipv4", "email"], message_template="Found {name}")

# prepare data and args
data = RegexSearchData(content="2026-05-01T12:00:00,000+00:00 connect from 192.0.2.1")
args = {"error_regex": rules}

analyzer = RegexSearchAnalyzer(system_info=None)
result = analyzer.analyze_data(data, args)
print(result.events)
```

CLI note: `RegexSearchPlugin` accepts `--data` pointing to a file or directory and `--error-regex` entries for patterns when invoked from the CLI or a plugin config JSON.

#### **'compare-runs' subcommand**
The `compare-runs` subcommand compares datamodels from two run log directories (e.g. two
`nodescraper_log_*` folders). By default, all plugins with data in both runs are compared.
Expand Down
1 change: 1 addition & 0 deletions docs/PLUGIN_DOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

| Plugin | Collection | Analyzer Args | Collection Args | DataModel | Collector | Analyzer |
| --- | --- | --- | --- | --- | --- | --- |
| RegexHelper | - | **Utility:** common regex patterns and `build_error_regex_dicts()` to produce analyzer-friendly `error_regex` dicts; useful when composing `RegexSearchPlugin` configs programmatically. | - | - | - | - |
| AmdSmiPlugin | bad-pages<br>firmware --json<br>list --json<br>metric -g all<br>partition --json<br>process --json<br>ras --cper --folder={folder}<br>ras --afid --cper-file {cper_file}<br>static -g all --json<br>static -g {gpu_id} --json<br>topology<br>version --json<br>xgmi -l<br>xgmi -m | **Analyzer Args:**<br>- `check_static_data`: bool — If True, run static data checks (e.g. driver version, partition mode).<br>- `expected_gpu_processes`: Optional[int] — Expected number of GPU processes.<br>- `expected_max_power`: Optional[int] — Expected maximum power value (e.g. watts).<br>- `expected_driver_version`: Optional[str] — Expected AMD driver version string.<br>- `expected_memory_partition_mode`: Optional[str] — Expected memory partition mode (e.g. sp3, dp).<br>- `expected_compute_partition_mode`: Optional[str] — Expected compute partition mode.<br>- `expected_firmware_versions`: Optional[dict[str, str]] — Expected firmware versions keyed by amd-smi fw_id (e.g. PLDM_BUNDLE).<br>- `l0_to_recovery_count_error_threshold`: Optional[int] — L0-to-recovery count above which an error is raised.<br>- `l0_to_recovery_count_warning_threshold`: Optional[int] — L0-to-recovery count above which a warning is raised.<br>- `vendorid_ep`: Optional[str] — Expected endpoint vendor ID (e.g. for PCIe).<br>- `vendorid_ep_vf`: Optional[str] — Expected endpoint VF vendor ID.<br>- `devid_ep`: Optional[str] — Expected endpoint device ID.<br>- `devid_ep_vf`: Optional[str] — Expected endpoint VF device ID.<br>- `sku_name`: Optional[str] — Expected SKU name string for GPU.<br>- `expected_xgmi_speed`: Optional[list[float]] — Expected xGMI speed value(s) (e.g. link rate).<br>- `analysis_range_start`: Optional[datetime.datetime] — Start of time range for time-windowed analysis.<br>- `analysis_range_end`: Optional[datetime.datetime] — End of time range for time-windowed analysis. | **Collection Args:**<br>- `analysis_firmware_ids`: Optional[list[str]] — amd-smi fw_id values to record in analysis_ref.firmware_versions<br>- `cper_file_path`: Optional[str] — Path to CPER folder or file for RAS AFID collection (ras --afid --cper-file). | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) |
| BiosPlugin | sh -c 'cat /sys/devices/virtual/dmi/id/bios_version'<br>wmic bios get SMBIOSBIOSVersion /Value | **Analyzer Args:**<br>- `exp_bios_version`: list[str] — Expected BIOS version(s) to match against collected value (str or list).<br>- `regex_match`: bool — If True, match exp_bios_version as regex; otherwise exact match. | - | [BiosDataModel](#BiosDataModel-Model) | [BiosCollector](#Collector-Class-BiosCollector) | [BiosAnalyzer](#Data-Analyzer-Class-BiosAnalyzer) |
| CmdlinePlugin | cat /proc/cmdline | **Analyzer Args:**<br>- `required_cmdline`: Union[str, List] — Command-line parameters that must be present (e.g. 'pci=bfsort').<br>- `banned_cmdline`: Union[str, List] — Command-line parameters that must not be present.<br>- `os_overrides`: Dict[str, nodescraper.plugins.inband.cmdline.cmdlineconfig.OverrideConfig] — Per-OS overrides for required_cmdline and banned_cmdline (keyed by OS identifier).<br>- `platform_overrides`: Dict[str, nodescraper.plugins.inband.cmdline.cmdlineconfig.OverrideConfig] — Per-platform overrides for required_cmdline and banned_cmdline (keyed by platform). | - | [CmdlineDataModel](#CmdlineDataModel-Model) | [CmdlineCollector](#Collector-Class-CmdlineCollector) | [CmdlineAnalyzer](#Data-Analyzer-Class-CmdlineAnalyzer) |
Expand Down
52 changes: 52 additions & 0 deletions nodescraper/regex_patterns.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
"""Common regex patterns and helpers for building analyzer error rules.

Keep these lightweight and dependency-free so other modules can import them
without circular imports.
"""
from typing import Iterable, List

COMMON_PATTERNS: dict[str, str] = {
"ipv4": r"\b(?:25[0-5]|2[0-4]\d|1?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|1?\d?\d)){3}\b",
"mac": r"\b(?:[0-9A-Fa-f]{2}[:-]){5}[0-9A-Fa-f]{2}\b",
"uuid": r"\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\b",
"iso8601_ts": r"\b\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?\b",
"email": r"\b[\w.+-]+@[\w-]+(?:\.[\w-]+)+\b",
}


def get_pattern(name: str) -> str:
"""Return the raw regex string for a named common pattern.

Raises KeyError if the name is unknown.
"""
return COMMON_PATTERNS[name]


def build_error_regex_dicts(
names: Iterable[str],
message_template: str = "{name} matched",
event_category: str = "UNKNOWN",
event_priority: str = "ERROR",
) -> List[dict]:
"""Create list of dicts compatible with RegexAnalyzer._convert_and_extend_error_regex.

Each dict contains keys: 'regex' (string), 'message', 'event_category', 'event_priority'.
The analyzer will compile the regex strings into patterns.
"""
out: List[dict] = []
for name in names:
pat = COMMON_PATTERNS.get(name)
if not pat:
raise KeyError(f"Unknown pattern name: {name}")
out.append(
{
"regex": pat,
"message": message_template.format(name=name),
"event_category": event_category,
"event_priority": event_priority,
}
)
return out


__all__ = ["COMMON_PATTERNS", "get_pattern", "build_error_regex_dicts"]
22 changes: 22 additions & 0 deletions test/unit/test_regex_patterns.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import re

from nodescraper import regex_patterns


def test_ipv4_pattern_matches():
pat = regex_patterns.get_pattern("ipv4")
compiled = re.compile(pat)
assert compiled.search("address 192.0.2.1")


def test_mac_and_uuid_patterns_match():
mac = regex_patterns.get_pattern("mac")
uuid = regex_patterns.get_pattern("uuid")
assert re.search(mac, "found MAC 00:1A:2B:3C:4D:5E")
assert re.search(uuid, "id: 123e4567-e89b-12d3-a456-426655440000")


def test_build_error_regex_dicts_works():
rules = regex_patterns.build_error_regex_dicts(["ipv4", "email"], message_template="got {name}")
assert isinstance(rules, list) and len(rules) == 2
assert all("regex" in r and "message" in r for r in rules)
29 changes: 29 additions & 0 deletions test/unit/test_regex_search_analyzer_extra.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from nodescraper.models.systeminfo import SystemInfo
from nodescraper.plugins.regex_search.regex_search_analyzer import RegexSearchAnalyzer
from nodescraper.plugins.regex_search.regex_search_data import RegexSearchData
from nodescraper.plugins.regex_search.analyzer_args import RegexSearchAnalyzerArgs

from nodescraper import regex_patterns


def test_regex_search_analyzer_detects_ipv4():
system_info = SystemInfo()
analyzer = RegexSearchAnalyzer(system_info=system_info)

# Content includes an ISO-like timestamp and an IPv4 address
content = "2026-05-01T12:00:00,000+00:00 Something happened at 192.0.2.123\n"
data = RegexSearchData(content=content, data_root="regex_search")

args = {
"error_regex": regex_patterns.build_error_regex_dicts(["ipv4"], message_template="Found {name}"),
"num_timestamps": 2,
"interval_to_collapse_event": 60,
}

result = analyzer.analyze_data(data, args)

assert result is not None
assert len(result.events) >= 1
ev = result.events[0]
# matched content should include the IPv4
assert "192.0.2.123" in str(ev.data.get("match_content", ""))