Skip to content

Add memory benchmarks for scan pipeline#3001

Merged
liquidsec merged 6 commits into3.0from
additional-memory-benchmarks
Apr 1, 2026
Merged

Add memory benchmarks for scan pipeline#3001
liquidsec merged 6 commits into3.0from
additional-memory-benchmarks

Conversation

@liquidsec
Copy link
Copy Markdown
Contributor

Summary

  • Adds three memory benchmarks that measure RSS and data retention through the real scan pipeline:
    • HTTP_RESPONSE body retention: 200 responses with 500KB bodies, measures how much body data survives after scan completion
    • High-volume pipeline: 5000 DNS_NAME events, measures per-event RSS cost and dedup tracker growth
    • Recursive discovery chain: DNS_NAME → URL → HTTP_RESPONSE chains 4 levels deep, measures parent chain retention
  • All benchmarks wire into pytest-benchmark (extra_info) for --benchmark-save / --benchmark-compare across branches
  • Establishes baselines for evaluating future memory optimizations

@github-actions
Copy link
Copy Markdown
Contributor


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 31, 2026

📊 Performance Benchmark Report

Comparing 3.0 (baseline) vs additional-memory-benchmarks (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 4.27ms 4.19ms -1.8%
Bloom Filter Large Scale Dns Brute Force 17.53ms 17.29ms -1.4%
Large Closest Match Lookup 351.39ms 340.92ms -3.0%
Realistic Closest Match Workload 187.48ms 190.77ms +1.8%
Event Memory Medium Scan 1776 B/event 1776 B/event +0.0%
Event Memory Large Scan 1760 B/event 1760 B/event +0.0%
Event Validation Full Scan Startup Small Batch 405.67ms 419.24ms +3.3%
Event Validation Full Scan Startup Large Batch 580.37ms 579.83ms -0.1%
Make Event Autodetection Small 30.87ms 31.36ms +1.6%
Make Event Autodetection Large 317.61ms 316.74ms -0.3%
Make Event Explicit Types 14.00ms 14.06ms +0.4%
Excavate Single Thread Small 3.962s 3.905s -1.4%
Excavate Single Thread Large 9.587s 9.822s +2.4%
Excavate Parallel Tasks Small 4.152s 4.100s -1.3%
Excavate Parallel Tasks Large 7.240s 7.205s -0.5%
Is Ip Performance 3.18ms 3.17ms -0.3%
Make Ip Type Performance 11.45ms 11.60ms +1.3%
Mixed Ip Operations 4.51ms 4.55ms +0.9%
Memory Use Web Crawl - 681ns New 🆕 🆕
Memory Use Subdomain Enum - 651ns New 🆕 🆕
Typical Queue Shuffle 62.89µs 59.80µs -4.9%
Priority Queue Shuffle 722.42µs 687.79µs -4.8%

🎯 Performance Summary

No significant performance changes detected (all changes <10%)

🆕 New Tests

  • Memory Use Web Crawl: 681ns, 1468.4K ops/sec
  • Memory Use Subdomain Enum: 651ns, 1536.1K ops/sec

🐍 Python Version 3.11.15

@aconite33
Copy link
Copy Markdown
Contributor

recheck

1 similar comment
@aconite33
Copy link
Copy Markdown
Contributor

recheck

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 20.86957% with 91 lines in your changes missing coverage. Please review.
✅ Project coverage is 91%. Comparing base (8b02acb) to head (32aa5a1).
⚠️ Report is 11 commits behind head on 3.0.

Files with missing lines Patch % Lines
bbot/test/benchmarks/_scan_memory_web_crawl.py 0% 49 Missing ⚠️
...bot/test/benchmarks/_scan_memory_subdomain_enum.py 0% 27 Missing ⚠️
bbot/test/benchmarks/test_scan_memory.py 50% 15 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##             3.0   #3001    +/-   ##
======================================
- Coverage     91%     91%    -0%     
======================================
  Files        436     439     +3     
  Lines      37072   37184   +112     
======================================
+ Hits       33677   33711    +34     
- Misses      3395    3473    +78     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aconite33 aconite33 closed this Mar 31, 2026
@aconite33 aconite33 reopened this Mar 31, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Mar 31, 2026
@blacklanternsecurity blacklanternsecurity unlocked this conversation Mar 31, 2026
@aconite33 aconite33 closed this Mar 31, 2026
@aconite33 aconite33 reopened this Mar 31, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Mar 31, 2026
@aconite33 aconite33 closed this Mar 31, 2026
@aconite33 aconite33 reopened this Mar 31, 2026
@liquidsec liquidsec force-pushed the additional-memory-benchmarks branch from 1825523 to cb5940b Compare March 31, 2026 13:52
@blacklanternsecurity blacklanternsecurity unlocked this conversation Mar 31, 2026
Scanner construction allocates 400+ MB in pytest (presets, module
loading, etc.) which was setting the tracemalloc peak before any
scan events existed, masking real differences between branches.

Split scanner init out of the tracemalloc window so we measure
only scan execution memory.

Also separate "new tests" from "significant changes" in benchmark
report output.
pytest's own allocations (~200 MB) contaminate tracemalloc peak
measurements when scans run in-process, masking real differences
between branches. Run each benchmark scan as a subprocess instead
so measurements reflect only the scan's own memory use.

Also rename tests to test_memory_use_* for clarity.
@liquidsec liquidsec force-pushed the additional-memory-benchmarks branch from aa7e2bc to 590e979 Compare March 31, 2026 19:48
liquidsec and others added 2 commits March 31, 2026 16:51
IP addresses and DNS record type strings (A, AAAA, CNAME, etc.)
repeat heavily across events. sys.intern() deduplicates them so
all events sharing the same IPs/rdtypes reference the same string
object, reducing memory ~10-30% on those fields.
…interning

Intern repeated strings in resolved_hosts and dns_children
# 1) Web crawl -- httpx visits many pages, excavate processes bodies
# ---------------------------------------------------------------------------

_WEB_CRAWL_SCRIPT = """
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we break this out into a file?

Copy link
Copy Markdown
Collaborator

@TheTechromancer TheTechromancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

,

@liquidsec liquidsec merged commit 6b359ac into 3.0 Apr 1, 2026
15 of 16 checks passed
@liquidsec liquidsec deleted the additional-memory-benchmarks branch April 1, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants