Add memory benchmarks for scan pipeline by liquidsec · Pull Request #3001 · blacklanternsecurity/bbot

liquidsec · 2026-03-31T02:00:23Z

Summary

Adds three memory benchmarks that measure RSS and data retention through the real scan pipeline:
- HTTP_RESPONSE body retention: 200 responses with 500KB bodies, measures how much body data survives after scan completion
- High-volume pipeline: 5000 DNS_NAME events, measures per-event RSS cost and dedup tracker growth
- Recursive discovery chain: DNS_NAME → URL → HTTP_RESPONSE chains 4 levels deep, measures parent chain retention
All benchmarks wire into pytest-benchmark (extra_info) for --benchmark-save / --benchmark-compare across branches
Establishes baselines for evaluating future memory optimizations

github-actions · 2026-03-31T02:00:36Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

github-actions · 2026-03-31T02:26:46Z

📊 Performance Benchmark Report

Comparing 3.0 (baseline) vs additional-memory-benchmarks (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name	📏 Base	📏 Current	📈 Change	🎯 Status
Bloom Filter Dns Mutation Tracking Performance	`4.27ms`	`4.19ms`	-1.8% ⚪	✅
Bloom Filter Large Scale Dns Brute Force	`17.53ms`	`17.29ms`	-1.4% ⚪	✅
Large Closest Match Lookup	`351.39ms`	`340.92ms`	-3.0% ⚪	✅
Realistic Closest Match Workload	`187.48ms`	`190.77ms`	+1.8% ⚪	✅
Event Memory Medium Scan	`1776 B/event`	`1776 B/event`	+0.0% ⚪	✅
Event Memory Large Scan	`1760 B/event`	`1760 B/event`	+0.0% ⚪	✅
Event Validation Full Scan Startup Small Batch	`405.67ms`	`419.24ms`	+3.3% ⚪	✅
Event Validation Full Scan Startup Large Batch	`580.37ms`	`579.83ms`	-0.1% ⚪	✅
Make Event Autodetection Small	`30.87ms`	`31.36ms`	+1.6% ⚪	✅
Make Event Autodetection Large	`317.61ms`	`316.74ms`	-0.3% ⚪	✅
Make Event Explicit Types	`14.00ms`	`14.06ms`	+0.4% ⚪	✅
Excavate Single Thread Small	`3.962s`	`3.905s`	-1.4% ⚪	✅
Excavate Single Thread Large	`9.587s`	`9.822s`	+2.4% ⚪	✅
Excavate Parallel Tasks Small	`4.152s`	`4.100s`	-1.3% ⚪	✅
Excavate Parallel Tasks Large	`7.240s`	`7.205s`	-0.5% ⚪	✅
Is Ip Performance	`3.18ms`	`3.17ms`	-0.3% ⚪	✅
Make Ip Type Performance	`11.45ms`	`11.60ms`	+1.3% ⚪	✅
Mixed Ip Operations	`4.51ms`	`4.55ms`	+0.9% ⚪	✅
Memory Use Web Crawl	`-`	`681ns`	New 🆕	🆕
Memory Use Subdomain Enum	`-`	`651ns`	New 🆕	🆕
Typical Queue Shuffle	`62.89µs`	`59.80µs`	-4.9% ⚪	✅
Priority Queue Shuffle	`722.42µs`	`687.79µs`	-4.8% ⚪	✅

🎯 Performance Summary

✅ No significant performance changes detected (all changes <10%)

🆕 New Tests

Memory Use Web Crawl: 681ns, 1468.4K ops/sec
Memory Use Subdomain Enum: 651ns, 1536.1K ops/sec

🐍 Python Version 3.11.15

aconite33 · 2026-03-31T02:54:07Z

recheck

aconite33 · 2026-03-31T02:55:23Z

recheck

codecov · 2026-03-31T02:56:05Z

Codecov Report

❌ Patch coverage is 20.86957% with 91 lines in your changes missing coverage. Please review.
✅ Project coverage is 91%. Comparing base (8b02acb) to head (32aa5a1).
⚠️ Report is 11 commits behind head on 3.0.

Files with missing lines	Patch %	Lines
bbot/test/benchmarks/_scan_memory_web_crawl.py	0%	49 Missing ⚠️
...bot/test/benchmarks/_scan_memory_subdomain_enum.py	0%	27 Missing ⚠️
bbot/test/benchmarks/test_scan_memory.py	50%	15 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##             3.0   #3001    +/-   ##
======================================
- Coverage     91%     91%    -0%     
======================================
  Files        436     439     +3     
  Lines      37072   37184   +112     
======================================
+ Hits       33677   33711    +34     
- Misses      3395    3473    +78

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Scanner construction allocates 400+ MB in pytest (presets, module loading, etc.) which was setting the tracemalloc peak before any scan events existed, masking real differences between branches. Split scanner init out of the tracemalloc window so we measure only scan execution memory. Also separate "new tests" from "significant changes" in benchmark report output.

pytest's own allocations (~200 MB) contaminate tracemalloc peak measurements when scans run in-process, masking real differences between branches. Run each benchmark scan as a subprocess instead so measurements reflect only the scan's own memory use. Also rename tests to test_memory_use_* for clarity.

IP addresses and DNS record type strings (A, AAAA, CNAME, etc.) repeat heavily across events. sys.intern() deduplicates them so all events sharing the same IPs/rdtypes reference the same string object, reducing memory ~10-30% on those fields.

…interning Intern repeated strings in resolved_hosts and dns_children

TheTechromancer · 2026-04-01T15:33:47Z

+# 1) Web crawl -- httpx visits many pages, excavate processes bodies
+# ---------------------------------------------------------------------------
+
+_WEB_CRAWL_SCRIPT = """


can we break this out into a file?

TheTechromancer

,

aconite33 closed this Mar 31, 2026

aconite33 reopened this Mar 31, 2026

github-actions bot locked and limited conversation to collaborators Mar 31, 2026

blacklanternsecurity unlocked this conversation Mar 31, 2026

aconite33 closed this Mar 31, 2026

aconite33 reopened this Mar 31, 2026

github-actions bot locked and limited conversation to collaborators Mar 31, 2026

aconite33 closed this Mar 31, 2026

aconite33 reopened this Mar 31, 2026

Add scan memory benchmarks and MB support in benchmark report

cb5940b

liquidsec force-pushed the additional-memory-benchmarks branch from 1825523 to cb5940b Compare March 31, 2026 13:52

blacklanternsecurity unlocked this conversation Mar 31, 2026

liquidsec added 2 commits March 31, 2026 11:34

liquidsec force-pushed the additional-memory-benchmarks branch from aa7e2bc to 590e979 Compare March 31, 2026 19:48

liquidsec and others added 2 commits March 31, 2026 16:51

Merge pull request #3006 from blacklanternsecurity/additional-string-…

b5816e4

…interning Intern repeated strings in resolved_hosts and dns_children

TheTechromancer reviewed Apr 1, 2026

View reviewed changes

Extract embedded benchmark scripts into standalone files

32aa5a1

TheTechromancer approved these changes Apr 1, 2026

View reviewed changes

liquidsec merged commit 6b359ac into 3.0 Apr 1, 2026
15 of 16 checks passed

liquidsec deleted the additional-memory-benchmarks branch April 1, 2026 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add memory benchmarks for scan pipeline#3001

Add memory benchmarks for scan pipeline#3001
liquidsec merged 6 commits into3.0from
additional-memory-benchmarks

liquidsec commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

aconite33 commented Mar 31, 2026

Uh oh!

aconite33 commented Mar 31, 2026

Uh oh!

codecov bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

TheTechromancer Apr 1, 2026

Uh oh!

TheTechromancer left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

liquidsec commented Mar 31, 2026

Summary

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Performance Benchmark Report

🎯 Performance Summary

🆕 New Tests

Uh oh!

aconite33 commented Mar 31, 2026

Uh oh!

aconite33 commented Mar 31, 2026

Uh oh!

codecov bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TheTechromancer Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

TheTechromancer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 31, 2026 •

edited

Loading

codecov bot commented Mar 31, 2026 •

edited

Loading