feat: entropy-based secret detection for exception code variables by ablaszkiewicz · Pull Request #692 · PostHog/posthog-python

ablaszkiewicz · 2026-06-23T11:29:16Z

Currently we detect secrets based on keys and values having common secret phrases like api-key, password, etc...

@hpouillot got an amazing idea to do an entropy based secrets detection. This PR does that and extends our detection with popular secrets format like sk-ant, ey..., gh_..., glpat_....

There is a problem however - sometimes high entropy strings are classified as secrets. We do our best to detect genuine high entropy non-secrets and stop the entropy detection process if it's:

UUID,
file or URL path,
something [in these brackets] or in {these brackets}.

There are much more cases and rules - it's in the PR.

I also tightened our search limits based on real world exceptions.

Did a benchmark on 3 real exceptions. This computes how long does it take to capture variables from frames

Case	Before (master) (ms)	After (ms)	Δ (ms)
eventA	0.062	0.083	+0.021 (+34%)
eventB	0.040	0.053	+0.013 (+31%)
eventC	0.040	0.053	+0.012 (+31%)

greptile-apps · 2026-06-23T11:35:32Z

Comments Outside Diff (3)

posthog/test/test_code_variables.py, line 684-688 (link)

_is_high_entropy_secret called outside its documented pre-condition

_is_high_entropy_secret is documented as "Assumes len(value) >= _SECRET_MIN_LENGTH" and the only call site in production code respects this (the length guard in _looks_like_secret runs first). The test calls _is_high_entropy_secret("xK9#mP2$") — an 8-character string — directly, bypassing that guard. The function happens to return False because the entropy of 8 unique characters is ≈ 3.0 bits/char (below the 3.8 threshold), but the test is exercising behaviour that lies outside the function's contract. A future maintainer who tightens the generic gate could inadvertently shift the function's short-string behaviour without any failing tests.

Consider testing the cheap-exit via the public _looks_like_secret instead, which is the only intended caller and enforces the length precondition explicitly.
posthog/test/test_code_variables.py, line 726-744 (link)

No test for the per-context override of detect_secrets

The per-context toggle set_code_variables_detect_secrets_context is a new public API entry point wired through contexts.py and client.py, but TestSecretDetection only exercises the client-level option (code_variables_detect_secrets=False in make_client). A test that wraps a capture in a with posthog.new_context(): set_code_variables_detect_secrets_context(False) block and asserts the secret leaks through would close the remaining gap and mirror the coverage pattern the other context-level toggles already have.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
posthog/test/test_code_variables.py, line 679-692 (link)

Duplicate coverage of already-parameterised values

test_uuids_and_object_ids_are_never_flagged and test_pure_hex_is_treated_as_an_id_not_a_secret assert on exact string values (the UUID, Mongo ObjectId, and md5 hex digest) that are already present in NON_SECRETS and exercised by the parameterised test_non_secrets_are_not_flagged. This violates the OnceAndOnlyOnce rule and the team preference for parameterised tests over dedicated single-case tests. Either remove these methods and let the parameterised suite carry the coverage, or pull these "must-never-flag" entries into a named sub-list with their own parameterised test and remove the duplicates from NON_SECRETS.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "feat: entropy-based secret detection for..." | Re-trigger Greptile}

github-actions · 2026-06-23T11:38:42Z

posthog-python Compliance Report

Date: 2026-06-23 16:33:49 UTC
Duration: 530093ms

✅ All Tests Passed!

45/45 tests passed

Capture Tests

✅ 29/29 tests passed

View Details

Test	Status	Duration
Format Validation.Event Has Required Fields	✅	516ms
Format Validation.Event Has Uuid	✅	10007ms
Format Validation.Event Has Lib Properties	✅	10007ms
Format Validation.Distinct Id Is String	✅	10007ms
Format Validation.Token Is Present	✅	10007ms
Format Validation.Custom Properties Preserved	✅	10008ms
Format Validation.Event Has Timestamp	✅	10007ms
Retry Behavior.Retries On 503	✅	18020ms
Retry Behavior.Does Not Retry On 400	✅	12003ms
Retry Behavior.Does Not Retry On 401	✅	10009ms
Retry Behavior.Respects Retry After Header	✅	16013ms
Retry Behavior.Implements Backoff	✅	30018ms
Retry Behavior.Retries On 500	✅	13009ms
Retry Behavior.Retries On 502	✅	16011ms
Retry Behavior.Retries On 504	✅	16011ms
Retry Behavior.Max Retries Respected	✅	30017ms
Deduplication.Generates Unique Uuids	✅	7002ms
Deduplication.Preserves Uuid On Retry	✅	16017ms
Deduplication.Preserves Uuid And Timestamp On Retry	✅	23019ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry	✅	16005ms
Deduplication.No Duplicate Events In Batch	✅	10003ms
Deduplication.Different Events Have Different Uuids	✅	10006ms
Compression.Sends Gzip When Enabled	✅	10009ms
Batch Format.Uses Proper Batch Structure	✅	10007ms
Batch Format.Flush With No Events Sends Nothing	✅	5005ms
Batch Format.Multiple Events Batched Together	✅	10005ms
Error Handling.Does Not Retry On 403	✅	12009ms
Error Handling.Does Not Retry On 413	✅	10007ms
Error Handling.Retries On 408	✅	14014ms

Feature_Flags Tests

✅ 16/16 tests passed

View Details

Test	Status	Duration
Request Payload.Request With Person Properties Device Id	✅	9502ms
Request Payload.Flags Request Uses V2 Query Param	✅	10007ms
Request Payload.Flags Request Hits Flags Path Not Decide	✅	10006ms
Request Payload.Flags Request Omits Authorization Header	✅	10007ms
Request Payload.Token In Flags Body Matches Init	✅	10007ms
Request Payload.Groups Round Trip	✅	10007ms
Request Payload.Groups Default To Empty Object	✅	10006ms
Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It	✅	10007ms
Request Payload.Disable Geoip False Propagates As Geoip Disable False	✅	10007ms
Request Payload.Disable Geoip Omitted Defaults To False	✅	10006ms
Request Payload.Flag Keys To Evaluate Contains Only Requested Key	✅	10007ms
Request Lifecycle.No Flags Request On Init Alone	✅	5004ms
Request Lifecycle.No Flags Request On Normal Capture	✅	10507ms
Request Lifecycle.Two Flag Calls Produce Two Remote Requests	✅	9512ms
Request Lifecycle.Mock Response Value Is Returned To Caller	✅	10001ms
Side Effect Events.Get Feature Flag Captures Feature Flag Called Event	✅	10509ms

Add a last-resort entropy-based detector that redacts high-entropy, secret-looking values (API keys, tokens, strong passwords) sitting in innocuously-named code variables, after the existing name-pattern and URL-credential checks. - Known vendor key formats (OpenAI, Anthropic, AWS, Stripe, GitHub, GitLab, Slack, Google, JWT, PEM private keys) are matched directly. - Structured identifiers (UUIDs, Mongo ObjectIds, hashes), object reprs, file paths and URLs are never flagged. - Exposed as the `code_variables_detect_secrets` option (default True) with a per-context override, threaded through client/contexts. - Tighten the masking size caps to keep capture cost bounded. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ablaszkiewicz · 2026-06-23T12:07:30Z

+
+# Synthetic, format-correct fakes (no real credentials). Vendor keys are assembled from
+# prefix + body so no complete secret literal lives in source (which trips secret scanners).
+def _key(prefix, body):


I had to split secrets into two parts. Otherwise GitHub wasn't happy

greptile-apps · 2026-06-23T12:16:04Z

_{Reviews (2): Last reviewed commit: "feat: entropy-based secret detection for..." | Re-trigger Greptile}

cat-ph

love this!! 🚀🚀

An opaque object whose __repr__ returns a bare high-entropy token bypassed detection, since _safe_repr only checked the name/keyword mask. Run _looks_like_secret on the rendered repr too; normal reprs (with parens/brackets/ angle-brackets) are rejected by the repr-punctuation guard, so only bare-token reprs are caught. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

marandaneto · 2026-06-23T14:10:31Z

i've used https://github.com/Yelp/detect-secrets/tree/master/detect_secrets/plugins (see regexes) in the past, not sure if its up to date tho

Add high-confidence, distinctive-prefix patterns adapted from the gitleaks / detect-secrets rule sets: Hugging Face, Google OAuth, DigitalOcean, Square, Grafana, Twilio, SendGrid, Mailgun, Mailchimp, npm, PyPI, Databricks, Doppler, Postman, Linear, Notion, Shopify, New Relic, and more GitLab/Slack variants. The entropy gate already catches most high-entropy keys generically; the known list mainly adds deterministic guarantees and covers low-character-class formats (e.g. AWS AKIA) the entropy gate rejects. Runs only on the fallback path, so no measurable perf change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ablaszkiewicz · 2026-06-23T16:33:06Z

Thanks @marandaneto I borrowed some stuff from them 8)

ablaszkiewicz force-pushed the code-variables-entropy-detection branch from a245e6e to d206ea9 Compare June 23, 2026 11:51

ablaszkiewicz force-pushed the code-variables-entropy-detection branch from d206ea9 to 615244d Compare June 23, 2026 12:07

ablaszkiewicz commented Jun 23, 2026

View reviewed changes

ablaszkiewicz marked this pull request as ready for review June 23, 2026 12:08

ablaszkiewicz requested a review from a team as a code owner June 23, 2026 12:08

ablaszkiewicz requested review from a team, cat-ph and hpouillot June 23, 2026 12:08

ablaszkiewicz mentioned this pull request Jun 23, 2026

docs: automatic secret detection for Python code variables PostHog/posthog.com#17848

Merged

cat-ph approved these changes Jun 23, 2026

View reviewed changes

Comment thread posthog/exception_utils.py

hpouillot approved these changes Jun 23, 2026

View reviewed changes

marandaneto approved these changes Jun 23, 2026

View reviewed changes

ablaszkiewicz merged commit 42ff4ca into main Jun 23, 2026
31 checks passed

ablaszkiewicz deleted the code-variables-entropy-detection branch June 23, 2026 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: entropy-based secret detection for exception code variables#692

feat: entropy-based secret detection for exception code variables#692
ablaszkiewicz merged 3 commits into
mainfrom
code-variables-entropy-detection

ablaszkiewicz commented Jun 23, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading

Comments Outside Diff (3)

Uh oh!

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

ablaszkiewicz Jun 23, 2026

Uh oh!

greptile-apps Bot commented Jun 23, 2026

Uh oh!

cat-ph left a comment

Uh oh!

Uh oh!

marandaneto commented Jun 23, 2026 •

edited

Loading

Uh oh!

ablaszkiewicz commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

ablaszkiewicz commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (3)

Uh oh!

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

posthog-python Compliance Report

✅ All Tests Passed!

Capture Tests

Feature_Flags Tests

Uh oh!

ablaszkiewicz Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 23, 2026

Uh oh!

cat-ph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marandaneto commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ablaszkiewicz commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ablaszkiewicz commented Jun 23, 2026 •

edited

Loading

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading

github-actions Bot commented Jun 23, 2026 •

edited

Loading

marandaneto commented Jun 23, 2026 •

edited

Loading