Redact sensitive SQL password phrases#18754
Conversation
4169833 to
fb9d19f
Compare
fb9d19f to
137caa9
Compare
There was a problem hiding this comment.
Pull request overview
This PR tightens SQL query sanitization to better redact plaintext passwords in administrative/non-data SQL statements (e.g., SAP HANA ... PASSWORD ... and Oracle-style IDENTIFIED BY ...) while preserving more useful sanitized query text and query summaries.
Changes:
- Update SQL sanitizer lexers to truncate (redact remainder) after
PASSWORDandIDENTIFIED BYin non-data contexts. - Adjust handling of
USERto improve CREATE/ALTER USER query summary extraction. - Expand regression tests to cover additional password-bearing admin forms and add controls for non-sensitive
password/identified_byidentifiers.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| instrumentation-api-incubator/src/main/jflex/SqlSanitizer.jflex | Adds PASSWORD/IDENTIFIED BY truncation logic and a dedicated USER keyword handler. |
| instrumentation-api-incubator/src/main/jflex/SqlSanitizerWithSummary.jflex | Mirrors the truncation logic in the summary-producing sanitizer and updates USER handling. |
| instrumentation-api-incubator/src/test/java/io/opentelemetry/instrumentation/api/incubator/semconv/db/SqlQueryAnalyzerTest.java | Extends sensitive-statement test coverage for additional password forms and non-sensitive controls. |
Comments suppressed due to low confidence (2)
instrumentation-api-incubator/src/main/jflex/SqlSanitizer.jflex:517
shouldSanitizeRemainderAfterSensitivePhrase()is used to trigger early truncation when encountering the tokenPASSWORD. Because it only checks the operation type, any non-DML statement (e.g.,CREATE TABLE ...) that contains an unquoted identifier namedpasswordwill now be truncated at that point, even when no secret value follows. This looks like an overbroad match for a keyword that is also a common identifier; consider adding additional context checks (e.g., only trigger at top level likeparenLevel == 0, and/or only when the grammar position indicates a password clause) and add a regression test forCREATE TABLE users (password VARCHAR(...)).
"PASSWORD" {
appendCurrentFragment();
if (shouldSanitizeRemainderAfterSensitivePhrase()) {
builder.append(" ?");
return YYEOF;
}
instrumentation-api-incubator/src/main/jflex/SqlSanitizerWithSummary.jflex:1038
- Similar to the non-summary sanitizer, truncating on the bare token
PASSWORDfor any non-DML operation is likely overbroad for query-summary mode too (e.g.,CREATE TABLE users (password VARCHAR(...))would stop at the column name). Consider tightening the condition (e.g.,parenLevel == 0and/or only in known admin/DDL password-clause contexts) and adding a regression test that ensures common schema DDL with apasswordcolumn is not truncated.
"PASSWORD" {
appendCurrentFragment();
if (shouldSanitizeRemainderAfterSensitivePhrase()) {
builder.append(" ?");
return YYEOF;
}
| } | ||
| if (isOverLimit()) return YYEOF; | ||
| } | ||
| "IDENTIFIED" {WHITESPACE}+ "BY" { |
There was a problem hiding this comment.
I guess you could have a comment between identified and by, but I think we can ignore that
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
instrumentation-api-incubator/src/main/jflex/SqlSanitizer.jflex:474
- [Security] The admin-keyword rules only enable PASSWORD/IDENTIFIED BY remainder redaction when
operation == NoOp.INSTANCE. After a semicolon,operationis intentionally not reset (old-semconv extraction sticks to the first statement), so a sensitive admin/DDL command that appears in a subsequent statement (e.g.,SELECT 1; CREATE USER u PASSWORD secret) will not enable redaction and can leak unquoted passwords in the sanitized query text. Consider tracking “start of statement” separately (reset on;outside comments, ignoring whitespace) and using that to gate these flags, instead ofoperation == NoOp.INSTANCE.
"CONNECT" | "VALIDATE" | "CHECK" | "EXPORT" | "IMPORT" | "RECOVER" {
if (!insideComment && operation == NoOp.INSTANCE) {
passwordSanitizationEnabled = true;
identifiedBySanitizationEnabled = true;
}
Redact values following sensitive SQL password phrases in non-data statements so SAP HANA administrative commands and Oracle-style IDENTIFIED BY clauses do not expose unquoted or double-quoted passwords in sanitized query text. CREATE/ALTER USER statements now redact at PASSWORD or IDENTIFIED BY instead of stopping at USER, and regression coverage includes the additional password-bearing forms plus controls for normal password column usage.