Skip to content

Experimental: Symbol-based usage detection (opt-in)#135

Draft
dfederm wants to merge 1 commit intomainfrom
dfederm/symbol-based-usage-detection
Draft

Experimental: Symbol-based usage detection (opt-in)#135
dfederm wants to merge 1 commit intomainfrom
dfederm/symbol-based-usage-detection

Conversation

@dfederm
Copy link
Copy Markdown
Owner

@dfederm dfederm commented Apr 10, 2026

Summary

Adds an experimental symbol-based analysis mode behind ReferenceTrimmerUseSymbolAnalysis (opt-in, defaults to false). The legacy GetUsedAssemblyReferences code path is preserved as the default.

Motivation

GetUsedAssemblyReferences over-reports usage by treating transitive assembly dependencies as "used" even when the project's code doesn't reference them directly.

Approach

Uses RegisterSymbolAction + RegisterOperationAction to track which assemblies contain symbols that code actually references. Safety measures for runtime deps: RT0001 uses conservative transitive closure for bare References; RT0002 respects DisableTransitiveProjectReferences; RT0003 uses precise detection (NuGet handles transitive deps).

Opt-in

xml <PropertyGroup> <ReferenceTrimmerUseSymbolAnalysis>true</ReferenceTrimmerUseSymbolAnalysis> </PropertyGroup>

Testing

All E2E tests run in both modes via DataRow parameterization (91 pass). New test UnusedDirectReferenceUsedTransitively validates the key improvement.

Rollout plan

  1. Ship as opt-in in 3.5
  2. Enable on key repos, fix bugs
  3. If successful, make default in a future major version

Replace the GetUsedAssemblyReferences approach with a Roslyn analyzer that tracks
symbol usage at finer granularity, behind the ReferenceTrimmerUseSymbolAnalysis
MSBuild property (opt-in, defaults to false).

The new approach uses RegisterSymbolAction and RegisterOperationAction to track
which assemblies contain symbols that the code actually references, rather than
relying on the compiler's broader 'used assembly' heuristic which over-reports
usage by treating transitive assembly dependencies as used.

Key design decisions:
- RT0001 (bare Reference): always uses conservative transitive closure to avoid
  breaking runtime dependencies that lack automatic transitive resolution
- RT0002 (ProjectReference): uses transitive closure only when
  DisableTransitiveProjectReferences is set; otherwise uses precise detection
- RT0003 (PackageReference): always uses precise symbol-based detection since
  NuGet handles transitive package deps automatically
- Attribute constructor/named arguments (including typeof) are tracked
- Early exit optimization when all reference assemblies are already tracked

The legacy GetUsedAssemblyReferences code path is preserved as the default.
All E2E tests run in both modes via DataRow parameterization (91 pass).
Version bumped from 3.4 to 3.5.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
// Build mappings from reference assembly identities to their metadata reference display paths.
// These are used both for symbol tracking and for the transitive closure computation.
var assemblyToPath = new Dictionary<AssemblyIdentity, string>();
var pathToAssembly = new Dictionary<string, IAssemblySymbol>(StringComparer.OrdinalIgnoreCase);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making case-agnostic comparer on Windows and case-sensitive on non-Windows.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applies to other comparers used in this project

@stan-sz
Copy link
Copy Markdown
Contributor

stan-sz commented Apr 14, 2026

Linking to dotnet/roslyn#625 as the original issue in the Roslyn repo. Tagging @AlekseyTs (the author of GetUsedAssemblyReferences) for comments on this new approach.

@AlekseyTs
Copy link
Copy Markdown

Approach

Uses RegisterSymbolAction + RegisterOperationAction to track which assemblies contain symbols that code actually references.

If I correctly interpret the approach, I think this approach is likely to undereport references needed for a successful build. There are situations when compiler needs an formation from types or assemblies that aren't explicitly mentioned in code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants