Enhance seeder with additional cipher types and architectural refactorings#6935
Enhance seeder with additional cipher types and architectural refactorings#6935theMickster merged 6 commits intomainfrom
Conversation
| return value; | ||
| } | ||
| } | ||
| return _buckets[^1].Value; |
There was a problem hiding this comment.
📝 Return last bucket for out-of-range indices (graceful degradation). This handles rounding errors where cumulative counts don't exactly match total.
| /// The workaround is a known test workaround that doesn't affect production code. | ||
| /// </summary> | ||
| [Fact] | ||
| public void SecureNote_SameSeed_ProducesSameOutput() |
There was a problem hiding this comment.
📝 Bogus 35.0+ has an edge case where Password() can infinite loop with certain seed/index combinations in WiFi/Database note categories. Limiting iterations is a test-only workaround that doesn't affect production usage where variety is high.
There was a problem hiding this comment.
♻️ Upcoming work
TODO: Add password override parameter
Currently all seeded users share the same password for dev convenience.
Future: Support per-user or pattern-based passwords (e.g., "Pass-{username}") for more realistic testing scenarios.
| @@ -32,16 +32,16 @@ public class CipherViewDto | |||
| public LoginViewDto? Login { get; set; } | |||
There was a problem hiding this comment.
♻️ Potential upcoming work
TODO: Add TOTP (2FA) support to LoginViewDto
Real logins often have Time-based One-Time Password seeds.
Need TotpSeed property (string, optional) that generators can populate with realistic authenticator seeds (e.g., "otpauth://totp/...")
This enables testing of TOTP generation/validation in vault scenarios.
No clue how to do this yet, but together we can figure it out.
There was a problem hiding this comment.
There are a good number of ways to use a seed to make TOTPs. This feels especially low-priority to need to test to me, as we're not testing the user's security itself. I suppose some use cases will "need" it but not totally sure what those are, in a hard-required sense.
There was a problem hiding this comment.
Sure thing; I wasn't sure where it fit in the priority list. Thanks for the info 😃
|
New Issues (1)Checkmarx found the following issues in this Pull Request
Fixed Issues (2)Great job! The following issues were fixed in this Pull Request
|
There was a problem hiding this comment.
Before this class, recipes had 7+ fields for generators and a long InitializeGenerators() method. Now recipes have 1 field (_ctx) andgenerators are accessed via properties.
Design goal: Make adding generators a 3-line change (field, property, done) without touching recipe constructor or initialization logic.
|
|
||
| namespace Bit.Seeder.Data.Generators; | ||
|
|
||
| /// <summary> |
There was a problem hiding this comment.
📝 SSH keys use a pre-generated pool because RSA.Create() uses system RNG (not our seed). Keys are marked with -----BEGIN FAKE RSA PRIVATE KEY----- headers to prevent accidental production use. Cross-run reproducibility is not guaranteed for SSH key content.
We really just want nice looking fake data; we don't want some testing tool using them.
That is so far out of scope right now; it's not even on the horizon
There was a problem hiding this comment.
Does the "fake" addition really stop anyone from using it though? And what do scanners say? I would want it to pass and validators our user experience might have first.
There was a problem hiding this comment.
No, I don't think it would stop anyone from using them. I will look closer into our static code analyzers to see if anything barks. In terms of the UI, it 'seems' to pass validation on the frontend (as in I don't see any toaster messages) when I open the modal.
There was a problem hiding this comment.
♻️Upcoming work
No organizational policies are currently configured. We'll add policy seeding (2FA requirements, Master Password policies, etc.).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6935 +/- ##
==========================================
- Coverage 60.01% 56.23% -3.79%
==========================================
Files 1971 1976 +5
Lines 87090 87523 +433
Branches 7759 7809 +50
==========================================
- Hits 52266 49216 -3050
- Misses 32928 36478 +3550
+ Partials 1896 1829 -67 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Claude finished @theMickster's task in 3m 42s —— View job Code Review CompleteI've conducted a thorough review of this PR which enhances the Seeder library with additional cipher types and architectural refactorings. Overall, this is a well-structured PR with good separation of concerns and comprehensive test coverage. Below are my findings organized by priority. ✅ What's Good
🟡 Suggestions (Non-Blocking)1. FolderNameGenerator Pool Size IssueFile: The _folderNames = Enumerable.Range(0, _namePoolSize)
.Select(_ => faker.Commerce.Department())
.Distinct()
.ToArray();Consider generating more names initially (e.g., 100) to ensure the target unique count after deduplication, or document the expected minimum. 2. Distribution.GetCounts SimplificationFile: Per @MGibson1's review comment, the last bucket handling could be simplified. The current implementation is correct but slightly more complex than necessary. 3. Double Mangling in OrganizationWithVaultRecipeFile: The email is mangled explicitly, then passed to var email = $"user{i}@{options.Domain}";
var mangledEmail = manglerService.Mangle(email);
var userKeys = RustSdkService.GenerateUserKeys(mangledEmail, UserSeeder.DefaultPassword);
var memberUser = UserSeeder.Create(mangledEmail, passwordHasher, manglerService, keys: userKeys);Since 🔴 Issues to Address1. Hardcoded Password - Checkmarx FindingFile: internal const string DefaultPassword = "asdfasdfasdf";This is flagged by Checkmarx as
The current approach is acceptable for a seeder utility, but the warning should be acknowledged or suppressed with documentation. 2. Potential Thread Safety Issue in SshKeyDataGeneratorFile: The lazy initialization is thread-safe, but if multiple threads access
This is minor since it only affects first access. 📝 Architectural NotesThe discussion between reviewers about mangling responsibilities has been well-resolved:
The SummaryThis is a solid PR that significantly improves the Seeder library's architecture. The main actionable item is addressing the Checkmarx warning for the hardcoded password - even if just with documentation/suppression. The other suggestions are non-blocking improvements. Recommended: Address the hardcoded password warning, then this PR is ready to merge. |
MGibson1
left a comment
There was a problem hiding this comment.
I think the discrepency on mangling is really the only thing I'd like to get aligned with.
As I mention in one of the comments these seeders have two use-cases
- to create development environments that are human-usable
- to create single-use, parallelizable test data where magic strings exist only to make tests more readable and the actual value doesn't matter
I'd like our seeders to handle both cases because they are the location of all the expertise in creating the object, moving that out of seeders just makes things more distributed and muddied.
I like the proposed solution of a ManglerService that a given seeder can always call, but only mangles based on execution context. I believe it is true that we will either want everything mangled or nothing mangled.
| /// <summary> | ||
| /// Creates a distribution from percentage buckets. | ||
| /// </summary> | ||
| /// <param name="buckets">Value-percentage pairs that must sum to 1.0 (within 0.001 tolerance).</param> | ||
| /// <exception cref="ArgumentException">Thrown when percentages don't sum to 1.0.</exception> | ||
| public Distribution(params (T Value, double Percentage)[] buckets) | ||
| { | ||
| var total = buckets.Sum(b => b.Percentage); | ||
| if (Math.Abs(total - 1.0) > 0.001) | ||
| { | ||
| throw new ArgumentException($"Percentages must sum to 1.0, got {total}"); | ||
| } | ||
| _buckets = buckets; | ||
| } |
There was a problem hiding this comment.
todo: just normalize the buckets when you get them to remove the restriction
There was a problem hiding this comment.
Will you please explain and clarify what you mean here? I'm not tracking ya
| /// <summary> | ||
| /// Returns all values with their calculated counts for a given total. | ||
| /// The last bucket receives any remainder from rounding. | ||
| /// </summary> | ||
| /// <param name="total">Total number of items to distribute.</param> | ||
| /// <returns>Sequence of value-count pairs.</returns> | ||
| public IEnumerable<(T Value, int Count)> GetCounts(int total) | ||
| { | ||
| var remaining = total; | ||
| for (var i = 0; i < _buckets.Length - 1; i++) | ||
| { | ||
| var count = (int)(total * _buckets[i].Percentage); | ||
| yield return (_buckets[i].Value, count); | ||
| remaining -= count; | ||
| } | ||
| yield return (_buckets[^1].Value, remaining); | ||
| } |
There was a problem hiding this comment.
nit: I don't think the last bucket is a special case
| /// <summary> | |
| /// Returns all values with their calculated counts for a given total. | |
| /// The last bucket receives any remainder from rounding. | |
| /// </summary> | |
| /// <param name="total">Total number of items to distribute.</param> | |
| /// <returns>Sequence of value-count pairs.</returns> | |
| public IEnumerable<(T Value, int Count)> GetCounts(int total) | |
| { | |
| var remaining = total; | |
| for (var i = 0; i < _buckets.Length - 1; i++) | |
| { | |
| var count = (int)(total * _buckets[i].Percentage); | |
| yield return (_buckets[i].Value, count); | |
| remaining -= count; | |
| } | |
| yield return (_buckets[^1].Value, remaining); | |
| } | |
| /// <summary> | |
| /// Returns all values with their calculated counts for a given total. | |
| /// The last bucket receives any remainder from rounding. | |
| /// </summary> | |
| /// <param name="total">Total number of items to distribute.</param> | |
| /// <returns>Sequence of value-count pairs.</returns> | |
| public IEnumerable<(T Value, int Count)> GetCounts(int total) | |
| { | |
| for (var i = 0; i < _buckets.Length; i++) | |
| { | |
| var count = (int)(total * _buckets[i].Percentage); | |
| yield return (_buckets[i].Value, count); | |
| } | |
| } |
There was a problem hiding this comment.
Does it still work as it does now with the same distributions?
If so then I'm fine with the change 😃
| /// Realistic distribution based on breach data and security research. | ||
| /// 25% VeryWeak, 30% Weak, 25% Fair, 15% Strong, 5% VeryStrong |
There was a problem hiding this comment.
Claude.
I didn't waste time researching nor really mind what the total are in the end as I just wanted something that smelled reasonable to make realistic vault data.
Eventually, we'll have these will be inputs to the recipe.
There was a problem hiding this comment.
A ⛏️ like my other comment -- and this can just be a future PR to tweak -- but I think / hope this is skewed more toward fair. This makes me want to suggest something more aligned with your "developer" option above and to have additional distributions for "power users" or similar.
There was a problem hiding this comment.
I agree that with these distributions we can have all sorts of different distributions for all different types of user vaults and organization vaults. And yes, this is not something to spend too much thought time on at this point.
We can have the 'boomer' distribution that leans towards the less secure side and then the 'bitwarden' distribution that's on the way secure side. And these distributions can all be easily added and maintained all outside the main recipes & steps. 😃
| /// </remarks> | ||
| public class EmailMangler | ||
| { | ||
| private readonly MangleId _mangleId = new(); |
There was a problem hiding this comment.
This should probably just be the play id, do we want to ever mangle things that aren't associated with a play?
Works for me. I'm for a chat. Also, what other data points other than the email address must be mangled? |
withinfocus
left a comment
There was a problem hiding this comment.
A few low-priority comments.
I tend to agree with Mick's approach that we'd have specific manglers and use those sometimes or not at all; ome use cases will have mangling and some data just doesn't need it or would be harmless.
Despite some inefficiencies are the "ingredients" necessary for further breakdown? What are the scenarios we envision implementing that require a different composition?
| { | ||
| /// <summary> | ||
| /// Realistic enterprise mix based on typical usage patterns. | ||
| /// 60% Login, 15% SecureNote, 12% Card, 10% Identity, 3% SshKey |
There was a problem hiding this comment.
ℹ️ Seems ... inaccurate to me. Something to ask about internally.
There was a problem hiding this comment.
Fair point and likely correct.
Without any access to environments outside my own, I took a random guess to get some data that 'looked' normal with plenty of examples of things other than logins.
| /// Realistic distribution based on breach data and security research. | ||
| /// 25% VeryWeak, 30% Weak, 25% Fair, 15% Strong, 5% VeryStrong |
There was a problem hiding this comment.
A ⛏️ like my other comment -- and this can just be a future PR to tweak -- but I think / hope this is skewed more toward fair. This makes me want to suggest something more aligned with your "developer" option above and to have additional distributions for "power users" or similar.
|
|
||
| namespace Bit.Seeder.Data.Generators; | ||
|
|
||
| /// <summary> |
There was a problem hiding this comment.
Does the "fake" addition really stop anyone from using it though? And what do scanners say? I would want it to pass and validators our user experience might have first.
| @@ -32,16 +32,16 @@ public class CipherViewDto | |||
| public LoginViewDto? Login { get; set; } | |||
There was a problem hiding this comment.
There are a good number of ways to use a seed to make TOTPs. This feels especially low-priority to need to test to me, as we're not testing the user's security itself. I suppose some use cases will "need" it but not totally sure what those are, in a hard-required sense.
| internal FolderNameGenerator(int seed) | ||
| { | ||
| var faker = new Faker { Random = new Randomizer(seed) }; | ||
|
|
There was a problem hiding this comment.
🎨 SUGGESTED: The .Distinct() call on line 19 could result in fewer unique names if Commerce.Department() returns duplicates. This means _folderNames.Length may be less than _namePoolSize (50), potentially causing less diverse folder names than expected.
Consider either:
- Generating more names initially (e.g., 100) to ensure 50+ unique values after deduplication
- Documenting the expected minimum unique count
This is a minor edge case - the current behavior is functional, just may produce less variety than the constant suggests.
MGibson1
left a comment
There was a problem hiding this comment.
Commenting for your response to MangleId class access modifier.
| // When keys are provided, caller owns email/key consistency - don't mangle | ||
| var mangledEmail = keys == null ? manglerService.Mangle(email) : email; | ||
|
|
||
| keys ??= RustSdkService.GenerateUserKeys(mangledEmail, DefaultPassword); |
There was a problem hiding this comment.
Non-blocking question: Under what circumstances are you expecting user keys to be provided?
Who is meant to produce those keys and how to they mangle to ensure tests can be executed in parallel?
There was a problem hiding this comment.
We needed the symmetric key in the Recipe to make folders (at least I'm fairly certain we do).
And, it just felt a little odd for the Seeder to return a UserWithKeys or some tuple.
withinfocus
left a comment
There was a problem hiding this comment.
Works for me, as for Gibson it seems.
939635b


💡 Before starting your review, remember that Rome wasn't built in a day nor will the Seeder be in a single pull request
🎟️ Tracking
📔 Objective
Implement required refactoring to decouple and standardize many of the components that make up the Seeder library.
CipherSeeder.csCode Quality
- Refactored all factories to consistent
Create()naming pattern- Unified seeder visibility to
internalwith properInternalsVisibleTofor tests- Split relationship entities into dedicated seeders (CollectionUserSeeder, GroupUserSeeder)
- Extracted organizational structure data to
Data/Static/Documentation
- Updated CLAUDE.md to focus on quick reference and pattern decisions
- Enhanced README.md with Helpers section and folder structure guide
- Created comprehensive Data/README.md with generator/distribution tables
- Added DbSeederUtility/CLAUDE.md for CLI-specific guidance
📝 I executed 7-10 local code reviews while working through this pull request using the out-of-the-box Claude Code Review command & our Bitwarden Claude Code command. I resolved various findings and architectural suggestions as I went.