perf(l1): reuse BAL for witness generation#6406
perf(l1): reuse BAL for witness generation#6406ArshLabs wants to merge 1 commit intolambdaclass:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes L1/L2 execution witness generation by removing the DatabaseLogger wrapper and instead deriving accessed state (accounts/storage slots) from the Block Access List (BAL) produced by execute_block() for Amsterdam+ blocks, reducing per-DB-call locking overhead.
Changes:
- Stop wrapping the VM database with
DatabaseLogger; execute blocks with a plainStoreVmDatabase. - Build the
state_accessedmap frombal.accounts()+all_storage_slots()instead of logger-captured accesses. - Adjust witness assembly to (a) gather bytecodes via account state/code lookups and (b) include a conservative range of ancestor headers to cover
BLOCKHASH.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Look up the code hash from the account state in storage | ||
| if let Ok(Some(account_state)) = | ||
| self.storage.get_account_state(parent_hash, ac.address) | ||
| { | ||
| if account_state.code_hash != *EMPTY_TRIE_HASH | ||
| && seen_code_hashes.insert(account_state.code_hash) | ||
| { |
There was a problem hiding this comment.
self.storage.get_account_state(parent_hash, ac.address) appears to be calling Store::get_account_state, but in crates/storage/store.rs that method is async fn get_account_state(block_number, address), not a sync lookup by block hash. As written this should not compile (wrong argument type + missing .await). Use a synchronous API here (e.g., get_account_state_by_root(parent_header.state_root, ...), or open the state trie for the relevant root) and pass the correct empty-code sentinel (EMPTY_KECCACK_HASH, not EMPTY_TRIE_HASH).
| // Include codes from accounts that have code changes in the BAL | ||
| for ac in bal.as_ref().into_iter().flat_map(|b| b.accounts()) { | ||
| if !ac.code_changes.is_empty() { | ||
| // Look up the code hash from the account state in storage | ||
| if let Ok(Some(account_state)) = | ||
| self.storage.get_account_state(parent_hash, ac.address) | ||
| { | ||
| if account_state.code_hash != *EMPTY_TRIE_HASH | ||
| && seen_code_hashes.insert(account_state.code_hash) | ||
| { | ||
| if let Ok(Some(code)) = | ||
| self.storage.get_account_code(account_state.code_hash) | ||
| { | ||
| codes.push(code.bytecode.to_vec()); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This code currently looks up code hashes from the parent state (parent_hash/parent root). That will miss bytecodes for contracts created or updated in the current block (and then used by later blocks in the batch). Instead, collect deployed/updated code from account_updates (it already carries code: Option<Code> / info.code_hash) or query against the post-state root after applying updates.
| if let Ok(Some(code)) = | ||
| self.storage.get_account_code(account_state.code_hash) | ||
| { | ||
| codes.push(code.bytecode.to_vec()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Bytecode collection silently ignores missing account states/codes (if let Ok(Some(...))), whereas the previous logger.code_accessed path failed witness generation if a referenced code hash was missing. Silent omission can produce an incomplete witness (guest execution will default missing bytecode to empty), masking real correctness issues. Prefer propagating a ChainError::WitnessGeneration when a code hash is expected but the code cannot be loaded.
| for account in state_accessed.keys() { | ||
| if let Ok(Some(account_state)) = | ||
| self.storage.get_account_state(parent_hash, *account) | ||
| { | ||
| if account_state.code_hash != *EMPTY_TRIE_HASH | ||
| && seen_code_hashes.insert(account_state.code_hash) | ||
| { |
There was a problem hiding this comment.
Same issue as above: self.storage.get_account_state(parent_hash, *account) does not match the Store::get_account_state API (async + block number), so this should not compile as written. Also, checking code_hash != *EMPTY_TRIE_HASH is incorrect for code hashes; use EMPTY_KECCACK_HASH instead.
| @@ -1407,21 +1419,18 @@ impl Blockchain { | |||
| .header | |||
| .clone(); | |||
|
|
|||
| // Headers from latest - 1 until we reach first block header we need. | |||
| // We do it this way because we want to fetch headers by hash, not by number | |||
| while current_header.hash() != first_needed_block_hash { | |||
| // Walk backwards from the last block's parent to the oldest needed header. | |||
| while current_header.number > first_needed_block_number { | |||
| let parent_hash = current_header.parent_hash; | |||
| let current_number = current_header.number - 1; | |||
|
|
|||
| current_header = self | |||
| let Some(parent_header) = self | |||
| .storage | |||
| .get_block_header_by_hash(parent_hash)? | |||
| .ok_or_else(|| { | |||
| ChainError::WitnessGeneration(format!( | |||
| "Failed to get block {current_number} header" | |||
| )) | |||
| })?; | |||
| else { | |||
| break; // Reached the oldest stored header | |||
| }; | |||
There was a problem hiding this comment.
The “up to 256 blocks” logic is off by one: saturating_sub(257) makes the loop include 257 ancestors (and potentially many more headers when the batch spans multiple blocks). Also, breaking on missing headers can yield a witness that’s missing block headers that the guest DB may need for BLOCKHASH, without returning an error. Consider (a) using saturating_sub(256) if you intend exactly the BLOCKHASH window, and (b) returning an explicit error if a header within the intended range can’t be fetched (or adjust the comment/logic to make the completeness guarantees explicit).
| // Pre-Amsterdam: fall back to DatabaseLogger path. | ||
| // This branch should not normally be reached since | ||
| // generate_witness is typically called for recent blocks, | ||
| // but we keep it for backward compatibility. | ||
| return Err(ChainError::WitnessGeneration( | ||
| "Witness generation without BAL requires DatabaseLogger; \ | ||
| use generate_witness_for_blocks for pre-Amsterdam blocks" |
There was a problem hiding this comment.
The pre-Amsterdam bal == None branch returns an error, but the inline comment says “fall back to DatabaseLogger path” and the error text suggests calling generate_witness_for_blocks, which just calls this same function. Please update the comment/error message to reflect the actual behavior (either implement an actual fallback here, or clearly state that pre-Amsterdam witness generation is unsupported from this path and point to the correct alternative API).
| // Pre-Amsterdam: fall back to DatabaseLogger path. | |
| // This branch should not normally be reached since | |
| // generate_witness is typically called for recent blocks, | |
| // but we keep it for backward compatibility. | |
| return Err(ChainError::WitnessGeneration( | |
| "Witness generation without BAL requires DatabaseLogger; \ | |
| use generate_witness_for_blocks for pre-Amsterdam blocks" | |
| // Pre-Amsterdam: BAL is not available from the execution engine. | |
| // This API does not support witness generation without BAL and | |
| // will return an error for pre-Amsterdam blocks. Use a | |
| // DatabaseLogger-backed witness generation API for those blocks. | |
| return Err(ChainError::WitnessGeneration( | |
| "Witness generation without BAL is not supported from this API; \ | |
| use a DatabaseLogger-backed witness generation mechanism for pre-Amsterdam blocks" |
| for ac in bal.accounts() { | ||
| let storage_keys: Vec<H256> = ac | ||
| .all_storage_slots() | ||
| .map(|slot| H256::from_uint(&slot)) | ||
| .collect(); | ||
| accessed.insert(ac.address, storage_keys); | ||
| } |
There was a problem hiding this comment.
AccountAccess::all_storage_slots() can include duplicates (it chains reads and writes). Previously this code deduplicated storage keys per account; now it collects them verbatim, which can cause redundant storage-trie lookups/work during witness generation. Consider deduplicating the collected storage_keys (e.g., with a HashSet) before inserting into accessed.
Greptile SummaryThis PR removes the Key changes and observations:
Confidence Score: 4/5Safe to merge for Amsterdam+ deployments; the pre-Amsterdam code path now hard-errors with a misleading error message but is not expected to be triggered in production. The core logic (BAL consumption, trie access, BLOCKHASH range) is correct and well-reasoned. The one P1 is the misleading error message for pre-Amsterdam blocks, which won't affect Amsterdam+ production paths but could confuse future callers. The redundant first bytecode pass is harmless but noisy. crates/blockchain/blockchain.rs – specifically the pre-Amsterdam error fallback (lines 1248-1257) and the first bytecode collection loop (lines 1312-1330).
|
| Filename | Overview |
|---|---|
| crates/blockchain/blockchain.rs | Replaces DatabaseLogger wrapping with direct BAL consumption in generate_witness_for_blocks_with_fee_configs; one misleading error message (incorrect fallback guidance for pre-Amsterdam path) and one redundant first-pass bytecode loop; BLOCKHASH walk-back logic is correct. |
Sequence Diagram
sequenceDiagram
participant Caller
participant generate_witness as generate_witness_for_blocks_with_fee_configs
participant Evm
participant Storage
participant BAL as BlockAccessList
Caller->>generate_witness: blocks, fee_configs
loop For each block
generate_witness->>Storage: new StoreVmDatabase(parent_header)
generate_witness->>Evm: new_evm(vm_db) [L1] or new_for_l2(...) [L2]
generate_witness->>Evm: execute_block(block)
Evm-->>generate_witness: (execution_result, Some(bal)) [Amsterdam+]
generate_witness->>Evm: get_state_transitions()
Evm-->>generate_witness: account_updates
generate_witness->>BAL: bal.accounts()
BAL-->>generate_witness: [AccountChanges] with address, storage_reads, storage_changes
generate_witness->>generate_witness: Build state_accessed map
generate_witness->>Storage: get_account_state(parent_hash, addr) per accessed account
Storage-->>generate_witness: account_state.code_hash
generate_witness->>Storage: get_account_code(code_hash)
Storage-->>generate_witness: bytecode -> codes[]
generate_witness->>Storage: apply_account_updates_from_trie_with_witness(...)
generate_witness->>Storage: store_block(block, account_updates, execution_result)
end
generate_witness->>Storage: Walk back up to 256 ancestor headers
Storage-->>generate_witness: block_headers_bytes[]
generate_witness-->>Caller: ExecutionWitness { codes, state_trie, storage_tries, headers }
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1249-1257
Comment:
**Incorrect fallback recommendation in error message**
The error message advises callers to "use `generate_witness_for_blocks` for pre-Amsterdam blocks", but `generate_witness_for_blocks` is defined as:
```rust
pub async fn generate_witness_for_blocks(&self, blocks: &[Block]) -> Result<ExecutionWitness, ChainError> {
self.generate_witness_for_blocks_with_fee_configs(blocks, None).await
}
```
It simply delegates to the exact same function that raises this error, so following the guidance will immediately hit the same `WitnessGeneration` error again. Any pre-Amsterdam caller expecting the old behaviour will be stuck with no working exit path.
The message should either indicate there is no pre-Amsterdam support, or point to a concrete alternative (e.g. `generate_witness_from_account_updates`, or the old `DatabaseLogger`-based path if it is still available elsewhere).
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1312-1330
Comment:
**First bytecode collection pass is redundant with the second pass**
This first loop iterates over accounts that have `code_changes` in the BAL and attempts to include their pre-state bytecode (at `parent_hash`). However, there are two reasons this loop is either ineffective or redundant:
1. **Newly deployed contracts (the common case):** `code_changes` is populated when new code is deployed (`CREATE`/`CREATE2`). For these accounts the pre-state at `parent_hash` has `code_hash == EMPTY_TRIE_HASH`, so the guard `account_state.code_hash != *EMPTY_TRIE_HASH` is false and nothing is added.
2. **Existing accounts whose code is overwritten (self-destruct + re-deploy):** The account would appear in `bal.accounts()`, so its address is already in `state_accessed`. The second pass (lines 1332–1346) iterates over `state_accessed.keys()` and would find exactly the same pre-state code hash, deduplicated by the shared `seen_code_hashes` set.
In practice, every address covered by this first pass is also unconditionally covered by the second pass. The first loop can be removed to simplify the code without any change in behaviour.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1413
Comment:
**`first_needed_block_number` is anchored to the first block, not the last**
`first_needed_block_number` is computed from `first_block_header.number`, which is intentionally the OLDEST block in the batch (any block in the batch can reference BLOCKHASH as far back as 256 predecessors of the first block). This is correct for a multi-block batch.
However, this line might be slightly confusing because the walk **starts** from `blocks.last().header` while the stopping condition is derived from `first_block_header`. A short inline comment explaining this relationship (e.g. "anchored to the first block so we cover all BLOCKHASH lookups across the entire batch") would help future readers follow the intent without having to re-derive it.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "perf(l1): reuse BAL for witness generati..." | Re-trigger Greptile
| // Pre-Amsterdam: fall back to DatabaseLogger path. | ||
| // This branch should not normally be reached since | ||
| // generate_witness is typically called for recent blocks, | ||
| // but we keep it for backward compatibility. | ||
| return Err(ChainError::WitnessGeneration( | ||
| "Witness generation without BAL requires DatabaseLogger; \ | ||
| use generate_witness_for_blocks for pre-Amsterdam blocks" | ||
| .to_string(), | ||
| )); |
There was a problem hiding this comment.
Incorrect fallback recommendation in error message
The error message advises callers to "use generate_witness_for_blocks for pre-Amsterdam blocks", but generate_witness_for_blocks is defined as:
pub async fn generate_witness_for_blocks(&self, blocks: &[Block]) -> Result<ExecutionWitness, ChainError> {
self.generate_witness_for_blocks_with_fee_configs(blocks, None).await
}It simply delegates to the exact same function that raises this error, so following the guidance will immediately hit the same WitnessGeneration error again. Any pre-Amsterdam caller expecting the old behaviour will be stuck with no working exit path.
The message should either indicate there is no pre-Amsterdam support, or point to a concrete alternative (e.g. generate_witness_from_account_updates, or the old DatabaseLogger-based path if it is still available elsewhere).
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1249-1257
Comment:
**Incorrect fallback recommendation in error message**
The error message advises callers to "use `generate_witness_for_blocks` for pre-Amsterdam blocks", but `generate_witness_for_blocks` is defined as:
```rust
pub async fn generate_witness_for_blocks(&self, blocks: &[Block]) -> Result<ExecutionWitness, ChainError> {
self.generate_witness_for_blocks_with_fee_configs(blocks, None).await
}
```
It simply delegates to the exact same function that raises this error, so following the guidance will immediately hit the same `WitnessGeneration` error again. Any pre-Amsterdam caller expecting the old behaviour will be stuck with no working exit path.
The message should either indicate there is no pre-Amsterdam support, or point to a concrete alternative (e.g. `generate_witness_from_account_updates`, or the old `DatabaseLogger`-based path if it is still available elsewhere).
How can I resolve this? If you propose a fix, please make it concise.| // Include codes from accounts that have code changes in the BAL | ||
| for ac in bal.as_ref().into_iter().flat_map(|b| b.accounts()) { | ||
| if !ac.code_changes.is_empty() { | ||
| // Look up the code hash from the account state in storage | ||
| if let Ok(Some(account_state)) = | ||
| self.storage.get_account_state(parent_hash, ac.address) | ||
| { | ||
| if account_state.code_hash != *EMPTY_TRIE_HASH | ||
| && seen_code_hashes.insert(account_state.code_hash) | ||
| { | ||
| if let Ok(Some(code)) = | ||
| self.storage.get_account_code(account_state.code_hash) | ||
| { | ||
| codes.push(code.bytecode.to_vec()); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
First bytecode collection pass is redundant with the second pass
This first loop iterates over accounts that have code_changes in the BAL and attempts to include their pre-state bytecode (at parent_hash). However, there are two reasons this loop is either ineffective or redundant:
-
Newly deployed contracts (the common case):
code_changesis populated when new code is deployed (CREATE/CREATE2). For these accounts the pre-state atparent_hashhascode_hash == EMPTY_TRIE_HASH, so the guardaccount_state.code_hash != *EMPTY_TRIE_HASHis false and nothing is added. -
Existing accounts whose code is overwritten (self-destruct + re-deploy): The account would appear in
bal.accounts(), so its address is already instate_accessed. The second pass (lines 1332–1346) iterates overstate_accessed.keys()and would find exactly the same pre-state code hash, deduplicated by the sharedseen_code_hashesset.
In practice, every address covered by this first pass is also unconditionally covered by the second pass. The first loop can be removed to simplify the code without any change in behaviour.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1312-1330
Comment:
**First bytecode collection pass is redundant with the second pass**
This first loop iterates over accounts that have `code_changes` in the BAL and attempts to include their pre-state bytecode (at `parent_hash`). However, there are two reasons this loop is either ineffective or redundant:
1. **Newly deployed contracts (the common case):** `code_changes` is populated when new code is deployed (`CREATE`/`CREATE2`). For these accounts the pre-state at `parent_hash` has `code_hash == EMPTY_TRIE_HASH`, so the guard `account_state.code_hash != *EMPTY_TRIE_HASH` is false and nothing is added.
2. **Existing accounts whose code is overwritten (self-destruct + re-deploy):** The account would appear in `bal.accounts()`, so its address is already in `state_accessed`. The second pass (lines 1332–1346) iterates over `state_accessed.keys()` and would find exactly the same pre-state code hash, deduplicated by the shared `seen_code_hashes` set.
In practice, every address covered by this first pass is also unconditionally covered by the second pass. The first loop can be removed to simplify the code without any change in behaviour.
How can I resolve this? If you propose a fix, please make it concise.| .copied() | ||
| }) | ||
| .unwrap_or(first_block_header.parent_hash); | ||
| let first_needed_block_number = first_block_header.number.saturating_sub(257); |
There was a problem hiding this comment.
first_needed_block_number is anchored to the first block, not the last
first_needed_block_number is computed from first_block_header.number, which is intentionally the OLDEST block in the batch (any block in the batch can reference BLOCKHASH as far back as 256 predecessors of the first block). This is correct for a multi-block batch.
However, this line might be slightly confusing because the walk starts from blocks.last().header while the stopping condition is derived from first_block_header. A short inline comment explaining this relationship (e.g. "anchored to the first block so we cover all BLOCKHASH lookups across the entire batch") would help future readers follow the intent without having to re-derive it.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1413
Comment:
**`first_needed_block_number` is anchored to the first block, not the last**
`first_needed_block_number` is computed from `first_block_header.number`, which is intentionally the OLDEST block in the batch (any block in the batch can reference BLOCKHASH as far back as 256 predecessors of the first block). This is correct for a multi-block batch.
However, this line might be slightly confusing because the walk **starts** from `blocks.last().header` while the stopping condition is derived from `first_block_header`. A short inline comment explaining this relationship (e.g. "anchored to the first block so we cover all BLOCKHASH lookups across the entire batch") would help future readers follow the intent without having to re-derive it.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
9a3e3fd to
43f8cda
Compare
| "Failed to get account code".to_string(), | ||
| ))?; | ||
| codes.push(code.bytecode.to_vec()); | ||
| let mut seen_code_hashes = HashSet::new(); |
There was a problem hiding this comment.
This uses parent_state_root (block N-1's state) to look up account code hashes. Contracts deployed during block N's execution won't exist in the parent state — get_account_state_by_root returns None and their bytecode is silently missed from the witness.
The account_updates from vm.get_state_transitions() should have the code_hash for newly created accounts — you could collect those as a second pass.
There was a problem hiding this comment.
This uses
parent_state_root(block N-1's state) to look up account code hashes. Contracts deployed during block N's execution won't exist in the parent state —get_account_state_by_rootreturnsNoneand their bytecode is silently missed from the witness.The
account_updatesfromvm.get_state_transitions()should have thecode_hashfor newly created accounts — you could collect those as a second pass.
pl check now and lemme know if anything is required on my end
43f8cda to
37782b1
Compare
closes #6388
replaces DatabaseLogger wrapping in generate_witness_for_blocks_with_fee_configs with the BAL that execute_block already produces for Amsterdam+ blocks. the BAL contains exactly the same info that DatabaseLogger captures (touched accounts, storage slots, code changes), so we use it directly instead of intercepting every DB call through mutex-guarded fields.
changes: