perf(l1): reuse BAL for witness generation by ArshLabs · Pull Request #6406 · lambdaclass/ethrex

ArshLabs · 2026-03-26T11:24:18Z

replaces DatabaseLogger wrapping in generate_witness_for_blocks_with_fee_configs with the BAL that execute_block already produces for Amsterdam+ blocks. the BAL contains exactly the same info that DatabaseLogger captures (touched accounts, storage slots, code changes), so we use it directly instead of intercepting every DB call through mutex-guarded fields.

changes:

creates a plain VM instead of wrapping with DatabaseLogger
uses bal.accounts() and all_storage_slots() to build state access map
collects bytecodes from account states in storage (deduplicated)
conservatively includes up to 256 ancestor block headers since BAL does not track BLOCKHASH opcode references
returns error for pre-Amsterdam blocks (BAL is None), since this path targets post-Amsterdam

Copilot

Pull request overview

This PR optimizes L1/L2 execution witness generation by removing the DatabaseLogger wrapper and instead deriving accessed state (accounts/storage slots) from the Block Access List (BAL) produced by execute_block() for Amsterdam+ blocks, reducing per-DB-call locking overhead.

Changes:

Stop wrapping the VM database with DatabaseLogger; execute blocks with a plain StoreVmDatabase.
Build the state_accessed map from bal.accounts() + all_storage_slots() instead of logger-captured accesses.
Adjust witness assembly to (a) gather bytecodes via account state/code lookups and (b) include a conservative range of ancestor headers to cover BLOCKHASH.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T11:31:17Z

+                        // Look up the code hash from the account state in storage
+                        if let Ok(Some(account_state)) =
+                            self.storage.get_account_state(parent_hash, ac.address)
+                        {
+                            if account_state.code_hash != *EMPTY_TRIE_HASH
+                                && seen_code_hashes.insert(account_state.code_hash)
+                            {


self.storage.get_account_state(parent_hash, ac.address) appears to be calling Store::get_account_state, but in crates/storage/store.rs that method is async fn get_account_state(block_number, address), not a sync lookup by block hash. As written this should not compile (wrong argument type + missing .await). Use a synchronous API here (e.g., get_account_state_by_root(parent_header.state_root, ...), or open the state trie for the relevant root) and pass the correct empty-code sentinel (EMPTY_KECCACK_HASH, not EMPTY_TRIE_HASH).

Copilot · 2026-03-26T11:31:17Z

+                // Include codes from accounts that have code changes in the BAL
+                for ac in bal.as_ref().into_iter().flat_map(|b| b.accounts()) {
+                    if !ac.code_changes.is_empty() {
+                        // Look up the code hash from the account state in storage
+                        if let Ok(Some(account_state)) =
+                            self.storage.get_account_state(parent_hash, ac.address)
+                        {
+                            if account_state.code_hash != *EMPTY_TRIE_HASH
+                                && seen_code_hashes.insert(account_state.code_hash)
+                            {
+                                if let Ok(Some(code)) =
+                                    self.storage.get_account_code(account_state.code_hash)
+                                {
+                                    codes.push(code.bytecode.to_vec());
+                                }
+                            }
+                        }
+                    }


This code currently looks up code hashes from the parent state (parent_hash/parent root). That will miss bytecodes for contracts created or updated in the current block (and then used by later blocks in the batch). Instead, collect deployed/updated code from account_updates (it already carries code: Option<Code> / info.code_hash) or query against the post-state root after applying updates.

Copilot · 2026-03-26T11:31:17Z

+                                if let Ok(Some(code)) =
+                                    self.storage.get_account_code(account_state.code_hash)
+                                {
+                                    codes.push(code.bytecode.to_vec());
+                                }
+                            }


Bytecode collection silently ignores missing account states/codes (if let Ok(Some(...))), whereas the previous logger.code_accessed path failed witness generation if a referenced code hash was missing. Silent omission can produce an incomplete witness (guest execution will default missing bytecode to empty), masking real correctness issues. Prefer propagating a ChainError::WitnessGeneration when a code hash is expected but the code cannot be loaded.

Copilot · 2026-03-26T11:31:17Z

+                for account in state_accessed.keys() {
+                    if let Ok(Some(account_state)) =
+                        self.storage.get_account_state(parent_hash, *account)
+                    {
+                        if account_state.code_hash != *EMPTY_TRIE_HASH
+                            && seen_code_hashes.insert(account_state.code_hash)
+                        {


Same issue as above: self.storage.get_account_state(parent_hash, *account) does not match the Store::get_account_state API (async + block number), so this should not compile as written. Also, checking code_hash != *EMPTY_TRIE_HASH is incorrect for code hashes; use EMPTY_KECCACK_HASH instead.

Copilot · 2026-03-26T11:31:18Z

@@ -1407,21 +1419,18 @@ impl Blockchain {
            .header
            .clone();

-        // Headers from latest - 1 until we reach first block header we need.
-        // We do it this way because we want to fetch headers by hash, not by number
-        while current_header.hash() != first_needed_block_hash {
+        // Walk backwards from the last block's parent to the oldest needed header.
+        while current_header.number > first_needed_block_number {
            let parent_hash = current_header.parent_hash;
-            let current_number = current_header.number - 1;

-            current_header = self
+            let Some(parent_header) = self
                .storage
                .get_block_header_by_hash(parent_hash)?
-                .ok_or_else(|| {
-                    ChainError::WitnessGeneration(format!(
-                        "Failed to get block {current_number} header"
-                    ))
-                })?;
+            else {
+                break; // Reached the oldest stored header
+            };


The “up to 256 blocks” logic is off by one: saturating_sub(257) makes the loop include 257 ancestors (and potentially many more headers when the batch spans multiple blocks). Also, breaking on missing headers can yield a witness that’s missing block headers that the guest DB may need for BLOCKHASH, without returning an error. Consider (a) using saturating_sub(256) if you intend exactly the BLOCKHASH window, and (b) returning an explicit error if a header within the intended range can’t be fetched (or adjust the comment/logic to make the completeness guarantees explicit).

Copilot · 2026-03-26T11:31:18Z

+                // Pre-Amsterdam: fall back to DatabaseLogger path.
+                // This branch should not normally be reached since
+                // generate_witness is typically called for recent blocks,
+                // but we keep it for backward compatibility.
+                return Err(ChainError::WitnessGeneration(
+                    "Witness generation without BAL requires DatabaseLogger; \
+                     use generate_witness_for_blocks for pre-Amsterdam blocks"


The pre-Amsterdam bal == None branch returns an error, but the inline comment says “fall back to DatabaseLogger path” and the error text suggests calling generate_witness_for_blocks, which just calls this same function. Please update the comment/error message to reflect the actual behavior (either implement an actual fallback here, or clearly state that pre-Amsterdam witness generation is unsupported from this path and point to the correct alternative API).

Suggested change

// Pre-Amsterdam: fall back to DatabaseLogger path.

// This branch should not normally be reached since

// generate_witness is typically called for recent blocks,

// but we keep it for backward compatibility.

return Err(ChainError::WitnessGeneration(

"Witness generation without BAL requires DatabaseLogger; \

use generate_witness_for_blocks for pre-Amsterdam blocks"

// Pre-Amsterdam: BAL is not available from the execution engine.

// This API does not support witness generation without BAL and

// will return an error for pre-Amsterdam blocks. Use a

// DatabaseLogger-backed witness generation API for those blocks.

return Err(ChainError::WitnessGeneration(

"Witness generation without BAL is not supported from this API; \

use a DatabaseLogger-backed witness generation mechanism for pre-Amsterdam blocks"

Copilot · 2026-03-26T11:31:18Z

+                for ac in bal.accounts() {
+                    let storage_keys: Vec<H256> = ac
+                        .all_storage_slots()
+                        .map(|slot| H256::from_uint(&slot))
+                        .collect();
+                    accessed.insert(ac.address, storage_keys);
+                }


AccountAccess::all_storage_slots() can include duplicates (it chains reads and writes). Previously this code deduplicated storage keys per account; now it collects them verbatim, which can cause redundant storage-trie lookups/work during witness generation. Consider deduplicating the collected storage_keys (e.g., with a HashSet) before inserting into accessed.

greptile-apps · 2026-03-26T11:33:21Z

Greptile Summary

This PR removes the DatabaseLogger wrapper from generate_witness_for_blocks_with_fee_configs and instead consumes the Block Access List (BAL) that execute_block already produces for Amsterdam+ blocks. The refactor eliminates per-DB-call mutex overhead and makes the data flow more direct.

Key changes and observations:

BAL-based state access map: All touched accounts and storage slots come from bal.accounts() / all_storage_slots(), which is correct and complete for Amsterdam+ blocks.
Pre-Amsterdam fallback now hard-errors: The else branch returns Err(ChainError::WitnessGeneration(...)) and advises callers to use generate_witness_for_blocks, but that function is a thin wrapper that calls the same function and will hit the same error immediately. The error message is incorrect and leaves pre-Amsterdam callers with no working path.
Redundant bytecode first pass: The loop that collects bytecodes for accounts with code_changes looks up their pre-state code at parent_hash. For newly deployed contracts (the common case) the pre-state code hash is EMPTY_TRIE_HASH so nothing is added. In the rare self-destruct+re-deploy case the second pass covers the same account anyway. The first loop can be safely removed.
Conservative BLOCKHASH header range: The new approach includes all 256 ancestor headers rather than only the ones referenced via the BLOCKHASH opcode. This is correct and a deliberate trade-off.

Confidence Score: 4/5

Safe to merge for Amsterdam+ deployments; the pre-Amsterdam code path now hard-errors with a misleading error message but is not expected to be triggered in production.

The core logic (BAL consumption, trie access, BLOCKHASH range) is correct and well-reasoned. The one P1 is the misleading error message for pre-Amsterdam blocks, which won't affect Amsterdam+ production paths but could confuse future callers. The redundant first bytecode pass is harmless but noisy.

crates/blockchain/blockchain.rs – specifically the pre-Amsterdam error fallback (lines 1248-1257) and the first bytecode collection loop (lines 1312-1330).

Important Files Changed

Filename	Overview
crates/blockchain/blockchain.rs	Replaces `DatabaseLogger` wrapping with direct BAL consumption in `generate_witness_for_blocks_with_fee_configs`; one misleading error message (incorrect fallback guidance for pre-Amsterdam path) and one redundant first-pass bytecode loop; BLOCKHASH walk-back logic is correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant generate_witness as generate_witness_for_blocks_with_fee_configs
    participant Evm
    participant Storage
    participant BAL as BlockAccessList

    Caller->>generate_witness: blocks, fee_configs
    loop For each block
        generate_witness->>Storage: new StoreVmDatabase(parent_header)
        generate_witness->>Evm: new_evm(vm_db) [L1] or new_for_l2(...) [L2]
        generate_witness->>Evm: execute_block(block)
        Evm-->>generate_witness: (execution_result, Some(bal)) [Amsterdam+]
        generate_witness->>Evm: get_state_transitions()
        Evm-->>generate_witness: account_updates
        generate_witness->>BAL: bal.accounts()
        BAL-->>generate_witness: [AccountChanges] with address, storage_reads, storage_changes
        generate_witness->>generate_witness: Build state_accessed map
        generate_witness->>Storage: get_account_state(parent_hash, addr) per accessed account
        Storage-->>generate_witness: account_state.code_hash
        generate_witness->>Storage: get_account_code(code_hash)
        Storage-->>generate_witness: bytecode -> codes[]
        generate_witness->>Storage: apply_account_updates_from_trie_with_witness(...)
        generate_witness->>Storage: store_block(block, account_updates, execution_result)
    end
    generate_witness->>Storage: Walk back up to 256 ancestor headers
    Storage-->>generate_witness: block_headers_bytes[]
    generate_witness-->>Caller: ExecutionWitness { codes, state_trie, storage_tries, headers }

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1249-1257

Comment:
**Incorrect fallback recommendation in error message**

The error message advises callers to "use `generate_witness_for_blocks` for pre-Amsterdam blocks", but `generate_witness_for_blocks` is defined as:

```rust
pub async fn generate_witness_for_blocks(&self, blocks: &[Block]) -> Result<ExecutionWitness, ChainError> {
    self.generate_witness_for_blocks_with_fee_configs(blocks, None).await
}
```

It simply delegates to the exact same function that raises this error, so following the guidance will immediately hit the same `WitnessGeneration` error again. Any pre-Amsterdam caller expecting the old behaviour will be stuck with no working exit path.

The message should either indicate there is no pre-Amsterdam support, or point to a concrete alternative (e.g. `generate_witness_from_account_updates`, or the old `DatabaseLogger`-based path if it is still available elsewhere).

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1312-1330

Comment:
**First bytecode collection pass is redundant with the second pass**

This first loop iterates over accounts that have `code_changes` in the BAL and attempts to include their pre-state bytecode (at `parent_hash`). However, there are two reasons this loop is either ineffective or redundant:

1. **Newly deployed contracts (the common case):** `code_changes` is populated when new code is deployed (`CREATE`/`CREATE2`). For these accounts the pre-state at `parent_hash` has `code_hash == EMPTY_TRIE_HASH`, so the guard `account_state.code_hash != *EMPTY_TRIE_HASH` is false and nothing is added.

2. **Existing accounts whose code is overwritten (self-destruct + re-deploy):** The account would appear in `bal.accounts()`, so its address is already in `state_accessed`. The second pass (lines 1332–1346) iterates over `state_accessed.keys()` and would find exactly the same pre-state code hash, deduplicated by the shared `seen_code_hashes` set.

In practice, every address covered by this first pass is also unconditionally covered by the second pass. The first loop can be removed to simplify the code without any change in behaviour.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1413

Comment:
**`first_needed_block_number` is anchored to the first block, not the last**

`first_needed_block_number` is computed from `first_block_header.number`, which is intentionally the OLDEST block in the batch (any block in the batch can reference BLOCKHASH as far back as 256 predecessors of the first block). This is correct for a multi-block batch.

However, this line might be slightly confusing because the walk **starts** from `blocks.last().header` while the stopping condition is derived from `first_block_header`. A short inline comment explaining this relationship (e.g. "anchored to the first block so we cover all BLOCKHASH lookups across the entire batch") would help future readers follow the intent without having to re-derive it.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "perf(l1): reuse BAL for witness generati..." | Re-trigger Greptile}

greptile-apps · 2026-03-26T11:33:24Z

+                // Pre-Amsterdam: fall back to DatabaseLogger path.
+                // This branch should not normally be reached since
+                // generate_witness is typically called for recent blocks,
+                // but we keep it for backward compatibility.
+                return Err(ChainError::WitnessGeneration(
+                    "Witness generation without BAL requires DatabaseLogger; \
+                     use generate_witness_for_blocks for pre-Amsterdam blocks"
+                        .to_string(),
+                ));


Incorrect fallback recommendation in error message

The error message advises callers to "use generate_witness_for_blocks for pre-Amsterdam blocks", but generate_witness_for_blocks is defined as:

pub async fn generate_witness_for_blocks(&self, blocks: &[Block]) -> Result<ExecutionWitness, ChainError> { self.generate_witness_for_blocks_with_fee_configs(blocks, None).await }

It simply delegates to the exact same function that raises this error, so following the guidance will immediately hit the same WitnessGeneration error again. Any pre-Amsterdam caller expecting the old behaviour will be stuck with no working exit path.

The message should either indicate there is no pre-Amsterdam support, or point to a concrete alternative (e.g. generate_witness_from_account_updates, or the old DatabaseLogger-based path if it is still available elsewhere).

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/blockchain/blockchain.rs Line: 1249-1257 Comment: **Incorrect fallback recommendation in error message** The error message advises callers to "use `generate_witness_for_blocks` for pre-Amsterdam blocks", but `generate_witness_for_blocks` is defined as: ```rust pub async fn generate_witness_for_blocks(&self, blocks: &[Block]) -> Result<ExecutionWitness, ChainError> { self.generate_witness_for_blocks_with_fee_configs(blocks, None).await } ``` It simply delegates to the exact same function that raises this error, so following the guidance will immediately hit the same `WitnessGeneration` error again. Any pre-Amsterdam caller expecting the old behaviour will be stuck with no working exit path. The message should either indicate there is no pre-Amsterdam support, or point to a concrete alternative (e.g. `generate_witness_from_account_updates`, or the old `DatabaseLogger`-based path if it is still available elsewhere). How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-26T11:33:25Z

+                // Include codes from accounts that have code changes in the BAL
+                for ac in bal.as_ref().into_iter().flat_map(|b| b.accounts()) {
+                    if !ac.code_changes.is_empty() {
+                        // Look up the code hash from the account state in storage
+                        if let Ok(Some(account_state)) =
+                            self.storage.get_account_state(parent_hash, ac.address)
+                        {
+                            if account_state.code_hash != *EMPTY_TRIE_HASH
+                                && seen_code_hashes.insert(account_state.code_hash)
+                            {
+                                if let Ok(Some(code)) =
+                                    self.storage.get_account_code(account_state.code_hash)
+                                {
+                                    codes.push(code.bytecode.to_vec());
+                                }
+                            }
+                        }
+                    }
+                }


First bytecode collection pass is redundant with the second pass

This first loop iterates over accounts that have code_changes in the BAL and attempts to include their pre-state bytecode (at parent_hash). However, there are two reasons this loop is either ineffective or redundant:

Newly deployed contracts (the common case): code_changes is populated when new code is deployed (CREATE/CREATE2). For these accounts the pre-state at parent_hash has code_hash == EMPTY_TRIE_HASH, so the guard account_state.code_hash != *EMPTY_TRIE_HASH is false and nothing is added.

Existing accounts whose code is overwritten (self-destruct + re-deploy): The account would appear in bal.accounts(), so its address is already in state_accessed. The second pass (lines 1332–1346) iterates over state_accessed.keys() and would find exactly the same pre-state code hash, deduplicated by the shared seen_code_hashes set.

In practice, every address covered by this first pass is also unconditionally covered by the second pass. The first loop can be removed to simplify the code without any change in behaviour.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/blockchain/blockchain.rs Line: 1312-1330 Comment: **First bytecode collection pass is redundant with the second pass** This first loop iterates over accounts that have `code_changes` in the BAL and attempts to include their pre-state bytecode (at `parent_hash`). However, there are two reasons this loop is either ineffective or redundant: 1. **Newly deployed contracts (the common case):** `code_changes` is populated when new code is deployed (`CREATE`/`CREATE2`). For these accounts the pre-state at `parent_hash` has `code_hash == EMPTY_TRIE_HASH`, so the guard `account_state.code_hash != *EMPTY_TRIE_HASH` is false and nothing is added. 2. **Existing accounts whose code is overwritten (self-destruct + re-deploy):** The account would appear in `bal.accounts()`, so its address is already in `state_accessed`. The second pass (lines 1332–1346) iterates over `state_accessed.keys()` and would find exactly the same pre-state code hash, deduplicated by the shared `seen_code_hashes` set. In practice, every address covered by this first pass is also unconditionally covered by the second pass. The first loop can be removed to simplify the code without any change in behaviour. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-26T11:33:26Z

-                    .copied()
-            })
-            .unwrap_or(first_block_header.parent_hash);
+        let first_needed_block_number = first_block_header.number.saturating_sub(257);


first_needed_block_number is anchored to the first block, not the last

first_needed_block_number is computed from first_block_header.number, which is intentionally the OLDEST block in the batch (any block in the batch can reference BLOCKHASH as far back as 256 predecessors of the first block). This is correct for a multi-block batch.

However, this line might be slightly confusing because the walk starts from blocks.last().header while the stopping condition is derived from first_block_header. A short inline comment explaining this relationship (e.g. "anchored to the first block so we cover all BLOCKHASH lookups across the entire batch") would help future readers follow the intent without having to re-derive it.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/blockchain/blockchain.rs Line: 1413 Comment: **`first_needed_block_number` is anchored to the first block, not the last** `first_needed_block_number` is computed from `first_block_header.number`, which is intentionally the OLDEST block in the batch (any block in the batch can reference BLOCKHASH as far back as 256 predecessors of the first block). This is correct for a multi-block batch. However, this line might be slightly confusing because the walk **starts** from `blocks.last().header` while the stopping condition is derived from `first_block_header`. A short inline comment explaining this relationship (e.g. "anchored to the first block so we cover all BLOCKHASH lookups across the entire batch") would help future readers follow the intent without having to re-derive it. How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

ElFantasma · 2026-03-26T16:10:29Z

-                        "Failed to get account code".to_string(),
-                    ))?;
-                codes.push(code.bytecode.to_vec());
+                let mut seen_code_hashes = HashSet::new();


This uses parent_state_root (block N-1's state) to look up account code hashes. Contracts deployed during block N's execution won't exist in the parent state — get_account_state_by_root returns None and their bytecode is silently missed from the witness.

The account_updates from vm.get_state_transitions() should have the code_hash for newly created accounts — you could collect those as a second pass.

This uses parent_state_root (block N-1's state) to look up account code hashes. Contracts deployed during block N's execution won't exist in the parent state — get_account_state_by_root returns None and their bytecode is silently missed from the witness.

The account_updates from vm.get_state_transitions() should have the code_hash for newly created accounts — you could collect those as a second pass.

pl check now and lemme know if anything is required on my end

ArshLabs requested a review from a team as a code owner March 26, 2026 11:24

Copilot AI review requested due to automatic review settings March 26, 2026 11:24

Copilot started reviewing on behalf of ArshLabs March 26, 2026 11:24 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

greptile-apps Bot reviewed Mar 26, 2026

View reviewed changes

ArshLabs force-pushed the perf/6388-reuse-bal-for-witness branch from 9a3e3fd to 43f8cda Compare March 26, 2026 12:01

ElFantasma requested changes Mar 27, 2026

View reviewed changes

perf(l1): reuse BAL for witness generation

37782b1

ArshLabs force-pushed the perf/6388-reuse-bal-for-witness branch from 43f8cda to 37782b1 Compare March 27, 2026 16:37

ArshLabs requested a review from ElFantasma March 27, 2026 16:38

Conversation

ArshLabs commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Mar 26, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

ElFantasma Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

ArshLabs Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants