Skip to content

refactor(cache): skip empty data in pending data#3046

Merged
julienrbrt merged 2 commits intomainfrom
julien/pending-data
Feb 3, 2026
Merged

refactor(cache): skip empty data in pending data#3046
julienrbrt merged 2 commits intomainfrom
julien/pending-data

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Feb 3, 2026

Overview

NumPendingData() is giving a right but out of proportion value because empty data isn't submitted to DA, this led to a lagging submitted value and a way too high missing submission counts.
Discovered thanks to #3038.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 3, 2026, 2:51 PM

@claude
Copy link
Contributor

claude bot commented Feb 3, 2026

Claude finished @julienrbrt's task in 2m 47s —— View job


Code Review Complete

I've completed a comprehensive review of this PR. The changes correctly address the issue where NumPendingData() was inflating counts by including empty data blocks that are never submitted to the DA layer.

Todo List:

  • Read and analyze all changed files
  • Review core logic in pending_data.go
  • Review test coverage and test quality
  • Check error handling and context usage
  • Assess performance implications
  • Review CHANGELOG entry
  • Provide comprehensive feedback

Summary

The PR successfully implements automatic advancement past empty data blocks in the PendingData cache, ensuring accurate pending data counts. The test coverage is thorough and the implementation is generally sound. However, there are several concerns around context handling, error propagation, and potential performance implications that should be addressed.


Critical Issues

1. Context Handling in NumPendingData() (High Priority)

File: block/internal/cache/pending_data.go:80

func (pd *PendingData) NumPendingData() uint64 {
    pd.advancePastEmptyData(context.Background())
    return pd.base.numPending()
}

Issue: Using context.Background() in NumPendingData() is problematic:

  • The function is called from daSubmissionLoop() which has a cancellable context (submitter.go:179, 242)
  • Store operations in advancePastEmptyData cannot be cancelled, leading to potential hangs
  • If the underlying store operations block, the entire submission loop becomes unresponsive
  • The function signature doesn't accept a context, making cancellation impossible

Impact: In production scenarios with slow storage or network-backed stores, this could cause the submission loop to hang indefinitely without ability to gracefully shut down.

Recommendation:
Either:

  1. Change NumPendingData() to accept a context.Context parameter (preferred for proper cancellation)
  2. Use a context with timeout derived from the parent context
  3. Make advancePastEmptyData non-blocking or add timeout protection

Fix this →


2. Silent Error Swallowing in advancePastEmptyData() (Medium Priority)

File: block/internal/cache/pending_data.go:90-113

func (pd *PendingData) advancePastEmptyData(ctx context.Context) {
    storeHeight, err := pd.base.store.Height(ctx)
    if err != nil {
        return  // Silent failure
    }
    // ...
    for height := currentHeight + 1; height <= storeHeight; height++ {
        data, err := fetchData(ctx, pd.base.store, height)
        if err != nil {
            return  // Silent failure - could be real error or errInFlightData
        }
        // ...
    }
}

Issues:

  • All errors are silently swallowed without logging or differentiation
  • Cannot distinguish between transient errors (in-flight data) and persistent errors (store corruption, network issues)
  • The function is called from a hot path (NumPendingData in submission loop) where failures should be observable
  • No way for operators to detect if advancement is failing consistently

Impact:

  • Silent failures could mask serious storage issues
  • Debugging production issues becomes difficult without visibility into why advancement stops
  • Metrics like DASubmitterPendingBlobs may be misleading if advancement consistently fails

Recommendation:

func (pd *PendingData) advancePastEmptyData(ctx context.Context) {
    storeHeight, err := pd.base.store.Height(ctx)
    if err != nil {
        pd.base.logger.Debug().Err(err).Msg("failed to get store height in advancePastEmptyData")
        return
    }
    
    currentHeight := pd.base.getLastSubmittedHeight()
    
    for height := currentHeight + 1; height <= storeHeight; height++ {
        data, err := fetchData(ctx, pd.base.store, height)
        if err != nil {
            // Only log non-inflight errors
            if !errors.Is(err, errInFlightData) {
                pd.base.logger.Debug().Err(err).Uint64("height", height).
                    Msg("failed to fetch data in advancePastEmptyData")
            }
            return
        }
        // ... rest of logic
    }
}

Fix this →


Design & Architecture Concerns

3. Side Effects in Metric Getter (Medium Priority)

File: block/internal/cache/pending_data.go:79-82

The function NumPendingData() is named like a simple getter but has significant side effects:

  • Modifies persistent state (lastSubmittedDataHeight in store)
  • Called frequently from hot path (daSubmissionLoop ticker)
  • Performs I/O operations on every call
  • Can advance state by multiple heights

Issues:

  • Violates principle of least surprise - getters shouldn't mutate state
  • No indication in the API that this is a stateful operation
  • Could lead to unexpected behavior if called from multiple locations
  • Makes reasoning about state changes difficult

Comparison with NumPendingHeaders():
PendingHeaders.NumPendingHeaders() (pending_headers.go:75) is a pure getter with no side effects, making the API inconsistent.

Recommendation:

  1. Consider renaming to something like NumPendingDataWithAdvancement() to signal side effects
  2. Or separate concerns: have NumPendingData() remain pure, and call advancePastEmptyData() explicitly before getting the count in the submission loop
  3. Add documentation clearly stating the side effects

4. Race Condition Risk Between NumPendingData() and GetPendingData() (Low-Medium Priority)

File: block/internal/cache/pending_data.go:79-82, 57-77

Scenario:

// Thread 1 (submission loop - submitter.go:242)
dataNb := s.cache.NumPendingData()  // Advances to height 5, returns 3
if dataNb > 0 {
    // Thread 2 could run here
    signedDataList, _, err := s.cache.GetPendingData(ctx)  // Might return different items
}

Issue:
Between the check (NumPendingData) and the fetch (GetPendingData), another goroutine could modify the last submitted height. This could lead to:

  • Fetching different data than expected
  • Count-fetch mismatch (checking for 3 items but getting 2 or 4)

Current Mitigation:
The submission loops use mutexes (dataSubmissionMtx, headerSubmissionMtx) which provide some protection, but the pattern is fragile.

Recommendation:
Consider a more atomic approach:

// Option 1: Return both count and data together
func (pd *PendingData) GetPendingDataWithCount(ctx context.Context) ([]*types.Data, [][]byte, uint64, error)

// Option 2: Document the assumption that callers must hold external locks
// (current implicit behavior)

Performance & Resource Efficiency

5. Potential O(n) Store Operations on Every Tick (Medium Priority)

File: block/internal/cache/pending_data.go:90-113

Issue:

  • advancePastEmptyData is called on every ticker interval from daSubmissionLoop (submitter.go:179, 242)
  • If there are many consecutive empty blocks, this function iterates through all of them on every tick
  • Each iteration performs:
    • Store read (fetchData)
    • Potential store write (setLastSubmittedHeight)
  • With default tick interval of ~250ms (BlockTime/4), this could be frequent

Example Scenario:

  • 100 consecutive empty blocks at heights 1000-1099
  • On first NumPendingData() call after height 999 is submitted:
    • Iterates 100 times
    • Performs 100 store reads
    • Performs 100 store writes (via setLastSubmittedHeight)
    • Takes potentially seconds on slow storage

Optimization Opportunities:

  1. Batch store writes: Instead of calling setLastSubmittedHeight for each empty block, accumulate and write once at the end
  2. Add early exit: Cache the result of advancement within a tick interval
  3. Limit advancement per call: Only advance a maximum number of blocks per call (e.g., 10) to bound latency

Recommendation:

func (pd *PendingData) advancePastEmptyData(ctx context.Context) {
    storeHeight, err := pd.base.store.Height(ctx)
    if err != nil {
        return
    }
    
    currentHeight := pd.base.getLastSubmittedHeight()
    lastEmptyHeight := currentHeight
    
    const maxAdvancePerCall = 50  // Prevent excessive iteration
    endHeight := min(storeHeight, currentHeight + maxAdvancePerCall)
    
    for height := currentHeight + 1; height <= endHeight; height++ {
        data, err := fetchData(ctx, pd.base.store, height)
        if err != nil {
            break
        }
        
        if len(data.Txs) > 0 {
            break
        }
        
        lastEmptyHeight = height
    }
    
    // Batch write at the end instead of per-iteration
    if lastEmptyHeight > currentHeight {
        pd.base.setLastSubmittedHeight(ctx, lastEmptyHeight)
    }
}

Note: This is already partially optimized by the current implementation calling setLastSubmittedHeight per block, but each call performs a store write which could be batched.


6. Store Write Amplification (Low Priority)

File: block/internal/cache/pending_data.go:111

Each empty block triggers a store metadata write via setLastSubmittedHeight:

pd.base.setLastSubmittedHeight(ctx, height)

Which calls (pending_base.go:127-141):

func (pb *pendingBase[T]) setLastSubmittedHeight(ctx context.Context, newLastSubmittedHeight uint64) {
    // ... atomic update ...
    err := pb.store.SetMetadata(ctx, pb.metaKey, bz)  // Store write every time
}

For consecutive empty blocks, this results in N store writes where 1 would suffice.

Impact: Minimal for small numbers of empty blocks, but could matter at scale or with high-latency storage.


Testing & Reliability

7. Excellent Test Coverage

Files: block/internal/cache/pending_data_test.go

The test coverage is thorough and well-structured:

Strengths:

  • ✅ Comprehensive edge case coverage (empty at start, middle, end, all empty)
  • ✅ Tests verify both count and actual data returned
  • ✅ Tests check state persistence (GetLastSubmittedDataHeight)
  • ✅ Parallel test execution with proper isolation
  • ✅ Clear test names describing scenarios
  • ✅ Tests use realistic data (transactions where expected)

Test Scenarios Covered:

  1. TestPendingData_BasicFlow - Normal operation with non-empty blocks
  2. TestPendingData_AdvancesPastEmptyData - Empty blocks in the middle
  3. TestPendingData_AdvancesPastAllEmptyToEnd - All remaining blocks empty
  4. TestPendingData_AdvancesPastEmptyAtStart - Empty blocks from genesis
  5. TestPendingData_InitFromMetadata - State recovery after restart
  6. TestPendingData_GetPending_PropagatesFetchError - Error handling

Minor Test Improvements:

  1. Could add test for concurrent access patterns
  2. Could add test for advancement with in-flight data (errInFlightData)
  3. Could add test for store errors during advancement
  4. Could add benchmark for advancement performance with many empty blocks

Error Handling & Resilience

8. Inconsistent Error Handling Pattern (Low Priority)

Comparing error handling across similar functions:

advancePastEmptyData: Silently returns on all errors
getPending (pending_base.go:60): Returns errors to caller
numPending (pending_base.go:114): Logs error and returns 0

This inconsistency makes it unclear what the expected behavior should be. Consider standardizing the approach across the codebase.


9. No Circuit Breaker for Persistent Failures (Low Priority)

If fetchData consistently fails (e.g., corrupted store), advancePastEmptyData will be called repeatedly on every tick without any backoff or circuit breaker pattern. This is minor since the function returns early on error, but worth noting for future hardening.


Code Quality & Go Idioms

10. Good: Removal of Redundant init() Methods

Files:

  • block/internal/cache/pending_data.go (removed lines 56-58)
  • block/internal/cache/pending_headers.go (removed lines 83-86)

The removal of these wrapper init() methods is a good cleanup. They were unnecessary since:

  1. newPendingBase already calls init() internally (pending_base.go:52)
  2. These wrappers provided no additional functionality
  3. Reduces API surface area

11. Good: Proper Use of Context in Most Places

Most functions properly accept and propagate context.Context parameters, following Go best practices. The main exception is NumPendingData() discussed in issue #1.


12. Minor: Test Assertion Improvement

File: block/internal/cache/pending_data_test.go:223

Changed from:

require.Empty(t, pending)

To:

require.Nil(t, pending)

This is more precise since:

  • require.Empty passes for both nil and []T{}
  • require.Nil only passes for nil
  • Better matches the function's contract of returning nil on error

Documentation & Developer Experience

13. Good Documentation of Function Behavior

The comment for advancePastEmptyData (pending_data.go:88-89) clearly explains its purpose:

// advancePastEmptyData advances lastSubmittedDataHeight past any consecutive empty data blocks.
// This ensures that NumPendingData doesn't count empty data that won't be published to DA.

Improvement Needed:
Should also document:

  • That it modifies persistent state
  • That errors are silently handled
  • When it's expected to be called
  • Performance characteristics for many empty blocks

Suggested improvement:

// advancePastEmptyData advances lastSubmittedDataHeight past any consecutive empty data blocks
// and persists the new height to the store. This ensures that NumPendingData doesn't count
// empty data that won't be published to DA.
//
// This function is called automatically by NumPendingData() and performs store I/O.
// Errors during advancement (including in-flight data) cause the function to stop advancing
// and return silently. The advancement will be retried on the next call.
//
// Note: For N consecutive empty blocks, this performs N store reads and potentially N store writes.

14. CHANGELOG Entry is Clear

File: CHANGELOG.md:18

The entry accurately describes the change:

- Improve `cache.NumPendingData` to not return empty data. Automatically bumps `LastSubmittedHeight` to reflect that. ([#3046](https://github.com/evstack/ev-node/pull/3046))

Minor suggestion: Could be slightly more explicit about the side effect:

- Fix `cache.NumPendingData` to skip empty data blocks that aren't submitted to DA. The function now automatically advances `LastSubmittedHeight` past consecutive empty blocks. ([#3046](https://github.com/evstack/ev-node/pull/3046))

Security Considerations

15. No Security Concerns Identified

  • No credential exposure or sensitive data handling
  • No injection vulnerabilities
  • Proper validation of data (checks for empty txs)
  • Store operations use context for cancellation (except issue ABCI interface #1)
  • No unbounded resource consumption (limited by store height)

Concurrency Patterns

16. Thread Safety Analysis

Existing Protection:

  • pendingBase uses sync.Mutex to protect getPending (pending_base.go:35, 61)
  • pendingBase uses atomic.Uint64 for lastHeight (pending_base.go:29)
  • Submission loops use separate mutexes for headers and data (submitter.go:55-56)

Potential Issue:
advancePastEmptyData modifies lastHeight via setLastSubmittedHeight without holding the pendingBase.mu lock. However, this is acceptable because:

  1. setLastSubmittedHeight uses atomic operations internally (pending_base.go:129)
  2. The function is only called from NumPendingData which is called from a single goroutine (submission loop)
  3. Even if called concurrently, the atomic CompareAndSwap provides linearizability

Verdict: Thread safety appears sound, though the reliance on external mutex protection should be documented.


Integration & System-Level Concerns

17. Interaction with submitter.go (Low Priority)

File: block/internal/submitting/submitter.go:242-302

The submission loop calls NumPendingData() on every tick:

dataNb := s.cache.NumPendingData()  // Line 242

Observations:

  1. The check if dataNb > 0 (line 243) means advancement only matters when there's pending data
  2. The subsequent GetPendingData() call (line 257) fetches the actual data
  3. If advancement fails silently (issue Peer discovery #2), the count may be incorrect, leading to unnecessary submission attempts

Recommendation: Consider adding metrics or logging in the submission loop to track:

  • How often advancement occurs
  • How many blocks are skipped
  • When advancement fails

Summary of Recommendations (Prioritized)

Must Fix (High Priority):

  1. ⚠️ Issue ABCI interface #1: Fix context handling in NumPendingData() to accept context parameter for proper cancellation

Should Fix (Medium Priority):
2. Issue #2: Add logging for errors in advancePastEmptyData() to improve observability
3. Issue #3: Rename function or document side effects clearly
4. Issue #5: Consider batching store writes or limiting advancement per call

Nice to Have (Low Priority):
5. Issue #4: Document race condition assumptions or make API more atomic
6. Issue #13: Improve documentation of advancePastEmptyData()
7. Issue #17: Add metrics for empty block advancement


Positive Aspects ✅

  1. Correct Solution: The approach of auto-advancing past empty blocks is the right design
  2. Thorough Testing: Excellent test coverage with clear edge cases
  3. Code Quality: Clean, readable implementation following Go idioms
  4. Good Cleanup: Removal of redundant init() methods improves the API
  5. Atomic Operations: Proper use of atomic operations for concurrent safety
  6. No Breaking Changes: Changes are backward compatible

Codecov Analysis

The Codecov report shows 69.23% patch coverage with 4 lines missing coverage:

Likely uncovered lines in advancePastEmptyData:

  • Error paths (lines 92-93, 100-102)
  • Early returns on error

Recommendation: The coverage is acceptable since:

  1. Error paths are defensive and tested indirectly
  2. The main logic paths are covered by comprehensive tests
  3. Increasing coverage to 100% would require mocking store failures, which may not add significant value

Final Verdict

Approval Recommendation: ⚠️ Approve with Required Changes

The PR successfully solves the stated problem and includes excellent test coverage. However, the context handling issue (#1) should be addressed before merging to prevent potential production issues with hung submission loops.

Required Before Merge:

Recommended Before Merge:

Can Be Follow-up:


@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the PendingData cache mechanism to ensure that the reported count of pending data blocks accurately reflects only those blocks that contain actual transactions and are therefore eligible for submission to the Data Availability layer. This is achieved by introducing logic to automatically advance the 'last submitted data height' past any consecutive empty blocks, preventing misleading metrics and improving the efficiency of the submission process.

Highlights

  • Accurate Pending Data Count: The NumPendingData() function now provides a more accurate count by automatically skipping data blocks that are empty and thus not submitted to the Data Availability (DA) layer.
  • Automatic Empty Block Advancement: A new internal function, advancePastEmptyData, has been introduced to automatically update the lastSubmittedDataHeight past any consecutive empty data blocks, ensuring that only relevant blocks are considered pending.
  • Refactoring of init() Methods: Redundant init() methods have been removed from both PendingData and PendingHeaders structs, simplifying the codebase.
  • Enhanced Test Coverage: New unit tests have been added to thoroughly validate the logic for advancing past empty data blocks in various scenarios, including blocks at the start, middle, and end of the pending sequence.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • block/internal/cache/pending_data.go
    • Removed the unused init() method.
    • Modified NumPendingData() to invoke advancePastEmptyData() for accurate counting.
    • Added advancePastEmptyData() function to skip empty data blocks when determining pending data.
  • block/internal/cache/pending_data_test.go
    • Updated TestPendingData_BasicFlow to use blocks with transactions for more realistic testing.
    • Added TestPendingData_AdvancesPastEmptyData to verify correct advancement past empty blocks.
    • Added TestPendingData_AdvancesPastAllEmptyToEnd to test scenarios where all remaining blocks are empty.
    • Added TestPendingData_AdvancesPastEmptyAtStart to test advancement when empty blocks are at the beginning.
    • Ensured TestPendingData_InitFromMetadata creates a non-empty block at height 3.
    • Changed require.Empty(t, pending) to require.Nil(t, pending) in TestPendingData_GetPending_PropagatesFetchError for consistency.
  • block/internal/cache/pending_headers.go
    • Removed the unused init() method.
  • block/internal/submitting/submitter.go
    • Performed a minor whitespace adjustment.
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the PendingData and PendingHeaders caches to correctly account for and skip empty data blocks when calculating the number of pending items. This addresses an issue where NumPendingData() was returning an inflated value due to counting empty blocks that would not be submitted to the Data Availability (DA) layer. The changes include introducing an advancePastEmptyData method in PendingData and updating test cases to reflect this new behavior, including comprehensive tests for various empty data scenarios. Additionally, redundant init() methods were removed, and a minor correction from require.Empty to require.Nil was made in a test assertion for better precision.

}

func (pd *PendingData) NumPendingData() uint64 {
pd.advancePastEmptyData(context.Background())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The advancePastEmptyData function is called with context.Background(). While NumPendingData itself doesn't accept a context, advancePastEmptyData performs store operations that could potentially be long-running or require cancellation. Using context.Background() means these operations are not cancellable and will run indefinitely if the underlying store operations hang. Consider if NumPendingData should accept a context, or if a context with a timeout/cancellation should be derived for advancePastEmptyData to ensure responsiveness in scenarios where NumPendingData might be called in a cancellable loop.

@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 69.23077% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.19%. Comparing base (e8e8946) to head (4963492).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/cache/pending_data.go 69.23% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3046      +/-   ##
==========================================
+ Coverage   56.17%   56.19%   +0.02%     
==========================================
  Files         118      118              
  Lines       12066    12075       +9     
==========================================
+ Hits         6778     6786       +8     
+ Misses       4543     4542       -1     
- Partials      745      747       +2     
Flag Coverage Δ
combined 56.19% <69.23%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt added this pull request to the merge queue Feb 3, 2026
Merged via the queue into main with commit e7bf19d Feb 3, 2026
29 of 30 checks passed
@julienrbrt julienrbrt deleted the julien/pending-data branch February 3, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants