Fix lexicographic search by rgsl888prabhu · Pull Request #969 · NVIDIA/cuopt

rgsl888prabhu · 2026-03-17T21:24:44Z

Description

Fix GES lexicographic search cudaErrorInvalidValue (test_lex_smoke)

Summary

Fixes cudaErrorInvalidValue in the guided ejection search (GES) lexicographic kernel for PDP routing. The failure was caused by launching with a block count computed for the initial k_max while the loop could reduce k_max (and thus the valid block range) before launch.

Root cause

n_blocks_lexico was set once using the initial k_max (e.g. 5).
The loop that finds a valid shared-memory size can decrement k_max (e.g. to 2–4) when set_shmem_of_kernel fails.
The kernel was still launched with the original block count (e.g. 77 for k_max = 5) but received the reduced k_max (e.g. 2).
The kernel uses delivery_insertion_idx_in_permutation = blockIdx.x / n_lexico_positions and expects the grid size to be n_lexico_positions * max_neighbors(k_max). With too many blocks, blockIdx.x went out of the intended range and led to invalid behavior and cudaErrorInvalidValue.

Changes

cpp/src/routing/ges/lexicographic_search/lexicographic_search.cu
- Compute only the number of insertion positions (n_lexico_positions) before the loop.
- After the loop, set n_blocks_lexico = n_lexico_positions * max_neighbors(k_max) using the final k_max so the launch config matches the kernel.
- If n_blocks_lexico <= 0, return false and skip the kernel launch.

Testing

pytest python/cuopt/cuopt/tests/routing/test_warnings.py::test_lex_smoke passes after rebuilding libcuopt.

Issue

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

copy-pr-bot · 2026-03-17T21:24:48Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-03-18T16:17:01Z

📝 Walkthrough

Walkthrough

The change relocates the computation of n_blocks_lexico from before a while-loop to after it completes, ensuring the grid launch parameter uses the finalized k_max value. A duplicate pre-loop calculation is removed.

Changes

Cohort / File(s)	Summary
Kernel Grid Parameter Timing `cpp/src/routing/ges/lexicographic_search/lexicographic_search.cu`	Moved `n_blocks_lexico` computation post-loop to use final `k_max` value; removed pre-loop duplicate calculation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix lexicographic search' directly refers to the main change: fixing a cudaErrorInvalidValue bug in the lexicographic search kernel by reordering block count computation.
Description check	✅ Passed	The description is detailed and clearly related to the changeset, explaining the root cause, changes made, and testing confirmation for the lexicographic search fix.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

cpp/src/routing/ges/lexicographic_search/lexicographic_search.cu (1)
701-707: The fix correctly addresses the root cause by computing n_blocks_lexico after k_max is finalized.

The relocation of this computation ensures the grid launch parameter uses the final k_max value determined by the shared-memory adjustment loop.

However, the PR description mentions adding a check: "If n_blocks_lexico <= 0, return false and skip the kernel launch." This defensive check appears to be missing. While practically unlikely given k_max >= 2 at this point, adding the guard would prevent launching kernels with invalid grid dimensions in edge cases.
🛡️ Suggested defensive check
   // Compute n_blocks_lexico after k_max is finalized by the while-loop above
   i_t n_blocks_lexico = (solution_ptr->get_num_orders() + solution_ptr->get_n_routes() -
                          solution_ptr->problem_ptr->order_info.depot_included_);
   if constexpr (REQUEST == request_t::PDP) {
     n_blocks_lexico *= max_neighbors<i_t, REQUEST>(k_max);
   }
+
+  if (n_blocks_lexico <= 0) { return false; }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/routing/ges/lexicographic_search/lexicographic_search.cu` around
lines 701 - 707, The code now computes n_blocks_lexico after k_max is finalized
but is missing the defensive guard promised in the PR; add a check after
computing n_blocks_lexico that if n_blocks_lexico <= 0 then return false
(skipping the kernel launch), ensuring this check applies for both generic
REQUEST and the PDP branch (where n_blocks_lexico is multiplied by
max_neighbors<i_t, REQUEST>(k_max)); reference the symbols n_blocks_lexico,
k_max, max_neighbors, and request_t::PDP when adding the early-return guard to
prevent invalid grid dimensions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/src/routing/ges/lexicographic_search/lexicographic_search.cu`:
- Around line 701-707: The code now computes n_blocks_lexico after k_max is
finalized but is missing the defensive guard promised in the PR; add a check
after computing n_blocks_lexico that if n_blocks_lexico <= 0 then return false
(skipping the kernel launch), ensuring this check applies for both generic
REQUEST and the PDP branch (where n_blocks_lexico is multiplied by
max_neighbors<i_t, REQUEST>(k_max)); reference the symbols n_blocks_lexico,
k_max, max_neighbors, and request_t::PDP when adding the early-return guard to
prevent invalid grid dimensions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5a68901e-6066-4030-a060-a399f2352f77

📥 Commits

Reviewing files that changed from the base of the PR and between 591ac5f and 27f087a.

📒 Files selected for processing (1)

cpp/src/routing/ges/lexicographic_search/lexicographic_search.cu

rgsl888prabhu · 2026-03-19T14:50:07Z

/merge

fix

b91f70e

fix

27f087a

rgsl888prabhu changed the title ~~fix~~ Fix lexicographic search Mar 18, 2026

rgsl888prabhu self-assigned this Mar 18, 2026

rgsl888prabhu added bug Something isn't working non-breaking Introduces a non-breaking change labels Mar 18, 2026

rgsl888prabhu marked this pull request as ready for review March 18, 2026 16:13

rgsl888prabhu requested a review from a team as a code owner March 18, 2026 16:13

rgsl888prabhu requested review from Bubullzz, akifcorduk and rg20 and removed request for Bubullzz and akifcorduk March 18, 2026 16:13

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

rg20 approved these changes Mar 18, 2026

View reviewed changes

rapids-bot bot merged commit 04891d8 into NVIDIA:main Mar 19, 2026
507 of 520 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix lexicographic search#969

Fix lexicographic search#969
rapids-bot[bot] merged 2 commits intoNVIDIA:mainfrom
rgsl888prabhu:fix_lex_search

rgsl888prabhu commented Mar 17, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 18, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

rgsl888prabhu commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rgsl888prabhu commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fix GES lexicographic search cudaErrorInvalidValue (test_lex_smoke)

Summary

Root cause

Changes

Testing

Issue

Checklist

Uh oh!

copy-pr-bot bot commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 18, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rgsl888prabhu commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rgsl888prabhu commented Mar 17, 2026 •

edited

Loading