Warm start barrier by taking PDLP solution, moving it to the interior, and doing PDHG on to find a point on the central path#955
Conversation
📝 WalkthroughWalkthroughThe PR introduces a Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can customize the high-level summary generated by CodeRabbit.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
cpp/src/barrier/barrier.hpp (1)
79-84: Bundle the PDHG warm start into a single validated object.These five vectors are one logical state, but exposing them independently makes partial initialization easy and forces
initial_point()to infer validity fromx_pdhgalone. A small warm-start struct plus setter/clearer would reduce the coupling between PDHG and barrier and give you one place to validate dimensions and interiority.Based on learnings: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/barrier/barrier.hpp` around lines 79 - 84, Replace the five separate PDHG warm-start vectors (x_pdhg, w_pdhg, y_pdhg, z_pdhg, v_pdhg) with a single PDHGWarmStart struct that owns those vectors and provides validate(), set(const PDHGWarmStart&), clear(), and isSet() methods; update initial_point() and any code that referenced the individual vectors to accept/ask the PDHGWarmStart object and call validate() (checking dimensions and interiority) before using its contents; ensure the setter performs dimension checks against the problem sizes and that clear() resets the isSet flag so partial initialization cannot occur.cpp/src/dual_simplex/crossover.cu (1)
1884-1906: Remove the per-iteration stream synchronizations from this hot path.These kernels are already issued on the same stream, so the
sync_stream()calls serialize every PDHG iteration and will dominate runtime long before the math does. Synchronize only where host-visible data is actually needed.As per coding guidelines, "Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/dual_simplex/crossover.cu` around lines 1884 - 1906, Remove the per-iteration host synchronizations: delete the lp.handle_ptr->sync_stream() calls surrounding the cusparse_view.transpose_spmv and the two cub::DeviceTransform::Transform invocations (the sync before transpose_spmv and the two syncs after the transforms). These kernels are launched on the same stream so the sync_stream() calls serialize the hot path; rely on the stream for ordering and only synchronize later at points where host-visible data is actually required (or replace with stream/event-based sync there), leaving the cusparse_view.transpose_spmv, both cub::DeviceTransform::Transform calls, and the RAFT_CHECK_CUDA checks intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/barrier/barrier.cu`:
- Around line 1974-1980: Validate that all warm-start vectors x_pdhg, w_pdhg,
y_pdhg, z_pdhg, v_pdhg are present and exactly match the current presolved
problem dimensions (match data.x.size(), data.w.size(), data.y.size(),
data.z.size(), data.v.size() or the problem n/m counts) before assigning them to
data.x/data.w/data.y/data.z/data.v; if any size mismatches occur, skip the warm
start and leave existing data or reset to a safe default. After a successful
full-size overwrite, re-establish strict interiority by pushing any non‑positive
entries in data.x/data.z and data.w/data.v to a small positive tolerance (e.g.,
eps) so subsequent divisions in gpu_compute_search_direction() cannot divide by
zero or negative values. Ensure you reference and check the five symbols x_pdhg,
w_pdhg, y_pdhg, z_pdhg, v_pdhg and update data.x, data.w, data.y, data.z, data.v
atomically only when all validations pass.
In `@cpp/src/dual_simplex/crossover.cu`:
- Around line 1974-2013: The branch that handles barrier_status after
barrier_solver.solve currently returns based on barrier_status but never
copies/postsolves barrier_solution into the caller-facing solution nor
initializes vstatus, and it collapses TIME_LIMIT/CONCURRENT_LIMIT into
NUMERICAL_ISSUES; update the code after barrier_solver.solve to (1) when
barrier_status == OPTIMAL, postsolve/copy barrier_solution into solution and set
vstatus appropriately (initialize any variable-status arrays the caller
expects), and return crossover_status_t::OPTIMAL; (2) when barrier_status ==
TIME_LIMIT return the corresponding crossover_status_t TIME_LIMIT (preserving
the termination reason), when barrier_status == CONCURRENT_LIMIT return
CONCURRENT_LIMIT, and for any other non-OPTIMAL status return NUMERICAL_ISSUES;
use the existing symbols barrier_solution, barrier_solver.solve, barrier_status,
solution, and vstatus to locate where to perform the postsolve/copy and status
propagation.
- Around line 1865-1967: The PDHG main for-loop (for (i_t iter = 0; iter <
iter_max; iter++)) never checks time_limit or the concurrent_halt flag, so
long-running/cancelled solves keep using the GPU; add termination checks inside
the loop (e.g., at the top or immediately after the periodic sync/printf block)
that compute elapsed time via toc(start_time) and break if elapsed >=
time_limit, and also check a shared concurrent_halt (or similar cancellation
token) and break if set; update the same cleanup/raft::copy-to-host and
converged handling (x,y,xbar) when breaking due to timeout/cancel to ensure
consistent state.
- Around line 1817-1823: norm_c or barrier_lp.A.norm2_estimate can be zero
causing primal_weight, eta, tau or sigma to become non-finite; update the
initialization around norm_b, norm_c, primal_weight, eta, tau, sigma (and use
mu_target) to guard against zero/near-zero values by introducing a small epsilon
(e.g. eps = numeric_limits<f_t>::epsilon() or a configurable tiny constant),
clamp norm_b/norm_c and the norm2_estimate denominator to >= eps, compute
primal_weight = (norm_b > eps) ? norm_c / norm_b : 1.0, compute eta = 0.998 /
max(barrier_lp.A.norm2_estimate(1e-4), eps), then set tau = eta /
max(primal_weight, eps) and sigma = eta * max(primal_weight, eps); finally
ensure tau/sigma are finite (fallback to safe defaults if not) so PDHG never
starts with NaN/Inf.
In `@cpp/src/dual_simplex/presolve.cpp`:
- Around line 838-855: The presolve negation branch (when
settings.barrier_presolve && no_lower_bound > 0) flips variable bounds and A(:,
j) but fails to negate the linear objective and any quadratic/postsolve data;
update problem.objective[j] to -problem.objective[j] (not *= 1) and
record/negate any quadratic terms and postsolve mappings associated with column
j so that presolve_info.negated_variables and uncrush_solution can restore signs
correctly; ensure all places that consume negated_variables (e.g.,
uncrush_solution, postsolve reconstruction, and any quadratic Hessian or local
quadratic coefficients) apply the same sign flip.
- Around line 1155-1205: crush_solution_to_presolve_space currently hard-aborts
on unsupported presolve mappings (removed_constraints, removed_variables,
free_variable_pairs) and assumes removed_lower_bounds/negated_variables align
with original_x, which can kill the process or produce mis-sized outputs; change
the function signature to return a status (bool or enum) instead of void, remove
printf/exit, and return false when encountering unsupported transforms (preserve
existing behavior for negated_variables/removed_lower_bounds only if all index
checks pass); add bounds checks before indexing (verify
presolve_info.negated_variables[k] < crushed_x.size() and
removed_lower_bounds.size() == crushed_x.size() before subtracting) and document
that callers must short-circuit warm-starting when this function returns false.
In `@cpp/src/dual_simplex/random.hpp`:
- Around line 33-37: The template function random_value<T> should be constrained
to floating-point types and use a distribution parameterized by T: add a
static_assert(std::is_floating_point<T>::value, "...") in random_value to
prevent integral misuse, change the distribution type to
std::uniform_real_distribution<T> so the generated type matches T, and ensure
<type_traits> is included; update the call sites if any rely on implicit double
behavior.
In `@cpp/src/dual_simplex/sparse_matrix.cpp`:
- Around line 867-894: The loop currently divides by Sx_norm and norm_x using
exact-zero checks which can still divide by tiny values; guard these
normalizations with a relative tolerance instead. In the power-iteration section
(symbols: matrix_vector_multiply, matrix_transpose_vector_multiply,
vector_norm2, random_t and variables Sx_norm, norm_x, e, x) replace the
Sx_norm==0.0 and norm_x==0.0 checks with tests like Sx_norm <= max(tol,
std::numeric_limits<f_t>::epsilon())*reference_norm (or tol * e0) and similarly
for norm_x, and if the norm is below that threshold regenerate Sx (or
reinitialize x) and skip the division (or scale safely); ensure divisions x[j]
/= norm_x and e = norm_x / Sx_norm only occur when both norms exceed the chosen
tolerance to avoid inf/NaN contamination.
In `@cpp/src/pdlp/solve.cu`:
- Around line 623-624: The change replaces the classical crossover with
dual_simplex::central_path which returns an interior-point solution (via
barrier_solver.solve()) and does not produce a basis nor correctly-populated
vstatus; either revert to the original classical crossover when
settings.crossover is set or add an explicit crossover step after central_path
to convert the interior solution into a full basic solution and populate
vstatus. Specifically, restore or call the previous crossover routine (e.g.,
classical_crossover / simplex_crossover) instead of relying on central_path
alone, or after central_path invoke a crossover routine that takes
vertex_solution and fills vstatus and returns a proper
dual_simplex::crossover_status_t; also update the switch/handling of
dual_feasible/other statuses to map all possible central_path/crossover outputs
to existing error enums and add a short comment documenting why
central_path+explicit crossover is safe if you choose to keep it.
---
Nitpick comments:
In `@cpp/src/barrier/barrier.hpp`:
- Around line 79-84: Replace the five separate PDHG warm-start vectors (x_pdhg,
w_pdhg, y_pdhg, z_pdhg, v_pdhg) with a single PDHGWarmStart struct that owns
those vectors and provides validate(), set(const PDHGWarmStart&), clear(), and
isSet() methods; update initial_point() and any code that referenced the
individual vectors to accept/ask the PDHGWarmStart object and call validate()
(checking dimensions and interiority) before using its contents; ensure the
setter performs dimension checks against the problem sizes and that clear()
resets the isSet flag so partial initialization cannot occur.
In `@cpp/src/dual_simplex/crossover.cu`:
- Around line 1884-1906: Remove the per-iteration host synchronizations: delete
the lp.handle_ptr->sync_stream() calls surrounding the
cusparse_view.transpose_spmv and the two cub::DeviceTransform::Transform
invocations (the sync before transpose_spmv and the two syncs after the
transforms). These kernels are launched on the same stream so the sync_stream()
calls serialize the hot path; rely on the stream for ordering and only
synchronize later at points where host-visible data is actually required (or
replace with stream/event-based sync there), leaving the
cusparse_view.transpose_spmv, both cub::DeviceTransform::Transform calls, and
the RAFT_CHECK_CUDA checks intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3c7ebf4a-64ad-424b-ba57-13c2def561c6
📒 Files selected for processing (12)
cpp/src/barrier/barrier.cucpp/src/barrier/barrier.hppcpp/src/dual_simplex/CMakeLists.txtcpp/src/dual_simplex/crossover.cucpp/src/dual_simplex/crossover.hppcpp/src/dual_simplex/presolve.cppcpp/src/dual_simplex/presolve.hppcpp/src/dual_simplex/random.hppcpp/src/dual_simplex/sparse_matrix.cppcpp/src/dual_simplex/sparse_matrix.hppcpp/src/pdlp/solve.cucpp/src/pdlp/translate.hpp
| if (x_pdhg.size() > 0) { | ||
| data.x = x_pdhg; | ||
| data.w = w_pdhg; | ||
| data.y = y_pdhg; | ||
| data.z = z_pdhg; | ||
| data.v = v_pdhg; | ||
| } |
There was a problem hiding this comment.
Validate the PDHG warm start before overwriting the barrier point.
x_pdhg.size() > 0 is the only gate here, so a partially populated or mis-sized warm start can shrink data.w/y/z/v to the wrong dimensions. Also, because this overwrite happens after the positivity adjustments above, zeros or negatives can flow into the later z / x and v / w divisions in gpu_compute_search_direction(). Please validate all five vector sizes against the current presolved problem and re-establish strict interiority after the overwrite.
Suggested guard
- if (x_pdhg.size() > 0) {
+ if (!x_pdhg.empty()) {
+ if (x_pdhg.size() != data.x.size() || w_pdhg.size() != data.w.size() ||
+ y_pdhg.size() != data.y.size() || z_pdhg.size() != data.z.size() ||
+ v_pdhg.size() != data.v.size()) {
+ settings.log.printf("Invalid PDHG warm-start dimensions\n");
+ return -1;
+ }
data.x = x_pdhg;
data.w = w_pdhg;
data.y = y_pdhg;
data.z = z_pdhg;
data.v = v_pdhg;
+ data.w.ensure_positive(epsilon_adjust);
+ data.x.ensure_positive(epsilon_adjust);
+ data.v.ensure_positive(epsilon_adjust);
+ data.z.ensure_positive(epsilon_adjust);
}Based on learnings: Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases, and ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/barrier/barrier.cu` around lines 1974 - 1980, Validate that all
warm-start vectors x_pdhg, w_pdhg, y_pdhg, z_pdhg, v_pdhg are present and
exactly match the current presolved problem dimensions (match data.x.size(),
data.w.size(), data.y.size(), data.z.size(), data.v.size() or the problem n/m
counts) before assigning them to data.x/data.w/data.y/data.z/data.v; if any size
mismatches occur, skip the warm start and leave existing data or reset to a safe
default. After a successful full-size overwrite, re-establish strict interiority
by pushing any non‑positive entries in data.x/data.z and data.w/data.v to a
small positive tolerance (e.g., eps) so subsequent divisions in
gpu_compute_search_direction() cannot divide by zero or negative values. Ensure
you reference and check the five symbols x_pdhg, w_pdhg, y_pdhg, z_pdhg, v_pdhg
and update data.x, data.w, data.y, data.z, data.v atomically only when all
validations pass.
| f_t norm_b = vector_norm2<i_t, f_t>(barrier_lp.rhs); | ||
| f_t norm_c = vector_norm2<i_t, f_t>(barrier_lp.objective); | ||
| f_t primal_weight = norm_b > 0.0 ? norm_c / norm_b : 1.0; | ||
| f_t eta = 0.998 / barrier_lp.A.norm2_estimate(1e-4); | ||
| f_t tau = eta / primal_weight; | ||
| f_t sigma = eta * primal_weight; | ||
| f_t mu = mu_target; |
There was a problem hiding this comment.
Guard PDHG step-size initialization against zero norms.
norm_c == 0 makes primal_weight zero, and a zero/empty A makes norm2_estimate() zero; either case sends tau or sigma non-finite before the first iteration. Zero-objective or fully-presolved LPs are valid inputs here.
🛡️ Suggested guard
- f_t norm_b = vector_norm2<i_t, f_t>(barrier_lp.rhs);
- f_t norm_c = vector_norm2<i_t, f_t>(barrier_lp.objective);
- f_t primal_weight = norm_b > 0.0 ? norm_c / norm_b : 1.0;
- f_t eta = 0.998 / barrier_lp.A.norm2_estimate(1e-4);
+ f_t norm_b = vector_norm2<i_t, f_t>(barrier_lp.rhs);
+ f_t norm_c = vector_norm2<i_t, f_t>(barrier_lp.objective);
+ f_t primal_weight = (norm_b > 0.0 && norm_c > 0.0) ? norm_c / norm_b : 1.0;
+ f_t norm_A = barrier_lp.A.norm2_estimate(1e-4);
+ if (!(norm_A > 0.0)) { return crossover_status_t::NUMERICAL_ISSUES; }
+ f_t eta = 0.998 / norm_A;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/crossover.cu` around lines 1817 - 1823, norm_c or
barrier_lp.A.norm2_estimate can be zero causing primal_weight, eta, tau or sigma
to become non-finite; update the initialization around norm_b, norm_c,
primal_weight, eta, tau, sigma (and use mu_target) to guard against
zero/near-zero values by introducing a small epsilon (e.g. eps =
numeric_limits<f_t>::epsilon() or a configurable tiny constant), clamp
norm_b/norm_c and the norm2_estimate denominator to >= eps, compute
primal_weight = (norm_b > eps) ? norm_c / norm_b : 1.0, compute eta = 0.998 /
max(barrier_lp.A.norm2_estimate(1e-4), eps), then set tau = eta /
max(primal_weight, eps) and sigma = eta * max(primal_weight, eps); finally
ensure tau/sigma are finite (fallback to safe defaults if not) so PDHG never
starts with NaN/Inf.
| for (i_t iter = 0; iter < iter_max; iter++) { | ||
| // Dual update | ||
| // delta_y = -sigma * A * xbar + sigma * b = sigma * (b - A * xbar) | ||
| raft::copy(d_delta_y.data(), d_b.data(), m, lp.handle_ptr->get_stream()); | ||
| cusparse_view.spmv(-1, d_xbar, 1, d_delta_y); | ||
| cub::DeviceTransform::Transform( | ||
| cuda::std::make_tuple(d_delta_y.data(), d_y.data()), | ||
| d_y.data(), | ||
| d_delta_y.size(), | ||
| [sigma] HD(f_t delta_y_i, f_t dy_i) -> f_t { return dy_i + delta_y_i * sigma; }, | ||
| lp.handle_ptr->get_stream().value()); | ||
| RAFT_CHECK_CUDA(lp.handle_ptr->get_stream()); | ||
|
|
||
| raft::copy(d_xold.data(), d_x.data(), n, lp.handle_ptr->get_stream()); | ||
|
|
||
| // xold = x | ||
| // Primal gradient step | ||
| // v = x + tau * A' * y | ||
|
|
||
| lp.handle_ptr->sync_stream(); | ||
| cusparse_view.transpose_spmv(1.0, d_y, 0.0, d_v); | ||
| cub::DeviceTransform::Transform( | ||
| cuda::std::make_tuple(d_x.data(), d_v.data()), | ||
| d_v.data(), | ||
| d_v.size(), | ||
| [tau] HD(f_t x_i, f_t v_i) -> f_t { return x_i + tau * v_i; }, | ||
| lp.handle_ptr->get_stream().value()); | ||
| RAFT_CHECK_CUDA(lp.handle_ptr->get_stream()); | ||
| lp.handle_ptr->sync_stream(); | ||
|
|
||
| cub::DeviceTransform::Transform( | ||
| cuda::std::make_tuple(d_v.data(), d_c.data(), d_x.data(), d_xold.data()), | ||
| thrust::make_zip_iterator(d_x.data(), d_xbar.data()), | ||
| d_x.size(), | ||
| [tau, mu] HD(f_t v_j, f_t c_j, f_t x_j, f_t xold_j) -> thrust::tuple<f_t, f_t> { | ||
| f_t w_j = v_j - tau * c_j; | ||
| f_t new_x_j = (w_j + std::sqrt(w_j * w_j + 4.0 * tau * mu)) / 2.0; | ||
| return {new_x_j, 2.0 * new_x_j - xold_j}; | ||
| }, | ||
| lp.handle_ptr->get_stream().value()); | ||
| RAFT_CHECK_CUDA(lp.handle_ptr->get_stream()); | ||
| lp.handle_ptr->sync_stream(); | ||
|
|
||
| if (iter % 1000 == 0) { | ||
| raft::copy(x.data(), d_x.data(), n, lp.handle_ptr->get_stream()); | ||
| raft::copy(d_primal_residual.data(), d_b.data(), m, lp.handle_ptr->get_stream()); | ||
| cusparse_view.spmv(1.0, d_x, -1.0, d_primal_residual); | ||
|
|
||
| cub::DeviceTransform::Transform( | ||
| cuda::std::make_tuple(d_xbar.data(), d_c.data()), | ||
| d_dual_residual.data(), | ||
| d_xbar.size(), | ||
| [mu_target] HD(f_t xbar_j, f_t c_j) -> f_t { return mu_target / xbar_j - c_j; }, | ||
| lp.handle_ptr->get_stream().value()); | ||
| RAFT_CHECK_CUDA(lp.handle_ptr->get_stream()); | ||
| lp.handle_ptr->sync_stream(); | ||
|
|
||
| cusparse_view.transpose_spmv(1.0, d_y, 1.0, d_dual_residual); | ||
|
|
||
| f_t vector_norm_2_primal = | ||
| std::sqrt(thrust::inner_product(rmm::exec_policy(lp.handle_ptr->get_stream()), | ||
| d_primal_residual.data(), | ||
| d_primal_residual.data() + m, | ||
| d_primal_residual.data(), | ||
| 0.0)); | ||
| f_t vector_norm_2_dual = | ||
| std::sqrt(thrust::inner_product(rmm::exec_policy(lp.handle_ptr->get_stream()), | ||
| d_dual_residual.data(), | ||
| d_dual_residual.data() + n, | ||
| d_dual_residual.data(), | ||
| 0.0)); | ||
|
|
||
| f_t primal_relative = vector_norm_2_primal / (1.0 + norm_b); | ||
| f_t dual_relative = vector_norm_2_dual / (1.0 + norm_c); | ||
|
|
||
| f_t primal_obj = thrust::inner_product(rmm::exec_policy(lp.handle_ptr->get_stream()), | ||
| d_c.data(), | ||
| d_c.data() + n, | ||
| d_xbar.data(), | ||
| 0.0); | ||
| f_t dual_obj = thrust::inner_product(rmm::exec_policy(lp.handle_ptr->get_stream()), | ||
| d_b.data(), | ||
| d_b.data() + m, | ||
| d_y.data(), | ||
| 0.0); | ||
| f_t time = toc(start_time); | ||
| printf("%8d %+12.6e %+12.6e %8.2e %8.2e %8.1fs\n", | ||
| iter, | ||
| primal_obj, | ||
| dual_obj, | ||
| primal_relative, | ||
| dual_relative, | ||
| time); | ||
|
|
||
| if (primal_relative < 1e-5 && dual_relative < 1e-5) { | ||
| converged = true; | ||
| raft::copy(x.data(), d_x.data(), n, lp.handle_ptr->get_stream()); | ||
| raft::copy(y.data(), d_y.data(), m, lp.handle_ptr->get_stream()); | ||
| raft::copy(xbar.data(), d_xbar.data(), n, lp.handle_ptr->get_stream()); | ||
| break; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Honor time_limit and concurrent_halt inside the PDHG loop.
This loop can run up to 1,000,000 iterations without checking either termination condition, so a timed-out or cancelled solve keeps burning GPU time until PDHG finishes or fails.
⏱️ Suggested termination check
for (i_t iter = 0; iter < iter_max; iter++) {
+ if ((iter % 256) == 0) {
+ if (toc(start_time) > settings.time_limit) { return crossover_status_t::TIME_LIMIT; }
+ if (settings.concurrent_halt != nullptr && *settings.concurrent_halt == 1) {
+ return crossover_status_t::CONCURRENT_LIMIT;
+ }
+ }
// Dual update🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/crossover.cu` around lines 1865 - 1967, The PDHG main
for-loop (for (i_t iter = 0; iter < iter_max; iter++)) never checks time_limit
or the concurrent_halt flag, so long-running/cancelled solves keep using the
GPU; add termination checks inside the loop (e.g., at the top or immediately
after the periodic sync/printf block) that compute elapsed time via
toc(start_time) and break if elapsed >= time_limit, and also check a shared
concurrent_halt (or similar cancellation token) and break if set; update the
same cleanup/raft::copy-to-host and converged handling (x,y,xbar) when breaking
due to timeout/cancel to ensure consistent state.
| if (converged) { | ||
| printf("PDHG found point on central path\n\n\n"); | ||
| // Solve using barrier | ||
| lp_solution_t<i_t, f_t> barrier_solution(presolved_lp.num_rows, presolved_lp.num_cols); | ||
|
|
||
| barrier_solver_t<i_t, f_t> barrier_solver(presolved_lp, presolve_info, barrier_settings); | ||
|
|
||
| // Convert the converged solution to (x, w, y, z, v) | ||
| std::vector<f_t> x_pdhg(n - n_upper_bounds); | ||
| std::vector<f_t> w_pdhg(n_upper_bounds); | ||
| std::vector<f_t> y_pdhg(m - n_upper_bounds); | ||
| std::vector<f_t> z_pdhg(n - n_upper_bounds); | ||
| std::vector<f_t> v_pdhg(n_upper_bounds); | ||
|
|
||
| for (i_t j = 0; j < n - n_upper_bounds; j++) { | ||
| x_pdhg[j] = x[j]; | ||
| z_pdhg[j] = mu_target / x[j]; | ||
| } | ||
|
|
||
| for (i_t j = 0; j < n_upper_bounds; j++) { | ||
| w_pdhg[j] = x[n - n_upper_bounds + j]; | ||
| v_pdhg[j] = mu_target / x[n - n_upper_bounds + j]; | ||
| } | ||
|
|
||
| for (i_t i = 0; i < m - n_upper_bounds; i++) { | ||
| y_pdhg[i] = y[i]; | ||
| } | ||
|
|
||
| barrier_solver.x_pdhg = x_pdhg; | ||
| barrier_solver.w_pdhg = w_pdhg; | ||
| barrier_solver.y_pdhg = y_pdhg; | ||
| barrier_solver.z_pdhg = z_pdhg; | ||
| barrier_solver.v_pdhg = v_pdhg; | ||
|
|
||
| barrier_solver_settings_t<i_t, f_t> barrier_solver_settings; | ||
| lp_status_t barrier_status = | ||
| barrier_solver.solve(start_time, barrier_solver_settings, barrier_solution); | ||
|
|
||
| return barrier_status == lp_status_t::OPTIMAL ? crossover_status_t::OPTIMAL | ||
| : crossover_status_t::NUMERICAL_ISSUES; |
There was a problem hiding this comment.
Propagate the barrier result and preserve the barrier termination code.
This branch returns based on barrier_status, but it never copies/postsolves barrier_solution into solution or initializes vstatus, so callers can observe stale outputs after an OPTIMAL return. It also collapses TIME_LIMIT and CONCURRENT_LIMIT into NUMERICAL_ISSUES, which loses the actual termination reason.
Based on learnings, in cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/crossover.cu` around lines 1974 - 2013, The branch that
handles barrier_status after barrier_solver.solve currently returns based on
barrier_status but never copies/postsolves barrier_solution into the
caller-facing solution nor initializes vstatus, and it collapses
TIME_LIMIT/CONCURRENT_LIMIT into NUMERICAL_ISSUES; update the code after
barrier_solver.solve to (1) when barrier_status == OPTIMAL, postsolve/copy
barrier_solution into solution and set vstatus appropriately (initialize any
variable-status arrays the caller expects), and return
crossover_status_t::OPTIMAL; (2) when barrier_status == TIME_LIMIT return the
corresponding crossover_status_t TIME_LIMIT (preserving the termination reason),
when barrier_status == CONCURRENT_LIMIT return CONCURRENT_LIMIT, and for any
other non-OPTIMAL status return NUMERICAL_ISSUES; use the existing symbols
barrier_solution, barrier_solver.solve, barrier_status, solution, and vstatus to
locate where to perform the postsolve/copy and status propagation.
| if (settings.barrier_presolve && no_lower_bound > 0) { | ||
| presolve_info.negated_variables.reserve(no_lower_bound); | ||
| for (i_t j = 0; j < problem.num_cols; j++) { | ||
| if (problem.lower[j] == -inf && problem.upper[j] < inf) { | ||
| presolve_info.negated_variables.push_back(j); | ||
|
|
||
| problem.lower[j] = -problem.upper[j]; | ||
| problem.upper[j] = inf; | ||
| problem.objective[j] *= 1; | ||
|
|
||
| const i_t col_start = problem.A.col_start[j]; | ||
| const i_t col_end = problem.A.col_start[j + 1]; | ||
| for (i_t p = col_start; p < col_end; p++) { | ||
| problem.A.x[p] *= -1.0; | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Finish the x'_j = -x_j substitution.
This block updates bounds and flips A(:, j), but the linear term is left unchanged (problem.objective[j] *= 1) and the quadratic/postsolve pieces are not updated to match. For any -inf < x_j <= u_j column, barrier presolve will otherwise solve a different objective and uncrush_solution() will return the wrong sign for x/z.
Minimum linear-term fix
- problem.objective[j] *= 1;
+ problem.objective[j] *= -1.0;Based on learnings: Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results, and ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/presolve.cpp` around lines 838 - 855, The presolve
negation branch (when settings.barrier_presolve && no_lower_bound > 0) flips
variable bounds and A(:, j) but fails to negate the linear objective and any
quadratic/postsolve data; update problem.objective[j] to -problem.objective[j]
(not *= 1) and record/negate any quadratic terms and postsolve mappings
associated with column j so that presolve_info.negated_variables and
uncrush_solution can restore signs correctly; ensure all places that consume
negated_variables (e.g., uncrush_solution, postsolve reconstruction, and any
quadratic Hessian or local quadratic coefficients) apply the same sign flip.
| template <typename i_t, typename f_t> | ||
| void crush_solution_to_presolve_space(const lp_problem_t<i_t, f_t>& original_lp, | ||
| presolve_info_t<i_t, f_t>& presolve_info, | ||
| const std::vector<f_t>& original_x, | ||
| const std::vector<f_t>& original_y, | ||
| const std::vector<f_t>& original_z, | ||
| std::vector<f_t>& crushed_x, | ||
| std::vector<f_t>& crushed_y, | ||
| std::vector<f_t>& crushed_z) | ||
| { | ||
| crushed_x = original_x; | ||
| crushed_y = original_y; | ||
| crushed_z = original_z; | ||
|
|
||
| if (presolve_info.negated_variables.size() > 0) { | ||
| for (i_t j = 0; j < presolve_info.negated_variables.size(); j++) { | ||
| crushed_x[presolve_info.negated_variables[j]] *= -1.0; | ||
| crushed_z[presolve_info.negated_variables[j]] *= -1.0; | ||
| } | ||
| } | ||
|
|
||
| // Repeat all presolve steps to get the crushed solution | ||
| if (presolve_info.removed_lower_bounds.size() > 0) { | ||
| // We had l_j <= x_j <= u_j | ||
| // And we transformed it into | ||
| // 0 <= v_j = x_j - l_j | ||
| // So we need to subtract the removed lower bounds from the original solution | ||
| for (i_t j = 0; j < original_x.size(); j++) { | ||
| crushed_x[j] -= presolve_info.removed_lower_bounds[j]; | ||
| } | ||
| } | ||
|
|
||
| // Handle empty rows | ||
| if (presolve_info.removed_constraints.size() > 0) { | ||
| printf("Can't handle removed constraints %ld\n", presolve_info.removed_constraints.size()); | ||
| exit(1); | ||
| } | ||
|
|
||
| // Handle empty cols | ||
| if (presolve_info.removed_variables.size() > 0) { | ||
| printf("Can't handle removed variables %ld\n", presolve_info.removed_variables.size()); | ||
| exit(1); | ||
| } | ||
|
|
||
| // Handle free variables | ||
| if (presolve_info.free_variable_pairs.size() > 0) { | ||
| printf("Can't handle free variable pairs %ld\n", presolve_info.free_variable_pairs.size()); | ||
| exit(1); | ||
| } | ||
|
|
||
| } |
There was a problem hiding this comment.
Do not hard-abort on unsupported presolve mappings.
presolve() above can legitimately remove empty rows/cols, expand free variables, and fold the system. This helper either exits the process when those states appear or silently ignores them (for example, folding), so a valid model can turn the warm-start path into a process kill or a mis-sized crushed solution. Please return a status/fallback and short-circuit warm-starting when the presolve transform set is not fully supported.
Based on learnings: Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations, and reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/presolve.cpp` around lines 1155 - 1205,
crush_solution_to_presolve_space currently hard-aborts on unsupported presolve
mappings (removed_constraints, removed_variables, free_variable_pairs) and
assumes removed_lower_bounds/negated_variables align with original_x, which can
kill the process or produce mis-sized outputs; change the function signature to
return a status (bool or enum) instead of void, remove printf/exit, and return
false when encountering unsupported transforms (preserve existing behavior for
negated_variables/removed_lower_bounds only if all index checks pass); add
bounds checks before indexing (verify presolve_info.negated_variables[k] <
crushed_x.size() and removed_lower_bounds.size() == crushed_x.size() before
subtracting) and document that callers must short-circuit warm-starting when
this function returns false.
| template <typename T> | ||
| T random_value(T min, T max) | ||
| { | ||
| std::uniform_real_distribution<> distrib(min, max); | ||
| return distrib(gen); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the file
find . -path "*cpp/src/dual_simplex/random.hpp" -type fRepository: NVIDIA/cuopt
Length of output: 89
🏁 Script executed:
# Let's check the full content of the random.hpp file
cat -n cpp/src/dual_simplex/random.hppRepository: NVIDIA/cuopt
Length of output: 1281
🏁 Script executed:
# Search for calls to random_value() to see what types are being used
rg "random_value" --type cpp --type hpp -B2 -A2Repository: NVIDIA/cuopt
Length of output: 83
🏁 Script executed:
# Search for random_value calls without file type filter
rg "random_value" -B2 -A2Repository: NVIDIA/cuopt
Length of output: 1496
🏁 Script executed:
# Also search for usage of the random_t class to understand context
rg "random_t<" -B1 -A1Repository: NVIDIA/cuopt
Length of output: 443
🏁 Script executed:
# Check if there are explicit template instantiations or usage patterns
rg "random_value\s*\(" --no-heading -A1Repository: NVIDIA/cuopt
Length of output: 287
Constrain random_value() to real types and parameterize the distribution.
std::uniform_real_distribution<> defaults to double regardless of template argument T. This means if integral types are passed, the function still compiles but returns truncated double-to-integral conversions instead of uniformly distributed integers. The current call site (sparse_matrix.cpp) only uses floating-point types, but the API should enforce this constraint to prevent future misuse.
Use std::uniform_real_distribution<T> with a static_assert to restrict T to floating-point types:
🔧 Proposed fix
`#include` <random>
+#include <type_traits>
@@
template <typename T>
T random_value(T min, T max)
{
- std::uniform_real_distribution<> distrib(min, max);
+ static_assert(std::is_floating_point_v<T>,
+ "random_value requires a floating-point type");
+ std::uniform_real_distribution<T> distrib(min, max);
return distrib(gen);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/random.hpp` around lines 33 - 37, The template function
random_value<T> should be constrained to floating-point types and use a
distribution parameterized by T: add a
static_assert(std::is_floating_point<T>::value, "...") in random_value to
prevent integral misuse, change the distribution type to
std::uniform_real_distribution<T> so the generated type matches T, and ensure
<type_traits> is included; update the call sites if any rely on implicit double
behavior.
| f_t e = vector_norm2<i_t, f_t>(x); | ||
| if (e == 0.0) { return 0.0; } | ||
|
|
||
| for (i_t j = 0; j < n; ++j) { | ||
| x[j] /= e; | ||
| } | ||
|
|
||
| f_t e0 = 0.0; | ||
|
|
||
| i_t iter = 0; | ||
| const i_t max_iter = 100; | ||
| while (std::abs(e - e0) > tol * e) { | ||
| e0 = e; | ||
| matrix_vector_multiply(*this, 1.0, x, 0.0, Sx); | ||
| f_t Sx_norm = vector_norm2<i_t, f_t>(Sx); | ||
| if (Sx_norm == 0.0) { | ||
| random_t<i_t, f_t> rng(0); | ||
| for (i_t i = 0; i < m; ++i) { | ||
| Sx[i] = rng.random_value(0.0, 1.0); | ||
| } | ||
| Sx_norm = vector_norm2<i_t, f_t>(Sx); | ||
| } | ||
| matrix_transpose_vector_multiply(*this, 1.0, Sx, 0.0, x); | ||
| f_t norm_x = vector_norm2<i_t, f_t>(x); | ||
| e = norm_x / Sx_norm; | ||
| for (i_t j = 0; j < n; ++j) { | ||
| x[j] /= norm_x; | ||
| } |
There was a problem hiding this comment.
Guard the norm normalizations with a tolerance, not == 0.0.
Sx_norm and norm_x can legitimately become tiny on rank-deficient or very small matrices. With the current exact-zero checks, e = norm_x / Sx_norm and x[j] /= norm_x can produce inf/NaN values before either norm is exactly zero, which will corrupt the estimate used downstream.
🛡️ Proposed fix
`#include` <cstdio>
+#include <limits>
@@
- f_t e = vector_norm2<i_t, f_t>(x);
- if (e == 0.0) { return 0.0; }
+ const f_t min_norm = std::max(std::abs(tol), std::numeric_limits<f_t>::epsilon());
+ f_t e = vector_norm2<i_t, f_t>(x);
+ if (e <= min_norm) { return 0.0; }
@@
matrix_vector_multiply(*this, 1.0, x, 0.0, Sx);
f_t Sx_norm = vector_norm2<i_t, f_t>(Sx);
- if (Sx_norm == 0.0) {
+ if (Sx_norm <= min_norm) {
random_t<i_t, f_t> rng(0);
for (i_t i = 0; i < m; ++i) {
- Sx[i] = rng.random_value(0.0, 1.0);
+ Sx[i] = rng.random_value(f_t{0}, f_t{1});
}
Sx_norm = vector_norm2<i_t, f_t>(Sx);
+ if (Sx_norm <= min_norm) { return 0.0; }
}
matrix_transpose_vector_multiply(*this, 1.0, Sx, 0.0, x);
f_t norm_x = vector_norm2<i_t, f_t>(x);
+ if (norm_x <= min_norm) { return 0.0; }
e = norm_x / Sx_norm;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/sparse_matrix.cpp` around lines 867 - 894, The loop
currently divides by Sx_norm and norm_x using exact-zero checks which can still
divide by tiny values; guard these normalizations with a relative tolerance
instead. In the power-iteration section (symbols: matrix_vector_multiply,
matrix_transpose_vector_multiply, vector_norm2, random_t and variables Sx_norm,
norm_x, e, x) replace the Sx_norm==0.0 and norm_x==0.0 checks with tests like
Sx_norm <= max(tol, std::numeric_limits<f_t>::epsilon())*reference_norm (or tol
* e0) and similarly for norm_x, and if the norm is below that threshold
regenerate Sx (or reinitialize x) and skip the division (or scale safely);
ensure divisions x[j] /= norm_x and e = norm_x / Sx_norm only occur when both
norms exceed the chosen tolerance to avoid inf/NaN contamination.
| dual_simplex::crossover_status_t crossover_status = dual_simplex::central_path( | ||
| lp, dual_simplex_settings, initial_solution, timer.get_tic_start(), vertex_solution, vstatus); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check the context around lines 623-624 in solve.cu
sed -n '610,640p' cpp/src/pdlp/solve.cuRepository: NVIDIA/cuopt
Length of output: 2034
🏁 Script executed:
# Find the dual_simplex crossover/central_path definitions
find . -type f \( -name "*.hpp" -o -name "*.cu" \) -path "*/dual_simplex/*" | head -20Repository: NVIDIA/cuopt
Length of output: 829
🏁 Script executed:
# Search for central_path function definition
rg "central_path" --type cpp --type cuda -A 10 -B 2 cpp/src/dual_simplex/Repository: NVIDIA/cuopt
Length of output: 84
🏁 Script executed:
# Search for the old crossover function to compare
rg "crossover\(" --type cpp --type cuda -A 10 -B 2 cpp/src/dual_simplex/ | head -100Repository: NVIDIA/cuopt
Length of output: 84
🏁 Script executed:
# Read the crossover.hpp file for function declarations
cat -n cpp/src/dual_simplex/crossover.hppRepository: NVIDIA/cuopt
Length of output: 1950
🏁 Script executed:
# Read the crossover.cu file for implementations
cat -n cpp/src/dual_simplex/crossover.cuRepository: NVIDIA/cuopt
Length of output: 50368
🏁 Script executed:
# Find the crossover() template implementation in crossover.cu
grep -n "^.*crossover.*{" cpp/src/dual_simplex/crossover.cu | head -5Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Find the central_path() template implementation in crossover.cu
grep -n "^.*central_path.*{" cpp/src/dual_simplex/crossover.cu | head -5Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Search for where central_path is actually defined
rg "crossover_status_t central_path" cpp/src/dual_simplex/crossover.cu -A 30 | head -100Repository: NVIDIA/cuopt
Length of output: 1951
🏁 Script executed:
# Search for the old crossover() implementation to compare
rg "^template.*crossover_status_t crossover\(" cpp/src/dual_simplex/crossover.cu -A 50 | head -150Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Look for what central_path returns - does it produce a basic solution?
rg "central_path" cpp/src/dual_simplex/crossover.cu -A 100 | tail -200Repository: NVIDIA/cuopt
Length of output: 4249
🏁 Script executed:
# Get more of central_path implementation to see the full function
sed -n '1100,1400p' cpp/src/dual_simplex/crossover.cu | tail -100Repository: NVIDIA/cuopt
Length of output: 4580
🏁 Script executed:
# Search for what vstatus is set to in central_path
rg "vstatus\[" cpp/src/dual_simplex/crossover.cu | grep -A2 -B2 "central_path" | head -50Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Look at the full central_path function - get line ranges
rg -n "^template.*i_t, f_t.*central_path" cpp/src/dual_simplex/crossover.cuRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Find where central_path is defined in the file
grep -n "template.*crossover_status_t central_path" cpp/src/dual_simplex/crossover.cuRepository: NVIDIA/cuopt
Length of output: 117
🏁 Script executed:
# Get the line count of the file to understand its size
wc -l cpp/src/dual_simplex/crossover.cuRepository: NVIDIA/cuopt
Length of output: 94
🏁 Script executed:
# Search for the old crossover() function to see the pattern
grep -n "template.*crossover_status_t crossover" cpp/src/dual_simplex/crossover.cuRepository: NVIDIA/cuopt
Length of output: 112
🏁 Script executed:
# Search for where the actual template function for central_path starts (not the instantiation)
grep -n "^template <typename i_t, typename f_t>" cpp/src/dual_simplex/crossover.cu | head -20Repository: NVIDIA/cuopt
Length of output: 814
🏁 Script executed:
# Get a broader view - look for both crossover and central_path function declarations
grep -n "crossover_status_t crossover\|crossover_status_t central_path" cpp/src/dual_simplex/crossover.cuRepository: NVIDIA/cuopt
Length of output: 313
🏁 Script executed:
# Read the section around line 2029 backwards to find where central_path actually starts
sed -n '1600,2030p' cpp/src/dual_simplex/crossover.cu | head -100Repository: NVIDIA/cuopt
Length of output: 3724
🏁 Script executed:
# Read lines 1582 to 1900 to see the main logic of central_path
sed -n '1582,1900p' cpp/src/dual_simplex/crossover.cuRepository: NVIDIA/cuopt
Length of output: 11660
🏁 Script executed:
# Read lines 1900 to 2020 to see the end of central_path
sed -n '1900,2020p' cpp/src/dual_simplex/crossover.cuRepository: NVIDIA/cuopt
Length of output: 4930
🏁 Script executed:
# Now let's check what happens with vstatus and vertex_solution after line 624 in solve.cu
sed -n '623,700p' cpp/src/pdlp/solve.cuRepository: NVIDIA/cuopt
Length of output: 4733
🏁 Script executed:
# Look for the old crossover() implementation to compare it with central_path()
sed -n '1186,1400p' cpp/src/dual_simplex/crossover.cu | head -150Repository: NVIDIA/cuopt
Length of output: 5833
API contract violation: central_path() produces interior point solution, not basic solution.
The function dispatch change at line 623 replaces classical crossover (which produces a complete basic solution with vstatus properly classified) with central_path(), which runs a barrier method with PDHG and produces an interior point on the central path. This is a silent semantic change: identical signatures but fundamentally different algorithm and output type.
central_path() terminates by calling barrier_solver.solve(), not simplex pivoting. It does not produce a complete basis—vstatus will not correctly represent which variables are basic/nonbasic for downstream simplex operations. Code expecting a basic solution will be broken.
The mapping of crossover_status_t::DUAL_FEASIBLE to NumericalError in the switch statement suggests incomplete handling of possible output states from the new algorithm.
Required: Document why this API contract change is safe, or restore the original crossover behavior for the settings.crossover branch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/solve.cu` around lines 623 - 624, The change replaces the
classical crossover with dual_simplex::central_path which returns an
interior-point solution (via barrier_solver.solve()) and does not produce a
basis nor correctly-populated vstatus; either revert to the original classical
crossover when settings.crossover is set or add an explicit crossover step after
central_path to convert the interior solution into a full basic solution and
populate vstatus. Specifically, restore or call the previous crossover routine
(e.g., classical_crossover / simplex_crossover) instead of relying on
central_path alone, or after central_path invoke a crossover routine that takes
vertex_solution and fills vstatus and returns a proper
dual_simplex::crossover_status_t; also update the switch/handling of
dual_feasible/other statuses to map all possible central_path/crossover outputs
to existing error enums and add a short comment documenting why
central_path+explicit crossover is safe if you choose to keep it.
Uses parts of Algorithm 1 from Rothberg's https://arxiv.org/pdf/2603.03150