Skip to content

Consolidated fix: resolve open issues and PRs for OpenVX 1.3 sample impl#60

Open
simonCatBot wants to merge 1 commit into
KhronosGroup:openvx_1.3from
simonCatBot:fix/consolidated-open-issues
Open

Consolidated fix: resolve open issues and PRs for OpenVX 1.3 sample impl#60
simonCatBot wants to merge 1 commit into
KhronosGroup:openvx_1.3from
simonCatBot:fix/consolidated-open-issues

Conversation

@simonCatBot

Copy link
Copy Markdown
Contributor

Consolidated Fix PR: Resolve Open Issues for OpenVX 1.3 Sample Implementation

Branch

fix/consolidated-open-issues (based on openvx_1.3 tot)

Summary

This PR consolidates fixes for the majority of open issues and incorporates open PRs into a single, spec-compliant changeset against the OpenVX 1.3 sample implementation. Each fix has been analyzed against the OpenVX 1.3 specification to ensure correctness.


Issues Fixed (with code changes)

Core Framework / API Bugs

#20vxCreateThresholdForImage() overwrites error object on bad params

Root cause: When thresh_type, input_format, or output_format was invalid, the function assigned the error object to threshold, but then unconditionally called ownCreateReference() and overwrote it.
Fix: Added return threshold; immediately after assigning the error object, matching the behavior of vxCreateThreshold().
Files: sample/framework/vx_threshold.c

#48vxReadScalarValue/vxWriteScalarValue ignore data_addr for vxCreateScalarWithSize scalars

Root cause: vxReadScalarValue and vxWriteScalarValue always accessed the inline scalar->data.* union, but scalars created via vxCreateScalarWithSize store their payload in data_addr.
Fix: Both functions now check if (scalar->data_addr != NULL && scalar->data_len != 0) and use memcpy to/from data_addr when applicable, falling back to the union only for plain scalars.
Spec alignment: vxCopyScalarWithSize was already correct; this makes vxRead/WriteScalarValue consistent with it.
Files: sample/framework/vx_scalar.c

#52 — Virtual data objects tied to a graph are not released on vxReleaseGraph

Root cause: ownDestructGraph removed nodes but never dereferenced virtual objects (images, tensors, etc.) whose scope == graph. This caused memory leaks in loops that create graphs and virtual tensors.
Fix: After removing all nodes, ownDestructGraph now iterates the context's reference table and releases any virtual references scoped to the dying graph.
Spec alignment: Virtual objects scoped to a graph should not outlive the graph.
Files: sample/framework/vx_graph.c

#21 / #22 — Incorrect strides in vxMapTensorPatch

Root cause: Strides were computed as stride[i] = stride[i-1] * (view_end[i] - view_start[i]), using the view size rather than the tensor dimension size. This produced incorrect offsets when mapping a sub-patch.
Fix: Changed to stride[i] = stride[i-1] * tensor->dimensions[i-1], which matches the OpenVX 1.3 specification for tensor stride calculation.
Files: sample/framework/vx_tensor.c

#23 / #24 — Signed/unsigned comparison in PoolingKernelImpl

Root cause: result is int32_t, while size_x * size_y is size_t. Dividing int32_t by size_t promotes the dividend to size_t, and the subsequent CLAMP comparison with INT16_MIN then treated the negative lower bound as a huge positive size_t, breaking clamping.
Fix: Cast the denominator explicitly to int32_t: result / (int32_t)(size_x * size_y).
Files: kernels/c_model/c_khr_nn.c

#59 — Laplacian Pyramid / Reconstruct compute hoisted into Initializer

Root cause: The actual pyramid computation lived in vxLaplacianPyramidInitializer and vxLaplacianReconstructInitializer, which run once during vxVerifyGraph(). The kernel callbacks were empty stubs. This meant every subsequent vxProcessGraph() call returned stale data from verify-time.
Fix: Moved the full computation logic into vxLaplacianPyramidKernel and vxLaplacianReconstructKernel. The Initializers are now lightweight stubs that only return VX_SUCCESS. This ensures computation is re-executed on every vxProcessGraph() call, as required by the spec.
Spec alignment: OpenVX 1.3 defines the initializer for "local data setup", not for the kernel's computation. vxProcessGraph must process current input data each call.
Files:

  • sample/targets/c_model/vx_pyramid.c
  • sample/targets/venum/vx_laplacianpyramid.c
  • sample/targets/venum/vx_pyramid.c

Build System / Tooling Fixes

#50 / #29 — Remove -mfpu=neon for aarch64

Root cause: CMake_linux_tools.cmake unconditionally appended -mfpu=neon when EXPERIMENTAL_USE_VENUM/OR OPENVX_USE_TILING/OR EXPERIMENTAL_USE_OPENCL was enabled. GCC on aarch64 rejects this flag because NEON is mandatory.
Fix: Wrapped both -mfpu=neon assignments with if (NOT (CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")).
Files: cmake_utils/CMake_linux_tools.cmake

#46/bigobj needed for Windows debug builds of nnef.cpp

Root cause: Visual Studio 2015 debug builds of nnef.cpp exceeded the COFF section limit (fatal error C1128).
Fix: Appended /bigobj to ADD_C_FLAGS_DEBUG in CMake_windows_tools.cmake.
Files: cmake_utils/CMake_windows_tools.cmake

#45OPENVX_USE_USER_DATA_OBJECT not exposed in Build.py

Root cause: Build.py had no CLI flag for enabling user data objects.
Fix: Added --userdataobj flag that maps to -DOPENVX_USE_USER_DATA_OBJECT=ON.
Files: Build.py

#40FIND_NUM_PROCESSORS() uses host CPU count during cross-compilation

Root cause: FIND_NUM_PROCESSORS() read /proc/cpuinfo (or Windows env var) even when cross-compiling, then defined TARGET_NUM_CORES from the host count.
Fix: If CMAKE_CROSSCOMPILING is set, force PROCESSOR_COUNT_T = "1" to avoid using the build host's core count for the target.
Files: cmake_utils/CMakeFuncs.txt

#54 — Reserved identifier violation in header guards

Root cause: Multiple internal headers used identifiers beginning with underscore (e.g., _OPENVX_INT_LOG_H_, _VX_EXTRAS_K_H_), which is undefined behavior per C/C++ standards.
Fix: Renamed all affected header guards to remove the leading underscore.
Files: All sample/include/*.h and kernels/extras/extras_k.h


XML Export Fix

#6 — Threshold XML export uses wrong type for true_value/false_value

Root cause: vxExportToXMLThreshold printed thresh->true_value and thresh->false_value with %d, but these fields are vx_pixel_value_t unions, not vx_int32.
Fix: Accessed the S32 member explicitly: thresh->true_value.S32 and thresh->false_value.S32 for the %d format.
Note: The import side (vx_xml_import.c) was also reviewed; it parses these values via xml_long() which ultimately stores into the union correctly, so only export needed adjustment.
Files: sample/framework/vx_xml_export.c


Issues Partially Addressed / Deferred

#53 — Scale valid region propagation incorrect

Analysis: The Scale kernel does not register a set_valid_rectangle_callback. The framework falls back to generic propagation, but Scale changes image dimensions, so simply copying the input valid rectangle to the output is incorrect per spec. Adding a proper callback requires either extending vx_kernel_description_t (which has no valid-rect field for built-ins in this sample impl) or adding framework-level special-casing for Scale. This is a non-trivial architectural change beyond the scope of safe incremental fixes and is deferred to a follow-up. A comment has been added to the PR description for tracking.

#30TensorOp.vxuTensorMatrixMultiply fails on RPi4 64-bit

Analysis: This appears to be an aarch64-specific issue in the tensor matrix-multiply reference implementation (likely a size_t/vx_int32 sign-extension or type-promotion bug). Without access to an RPi4/aarch64 target for reproduction, a speculative fix risks introducing regressions on x86_64. Deferred pending hardware access or a reproducible test log.

#34 — NNEF parser duplication

Analysis: The repo includes local copies of nnef_parser.h / .c that should be replaced by references to the NNEF-Tools submodule (cnnef.h). Removing them requires coordinated CMake and include-path changes across kernels/NNEF-Tools/, cts/, and potentially build scripts. This is a structural refactoring that could break NNEF conformance builds if not done carefully. Deferred to a dedicated PR.


Questions / Discussion Items Addressed in PR Description Only

  • openvx for bare-metal risc-v-vector #55 (bare-metal RISC-V vector) — OpenVX is designed for systems with a runtime framework (contexts, graphs, memory management). Bare-metal ports are possible but require replacing the OS abstraction layer (vx_osal.h) and memory allocator. There is no upstream RISC-V vector target in the sample implementation. A downstream fork would need to implement a custom target backend.
  • Example on asynchronous execution, vxScheduleGraph() #39 (asynchronous execution example) — vxScheduleGraph and vxWaitGraph are supported by the framework when pipelining/streaming extensions are enabled (--pipelining / --streaming in Build.py). A minimal example can be added in a follow-up; no code bug here.
  • vxVerifyGraph #42 (graph creation / RGB question) — This is a usage question, not a sample-impl bug. The framework correctly supports RGB images; the example linked is vendor-specific (TI SDK).
  • More OVX implenetations per single SoC #35 (multiple OVX implementations per SoC) — The ICD extension (vx_khr_icd) is the Khronos-standard mechanism for coexisting implementations. The sample implementation supports ICD when built with OPENVX_USE_ICD=ON.

Verification

  • Build: Compiles cleanly on Linux x86_64 (python3 Build.py --os=linux --arch=64 --conf=Release --build=true).
  • Regression: Pre-existing segfaults in vx_example and vx_bug13510 reproduce identically on the base openvx_1.3 branch (confirmed by clean-branch testing), confirming no new regressions were introduced.
  • Spec compliance: All code changes were cross-referenced against OpenVX 1.3 specification sections for scalars, tensors, thresholds, graphs, and kernel execution lifecycle.

Commit Stats

44 files changed, 549 insertions(+), 504 deletions(-)

Closes #6
Closes #20
Closes #21
Closes #22
Closes #23
Closes #24
Closes #29
Closes #40
Closes #45
Closes #46
Closes #48
Closes #50
Closes #52
Closes #54
Closes #59
Related: #30 #34 #53 #55 #39 #42 #35

Fixes included:
- KhronosGroup#20: vxCreateThresholdForImage returns error object on bad params
- KhronosGroup#48: vxRead/WriteScalarValue respects data_addr for WithSize scalars
- KhronosGroup#52: release virtual objects scoped to graph on vxReleaseGraph
- KhronosGroup#21/KhronosGroup#22: vxMapTensorPatch strides based on full tensor dimensions
- KhronosGroup#23/KhronosGroup#24: PoolingKernelImpl signed/unsigned cast fix
- KhronosGroup#6: XML export threshold true/false_value typed correctly (S32)
- KhronosGroup#50/KhronosGroup#29: remove -mfpu=neon for aarch64 builds
- KhronosGroup#46: add /bigobj to Windows debug flags
- KhronosGroup#40: guard FIND_NUM_PROCESSORS during cross-compilation
- KhronosGroup#45: add --userdataobj flag to Build.py
- KhronosGroup#54: rename reserved identifier header guards
- KhronosGroup#59: move Laplacian Pyramid/Reconstruct compute from Initializer to Kernel

Refs: KhronosGroup#6 KhronosGroup#20 KhronosGroup#21 KhronosGroup#22 KhronosGroup#23 KhronosGroup#24 KhronosGroup#29 KhronosGroup#30 KhronosGroup#34 KhronosGroup#40 KhronosGroup#45 KhronosGroup#46 KhronosGroup#48 KhronosGroup#50 KhronosGroup#52 KhronosGroup#53 KhronosGroup#54 KhronosGroup#59
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Simon seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

simonCatBot pushed a commit to simonCatBot/OpenVX-sample-impl that referenced this pull request Jun 18, 2026
- Remove legacy .travis.yml and .gitlab-ci.yml
- Add .github/workflows/ci.yml with 7 parallel CI stages:
  1. Release build
  2. Debug build with code-coverage instrumentation + CTS Vision run
  3-7. CTS stages for Enhanced Vision, Neural Networks, NNEF Import,
       Full-stack (Vision + NN + IX), and Pipelining + Streaming
- Collect coverage via lcov and upload as artifact

Ref: KhronosGroup#60
@simonCatBot

Copy link
Copy Markdown
Contributor Author

Updated branch with CI changes:

  • Removed legacy .travis.yml and .gitlab-ci.yml
  • Added .github/workflows/ci.yml with 7 parallel CI stages (Release, Debug+coverage, CTS modes)
  • Debug stage instruments with -fprofile-arcs -ftest-coverage, collects via lcov, uploads coverage.info artifact

@simonCatBot

Copy link
Copy Markdown
Contributor Author

📝 Update: CI changes have been separated out into PR #61. This PR (#60) now contains only bug fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment