Conversation
|
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_296 ran successfully. |
|
@ndgrigorian Please add diff --git a/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst b/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst
index ae1605d988..ef20f4654c 100644
--- a/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst
+++ b/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst
@@ -10,3 +10,4 @@ Sorting functions
argsort
sort
+ top_k |
79b97d9 to
882c70d
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_295 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_297 ran successfully. |
a56e21c to
26718f3
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_326 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_327 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_331 ran successfully. |
8bcb100 to
8f38b80
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_331 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_331 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_388 ran successfully. |
84d1388 to
809cb70
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_391 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_386 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_385 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_386 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_387 ran successfully. |
|
@oleksandr-pavlyk |
|
Good to see the CI green again! I was suggesting to only skip it in the test command used in the workflow so that we can still provide a reproducer to the CPU team. I was thinking we could add a file of tests to skip, and pass it as argument to I think this is the approach taken by |
Until it is passed over to the host function, and
unique_ptr's ownership is released.
Also reduced allocation sizes, where too much was being
allocated.
Introduce smart_malloc_device, etc.
The smart_malloc_device<T>(count, q) makes USM allocation
and returns a unique_ptr<T, USMDeleter> which owns the
allocation. The function throws an exception (std::runtime_error)
if USM allocation is not successful.
Introduce async_smart_free.
This function intends to replace use of host_task submissions
to manage USM temporary deallocations.
The usage is as follows:
```
// returns unique_ptr
auto alloc_owner = smart_malloc_device<T>(count, q);
// get raw pointer for use in kernels
T *data = alloc_owner.get();
[..SNIP..]
// submit host_task that releases the unique_ptr
// after the host task was successfully submitted
// and ownership of USM allocation is transfered to
// the said host task
sycl::event ht_ev =
async_smart_free(q,
dependent_events,
alloc_owner);
[...SNIP...]
```
Replaced three duplicates of the same kernel with calls to this function.
Factored out map_back_impl projects indexing from flat index to a row-wise index. Removed dead code excluded by preprocessor conditional.
Replaced it with hand-written implementation of ceil_log2(n),
such that n <= (dectype(n){1} << ceil_log2(n)) is true for all
positive values of `n` in the range.
Add check of computed against expected indices
One asserts that at least one unique pointer is specified. Another that specified arguments are unique pointers with USMDeleter.
gid-lane_id is already a multiple of sg_size.
Change kernel to process few data elements in the work-item.
Counters can not exceed uint16_t max, because the kernel assumes that the number of elements to sort fits into uint16_t. The change reduces the kernel SLM footprint. Also, remove use of std::move, uint16_t->std::uint16_t, etc Replace size_t->std::size_t, uint32_t->std::uint32_t Use `if constexpr` in order-preservign-cast for better readability.
The team developing OpenCL:CPU device runtime and compiler was notified. See CMPLRLLVM-64592 Once fixed, the work-around should be removed.
was applied in C++. Add tests for 2d input arrays, for axis=0 and axis=1 Add a test for non-contiguous input, 0d input, validation 100% coverage of top_k function implementation achieved
2865f27 to
505b64c
Compare
Done, I will add the addition of topk to the changelog |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_395 ran successfully. |
top_k functions in dpctl.tensortop_k in dpctl.tensor
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_396 ran successfully. |
This PR implements the functions
top_kas per proposal in array API spec.Radix and merge sorting are used, and modified merge-sort kernels are introduced which sort the array in chunks and write out to a temporary the
klargest or smallest values.