Skip to content

Closes #5335: Bug: pd.array fails on Arkouda-backed Categorical with dtype specified#5448

Merged
ajpotts merged 1 commit intoBears-R-Us:mainfrom
ajpotts:5335_pd.array_fails_on_Arkouda-backed_Categorical_with_dtype_specified
Mar 3, 2026
Merged

Closes #5335: Bug: pd.array fails on Arkouda-backed Categorical with dtype specified#5448
ajpotts merged 1 commit intoBears-R-Us:mainfrom
ajpotts:5335_pd.array_fails_on_Arkouda-backed_Categorical_with_dtype_specified

Conversation

@ajpotts
Copy link
Copy Markdown
Contributor

@ajpotts ajpotts commented Feb 27, 2026

Summary

Fixes an issue where calling pd.array() on an Arkouda-backed
Categorical with an explicit dtype would fail with a
NotImplementedError.

The failure occurred because pd.array() routed through
ArkoudaArray._from_sequence, which ultimately attempted to iterate
over the Categorical. Arkouda intentionally disallows iteration on
Categorical objects to prevent implicit data transfer from the server.

This change introduces a safe conversion path that: - Detects
Arkouda-backed Categorical inputs in _from_sequence - Extracts the
server-side categorical codes - Casts the codes if a dtype is
provided - Constructs the ArkoudaArray directly from the server-side
pdarray

This avoids iteration entirely and preserves server-side semantics.


Root Cause

pd.array(cat, dtype="ak_int64") triggered:

  1. ArkoudaArray._from_sequence
  2. ak_array(scalars, ...)
  3. list(scalars) inside ak_array
  4. Categorical.__iter__, which raises NotImplementedError

Since iteration is intentionally blocked for Categoricals, a direct
server-side conversion path was required.


Changes

_arkouda_array.py

  • Added special-case handling for Arkouda Categorical in
    _from_sequence
  • Extract categorical codes directly
  • Cast codes when a dtype is provided
  • Return cls(codes) without invoking ak_array

Tests

Added:

def test_pd_array_with_dtype_on_ak_categorical_should_not_iterate(self):

This verifies that:

  • pd.array(cat, dtype="ak_int64") succeeds
  • The resulting values match cat.codes
  • No iteration occurs

Behavioral Impact

Before: - pd.array(Categorical(...), dtype=...) raised
NotImplementedError

After: - Returns an ArkoudaArray backed by categorical codes - No
implicit data transfer - Fully server-side conversion path


Example

import arkouda as ak
import pandas as pd
from arkouda.pandas import Categorical

cat = Categorical(ak.array(["a", "a", "b"]))
arr = pd.array(cat, dtype="ak_int64")

# arr now contains the categorical codes

Closes #5335: Bug: pd.array fails on Arkouda-backed Categorical with dtype specified

@ajpotts ajpotts force-pushed the 5335_pd.array_fails_on_Arkouda-backed_Categorical_with_dtype_specified branch from 4c76949 to 5252427 Compare February 27, 2026 21:54
@ajpotts ajpotts force-pushed the 5335_pd.array_fails_on_Arkouda-backed_Categorical_with_dtype_specified branch from 5252427 to 9088871 Compare February 27, 2026 22:02
@ajpotts ajpotts requested a review from jaketrookman February 27, 2026 22:11
@ajpotts ajpotts marked this pull request as ready for review February 27, 2026 22:11
Copy link
Copy Markdown
Contributor

@jaketrookman jaketrookman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@ajpotts ajpotts added this pull request to the merge queue Mar 3, 2026
Merged via the queue into Bears-R-Us:main with commit df543f6 Mar 3, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: pd.array fails on Arkouda-backed Categorical with dtype specified

2 participants