Skip to content

[Bug] micro_auroc always NaN and CPU-bound label tensor in Classification Transformers #164

@PuneetKumar1790

Description

@PuneetKumar1790

Description

Two bugs in DeepLense_Classification_Transformers_Archil_Srivastava affect evaluation metric correctness and device placement efficiency.

Bug 1: micro_auroc always returns NaN

In eval.py, micro_auroc is initialized as an empty list on line 47 but is never computed or appended to. When np.mean(micro_auroc) is called on line 69, it returns nan because np.mean([]) is nan.

# Line 47: initialized as empty list
loss, accuracy, class_auroc, micro_auroc, macro_auroc = [], [], [], [], []

# Line 69: np.mean([]) = nan
"micro_auroc": np.mean(micro_auroc),  # Always nan!

Impact: Every evaluation reports nan for micro_auroc, corrupting W&B experiment logs.

Bug 2: labels.type(torch.LongTensor) creates CPU tensor

In both train.py (line 47-49) and eval.py (line 54-55), labels are converted using labels.type(torch.LongTensor). torch.LongTensor always creates a CPU tensor, regardless of the current device.

# train.py line 47-49
images, labels = images.to(device, dtype=torch.float), labels.type(
    torch.LongTensor  # Always CPU!
).to(device)

# eval.py line 54-55
batch_X, batch_y = batch_X.to(device, dtype=torch.float), batch_y.type(
    torch.LongTensor  # Always CPU, never moved to device
)

In train.py, the CPU tensor is immediately moved back to GPU with .to(device), causing an unnecessary CPU allocation. In eval.py, batch_y is never moved to device at all — it stays on CPU while logits are moved to CPU for comparison, silently working but wasting GPU potential.

Impact: Unnecessary CPU tensor allocation on every batch; inefficient for GPU training.

Proposed Fix

Bug 1: Remove micro_auroc from the initialization list and compute it directly:

micro_auroc = auroc_fn(logits, y, num_classes=NUM_CLASSES, average="weighted")

Bug 2: Replace labels.type(torch.LongTensor) with device-aware conversion:

labels.to(device, dtype=torch.long)

Environment

  • Python 3.10, PyTorch 2.10.0, torchmetrics 1.8.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions