Description
Two bugs in DeepLense_Classification_Transformers_Archil_Srivastava affect evaluation metric correctness and device placement efficiency.
Bug 1: micro_auroc always returns NaN
In eval.py, micro_auroc is initialized as an empty list on line 47 but is never computed or appended to. When np.mean(micro_auroc) is called on line 69, it returns nan because np.mean([]) is nan.
# Line 47: initialized as empty list
loss, accuracy, class_auroc, micro_auroc, macro_auroc = [], [], [], [], []
# Line 69: np.mean([]) = nan
"micro_auroc": np.mean(micro_auroc), # Always nan!
Impact: Every evaluation reports nan for micro_auroc, corrupting W&B experiment logs.
Bug 2: labels.type(torch.LongTensor) creates CPU tensor
In both train.py (line 47-49) and eval.py (line 54-55), labels are converted using labels.type(torch.LongTensor). torch.LongTensor always creates a CPU tensor, regardless of the current device.
# train.py line 47-49
images, labels = images.to(device, dtype=torch.float), labels.type(
torch.LongTensor # Always CPU!
).to(device)
# eval.py line 54-55
batch_X, batch_y = batch_X.to(device, dtype=torch.float), batch_y.type(
torch.LongTensor # Always CPU, never moved to device
)
In train.py, the CPU tensor is immediately moved back to GPU with .to(device), causing an unnecessary CPU allocation. In eval.py, batch_y is never moved to device at all — it stays on CPU while logits are moved to CPU for comparison, silently working but wasting GPU potential.
Impact: Unnecessary CPU tensor allocation on every batch; inefficient for GPU training.
Proposed Fix
Bug 1: Remove micro_auroc from the initialization list and compute it directly:
micro_auroc = auroc_fn(logits, y, num_classes=NUM_CLASSES, average="weighted")
Bug 2: Replace labels.type(torch.LongTensor) with device-aware conversion:
labels.to(device, dtype=torch.long)
Environment
- Python 3.10, PyTorch 2.10.0, torchmetrics 1.8.2
Description
Two bugs in
DeepLense_Classification_Transformers_Archil_Srivastavaaffect evaluation metric correctness and device placement efficiency.Bug 1:
micro_aurocalways returnsNaNIn
eval.py,micro_aurocis initialized as an empty list on line 47 but is never computed or appended to. Whennp.mean(micro_auroc)is called on line 69, it returnsnanbecausenp.mean([])isnan.Impact: Every evaluation reports
nanformicro_auroc, corrupting W&B experiment logs.Bug 2:
labels.type(torch.LongTensor)creates CPU tensorIn both
train.py(line 47-49) andeval.py(line 54-55), labels are converted usinglabels.type(torch.LongTensor).torch.LongTensoralways creates a CPU tensor, regardless of the current device.In
train.py, the CPU tensor is immediately moved back to GPU with.to(device), causing an unnecessary CPU allocation. Ineval.py,batch_yis never moved to device at all — it stays on CPU while logits are moved to CPU for comparison, silently working but wasting GPU potential.Impact: Unnecessary CPU tensor allocation on every batch; inefficient for GPU training.
Proposed Fix
Bug 1: Remove
micro_aurocfrom the initialization list and compute it directly:Bug 2: Replace
labels.type(torch.LongTensor)with device-aware conversion:Environment