Skip to content

fix: prevent duplicate uploads with replace-or-cancel confirmation (#20)#400

Open
Padmajakachare1911 wants to merge 2 commits into
param20h:devfrom
Padmajakachare1911:fix/duplicate-upload-conflict
Open

fix: prevent duplicate uploads with replace-or-cancel confirmation (#20)#400
Padmajakachare1911 wants to merge 2 commits into
param20h:devfrom
Padmajakachare1911:fix/duplicate-upload-conflict

Conversation

@Padmajakachare1911

Copy link
Copy Markdown
Contributor

🔗 Related Issue

Closes #20

📋 Summary

Uploading a PDF with the same filename as an existing document silently
created a duplicate DB row and left orphaned vectors in ChromaDB. This PR
intercepts the duplicate at the upload route, returns a structured 409,
and gives the user a modal to either cancel or fully replace the existing
document (file on disk + ChromaDB vectors + DB row reset + ingest
re-queued) - all in one atomic flow.


🛠️ Changes Made

backend/app/routes/documents.py

  • Added a duplicate check before Document() creation — queries for an
    existing row matching original_name scoped to user_id
  • Returns 409 Conflict with a structured body:
  { "conflict": true, "existing_id": 42, "original_name": "report.pdf" }
  • Added PUT /{doc_id}/replace endpoint that:
    1. Returns 423 Locked when document.status == "processing" with
      message: "This document is still being processed. Please wait
      before replacing it."
    2. Deletes old file from disk
    3. Purges old vectors from ChromaDB
      (collection.delete(where={"document_id": doc_id}))
    4. Resets the existing DB row — status → "pending", updates
      filename, file_size, created_at (no schema changes)
    5. Re-queues the ingest background task via the same code path as
      a fresh upload
  • Duplicate check is scoped per user_id — two different users
    uploading the same filename is not a conflict

frontend/src/components/document/ConfirmReplaceModal.tsx (new)

  • Confirmation modal built with existing Tailwind + lucide-react
    (no new dependencies)
  • Shows filename, warning about permanent removal of document and
    chat history
  • Displays inline error message for 423 without closing the modal
  • Cancel button focused by default (safe action first)
  • Closes on Escape keydown
  • Spinner state while replace request is in-flight

frontend/src/components/document/DocumentUpload.tsx

  • Intercepts 409 response before the normal success handler — stores
    existingId, filename, and formData in conflictMeta state,
    bails early without adding a ghost card to the document list
  • handleReplace calls PUT /api/v1/documents/{existingId}/replace
    with the same FormData
  • 423 sets replaceError inside the open modal (does not close it)
  • On success: closes modal, calls refreshDocuments()
  • Escaped quotes fix (ESLint: ' in ConfirmReplaceModal.tsx)

frontend/e2e/duplicate-upload.spec.ts (new)

  • Playwright tests covering the full user-facing flow

backend/tests/test_document_upload_validation.py (updated)

  • Updated existing test that previously asserted duplicates were allowed
  • Added new tests for the replace flow and 423 guard

✅ Test Results

Backend run with PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 due to a global
web3 pytest plugin conflict on the dev machine (unrelated to this PR).

Duplicate upload (409)

Test Result
test_upload_document_returns_409_for_duplicate_original_name ✅ Pass
test_upload_returns_409_when_original_name_already_exists (HTTP API) ✅ Pass

Replace flow

Test Result
test_replace_document_resets_row_and_requeues_ingest ✅ Pass
test_replace_document_returns_423_while_indexing ✅ Pass

Full suite

108 passed — entire backend/tests/

Playwright UI (390px + desktop)

Test Result
Duplicate upload → modal appears (409) ✅ Pass
Replace → PUT .../replace → modal closes ✅ Pass
423 while processing → error stays in modal ✅ Pass

7/7 passed in test_document_upload_validation.py


🔒 Edge Cases Covered

Scenario Behaviour
Same filename, different users No conflict — check scoped to user_id
Document mid-processing (status == "processing") 423 returned, modal shows error, stays open
ChromaDB purge fails try/except — vector failure does not block DB reset or file replace
User clicks Cancel Modal dismissed, original document untouched, no state change
Ghost card on failed upload Early return on 409 prevents card ever appearing in list

🖥️ Desktop / Mobile Regression

Zero. All changes are contained to the upload flow. No layout, sidebar,
or chat components were touched.


📦 Dependencies Added

None — uses Tailwind CSS and lucide-react already present in the
project.


🧪 How to Test Locally

# Backend
cd backend
pytest backend/tests/test_document_upload_validation.py -v

# Frontend
cd frontend
npx playwright test e2e/duplicate-upload.spec.ts

# Manual
# 1. Upload any PDF
# 2. Upload the exact same filename again
#    → "Replace existing document?" modal appears
# 3. Click Cancel → modal closes, original document untouched
# 4. Upload same file again, click Replace
#    → spinner, modal closes, document re-indexes from scratch
# 5. Trigger replace while a doc shows "processing"
#    → red error in modal: "This document is still being processed..."
# 6. Repeat step 1-4 as two different users with same filename
#    → no conflict (each user's uploads are independent)

📝 Notes

  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 used in CI command to avoid a
    pre-existing global web3 plugin conflict unrelated to this PR.
    All 108 tests pass cleanly with this flag.
  • Status string aligned to "processing" (not "indexing") to match
    the actual values set by the ingest pipeline in this codebase.

param20h
param20h previously approved these changes Jun 6, 2026
@param20h

param20h commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Merge COnflicts @Padmajakachare1911

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: Re-uploading the same PDF creates duplicate document entries

2 participants