A full-stack intelligent knowledge management platform that lets you chat with your documents using RAG (Retrieval-Augmented Generation). Upload PDFs, ask questions, and get AI-powered answers from your personal knowledge base.
Done on this google doc: https://docs.google.com/document/d/1fSG7LxXG498K0HGJsts1rKcrZRbEhe30KTweYEudFBA/edit?usp=sharing
This project goes beyond simple document search by tracking how you engage with your knowledge base and surfacing your most-referenced topics on a personalized dashboard. It's designed for students, researchers, and knowledge workers who want to make their documents searchable and conversational.
- Document Upload: Upload PDF documents to your personal knowledge base
- RAG-Powered Chat: Ask questions and get answers grounded in your documents
- Smart Retrieval: Vector similarity search finds the most relevant content
- Usage Analytics: Track which documents you reference most frequently
- Personalized Dashboard: See your top 5 most-retrieved documents on login
- User Authentication: Secure email/password authentication
- Multiple file format support (DOCX, TXT, Markdown)
- OAuth authentication
- Advanced retrieval strategies (re-ranking, hybrid search)
- Spaced repetition suggestions for learning
- Document tagging and organization
- Frontend: Streamlit (MVP) → React (later)
- Backend: FastAPI (Python)
- Database: PostgreSQL with pgvector extension
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
- LLM: OpenAI GPT-4 / Anthropic Claude
- Task Queue: Redis (for async document processing)
User uploads PDF →
Extract text → Chunk into segments → Generate embeddings →
Store in vector database
User asks question →
Embed question → Search similar chunks → Send to LLM →
Return contextualized answer → Log query (async)
Dashboard loads →
Aggregate most-retrieved documents from past 7 days →
Display top 5 with retrieval counts
second-brain/
├── api/ # FastAPI routes and endpoints
├── core/ # RAG logic (chunking, embedding, retrieval, LLM)
├── database/ # SQLAlchemy models and migrations
├── config/ # YAML configs for tunable parameters
├── frontend/ # Streamlit UI
└── tests/ # Unit tests and evaluation framework
- Python 3.10+
- PostgreSQL 15+ with pgvector extension
- Redis (optional, for background tasks)
- OpenAI API key or Anthropic API key
- Clone the repository
git clone https://github.com/yourusername/second-brain.git
cd second-brain- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Set up PostgreSQL with pgvector
# Install pgvector extension
CREATE EXTENSION vector;- Configure environment variables
cp .env.example .env
# Edit .env with your settings:
# - DATABASE_URL
# - OPENAI_API_KEY or ANTHROPIC_API_KEY
# - SECRET_KEY for JWT- Run database migrations
alembic upgrade head- Start the API server
uvicorn api.main:app --reload- Start the frontend (in a new terminal)
streamlit run frontend/app.pyAll tunable parameters are in config/ directory:
chunk_size: 512 # tokens per chunk
overlap: 50 # token overlap between chunks
min_chunk_length: 100 # minimum chunk sizetop_k: 5 # number of chunks to retrieve
min_similarity: 0.3 # minimum similarity thresholdContains system prompts and RAG templates for the LLM.
To experiment: Just change values in YAML files and restart the server. No code changes needed.
users
- id, email, password_hash
documents
- id, user_id, filename, upload_date, processing_status
chunks
- id, document_id, content, embedding (vector), chunk_index
query_logs
- id, user_id, query, retrieved_chunk_ids, timestamp, latency_ms, user_feedback- User uploads PDF via frontend
- API stores file and creates
Documentrecord with status "processing" - Background task:
- Extracts text from PDF
- Chunks text based on config
- Generates embeddings in batch
- Stores chunks with vectors in DB
- Updates document status to "ready"
- User submits question
- Embed question using same model
- Vector similarity search for top-k chunks
- Build context from retrieved chunks
- Send context + question to LLM
- Return answer to user
- Async: Log query and retrieved chunks
- Query aggregates
query_logsfrom past 7 days - Groups by document, counts retrievals
- Returns top 5 documents
- Frontend displays with retrieval counts
Located in tests/evaluation/:
{
"question": "What are the main components of RAG?",
"expected_chunks": ["chunk_id_123"],
"expected_answer_contains": ["retrieval", "generation"],
"difficulty": "easy"
}- Retrieval Accuracy: Precision@k, Recall@k
- Answer Quality: LLM-based evaluation or manual review
- End-to-End Latency: Time from query to response
- Cost per Query: API costs for embeddings + LLM
python tests/evaluation/run_evaluation.py- Async document processing: User doesn't wait for chunking/embedding
- Batch embedding: Process 32+ chunks in one API call
- Indexed vector search: pgvector with proper indexing keeps search <300ms
- Async logging: Query logging doesn't block user response
- Self-hosted embedding model (free) vs OpenAI embeddings ($0.00002/1K tokens)
- Claude Haiku (
$0.003/query) vs GPT-4 ($0.015/query) - Estimated: $3-15/month for 1000 queries
- Single server (MVP) → handles 100s of users
- Add load balancer → multiple API servers for 1000s of users
- Migrate to Qdrant/Weaviate → if vector search becomes bottleneck
- Add caching layer → Redis for frequently asked questions
- Simpler architecture (one database vs two systems)
- Easier to join user data with vectors
- Sufficient for <100K documents
- Lower operational complexity
- Simple, predictable, fast
- Semantic chunking can be added later if needed
- Easy to reason about and debug
- User gets immediate feedback ("processing...")
- Can upload multiple documents without waiting
- Scales better under load
- Easy experimentation (change chunk_size without touching code)
- Version control configuration alongside code
- Clear documentation of current settings