Architecture Guide - Vector Bot
Overview
System Components
Data Flow
Storage Architecture
Configuration System
CLI Architecture
Multi-Environment Support
Executable Distribution
Performance Considerations
Security Model
Vector Bot is a Retrieval-Augmented Generation system that combines:
Document Processing : Converting files into searchable chunks
Vector Embeddings : Semantic representation of text
Similarity Search : Finding relevant content
Language Generation : Answering questions using retrieved context
graph TB
A[Documents] --> B[Document Ingestion]
B --> C[Text Chunking]
C --> D[Embedding Generation]
D --> E[Vector Index]
F[User Query] --> G[Query Embedding]
G --> H[Similarity Search]
E --> H
H --> I[Context Retrieval]
I --> J[Language Model]
J --> K[Generated Answer]
Loading
Component
Technology
Role
LLM Runtime
Ollama
Hosts and runs language models locally
RAG Framework
LlamaIndex
Orchestrates indexing, retrieval, and generation
Embeddings
nomic-embed-text
Converts text to semantic vectors
Chat Model
llama3.1/llama3.2/etc
Generates answers from context
CLI Interface
Rich + argparse
User interaction and commands
Configuration
python-dotenv
Environment-based settings
┌─────────────────────────────────────────┐
│ CLI Layer │ ← User Interface
├─────────────────────────────────────────┤
│ Configuration Layer │ ← Environment Management
├─────────────────────────────────────────┤
│ RAG Core Layer │ ← Business Logic
├─────────────────────────────────────────┤
│ LlamaIndex Framework │ ← RAG Orchestration
├─────────────────────────────────────────┤
│ Ollama Client │ ← Model Communication
├─────────────────────────────────────────┤
│ Local File System │ ← Document & Index Storage
└─────────────────────────────────────────┘
1. Document Discovery
├── Scan DOCS_DIR for supported files
├── Filter by extension (.pdf, .txt, .md, .json, .csv)
└── Skip files > 20MB
2. Document Processing
├── Load document content
├── Extract text (PDF parsing, etc.)
└── Split into chunks (typically 512-1024 chars)
3. Embedding Generation
├── Send chunks to nomic-embed-text model
├── Generate 768-dimensional vectors
└── Store embeddings with metadata
4. Index Construction
├── Build searchable vector index
├── Create document store mapping
└── Persist to index_storage/
1. Query Processing
├── Receive user question
├── Apply preprocessing/normalization
└── Generate query embedding vector
2. Similarity Search
├── Compare query vector to document vectors
├── Calculate cosine similarity scores
└── Retrieve top-k most similar chunks
3. Context Assembly
├── Collect retrieved document chunks
├── Add metadata (source files, scores)
└── Format for language model input
4. Answer Generation
├── Send context + query to chat model
├── Generate response using retrieved information
└── Return answer to user
project/
├── vector-bot.exe # Standalone executable
├── configs/ # Environment configurations
│ ├── development.env # Dev settings
│ ├── production.env # Prod settings
│ └── docker.env # Container settings
├── docs/ # User documents (input)
│ ├── project-spec.pdf
│ ├── user-manual.md
│ └── api-docs.json
├── index_storage/ # Generated index (output)
│ ├── docstore.json # Document chunks & metadata
│ ├── index_store.json # Index structure
│ ├── vector_store.json # Embedding vectors
│ └── graph_store.json # Relationships
└── .env # Local configuration override
{
"docstore" : {
"docs" : {
"doc_1" : {
"text" : " The project requirements include..." ,
"metadata" : {
"file_name" : " requirements.pdf" ,
"file_path" : " /docs/requirements.pdf" ,
"creation_date" : " 2024-01-15"
}
}
}
}
}
{
"embedding_dict" : {
"doc_1" : [0.123 , -0.456 , 0.789 , ... ], // 768 dimensions
"doc_2" : [-0.234 , 0.567 , -0.891 , ... ]
},
"metadata_dict" : {
"doc_1" : {"similarity_score" : 0.95 }
}
}
1. Command Line Arguments (highest priority)
├── --env production
└── --config-info
2. Environment Variables
├── OLLAMA_CHAT_MODEL=llama3.1
└── SIMILARITY_TOP_K=6
3. Environment Config Files
├── configs/production.env
└── configs/development.env
4. Local .env File
└── .env
5. Built-in Defaults (lowest priority)
└── hardcoded in config.py
Configuration Loading Process
def load_config (env_name = None ):
# 1. Determine executable location
executable_dir = get_executable_dir ()
# 2. Build search paths for config files
config_paths = [
executable_dir / f"configs/{ env_name } .env" ,
Path (f"configs/{ env_name } .env" ),
executable_dir / ".env" ,
Path (".env" )
]
# 3. Load first found config (don't override existing env vars)
load_dotenv (config_file , override = False )
# 4. Build final configuration with validation
return validate_and_build_config ()
vector-bot [global_options] <command> [command_options]
Global Options:
├── --env ENV # Environment selection
├── --config-info # Show configuration
└── --version # Show version
Commands:
├── doctor # Health check
├── ingest # Index documents
└── query "question" # Ask questions
def main ():
# 1. Parse arguments
args = parse_arguments ()
# 2. Handle global options
if args .config_info :
show_configuration ()
return
# 3. Load environment-specific config
config = load_config (args .env )
# 4. Execute command
if args .command == "doctor" :
doctor (config )
elif args .command == "ingest" :
ingest (config )
elif args .command == "query" :
query (args .question , config )
Multi-Environment Support
Environment
Use Case
Characteristics
Development
Local dev work
Verbose logging, local paths, debug mode
Production
Server deployment
Optimized settings, absolute paths, minimal logs
Docker
Containerized
Container networking, mounted volumes
Environment-Specific Settings
# Development
DOCS_DIR=./docs # Relative paths
LOG_LEVEL=DEBUG # Verbose logging
ENABLE_VERBOSE_OUTPUT=true # Debug info
REQUEST_TIMEOUT=60 # Standard timeout
# Production
DOCS_DIR=/data/documents # Absolute paths
LOG_LEVEL=INFO # Minimal logging
ENABLE_VERBOSE_OUTPUT=false # No debug
REQUEST_TIMEOUT=120 # Longer timeout
EMBED_BATCH_SIZE=5 # Smaller batches
PyInstaller Bundle Structure
vector-bot.exe
├── Python Runtime # Embedded Python 3.10+
├── Dependencies # All pip packages
│ ├── llama-index-core
│ ├── llama-index-llms-ollama
│ ├── llama-index-embeddings-ollama
│ ├── requests
│ ├── rich
│ └── python-dotenv
├── Application Code # Our Python modules
│ └── rag/
│ ├── cli.py
│ ├── config.py
│ ├── ingest.py
│ ├── query.py
│ └── ollama_check.py
├── Configuration Files # Environment configs
│ └── configs/
│ ├── development.env
│ ├── production.env
│ └── docker.env
└── Entry Point # Bootstrap script
└── rag_main.py
def get_executable_dir ():
if getattr (sys , 'frozen' , False ):
# Running as PyInstaller executable
return Path (sys .executable ).parent
else :
# Running as Python script
return Path (__file__ ).parent .parent .parent
# This allows configs to be found whether running as:
# 1. vector-bot.exe --config-info
# 2. python -m rag.cli --config-info
Performance Considerations
Factor
Impact
Optimization
Document Size
Linear with content
Chunk large files
Embedding Model
Batch processing
Use EMBED_BATCH_SIZE=10
File Format
PDF parsing overhead
Prefer text/markdown
Storage Type
I/O bottleneck
Use SSD for index_storage/
Factor
Impact
Optimization
Index Size
Search complexity
Consider index pruning
Similarity K
Linear with K
Use minimal --k needed
Chat Model
Generation time
Balance model size vs speed
Context Length
Token processing
Limit chunk sizes
Component Memory Usage:
├── Ollama Models
│ ├── llama3.1 (7B): ~4-8 GB RAM
│ └── nomic-embed: ~500 MB RAM
├── Vector Index
│ ├── ~768 bytes per chunk
│ └── Scales with document count
└── Application
└── ~50-100 MB Python runtime
The system is designed for local-only operation with these assumptions:
Protected:
✅ Documents never leave the machine
✅ No network calls except to localhost:11434
✅ All processing happens offline
✅ User controls all data and models
Attack Vectors:
⚠️ Malicious documents (PDF exploits, etc.)
⚠️ Model tampering (if Ollama compromised)
⚠️ File system permissions
⚠️ Input injection in queries
# Input validation
def validate_config (config ):
# Validate paths can be created
# Check URL formats
# Sanitize user inputs
pass
# File handling
def load_documents (docs_dir ):
# Skip files > 20MB (DoS protection)
# Validate file extensions
# Handle parsing errors gracefully
pass
# Network isolation
OLLAMA_BASE_URL = "http://localhost:11434" # Only localhost
# No external API calls
# No telemetry or analytics
Data Type
Storage Location
Network Access
User Documents
Local docs/ folder
Never transmitted
Generated Embeddings
Local index_storage/
Never transmitted
Chat History
Not stored
Never transmitted
Configuration
Local env files
Never transmitted
Model Weights
Ollama local cache
Downloaded once
Vector Bot
├── Ollama Server (required)
│ ├── HTTP API on localhost:11434
│ ├── Model management
│ └── Inference execution
├── File System (required)
│ ├── Document storage
│ ├── Index persistence
│ └── Configuration files
└── Operating System (required)
├── Process management
├── Network stack (localhost only)
└── File permissions
User Input → CLI → Config → RAG Core → LlamaIndex → Ollama API
↓
File System ← Index Storage
try :
# 1. Validate configuration
config = load_config ()
# 2. Check Ollama connectivity
if not check_server ():
raise ConnectionError ("Ollama not running" )
# 3. Validate models
ensure_models_available ()
# 4. Execute command
result = execute_command ()
except ConfigurationError as e :
console .print (f"Config error: { e } " )
return 1
except ConnectionError as e :
console .print (f"Ollama error: { e } " )
return 1
except Exception as e :
console .print (f"Unexpected error: { e } " )
return 1
tests/
├── unit/ # Isolated unit tests
│ ├── test_cli.py # CLI interface tests
│ ├── test_config.py # Configuration tests
│ ├── test_ingest.py # Ingestion logic tests
│ ├── test_query.py # Query processing tests
│ └── test_ollama_check.py # Health check tests
├── integration/ # End-to-end tests
│ └── test_integration.py
├── conftest.py # Shared fixtures
└── README.md # Testing documentation
Test Type
Purpose
Characteristics
Unit Tests
Test individual functions
Fully mocked, < 1s execution
Integration Tests
Test component interaction
Partial mocking, < 5s execution
Smoke Tests
Validate basic functionality
Real execution, requires Ollama
# External dependencies are fully mocked
@patch ("requests.get" )
@patch ("llama_index.llms.ollama.Ollama" )
@patch .object (Path , "exists" )
def test_function (mock_path , mock_ollama , mock_requests ):
# Test runs completely offline
# No network calls or file system access
pass
Core Logic : 90%+ coverage
Error Paths : All exceptions tested
Edge Cases : Empty inputs, large files, timeouts
Platform Independence : Tests pass on all OS
Adding New Document Types
# In ingest.py, extend supported extensions:
extensions = [".txt" , ".md" , ".pdf" , ".json" , ".csv" , ".docx" ] # Add .docx
# Add custom parser:
def load_docx (file_path ):
# Custom document loading logic
pass
# In config.py, support new embedding models:
SUPPORTED_EMBED_MODELS = [
"nomic-embed-text" ,
"mxbai-embed-large" ,
"your-custom-model" # Add new model
]
# Current: Ollama only
# Future: Could extend to support OpenAI API, local transformers, etc.
def get_llm (config ):
if config ["LLM_BACKEND" ] == "ollama" :
return Ollama (...)
elif config ["LLM_BACKEND" ] == "openai" :
return OpenAI (...)
This architecture provides a secure, offline, and extensible RAG system that can scale from personal use to enterprise deployment while maintaining complete data privacy and control.