| title | Glossary | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| description | Definitions of terms and concepts used in Vector Bot and related technologies | |||||||||
| audience | all | |||||||||
| level | reference | |||||||||
| keywords |
|
|||||||||
| related_docs |
|
Comprehensive definitions of terms and concepts used in Vector Bot and related technologies.
A machine learning system trained to perform specific tasks, such as generating text or understanding language. Vector Bot uses AI models from Ollama to generate answers based on your documents.
A set of protocols and tools for building software applications. Vector Bot communicates with Ollama through its REST API.
Vector Bot's ability to automatically discover and use available AI models without explicit configuration. Used as the default for OLLAMA_CHAT_MODEL.
Processing multiple items together as a group rather than individually. Vector Bot processes document chunks in batches during embedding generation for efficiency.
The root URL for an API service. For Vector Bot, OLLAMA_BASE_URL specifies where to find the Ollama server (default: http://localhost:11434).
An AI model specifically designed for conversational interactions and text generation. Examples include llama3.1, mistral, and qwen2.5.
A small segment of text extracted from a larger document. Vector Bot splits documents into chunks to create searchable pieces that can be efficiently processed and retrieved.
The number of characters that consecutive chunks share to maintain context continuity. Default is 200 characters.
The maximum number of characters in each document chunk. Default is 1000 characters.
A text-based interface for interacting with software. Vector Bot provides a CLI with commands like doctor, ingest, and query.
A file containing settings and parameters for an application. Vector Bot uses .env files and environment profiles in the configs/ directory.
A setting that controls how Vector Bot behaves. Examples include DOCS_DIR, OLLAMA_CHAT_MODEL, and SIMILARITY_TOP_K.
A mathematical measure of similarity between vectors, commonly used in vector databases to find relevant content. Vector Bot uses cosine similarity to match queries with document chunks.
A platform for developing, shipping, and running applications in containers. Vector Bot supports Docker deployment with specialized configuration.
See Vector Index.
The process of loading, processing, and indexing documents to make them searchable. Performed by the vector-bot ingest command.
A numerical representation (vector) of text that captures semantic meaning. Vector Bot creates embeddings for both document chunks and queries to enable semantic search.
An AI model that converts text into numerical vectors (embeddings). Vector Bot's default is nomic-embed-text.
A predefined set of configuration settings for specific deployment scenarios (development, production, docker). Stored in configs/{profile}.env files.
A system-level setting that programs can read to modify their behavior. Vector Bot reads environment variables like DOCS_DIR and OLLAMA_CHAT_MODEL.
A collection of common questions and their answers. Vector Bot's FAQ covers installation, usage, and troubleshooting topics.
Specialized hardware that can accelerate AI model processing. Ollama can use GPUs to run models faster than CPU-only processing.
A pattern-matching syntax for specifying groups of files. Examples: *.pdf (all PDF files), docs/**/*.md (all Markdown files in docs and subdirectories).
A diagnostic test to verify that all system components are working correctly. Vector Bot's doctor command performs comprehensive health checks.
The protocol used for web communication. Vector Bot communicates with Ollama over HTTP/HTTPS.
A search approach that combines multiple techniques, such as keyword matching and semantic similarity. Vector Bot primarily uses semantic search via embeddings.
See Vector Index.
The location where Vector Bot stores the vector index files. Controlled by the INDEX_DIR configuration variable (default: ./index_storage).
See Document Ingestion.
A lightweight data format commonly used for configuration files and data exchange. Vector Bot can process JSON documents and uses JSON for some configuration.
A security standard for transmitting information securely between parties. Can be used for Vector Bot authentication in enterprise deployments.
An algorithm for finding the most similar items in a dataset. Vector Bot uses k-NN to find the most relevant document chunks for a query.
A container orchestration platform. Vector Bot can be deployed in Kubernetes clusters with appropriate configuration.
A type of AI model trained on vast amounts of text data to understand and generate human language. Examples include the Llama family of models.
The framework Vector Bot uses for document processing, indexing, and retrieval. LlamaIndex provides the core RAG functionality.
Performing computation on the user's own hardware rather than sending data to external services. Vector Bot operates entirely locally for privacy and security.
Additional information about data, such as file creation date, author, or document type. Vector Bot extracts and uses metadata during document processing.
See AI Model.
The ability to use different configurations for different deployment scenarios (development, staging, production). Vector Bot supports multi-environment configurations.
The field of AI focused on enabling computers to understand and work with human language. Vector Bot uses NLP techniques for document understanding and query processing.
A machine learning architecture inspired by biological neural networks. Modern AI models, including those used by Vector Bot, are based on neural networks.
A package manager for JavaScript/Node.js applications. One of Vector Bot's distribution methods is through npm packages.
Performing tasks without requiring an internet connection. Vector Bot works entirely offline after initial setup.
An open-source platform for running large language models locally. Vector Bot requires Ollama for both chat and embedding models.
Software deployed and run on the user's own hardware rather than in the cloud. Vector Bot is designed for on-premises deployment.
The process of converting relative paths to absolute paths. Vector Bot resolves relative paths based on the executable's location.
A file format commonly used for documents. Vector Bot can extract text from PDF files for indexing.
Data storage that survives between application runs. Vector Bot uses persistent storage for the vector index.
The package installer for Python. One of Vector Bot's distribution methods is through pip packages.
The input text or question provided to an AI model. Vector Bot combines your query with relevant document content to create effective prompts.
A question or request for information submitted to Vector Bot. Processed by the vector-bot query command.
The component responsible for finding relevant content and generating answers. Vector Bot uses LlamaIndex's query engine.
An AI technique that combines information retrieval with text generation. Vector Bot is a RAG system that retrieves relevant document content and uses it to generate informed answers.
The process of finding and extracting relevant information from a dataset. The "R" in RAG refers to retrieving relevant document chunks.
A type of web API that follows REST architectural principles. Ollama provides a REST API that Vector Bot uses for communication.
Search based on meaning rather than exact keyword matches. Vector Bot uses semantic search to find contextually relevant content even when exact words don't match.
A numerical measure of how closely related two pieces of content are. Vector Bot uses similarity scores to rank document chunks by relevance to queries.
The number of most similar document chunks to retrieve for each query. Controlled by the SIMILARITY_TOP_K configuration variable.
The original document from which information was retrieved. Vector Bot can show source documents using the --show-sources flag.
A unit of text processing in AI models (roughly equivalent to words or word parts). Also refers to authentication tokens in security contexts.
See Similarity Top K.
A neural network architecture that revolutionized natural language processing. Most modern language models, including those used by Vector Bot, are based on transformers.
A web address that specifies the location of a resource. Vector Bot uses URLs to connect to Ollama servers.
The path a user takes when learning and using a system. Vector Bot's documentation is organized around common user journeys.
A numerical representation of data, typically as a list of numbers. In Vector Bot, text is converted to vectors (embeddings) for similarity comparison.
A specialized database for storing and searching vector embeddings. Vector Bot uses LlamaIndex's vector storage capabilities.
A data structure containing document embeddings organized for efficient similarity search. Vector Bot stores this in the index directory.
A mathematical space where vectors exist and can be compared. Vector Bot performs searches in embedding vector space.
Detailed information about processing steps and system behavior. Enabled with --verbose flags or ENABLE_VERBOSE_OUTPUT=true.
A sequence of steps or processes to accomplish a task. Vector Bot supports various workflows for different use cases.
A human-readable data serialization format. Vector Bot documentation uses YAML frontmatter for metadata.
| Acronym | Full Form | Description |
|---|---|---|
| AI | Artificial Intelligence | Computer systems that can perform tasks typically requiring human intelligence |
| API | Application Programming Interface | Set of protocols for building software applications |
| CLI | Command Line Interface | Text-based user interface |
| CPU | Central Processing Unit | The main processor in a computer |
| CSV | Comma-Separated Values | A simple file format for tabular data |
| FAQ | Frequently Asked Questions | Collection of common questions and answers |
| GPU | Graphics Processing Unit | Specialized processor for graphics and parallel computing |
| HTTP | Hypertext Transfer Protocol | Protocol for web communication |
| JSON | JavaScript Object Notation | Lightweight data interchange format |
| JWT | JSON Web Token | Security standard for information transmission |
| LLM | Large Language Model | AI model trained on large amounts of text |
| NLP | Natural Language Processing | AI field focused on human language understanding |
| Portable Document Format | File format for documents | |
| RAG | Retrieval-Augmented Generation | AI technique combining retrieval and generation |
| REST | Representational State Transfer | Architectural style for web services |
| TLS | Transport Layer Security | Cryptographic protocol for secure communication |
| URL | Uniform Resource Locator | Web address format |
| YAML | YAML Ain't Markup Language | Human-readable data serialization format |
- AI Model, Chat Model, Embedding Model
- Large Language Model (LLM), Neural Network, Transformer
- Natural Language Processing (NLP)
- RAG (Retrieval-Augmented Generation), LlamaIndex, Ollama
- Vector Index, Embedding, Chunk
- Query Engine, Retrieval
- Environment Variable, Configuration File, Environment Profile
- Multi-Environment, Path Resolution
- Docker, On-Premises
Need more specific information? Check the FAQ or Configuration Variables Reference for detailed explanations.