Skip to content

darshjme/darshjdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

188 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DarshJDB

License: MIT Built with Rust PostgreSQL 16+ Status: Alpha CI Tests: 1090 Release: v0.4.0


Postgres + Redis + Pinecone + LangChain Memory + MCP in one Rust binary.

Self-hosted Backend-as-a-Service. Triple-store EAV over PostgreSQL. Tiered agent memory with LLM summariser. RESP3-compatible cache. pgvector hybrid search. Model Context Protocol server. TimescaleDB hypertables. Merkle audit chain. One binary. One config. One port per service.

Quick Start | Feature Grid | DarshJQL | Agent Memory | Redis Drop-in | MCP | SDKs | Roadmap


What is DarshJDB

DarshJDB is a self-hosted backend that ships as one Rust binary. It replaces the stack most modern apps end up stitching together -- Postgres for data, Redis for cache, Pinecone for vectors, a bespoke "agent memory" service, and an MCP bridge -- with a single process that speaks REST, WebSocket, RESP3, JSON-RPC, and Server-Sent Events. You connect from TypeScript, React, Next.js, Angular, Python, or PHP. It runs on a machine with 512MB of RAM.

This is alpha software built using AI-assisted development tools (Claude Code). It has over 1,090 integration tests across Rust, TypeScript, Python, and PHP. It works. It is not production-hardened yet.


Why DarshJDB Exists

Started in 2021 in Ahmedabad, India. Darshankumar Joshi was building apps for clients at his company (GraymatterOnline) and got frustrated -- every project needed auth, real-time, storage, permissions, and a query engine. Then every AI project needed a vector DB, an agent memory service, and a way to expose tools to Claude Desktop or Cursor. Firebase was locked-in, Supabase was Postgres but heavy, Convex was cloud-only, Pinecone was a separate bill, every "agent memory" product was a thin wrapper over someone else's infra.

He wanted something he could run on a $5 VPS, own completely, and extend without limits. So he started building DarshJDB -- one Rust binary that handles everything. You are not fighting the rowboat when you build a backend in 2026. You are fighting the aircraft carrier of services that boot alongside it.

Five years in, DarshJDB absorbs ideas from PostgreSQL, GraphDB, Redis, pgvector, TimescaleDB, Bitcoin/Solana anchoring, InstantDB, Convex, and the Model Context Protocol -- and still runs on a machine with 512MB of RAM.


Feature Grid

Every capability below is code that shipped in the Grand Transformation integration. No roadmap items. No fake features.

Phase Capability What it replaces
0 -- Security Real admin RBAC, magic-link email auth, 24h session cap (max 5 per user), refresh-token SHA256 hashing, exponential login rate limit (lock after 10 fails / 3600s), WS mutation transactional broadcast, SSE subscription re-evaluation, path-traversal-safe uploads Hand-rolled auth middleware, ad-hoc rate limiters
1 -- Redis Superset ddb-cache crate (L1 DashMap + lz4, L2 Postgres + zstd) and ddb-cache-server RESP3 binary on port 7701. GET/SET/DEL/EXPIRE/TTL/KEYS, HSET/HGET/HGETALL/HDEL/HLEN, LPUSH/RPUSH/LPOP/RPOP/LRANGE, ZADD/ZRANGE/ZRANGEBYSCORE/ZRANK/ZREM/ZSCORE, XADD/XREAD/XRANGE, BFADD/BFEXISTS, PFADD/PFCOUNT, SUBSCRIBE/UNSUBSCRIBE/PUBLISH, HELLO 3, AUTH, INFO. Mirrored at /api/cache/*. Redis, RedisBloom, Dragonfly, KeyDB
2 -- Agent Memory ddb-agent-memory crate with 3-tier hierarchy (working DashMap / episodic Postgres / semantic pgvector HNSW) plus agent_facts injection. tiktoken-aware ContextBuilder with reverse-chron recall, semantic top-K, budget-bounded assembly. LLM-backed episodic-to-semantic summariser firing at 50 / 100 / 200 thresholds. Pinecone, Weaviate, LangChain Memory, Zep, Mem0
3 -- Hybrid Search pgvector HNSW + IVFFlat, ts_rank full-text, Reciprocal Rank Fusion (k=60, Cormack et al.) at /api/search/hybrid. Elasticsearch + Pinecone + custom re-ranker
4 -- Self-Contained Embedded Postgres 16 via pg-embed behind the embedded-db feature. React admin dashboard bundled via include_dir!. One-line installer at scripts/install.sh. GH Actions release matrix: x86_64-unknown-linux-musl, aarch64-apple-darwin, x86_64-pc-windows-msvc. Docker compose stacks, "bring your own Postgres" docs
5 -- Time + Graph + Anchor TimescaleDB hypertable time_series at /api/ts/* (insert/range/agg/latest). Graph edges table with BFS at /api/graph/*. Merkle anchor receipts with optional IPFS/Ethereum submission behind anchor-ipfs / anchor-eth feature flags. TimescaleDB instance, Neo4j, anchor-as-a-service
6 -- MCP + Streaming JSON-RPC 2.0 Model Context Protocol at POST /api/mcp exposing 10 tools. SSE streaming agent endpoint /api/agent/stream. Works with Claude Desktop, Cursor, and any MCP client. Custom MCP bridges per project
7 -- Multimodal Chunked/resumable uploads at /api/storage/upload/{init,chunk,status}. Image transforms pipeline (resize/crop/format/quality) with lz4 byte cache. WebSocket diff engine emits {added, removed, updated} buckets instead of full replays. tus, uppy, imgproxy, hand-rolled diff logic
9 -- SurrealDB Parity Strict schema mode (schema_definitions, 422 on violation). LIVE SELECT auto-registers WS subscriptions from DarshanQL. SQL passthrough at POST /api/sql (whitelisted DML, admin-only, audit-logged). A second database for "just the SQL case"
10 -- Observability Prometheus metrics at /metrics (IP-allowlisted). /health, /ready, /live probes. Structured JSON logging with request_id propagation. Legacy rich health moved to /health/full. A Grafana stack you don't have time to configure

Agent Memory: Unlimited LLM Context

DarshJDB ships a tiered agent memory model that gives any LLM agent (Claude, GPT, Gemini, open-source) effectively unlimited working memory without shelling out to Pinecone, Weaviate, Qdrant, or a bespoke MCP memory server.

Tier Storage Purpose Latency
Working In-process DashMap window Hot turns for the current conversation <1ms
Episodic Postgres memory_entries rows Recent turns evicted from the working window <10ms
Semantic pgvector HNSW + LLM-backed summaries Long-term recall. Compressed by the summariser at 50/100/200 turn thresholds <50ms
Facts agent_facts key/value per session Structured facts injected directly into the context budget <10ms

ContextBuilder uses a tiktoken counter to assemble a budget-bounded prompt: reverse-chronological working window, semantic top-K recall against the current query, and optional fact injection. When the episodic tier grows past 50, 100, or 200 entries, the summariser fires, asks the configured LLM provider (OpenAI, Ollama, Anthropic, or None) to compress the oldest block into a semantic summary, writes the compressed row, and deletes the source rows in a single transaction. A 500-turn conversation stays inside a 4k-token prompt window. A 5000-turn conversation does too.

style working fill:#14532d,stroke:#86efac,color:#fff
style partial fill:#713f12,stroke:#fde68a,color:#fff

### What each piece actually does

- **Data path**: `POST /api/data/users -d '{"name":"Alice"}'` writes triples to Postgres. `GET /api/data/users` reads them back. Round-trip proven by integration tests across all SDKs.
- **Auth**: Signup hashes passwords with Argon2id (64MB memory, 3 iterations). Signin returns a JWT. Every protected route validates the token before touching data.
- **Permissions**: Every request evaluates row-level rules. Users see only their own data. Admins bypass. Rules are stored as data (triples), not config files.
- **Query engine**: DarshanQL — a purpose-built query language that parses, generates an execution plan, and runs against Postgres. Not SQL, not GraphQL, not a toy.
- **WebSocket subscriptions**: Clients subscribe to queries. When a mutation changes matching data, the server broadcasts diffs to connected clients.
- **Admin dashboard**: React + Vite + Tailwind. Shows live data from the API. Manages collections, users, permissions.

### What's not done yet

- Server function V8 runtime (subprocess placeholder exists, API surface validated)
- Published npm/crates.io packages
- Install script (`curl -fsSL ... | sh`)
- Hosted documentation site
- Performance benchmarks against Firebase/Supabase/Convex
- Horizontal scaling / multi-node

---

## DarshQL

DarshQL is the query language purpose-built for DarshJDB. It borrows the clarity of SQL, the traversal power of graph query languages, and the expressiveness of document query builders — then unifies them under one syntax that works across every data model.

### Define

```sql
-- Namespace and database
USE NS production DB myapp;

-- Define a schemafull table
DEFINE TABLE user SCHEMAFULL;
DEFINE FIELD name ON user TYPE string ASSERT $value != NONE;
DEFINE FIELD email ON user TYPE string ASSERT string::is::email($value);
DEFINE FIELD created ON user TYPE datetime DEFAULT time::now();
DEFINE INDEX idx_email ON user FIELDS email UNIQUE;

-- Define a schemaless table (anything goes)
DEFINE TABLE event SCHEMALESS;

-- Define a graph edge table
DEFINE TABLE follows SCHEMAFULL TYPE RELATION IN user OUT user;
DEFINE FIELD since ON follows TYPE datetime DEFAULT time::now();

Create

-- Create with auto-generated ID
CREATE user SET name = 'Darsh', email = 'darsh@navsari.dev';

-- Create with specific ID
CREATE user:darsh SET name = 'Darsh Joshi', email = 'darsh@navsari.dev';

-- Insert multiple records
INSERT INTO user [
  { name: 'Alice', email: 'alice@example.com' },
  { name: 'Bob', email: 'bob@example.com' }
];

Query

-- Simple select with conditions
SELECT * FROM user WHERE email CONTAINS 'example.com' ORDER BY created DESC LIMIT 10;

-- Nested field access
SELECT name, settings.theme, settings.notifications.email FROM user;

-- Aggregations
SELECT count() AS total, math::mean(age) AS avg_age FROM user GROUP BY country;

Graph Relations

-- Create a relationship between two records
RELATE user:darsh -> follows -> user:alice SET since = time::now();
RELATE user:darsh -> wrote -> article:rust_is_great SET published = true;

-- Traverse the graph — who does Darsh follow?
SELECT ->follows->user.name FROM user:darsh;

-- Reverse traversal — who follows Alice?
SELECT <-follows<-user.name FROM user:alice;

-- Multi-hop — friends of friends
SELECT ->follows->user->follows->user.name FROM user:darsh;

-- Graph with conditions
SELECT ->follows->user WHERE age > 25 FROM user:darsh;

LIVE SELECT — Real-Time Subscriptions

-- Subscribe to all changes on a table
LIVE SELECT * FROM user;

-- Subscribe with filters — only get notified about relevant changes
LIVE SELECT * FROM user WHERE country = 'IN';

-- Subscribe to graph changes
LIVE SELECT * FROM follows WHERE in = user:darsh;

-- Subscribe with diff mode — only receive the changed fields
LIVE SELECT DIFF FROM user;

When a mutation matches a LIVE SELECT, DarshJDB pushes the change to all subscribed clients over WebSocket. No polling. No webhooks. No external message broker.

Embedded Functions

-- String functions
SELECT string::uppercase(name), string::slug(title) FROM article;

-- Math functions
SELECT math::mean(scores), math::median(scores) FROM student;

-- Time functions
SELECT * FROM event WHERE created > time::now() - 7d;

-- Crypto functions
SELECT crypto::argon2::generate(password) AS hash FROM $input;

-- Vector search — semantic similarity
SELECT * FROM document WHERE embedding <|4|> $query_vector;

-- Geo functions
SELECT * FROM restaurant WHERE geo::distance(location, $user_location) < 5km;

-- HTTP functions (server-side)
SELECT http::get('https://api.example.com/data') AS response;

-- Custom functions
DEFINE FUNCTION fn::greet($name: string) {
  RETURN string::concat('Namaste, ', $name, '!');
};
SELECT fn::greet(name) FROM user;

Transactions

BEGIN TRANSACTION;
  LET $from = (UPDATE wallet:darsh SET balance -= 100 RETURN AFTER);
  LET $to = (UPDATE wallet:alice SET balance += 100 RETURN AFTER);
  IF $from.balance < 0 {
    CANCEL TRANSACTION;
  };
COMMIT TRANSACTION;
graph LR
    subgraph pipeline["DarshQL Query Pipeline"]
        direction LR
        Q["DarshQL\n<i>query string</i>"] --> P["Parser\n<i>lexer + grammar</i>"]
        P --> AST["AST\n<i>typed syntax tree</i>"]
        AST --> OPT["Optimizer\n<i>index selection\njoin reordering</i>"]
        OPT --> EXEC["Executor\n<i>parallel evaluation\npermission injection</i>"]
        EXEC --> STORE["Storage\n<i>Postgres / Memory\n/ RocksDB</i>"]
    end

    style Q fill:#cc9933,color:#000
    style P fill:#1a1a2e,color:#fff
    style AST fill:#1a1a2e,color:#fff
    style OPT fill:#0f3460,color:#fff
    style EXEC fill:#14532d,color:#fff
    style STORE fill:#336791,color:#fff
Loading

Multi-Model Storage

DarshJDB is not five databases duct-taped together. It is one engine with five access patterns over the same storage layer.

Model How DarshJDB implements it Example
Document Schemaless tables, nested JSON, flexible fields CREATE article SET title = 'Hello', tags = ['rust', 'db']
Graph RELATE statements, -> / <- traversals, typed edges RELATE user:darsh -> authored -> article:hello
Relational SCHEMAFULL tables, indexes, joins, foreign keys DEFINE TABLE invoice SCHEMAFULL; DEFINE FIELD customer ON invoice TYPE record<customer>
Key-Value Direct record access by ID, O(1) lookups SELECT * FROM config:smtp / UPDATE config:smtp SET host = 'mail.example.com'
Vector pgvector HNSW indexes, cosine/euclidean/dot product SELECT * FROM document WHERE embedding <|4|> $query ORDER BY dist ASC
graph TB
    subgraph models["Five Models, One Engine"]
        DOC["Document Store\n<i>schemaless JSON\nnested objects, arrays</i>"]
        GRAPH["Graph Database\n<i>RELATE, traverse\nmulti-hop queries</i>"]
        REL["Relational Tables\n<i>SCHEMAFULL, indexes\njoins, constraints</i>"]
        KV["Key-Value Cache\n<i>O(1) record access\nconfig, sessions</i>"]
        VEC["Vector Search\n<i>HNSW embeddings\nsemantic similarity</i>"]
    end

    subgraph storage["Unified Storage Layer"]
        TS["Triple Store\n<i>Entity-Attribute-Value</i>"]
        PG[("PostgreSQL 16+\n<i>pgvector + tsvector</i>")]
        TS --> PG
    end

    DOC --> TS
    GRAPH --> TS
    REL --> TS
    KV --> TS
    VEC --> TS

    style models fill:#1a1a2e,stroke:#F59E0B,color:#fff
    style storage fill:#0f3460,stroke:#F59E0B,color:#fff
    style PG fill:#336791,stroke:#fff,color:#fff
Loading

Real-Time Architecture

Every mutation in DarshJDB flows through a change feed. LIVE SELECT queries register interest in specific data patterns. When a mutation matches, the diff is pushed to all subscribers — filtered through row-level permissions so each client only sees what they are allowed to see.

sequenceDiagram
    participant W as Writer Client
    participant S as DarshJDB Server
    participant CF as Change Feed
    participant LQM as Live Query Manager
    participant P as Permission Engine
    participant R1 as Reader A (WebSocket)
    participant R2 as Reader B (WebSocket)

    W->>S: CREATE user SET name = 'Darsh'
    S->>S: Execute mutation + write to storage
    S->>CF: Emit ChangeEvent{table: user, action: CREATE, data: {...}}
    CF->>LQM: Match against registered LIVE SELECTs
    LQM->>P: Filter: which subscribers can see this record?
    P-->>LQM: Reader A: full access, Reader B: denied
    LQM-->>R1: WebSocket push: {action: CREATE, result: {name: 'Darsh'}}
    Note over R2: Reader B receives nothing — permission denied
Loading

How DarshJDB Compares

No names. Just capabilities.

Capability DarshJDB Traditional BaaS Cloud Databases
Multi-model (doc + graph + relational + KV + vector) Yes Partial Rare
Graph relations with traversal Yes No Some
LIVE SELECT real-time queries Yes Polling or webhooks Some
Single binary deployment Yes 3-7 services Cloud-only
Self-hosted, your metal Yes Some Cloud-first
Row-level permissions Yes Yes Some
Embedded functions (string, math, geo, crypto, http) Yes Rare Rare
Vector search (HNSW) Yes No Some
Triple store / knowledge graph Yes No No
Schema modes (strict, flexible, mixed) Yes Pick one Pick one
WebSocket + SSE subscriptions Yes WebSocket only Varies
Merkle audit trail Yes No Enterprise only
Runs on a $5 VPS Yes Depends No

Deployment

End-to-end example

# 1. Create a session
curl -X POST http://localhost:7700/api/agent/sessions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"claude-opus","model":"claude-opus-4","system_prompt":"You are a travel agent."}'
# → {"session":{"id":"3f1e...","agent_id":"claude-opus",...}}

# 2. Append a user turn
curl -X POST http://localhost:7700/api/agent/sessions/3f1e.../messages \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"role":"user","content":"Book me a flight to Tokyo next Friday."}'

# 3. Build a prompt for the next LLM call — reverse-chron + semantic recall
curl "http://localhost:7700/api/agent/sessions/3f1e.../context?max_tokens=4096&current_query=tokyo%20flight&recall_top_k=5&include_facts=true" \
  -H "Authorization: Bearer $TOKEN"
# → {"messages":[...], "total_tokens": 842, "budget_remaining": 3254}
import httpx

ddb = httpx.Client(base_url="http://localhost:7700",
                   headers={"Authorization": f"Bearer {token}"})

# Create session
session = ddb.post("/api/agent/sessions",
                   json={"agent_id": "claude-opus",
                         "model": "claude-opus-4"}).json()["session"]
sid = session["id"]

# Log turns
for role, content in turns:
    ddb.post(f"/api/agent/sessions/{sid}/messages",
             json={"role": role, "content": content})

# Build the next prompt — unlimited history compressed into a 4k budget
ctx = ddb.get(f"/api/agent/sessions/{sid}/context",
              params={"max_tokens": 4096,
                      "current_query": user_query,
                      "recall_top_k": 5,
                      "include_facts": True}).json()

messages = ctx["messages"]  # Pass straight to Anthropic / OpenAI / Ollama

No external services. No per-token pricing. One Rust binary.


Redis Drop-in (RESP3, port 7701)

The ddb-cache-server binary speaks native RESP3 on port 7701. Any Redis client works. ddb-cache writes through an L1 DashMap (lz4-compressed, sub-microsecond) to an L2 Postgres table (zstd-compressed for values >=1KB) so the data survives restarts.

# Point any redis-cli at DarshJDB
redis-cli -p 7701 PING
# → PONG

redis-cli -p 7701 HELLO 3
redis-cli -p 7701 SET user:42 '{"name":"Alice"}' EX 3600
redis-cli -p 7701 GET user:42
# → "{\"name\":\"Alice\"}"

redis-cli -p 7701 HSET session:abc user_id 42 csrf_token tok_9f
redis-cli -p 7701 HGETALL session:abc

redis-cli -p 7701 ZADD leaderboard 1500 alice 2200 bob
redis-cli -p 7701 ZRANGEBYSCORE leaderboard 0 3000 WITHSCORES

redis-cli -p 7701 XADD events '*' type signup user_id 42
redis-cli -p 7701 XRANGE events - +

Supported commands: AUTH, HELLO 3, PING, QUIT, INFO, GET, SET, DEL, EXISTS, EXPIRE, TTL, KEYS, HSET, HGET, HGETALL, HDEL, HLEN, LPUSH, RPUSH, LPOP, RPOP, LRANGE, ZADD, ZRANGE, ZRANGEBYSCORE, ZRANK, ZREM, ZSCORE, SUBSCRIBE, UNSUBSCRIBE, PUBLISH, XADD, XREAD, XRANGE, BFADD, BFEXISTS, PFADD, PFCOUNT.

Operation Redis DarshJDB (ddb-cache)
GET / SET Yes Yes
Hashes Yes Yes
Lists Yes Yes
Sorted sets Yes Yes
Streams (XADD) Yes Yes
Bloom filters RedisBloom mod Native (BFADD / BFEXISTS)
HyperLogLog Yes Native (PFADD / PFCOUNT)
Pub/Sub Yes Yes (RESP3 + WebSocket + SSE)
TTL / expiry Yes Yes (background reaper)
Persistence RDB + AOF Postgres write-through
Process Separate daemon Same binary as the DB

An HTTP mirror at /api/cache/* exposes the same commands for clients that do not want to open a second socket. One fewer service to run. One fewer service to monitor. One fewer service to secure.


Quick Start

Three ways to get a local server. Pick one.

1. Zero-dep Rust (recommended for hacking)

Build with the embedded-db feature and DarshJDB will download, launch, and shut down a portable Postgres 16 server for you on a free port. No DATABASE_URL, no docker, no brew install postgresql.

git clone https://github.com/darshjme/darshjdb.git
cd darshjdb
cargo run --bin ddb-server --features embedded-db

First run downloads ~20 MB of Postgres binaries (one-time); subsequent runs are instant. Data lives in ~/.darshjdb/data/pg and persists across restarts. If DATABASE_URL is set, the embedded server is skipped and the configured URL wins.

2. One-line installer (latest release binary)

curl -fsSL https://raw.githubusercontent.com/darshjme/darshjdb/main/scripts/install.sh | bash
~/.darshjdb/bin/ddb start

The installer pulls the correct binary from the GH Actions release matrix (x86_64-unknown-linux-musl, aarch64-apple-darwin, x86_64-pc-windows-msvc) and drops it at $HOME/.darshjdb/bin/ddb. Override with DARSH_INSTALL_DIR=... if you want it elsewhere.

3. Docker compose (BYO Postgres image)

git clone https://github.com/darshjme/darshjdb.git
cd darshjdb
docker compose up -d
curl http://localhost:7700/health

Once the server is up:

# REST data plane
curl -X POST http://localhost:7700/api/auth/signup \
  -H "Content-Type: application/json" \
  -d '{"email":"dev@example.com","password":"changeme123"}'

curl -X POST http://localhost:7700/api/data/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"title":"Ship v1","status":"active","priority":1}'

# Redis-compatible cache (RESP3 on 7701)
redis-cli -p 7701 PING                 # → PONG

# Liveness / readiness / metrics
curl http://localhost:7700/health
curl http://localhost:7700/ready
curl http://localhost:7700/metrics     # Prometheus, IP-allowlisted

Architecture

graph TB
    subgraph Clients["Client SDKs"]
        TS["TypeScript"]
        React["React"]
        Next["Next.js"]
        Angular["Angular"]
        Python["Python"]
        PHP["PHP"]
    end

    subgraph Server["DarshJDB Server -- Single Rust Binary"]
        API["REST + WebSocket\nAxum + Tokio"]
        AUTH["Auth Engine\nArgon2id, JWT RS256\nOAuth, MFA, Magic Links"]
        PERM["Permission Engine\nRow-Level Security"]
        QE["Query Engine\nDarshJQL"]
        TS2["Triple Store\nEAV"]
        RT["Real-Time\nLive Queries + Pub/Sub"]
        STORE["Storage\nS3 / R2 / Local"]
    end

    PG[("PostgreSQL 16+\npgvector + tsvector")]

    Clients --> API
    API --> AUTH
    AUTH --> PERM
    PERM --> QE
    QE --> TS2
    TS2 --> PG
    API --> RT
    API --> STORE

    style Server fill:#0d1117,stroke:#F59E0B,color:#c9d1d9
    style PG fill:#336791,stroke:#fff,color:#fff
    style Clients fill:#161b22,stroke:#F59E0B,color:#c9d1d9
Loading

Request Lifecycle

sequenceDiagram
    participant C as Client
    participant R as Router (Axum)
    participant A as Auth
    participant P as Permissions
    participant Q as Query Engine
    participant T as Triple Store
    participant PG as PostgreSQL

    C->>R: HTTP / WebSocket
    R->>A: Validate JWT
    A->>P: Load permission rules
    P->>Q: Inject RLS clauses
    Q->>T: DarshJQL to SQL
    T->>PG: Execute
    PG-->>T: Rows
    T-->>Q: Triples to entities
    Q-->>P: Filter restricted fields
    P-->>C: JSON response
Loading

Module Map

29 server modules under packages/server/src/:

Module Purpose
api HTTP route handlers, request/response types
api_keys API key generation, validation, scoping
activity Activity log, user action tracking
aggregation COUNT, SUM, AVG, MIN, MAX, GROUP BY
audit Merkle tree hash chain, tamper detection
auth Signup, signin, OAuth, MFA/TOTP, magic links, token refresh
automations Scheduled tasks, cron triggers
cache DashMap query cache, ChangeEvent invalidation
collaboration Presence, cursor sharing
connectors Webhook + log connectors, entity-level sync
embeddings pgvector HNSW, OpenAI/Ollama/NIM providers
events Change feed, mutation event bus
fields Field definitions, type validation
formulas Computed fields, formula evaluation
functions Server functions — Node.js/Deno subprocess default, or embedded V8 via --features v8 (VYASA)
graph RELATE, traversals, multi-hop queries
history Record version history, point-in-time queries
import_export JSON/CSV import, full database export
plugins Plugin registry, lifecycle hooks
query DarshJQL parser, AST, optimizer, executor
relations Foreign key relations, record links
rules Forward-chaining rules, triggered triples
schema SCHEMALESS / SCHEMAFULL / SCHEMAMIXED modes
storage File uploads: local FS, S3, R2, MinIO
sync Real-time diff engine, WebSocket broadcast
tables Table definitions, namespace management
triple_store EAV storage layer, entity pool (UUID to i64)
views Materialized views, virtual tables
webhooks Outbound webhooks on mutation events

DarshJQL

DarshJQL is the query language built for DarshJDB. It borrows from SQL, graph query languages, and document query builders.

Basic Query

-- Define a table
DEFINE TABLE user SCHEMAFULL;
DEFINE FIELD name ON user TYPE string ASSERT $value != NONE;
DEFINE FIELD email ON user TYPE string ASSERT string::is::email($value);
DEFINE INDEX idx_email ON user FIELDS email UNIQUE;

-- Create records
CREATE user SET name = 'Alice', email = 'alice@example.com';
CREATE user:bob SET name = 'Bob', email = 'bob@example.com';

-- Query with conditions
SELECT * FROM user WHERE email CONTAINS 'example.com' ORDER BY name ASC LIMIT 10;
-- Returns: [{id: "user:...", name: "Alice", email: "alice@example.com"}, ...]

Graph Traversal

-- Define a relation
DEFINE TABLE follows SCHEMAFULL TYPE RELATION IN user OUT user;

-- Create edges
RELATE user:alice -> follows -> user:bob SET since = time::now();
RELATE user:alice -> follows -> user:carol;

-- Who does Alice follow?
SELECT ->follows->user.name FROM user:alice;
-- Returns: ["Bob", "Carol"]

-- Who follows Bob? (reverse traversal)
SELECT <-follows<-user.name FROM user:bob;
-- Returns: ["Alice"]

-- Friends of friends (multi-hop)
SELECT ->follows->user->follows->user.name FROM user:alice;

Full-Text Search

-- String and math functions
SELECT string::uppercase(name), math::mean(scores) FROM student;

-- Time-based queries
SELECT * FROM event WHERE created > time::now() - 7d;

-- Vector search (semantic similarity)
SELECT * FROM document WHERE embedding <|4|> $query_vector;

-- Geo queries
SELECT * FROM restaurant WHERE geo::distance(location, $user_location) < 5km;

Real-Time Subscription

-- Subscribe to all changes on a table
LIVE SELECT * FROM user;

-- Subscribe with filters
LIVE SELECT * FROM user WHERE country = 'IN';

-- Diff mode -- only changed fields
LIVE SELECT DIFF FROM user;

When a mutation matches a LIVE SELECT, DarshJDB pushes the change over WebSocket. No polling, no external message broker. DarshanQL also supports a whitelisted SQL passthrough at POST /api/sql (admin-only, audit-logged) for the "just give me SQL" case.


Hybrid Search (pgvector + FTS + RRF)

Three search endpoints, backed by pgvector HNSW indexes and Postgres ts_rank FTS:

# Pure semantic (cosine over HNSW)
curl -X POST http://localhost:7700/api/search/semantic \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.012, -0.084, ...],
    "entity_type": "article",
    "limit": 10
  }'

# Full-text
curl "http://localhost:7700/api/search/text?q=rust+database&entity_type=article&limit=10" \
  -H "Authorization: Bearer $TOKEN"

# Hybrid — Reciprocal Rank Fusion (k=60, Cormack et al.)
curl -X POST http://localhost:7700/api/search/hybrid \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query_text": "how does the triple store work",
    "vector": [0.012, -0.084, ...],
    "entity_type": "doc",
    "attribute": "body",
    "limit": 10,
    "weights": { "semantic": 1.0, "text": 1.0 }
  }'

The hybrid handler pulls a 4x candidate window from each side, fuses with score(d) = Σ wᵢ / (k + rankᵢ(d)), then returns the top limit rows. Embedding generation is caller-side on purpose — bring your OpenAI / Ollama / Anthropic / NIM provider of choice, or let the built-in embedding worker handle it.


Model Context Protocol

DarshJDB is a native Model Context Protocol server. Point Claude Desktop, Cursor, or any MCP client at POST /api/mcp and it exposes 10 first-class tools:

Tool Purpose
ddb_query Execute a DarshJQL query
ddb_mutate Apply a batch of create/update/delete operations
ddb_semantic_search Vector similarity over embeddings
ddb_memory_store Persist a chat-style memory turn
ddb_memory_recall Retrieve chat memory for a session
ddb_graph_traverse BFS/DFS over graph edges
ddb_timeseries Time-bucketed aggregation of events
ddb_cache_get Read a value from the hot KV cache
ddb_cache_set Write a value into the hot KV cache
ddb_kv_list List cache keys matching a pattern

Drop this into ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "darshjdb": {
      "command": "ddb-server",
      "args": ["--embedded-db", "--mcp-stdio"],
      "env": {
        "DARSHJDB_BEARER": "<your-token>"
      }
    }
  }
}

Or point an HTTP MCP client directly at the running server:

curl -X POST http://localhost:7700/api/mcp \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

An SSE streaming endpoint at /api/agent/stream pushes intermediate tokens for clients that want to render model output progressively instead of waiting on a single response.


Features

Data

Feature Description
Triple store (EAV) Every record is a set of (entity, attribute, value) triples over PostgreSQL
Typed fields String, number, boolean, datetime, record, array, object, geometry
Schema modes SCHEMALESS (dev), SCHEMAFULL (prod), SCHEMAMIXED (migration), strict mode (schema_definitions, 422 on violation)
Views Materialized views, virtual tables
Formulas Computed fields, formula evaluation
Relations Record links, foreign keys, graph edges
Aggregation COUNT, SUM, AVG, MIN, MAX, GROUP BY
Full-text search PostgreSQL tsvector + GIN indexes, ts_rank ordering
Vector search pgvector HNSW + IVFFlat, cosine/euclidean/dot product
Hybrid search Reciprocal Rank Fusion (k=60) over semantic + FTS
Entity Pool UUID-to-i64 dictionary encoding for performance
SQL passthrough Whitelisted DML at POST /api/sql, admin-only, audit-logged

AI & Agents

Feature Description
Agent memory 3-tier hierarchy (working / episodic / semantic) + agent_facts injection
Unlimited context LLM-backed episodic → semantic summariser at 50/100/200 thresholds
Context builder tiktoken budget, reverse-chron window, semantic top-K recall
Embedding worker OpenAI, Ollama, Anthropic, None (pluggable), 5s batch tick
MCP server JSON-RPC 2.0 at /api/mcp, 10 tools, Claude Desktop / Cursor compatible
Streaming agent SSE endpoint at /api/agent/stream
Sessions API POST /api/agent/sessions, /messages, /context, /search, /timeline, /stats, /facts

Caching & Streams (ddb-cache)

Feature Description
RESP3 server Port 7701, drop-in Redis client compatibility
L1 cache In-process DashMap, lz4-compressed, sub-microsecond
L2 cache Postgres-backed, zstd-compressed (values >=1KB), write-through
Data structures Strings, hashes, lists, sorted sets, streams (XADD/XREAD/XRANGE)
Probabilistic Bloom filter (BFADD/BFEXISTS), HyperLogLog (PFADD/PFCOUNT)
Pub/Sub RESP3 SUBSCRIBE/PUBLISH, plus WS + SSE mirrors
HTTP mirror /api/cache/* for clients that don't want a second socket

Time-series & Graph

Feature Description
Hypertables TimescaleDB time_series table at /api/ts/{entity_type}
TS endpoints Insert, range, aggregate (bucket/window), latest
Graph edges edges table with BFS traversal
Graph endpoints /api/graph/relate, /traverse, /neighbors, /outgoing, /incoming

Auth

Feature Description
Password auth Argon2id (64MB memory, 3 iterations, OWASP recommended)
OAuth Google, GitHub, Apple, Discord, and more
MFA / TOTP Time-based one-time passwords
Magic links Email-based passwordless login (SMTP + SendGrid + dev-log backends)
JWT RS256 Asymmetric token signing, 15min access + 7day refresh
Token refresh Automatic rotation with SHA256-hashed refresh tokens
Session hardening 24h absolute timeout, max 5 concurrent sessions per user
Login rate limit Exponential backoff after 5 failures, lock after 10 for 3600s
Admin RBAC Cryptographic JWT validation of admin role — no stubs

Real-Time

Feature Description
WebSocket subscriptions LIVE SELECT pushes {added, removed, updated} diff buckets
WS diff engine Incremental diffs only — full replays are opt-in
Presence Online status, cursor positions
Pub/Sub WebSocket channels + SSE, pattern matching
SSE subscriptions entity_type + where-clause re-evaluation on every mutation
Permission-filtered Each client only receives data they are authorized to see

Multimodal Storage

Feature Description
File storage S3, Cloudflare R2, MinIO, local filesystem
Path traversal safe Sanitized paths on upload (Phase 0 fix)
Chunked uploads /api/storage/upload/{init, chunk, status} for resumable transfers
Image transforms Resize / crop / format / quality pipeline with lz4 byte cache

Infrastructure

Feature Description
Webhooks Outbound HTTP on mutation events
API keys Scoped keys for service-to-service auth
Plugins Plugin registry with lifecycle hooks
Automations Cron-triggered scheduled tasks
Server functions Node.js / Deno subprocess execution — or embedded V8 isolate via cargo run --features v8 (VYASA, sub-ms cold start)
Connectors Webhook + log connectors, entity-level sync
TTL / Expiry Per-entity expiry with background reaper
Batch API Multiple operations in a single request

DevEx

Feature Description
Embedded Postgres cargo run --features embedded-db — zero external deps
One-line installer scripts/install.sh fetches the right release binary
Release matrix linux-musl, darwin-aarch64, windows-msvc via GH Actions
CLI (ddb) start, sql console, import, export, status
Admin dashboard React + Vite + Tailwind, embedded via include_dir!
7 SDKs TypeScript, React, Next.js, Angular, Python, PHP, cURL
Import / Export JSON and CSV, full database backup and restore
Forward-chaining rules Trigger implied triples on mutation

Observability

Feature Description
Prometheus metrics /metrics (IP-allowlisted)
Health probes /health, /ready, /live — plus legacy rich /health/full
Structured logging JSON logs with request_id propagation

Integrity

Feature Description
Merkle audit trail SHA-512 hash chain, tamper detection, verification endpoints
Blockchain anchor SHA3-Keccak aggregate Merkle roots, optional IPFS / Ethereum submission (anchor-ipfs, anchor-eth feature flags)
Row-level security WHERE clause injection per user on every query
Field-level filtering Restricted fields stripped from responses
Rate limiting Token bucket per IP and per user
TLS Native rustls via DDB_TLS_CERT / DDB_TLS_KEY
CORS Environment-aware origin configuration

SDKs

TypeScript

import { DDB } from '@darshjdb/client';

const db = new DDB({ serverUrl: 'http://localhost:7700' });
await db.signin({ email: 'dev@example.com', password: 'changeme123' });

const task = await db.create('task', {
  title: 'Ship v1',
  status: 'active'
});

const tasks = await db.select('task', {
  where: { status: 'active' },
  orderBy: { created: 'desc' },
  limit: 10
});

React

import { useQuery, useMutation, useAuth } from '@darshjdb/react';

function TaskList() {
  const { data, loading } = useQuery({
    tasks: { $where: { status: 'active' } }
  });

  const [createTask] = useMutation('task');

  if (loading) return <p>Loading...</p>;
  return data.tasks.map(t => <div key={t.id}>{t.title}</div>);
}

Next.js

import { createServerClient } from '@darshjdb/nextjs';

// Server Component
export default async function Page() {
  const db = createServerClient();
  const tasks = await db.select('task', { where: { status: 'active' } });
  return <ul>{tasks.map(t => <li key={t.id}>{t.title}</li>)}</ul>;
}

Angular

import { injectDarshan } from '@darshjdb/angular';

@Component({ /* ... */ })
export class TaskComponent {
  private readonly db = injectDarshan();
  tasks = this.db.query('task', { where: { status: 'active' } });
}

Python

from darshjdb import DarshJDB, AsyncDarshJDB

db = DarshJDB("http://localhost:7700")
db.signin(email="dev@example.com", password="changeme123")

task = db.create("task", {"title": "Ship v1", "status": "active"})
tasks = db.select("task", where={"status": "active"}, limit=10)

# Async (FastAPI)
adb = AsyncDarshJDB("http://localhost:7700")

@app.get("/tasks")
async def get_tasks():
    return await adb.select("task", limit=50)

PHP

use Darshjdb\Client;

$db = new Client('http://localhost:7700');
$db->signin(['email' => 'dev@example.com', 'password' => 'changeme123']);

$task = $db->create('task', ['title' => 'Ship v1', 'status' => 'active']);
$tasks = $db->select('task', ['where' => ['status' => 'active'], 'limit' => 10]);

Comparison

An honest comparison. "Yes" means the feature is implemented in-tree. Dashes mean it is not available.

DarshJDB Firebase Supabase Convex PocketBase
Self-hosted Yes No Yes No Yes
Real-time WS + SSE + RESP3 WebSocket WebSocket WebSocket SSE
Auth (built-in) Yes Yes Yes Yes Yes
File storage S3/R2/Local + chunked + transforms Yes Yes Yes Local
Custom query language DarshJQL No SQL JS/TS No
Graph traversal Yes No No No No
Vector search pgvector HNSW + RRF hybrid No pgvector No No
Agent memory (unlimited context) Yes (3 tiers + summariser) No No No No
Redis-compatible cache Yes (RESP3 port 7701) No No No No
Time-series hypertables Yes (Timescale) No Yes (Timescale ext) No No
MCP server Yes (10 tools) No No No No
Blockchain anchoring Yes (Merkle + IPFS/ETH) No No No No
Triple store / EAV Yes No No No No
Schema modes (strict + flexible) Yes Flexible only Strict only Strict only Flexible only
Single binary Yes Cloud Multi-service Cloud Yes
Runs on 512MB RAM Yes N/A No N/A Yes
Production-hardened No (alpha) Yes Yes Yes Yes
Hosted option No Yes Yes Yes No
Price Free (MIT) Free tier + pay Free tier + pay Free tier + pay Free (MIT)

What Works and What Doesn't

Two columns. No hedging.

Working Alpha / Incomplete
REST API: full CRUD over triple store npm / crates.io packages not yet published
Auth: signup, signin, JWT, refresh, magic links, admin RBAC No hosted documentation site
OAuth: Google, GitHub, Apple, Discord No performance benchmarks published
MFA / TOTP No horizontal scaling / multi-node
Session hardening + exponential login rate limit Mobile SDKs (Swift, Kotlin) not started
Row-level + field-level security Embedded V8 runtime available via --features v8 (VYASA), subprocess remains default
WebSocket diff engine ({added, removed, updated}) Phone OTP auth not implemented
LIVE SELECT auto-registration No production deployment under real traffic
DarshJQL parser, optimizer, executor Auth test suite still relies on live Postgres (testcontainers migration pending)
SQL passthrough (admin-only, audit-logged) Typed DdbConfig hierarchy (Slice 17) still on the roadmap
Graph relations, BFS traversal, /api/graph/* Graph visualisation UI (Slice 22) not started
Full-text search (ts_rank)
Vector search (pgvector HNSW + IVFFlat)
Hybrid search (Reciprocal Rank Fusion, k=60)
Agent memory tiers + summariser (50/100/200)
Embedding worker (OpenAI / Ollama / Anthropic / None)
RESP3 cache server on port 7701
Hashes, lists, sorted sets, streams, bloom, HLL
TimescaleDB hypertable at /api/ts/*
MCP JSON-RPC server + 10 tools
SSE streaming agent endpoint
Chunked / resumable uploads
Image transforms pipeline with lz4 cache
Strict schema mode (schema_definitions)
Prometheus /metrics + /health / /ready / /live
Structured JSON logging with request_id
Embedded Postgres 16 (--features embedded-db)
Admin dashboard embedded via include_dir!
One-line installer (scripts/install.sh)
GH Actions release matrix (linux-musl / darwin-aarch64 / windows-msvc)
File storage (S3, R2, MinIO, local)
Merkle audit trail + optional IPFS / Ethereum anchor
Rate limiting (token bucket)
CLI: start, sql, import, export
7 SDKs with tests

Roadmap

v0.3.0 Grand Transformation — this release. Everything in the Feature Grid above.

Coming next:

  • Slice 17 -- Typed DdbConfig hierarchy. Defaults, config.toml, DARSH_* env override, threaded through main.rs.
  • Slice 22 -- Graph UI. Visual editor for the edges table in the embedded admin dashboard.
  • testcontainers for auth tests. Replace live Postgres requirement with testcontainers for CI.
  • npm + crates.io publish. Package metadata is in place; actual cargo publish / npm publish is the next ship task.
  • Phase 8 -- production hardening pass. Soak tests, fuzz targets, chaos drills, and the first public performance benchmarks.

Looking further out: mobile SDKs (Swift, Kotlin), V8-in-process server functions, horizontal scaling via logical replication, and a hosted db.darshj.me control plane.


Contributing

# Prerequisites: Rust 1.85+, optional Postgres 16+ (or use --features embedded-db), Node.js 20+

# Clone and build
git clone https://github.com/darshjme/darshjdb.git
cd darshjdb
cargo build --workspace

# Option A -- zero-dep: embedded Postgres
cargo run --bin ddb-server --features embedded-db

# Option B -- bring your own Postgres
docker compose up postgres -d
DATABASE_URL=postgres://darshan:darshan@localhost:5432/darshjdb \
  cargo run --bin ddb-server

# Rust tests (workspace: server, cache, cache-server, agent-memory, cli)
cargo test --workspace

# TypeScript SDK tests
cd packages/tests && npm install && npm test

# Python SDK tests
cd sdks/python && pip install -e . && pytest

# PHP SDK tests
cd sdks/php && composer install && composer test

Read CONTRIBUTING.md for code style, PR process, and architecture decisions. Read SECURITY.md for reporting vulnerabilities.

The project is alpha. There is real work to do. If you care about self-hosted infrastructure and developer tools, pull requests are welcome.


Built With AI

This project was scaffolded using AI-assisted development tools (Claude Code) and is being hardened into production-grade software. The architecture is intentional, the code compiles and passes CI, but this is alpha software built in weeks, not years. Contributions that improve production-readiness are especially welcome.


License & Attribution

MIT. See LICENSE and NOTICE.

DarshJDB is the original creation of Darshankumar Joshi. The triple-store architecture over PostgreSQL, the DarshanQL query language, the 3-tier agent memory model, the Merkle audit chain, and all associated SDKs are original works. Workspace authors, homepage (db.darshj.me), repository, and keywords are wired into Cargo.toml at the workspace root.


Built by Darshankumar Joshi in Navsari and Ahmedabad, Gujarat, India.

db.darshj.me | GitHub | darshj.ai