Skip to content

A flexibly programmable research tool for controlled long-form text reduction, high-precision long-form summarization, research report generation, and research pipeline assistance.

License

Notifications You must be signed in to change notification settings

Node0/crystallizer

Repository files navigation

Crystallizer

A Map -> Reduce powerhouse, disguised as an insight summarization tool.

Crystallizer is a programmable, LLM-powered general purpose data traversal and transformation tool.

Its default use-case will be as insight extraction and cohesion across N parts of long documents (think books).

However it can be programmed to do a large number of open-ended tasks, owing to it's templated system and task prompt design.

Crystallizer-Web

Installation

📋 Complete Installation Guide - Choose your preferred tool

Quick Links by Tool:

Quick Start (pip)

Requirements: Python 3.11+

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Test installation
python crystallizer.py --help

Usage

python crystallizer.py \
  --system-prompt system_prompt.j2 \
  --haystack-path ./chat_logs \
  --connection ollama-local \
  --task-label gluon_design \
  --output-dir ./crystals

Configuration

Configure each LLM connection in config.json:

{
  "inference_service_connections": {
    "ollama-local": {
      "api_type": "ollama",
      "base_url": "http://localhost:11434",
      "default_model": "qwen2.5-coder:32b",
      "default_ctx_len": 18000
    },
    "openai-main": {
      "api_type": "openai",
      "base_url": "https://api.openai.com/v1",
      "api_key": "sk-...",
      "default_model": "gpt-4o-mini",
      "default_ctx_len": 128000
    }
  }
}

Features

  • Token-Aware Windowing: Automatically chunks large documents to fit LLM context limits
  • Multi-Provider Support: Works with Ollama (local) and OpenAI (cloud) backends
  • Template-Driven Prompts: Jinja2 templates for custom system prompts
  • Hierarchical Processing: 3-segment micro-windowing with merge strategies
  • Professional Logging: Semantic progress tracking with contextual semaphores
  • Batch Processing: Handle single files or entire directories

License

GNU AGPLv3

About

A flexibly programmable research tool for controlled long-form text reduction, high-precision long-form summarization, research report generation, and research pipeline assistance.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published