Skip to content

front-depiction/Effect-FuzzyCache

Repository files navigation

FuzzyCache

A high-performance fuzzy cache for Effect that extends Effect's Cache with approximate parameter matching and intelligent request deduplication.

Features

  • Effect Cache Subtyping - Drop-in replacement for Effect's Cache and ConsumerCache interfaces
  • Fuzzy Parameter Matching - Score-based approximate matching with configurable thresholds (Levenshtein, numeric, custom)
  • In-Flight Request Deduplication - Automatically shares lookups across concurrent requests using hash-based comparison
  • Order-Agnostic Hashing - Uses Effect's Hash.structure() for reliable parameter comparison across Effect types
  • Expiration-Based Eviction - Smart capacity management that evicts entries with earliest expiration times first
  • Two-Level Capacity Management - Separate bucket and entry-level limits with independent control
  • TTL-Based Expiration - Automatic time-to-live management with per-exit customization
  • Granular Statistics - Separate exact (bucket-level) and fuzzy (entry-level) stats tracking
  • Property-Based Tested - Comprehensive FastCheck property tests with 1000+ runs per property

Installation

npm install @effect/platform effect

Quick Start

import * as FuzzyCache from "./FuzzyCache"
import * as Matchers from "./Matchers"
import { Effect, Duration } from "effect"

interface SearchParams {
  readonly userId: string
  readonly query: string
  readonly maxResults: number
}

const cache = FuzzyCache.make({
  lookup: (params: SearchParams) =>
    Effect.succeed(`Results for ${params.query}`),
  config: {
    userId: Matchers.Exact(),           // Exact match - defines buckets
    query: Matchers.levenshtein(0.3),   // Fuzzy match - 30% difference allowed
    maxResults: Matchers.numeric(5)     // Numeric match - within 5 units
  },
  capacity: { bucket: 100, list: 10 },  // 100 buckets, 10 entries per bucket
  timeToLive: Duration.minutes(5)
})

const program = Effect.gen(function* () {
  const fuzzyCache = yield* cache

  // First call invokes lookup
  const value1 = yield* fuzzyCache.get({
    userId: "user123",
    query: "hello world",
    maxResults: 10
  })

  // Similar query returns cached result (fuzzy match)
  const value2 = yield* fuzzyCache.get({
    userId: "user123",
    query: "hello earth",  // Similar to "hello world"
    maxResults: 12         // Within numeric tolerance of 10
  })

  // Get all matches above threshold
  const allMatches = yield* fuzzyCache.getAll(
    { userId: "user123", query: "hello", maxResults: 10 },
    0.7  // Minimum score threshold
  )

  allMatches.forEach(match => {
    console.log(`Score: ${match.score}, Value: ${match.value}`)
  })
})

Core Concepts

Bucket-Based Architecture

FuzzyCache organizes entries into buckets based on exact-match parameters:

  1. Exact Parameters: Define bucket boundaries (must match exactly)
  2. Fuzzy Parameters: Enable approximate matching within buckets
  3. Scored Results: Each match receives a similarity score (0.0 to 1.0)
const cache = yield* FuzzyCache.make({
  lookup: (params: { url: string; prompt: string }) =>
    Effect.succeed(`Answer for ${params.prompt}`),
  config: {
    url: Matchers.Exact(),           // Different URLs = different buckets
    prompt: Matchers.levenshtein(0.3) // Similar prompts share results
  },
  capacity: { bucket: 50, list: 5 },
  timeToLive: Duration.minutes(10)
})

// These go to different buckets (different URL)
yield* cache.get({ url: "docs.com", prompt: "what is Effect?" })
yield* cache.get({ url: "blog.com", prompt: "what is Effect?" })

// These share a bucket (same URL, fuzzy prompt matching)
yield* cache.get({ url: "docs.com", prompt: "what is Effect?" })
yield* cache.get({ url: "docs.com", prompt: "what is effect?" }) // May return cached result

Fuzzy Matching

FuzzyCache scores entries based on parameter similarity:

const cache = yield* FuzzyCache.make({
  lookup: (params: { text: string }) =>
    Effect.succeed(`Result for ${params.text}`),
  config: {
    text: Matchers.levenshtein(0.5) // Allow up to 50% difference
  },
  capacity: { bucket: 100, list: 10 },
  timeToLive: Duration.minutes(5),
  minScore: 0.7 // Global minimum score threshold
})

// Cache a value
yield* cache.set({ text: "hello" }, "greeting")

// Get all matches with scores
const results = yield* cache.getAll({ text: "hallo" })
// [{ value: "greeting", score: 0.8, params: { text: "hello" } }]

// Get best match only
const best = yield* cache.get({ text: "hallo" })
// Returns "greeting" if score >= 0.7

In-Flight Request Deduplication

FuzzyCache automatically deduplicates concurrent requests for the same parameters:

const cache = yield* FuzzyCache.make({
  lookup: (params: { key: string }) =>
    Effect.sync(() => {
      console.log("Lookup called!")
      return `value-${params.key}`
    }),
  config: { key: Matchers.Exact() },
  capacity: { bucket: 100, list: 10 },
  timeToLive: Duration.minutes(5)
})

// Launch 5 concurrent requests
const results = yield* Effect.all([
  cache.get({ key: "test" }),
  cache.get({ key: "test" }),
  cache.get({ key: "test" }),
  cache.get({ key: "test" }),
  cache.get({ key: "test" })
], { concurrency: "unbounded" })

// Output: "Lookup called!" (only once)
// All 5 requests share the same lookup

Hash.structure for Parameter Hashing

FuzzyCache uses Effect's Hash.structure() for order-agnostic, deep parameter comparison:

import * as Option from "effect/Option"

const cache = yield* FuzzyCache.make({
  lookup: (params: { userId: string; tags: Option.Option<Array<string>> }) =>
    Effect.succeed(`Result for ${params.userId}`),
  config: {
    userId: Matchers.Exact(),
    tags: Matchers.Exact()
  },
  capacity: { bucket: 100, list: 10 },
  timeToLive: Duration.minutes(5)
})

// These are considered identical due to Hash.structure
yield* cache.get({ userId: "A", tags: Option.some(["x", "y"]) })
yield* cache.get({ userId: "A", tags: Option.some(["x", "y"]) }) // Cache hit

// Hash.structure handles Effect types correctly
yield* cache.get({ userId: "A", tags: Option.none() })
yield* cache.get({ userId: "A", tags: Option.none() }) // Cache hit

API Reference

FuzzyCache.make

Creates a new FuzzyCache with fixed time-to-live.

FuzzyCache.make<Params, Value, Error, R>(options: {
  readonly lookup: (params: Params) => Effect.Effect<Value, Error, R>
  readonly config: FuzzyConfig<Params>
  readonly capacity: { readonly bucket: number; readonly list: number }
  readonly timeToLive: Duration.DurationInput
  readonly minScore?: number
}): Effect.Effect<FuzzyCache<Params, Value, Error>, never, R>

Parameters:

  • lookup - Function to compute values for cache misses
  • config - Matcher configuration for each parameter
  • capacity.bucket - Maximum number of buckets
  • capacity.list - Maximum entries per bucket
  • timeToLive - Time-to-live for cached entries
  • minScore - Global minimum score threshold (default: 0.0)

FuzzyCache.makeWith

Creates a new FuzzyCache with dynamic time-to-live based on lookup result.

FuzzyCache.makeWith<Params, Value, Error, R>(options: {
  readonly lookup: (params: Params) => Effect.Effect<Value, Error, R>
  readonly config: FuzzyConfig<Params>
  readonly capacity: { readonly bucket: number; readonly list: number }
  readonly timeToLive: (exit: Exit.Exit<Value, Error>) => Duration.DurationInput
  readonly minScore?: number
}): Effect.Effect<FuzzyCache<Params, Value, Error>, never, R>

Example:

const cache = FuzzyCache.makeWith({
  lookup: (params: { key: string }) =>
    Effect.succeed(`value-${params.key}`),
  config: { key: Matchers.Exact() },
  capacity: { bucket: 100, list: 10 },
  timeToLive: (exit) =>
    Exit.isSuccess(exit)
      ? Duration.minutes(10)  // Cache successes for 10 minutes
      : Duration.seconds(30)  // Cache failures for 30 seconds
})

Methods

get

Retrieves the best-matching cached value or computes a new one.

get(params: Params): Effect.Effect<Value, Error>

Returns the value with the highest fuzzy match score above minScore.

getEither

Returns Either.Left for cached values, Either.Right for newly computed values.

getEither(params: Params): Effect.Effect<Either.Either<Value, Value>, Error>

Example:

const result = yield* cache.getEither({ key: "test" })

if (Either.isLeft(result)) {
  console.log("From cache:", result.left)
} else {
  console.log("Newly computed:", result.right)
}

getOption

Returns cached value as Option without triggering lookup.

getOption(params: Params): Effect.Effect<Option.Option<Value>, Error>

Waits for in-flight lookups but doesn't start new ones.

getOptionComplete

Returns cached value only if lookup is complete (doesn't wait for in-flight).

getOptionComplete(params: Params): Effect.Effect<Option.Option<Value>>

getAll

Returns all matches above threshold, sorted by score (highest first).

getAll(
  params: Params,
  threshold?: number
): Effect.Effect<Array<ScoredResult<Value>>, Error>

Example:

const matches = yield* cache.getAll({ query: "hello" }, 0.8)

matches.forEach(match => {
  console.log(`Score: ${match.score}`)
  console.log(`Value: ${match.value}`)
  console.log(`Original params:`, match.params)
})

refresh

Forces recomputation without invalidating existing cache.

refresh(params: Params): Effect.Effect<void, Error>

set

Manually adds or updates a cache entry.

set(params: Params, value: Value): Effect.Effect<void>

invalidate

Removes the best-matching entry from cache.

invalidate(params: Params): Effect.Effect<void>

invalidateWhen

Conditionally removes entry based on predicate.

invalidateWhen(
  params: Params,
  predicate: Predicate.Predicate<Value>
): Effect.Effect<void>

invalidateAll

Clears all cache entries.

readonly invalidateAll: Effect.Effect<void>

Statistics

FuzzyCache provides separate statistics for bucket and entry operations:

// Bucket-level stats (when buckets are accessed/created)
readonly exactStats: Effect.Effect<Cache.CacheStats>

// Entry-level stats (when fuzzy matches succeed/fail)
readonly fuzzyStats: Effect.Effect<Cache.CacheStats>

// Backwards compatible (alias for fuzzyStats)
readonly cacheStats: Effect.Effect<Cache.CacheStats>

Example:

const exactStats = yield* cache.exactStats
console.log(`Bucket hits: ${exactStats.hits}`)
console.log(`Bucket misses: ${exactStats.misses}`)

const fuzzyStats = yield* cache.fuzzyStats
console.log(`Entry hits: ${fuzzyStats.hits}`)
console.log(`Entry misses: ${fuzzyStats.misses}`)
console.log(`Total entries: ${fuzzyStats.size}`)

Other Methods

contains(params: Params): Effect.Effect<boolean>
entryStats(params: Params): Effect.Effect<Option.Option<Cache.EntryStats>>
readonly size: Effect.Effect<number>
readonly keys: Effect.Effect<Array<Params>>
readonly values: Effect.Effect<Array<Value>>
readonly entries: Effect.Effect<Array<[Params, Value]>>

Matchers

Exact

Requires exact parameter match (defines bucket boundaries).

Matchers.Exact()

levenshtein

Fuzzy string matching using edit distance.

Matchers.levenshtein(threshold: number): ParamMatcher<string>

Parameters:

  • threshold - Maximum normalized distance (0.0 to 1.0)

Score Calculation:

  • Edit distance / max(length) = normalized distance
  • If normalized > threshold: Returns Option.none() - entry is completely excluded
  • If normalized ≤ threshold: Returns Option.some(1.0 - normalized)

Example:

const matcher = Matchers.levenshtein(0.3)

// "hello" vs "hallo": distance 1, length 5, normalized 0.2 → Option.some(0.8)
// "hello" vs "world": distance 4, length 5, normalized 0.8 → Option.none() (exceeds threshold)
// "hello" vs "xyz": distance 5, length 5, normalized 1.0 → Option.none() (exceeds threshold)

Important: Entries returning Option.none() for any parameter are completely filtered out and won't appear in results at all.

numeric

Fuzzy numeric matching with tolerance.

Matchers.numeric(tolerance: number): ParamMatcher<number>

Parameters:

  • tolerance - Maximum difference for perfect match

Score Calculation:

  • If diff ≤ tolerance: Returns Option.some(1.0) (perfect match)
  • If diff < 2 × tolerance: Returns Option.some(1.0 - (diff - tolerance) / tolerance) (decay)
  • If diff ≥ 2 × tolerance: Returns Option.none() - entry is completely excluded

Example:

const matcher = Matchers.numeric(10)

// Query: 100
// Cached: 105 -> diff 5 → Option.some(1.0) (within tolerance)
// Cached: 115 -> diff 15 → Option.some(0.5) (excess 5, score 1.0 - 5/10)
// Cached: 120 -> diff 20 → Option.none() (at 2x tolerance, filtered out)
// Cached: 125 -> diff 25 → Option.none() (beyond 2x tolerance, filtered out)

Important: Entries at or beyond 2x tolerance are completely excluded using Option.none().

Custom Matchers

Create custom fuzzy matchers with the Fuzzy constructor:

import * as Matchers from "./Matchers"
import * as Option from "effect/Option"

const customMatcher = Matchers.Fuzzy<Date>((cached, query): Option.Option<number> => {
  const diff = Math.abs(cached.getTime() - query.getTime())
  const hourInMs = 3600000

  // Within 1 hour: perfect match
  if (diff <= hourInMs) return Option.some(1.0)

  // Beyond 24 hours: completely exclude entry
  if (diff >= hourInMs * 24) return Option.none()

  // Between 1 and 24 hours: proportional decay
  return Option.some(1.0 - (diff - hourInMs) / (hourInMs * 23))
})

Important: Custom matchers must return Option.Option<number>:

  • Option.some(score) - Entry is eligible for results (score between 0.0 and 1.0)
  • Option.none() - Entry is completely excluded from consideration

If any parameter returns Option.none(), the entire entry is disqualified from the results.

Advanced Usage

Custom Hashable Types

FuzzyCache works with any hashable Effect types:

import * as Option from "effect/Option"
import * as Either from "effect/Either"
import * as Array from "effect/Array"

interface ComplexParams {
  readonly userId: string
  readonly tags: Option.Option<Array<string>>
  readonly result: Either.Either<string, number>
}

const cache = yield* FuzzyCache.make({
  lookup: (params: ComplexParams) =>
    Effect.succeed(`Result for ${params.userId}`),
  config: {
    userId: Matchers.Exact(),
    tags: Matchers.Exact(),
    result: Matchers.Exact()
  },
  capacity: { bucket: 100, list: 10 },
  timeToLive: Duration.minutes(5)
})

// Hash.structure handles Effect types correctly
yield* cache.get({
  userId: "user123",
  tags: Option.some(["a", "b"]),
  result: Either.right(42)
})

Capacity Management

FuzzyCache uses two-level capacity with expiration-based eviction:

const cache = yield* FuzzyCache.make({
  lookup: (params: { userId: string; query: string }) =>
    Effect.succeed(`result-${params.query}`),
  config: {
    userId: Matchers.Exact(),  // Defines buckets
    query: Matchers.Exact()
  },
  capacity: {
    bucket: 100,  // Max 100 buckets
    list: 10      // Max 10 entries per bucket
  },
  timeToLive: Duration.minutes(5)
})

Eviction Strategy:

  1. When a bucket exceeds list capacity, the entry with the earliest expiration time is evicted
  2. When total buckets exceed bucket capacity, Effect's underlying Cache handles bucket eviction
  3. Expired entries are filtered out during getAll operations

Example with Staggered TTLs:

const cache = yield* FuzzyCache.makeWith({
  lookup: (params: { bucket: string; key: string }) =>
    Effect.succeed(`value-${params.key}`),
  config: {
    bucket: Matchers.Exact(),
    key: Matchers.Exact()
  },
  capacity: { bucket: 5, list: 3 },
  timeToLive: (exit) => {
    if (Exit.isSuccess(exit)) {
      const value = exit.value
      // Assign different TTLs based on value
      if (value.includes("important")) return Duration.hours(1)
      if (value.includes("normal")) return Duration.minutes(30)
      return Duration.minutes(5)
    }
    return Duration.seconds(30)
  }
})

// When capacity is exceeded, entries with shortest remaining TTL are evicted first

Effect Cache Compatibility

FuzzyCache is a full subtype of Effect's Cache and ConsumerCache interfaces, meaning it can be used anywhere these types are expected:

import * as Cache from "effect/Cache"

// Generic function accepting any Cache
const getCached = <K, V, E>(
  cache: Cache.Cache<K, V, E>,
  key: K
): Effect.Effect<V, E> => cache.get(key)

// Generic function accepting any ConsumerCache (read-only operations)
const checkCached = <K, V, E>(
  cache: Cache.ConsumerCache<K, V, E>,
  key: K
): Effect.Effect<boolean> => cache.contains(key)

const fuzzyCache = yield* FuzzyCache.make({
  lookup: (params: { key: string }) => Effect.succeed(`value-${params.key}`),
  config: { key: Matchers.Exact() },
  capacity: { bucket: 100, list: 10 },
  timeToLive: Duration.minutes(5)
})

// Works seamlessly with Cache interface
const value = yield* getCached(fuzzyCache, { key: "test" })

// Works seamlessly with ConsumerCache interface
const exists = yield* checkCached(fuzzyCache, { key: "test" })

Implemented Cache Methods:

  • get(params) - Retrieve or compute value
  • getEither(params) - Distinguish cached vs computed
  • getOption(params) - Non-blocking lookup (waits for in-flight)
  • getOptionComplete(params) - Immediate lookup (returns None for in-flight)
  • refresh(params) - Force recomputation
  • set(params, value) - Manual cache insertion
  • invalidate(params) - Remove single entry
  • invalidateWhen(params, predicate) - Conditional invalidation
  • invalidateAll - Clear entire cache
  • contains(params) - Check existence
  • entryStats(params) - Entry-level statistics
  • cacheStats - Overall cache statistics
  • size - Total entry count
  • keys - All parameter sets
  • values - All cached values
  • entries - All key-value pairs

Internal Architecture

FuzzyCache is built with a modular architecture split across several internal modules:

Module Structure

src/
├── FuzzyCache.ts          # Public API and type definitions
├── Matchers.ts            # Matcher constructors (Exact, Fuzzy, levenshtein, numeric)
└── internal/
    ├── fuzzycache.ts      # Core cache implementation logic
    ├── config.ts          # Configuration and parameter partitioning
    ├── bucketKey.ts       # Bucket key generation and hashing
    ├── entry.ts           # Entry value types (Pending, Complete) and expiration
    ├── bucket.ts          # Bucket operations and eviction logic
    ├── scoring.ts         # Fuzzy matching and score calculation
    └── stats.ts           # Statistics tracking (hits, misses, size)

Key Design Decisions

Bucket-Based Organization: The cache uses exact-match parameters to create isolated buckets, then performs fuzzy matching within each bucket. This provides O(1) bucket lookup followed by O(n) fuzzy scoring where n is the entries per bucket (typically small).

Hash.structure() for Deduplication: Uses Effect's Hash.structure() to compute structural hashes of parameter objects. This enables:

  • Reliable equality checks across Effect types (Option, Either, etc.)
  • Order-agnostic comparison (objects with same fields in different order are equal)
  • Proper handling of nested structures and arrays

Expiration-Based Eviction: When bucket capacity is exceeded, the entry with the earliest absolute expiration time is evicted. This differs from LRU (evicting least recently used) and ensures soon-to-expire entries are removed first, maximizing cache utility.

Dual Stats Tracking:

  • exactStats tracks bucket-level operations (when buckets are created/accessed via the underlying Cache)
  • fuzzyStats tracks entry-level operations (when fuzzy matches succeed/fail within buckets)
  • This distinction is critical because multiple entries can exist in the same bucket

In-Flight Request Handling: Pending lookups are stored as special Pending entries with Deferred values. Concurrent requests for the same parameters are identified using Hash.hash() comparison and await the same Deferred, ensuring only one lookup executes. The set() method also uses hash-based deduplication to prevent duplicate entries with identical parameters.

Performance

  • O(1) Bucket Lookup: Uses Effect's Hash.structure() for constant-time bucket access
  • O(n) Fuzzy Scoring: Linear scan within buckets, where n is typically small (10-100 entries)
  • In-Flight Deduplication: Concurrent requests for identical parameters share single lookup
  • Expiration-Based Eviction: Efficient memory management by evicting soon-to-expire entries first
  • Property-Based Tested: All core properties verified with FastCheck (1000+ runs per property)

Examples

API Response Caching with Fuzzy Matching

interface APIParams {
  readonly endpoint: string
  readonly query: string
  readonly limit: number
}

const apiCache = yield* FuzzyCache.make({
  lookup: (params: APIParams) =>
    Effect.tryPromise(() =>
      fetch(`${params.endpoint}?q=${params.query}&limit=${params.limit}`)
        .then(r => r.json())
    ),
  config: {
    endpoint: Matchers.Exact(),          // Different endpoints = different buckets
    query: Matchers.levenshtein(0.2),    // Similar queries share results
    limit: Matchers.numeric(5)           // Nearby limits share results
  },
  capacity: { bucket: 50, list: 10 },
  timeToLive: Duration.minutes(5)
})

// First call hits API
const results1 = yield* apiCache.get({
  endpoint: "/api/search",
  query: "effect typescript",
  limit: 20
})

// Similar query may return cached results
const results2 = yield* apiCache.get({
  endpoint: "/api/search",
  query: "effect typescrpt", // Typo, but similar enough
  limit: 22                   // Close to 20
})

User Session Caching

import * as Option from "effect/Option"

interface SessionParams {
  readonly userId: string
  readonly roles: Array<string>
  readonly permissions: Option.Option<Array<string>>
}

const sessionCache = yield* FuzzyCache.make({
  lookup: (params: SessionParams) =>
    Effect.gen(function* () {
      const user = yield* UserService.loadUser(params.userId)
      return createSession(user, params.roles)
    }),
  config: {
    userId: Matchers.Exact(),
    roles: Matchers.Exact(),
    permissions: Matchers.Exact()
  },
  capacity: { bucket: 1000, list: 5 },
  timeToLive: Duration.hours(1)
})

Rate-Limited API with Smart Caching

const rateLimitedCache = yield* FuzzyCache.makeWith({
  lookup: (params: { endpoint: string; query: string }) =>
    Effect.gen(function* () {
      // Implement rate limiting
      yield* RateLimiter.acquire

      return yield* Effect.tryPromise(() =>
        fetch(`${params.endpoint}?q=${params.query}`).then(r => r.json())
      )
    }),
  config: {
    endpoint: Matchers.Exact(),
    query: Matchers.levenshtein(0.3)
  },
  capacity: { bucket: 20, list: 10 },
  timeToLive: (exit) =>
    Exit.isSuccess(exit)
      ? Duration.minutes(10)  // Cache successes longer
      : Duration.seconds(30)  // Retry failures sooner
})

Testing

FuzzyCache includes comprehensive tests using @effect/vitest and FastCheck for property-based testing:

Test Coverage

  • 60 total tests - All passing

  • Property-Based Tests - 1000+ runs per property verifying core invariants:

    • Exact match always returns score 1.0
    • No duplicate lookups for identical parameters
    • Levenshtein distance symmetry and range properties
    • Numeric matcher tolerance and scoring properties
    • Bucket isolation by exact parameters
    • Score ordering (descending)
    • Size tracking correctness
  • Cache Subtype Verification - Proves FuzzyCache implements Cache/ConsumerCache interfaces correctly

  • TTL Expiration Tests - Using TestClock for deterministic time-based testing

  • Capacity & Eviction Tests - Verifies expiration-based eviction strategy

  • In-Flight Deduplication Tests - Confirms concurrent request handling

  • API Method Tests - Complete coverage of all public methods

Running Tests

npm test                # Run all tests
npm run test:coverage   # Generate coverage report

License

MIT

About

Bucket-based fuzzy cache extending Effect's Cache

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors