Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
7e40629
refactor(chunker): qualify methods with enclosing class name
aeneasr Apr 8, 2026
32b5bf7
feat(chunker): add Kotlin language support (.kt, .kts)
aeneasr Apr 8, 2026
33c6bd0
test(bench): add Kotlin SWE-bench task from ktor CORS bug
aeneasr Apr 8, 2026
56bd549
docs: add Kotlin to supported languages lists
aeneasr Apr 8, 2026
388f48a
docs: add Kotlin benchmark result (-7% cost)
aeneasr Apr 8, 2026
9ad43f6
test(bench): replace Kotlin task with harder kotlinx.serialization bug
aeneasr Apr 9, 2026
a5f1c96
docs: update Kotlin benchmark results (-13% cost, -41% time)
aeneasr Apr 9, 2026
7d47075
chore: remove old Kotlin benchmark runs, keep latest results
aeneasr Apr 9, 2026
f9ee367
refactor: remove logging from cmd package
aeneasr Apr 9, 2026
70241fb
chore: update E2E snapshots and add embedder logging
aeneasr Apr 9, 2026
5dfd015
fix(tui): replace pterm progress bar with custom renderer
aeneasr Apr 9, 2026
9312704
fix(cmd): remove undefined logger reference from runIndex
aeneasr Apr 9, 2026
ce1c9fd
revert(chunker): remove ResolveContainment
aeneasr Apr 9, 2026
21db4cd
feat(config): bump IndexVersion to 3
aeneasr Apr 9, 2026
d50b8cf
test: regenerate E2E snapshots for class-qualified symbols
aeneasr Apr 9, 2026
823a086
docs(bench): add Kotlin rerun results (Sonnet, Apr 12) — Good→Perfect
aeneasr Apr 13, 2026
a65d7bf
refactor(chunker): isolate Kotlin changes from existing languages
aeneasr Apr 13, 2026
b3af3ca
fix(preflight): replace text-mention check with tool_use event detection
aeneasr Apr 13, 2026
0a7a909
docs(bench): update Kotlin to haiku run (Apr 13) — Good→Perfect confi…
aeneasr Apr 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 24 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,39 +225,43 @@ Claude on real GitHub bug-fix tasks and measures cost, time, output tokens, and
patch quality — with and without Lumen. All results are reproducible: raw JSONL
streams, patch diffs, and judge ratings are committed to this repository.

**Key results** — 8 runs across 8 languages, hard difficulty, real GitHub
**Key results** — 9 runs across 9 languages, hard difficulty, real GitHub
issues (`ordis/jina-embeddings-v2-base-code`, Ollama):

| Language | Cost Reduction | Time Reduction | Output Token Reduction | Quality |
| ---------- | -------------- | -------------- | ----------------------- | -------------- |
| Rust | **-39%** | **-34%** | **-31%** (18K → 12K) | Poor (both) |
| JavaScript | **-33%** | **-53%** | **-66%** (14K → 5K) | Perfect (both) |
| TypeScript | **-27%** | **-33%** | **-64%** (5K → 1.8K) | Good (both) |
| PHP | **-27%** | **-34%** | **-59%** (1.9K → 0.8K) | Good (both) |
| Ruby | **-24%** | **-11%** | -9% (6.1K → 5.6K) | Good (both) |
| Python | **-20%** | **-29%** | **-36%** (1.7K → 1.1K) | Perfect (both) |
| Go | **-12%** | -9% | -10% (11K → 10K) | Good (both) |
| C++ | **-8%** | -3% | +42% (feature task) | Good (both) |

**Cost was reduced in every language tested. Quality was maintained in every
task — zero regressions.** JavaScript and TypeScript show the most dramatic
efficiency gains: same quality fixes in half the time with two-thirds fewer
tokens. Even on tasks too hard for either approach (Rust), Lumen cuts the cost
of failure by 39%.

See [docs/BENCHMARKS.md](docs/BENCHMARKS.md) for all 8 per-language deep dives,
| Language | Cost Reduction | Time Reduction | Output Token Reduction | Quality |
| ------------- | -------------- | -------------- | ----------------------- | --------------------- |
| Rust | **-39%** | **-34%** | **-31%** (18K → 12K) | Poor (both) |
| JavaScript | **-33%** | **-53%** | **-66%** (14K → 5K) | Perfect (both) |
| Kotlin | **-37%** | **-34%** | -10% (15K → 14K) | Good → **Perfect** ↑ |
| TypeScript | **-27%** | **-33%** | **-64%** (5K → 1.8K) | Good (both) |
| PHP | **-27%** | **-34%** | **-59%** (1.9K → 0.8K) | Good (both) |
| Ruby | **-24%** | **-11%** | -9% (6.1K → 5.6K) | Good (both) |
| Python | **-20%** | **-29%** | **-36%** (1.7K → 1.1K) | Perfect (both) |
| Go | **-12%** | -9% | -10% (11K → 10K) | Good (both) |
| C++ | **-8%** | -3% | +42% (feature task) | Good (both) |

**Cost was reduced in every language tested. Quality was maintained or improved
in every task — zero regressions.** Kotlin is the first task where Lumen lifted
quality (Good → Perfect): semantic search surfaced `JsonPath.resize()` in one
query, giving Claude precise context for a tighter patch. JavaScript and
TypeScript show the most dramatic efficiency gains: same quality fixes in half
the time with two-thirds fewer tokens. Even on tasks too hard for either
approach (Rust), Lumen cuts the cost of failure by 39%.

See [docs/BENCHMARKS.md](docs/BENCHMARKS.md) for all 9 per-language deep dives,
judge rationales, and reproduce instructions.

## Supported languages

Supports **12 language families** with semantic chunking (9 benchmarked):
Supports **13 language families** with semantic chunking (9 benchmarked):

| Language | Parser | Extensions | Benchmark status |
| ---------------- | ----------- | ----------------------------------------- | --------------------------------------------- |
| Go | Native AST | `.go` | Benchmarked: -12% cost, Good quality |
| Python | tree-sitter | `.py` | Benchmarked: Perfect quality, -36% tokens |
| TypeScript / TSX | tree-sitter | `.ts`, `.tsx` | Benchmarked: -64% tokens, -33% time |
| JavaScript / JSX | tree-sitter | `.js`, `.jsx`, `.mjs` | Benchmarked: -66% tokens, -53% time |
| Kotlin | tree-sitter | `.kt`, `.kts` | Benchmarked: -31% cost, -37% time, Good → Perfect quality |
| Dart | tree-sitter | `.dart` | Benchmarked: -76% cost, -82% tokens, -79% time |
| Rust | tree-sitter | `.rs` | Benchmarked: -39% cost, -34% time |
| Ruby | tree-sitter | `.rb` | Benchmarked: -24% cost, -11% time |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# SWE-Bench Detail Report

Generated: 2026-04-13 10:01 UTC

---

## kotlin-hard [kotlin]

**Issue:** Unexpected token for data object in a deep json structure gives ArrayIndexOutOfBoundsException

> **Describe the bug**
>
> When the JSON parser encounters an unexpected token while parsing a `data object` deep in a JSON structure (roughly 8 or more levels of nesting), it throws `ArrayIndexOutOfBoundsException` instead of the usual `SerializationException` with a helpful message pointing at the offending path.
>
> For shallow structures, the parser produces a clean error such as:
>
> Unexpected JSON token at offset N: Expected start of the object '{', but had 'f' instead at path: $.bar.bar.baz
>
> Once the nesting depth crosses a threshold, the same kind of malformed input instead surfaces as:
>
> java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
>
> with no indication of where in the JSON the problem occurred.
>
> **To Reproduce**
>
> ```kotlin
> package com.foo.bar
>
> import kotlinx.serialization.Serializable
> import kotlinx.serialization.SerializationException
> import kotlinx.serialization.json.Json
> import org.junit.jupiter.api.Assertions.assertEquals
> import org.junit.jupiter.api.Test
> import org.junit.jupiter.api.assertThrows
>
> class TestDeepJsonFailure {
> @Test
> fun `test deep failure`() {
> val exception = assertThrows<SerializationException> {
> Json.decodeFromString<Foo>("""{"bar": {"bar": {"bar": {"bar": {"bar": {"bar": {"baz": false}}}}}}}""")
> }
>
> assertEquals(
> """
> Unexpected JSON token at offset 56: Expected start of the object '{', but had 'f' instead at path: ${'$'}.bar.bar.bar.bar.bar.bar.baz
> JSON input: {"bar": {"bar": {"bar": {"bar": {"bar": {"bar": {"baz": false}}}}}}}
> """.trimIndent(),
> exception.message
> )
> }
>
> @Test
> fun `test deeper failure`() {
> val exception = assertThrows<ArrayIndexOutOfBoundsException> {
> Json.decodeFromString<Foo>("""{"bar": {"bar": {"bar": {"bar": {"bar": {"bar": {"bar": {"baz": false}}}}}}}}""")
> }
>
> assertEquals(
> "Index 0 out of bounds for length 0",
> exception.message
> )
> }
> }
>
> @Serializable
> data class Foo(val bar: Foo?, val baz: Baz?)
>
> @Serializable
> data object Baz
> ```
>
> The first test passes (shallow malformed input produces a `SerializationException` with a readable path), while the second test documents the broken behaviour: for a slightly deeper but otherwise equivalent malformed input the decoder throws an `ArrayIndexOutOfBoundsException` with the useless message `Index 0 out of bounds for length 0`.
>
> **Expected behavior**
>
> The parser should fail with the same kind of `SerializationException` carrying a readable error message and path, regardless of how deeply the malformed `data object` is nested. Users should never see an `ArrayIndexOutOfBoundsException` escape from `Json.decodeFromString`.
>
> **Environment**
> - Kotlin version: 2.1.10
> - kotlinx.serialization version: 1.8.0
> - Kotlin platforms: JVM
>

### Metrics

| Scenario | Duration | Input Tok | Cache Read | Output Tok | Cost |
|----------|----------|-----------|------------|------------|------|
| **baseline** | 350.1s | 305 | 2276925 | 15511 | $0.5606 |
| **with-lumen** | 231.4s | 271 | 2143284 | 13896 | $0.3516 |

### baseline

**Rating: Good**

```diff
diff --git a/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt b/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt
index 14e70a4..9725ad8 100644
--- a/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt
+++ b/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt
@@ -132,9 +132,13 @@ internal class JsonPath {
private fun prettyString(it: Any?) = (it as? SerialDescriptor)?.serialName ?: it.toString()

private fun resize() {
+ val oldSize = indicies.size
val newSize = currentDepth * 2
currentObjectPath = currentObjectPath.copyOf(newSize)
indicies = indicies.copyOf(newSize)
+ for (i in oldSize until newSize) {
+ indicies[i] = -1
+ }
}

override fun toString(): String = getPath()
```

### with-lumen

**Rating: Perfect**

```diff
diff --git a/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt b/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt
index 14e70a4..708f2ac 100644
--- a/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt
+++ b/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt
@@ -134,7 +134,11 @@ internal class JsonPath {
private fun resize() {
val newSize = currentDepth * 2
currentObjectPath = currentObjectPath.copyOf(newSize)
- indicies = indicies.copyOf(newSize)
+ val newIndices = indicies.copyOf(newSize)
+ for (i in indicies.size until newSize) {
+ newIndices[i] = -1
+ }
+ indicies = newIndices
}

override fun toString(): String = getPath()
```


Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"rating":"Good","files_correct":false,"logic_equivalent":true}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Rating: Good

The candidate patch correctly fixes the root cause by initializing the newly allocated indices to -1 after resizing, which matches the gold patch's intent. The logic is equivalent — both approaches ensure new elements in the expanded `indicies` array are initialized to -1 rather than the default 0. However, the candidate patch is missing the test file changes (`JsonPathTest.kt`) that the gold patch includes, and uses a slightly different implementation style (post-copy loop vs. pre-initialized IntArray with copyInto).
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"cost_usd": 0.5605703,
"duration_ms": 350074,
"input_tokens": 305,
"cache_read": 2276925,
"cache_created": 51296,
"output_tokens": 15511
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
diff --git a/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt b/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt
index 14e70a4..9725ad8 100644
--- a/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt
+++ b/formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonPath.kt
@@ -132,9 +132,13 @@ internal class JsonPath {
private fun prettyString(it: Any?) = (it as? SerialDescriptor)?.serialName ?: it.toString()

private fun resize() {
+ val oldSize = indicies.size
val newSize = currentDepth * 2
currentObjectPath = currentObjectPath.copyOf(newSize)
indicies = indicies.copyOf(newSize)
+ for (i in oldSize until newSize) {
+ indicies[i] = -1
+ }
}

override fun toString(): String = getPath()
Loading
Loading