Skip to content

GH-3751: Test input on entry. QueryIterFailed on all setup.#3753

Merged
afs merged 2 commits intoapache:mainfrom
afs:gh3751-tdb-reorder
Feb 15, 2026
Merged

GH-3751: Test input on entry. QueryIterFailed on all setup.#3753
afs merged 2 commits intoapache:mainfrom
afs:gh3751-tdb-reorder

Conversation

@afs
Copy link
Member

@afs afs commented Feb 13, 2026

GitHub issue resolved #3751

Hi @Aklakan - would you mind sanity checking OpExecutorTDB2.optimizeExecuteQuads/optimizeExecuteTriples in this PR

Thanks.


  • Tests are included.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.

@afs afs force-pushed the gh3751-tdb-reorder branch from a32ccdf to 7a0771d Compare February 13, 2026 20:15
@Aklakan
Copy link
Contributor

Aklakan commented Feb 14, 2026

Overall it looks good. I occasionally got failing query cancel tests on a 14 core machine - before and with your PR.
The patch adds one more QueryIterFailed/try-catch block and synchronizes a closeIterator method. With those changes I didn't see any further build/test failures.

gh3751-tdb-reorder.patch

to be applied from project root with patch -p0 < gh3751-tdb-reorder.patch

The synchronized method should solve this race-condition between regular close and concurrent cancel.

An alternative might be in QueryIteratorBase to always run closeIterator() inside of synchronized (cancelLock):
requestCancel() is already run with the cancelLock. But in the case of QueryIterPlainWrapper, cancel calls closeIterator() thus competing with non-synchronized regular close.
Then again, cancel() normally shouldn't call close() because that's the privilege of the iterator's owner = the query execution. So QueryIterPlainWrapper is an exception and the synchronized method approach should be fine.

[INFO] Running org.apache.jena.geosparql.geo.topological.CancelQueryTest
java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:124)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:193)
	at org.apache.jena.geosparql.geo.topological.CancelQueryTest.runAsyncAbort(CancelQueryTest.java:155)
	at org.apache.jena.geosparql.geo.topological.CancelQueryTest.test_cancel_spatial_property_function1(CancelQueryTest.java:124)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)
	at java.base/java.lang.Thread.run(Thread.java:1474)
Caused by: java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
	at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
	at java.base/java.util.Objects.checkIndex(Objects.java:365)
	at java.base/java.util.ArrayList.get(ArrayList.java:428)
	at org.apache.jena.util.iterator.NiceIterator$1.close(NiceIterator.java:152)
	at org.apache.jena.util.iterator.NiceIterator.close(NiceIterator.java:203)
	at org.apache.jena.util.iterator.Map1Iterator.close(Map1Iterator.java:64)
	at org.apache.jena.util.iterator.NiceIterator.close(NiceIterator.java:203)
	at org.apache.jena.util.iterator.Map1Iterator.close(Map1Iterator.java:64)
	at org.apache.jena.atlas.iterator.Iter.close(Iter.java:719)
	at org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.closeIterator(QueryIterPlainWrapper.java:72)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.close(QueryIteratorBase.java:194)
	at org.apache.jena.sparql.engine.iterator.QueryIter.close(QueryIter.java:88)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.performClose(QueryIteratorBase.java:224)
	at org.apache.jena.sparql.engine.iterator.QueryIter1.closeIterator(QueryIter1.java:46)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.close(QueryIteratorBase.java:194)
	at org.apache.jena.sparql.engine.iterator.QueryIter.close(QueryIter.java:88)
	at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.closeSubIterator(QueryIterRepeatApply.java:110)
	at org.apache.jena.sparql.engine.iterator.QueryIter1.closeIterator(QueryIter1.java:45)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.close(QueryIteratorBase.java:194)
	at org.apache.jena.sparql.engine.iterator.QueryIter.close(QueryIter.java:88)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.nextBinding(QueryIteratorBase.java:156)
	at org.apache.jena.sparql.engine.iterator.QueryIterProcedure.moveToNextBinding(QueryIterProcedure.java:76)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.nextBinding(QueryIteratorBase.java:166)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.moveToNextBinding(QueryIteratorWrapper.java:45)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.nextBinding(QueryIteratorBase.java:166)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.moveToNextBinding(QueryIteratorWrapper.java:45)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.nextBinding(QueryIteratorBase.java:166)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.next(QueryIteratorBase.java:143)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.next(QueryIteratorBase.java:46)
	at org.apache.jena.sparql.exec.RowSetStream.next(RowSetStream.java:57)
	at org.apache.jena.sparql.exec.RowSetStream.next(RowSetStream.java:33)
	at org.apache.jena.sparql.engine.ResultSetStream.nextBinding(ResultSetStream.java:90)
	at org.apache.jena.sparql.engine.ResultSetStream.nextSolution(ResultSetStream.java:115)
	at org.apache.jena.sparql.engine.ResultSetStream.next(ResultSetStream.java:123)
	at org.apache.jena.geosparql.geo.topological.CancelQueryTest.doCount(CancelQueryTest.java:133)
	at org.apache.jena.geosparql.geo.topological.CancelQueryTest.lambda$runAsyncAbort$0(CancelQueryTest.java:146)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
	... 1 more
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.288 s <<< FAILURE! -- in org.apache.jena.geosparql.geo.topological.CancelQueryTest
[ERROR] org.apache.jena.geosparql.geo.topological.CancelQueryTest.test_cancel_spatial_property_function1[number of geometries: 31,623] -- Time elapsed: 1.286 s <<< FAILURE!
java.lang.AssertionError: expected:<class org.apache.jena.query.QueryCancelledException> but was:<class java.lang.IndexOutOfBoundsException>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:120)
	at org.junit.Assert.assertEquals(Assert.java:146)
	at org.apache.jena.geosparql.geo.topological.CancelQueryTest.runAsyncAbort(CancelQueryTest.java:161)
	at org.apache.jena.geosparql.geo.topological.CancelQueryTest.test_cancel_spatial_property_function1(CancelQueryTest.java:124)

@Aklakan
Copy link
Contributor

Aklakan commented Feb 14, 2026

Meh, the patch can be further simplified by removing the nested try-catch blocks in StageGeneratorGeneric:

    protected QueryIterator execute(BasicPattern pattern, ReorderTransformation reorder,
                                    QueryIterator input, ExecutionContext execCxt) {
        try {
            Explain.explain(pattern, execCxt.getContext()) ;

             if ( reorder != null && pattern.size() >= 2 ) {
                // If pattern size is 0 or 1, nothing to do.
                BasicPattern bgp2 = pattern ;

                // Try to ground the pattern
                if ( ! input.isJoinIdentity() ) {
                    QueryIterPeek peek = QueryIterPeek.create(input, execCxt) ;
                    // And now use this one
                    input = peek ;
                    Binding b = peek.peek() ;
                    bgp2 = Substitute.substitute(pattern, b) ;
                }
                ReorderProc reorderProc = reorder.reorderIndexes(bgp2) ;
                pattern = reorderProc.reorder(pattern) ;
            } else {
                if ( ! input.hasNext() )
                    return input ;
            }

            Explain.explain("Reorder/generic", pattern, execCxt.getContext()) ;
            return PatternMatchData.execute(execCxt.getActiveGraph(), pattern, input, null, execCxt) ;
        } catch (Exception e) {
            return new QueryIterFailed(input, execCxt, e);
        }
    }

@afs
Copy link
Member Author

afs commented Feb 15, 2026

StageGeneratorGeneric needs your patch - thanks (took v2, and put if ( input.hasNext ) at the start).

PR revised.

I occasionally got failing query cancel tests on a 14 core machine - before and with your PR.

What sort of machine is that?

I have a Dell PC with 12th Gen Intel® Core™ i7-12700 × 20 - 4 P cores, 16 E cores. Each P core has its own L2 cache and groups of 4 E cores have an L2 cache.

I have seen these errors once in several months of a lot of Jena builds while removing the deprecated code and other cleaning up for Jena6. The CI on Jenkins and github has it quite often when the build systems are busy.


Looks like you have found the cause of both the jena-integeration-tests and jena-geosparql test failures!

It's QueryIterAbortable for jena-integeration-tests which inherits from QueryIterPlainWrapper.

However ... 😄 ... this shows another problem.
It's not just missing the iterator and failing to close it. The code can concurrently manipulate the iterator.

requestCancel is on another thread.
The synchronized makes close / requestCancel safe.

hasNextBinding / moveToNextBinding / close can manipulate the iterator safely as they are all on the same thread.
What Rust might call the owning thread.

requestCancel / hasNextBinding / moveToNextBinding is a mixture of threads. While closing due to cancellation, *NextBinding may be active and there is no protection.

In Rust, this code pattern would not even compile!

So I think using synchronized will fix the build crashes but a complete fix needs to only perform the close on the query thread. Adding synchronized to *NextBinding is very heavy.

As using synchroized for close / requestCancel is step forward, let's put it in and see how the CI reacts. In parallel,
I'll raise an issue (PS #3755) to discuss what to do with a permanent fix which I fear may impact other iterators. One solution is a volatile so the cancel signal is put onto the query execution thread.

I'm currently at the stage of wondering why the whole system does not crash more often! This is a common state when working on concurrency bugs.

@afs afs force-pushed the gh3751-tdb-reorder branch from dab66a0 to 45e9f50 Compare February 15, 2026 16:12
@Aklakan
Copy link
Contributor

Aklakan commented Feb 15, 2026

My work notebook has: 12th Gen Intel(R) Core(TM) i9-12900HK (6P and 8E cores)

Hm, indeed a concurrent close during hasNextBinding / moveToNextBinding is not ideal - but any exception thrown there should usually still lead to a clean close in the end - the exception cause might be misleading (e.g. "iterator already closed" instead of "aborted"). (well, its not guaranteed that a close is clean while the resource is in-use concurrently - so yeah, best effort fix only)

So I think using synchronized will fix the build crashes but a complete fix needs to only perform the close on the query thread.

I agree.

@afs
Copy link
Member Author

afs commented Feb 15, 2026

usually

that word is doing a lot of work here!

The exception could be an NPE.

@afs afs merged commit e87cf2d into apache:main Feb 15, 2026
@afs afs deleted the gh3751-tdb-reorder branch February 15, 2026 21:16
@afs
Copy link
Member Author

afs commented Feb 16, 2026

I've done about 40 builds on GH and Jenkins, 15 of them this morning (Monday) which is a typically busy time.

No build failures 🎆 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Query returns 500 server error

2 participants

Comments