Skip to content

[full-text] Use paimon-full-text in Java and Python#8463

Merged
JingsongLi merged 5 commits into
masterfrom
codex/paimon-full-text-dependency
Jul 5, 2026
Merged

[full-text] Use paimon-full-text in Java and Python#8463
JingsongLi merged 5 commits into
masterfrom
codex/paimon-full-text-dependency

Conversation

@JingsongLi

Copy link
Copy Markdown
Contributor

Summary

Switch Paimon's Tantivy full-text global index implementation to the standalone paimon-full-text dependency for Java and PyPaimon. Single-column structured full-text DSL is delegated to the native reader, while Paimon keeps Java/Python orchestration for multi-column and hybrid query composition.

Changes

  • Add the paimon-full-text-index Maven dependency and route Java full-text index reader/writer calls through org.apache.paimon.index.fulltext.
  • Replace PyPaimon's tantivy-py reader/writer path with paimon_ftindex adapters, including native roaring-filter support for include_row_ids.
  • Push whole single-column full-text queries down to the native reader; keep multi-match/cross-column planning in Paimon.
  • Remove index-module archive layout and searcher-pool helpers that were specific to paimon-tantivy-jni.
  • Update Java/Python tests and mixed-language E2E coverage for the new dependency.
  • Update CI workflows to checkout, build, and install apache/paimon-full-text for Java and Python jobs.

Testing

  • python -m py_compile paimon-python/pypaimon/globalindex/tantivy/tantivy_full_text_global_index_reader.py paimon-python/pypaimon/globalindex/tantivy/tantivy_full_text_index_writer.py paimon-python/pypaimon/tests/vector_search_filter_test.py paimon-python/pypaimon/tests/global_index_build_test.py paimon-python/pypaimon/tests/e2e/java_py_read_write_test.py
  • python -m pytest paimon-python/pypaimon/tests/vector_search_filter_test.py paimon-python/pypaimon/tests/global_index_build_test.py -q
  • PAIMON_FTINDEX_JNI_LIB_PATH=... mvn -B -ntp -pl paimon-tantivy/paimon-tantivy-index -am -Dtest=TantivyFullTextGlobalIndexTest,JavaPyTantivyE2ETest,TantivyFullTextGlobalIndexerFactoryTest -Drun.e2e.tests=true -DfailIfNoTests=false -Dcheckstyle.skip=true -Dspotless.check.skip=true clean test
  • PAIMON_FTINDEX_LIB_PATH=... PYTHONPATH=... python -m pytest java_py_read_write_test.py::JavaPyReadWriteTest::test_read_tantivy_full_text_index -q
  • git diff --check --cached

Notes

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup and the migration to paimon-full-text. I found one Java regression around whole-query pushdown for nested multi_match; please fix it before merge. I also ran git diff --check and the focused PyPaimon full-text tests (vector_search_filter_test.py, global_index_build_test.py), which passed with 90 tests.

}

private static boolean canPushDownWholeQuery(FullTextQuery query) {
return !(query instanceof FullTextQuery.MultiMatch) && query.columns().size() == 1;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canPushDownWholeQuery only rejects a root MultiMatch. A single-column BooleanQuery or Boost that contains a nested MultiMatch still has columns().size() == 1, so this path pushes the whole query to TantivyFullTextGlobalIndexReader.toNativeQueryJson. The new native converter has no MultiMatch branch and throws Unsupported single-column full-text query. Before this PR, the recursive evaluator handled MultiMatch by expanding it into per-column Match queries. Please either reject nested MultiMatch in the pushdown check or add native conversion coverage, ideally with a regression test.

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the nested MultiMatch pushdown regression. The recursive guard plus regression test look good to me.

Validated locally:

  • git diff --check origin/master...HEAD
  • PyPaimon focused full-text tests: vector_search_filter_test.py, global_index_build_test.py (90 passed)
  • mvn -B -ntp -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=FullTextSearchBuilderTest#testNestedSingleColumnMultiMatchFallsBackToRecursiveEvaluation test

@JingsongLi JingsongLi merged commit baaaa0e into master Jul 5, 2026
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants