[full-text] Use paimon-full-text in Java and Python#8463
Conversation
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the cleanup and the migration to paimon-full-text. I found one Java regression around whole-query pushdown for nested multi_match; please fix it before merge. I also ran git diff --check and the focused PyPaimon full-text tests (vector_search_filter_test.py, global_index_build_test.py), which passed with 90 tests.
| } | ||
|
|
||
| private static boolean canPushDownWholeQuery(FullTextQuery query) { | ||
| return !(query instanceof FullTextQuery.MultiMatch) && query.columns().size() == 1; |
There was a problem hiding this comment.
canPushDownWholeQuery only rejects a root MultiMatch. A single-column BooleanQuery or Boost that contains a nested MultiMatch still has columns().size() == 1, so this path pushes the whole query to TantivyFullTextGlobalIndexReader.toNativeQueryJson. The new native converter has no MultiMatch branch and throws Unsupported single-column full-text query. Before this PR, the recursive evaluator handled MultiMatch by expanding it into per-column Match queries. Please either reject nested MultiMatch in the pushdown check or add native conversion coverage, ideally with a regression test.
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for addressing the nested MultiMatch pushdown regression. The recursive guard plus regression test look good to me.
Validated locally:
git diff --check origin/master...HEAD- PyPaimon focused full-text tests:
vector_search_filter_test.py,global_index_build_test.py(90 passed) mvn -B -ntp -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=FullTextSearchBuilderTest#testNestedSingleColumnMultiMatchFallsBackToRecursiveEvaluation test
Summary
Switch Paimon's Tantivy full-text global index implementation to the standalone
paimon-full-textdependency for Java and PyPaimon. Single-column structured full-text DSL is delegated to the native reader, while Paimon keeps Java/Python orchestration for multi-column and hybrid query composition.Changes
paimon-full-text-indexMaven dependency and route Java full-text index reader/writer calls throughorg.apache.paimon.index.fulltext.tantivy-pyreader/writer path withpaimon_ftindexadapters, including native roaring-filter support forinclude_row_ids.paimon-tantivy-jni.apache/paimon-full-textfor Java and Python jobs.Testing
python -m py_compile paimon-python/pypaimon/globalindex/tantivy/tantivy_full_text_global_index_reader.py paimon-python/pypaimon/globalindex/tantivy/tantivy_full_text_index_writer.py paimon-python/pypaimon/tests/vector_search_filter_test.py paimon-python/pypaimon/tests/global_index_build_test.py paimon-python/pypaimon/tests/e2e/java_py_read_write_test.pypython -m pytest paimon-python/pypaimon/tests/vector_search_filter_test.py paimon-python/pypaimon/tests/global_index_build_test.py -qPAIMON_FTINDEX_JNI_LIB_PATH=... mvn -B -ntp -pl paimon-tantivy/paimon-tantivy-index -am -Dtest=TantivyFullTextGlobalIndexTest,JavaPyTantivyE2ETest,TantivyFullTextGlobalIndexerFactoryTest -Drun.e2e.tests=true -DfailIfNoTests=false -Dcheckstyle.skip=true -Dspotless.check.skip=true clean testPAIMON_FTINDEX_LIB_PATH=... PYTHONPATH=... python -m pytest java_py_read_write_test.py::JavaPyReadWriteTest::test_read_tantivy_full_text_index -qgit diff --check --cachedNotes
paimon_ftindexpackage installation because the package requires Python 3.8+.