docs: ensemble component predict pattern + regression test (#1136) by immu4989 · Pull Request #1558 · microsoft/FLAML

immu4989 · 2026-06-09T05:22:39Z

Why are these changes needed?

#1136 reported a long-standing footgun: when automl.fit(..., ensemble=True) is used on a DataFrame with categorical features, calling automl.model.estimators_[i].predict(X_raw) on a single ensemble component throws cryptic errors (LightGBM train and valid dataset categorical_feature do not match, XGBoost DataFrame.dtypes for data must be int, float, bool or category, sklearn estimators feature_names should match those that were passed during fit). The reporter found a workaround that reached into private state: automl._state.task.preprocess(X, automl._transformer).

PR #1497 (merged 2026-01-21) already added the public automl.preprocess(X) method that performs exactly this transformation without touching private state. The underlying functional gap is therefore already closed — but the docstring example didn't show the ensemble-component case, and there was no regression test pinning the original failure surface. This PR closes both gaps:

flaml/automl/automl.py — extend the preprocess() docstring example to show the per-component prediction pattern, referencing #1136.
test/automl/test_regression.py — add test_ensemble_component_predict_via_public_preprocess, which builds an ensemble on a DataFrame with categorical features (gender, education — the exact failure surface from #1136), asserts that at least one component fails on raw input, and verifies all components succeed once data is run through automl.preprocess(X).

After this lands, #1136 can be closed.

Verified locally

New test: pytest test/automl/test_regression.py::test_ensemble_component_predict_via_public_preprocess — passes.
Adjacent test_multioutput test continues to pass.
pre-commit run --files flaml/automl/automl.py test/automl/test_regression.py — all hooks pass.

Related issue number

Closes #1136 (functional resolution shipped in #1497; this PR adds the missing example + regression test).

Checks

I've used pre-commit to lint the changes in this PR (note the same in integrated in our CI checks).
I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

…#1136)

Copilot

Pull request overview

This PR updates FLAML’s public AutoML.preprocess(X) documentation and adds a regression test to capture the ensemble-component prediction “footgun” from #1136, where individual ensemble components (from automl.model.estimators_) are fit on task-preprocessed data and therefore require callers to preprocess inputs before calling .predict() directly.

Changes:

Extend AutoML.preprocess() docstring example to include the ensemble-component prediction pattern referencing #1136.
Add a regression test that builds an ensemble on a DataFrame with categorical columns and verifies component prediction succeeds after using automl.preprocess(X).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`flaml/automl/automl.py`	Docstring example updated to demonstrate preprocessing before predicting with a single ensemble component.
`test/automl/test_regression.py`	Adds a regression test covering the ensemble-component prediction workflow with categorical features using the public `preprocess()` API.

immu4989 and others added 2 commits June 9, 2026 00:22

docs: ensemble component predict pattern + regression test (microsoft…

e0fa821

…#1136)

Merge branch 'main' into flaml-fix-1136-ensemble-component-preprocess

96093c3

thinkall requested a review from Copilot June 12, 2026 07:13

Copilot started reviewing on behalf of thinkall June 12, 2026 07:13 View session

thinkall approved these changes Jun 12, 2026

View reviewed changes

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Comment thread test/automl/test_regression.py

Comment thread test/automl/test_regression.py

Comment thread test/automl/test_regression.py

Comment thread flaml/automl/automl.py

thinkall merged commit 7d25e03 into microsoft:main Jun 12, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: ensemble component predict pattern + regression test (#1136)#1558

docs: ensemble component predict pattern + regression test (#1136)#1558
thinkall merged 2 commits into
microsoft:mainfrom
immu4989:flaml-fix-1136-ensemble-component-preprocess

immu4989 commented Jun 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

immu4989 commented Jun 9, 2026

Why are these changes needed?

Verified locally

Related issue number

Checks

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants