docs: ensemble component predict pattern + regression test (#1136)#1558
Merged
thinkall merged 2 commits intoJun 12, 2026
Merged
Conversation
thinkall
approved these changes
Jun 12, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates FLAML’s public AutoML.preprocess(X) documentation and adds a regression test to capture the ensemble-component prediction “footgun” from #1136, where individual ensemble components (from automl.model.estimators_) are fit on task-preprocessed data and therefore require callers to preprocess inputs before calling .predict() directly.
Changes:
- Extend
AutoML.preprocess()docstring example to include the ensemble-component prediction pattern referencing #1136. - Add a regression test that builds an ensemble on a DataFrame with categorical columns and verifies component prediction succeeds after using
automl.preprocess(X).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
flaml/automl/automl.py |
Docstring example updated to demonstrate preprocessing before predicting with a single ensemble component. |
test/automl/test_regression.py |
Adds a regression test covering the ensemble-component prediction workflow with categorical features using the public preprocess() API. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
#1136 reported a long-standing footgun: when
automl.fit(..., ensemble=True)is used on a DataFrame with categorical features, callingautoml.model.estimators_[i].predict(X_raw)on a single ensemble component throws cryptic errors (LightGBMtrain and valid dataset categorical_feature do not match, XGBoostDataFrame.dtypes for data must be int, float, bool or category, sklearn estimatorsfeature_names should match those that were passed during fit). The reporter found a workaround that reached into private state:automl._state.task.preprocess(X, automl._transformer).PR #1497 (merged 2026-01-21) already added the public
automl.preprocess(X)method that performs exactly this transformation without touching private state. The underlying functional gap is therefore already closed — but the docstring example didn't show the ensemble-component case, and there was no regression test pinning the original failure surface. This PR closes both gaps:flaml/automl/automl.py— extend thepreprocess()docstring example to show the per-component prediction pattern, referencing#1136.test/automl/test_regression.py— addtest_ensemble_component_predict_via_public_preprocess, which builds an ensemble on a DataFrame with categorical features (gender,education— the exact failure surface from#1136), asserts that at least one component fails on raw input, and verifies all components succeed once data is run throughautoml.preprocess(X).After this lands, #1136 can be closed.
Verified locally
pytest test/automl/test_regression.py::test_ensemble_component_predict_via_public_preprocess— passes.test_multioutputtest continues to pass.pre-commit run --files flaml/automl/automl.py test/automl/test_regression.py— all hooks pass.Related issue number
Closes #1136 (functional resolution shipped in #1497; this PR adds the missing example + regression test).
Checks