Skip to content

Order of nominal feature levels change the results #629

@aswer-svg

Description

@aswer-svg

Hi,

I am building an EBM regressor with some nominal and continuous features with only main effects. I have tried ordering the nominal features by the mean of the response for better visualization. For example, if a feature has levels named ["cat", "dog", "fish"], the transformed levels would be ["2-cat", "0-dog", "1-fish"], where the number at the start indicate the order with respect to the mean of the response.

With this transformation, the scores of transformed feature levels change a lot. I had a categorical feature whose importance was quite low before the transformation, and it is the most important feature after this transformation.

I didn't specify the feature_types parameter, and the categorical features has type "nominal" when inspecting feature_types_in_.

As far as I know, EBM uses Fischer method for nominal categoricals, so the order shouldn't matter.

Can you please clarify the underlying cause of this discrepancy in the model results?

Take care,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions