Skip to content
This repository has been archived by the owner on Jan 31, 2023. It is now read-only.

Error when predict with converted model built with CountVectorizer(binary=True) #19

Open
phongvis opened this issue Feb 8, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@phongvis
Copy link

phongvis commented Feb 8, 2022

Describe the bug
An error is raised when making an inference with a converted sklearn model built with CountVectorizer(binary=True). It's ok if binary=False

To Reproduce

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from pure_sklearn.map import convert_estimator

vectorizer = CountVectorizer(binary=True)
model = LogisticRegression(random_state=0)
pipeline = Pipeline([
    ('vect', vectorizer),
    ('clf', model)
])

X_train = ['one text', 'two text', 'three text']
y_train = ['1', '2', '3']
pipeline.fit(X_train, y_train)
converted = convert_estimator(pipeline)
converted.predict(['four'])

It's ok if a vectorizer is created with binary=False.

Expected behavior
There shouldn't be any errors.

Additional context
Add any other context about the problem here.

@phongvis phongvis added the bug Something isn't working label Feb 8, 2022
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants