python - How to add features to scikit-learn DictVectorizer? -

i'm training spam detector using multinomialnb model in scikit-learn. use dictvectorizer class transform tokens word counts (i.e. features). able train model on time using new data arrives (in case in form of chat messages incoming our app server). this, looks partial_fit function useful.

however can't seem figure out how enlarge size of dictvectorizer after has been "trained". if new features/words arrive have never been seen, ignored. pickle current version of model , dictvectorizer , update them each time new training session. possible?

in documentation, use dictionary learning phase of dictvectorizer. add new feature original dictionary , fit_transform. way add value dictvectoriser.

be careful partial_fit method kind of heavy treatment. told on method documentation, there treatment overhead.

from sklearn.feature_extraction import dictvectorizer v = dictvectorizer(sparse=false) d = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}] x = v.fit_transform(d)  # learn , treatment  # when new data come (value dictionary) d.append(values) x = v.fit_transform(d) # fit again  # 2 choices,  # wait more modification before learning  # or learn each time have modification (not performant)

Search This Blog

hj

python - How to add features to scikit-learn DictVectorizer? -

Popular posts from this blog

title2

debugging - Reference - What does this error mean in PHP? -

Perl "Out of memory!" when processing a large batch job -