python - How to add features to scikit-learn DictVectorizer? -


i'm training spam detector using multinomialnb model in scikit-learn. use dictvectorizer class transform tokens word counts (i.e. features). able train model on time using new data arrives (in case in form of chat messages incoming our app server). this, looks partial_fit function useful.

however can't seem figure out how enlarge size of dictvectorizer after has been "trained". if new features/words arrive have never been seen, ignored. pickle current version of model , dictvectorizer , update them each time new training session. possible?

in documentation, use dictionary learning phase of dictvectorizer. add new feature original dictionary , fit_transform. way add value dictvectoriser.

be careful partial_fit method kind of heavy treatment. told on method documentation, there treatment overhead.

from sklearn.feature_extraction import dictvectorizer v = dictvectorizer(sparse=false) d = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}] x = v.fit_transform(d)  # learn , treatment  # when new data come (value dictionary) d.append(values) x = v.fit_transform(d) # fit again  # 2 choices,  # wait more modification before learning  # or learn each time have modification (not performant) 

Popular posts from this blog