Use NLTK’s currently recommended part of speech tagger to tag the given list of tokens.
>>> from nltk.tag import pos_tag
>>> from nltk.tokenize import word_tokenize
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
[('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
('.', '.')]
Alphabetical list of part-of-speech tags used in the Penn Treebank Project:
[References]
http://www.nltk.org/api/nltk.tag.html?highlight=pos_tag#nltk.tag.pos_tag
http://www.cis.upenn.edu/~treebank/
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
'Python' 카테고리의 다른 글
scipy.ndimage.filters.gaussian_filter1d (0) | 2014.10.06 |
---|---|
scipy.signal.argrelextrema (0) | 2014.10.06 |
scipy.signal.find_peaks_cwt (0) | 2014.10.05 |
pandas.read_csv (1) | 2014.09.28 |
numpy.loadtxt vs. numpy.genfromtxt (0) | 2014.09.28 |