| |
- nltk.chunk.api.ChunkParserI(nltk.parse.api.ParserI)
-
- TreeTaggerChunker
- nltk.tag.api.TaggerI(builtins.object)
-
- TreeTagger
class TreeTagger(nltk.tag.api.TaggerI) |
|
A class for pos tagging with TreeTagger. The default encoding used by
TreeTagger is utf-8. The input is the paths to:
- a language trained on training data
- (optionally) the path to the TreeTagger binary
This class communicates with the TreeTagger binary via pipes.
Example:
.. doctest::
:options: +SKIP
>>> from treetagger import TreeTagger
>>> tt = TreeTagger(language='english')
>>> tt.tag('What is the airspeed of an unladen swallow?')
[['What', 'WP', 'what'],
['is', 'VBZ', 'be'],
['the', 'DT', 'the'],
['airspeed', 'NN', 'airspeed'],
['of', 'IN', 'of'],
['an', 'DT', 'an'],
['unladen', 'JJ', '<unknown>'],
['swallow', 'NN', 'swallow'],
['?', 'SENT', '?']] |
|
- Method resolution order:
- TreeTagger
- nltk.tag.api.TaggerI
- builtins.object
Methods defined here:
- __init__(self, path_to_treetagger=None, language='english', verbose=False, abbreviation_list=None)
- Initialize the TreeTagger.
:param language: Default language is english.
The encoding used by the model. Unicode tokens
passed to the tag() method are converted to
this charset when they are sent to TreeTagger.
The default is utf-8.
This parameter is ignored for str tokens, which are sent as-is.
The caller must ensure that tokens are encoded in the right charset.
- get_installed_lang(self)
- Returns a list of the installed languages for the treetagger.
- get_treetagger_path(self)
- Returns the path of the treetagger
- tag(self, sentences)
- Tags a single sentence: a list of words.
The tokens should not contain any newline characters.
Data and other attributes defined here:
- __abstractmethods__ = frozenset()
Methods inherited from nltk.tag.api.TaggerI:
- evaluate(self, gold)
- Score the accuracy of the tagger against the gold standard.
Strip the tags from the gold standard text, retag it using
the tagger, then compute the accuracy score.
:type gold: list(list(tuple(str, str)))
:param gold: The list of tagged sentences to score the tagger on.
:rtype: float
- tag_sents(self, sentences)
- Apply ``self.tag()`` to each element of *sentences*. I.e.:
return [self.tag(sent) for sent in sentences]
Data descriptors inherited from nltk.tag.api.TaggerI:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class TreeTaggerChunker(nltk.chunk.api.ChunkParserI) |
|
A class for chunking with TreeTagger Chunker. The default encoding used
by TreeTagger is utf-8. The input is the paths to:
- a language trained on training data
- (optionally) the path to the TreeTagger binary
This class communicates with the TreeTagger Chunker binary via pipes.
Example:
.. doctest::
:options: +SKIP
>>> from treetagger import TreeTaggerChunker
>>> tt = TreeTaggerChunker(language='english')
>>> tt.parse('What is the airspeed of an unladen swallow?')
[['<NC>'], ['What', 'WP', 'what'], ['</NC>'], ['<VC>'],
['is', 'VBZ', 'be'], ['</VC>'], ['<NC>'], ['the', 'DT', 'the'],
['airspeed', 'NN', 'airspeed'], ['</NC>'], ['<PC>'],
['of', 'IN', 'of'], ['<NC>'], ['an', 'DT', 'an'],
['unladen', 'JJ', '<unknown>'], ['swallow', 'NN', 'swallow'],
['</NC>'], ['</PC>'], ['?', 'SENT', '?']]
.. doctest::
:options: +SKIP
>>> from treetagger import TreeTaggerChunker
>>> tt = TreeTaggerChunker(language='english')
>>> tt.parse_to_tree('What is the airspeed of an unladen swallow?')
Tree('S', [Tree('NC', [Tree('What', ['WP'])]),
Tree('VC', [Tree('is', ['VBZ'])]),
Tree('NC', [Tree('the', ['DT']), Tree('airspeed', ['NN'])]),
Tree('PC', [Tree('of', ['IN']),
Tree('NC', [Tree('an', ['DT']),
Tree('unladen', ['JJ']),
Tree('swallow', ['NN'])])]),
Tree('?', ['SENT'])])
.. doctest::
:options: +SKIP
>>> from nltk.tree import Tree
>>> from treetagger import TreeTaggerChunker
>>> tt = TreeTaggerChunker(language='english')
>>> res = tt.parse_to_tree('What is the airspeed of an unladen swallow?')
>>> print(res)
(S
(NC (What WP))
(VC (is VBZ))
(NC (the DT) (airspeed NN))
(PC (of IN) (NC (an DT) (unladen JJ) (swallow NN)))
(? SENT)) |
|
- Method resolution order:
- TreeTaggerChunker
- nltk.chunk.api.ChunkParserI
- nltk.parse.api.ParserI
- builtins.object
Methods defined here:
- __init__(self, path_to_treetagger=None, language='english', verbose=False, abbreviation_list=None)
- Initialize the TreeTaggerChunker.
:param language: Default language is english.
The encoding used by the model. Unicode tokens
passed to the parse() and parse_to_tree() methods are converted to
this charset when they are sent to TreeTaggerChunker.
The default is utf-8.
This parameter is ignored for str tokens, which are sent as-is.
The caller must ensure that tokens are encoded in the right charset.
- get_installed_lang(self)
- Returns a list of the installed languages for the treetagger.
- get_treetagger_path(self)
- Returns the path of the treetagger
- parse(self, tokens)
- Tag and chunk a single sentence: a list of words.
The tokens should not contain any newline characters.
- parse_to_tree(self, tokens)
- Parse tokens into a tree
Methods inherited from nltk.chunk.api.ChunkParserI:
- evaluate(self, gold)
- Score the accuracy of the chunker against the gold standard.
Remove the chunking the gold standard text, rechunk it using
the chunker, and return a ``ChunkScore`` object
reflecting the performance of this chunk peraser.
:type gold: list(Tree)
:param gold: The list of chunked sentences to score the chunker on.
:rtype: ChunkScore
Methods inherited from nltk.parse.api.ParserI:
- grammar(self)
- :return: The grammar used by this parser.
- parse_all(self, sent, *args, **kwargs)
- :rtype: list(Tree)
- parse_one(self, sent, *args, **kwargs)
- :rtype: Tree or None
- parse_sents(self, sents, *args, **kwargs)
- Apply ``self.parse()`` to each element of ``sents``.
:rtype: iter(iter(Tree))
Data descriptors inherited from nltk.parse.api.ParserI:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |