Python: module treetagger

treetagger

index
/home/linkliman/Documents/Python/Python3.6/Projects/TAL-Project/tal/src/treetagger.py

This file defines the TreeTagger and TreeTaggerChunker classes.

Classes



nltk.chunk.api.ChunkParserI(nltk.parse.api.ParserI)

TreeTaggerChunker

nltk.tag.api.TaggerI(builtins.object)

TreeTagger

class TreeTagger(nltk.tag.api.TaggerI)

    A class for pos tagging with TreeTagger. The default encoding used by TreeTagger is utf-8. The input is the paths to: - a language trained on training data - (optionally) the path to the TreeTagger binary This class communicates with the TreeTagger binary via pipes. Example: .. doctest::     :options: +SKIP     >>> from treetagger import TreeTagger     >>> tt = TreeTagger(language='english')     >>> tt.tag('What is the airspeed of an unladen swallow?')     [['What', 'WP', 'what'],      ['is', 'VBZ', 'be'],      ['the', 'DT', 'the'],      ['airspeed', 'NN', 'airspeed'],      ['of', 'IN', 'of'],      ['an', 'DT', 'an'],      ['unladen', 'JJ', '<unknown>'],      ['swallow', 'NN', 'swallow'],      ['?', 'SENT', '?']]

Method resolution order:

TreeTagger

nltk.tag.api.TaggerI

builtins.object

Methods defined here:

__init__(self, path_to_treetagger=None, language='english', verbose=False, abbreviation_list=None)
Initialize the TreeTagger. :param language: Default language is english. The encoding used by the model. Unicode tokens passed to the tag() method are converted to this charset when they are sent to TreeTagger. The default is utf-8. This parameter is ignored for str tokens, which are sent as-is. The caller must ensure that tokens are encoded in the right charset.

get_installed_lang(self)
Returns a list of the installed languages for the treetagger.

get_treetagger_path(self)
Returns the path of the treetagger

tag(self, sentences)
Tags a single sentence: a list of words. The tokens should not contain any newline characters.

Data and other attributes defined here:

__abstractmethods__ = frozenset()

Methods inherited from nltk.tag.api.TaggerI:

evaluate(self, gold)
Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score. :type gold: list(list(tuple(str, str))) :param gold: The list of tagged sentences to score the tagger on. :rtype: float

tag_sents(self, sentences)
Apply ``self.tag()`` to each element of *sentences*.  I.e.:     return [self.tag(sent) for sent in sentences]

Data descriptors inherited from nltk.tag.api.TaggerI:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class TreeTaggerChunker(nltk.chunk.api.ChunkParserI)

    A class for chunking with TreeTagger Chunker. The default encoding used by TreeTagger is utf-8. The input is the paths to: - a language trained on training data - (optionally) the path to the TreeTagger binary This class communicates with the TreeTagger Chunker binary via pipes. Example: .. doctest::     :options: +SKIP     >>> from treetagger import TreeTaggerChunker     >>> tt = TreeTaggerChunker(language='english')     >>> tt.parse('What is the airspeed of an unladen swallow?')     [['<NC>'], ['What', 'WP', 'what'], ['</NC>'], ['<VC>'],      ['is', 'VBZ', 'be'], ['</VC>'], ['<NC>'], ['the', 'DT', 'the'],      ['airspeed', 'NN', 'airspeed'], ['</NC>'], ['<PC>'],      ['of', 'IN', 'of'], ['<NC>'], ['an', 'DT', 'an'],      ['unladen', 'JJ', '<unknown>'], ['swallow', 'NN', 'swallow'],      ['</NC>'], ['</PC>'], ['?', 'SENT', '?']] .. doctest::     :options: +SKIP     >>> from treetagger import TreeTaggerChunker     >>> tt = TreeTaggerChunker(language='english')     >>> tt.parse_to_tree('What is the airspeed of an unladen swallow?')     Tree('S', [Tree('NC', [Tree('What', ['WP'])]),     Tree('VC', [Tree('is', ['VBZ'])]),     Tree('NC', [Tree('the', ['DT']), Tree('airspeed', ['NN'])]),     Tree('PC', [Tree('of', ['IN']),                 Tree('NC', [Tree('an', ['DT']),                             Tree('unladen', ['JJ']),                             Tree('swallow', ['NN'])])]),     Tree('?', ['SENT'])]) .. doctest::     :options: +SKIP     >>> from nltk.tree import Tree     >>> from treetagger import TreeTaggerChunker     >>> tt = TreeTaggerChunker(language='english')     >>> res = tt.parse_to_tree('What is the airspeed of an unladen swallow?')     >>> print(res)     (S         (NC (What WP))         (VC (is VBZ))         (NC (the DT) (airspeed NN))         (PC (of IN) (NC (an DT) (unladen JJ) (swallow NN)))         (? SENT))

Method resolution order:

TreeTaggerChunker

nltk.chunk.api.ChunkParserI

nltk.parse.api.ParserI

builtins.object

Methods defined here:

__init__(self, path_to_treetagger=None, language='english', verbose=False, abbreviation_list=None)
Initialize the TreeTaggerChunker. :param language: Default language is english. The encoding used by the model. Unicode tokens passed to the parse() and parse_to_tree() methods are converted to this charset when they are sent to TreeTaggerChunker. The default is utf-8. This parameter is ignored for str tokens, which are sent as-is. The caller must ensure that tokens are encoded in the right charset.

get_installed_lang(self)
Returns a list of the installed languages for the treetagger.

get_treetagger_path(self)
Returns the path of the treetagger

parse(self, tokens)
Tag and chunk a single sentence: a list of words. The tokens should not contain any newline characters.

parse_to_tree(self, tokens)
Parse tokens into a tree

Methods inherited from nltk.chunk.api.ChunkParserI:

evaluate(self, gold)
Score the accuracy of the chunker against the gold standard. Remove the chunking the gold standard text, rechunk it using the chunker, and return a ``ChunkScore`` object reflecting the performance of this chunk peraser. :type gold: list(Tree) :param gold: The list of chunked sentences to score the chunker on. :rtype: ChunkScore

Methods inherited from nltk.parse.api.ParserI:

grammar(self)
:return: The grammar used by this parser.

parse_all(self, sent, *args, **kwargs)
:rtype: list(Tree)

parse_one(self, sent, *args, **kwargs)
:rtype: Tree or None

parse_sents(self, sents, *args, **kwargs)
Apply ``self.parse()`` to each element of ``sents``. :rtype: iter(iter(Tree))

Data descriptors inherited from nltk.parse.api.ParserI:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

Functions

files(path, pattern)
Find all find following the given pattern in the given path.

Data

PIPE = -1

Functions
		files(path, pattern) `Find all find following the given pattern in the given path.`