Python: module tagging

tagging

index
/home/linkliman/Documents/Python/Python3.6/Projects/TAL-Project/tal/src/tagging.py

This file contains funtions to tag a sentence with the TreeTagger and to convert a tagged sentence to a format for our application.

Modules

os
re
sys
src.treetagger

Functions


add_mod_stce(temp_saved_tag, current_group, mod_sentence)
A function to shorten transform_sentence

check_entities(sentence, entity)
A function that transform tagged group where tag is 'S' in tagged group where tag is 'A' where they don't contain the subject entity Arguments:     sentence {list} -- A sentence of new tagged groups     entity {string} -- the name of the original wikipedia page Returns:     [list] -- Returns the modified sentence

check_subj(sentence, subject)
A function that modifies a tagged sentence to tag a non-subject as subject if it contains the subject. Arguments:     sentence {list} -- sentence as a list of tagged groups     subject {str} -- subject to check

clean_alias(sentence, subject)
A function that catch alias sentence french format of wikipedia and return a modified sentence Arguments:     sentence {list} -- A sentence to be modified Returns:     [list] -- [Returns the modified sentence]

clean_born(sentence)
A function that catch born sentence french format of wikipedia and return a modified sentence Arguments:     sentence {list} -- A sentence to be modified Returns:     [list] -- Returns the modified sentence

clean_isolate_etre_verb(sentence, subject)
A function that catch isolated verbs and add before them the subject Arguments:     sentence {list} -- A sentence to be modified     subject {string} -- the subject of the original wikipedia page Returns:     [list] -- Returns the modified sentence

clean_or_simply(sentence)
A function that catch "ou simplement" sentence french format of wikipedia and return a modified sentence Arguments:     sentence {list} -- A sentence to be modified Returns:     [list] -- [Returns the modified sentence]

clean_poss(sentence, subject)
A function that catch possessive group Arguments:     sentence {list} -- A sentence to be modified     subject {string} -- the subject of the original wikipedia page Returns:     [list] -- Returns the modified sentence

clean_say(sentence, subject)
A function that catch says sentence french format of wikipedia and return a modified sentence Arguments:     sentence {list} -- A sentence to be modified     subject {string} -- the subject of the original wikipedia page Returns:     [list] -- Returns the modified sentence

cleaning_sentence(sentence, subject)
A function that call cleaning functions Arguments:     sentence {list} -- A sentence to be modified     subject {string} -- the subject of the original wikipedia page Returns:     [list] -- Returns the modified sentence

clear_multiple_spaces(sentence)
A function that transform multiple spaces in one space Arguments:     sentence {list} -- A sentence to be cleaned Returns:     [list] -- Returns the cleaned sentence

clear_sentence(sentence)
A function that delete useless spaces Arguments:     sentence {list} -- A sentence to be cleaned Returns:     [list] -- Returns the cleaned sentence

del_parenthesis_content(sentence)
A function that delete parenthesis content Arguments:     sentence {list} -- A sentence to be cleaned Returns:     [list] -- Returns the cleaned sentence

del_phonetic(sentence)
A function that delete wikipedia phonetics informations Arguments:     sentence {list} -- A sentence to be cleaned Returns:     [list] -- Returns the cleaned sentence

is_subj(part, subject)
Returns whether the part checked contains any part of the subject.

separe_sentences(text, subject, article_file)
A function that call tagging_text() function and split the tagged sentences Arguments:     text {string} -- the text we want to tag     subject {string} -- the title of the original wikipedia page     article_file {file} -- the report file Keyword Arguments:     path {str} -- the path to the TreeTagger (optionnal) (default: {"./TreeTagger"})     lang {str} -- the language of the text (optionnal) (default: {"french"}) Returns:     [list] -- Returns the list of sentences

split(string, num)
A function that split a string in a list of n characters string Arguments:     string {str} -- the text we want to cut     num {int} -- the length of the string at the output Returns:     [list] -- Returns a list of string

tagging_text(text, subject, article_file, path='./TreeTagger', lang='french')
A function that call that transform a text in a tagged text by TreeTagger Arguments:     text {string} -- the text we want to tag     subject {type} -- the title of the original wikipedia page     article_file {file} -- the report file Keyword Arguments:     path {str} -- the path to the TreeTagger (optionnal) (default: {"./TreeTagger"})     lang {str} -- the language of the text (optionnal) (default: {"french"}) Returns:     [list] -- Returns the tagged text

transform_sentence(sentence, article_file)
A function that transform tagged text in an other tagging more simple Arguments:     sentence {list} -- the tagged text that we will transform     file {file} -- the report file Returns:     [list] -- Returns the new tagged text

Data

DET_POS = ['DET:POS']
END_SENT = ['END']
ENTITY_NAME = ['NAM']
PRONOM_PERS = ['PRO:PER']
PRONOM_POSSESSIF = ['PRO:POS']
THIS_DIR = '/home/linkliman/Documents/Python/Python3.6/Projects/TAL-Project/tal/src'
UNTREATED_TAGS = ['ABR', 'ADJ', 'ADV', 'DET:ART', 'INT', 'KON', 'PUN', 'NOM', 'NUM', 'PRO', 'PRO:DEM', 'PRO:IND', 'PRO:REL', 'PRP', 'PRP:det', 'PUN', 'PUN:cit', 'SENT', 'SYM', 'VER:infi', ...]
VERB = ['VER:cond', 'VER:futu', 'VER:impe', 'VER:impf', 'VER:pres', 'VER:simp', 'VER:subi', 'VER:subp']

Data
		DET_POS = ['DET:POS'] END_SENT = ['END'] ENTITY_NAME = ['NAM'] PRONOM_PERS = ['PRO:PER'] PRONOM_POSSESSIF = ['PRO:POS'] THIS_DIR = '/home/linkliman/Documents/Python/Python3.6/Projects/TAL-Project/tal/src' UNTREATED_TAGS = ['ABR', 'ADJ', 'ADV', 'DET:ART', 'INT', 'KON', 'PUN', 'NOM', 'NUM', 'PRO', 'PRO:DEM', 'PRO:IND', 'PRO:REL', 'PRP', 'PRP:det', 'PUN', 'PUN:cit', 'SENT', 'SYM', 'VER:infi', ...] VERB = ['VER:cond', 'VER:futu', 'VER:impe', 'VER:impf', 'VER:pres', 'VER:simp', 'VER:subi', 'VER:subp']