Package nltk_lite :: Package contrib :: Package toolbox :: Module text :: Class Text
[hide private]
[frames] | no frames]

Class Text

source code

                    object --+    
                             |    
corpora.toolbox.StandardFormat --+
                                 |
                                Text

This class defines an interlinearized text, which consists of a collection of Paragraph objects.

Instance Methods [hide private]
 
__init__(self, file=None, fm_line='ref', fm_paragraph='id', fm_morpheme='m', fm_morpheme_gloss='g', fm_word='w')
Constructor for Text object.
source code
 
get_lines(self)
Obtain a list of line objects (ignoring paragraph structure).
source code
 
get_paragraphs(self)
Obtain a list of paragraph objects.
source code
 
add_paragraph(self, paragraph)
Add paragraph object to list of paragraph objects.
source code
 
getLineFM(self)
Get field marker that identifies a new line.
source code
 
setLineFM(self, lineHeadFieldMarker)
Change default field marker that identifies new line.
source code
 
getParagraphFM(self)
Get field marker that identifies a new paragraph.
source code
 
setParagraphFM(self, paragraphHeadFieldMarker)
Change default field marker that identifies new paragraph.
source code
 
getWordFM(self)
Get field marker that identifies word tier.
source code
 
setWordFM(self, wordFieldMarker)
Change default field marker that identifies word tier.
source code
 
getMorphemeFM(self)
Get field marker that identifies morpheme tier.
source code
 
setMorphemeFM(self, morphemeFieldMarker)
Change default field marker that identifies morpheme tier.
source code
 
getMorphemeGlossFM(self)
Get field marker that identifies morpheme gloss tier.
source code
 
setMorphemeGlossFM(self, morphemeGlossFieldMarker)
Change default field marker that identifies morpheme gloss tier.
source code
 
get_file(self)
Get file path as string.
source code
 
set_file(self, file)
Change file path set upon initialization.
source code
 
parse(self)
Parse specified Shoebox file into Text object.
source code

Inherited from corpora.toolbox.StandardFormat: close, fields, open, open_string, raw_fields

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, file=None, fm_line='ref', fm_paragraph='id', fm_morpheme='m', fm_morpheme_gloss='g', fm_word='w')
(Constructor)

source code 

Constructor for Text object. All arguments are optional. By default, the fields used to parse the Shoebox file are the following:

Parameters:
  • file (str) - filepath
  • fm_line (str) - field marker identifying line (default: 'ref')
  • fm_paragraph (str) - field marker identifying paragraph (default: 'id')
  • fm_morpheme (str) - field marker identifying morpheme tier (default: 'm')
  • fm_morpheme_gloss (str) - field marker identifying morpheme gloss tier (default: 'g')
  • fm_word (str) - field marker identifying word tier (???)
Overrides: object.__init__

add_paragraph(self, paragraph)

source code 

Add paragraph object to list of paragraph objects.

Parameters:
  • paragraph (Paragraph) - paragraph to be added to text