This module contains classes that allow reading from an index.
Do not instantiate this object directly. Instead use Index.reader().
Returns an iterator of all (undeleted) document IDs in the reader.
Yields the stored fields for all documents (including deleted documents).
Yields (fieldname, text) tuples for every term in the index.
Closes the open files associated with this reader.
Returns the whoosh.codec.base.Codec object used to read this reader’s segment. If this reader is not atomic (reader.is_atomic() == True), returns None.
Returns a whoosh.spelling.Corrector object that suggests corrections based on the terms in the given field.
Returns the total number of UNDELETED documents in this reader.
Returns the total number of documents, DELETED OR UNDELETED, in this reader.
Returns the number of terms in the given field in the given document. This is used by some scoring algorithms.
Returns how many documents the given term appears in.
Yields terms in the given field that start with the given prefix.
Returns the total number of terms in the given field. This is used by some scoring algorithms.
Yields all term values (converted from on-disk bytes) in the given field.
Returns the first ID in the posting list for the given term. This may be optimized in certain backends.
Returns the total number of instances of the given term in the collection.
Returns the generation of the index being read, or -1 if the backend is not versioned.
Returns True if the underlying index/segment has deleted documents.
Returns True if the given document has a term vector for the given field.
Returns True if the given field has a “word graph” associated with it, allowing suggestions for correcting mis-typed words and fast fuzzy term searching.
Returns an iterable of strings representing the names of the indexed fields. This may include additional names not explicitly listed in the Schema if you use “glob” fields.
Returns True if the given document number is marked deleted.
Yields a series of (docnum, stored_fields_dict) tuples for the undeleted documents in the reader.
Yields (text, terminfo) tuples for all terms in the given field.
Yields ((fieldname, text), terminfo) tuples for all terms in the reader, starting at the given term.
Low-level method, yields all postings in the reader as (fieldname, text, docnum, weight, valuestring) tuples.
Yields (text, terminfo) tuples for all terms in the given field with a certain prefix.
Returns a list of (IndexReader, docbase) pairs for the child readers of this reader if it is a composite reader. If this is not a composite reader, it returns [(self, 0)].
Yields all bytestrings in the given field.
Returns the minimum length of the field across all documents. This is used by some scoring algorithms.
Returns the minimum length of the field across all documents. This is used by some scoring algorithms.
Returns the top ‘number’ terms with the highest tf*idf scores as a list of (score, text) tuples.
Returns the top ‘number’ most frequent terms in the given field as a list of (frequency, text) tuples.
Returns a Matcher for the postings of the given term.
>>> pr = reader.postings("content", "render")
>>> pr.skip_to(10)
>>> pr.id
12
Parameters: |
|
---|---|
Return type: |
Returns the whoosh.index.Segment object used by this reader. If this reader is not atomic (reader.is_atomic() == True), returns None.
Returns the whoosh.filedb.filestore.Storage object used by this reader to read its files. If the reader is not atomic, (reader.is_atomic() == True), returns None.
Returns the stored fields for the given document number.
Parameters: | numerickeys – use field numbers as the dictionary keys instead of field names. |
---|
Returns a TermInfo object allowing access to various statistics about the given term.
Yields (fieldname, text) tuples for every term in the index starting at the given prefix.
Returns a generator of words in the given field within maxdist Damerau-Levenshtein edit distance of the given text.
Important: the terms are returned in no particular order. The only criterion is that they are within maxdist edits of text. You may want to run this method multiple times with increasing maxdist values to ensure you get the closest matches first. You may also have additional information (such as term frequency or an acoustic matching algorithm) you can use to rank terms with the same edit distance.
Parameters: |
|
---|
Returns a Matcher object for the given term vector.
>>> docnum = searcher.document_number(path=u'/a/b/c')
>>> v = searcher.vector(docnum, "content")
>>> v.all_as("frequency")
[(u"apple", 3), (u"bear", 2), (u"cab", 2)]
Parameters: |
|
---|---|
Return type: |
Returns an iterator of (termtext, value) pairs for the terms in the given term vector. This is a convenient shortcut to calling vector() and using the Matcher object when all you want are the terms and/or values.
>>> docnum = searcher.document_number(path=u'/a/b/c')
>>> searcher.vector_as("frequency", docnum, "content")
[(u"apple", 3), (u"bear", 2), (u"cab", 2)]
Parameters: |
|
---|
Returns the root whoosh.fst.Node for the given field, if the field has a stored word graph (otherwise raises an exception). You can check whether a field has a word graph using IndexReader.has_word_graph().
Do not instantiate this object directly. Instead use Index.reader().
Represents a set of statistics about a term. This object is returned by IndexReader.term_info(). These statistics may be useful for optimizations and scoring algorithms.
Returns the number of documents the term appears in.
Returns the highest document ID this term appears in.
Returns the length of the longest field value the term appears in.
Returns the number of times the term appears in the document in which it appears the most.
Returns the lowest document ID this term appears in.
Returns the length of the shortest field value the term appears in.
Returns the total frequency of the term across all documents.