reflex Namespace Reference

updated Thu Jan 26 2017
 
Classes | Typedefs | Functions
reflex Namespace Reference

Classes

class  AbstractLexer
 The abstract lexer class template that is the abstract root class of all reflex-generated scanners. More...
 
class  AbstractMatcher
 The abstract matcher base class template defines an interface for all pattern matcher engines. More...
 
class  Bits
 RE/flex Bits class for dynamic bit vectors. More...
 
class  BoostMatcher
 Boost matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the Boost::regex library. More...
 
class  BoostPerlMatcher
 Boost matcher engine class, extends reflex::BoostMatcher for Boost Perl regex matching. More...
 
class  BoostPosixMatcher
 Boost matcher engine class, extends reflex::BoostMatcher for Boost POSIX regex matching. More...
 
class  FlexLexer
 Flex-compatible FlexLexer abstract base class template derived from reflex::AbstractMatcher for the reflex-generated yyFlexLexer scanner class. More...
 
class  Input
 Input character sequence class for unified access to sources of input. More...
 
struct  lazy_intersection
 Intersection of two ordered sets, with an iterator to get elements lazely. More...
 
struct  lazy_union
 Union of two ordered sets, with an iterator to get elements lazely. More...
 
class  Matcher
 RE/flex matcher engine class, implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators. More...
 
class  ORanges
 RE/flex ORanges (open-ended, ordinal value range) template class. More...
 
class  Pattern
 Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...
 
class  PatternMatcher
 The pattern matcher class template extends abstract matcher base class. More...
 
struct  range_compare
 Functor to order ranges in the reflex::Ranges set container. More...
 
class  Ranges
 RE/flex Ranges template class. More...
 
class  StdEcmaMatcher
 std matcher engine class, extends reflex::StdMatcher for ECMA std::regex::ECMAScript regex matching. More...
 
class  StdMatcher
 std matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the C++11 std::regex library. More...
 
class  StdPosixMatcher
 std matcher engine class, extends reflex::StdMatcher for POSIX ERE std::regex::awk regex matching. More...
 
struct  TypeOp
 TypeOp<T>::Type = T, TypeOp<T>::ConstType = const T, TypeOp<T>::NonConstType = non-const T. More...
 
struct  TypeOp< const T >
 Template specialization of reflex::TypeOp. More...
 

Typedefs

typedef wchar_t unicode_t
 

Functions

std::string regroup (const std::string &regex)
 Regroup: convert a string regex to a regex with group captures only for the outermost (top-level) choices. All outermost (top-level) choices X1, X2, ... Xn in a regex X1|X2|...|Xn are regrouped such that X1|X2|...|Xn => (X1)|(X2)|...|(Xn). Innermost (nested) group captures (Y) are converted to non-captured (Y) => (?:Y). All existing non-capturing groups are retained. For example, a|b(c)|(?:d) => (a)|(b(?:c))|((?:d)). More...
 
int isword (int c)
 Check word character. More...
 
template<typename S1 , typename S2 >
bool is_disjoint (const S1 &s1, const S2 &s2)
 Check if sets s1 and s2 are disjoint. More...
 
template<typename T , typename S >
bool is_in_set (const T &x, const S &s)
 Check if value x is in set s. More...
 
template<typename S1 , typename S2 >
bool is_subset (const S1 &s1, const S2 &s2)
 Check if set s1 is a subset of set s2. More...
 
template<typename S1 , typename S2 >
void set_insert (S1 &s1, const S2 &s2)
 Insert set s2 into set s1. More...
 
template<typename S1 , typename S2 >
void set_delete (S1 &s1, const S2 &s2)
 Delete elements of set s2 from set s1. More...
 
std::string utf8 (unicode_t a, unicode_t b, bool strict=true, const char *esc=NULL)
 Convert a UCS range [a,b] to a UTF-8 regex pattern. More...
 
size_t utf8 (unicode_t c, char *s)
 Convert UCS to UTF-8. More...
 
unicode_t utf8 (const char *s)
 Convert UTF-8 to UCS, returns 0xFFFD for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove this limit to support lossless UTF-8 encoding up to 6 bytes). More...
 

Typedef Documentation

typedef wchar_t reflex::unicode_t

Function Documentation

template<typename S1 , typename S2 >
bool reflex::is_disjoint ( const S1 &  s1,
const S2 &  s2 
)

Check if sets s1 and s2 are disjoint.

Returns
true or false
template<typename T , typename S >
bool reflex::is_in_set ( const T &  x,
const S &  s 
)
inline

Check if value x is in set s.

Returns
true or false
template<typename S1 , typename S2 >
bool reflex::is_subset ( const S1 &  s1,
const S2 &  s2 
)

Check if set s1 is a subset of set s2.

Returns
true or false
int reflex::isword ( int  c)
inline

Check word character.

Returns
nonzero if argument c is in [A-Za-z0-9_], zero otherwise.
Parameters
cCharacter to check
std::string reflex::regroup ( const std::string &  regex)

Regroup: convert a string regex to a regex with group captures only for the outermost (top-level) choices. All outermost (top-level) choices X1, X2, ... Xn in a regex X1|X2|...|Xn are regrouped such that X1|X2|...|Xn => (X1)|(X2)|...|(Xn). Innermost (nested) group captures (Y) are converted to non-captured (Y) => (?:Y). All existing non-capturing groups are retained. For example, a|b(c)|(?:d) => (a)|(b(?:c))|((?:d)).

template<typename S1 , typename S2 >
void reflex::set_delete ( S1 &  s1,
const S2 &  s2 
)

Delete elements of set s2 from set s1.

template<typename S1 , typename S2 >
void reflex::set_insert ( S1 &  s1,
const S2 &  s2 
)
inline

Insert set s2 into set s1.

std::string reflex::utf8 ( unicode_t  a,
unicode_t  b,
bool  strict = true,
const char *  esc = NULL 
)

Convert a UCS range [a,b] to a UTF-8 regex pattern.

Returns
regex string to match the UCS range encoded in UTF-8.
Parameters
alower bound of UCS range
bupper bound of UCS range
strictreturned regex is strict UTF-8 (true) or permissive and lean UTF-8 (false)
escescape char(s), 0-3 chars limit, one backslash "\\" is the default
size_t reflex::utf8 ( unicode_t  c,
char *  s 
)
inline

Convert UCS to UTF-8.

Returns
length (in bytes) of UTF-8 character sequence stored in s.
Parameters
cUCS character
spoints to the buffer to populate with UTF-8 (1 to 6 bytes) not \0-terminated
unicode_t reflex::utf8 ( const char *  s)
inline

Convert UTF-8 to UCS, returns 0xFFFD for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove this limit to support lossless UTF-8 encoding up to 6 bytes).

Returns
UCS character.
Parameters
spoints to the buffer with UTF-8 (1 to 6 bytes)