Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Named Captures

Overview

For complicated regular expressions, dealing with numbered captures can be a pain. Counting left parentheses to figure out which capture to reference is no fun. Less fun is the fact that merely editing a regular expression could cause a capture to be assigned a new number, invaliding code that refers back to it by the old number.

Other regular expression engines solve this problem with a feature called named captures. This feature allows you to assign a name to a capture, and to refer back to the capture by name rather by number. Xpressive also supports named captures, both in dynamic and in static regexes.

Dynamic Named Captures

For dynamic regular expressions, xpressive follows the lead of other popular regex engines with the syntax of named captures. You can create a named capture with "(?P<xxx>...)" and refer back to that capture with "(?P=xxx)". Here, for instance, is a regular expression that creates a named capture and refers back to it:

// Create a named capture called "char" that matches a single
// character and refer back to that capture by name.
sregex rx = sregex::compile("(?P<char>.)(?P=char)");

The effect of the above regular expression is to find the first doubled character.

Once you have executed a match or search operation using a regex with named captures, you can access the named capture through the match_results<> object using the capture's name.

std::string str("tweet");
sregex rx = sregex::compile("(?P<char>.)(?P=char)");
smatch what;
if(regex_search(str, what, rx))
{
    std::cout << "char = " << what["char"] << std::endl;
}

The above code displays:

char = e

You can also refer back to a named capture from within a substitution string. The syntax for that is "\\g<xxx>". Below is some code that demonstrates how to use named captures when doing string substitution.

std::string str("tweet");
sregex rx = sregex::compile("(?P<char>.)(?P=char)");
str = regex_replace(str, rx, "**\\g<char>**", regex_constants::format_perl);
std::cout << str << std::endl;

Notice that you have to specify format_perl when using named captures. Only the perl syntax recognizes the "\\g<xxx>" syntax. The above code displays:

tw**e**t

Static Named Captures

If you're using static regular expressions, creating and using named captures is even easier. You can use the mark_tag type to create a variable that you can use like s1, s2 and friends, but with a name that is more meaningful. Below is how the above example would look using static regexes:

mark_tag char_(1); // char_ is now a synonym for s1
sregex rx = (char_= _) >> char_;

After a match operation, you can use the mark_tag to index into the match_results<> to access the named capture:

std::string str("tweet");
mark_tag char_(1);
sregex rx = (char_= _) >> char_;
smatch what;
if(regex_search(str, what, rx))
{
    std::cout << what[char_] << std::endl;
}

The above code displays:

char = e

When doing string substitutions with regex_replace(), you can use named captures to create format expressions as below:

std::string str("tweet");
mark_tag char_(1);
sregex rx = (char_= _) >> char_;
str = regex_replace(str, rx, "**" + char_ + "**");
std::cout << str << std::endl;

The above code displays:

tw**e**t
[Note] Note

You need to include <boost/xpressive/regex_actions.hpp> to use format expressions.


PrevUpHomeNext