regex_utils library#
This library contains useful utilities to handle all regex related tasks.
Regex to Wildcard Translator#
Goal#
Performs a best-effort translation to turn a regex string to an equivalent wildcard string.
CLP currently only recognizes three meta-characters in the wildcard syntax:
?Matches any single character*Matches zero or more characters\Suppresses the special meaning of meta characters (including itself)
If the regex query can actually be expressed as a wildcard query only deploying the three metacharacters above, CLP should use the wildcard version.
Includes#
The translator function returns a
Result<std::string, std::error_code>type, which can either contain a value or an error code.
To use the translator:
#include <regex_utils/regex_translation_utils.hpp>
using clp::regex_utils::regex_to_wildcard;
// Other code
auto result{regex_to_wildcard(wildcard_str)};
if (result.has_error()) {
auto err_code{result.error()};
// Handle error
} else {
auto regex_str{result.value()};
// Do things with the translated string
}
To add custom configuration to the translator:
#include <regex_utils/RegexToWildcardTranslatorConfig.hpp>
RegexToWildcardTranslatorConfig config{true, false, /*...other booleans*/};
auto result{regex_to_wildcard(wildcard_str, config)};
// Same as above
For a detailed description on the options order and usage, see the Custom Configuration section.
Functionalities#
Wildcards
Turn
.into?Turn
.*into*Turn
.+into?*E.g.
abc.*def.ghi.+will get translated toabc*def?ghi?*
Metacharacter escape sequences
An escaped regex metacharacter is treated as a literal and appended to the wildcard output.
The list of characters that require escaping to have their special meanings suppressed is
[\/^$.|?*+(){}.Superfluous escape characters are ignored for the following characters:
],<>-_=!.E.g.
a\[\+b\-\_c-_dwill get translated toa[+b-_c-_dNote: generally, any non-alphanumeric character can be escaped to use it as a literal. The list this utils library supports is non-exhaustive and can be expanded when necessary.
For metacharacters shared by both syntaxes, keep the escape backslashes.
The list of characters that fall into this category is
*?\. All wildcard metacharacters are also regex metacharacters.E.g.
a\*b\?c\\dwill get translated toa\*b\?c\\d(no change)
Escape sequences with alphanumeric characters are disallowed.
E.g. Special utility escape sequences
\Q,\E,\Aetc. and back references\1\2etc. cannot be translated.
Character set
Reduces a character set into a single character if possible.
A trivial character set containing a single character or a single escaped metacharacter.
E.g.
[a]intoa,[\^]into^
If the
case_insensitive_wildcardconfig is turned on, the translator can also reduce the case-insensitive style character set patterns into a single lowercase character:E.g.
[aA]intoa,[Bb]intob,[xX][Yy][zZ]intoxyz
Custom configuration#
The RegexToWildcardTranslatorConfig class objects are currently immutable once instantiated. By
default, all of the options are set to false.
The constructor takes the following option arguments in order:
case_insensitive_wildcard: see Character set bullet point in the Functionalities section.add_prefix_suffix_wildcards: in the absence of regex anchors, add prefix or suffix wildcards so the query becomes a substring query.E.g.
info.*systemgets translated into*info*system*which makes the original query a substring query.