regex_utils library#
This library contains useful utilities to handle all regex related tasks.
Regex to Wildcard Translator#
Goal#
Performs a best-effort translation to turn a regex string to an equivalent wildcard string.
CLP currently only recognizes three meta-characters in the wildcard syntax:
- ?Matches any single character
- *Matches zero or more characters
- \Suppresses the special meaning of meta characters (including itself)
If the regex query can actually be expressed as a wildcard query only deploying the three metacharacters above, CLP should use the wildcard version.
Includes#
- The translator function returns a - Result<std::string, std::error_code>type, which can either contain a value or an error code.
To use the translator:
#include <regex_utils/regex_translation_utils.hpp>
using clp::regex_utils::regex_to_wildcard;
// Other code
auto result{regex_to_wildcard(wildcard_str)};
if (result.has_error()) {
    auto err_code{result.error()};
    // Handle error
} else {
    auto regex_str{result.value()};
    // Do things with the translated string
}
- To add custom configuration to the translator: 
#include <regex_utils/RegexToWildcardTranslatorConfig.hpp>
RegexToWildcardTranslatorConfig config{true, false, /*...other booleans*/};
auto result{regex_to_wildcard(wildcard_str, config)};
// Same as above
For a detailed description on the options order and usage, see the Custom Configuration section.
Functionalities#
- Wildcards - Turn - .into- ?
- Turn - .*into- *
- Turn - .+into- ?*
- E.g. - abc.*def.ghi.+will get translated to- abc*def?ghi?*
 
- Metacharacter escape sequences - An escaped regex metacharacter is treated as a literal and appended to the wildcard output. - The list of characters that require escaping to have their special meanings suppressed is - [\/^$.|?*+(){}.
- Superfluous escape characters are ignored for the following characters: - ],<>-_=!.
- E.g. - a\[\+b\-\_c-_dwill get translated to- a[+b-_c-_d
- Note: generally, any non-alphanumeric character can be escaped to use it as a literal. The list this utils library supports is non-exhaustive and can be expanded when necessary. 
 
- For metacharacters shared by both syntaxes, keep the escape backslashes. - The list of characters that fall into this category is - *?\. All wildcard metacharacters are also regex metacharacters.
- E.g. - a\*b\?c\\dwill get translated to- a\*b\?c\\d(no change)
 
- Escape sequences with alphanumeric characters are disallowed. - E.g. Special utility escape sequences - \Q,- \E,- \Aetc. and back references- \1- \2etc. cannot be translated.
 
 
- Character set - Reduces a character set into a single character if possible. - A trivial character set containing a single character or a single escaped metacharacter. - E.g. - [a]into- a,- [\^]into- ^
 
- If the - case_insensitive_wildcardconfig is turned on, the translator can also reduce the case-insensitive style character set patterns into a single lowercase character:- E.g. - [aA]into- a,- [Bb]into- b,- [xX][Yy][zZ]into- xyz
 
 
 
Custom configuration#
The RegexToWildcardTranslatorConfig class objects are currently immutable once instantiated. By
default, all of the options are set to false.
The constructor takes the following option arguments in order:
- case_insensitive_wildcard: see Character set bullet point in the Functionalities section.
- add_prefix_suffix_wildcards: in the absence of regex anchors, add prefix or suffix wildcards so the query becomes a substring query.- E.g. - info.*systemgets translated into- *info*system*which makes the original query a substring query.
 
