Tests for CLP's default variable schema: timestamp, int, float, hex, key-value pairs, etc.
More...
Tests for CLP's default variable schema: timestamp, int, float, hex, key-value pairs, etc.
Validates token recognition across common variable types using a default schema definition.
◆ single_line_with_clp_default_vars()
void single_line_with_clp_default_vars |
( |
| ) |
|
Validates tokenization behavior using the default schema commonly used in CLP.
This tests the BufferParser's ability to correctly tokenize inputs according to a schema defining:
- Timestamps
- Integers and floating-point numbers
- Hex strings (alphabetic-only)
- Key-value pairs with named capture groups
- Generic patterns containing numbers
It ensures:
- All schema variables are registered and recognized correctly.
- Inputs are matched and classified according to their variable type.
- Capture groups are properly detected and positionally tracked.
This group demonstrates how to define and integrate regex-based schemas, including named capture groups, for structured log tokenization.
Schema Definition
delimiters: \n\r\[:,
firstTimestamp: [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}[,\.][0-9]{0,3}
int: -{0,1}[0-9]+
float: -{0,1}[0-9]+\.[0-9]+
hex: [a-fA-F]+
equals: [^ \r\n=]+=(?<val>[^ \r\n]*[A-Za-z0-9][^ \r\n]*)
hasNumber: ={0,1}[^ \r\n=]*\d[^ \r\n=]*={0,1}
Test Input
"2012-12-12 12:12:12.123 123 123.123 abc userID=123 text user123"
Expected Logtype
" <int> <float> <hex> userID=<val> text <hasNumber>"
Expected Timestamp
"2012-12-12 12:12:12.123"
Expected Tokenization
"2012-12-12 12:12:12.123" -> "firstTimestamp"
" 123" -> "int"
" 123.123" -> "float"
" abc" -> "hex"
" userID=123" -> "keyValuePair" with "123" -> "val"
" text" -> uncaught string
" user123" -> "hasNumber"