clp_ffi_py.ir package#
Submodules#
- clp_ffi_py.ir.native module
Deserializer
DeserializerBuffer
FourByteDeserializer
FourByteSerializer
KeyValuePairLogEvent
LogEvent
Metadata
Query
Query.__init__()
Query.__new__()
Query.default_search_time_lower_bound()
Query.default_search_time_termination_margin()
Query.default_search_time_upper_bound()
Query.get_search_time_lower_bound()
Query.get_search_time_termination_margin()
Query.get_search_time_upper_bound()
Query.get_wildcard_queries()
Query.match_log_event()
Serializer
- clp_ffi_py.ir.native_deprecated module
- clp_ffi_py.ir.query_builder module
QueryBuilder
QueryBuilder.add_wildcard_query()
QueryBuilder.add_wildcard_queries()
QueryBuilder.build()
QueryBuilder.reset()
QueryBuilder.reset_search_time_lower_bound()
QueryBuilder.reset_search_time_termination_margin()
QueryBuilder.reset_search_time_upper_bound()
QueryBuilder.reset_wildcard_queries()
QueryBuilder.search_time_lower_bound
QueryBuilder.search_time_termination_margin
QueryBuilder.search_time_upper_bound
QueryBuilder.set_search_time_lower_bound()
QueryBuilder.set_search_time_termination_margin()
QueryBuilder.set_search_time_upper_bound()
QueryBuilder.wildcard_queries
QueryBuilderException
- clp_ffi_py.ir.readers module
ClpIrFileReader
ClpIrStreamReader
ClpIrStreamReader.DEFAULT_DECODER_BUFFER_SIZE
ClpIrStreamReader.DEFAULT_DESERIALIZER_BUFFER_SIZE
ClpIrStreamReader.__init__()
ClpIrStreamReader.close()
ClpIrStreamReader.get_metadata()
ClpIrStreamReader.has_metadata()
ClpIrStreamReader.read_next_log_event()
ClpIrStreamReader.read_preamble()
ClpIrStreamReader.search()
Module contents#
- class clp_ffi_py.ir.ClpIrFileReader[source]#
Bases:
ClpIrStreamReader
Wrapper class of ClpIrStreamReader that calls open for convenience.
- class clp_ffi_py.ir.ClpIrStreamReader[source]#
Bases:
Iterator
[LogEvent
]This class represents a stream reader used to read/deserialize log events from a CLP IR stream. It also provides method(s) to instantiate a log event generator with a customized search query.
- Parameters:
istream – Input stream that contains CLP IR byte sequence.
deserializer_buffer_size – Initial size of the deserializer buffer.
enable_compression – A flag indicating whether the istream is compressed using zstd.
allow_incomplete_stream – If set to True, an incomplete CLP IR stream is not treated as an error. Instead, encountering such a stream is seen as reaching its end without raising any exceptions.
decoder_buffer_size – Deprecated since 0.0.13. Use deserializer_buffer_size instead. This argument is provided for backward compatibility and if set, will overwrite deserializer_buffer_size’s value.
- DEFAULT_DECODER_BUFFER_SIZE: int = 65536#
- DEFAULT_DESERIALIZER_BUFFER_SIZE: int = 65536#
- __init__(istream, deserializer_buffer_size=65536, enable_compression=True, allow_incomplete_stream=False, decoder_buffer_size=None)[source]#
- Parameters:
istream (IO[bytes])
deserializer_buffer_size (int)
enable_compression (bool)
allow_incomplete_stream (bool)
decoder_buffer_size (int | None)
- read_next_log_event()[source]#
Reads and deserializes the next log event from the IR stream.
- Returns:
Next unread log event represented as an instance of LogEvent.
None if the end of IR stream is reached.
- Raises:
Exception – If
deserialize_next_log_event()
fails.- Return type:
LogEvent | None
- read_preamble()[source]#
Try to deserialize the preamble and set metadata. If metadata has been set already, it will instantly return. It is separated from __init__ so that the input stream does not need to be readable on a reader’s construction, but until the user starts to iterate logs.
- Raises:
Exception – If
deserialize_preamble()
fails.- Return type:
None
- class clp_ffi_py.ir.Decoder[source]#
Bases:
object
Deprecated since version 0.0.13:
Decoder
is deprecated and has been renamed toFourByteDeserializer
.- static decode_next_log_event(decoder_buffer, query=None, allow_incomplete_stream=False)[source]#
This method is deprecated and has been renamed to
deserialize_next_log_event()
.- Parameters:
decoder_buffer (DecoderBuffer)
query (Query | None)
allow_incomplete_stream (bool)
- Return type:
LogEvent | None
- static decode_preamble(decoder_buffer)[source]#
This method is deprecated and has been renamed to
deserialize_preamble()
.- Parameters:
decoder_buffer (DecoderBuffer)
- Return type:
- class clp_ffi_py.ir.DecoderBuffer[source]#
Bases:
object
Deprecated since version 0.0.13:
DecoderBuffer
is deprecated and has been renamed toDeserializerBuffer
.- __init__(input_stream, initial_buffer_capacity=4096)[source]#
- Parameters:
input_stream (IO[bytes])
initial_buffer_capacity (int)
- static __new__(cls, *args, **kwargs)#
- get_num_decoded_log_messages()[source]#
This method is deprecated and has been renamed to
get_num_deserialized_log_messages()
.- Return type:
int
- class clp_ffi_py.ir.Deserializer#
Bases:
object
Deserializer for deserializing CLP key-value pair IR streams. This class deserializes a CLP key-value pair IR stream into log events.
__init__(self, input_stream, buffer_capacity=65536, allow_incomplete_stream=False)
Initializes a
Deserializer
instance with the given inputs. Note that each object should only be initialized once. Double initialization will result in a memory leak.- Parameters:
input_stream (IO[bytes]) – Serialized CLP IR stream.
buffer_capacity (int) – The capacity of the underlying read buffer.
allow_incomplete_stream (bool) – If set to True, an incomplete CLP IR stream is not treated as an error.
- __init__(*args, **kwargs)#
- __new__(**kwargs)#
- deserialize_log_event()#
Deserializes the next log event from the IR stream.
- Returns:
The next deserialized log event from the IR stream.
None if there are no more log events in the stream.
- Return type:
KeyValuePairLogEvent
| None- Raises:
Appropriate exceptions with detailed information on any encountered failure.
- class clp_ffi_py.ir.DeserializerBuffer#
Bases:
object
This class represents a CLP IR Deserializer Buffer corresponding to a CLP IR stream. It buffers serialized CLP IR data read from the input stream, which can be consumed by the CLP IR deserialization methods to recover serialized log events. An instance of this class is expected to be passed across different calls of CLP IR deserialization methods when deserializing from the same IR stream.
The signature of __init__ method is shown as following:
__init__(self, input_stream, initial_buffer_capacity=4096)
Initializes a DeserializerBuffer object for the given input IR stream.
- Parameters:
input_stream – Input stream that contains serialized CLP IR. It should be an instance of type IO[bytes] with the method readinto supported.
initial_buffer_capacity – The initial capacity of the underlying byte buffer.
- __init__(*args, **kwargs)#
- __new__(**kwargs)#
- get_num_deserialized_log_messages()#
- Returns:
Total number of messages deserialized so far.
- class clp_ffi_py.ir.FourByteDeserializer#
Bases:
object
Namespace for all CLP four-byte encoded IR deserialization methods.
Methods deserialize log events from serialized CLP IR streams. This class should never be instantiated since it only contains static methods.
- static deserialize_next_log_event(deserializer_buffer, query=None, allow_incomplete_stream=False)#
Deserializes the next serialized log event from the IR stream buffered in the given deserializer buffer. deserializer_buffer must have been returned by a successfully invocation of deserialize_preamble. If query is provided, only the next log event matching the query will be returned.
- Parameters:
deserializer_buffer – The deserializer buffer of the serialized CLP IR stream.
query – A Query object that filters log events. See Query documents for more details.
allow_incomplete_stream – If set to True, an incomplete CLP IR stream is not treated as an error. Instead, encountering such a stream is seen as reaching its end, and the function will return None without raising any exceptions.
- Raises:
Appropriate exceptions with detailed information on any encountered failure.
- Returns:
A newly created LogEvent instance representing the next deserialized log event from the IR stream (if the query is None).
A newly created LogEvent instance representing the next deserialized log event matched with the given query in the IR stream (if the query is given).
None when the end of IR stream is reached or the query search terminates.
- static deserialize_preamble(deserializer_buffer)#
Deserializes the preamble from the IR stream buffered in the given deserializer buffer.
- Parameters:
deserializer_buffer – The deserializer buffer of the serialized CLP IR stream.
- Raises:
Appropriate exceptions with detailed information on any encountered failure.
- Returns:
The deserialized preamble presented as a new instance of Metadata.
- class clp_ffi_py.ir.FourByteEncoder[source]#
Bases:
object
Deprecated since version 0.0.13:
FourByteEncoder
is deprecated and has been renamed toFourByteSerializer
.- static encode_end_of_ir()[source]#
This method is deprecated and has been renamed to
serialize_end_of_ir()
.- Return type:
bytearray
- static encode_message(msg)[source]#
This method is deprecated and has been renamed to
serialize_message()
.- Parameters:
msg (bytes)
- Return type:
bytearray
- static encode_message_and_timestamp_delta(timestamp_delta, msg)[source]#
This method is deprecated and has been renamed to
serialize_message_and_timestamp_delta()
.- Parameters:
timestamp_delta (int)
msg (bytes)
- Return type:
bytearray
- static encode_preamble(ref_timestamp, timestamp_format, timezone)[source]#
This method is deprecated and has been renamed to
serialize_preamble()
.- Parameters:
ref_timestamp (int)
timestamp_format (str)
timezone (str)
- Return type:
bytearray
- static encode_timestamp_delta(timestamp_delta)[source]#
This method is deprecated and has been renamed to
serialize_timestamp_delta()
.- Parameters:
timestamp_delta (int)
- Return type:
bytearray
- class clp_ffi_py.ir.FourByteSerializer#
Bases:
object
Namespace for all CLP four byte IR serialization methods.
Methods serialize bytes from the log record to create a CLP log message. This class should never be instantiated since it only contains static methods.
- static serialize_end_of_ir()#
Serializes the byte sequence that indicates the end of a CLP IR stream. A stream that does not contain this will be considered as an incomplete IR stream.
- static serialize_message(msg)#
Serializes the log msg using the 4-byte encoding.
- Parameters:
msg – Log message to serialize.
- Raises:
NotImplementedError – If the log message failed to serialize.
- Returns:
The serialized message.
- static serialize_message_and_timestamp_delta(timestamp_delta, msg)#
Serializes the log msg along with the timestamp delta using the 4-byte encoding.
- Parameters:
timestamp_delta – Timestamp difference in milliseconds between the current log message and the previous log message.
msg – Log message to serialize.
- Raises:
NotImplementedError – If the log message failed to serialize.:return: The serialized message and timestamp.
- static serialize_preamble(ref_timestamp, timestamp_format, timezone)#
Serializes the preamble for a 4-byte encoded CLP IR stream.
- Parameters:
ref_timestamp – Reference timestamp used to calculate deltas emitted with each message.
timestamp_format – Timestamp format to be use when generating the logs with a reader.
timezone – Timezone in TZID format to be use when generating the timestamp from Unix epoch time.
- Raises:
NotImplementedError – If metadata length too large.
- Returns:
The serialized preamble.
- static serialize_timestamp_delta(timestamp_delta)#
Serializes the timestamp using the 4-byte encoding.
- Parameters:
timestamp_delta – Timestamp difference in milliseconds between the current log message and the previous log message.
- Raises:
NotImplementedError – If the timestamp failed to serialize.
- Returns:
The serialized timestamp.
- exception clp_ffi_py.ir.IncompleteStreamError#
Bases:
Exception
This exception will be raised if the deserializer buffer cannot read more data from the input stream while the deserialization method expects more bytes. Typically, this error indicates the input stream has been truncated.
- class clp_ffi_py.ir.KeyValuePairLogEvent#
Bases:
object
This class represents a key-value pair log event and provides methods to access the key-value pairs. This class is designed to be instantiated by the IR deserializer. However, direct instantiation using the __init__ method is also supported for testing purposes, although this may not be as efficient as emission from the IR deserializer.
__init__(self, dictionary)
Initializes a
KeyValuePairLogEvent
from the given Python dictionary. Note that each object should only be initialized once. Double initialization will result in a memory leak.- Parameters:
dictionary (dict[str, Any]) – A dictionary representing the key-value pair log event, where all keys must be strings, including keys inside any sub-dictionaries.
- __init__(*args, **kwargs)#
- __new__(**kwargs)#
- to_dict()#
Converts the log event into a Python dictionary.
- Returns:
The log event as a Python dictionary.
- Return type:
dict[str, Any]
- class clp_ffi_py.ir.LogEvent#
Bases:
object
This class represents a deserialzied log event and provides ways to access the underlying log data, including the log message, the timestamp, and the log event index. Normally, this class will be instantiated by the FFI IR deserialization methods. However, with the __init__ method provided below, direct instantiation is also possible.
The signature of __init__ method is shown as following:
__init__(self, log_message, timestamp, index=0, metadata=None)
Initializes an object that represents a log event. Notice that each object should be strictly initialized only once. Double initialization will result in memory leaks.
- Parameters:
log_message – The message content of the log event.
timestamp – The timestamp of the log event.
index – The message index (relative to the source CLP IR stream) of the log event.
metadata – The PyMetadata instance that represents the source CLP IR stream. It is set to None by default.
- __init__(*args, **kwargs)#
- __new__(**kwargs)#
- get_formatted_message(timezone=None)#
Gets the formatted log message of the log event.
If a specific timezone is provided, it will be used to format the timestamp. Otherwise, the timestamp will be formatted using the timezone from the source CLP IR stream.
- Parameters:
timezone – Python tzinfo object that specifies a timezone.
- Returns:
The formatted message.
- get_index()#
Gets the message index (relative to the source CLP IR stream) of the log event. This index is set to 0 by default.
- Returns:
The log event index.
- get_log_message()#
Gets the log message of the log event.
- Returns:
The log message.
- get_timestamp()#
Gets the Unix epoch timestamp in milliseconds of the log event.
- Returns:
The timestamp in milliseconds.
- match_query(query)#
Matches the underlying log event against the given query. Refer to the documentation of clp_ffi_py.Query for more details.
- Parameters:
query – Input Query object.
- Returns:
True if the log event matches the query, False otherwise.
- class clp_ffi_py.ir.Metadata#
Bases:
object
This class represents the IR stream preamble and provides ways to access the underlying metadata. Normally, this class will be instantiated by the FFI IR deserialization methods. However, with the __init__ method provided below, direct instantiation is also possible.
The signature of __init__ method is shown as following:
__init__(self, ref_timestamp, timestamp_format, timezone_id)
Initializes an object that represents CLP IR metadata. Assumes self is uninitialized and will allocate the underlying memory. If self is already initialized this will result in memory leaks.
- Parameters:
ref_timestamp – the reference Unix epoch timestamp in milliseconds used to calculate the timestamp of the first log message in the IR stream.
timestamp_format – the timestamp format to be use when generating the logs with a reader.
timezone_id – the timezone id to be use when generating the timestamp from Unix epoch time.
- __init__(*args, **kwargs)#
- __new__(**kwargs)#
- get_ref_timestamp()#
Gets the reference Unix epoch timestamp in milliseconds used to calculate the timestamp of the first log message in the IR stream.
- Returns:
The reference timestamp.
- get_timestamp_format()#
Gets the timestamp format to be use when generating the logs with a reader.
- Returns:
The timestamp format.
- get_timezone()#
Gets the timezone represented as tzinfo to be use when generating the timestamp from Unix epoch time.
- Returns:
A new reference to the timezone as tzinfo.
- get_timezone_id()#
Gets the timezone id to be use when generating the timestamp from Unix epoch time.
- Returns:
The timezone ID in TZID format.
- is_using_four_byte_encoding()#
Checks whether the CLP IR is encoded using 4-byte or 8-byte encoding methods.
- Returns:
True for 4-byte encoding, and False for 8-byte encoding.
- class clp_ffi_py.ir.Query#
Bases:
object
This class represents a search query, utilized for filtering log events in a CLP IR stream. The query could include a list of wildcard queries aimed at identifying certain log messages, and a timestamp range with a lower and upper bound. This class provides an interface to set up a search query, as well as methods to validate whether the query can be matched by a log event. Note that an empty wildcard query list will match any log within the range.
By default, the wildcard query list is empty and the timestamp range is set to include all the valid Unix epoch timestamps. To filter certain log messages, use customized wildcard queries to initialize the wildcard query list. For more details, check the documentation of the class WildcardQuery.
NOTE: When searching an IR stream with a query, ideally, the search would terminate once the current log event’s timestamp exceeds the upper bound of the query’s time range. However, the timestamps in the IR stream might not be monotonically increasing; they can be locally disordered due to thread contention. To safely stop searching, the deserializer needs to ensure that the current timestamp in the IR stream exceeds the query’s upper bound timestamp by a reasonable margin. This margin can be specified during the initialization. This margin is set to a default value specified by the static method default_search_time_termination_margin(). Users can customized this margin accordingly, for example, the margin can be set to 0 if the CLP IR stream is generated from a single-threaded program execution.
The signature of __init__ method is shown as following:
__init__(self, search_time_lower_bound=Query.default_search_time_lower_bound(), search_time_upper_bound=Query.default_search_time_upper_bound(), wildcard_queries=None,search_time_termination_margin=Query.default_search_time_termination_margin())
Initializes a Query object using the given inputs.
- Parameters:
search_time_lower_bound – Start of search time range (inclusive).
search_time_upper_bound – End of search time range (inclusive).
wildcard_queries – A list of wildcard queries.
search_time_termination_margin – The margin used to determine the search termination timestamp.
- __init__(*args, **kwargs)#
- __new__(**kwargs)#
- static default_search_time_lower_bound()#
- Returns:
The minimum valid timestamp from Unix epoch time.
- static default_search_time_termination_margin()#
- Returns:
The default search termination margin as Unix epoch time.
- static default_search_time_upper_bound()#
- Returns:
The maximum valid timestamp from Unix epoch time.
- get_search_time_lower_bound()#
- Returns:
The search time lower bound.
- get_search_time_termination_margin()#
- Returns:
The search time termination margin.
- get_search_time_upper_bound()#
- Returns:
The search time upper bound.
- get_wildcard_queries()#
- Returns:
A new Python list of stored wildcard queries, presented as Wildcard Query objects.
- Returns:
None if the wildcard queries are empty.
- match_log_event(log_event)#
Validates whether the input log message matches the query.
- Parameters:
log_event – Input log event.
- Returns:
True if the timestamp is in range, and the wildcard query list is empty or has at least one match.
False otherwise.
- class clp_ffi_py.ir.QueryBuilder[source]#
Bases:
object
This class serves as an interface for conveniently constructing Query objects utilized in CLP IR streaming search. It provides methods for configuring and resetting search parameters.
For more details about the search query CLP IR stream supports, see
Query
andWildcardQuery
.- add_wildcard_queries(wildcard_queries)[source]#
Adds a list of wildcard queries to the wildcard query list.
- Parameters:
wildcard_queries (List[WildcardQuery]) – The list of wildcard queries to add.
- Returns:
self.
- Return type:
- add_wildcard_query(wildcard_query: str, case_sensitive: bool = False) QueryBuilder [source]#
- add_wildcard_query(wildcard_query: WildcardQuery) QueryBuilder
- build()[source]#
- Raises:
QueryBuilderException – If the search time range lower bound exceeds the search time range upper bound.
- Returns:
A
Query
object initialized with the parameters set by the builder.- Return type:
- reset_search_time_lower_bound()[source]#
Resets the search time lower bound to the default value.
- Returns:
self.
- Return type:
- reset_search_time_termination_margin()[source]#
Resets the search time termination margin to the default value.
- Returns:
self.
- Return type:
- reset_search_time_upper_bound()[source]#
Resets the search time upper bound to the default value.
- Returns:
self.
- Return type:
- property search_time_lower_bound: int#
- property search_time_termination_margin: int#
- property search_time_upper_bound: int#
- set_search_time_lower_bound(ts)[source]#
- Parameters:
ts (int) – Start of the search time range (inclusive) as a UNIX epoch timestamp in milliseconds.
- Returns:
self.
- Return type:
- set_search_time_termination_margin(ts)[source]#
- Parameters:
ts (int) – The search time termination margin as a UNIX epoch timestamp in milliseconds.
- Returns:
self.
- Return type:
- set_search_time_upper_bound(ts)[source]#
- Parameters:
ts (int) – End of the search time range (inclusive) as a UNIX epoch timestamp in milliseconds.
- Returns:
self.
- Return type:
- property wildcard_queries: List[WildcardQuery]#
- Returns:
A deep copy of the underlying wildcard query list.
- class clp_ffi_py.ir.Serializer#
Bases:
object
Serializer for serializing CLP key-value pair IR streams. This class serializes log events into the CLP key-value pair IR format and writes the serialized data to a specified byte stream object.
__init__(self, output_stream, buffer_size_limit=65536)
Initializes a
Serializer
instance with the given output stream. Note that each object should only be initialized once. Double initialization will result in a memory leak.- Parameters:
output_stream (IO[bytes]) – A writable byte output stream to which the serializer will write the serialized IR byte sequences.
buffer_size_limit (int) – The maximum amount of serialized data to buffer before flushing it to output_stream. Defaults to 64 KiB.
- __init__(*args, **kwargs)#
- __new__(**kwargs)#
- close()#
Closes the serializer, writing any buffered data to the output stream and appending a byte sequence to mark the end of the CLP IR stream. The output stream is then flushed and closed. NOTE: This method must be called to properly terminate an IR stream. If it isn’t called, the stream will be incomplete, and any buffered data may be lost.
- Raises:
IOError – If the serializer has already been closed.
- flush()#
Flushes any buffered data and the output stream.
- Raises:
IOError – If the serializer has already been closed.
- get_num_bytes_serialized()#
- Returns:
The total number of bytes serialized.
- Return type:
int
- Raises:
IOError – If the serializer has already been closed.
- serialize_log_event_from_msgpack_map(msgpack_map)#
Serializes the given log event.
- Parameters:
msgpack_map (bytes) – The log event as a packed msgpack map where all keys are strings.
- Returns:
The number of bytes serialized.
- Return type:
int
- Raises:
IOError – If the serializer has already been closed.
TypeError – If msgpack_map is not a packed msgpack map.
RuntimeError – If msgpack_map couldn’t be unpacked or serialization into the IR stream failed.