clp_ffi_py.ir package#

Submodules#

Module contents#

class clp_ffi_py.ir.ClpIrFileReader[source]#

Bases: ClpIrStreamReader

Wrapper class of ClpIrStreamReader that calls open for convenience.

__init__(fpath, deserializer_buffer_size=65536, enable_compression=True, allow_incomplete_stream=False)[source]#
Parameters:
  • fpath (Path)

  • deserializer_buffer_size (int)

  • enable_compression (bool)

  • allow_incomplete_stream (bool)

dump(ostream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>)[source]#
Parameters:

ostream (IO[str])

Return type:

None

class clp_ffi_py.ir.ClpIrStreamReader[source]#

Bases: Iterator[LogEvent]

This class represents a stream reader used to read/deserialize log events from a CLP IR stream. It also provides method(s) to instantiate a log event generator with a customized search query.

Parameters:
  • istream – Input stream that contains CLP IR byte sequence.

  • deserializer_buffer_size – Initial size of the deserializer buffer.

  • enable_compression – A flag indicating whether the istream is compressed using zstd.

  • allow_incomplete_stream – If set to True, an incomplete CLP IR stream is not treated as an error. Instead, encountering such a stream is seen as reaching its end without raising any exceptions.

DEFAULT_DESERIALIZER_BUFFER_SIZE: int = 65536#
__init__(istream, deserializer_buffer_size=65536, enable_compression=True, allow_incomplete_stream=False)[source]#
Parameters:
  • istream (IO[bytes])

  • deserializer_buffer_size (int)

  • enable_compression (bool)

  • allow_incomplete_stream (bool)

close()[source]#
Return type:

None

get_metadata()[source]#
Return type:

Metadata

has_metadata()[source]#
Return type:

bool

read_next_log_event()[source]#

Reads and deserializes the next log event from the IR stream.

Returns:

  • Next unread log event represented as an instance of LogEvent.

  • None if the end of IR stream is reached.

Raises:

Exception – If deserialize_next_log_event() fails.

Return type:

LogEvent | None

read_preamble()[source]#

Try to deserialize the preamble and set metadata. If metadata has been set already, it will instantly return. It is separated from __init__ so that the input stream does not need to be readable on a reader’s construction, but until the user starts to iterate logs.

Raises:

Exception – If deserialize_preamble() fails.

Return type:

None

search(query)[source]#

Searches and yields log events that match a specific search query.

Parameters:

query (Query) – The input query object used to match log events. Check the document of Query for more details.

Yield:

The next unread log event that matches the given search query from the IR stream.

Return type:

Generator[LogEvent, None, None]

class clp_ffi_py.ir.Deserializer#

Bases: object

Namespace for all CLP IR deserialization methods.

Methods deserialize log events from serialized CLP IR streams. This class should never be instantiated since it only contains static methods.

static deserialize_next_log_event(deserializer_buffer, query=None, allow_incomplete_stream=False)#

Deserializes the next serialized log event from the IR stream buffered in the given deserializer buffer. deserializer_buffer must have been returned by a successfully invocation of deserialize_preamble. If query is provided, only the next log event matching the query will be returned.

Parameters:
  • deserializer_buffer – The deserializer buffer of the serialized CLP IR stream.

  • query – A Query object that filters log events. See Query documents for more details.

  • allow_incomplete_stream – If set to True, an incomplete CLP IR stream is not treated as an error. Instead, encountering such a stream is seen as reaching its end, and the function will return None without raising any exceptions.

Raises:

Appropriate exceptions with detailed information on any encountered failure.

Returns:

  • A newly created LogEvent instance representing the next deserialized log event from the IR stream (if the query is None).

  • A newly created LogEvent instance representing the next deserialized log event matched with the given query in the IR stream (if the query is given).

  • None when the end of IR stream is reached or the query search terminates.

static deserialize_preamble(deserializer_buffer)#

Deserializes the preamble from the IR stream buffered in the given deserializer buffer.

Parameters:

deserializer_buffer – The deserializer buffer of the serialized CLP IR stream.

Raises:

Appropriate exceptions with detailed information on any encountered failure.

Returns:

The deserialized preamble presented as a new instance of Metadata.

class clp_ffi_py.ir.DeserializerBuffer#

Bases: object

This class represents a CLP IR Deserializer Buffer corresponding to a CLP IR stream. It buffers serialized CLP IR data read from the input stream, which can be consumed by the CLP IR deserialization methods to recover serialized log events. An instance of this class is expected to be passed across different calls of CLP IR deserialization methods when deserializing from the same IR stream.

The signature of __init__ method is shown as following:

__init__(self, input_stream, initial_buffer_capacity=4096)

Initializes a DeserializerBuffer object for the given input IR stream.

Parameters:
  • input_stream – Input stream that contains serialized CLP IR. It should be an instance of type IO[bytes] with the method readinto supported.

  • initial_buffer_capacity – The initial capacity of the underlying byte buffer.

__init__(*args, **kwargs)#
__new__(**kwargs)#
get_num_deserialized_log_messages()#
Returns:

Total number of messages deserialized so far.

class clp_ffi_py.ir.FourByteSerializer#

Bases: object

Namespace for all CLP four byte IR serialization methods.

Methods serialize bytes from the log record to create a CLP log message. This class should never be instantiated since it only contains static methods.

static serialize_end_of_ir()#

Serializes the byte sequence that indicates the end of a CLP IR stream. A stream that does not contain this will be considered as an incomplete IR stream.

static serialize_message(msg)#

Serializes the log msg using the 4-byte encoding.

Parameters:

msg – Log message to serialize.

Raises:

NotImplementedError – If the log message failed to serialize.

Returns:

The serialized message.

static serialize_message_and_timestamp_delta(timestamp_delta, msg)#

Serializes the log msg along with the timestamp delta using the 4-byte encoding.

Parameters:
  • timestamp_delta – Timestamp difference in milliseconds between the current log message and the previous log message.

  • msg – Log message to serialize.

Raises:

NotImplementedError – If the log message failed to serialize.:return: The serialized message and timestamp.

static serialize_preamble(ref_timestamp, timestamp_format, timezone)#

Serializes the preamble for a 4-byte encoded CLP IR stream.

Parameters:
  • ref_timestamp – Reference timestamp used to calculate deltas emitted with each message.

  • timestamp_format – Timestamp format to be use when generating the logs with a reader.

  • timezone – Timezone in TZID format to be use when generating the timestamp from Unix epoch time.

Raises:

NotImplementedError – If metadata length too large.

Returns:

The serialized preamble.

static serialize_timestamp_delta(timestamp_delta)#

Serializes the timestamp using the 4-byte encoding.

Parameters:

timestamp_delta – Timestamp difference in milliseconds between the current log message and the previous log message.

Raises:

NotImplementedError – If the timestamp failed to serialize.

Returns:

The serialized timestamp.

exception clp_ffi_py.ir.IncompleteStreamError#

Bases: Exception

This exception will be raised if the deserializer buffer cannot read more data from the input stream while the deserialization method expects more bytes. Typically, this error indicates the input stream has been truncated.

class clp_ffi_py.ir.LogEvent#

Bases: object

This class represents a deserialzied log event and provides ways to access the underlying log data, including the log message, the timestamp, and the log event index. Normally, this class will be instantiated by the FFI IR deserialization methods. However, with the __init__ method provided below, direct instantiation is also possible.

The signature of __init__ method is shown as following:

__init__(self, log_message, timestamp, index=0, metadata=None)

Initializes an object that represents a log event. Notice that each object should be strictly initialized only once. Double initialization will result in memory leaks.

Parameters:
  • log_message – The message content of the log event.

  • timestamp – The timestamp of the log event.

  • index – The message index (relative to the source CLP IR stream) of the log event.

  • metadata – The PyMetadata instance that represents the source CLP IR stream. It is set to None by default.

__init__(*args, **kwargs)#
__new__(**kwargs)#
get_formatted_message(timezone=None)#

Gets the formatted log message of the log event.

If a specific timezone is provided, it will be used to format the timestamp. Otherwise, the timestamp will be formatted using the timezone from the source CLP IR stream.

Parameters:

timezone – Python tzinfo object that specifies a timezone.

Returns:

The formatted message.

get_index()#

Gets the message index (relative to the source CLP IR stream) of the log event. This index is set to 0 by default.

Returns:

The log event index.

get_log_message()#

Gets the log message of the log event.

Returns:

The log message.

get_timestamp()#

Gets the Unix epoch timestamp in milliseconds of the log event.

Returns:

The timestamp in milliseconds.

match_query(query)#

Matches the underlying log event against the given query. Refer to the documentation of clp_ffi_py.Query for more details.

Parameters:

query – Input Query object.

Returns:

True if the log event matches the query, False otherwise.

class clp_ffi_py.ir.Metadata#

Bases: object

This class represents the IR stream preamble and provides ways to access the underlying metadata. Normally, this class will be instantiated by the FFI IR deserialization methods. However, with the __init__ method provided below, direct instantiation is also possible.

The signature of __init__ method is shown as following:

__init__(self, ref_timestamp, timestamp_format, timezone_id)

Initializes an object that represents CLP IR metadata. Assumes self is uninitialized and will allocate the underlying memory. If self is already initialized this will result in memory leaks.

Parameters:
  • ref_timestamp – the reference Unix epoch timestamp in milliseconds used to calculate the timestamp of the first log message in the IR stream.

  • timestamp_format – the timestamp format to be use when generating the logs with a reader.

  • timezone_id – the timezone id to be use when generating the timestamp from Unix epoch time.

__init__(*args, **kwargs)#
__new__(**kwargs)#
get_ref_timestamp()#

Gets the reference Unix epoch timestamp in milliseconds used to calculate the timestamp of the first log message in the IR stream.

Returns:

The reference timestamp.

get_timestamp_format()#

Gets the timestamp format to be use when generating the logs with a reader.

Returns:

The timestamp format.

get_timezone()#

Gets the timezone represented as tzinfo to be use when generating the timestamp from Unix epoch time.

Returns:

A new reference to the timezone as tzinfo.

get_timezone_id()#

Gets the timezone id to be use when generating the timestamp from Unix epoch time.

Returns:

The timezone ID in TZID format.

is_using_four_byte_encoding()#

Checks whether the CLP IR is encoded using 4-byte or 8-byte encoding methods.

Returns:

True for 4-byte encoding, and False for 8-byte encoding.

class clp_ffi_py.ir.Query#

Bases: object

This class represents a search query, utilized for filtering log events in a CLP IR stream. The query could include a list of wildcard queries aimed at identifying certain log messages, and a timestamp range with a lower and upper bound. This class provides an interface to set up a search query, as well as methods to validate whether the query can be matched by a log event. Note that an empty wildcard query list will match any log within the range.

By default, the wildcard query list is empty and the timestamp range is set to include all the valid Unix epoch timestamps. To filter certain log messages, use customized wildcard queries to initialize the wildcard query list. For more details, check the documentation of the class WildcardQuery.

NOTE: When searching an IR stream with a query, ideally, the search would terminate once the current log event’s timestamp exceeds the upper bound of the query’s time range. However, the timestamps in the IR stream might not be monotonically increasing; they can be locally disordered due to thread contention. To safely stop searching, the deserializer needs to ensure that the current timestamp in the IR stream exceeds the query’s upper bound timestamp by a reasonable margin. This margin can be specified during the initialization. This margin is set to a default value specified by the static method default_search_time_termination_margin(). Users can customized this margin accordingly, for example, the margin can be set to 0 if the CLP IR stream is generated from a single-threaded program execution.

The signature of __init__ method is shown as following:

__init__(self, search_time_lower_bound=Query.default_search_time_lower_bound(), search_time_upper_bound=Query.default_search_time_upper_bound(), wildcard_queries=None,search_time_termination_margin=Query.default_search_time_termination_margin())

Initializes a Query object using the given inputs.

Parameters:
  • search_time_lower_bound – Start of search time range (inclusive).

  • search_time_upper_bound – End of search time range (inclusive).

  • wildcard_queries – A list of wildcard queries.

  • search_time_termination_margin – The margin used to determine the search termination timestamp.

__init__(*args, **kwargs)#
__new__(**kwargs)#
static default_search_time_lower_bound()#
Returns:

The minimum valid timestamp from Unix epoch time.

static default_search_time_termination_margin()#
Returns:

The default search termination margin as Unix epoch time.

static default_search_time_upper_bound()#
Returns:

The maximum valid timestamp from Unix epoch time.

get_search_time_lower_bound()#
Returns:

The search time lower bound.

get_search_time_termination_margin()#
Returns:

The search time termination margin.

get_search_time_upper_bound()#
Returns:

The search time upper bound.

get_wildcard_queries()#
Returns:

A new Python list of stored wildcard queries, presented as Wildcard Query objects.

Returns:

None if the wildcard queries are empty.

match_log_event(log_event)#

Validates whether the input log message matches the query.

Parameters:

log_event – Input log event.

Returns:

  • True if the timestamp is in range, and the wildcard query list is empty or has at least one match.

  • False otherwise.

class clp_ffi_py.ir.QueryBuilder[source]#

Bases: object

This class serves as an interface for conveniently constructing Query objects utilized in CLP IR streaming search. It provides methods for configuring and resetting search parameters.

For more details about the search query CLP IR stream supports, see Query and WildcardQuery.

__init__()[source]#
Return type:

None

add_wildcard_queries(wildcard_queries)[source]#

Adds a list of wildcard queries to the wildcard query list.

Parameters:

wildcard_queries (List[WildcardQuery]) – The list of wildcard queries to add.

Returns:

self.

Return type:

QueryBuilder

add_wildcard_query(wildcard_query: str, case_sensitive: bool = False) QueryBuilder[source]#
add_wildcard_query(wildcard_query: WildcardQuery) QueryBuilder
build()[source]#
Raises:

QueryBuilderException – If the search time range lower bound exceeds the search time range upper bound.

Returns:

A Query object initialized with the parameters set by the builder.

Return type:

Query

reset()[source]#

Resets all settings to their defaults.

Returns:

self.

Return type:

QueryBuilder

reset_search_time_lower_bound()[source]#

Resets the search time lower bound to the default value.

Returns:

self.

Return type:

QueryBuilder

reset_search_time_termination_margin()[source]#

Resets the search time termination margin to the default value.

Returns:

self.

Return type:

QueryBuilder

reset_search_time_upper_bound()[source]#

Resets the search time upper bound to the default value.

Returns:

self.

Return type:

QueryBuilder

reset_wildcard_queries()[source]#

Clears the wildcard query list.

Returns:

self.

Return type:

QueryBuilder

property search_time_lower_bound: int#
property search_time_termination_margin: int#
property search_time_upper_bound: int#
set_search_time_lower_bound(ts)[source]#
Parameters:

ts (int) – Start of the search time range (inclusive) as a UNIX epoch timestamp in milliseconds.

Returns:

self.

Return type:

QueryBuilder

set_search_time_termination_margin(ts)[source]#
Parameters:

ts (int) – The search time termination margin as a UNIX epoch timestamp in milliseconds.

Returns:

self.

Return type:

QueryBuilder

set_search_time_upper_bound(ts)[source]#
Parameters:

ts (int) – End of the search time range (inclusive) as a UNIX epoch timestamp in milliseconds.

Returns:

self.

Return type:

QueryBuilder

property wildcard_queries: List[WildcardQuery]#
Returns:

A deep copy of the underlying wildcard query list.