clp_ffi_py.ir.native module#

Python interface to the CLP IR serialization and deserialization methods.

class clp_ffi_py.ir.native.Deserializer#

Bases: object

Deserializer for deserializing CLP key-value pair IR streams. This class deserializes a CLP key-value pair IR stream into log events.

__init__(self, input_stream, buffer_capacity=65536, allow_incomplete_stream=False)

Initializes a Deserializer instance with the given inputs. Note that each object should only be initialized once. Double initialization will result in a memory leak.

Parameters:
  • input_stream (IO[bytes]) – Serialized CLP IR stream.

  • buffer_capacity (int) – The capacity of the underlying read buffer.

  • allow_incomplete_stream (bool) – If set to True, an incomplete CLP IR stream is not treated as an error.

__init__(*args, **kwargs)#
__new__(**kwargs)#
deserialize_log_event()#

Deserializes the next log event from the IR stream.

Returns:

  • The next deserialized log event from the IR stream.

  • None if there are no more log events in the stream.

Return type:

KeyValuePairLogEvent | None

Raises:

Appropriate exceptions with detailed information on any encountered failure.

class clp_ffi_py.ir.native.DeserializerBuffer#

Bases: object

This class represents a CLP IR Deserializer Buffer corresponding to a CLP IR stream. It buffers serialized CLP IR data read from the input stream, which can be consumed by the CLP IR deserialization methods to recover serialized log events. An instance of this class is expected to be passed across different calls of CLP IR deserialization methods when deserializing from the same IR stream.

The signature of __init__ method is shown as following:

__init__(self, input_stream, initial_buffer_capacity=4096)

Initializes a DeserializerBuffer object for the given input IR stream.

Parameters:
  • input_stream – Input stream that contains serialized CLP IR. It should be an instance of type IO[bytes] with the method readinto supported.

  • initial_buffer_capacity – The initial capacity of the underlying byte buffer.

__init__(*args, **kwargs)#
__new__(**kwargs)#
get_num_deserialized_log_messages()#
Returns:

Total number of messages deserialized so far.

class clp_ffi_py.ir.native.FourByteDeserializer#

Bases: object

Namespace for all CLP four-byte encoded IR deserialization methods.

Methods deserialize log events from serialized CLP IR streams. This class should never be instantiated since it only contains static methods.

static deserialize_next_log_event(deserializer_buffer, query=None, allow_incomplete_stream=False)#

Deserializes the next serialized log event from the IR stream buffered in the given deserializer buffer. deserializer_buffer must have been returned by a successfully invocation of deserialize_preamble. If query is provided, only the next log event matching the query will be returned.

Parameters:
  • deserializer_buffer – The deserializer buffer of the serialized CLP IR stream.

  • query – A Query object that filters log events. See Query documents for more details.

  • allow_incomplete_stream – If set to True, an incomplete CLP IR stream is not treated as an error. Instead, encountering such a stream is seen as reaching its end, and the function will return None without raising any exceptions.

Raises:

Appropriate exceptions with detailed information on any encountered failure.

Returns:

  • A newly created LogEvent instance representing the next deserialized log event from the IR stream (if the query is None).

  • A newly created LogEvent instance representing the next deserialized log event matched with the given query in the IR stream (if the query is given).

  • None when the end of IR stream is reached or the query search terminates.

static deserialize_preamble(deserializer_buffer)#

Deserializes the preamble from the IR stream buffered in the given deserializer buffer.

Parameters:

deserializer_buffer – The deserializer buffer of the serialized CLP IR stream.

Raises:

Appropriate exceptions with detailed information on any encountered failure.

Returns:

The deserialized preamble presented as a new instance of Metadata.

class clp_ffi_py.ir.native.FourByteSerializer#

Bases: object

Namespace for all CLP four byte IR serialization methods.

Methods serialize bytes from the log record to create a CLP log message. This class should never be instantiated since it only contains static methods.

static serialize_end_of_ir()#

Serializes the byte sequence that indicates the end of a CLP IR stream. A stream that does not contain this will be considered as an incomplete IR stream.

static serialize_message(msg)#

Serializes the log msg using the 4-byte encoding.

Parameters:

msg – Log message to serialize.

Raises:

NotImplementedError – If the log message failed to serialize.

Returns:

The serialized message.

static serialize_message_and_timestamp_delta(timestamp_delta, msg)#

Serializes the log msg along with the timestamp delta using the 4-byte encoding.

Parameters:
  • timestamp_delta – Timestamp difference in milliseconds between the current log message and the previous log message.

  • msg – Log message to serialize.

Raises:

NotImplementedError – If the log message failed to serialize.:return: The serialized message and timestamp.

static serialize_preamble(ref_timestamp, timestamp_format, timezone)#

Serializes the preamble for a 4-byte encoded CLP IR stream.

Parameters:
  • ref_timestamp – Reference timestamp used to calculate deltas emitted with each message.

  • timestamp_format – Timestamp format to be use when generating the logs with a reader.

  • timezone – Timezone in TZID format to be use when generating the timestamp from Unix epoch time.

Raises:

NotImplementedError – If metadata length too large.

Returns:

The serialized preamble.

static serialize_timestamp_delta(timestamp_delta)#

Serializes the timestamp using the 4-byte encoding.

Parameters:

timestamp_delta – Timestamp difference in milliseconds between the current log message and the previous log message.

Raises:

NotImplementedError – If the timestamp failed to serialize.

Returns:

The serialized timestamp.

class clp_ffi_py.ir.native.KeyValuePairLogEvent#

Bases: object

This class represents a key-value pair log event and provides methods to access the key-value pairs. This class is designed to be instantiated by the IR deserializer. However, direct instantiation using the __init__ method is also supported for testing purposes, although this may not be as efficient as emission from the IR deserializer.

__init__(self, dictionary)

Initializes a KeyValuePairLogEvent from the given Python dictionary. Note that each object should only be initialized once. Double initialization will result in a memory leak.

Parameters:

dictionary (dict[str, Any]) – A dictionary representing the key-value pair log event, where all keys must be strings, including keys inside any sub-dictionaries.

__init__(*args, **kwargs)#
__new__(**kwargs)#
to_dict()#

Converts the log event into a Python dictionary.

Returns:

The log event as a Python dictionary.

Return type:

dict[str, Any]

class clp_ffi_py.ir.native.LogEvent#

Bases: object

This class represents a deserialzied log event and provides ways to access the underlying log data, including the log message, the timestamp, and the log event index. Normally, this class will be instantiated by the FFI IR deserialization methods. However, with the __init__ method provided below, direct instantiation is also possible.

The signature of __init__ method is shown as following:

__init__(self, log_message, timestamp, index=0, metadata=None)

Initializes an object that represents a log event. Notice that each object should be strictly initialized only once. Double initialization will result in memory leaks.

Parameters:
  • log_message – The message content of the log event.

  • timestamp – The timestamp of the log event.

  • index – The message index (relative to the source CLP IR stream) of the log event.

  • metadata – The PyMetadata instance that represents the source CLP IR stream. It is set to None by default.

__init__(*args, **kwargs)#
__new__(**kwargs)#
get_formatted_message(timezone=None)#

Gets the formatted log message of the log event.

If a specific timezone is provided, it will be used to format the timestamp. Otherwise, the timestamp will be formatted using the timezone from the source CLP IR stream.

Parameters:

timezone – Python tzinfo object that specifies a timezone.

Returns:

The formatted message.

get_index()#

Gets the message index (relative to the source CLP IR stream) of the log event. This index is set to 0 by default.

Returns:

The log event index.

get_log_message()#

Gets the log message of the log event.

Returns:

The log message.

get_timestamp()#

Gets the Unix epoch timestamp in milliseconds of the log event.

Returns:

The timestamp in milliseconds.

match_query(query)#

Matches the underlying log event against the given query. Refer to the documentation of clp_ffi_py.Query for more details.

Parameters:

query – Input Query object.

Returns:

True if the log event matches the query, False otherwise.

class clp_ffi_py.ir.native.Metadata#

Bases: object

This class represents the IR stream preamble and provides ways to access the underlying metadata. Normally, this class will be instantiated by the FFI IR deserialization methods. However, with the __init__ method provided below, direct instantiation is also possible.

The signature of __init__ method is shown as following:

__init__(self, ref_timestamp, timestamp_format, timezone_id)

Initializes an object that represents CLP IR metadata. Assumes self is uninitialized and will allocate the underlying memory. If self is already initialized this will result in memory leaks.

Parameters:
  • ref_timestamp – the reference Unix epoch timestamp in milliseconds used to calculate the timestamp of the first log message in the IR stream.

  • timestamp_format – the timestamp format to be use when generating the logs with a reader.

  • timezone_id – the timezone id to be use when generating the timestamp from Unix epoch time.

__init__(*args, **kwargs)#
__new__(**kwargs)#
get_ref_timestamp()#

Gets the reference Unix epoch timestamp in milliseconds used to calculate the timestamp of the first log message in the IR stream.

Returns:

The reference timestamp.

get_timestamp_format()#

Gets the timestamp format to be use when generating the logs with a reader.

Returns:

The timestamp format.

get_timezone()#

Gets the timezone represented as tzinfo to be use when generating the timestamp from Unix epoch time.

Returns:

A new reference to the timezone as tzinfo.

get_timezone_id()#

Gets the timezone id to be use when generating the timestamp from Unix epoch time.

Returns:

The timezone ID in TZID format.

is_using_four_byte_encoding()#

Checks whether the CLP IR is encoded using 4-byte or 8-byte encoding methods.

Returns:

True for 4-byte encoding, and False for 8-byte encoding.

class clp_ffi_py.ir.native.Query#

Bases: object

This class represents a search query, utilized for filtering log events in a CLP IR stream. The query could include a list of wildcard queries aimed at identifying certain log messages, and a timestamp range with a lower and upper bound. This class provides an interface to set up a search query, as well as methods to validate whether the query can be matched by a log event. Note that an empty wildcard query list will match any log within the range.

By default, the wildcard query list is empty and the timestamp range is set to include all the valid Unix epoch timestamps. To filter certain log messages, use customized wildcard queries to initialize the wildcard query list. For more details, check the documentation of the class WildcardQuery.

NOTE: When searching an IR stream with a query, ideally, the search would terminate once the current log event’s timestamp exceeds the upper bound of the query’s time range. However, the timestamps in the IR stream might not be monotonically increasing; they can be locally disordered due to thread contention. To safely stop searching, the deserializer needs to ensure that the current timestamp in the IR stream exceeds the query’s upper bound timestamp by a reasonable margin. This margin can be specified during the initialization. This margin is set to a default value specified by the static method default_search_time_termination_margin(). Users can customized this margin accordingly, for example, the margin can be set to 0 if the CLP IR stream is generated from a single-threaded program execution.

The signature of __init__ method is shown as following:

__init__(self, search_time_lower_bound=Query.default_search_time_lower_bound(), search_time_upper_bound=Query.default_search_time_upper_bound(), wildcard_queries=None,search_time_termination_margin=Query.default_search_time_termination_margin())

Initializes a Query object using the given inputs.

Parameters:
  • search_time_lower_bound – Start of search time range (inclusive).

  • search_time_upper_bound – End of search time range (inclusive).

  • wildcard_queries – A list of wildcard queries.

  • search_time_termination_margin – The margin used to determine the search termination timestamp.

__init__(*args, **kwargs)#
__new__(**kwargs)#
static default_search_time_lower_bound()#
Returns:

The minimum valid timestamp from Unix epoch time.

static default_search_time_termination_margin()#
Returns:

The default search termination margin as Unix epoch time.

static default_search_time_upper_bound()#
Returns:

The maximum valid timestamp from Unix epoch time.

get_search_time_lower_bound()#
Returns:

The search time lower bound.

get_search_time_termination_margin()#
Returns:

The search time termination margin.

get_search_time_upper_bound()#
Returns:

The search time upper bound.

get_wildcard_queries()#
Returns:

A new Python list of stored wildcard queries, presented as Wildcard Query objects.

Returns:

None if the wildcard queries are empty.

match_log_event(log_event)#

Validates whether the input log message matches the query.

Parameters:

log_event – Input log event.

Returns:

  • True if the timestamp is in range, and the wildcard query list is empty or has at least one match.

  • False otherwise.

class clp_ffi_py.ir.native.Serializer#

Bases: object

Serializer for serializing CLP key-value pair IR streams. This class serializes log events into the CLP key-value pair IR format and writes the serialized data to a specified byte stream object.

__init__(self, output_stream, buffer_size_limit=65536)

Initializes a Serializer instance with the given output stream. Note that each object should only be initialized once. Double initialization will result in a memory leak.

Parameters:
  • output_stream (IO[bytes]) – A writable byte output stream to which the serializer will write the serialized IR byte sequences.

  • buffer_size_limit (int) – The maximum amount of serialized data to buffer before flushing it to output_stream. Defaults to 64 KiB.

__init__(*args, **kwargs)#
__new__(**kwargs)#
close()#

Closes the serializer, writing any buffered data to the output stream and appending a byte sequence to mark the end of the CLP IR stream. The output stream is then flushed and closed. NOTE: This method must be called to properly terminate an IR stream. If it isn’t called, the stream will be incomplete, and any buffered data may be lost.

Raises:

IOError – If the serializer has already been closed.

flush()#

Flushes any buffered data and the output stream.

Raises:

IOError – If the serializer has already been closed.

get_num_bytes_serialized()#
Returns:

The total number of bytes serialized.

Return type:

int

Raises:

IOError – If the serializer has already been closed.

serialize_log_event_from_msgpack_map(msgpack_map)#

Serializes the given log event.

Parameters:

msgpack_map (bytes) – The log event as a packed msgpack map where all keys are strings.

Returns:

The number of bytes serialized.

Return type:

int

Raises:
  • IOError – If the serializer has already been closed.

  • TypeError – If msgpack_map is not a packed msgpack map.

  • RuntimeError – If msgpack_map couldn’t be unpacked or serialization into the IR stream failed.