Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support search logs by timestamp for structured and unstructured logs. #42

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ea23947
search timestamp works for unstructured logs
Henry8192 Nov 19, 2024
7edaa48
Merge branch 'y-scope:main' into search-timestamp
Henry8192 Dec 16, 2024
7481573
Merge branch 'main' into search-timestamp
Henry8192 Dec 18, 2024
3ee05a2
fix lint
Henry8192 Dec 19, 2024
f1a71a1
implement structured logs search by timestamp
Henry8192 Dec 22, 2024
5263385
Merge branch 'main' into search-timestamp
Henry8192 Dec 22, 2024
c423ec5
address partial changes from review
Henry8192 Dec 27, 2024
467e998
snapshot: get_timestamp seems to be undefined for std::upper_bound
Henry8192 Dec 30, 2024
c66cb80
fix lint
Henry8192 Dec 30, 2024
89338f7
pass in log_events instead of iterators to generic_get_log_event_inde…
Henry8192 Jan 2, 2025
a99ec2a
switch back to std::upper_bound because std::ranges::upper_bound is n…
Henry8192 Jan 6, 2025
9ec039f
fix lint
Henry8192 Jan 6, 2025
4f125b8
change generic_get_log_event_index_by_timestamp behavior: only return…
Henry8192 Jan 6, 2025
5221588
edit docstring for get_log_event_index_by_timestamp
Henry8192 Jan 6, 2025
abeb4f8
fix lint
Henry8192 Jan 6, 2025
6647fbd
Apply suggestions from code review
Henry8192 Jan 8, 2025
046191e
use concept to shorten function definition; minor change to generic_g…
Henry8192 Jan 8, 2025
4c7386d
Merge branch 'main' into search-timestamp
Henry8192 Jan 9, 2025
c400361
fix lint & syntax
Henry8192 Jan 9, 2025
ca4a616
rename get_log_event_index_by_timestamp to get_log_event_idx_by_times…
Henry8192 Jan 9, 2025
9443ba3
revert comments and plan to fix in the next pr
Henry8192 Jan 10, 2025
db60efd
add back the missing space
Henry8192 Jan 10, 2025
0e3e21b
revert decode_range (without creating concept)
Henry8192 Jan 10, 2025
412b96e
address changes from Marco's review
Henry8192 Jan 13, 2025
54c7df1
remove unnecessary require statement for get_log_event_idx_by_timestamp
Henry8192 Jan 13, 2025
eff1849
address the rest of the comments
Henry8192 Jan 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion src/clp_ffi_js/ir/StreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
"Array<[string, bigint, number, number]>"
);
emscripten::register_type<clp_ffi_js::ir::FilteredLogEventMapTsType>("number[] | null");
emscripten::register_type<clp_ffi_js::ir::LogEventIdxTsType>("number | null");
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
emscripten::class_<clp_ffi_js::ir::StreamReader>("ClpStreamReader")
.constructor(
&clp_ffi_js::ir::StreamReader::create,
Expand All @@ -145,7 +146,11 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
)
.function("filterLogEvents", &clp_ffi_js::ir::StreamReader::filter_log_events)
.function("deserializeStream", &clp_ffi_js::ir::StreamReader::deserialize_stream)
.function("decodeRange", &clp_ffi_js::ir::StreamReader::decode_range);
.function("decodeRange", &clp_ffi_js::ir::StreamReader::decode_range)
.function(
"getLogEventIdxByTimestamp",
&clp_ffi_js::ir::StreamReader::get_log_event_idx_by_timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id consider renaming get_log_event_idx_by_timestamp to find_log_event_idx_with_nearest_timestamp

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me their purposes are the same, and plus I'm kinda lazy to change this protocol in the log viewer side XD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you're already going to need to change since we changed getLogEventIndexByTimestamp to getLogEventIdxByTimestamp. Unless you disagree with the name, but it is close to log viewer function that does something similiar to idx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

);
}
} // namespace

Expand Down
65 changes: 65 additions & 0 deletions src/clp_ffi_js/ir/StreamReader.hpp
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <type_traits>
#include <vector>

#include <clp/ir/types.hpp>
#include <clp/streaming_compression/zstd/Decompressor.hpp>
#include <clp/type_utils.hpp>
#include <emscripten/em_asm.h>
Expand All @@ -29,6 +30,7 @@ EMSCRIPTEN_DECLARE_VAL_TYPE(ReaderOptions);
// JS types used as outputs
EMSCRIPTEN_DECLARE_VAL_TYPE(DecodedResultsTsType);
EMSCRIPTEN_DECLARE_VAL_TYPE(FilteredLogEventMapTsType);
EMSCRIPTEN_DECLARE_VAL_TYPE(LogEventIdxTsType);

enum class StreamType : uint8_t {
Structured,
Expand All @@ -44,6 +46,16 @@ using LogEvents = std::vector<LogEventWithFilterData<LogEvent>>;
*/
using FilteredLogEventsMap = std::optional<std::vector<size_t>>;

template <typename LogEvent>
davemarco marked this conversation as resolved.
Show resolved Hide resolved
concept GetLogEventIdxInterface = requires(
LogEventWithFilterData<LogEvent> const& event,
clp::ir::epoch_time_ms_t timestamp
) {
{
event.get_timestamp()
} -> std::convertible_to<clp::ir::epoch_time_ms_t>;
};

/**
* Class to deserialize and decode Zstandard-compressed CLP IR streams as well as format decoded
* log events.
Expand Down Expand Up @@ -123,6 +135,18 @@ class StreamReader {
*/
[[nodiscard]] virtual auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType = 0;
/**
* Finds the index of the last log event that matches or next to the given timestamp.
*
* @tparam LogEvent
* @param timestamp The timestamp to search for, in milliseconds since the Unix epoch.
* @return The last index of the log event whose timestamp is smaller than or equal to the
* `timestamp`.
* @return `0` if all log event timestamps are larger than the target.
* @return null if no log event exists in the stream.
*/
[[nodiscard]] virtual auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType = 0;
Comment on lines +128 to +139
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
 * Finds the log event with the timestamp that's nearest to the `target_ts`.
 * @param target_ts
 * @return The index of the log event with:
 * - the largest timestamp less than or equal to `target_ts`,
 * - or the index `0` if all timestamps are greater than `target_ts`.
 * @return null if no log event exists in the stream.
 */
[[nodiscard]] virtual auto find_log_event_idx_with_nearest_timestamp(clp::ir::epoch_time_ms_t target_ts
) -> LogEventIdxTsType = 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think my comment is an improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the think target_ts is less confusing to distinguish between the timestamp provided, and the log event timestamp


protected:
explicit StreamReader() = default;
Expand Down Expand Up @@ -172,6 +196,19 @@ class StreamReader {
LogLevelFilterTsType const& log_level_filter,
LogEvents<LogEvent> const& log_events
) -> void;

/**
* Templated implementation of `get_log_event_idx_by_timestamp`.
*
* @tparam LogEvent
* @param timestamp
* @return the best matched log event index.
*/
template <GetLogEventIdxInterface LogEvent>
auto generic_get_log_event_idx_by_timestamp(
LogEvents<LogEvent> const& log_events,
clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType;
};

template <typename LogEvent, typename ToStringFunc>
Expand Down Expand Up @@ -258,6 +295,34 @@ auto StreamReader::generic_filter_log_events(
}
}
}

template <GetLogEventIdxInterface LogEvent>
auto StreamReader::generic_get_log_event_idx_by_timestamp(
LogEvents<LogEvent> const& log_events,
clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType {
if (log_events.empty()) {
return LogEventIdxTsType{emscripten::val::null()};
}
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved

auto upper{std::upper_bound(
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
log_events.begin(),
log_events.end(),
timestamp,
[](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) {
return ts < log_event.get_timestamp();
}
)};

if (upper == log_events.begin()) {
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
return LogEventIdxTsType{emscripten::val(0)};
}

auto const upper_index{std::distance(log_events.begin(), upper)};
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
auto const index{upper_index - 1};
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved

return LogEventIdxTsType{emscripten::val(index)};
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
}
} // namespace clp_ffi_js::ir

#endif // CLP_FFI_JS_IR_STREAMREADER_HPP
10 changes: 10 additions & 0 deletions src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <clp/Array.hpp>
#include <clp/ErrorCode.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ir/types.hpp>
#include <clp/TraceableException.hpp>
#include <emscripten/val.h>
#include <spdlog/spdlog.h>
Expand Down Expand Up @@ -147,6 +148,15 @@ auto StructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx, bo
);
}

auto StructuredIrStreamReader::get_log_event_idx_by_timestamp(
clp::ir::epoch_time_ms_t const timestamp
) -> LogEventIdxTsType {
return generic_get_log_event_idx_by_timestamp<StructuredLogEvent>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not need to specify type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emmm you mean LogEventIdxTsType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No like you can remove <StructuredLogEvent> and it should still work

*m_deserialized_log_events,
timestamp
);
}

StructuredIrStreamReader::StructuredIrStreamReader(
StreamReaderDataContext<StructuredIrDeserializer>&& stream_reader_data_context,
std::shared_ptr<StructuredLogEvents> deserialized_log_events
Expand Down
4 changes: 4 additions & 0 deletions src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include <clp/Array.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/SchemaTree.hpp>
#include <clp/ir/types.hpp>
#include <emscripten/val.h>

#include <clp_ffi_js/ir/LogEventWithFilterData.hpp>
Expand Down Expand Up @@ -74,6 +75,9 @@ class StructuredIrStreamReader : public StreamReader {
[[nodiscard]] auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType override;

[[nodiscard]] auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType override;

private:
// Constructor
explicit StructuredIrStreamReader(
Expand Down
9 changes: 9 additions & 0 deletions src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,15 @@ auto UnstructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx,
);
}

auto UnstructuredIrStreamReader::get_log_event_idx_by_timestamp(
clp::ir::epoch_time_ms_t const timestamp
) -> LogEventIdxTsType {
return generic_get_log_event_idx_by_timestamp<UnstructuredLogEvent>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not need to specify type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

m_encoded_log_events,
timestamp
);
}

UnstructuredIrStreamReader::UnstructuredIrStreamReader(
StreamReaderDataContext<UnstructuredIrDeserializer>&& stream_reader_data_context
)
Expand Down
3 changes: 3 additions & 0 deletions src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ class UnstructuredIrStreamReader : public StreamReader {
[[nodiscard]] auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType override;

[[nodiscard]] auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType override;

private:
// Constructor
explicit UnstructuredIrStreamReader(
Expand Down
Loading