Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging improvements #1132

Open
wants to merge 5 commits into
base: v1.x.x
Choose a base branch
from
Open

Logging improvements #1132

wants to merge 5 commits into from

Conversation

shsms
Copy link
Contributor

@shsms shsms commented Dec 12, 2024

No description provided.

@shsms shsms requested a review from a team as a code owner December 12, 2024 16:09
@shsms shsms requested review from llucax and removed request for a team December 12, 2024 16:09
@github-actions github-actions bot added part:data-pipeline Affects the data pipeline part:microgrid Affects the interactions with the microgrid labels Dec 12, 2024
@github-actions github-actions bot added the part:docs Affects the documentation label Dec 16, 2024
@shsms shsms enabled auto-merge December 16, 2024 16:47
@shsms shsms marked this pull request as draft December 17, 2024 09:53
auto-merge was automatically disabled December 17, 2024 09:53

Pull request was converted to draft

@shsms
Copy link
Contributor Author

shsms commented Dec 17, 2024

I will add more commits here

shsms added 4 commits January 8, 2025 12:07
It is not providing any useful information or details about its
context, and it is too noisy.

Signed-off-by: Sahas Subramanian <[email protected]>
It is a wrapper around a `Logger` instance and limits logs when
there's an ongoing outage.

Signed-off-by: Sahas Subramanian <[email protected]>
Signed-off-by: Sahas Subramanian <[email protected]>
@github-actions github-actions bot added the part:core Affects the SDK core components (data structures, etc.) label Jan 8, 2025
@shsms shsms marked this pull request as ready for review January 8, 2025 11:14
@shsms
Copy link
Contributor Author

shsms commented Jan 8, 2025

I hope no one thinks this is overkill.

@github-actions github-actions bot added the part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) label Jan 8, 2025
This is because of the latest marshmallow release missing symbols
causing mkdoc builds to fail.  This needs to be reverted once this is
resolved: marshmallow-code/marshmallow#2739.

Signed-off-by: Sahas Subramanian <[email protected]>
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is overkill but I'm not convinced about the balance. It seems to me to be too generic for being specific and too specific for being generic.

As a generic solution, I like more the approach from https://github.com/samuller/log-rate-limit, it looks super flexible, allowing even to solve the reset() issue by allowing the next N logs:

# Override the allow_next_n value for a set of logs in the same stream so that this group of logs don't restrict one
# another from occuring consecutively
logger.warning("Test", extra=RateLimit(stream_id="stream2", allow_next_n=2))
logger.info("Extra", extra=RateLimit(stream_id="stream2"))
logger.debug("Info", extra=RateLimit(stream_id="stream2"))

And if we want to go more specific, I think it would just keep track of when logs are emitted in the metric fetcher itself, so we can even print more meaningful info, like including when was the last data received ("No data received for component %d since %s.").

# pylint: disable=arguments-differ


class RateLimitedLogger:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to inherit from Logger so this can be used in places where a plain Logger is expected? If you do so you probably only need to implement log().

And maybe this could be implemented as a Filter instead (here is an example filter to de-duplicate messages), but not sure, because filters are applied at the Handler level, so it would apply to everything that's logged or we need to customize the config in such a way that messages that we want to rate-limit must be handled by a different handler.

This looks quite interesting: https://github.com/samuller/log-rate-limit, it is a filter but allows overriding on every log call too, and allows grouping messages in streams, and individual streams can be rate-limited individually.

| None
)

DEFAULT_RATE_LIMIT = timedelta(minutes=15)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just put the literal timedelta() in the constructor to avoid the indirection in the docs. If you keep it this way, you should documented via a docstring so it appears in the docs and users can know what's the default.

Comment on lines +55 to +63
self._rate_limit: timedelta = rate_limit

def set_rate_limit(self, rate_limit: timedelta) -> None:
"""Set the rate limit for the logger.

Args:
rate_limit: Time interval between two log messages.
"""
self._rate_limit = rate_limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just making it public?

Suggested change
self._rate_limit: timedelta = rate_limit
def set_rate_limit(self, rate_limit: timedelta) -> None:
"""Set the rate limit for the logger.
Args:
rate_limit: Time interval between two log messages.
"""
self._rate_limit = rate_limit
self.rate_limit: timedelta = rate_limit
"""The rate limit for the logger."""

"""
self._rate_limit = rate_limit

def is_limiting(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_limiting(self) -> bool:
@property
def is_limiting(self) -> bool:

stacklevel: Stack level.
extra: Extra information.
"""
if self._rate_limit is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the check? It can't be None, right?

Comment on lines +130 to +134
_missing_data_logger.reset()
_missing_data_logger.debug(
"Component %d has started sending data.", self._component_id
)
_missing_data_logger.reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just log using the regular _logger only reset() here?

Suggested change
_missing_data_logger.reset()
_missing_data_logger.debug(
"Component %d has started sending data.", self._component_id
)
_missing_data_logger.reset()
__logger.debug(
"Component %d has started sending data.", self._component_id
)
_missing_data_logger.reset()

@@ -128,7 +140,9 @@ async def fetch_next(self) -> ComponentMetricsData | None:
return None
except asyncio.TimeoutError:
# Next time wait infinitely until we receive any message.
_logger.debug("Component %d stopped sending data.", self._component_id)
_missing_data_logger.debug(
"Component %d stopped sending data.", self._component_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase because when it is repeated, one might think that it just stopped sending data now. Also since we have instances where the problem was in the data pipeline and not the component itself, maybe is more accurate to say we are not receiving data.

Suggested change
"Component %d stopped sending data.", self._component_id
"No data received for component %d.", self._component_id

@@ -9,7 +9,7 @@ copyright: "Copyright © 2022 Frequenz Energy-as-a-Service GmbH"
repo_name: "frequenz-sdk-python"
repo_url: "https://github.com/frequenz-floss/frequenz-sdk-python"
edit_uri: "edit/v1.x.x/docs/"
strict: true # Treat warnings as errors
# strict: true # Treat warnings as errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, marshmallow-code/marshmallow#2739 is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:core Affects the SDK core components (data structures, etc.) part:data-pipeline Affects the data pipeline part:docs Affects the documentation part:microgrid Affects the interactions with the microgrid part:tooling Affects the development tooling (CI, deployment, dependency management, etc.)
Projects
Status: To do
Development

Successfully merging this pull request may close these issues.

2 participants