-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging improvements #1132
base: v1.x.x
Are you sure you want to change the base?
Logging improvements #1132
Conversation
Pull request was converted to draft
I will add more commits here |
It is not providing any useful information or details about its context, and it is too noisy. Signed-off-by: Sahas Subramanian <[email protected]>
It is a wrapper around a `Logger` instance and limits logs when there's an ongoing outage. Signed-off-by: Sahas Subramanian <[email protected]>
Signed-off-by: Sahas Subramanian <[email protected]>
Signed-off-by: Sahas Subramanian <[email protected]>
I hope no one thinks this is overkill. |
This is because of the latest marshmallow release missing symbols causing mkdoc builds to fail. This needs to be reverted once this is resolved: marshmallow-code/marshmallow#2739. Signed-off-by: Sahas Subramanian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is overkill but I'm not convinced about the balance. It seems to me to be too generic for being specific and too specific for being generic.
As a generic solution, I like more the approach from https://github.com/samuller/log-rate-limit, it looks super flexible, allowing even to solve the reset()
issue by allowing the next N logs:
# Override the allow_next_n value for a set of logs in the same stream so that this group of logs don't restrict one
# another from occuring consecutively
logger.warning("Test", extra=RateLimit(stream_id="stream2", allow_next_n=2))
logger.info("Extra", extra=RateLimit(stream_id="stream2"))
logger.debug("Info", extra=RateLimit(stream_id="stream2"))
And if we want to go more specific, I think it would just keep track of when logs are emitted in the metric fetcher itself, so we can even print more meaningful info, like including when was the last data received ("No data received for component %d since %s."
).
# pylint: disable=arguments-differ | ||
|
||
|
||
class RateLimitedLogger: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to inherit from Logger
so this can be used in places where a plain Logger
is expected? If you do so you probably only need to implement log()
.
And maybe this could be implemented as a Filter
instead (here is an example filter to de-duplicate messages), but not sure, because filters are applied at the Handler
level, so it would apply to everything that's logged or we need to customize the config in such a way that messages that we want to rate-limit must be handled by a different handler.
This looks quite interesting: https://github.com/samuller/log-rate-limit, it is a filter but allows overriding on every log call too, and allows grouping messages in streams, and individual streams can be rate-limited individually.
| None | ||
) | ||
|
||
DEFAULT_RATE_LIMIT = timedelta(minutes=15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably just put the literal timedelta()
in the constructor to avoid the indirection in the docs. If you keep it this way, you should documented via a docstring so it appears in the docs and users can know what's the default.
self._rate_limit: timedelta = rate_limit | ||
|
||
def set_rate_limit(self, rate_limit: timedelta) -> None: | ||
"""Set the rate limit for the logger. | ||
|
||
Args: | ||
rate_limit: Time interval between two log messages. | ||
""" | ||
self._rate_limit = rate_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just making it public?
self._rate_limit: timedelta = rate_limit | |
def set_rate_limit(self, rate_limit: timedelta) -> None: | |
"""Set the rate limit for the logger. | |
Args: | |
rate_limit: Time interval between two log messages. | |
""" | |
self._rate_limit = rate_limit | |
self.rate_limit: timedelta = rate_limit | |
"""The rate limit for the logger.""" |
""" | ||
self._rate_limit = rate_limit | ||
|
||
def is_limiting(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def is_limiting(self) -> bool: | |
@property | |
def is_limiting(self) -> bool: |
stacklevel: Stack level. | ||
extra: Extra information. | ||
""" | ||
if self._rate_limit is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the check? It can't be None
, right?
_missing_data_logger.reset() | ||
_missing_data_logger.debug( | ||
"Component %d has started sending data.", self._component_id | ||
) | ||
_missing_data_logger.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just log using the regular _logger
only reset()
here?
_missing_data_logger.reset() | |
_missing_data_logger.debug( | |
"Component %d has started sending data.", self._component_id | |
) | |
_missing_data_logger.reset() | |
__logger.debug( | |
"Component %d has started sending data.", self._component_id | |
) | |
_missing_data_logger.reset() |
@@ -128,7 +140,9 @@ async def fetch_next(self) -> ComponentMetricsData | None: | |||
return None | |||
except asyncio.TimeoutError: | |||
# Next time wait infinitely until we receive any message. | |||
_logger.debug("Component %d stopped sending data.", self._component_id) | |||
_missing_data_logger.debug( | |||
"Component %d stopped sending data.", self._component_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase because when it is repeated, one might think that it just stopped sending data now. Also since we have instances where the problem was in the data pipeline and not the component itself, maybe is more accurate to say we are not receiving data.
"Component %d stopped sending data.", self._component_id | |
"No data received for component %d.", self._component_id |
@@ -9,7 +9,7 @@ copyright: "Copyright © 2022 Frequenz Energy-as-a-Service GmbH" | |||
repo_name: "frequenz-sdk-python" | |||
repo_url: "https://github.com/frequenz-floss/frequenz-sdk-python" | |||
edit_uri: "edit/v1.x.x/docs/" | |||
strict: true # Treat warnings as errors | |||
# strict: true # Treat warnings as errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed, marshmallow-code/marshmallow#2739 is fixed.
No description provided.