New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Logging improvements #1132

Open

shsms wants to merge 5 commits into frequenz-floss:v1.x.x from shsms:logging

Contributor

shsms commented Dec 12, 2024

No description provided.

shsms requested a review from a team as a code owner

December 12, 2024 16:09

shsms requested review from llucax and removed request for a team

December 12, 2024 16:09

github-actions bot added part:data-pipeline part:microgrid labels

shsms force-pushed the logging branch from 1343afa to 5572e90 Compare

December 16, 2024 16:46

github-actions bot added the part:docs label

shsms enabled auto-merge

December 16, 2024 16:47

shsms marked this pull request as draft

December 17, 2024 09:53

auto-merge was automatically disabled

December 17, 2024 09:53

Pull request was converted to draft

Contributor Author

shsms commented Dec 17, 2024

I will add more commits here

shsms added 4 commits

January 8, 2025 12:07


          Remove "Checking battery" log message

374665f

It is not providing any useful information or details about its
context, and it is too noisy.

Signed-off-by: Sahas Subramanian <[email protected]>


          Add a RateLimitedLogger implementation

63e0b89

It is a wrapper around a `Logger` instance and limits logs when
there's an ongoing outage.

Signed-off-by: Sahas Subramanian <[email protected]>


          Rate limit missing component data log messages from the battery pool

4b41848

Signed-off-by: Sahas Subramanian <[email protected]>


          Update release notes

d1fd598

Signed-off-by: Sahas Subramanian <[email protected]>

shsms force-pushed the logging branch from 5572e90 to d1fd598 Compare

January 8, 2025 11:14

github-actions bot added the part:core label

shsms marked this pull request as ready for review

January 8, 2025 11:14

Contributor Author

shsms commented Jan 8, 2025

I hope no one thinks this is overkill.

github-actions bot added the part:tooling label


          Disable mkdocs cross-ref strict mode

8f497e9

This is because of the latest marshmallow release missing symbols
causing mkdoc builds to fail.  This needs to be reverted once this is
resolved: marshmallow-code/marshmallow#2739.

Signed-off-by: Sahas Subramanian <[email protected]>

shsms force-pushed the logging branch from 073863a to 8f497e9 Compare

January 8, 2025 14:54

llucax reviewed

View reviewed changes

Contributor

llucax left a comment

I don't think it is overkill but I'm not convinced about the balance. It seems to me to be too generic for being specific and too specific for being generic.

As a generic solution, I like more the approach from https://github.com/samuller/log-rate-limit, it looks super flexible, allowing even to solve the reset() issue by allowing the next N logs:

# Override the allow_next_n value for a set of logs in the same stream so that this group of logs don't restrict one
# another from occuring consecutively
logger.warning("Test", extra=RateLimit(stream_id="stream2", allow_next_n=2))
logger.info("Extra", extra=RateLimit(stream_id="stream2"))
logger.debug("Info", extra=RateLimit(stream_id="stream2"))

And if we want to go more specific, I think it would just keep track of when logs are emitted in the metric fetcher itself, so we can even print more meaningful info, like including when was the last data received ("No data received for component %d since %s.").

src/frequenz/sdk/_internal/_logging.py

		# pylint: disable=arguments-differ


		class RateLimitedLogger:

Contributor

llucax Jan 9, 2025

Any reason not to inherit from Logger so this can be used in places where a plain Logger is expected? If you do so you probably only need to implement log().

And maybe this could be implemented as a Filter instead (here is an example filter to de-duplicate messages), but not sure, because filters are applied at the Handler level, so it would apply to everything that's logged or we need to customize the config in such a way that messages that we want to rate-limit must be handled by a different handler.

This looks quite interesting: https://github.com/samuller/log-rate-limit, it is a filter but allows overriding on every log call too, and allows grouping messages in streams, and individual streams can be rate-limited individually.

src/frequenz/sdk/_internal/_logging.py

+                  | None
+              )
+              DEFAULT_RATE_LIMIT = timedelta(minutes=15)

Contributor

llucax Jan 9, 2025

I would probably just put the literal timedelta() in the constructor to avoid the indirection in the docs. If you keep it this way, you should documented via a docstring so it appears in the docs and users can know what's the default.

src/frequenz/sdk/_internal/_logging.py

Comment on lines +55 to +63

+                      self._rate_limit: timedelta = rate_limit
+                  def set_rate_limit(self, rate_limit: timedelta) -> None:
+                      """Set the rate limit for the logger.
+                      Args:
+                          rate_limit: Time interval between two log messages.
+                      """
+                      self._rate_limit = rate_limit

Contributor

llucax Jan 9, 2025

Why not just making it public?

Suggested change

      
                    self._rate_limit: timedelta = rate_limit
          
                def set_rate_limit(self, rate_limit: timedelta) -> None:
          
                    """Set the rate limit for the logger.
          
                    Args:
          
                        rate_limit: Time interval between two log messages.
          
                    """
          
                    self._rate_limit = rate_limit
          
                    self.rate_limit: timedelta = rate_limit
          
                    """The rate limit for the logger."""

src/frequenz/sdk/_internal/_logging.py

+                      """
+                      self._rate_limit = rate_limit
+                  def is_limiting(self) -> bool:

Contributor

llucax Jan 9, 2025

Suggested change

      
                def is_limiting(self) -> bool:
          
                @property
          
                def is_limiting(self) -> bool:

src/frequenz/sdk/_internal/_logging.py

+                          stacklevel: Stack level.
+                          extra: Extra information.
+                      """
+                      if self._rate_limit is None:

Contributor

llucax Jan 9, 2025

Why the check? It can't be None, right?

src/frequenz/sdk/timeseries/battery_pool/_component_metric_fetcher.py

Comment on lines +130 to +134

+                              _missing_data_logger.reset()
+                              _missing_data_logger.debug(
+                                  "Component %d has started sending data.", self._component_id
+                              )
+                              _missing_data_logger.reset()

Contributor

llucax Jan 9, 2025

Couldn't you just log using the regular _logger only reset() here?

Suggested change

      
                            _missing_data_logger.reset()
          
                            _missing_data_logger.debug(
          
                                "Component %d has started sending data.", self._component_id
          
                            )
          
                            _missing_data_logger.reset()
          
                            __logger.debug(
          
                                "Component %d has started sending data.", self._component_id
          
                            )
          
                            _missing_data_logger.reset()

src/frequenz/sdk/timeseries/battery_pool/_component_metric_fetcher.py

@@ @@ -128,7 +140,9 @@ async def fetch_next(self) -> ComponentMetricsData | None: @@
                           return None
                       except asyncio.TimeoutError:
                           # Next time wait infinitely until we receive any message.
-                          _logger.debug("Component %d stopped sending data.", self._component_id)
+                          _missing_data_logger.debug(
+                              "Component %d stopped sending data.", self._component_id

Contributor

llucax Jan 9, 2025

I would rephrase because when it is repeated, one might think that it just stopped sending data now. Also since we have instances where the problem was in the data pipeline and not the component itself, maybe is more accurate to say we are not receiving data.

Suggested change

      
                            "Component %d stopped sending data.", self._component_id
          
                            "No data received for component %d.", self._component_id

llucax reviewed

View reviewed changes

mkdocs.yml

               repo_name: "frequenz-sdk-python"
               repo_url: "https://github.com/frequenz-floss/frequenz-sdk-python"
               edit_uri: "edit/v1.x.x/docs/"
-              strict: true  # Treat warnings as errors
+              # strict: true  # Treat warnings as errors

Contributor

llucax Jan 9, 2025

This can be removed, marshmallow-code/marshmallow#2739 is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:core part:data-pipeline part:docs part:microgrid part:tooling