Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring de l'utilisation du pool Ecto #3465

Merged
merged 5 commits into from
Sep 18, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions apps/transport/lib/transport/application.ex
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ defmodule Transport.Application do
:ok = Transport.Jobs.ObanLogger.setup()

:ok = Transport.Telemetry.setup()
:ok = Transport.EctoTelemetry.setup()

opts = [strategy: :one_for_one, name: Transport.Supervisor]
Supervisor.start_link(children, opts)
Expand Down
78 changes: 78 additions & 0 deletions apps/transport/lib/transport/ecto_telemetry.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
defmodule Transport.EctoTelemetry do
require Logger

@moduledoc """
A module receiving Ecto events and forwarding them (currently) to AppSignal, so that we can
chart the pool occupancy.

References:
* https://hexdocs.pm/ecto/Ecto.Repo.html#module-telemetry-events
* https://github.com/appsignal/appsignal-elixir/issues/318#issuecomment-841156779 (inspiration)
* https://elixirforum.com/t/how-to-deal-with-detached-telemetry-handlers-caused-by-errors/56069/2
* https://github.com/appsignal/appsignal-elixir/issues/887 (which ensures calls are fast & mostly safe)

A few notes extracted from the doc (with extra comments):
* `:idle_time` - the time the connection spent waiting before being checked out for the query.
the higher the better ; if this gets low (close to 0), the pool is over-used (not good).
* `:queue_time` - the time spent waiting to check out a database connection.
the lower the better. if this gets too high, the pool is over-used (not good).
* `:query_time` - the time spent executing the query.

We do not track the following metrics for now, because they are less useful for pool optimization:
* `:decode_time` - the time spent decoding the data received from the database
* `:total_time` - the sum of (queue_time, query_time, and decode_time)️

It must be noted that handlers are running inline, and that they can be "detached" in case of errors.

In the case of this implementation, the risk is just losing metrics, which is acceptable, plus the way
AppSignal works (see above discussions) makes this risk not very likely to occur.
"""

def handle_event([:db, :repo, :query], measurements, _metadata, _config) do
#
# NOTE: at time of writing, at AppSignal, "Custom metrics do not count towards your plan.
# Only requests and logging. Custom metrics are included.".
#
# If we need to reduce the volume of metrics (for billing or ops reasons), it appears that
# Oban emits roughly 5 calls every 500ms in development at least (setting `WORKER=0` removes those events).
# If this is a problem, we can probably filter a part of them via metadata[:options][:oban_conf] which
# appears to be there, but for now I believe this gives useful information too.
#

case measurements do
%{queue_time: queue_time} ->
Appsignal.add_distribution_value("ecto.queue_time", System.convert_time_unit(queue_time, :native, :millisecond))

_ ->
nil
end

case measurements do
%{idle_time: idle_time} ->
Appsignal.add_distribution_value("ecto.idle_time", System.convert_time_unit(idle_time, :native, :millisecond))

_ ->
nil
end

case measurements do
%{query_time: query_time} ->
Appsignal.add_distribution_value("ecto.query_time", System.convert_time_unit(query_time, :native, :millisecond))

_ ->
nil
end
Comment on lines +42 to +64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pourquoi pas un seul case avec plusieurs clauses ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

J'imagine parce que ça stop au premier match, je me demande si on pourrait écrire autrement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C'est parce qu'effectivement il faut avoir plusieurs match, certains events auront un, deux ou trois sous-parties, qui doivent être tracées quand elles sont présente.

L'implémentation est reprise d'ici appsignal/appsignal-elixir#318 (comment) et semble avoir été testée aussi, et AppSignal va implémenter ça en natif prochainement (appsignal/appsignal-elixir#539 (comment)), donc je n'ai pas cherché à trop modifier.

end

def setup do
Logger.info("Setting up telemetry for AppSignal + Ecto")

:telemetry.attach(
"transport-ecto",
# NOTE: the first two params are I believe mapped to `DB.Repo`
[:db, :repo, :query],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ce serait pas :transport au lieu de :db ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C'est ce que j'avais mis initialement, mais ça ne fonctionnait pas, car le module s'appelle (en tout cas c'est ce que j'ai compris) DB.Repo.

&Transport.EctoTelemetry.handle_event/4,
nil
)
end
end