Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): optimize metrics #27

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pr-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21.3

- uses: actions/cache@v3
with:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21.3
- name: Lint
uses: golangci/golangci-lint-action@v3
with:
Expand All @@ -43,7 +43,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21.3

- uses: actions/cache@v3
with:
Expand Down
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,10 @@ h.GET("/ping", func(c context.Context, ctx *app.RequestContext) {

Below is a table of HTTP server metric instruments.

| Name | Instrument Type | Unit | Unit | Description |
|-------------------------------|---------------------------------------------------|--------------|-------------------------------------------|------------------------------------------------------------------------------|
| `http.server.duration` | Histogram | milliseconds | `ms` | measures the duration inbound HTTP requests |
| Name | Instrument Type | Unit | Unit | Description |
|-----------------------------|-----------------|--------------|-----------|------------------------------------------------------------------------------|
| `http.server.duration` | Histogram | milliseconds | `ms`<br/> | measures the duration inbound HTTP requests |
| `http.server.request_count` | Counter | count | `count` | measures the incoming request count total |


#### Hertz Client
Expand All @@ -145,6 +146,7 @@ Below is a table of HTTP client metric instruments.
| Name | Instrument Type ([*](README.md#instrument-types)) | Unit | Unit ([UCUM](README.md#instrument-units)) | Description |
|-----------------------------|---------------------------------------------------|--------------|-------------------------------------------|----------------------------------------------------------|
| `http.client.duration` | Histogram | milliseconds | `ms` | measures the duration outbound HTTP requests |
| `http.client.request_count` | Counter | count | `count` | measures the client request count total |


### R.E.D
Expand All @@ -155,15 +157,15 @@ the number of requests, per second, you services are serving.

eg: QPS
```
sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
sum(rate(http_server_request_count_total{}[5m])) by (service_name, http_method)
```

#### Errors
the number of failed requests per second.

eg: Error ratio
```
sum(rate(http_server_duration_count{status_code="Error"}[5m])) by (service_name, http_method) / sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
sum(rate(http_server_request_count_total{status_code="Error"}[5m])) by (service_name, http_method) / sum(rate(http_server_request_count_total{}[5m])) by (service_name, http_method)
```

#### Duration
Expand All @@ -177,7 +179,7 @@ histogram_quantile(0.99, sum(rate(http_server_duration_bucket{}[5m])) by (le, se
### Service Topology Map
The `http.server.duration` will record the peer service and the current service dimension. Based on this dimension, we can aggregate the service topology map
```
sum(rate(http_server_duration_count{}[5m])) by (service_name, peer_service)
sum(rate(http_server_request_count_total{}[5m])) by (service_name, peer_service)
```

### Runtime Metrics
Expand Down
20 changes: 11 additions & 9 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,17 +140,19 @@ h.GET("/ping", func(c context.Context, ctx *app.RequestContext) {

下表列出了 HTTP 服务的指标

| 名称 | Instrument Type | 单位 | 单位 | 描述 |
|-------------------------------|---------------------------------------------------|--------------|-------------------------------------------|------------------------------------------------------------------------------|
| `http.server.duration` | Histogram | milliseconds | `ms` | 测量入站 HTTP 请求的耗时 |
| 名称 | Instrument Type | 单位 | 单位 | 描述 |
|-----------------------------|---------------------------------------------------|--------------|-------------------------------------------|-----------------|
| `http.server.duration` | Histogram | milliseconds | `ms` | 测量入站 HTTP 请求的耗时 |
| `http.server.request_count` | Counter | count | `count` | 测量入站 HTTP 请求数 |

#### Hertz Client

下表列出了 HTTP 客户端指标

| 名称 | Instrument Type | 单位 | 单位 (UCUM) | 描述 |
|-----------------------------|---------------------------------------------------|--------------|-------------------------------------------|----------------------------------------------------------|
| `http.client.duration` | Histogram | milliseconds | `ms` | 测量出站 HTTP 请求的耗时 |
| 名称 | Instrument Type | 单位 | 单位 (UCUM) | 描述 |
|-----------------------------|---------------------------------------------------|--------------|-------------------------------------------|-----------------|
| `http.client.duration` | Histogram | milliseconds | `ms` | 测量出站 HTTP 请求的耗时 |
| `http.client.request_count` | Counter | count | `count` | 测量出站 HTTP 请求数 |


### R.E.D
Expand All @@ -163,7 +165,7 @@ R.E.D (Rate, Errors, Duration) 定义了架构中的每个微服务测量的三
例如: QPS(Queries Per Second)每秒查询率

```
sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
sum(rate(http_server_request_count_total{}[5m])) by (service_name, http_method)
```

#### Errors
Expand All @@ -173,7 +175,7 @@ sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
例如:错误率

```
sum(rate(http_server_duration_count{status_code="Error"}[5m])) by (service_name, http_method) / sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
sum(rate(http_server_request_count_total{status_code="Error"}[5m])) by (service_name, http_method) / sum(rate(http_server_request_count_total{}[5m])) by (service_name, http_method)
```

#### Duration
Expand All @@ -190,7 +192,7 @@ histogram_quantile(0.99, sum(rate(http_server_duration_bucket{}[5m])) by (le, se

`http.server.duration`将记录对等服务和当前服务维度。基于这个维度,我们可以汇总生成服务拓扑图
```
sum(rate(http_server_duration_count{}[5m])) by (service_name, peer_service)
sum(rate(http_server_request_count_total{}[5m])) by (service_name, peer_service)
```

### Runtime Metrics
Expand Down
29 changes: 29 additions & 0 deletions testutil/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
module github.com/hertz-contrib/obs-opentelemetry/testutil

go 1.21

require (
github.com/prometheus/client_golang v1.17.0
go.opentelemetry.io/otel v1.20.0
go.opentelemetry.io/otel/exporters/prometheus v0.43.0
go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.20.0
go.opentelemetry.io/otel/metric v1.20.0
go.opentelemetry.io/otel/sdk v1.20.0
go.opentelemetry.io/otel/sdk/metric v1.20.0
)

require (
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/go-logr/logr v1.3.0 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.4 // indirect
github.com/prometheus/client_model v0.5.0 // indirect
github.com/prometheus/common v0.44.0 // indirect
github.com/prometheus/procfs v0.11.1 // indirect
go.opentelemetry.io/otel/trace v1.20.0 // indirect
golang.org/x/sys v0.14.0 // indirect
google.golang.org/protobuf v1.31.0 // indirect
)
56 changes: 56 additions & 0 deletions testutil/go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/cespare/xxhash/v2 v2.2.0 h1:DC2CZ1Ep5Y4k3ZQ899DldepgrayRUGE6BBZ/cd9Cj44=
github.com/cespare/xxhash/v2 v2.2.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
github.com/go-logr/logr v1.3.0 h1:2y3SDp0ZXuc6/cjLSZ+Q3ir+QB9T/iG5yYRXqsagWSY=
github.com/go-logr/logr v1.3.0/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE=
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk=
github.com/golang/protobuf v1.5.3 h1:KhyjKVUg7Usr/dYsdSqoFveMYd5ko72D+zANwlG1mmg=
github.com/golang/protobuf v1.5.3/go.mod h1:XVQd3VNwM+JqD3oG2Ue2ip4fOMUkwXdXDdiuN0vRsmY=
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/matttproud/golang_protobuf_extensions v1.0.4 h1:mmDVorXM7PCGKw94cs5zkfA9PSy5pEvNWRP0ET0TIVo=
github.com/matttproud/golang_protobuf_extensions v1.0.4/go.mod h1:BSXmuO+STAnVfrANrmjBb36TMTDstsz7MSK+HVaYKv4=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/prometheus/client_golang v1.17.0 h1:rl2sfwZMtSthVU752MqfjQozy7blglC+1SOtjMAMh+Q=
github.com/prometheus/client_golang v1.17.0/go.mod h1:VeL+gMmOAxkS2IqfCq0ZmHSL+LjWfWDUmp1mBz9JgUY=
github.com/prometheus/client_model v0.5.0 h1:VQw1hfvPvk3Uv6Qf29VrPF32JB6rtbgI6cYPYQjL0Qw=
github.com/prometheus/client_model v0.5.0/go.mod h1:dTiFglRmd66nLR9Pv9f0mZi7B7fk5Pm3gvsjB5tr+kI=
github.com/prometheus/common v0.44.0 h1:+5BrQJwiBB9xsMygAB3TNvpQKOwlkc25LbISbrdOOfY=
github.com/prometheus/common v0.44.0/go.mod h1:ofAIvZbQ1e/nugmZGz4/qCb9Ap1VoSTIO7x0VV9VvuY=
github.com/prometheus/procfs v0.11.1 h1:xRC8Iq1yyca5ypa9n1EZnWZkt7dwcoRPQwX/5gwaUuI=
github.com/prometheus/procfs v0.11.1/go.mod h1:eesXgaPo1q7lBpVMoMy0ZOFTth9hBn4W/y0/p/ScXhY=
github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk=
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
go.opentelemetry.io/otel v1.20.0 h1:vsb/ggIY+hUjD/zCAQHpzTmndPqv/ml2ArbsbfBYTAc=
go.opentelemetry.io/otel v1.20.0/go.mod h1:oUIGj3D77RwJdM6PPZImDpSZGDvkD9fhesHny69JFrs=
go.opentelemetry.io/otel/exporters/prometheus v0.43.0 h1:Skkl6akzvdWweXX6LLAY29tyFSO6hWZ26uDbVGTDXe8=
go.opentelemetry.io/otel/exporters/prometheus v0.43.0/go.mod h1:nZStMoc1H/YJpRjSx9IEX4abBMekORTLQcTUT1CgLkg=
go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.20.0 h1:4s9HxB4azeeQkhY0GE5wZlMj4/pz8tE5gx2OQpGUw58=
go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.20.0/go.mod h1:djVA3TUJ2fSdMX0JE5XxFBOaZzprElJoP7fD4vnV2SU=
go.opentelemetry.io/otel/metric v1.20.0 h1:ZlrO8Hu9+GAhnepmRGhSU7/VkpjrNowxRN9GyKR4wzA=
go.opentelemetry.io/otel/metric v1.20.0/go.mod h1:90DRw3nfK4D7Sm/75yQ00gTJxtkBxX+wu6YaNymbpVM=
go.opentelemetry.io/otel/sdk v1.20.0 h1:5Jf6imeFZlZtKv9Qbo6qt2ZkmWtdWx/wzcCbNUlAWGM=
go.opentelemetry.io/otel/sdk v1.20.0/go.mod h1:rmkSx1cZCm/tn16iWDn1GQbLtsW/LvsdEEFzCSRM6V0=
go.opentelemetry.io/otel/sdk/metric v1.20.0 h1:5eD40l/H2CqdKmbSV7iht2KMK0faAIL2pVYzJOWobGk=
go.opentelemetry.io/otel/sdk/metric v1.20.0/go.mod h1:AGvpC+YF/jblITiafMTYgvRBUiwi9hZf0EYE2E5XlS8=
go.opentelemetry.io/otel/trace v1.20.0 h1:+yxVAPZPbQhbC3OfAkeIVTky6iTFpcr4SiY9om7mXSQ=
go.opentelemetry.io/otel/trace v1.20.0/go.mod h1:HJSK7F/hA5RlzpZ0zKDCHCDHm556LCDtKaAo6JmBFUU=
golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.14.0 h1:Vz7Qs629MkJkGyHxUlRHizWJRG2j8fbQKjELVSNhy7Q=
golang.org/x/sys v0.14.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
google.golang.org/protobuf v1.26.0/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc=
google.golang.org/protobuf v1.31.0 h1:g0LDEJHgrBl9N9r17Ru3sqWhkIx2NB67okBHPwC7hs8=
google.golang.org/protobuf v1.31.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
108 changes: 108 additions & 0 deletions testutil/otel.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
// Copyright 2022 CloudWeGo Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package testutil

import (
"os"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/testutil"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/metric"
"go.opentelemetry.io/otel/sdk/resource"

otelprom "go.opentelemetry.io/otel/exporters/prometheus"
stdout "go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
otelmetric "go.opentelemetry.io/otel/metric"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
)

// OtelTestProvider get otel test provider
func OtelTestProvider() (*sdktrace.TracerProvider, otelmetric.MeterProvider, *prometheus.Registry) {
// prometheus registry
registry := prometheus.NewRegistry()

// init tracer
tracerProvider, err := initTracer()
if err != nil {
panic(err)
}

meterProvider, err := initMeterProvider(registry)
if err != nil {
panic(err)
}

return tracerProvider, meterProvider, registry
}

// GatherAndCompare compare metrics with registry
func GatherAndCompare(registry *prometheus.Registry, expectedFilePath string, metricName ...string) error {
file, err := os.Open(expectedFilePath)
if err != nil {
return err
}
defer func(file *os.File) {
_ = file.Close()
}(file)

err = testutil.GatherAndCompare(registry, file, metricName...)
if err != nil {
return err
}
return nil
}

func initMeterProvider(registry *prometheus.Registry) (otelmetric.MeterProvider, error) {
exporter, err := initMetricExporter(registry)
if err != nil {
return nil, err
}
provider := metric.NewMeterProvider(metric.WithReader(exporter))
return provider, nil
}

func initMetricExporter(registry *prometheus.Registry) (*otelprom.Exporter, error) {
return otelprom.New(
otelprom.WithRegisterer(registry),
)
}

func initTracer() (*sdktrace.TracerProvider, error) {
// Create stdout exporter to be able to retrieve
// the collected spans.
exporter, err := stdout.New(stdout.WithPrettyPrint())
if err != nil {
return nil, err
}

// For the demonstration, use sdktrace.AlwaysSample sampler to sample all traces.
// In a production application, use sdktrace.ProbabilitySampler with a desired probability.
tp := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sdktrace.AlwaysSample()),
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String("test-server"),
semconv.ServiceNamespaceKey.String("test-ns"),
semconv.DeploymentEnvironmentKey.String("test-env"),
)),
)
otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
return tp, err
}
67 changes: 67 additions & 0 deletions tracing/example_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
// Copyright 2022 CloudWeGo Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package tracing_test

import (
"context"
"testing"
"time"

"github.com/cloudwego/hertz/pkg/app"
"github.com/cloudwego/hertz/pkg/app/client"
"github.com/cloudwego/hertz/pkg/app/server"
"github.com/cloudwego/hertz/pkg/common/hlog"
"github.com/cloudwego/hertz/pkg/protocol/consts"
"github.com/hertz-contrib/obs-opentelemetry/testutil"
hertztracing "github.com/hertz-contrib/obs-opentelemetry/tracing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/otel"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

func TestMetricsExample(t *testing.T) {
// test util
tracerProvider, meterProvider, registry := testutil.OtelTestProvider()
defer func(tracerProvider *sdktrace.TracerProvider, ctx context.Context) {
_ = tracerProvider.Shutdown(ctx)
}(tracerProvider, context.Background())
otel.SetMeterProvider(meterProvider)

// server example
tracer, cfg := hertztracing.NewServerTracer()
h := server.Default(tracer, server.WithHostPorts(":39888"))
h.Use(hertztracing.ServerMiddleware(cfg))
h.GET("/ping", func(c context.Context, ctx *app.RequestContext) {
hlog.CtxDebugf(c, "message received successfully")
ctx.JSON(consts.StatusOK, "pong")
})
go h.Spin()

<-time.After(time.Millisecond * 500)

// client example
c, _ := client.NewClient()
c.Use(hertztracing.ClientMiddleware())
_, body, err := c.Get(context.Background(), nil, "http://localhost:39888/ping?foo=bar")
require.NoError(t, err)
assert.NotNil(t, body)

// diff metrics
assert.NoError(t, testutil.GatherAndCompare(
registry, "testdata/hertz_request_metrics.txt",
"http_server_request_count_total", "http_client_request_count_total"),
)
}
Loading
Loading