-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Druid CI tests #5184
Druid CI tests #5184
Conversation
// non-retryable HTTP errors | ||
tooManyRedirects = regexp.MustCompile(`stopped after \d+ redirects\z`) | ||
invalidProtocol = regexp.MustCompile(`unsupported protocol scheme`) | ||
TLSCert = regexp.MustCompile(`certificate is not trusted`) | ||
// retryable Druid errors | ||
coordinatorDown = regexp.MustCompile("A leader node could not be found for") | ||
brokerDown = regexp.MustCompile("There are no available brokers") | ||
noObject = regexp.MustCompile("Object '.*' not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider prefixing with errPattern
or something like that
type coordinatorHTTPCheck struct { | ||
c *sqlConnection | ||
} | ||
|
||
var _ retrier.AdditionalTest = &coordinatorHTTPCheck{} | ||
|
||
// if Coordinator is down for a transient reason it's not a hard failure | ||
func (chc *coordinatorHTTPCheck) IsHardFailure(ctx context.Context) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Move to the bottom of the file (as per the style guide, types should be declared in rough call order)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coordinatorHTTPCheck
is used right below
req.Header.Add("Content-Type", "application/json") | ||
resp, err := c.client.Do(req) | ||
resp, err := chc.c.client.Do(req) | ||
if err != nil { | ||
return nil, err | ||
return false, err | ||
} | ||
|
||
switch resp.StatusCode { | ||
case http.StatusTooManyRequests: | ||
return false, nil | ||
case http.StatusUnauthorized, http.StatusForbidden: | ||
return true, fmt.Errorf("Unauthorized request") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cases will return without closing the body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
resp.Body.Close() | ||
}) | ||
req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.dsn, bodyReader) | ||
req, err := http.NewRequestWithContext(ctx, http.MethodPost, chc.c.dsn, bodyReader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to extract the actual request and error handling into a single generic function and use structs for the various request/response types? It might also help ensure consistent error handling everywhere. Like:
req := &queryRequest{...}
resp := &segmentsResult{}
err := chc.c.request(ctx, req, resp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do that in a separate PR
switch resp.StatusCode { | ||
case http.StatusTooManyRequests: | ||
return nil, retrier.Retry, fmt.Errorf("Too many requests") | ||
case http.StatusUnauthorized, http.StatusForbidden: | ||
return nil, retrier.Fail, fmt.Errorf("Unauthorized request") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response body is not closed in these error conditions. Why not use defer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added defer
in the other request. Here the body should go to a handler that will parse it outside of the function so that we can support drivers.Rows.Next()
semantics.
transformers := make([]func(any) (any, error), len(columns)) | ||
for i, c := range types { | ||
transformers[i] = identityTransformer | ||
switch c { | ||
case "TINYINT": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This switch statement is pretty long/deep. Could it be pulled into a separate function?
runtime/drivers/druid/olap.go
Outdated
func (i informationSchema) Lookup(ctx context.Context, db, schema, name string) (*drivers.Table, error) { | ||
q := ` | ||
// ensure Coordinator is ready | ||
q := "SELECT * FROM sys.segments LIMIT 1" | ||
rows, err := i.c.db.QueryxContext(ctx, q, name) | ||
if err != nil { | ||
return nil, err | ||
} | ||
rows.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should avoid running a separate query to check for readiness on every single query. Also, isn't this already handled in the retrier in the QueryContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only on Lookup query and it's negligible compared to the whole amount of other requests.
The issues is that the request
SELECT
T.TABLE_SCHEMA AS SCHEMA,
T.TABLE_NAME AS NAME,
T.TABLE_TYPE AS TABLE_TYPE,
C.COLUMN_NAME AS COLUMN_NAME,
C.DATA_TYPE AS COLUMN_TYPE,
C.IS_NULLABLE = 'YES' AS IS_NULLABLE
FROM INFORMATION_SCHEMA.TABLES T
JOIN INFORMATION_SCHEMA.COLUMNS C ON T.TABLE_SCHEMA = C.TABLE_SCHEMA AND T.TABLE_NAME = C.TABLE_NAME
WHERE T.TABLE_SCHEMA = 'druid' AND T.TABLE_NAME = ?
ORDER BY SCHEMA, NAME, TABLE_TYPE, C.ORDINAL_POSITION
returns false-negative if the Coordinator is being restarted. Retrier is a more abstract class and it doesn't check if SQL tries to retreive dynamic schema and the will be no error from Druid Router (because if dynamic schema is empty - it's considered OK by the cluster).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a description.
if os.Getenv("METRICS_CREDS") == "" { | ||
t.Skip("skipping the test without the test instance") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should avoid hard-coding this env var in many places through the tests. Consider checking it once in e.g. NewInstanceForDruidProject
and calling t.Skip
if it's not set there.
runtime/testruntime/testruntime.go
Outdated
var err error | ||
creds := os.Getenv("METRICS_CREDS") | ||
address := os.Getenv("METRICS_ADDRESS") | ||
data, err := base64.StdEncoding.DecodeString(creds) | ||
if err != nil { | ||
return nil, "", err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace this with a single env var RILL_RUNTIME_DRUID_TEST_DSN
? And save it to our shared .env
file so everyone on the team is able to run it (see rill devtool dotenv -h
)
type coordinatorHTTPCheck struct { | ||
c *sqlConnection | ||
} | ||
|
||
var _ retrier.AdditionalTest = &coordinatorHTTPCheck{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to later in the file – style guide says to sort functions in call order with exported functionality first (i.e. QueryContext
first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't complain when CI tests were run. Shouldn't we skip it - it's looks good only from one side but it contradicts the rule - place dependent parts closer together to avoid scrolling back and forth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a lint rule, but described in the style guide here: https://github.com/uber-go/guide/blob/master/style.md#function-grouping-and-ordering.
I think it's a good rule to sort declarations by call order – it doesn't feel intuitive to look at a Druid driver and the first 100+ lines of code are internal network retrying logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But otherwise you'll see the struct you don't know and have to scroll to figure out what it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover I believe if the auto-formatter cannot render a style rule by itself then it's a maintenance burden - it consumes your focus away from thinking about logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
resp, err := c.client.Do(req) | ||
if err != nil { | ||
// nolint:errorlint // there's no wrapping | ||
if v, ok := err.(*url.Error); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know how the library will change in the future. Better to use errors.As
which is created for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I get the message then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what errors.As
helps with? Maybe you're confusing errors.Is
and errors.As
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean error.As
returns a bool but then after that I have to get the error message.
I agree that the library implementation can change but it will be incompatible changes for all clients (which is rare and a bad practice by the library). But anyway, if they change they can wrap it double time and this code fails again. So I should Unwrap recursively - which looks weird and clumsy for error handling.
So for example what can happen if the error is wrapped double time:
type Er1 struct {
}
func (*Er1) Error() string {
return "er1"
}
type wrapError struct {
msg string
err error
}
func (w *wrapError) Error() string {
return w.msg
}
func (w *wrapError) Unwrap() error {
return w.err
}
func ret() error {
var e error
e = &Er1{}
e = &wrapError{
msg: "extra info",
err: e,
}
e = &wrapError{
msg: "superextra info",
err: e,
}
return e
}
var _ error = &Er1{}
func TestError(t *testing.T) {
e := ret()
e0 := &Er1{}
require.True(t, errors.As(e, &e0))
require.Equal(t, "er1", errors.Unwrap(e).Error())
}
expected: "er1"
actual : "extra info"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errors.As
doesn't just return a bool, it also assigns the error – it handles the recursive unwrapping automatically. Check the docs, I'm pretty sure it's all you need here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
// nolint:errorlint // there's no wrapping | ||
if v, ok := err.(*url.Error); ok { | ||
if errTooManyRedirects.MatchString(v.Error()) { | ||
resp.Body.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may panic – accessing the body when the request failed.
It's a problem multiple places, I would serialize consider if this can be refactored to use only defer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. But defer
won't work - see next comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, got it. In that case, it's worth considering if things can be re-organized in such a way that resp.Body.Close
needs to be manually called in as few cases as possible (to reduce the chance of developer errors, as well as the risk of a buffer leak if a panic
occurs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think about it in another PR.
runtime/testruntime/testruntime.go
Outdated
var err error | ||
b64 := os.Getenv("RILL_RUNTIME_DRUID_TEST_DSN") | ||
dsn, err := base64.StdEncoding.DecodeString(b64) | ||
if err != nil { | ||
return nil, "", err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be base64 encoded? We have other env variables containing DSNs that are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It contains password in format https://username:password@hostname...
Bash has special symbols that the password can contain, to reduce the mental burden remembering which ones I just coded it into base64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought bash/env variables can contain any ASCII string? Just need to escape it or use single quotes to enter it.
Given that all other environment variables in the project do not use base64, I think we should avoid it here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some follow-up comments on various previous review comments. Please take a look as some of them require some more changes.
re := retrier.NewRetrier(6, 2*time.Second, &coordinatorHTTPCheck{ | ||
c: c, | ||
}) | ||
return re.RunCtx(ctx, func(ctx2 context.Context) (driver.Rows, retrier.Action, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it on purpose that most calls use ctx
instead of ctx2
? What is the implication of using one vs. the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
return l, nil | ||
} | ||
|
||
return nil, retrier.Retry, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to retry when we don't even get a response back? For example, might just be the DSN that's broken or a local firewall, it's not great to have 126 seconds of retrying just to discover that.
Should we maybe opt for just retrying on the retryable HTTP status codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
runtime/testruntime/testruntime.go
Outdated
@@ -224,11 +230,20 @@ func NewInstanceForProject(t TestingT, name string) (*runtime.Runtime, string) { | |||
return rt, inst.ID | |||
} | |||
|
|||
func NewInstanceForDruidProject(t TestingT) (*runtime.Runtime, string) { | |||
func NewInstanceForDruidProject(t *testing.T) (*runtime.Runtime, string, error) { | |||
_, err := os.Stat(".env") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Go tests always evaluate relative to the test file, not the repo root. So you probably need to do something similar to this to get the path of the root .env
file:
rill/runtime/testruntime/testruntime.go
Lines 181 to 182 in d66a28f
_, currentFile, _, _ := goruntime.Caller(0) | |
projectPath := filepath.Join(currentFile, "..", "testdata", name) |
Please test and confirm that the approach works
runtime/testruntime/testruntime.go
Outdated
func NewInstanceForDruidProject(t TestingT) (*runtime.Runtime, string) { | ||
func NewInstanceForDruidProject(t *testing.T) (*runtime.Runtime, string, error) { | ||
_, err := os.Stat(".env") | ||
if !errors.Is(err, fs.ErrNotExist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be other errors than file not existing – it would probably be better to do if err == nil
, which is the only way to know that the file actually exists
runtime/testruntime/testruntime.go
Outdated
func NewInstanceForDruidProject(t *testing.T) (*runtime.Runtime, string, error) { | ||
_, err := os.Stat(".env") | ||
if !errors.Is(err, fs.ErrNotExist) { | ||
require.NoError(t, godotenv.Load()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the comment above – you may need to pass the repo root path here for this to work
Done. |
runtime/testruntime/testruntime.go
Outdated
func NewInstanceForDruidProject(t TestingT) (*runtime.Runtime, string) { | ||
func NewInstanceForDruidProject(t *testing.T) (*runtime.Runtime, string, error) { | ||
_, currentFile, _, _ := goruntime.Caller(0) | ||
envPath := filepath.Join(currentFile, "..", "..", ".env") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this resolve to <repo>/runtime/.env
, not <repo>/.env
? Can you try any changes to confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
No description provided.