Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Druid CI tests #5184

Merged
merged 36 commits into from
Aug 23, 2024
Merged

Druid CI tests #5184

merged 36 commits into from
Aug 23, 2024

Conversation

egor-ryashin
Copy link
Contributor

No description provided.

Comment on lines 24 to 31
// non-retryable HTTP errors
tooManyRedirects = regexp.MustCompile(`stopped after \d+ redirects\z`)
invalidProtocol = regexp.MustCompile(`unsupported protocol scheme`)
TLSCert = regexp.MustCompile(`certificate is not trusted`)
// retryable Druid errors
coordinatorDown = regexp.MustCompile("A leader node could not be found for")
brokerDown = regexp.MustCompile("There are no available brokers")
noObject = regexp.MustCompile("Object '.*' not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider prefixing with errPattern or something like that

Comment on lines 58 to 65
type coordinatorHTTPCheck struct {
c *sqlConnection
}

var _ retrier.AdditionalTest = &coordinatorHTTPCheck{}

// if Coordinator is down for a transient reason it's not a hard failure
func (chc *coordinatorHTTPCheck) IsHardFailure(ctx context.Context) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move to the bottom of the file (as per the style guide, types should be declared in rough call order)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coordinatorHTTPCheck is used right below

Comment on lines 79 to 89
req.Header.Add("Content-Type", "application/json")
resp, err := c.client.Do(req)
resp, err := chc.c.client.Do(req)
if err != nil {
return nil, err
return false, err
}

switch resp.StatusCode {
case http.StatusTooManyRequests:
return false, nil
case http.StatusUnauthorized, http.StatusForbidden:
return true, fmt.Errorf("Unauthorized request")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cases will return without closing the body

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

resp.Body.Close()
})
req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.dsn, bodyReader)
req, err := http.NewRequestWithContext(ctx, http.MethodPost, chc.c.dsn, bodyReader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to extract the actual request and error handling into a single generic function and use structs for the various request/response types? It might also help ensure consistent error handling everywhere. Like:

req := &queryRequest{...}
resp := &segmentsResult{}
err := chc.c.request(ctx, req, resp)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do that in a separate PR

Comment on lines 173 to 178
switch resp.StatusCode {
case http.StatusTooManyRequests:
return nil, retrier.Retry, fmt.Errorf("Too many requests")
case http.StatusUnauthorized, http.StatusForbidden:
return nil, retrier.Fail, fmt.Errorf("Unauthorized request")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response body is not closed in these error conditions. Why not use defer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added defer in the other request. Here the body should go to a handler that will parse it outside of the function so that we can support drivers.Rows.Next() semantics.

Comment on lines 212 to 216
transformers := make([]func(any) (any, error), len(columns))
for i, c := range types {
transformers[i] = identityTransformer
switch c {
case "TINYINT":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switch statement is pretty long/deep. Could it be pulled into a separate function?

Comment on lines 198 to 205
func (i informationSchema) Lookup(ctx context.Context, db, schema, name string) (*drivers.Table, error) {
q := `
// ensure Coordinator is ready
q := "SELECT * FROM sys.segments LIMIT 1"
rows, err := i.c.db.QueryxContext(ctx, q, name)
if err != nil {
return nil, err
}
rows.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should avoid running a separate query to check for readiness on every single query. Also, isn't this already handled in the retrier in the QueryContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only on Lookup query and it's negligible compared to the whole amount of other requests.
The issues is that the request

		SELECT
			T.TABLE_SCHEMA AS SCHEMA,
			T.TABLE_NAME AS NAME,
			T.TABLE_TYPE AS TABLE_TYPE, 
			C.COLUMN_NAME AS COLUMN_NAME,
			C.DATA_TYPE AS COLUMN_TYPE,
			C.IS_NULLABLE = 'YES' AS IS_NULLABLE
		FROM INFORMATION_SCHEMA.TABLES T 
		JOIN INFORMATION_SCHEMA.COLUMNS C ON T.TABLE_SCHEMA = C.TABLE_SCHEMA AND T.TABLE_NAME = C.TABLE_NAME
		WHERE T.TABLE_SCHEMA = 'druid' AND T.TABLE_NAME = ?
		ORDER BY SCHEMA, NAME, TABLE_TYPE, C.ORDINAL_POSITION

returns false-negative if the Coordinator is being restarted. Retrier is a more abstract class and it doesn't check if SQL tries to retreive dynamic schema and the will be no error from Druid Router (because if dynamic schema is empty - it's considered OK by the cluster).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a description.

Comment on lines 2265 to 2267
if os.Getenv("METRICS_CREDS") == "" {
t.Skip("skipping the test without the test instance")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should avoid hard-coding this env var in many places through the tests. Consider checking it once in e.g. NewInstanceForDruidProject and calling t.Skip if it's not set there.

Comment on lines 237 to 243
var err error
creds := os.Getenv("METRICS_CREDS")
address := os.Getenv("METRICS_ADDRESS")
data, err := base64.StdEncoding.DecodeString(creds)
if err != nil {
return nil, "", err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this with a single env var RILL_RUNTIME_DRUID_TEST_DSN? And save it to our shared .env file so everyone on the team is able to run it (see rill devtool dotenv -h)

Comment on lines 58 to 62
type coordinatorHTTPCheck struct {
c *sqlConnection
}

var _ retrier.AdditionalTest = &coordinatorHTTPCheck{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to later in the file – style guide says to sort functions in call order with exported functionality first (i.e. QueryContext first)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't complain when CI tests were run. Shouldn't we skip it - it's looks good only from one side but it contradicts the rule - place dependent parts closer together to avoid scrolling back and forth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a lint rule, but described in the style guide here: https://github.com/uber-go/guide/blob/master/style.md#function-grouping-and-ordering.

I think it's a good rule to sort declarations by call order – it doesn't feel intuitive to look at a Druid driver and the first 100+ lines of code are internal network retrying logic.

Copy link
Contributor Author

@egor-ryashin egor-ryashin Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But otherwise you'll see the struct you don't know and have to scroll to figure out what it is.

Copy link
Contributor Author

@egor-ryashin egor-ryashin Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover I believe if the auto-formatter cannot render a style rule by itself then it's a maintenance burden - it consumes your focus away from thinking about logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Comment on lines 165 to 168
resp, err := c.client.Do(req)
if err != nil {
// nolint:errorlint // there's no wrapping
if v, ok := err.(*url.Error); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know how the library will change in the future. Better to use errors.As which is created for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I get the message then?

Copy link
Contributor

@begelundmuller begelundmuller Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what errors.As helps with? Maybe you're confusing errors.Is and errors.As?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean error.As returns a bool but then after that I have to get the error message.
I agree that the library implementation can change but it will be incompatible changes for all clients (which is rare and a bad practice by the library). But anyway, if they change they can wrap it double time and this code fails again. So I should Unwrap recursively - which looks weird and clumsy for error handling.
So for example what can happen if the error is wrapped double time:

type Er1 struct {
}

func (*Er1) Error() string {
	return "er1"
}

type wrapError struct {
	msg string
	err error
}

func (w *wrapError) Error() string {
	return w.msg
}

func (w *wrapError) Unwrap() error {
	return w.err
}

func ret() error {
	var e error
	e = &Er1{}
	e = &wrapError{
		msg: "extra info",
		err: e,
	}
	e = &wrapError{
		msg: "superextra info",
		err: e,
	}
	return e
}

var _ error = &Er1{}

func TestError(t *testing.T) {
	e := ret()
	e0 := &Er1{}
	require.True(t, errors.As(e, &e0))
	require.Equal(t, "er1", errors.Unwrap(e).Error())
}
        	            	expected: "er1"
        	            	actual  : "extra info"

Copy link
Contributor

@begelundmuller begelundmuller Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors.As doesn't just return a bool, it also assigns the error – it handles the recursive unwrapping automatically. Check the docs, I'm pretty sure it's all you need here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

// nolint:errorlint // there's no wrapping
if v, ok := err.(*url.Error); ok {
if errTooManyRedirects.MatchString(v.Error()) {
resp.Body.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may panic – accessing the body when the request failed.

It's a problem multiple places, I would serialize consider if this can be refactored to use only defer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. But defer won't work - see next comment.

Copy link
Contributor

@begelundmuller begelundmuller Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it. In that case, it's worth considering if things can be re-organized in such a way that resp.Body.Close needs to be manually called in as few cases as possible (to reduce the chance of developer errors, as well as the risk of a buffer leak if a panic occurs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think about it in another PR.

Comment on lines 240 to 245
var err error
b64 := os.Getenv("RILL_RUNTIME_DRUID_TEST_DSN")
dsn, err := base64.StdEncoding.DecodeString(b64)
if err != nil {
return nil, "", err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be base64 encoded? We have other env variables containing DSNs that are not.

Copy link
Contributor Author

@egor-ryashin egor-ryashin Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It contains password in format https://username:password@hostname...
Bash has special symbols that the password can contain, to reduce the mental burden remembering which ones I just coded it into base64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought bash/env variables can contain any ASCII string? Just need to escape it or use single quotes to enter it.

Given that all other environment variables in the project do not use base64, I think we should avoid it here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try it.

runtime/testruntime/testruntime.go Show resolved Hide resolved
Copy link
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some follow-up comments on various previous review comments. Please take a look as some of them require some more changes.

re := retrier.NewRetrier(6, 2*time.Second, &coordinatorHTTPCheck{
c: c,
})
return re.RunCtx(ctx, func(ctx2 context.Context) (driver.Rows, retrier.Action, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it on purpose that most calls use ctx instead of ctx2? What is the implication of using one vs. the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

return l, nil
}

return nil, retrier.Retry, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to retry when we don't even get a response back? For example, might just be the DSN that's broken or a local firewall, it's not great to have 126 seconds of retrying just to discover that.

Should we maybe opt for just retrying on the retryable HTTP status codes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

runtime/testruntime/testruntime.go Show resolved Hide resolved
@@ -224,11 +230,20 @@ func NewInstanceForProject(t TestingT, name string) (*runtime.Runtime, string) {
return rt, inst.ID
}

func NewInstanceForDruidProject(t TestingT) (*runtime.Runtime, string) {
func NewInstanceForDruidProject(t *testing.T) (*runtime.Runtime, string, error) {
_, err := os.Stat(".env")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Go tests always evaluate relative to the test file, not the repo root. So you probably need to do something similar to this to get the path of the root .env file:

_, currentFile, _, _ := goruntime.Caller(0)
projectPath := filepath.Join(currentFile, "..", "testdata", name)

Please test and confirm that the approach works

func NewInstanceForDruidProject(t TestingT) (*runtime.Runtime, string) {
func NewInstanceForDruidProject(t *testing.T) (*runtime.Runtime, string, error) {
_, err := os.Stat(".env")
if !errors.Is(err, fs.ErrNotExist) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be other errors than file not existing – it would probably be better to do if err == nil, which is the only way to know that the file actually exists

func NewInstanceForDruidProject(t *testing.T) (*runtime.Runtime, string, error) {
_, err := os.Stat(".env")
if !errors.Is(err, fs.ErrNotExist) {
require.NoError(t, godotenv.Load())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the comment above – you may need to pass the repo root path here for this to work

@egor-ryashin
Copy link
Contributor Author

Done.

func NewInstanceForDruidProject(t TestingT) (*runtime.Runtime, string) {
func NewInstanceForDruidProject(t *testing.T) (*runtime.Runtime, string, error) {
_, currentFile, _, _ := goruntime.Caller(0)
envPath := filepath.Join(currentFile, "..", "..", ".env")
Copy link
Contributor

@begelundmuller begelundmuller Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this resolve to <repo>/runtime/.env, not <repo>/.env? Can you try any changes to confirm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@begelundmuller begelundmuller merged commit dc46b12 into main Aug 23, 2024
4 checks passed
@begelundmuller begelundmuller deleted the druid-ci-tests branch August 23, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants