Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce TestContainer interface to streamline test setup of supported databases #2053

Merged
merged 37 commits into from
Dec 24, 2024

Conversation

sanyamsinghal
Copy link
Collaborator

@sanyamsinghal sanyamsinghal commented Dec 10, 2024

Describe the changes in this pull request

  • Any package in voyager code, for eg srcdb, tgtdb, connpool, yugabyted etc.. can spin up the required source db(oracle, pg, mysql) with a specific version to write unit tests.

  • Every call to TestContainer.Start() check if a required database(dbtype + version) is already run then reuse it, otherwise start a new one

  • In every package add a TestMain() function, to setup the environment needed specifically for that test.

  • Added a new Github actions workflow to run the integration tests separately(required some oracle instance client libraries).
    Moving forward it will be good to have separation between go tests as unit or integration.

Challenges/Limitations:

  1. Currently we are not calling TestContainer.TerminateAllContainers(), right now all the containers gets terminated at the end, but we should also call that. Challenge was i couldn't find the right point in code where i can call that.

  2. In Golang, for each packages separate new process is spawned to run its tests, hence the containerRegistry is not shared across all the packages due to separate processes, hence with the current approach it is not possible to reuse the same containers across all packages but across tests in a single package.

  • One solution is maintain the containerRegistry info in some json file(act as a registry) which can be reused by all processes

Describe if there are any user-facing changes

NA

How was this pull request tested?

  • All the existing unit tests are working as expected.

Does your PR have changes that can cause upgrade issues?

Component Breaking changes?
MetaDB No
Name registry json No
Data File Descriptor Json No
Export Snapshot Status Json No
Import Data State No
Export Status Json No
Data .sql files of tables No
Export and import data queue No
Schema Dump No
AssessmentDB No
Sizing DB No
Migration Assessment Report Json No
Callhome Json No
YugabyteD Tables No
TargetDB Metadata Tables No

@sanyamsinghal sanyamsinghal self-assigned this Dec 10, 2024
Copy link
Collaborator

@makalaaneesh makalaaneesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! left a few comments

if err != nil {
utils.ErrExit("Failed to connect to mysql database: %w", err)
}
defer testMySQLSource.DB().Disconnect()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since you're only connecting to test that the connections are working, you don't really need the "defer", you can disconnect at this point itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as i understand, the Connect() function also initialises the sql.DB in each case, which later on provides the connection used by functions like GetAllTableNamesRaw. Running Disconnect() would terminate that sql connection pool?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. How is it working now then for each test that uses the test containers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it working now then for each test that uses the test containers?

didn't get you exactly, i think all the tests(srcdb or tgtdb packages) i have added is following this.
Are there any existing tests which don't follow this but still works?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that TestMain runs before any other test in the package.
And if TestMain has a defer to disconnect the DB, then how is it that in TestPostgresGetAllTableNames, you are able to use that container to execute SQLs? It should have been disconnected by then, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makalaaneesh in TestMain we have to call m.Run() which executes all the tests of that package.
So here defer will be invoked after all the tests for that package has run.
like you can i added terminateAllContainers after that.`

I will add one comment in the code for this.


// Test GetAllTableNames
actualTables := testMySQLSource.DB().GetAllTableNames()
expectedTables := []*sqlname.SourceName{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this test-case in isolation, it's confusing where these tables come from. I would be in favour of NOT having a default schema that is assumed with testcontainer startup. Don't really see a point of it.
I think it would be better for each testcase to be self-contained. clearly set up it's schema objects at the beginning of the test, run its tests, delete/drop the schema objects.

Another reason to avoid a single common schema is that over time, we will end up writing all the DDLs for our tests into that schema file itself, which beats the point of having self-container "unit" tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That init schema file can be used for various purposes like setting up something in source db, maybe creating user with specific privileges or anything. So i think we should have make use of it since it good to have the cluster/db precreated with those.

Regarding the schema/objects to be setup for each test separately.
I was thinking to do that but it was increasing the work significantly.
Maybe we can do that in separate PR or this PR would have to wait...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! ExecuteSqls is nice, and using it with defer for cleanup is very useful

yb-voyager/test/containers/testcontainers.go Show resolved Hide resolved
@CLAassistant
Copy link

CLAassistant commented Dec 16, 2024

CLA assistant check
All committers have signed the CLA.

@sanyamsinghal sanyamsinghal force-pushed the sanyam/integ-tests branch 2 times, most recently from fb5079b to 50ee8b2 Compare December 18, 2024 16:39
@sanyamsinghal sanyamsinghal marked this pull request as ready for review December 18, 2024 16:55
Sanyam Singhal and others added 19 commits December 20, 2024 18:42
…ke sequences and pk constraint objects

- fixed to return only table names
…kage

- testcontainers interface and its implementations follows singleton pattern
- implemented container registry to keep track of that and also added TerminateAllContainers() function using the registry
- Using go:embed for storing the content of init schema files, and using the variable to make Start() load init file irrespective of wher the function is called from, in the project
This reverts commit 605e3a6.
Copy link
Collaborator

@makalaaneesh makalaaneesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

.github/workflows/integration-tests.yml Show resolved Hide resolved
@@ -940,6 +940,7 @@ var PG_QUERY_TO_CHECK_IF_TABLE_HAS_PK = `SELECT nspname AS schema_name, relname
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_constraint con ON con.conrelid = c.oid AND con.contype = 'p'
WHERE c.relkind = 'r' -- Only consider ordinary tables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to do this, we should report partitioned tables not having pk as well right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to have p and r both. Previously it was also considering objects like sequences since they also don't have PK on them.

@sanyamsinghal sanyamsinghal merged commit 43aaeeb into main Dec 24, 2024
43 checks passed
@sanyamsinghal sanyamsinghal deleted the sanyam/integ-tests branch December 24, 2024 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants