Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test init_pipeline outside of the test process and drop the (partial) database if the initialisation fails #165

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

muffato
Copy link
Contributor

@muffato muffato commented Jun 19, 2020

Use case

  1. init_pipeline was the last script tested in the test process itself. A long time ago, I thought it would simplify testing, but I gradually realised it's causing more problems because it is not isolated from the test process. All test functions have since been moved to calling the script with system and init_pipeline was the last one to work the old way.
    One problem is that it loads all the Runnables and registers the pipeline in TheApiary. And since the main script was not tested, some tests were explicitly not using the test function in order to test the script.

  2. The other thing I'm addressing here is the ability to expect script failures and to test the error message.

  3. If there is say a syntax error in a Runnable, init_pipeline.pl rightly detects it:

    $ init_pipeline.pl Bio::EnsEMBL::Compara::PipeConfig::DumpMultiAlign_conf $(mysql-ens-compara-prod-5-ensadmin details hive) -compara_db $(mysql-ens-compara-prod-9-ensadmin details url muffato_salmonids_epo_101) -division vertebrates -mlss_id 1894 -split_by_chromosome 0 -rel_suffix b
    > Parsing the command-line options.
    > Running the creation commands.
    > Parsing the PipeConfig file and adding objects (this may take a while).
    The runnable module 'Bio::EnsEMBL::Compara::RunnableDB::DumpMultiAlign::MLSSJobFactory' cannot be loaded or compiled:
    Global symbol "$mlss_id" requires explicit package name (did you forget to declare "my $mlss_id"?) at /homes/muffato/workspace/src/rel101/ensembl-compara/modules/Bio/EnsEMBL/Compara/RunnableDB/DumpMultiAlign/MLSSJobFactory.pm line 98.
    Global symbol "$mlss_id" requires explicit package name (did you forget to declare "my $mlss_id"?) at /homes/muffato/workspace/src/rel101/ensembl-compara/modules/Bio/EnsEMBL/Compara/RunnableDB/DumpMultiAlign/MLSSJobFactory.pm line 104.
    Compilation failed in require at (eval 41) line 2.
    

    However, this leaves the database in a partial state and I need to add -hive_force_init 1 once I have fixed the Runnable.

Description

1/2. I have reimplemented the test init_pipeline function to use _test_ehive_script, which now has an extra parameter named $flags with test options. At this time the only option is expect_failure which is either 1 or the regular expression to match the script's standard error against.
3. I made init_pipeline.pl drop the database if the initialisation is not complete, which makes it quicker for the user.

Possible Drawbacks

Perhaps some users manipulate TheApiary in tests after calling init_pipeline. They would have to add a line of code to load the pipeline from the database.

Testing

Have you added/modified unit tests to test the changes?

Yes

If so, do the tests pass/fail?

Yes

Have you run the entire test suite and no regression was detected?

Yes

@muffato muffato force-pushed the feature/init_pipeline_test branch 2 times, most recently from 298e451 to 6ce82b2 Compare October 7, 2020 01:36
Copy link
Contributor

@ens-bwalts ens-bwalts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been open so long that peekJob got added after it was submitted. Perhaps update the documentation here to indicate that flags can be passed through to _test_ehive_script as well

@muffato
Copy link
Contributor Author

muffato commented Nov 27, 2020

Done. I have rebased the branch and added the doc for peekJob and tweak_pipeline

init_pipeline was the last script tested this way. That caused some
issues because it is not isolated from the test process, and for
instance it will register the pipeline in TheApiary. Also, the test is
more reliable if the actual script is tested.
@codecov
Copy link

codecov bot commented Nov 27, 2020

Codecov Report

Merging #165 (c1bee2a) into master (d165d11) will decrease coverage by 0.01%.
The diff coverage is 79.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #165      +/-   ##
==========================================
- Coverage   82.79%   82.78%   -0.02%     
==========================================
  Files         184      184              
  Lines       10557    10564       +7     
  Branches     1660     1668       +8     
==========================================
+ Hits         8741     8745       +4     
+ Misses       1126     1122       -4     
- Partials      690      697       +7     
Impacted Files Coverage Δ
modules/Bio/EnsEMBL/Hive/Scripts/InitPipeline.pm 65.85% <22.22%> (-12.94%) ⬇️
modules/Bio/EnsEMBL/Hive/Utils/Test.pm 80.71% <86.36%> (-0.35%) ⬇️
t/10.pipeconfig/analysis_heir.t 96.15% <90.90%> (-3.85%) ⬇️
t/02.api/create_drop_database.t 88.00% <100.00%> (ø)
t/02.api/fetch_and_count_by_multiple_columns.t 100.00% <100.00%> (ø)
t/03.scripts/beekeeper_big_red_button.t 100.00% <100.00%> (ø)
t/03.scripts/beekeeper_opts.t 100.00% <100.00%> (ø)
t/03.scripts/generate_graph.t 82.50% <100.00%> (-1.23%) ⬇️
t/10.pipeconfig/gc_dataflow.t 100.00% <100.00%> (ø)
t/10.pipeconfig/pipeline_url.t 100.00% <100.00%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d165d11...c1bee2a. Read the comment docs.

@muffato muffato changed the title Test init_pipeline outside of the test process and added an option to test failures Test init_pipeline outside of the test process and drop the (partial) database if the initialisation fails Nov 29, 2020
@muffato
Copy link
Contributor Author

muffato commented Nov 29, 2020

I have added there a few commits to deal with dropping the database when init_pipeline fails

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants