Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline integration testing #83

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open

Conversation

zlochina
Copy link
Collaborator

Created simple integration tests based on example .osm file

@zlochina zlochina requested a review from F-I-D-O September 20, 2024 10:42
@zlochina
Copy link
Collaborator Author

@F-I-D-O, i've got an issue with integration tests, which im not really sure how to solve. The integration tests work fine if the tables are clear (or at least do not influence test data), but im afraid that these tests shoulnd't be run, if the user has already uploaded some data (these tests could lead to modifications of original data)

@F-I-D-O
Copy link
Member

F-I-D-O commented Sep 20, 2024

@F-I-D-O, i've got an issue with integration tests, which im not really sure how to solve. The integration tests work fine if the tables are clear (or at least do not influence test data), but im afraid that these tests shoulnd't be run, if the user has already uploaded some data (these tests could lead to modifications of original data)

The test should be run in a test scheme, just like the SQL unit tests. The scheme should be initialized empty before any test.

@F-I-D-O
Copy link
Member

F-I-D-O commented Sep 20, 2024

Additionally, integration tests should contain tests testing the whole pipeline at once

@zlochina
Copy link
Collaborator Author

@F-I-D-O, i've got an issue with integration tests, which im not really sure how to solve. The integration tests work fine if the tables are clear (or at least do not influence test data), but im afraid that these tests shoulnd't be run, if the user has already uploaded some data (these tests could lead to modifications of original data)

The test should be run in a test scheme, just like the SQL unit tests. The scheme should be initialized empty before any test.

UPD: resolved

@zlochina
Copy link
Collaborator Author

Additionally, integration tests should contain tests testing the whole pipeline at once

they were already. From setting up the tables up to contraction/computing strongly connected components

@zlochina
Copy link
Collaborator Author

zlochina commented Oct 9, 2024

@F-I-D-O, would like your review on assertion of the export. I've done it in such way, that I've computed resulted files' md5 hashes and they are asserted against newly generated files at each execution of the test.

  • Also, I guess I need to explain how database is initialised. At first, addition of the tables is done into default schema public (or not, depending whether table areas is present), then psql function test_env_constructor is called in order to copy the tables into test_env schema. Only after this test osm data are uploaded into test_env and integration testing begins. The reason I'm mentioning this, is that I've met the problem when cleaning up after execution of the test - in case, the default schema had no tables, we need to delete these tables when cleaning up, but if there were already tables, we need not. I guess solution to this problem is initialisation of the database already in test_env schema, but for that we need to rewrite scripts/install_sql.py to process initialisation with choice of the schema

@zlochina
Copy link
Collaborator Author

zlochina commented Oct 9, 2024

@F-I-D-O, also while researching export flow I encountered one suspicious to me thing. In order to export the data we need to use some kind of config, which is something like this:

---
area_dir: ../../../../../
area_id: 1
map:
  SRID: 4326
  SRID_plane: 32618
  path: dasdsada

In the integration test im creating variable dictionary for export, but it does not seem right to me.

@F-I-D-O
Copy link
Member

F-I-D-O commented Oct 11, 2024

@F-I-D-O, would like your review on assertion of the export. I've done it in such way, that I've computed resulted files' md5 hashes and they are asserted against newly generated files at each execution of the test.

  • Also, I guess I need to explain how database is initialised. At first, addition of the tables is done into default schema public (or not, depending whether table areas is present), then psql function test_env_constructor is called in order to copy the tables into test_env schema. Only after this test osm data are uploaded into test_env and integration testing begins. The reason I'm mentioning this, is that I've met the problem when cleaning up after execution of the test - in case, the default schema had no tables, we need to delete these tables when cleaning up, but if there were already tables, we need not. I guess solution to this problem is initialisation of the database already in test_env schema, but for that we need to rewrite scripts/install_sql.py to process initialisation with choice of the schema

Hashes are not a good solution:

  • for binary files, the format could change in the future, and then the tests need to be updated,
  • for text files, it is useful to compare the file content and see the diff

I suggest keeping the hashes now for binary files and comparing text files as text.

Concerning the second issue, I think it would be to best to modify install_sql.py so that the schema can be configured. There is already an issue for that.

@F-I-D-O
Copy link
Member

F-I-D-O commented Oct 11, 2024

@F-I-D-O, also while researching export flow I encountered one suspicious to me thing. In order to export the data we need to use some kind of config, which is something like this:

---
area_dir: ../../../../../
area_id: 1
map:
  SRID: 4326
  SRID_plane: 32618
  path: dasdsada

In the integration test im creating variable dictionary for export, but it does not seem right to me.

I think we should check if we need all of these configs and if yes, we should use a config file like this ( I think we will setup some config file with Marek...)

@zlochina
Copy link
Collaborator Author

As per the request made from our last meeting, I'm including summary of the updates by this Pull Request:

  • Integration test. Added python/tests/integrations_test.py file to contain all integration tests for the project. Currently there is 1 test called test_integration_base_flow, which tests such flow:

    1. Setting up.
      1. If there are no tables in the public schema (which is checked only by existence of the areas table), call pre_processing function, which is main function imported from scripts.install_sql module.
      2. As the previous step added needed tables, the postgres test_env_constructor() procedure is called (which is actually a procedure from SQL/testing_extension.sql file). This procedure copies every table and sequences from public to test_env.
      3. OSM data is imported to TEST_SCHEMA, which is test_env.
      4. Area corresponding to OSM file is inserted.
      5. OSM data are imported to the python dictionary.
    2. Assertions.
      1. Assertions of correct import of OSM data.
        1. Assertion of nodes, comparing local nodes to nodes from TEST_SCHEMA.
        2. Assertion of ways, ...
        3. Assertion of nodes_ways, ...
        4. Assertion of area, ...
      2. Test of contract_graph_in_area
        1. Assertion of contracted nodes against hardcoded expected result
        2. Assertion of edges against hardcoded expected result
      3. Test of compute_strong_components
        1. Assertion of component data against hardcoded expected result
      4. Test of export to files.
        1. All files are checked by md5 hashing of each of the files
    3. Cleaning up.
      1. fixture cleanup_for_base_flow() is called, when function has finished execution. It:
        1. removes all created files by export procedure
        2. postgres procedure test_env_destructor() is called to remove test_env schema with all testing data.
    4. Space for improving:
      • Update check of table existence. Update either to the state of checking for every needed table or come up with another setting up tables to the test_env schema.
      • Both test_env_constructor, test_env_destructor could accept optional argument with the name of the testing schema. We can pass python module variable TEST_SCHEMA.
      • Current implementation of assertion of the exported files is rather arguable. As discussed in PR, it could be better to rather check content of the file than its hash for the non-binary files.
  • Other files. Beside formatting the python files, these non-trivial changes were done:

    1. roadgraphtool/db.py.
      • Function execute_sql_and_fetch_all_rows() now works with dynamic schema.
    2. Python wrappers of postgres procedures were moved from scripts/main.py to roadgraphtool/db_operations.py.
    3. roadgraphtool/insert_area.py.
      • Function insert_area() now works with dynamic schema.
    4. roadgraphtool/map.py.
      • Both functions _get_map_from_db() and get_map() work with dynamic schema.
    5. scripts/prcoess_osm.py.
      • Resolved bug in setup_ssh_tunnel() described in issue
    6. SQL/main.sql.
      • Attributes public.nodes.id and public.ways.id, public.ways."from", public.ways."to" have updated datatype from bigint to integer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants