diff --git a/docs/source/dev/contributing.rst b/docs/source/dev/contributing.rst index 8e30f84..cd29310 100644 --- a/docs/source/dev/contributing.rst +++ b/docs/source/dev/contributing.rst @@ -42,7 +42,7 @@ Here are a sequence of steps to help with your code contribution: 3. Install the package by running ``poetry install`` at the command line. 4. Verify that all tests pass on your system by running ``poetry run pytest`` at the command line. In case of failures, conduct a thorough investigation. If you require assistance in diagnosing the issue, follow the guidelines for filing :ref:`bug-reports`. 5. Construct test cases that effectively illustrate the bug or feature. -6. Implement your changes, including any relevant documentation updates following our :ref:`documentation-contributions` fguidelines. +6. Implement your changes, including any relevant documentation updates following our :ref:`documentation-contributions` guidelines. 7. Re-run the complete test suite to ensure the success of all tests. 8. Format and analyze your code according to our :ref:`code-format-and-analysis` guidelines. 9. Ensure the docs build following the :ref:`documentation-contributions` guidelines. @@ -115,6 +115,6 @@ If you are proposing a feature, please use the `Feature request`_ issue template Commit Messages --------------- -Commit messages are incredibly valuable for understanding our project's code. When crafting your commit message, please provide context about the changes being made and the reasons behind the chosen implementation. +Commit messages are incredibly valuable for understanding our project's code. When crafting your commit message, please provide context about the changes being made and the reasons behind them. To ensure readability, we recommend to keep the commit message header under 52 characters and the body within 72 characters. \ No newline at end of file diff --git a/docs/source/dev/design.rst b/docs/source/dev/design.rst index c689e9f..1443bb1 100644 --- a/docs/source/dev/design.rst +++ b/docs/source/dev/design.rst @@ -27,9 +27,10 @@ Non-functional design requirements we considered: * The application should be intuitive for novice users. * Meets performance requirements for batch conversion or real-time updates of metadata records in data repositories. +* Test each conversion strategy against a consistent set of pre-defined criteria. -Strategy Pattern ----------------- +Architecture +------------ The system architecture implements the `Strategy Pattern`_, a behavioral design pattern that allows us to define a set of algorithms for converting metadata, encapsulate each one, and make them interchangeable. This pattern enables the client code to choose an algorithm or strategy at runtime without needing to know the details of each algorithm's implementation. This flexibility applies not only to metadata conversion but also to a test interface implementing a consistent set of checks. @@ -48,7 +49,7 @@ Implementation Overview: With this pattern, new support for metadata standards or versions can be easily added as strategy modules without modifying the client code or test suite. -Users typically define workflows that iterate over a series of metadata files. For each file, along with its corresponding strategy and any unmappable properties expressed as `kwargs`, users invoke `main.convert`, which then returns a SOSO record. +Users typically define workflows that iterate over a series of metadata files. For each file, along with its corresponding strategy and any unmappable properties expressed as `kwargs`, users invoke `convert`, which then returns a SOSO record. .. image:: sequence_diagram.png :alt: Strategy Pattern @@ -62,15 +63,17 @@ We utilize the `Simple Standard for Sharing Ontological Mappings`_ (SSSOM) for s We apply SSSOM following `SSSOM guidelines`_, with some nuanced additions tailored to our project's needs. One such addition is the inclusion of a `subject_category` column, which aids in grouping and improving the readability of highly nested `subject_id` values. Additionally, we've formatted `subject_id` values using an arbitrary hierarchical path-like expression, enhancing clarity for the reader in understanding which property is being referenced. Note, while this path is human-readable, it is not machine-actionable. -Beyond these general differences, each metadata standard's mapping may have unique nuances that should be considered. These are documented in each metadata standard's SSSOM .yml file, located in the `src/soso/data/` directory. +Beyond these general differences, each metadata standard's mapping may have unique nuances that should be considered. These are documented in each metadata standard's SSSOM.yml file, located in the `src/soso/data/` directory. -Creating or updating a metadata standard's SSSOM files involves subjectively mapping properties. To mitigate subjectivity, we've established a set of mapping guidelines (see below). Additionally, we recommend having a second set of eyes review any mapping work to identify potential biases or misunderstandings. The original mapping creator is listed in the SSSOM and can serve as a helpful reference for clarification. +Creating or updating a metadata standard's SSSOM files involves subjectively mapping properties. To mitigate subjectivity, we've established a set of :ref:`predicate-mapping-guidelines`. Additionally, we recommend having a second set of eyes review any mapping work to identify potential biases or misunderstandings. The original mapping creator is listed in the SSSOM and can serve as a helpful reference for clarification. Before committing any changes to SSSOM files, it's a good practice to thoroughly review them to ensure unintended alterations haven't been made to other parts of the SSSOM files. Given the file's extensive information and nuanced formatting, careful attention to detail is important. -.. _Simple Standard for Sharing Ontological Mappings: https://mapping-commons.github.io/sssom/about/ +.. _Simple Standard for Sharing Ontological Mappings: https://mapping-commons.github.io/sssom/ .. _SSSOM guidelines: https://mapping-commons.github.io/sssom/mapping-predicates/ +.. _predicate-mapping-guidelines: + Predicate Mapping Guidelines ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -113,7 +116,7 @@ This section outlines the conditions for implementing a mapping in code. Our goa Testing ------- -The test suite utilizes the strategy design pattern to implement a standardized set of checks that all strategies must undergo (`tests/test_strategies.py`). It verifies that returned property values (resource types and data types) adhere to SOSO conventions. It ensures that null values (e.g., `""` for strings) or containers (e.g., `[]` for lists) are not returned, thereby reducing the accumulation of detritus in the resultant SOSO record. Additionally, verification tests against snapshots of full SOSO records help check the consistency of inputs and outputs produced by the system (`tests/test_main.py`). +The test suite utilizes the strategy design pattern to implement a standardized set of checks that all strategies must undergo (`tests/test_strategies.py`). It verifies that returned property values (resource types and data types) adhere to SOSO conventions. It ensures that null values (e.g., `""`) or containers (e.g., `[]`) are not returned, thereby reducing the accumulation of detritus in the resultant SOSO record. Additionally, verification tests against snapshots of full SOSO records help check the consistency of inputs and outputs produced by the system (`tests/test_main.py`). Setting up tests for a new strategy requires only creating a strategy instance, essentially a metadata record read into the strategy module, and running through each method test in the `test_strategies.py` module. To test negative cases, an empty metadata record is used. This helps ensure that strategy methods correctly handle scenarios where the metadata record lacks content. @@ -122,12 +125,12 @@ Strategy-specific utility functions are tested in their own test suite module na Schema Versioning ----------------- -To ensure compatibility with multiple versions of supported metadata standards, this application employs a schema version handling mechanism. During conversion: +To ensure compatibility with multiple versions of supported metadata standards, `soso` employs a schema version handling mechanism. During conversion: -* The application parses the schema version information directly from the metadata record itself. -* This extracted information is then stored as an attribute within the conversion strategy. -* Conversion methods for individual properties can access this schema version attribute allowing the flow control logic within the conversion process to leverage the schema version. -* Based on the identified version, the logic applies specific processing rules for each property. +* The conversion strategy parses the schema version information directly from the metadata record itself. +* This extracted information is then stored as an attribute within the strategy. +* Conversion methods for individual properties can access this attribute allowing the flow control logic within the conversion process to leverage the schema version. +* Based on the identified version, the logic applies specific processing rules. This approach ensures that even backward-incompatible changes introduced between schema versions are handled gracefully, maintaining overall conversion success. @@ -139,7 +142,7 @@ The Strategy Pattern employed in our application enables a high degree of user c * Properties that don’t map from a metadata standard but require external data, such as dataset landing page URLs. * Properties requiring custom processing due to community-specific application of metadata standards. -These cases can be addressed by providing information as `kwargs` to the main.convert function, which overrides properties corresponding to `kwargs` key names, or by modifying existing strategy methods through method overrides. For further details, refer to the user :ref:`quickstart`. +These cases can be addressed by providing information as `kwargs` to the convert function, which overrides properties corresponding to `kwargs` key names, or by modifying existing strategy methods through method overrides. For further details, refer to the user :ref:`quickstart`. Setting Up a New Metadata Conversion Strategy --------------------------------------------- @@ -186,7 +189,7 @@ Steps: 8. **Verification Tests:** - * Add a snapshot of the expected SOSO record generated by `main.convert` to `tests/data/` for verification tests. + * Add a snapshot of the expected SOSO record generated by `convert` to `tests/data/` for verification tests. 9. **Testing:** @@ -208,4 +211,4 @@ Before settling on the Strategy Pattern as the design for this project, we consi The benefits of the JSON-LD Framing approach include ease of extension to other metadata standards through the creation of new crosswalks and simplified maintenance, as modifications are primarily made to the crosswalk file. However, this approach has its downsides. Some metadata standards cannot be serialized to JSON-LD, necessitating additional custom code. Additionally, when dealing with metadata standards with nested properties, framing results in information loss, as framing works best for flat sets of properties. -Ultimately, we determined that the potential loss of information during conversion outweighed the benefits of simplified maintenance. Furthermore, it was not evident that JSON-LD Framing offered a less complex solution compared to the Strategy Pattern. +Ultimately, we determined that the potential loss of information during conversion outweighed the benefits of simplified maintenance. diff --git a/docs/source/dev/maintaining.rst b/docs/source/dev/maintaining.rst index 1daa636..d1c9987 100644 --- a/docs/source/dev/maintaining.rst +++ b/docs/source/dev/maintaining.rst @@ -17,7 +17,7 @@ If you are unable to respond fully to a pull request or issue in a timely manner Pull Request Review ------------------- -Pull request review facilitates refinement of a contribution before it's incorporated into the project. The convert goals are to ensure the contribution is consistent with the project's design, is well-documented, and is well-tested. We are not looking for perfection, but rather that the contribution does what it is intended to do. +Pull request review facilitates refinement of a contribution before it's incorporated into the project. The goals are to ensure the contribution is consistent with the project's design, is well-documented, and is well-tested. We are not looking for perfection, but rather that the contribution does what it is intended to do. *Though pull request review is required by the project's GitHub branch protection rules, maintainers are allowed to bypass review. Having said this, we generally encourage review in all cases.* @@ -30,7 +30,7 @@ Here are a steps to help with your pull request review: 5. Check for compliance with our :ref:`commit-message` style. 6. Submit the review. -If collaboration on the pull request is needed, convert a `feature branch` in GitHub, change the base branch of the pull request from `development` to the newly created `feature branch`, and merge. This allows the maintainer to lend a helpful hand. +If collaboration on the pull request is needed, create a `feature branch` in GitHub, change the base branch of the pull request from `development` to the newly created `feature branch`, and merge. This allows the maintainer to lend a helpful hand. When a pull request passes review, it is ready to be merged into the `development` branch (see :ref:`merging-features-into-development`). @@ -67,7 +67,7 @@ Do your best to keep the commit message header from exceeding 52 characters in l Branch Management ~~~~~~~~~~~~~~~~~ -The `convert` branch always reflects the current stable release, a `development` branch is used for incorporating new features, and `feature` branches implement changes. The `development` branch is always in a releasable state. +The `main` branch always reflects the current stable release, a `development` branch is used for incorporating new features, and `feature` branches implement changes. The `development` branch is always in a releasable state. .. _feature-branches: @@ -102,21 +102,21 @@ If at this point, part of the feature was forgotten, don't restore the `feature Merging Development into Main ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -When it's time to convert a new release, a project maintainer, with repository write access, will merge the `development` branch into `convert` locally, and then push to the remote, which will then kick-start the automated release workflow (see :ref:`cd-workflow`). This approach to merging, is taken in order to preserve a linear commit history and to retain the Angular styled commit messages required by `Python Semantic Release`_. +When it's time to create a new release, a project maintainer, with repository write access, will merge the `development` branch into `main` locally, and then push to the remote, which will then kick-start the automated release workflow (see :ref:`cd-workflow`). This approach to merging, is taken in order to preserve a linear commit history and to retain the Angular styled commit messages required by `Python Semantic Release`_. -Here's a sequence of steps for merging `development` into `convert` and creating a new release: +Here's a sequence of steps for merging `development` into `main` and creating a new release: -1. Open a pull request from the `development` branch to `convert`. +1. Open a pull request from the `development` branch to `main`. 2. Check that the :ref:`ci-workflow` and other requirements pass. 3. Get a pull request review from another maintainer (if possible). 4. *Do not merge in GitHub!* Instead follow these steps: - i. Pull the remote `development` and `convert` branches into your local repository. - ii. Merge the `development` branch into `convert`. - iii. Push your local `convert` branch to the remote. + i. Pull the remote `development` and `main` branches into your local repository. + ii. Merge the `development` branch into `main`. + iii. Push your local `main` branch to the remote. 5. Ensure both the :ref:`ci-workflow` and :ref:`cd-workflow` complete successfully. 6. Ensure the docs build and deploy successfully on `readthedocs.io`_. 7. Check the pull request has been merged and closed out. -8. Pull the remote `convert` and `development` branches back into your local repository. This will keep your local branches in sync with the remote, which the semantic release made modifications to during the release process. +8. Pull the remote `main` and `development` branches back into your local repository. This will keep your local branches in sync with the remote, which the semantic release made modifications to during the release process. .. _readthedocs.io: https://soso.readthedocs.io/en/latest/ @@ -125,12 +125,12 @@ Here's a sequence of steps for merging `development` into `convert` and creating Hot Fixes ^^^^^^^^^ -Hotfixes should always be implemented in a `feature branch`, which is merged into `development`, and then merged into `convert` using the approaches outlined above. Implementing a hotfix in `convert` and merging into `development` will convert problems in the commit history. +Hotfixes should always be implemented in a `feature branch`, which is merged into `development`, and then merged into `main` using the approaches outlined above. Implementing a hotfix in `main` and merging into `development` will create problems in the commit history. Branch Protection Rules ~~~~~~~~~~~~~~~~~~~~~~~ -GitHub branch protection rules are used to help ensure the integrity of the codebase. The following rules are enforced on the `development` and `convert` branches: +GitHub branch protection rules are used to help ensure the integrity of the codebase. The following rules are enforced on the `development` and `main` branches: * Require a pull request approval before merging * Require status checks to pass before merging @@ -143,7 +143,7 @@ GitHub branch protection rules are used to help ensure the integrity of the code Secrets ~~~~~~~ -A GitHub repository secret, containing the personal access token of one of the maintainers with write access, is required for the :ref:`cd-workflow` to complete. This token should be added to the project's repository secrets with the name ``RELEASE_TOKEN``. This authentication is used by `Python Semantic Release`_ to commit changes created during the release proces to the `convert` branch, which are then merged into the `development` branch. This latter step ensures the two branches remain synchronized. +A GitHub repository secret, containing the personal access token of one of the maintainers with write access, is required for the :ref:`cd-workflow` to complete. This token should be added to the project's repository secrets with the name ``RELEASE_TOKEN``. This authentication is used by `Python Semantic Release`_ to commit changes created during the release proces to the `main` branch, which are then merged into the `development` branch. This latter step ensures the two branches remain synchronized. Workflows ~~~~~~~~~ @@ -155,7 +155,7 @@ GitHub Actions are used for continuous integration and delivery. CI Workflow ^^^^^^^^^^^ -The CI workflow is run on each pull request and push to the `development` and `convert` branches. It performs the following steps: +The CI workflow is run on each pull request and push to the `development` and `main` branches. It performs the following steps: 1. Formats code in *src/* and *tests/* using `Black`_. This check is strictly enforced and will fail the workflow. 2. Analyzes code in *src/* and *tests/* using our project's `Pylint`_ configuration (see :ref:`code-format-and-analysis`). This check is not strictly enforced and will not fail the workflow. However, generally, Pylint recommendations should be followed. @@ -171,17 +171,17 @@ The CI workflow is run on each pull request and push to the `development` and `c CD Workflow ^^^^^^^^^^^ -The CD workflow is run on push to the `convert` branch for releases. It performs the following steps: +The CD workflow is run on push to the `main` branch for releases. It performs the following steps: 1. Runs `Python Semantic Release`_ to build the changelog, convert the distributions, bump the version number, and tag the release. -2. Merges changes in the `convert` branch back into `development` to keep the branches synchronized. +2. Merges changes in the `main` branch back into `development` to keep the branches synchronized. .. _developing-features-as-a-maintainer: Developing Features as a Maintainer ----------------------------------- -As a maintainer, when developing a new feature, you don't have to fork the project repository to your personal GitHub, and submit pull requests via that route. Rather, you may convert a `feature branch` in the project's remote repository, and submit a pull request to `development` from there. +As a maintainer, when developing a new feature, you don't have to fork the project repository to your personal GitHub, and submit pull requests via that route. Rather, you may create a `feature branch` in the project's remote repository, and submit a pull request to `development` from there. Dependency and Environment Management ------------------------------------- diff --git a/docs/source/user/quickstart.rst b/docs/source/user/quickstart.rst index 6809575..c84ae3e 100644 --- a/docs/source/user/quickstart.rst +++ b/docs/source/user/quickstart.rst @@ -29,7 +29,7 @@ The primary function is to convert metadata records into SOSO markup. To perform >>> r '{"@context": {"@vocab": "https://schema.org/", "prov": "http://www. ...}' -For a list of available strategies, please refer to the documentation of the `main.convert` function. +For a list of available strategies, please refer to the documentation of the `convert` function. Adding Unmappable Properties @@ -165,14 +165,14 @@ The `soso` package is designed to be both flexible and extensible. By following return r -If you have any questions or need help, please don't hesitate to reach out to us. +If you have any questions or need help, please don't hesitate to reach out. Notes ----- **Adding Vocabularies** -The `main.convert` function only recognizes vocabularies that are specified within its implementation. You can view the source code for more details on these vocabularies. If you add additional vocabularies to a SOSO graph using property overwrites and method overrides, these vocabularies will have to be defined within an embedded context. +The `convert` function only recognizes vocabularies that are specified within its implementation. You can view the source code for more details on these vocabularies. If you add additional vocabularies to a SOSO graph using property overwrites and method overrides, these vocabularies will have to be defined within an embedded context. **Leverage Partial Property Method Implementations**