Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIKE test cspell integration with pre-commit-hooks #42

Closed
wants to merge 4 commits into from

Conversation

turo-mwong
Copy link

@turo-mwong turo-mwong commented Jan 13, 2024

The purpose of this PR is to exemplify the possible solution for adding spell checking to the Turo dev workflow via pre-commit-hooks

The PR is a draft additional to the Automate spell checking in WEB pull requests RFC to get more feedback from the team.

Screenshots:

Below is an example of how the output would look. There is an extra amount of words here, displaying a bunch of false positives such as "Turo". The output should be greatly trimmed upon adding a dictionary.

Screenshot 2024-01-12 at 3 36 33 PM

#!/bin/bash
set -e

git diff --name-only | npx cspell --no-summary --no-progress --no-must-find-files --file-list stdin
Copy link
Author

@turo-mwong turo-mwong Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Think this is a good stopping point for SPIKE. I think there is definitely a way to setup a shared dictionary which we can determine later.

  2. Possible limitation with shared dictionaries: In the current implementation we require a dictionary to be present in the repo with the pre-commit-config.yaml. The reason is because this command is run in target repo work directory (it cannot reference files in @turo/pre-commit-hooks) If we can expose the dictionary some how then it's fine. Otherwise we may want to explore the cspell pre-commit hook.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this makes sense but would it be possible for us to maintain our own cspell config like we do for commitlint for example, and then that repo would hold our dictionary, and we can just extend it in the local cspell.config.yaml file similar to https://github.com/turo/web-front-end-app/blob/main/.commitlintrc.yaml?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmarJaroudi I don't think that is possible, because the dictionary/configuration needs to be present in a cspell.json file in the project root where the commit is happening, rather than logic embedded in pre-commit-hooks: docs + docs 2.

I think that's why this implementation has everything in the CLI: the only way to "hide away" these details is embedding most of the logic in a script. I think the problem with this approach is that it would require any consumer repos to "extend" this scripting logic with their own dictionaries/configuration, all provided via CLI, and wherever possible I think it's better to have configuration code over in-lined scripting code. Even well-maintained and commented script code is not as intuitive to read as configuration files (.json/.yaml), in my opinion.

I've added an example PR for what maintaining a common dictionary setup would look like in #44, with an example PR in Schumacher here: https://github.com/turo/web-schumacher-app/pull/9234, with the understanding that repos that benefit from spell-checking would need to add a configuration file, cspell.jsonc, to extend from the dictionary committed in pre-commit-hooks.

The reason I think this is an advisable alternative is that it is easily extensible: we can maintain a common dictionary and add repo-specific configuration as needed. These configuration files are easier to read--and track changes--than pure scripting. I think we need to build functionality into the repos because, based on my experience of bringing Schumacher into compliance, there are a lot of false positives (and existing typos 😆), and it's a judgment call as to whether these violations should exist only in the repo, or be moved to a common dictionary.

I am in particular imagining the following scenario: if I'm an engineer with a low opinion about the ROI of this tooling, it's one thing for me to add a new word (or ignore glob) in the repo I'm working in, as an additional part of my commit, when I'm blocked by a violation. It's another if I'm going to have to touch this repo on a per commit basis to resolve violations. I don't think that's sufficiently high DX, and so a non-starter. We want the bar for "the ability to contribute/modify" be pretty low, and I think keeping a plaintext dictionary with example cspell.jsonc suffices in that regard.

cc @turo-mwong

Copy link
Contributor

@OmarJaroudi OmarJaroudi Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at cspell-dicts https://github.com/streetsidesoftware/cspell-dicts/blob/main/dictionaries/ar/README.md, my thought was to try and mimic that - extend/import a shared dictionary that would live in its own repo, and then override this dict as we go in each repo if there are one-off typos that we want to disable for specific repos and not others. If that works I dont think folks would have to touch the shared dictionary frequently (unless they want to introduce a new word that they think shouldnt be a typo across all our codebase). I'll have to look more into this just juggling too many things atm 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PRs I shared (#44 + https://github.com/turo/web-schumacher-app/pull/9234) approximate what you are describing. We maintain a turo-dictionary in this repo that other repos can extend in their cspell.json project root file. When you have time, a quick sniff-test of the example PRs would help provide some Platform buy-in, at which point we can modify the RFC accordingly and circulate the content in #front-end-quality to get some eyes on it and get the ball rolling.

(I didn't see cspell-dicts though, which is useful: I was only pulling in the dictionaries listed here: https://cspell.org/docs/dictionaries/. Perhaps we can catch more terms with some of the more exhaustive dictionaries. I'll look into it, but the additional benefit at this point is more marginal for each potential dictionary we could add. The integration mostly works.)

Copy link
Contributor

@OmarJaroudi OmarJaroudi Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tinkered around with this some more, how do you feel about https://github.com/turo/web-ui/pull/397. I don't see why we'd still need to go through this repo if we just need to house a shared dict and not an actual custom hook. If that's the case I think a separate repo does make more sense (I expect we'll also probably get this feedback from folks in devops too 😆 ). And I think that we'll probably have an easier time testing in a repo other than schumacher to start 😄 But I think this is generally all very promising!

Copy link

@michaeljaltamirano michaeljaltamirano Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pointing directly to a .txt file held in a public repo, even with the extra 3-4 lines of setup in cspell.json in usage, is preferable to a dependency listed in package.json. When I see a dependency in package.json I typically expect there to be some source we're executing. In this case it's just sourcing plain text. The only compelling exception to this opinion would be if we didn't want our typo list to be available publicly.

I think the main point of disagreement is around the following (pulled from the PR description in web-ui):

But the point being that the shared dictionary doesn't need to live in our pre-commit-hooks repo (in fact I dont think we'd have to go through that repo at all).

I think holding the dictionary in pre-commit-hooks makes the most sense precisely because it is a dictionary we are pulling into pre-commit tooling. Maintaining a separate repo specifically for a dictionary seems more cumbersome than it needs to be, and an example of repo proliferation in the same way we see Slack channel proliferation. Everyone at Turo--front-end and otherwise--is using (or should be using) pre-commit tooling. The co-location is helpful in understanding its broader appeal in a way that a stand-alone repo is not so obvious, and is also "closer to the source" (the README in the pre-commit repo, in the PR I shared, shows instructions for how to define a cspell.json config file, with the only remaining step being adding the cspell-cli pre-commit hook to the list of hooks the repo is pulling in. That can be added to the README if it would be helpful.)


Ultimately, I asked for feedback and got it 😅. I think a next step would be updating the RFC to reflect these two patterns and seeking feedback from the wider team on their preferences.

EDIT: I left a follow-up comment here (https://github.com/turo/web-schumacher-app/pull/9234#discussion_r1491870922), I'm open to the cspell-dicts setup. I am mostly interested in finding consensus on a solution sooner, much more so than the details of the implementation solution 🙏

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback, just to reiterate my slack DM I'm trying to allocate time to re-immerse myself in this RFC once I have sprint work out of the way 🙏 🙏 Been roughly scanning through these comments though.

Comment on lines +100 to +104
- id: cspell
name: "Check for spelling errors with `cspell-cli`"
description: "Find and report spelling errors in staged files"
language: script
entry: hooks/cspell/cspell.sh
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's a way to run the cspell pre-commit hook like how cspell suggests https://github.com/streetsidesoftware/cspell-cli/blob/main/.pre-commit-hooks.yaml (instead of running an executable bash script)

My theory is the way above would allow us to store the dictionary or cspell.json in this repo.

In the current implementation I believe it may be difficult to create a shared dictionary. See concerns / comment #2 in cspell.sh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are correct: I looked into making a cspell id for repos to reference from, but I think we'd largely have to fork/duplicate the setup in https://github.com/streetsidesoftware/cspell-cli in this repo, which seemed like a poor trade-off for avoiding the following lines of logic in the repos:

repos:
  - repo: https://github.com/streetsidesoftware/cspell-cli
    rev: v8.3.0
    hooks:
      - id: cspell

The only benefit I could see, beyond convenience, is maintaining a single version of cspell behind on our versioned releases. The overhead for that benefit seems dubious. 🤷

@turo-mwong
Copy link
Author

@OmarJaroudi @michaeljaltamirano

Tagged you both in case theres any more significant feedback that may need to be applied before adding it into the RFC.

Copy link

@michaeljaltamirano michaeljaltamirano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turo-mwong reviewed this yesterday, leaving a comment so my github integration marks this as reviewed, let me know what you think about the other implementation PRs.

@turo-mwong turo-mwong closed this Feb 27, 2024
@michaeljaltamirano michaeljaltamirano deleted the spike/testing-cspell-integration branch April 14, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants