SPIKE test `cspell` integration with pre-commit-hooks #42

turo-mwong · 2024-01-13T00:44:08Z

The purpose of this PR is to exemplify the possible solution for adding spell checking to the Turo dev workflow via pre-commit-hooks

The PR is a draft additional to the Automate spell checking in WEB pull requests RFC to get more feedback from the team.

Screenshots:

Below is an example of how the output would look. There is an extra amount of words here, displaying a bunch of false positives such as "Turo". The output should be greatly trimmed upon adding a dictionary.

turo-mwong · 2024-01-13T00:55:04Z

hooks/cspell/cspell.sh

+#!/bin/bash
+set -e
+
+git diff --name-only | npx cspell --no-summary --no-progress --no-must-find-files --file-list stdin


Think this is a good stopping point for SPIKE. I think there is definitely a way to setup a shared dictionary which we can determine later.

Possible limitation with shared dictionaries: In the current implementation we require a dictionary to be present in the repo with the pre-commit-config.yaml. The reason is because this command is run in target repo work directory (it cannot reference files in @turo/pre-commit-hooks) If we can expose the dictionary some how then it's fine. Otherwise we may want to explore the cspell pre-commit hook.

Not sure if this makes sense but would it be possible for us to maintain our own cspell config like we do for commitlint for example, and then that repo would hold our dictionary, and we can just extend it in the local cspell.config.yaml file similar to https://github.com/turo/web-front-end-app/blob/main/.commitlintrc.yaml?

@OmarJaroudi I don't think that is possible, because the dictionary/configuration needs to be present in a cspell.json file in the project root where the commit is happening, rather than logic embedded in pre-commit-hooks: docs + docs 2.

I think that's why this implementation has everything in the CLI: the only way to "hide away" these details is embedding most of the logic in a script. I think the problem with this approach is that it would require any consumer repos to "extend" this scripting logic with their own dictionaries/configuration, all provided via CLI, and wherever possible I think it's better to have configuration code over in-lined scripting code. Even well-maintained and commented script code is not as intuitive to read as configuration files (.json/.yaml), in my opinion.

I've added an example PR for what maintaining a common dictionary setup would look like in #44, with an example PR in Schumacher here: https://github.com/turo/web-schumacher-app/pull/9234, with the understanding that repos that benefit from spell-checking would need to add a configuration file, cspell.jsonc, to extend from the dictionary committed in pre-commit-hooks.

The reason I think this is an advisable alternative is that it is easily extensible: we can maintain a common dictionary and add repo-specific configuration as needed. These configuration files are easier to read--and track changes--than pure scripting. I think we need to build functionality into the repos because, based on my experience of bringing Schumacher into compliance, there are a lot of false positives (and existing typos 😆), and it's a judgment call as to whether these violations should exist only in the repo, or be moved to a common dictionary.

I am in particular imagining the following scenario: if I'm an engineer with a low opinion about the ROI of this tooling, it's one thing for me to add a new word (or ignore glob) in the repo I'm working in, as an additional part of my commit, when I'm blocked by a violation. It's another if I'm going to have to touch this repo on a per commit basis to resolve violations. I don't think that's sufficiently high DX, and so a non-starter. We want the bar for "the ability to contribute/modify" be pretty low, and I think keeping a plaintext dictionary with example cspell.jsonc suffices in that regard.

cc @turo-mwong

Looking at cspell-dicts https://github.com/streetsidesoftware/cspell-dicts/blob/main/dictionaries/ar/README.md, my thought was to try and mimic that - extend/import a shared dictionary that would live in its own repo, and then override this dict as we go in each repo if there are one-off typos that we want to disable for specific repos and not others. If that works I dont think folks would have to touch the shared dictionary frequently (unless they want to introduce a new word that they think shouldnt be a typo across all our codebase). I'll have to look more into this just juggling too many things atm 😅

I think the PRs I shared (#44 + https://github.com/turo/web-schumacher-app/pull/9234) approximate what you are describing. We maintain a turo-dictionary in this repo that other repos can extend in their cspell.json project root file. When you have time, a quick sniff-test of the example PRs would help provide some Platform buy-in, at which point we can modify the RFC accordingly and circulate the content in #front-end-quality to get some eyes on it and get the ball rolling.

(I didn't see cspell-dicts though, which is useful: I was only pulling in the dictionaries listed here: https://cspell.org/docs/dictionaries/. Perhaps we can catch more terms with some of the more exhaustive dictionaries. I'll look into it, but the additional benefit at this point is more marginal for each potential dictionary we could add. The integration mostly works.)

Just tinkered around with this some more, how do you feel about https://github.com/turo/web-ui/pull/397. I don't see why we'd still need to go through this repo if we just need to house a shared dict and not an actual custom hook. If that's the case I think a separate repo does make more sense (I expect we'll also probably get this feedback from folks in devops too 😆 ). And I think that we'll probably have an easier time testing in a repo other than schumacher to start 😄 But I think this is generally all very promising!

I think pointing directly to a .txt file held in a public repo, even with the extra 3-4 lines of setup in cspell.json in usage, is preferable to a dependency listed in package.json. When I see a dependency in package.json I typically expect there to be some source we're executing. In this case it's just sourcing plain text. The only compelling exception to this opinion would be if we didn't want our typo list to be available publicly.

I think the main point of disagreement is around the following (pulled from the PR description in web-ui):

But the point being that the shared dictionary doesn't need to live in our pre-commit-hooks repo (in fact I dont think we'd have to go through that repo at all).

I think holding the dictionary in pre-commit-hooks makes the most sense precisely because it is a dictionary we are pulling into pre-commit tooling. Maintaining a separate repo specifically for a dictionary seems more cumbersome than it needs to be, and an example of repo proliferation in the same way we see Slack channel proliferation. Everyone at Turo--front-end and otherwise--is using (or should be using) pre-commit tooling. The co-location is helpful in understanding its broader appeal in a way that a stand-alone repo is not so obvious, and is also "closer to the source" (the README in the pre-commit repo, in the PR I shared, shows instructions for how to define a cspell.json config file, with the only remaining step being adding the cspell-cli pre-commit hook to the list of hooks the repo is pulling in. That can be added to the README if it would be helpful.)

Ultimately, I asked for feedback and got it 😅. I think a next step would be updating the RFC to reflect these two patterns and seeking feedback from the wider team on their preferences.

EDIT: I left a follow-up comment here (https://github.com/turo/web-schumacher-app/pull/9234#discussion_r1491870922), I'm open to the cspell-dicts setup. I am mostly interested in finding consensus on a solution sooner, much more so than the details of the implementation solution 🙏

Thanks for all the feedback, just to reiterate my slack DM I'm trying to allocate time to re-immerse myself in this RFC once I have sprint work out of the way 🙏 🙏 Been roughly scanning through these comments though.

turo-mwong · 2024-01-13T00:55:18Z

.pre-commit-hooks.yaml

+- id: cspell
+  name: "Check for spelling errors with `cspell-cli`"
+  description: "Find and report spelling errors in staged files"
+  language: script
+  entry: hooks/cspell/cspell.sh


I'm not sure if there's a way to run the cspell pre-commit hook like how cspell suggests https://github.com/streetsidesoftware/cspell-cli/blob/main/.pre-commit-hooks.yaml (instead of running an executable bash script)

My theory is the way above would allow us to store the dictionary or cspell.json in this repo.

In the current implementation I believe it may be difficult to create a shared dictionary. See concerns / comment #2 in cspell.sh

I think you are correct: I looked into making a cspell id for repos to reference from, but I think we'd largely have to fork/duplicate the setup in https://github.com/streetsidesoftware/cspell-cli in this repo, which seemed like a poor trade-off for avoiding the following lines of logic in the repos:

repos: - repo: https://github.com/streetsidesoftware/cspell-cli rev: v8.3.0 hooks: - id: cspell

The only benefit I could see, beyond convenience, is maintaining a single version of cspell behind on our versioned releases. The overhead for that benefit seems dubious. 🤷

turo-mwong · 2024-01-13T04:05:39Z

@OmarJaroudi @michaeljaltamirano

Tagged you both in case theres any more significant feedback that may need to be applied before adding it into the RFC.

michaeljaltamirano

@turo-mwong reviewed this yesterday, leaving a comment so my github integration marks this as reviewed, let me know what you think about the other implementation PRs.

turo-mwong added 4 commits January 12, 2024 15:15

helloworld

40bf516

helloworld

d299c52

helloworld

11f450f

feat: add spell checking executable

796cc35

turo-mwong commented Jan 13, 2024

View reviewed changes

turo-mwong requested review from michaeljaltamirano and OmarJaroudi January 13, 2024 00:55

michaeljaltamirano reviewed Feb 14, 2024

View reviewed changes

turo-mwong closed this Feb 27, 2024

michaeljaltamirano deleted the spike/testing-cspell-integration branch April 14, 2024 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPIKE test `cspell` integration with pre-commit-hooks #42

SPIKE test `cspell` integration with pre-commit-hooks #42

turo-mwong commented Jan 13, 2024 •

edited

Loading

turo-mwong Jan 13, 2024 •

edited

Loading

OmarJaroudi Jan 17, 2024

michaeljaltamirano Feb 13, 2024

OmarJaroudi Feb 15, 2024 •

edited

Loading

michaeljaltamirano Feb 15, 2024

OmarJaroudi Feb 15, 2024 •

edited

Loading

michaeljaltamirano Feb 15, 2024 •

edited

Loading

turo-mwong Feb 15, 2024

turo-mwong Jan 13, 2024

michaeljaltamirano Feb 13, 2024

turo-mwong commented Jan 13, 2024

michaeljaltamirano left a comment

SPIKE test cspell integration with pre-commit-hooks #42

SPIKE test cspell integration with pre-commit-hooks #42

Conversation

turo-mwong commented Jan 13, 2024 • edited Loading

Screenshots:

turo-mwong Jan 13, 2024 • edited Loading

Choose a reason for hiding this comment

OmarJaroudi Jan 17, 2024

Choose a reason for hiding this comment

michaeljaltamirano Feb 13, 2024

Choose a reason for hiding this comment

OmarJaroudi Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

michaeljaltamirano Feb 15, 2024

Choose a reason for hiding this comment

OmarJaroudi Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

michaeljaltamirano Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

turo-mwong Feb 15, 2024

Choose a reason for hiding this comment

turo-mwong Jan 13, 2024

Choose a reason for hiding this comment

michaeljaltamirano Feb 13, 2024

Choose a reason for hiding this comment

turo-mwong commented Jan 13, 2024

michaeljaltamirano left a comment

Choose a reason for hiding this comment

SPIKE test `cspell` integration with pre-commit-hooks #42

SPIKE test `cspell` integration with pre-commit-hooks #42

turo-mwong commented Jan 13, 2024 •

edited

Loading

turo-mwong Jan 13, 2024 •

edited

Loading

OmarJaroudi Feb 15, 2024 •

edited

Loading

OmarJaroudi Feb 15, 2024 •

edited

Loading

michaeljaltamirano Feb 15, 2024 •

edited

Loading