Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex caret (^) not working properly for scope: raw when there are over 999 characters in a Markdown file. #869

Open
1 task done
michael-nok opened this issue Jul 10, 2024 · 1 comment

Comments

@michael-nok
Copy link

michael-nok commented Jul 10, 2024

Check for existing issues

  • Completed

Environment

  • Windows 10
  • Direct download of Windows executable
  • Vale 2.29.1 or later

Describe the bug / provide steps to reproduce it

The changes implemented by this commit causes the problem to exist: f769fcd

	// NOTE: If the `ctx` document is large (as could be the case with
	// `scope: raw`) this is *slow*. Thus, the cap at 1k.
	//
	// TODO: Actually fix this.

I have a rule that looks for incorrectly indented content. It uses the following token:

extends: existence
message: 'Content must be indented using 4x spaces each time. "%s"'
level: error
nonword: true
scope: raw
tokens:
  - '^[ ]{1,3}\`'

When the Markdown file contains 999 characters or more (i.e. ctx > 1000), the ^ part of the token stops using the start of the line properly and invents (hallucinates) new starting positions.

Attached are sample files that exactly show the spillover in the logic:
vale.zip

Consequently, text with four spaces before the first ` is flagged as incorrect, and the starting position for the ^ is column 1.

image

Using the vale.exe from release 2.29.0 does not have this problem.

@twitchard
Copy link

I ran into this too.

My reproduction

cd $(mktemp -d)                                                             
cat <<EOF > .vale.ini
StylesPath = ./.vale/styles
MinAlertLevel = suggestion
[*.txt]
BasedOnStyles = MyStyle
EOF
mkdir -p ./.vale/styles/MyStyle
cat <<EOF > .vale/styles/MyStyle/MyRule.yml
extends: script
message: "Error %s"
scope: raw
script: |
  matches := [{begin: 0, end: 1}, {begin: 997, end: 998}]
EOF
(
  echo a
  echo b
  for x in `seq 993`; do
    echo -n "x";
  done
  echo b
) >> test-999-bytes.txt
cp test-999-bytes.txt test-1001-bytes.txt
echo x >> test-1001-bytes.txt
vale test-999-bytes.txt
vale test-1001-bytes.txt

Produces

 test-999-bytes.txt
 1:1    warning  Error a  MyStyle.MyRule
 3:994  warning  Error b  MyStyle.MyRule

✖ 0 errors, 2 warnings and 0 suggestions in 1 file.

 test-1001-bytes.txt
 1:1  warning  Error a  MyStyle.MyRule
 2:1  warning  Error b  MyStyle.MyRule

✖ 0 errors, 2 warnings and 0 suggestions in 1 file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants