Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore redundant nesting when checking for related siblings #98

Merged
merged 2 commits into from
Aug 29, 2024

Conversation

tuzz
Copy link
Contributor

@tuzz tuzz commented Aug 2, 2024

If the best candidate is in an element all by itself, then we should probably check its nearest ancestor that has siblings when considering whether to append siblings that meet the score threshold.

For example, in the example below, we would now include the second paragraph whereas previous we would not.

<div>
  <div>
    <p>This is the best candidate.</p>
  </div>
</div>
<p>This paragraph meets the score threshold.</p>

Note that this changes behaviour. We could put this change behind an option if preferred. I think this will improve the extraction for most use cases, though, and none of the existing test cases fail.

If the best candidate is in an element all by itself, then we should
probably check its nearest ancestor that has siblings when considering
whether to append siblings that meet the score threshold.

For example, in the example below, we would now include the second
paragraph whereas previous we would not.


```html
<div>
  <div>
    <p>This is the best candidate.</p>
  </div>
</div>
<p>This paragraph meets the score threshold.</p>
```
@cantino
Copy link
Owner

cantino commented Aug 3, 2024

Hi @tuzz! Does the Mozilla JS version do this? For consistency we should probably make this an option.

@tuzz
Copy link
Contributor Author

tuzz commented Aug 27, 2024

Hi @cantino, apologies for the slow reply.

No, the Mozilla JS version doesn't have this feature. I've just pushed a commit to hide it behind an option. Hopefully the feature is useful enough to be considered for inclusion. We've found it really helps for some DOM structures.

Thanks

@cantino cantino merged commit 5652f91 into cantino:master Aug 29, 2024
1 check passed
@cantino
Copy link
Owner

cantino commented Aug 29, 2024

Thanks @tuzz!

@cantino
Copy link
Owner

cantino commented Aug 29, 2024

Releases in 0.7.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants