Skip to content
This repository has been archived by the owner on Aug 25, 2024. It is now read-only.

feat(webcrawler): improve unchanged pages sourcing #131

Merged
merged 3 commits into from
Aug 22, 2024

Conversation

nicoloboschi
Copy link
Collaborator

New flags:

  • only-main-content: default to false, if enabled it will remove script, style (and others) tags from the emitted document. This is particalury helpful in order to verify actual semantic changes to the pages, not related to sldf (script versioning, cache busting, etc)
  • emit-content-diff: list, default to all the content diff. You can filter the content diff you want the source to emit, if available. For example, to not emit content_unchanged, you can set emit-content-diff: ['new', 'content_diff']

@nicoloboschi nicoloboschi merged commit a90c902 into vectorize-io:main Aug 22, 2024
11 of 14 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant