Skip to content

Commit

Permalink
Keep h1 and other headings
Browse files Browse the repository at this point in the history
Even though using h1 tags for sections inside an article is semantically
wrong, a lot of websites are doing it anyway. So the idea here is to
stop stripping headings, including h1 on Readability's side.

Fixes wallabag/wallabag#5805

Signed-off-by: Kevin Decherf <[email protected]>
  • Loading branch information
Kdecherf committed Jun 11, 2022
1 parent c506b7e commit ada8ff0
Showing 1 changed file with 0 additions and 12 deletions.
12 changes: 0 additions & 12 deletions src/Readability.php
Original file line number Diff line number Diff line change
Expand Up @@ -427,18 +427,6 @@ public function prepArticle(\DOMNode $articleContent)
$this->clean($articleContent, 'object');
$this->clean($articleContent, 'iframe');
$this->clean($articleContent, 'canvas');
$this->clean($articleContent, 'h1');

/*
* If there is only one h2, they are probably using it as a main header, so remove it since we
* already have a header.
*/
$h2s = $articleContent->getElementsByTagName('h2');
if (1 === $h2s->length && mb_strlen($this->getInnerText($h2s->item(0), true, true)) < 100) {
$this->clean($articleContent, 'h2');
}

$this->cleanHeaders($articleContent);

// Do these last as the previous stuff may have removed junk that will affect these.
$this->cleanConditionally($articleContent, 'form');
Expand Down

0 comments on commit ada8ff0

Please sign in to comment.