Skip to content

Commit

Permalink
Implement efficient validation for hash intra-link (#2465)
Browse files Browse the repository at this point in the history
  • Loading branch information
yiwen101 authored Mar 29, 2024
1 parent 9da549c commit 66591c5
Show file tree
Hide file tree
Showing 14 changed files with 160 additions and 33 deletions.
2 changes: 1 addition & 1 deletion docs/devGuide/bootcamp/exploreMarkBind.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ Now, let's try editing the `index.md` file to add some content of our own!
1. Modify or add some content utilizing one or more of the following components:
- <a tags="environment--combined" href="/userGuide/components/presentation.html#boxes">Box</a><a tags="environment--dg" href="https://markbind.org/userGuide/components/presentation.html#boxes">Box</a>
- <a tags="environment--combined" href="/userGuide/components/presentation.html#panels">Panel</a><a tags="environment--dg" href="https://markbind.org/userGuide/components/presentation.html#panels">Panel</a>
- <a tags="environment--combined" href="/userGuide/components/presentation.html#tooltips">Tooltip</a><a tags="environment--dg" href="https://markbind.org/userGuide/components/presentation.html#tooltips">Tooltip</a>
- <a tags="environment--combined" href="/userGuide/components/popups.html#tooltips">Tooltip</a><a tags="environment--dg" href="https://markbind.org/userGuide/components/popups.html#tooltips">Tooltip</a>
- etc.

<box type="important" light>
Expand Down
2 changes: 1 addition & 1 deletion docs/userGuide/markBindInTheProjectWorkflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ A MarkBind conversion involves the following:
- Adding a Home page: If your project already has a `README.md` or `Home.md`, the content will be copied over to `index.md`. Otherwise, a default home page will be added.
- Adding an About Us page: If your project already has `about.md`, this will be used as the About page. Otherwise, a default About page will be added.
- Adding a top navigation bar.
- Adding a site navigation menu: If your project has a valid `_Sidebar.md` file, it will be used as the [site navigation menu]({{baseUrl}}/userGuide/tweakingThePageStructure.html#site-navigation-menus). Otherwise, the menu will be built from your project's directory structure and contain links to all addressable pages.
- Adding a site navigation menu: If your project has a valid `_Sidebar.md` file, it will be used as the [site navigation menu]({{baseUrl}}/userGuide/tweakingThePageStructure.html#constructing-a-page-navigation-menu). Otherwise, the menu will be built from your project's directory structure and contain links to all addressable pages.
- Adding a custom footer: If your project has a valid `_Footer.md` file, it will be used as the website footer. Otherwise, a default footer will be added.

<box type="warning">
Expand Down
6 changes: 3 additions & 3 deletions docs/userGuide/reusingContents.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ You can use a [_Tabs_ component](components/presentation.html#tabs) to give alte
#### Giving access to additional contents

You can use following components to give readers an option to access additional content at their discretion.
* [Tooltips](components/popups.html#tooltip), [Popovers](components/popups.html#popover), [Modals](components/popups.html#modal)
* [Tooltips](components/popups.html#tooltips), [Popovers](components/popups.html#popovers), [Modals](components/popups.html#modals)
* [Expandable Panels](components/presentation.html#panels)

#### Organizing contents in alternative ways
Expand Down Expand Up @@ -154,13 +154,13 @@ Then, in a page-specific CSS file,

#### Deploying a page multiple times with different titles

By [overriding the `title` declared in the frontmatter of the page using `site.json`](tweakingThePageStructure.html#front-matter), it is possible to allow MarkBind to serve the same page with different titles.
By [overriding the `title` declared in the frontmatter of the page using `site.json`](tweakingThePageStructure.html#frontmatter), it is possible to allow MarkBind to serve the same page with different titles.

This may especially be useful for users who are serving a page from a submodule.

#### Creating slight variations of content

Tags are a good way to create multiple variations of a page within the same source file, such as to filter content for creating multiple different versions of the same page. See [_User Guide: Tweaking the Page Structure → Tags_](tweakingThePageStructure.html#tags) section for more information.
Tags are a good way to create multiple variations of a page within the same source file, such as to filter content for creating multiple different versions of the same page. See [_User Guide: Tweaking the Page Structure → Tags_](tweakingThePageStructure.html#plugin-tags) section for more information.

{% from "njk/common.njk" import previous_next %}
{{ previous_next('tweakingThePageStructure', 'workingWithSites') }}
6 changes: 3 additions & 3 deletions docs/userGuide/siteJsonFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,8 @@ _(Optional)_ **The styling options to be applied to the site.** This includes:
* `glob` can be used alternatively to define a file pattern in the [_glob syntax_](https://en.wikipedia.org/wiki/Glob_(programming)), or an array of such file patterns.<br>
{{ icon_examples }} `**/*.md` or `[ '**/*.md', '**/index.md' ]` { .my-2 }
* **`globExclude`**: An array of file patterns to be excluded from rendering when using `glob`, also defined in the glob syntax.
* **`title`**: The page `<title>` for the generated web page. Titles specified here take priority over titles specified in the [frontmatter](tweakingThePageStructure.html#front-matter) of individual pages.
* **`layout`**: The [layout](tweakingThePageStructure.html#page-layouts) to be used by the page. Default: `default`.
* **`title`**: The page `<title>` for the generated web page. Titles specified here take priority over titles specified in the [frontmatter](tweakingThePageStructure.html#frontmatter) of individual pages.
* **`layout`**: The [layout](tweakingThePageStructure.html#layouts) to be used by the page. Default: `default`.
* **`searchable`**: Specifies that the page(s) should be excluded from searching. Default: `yes`.
* **`externalScripts`**: An array of external scripts to be referenced on the page. Scripts referenced will be run before the layout script.
* **`frontmatter`**: Specifies properties to add to the frontmatter of a page or glob of pages. Overrides any existing properties if they have the same name, and overrides any frontmatter properties specified in `globalOverride`.
Expand Down Expand Up @@ -254,7 +254,7 @@ The ignore pattern follows the [glob pattern used in .gitignore](https://git-scm

**A list of plugins to load.** Plugins are user-defined extensions that can add custom features to MarkBind. `pluginsContext` contains settings to be applied to the loaded plugins. See [User Guide: Using Plugins](usingPlugins.html) for more details.

The example above uses tags as an example of configuring plugin settings, refer to the [`filterTags` plugin](tweakingThePageStructure.html#filtertags-toggling-alternative-contents-in-a-page) for more details.
The example above uses tags as an example of configuring plugin settings, refer to the [`filterTags` plugin](tweakingThePageStructure.html#toggling-alternative-contents) for more details.

#### **`headingIndexingLevel`**

Expand Down
2 changes: 1 addition & 1 deletion docs/userGuide/syntax/navBars.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

<box type="warning">
<markdown>
Note: **Navbars** should be placed within a [header file]({{ baseUrl }}/userGuide/tweakingThePageStructure.html#headers) to ensure that they are correctly positioned at the top of the page, above the [site navigation]({{ baseUrl }}/userGuide/tweakingThePageStructure.html#site-navigation-menus) and [page navigation]({{ baseUrl }}/userGuide/tweakingThePageStructure.html#page-navigation-menus) menus.
Note: **Navbars** should be placed within a [header file]({{ baseUrl }}/userGuide/tweakingThePageStructure.html#sticking-the-header-to-the-top) to ensure that they are correctly positioned at the top of the page, above the [site navigation]({{ baseUrl }}/userGuide/tweakingThePageStructure.html#constructing-a-page-navigation-menu) and [page navigation]({{ baseUrl }}/userGuide/tweakingThePageStructure.html#constructing-a-page-navigation-menu) menus.
</markdown>
</box>

Expand Down
6 changes: 3 additions & 3 deletions docs/userGuide/syntax/searchBars.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Enter a search term (eg. 'search bar') to see the search result dropdown.

Name | Type | Default | Description
---- | ---- | ------- | ------
algolia | `Boolean` | `false` | Whether the searchbar should be connected to [Algolia DocSearch]({{ baseUrl }}/userGuide/usingPlugins.html#algolia-enabling-algolia-docsearch).
algolia | `Boolean` | `false` | Whether the searchbar should be connected to [Algolia DocSearch]({{ baseUrl }}/userGuide/usingPlugins.html#plugin-algolia).
data | `Array` || The local data source for suggestions. Expected to be a primitive array. To use MarkBind's search functionality, set this value to `"searchData"`.
menu-align-right | `Boolean` | `false` | Whether the search bar's dropdown list will be right-aligned.
on-hit | `Function` || A callback function when you click or hit return on an item. To use MarkBind's search functionality, set this value to `"searchCallback"`.
Expand All @@ -45,13 +45,13 @@ placeholder | `String` | `''` | The placeholder text shown when no keywords are

Note: If you are using MarkBind's search functionality, then `enableSearch` **must be set to `true` in `site.json`**.

See: [User Guide: Site Configuration → enableSearch]({{ baseUrl }}/userGuide/siteJsonFile.html#enable-search).
See: [User Guide: Site Configuration → enableSearch]({{ baseUrl }}/userGuide/siteJsonFile.html#enablesearch).

</box>

%%{{ icon_info }} Related topic: [User Guide: Making the Site Searchable]({{ baseUrl }}/userGuide/makingTheSiteSearchable.html).%%

%%{{ icon_info }} Related topic: [User Guide: Using Plugins → Algolia: Enabling Algolia DocSearch]({{ baseUrl }}/userGuide/usingPlugins.html#algolia-enabling-algolia-docsearch).%%
%%{{ icon_info }} Related topic: [User Guide: Using Plugins → Algolia: Enabling Algolia DocSearch]({{ baseUrl }}/userGuide/usingPlugins.html#plugin-algolia).%%

</div> <!-- end of body -->

Expand Down
5 changes: 3 additions & 2 deletions packages/core/src/html/NodeProcessor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ export class NodeProcessor {

transformOldSlotSyntax(node);
shiftSlotNodeDeeper(node);

this.siteLinkManager.maintainFilePathToHashesMap(node, context.cwf);
// log warnings for conflicting attributes
if (_.has(warnConflictingAtributesMap, node.name)) { warnConflictingAtributesMap[node.name](node); }

Expand All @@ -189,7 +189,7 @@ export class NodeProcessor {
return processInclude(node, context, this.pageSources, this.variableProcessor,
(text: string) => this.markdownProcessor.renderMd(text),
(text: string) => this.markdownProcessor.renderMdInline(text),
this.config);
this.config, this.siteLinkManager);
case 'panel':
this.mdAttributeRenderer.processPanelAttributes(node);
return processPanelSrc(node, context, this.pageSources, this.config);
Expand Down Expand Up @@ -383,6 +383,7 @@ export class NodeProcessor {
const isHeadingTag = (/^h[1-6]$/).test(node.name);
if (isHeadingTag && !node.attribs.id) {
setHeadingId(node, this.config);
this.siteLinkManager.maintainFilePathToHashesMap(node, context.cwf);
}

this.pluginManager.postProcessNode(node);
Expand Down
53 changes: 49 additions & 4 deletions packages/core/src/html/SiteLinkManager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import has from 'lodash/has';
import * as linkProcessor from './linkProcessor';
import type { NodeProcessorConfig } from './NodeProcessor';
import { MbNode } from '../utils/node';
import { setHeadingId } from './headerProcessor';

const _ = { has };

Expand All @@ -15,12 +16,14 @@ const tagsToValidate: Set<string> = new Set([
]);

export class SiteLinkManager {
private config: NodeProcessorConfig;
private intralinkCollection: Map<string, Set<string>>;
protected config: NodeProcessorConfig;
protected intralinkCollection: Map<string, Set<string>>;
protected filePathToHashesMap: Map<string, Set<string>>;

constructor(config: NodeProcessorConfig) {
this.config = config;
this.intralinkCollection = new Map();
this.filePathToHashesMap = new Map();
}

/**
Expand All @@ -39,9 +42,11 @@ export class SiteLinkManager {
if (!this.config.intrasiteLinkValidation.enabled) {
return;
}

this.intralinkCollection.forEach((resourcePaths, cwf) => {
resourcePaths.forEach(resourcePath => linkProcessor.validateIntraLink(resourcePath, cwf, this.config));
resourcePaths.forEach(resourcePath => linkProcessor.validateIntraLink(resourcePath,
cwf,
this.config,
this.filePathToHashesMap));
});

this.intralinkCollection = new Map();
Expand Down Expand Up @@ -69,4 +74,44 @@ export class SiteLinkManager {
this._addToCollection(resourcePath, cwf);
return 'Intralink collected to be validated later';
}

/**
* Add sections that could be reached by intra-link with hash to this node to filePathToHashesMap,
* The reachable sections include nodes with ids and headings.
*/
maintainFilePathToHashesMap(node: MbNode, cwf: string) {
if (!this.config.intrasiteLinkValidation.enabled) {
return;
}
const path = cwf.substring(this.config.rootPath.length);
if (!this.filePathToHashesMap.has(path)) {
this.filePathToHashesMap.set(path, new Set());
}
if (node.attribs!.id) {
this.filePathToHashesMap.get(path)!.add(node.attribs!.id);
}
}

/**
* Recursively add reachable sections of the included node to the filePathToHashesMap for validation.
*/
maintainHashesForInclude(node: MbNode, cwf: string) {
if (!this.config.intrasiteLinkValidation.enabled) {
return;
}
const isHeadingTag = (/^h[1-6]$/).test(node.name);
if (isHeadingTag && node.attribs && !node.attribs.id) {
setHeadingId(node, this.config, false);
this.maintainFilePathToHashesMap(node, cwf);
node.attribs.id = undefined;
}
if (node.attribs && node.attribs.id) {
this.maintainFilePathToHashesMap(node, cwf);
}
if (node.children) {
node.children.forEach((child) => {
this.maintainHashesForInclude(child as MbNode, cwf);
});
}
}
}
22 changes: 18 additions & 4 deletions packages/core/src/html/headerProcessor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,38 @@ const _ = {
has,
};

/*
* h1 - h6
/**
* Increment the counter for same heading text, and set the id of the heading node
*
* If the addHeaderCount is false, the counter for the same heading text will not be incremented.
* The heading id will also not have the count appended to it.
*
* So for, only SiteLinkManager uses this function with addHeaderCount set to false.
* This is for validating intralinks in a side-effect free manner.
*
* Heading text refers to textContent of h1-h6 Nodes.
*/
export function setHeadingId(node: MbNode, config: NodeProcessorConfig) {
export function setHeadingId(node: MbNode,
config: NodeProcessorConfig,
addHeaderCount: boolean = true) {
const textContent = cheerio(node).text();
// remove the '&lt;' and '&gt;' symbols that markdown-it uses to escape '<' and '>'
const cleanedContent = textContent.replace(/&lt;|&gt;/g, '');
const slugifiedHeading = slugify(cleanedContent, { decamelize: false });

let headerId = slugifiedHeading;
if (!addHeaderCount) {
node.attribs.id = headerId;
return;
}
const { headerIdMap } = config;

if (headerIdMap[slugifiedHeading]) {
headerId = `${slugifiedHeading}-${headerIdMap[slugifiedHeading]}`;
headerIdMap[slugifiedHeading] += 1;
} else {
headerIdMap[slugifiedHeading] = 2;
}

node.attribs.id = headerId;
}

Expand Down
7 changes: 3 additions & 4 deletions packages/core/src/html/includePanelProcessor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import type { Context } from './Context';
import type { PageSources } from '../Page/PageSources';
import type { VariableProcessor } from '../variables/VariableProcessor';
import { MbNode, NodeOrText } from '../utils/node';
import { SiteLinkManager } from './SiteLinkManager';

require('../patches/htmlparser2');

Expand Down Expand Up @@ -182,7 +183,7 @@ function buildGetNextFootnodeNumber() {
export function processInclude(node: MbNode, context: Context, pageSources: PageSources,
variableProcessor: VariableProcessor, renderMd: (text: string) => string,
renderMdInline: (text: string) => string,
config: Record<string, any>,
config: Record<string, any>, siteLinkManager: SiteLinkManager,
getNextFootnodeNumber: () => number = buildGetNextFootnodeNumber()): Context {
if (_.isEmpty(node.attribs.src)) {
const error = new Error(`Empty src attribute in include in: ${context.cwf}`);
Expand Down Expand Up @@ -274,11 +275,9 @@ export function processInclude(node: MbNode, context: Context, pageSources: Page
if (isTrim) {
actualContent = actualContent.trim();
}

const $includeEl = cheerio(node);
$includeEl.empty();
$includeEl.append(actualContent);

if (node.children && node.children.length > 0) {
childContext.addCwfToCallstack(context.cwf);
childContext.processingOptions.omitFrontmatter = shouldOmitFrontmatter;
Expand All @@ -292,7 +291,7 @@ export function processInclude(node: MbNode, context: Context, pageSources: Page
}

_deleteIncludeAttributes(node);

siteLinkManager.maintainHashesForInclude(node, context.cwf);
return childContext;
}

Expand Down
24 changes: 20 additions & 4 deletions packages/core/src/html/linkProcessor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -165,19 +165,23 @@ function isValidFileAsset(resourcePath: string, config: NodeProcessorConfig) {
* @param config passed for page metadata access
* @returns these string return values are for unit testing purposes only
*/
export function validateIntraLink(resourcePath: string, cwf: string, config: NodeProcessorConfig): string {
export function validateIntraLink(resourcePath: string,
cwf: string,
config: NodeProcessorConfig,
filePathToHashesMap: Map<string, Set<string>> = new Map()): string {
if (!isIntraLink(resourcePath)) {
return 'Not Intralink';
}

const err = `You might have an invalid intra-link! Ignore this warning if it was intended.
'${resourcePath}' found in file '${cwf}'`;

const hashErr = `You might have an invalid hash for intra-link! Ignore this warning if it was intended.'
${resourcePath}' found in file '${cwf}'`;
resourcePath = urlUtil.stripBaseUrl(resourcePath, config.baseUrl); // eslint-disable-line no-param-reassign

const resourcePathUrl = parse(resourcePath);

let hash;
if (resourcePathUrl.hash) {
hash = resourcePathUrl.hash.substring(1);
// remove hash portion (if any) in the resourcePath
resourcePath = resourcePathUrl.pathname; // eslint-disable-line no-param-reassign
}
Expand All @@ -202,6 +206,11 @@ export function validateIntraLink(resourcePath: string, cwf: string, config: Nod
logger.warn(err);
return 'Intralink with no extension is neither a Page Source nor File Asset';
}
if (hash !== undefined
&& (!filePathToHashesMap.get(asFileAsset) || !filePathToHashesMap.get(asFileAsset)!.has(hash))) {
logger.warn(hashErr);
return 'Intralink with no extension is a valid Page Source or File Asset but hash is not found';
}
return 'Intralink with no extension is a valid Page Source or File Asset';
}

Expand All @@ -211,6 +220,13 @@ export function validateIntraLink(resourcePath: string, cwf: string, config: Nod
logger.warn(err);
return 'Intralink with ".html" extension is neither a Page Source nor File Asset';
}
if (hash !== undefined) {
const filePath = `${resourcePath.slice(0, -5)}.md`;
if (!filePathToHashesMap.get(filePath) || !filePathToHashesMap.get(filePath)!.has(hash)) {
logger.warn(hashErr);
return 'Intralink with ".html" extension is a valid Page Source or File Asset but hash is not found';
}
}
return 'Intralink with ".html" extension is a valid Page Source or File Asset';
}

Expand Down
Loading

0 comments on commit 66591c5

Please sign in to comment.