Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping of LaTeX delimiters is overly aggressive #1078

Open
dlqqq opened this issue Oct 31, 2024 · 0 comments
Open

Escaping of LaTeX delimiters is overly aggressive #1078

dlqqq opened this issue Oct 31, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@dlqqq
Copy link
Member

dlqqq commented Oct 31, 2024

Problem

Because Jupyter AI uses the JupyterLab renderer for Markdown + LaTeX, we need to add an extra backslash to LaTeX delimiters so math expressions are rendered when they are used.

Our current logic is very simple and directly escapes the message body as a string before passing it to the renderer:

/**
 * Escapes backslashes in LaTeX delimiters such that they appear in the DOM
 * after the initial MarkDown render. For example, this function takes '\(` and
 * returns `\\(`.
 *
 * Required for proper rendering of MarkDown + LaTeX markup in the chat by
 * `ILatexTypesetter`.
 */
function escapeLatexDelimiters(text: string) {
  return text
    .replace(/\\\(/g, '\\\\(')
    .replace(/\\\)/g, '\\\\)')
    .replace(/\\\[/g, '\\\\[')
    .replace(/\\\]/g, '\\\\]');
}

However, sometimes the LLM provides a response that literally includes \(, \[, \], or \). In other words, there are scenarios where these symbols appear but do not serve as TeX delimiters, and therefore should not be escaped. Currently, when this happens, an extra backslash is erroneously added.

In the below example, the UI wrongly shows three backslashes in the negative lookbehind instead of two. The frontend is transforming ?<![$\\]) into ?<![$\\\]), which is invalid.

Screenshot 2024-10-31 at 1 19 42 PM

Proposed Solution

Do not escape LaTeX delimiters if:

  1. The delimiter is already escaped (i.e. escapeLatexDelimiters('\\(') should return '\\('), or
  2. The delimiter is within MarkDown inline or block code.

The second step will be trickier, since I'm not sure if it can be handled exclusively via a regex. This may have to be done imperatively. We should also explore if there is a third-party package that provides an implementation of this.

@dlqqq dlqqq added enhancement New feature or request bug Something isn't working and removed enhancement New feature or request labels Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant