Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework approximate markdown servlet #22

Closed
stoerr opened this issue Oct 10, 2023 · 0 comments · Fixed by #29
Closed

Rework approximate markdown servlet #22

stoerr opened this issue Oct 10, 2023 · 0 comments · Fixed by #29
Assignees
Labels
enhancement New feature or request

Comments

@stoerr
Copy link
Member

stoerr commented Oct 10, 2023

The algorithm used to extract the text from a page or component as input for ChatGPT currently works with a heuristic that renders common attributes like jcr:title, title, subtitle, linkTitle, jcr:description, text, code, copyright, defaultValue, exampleCode, suffix, exampleResult, footer (see ApproximateMarkdownServiceImpl.java), and add special cases for special components using ApproximateMarkdownServicePlugin . That seems to work fine for standard components, but will output nothing or little for custom components that have custom attributes (which is pretty common in AEM / Composum development). Thus we have to improve that.

Variants:

  • Turn it around and output common attributes like jcr:title, title, subtitle, linkTitle, text, header, footer as they are, filter out system attributes by name and value (we probably won't want to output any attribute with a namespace and also not attributes with numbers, booleans, dates (?), arrays (?) )
  • Extract the text content from the HTML rendering of a page or component. (advantage: that contains content included from elsewhere, disadvantage: it includes irrelevant content like navigations, advertisements, headers, footers etc.)

Additionally, see whether we can easily rework it to produce HTML when needed.

@stoerr stoerr added the enhancement New feature or request label Oct 10, 2023
@stoerr stoerr self-assigned this Oct 12, 2023
@stoerr stoerr linked a pull request Oct 17, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant