Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow markdown Emph / BulletList characters to be customised (eg. * vs _, and - vs *) #10527

Closed
0xdevalias opened this issue Jan 12, 2025 · 3 comments

Comments

@0xdevalias
Copy link

0xdevalias commented Jan 12, 2025

Describe your proposed improvement and the problem it solves.

I noticed that the markdown outputs (I was using gfm, but it seems to be for most of them) use * for Emph output, whereas I was hoping to be able to use _. I noticed that in plaintext with the gutenberg extension we can render with _, but that seemingly doesn't work for markdown:

inlineToMarkdown opts (Emph lst) = do
variant <- asks envVariant
contents <- inlineListToMarkdown opts lst
return $ case variant of
PlainText
| isEnabled Ext_gutenberg opts -> "_" <> contents <> "_"
| otherwise -> contents
_ -> "*" <> contents <> "*"

While it's less directly needed for my particular usecase, I figured it might be useful to be able to configure which character is used when markdown lists are rendered. I noticed that depending on the specific flavour of markdown used it will choose between * or -:

-- | Convert bullet list item (list of blocks) to markdown.
bulletListItemToMarkdown :: PandocMonad m => WriterOptions -> [Block] -> MD m (Doc Text)
bulletListItemToMarkdown opts bs = do
variant <- asks envVariant
let exts = writerExtensions opts
contents <- blockListToMarkdown opts $ taskListItemToAscii exts bs
let start = case variant of
Markua -> "* "
Commonmark -> "- "
_ -> "- " <> T.replicate (writerTabStop opts - 2) " "
-- remove trailing blank line if item ends with a tight list
let contents' = if itemEndsWithTightList bs
then chomp contents <> cr
else contents
return $ hang (T.length start) (literal start) contents'

See also:

Describe alternatives you've considered.

I spent the last day or two trying to hack together a --lua-filter to handle these cases and make them configurable via --metadata args. In the end I got a semi-functional PoC script hacked together, but it feels overly complicated and suffers from a number of bugs and edgecases still:

@jgm
Copy link
Owner

jgm commented Jan 12, 2025

The reason * is used is that many markdown variants don't allow word-internal _ to trigger emphasis.

Regarding this issue in general: there are lots of things people might want to tweak in markdown output. Indentation of bullets, character used for bullets, whether lazy wrapping is used, which characters to use for bold or emph, what to use for a horizontal rule, whether to align columns in pipe tables, which table formats to prefer, etc. Adding options to control these would add a lot of additional complexity to pandoc and is probably not worth.

See my comments on the very similar issue #10479.

@jgm jgm closed this as completed Jan 12, 2025
@jgm jgm closed this as not planned Won't fix, can't repro, duplicate, stale Jan 12, 2025
@0xdevalias
Copy link
Author

0xdevalias commented Jan 12, 2025

The reason * is used is that many markdown variants don't allow word-internal _ to trigger emphasis.

Ah true, that makes sense. I've never tried to emphasis in the middle of a word, so makes sense why I haven't run into that before.


Regarding this issue in general: there are lots of things people might want to tweak in markdown output.

See my comments on the very similar issue #10479.

Oh, I somehow missed that one in my search. Thanks for linking, shall have a read 🖤


For future references, see also:


A random old comment that triggered a path in my brain I hadn't even really thought of to handle this:

The trouble with writer extensions is that they are just booleans. And if you want to configure a pretty printer (which is what we're getting into the direction of here), you would want to specify many more things, like the order of preference of the various table formats, or also just set something like list-indent=2.

Originally posted by @mb21 in #5584 (comment)

I hadn't considered passing pandoc's generated markdown through a pretty printer to 'normalise' it how I prefer. prettier supports markdown, but doesn't seem to be particularly customisable:

But could potentially use another tool to do so (though depending how deep/esoteric we get here.. it may just end up being another 'manipulate markdown AST' type situation.. in which case probably may as well just keep doing it within --lua-filter or similar.


Looking deeper at the markdown variants:

  • https://pandoc.org/MANUAL.html#markdown-variants
    • gfm (Github-Flavored Markdown)

      • pandoc --list-extensions=gfm
        ⇒ pandoc --list-extensions=gfm
        
        -ascii_identifiers
        -attributes
        +autolink_bare_uris
        -bracketed_spans
        -definition_lists
        -east_asian_line_breaks
        +emoji
        -fancy_lists
        -fenced_divs
        +footnotes
        +gfm_auto_identifiers
        -hard_line_breaks
        -implicit_figures
        -implicit_header_references
        +pipe_tables
        -raw_attribute
        +raw_html
        -rebase_relative_paths
        -smart
        -sourcepos
        +strikeout
        -subscript
        -superscript
        +task_lists
        +tex_math_dollars
        +yaml_metadata_block
    • markdown_github (deprecated GitHub-Flavored Markdown)

      • pandoc --list-extensions=markdown_github
        ⇒ pandoc --list-extensions=markdown_github
        
        -abbreviations
        +all_symbols_escapable
        -angle_brackets_escapable
        -ascii_identifiers
        +auto_identifiers
        +autolink_bare_uris
        +backtick_code_blocks
        -blank_before_blockquote
        -blank_before_header
        -bracketed_spans
        -citations
        -compact_definition_lists
        -definition_lists
        -east_asian_line_breaks
        +emoji
        -escaped_line_breaks
        -example_lists
        -fancy_lists
        -fenced_code_attributes
        +fenced_code_blocks
        -fenced_divs
        -footnotes
        -four_space_rule
        +gfm_auto_identifiers
        -grid_tables
        -gutenberg
        -hard_line_breaks
        -header_attributes
        -ignore_line_breaks
        -implicit_figures
        -implicit_header_references
        -inline_code_attributes
        -inline_notes
        +intraword_underscores
        -latex_macros
        -line_blocks
        -link_attributes
        +lists_without_preceding_blankline
        -literate_haskell
        -markdown_attribute
        -markdown_in_html_blocks
        -mmd_header_identifiers
        -mmd_link_attributes
        -mmd_title_block
        -multiline_tables
        -native_divs
        -native_spans
        -old_dashes
        -pandoc_title_block
        +pipe_tables
        -raw_attribute
        +raw_html
        -raw_tex
        -rebase_relative_paths
        -short_subsuperscripts
        +shortcut_reference_links
        -simple_tables
        -smart
        +space_in_atx_header
        -spaced_reference_links
        -startnum
        +strikeout
        -subscript
        -superscript
        +task_lists
        -table_captions
        -tex_math_dollars
        -tex_math_double_backslash
        -tex_math_single_backslash
        -yaml_metadata_block

In the older/deprecated markdown_github, there is actually an extension to use gutenberg; which would allow for the _emphasis_ style I wanted, but unfortunately it also swaps bold to ALLCAPS, so not ideal for my needs:

@0xdevalias
Copy link
Author

0xdevalias commented Jan 12, 2025

Reading further, it seems this could be better resolved with a custom writer rather than the --lua-filter I was originally trying to use:

With an example from djot-writer.lua(Ref: 1, 2):

Inlines.Emph = function(el)
  return concat{ "_", inlines(el.content), "_" }
end
Blocks.BulletList = function(el)
  local attr = render_attributes(el, true)
  local result = {attr, cr}
  for i=1,#el.content do
    result[#result + 1] = hang(blocks(el.content[i], blankline), 2, concat{"-",space})
  end
  local sep = blankline
  if is_tight_list(el) then
    sep = cr
  end
  return concat(result, sep)
end

See also, the following issue about improving the docs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants