Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pandoc.scaffolding section in 'Pandoc Lua Filters' docs (RE: Custom Writers) #10531

Open
0xdevalias opened this issue Jan 12, 2025 · 6 comments

Comments

@0xdevalias
Copy link

0xdevalias commented Jan 12, 2025

In the spirit of:

  • https://pandoc.org/CONTRIBUTING.html
    • Write or improve documentation. If you ran into a problem which took more time to figure out than expected, please consider to save other users from the same experience. People writing the documentation tend to lack an outside view, so please help provide one. Good documentation is both difficult and extremely important.

Describe your proposed improvement and the problem it solves.

In reading through the Lua Filters documentation recently, I found this section on pandoc.scaffolding for Writers:

But the Fields section is empty (unsure if that is how it is meant to be, but perhaps adding some text to say that explicitly if so to remove ambiguity):

The Writer section is also fairly light on details:

In doing some googling, I also came across these pages:

It would be good to fill out this section of the docs more, even if that is just adding some cross-links to the existing pages on Custom Writers/etc.

Describe alternatives you've considered.

I'm aware of this issue/PR, but the nuance is different:

This docs issue is also unrelated, except for improving the docs UX for new users in general:

@0xdevalias
Copy link
Author

0xdevalias commented Jan 16, 2025

The following is a bit of a mind dump of my process/thoughts/random ideas of what would make this easier as a new user wanting to make a custom writer:

Have been trying to play with pandoc.scaffolding.Writer a little more this afternoon, and it seems like we basically just need to implement handlers for each of the node types; but we have the flexibility to override functions if when needed:

  • https://pandoc.org/custom-writers.html#reducing-boilerplate-with-pandoc.scaffolding.writer
    • The pandoc.scaffolding.Writer structure is a custom writer scaffold that serves to avoid common boilerplate code when defining a custom writer. The object can be used as a function and allows to skip details like metadata and template handling, requiring only the render functions for each AST element type.

      The value of pandoc.scaffolding.Writer is a function that should usually be assigned to the global Writer

    • All predefined functions can be overwritten when needed.

I may have missed it when skimming around, but I have been struggling to figure out how to see what the default functions this creates actually are:

I think it would be helpful to expand the documentation to:

  • explain more about what functions are generated/handled by this helper method
  • perhaps also to give an example of how to inspect those ourselves (in case the documentation gets out of sync with the implementation at some point)
  • an example of how to override one of the default functions

The next thing I noticed was that if we just create a writer with Writer = pandoc.scaffolding.Writer and try to run it, we will immediately get errors about not implemented handlers for the various AST nodes. While this makes sense, as it is a 'custom writer' that we haven't customised; it might be nice if there was an option to allow it to delegate to a specified writer implementation for the nodes by default, and then we could just override the ones we want to change. For example, I would like a custom writer that is mostly gfm, but that I tweak for Emph / BulletList / etc (eg. #10527, #10479, etc).

But since that's not currently a feature (to my knowledge), I next attempted to find a list of AST nodes that I needed to implement, and from a quick attempt to skim around the lua docs, I couldn't find a neat canonical list (#3262) I'm sure I could iterate through all the headings to build one, but really I wanted to be able to copy/paste a paragraph or so and reformat it into lua functions; or even better would be if there was a default set of functions I could copy from the lua filter docs/etc that would cover all of the boilerplate of setting up a 'delegating custom writer'; or perhaps this could also be 'baked in' to pandoc by adding a canonical example 'delegating custom writer' to the user data directory, that we could then access via something like pandoc --print-default-data-file custom/delegating-custom-writer-example.lua; or perhaps some combination of these methods.

Luckily I remember that I was reading through some of the Haskell source for AST nodes the other day, and copied a list of them into my gist, so I could use that as a starting reference; but I don't remember where I actually found that originally. #3288 seems to allude to it, and links to this code that seems to define relevant things, though its not immediately clear to me if thats everything I would need to implement or not (and as a new user, I wouldn't want to have to dig through the source to find these sorts of answers):

https://github.com/jgm/pandoc-types/blob/dc56b9a9678843649a6b1b50d255cc689fba4412/src/Text/Pandoc/Definition.hs#L315-L336

https://github.com/jgm/pandoc-types/blob/dc56b9a9678843649a6b1b50d255cc689fba4412/src%2FText%2FPandoc%2FDefinition.hs#L269-L303

I was originally looking at the old djot implementation for inspiration, and noticed this pattern that seems to allow running something for unimplemented entries; I wonder if we could do similar here to hack some kind of 'delegating custom writer':

Blocks = {}
Blocks.mt = {}
Blocks.mt.__index = function(tbl,key)
  return function() io.stderr:write("Unimplemented " .. key .. "\n") end
end
setmetatable(Blocks, Blocks.mt)

Inlines = {}
Inlines.mt = {}
Inlines.mt.__index = function(tbl,key)
  return function() io.stderr:write("Unimplemented " .. key .. "\n") end
end
setmetatable(Inlines, Inlines.mt)
  • https://www.lua.org/pil/13.html
    • 13 – Metatables and Metamethods

    • Metatables allow us to change the behavior of a table.

    • We can use setmetatable to set or change the metatable of any table

    • It seems there is also a getmetatable
  • https://www.lua.org/pil/13.3.html
    • 13.3 – Library-Defined Metamethods

    • It is a common practice for some libraries to define their own fields in metatables.

    • The tostring function provides a typical example. As we saw earlier, tostring represents tables in a rather simple format:

      print({})      --> table: 0x8062ac0
      

      (Note that print always calls tostring to format its output.) However, when formatting an object, tostring first checks whether the object has a metatable with a __tostring field. If this is the case, tostring calls the corresponding value (which must be a function) to do its job, passing the object as an argument. Whatever this metamethod returns is the result of tostring.

    • The setmetatable/getmetatable functions use a metafield also, in this case to protect metatables. Suppose you want to protect your sets, so that users can neither see nor change their metatables. If you set a __metatable field in the metatable, getmetatable will return the value of this field, whereas setmetatable will raise an error

  • https://www.lua.org/pil/13.4.html
    • Table-Access Metamethods

    • But Lua also offers a way to change the behavior of tables for two normal situations, the query and modification of absent fields in a table.

    • https://www.lua.org/pil/13.4.1.html
      • 3.4.1 – The __index Metamethod

      • when we access an absent field in a table, the result is nil. This is true, but it is not the whole truth. Actually, such access triggers the interpreter to look for an __index metamethod: If there is no such method, as usually happens, then the access results in nil; otherwise, the metamethod will provide the result.

    • https://www.lua.org/pil/13.4.2.html
      • 13.4.2 – The __newindex Metamethod

      • The __newindex metamethod does for table updates what __index does for table accesses. When you assign a value to an absent index in a table, the interpreter looks for a __newindex metamethod: If there is one, the interpreter calls it instead of making the assignment.

      • The combined use of __index and __newindex metamethods allows several powerful constructs in Lua, from read-only tables to tables with default values to inheritance for object-oriented programming.

    • https://www.lua.org/pil/13.4.4.html
      • 13.4.4 – Tracking Table Accesses

      • Both __index and __newindex are relevant only when the index does not exist in the table. The only way to catch all accesses to a table is to keep it empty. So, if we want to monitor all accesses to a table, we should create a proxy for the real table. This proxy is an empty table, with proper __index and __newindex metamethods, which track all accesses and redirect them to the original table.

From reading deeper about mettables and metmethods like __index, it sounds like that would probably be a good way to implement a 'delegating custom writer'; and the 'tracking table access' might give us a way to see everything being called (as a way of seeing what functions exist maybe?)

I thought I tried this before.. but apparently not, as it seems to work somewhat:

Writer = pandoc.scaffolding.Writer

print("Writer:")
for n in pairs(Writer) do
  elem = Writer[n]

  print(n, type(n), type(elem))

  if type(elem) == "table" then
    print("--Foo--")
    for m in pairs(elem) do
      print("  ", m, type(m))
    end
    print("--Bar--")
  end
end

Outputting:

Writer:
Inlines	string	function
Blocks	string	function
Pandoc	string	function
Inline	string	table
--Foo--
--Bar--
Block	string	table
--Foo--
--Bar--

I can also see that directly calling something like Writer.Inline.Plain() will trigger the function not existing, which I assume means the 'handle default' metatable functionality is implemented on Writer.Inline and Writer.Block; which we could override:

Writer = pandoc.scaffolding.Writer

Writer.Inline.Plain()
Error running Lua:
No render function for Block value 'Plain';
define a function `Writer.Block.Plain` that returns a string or Doc.
stack traceback:
	...lias/.local/share/pandoc/custom/gfm_devalias_writer4.lua:27: in main chunk

Next I would just need to figure out if/how I can get access to the existing writer's handlers from lua.. or if not, a method for doing similar (eg. calling write within each handler)

I think that's the end of my mind dump thoughts for now.

@jgm
Copy link
Owner

jgm commented Jan 16, 2025

I couldn't find a neat canonical list

We should have that. We used to have a sample old-style custom writer in the source tree, but it seems to have been deleted in 79d6b45 . This contained all of the things you'd need to implement.

The manual now points you to djot-writer.lua as a canonical example.

it might be nice if there was an option to allow it to delegate to a specified writer implementation for the nodes by default, and then we could just override the ones we want to change.

I've talked to @tarleb about this idea. We agree it would be great to have this, but it's not possible without some architectural changes.

@0xdevalias
Copy link
Author

The manual now points you to djot-writer.lua as a canonical example.

I have been looking at that, and it's been helpful, but a few points:

  • It's no longer part of the repo:
  • The last version there is built in seemingly the old style, and not using pandoc.scaffolding.Writer, so while it's a good general reference, it's more verbose than it needs to be with the modern APIs.

We agree it would be great to have this, but it's not possible without some architectural changes.

nods makes sense, and I figured that was likely the case. Just wanted to make sure to raise it all the same.

@0xdevalias
Copy link
Author

0xdevalias commented Jan 17, 2025

It's no longer part of the repo

Actually, looking at the commit history, seems there is a jgm/djot.lua repository as well:

Looking at that timeline and versions of the code:

Comparing those 2 versions shows the diff is quite minimal:

--- -- Last version before moved to jgm/djot.lua: http…
+++ -- Version from https://github.com/jgm/djot/pull/2…
@@ -1,5 +1,9 @@
--- Last version before moved to jgm/djot.lua: https://github.com/jgm/djot/blob/239969fd84b6406b1f29e3514186667a5ac6c1cc/djot-writer.lua
+-- Version from https://github.com/jgm/djot/pull/233 merged in https://github.com/jgm/djot/blob/2336a695176d9ef15661faa5a425d9a372187db9/djot-writer.lua
 -- custom writer for pandoc
+-- example of upgrading of old style writer (pandoc <3.0)
+-- into new style (pandoc >= 3.0)
+-- see end of file for modified Writer function incorporating code snippet
+-- @@jarnosz at github 2023-06-01
 
 local unpack = unpack or table.unpack
 local format = string.format
@@ -439,6 +443,13 @@
 end
 
 function Writer (doc, opts)
+-- begin patch
+-- function Writer (doc, opts)
+  PANDOC_DOCUMENT = doc
+  PANDOC_WRITER_OPTIONS = opts
+  loadfile(PANDOC_SCRIPT_FILE)()
+-- return pandoc.write_classic(doc, opts) 
+-- end patch
   local d = blocks(doc.blocks, blankline)
   local notes = {}
   for i=1,#footnotes do

And diffing the last non-reverted version of djot-writer.lua in jgm/djot before transitioning with the latest version in jgm/djot.lua shows also minimal recent changes:

--- -- Last version before moved to jgm/djot.lua: http…
+++ -- Latest version on jgm/djot.lua (from 2023-11-03…
@@ -1,4 +1,4 @@
--- Last version before moved to jgm/djot.lua: https://github.com/jgm/djot/blob/239969fd84b6406b1f29e3514186667a5ac6c1cc/djot-writer.lua
+-- Latest version on jgm/djot.lua (from 2023-11-03) https://github.com/jgm/djot.lua/blob/7d1fbbb347c0f35b92590ed799501e8ba7115156/djot-writer.lua
 -- custom writer for pandoc
 
 local unpack = unpack or table.unpack
@@ -439,12 +439,18 @@
 end
 
 function Writer (doc, opts)
+  PANDOC_WRITER_OPTIONS = opts
   local d = blocks(doc.blocks, blankline)
   local notes = {}
   for i=1,#footnotes do
     local note = hang(blocks(footnotes[i], blankline), 4, concat{format("[^%d]:",i),space})
     table.insert(notes, note)
   end
-  return layout.render(concat{d, blankline, concat(notes, blankline)}, opts.columns)
+  local formatted = concat{d, blankline, concat(notes, blankline)}
+  if PANDOC_WRITER_OPTIONS.wrap_text == "wrap-none" then
+    return layout.render(formatted)
+  else
+    return layout.render(formatted, opts.columns)
+  end
 end

Since there are minimal differences across all 3 versions, I will just look at/comment on the latest version in jgm/djot.lua as a source of example for custom writers:

  • It still seems to be largely written using the old style custom writer, and seems to sort of half implement the changes for 3.x (namely, setting PANDOC_WRITER_OPTIONS, but doesn't use pandoc.write_classic/etc):
  • It's not specifically using pandoc.scaffolding.Writer, but from my understanding of how it works, I think it's implemented in basically the same way, just explicitly within the custom writers lua code itself:
    • eg. This Blocks / Inlines + setmetatable implementation appears to be equivalent to pandoc.scaffolding.Writer's implementation of Writer.Block and Writer.Inline (note the non-plural in the newer name change)
      • The render functions for Block and Inline values can then be added to Writer.Block and Writer.Inline, respectively. (Ref)

    • This inlines / blocks implementation appears to be equivalent to pandoc.scaffolding.Writer's implementation of Writer.Blocks and Writer.Inlines
      • the functions Writer.Blocks and Writer.Inlines can be used to render lists of elements, and Writer.Pandoc renders the document’s blocks. (Ref)

    • The Writer function is roughly equivalent to pandoc.scaffolding.Writer's main generated Writer function (aka: the result from calling it: Writer = pandoc.scaffolding.Writer)
    • etc

Based on this discovery, a few things that would probably be worth doing:

@0xdevalias
Copy link
Author

0xdevalias commented Jan 17, 2025

RE: AST Node Reference

I couldn't find a neat canonical list

We should have that.

This issue has a few mentions about it:

@jgm
Copy link
Owner

jgm commented Jan 17, 2025

I'd like to get @tarleb 's fedback on your comment, but to me these all seem like good changes! Thanks for the thorough look-through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants