Cabal exact print #65

jappeace · 2024-07-17T01:10:08Z

Ericson2314 · 2024-07-19T18:30:53Z

Does GenericPackageDescription have enough information? I'm very out of the loop on this, but I feared it might not. (Feel free to just link me something that already exists answering this.)
ExactPrintMeta is a nasty way to encode this information, because it makes it very hard to modify the AST while preserving comments (other edits would invalidate stuff in the map --- unless you have cool ideas I am missing). Maybe it is good intermediate step, but I highly doubt its what we want to end up with. This choice should be justified.

From a quick glance at the mega-issue, the approach of using Distribution.Fields does seem more promising. It would be good to discuss why that approach is not taking if you don't want to do it.

jmct · 2024-07-19T18:31:12Z

@andreabedini, it would be useful to have your thoughts on this as someone that's considered this stuff deeply before.

gbaz · 2024-07-19T19:07:52Z

Having reviewed everything I see there's already a lot of work in this draft PR: https://github.com/haskell/cabal/pull/9436/files

It would be good to clarify if this proposal is to continue to extend that work, or to do something else inspired by it, or something else. It looks like the first to me!

If so, it would be good to clarify A) what remains to be done given that work already existing, and B) if this approach has been tested in terms of what happens if we parse, modify, and then exact-print. Are there examples of this -- and where in that code is the algorithm that sort of "merges" the changes back into the exact-printed structure?

Reviewing that PR I think I recall that the code looked very promising, and answers a lot of questions that the proposal itself isn't in-depth enough to get to.

jappeace · 2024-07-23T00:17:29Z

@gbaz
sorry I forgot to directly link, but that's the PR indeed, I added it to the proposal.

B) if this approach has been tested in terms of what happens if we parse, modify, and then exact-print. Are there examples of this

There are no tests for this right now, we should add tests for this, please list which modifications are important and I'll add an automated test for each of them.

and where in that code is the algorithm that sort of "merges" the changes back into the exact-printed structure?

The current implementation will "improvise" and fall back to the pretty printer.
I think what it should do is keep track of improvisations and relatively shift subsequent exact occurrences.

it does something like this right now but it's kind of adhoc:
https://github.com/haskell/cabal/pull/9436/files#diff-53cbe1fb815e26b11060bd5c78c372013a079192b8acad5c1c6044ecc647dcefR160

and it isn't quite right, because it'll find negative relative rows and do nothing with that information, but it indicates those improvisations have occurred: https://github.com/haskell/cabal/pull/9436/files#diff-53cbe1fb815e26b11060bd5c78c372013a079192b8acad5c1c6044ecc647dcefR137

For comment placement you can do the same relative shifts per row, but for columns you may need to keep right shifting them untill you find place (under modification)

jappeace · 2024-07-23T00:41:11Z

@Ericson2314

Hi eric,

Does GenericPackageDescription have enough information? I'm very out of the loop on this, but I feared it might not. (Feel free to just link me something that already exists answering this.)

as far as I can tell it's the root type used for dealing with .cabal, for example it's used in readSourcePackageCabalFile, and they benchmark for the parseGenericPackage function:
https://github.com/haskell/cabal/blob/705b6ebcaed649ed1a30b138b4348d24722d610a/cabal-benchmarks/bench/CabalBenchmarks.hs#L22
up till now I've added some fields for things that aren't used by cabal (for example, exact positions and comments), but I think the main cabal library uses that type as well to do most things.

ExactPrintMeta is a nasty way to encode this information, because it makes it very hard to modify the AST while preserving comments

I'm open to suggestions! Comments specifically are underdeveloped in its current form because it was the zurich hack prototype which I barely got working 😄

(other edits would invalidate stuff in the map

Invalidation works kinda great for exact positions themselves (eg for fields and sections).

Perhaps using a map wasn't the best idea for comments,
it should be something like Map Ycoordinate (XCoordinate, Text).
y can be relatively calculated like I mentioned above (keeping track of "improvisations"),
and x you try to place on a row, if there is no space you just keep shifting right until you find space.
That way you can modify GenericPackageDescription and preserve the comment.

gbaz · 2024-07-23T01:34:31Z

Thanks for the updates and comments. On the thing you added with remaining work, I think a proposal we accept would need some sense of how we can architect thing to solve those problems, since they're pretty fundamental:

Redo how common stanza's are handled (they're currently "merged" into sections directly, which is unrecoverable).
add support for comma printing,
add support for braces.
add support for conditional branches.

Specifically, I think that adding support for common stanzas and conditionals is going to be both very tricky and very important. Meanwhile, adding support for braces can be out of scope -- i.e. if braces are translated away, I think that's livable -- they're a basically unused feature. And commas should be straightforward.

andreabedini · 2024-07-23T14:42:15Z

It is great to see you pushing forward with this @jappeace. I know you have invested significant time on it already and we should not let that effort go unnoticed. I also liked working with you on this at ZuriHac, if only that weekend had been a few days longer!

With respect to the proposal. Thanks for mentioning my Field-based proposal, but I think you misunderstand it.

A related effort is to build combinators that allow modifyng the Field type directly. This would depracate the GenericPackage structure and make an alternative structure available. [..] I suppose the idea is to completly replace GenericPackageDescription with this Field type.

Replacing GenericPackageDescription (GPD) has never been on the cards. My approach, like yours, was to make the minimal change that is a step forward. Therefore what I have been working on was intentionally ... anticlimactic: forget about the exact-print of GPD and start with the exact-print of Field. The reasoning is that it is Field that actually describes the syntax cabal files (and other files) are written in.

A value of type Field a (where a is a type parameter used to annotate parsed values) does correspond 1-to-1 to what is written in a cabal file, with the non-trivial caveat that comments and the whitespaces have already been stripped by the lexer. GPD, ProjectConfig and InstalledPackageInfo get all "parsed" from this lower-level Field type.

To make a long story short: my plan was to get the lexer to keep whitespace and comments so I could exactprint Field. Changing the lexer was easy but the lexer and the Field parser are interlevead and I never managed to adjust the parser side. I thought at ZuriHac you did succeed in fixing the parser side? You might have already integrated the two approaches in your PR, I will need to check.

In any case, the two approaches are not incompatible or in alternative. I believe you must have gone past the obstacles I have encountered, which now leaves me to decide what to do with my efforts :-)

Unlike the Field effort, this proposal only focusess only on getting exact printing to work with a minimal footprint. We don't want to do any additional refactoring.

Maybe I misunderstand the plan but I am not sure whether GPD can even be mapped to the concrete grammar. @gbaz rightly mentions the common stanzas, which get inlined directly during the Field to GPD parsing. How do we keep the knowledge of common stanzas in GPD without a significant refactoring?¹

There are some constraints I think we should keep in mind.

I believe whatever framework of functionality should be part of Cabal-syntax.
The API should be well integrated with the rest (where it makes sense. Cabal-syntax is a bit of a kitchen sink already).
Ideally the set of parseable cabal files should not change, although pragmatically I would say it's ok to only check those on Hackage.
Operations like cabal update and cabal build --dry-run should not get slower.
The memory demand on workloads like hackage-server should not increase.

All this might well within reach, you just need to prove it :P

I do not mean to give a negative connotation to "significant refactoring", likely it would be beneficial. Nevertheless I think it is unavoidable. ↩

jappeace · 2024-08-07T20:29:24Z

@andreabedini

In any case, the two approaches are not incompatible or in alternative. I believe you must have gone past the obstacles I have encountered, which now leaves me to decide what to do with my efforts :-)

I strongly suspect this approach I'm taking will work, but I don't wish to discourage you.

To make a long story short: my plan was to get the lexer to keep whitespace and comments so I could exactprint Field.

Could you elaborate on how this deals with changes made? For example, if hls wants to add a dependency to a library.

I thought at ZuriHac you did succeed in fixing the parser side? You might have already integrated the two approaches in your PR, I will need to check.

yes it passes the tests. Like mentioned in the proposal, it's not complete however, not all comment positions are captured.
I pushed all changes I made on zurich hack to pr linked in this proposal, for example:

https://github.com/haskell/cabal/pull/9436/files#diff-cd54b0e388dc16fd5e820e41341f642c3a526d143a6768014478e56984a92751L138

I believe whatever framework of functionality should be part of Cabal-syntax.

I'm not familiar with the cabal package structure, why is this desired?
I think the current implementation already abides by this.

The API should be well integrated with the rest (where it makes sense. Cabal-syntax is a bit of a kitchen sink already).

Could you elaborate on this?

Ideally the set of parseable cabal files should not change, although pragmatically I would say it's ok to only check those on Hackage.

I already mentioned this I think.
https://github.com/haskellfoundation/tech-proposals/pull/65/files#diff-e5972723ec21675420683d52a3ddc437d65024bd60428c4a50ff09a0ba53c87aR171

The memory demand on workloads like hackage-server should not increase.

I mentioned in the proposal it should not crash, because no increase at all seems a bit stringent. Please review:
https://github.com/haskellfoundation/tech-proposals/pull/65/files#diff-e5972723ec21675420683d52a3ddc437d65024bd60428c4a50ff09a0ba53c87aR393

Operations like cabal update and cabal build --dry-run should not get slower.

Why do we want this?

jappeace · 2024-08-07T20:30:27Z

@gbaz I added some ideas for both common stanzas and conditionals.

gbaz · 2024-08-08T02:59:09Z

Thanks for the updates. Starting to shape up.

Ericson2314 · 2024-08-15T21:12:10Z

@jappeace Even with relative offsets, the separate map idea sounds very hard to work with to me. I still strongly lean towards the Field approach. I don't yet see any downsides to it, just upsides.

gbaz · 2024-08-16T01:23:02Z

Why do we want this?

We want operations to not get slower because cabal needs to parse the full tarball of cabal files to build its database to solve, etc. If parsing is slow, then it becomes amplified at scale, and makes end-users frustrated when common commands take longer.

gbaz · 2024-08-16T01:28:01Z

in terms of "Refactor GenericPackageDescription so that the parser no longer merges these sections and instead stores them as proper records. Then make the callsites smart enough to deal with common stanzas." I think this will be very painful. There are a lot of callsites and a lot of usages -- especially since so many fields can be within common stanzas. A remembering and "smart back-merge" algo seems much more lightweight.

As far as this approach vs fields, the advantage to this gpd approach that I see is that it would let people modify the gpd, which is a much more "natural" structure to the data, as opposed to modifying fields directly. So its like letting people manipulate an actual datastructure vs the json-encoding of that structure.

In fact, roughly, Fields : GPD :: Json : ADT

Bodigrim · 2024-08-23T20:50:59Z

I greatly appreciate the effort, so please consider my remarks below not as a critique against but as an encouragement to make the proposal stronger.

In a world of infinite resources having an exact printer is better than not having one. But in the real world I'd suggest to adopt a problem-based approach. What is a practical problem which an exact printer can solve, which cannot be reasonably solved by other, already available methods? There is no such business ask as "bidirectional parsing and printing" and I hardly imagine an HF sponsor immediately excited about by such achievement.

The proposal lists automated addition of dependencies, expansion of modules and formatting. But we do have (maybe, imperfect) automated tools for all of this already. Would it be easier to accomplish such tasks with an exact printer? How exactly? To pick a particular instance, what kind of impact would an exact printer have for cabal-add?

As much as I can tell from a cursory glance, GHC exact printer needs lots of maintenance. Pretty much every release of GHC requires non-trivial amount of work in this area. We'd better have a very good justification to invest into one for Cabal.

it's kindoff annoying because the reference implementation already uses this, but somehow people aren't connecting the dots.

Bodigrim · 2024-08-26T15:45:42Z

For example a newly introduced cabal-add, would need to take into account any
syntax change to cabal, forever.

I don't quite understand this, cabal-add does not parse Cabal files on its own.

jappeace · 2024-08-26T16:02:24Z

It looks like parsing to me because it's working directly on Field type which is what the parser in cabal does.
but I wrote that because I saw stuff like detectLeadingComma this:
https://github.com/Bodigrim/cabal-add/blob/master/src/Distribution/Client/Add.hs#L393

Bodigrim · 2024-08-26T16:19:11Z

cabal-add uses detectLeadingComma to detect whether a Cabal file uses either leading or trailing commas for build-depends, so that it can mimic the convention in a new entry. How would exact printing help with this? It seems to me it will only make this property harder to detect.

It would help me to see a specific example how cabal-add would look like if the proposal is implemented. Having [Field Position] is already quite powerful, and we can convert it to [Field ByteString] without much work, so I'm somewhat unsure whether full-fledged exact print annotations bring enough additional value.

jappeace · 2024-08-26T16:59:10Z

the type we wish to expose isn't Field 😅
I added an example to clarify why.

There are many ways of capturing the cruft of commas and spacing, the one I proposed here is just an imperfect but hopefully decent solution,
I think it'll make manipulating cabal files much easier.

jappeace added 6 commits November 15, 2023 23:06

cabal exact pritn

562a746

exact printer

f5ef56c

add more text

bb5087d

clear out draft

2a6f04d

Add intro

9367c7b

Remove riuga

99f519a

jappeace added 2 commits July 22, 2024 19:32

Add direct link to the exact printer pull request

3c138b4

Draft out current work left todo

49a0679

jappeace added 3 commits July 30, 2024 15:48

Draft out common stanzas design

fe8bea5

add note on previous efforts

75c8ab6

Add notes on conditionals

61a5916

jappeace added 2 commits August 7, 2024 18:31

Add more lines

c79a135

Fix grammar

a50d449

jappeace changed the title ~~Jappie cabal exact print~~ Cabal exact print Aug 10, 2024

jappeace added 2 commits August 26, 2024 11:17

Section on why we need exact printing and how users programs benefit

a127fcb

spell errors

163731e

jappeace added 3 commits August 26, 2024 11:19

Add section on field because it comes up a lot

13c280c

it's kindoff annoying because the reference implementation already uses this, but somehow people aren't connecting the dots.

recover heading

b014e2d

Re-read doc, delete irrelevant stuff, fix various grammar issues

38b0e9b

jappeace added 2 commits August 26, 2024 12:56

Add example on why use GenericPackageDescription instead of Field

b5139ef

Add note on lack of syntax mangling

4338943

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cabal exact print #65

Cabal exact print #65

jappeace commented Jul 17, 2024 •

edited by gbaz

Loading

Ericson2314 commented Jul 19, 2024

jmct commented Jul 19, 2024

gbaz commented Jul 19, 2024

jappeace commented Jul 23, 2024

jappeace commented Jul 23, 2024

gbaz commented Jul 23, 2024

andreabedini commented Jul 23, 2024

jappeace commented Aug 7, 2024

jappeace commented Aug 7, 2024

gbaz commented Aug 8, 2024

Ericson2314 commented Aug 15, 2024

gbaz commented Aug 16, 2024

gbaz commented Aug 16, 2024

Bodigrim commented Aug 23, 2024

Bodigrim commented Aug 26, 2024

jappeace commented Aug 26, 2024

Bodigrim commented Aug 26, 2024

jappeace commented Aug 26, 2024

Cabal exact print #65

Are you sure you want to change the base?

Cabal exact print #65

Conversation

jappeace commented Jul 17, 2024 • edited by gbaz Loading

Ericson2314 commented Jul 19, 2024

jmct commented Jul 19, 2024

gbaz commented Jul 19, 2024

jappeace commented Jul 23, 2024

jappeace commented Jul 23, 2024

gbaz commented Jul 23, 2024

andreabedini commented Jul 23, 2024

Footnotes

jappeace commented Aug 7, 2024

jappeace commented Aug 7, 2024

gbaz commented Aug 8, 2024

Ericson2314 commented Aug 15, 2024

gbaz commented Aug 16, 2024

gbaz commented Aug 16, 2024

Bodigrim commented Aug 23, 2024

Bodigrim commented Aug 26, 2024

jappeace commented Aug 26, 2024

Bodigrim commented Aug 26, 2024

jappeace commented Aug 26, 2024

jappeace commented Jul 17, 2024 •

edited by gbaz

Loading