-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2d print #1155
base: master
Are you sure you want to change the base?
Conversation
* Improving handling for Infix, Prefix and Postfix format
Character-based printing does not feel like a "core" function. Instead, it is purely a formatting function, done after boxing. Can this be reformulated as a formatting process instead? |
@rocky, indeed, my idea when I started writting this (circa July 2023) was to out in a module. But still I was not able to redefine the boxing mechanism and the package interface to make it works as an independent package. In any case, most of the code is independent enough to move it to a package when the other things get ready. |
It does not necessarily have to be in a Mathics3 module outside of the mathics core repository, but it should be outside of (And yes, if we could hook into SymPy's format routines that would also be awesome.) I fear that we are making things worse for us in the long run by violating modularity or separation of phases. This kind of thing has happened in this project in the past, and it has caused a lot of extra work that has taken a long time to address (and some of it hasn't been fully addressed even now). |
wolframscript has:
Similarly: these appear differently in wolframscript:
|
Comparing with what SymPy produces:
This is different from wolframscript in that some Unicode symbols seem to be used: |
Indeed, what I build was a kind here is a kind of experiment. I didn't try too hard to mimic exactly what Sympy or WMA does, apart from building a "2D" text representation, that is what |
@@ -233,7 +233,7 @@ def fraction(a: Union[TextBlock, str], b: Union[TextBlock, str]) -> TextBlock: | |||
a = TextBlock(a) | |||
if isinstance(b, str): | |||
b = TextBlock(b) | |||
width = max(b.width, a.width) + 2 | |||
width = max(b.width, a.width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding more original code, please look into hooking into SymPy's character-based formatting mechanism.
Otherwise, this may be another kind of thing where effort is put into creating something that is later removed, because there is something that is more likely to be more complete and that we won't have to maintain.
If it turns out that we can't use SymPy's character-based printing, then we can go down this road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the plan is to replace the use of mathics.format.pane_text
by sympy.printing.pretty.stringpict
, which is something that I already started to do.
What is not possible to do is just using sympy.pretty(expression.to_sympy())
because the translation function which is useful for evaluation, is not useful for formatting.
For example
>>> import sympy
>>> from mathics.session import MathicsSession
>>> session=MathicsSession()
>>> expr=session.evaluate("Integrate[f[x]/g[x]^2,x]")
>>> sympy.pretty(expr.to_sympy())
produces
⌠
⎮ SympyExpression(_uGlobal`f, _uGlobal`x)[Global`f[Global`x]])
⎮ ───────────────────────────────────────────────────────────── d(_uGlobal`x)
⎮ 2
⎮ SympyExpression(_uGlobal`g, _uGlobal`x)[Global`g[Global`x]])
⌡
while
>>> expr=session.evaluate("Integrate[f[x]/g[x]^2,{x,a,b}]")
>>> print(sympy.pretty(expr.to_sympy()))
is not even able to identify the integrate symbol:
SympyExpression(_uSystem`Integrate, SympyExpression(_uGlobal`f, _uGlobal`x)/SympyExpression(_uGlobal`g, _uGlobal`x)**2, SympyExpression(_uSystem`List, _uGlobal`x, _uGlobal` ↪
↪ a, _uGlobal`b))[System`Integrate[System`Times[Global`f[Global`x], System`Power[Global`g[Global`x], -2]], {Global`x,Global`a,Global`b}]])
In any case, the purpose of this PR is to
- considering if this 2D format is something that we would like to use in the REPL
- bring the formatting routines closer to the one used in WL
- explore a possible design pattern (based on what we already have) to connect a formatted Mathics expression to a
prettyForm
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the plan is to replace the use of
mathics.format.pane_text
bysympy.printing.pretty.stringpict
, which is something that I already started to do.What is not possible to do is just using
sympy.pretty(expression.to_sympy())
because the translation function which is useful for evaluation, is not useful for formatting.For example
>>> import sympy >>> from mathics.session import MathicsSession >>> session=MathicsSession() >>> expr=session.evaluate("Integrate[f[x]/g[x]^2,x]") >>> sympy.pretty(expr.to_sympy())
This is clearly wrong because expr
needs to be boxed first. Calling SymPy formatting routines are triggered by the formatting process of boxed expressions.
This is setting up a strawman or superficial argument only to be able to shoot it down.
The time spent adjusting the bar in a division I think is better spent towards getting to the skeleton of a possible solution. The plan that you wanted written down has you working on revising Boxing. When that is in place, we might be in an even better position to work on the boxing to formatting step needed in character-based printing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearly wrong because
expr
needs to be boxed first. Calling SymPy formatting routines are triggered from formatting boxes.This is setting up a strawman or superficial argument only to be able to shoot it down.
Please, do not take this in a wrong way: the example was just a way to show some of the challenges in hooking up sympy.pretty
: even at the level of symbols, a preprocessing must be done. Probably what is more easy to hook is the sympy.printing.pretty.stringpict.prettyForm
, which is quite analogous to what I put in pane_text
.
sympy.pretty
works using a sympy.printing.pretty.PrettyPrinter
object that does something similar to what I did in mathics.format.prettyprint
.
The time spent adjusting the bar in a division I think is better spent towards getting to the skeleton of a possible solution. The plan that you wanted written down has you working on revising Boxing. When that is in place, we might be in an even better position to work on the boxing to formatting step needed in character-based printing.
OK, but the skeleton of the solution that I propose is already here: When an expression is wrapped by OutputForm
(even if $Use2DOutputForm
is set to False
) the formatting process follows a sequence closer to the one (I think) WMA follows.
In any case, I wanted to put this here, because I will need it to present my case when I propose the other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, do not take this in a wrong way: the example was just a way to show some of the challenges in hooking up
sympy.pretty
: even at the level of symbols, a preprocessing must be done.
I was aware of the challenges and the process well before you started this PR. "preprocessing" is the wrong word/concept. In the formatting process, objects like SymPy symbols get transformed to strings.
Probably what is more easy to hook is the
sympy.printing.pretty.stringpict.prettyForm
, which is quite analogous to what I put inpane_text
.
sympy.pretty
works using asympy.printing.pretty.PrettyPrinter
object that does something similar to what I did inmathics.format.prettyprint
.
Except more effort, time, and thought was probably put into the sympy.printing.pretty.PrettyPrinter
object. It may be that it is more elucidating for you to write some code so you understand the basic concepts rather than look at someone else's code. For me though this kind of thing is more of a distraction and possibly a dangerous activity, because I get the feeling that if I don't mention something, this kind of thing will get into the code base and then we'll want to remove it later on. We have seen this kind of thing too often. Furthermore, we haven't dug out of the previous messes fully yet.
OK, but the skeleton of the solution that I propose is already here: When an expression is wrapped by
OutputForm
(even if$Use2DOutputForm
is set toFalse
) the formatting process follows a sequence closer to the one (I think) WMA follows.
There are probably very many situations that aren't covered and haven't been considered. And in the first few things I tried, I saw differences.
BTW, I don't like the term "2D". This is character-based output. Most output, such as SVG, MathML, and LaTeX, is 2D.
In any case, I wanted to put this here, because I will need it to present my case when I propose the other changes.
Personally, I would prefer if you discuss what changes you want to propose at a high level first. If I need detailed code to understand, then we can code this out. If you need to write some sample code for yourself , sure do that. But unless others express interest in seeing this, these branches don't help me in a positive way. Rather, it feels negative because I see flailing about where it feels to me there shouldn't be flailing like this.
In my view, the developer docs describe in pretty good detail how the system transforms M-expressions to boxed-expressions, and then to formatted output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but for naming functions, it would be a little bit long, isn't it? Here I didn´t want to call it "OutputForm", because in all the other places, "OutputForm" is still a "one-dimensional using only keyboard characters" which most of the time is the same than InputForm
, but with spaces between infix operators and operands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, the question is: supposing we found and agree on an implementation of this "two-dimensional using only keyboard characters", is it something that we want to have (at least optional) available in the command line interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I was trying to say is that if you have to drop something in the description, dropping "character-based" is bad, in the same way that condensing "strawberry" to "straw" rather than "berry" is not helpful.
"Character2D" is not too long. But the word "character" is as important as 2D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, the question is: supposing we found and agree on an implementation of this "two-dimensional using only keyboard characters", is it something that we want to have (at least optional) available in the command line interface?
It feels to me that we are thinking about this the wrong way. To me, this is like asking if TeXForm should be in the command-line interface. And whether we should be able to return MathML formatted output in the command-line interface.
To me, the focus of the implementation should be on how things are boxed and formatted in a generic and general way. Not about what is appropriate for a particular front end. The current implementation is lacking in the generic and general nature. It was particularly more evident when this was in mathics.core
which 2D character-based formatting is totally inappropriate for.
Here is another example from our code base. We have these whacky and complicated regular expressions for handling doctests inside of docstrings. I imagine the person that started this may have thought it cool to recreate sphinx using regular expressions. Those regular expressions use some pretty advanced and little-used tagging mechanisms. If you want to show off how clever you can code, great. But as far as handling the underlying problem in a uniform, maintainable, and comprehensible way, this code totally fails.
Let's defer character-based 2D output formatting until after we have Boxing under control. Formatting is intimately tied to Boxing. And if we have a good Boxing mechanism, I think you'll see how easily character-based 2D formatting falls out from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, the question is: supposing we found and agree on an implementation of this "two-dimensional using only keyboard characters", is it something that we want to have (at least optional) available in the command line interface?
It feels to me that we are thinking about this the wrong way. To me, this is like asking if TeXForm should be in the command-line interface. And whether we should be able to return MathML formatted output in the command-line interface.
TeXForm is useful and needed at last to generate the PDF documentation. MathML is used in the Django front-end. PrettyPrint is something "pretty" but maybe is a waste of time to implement/maintain it. For example, probably we do not want to use to check doctests, because writing the expected results would be awkward.
To me, the focus of the implementation should be on how things are boxed and formatted in a generic and general way. Not about what is appropriate for a particular front end. The current implementation is lacking in the generic and general nature. It was particularly more evident when this was in
mathics.core
which 2D character-based formatting is totally inappropriate for.
Yes, but at least for me, it helps me to think why WMA implements boxing as it does. And this is all the reason I wrote and put this here.
Now I am putting another PR, where all the Character2D code is stripped away.
Here is another example from our code base. We have these whacky and complicated regular expressions for handling doctests inside of docstrings. I imagine the person that started this may have thought it cool to recreate sphinx using regular expressions. Those regular expressions use some pretty advanced and little-used tagging mechanisms. If you want to show off how clever you can code, great. But as far as handling the underlying problem in a uniform, maintainable, and comprehensible way, this code totally fails.
OK
Let's defer character-based 2D output formatting until after we have Boxing under control. Formatting is intimately tied to Boxing. And if we have a good Boxing mechanism, I think you'll see how easily character-based 2D formatting falls out from that.
Sure. In any case, if you are OK with it, I will leave this here for a while.
This is a kind of experiment. In WMA, in opposition to
InputForm
which produces an "inline" representation of an expression,OutputForm
consists in a "2D" text-like representation of an expression. There are some examples:In this way, the output in
OutputForm
is formatted in a similar way to the prettyprint format ofSympy
.This PR provides a way to partially reproduce this behavior when an expression is wrapped by
OutputForm
, and a variable$Use2DOutputForm
is set to True.The support is not complete, and probably can be improved using the code in
sympy.prettyprint
. Also, it currently does not work in Mathics-Django, because it is not able to print strings with line breaks.Here are some examples in Mathics, under this branch: