Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_width makes short lines longer #6338

Open
kornelski opened this issue Sep 19, 2024 · 9 comments
Open

max_width makes short lines longer #6338

kornelski opened this issue Sep 19, 2024 · 9 comments

Comments

@kornelski
Copy link

kornelski commented Sep 19, 2024

If I have some code that is formatted perfectly correctly for a max_width = 50:

fn example() {
    let x = [1, 2, 3]
        .iter()
        .rev()
        .enumerate()
        .count();
}

then the same code formatted with a longer maximum length (max_width = 100) always gets reformatted to stretch as widely as possible. It gets changed despite already being formatted cleanly, and not exceeding the maximum width:

fn example() {
    let x = [1, 2, 3].iter().rev().enumerate().count();
}

This makes rustfmt's formatting especially detrimental when max_width is set to a high value (like max_width = 150 or max_width = 200), because expressions that had reasonable line breaks previously are forced to become very long lines.

I don't mind having occasional long lines in the code, and I would like to be able to set higher allowable maximum line width, without it also becoming a target width for making all code maximally wide.

Example use-cases:

  • In if conditions, wrapping has a relatively high cost. Indentation of the wrapped conditions is the same as indentation of a usual "then" block, so the reader has to pay close attention to the unusual position of {. In such cases a line that is a bit longer is IMHO a lesser evil. But I don't want rustfmt to undo already wrapped if conditions where I've already decided the line was too long.

  • I have match statements with similar but not identical content (e.g. when mapping types in an enum to invocations of a generic function, the calls may be very similar, but variant names and variable names differ in lengths). It's possible for some, but not all, of the match arms to exceed the usual limit of 80-100 chars. This makes rustfmt format the arms inconsistently in regards to each other: some will have multi-line formatting, some will have single-line formatting. This makes it harder to compare the match arms. A value of max_width that makes one match consistent will break consistency of another that has different range of widths.

  • I have expressions that implement standard-defined mathematical formulas defined in external documents. I want those expressions to have line breaks in the same places as in the source document, for them to be easy to compare to the original for review.

  • I want to avoid trivial code changes from looking more complex in code reviews. Code is more often read more than written, and code that is developed via pull requests is also often read as a diff. For me as a maintainer, readability of diffs of the code is most important. When a PR renames a function or a variable, I want that PR to have minimal single-word diffs. Unfortunately, if the new name has a different length, it often makes lines cross the (un)wrapping threshold, and changes start to look like a complete rewrite of entire blocks/statements. Same thing happens when a line is deleted from a multi-line expression (e.g. a struct field is deleted). Instead of keeping it as an obvious 1-line-delete diff, rustfmt can decide to unwrap the whole multi-line block into one long line, making the diff much bigger, and obscuring what has really changed.

Why not #[rustfmt::skip]?

  1. It's an all-or-nothing mark, rather than control over line length. I still want all the short and long lines to be correctly formatted in all other aspects, like spaces around punctuation. When a long line ends with a block, I still want the indentation of the following code in the block to be formatted well.

  2. #[rustfmt::skip] attribute by itself adds visual clutter. It becomes a point of contention whether an improvement in readability of one line is worth adding an eyesore in another. Usually it isn't, which makes me feel forced to live with an undesirable max_width behavior that always makes some code worse regardless of what value I set it to.

Notably, go fmt does not have this problem. In Golang, lines that exceed the maximum limit are wrapped to become multi-line expressions, but expressions that already fit below the maximum line length are not forced to become one-liner spaghetti. All lines are still formatted correctly according to gofmt's rules, and this behavior is entirely deterministic and idempotent.

@ytmimi
Copy link
Contributor

ytmimi commented Sep 19, 2024

@kornelski I'm not sure if it'll help in all cases, but maybe some of rustfmt's more granular width configurations could help you out here.
I'm linking to use_small_heuristics, which links to the other width options.

@kornelski
Copy link
Author

kornelski commented Sep 19, 2024

No, these don't help at all. I've reviewed and tested these carefully. The problem is not in choosing the right widths for various items, the problem is in the general design principle in rustfmt that always makes it unwrap shorter lines up to the maximum setting. If set short widths, I'll have much more wrapping than I want. If I set long widths, I'll have unwanted spaghettification.

I can only move the thresholds, but I can't disable the domino effect of code crossing a threshold.

To be clear, I'm not asking to change max_width specifically, but some option to change the maximum line length enforcement to act like in go fmt would be great.

@ytmimi
Copy link
Contributor

ytmimi commented Sep 19, 2024

Thanks for clarifying. That might be easier said than done. As you've noted, rustfmt doesn't take the source layout into account when rewriting AST nodes.

@kornelski
Copy link
Author

rustfmt somehow preserves empty lines, even though they're whitespace and not part of a usual AST. Perhaps in a similar way it could keep information about the original line lengths, and format AST nodes for a max_width set to maximum line length of any of the child nodes?

@ytmimi
Copy link
Contributor

ytmimi commented Sep 20, 2024

The way rustfmt preserves newlines between AST nodes is actually fairly simple and doesn't require keeping track of any layout information from the source code. It does however peek back into the source to count newlines and recover comments.

It's maybe possible to provide an option that would infer the layout of nodes in certain contexts based on the source code, for example forcing a vertical layout, but I don't think we'd be able to format code exactly like go fmt would.

@calebcartwright
Copy link
Member

There's a lot of good discussion on this thread but there's a few things I want to add.

I know this is somewhat tangential to the underlying behavior being requested, but I think important to note rustfmt has some config options (including max_width) that by default change the values of other options unless those options are set explicitly by the user. So changing the value of max_width is also changing the value of chain_width which is what triggers the specific formatting change here.

Ultimately, I think the difference between the requested behavior and the current behavior is that the Style Guide currently has very explicit prescriptions based on column widths, and if an element is less than that threshold (e.g. chain_width) then it must be single line, else it must be multiline, regardless of whatever choice the developer made about linebreaks in the input.

As a result, you'll never see something like this with rustfmt

fn example() {
    let x = [1, 2, 3]
        .iter()
        .rev()
        .enumerate()
        .count();

    let x = [1, 2, 3].iter().rev().enumerate().count();
}

two identical constructs will be formatted exactly the same way, every time. that's an intentional decision, and one that comes with tradeoffs that reasonable people may weigh differently.

i'll share that the Style team already has plans to review the default rules for 2027 to try to devise something that produces more consistently "better" results as we believe that column width is simpler to grok mentally and implement in tools but too often produces subpar results.

In the meantime, something the rustfmt team has been actively exploring are config-driven solutions that would add options that support devs/teams that want more manual control over when line breaks are added or removed, and which would enable the the outcome requested here.

Given the current (and unlikely to change) model, I do not envision the default formatting behavior will ever change and be gofmt, but I do completely agree that the whitespace information is available in the input and we can support non-default formatting options that utilize it.

Due to the AST-centric nature of the formatting rules and behavior, it is something we would have to do very incrementally (chains, arrays, etc.), but it is doable. We successfully implemented a POC with this for chains, but had to temporarily put it on the backburner due to the focus on the 2024 edition. It's something we'll pick back up once 2024 ships

@kornelski
Copy link
Author

two identical constructs will be formatted exactly the same way, every time.

This is only true in a narrow case. It's not every time. In practice, the opposite happens, and rustfmt can force identical AST nodes to look completely different:

let x = [1, 2, 3].iter().rev().enumerate().count();

if true {
    let x = [1, 2, 3]
        .iter()
        .rev()
        .enumerate()
        .count();
}

rustfmt formatting identical expressions in different ways is one of the issues I have with it. If the longest line happens to cross a threshold:

match val {
    Enum::One(one) => take(one).and().process().it(),
    Enum::Two(two) => take(two).and().process().it(),
    Enum::Three(three) => take(three).and().process().it(),
    Enum::Four(four) => take(four).and().process().it(),
}

Then rustfmt makes identical match arms formatted differently:

match val {
    Enum::One(one) => take(one).and().process().it(),
    Enum::Two(two) => take(two).and().process().it(),
    Enum::Three(three) => take(three)
        .and()
        .process()
        .it(),
    Enum::Four(four) => take(four).and().process().it(),
}

@ytmimi
Copy link
Contributor

ytmimi commented Sep 25, 2024

This is only true in a narrow case. It's not every time. In practice, the opposite happens, and rustfmt can force identical AST nodes to look completely different:

let x = [1, 2, 3].iter().rev().enumerate().count();

if true {
    let x = [1, 2, 3]
        .iter()
        .rev()
        .enumerate()
        .count();
}

The indentation level of a node is an important factor when determining the layout since that impacts how much room rustfmt has before it needs to break lines. It might be better to say that "two identical nodes will be formatted exactly the same way, every time if they appear in the same context".

rustfmt formatting identical expressions in different ways is one of the issues I have with it. If the longest line happens to cross a threshold:

match val {
    Enum::One(one) => take(one).and().process().it(),
    Enum::Two(two) => take(two).and().process().it(),
    Enum::Three(three) => take(three).and().process().it(),
    Enum::Four(four) => take(four).and().process().it(),
}

Then rustfmt makes identical match arms formatted differently:

match val {
    Enum::One(one) => take(one).and().process().it(),
    Enum::Two(two) => take(two).and().process().it(),
    Enum::Three(three) => take(three)
        .and()
        .process()
        .it(),
    Enum::Four(four) => take(four).and().process().it(),
}

Having more consistent match arm formatting has been brought up before (#3995). Probably best to keep the discussion here focused on how increasing the max_width makes short lines longer.

@kornelski
Copy link
Author

kornelski commented Sep 26, 2024

I brought match up here, because it is related. If max_width didn't force short lines to be long, then I could set max_width=99999, and choose myself whether I want the match arms wrapped or not.

#3995 also brings other more complex causes of apparent inconsistency in match, and whether structurally different AST nodes should be formatted in similar ways may be a more difficult question than whether structurally identical AST nodes should be allowed to look the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants