Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String break detection #39

Closed
jmendes92 opened this issue Nov 20, 2019 · 5 comments
Closed

String break detection #39

jmendes92 opened this issue Nov 20, 2019 · 5 comments

Comments

@jmendes92
Copy link

Hi,

For a project I need to know the width of a string for a given font and font size, so I can split the string into multiple lines, typically when the string is much wider than the page.

I checked the crate's documentation but I haven't found anything about it.

If this feature doesn't exist in this crate or one of its dependencies is there anything about getting at least the width of a char. If not what would be the best approach to solve this issue.

Thanks for the help.

@fschutt
Copy link
Owner

fschutt commented Nov 20, 2019

Yes I intentionally didn't add this because it's not the task of a PDF library to do text layout. You can maybe look at azul-text-layout (my other project), which has this kind of capability, however it's pretty alpha right now: https://github.com/maps4print/azul/blob/cc3ad6eecda13ea052679c882a3a7131eb1e3d4e/azul/lib.rs#L358-L388 - you add azulc to your dependencies crate (on the /unvendor_dependencies branch - please clone the repo first then copy those few lines into your library and you should be good to go.

The azul-text-layout also supports proper glyph composition (for example, merging "a" and "^" into one character), line breaking, letter, word and line spacing, text alignment as well as flowing text around images or holes in the text. It's based on harfbuzz / freetype, but it should do the job.

So basically: You have a text and you don't know whether it fits in the page. You create an SvgTextLayout like the code linked above. In the ResolvedTextLayoutOptions, you pass in a max_width, which is the maximum width of the page. This will then give you the layouted text. Now you can do:

pdf.start_text_block();

// for each word in your text...
for (word_index, word) in svg_text_layout.words.iter().enumerate() {
    // get the position of the word
    let word_position = svg_text_layout.word_positions[word_index];
    // position the cursor in the pdf
    pdf.set_text_cursor(word_position);
    // write the word
    pdf.write_text(word.get_string());
}

pdf.end_text_block();

Alternatively you can write text on a line-by-line basis:

pdf.start_text_block();

for line in svg_text_layout.inline_text_layout.lines {
    pdf.set_text_cursor(line.get_bounds().top); // set cursor once for the line
    // now write all words for the line
    for word_index in (line.word_index_start..line.word_index_end) {
         let word = svg_text_layout.word_positions[word_index];
         pdf.write_text(word.get_string());
    }
}

pdf.end_text_block();

This is just pseudo-code, I'm just writing it on the top of my head right now to give you a hint where to look. You also might run into issues with coordinate systems. The important part however is that azul-text-layout has functions to return whether a text breaks or not, given a text, a font, a font size and a max_width.

Yes, text layout is quite a pain in the ass, I know, but it's a pretty hard task. I haven't really concerned myself with text layout for printing all that much.

@jmendes92
Copy link
Author

Thanks, I'll look into it.

@jmendes92
Copy link
Author

So after some tests, there is a begin of a solution to help others with the same problem.

Cargo.toml add the following lines:

[dependencies]
azul-core = { path = "../../azul/cargo/azul-core" }
azul-css = { path = "../../azul/cargo/azul-css" }
azul-widgets = { path = "../../azul/cargo/azul-widgets", features = ["fonts", "svg", "svg_parsing"] }
azulc = { path = "../../azul/cargo/azulc", features = ["text_layout"]}
azul-text-layout = { path = "../../azul/cargo/azul-text-layout" }

Then in your rust file add :

use printpdf::*;
use std::fs::File;
use std::io::BufWriter;
use azul_text_layout::text_layout::ResolvedTextLayoutOptions;
use azul_css::{StyleTextAlignmentHorz, LayoutPoint};
use azul_widgets::svg::SvgTextLayout;
use azulc::font_loading::font_source_get_bytes;
use azul_core::app_resources::FontSource;

pub fn svg_text_layout_from_str(
    text: &str,
    font_bytes: &[u8],
    font_index: u32,
    mut text_layout_options: ResolvedTextLayoutOptions,
    horizontal_alignment: StyleTextAlignmentHorz,
) -> SvgTextLayout {
    use azulc::layout::text_layout::text_layout;
    use azulc::layout::text_layout::text_shaping::get_font_metrics_freetype;

//    text_layout_options.font_size_px = SVG_FAKE_FONT_SIZE;
    let words = text_layout::split_text_into_words(text);
    let font_metrics = get_font_metrics_freetype(font_bytes, font_index as i32);
    let scaled_words = text_layout::words_to_scaled_words(&words, font_bytes, font_index, font_metrics, text_layout_options.font_size_px);
    let word_positions = text_layout::position_words(&words, &scaled_words, &text_layout_options);

    let mut inline_text_layout = text_layout::word_positions_to_inline_text_layout(&word_positions, &scaled_words);
    inline_text_layout.align_children_horizontal(horizontal_alignment);

    let layouted_glyphs = text_layout::get_layouted_glyphs(&word_positions, &scaled_words, &inline_text_layout, LayoutPoint::zero());

    SvgTextLayout {
        words,
        scaled_words,
        word_positions,
        layouted_glyphs,
        inline_text_layout,
    }
}


fn create_pdf() {
    let (doc, page1, layer1) = PdfDocument::new("printpdf graphics test", Mm(210.0), Mm(297.0), "0");
    let current_layer = doc.get_page(page1).get_layer(layer1);
    let font = doc.add_builtin_font(BuiltinFont::Helvetica).unwrap();

    // Length text : words = 135, chars = 885
    let text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Facilisi nullam vehicula ipsum a arcu cursus vitae congue. Porta nibh venenatis cras sed felis. Eros in cursus turpis massa. Nulla facilisi cras fermentum odio. Posuere ac ut consequat semper. Ut tristique et egestas quis ipsum suspendisse ultrices gravida dictum. Tortor consequat id porta nibh venenatis cras sed felis. Porta nibh venenatis cras sed felis eget velit. Blandit aliquam etiam erat velit. Tristique magna sit amet purus gravida. Et odio pellentesque diam volutpat commodo sed egestas egestas. Faucibus purus in massa tempor nec feugiat nisl. Consectetur adipiscing elit ut aliquam purus. Morbi tincidunt ornare massa eget egestas purus viverra accumsan in. Aliquam sem et tortor consequat id porta. Porttitor eget dolor morbi non arcu risus quis.";
    let font_size_pt = 12;
    let line_height = 14;

    current_layer.set_font(&font, font_size_pt);
    current_layer.set_line_height(line_height);

    let title_color = Color::Rgb(Rgb{
        r: 1.0,
        g: 0.0,
        b: 0.0,
        icc_profile: None
    });

    let text_color = Color::Rgb(Rgb{
        r: 0.0,
        g: 0.0,
        b: 0.0,
        icc_profile: None
    });

    let font_source = font_source_get_bytes(&FontSource::System("Helvetica".to_string())).unwrap();
    let text_layout_options = ResolvedTextLayoutOptions {
        font_size_px: 16.0, // 12pt => (12*96)/72 = 16px
        line_height: Some(font_size_pt as f32 / line_height as f32),
        letter_spacing: None,
        word_spacing: None,
        tab_width: None,
        max_horizontal_width: Some(360.0),
        leading: None,
        holes: vec![],
    };
    let horizontal_alignment = StyleTextAlignmentHorz::Left;

    let svg_text_layout = svg_text_layout_from_str(text.as_ref(), &font_source.font_bytes, font_source.font_index as u32, text_layout_options.clone(), horizontal_alignment);
    let lines = svg_text_layout.inline_text_layout.lines;

    // ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ //
    //                              WORKAROUND 1 - TEXT SECTION
    // ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ //

    current_layer.begin_text_section();
    current_layer.set_text_cursor(Mm(10.0), Mm(290.0));
    current_layer.set_fill_color(title_color.clone());
    current_layer.write_text("WORKAROUND 1", &font);
    current_layer.end_text_section();

    for line in lines.clone() {
        // Convert bounds to mm
        let x_mm = (line.bounds.origin.x / 96.0 * 25.4) as f64;
        let y_mm = (line.bounds.origin.y / 96.0 * 25.4) as f64;

        // Set coordinates properly
        let x = x_mm;
        let y = 297.0 - y_mm - line_height as f64;

        current_layer.begin_text_section();
        current_layer.set_fill_color(text_color.clone());
        current_layer.set_text_cursor(Mm(x), Mm(y)); // set cursor once for the line

        // now write all words for the line
        for word_index in line.word_start..line.word_end {
            let word = svg_text_layout.words.items[word_index];
            let word_str = svg_text_layout.words.get_substr(&word);
            current_layer.write_text(word_str.as_str(), &font);
        }

        current_layer.end_text_section();
    }
    // ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ //

    // ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ //
    //                              WORKAROUND 2 - USE TEXT
    // ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ //

    current_layer.begin_text_section();
    current_layer.set_text_cursor(Mm(110.0), Mm(290.0));
    current_layer.set_fill_color(title_color.clone());
    current_layer.write_text("WORKAROUND 2", &font);
    current_layer.end_text_section();

    for line in lines.clone() {
        // Convert bounds to mm
        let x_mm = (line.bounds.origin.x / 96.0 * 25.4) as f64;
        let y_mm = (line.bounds.origin.y / 96.0 * 25.4) as f64;

        // Set coordinates properly
        let x = x_mm + 100.0; // Add horizontal offset for example visibility
        let y = 297.0 - y_mm - line_height as f64;

        let mut line_str = "".to_string();

        // now write all words for the line
        for word_index in line.word_start..line.word_end {
            let word = svg_text_layout.words.items[word_index];
            let word_str = svg_text_layout.words.get_substr(&word);
            line_str += word_str.as_str();
        }

        current_layer.set_fill_color(text_color.clone());
        current_layer.use_text(line_str, font_size_pt, Mm(x), Mm(y), &font)
    }

    // ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ //
    //                              WORKAROUND 3 - LINE BREAK
    // ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ //

    current_layer.begin_text_section();
    current_layer.set_text_cursor(Mm(10.0), Mm(140.0));
    current_layer.set_fill_color(title_color.clone());
    current_layer.write_text("WORKAROUND 3", &font);
    current_layer.end_text_section();

    let first_line = lines.first().unwrap();

    current_layer.begin_text_section();
    current_layer.set_fill_color(text_color.clone());
    current_layer.set_text_cursor(Mm(first_line.bounds.origin.x as f64), Mm(130.0)); // set cursor once for the line

    for line in lines.clone() {
        // now write all words for the line
        for word_index in line.word_start..line.word_end {
            let word = svg_text_layout.words.items[word_index];
            let word_str = svg_text_layout.words.get_substr(&word);
            current_layer.write_text(word_str.as_str(), &font);
        }

        current_layer.add_line_break();
    }

    current_layer.end_text_section();

    doc.save(&mut BufWriter::new(File::create("test.pdf").unwrap())).unwrap();
}

This file contains tree workarounds, where the result can be seen in the next image:

image

The Workaround 3 is the one I think looks close to the expected result. In this workaround, the cursor is set for the first line only and then use layer.add_line_break(). This way the line_height looks better.

I would like to point out some issues that I have noticed. Some issues may depend on prinpdf or what I have done:

  • When a single word is wider than the max_width it isn't cut and overflows. A better solution would be to split the word into n lines (with or without hyphens) or simply remove the extras chars.
  • Lines construction is a little bit weird. For example, if you look to the resulting image the first to lines _Lorem ipsum ... _ and _ amet, ... _ visually the word _ amet, _ should fit on the first line and there are plenty example in the image. However, I have looked at the line dimensions and all the lines have almost the same width only few pixels of difference.
  • The line_height property inside ResolvedTextLayoutOptions as no effect.

For now, it does the job, and I am sure in the future it will be better. Also if I may suggest, the crate for text layout should be totally independent of the azul project. This way it can be used easily for both azul and printpdf. Right now I haven't the time to contribute but in the future maybe.

If you think I am doing or wrote something wrong don't hesitate to tell me.

Again thank you for your help.

@fschutt
Copy link
Owner

fschutt commented Dec 7, 2019

Yes, all of this is in a pretty early stage (both printpdf and azul-text-layout), so it does require workaround and hacks, sadly.

  1. Please use ref = "<commit hash>", so that your code doesn't break when the git repo is updated.
  2. Theoretically you should only have to add azul-core (the core data types) and azul-text-layout into your dependencies. And the azul-text-layout crate doesn't do anything specific to azul, it's just the name. However, splitting the crates "properly" is only a difficult task. You can optimize your dependencies further by setting default-features = falseand removing the svg_parsing and the fonts feature from the azul-widgets crate. Those are only necessary if you want to parse SVG or convert fonts into SVG paths.
  3. Strangely the issue with spaces on a new line only appears in the PDF, not when using azul-text-layout for a UI. So it has to be either a conversion mistake or the text layout is slightly broken.
  4. Hyphenation and breaking strings properly is a pretty complex topic - hyphenation especially because you need language dictionaries to do it properly. Currently the text simply overflows because it was the easier decision.
  5. For a workaround, you can collect all the words in a line, then write the entire line as one string:
    for line in lines.clone() {

        let entire_line_text = line.word_start..line.word_end.filter_map(|word_index| {
            let word = svg_text_layout.words.items[word_index];
            if word.is_space() || word.is_return() {
                None // this removes the first space on a line
            } else {
                Some(svg_text_layout.words.get_substr(&word))
            }
        })
        .collect::<Vec<String>>()
        .join(" ");

        current_layer.write_text(entire_line_text.as_str(), &font);
        current_layer.add_line_break();
    }

The text layout engine of your PDF viewer of choice should do the rest.

@jmendes92
Copy link
Author

Thanks for the help. As the project is only a POC, it does the job. If the project, goes forward I think I will create a library that fits better the project needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants