Idea to encode grammar #1

artelk · 2022-08-27T15:11:27Z

You can remap the grammar symbols making the array of pairs sorted.
Imagine you have the rules:

256->[17][23]
257->[10][256]
258->[15][257]

After remapping the grammar symbol values you will have the table sorted by the ([firstSymbol],[secondSymbol]) pairs:

256->[10][258]
257->[15][256]
258->[17][23]

The pairs can be treated as long numbers and the sorted table could be compressed using Interpolative Coding

nicolaprezza · 2022-08-27T16:06:10Z

Hi, thanks. I already do that, even if I use Elias delta encoding instead of interpolative coding. Some more details are in section 3.3.here https://arxiv.org/pdf/1704.08558.pdf (I don't remember the details, a few years have passed).

artelk · 2022-08-27T18:52:04Z

If I understood that correctly your algorithm selects the best candidate (only) when there are multiple pairs with the same frequency. Will that produce the best ordering for the final encoding of the whole grammar array? (that wasn't clear to me to be honest, I don't have a black belt in computer science 😄 )

And I proposed to make the remapping of the symbol values (in the grammar and in the output text) at the very end of the re-pair. You have complete freedom how to remap/reorder them to improve the compression rate of the grammar rules array. Could it make sense?

nicolaprezza · 2022-08-31T07:53:23Z

Ah, right: we sort only when the frequency is the same. Ok, I got your idea. I think there are some issues, but the underlying optimization problem looks interesting. One issue that I see is that if you re-map symbols so that first symbols are increasing, then also the left-hand sides of the rules get remapped, so you could lose that order. For example:

1 -> 3 4
2 -> 1 3
...

to apply your idea, we need to re-map 3->1 and 1->3 (or any other map such that map(3) < map(1)) but that would give us

3 -> 1 4
2 -> 3 1

so now first symbols are sorted, but left-hand sides are no longer sorted.

if you wish, we can continue this discussion via email (you can find it in my page https://www.unive.it/data/persone/24170321)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea to encode grammar #1

Idea to encode grammar #1

artelk commented Aug 27, 2022

nicolaprezza commented Aug 27, 2022

artelk commented Aug 27, 2022

nicolaprezza commented Aug 31, 2022

Idea to encode grammar #1

Idea to encode grammar #1

Comments

artelk commented Aug 27, 2022

nicolaprezza commented Aug 27, 2022

artelk commented Aug 27, 2022

nicolaprezza commented Aug 31, 2022