Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea to encode grammar #1

Open
artelk opened this issue Aug 27, 2022 · 3 comments
Open

Idea to encode grammar #1

artelk opened this issue Aug 27, 2022 · 3 comments

Comments

@artelk
Copy link

artelk commented Aug 27, 2022

You can remap the grammar symbols making the array of pairs sorted.
Imagine you have the rules:

256->[17][23]
257->[10][256]
258->[15][257]

After remapping the grammar symbol values you will have the table sorted by the ([firstSymbol],[secondSymbol]) pairs:

256->[10][258]
257->[15][256]
258->[17][23]

The pairs can be treated as long numbers and the sorted table could be compressed using Interpolative Coding

@nicolaprezza
Copy link
Owner

Hi, thanks. I already do that, even if I use Elias delta encoding instead of interpolative coding. Some more details are in section 3.3.here https://arxiv.org/pdf/1704.08558.pdf (I don't remember the details, a few years have passed).

@artelk
Copy link
Author

artelk commented Aug 27, 2022

If I understood that correctly your algorithm selects the best candidate (only) when there are multiple pairs with the same frequency. Will that produce the best ordering for the final encoding of the whole grammar array? (that wasn't clear to me to be honest, I don't have a black belt in computer science 😄 )

And I proposed to make the remapping of the symbol values (in the grammar and in the output text) at the very end of the re-pair. You have complete freedom how to remap/reorder them to improve the compression rate of the grammar rules array. Could it make sense?

@nicolaprezza
Copy link
Owner

Ah, right: we sort only when the frequency is the same. Ok, I got your idea. I think there are some issues, but the underlying optimization problem looks interesting. One issue that I see is that if you re-map symbols so that first symbols are increasing, then also the left-hand sides of the rules get remapped, so you could lose that order. For example:

1 -> 3 4
2 -> 1 3
...

to apply your idea, we need to re-map 3->1 and 1->3 (or any other map such that map(3) < map(1)) but that would give us

3 -> 1 4
2 -> 3 1

so now first symbols are sorted, but left-hand sides are no longer sorted.

if you wish, we can continue this discussion via email (you can find it in my page https://www.unive.it/data/persone/24170321)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants