-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data decompresses with C++ but exceptions when decompressed with managed lzma #32
Comments
I'll have a look later today and see if I can reproduce this, I don't see any immediate mistake in the calling code you posted. If you want to speed this up put your file somewhere you can take it back down again, send me an email, or tell me the compression settings and library you used if you compressed the data yourself. Otherwise I'll just try various settings to see if its a general problem. The SDK version used while porting was 9.22 (included in the repo) so yeah its way older than 19.00, but I checked a year or two ago and there haven't been any relevant changes to the code (which makes sense since it needs to remain compatible). I'll check again of course to see if there are any recent bugfixes for edge cases. |
Thanks for the quick reply!
Here’s a shareable link to the file on Google Drive - https://drive.google.com/drive/folders/1WlRGjE6C26YdhdNAKiK0bR_u9wcp8YQH?usp=sharing
I used the LZMA1900 SDK to compile a library and compress the block.
Compression settings are:
int8 level{ 2 };
int8 lc{ 3 };
int8 lp{ 0 };
int8 pb{ 3 };
int8 algo{ 1 };
int8 btMode{ 1 };
int8 numHashBytes{ 4 };
int8 numThreads{ 1 };
int8 numBlockThreads_Reduced{ 1 };
int8 numBlockThreads_Max{ 1 };
int8 numTotalThreads{ 1 };
bool writeEndMark{ false };
int16 fb{ 32 };
uint32 dictSize{ 0 };
uint32 mc{ 32 };
uint64 reduceSize{ 0x100000 };
uint64 blockSize{ 0 };
It seems that an end mark is written even though I don’t request it - that’s a problem for a different day.
Cheers
John
|
Thanks, you can take down the file if you want. I was able to reproduce the issue and it also occurs when using the 9.22 C++ source (i.e. the 9.22 C++ API is able to decode). The "problem" is that your C++ snippet is using the "One Call Interface" while ManagedLzma is using the incremental API. Replicating what the "one call interface" does allows to decode in C# so its not a bug in the low level decoding routines, but rather in how they are called. Either LZMA2 itself has an edge case bug in the incremental API or I'm using it incorrectly. I'll have to write C++ code to decode your file incrementally and see if that succeeds. Might take a few days to figure everything out, sorry. I'll probably have something ready early next week.
Its been a while but I think the "end mark" is a feature only used in LZMA, not in LZMA2 - the way LZMA2 works is by splitting the source into chunks so they can be compressed in parallel (or be stored uncompressed if compression doesn't reduce size) - either way it stores the size of each chunk in a control code, so as far as LZMA2 is concerned it will always know where the stream ended and return the "ended with mark" status, even if the last chunk was LZMA and ended with "maybe ended" flag it replaces that because it has better information. |
Found the problem, I have a bug in translating the Technical details: I need to translate There are two things to note:
Sorry for not providing a "one call" decoding API but .NET is not good with large buffers. The decoder needs to allocate internal buffers and I don't like having an API which can become unstable and "sometimes" throw OOM depending on the memory fragmentation of the rest of the .NET program. Things are designed to keep buffers at reasonable sizes, but that means streaming is mandatory. Here's an example for how the loop can look like for your snippet, including a workaround for the int pInput = 0;
int pOutput = 0;
using (var decoder = new Decoder(new DecoderSettings(12)))
{
do
{
pInput += decoder.Decode(data.SourceData, data.SourceOffset + pInput, data.SourceLength - pInput, data.DestinationLength - pOutput, data.SourceLength - pInput < 16);
pOutput += decoder.ReadOutputData(data.DestinationData, data.DestinationOffset + pOutput, data.DestinationLength - pOutput);
}
while (!decoder.IsOutputComplete);
} |
Seems to work a treat (at least so far) – thanks!
Regarding the end mark - if I decompress that block with the C++ call:
SRes result = Lzma2Decode( dest, &dest_length, compressed_memory, &source_length, 12, ELzmaFinishMode::LZMA_FINISH_ANY, &status, &alloc );
It returns that it generated the correct amount of decompressed data (0x100000) and processed all but a single byte of the source data (0x100042 of 0x100043). If I change the finish mode to LZMA_FINISH_END then it reports that all source data has been consumed (0x100043 Bytes).
Not sure what to make of that, and this is nothing to do with managed-lzma, so just an FYI!
Cheers
John
|
Note that the returned |
As the subject states - I have a data block which is 0x100043 in length that decompresses to 0x100000 fine in C++ (code below), but exceptions with a bad data error in Lzma2Dec_DecodeToDic() line 323 when using managed lzma.
It could be possible I am using the API incorrectly, that would be the first thing to verify.
The Lzma2Dec_DecodeToDic functions in managed lzma and C++ lzma 1900 are just different enough that there's no easy change to make things work. It looks like the C# is a direct port of a different version of the C++.
C++ decode (which works fine)
C# code - which exceptions (for testing, I put this at the top of sandbox-7z Program.cs)
I could send you the file, but I don't want to bog down the forum with a meg of uncompressable data =)
What are your recommendations?
Cheers
John
The text was updated successfully, but these errors were encountered: