Text in FileObject.size_in_bytes Field Causes Error #278

brlogan · 2015-11-11T02:12:31Z

I have come across many STIX documents that include units in the FileObject.size_in_bytes field (despite its name). For example, I have a bunch of STIX documents with size_in_bytes values like "123 bytes". Because python-cybox tries to convert this value to a long, I get an "invalid literal for long() with base 10" error. From a TAXII/STIX perspective, it's annoying to have an entire STIX package fail because of this.

I'm not sure what the "right" answer to this problem is, but even something as simple as stripping out "bytes" or "B" would be helpful. If you wanted to get fancy, you could also do conversions for things like "KB", "MB", etc.

brlogan · 2016-05-20T18:08:45Z

This issue was brought up again earlier this month on the cit-users list. While more granular exception handling would help us not lose the entire package, even just handling "bytes" in the field would be very helpful. Any thoughts on this?

Matthew Hall writes:

...when the code comes across a CybOX FileObj w/ a bogus Size_In_Bytes, the exception disrupts parsing the entire STIX Package not just the corrupted / invalid entity:

<FileObj:Size_In_Bytes condition="Equals">380058 bytes/FileObj:Size_In_Bytes

ValueError: invalid literal for long() with base 10: '380058 bytes'
File ".../venv/lib/python2.7/site-packages/cybox/common/properties.py", line 514, in _parse_value
return long(value, 0)

How can I perform a best-effort parse with python-stix in order to operate as properly as possible in such situations?

gtback · 2016-05-23T20:16:08Z

We could certainly strip out all non-digit characters. This would break for things like "KB", but content like that wouldn't have worked in the first place. I'm a bit hesitant to do anything more than that. One thing about the current approach is that it handles hex ("0x") numbers correctly, but would break with this naive approach. Side note: I think octal numbers would be OK, since a leading "0" (except for "0x") causes Python to interpret it as octal, even if it's not "0o".

brlogan · 2016-05-24T03:55:15Z

I have never seen hex or octal values in this field, but "bytes" seems to be pretty common. I don't think just stripping non-digit characters would be a good choice. I'd prefer to handle/convert for a few common cases (bytes, KB, MB, etc.) and continue triggering an exception for ambiguous values.

gtback · 2016-05-25T01:05:13Z

Thanks, @brlogan. One issues is that it is easiest to implement it for all UnsignedLongObjectPropertyType properties at the same time. A lot of the other fields are places where I would legitmately expect a "0x" prefix. I can see three basic solutions:

Implement custom logic for the size_in_bytes field.
If the value ends in bytes, remove the last 6 characters.
Strip all alphabetic characters (upper and lower) from the beginning and end of the string. Octal and hex values should be unaffected, since they start with 0, but it would handle all kinds of suffixes.

I'm leaning towards the third, but it could be overkill. Thoughts?

brlogan · 2016-06-01T02:55:45Z

I'm not sure that the third option is the better way to go. If we strip something like "MB" or "KB" from the end of the string and just use the numeric value as if it were bytes, then we are working with incorrect data. An error may be better in that case. Further, if you strip letters from the end, you may change the hex value.
If someone is up to it, implementing some custom logic would be really nice, but with the frequency I've come across "bytes" in that field, I'd be happy with just the middle option.

gtback self-assigned this Jun 8, 2016

gtback removed their assignment Apr 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text in FileObject.size_in_bytes Field Causes Error #278

Text in FileObject.size_in_bytes Field Causes Error #278

brlogan commented Nov 11, 2015

brlogan commented May 20, 2016

gtback commented May 23, 2016

brlogan commented May 24, 2016

gtback commented May 25, 2016

brlogan commented Jun 1, 2016

Text in FileObject.size_in_bytes Field Causes Error #278

Text in FileObject.size_in_bytes Field Causes Error #278

Comments

brlogan commented Nov 11, 2015

brlogan commented May 20, 2016

gtback commented May 23, 2016

brlogan commented May 24, 2016

gtback commented May 25, 2016

brlogan commented Jun 1, 2016