-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text in FileObject.size_in_bytes Field Causes Error #278
Comments
This issue was brought up again earlier this month on the cit-users list. While more granular exception handling would help us not lose the entire package, even just handling "bytes" in the field would be very helpful. Any thoughts on this? Matthew Hall writes:
|
We could certainly strip out all non-digit characters. This would break for things like "KB", but content like that wouldn't have worked in the first place. I'm a bit hesitant to do anything more than that. One thing about the current approach is that it handles hex ("0x") numbers correctly, but would break with this naive approach. Side note: I think octal numbers would be OK, since a leading "0" (except for "0x") causes Python to interpret it as octal, even if it's not "0o". |
I have never seen hex or octal values in this field, but "bytes" seems to be pretty common. I don't think just stripping non-digit characters would be a good choice. I'd prefer to handle/convert for a few common cases (bytes, KB, MB, etc.) and continue triggering an exception for ambiguous values. |
Thanks, @brlogan. One issues is that it is easiest to implement it for all UnsignedLongObjectPropertyType properties at the same time. A lot of the other fields are places where I would legitmately expect a "0x" prefix. I can see three basic solutions:
I'm leaning towards the third, but it could be overkill. Thoughts? |
I'm not sure that the third option is the better way to go. If we strip something like "MB" or "KB" from the end of the string and just use the numeric value as if it were bytes, then we are working with incorrect data. An error may be better in that case. Further, if you strip letters from the end, you may change the hex value. |
I have come across many STIX documents that include units in the FileObject.size_in_bytes field (despite its name). For example, I have a bunch of STIX documents with size_in_bytes values like "123 bytes". Because python-cybox tries to convert this value to a long, I get an "invalid literal for long() with base 10" error. From a TAXII/STIX perspective, it's annoying to have an entire STIX package fail because of this.
I'm not sure what the "right" answer to this problem is, but even something as simple as stripping out "bytes" or "B" would be helpful. If you wanted to get fancy, you could also do conversions for things like "KB", "MB", etc.
The text was updated successfully, but these errors were encountered: