Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bit and byte for information and data #419

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

BrandonConder
Copy link
Contributor

Create information. Define Byte as basic unit for data storage. Include data rates. Modify barn (b -> bn) and Bel (B -> bel) to accommodate the new definitions.

Closes #173.
Mostly addresses #330.
Does not include e.g. methods for converting between bits and entropy, this is probably reasonable since most users will think of bits as data (e.g. storage capacity, transfer rates) rather than information.

Personal preferences:

  • I don't think this belongs in a separate units package because bit and byte are common and conflict with the existing definitions of barn and Bel.
  • Less common units are not included: nibble, 1024 multiples (kibibytes, mebibytes, etc.). A user could easily implement these.
  • Baud not defined because bits per symbol is not standardized.

Quirks:

  • I had to manually implement the SI prefixes for Bytes to avoid deci-bytes (dB), which is a meaningless unit and a conflict with Decibel.
  • ZettaByte and YottaByte were not included because the literal 8_000_000_000_000_000_000_000 undergoes type promotion to Float64 and causes a rounding error.
    • It might be possible to implement this using Int128(8_000_000_000_000_000_000_000)
    • If there were a way to use the Unitful prefix system excluding deciByte, Unitful handles these types greacefully (e.g. 1Zm/Em == 1000 returns true)
    • Behavior with the 32-bit Julia binary needs review, to ensure no units promote to Float32

I've included tests for the manual SI prefixes.

Create information. Define Byte as basic unit for data storage.
Include data rates. Modify barn (b -> bn) and Bel (B
-> bel) to accommodate the new definitions.
Test that manually-added prefixes match expected SI prefixes.
@BrandonConder
Copy link
Contributor Author

BrandonConder commented Jan 31, 2021

It looks like the Float32 problem is realized on x86. How to proceed? I can leave out SI prefixes and let users decide if they want to declare kB, MB, GB, etc.

I don't see any other units doing this, but can I declare e.g.

@unit      TB   "TB"     TeraByte          1_000GB    false

Alternatively I could try to rewrite the @Unit macro such that the input tf can accept a flag for "prefixes > 1 only", but that seems like a pretty big overhaul for this edge case.

@@ -113,7 +115,7 @@ end
# The hectare is used more frequently than any other power-of-ten of an are.
@unit a "a" Are 100m^2 false
const ha = Unitful.FreeUnits{(Unitful.Unit{:Are, 𝐋^2}(2, 1//1),), 𝐋^2}()
@unit b "b" Barn 100fm^2 true
@unit bn "bn" Barn 100fm^2 true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is breaking

#########
# Logarithmic scales and units

@logscale dB "dB" Decibel 10 10 false
@logscale B "B" Bel 10 1 false
@logscale bel "Bel" Bel 10 1 false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one as well

@BrandonConder
Copy link
Contributor Author

Is backward-compatibility-breaking prohibited? In my field Bel is never used (only decibel), and I don't know how common or significant Barn is, so I assumed bit and Byte were more common definitions of those abbreviations.

@giordano
Copy link
Collaborator

giordano commented Feb 1, 2021

We can have a breaking change, but need to do a breaking release (i.e., go to v2.0), which would cause some friction in the ecosystem of packages relying on Unitful, for little benefit

@sostock
Copy link
Collaborator

sostock commented Feb 12, 2021

I would suggest to use Byte and bit instead of B and b, that way it isn’t breaking.

@@ -9,6 +9,7 @@
@dimension 𝚯 "𝚯" Temperature # This one is \bfTheta
@dimension 𝐉 "𝐉" Luminosity
@dimension 𝐍 "𝐍" Amount
@dimension 𝐛 "𝐛" Information
Copy link
Contributor

@mcabbott mcabbott Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a dimension? I thought that nat is, like rad, a pure number -- it's just some p*log(p) where p is a probability. And bits, like degrees etc, are multiples of that.

Suggested change
@dimension 𝐛 "𝐛" Information

Comment on lines +221 to +222
# Data
@unit B "B" Byte 8b false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the definitions were like this, would it still convert MB to GB as integers, without introducing the log(2)?

Suggested change
# Data
@unit B "B" Byte 8b false
# Information
@unit nat "nat" Nat 1 false
@unit bit "bit" Bit log(2)*u"nat" false
@unit B "B" Byte 8bit false

Comment on lines +229 to +231
@unit kB "kB" KiloByte 8_000b false
@unit MB "MB" MegaByte 8_000_000b false
@unit GB "GB" GigaByte 8_000_000_000b false
Copy link
Contributor

@mcabbott mcabbott Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the base 2 ones also be included?

Suggested change
@unit kB "kB" KiloByte 8_000b false
@unit MB "MB" MegaByte 8_000_000b false
@unit GB "GB" GigaByte 8_000_000_000b false
@unit kB "kB" KiloByte 8_000b false
@unit kiB "kiB" KibiByte 8_192b false
@unit MB "MB" MegaByte 8_000_000b false
@unit MiB "MiB" MebiByte 8_388_608b false
@unit GB "GiB" GigaByte 8_000_000_000b false
@unit GiB "GiB" GibiByte 8_589_934_592b false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"GiB" appears twice in the suggested change, looks like a typo.

@sostock sostock mentioned this pull request Oct 6, 2021
@rico-cl
Copy link

rico-cl commented Nov 21, 2021

Regarding the question of adding another dimension for the desired units, I propose to go with dimensionless units, since

  • Unitful seem to try and adhere to the SI which doesn't provide such a dimension
  • how to consistently handle units that are dimensionless in the framework of SI as of now is still an unsolved (or at least debated) problem in the respective research communities as far as I understand, so I wouldn't expect Unitful to come up with a solution itself
  • the units in question can be used to measure different things: amount of data, amount of storage, informational content of data (as denoted by Shannon entropy) - so even if one would like to assign a dimension to them, Information might not even be the most meaningful choice

@mcabbott
Copy link
Contributor

mcabbott commented Apr 1, 2022

I was going to say that these points all apply to angles, where radians are also pure numbers. But it turns out they have in fact been blessed by SI.

Nevertheless, it is not an accident that storage and information are measured in the same units. There are mistakes you can make which won't be caught by checking the units, but there are also such mistakes when measuring both height and distance in metres.

@filchristou
Copy link

filchristou commented Apr 9, 2022

I am in need of such a unit in order to handle data rates in networking problems.
I think it's worth having a specific unit for this case because there is a some confusion in my field when someone talks about Mbits/sec or MBytes/sec. A package that handles this transparently will be great.

@sostock sostock added the new units adding new units/dimensions/constants to this package label Dec 6, 2022
@singularitti
Copy link
Contributor

Hope it's not off-topic: there is a relevant package https://github.com/uriele/UnitfulData.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new units adding new units/dimensions/constants to this package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Units of information
8 participants