Skip to content

Commit

Permalink
update docs (#61)
Browse files Browse the repository at this point in the history
  • Loading branch information
bicycle1885 authored Aug 11, 2018
1 parent 324c990 commit b87d348
Show file tree
Hide file tree
Showing 8 changed files with 115 additions and 94 deletions.
4 changes: 2 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ makedocs(
format=:html,
sitename="TranscodingStreams.jl",
modules=[TranscodingStreams],
pages=["index.md", "examples.md", "references.md", "devnotes.md"],
pages=["index.md", "examples.md", "reference.md", "devnotes.md"],
assets=["assets/custom.css"])

deploydocs(
repo="github.com/bicycle1885/TranscodingStreams.jl.git",
julia="0.6",
julia="0.7",
target="build",
deps=nothing,
make=nothing)
13 changes: 1 addition & 12 deletions docs/src/assets/custom.css
Original file line number Diff line number Diff line change
@@ -1,14 +1,3 @@
h1 {
font-size: 2.0em;
}

h2 {
font-size: 1.8em;
margin-top: 40px;
border-bottom: 1px solid #eeeeee;
}

table {
width: 125%;
font-size: 13px;
font-size: 0.8em;
}
2 changes: 1 addition & 1 deletion docs/src/devnotes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Developer's Notes
Developer's notes
=================

These notes are not for end users but rather for developers who are interested
Expand Down
76 changes: 44 additions & 32 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Read lines from a gzip-compressed file

The following snippet is an example of using CodecZlib.jl, which exports
`GzipDecompressorStream{S}` as an alias of
`TranscodingStream{GzipDecompressor,S} where S<:IO`:
`TranscodingStream{GzipDecompressor,S}`, where `S` is a subtype of `IO`:
```julia
using CodecZlib
stream = GzipDecompressorStream(open("data.txt.gz"))
Expand All @@ -16,9 +16,9 @@ end
close(stream)
```

Note that the last `close` call will close the file as well. Alternatively,
`open(<stream type>, <filepath>) do ... end` syntax will close the file at the
end:
Note that the last `close` call closes the wrapped file as well.
Alternatively, `open(<stream type>, <filepath>) do ... end` syntax closes the
file at the end:
```julia
using CodecZlib
open(GzipDecompressorStream, "data.txt.gz") do stream
Expand All @@ -32,11 +32,11 @@ Read compressed data from a pipe
--------------------------------

The input is not limited to usual files. You can read data from a pipe
(actually, any `IO` object that implements basic I/O methods) as follows:
(actually, any `IO` object that implements standard I/O methods) as follows:
```julia
using CodecZlib
pipe, proc = open(`cat some.data.gz`)
stream = GzipDecompressorStream(pipe)
proc = open(`cat some.data.gz`)
stream = GzipDecompressorStream(proc)
for line in eachline(stream)
# do something...
end
Expand All @@ -50,15 +50,17 @@ Writing compressed data is easy. One thing you need to keep in mind is to call
`close` after writing data; otherwise, the output file will be incomplete:
```julia
using CodecZstd
using DelimitedFiles
mat = randn(100, 100)
stream = ZstdCompressorStream(open("data.mat.zst", "w"))
writedlm(stream, mat)
close(stream)
```

Of course, `open(<stream type>, ...) do ... end` works well:
Of course, `open(<stream type>, ...) do ... end` just works:
```julia
using CodecZstd
using DelimitedFiles
mat = randn(100, 100)
open(ZstdCompressorStream, "data.mat.zst", "w") do stream
writedlm(stream, mat)
Expand All @@ -69,10 +71,11 @@ Explicitly finish transcoding by writing `TOKEN_END`
----------------------------------------------------

When writing data, the end of a data stream is indicated by calling `close`,
which may write an epilogue if necessary and flush all buffered data to the
underlying I/O stream. If you want to explicitly specify the end position of a
stream for some reason, you can write `TranscodingStreams.TOKEN_END` to the
transcoding stream as follows:
which writes an epilogue if necessary and flushes all buffered data to the
underlying I/O stream. If you want to explicitly specify the end of a data
chunk for some reason, you can write `TranscodingStreams.TOKEN_END` to the
transcoding stream, which finishes the current transcoding process without
closing the underlying stream:
```julia
using CodecZstd
using TranscodingStreams
Expand All @@ -87,34 +90,35 @@ close(stream)
Use a noop codec
----------------

Sometimes, the `Noop` codec, which does nothing, may be useful. The following
example creates a decompressor stream based on the extension of a filepath:
The `Noop` codec does nothing (i.e., buffering data without transformation).
`NoopStream` is an alias of `TranscodingStream{Noop}`. The following example
creates a decompressor stream based on the extension of a filepath:
```julia
using CodecZlib
using CodecBzip2
using CodecXz
using TranscodingStreams

function makestream(filepath)
if endswith(filepath, ".gz")
codec = GzipDecompressor()
elseif endswith(filepath, ".bz2")
codec = Bzip2Decompressor()
elseif endswith(filepath, ".xz")
codec = XzDecompressor()
else
codec = Noop()
end
return TranscodingStream(codec, open(filepath))
end

makestream("data.txt.gz")
makestream("data.txt.bz2")
makestream("data.txt.xz")
makestream("data.txt")
```

Change the codec of a file
--------------------------

`TranscodingStream`s are composable: a stream can be an input/output of another
stream. You can use this to chage the codec of a file by composing different
stream. You can use this to change the format of a file by composing different
codecs as below:
```julia
using CodecZlib
Expand All @@ -135,11 +139,13 @@ Effectively, this is equivalent to the following pipeline:
Stop decoding on the end of a block
-----------------------------------

Most codecs support decoding concatenated data blocks. For example, if you
concatenate two gzip files into a file and read it using
`GzipDecompressorStream`, you will see the byte stream of concatenation of two
files. If you need the first part of the file, you can set `stop_on_end` to
`true` to stop transcoding at the end of the first block:
Many codecs support decoding concatenated data blocks (or chunks). For example,
if you concatenate two gzip files into a single file and read it using
`GzipDecompressorStream`, you will see the byte stream of concatenation of the
two files. If you need the part corresponding the first file, you can set
`stop_on_end` to `true` to stop transcoding at the end of the first block.
Note that setting `stop_on_end` to `true` does not close the wrapped stream
because you will often want to reuse it.
```julia
using CodecZlib
# cat foo.txt.gz bar.txt.gz > foobar.txt.gz
Expand All @@ -150,8 +156,8 @@ eof(stream) #> true

In the case where you need to reuse the wrapped stream, the code above must be
slightly modified because the transcoding stream may read more bytes than
necessary from the wrapped stream. By wrapping a stream with `NoopStream`, the
problem of overreading is resolved:
necessary from the wrapped stream. Wrapping the stream with `NoopStream` solves
the problem because adjacent transcoding streams share the same buffer.
```julia
using CodecZlib
using TranscodingStreams
Expand All @@ -170,9 +176,9 @@ error:
using CodecZlib

function decompress(input, output)
buffer = Vector{UInt8}(16 * 1024)
buffer = Vector{UInt8}(undef, 16 * 1024)
while !eof(input)
n = min(nb_available(input), length(buffer))
n = min(bytesavailable(input), length(buffer))
unsafe_read(input, pointer(buffer), n)
unsafe_write(output, pointer(buffer), n)
stats = TranscodingStreams.stats(input)
Expand Down Expand Up @@ -207,11 +213,17 @@ Transcode lots of strings
`transcode(<codec type>, data)` method is convenient but suboptimal when
transcoding a number of objects. This is because the method reallocates a new
codec object for every call. Instead, you can use `transcode(<codec object>,
data)` method that reuses the allocated object as follows:
data)` method that reuses the allocated object as follows. In this usage, you
need to explicitly allocate and free resources by calling
`TranscodingStreams.initialize` and `TranscodingStreams.finalize`,
respectively.

```julia
using CodecZstd
using TranscodingStreams
strings = ["foo", "bar", "baz"]
codec = ZstdCompressor()
TranscodingStreams.initialize(codec) # allocate resources
try
for s in strings
data = transcode(codec, s)
Expand All @@ -220,7 +232,7 @@ try
catch
rethrow()
finally
CodecZstd.TranscodingStreams.finalize(codec)
TranscodingStreams.finalize(codec) # free resources
end
```

Expand All @@ -240,9 +252,9 @@ data2 = read(stream, 8)
@assert data1 == data2
```

The unread operaion is different from the write operation in that the unreaded
The unread operation is different from the write operation in that the unreaded
data are not written to the wrapped stream. The unreaded data are stored in the
internal buffer of a transcoding stream.

Unfortunately, *unwrite* operation is not provided because there is no way to
cancel write operations that are already commited to the wrapped stream.
cancel write operations that are already committed to the wrapped stream.
92 changes: 50 additions & 42 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,41 @@
TranscodingStreams.jl
=====================
# Home

Overview
--------
![TranscodingStream](./assets/transcodingstream.png)

TranscodingStreams.jl is a package for transcoding (e.g. compression) data
streams. It exports a type `TranscodingStream`, which is a subtype of `IO` and
supports various I/O operations like other usual I/O streams in the standard
library. Operations are quick, simple, and consistent.
## Overview

In this page, we intorduce the basic concepts of TranscodingStreams.jl and
available packages. The [Examples](@ref) page demonstrates common usage. The
[References](@ref) page offers a comprehensive API document.
TranscodingStreams.jl is a package for transcoding data streams. Transcoding
may be compression, decompression, ASCII encoding, and any other codec. The
package exports a data type `TranscodingStream`, which is a subtype of `IO` and
wraps other `IO` object to transcode data read from or written to the wrapped
stream.

In this page, we introduce the basic concepts of TranscodingStreams.jl and
currently available packages. The [Examples](@ref) page demonstrates common
usage. The [Reference](@ref) page offers a comprehensive API document.

Introduction
------------
## Introduction

`TranscodingStream` has two type parameters, `C<:Codec` and `S<:IO`, and hence
the actual type should be written as `TranscodingStream{C<:Codec,S<:IO}`. This
type wraps an underlying I/O stream `S` by a codec `C`. The codec defines
transformation (or transcoding) of the stream. For example, when `C` is a
lossless decompressor type and `S` is a file, `TranscodingStream{C,S}` behaves
like a data stream that incrementally decompresses data from the file.
the concrete data type is written as `TranscodingStream{C<:Codec,S<:IO}`. This
type wraps an underlying I/O stream `S` by a transcoding codec `C`. `C` and `S`
are orthogonal and hence you can use any combination of these two types. The
underlying stream may be any stream that supports I/O operations defined by the
`Base` module. For example, it may be `IOStream`, `TTY`, `IOBuffer`, or
`TranscodingStream`. The codec `C` must define the transcoding protocol defined
in this package. We already have various codecs in packages listed below. Of
course, you can define your own codec by implementing the transcoding protocol
described in [`TranscodingStreams.Codec`](@ref).

Codecs are defined in other packages listed below:
You can install codec packages using the standard package manager. These codec
packages are independent of each other and can be installed separately. You
won't need to explicitly install the TranscodingStreams.jl package unless you
will use lower-level interfaces of it. Each codec package defines some codec
types, which is a subtype of `TranscodingStreams.Codec`, and their
corresponding transcoding stream aliases. These aliases are partially
instantiated by a codec type; for example, `GzipDecompressionStream{S}` is an
alias of `TranscodingStream{GzipDecompressor,S}`, where `S` is a subtype of
`IO`.

```@raw html
<table>
Expand All @@ -33,7 +44,7 @@ Codecs are defined in other packages listed below:
<th>Library</th>
<th>Format</th>
<th>Codec</th>
<th>Stream</th>
<th>Stream alias</th>
<th>Description</th>
</tr>
<tr>
Expand Down Expand Up @@ -100,7 +111,7 @@ Codecs are defined in other packages listed below:
<tr>
<td rowspan="2"><a href="https://github.com/bicycle1885/CodecZstd.jl">CodecZstd.jl</a></td>
<td rowspan="2"><a href="http://facebook.github.io/zstd/">zstd</a></td>
<td rowspan="2"><a href="https://github.com/facebook/zstd/blob/dev/doc/zstd_compressor_format.md">Zstandard Compressor Format</a></td>
<td rowspan="2"><a href="https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md">Zstandard Compression Format</a></td>
<td><code>ZstdCompressor</code></td>
<td><code>ZstdCompressorStream</code></td>
<td>Compress data in zstd (.zst) format.</td>
Expand Down Expand Up @@ -146,27 +157,24 @@ Codecs are defined in other packages listed below:
</table>
```

Install packages you need by calling `Pkg.add(<package name>)` in a Julia
session. For example, if you want to read gzip-compressed files, call
`Pkg.add("CodecZlib")` to use `GzipDecompressor` or `GzipDecompressorStream`.
By convention, codec types have a name that matches `.*(Co|Deco)mpression` and
I/O types have a codec name with `Stream` suffix. All codecs are a subtype
`TranscodingStreams.Codec` and streams are a subtype of `Base.IO`. An important
thing is these packages depend on TranscodingStreams.jl and not *vice versa*.
This means you can install any codec package you need without installing all
codec packages. Also, if you want to define your own codec, it is totally
feasible like these packages. TranscodingStreams.jl requests a codec to
implement some interface functions which will be described later.

## Notes

Error handling
--------------
### Wrapped streams

You may encounter an error while processing data with this package. For example,
your compressed data may be corrupted or truncated and the decompressor codec
cannot handle it properly. In this case, the codec informs the stream of the
error and the stream goes to an unrecoverable mode. In this mode, the only
possible operations are `isopen` and `close`. Other operations, such as `read`
or `write`, will result in an argument error exception. Resources allocated in
the codec will be released by the stream and hence you must not call the
finalizer of a codec that is once passed to a transcoding stream object.
The wrapper stream takes care of the wrapped stream. Reading or writing data
from or to the wrapped stream outside the management will result in unexpected
behaviors. When you close the wrapped stream, you must call the `close` method
of the wrapper stream, which releases allocated resources and closes the
wrapped stream.

### Error handling

You may encounter an error while processing data with this package. For
example, your compressed data may be corrupted or truncated for some reason,
and the decompressor cannot recover the original data. In such a case, the
codec informs the stream of the error, and the stream goes to an unrecoverable
mode. In this mode, the only possible operations are `isopen` and `close`.
Other operations, such as `read` or `write`, will result in an argument error
exception. Resources allocated by the codec will be released by the stream, and
hence you must not call the finalizer of the codec.
7 changes: 4 additions & 3 deletions docs/src/references.md → docs/src/reference.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
References
==========
Reference
=========

```@meta
CurrentModule = TranscodingStreams
Expand All @@ -10,7 +10,8 @@ TranscodingStream

```@docs
TranscodingStream(codec::Codec, stream::IO)
transcode(codec::Codec, data::Vector{UInt8})
transcode(::Type{<:Codec}, data::ByteData)
transcode(codec::Codec, data::ByteData)
TranscodingStreams.TOKEN_END
TranscodingStreams.unsafe_read
TranscodingStreams.unread
Expand Down
5 changes: 5 additions & 0 deletions src/state.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
# =================

# See docs/src/devnotes.md.
"""
A mutable state type of transcoding streams.
See Developer's notes for details.
"""
mutable struct State
# current stream mode
mode::Symbol # {:idle, :read, :write, :stop, :close, :panic}
Expand Down
Loading

0 comments on commit b87d348

Please sign in to comment.