diff --git a/README.md b/README.md new file mode 100644 index 0000000..9a4607c --- /dev/null +++ b/README.md @@ -0,0 +1,20 @@ +# Unipept Index + +![Codecov](https://img.shields.io/codecov/c/github/unipept/unipept-index?token=IZ75A2FY98&logo=Codecov) + +The unipept index written entirely in `Rust`. This repository consists of multiple different Rust projects that depend on +each other. More information about each project can be found in their respective `README.md` file. + +## Installation + +Clone this repository with the following command: + +```bash +git clone https://github.com/unipept/unipept-index.git +``` + +And build the projects using: + +```bash +cargo build --release +``` diff --git a/fa-compression/README.md b/fa-compression/README.md new file mode 100644 index 0000000..0246520 --- /dev/null +++ b/fa-compression/README.md @@ -0,0 +1,33 @@ +# Functional Annotation Compression + +![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/unipept/unipept-index/test.yml?logo=github) +![Codecov](https://img.shields.io/codecov/c/github/unipept/unipept-index?token=IZ75A2FY98&flag=fa-compression&logo=codecov) +![Static Badge](https://img.shields.io/badge/doc-rustdoc-green) + +The `fa-compression` library offers compression for Unipept's functional annotation strings. These strings follow a very specific +format that the compression algorithm will use to achieve a guaranteed minimal compression of **50%** for both very large and very +small input strings. The compression ratio will often situate around **60-70%**. + +The compression algorithm never has to allocate extra memory to build an encoding table or something similar. We can encode each +string separately. This is particullary useful when all strings have to be encoded/decoded on their own. There is no need to decode +an entire database to only fetch a single entry. + +## Example + +```rust +use fa_compression; + +fn main() { + let encoded: Vec = fa_compression::encode( + "IPR:IPR016364;EC:1.1.1.-;IPR:IPR032635;GO:0009279;IPR:IPR008816" + ); + + // [ 44, 44, 44, 189, 17, 26, 56, 173, 18, 116, 117, 225, 67, 116, 110, 17, 153, 39 ] + println!("{:?}", encoded); + + let decoded: String = fa_compression::decode(&encoded); + + // "EC:1.1.1.-;GO:0009279;IPR:IPR016364;IPR:IPR032635;IPR:IPR008816" + println!("{:?}", decoded); +} +```