Skip to content

Corpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph.

License

Notifications You must be signed in to change notification settings

DLR-SC/corpus-annotation-graph-builder

Repository files navigation

Welcome to the Corpus Annotation Graph Builder (CAG)

Badge: PyPI version Badge: Made with Python Badge: Open in VSCode Badge: Black DOI License: MIT Twitter: DLR Software

cag is a Python Library offering an architectural framework to employ the build-annotate pattern when building Graphs.


Official Documentation.

Corpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph. CAG is built on top of ArangoDB and its Python drivers (PyArango). The build-and-annotate pattern consists of two phases (see Figure below): (1) data is collected from different sources (e.g., publication databases, online encyclopedias, news feeds, web portals, electronic libraries, repositories, media platforms) and preprocessed to build the core nodes, which we call Objects of Interest. The component responsible for this phase is the Graph-Creator. (2) Annotations are extracted from the OOIs, and corresponding annotation nodes are created and linked to the core nodes. The component dealing with this phase is the Graph-Annotator.

cag

This framework aims to offer researchers a flexible but unified and reproducible way of organizing and maintaining their interlinked document collections in a Corpus Annotation Graph.

Installation

Direct install via pip

The package can also be installed directly via pip.

pip install cag

This will allow you to use the module cag from any python script locally. The two main packages are cag.framework and cag.view_wrapper.

Manual cloning

Clone the repository, go to the root folder and then run:

pip install -e .

Citation

Please cite us in case you use CAG

@inproceedings{el-baff-etal-2023-corpus,
  title = "Corpus Annotation Graph Builder ({CAG}): An Architectural Framework to Create and Annotate a Multi-source Graph",
  author = "El Baff, Roxanne  and
    Hecking, Tobias  and
    Hamm, Andreas  and
    Korte, Jasper W.  and
    Bartsch, Sabine",
  booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
  month = may,
  year = "2023",
  address = "Dubrovnik, Croatia",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.eacl-demo.28",
  pages = "248--255"
}

Usage

  • After the installation, a project scaffold can be created with the command cag start-project
  • Graph Creation [jupyter notebook]
  • Graph Annotation [jupyter notebook]

Zenodo refs

Latest Version

  • v1.6.0 DOI

Previous Version

  • v1.5.17DOI
  • v1.5.0 DOI
  • v1.4.0 DOI

About

Corpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages