Releases: CLARIAH/wp6-missieven
With entity and ent nodes
Now the entity occurrences are represented as ent
nodes and these nodes have the features eid
and kind
for entity ID and entity kind. There are also entity
nodes that collect entity occurrences with the same eid
and kind
.
The edge feature eoccs
links entity
nodes to their occurrences, the ent
nodes.
So, multiword entity occurrences now corresponds to a single ent
node, linked to the words the entity occupies.
The ent
and entity
nodes are added to the original dataset. The version of the dataset is still 1.0e.
Note that most tutorials work with version 1.0, but not version 1.0e.
If you need to work with earlier versions of the missieven, specify the version in the use command, like so:
A = use("CLARIAH/wp6-missieven", version="1.0")
This works best if you have installed Text-Fabric as
pip install --upgrade 'text-fabric[all]'
because then TF can use the GitHub API to fetch the data.
If you only work with the latest version (1.0e
) this is not needed.
With entities as nodes
Now the entities are represented as nodes and these nodes have the features eid
and kind
for entity ID and entity kind.
So, a multiword entity occurrences now corresponds to a single entity node, linked to the words the entity occupies.
The entity nodes are added to the original dataset. The new version of the dataset is 1.0e.
Not that all tutorials work with version 1.0, but not version 1.0e.
If you need to work with earlier versions of the missieven, specify the version in the use command, like so:
A = use("CLARIAH/wp6-missieven", version="1.0")
This works best if you have installed Text-Fabric as
pip install --upgrade 'text-fabric[all]'
because then TF can use the GitHub API to fetch the data.
If you only work with the latest version (1.0e
) this is not needed.
With a new entities export
Metadata added for harvesting by CLARIAH.
New version of entity annotations by Sophie Arnoult.
Note on the attachments:
No need to download them manually. They will be fetched by Text-Fabric when needed.
tf
: the main corpusvoc-missives-export
entity annotations as produced in cltl/voc-missivesvoc-missives-migrated
entity annotations as migrated from an earlier version in cltl/voc-missivesexercises-entities
: results of toy example of creating entity annotationsexercises-numerics
: results of toy example of creating other annotations
Includes volume 14
Volume 14 was not included so far.
It has two bands.
We converted this material from a textual pdf, produced the same kind of xml as for volumes 1-13,
and generated TF from the result.
All letters in a page, space corrections
All letters now are part of a page, also the letters that do not have a <pb>
element in their text.
The space corrections by Sophie have been applied.
Spaces
Added spaces to feature punc
and friends on the basis of a correction set by Sophie Arnoult.
Fixed words outside lines
When multiple letters occur on a single page,
the non-first letters on such pages end up with the words on the first line not wrapped in a line node.
This hinders a space-optimization in the layered search app.
Corrected.
New data version 0.7
Data version 0.7 has a different treatment of footnotes.
Before, the footnote bodies were mere feature values.
Now they occupy slots and lines themselves.
v0.6
Dataversion 0.6.
Small fixes in folio references.
There is also a simple data export of all words plus basis information.
You can use this as input for natural language tools and named entity recognition.
The data can be used to run this tools on orginal words and editorial words separately.
See
export notebook
for a detailed description and the way it is generated.
Data version 0.5
Fixed the generation of spurious newlines in footnote bodies