Releases · CLARIAH/wp6-missieven

27 Mar 07:15

c86c5b3

Latest

Now the entity occurrences are represented as ent nodes and these nodes have the features eid and kind for entity ID and entity kind. There are also entity nodes that collect entity occurrences with the same eid and kind.
The edge feature eoccs links entity nodes to their occurrences, the ent nodes.

So, multiword entity occurrences now corresponds to a single ent node, linked to the words the entity occupies.

The ent and entity nodes are added to the original dataset. The version of the dataset is still 1.0e.

Note that most tutorials work with version 1.0, but not version 1.0e.

If you need to work with earlier versions of the missieven, specify the version in the use command, like so:

A = use("CLARIAH/wp6-missieven", version="1.0")

This works best if you have installed Text-Fabric as

pip install --upgrade 'text-fabric[all]'

because then TF can use the GitHub API to fetch the data.

If you only work with the latest version (1.0e) this is not needed.

Assets 6

26 Jan 14:49

dirkroorda

v1.0e

61b0cb1

With entities as nodes

Now the entities are represented as nodes and these nodes have the features eid and kind for entity ID and entity kind.
So, a multiword entity occurrences now corresponds to a single entity node, linked to the words the entity occupies.

The entity nodes are added to the original dataset. The new version of the dataset is 1.0e.

Not that all tutorials work with version 1.0, but not version 1.0e.

If you need to work with earlier versions of the missieven, specify the version in the use command, like so:

A = use("CLARIAH/wp6-missieven", version="1.0")

This works best if you have installed Text-Fabric as

pip install --upgrade 'text-fabric[all]'

because then TF can use the GitHub API to fetch the data.

If you only work with the latest version (1.0e) this is not needed.

Assets 6

12 Oct 08:47

dirkroorda

v1.1

e955ebf

With a new entities export

Metadata added for harvesting by CLARIAH.

New version of entity annotations by Sophie Arnoult.

Note on the attachments:

No need to download them manually. They will be fetched by Text-Fabric when needed.

tf: the main corpus
voc-missives-export entity annotations as produced in cltl/voc-missives
voc-missives-migrated entity annotations as migrated from an earlier version in cltl/voc-missives
exercises-entities: results of toy example of creating entity annotations
exercises-numerics: results of toy example of creating other annotations

Assets 11

04 May 14:30

dirkroorda

1.0

f6e4276

Includes volume 14

Volume 14 was not included so far.
It has two bands.
We converted this material from a textual pdf, produced the same kind of xml as for volumes 1-13,
and generated TF from the result.

Assets 5

22 Jul 15:19

dirkroorda

v0.9.1

fc67e0b

All letters in a page, space corrections

All letters now are part of a page, also the letters that do not have a <pb> element in their text.
The space corrections by Sophie have been applied.

Assets 5

17 Jun 08:17

dirkroorda

v0.8.1

4039e4b

Spaces

Added spaces to feature punc and friends on the basis of a correction set by Sophie Arnoult.

Assets 3

20 May 14:40

dirkroorda

v0.8

126869d

Fixed words outside lines

When multiple letters occur on a single page,
the non-first letters on such pages end up with the words on the first line not wrapped in a line node.
This hinders a space-optimization in the layered search app.
Corrected.

Assets 4

30 Jan 14:31

dirkroorda

v0.7

4553cbb

New data version 0.7

Data version 0.7 has a different treatment of footnotes.
Before, the footnote bodies were mere feature values.
Now they occupy slots and lines themselves.

Assets 4

07 Dec 11:03

dirkroorda

v0.6

9910511

v0.6

Dataversion 0.6.

Small fixes in folio references.

There is also a simple data export of all words plus basis information.
You can use this as input for natural language tools and named entity recognition.
The data can be used to run this tools on orginal words and editorial words separately.

See
export notebook
for a detailed description and the way it is generated.

Assets 4

17 Nov 08:39

dirkroorda

v0.5

d229830

Data version 0.5

Fixed the generation of spurious newlines in footnote bodies

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: CLARIAH/wp6-missieven

With entity and ent nodes

With entities as nodes

With a new entities export

Includes volume 14

All letters in a page, space corrections

Spaces

Fixed words outside lines

New data version 0.7

v0.6

Data version 0.5