repo to automatgically merge Persons from Fackel with PMB
Once upon a time (summer 2022) csae8092 trained a csv-dedupe model to match person entitites from the "Fackel" with PMB-Persons.
In order to make the process of merging those two datasets as reproducable as possible, the generated training data (./dedupe_files/training.json
) as well as the ouput the dedupe workflow output_link.csv
is checked into this repo.
fetch_data.sh
is a simple shell script to fetch the latest version of the Fackel-Personen data from a non public gitlab-repo and convert it into XML/TEI
add_idno.py
writes PMB and FACKEL-Person-URIS into the fackel listperson.xml
Everything above is wrapped into GitHub-Action pushing the resulting file into https://github.com/karl-kraus/fackel-texte