From 2e9ba1ec16c550762690f0bbf5783a5315da911b Mon Sep 17 00:00:00 2001 From: Walter Obweger Date: Fri, 28 Aug 2020 05:15:25 +0200 Subject: [PATCH] presentation --- itpACDH_20200827_notes.md | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 itpACDH_20200827_notes.md diff --git a/itpACDH_20200827_notes.md b/itpACDH_20200827_notes.md new file mode 100644 index 0000000..b9a8de3 --- /dev/null +++ b/itpACDH_20200827_notes.md @@ -0,0 +1,35 @@ +hello I'm Walter, +I was lucky enough to be assigned to project Arthur Schnitzler Briefe conducted by Martin. + +use the hashtag to explore more on twitter. + +correspSearch dot net is a platform hosted by +Berlin-Brandenburg Academy of Sciences and Humanities intended to facilitate correspondence search, who wrote when and where a letter to whom. +to do so, a TEI variation called CMIF has to be prepared. +Correspondence Metadata Interchange Format + +there is an online editor available, as you see here. +GND, gemeinsame Normdaten geo locations are supported and resolvable. + +input data were letter in print, scanned already. +the larger volumes contained a index, three columns starting with recipient and dates. + +there are multiple approaches to tackle task to convert index into an CMIF. + +Martin inspired me to following workflow + +first we use transkribus to perform a text recognition. +important to mention is that text area recognition needed a little help, manual selection was unavoidable. + +now all lines of the index were in a text file, line by line. + +with OpenRefine this messy data was shaped into a table. +during this process, problems in text recognition surfaced. +on first glance correct dates, couldn't be, because they weren't chronological. + +to detect such problems, a little python script in jupyter notebook was created. + +as final step, by means of OpenRefine suspect dates were edited manually. +and shaped into CMIF json file, which CMIF Creator can load. +GND has been adjusted manually by Martin. +