This project provides Python scripts to interact with the Transkribus REST API.
- Batch Upload: Upload multiple documents and their PageXMLs to a specified Transkribus collection.
- Batch Update: Update PageXMLs/transcriptions of documents already in a Transkribus collection.
- Python 3.7+
- An active Transkribus account
-
Clone the repository:
git clone https://github.com/cconzen/TranskribusBatchUtils.git cd TranskribusBatchUtils
-
Install required dependencies:
pip install -r requirements.txt
-
Set up your Transkribus Log in credentials in an
.env
file:TRANSKRIBUS_USER=<[email protected]> TRANSKRIBUS_PASSWORD=<password>
To upload a directory of documents to a Transkribus collection, use the following command:
python main.py upload <base_directory> <collection_id>
<base_directory>: The base directory that contains the documents.
<collection_id>: The Transkribus collection ID to which you want to upload the documents.
To update the PageXMLs of all documents in a Transkribus collection, use:
python main.py update <base_directory> <collection_id>
<base_directory>: The base directory that contains the documents.
<collection_id>: The Transkribus collection ID to which you want to upload the documents.
python main.py upload documents 12345
This will upload the contents of the documents
directory to the collection with ID 12345.
Make sure your base directory follows this structure for the script to work properly:
(metadata.xml is only required for updating existing Transkribus documents; it should be automatically created when exporting from Transkribus)
```
/BASE_DIR/
/Document_1/
metadata.xml
/page/
Page_1.xml
Page_2.xml
/Document_2/
metadata.xml
/page/
Page_1.xml
Page_2.xml
...
```
This project is licensed under the MIT License.