Skip to content

Exports EAD, PDF and METS from updated resource records and digital objects in ArchivesSpace

License

Notifications You must be signed in to change notification settings

RockefellerArchiveCenter/as_export

Repository files navigation

as_export

A script to automate and version the export of updated data from ArchivesSpace.

Dependencies

Getting Started

  1. Install dependencies

  2. Get a copy of the repo

    git clone [email protected]:RockefellerArchiveCenter/as_export.git
    

    or just download the zip file of this repo

  3. Create a local configuration file named local_settings.cfg in the same directory as the script and add variables. A sample file looks like this:

    [ARCHIVESSPACE]
    baseurl:http://localhost:8089
    repository:2
    user:admin
    password:admin
    
    [EAD]
    unpublished:false
    daos:true
    numbered:false
    
    [LAST_EXPORT]
    filepath:last_export.txt
    
    [DESTINATIONS]
    data = data
    ead = ead
    mets = mets
    
  4. Set up repositories

  5. Set a cron job to run as_export.py at an interval of your choice. This should be done in the crontab of the user whose SSH key has been added to Github.

The first time you run this, the script may take some time to execute, since it will attempt to export all published resource records in your ArchivesSpace repository. If you ever want to do a complete export, simply delete last_export.txt and the last_export variable will be set to zero (i.e. the epoch, which was long before ArchivesSpace or any of the resources in it existed).

Optional arguments

The script supports a few arguments, which will include or exclude specific functions. These arguments are also available via the command line by typing as_export -h.

--update_time updates last exported time stored in external file to current time. Useful when you want to avoid exporting everything after you've run reindexing when migrating to a new version.

--digital exports METS for all digital object records, regardless of when those resources were last updated. When this argument is used, the script does not update the last run time.

--resource %identifier% exports EAD for a specific resource record matching the ArchivesSpace %identifier%, regardless of when that resource was last updated. When this argument is used, the script does not update the last run time.

--resource_digital %identifier% exports METS digital object records associated with the the resource record matching the ArchivesSpace %identifier%, regardless of when those records were last updated. When this argument is used, the script does not update the last run time.

What's here

as_export.py

Exports EAD files from published resource records updated since last export (including updates to any child components or associated agents and subjects), as well as METS records for digital object records associated with those resource records. If a resource record is unpublished, this script will remove the EAD, PDF and any associated METS records. Exported or deleted files are logged to a text file log.txt. (Python)

Development

This repository contains a configuration file for git pre-commit hooks which help ensure that code is linted before it is checked into version control. It is strongly recommended that you install these hooks locally by installing pre-commit and running pre-commit install.

License

This code is released under the MIT License. See LICENSE.md for more information.

About

Exports EAD, PDF and METS from updated resource records and digital objects in ArchivesSpace

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages