Skip to content

The SIP (Submission Information Package) Validator is a Python script designed to validate the structure and content of a SIP directory. The script checks for the presence of expected folders and files, validates CSV files' structure, and ensures data integrity

Notifications You must be signed in to change notification settings

vt-digital-libraries-platform/sip-validator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

sip-validator

The SIP (Submission Information Package) Validator is a Python script designed to validate the structure and content of a SIP directory. A SIP typically consists of digital objects, metadata files, supporting information, a manifest, and a Readme file. The script checks for the presence of expected folders and files, validates CSV files' structure, and ensures data integrity.

Prerequisites

  • Python 3.x
  • Standard Python libraries: os, csv, re, datetime

Usage

  1. Save the script as sip-validator.py.
  2. Run the script in a terminal or command prompt: python3 sip-validator.py
  3. Enter the path to the SIP root folder when prompted.

Functionality

  1. Folder and File Existence Check The script verifies the existence of the following folders and files in the SIP:
  • Data: Files
  • Manifest: checksumsha1
  • Metadata: collection_metadata.csv, item_metadata.csv.
  • README: (read me of what is in the folder).
  1. CSV File Validation The script checks the structure and content of collection_metadata.csv and item_metadata.csv: collection_metadata.csv
  • Required Fields: identifier, title, description, visibility, rightsholder, rights.
  • Unique Identifier Check: Ensures each identifier is unique.
  • Special Character Check: Looks for special characters in each field. item_metadata.csv
  • Required Fields: identifier, title, description, creator, display_date, start_date, end_date, rights_holder, rights, tags, coverage, type, language, visibility, location.
  • Unique Identifier Check: Ensures each identifier is unique.
  • Special Character Check: Looks for special characters in each field.
  1. Validation Results Validation results are saved in a receipt file (validation_receipt.txt) within the SIP root folder:
  • Validation Errors: Details errors encountered during validation.
  • Non-Unique Identifiers: Lists rows with non-unique identifiers (if applicable).
  • Validation Success: Indicates successful validation if no errors are found.
  • Example - python3 sip-validator.py
  • Enter the path to your SIP root folder: /path/to/sip_root
  • Validation completed. Receipt saved to: /path/to/sip_root/validation_receipt.txt

About

The SIP (Submission Information Package) Validator is a Python script designed to validate the structure and content of a SIP directory. The script checks for the presence of expected folders and files, validates CSV files' structure, and ensures data integrity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages