TOC
Document version history
Version document | Date | Authors |
0.1 | 01-02-2017 | Robert Gillesse |
0.2 | 24-03-2017 | Robert Gillesse |
0.3 | 07-08-2017 | Robert Gillesse, input Afelonne Doek |
0.4 | Januar 2018 | Robert Gillesse, Input Eric de Ruijter, Afelonne Doek |
0.5 | April 2018 | Robert Gillesse |
0.6 | May 2018 | Input Eric de Ruijter |
0.7 | Jan 2019 | Input and comments Afelonne |
0.8 | Feb 2019 | Hannah Mackay proof reading |
0.9 | March 2019 | Robert Gillesse check on links and adding documentation |
1.0 | July 2019 | Robert Gillesse:
|
1.1 | 5-7-2019 | Submitted the text in the CTS Application Management Tool. As this accepts only plain text had to make small changes to preserve the logic of the text. In the version below all mark up is still visible. |
1.2 | 19-12-2019 | Processed all comments of CTS reviewers (arrived 14-11-2019). With input from Afelonne, Eric and Mario All changes marked by text comments. |
1.3 | 29-4-2020 | Processed all remarks CTS reviewer (mail 26-3-2020) Removed revisions markers from version 1.2 All new changes marked by text comments. |
1.4 | 26-6-2020 | Last changes before CTS publication (mail CTS 17-6-2020). Zefi Kavvadia proof reading. Removed revisions markers from version 1.3 |
1.5 | 13-7-2020 | The IISH has acquired the CTS. The text of the application has been published online: https://www.coretrustseal.org/wp-content/uploads/2020/07/International-Institute-of-Social-History-IISH.pdf. The text below is identical to the CTS publication only for some minor lay out differences (the version below should be slightly easier to read). |
Requirements source
Data Seal of Approval Guidelines version 2017-2019 November 10, 2016
Possible compliance levels for each of the Requirements
0 – Not applicable
1 – The repository has not considered this yet
2 – The repository has a theoretical concept
3 – The repository is in the implementation phase
4 – The guideline has been fully implemented in the repository
R0 Background information
- Repository type:
- Chosen categories: Domain or subject-based repository,/Library/Museum/Archives, other (please describe below).
- Brief description of repository: The International Institute of Social History (IISH) is a research and cultural heritage institute in the field of global history of labour and labour relations (see also R1). The institute acquires, manages and preserves archives, library material and audiovisual material in this field, as well as research data. The IISH is a 'private archive' and not a state archive. It has no legal deposit function and materials are collected outside the public sphere of government. Archives can be acquired from organisations (like for instance trade unions or activist groups) or private persons. Likewise, the subject matter of archives can be about organisations or individuals (as for instance Karl Marx, Mikhail Bakunin or Rosa Luxemburg).
- Designated communities:
Researchers (mostly students and scientists from the humanities and social sciences and research journalists)
- General public with (sometimes professional) interest in the IISH collections
- Archival donors
- Levels of curation: collection and research data: Enhanced curation – e.g., conversion to new formats, enhancement of documentation
- Outsource partners: The IISH uses the digital storage services of the KNAW (Royal Netherlands Academy of Arts and Sciences, see 6b directly below) ICT services for primary storage of the digital collections. This is formalised in a SLA between the KNAW ICT services and the IISH. The KNAW outsources the storage to the datacenter ‘Vancis’ (https://vancis.nl/). For secondary storage the IISH uses the services of ‘SURFsara’ data storage (https://www.surf.nl/en/services-and-products/data-archive/index.html). Both parties are closely linked to the Dutch university sphere in which security, trustworthiness and authenticity of data are of eminent value and have to meet strict demands on information security. Both parties meet the relevant ISO 27001 standard family (https://en.wikipedia.org/wiki/ISO/IEC_27001). ‘Vancis’: https://vancis.nl/over-vancis/certificeringen/. ‘Surf’: https://www.surf.nl/en/services-and-products/data-archive/data-security-and-privacy/index.html.
- Other relevant information:
- In a small selection of CTS criteria a distinction is made between the IISH collection data and IISH research data. By the first is meant all data concerning the archival, library and audio-visual collections (which can either be digitised or born digital materials). By the second the data which are the result of IISH initiated research.
- Important to mention is that the IISH is a Royal Netherlands Academy of Arts and Sciences (KNAW) institute. This means that part of the IISH services and policies are dependent on broader KNAW policies. Responsibility for collection management is however in the hands of an independent body: the IISH Foundation. Ownership of the research data that are created by IISH researchers lies with the KNAW Institute IISH. Since 2017 the IISH is also part of the Humanities Cluster (HUC) - an alliance between the KNAW institutes IISH, the Meertens Institute (https://www.meertens.knaw.nl/cms/en/) and Huygens-ING institute (https://www.huygens.knaw.nl/?lang=en). This alliance has a goal to stimulate cooperation between these institutes and promote innovation on the terrain of technical infrastructure and digital humanities research. This wider KNAW and HUC context gives the IISH more backbone/elbow room concerning issues like office IT, information security, digital humanities tools and research and knowledge management. When relevant this wider context will be mentioned in the criteria below.
- The IISH uses Archivematica - an open source digital archiving workflow software which follows OAIS principles (https://www.archivematica.org/en/) - to manage the transfer and ingest of digitized and born digital content and to support the preservation watch function of the digital repository. In 2020 Archivematica will also be implemented to archive the IISH research data. Archivematica is therefore a core application in the IISH digital archiving process. Since 2019 the IISH is also part of the Archivematica Support Group which supports and monitors the development of new features and contributes to the Archivematica development roadmap.
- Important information for the reviewers:
- Almost all documentation mentioned in the requirements is found on the public part of IISH Confluence pages: https://confluence.socialhistoryservices.org/display/CTS/Certification+of+IISH+Digital+Repository+Home. Where this is not the case this is mentioned explicitly.
- All documentation mentioned in the requirements can be found also on the List of documentation mentioned in CTS form https://confluence.socialhistoryservices.org/display/CTS/List+of+documentation+mentioned+in+CTS+form.
- The DOI mentioned in the organizational profile only points to the research data archive of the IISH. See also point 6a.
- The plain text of this application form can also be read in a formatted and slightly easier to read version on the IISH confluence: https://confluence.socialhistoryservices.org/display/CTS/Core+Trust+Seal+%28CTS%29+certification. Also all changes from the last version will be made visible (as comments).
R1 The repository has an explicit mission to provide access to and preserve data in its domain
Guidance: Repositories take responsibility for stewardship of digital objects, and to ensure that materials are held in the appropriate environment for appropriate periods of time. Depositors and users must be clear that preservation of, and continued access to, the data is an explicit role of the repository. For this Requirement, please describe:
Extended guidance 2017: If data management is not referred to in the mission statement, then, as a rule, this Requirement cannot have a compliance level of 3 or higher. |
Compliance level: 4 – The guideline has been fully implemented in the repository |
Response: In the IISH mission statement is as following: "The IISH is a unique institute, serving science and society on a global scale. At an international level, we generate and offer reliable information and insights about the (long-term) origins, effects and consequences of social inequality. To promote this, we form an international hub for social historians worldwide. We offer and produce historical sources and data, facilitate social-history research and collaborate internationally in ground breaking research projects. Moreover, by preserving the heritage of often oppressed social movements, the Institute serves the quality of the world's memory. With our work we hope to contribute to a vibrant civil society." Source: https://iisg.amsterdam/en/about/mission In this statement especially the phrases "We offer and produce historical sources and data" and "Moreover, by preserving the heritage of often oppressed social movements, the Institute serves the quality of the world's memory" refer to long term stewardship of the institute. In the IISH Strategic Plan 2019 - 2023 (https://confluence.socialhistoryservices.org/download/attachments/32703335/strategic_plan_2018-2023_iisg_engels.pdf) the ambition for long term stewardship and a trustworthy digital repository is addressed (in paragraphs 3.1, 3.2 and especially measure 5.2 (page 26): "We develop an internationally accredited repository for the sustainable storage of digital collections". Also the Collection Policy 2015-2020 (https://confluence.socialhistoryservices.org/download/attachments/42568284/20180110_collectieplan_rev_2018.pdf?api=v2) clearly points towards the ambition of long term digital stewardship, while relating to the overall IISH mission. In the plan it is stated as such (page 5): "in 2020 all digital objects will be stored in a Trusted Digital Repository". Paragraph 3.8.3 (page 23) contains the most relevant information about long term access to digital collections of the IISH. It is also important to mention that between 2017-2025 the IISH will receive ample funds from the KNAW to make the organisational and infrastructural transition from paper to long term digital archiving. This is a clear mandate from the umbrella organisation of the Institute to invest in the people, knowledge, software and hardware needed to help with this transition. The ideas behind this can be found in the (Dutch) application document Van archief/bibliotheek naar humanities research infrastructure, Aanvraag vernieuwingsinvestering KNAW (From archive/library to a humanities research infrastructure, application renewal investment KNAW): https://confluence.socialhistoryservices.org/download/attachments/42568284/20161203_investeren_in_collecties_intranetversie_0.pdf |
R2 Licenses
The repository maintains all applicable licenses covering data access and use and monitors compliance.
Guidance: Repositories must maintain all applicable licenses covering data access and use, communicate about them with users, and monitor compliance. This Requirement relates to the access regulations and applicable licenses set by the data repository itself, as well as any codes of conduct that are generally accepted in the relevant sector for the exchange and proper use of knowledge and information. Reviewers will be seeking evidence that the repository has sufficient controls in place according to the access criteria of their data holdings, as well as evidence that any relevant licenses or processes are well managed. For this Requirement, please describe:
Note that if all data holdings are completely public and without conditions imposed on users—such as attribution requirements or agreement to make secondary analysis openly available—then it can simply be stated. This Requirement must be read in conjunction with R4 (Confidentiality/Ethics) to the extent that ethical and privacy provisions impact on the licenses. Assurance that deposit licences provide sufficient rights for the repository to maintain, preserve, and offer access to data is covered under R10 (Preservation Plan). Extended guidance 2017: Access and use conditions could be set differently: either as standard terms and conditions, or as differentiated for particular depositors or datasets. These could cover the level of curation, what is the liability level, the level of responsibility taken for the data, limitations on use, limits on usage environment (safe room, secure remote access), and limits on types of users (approved researcher, has received training, etc.). The Creative Commons licences (https://creativecommons.org/), including CC 0 Waiver and public domain data, could be used as a reference here, but other alternatives are also possible. While it may be challenging to identify instances of noncompliance, some consideration should be given to the consequences if noncompliance is detected (e.g., sanctions on current or future access/use of data). In the case of sensitive personal data disclosure, there may be severe legal penalties that impact both the user and repository. Ideally, repositories should have a public policy in place for noncompliance. The minimum compliance level should be 4, if the applicant is currently providing access to data. |
Compliance level: 4 - The guideline has been fully implemented in the repository |
Response: Metadata records about collections and collection items contain clear information regarding access, restrictions and permissions, copyright holders and - when relevant - other licenses such as Creative Commons (CC). Based on this information internal procedures and automatic responses take place. The metadata themselves are published under a CC0 license. Research data are always published with a CC license. Some examples:
All descriptive metadata are available under a CC0 license: https://iisg.amsterdam/en/collections/use/api-linked-data In the case of a restricted collection, data non-compliance is no issue as these collections are simply not available to the users. In the unlikely case our systems or processes fail we shall act as promptly as possible and repair breaches quickly. In the case of copyright violation we remind users of their legal obligations. See our 'Policy in case of noncompliance of licenses and copyright violations': https://confluence.socialhistoryservices.org/display/CTS/Policy+in+case+of+noncompliance+of+licences+and+copyright+violations. |
R3 Continuity of Access
The repository has a continuity plan to ensure ongoing access to and preservation of its holdings.
Guidance: This Requirement covers the measures in place to ensure access to, and availability of, data holdings, both currently and in the future. Reviewers are seeking evidence that preparations are in place to address the risks inherent in changing circumstances. For this Requirement, please describe:
Evidence for this Requirement should relate more to governance than to the technical information that is needed in R10 (Preservation plan) and R14 (Data reuse), and should cover the situation in which R1 (Mission/Scope) changes. This Requirement contrasts with R15 (Technical infrastructure) and R16 (Security) in that it covers full business continuity of the preservation and access functions. Extended guidance 2017: The reviewer is looking for information to understand the level of responsibility for data. For example, are you the primary or only custodian? Is the depositor responsible as well? Does the repository promise to provide access, preservation, and/or data storage to some minimum quality level for some minimum time period? This information helps the reviewer to judge if the repository is sustainable in terms of its finances and processes; in particular, the continuity of its collections and responsibilities in the case of a major business continuity failure. The responsibility for sustainability may not lie in the hands of the repository itself, but a higher, overarching (or umbrella) organization. If so, this should be clearly indicated. Moreover, if the repository is part of such a larger (umbrella) organization, has this or any other organization (e.g., National Archives) guaranteed that it will take over the responsibility in the case of major business continuity failure? |
Compliance level: 3 – The repository is in the implementation phase |
Response The IISH has a long-standing tradition (since 1935) in preserving and ensuring access to its (world famous) collections. It is at the core of the institute's activities and a guiding principle for all policy and strategy documents. The plans for ongoing access to collections and research data can be found in multiple places but are most clearly stated in the Collection Policy Plan 2015 - 2020 (https://confluence.socialhistoryservices.org/download/attachments/42568284/20161203_investeren_in_collecties_intranetversie_0.pdf). A citation from the IISH collection plan: " 1. The IISH aspires to be (in addition to being a first-class research institute) a world-leading centre for preserving and making available sources in the field of socio-economical history, in particular the world history of labour and industrial relations. 2. The collection is primarily formed, made accessible and made available for the purpose of scientific use, but it simultaneously forms a socially valuable heritage collection relevant to a broader public. 3. Highly decisive for realising the aforementioned ambitions are: a. the volume and quality of the collection; b. the dependability of safe storage, taking into account security, privacy and the trust of the archival creators. c. the level of the archive’s discoverability and findability; d. the quality and openness of the services used in making the archive available (online); e. the measures we take to maintain the collection; f. the extent to which the IISH leads nationally and internationally in using cutting-edge methods." IISH collections are given on loan or as a deposit to a separate and independent foundation - the IISH Foundation. The IISH foundation is responsible for the collection management, the KNAW institute IISH is responsible for carrying out this management. For research data, the KNAW institute IISH is the owner and is responsible for carrying out this management. Being a KNAW institute and being part of the HUC (see R0), guarantees the continuity of the IISH and its collections. The sharing of (for instance IT) services and knowledge (i.e. information security) between the KNAW and HUC institutes also creates the environment for the IISH to perform at a higher level that it could do on its own. Since the foundation of the IISH, more than 100 million euro has been spend on the collections and the infrastructural facilities to acquire, manage and preserve them. Yearly, around 60 percent of the budget for IT investments has been spent on the sustainability and development of the collection and data infrastructure that supports the work-processes, including digital preservation. It is unlikely that funds will be spent on something completely different or that this funding will significantly drop to a point where this is a serious risk. In the formal contract between IISH and KNAW the importance of the collections and the need for its availability to researchers is recognized, ensuring and acknowledging the role of the IISH in the continuing process of collecting, processing and making available of the collections as one of its primary tasks. It is also stated that the IISH receives financial means from the KNAW – which itself receives the financial means from the Dutch state – to enable the institute to carry out its tasks. In the unlikely event that funding should run out, the IISG will turn to the KNAW to come up with means to continue taking care of both collections and research data. From the perspective of risk management, a yearly updated risk assessment document ('Risicomanagementplan 2020 Humanities Cluster', meaning 'Risk managementplan', latest version September 2019 - available on request) offers important evidence regarding the long-term conservation of the collection (see also R 16). In this document the risk of data loss and cyber-attacks is indicated and insight is given on how these risks are controlled. |
R4 Confidentiality/Ethics
The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms.
Guidance: Adherence to ethical norms is critical to responsible science. Disclosure risk—for example, the risk that an individual who participated in a survey can be identified or that the precise location of an endangered species can be pinpointed—is a concern that many repositories must address. Evidence sought is concerned with not only having good practices for data with disclosure risks, but also the necessity to maintain the trust of those agreeing to have personal/sensitive data stored in the repository. For this Requirement, responses should include evidence related to the following questions:
Evidence for this Requirement should be in alignment with provisions for the procedures stated in R12 (Workflows) and for any licenses in R2 (Licences) Extended guidance 2017: All organizations responsible for data have an ethical duty to manage them to the level expected by the scientific practice of its Designated Community. For repositories holding data about individuals, businesses, or other organizations, there are in addition expectations that the rights of the data subjects will be protected. These will be both of a legal and ethical nature. Disclosure of these data could also present a risk of personal harm, a breach of commercial confidentiality, or the release of critical information (e.g., the location of protected species or an archaeological site). Minimum compliance level should be a 4 if the repository is currently providing access to personal data. Reviewers expect to see evidence that the applicant understands their legal environment and the relevant ethical practices, and that they have documented procedures. |
Compliance level: 4 - The guideline has been fully implemented in the repository |
Response: Looking at the ethics concerning the collection in general, two things are important to mention:
Looking at the different phases of the archival process the following criteria are important when talking about collection ethics: Creation of archives:
Curation/collection of archives:
Pre-ingest (quick scan and appraisal and selection): During appraisal and selection of the to-be-ingested archives sensitive parts might be identified and, in consultation with the archival donor, be de-selected or closed for a certain period of time (see below under 'access'). The first step in this process is the so called 'quick scan' of the new archive by the Collection Development employee who has acquired the archive. This procedure is described in the document First inspection/quick scan of a new archive procedure: https://confluence.socialhistoryservices.org/pages/viewpage.action?pageId=60653936. On the basis of this quick scan it can be decided to do a more elaborate appraisal and selection session. During the quick scan the archive is also checked for possible privacy or other disclosure issues. This is done on a high, general level. In practice this means that not every individual file will be checked but instead a generated file and directory list is scanned to try and identify any of these issues. Ingest:
Access:
In the IISH 'risk management plan' document (see R3) there is ample attention given to the risks of disclosure and what is done to mitigate these. Also, the legal department of the KNAW has checked all contracts and procedures. Most important to mention here is that for born digital content the standard procedure is that only an AIP is created (the AIP storage not connected to the outside world) but not automatically a DIP (which can be available in the reading room or - in seldom cases - online) as well. Only after a conscious decision by the collection acquisition and public services staff a DIP is made 'manually'. With the implementation of the European legislation concerning General Data Protection Regulation (GDPR) (May 2018) the KNAW has taken the lead in making sure all KNAW institutes do comply with the GPDR. Some of the necessary provisions (KNAW privacy statement - https://www.knaw.nl/en/about-us/academy-privacy-statement -, organisational procedures, data protection officer etc.) have been taken at KNAW level and apply to all the institutes. As for the collections, the Department of Collections at the IISH has made an inventory of workflows where personal data is involved and has taken the necessary measures to ensure compliance. Regarding personal data that is present in the collections of the institute, a working group within the KNAW has been established to determine necessary measures and monitor developments. Institution-specific documentation on GPDR, such as the exceptions in collecting materials with personal data valid for scientific and historical research, and how these apply to the IISH, is available on Confluence. Specifically for research data the following is relevant: For research data, disclosure issues between depositor and IISH are laid down in the Provisions Data Deposit Agreement document (https://confluence.socialhistoryservices.org/pages/viewpageattachments.action?pageId=32703335&preview=/32703335/48988571/20160900_provisions_data_deposit_agreement.docx). The Agreement contains the following: "The Depositor declares that the dataset contains no data or other elements that are, either in themselves or in the event of their publication, contrary to Dutch law." In other words: the same checks for disclosure issues apply for research data as for collection data. |
R5 Organizational infrastructure
The repository has adequate funding and sufficient numbers of qualified staff managed through a clear system of governance to effectively carry out the mission.
Guidance: Repositories need funding to carry out their responsibilities, along with a competent staff who have expertise in data archiving. However, it is also understood that continuity of funding is seldom guaranteed, and this must be balanced with the need for stability. For this Requirement, responses should include evidence related to the following:
Full descriptions of the tasks performed by the repository—and the skills necessary to perform them—may be provided, if available. Such descriptions are not mandatory, however, as this level of detail is beyond the scope of core certification. Extended guidance 2017: The description of this Requirement should contain evidence describing the organization’s governance/management decision-making processes and the entities involved. Staff should have appropriate training in data management to ensure consistent quality standards. It is also important to know what proportion of staff is employed on a permanent or temporary basis and how this might affect the professional quality of the repository, particularly for long-term preservation. To what degree is funding structural or project-based? Can this be expressed in FTE numbers? How often does periodic renewal occur? |
Compliance level: 4 - The guideline has been fully implemented in the repository |
Response:
|
R6 Expert Guidance
The repository adopts mechanism(s) to secure ongoing expert guidance and feedback (either in- house, or external, including scientific guidance, if relevant).
Guidance: An effective repository strives to accommodate evolutions in data types, data volumes, and data rates, as well as to adopt the most effective new technologies in order to remain valuable to its Designated Community. Given the rapid pace of change in the research data environment, it is therefore advisable for a repository to secure the advice and feedback of expert users on a regular basis to ensure its continued relevance and improvement. For this Requirement, responses should include evidence related to the following questions:
This Requirement seeks to confirm that the repository has access to objective expert advice beyond that provided by skilled staff mentioned in R5 (Organizational infrastructure). Extended guidance 2017: The reviewer is looking for evidence that the repository is linked to a wider network of expertise in order to demonstrate access to advice and guidance for both its day-to-day activities and the monitoring of potential new challenges on the horizon (science and technology watch). Part of this information may already have been given under ‘R0. Brief Description of the Repository’s Designated Community’ and ‘Other relevant information’. If so, then please refer to it. |
Compliance level: 4 - The guideline has been fully implemented in the repository |
Response:
|
R7 Data integrity and authenticity
The repository guarantees the integrity and authenticity of the data.
Guidance: The repository should provide evidence to show that it operates a data and metadata management system suitable for ensuring integrity and authenticity during the processes of ingest, archival storage, and data access. Integrity ensures that changes to data and metadata are documented and can be traced to the rationale and originator of the change. Authenticity covers the degree of reliability of the original deposited data and its provenance, including the relationship between the original data and that disseminated, and whether or not existing relationships between datasets and/or metadata are maintained. For this Requirement, responses on data integrity should include evidence related to the following:
Evidence of authenticity management should relate to the follow questions:
This Requirement covers the entire data lifecycle within the repository, and thus has relationships with workflow steps included in other requirements—for example, R8 (Appraisal) for ingest, R9 (Documented storage procedures) and R10 (Preservation plan) for archival storage, and R12–R14 (Workflows, Data discovery and identification, and Data reuse) for dissemination. However, maintaining data integrity and authenticity can also be considered a mindset, and the responsibility of everyone within the repository. Extended guidance 2017: A clear and complete context section is important for all Requirements but this is especially the case for R7. The organization of the curation and the types of data will help guide the reviewer expectation. The reviewer will benefit from a clear overview of the processes and tools used to curate the data, including the level of manual and automated practice, and how the processes, tools, and practices are documented. The applicant may find it useful for this particular Requirement to respond to each bullet point separately, and to address integrity and authenticity independently, as defined in the Guidance of the Requirement. Audit trails which are written records of the actions performed on the data, should be described in the evidence provided. |
Compliance level: 4 – The guideline has been fully implemented in the repository |
Response: Data integrity checks by the IISH: Fixity:
The integrity of the bitstreams is guaranteed by the institute's data provider. An independent comparison by the system between metadata and hashed content is considered. Version control:
Standards/conventions:
Data authenticity measures by the IISH:
Specifically, for research data the following is relevant: At the moment of writing research data are not yet being archived by the use of the Archivematica (workflow). This is planned for the beginning of 2020 in a separate Dataverse Archivematica integration project. When this work is finished all of the above will apply for research data as well. For now, research data are published online via the Dataverse platform and the preservation level is that of bit preservation. This means that research data are stored on a replicated storage system and the fixity of the files are regularly checked. See also the schematic view of the four main digital archival workflows: https://confluence.socialhistoryservices.org/display/CTS/Scheme+of+four+main+IISH+digital+archival+workflows. |
R8 Appraisal
The repository accepts data and metadata based on defined criteria to ensure relevance and understandability for data users.
Guidance: The appraisal function is critical in determining whether data meet all criteria for inclusion in the collection and in establishing appropriate management for their preservation. Care must be taken to ensure that the data are relevant and understandable to the Designated Community served by the repository. For this Requirement, responses should include evidence related to the following questions:
This Requirement addresses quality assurance from the viewpoint of the interaction between the depositor of the data and metadata and the repository. It contrasts with R11 (Data quality), which addresses metadata and data quality from the viewpoint of the Designated Community. Extended guidance 2017: The applicant should demonstrate that procedures are in place to ensure only data appropriate to the collection policy are accepted. Repository staff should have all the necessary information, procedures, and skills to ensure long-term preservation and use as applicable for the Designated Community. |
Compliance level: 4 – The guideline has been fully implemented in the repository |
Response: Collection data:
Research data: The appraisal of research data follow a slightly different route as they are all produced by IISH researchers themselves. That means that these data automatically fall within the collection profile (as IISH research is all done following the IISH research programme: https://iisg.amsterdam/en/research/programme). Checks for completeness and understandability (as the adding of descriptive metadata) are done before the data is published on the IISH Dataverse platform. IISH research data are published using open formats. As explained in R7 in 2020 research data will also be ingested via a full preservation Archivematica 'fuelled' workflow. |
R9 Documented storage procedures
The repository applies documented processes and procedures in managing archival storage of the data.
Guidance: Guidance: Repositories need to store data and metadata from the point of deposit, through the ingest process, to the point of access. Repositories with a preservation remit must also offer ‘archival storage’ in OAIS terms. For this Requirement, responses should include evidence related to the following questions:
Extended guidance 2017: The reviewer will be looking to understand each of the storage locations that support curation processes, how data are appropriately managed in each environment, and that processes are in place to monitor and manage change to storage documentation. Can the repository recover from short-term disasters? Are procedures documented and standardized in such a way that different data managers, while performing the same tasks separately, will arrive at substantially the same outcome? |
Compliance level: 4 – The guideline has been fully implemented in the repository |
Response: The primary archival storage of the IISH is provided by KNAW ICT Services. The AIPs are stored in two different data centers - provided by Vancis - in the Netherlands. Because of the nature of the materials and the agreements with archival donors, storage under Dutch law is a strict requirement of the IISH when selecting archival storage. For additional safety, Vancis also provides a 30-day backup of the data. As an extra security measure, the AIPs are also stored on secondary storage provided by SURFSara, including a backup. The secondary storage system is only accessible as a WORM storage, which does not allow deletes and does not allow overwriting AIPs. In this sense, the AIPs are always stored in versions; a new version of the same AIP does not overwrite the old AIP, but writes a new version of the AIP. See document: Multiple copies and backup https://confluence.socialhistoryservices.org/display/CTS/Multiple+copies+and+backup. As the data is replicated across two different storage providers and mirrored across various data centers, this allows for a high level of protection against data loss. See document Security against data loss: https://confluence.socialhistoryservices.org/display/CTS/Security+against+data+loss. |
R10 Preservation plan
The repository assumes responsibility for long-term preservation and manages this function in a planned and documented way.
Guidance: The repository, data depositors, and Designated Community need to understand the level of responsibility undertaken for each deposited item in the repository. The repository must have the legal rights to undertake these responsibilities. Procedures must be documented and their completion assured. For this Requirement, responses should include evidence related to the following questions:
Extended guidance 2017: The reviewer will be looking for clear, managed documentation to ensure: (1) an organized approach to long-term preservation, (2) continued access for data types despite format changes, and (3) there is sufficient documentation to support usability by the Designated Community. The response should address whether the repository has defined preservation levels and, if so, how these are applied. The preservation plan should be managed to ensure that changes to data technology and user requirements are handled in a stable and timely manner. |
Compliance level: 4 – The guideline has been fully implemented in the repository |
Response: The Digital Preservation Policy 2019-2022 is published on the IISH Confluence wiki (https://confluence.socialhistoryservices.org/display/CTS/Digital+Preservation+Policy+2019-2022). There are two preservation levels which will be used in the IISH:
Collection data are preserved on 'full preservation level', research data - at the moment of writing - on 'bit preservation level'. In 2020, when research data will also be processed with the 'Archivematica workflow', the full preservation level will also apply for research data (see also R12). For the contract between the archive and the archival donor/depositor a distinction has to be made between collection and research data: Collection data:
Research data: When the institute acquires research datasets from outside the institute a data deposit agreement is drawn up between the IISH and researcher/depositor: https://confluence.socialhistoryservices.org/download/attachments/42568284/20160900_-_data_deposit_agreement.docx. In this standard contract all provisions are laid down regarding the deposit of dataset. The provisions data deposit agreement (https://confluence.socialhistoryservices.org/download/attachments/42568284/20160900_provisions_data_deposit_agreement%20%281%29.docx) contains the following relevant quote:"
|
R11 Data Quality
The repository has appropriate expertise to address technical data and metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations.
Guidance: Repositories must work in concert with depositors to ensure that there is enough available information about the data such that the Designated Community can assess the substantive quality of the data. Such quality assessment becomes increasingly relevant when the Designated Community is multidisciplinary, where researchers may not have the personal experience to make an evaluation of quality from the data alone. Repositories must also be able to evaluate the technical quality of data deposits in terms of the completeness and quality of the materials provided, and the quality of the metadata. Data, or associated metadata, may have quality issues relevant to their research value, but this does not preclude their use in science if a user can make a well-informed decision on their suitability through provided documentation. For this Requirement, please describe:
Provisions for data quality are also ensured by other Requirements. Specifically, please refer to R8 (Appraisal), R12 (Workflows), and R7 (Data integrity and authenticity). Extended Guidance: |
Compliance level: 4 - The guideline has been fully implemented in the repository |
Response: Collection data
Research data
|
R12 Workflows
Archiving takes place according to defined workflows from ingest to dissemination.
Guidance: To ensure the consistency of practices across datasets and services and to avoid ad hoc and reactive activities, archival workflows should be documented, and provisions for managed change should be in place. The procedure should be adapted to the repository mission and activities, and procedural documentation for archiving data should be clear. For this Requirement, responses should include evidence related to the following:
This Requirement confirms that all workflows are documented. Evidence of such workflows may have been provided as part of other task-specific Requirements, such as for ingest in R8 (Appraisal), storage procedures in R9 (Documented storage procedures), security arrangements in R16 (Security), and confidentiality in R4 (Confidentiality/Ethics). Extended Guidance 2017 The reviewer is looking for evidence that the applicant takes a consistent, rigorous, documented approach to managing its activities throughout their processes and that changes to those processes are appropriately implemented, evaluated, recorded, and administered. |
Compliance level: 3 – The repository is in the implementation phase |
Response: Collection data The IISH has two workflows for the ingest of collection data in the repository:
Security and impact on workflows
Approaches towards data outside the mission/collection profile The method of, and depth of, appraisal and selection of born digital collections is an ongoing discussion and adaptive process because of the changing nature of the materials and the organisations and persons that provide the materials. Having said this, the following materials will be deselected as a rule:
In the near future also extra functionality will be developed within Archivematica for identification and removal of duplicates files. Change management of the workflows
Research data The archival workflow for research data is, for an important part, determined by the Dataverse platform (https://datasets.socialhistory.org/) that the IISH uses for dissemination of the datasets: http://dataverse-guides.readthedocs.io/en/latest/user/dataset-management.html. Files are stored on a file server (of which back-ups are made). At the moment of writing this means that, for the IISH datasets, only the integrity of the files can be guaranteed (bit preservation level - see R7). In 2020 the datasets will be ingested through an Archivematica workflow which means that the datasets will be archived on full preservation level (see R7). Research data sets uploaded in Dataverse will be transferred to the Archivematica workflow and follow the standard ingest. The data will be published if access is open. See the also scheme of the four main IISH digital archival workflows: https://confluence.socialhistoryservices.org/display/CTS/Scheme+of+four+main+IISH+digital+archival+workflows |
R13 Data Discovery and Identification
The repository enables users to discover the data and refer to them in a persistent way through proper citation.
Guidance: Effective data discovery is key to data sharing, and most repositories provide searchable catalogues describing their holdings such that potential users can evaluate data to see if they meet their needs. Once discovered, datasets should be referenceable through full citations to the data, including persistent identifiers to ensure that data can be accessed into the future. Citations also provide credit and attribution to individuals who contributed to the creation of the dataset. For this Requirement, responses should include evidence related to the following questions:
Extended Guidance R13 |
Compliance level: 4 - The guideline has fully implemented in the repository |
Response: Collection data
Research data
|
R14 Data Reuse
The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.
Guidance: Repositories must ensure that data can be understood and used effectively into the future despite changes in technology. This Requirement evaluates the measures taken to ensure that data are reusable. For this Requirement, responses should include evidence related to the following questions:
The concept of ‘reuse’ is critical in environments in which secondary analysis outputs are redeposited into a repository alongside primary data, since the provenance chain and associated rights issues may then become increasingly complicated. Reuse is dependent on the applicable licenses covered in R2 (Licenses). Extended Guidance (2017) The applicant should understand the needs of the Designated Community in terms of their research practises, technical environment, and applicable standards. Changes in technology are important, but appropriate high-quality metadata should also play an essential role and should be referred to in the evidence provided. The latter information is critical to design curation processes that result in digital objects meeting the needs of the end user, as well as generic or disciplinary standards. |
Compliance level: 4 - The guideline has fully implemented in the repository |
Response: Metadata of archival donors See answer to R11: "The IISH aims to give proper access to all collections by adding sufficient descriptive metadata... created by the institute itself." Data provided for the Designated Community
Measures taken into account for the possible evolution of formats
Understandability of the data
Understandability of the metadata All collection data are described following international standards MARC21 and EAD (see R13). For MARC21 descriptions the IISH follows RDA (Resource Description and Access) description guidelines which, among other things, guarantee the authenticity and interoperability of the metadata. EAD guarantees the standardized description of archival collections. Each archival description comes with elaborate context information (for instance http://hdl.handle.net/10622/ARCH02639 under header 'content and context'). Additional IISH description guidelines further standardize the use of the metadata within the institute. Research data are described using the DDI standard via the IISH Dataverse platform (https://datasets.socialhistory.org/). This ensures that the data are described in a standardized and interoperable fashion. |
R15 Technical infrastructure
The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community.
Guidance: Repositories need to operate on reliable and stable core infrastructures that maximizes service availability. Furthermore, hardware and software used must be relevant and appropriate to the Designated Community and to the functions that a repository fulfils. Standards such as the OAIS reference model specify the functions of a repository in meeting user needs. For this Requirement, responses should include evidence related to the following questions:
Extended guidance 2017: The workflows and human actors providing repository services must be supported by a technological infrastructure. The reviewer is looking for evidence that the applicant understands the wider ecosystem of standards, tools, and technologies available for (research) data management and curation, and has selected options that align with local requirements. If possible, this should be demonstrated by using a reference model. |
Compliance level: 4 - The guideline has fully implemented in the repository |
Response: The IISH uses the OAIS functional model (https://en.wikipedia.org/wiki/Open_Archival_Information_System#The_functional_model) as guiding principle for the elements which have to be covered by the digital repository. For the recording and management of new acquisitions the IISH Acquisition database (Github: https://github.com/IISH/acquisition_database) is used. Digital packages are produced using the BagIt format using the tool Exactly (Github: https://github.com/IISH/uk-exactly). The archival bags are ingested into Archivematica (https://www.archivematica.org/en/) using the accession number provided by the Acquisition database. Archivematica serves as the OAIS ingest workflow and preservation planning software tool. The ingest procedure in Archivematica is provided as a workflow through a number of microservices, each providing the necessary tools for the required checks, information extractions, transformations and normalization steps during the ingest procedure. At the end of the ingest procedure, Archivematica produces an AIP, which also conforms to the BagIt format. This AIP includes a METS document as a wrapper for all structural and technical metadata. All preservation metadata is included in the METS using the PREMIS data model. All customizations of rules and tooling to use for various file formats during the ingest procedure is configured using the preservation planning tools provided by Archivematica. During the ingest procedure persistent identifiers (see R13) are assigned using the PID handle service (Githbub https://github.com/IISH/PID-webservice). For persistent identifiers and resolving the IISH uses the Handle System. All digital objects and descriptive metadata are made accessible through persistent identifiers. See also R13. All descriptive metadata are described using either MARC or EAD. MARC is used for describing the library holdings?? and the descriptions of individuals objects. Evergreen (http://evergreen-ils.org/) is used for the management of the MARC inventory. EAD is used for describing the archival descriptions. An XML editor XMetal (https://xmetal.com/) is used for the management of the EAD descriptions. All descriptive metadata can be accessed through OAI-PMH and SRU/SRW: https://iisg.amsterdam/en/collections/using/machine-access. The IISH tries as much as possible to use open source (community supported) software. Especially software concerning collection management and dissemination is mostly all open source. All software, documentation and test material produced by the IISH is open source and available through our GitHub repository (https://github.com/iish). A list of software used and relevant technical documentation is managed and monitored by the Digital Infrastructure Department of the KNAW Humanities Cluster (HUC). For real-time to near real-time data streams the provision of around-the-clock connectivity to public and private networks is at a bandwidth that is sufficient to meet the global and/or regional responsibilities of the repository. The IISH is connected to the 100Gb/s port of the Amsterdam Internet Exchange, which is considered to be one of the largest and fasted data hub transports in the world. Agreements of the KNAW with SURF Sara ensure this connection to a high quality and broad bandwidth connection. Future developments are (among others):
|
R16 Security
The technical infrastructure of the repository provides for protection of the facility and its data, products, services, and users.
Guidance: The repository should analyze potential threats, assess risks, and create a consistent security system. It should describe damage scenarios based on malicious actions, human error, or technical failure that pose a threat to the repository and its data, products, services, and users. It should measure the likelihood and impact of such scenarios, decide which risk levels are acceptable, and determine which measures should be taken to counter the threats to the repository and its Designated Community. This should be an ongoing process. For this Requirement, please describe:
This Requirement describes some of the aspects generally covered by others—for example, R12 (Workflows)—and is supplementary to R9 (Documented storage procedures). Extended Guidance R16 The reviewer is looking for evidence that the applicant understands the technical risks applicable to the servive for the data users and the physical environment, and that it has mechanisms in place to respond to security incidents. Evidence must focus on technical infrastructure rather than on managerial and procedural aspects of business continuity. In what way is the technical infrastructure controlled by the repository or by their host/outsource institution? Who is in charge? Can the repository in any way determine the technical infrastructure if that is outsourced? Are the arrangements sufficient to guarantee the long-term preservation of and/or access to the data holdings? |
Compliance level: 3 – The repository is in the implementation phase |
Response: The repository system is actively monitored and subject to standard security policies and procedures maintained at the Royal Dutch Academy of Arts and Sciences (KNAW). Organizational systems are regularly subjected to vulnerability scans and to a yearly audit process (SURFaudit: https://www.surf.nl/en/surfaudit). In terms of SURFaudit recommendations all KNAW organizations are expected to operate at level 3 (‘embedded in the organization’). Progress or deviations from this expected level of outcome are monitored on a yearly basis and results and improvement points are communicated in a yearly Security Workplan. All security incidents are registered, coordinated and handled by the Computer Security Incident Response Team of the KNAW in accordance with the process for handling information security incidents (‘Proces voor afhandeling informatiebeveiligingsincident CSIRT (nov 2015)’) Technical administrators in collaboration with functional administrators resolve incidents. Each institute within the KNAW maintains an Information Security Officer acting as an intermediary between the central CSIRT group and institute. If an incident is reported the following standard procedures are followed: 1. Identification Each step is further subdivided into a specific set of actions related to the incident level. As of January 1st 2016 each organization is obliged to report data leaks with the Autoriteit Persoonsgegevens (Authority personal data https://autoriteitpersoonsgegevens.nl/en) if a serious data leak has been discovered. See also the implementation of the GPDR and the measures taken by the KNAW and the Institute (as described in R4). All of our servers are provided by the ICT Services of the KNAW and covered via Service Level Agreements. These also cover recuperation procedures in case of an organization wide outage. Risk management procedures are described at the organizational level. To safeguard data deposited into the repository system multiple copies are maintained at off site locations. All data is furthermore replicated (see R9) in AIP packages that include metadata, data and authorization information. In case of a system outage all data can thus be retrieved from several locations. From the perspective of risk management a yearly updated risk assessment document ('Risicomanagementplan 2020 Humanities Cluster', meaning 'Risk managementplan'), latest version September 2019) offers important evidence regarding the long term conservation of the collections. In this document the risk of data loss and cyber attacks is indicated and insight is given on how these risks are controlled. Although the R16 Security Level can be considered to be compliance level 4, the IISH would like to see this confirmed by the Surf Audit in 2020 (which at the moment of writing still has to take place) that will take into account the infrastructural setup and measures taken over the last year when implementing Archivematica and new storage facilities. With the renewed confirmation of Surf Audit the level 4 is also confirmed. |