TOC


Document version history


Version document


Date

Authors

0.1

01-02-2017

Robert Gillesse

0.2

24-03-2017 

Robert Gillesse

0.307-08-2017Robert Gillesse, input Afelonne Doek
0.4Januar 2018Robert Gillesse, Input Eric de Ruijter, Afelonne Doek
0.5April 2018Robert Gillesse
0.6May 2018Input Eric de Ruijter
0.7Jan 2019Input and comments Afelonne
0.8Feb 2019

Hannah Mackay proof reading

0.9March 2019

Robert Gillesse check on links and adding documentation

1.0 July 2019

Robert Gillesse:

  • Made one page for all documentation
  • Small textual changes
1.15-7-2019

Submitted the text in the CTS Application Management Tool. As this accepts only plain text had to make small changes to preserve the logic of the text. In the version below all mark up is still visible.

1.2 19-12-2019

Processed all comments of CTS reviewers (arrived 14-11-2019). With input from Afelonne, Eric and Mario

All changes marked by text comments. 

1.329-4-2020

Processed all remarks CTS reviewer (mail 26-3-2020)

Removed revisions markers from version 1.2

All new changes marked by text comments. 

1.426-6-2020

Last changes before CTS publication (mail CTS 17-6-2020). Zefi Kavvadia proof reading.

Removed revisions markers from version 1.3

1.513-7-2020The IISH has acquired the CTS. The text of the application has been published online: https://www.coretrustseal.org/wp-content/uploads/2020/07/International-Institute-of-Social-History-IISH.pdf. The text below is identical to the CTS publication only for some minor lay out differences (the version below should be slightly easier to read).  


Requirements source

Data Seal of Approval Guidelines version 2017-2019 November 10, 2016


Possible compliance levels for each of the Requirements

0 – Not applicable
1 – The repository has not considered this yet
2 – The repository has a theoretical concept
3 – The repository is in the implementation phase
4 – The guideline has been fully implemented in the repository


R0 Background information


  1. Repository type:
    1. Chosen categories: Domain or subject-based repository,/Library/Museum/Archives, other (please describe below). 
  2. Brief description of repository: The International Institute of Social History (IISH) is a research and cultural heritage institute in the field of global history of labour and labour relations (see also R1). The institute acquires, manages and preserves archives, library material and audiovisual material in this field, as well as research data. The IISH is a 'private archive' and not a state archive. It has no legal deposit function and materials are collected outside the public sphere of government. Archives can be acquired from organisations (like for instance trade unions or activist groups) or private persons. Likewise, the subject matter of archives can be about organisations or individuals (as for instance Karl Marx, Mikhail Bakunin or Rosa Luxemburg).
  3. Designated communities:
    1. Researchers (mostly students and scientists from the humanities and social sciences and research journalists)

    2. General public with (sometimes professional) interest in the IISH collections
    3. Archival donors
  4. Levels of curation: collection and research data: Enhanced curation – e.g., conversion to new formats, enhancement of documentation
  5. Outsource partners: The IISH uses the digital storage services of the KNAW (Royal Netherlands Academy of Arts and Sciences, see 6b directly below) ICT services for primary storage of the digital collections. This is formalised in a SLA between the KNAW ICT services and the IISH. The KNAW outsources the storage to the datacenter ‘Vancis’ (https://vancis.nl/). For secondary storage the IISH uses the services of ‘SURFsara’ data storage (https://www.surf.nl/en/services-and-products/data-archive/index.html). Both parties are closely linked to the Dutch university sphere in which security, trustworthiness and authenticity of data are of eminent value and have to meet strict demands on information security. Both parties meet the relevant ISO 27001 standard family (https://en.wikipedia.org/wiki/ISO/IEC_27001). ‘Vancis’: https://vancis.nl/over-vancis/certificeringen/. ‘Surf’: https://www.surf.nl/en/services-and-products/data-archive/data-security-and-privacy/index.html
  6. Other relevant information: 
    1. In a small selection of CTS criteria a distinction is made between the IISH collection data and IISH research data. By the first is meant all data concerning the archival, library and audio-visual collections (which can either be digitised or born digital materials). By the second the data which are the result of IISH initiated research. 
    2. Important to mention is that the IISH is a Royal Netherlands Academy of Arts and Sciences (KNAW) institute. This means that part of the IISH services and policies are dependent on broader KNAW policies. Responsibility for collection management is however in the hands of an independent body: the IISH Foundation. Ownership of the research data that are created by IISH researchers lies with the KNAW Institute IISH. Since 2017 the IISH is also part of the Humanities Cluster (HUC) - an alliance between the KNAW institutes IISH, the Meertens Institute (https://www.meertens.knaw.nl/cms/en/) and Huygens-ING institute (https://www.huygens.knaw.nl/?lang=en). This alliance has a goal to stimulate cooperation between these institutes and promote innovation on the terrain of technical infrastructure and digital humanities research. This wider KNAW and HUC context gives the IISH more backbone/elbow room concerning issues like office IT, information security, digital humanities tools and research and knowledge management. When relevant this wider context will be mentioned in the criteria below. 
    3. The IISH uses Archivematica - an open source digital archiving workflow software which follows OAIS principles (https://www.archivematica.org/en/) - to manage the transfer and ingest of digitized and born digital content and to support the preservation watch function of the digital repository. In 2020 Archivematica will also be implemented to archive the IISH research data. Archivematica is therefore a core application in the IISH digital archiving process. Since 2019 the IISH is also part of the Archivematica Support Group which supports and monitors the development of new features and contributes to the Archivematica development roadmap. 
    4. Important information for the reviewers:
      1. Almost all documentation mentioned in the requirements is found on the public part of IISH Confluence pages: https://confluence.socialhistoryservices.org/display/CTS/Certification+of+IISH+Digital+Repository+Home. Where this is not the case this is mentioned explicitly. 
      2. All documentation mentioned in the requirements can be found also on the List of documentation mentioned in CTS form https://confluence.socialhistoryservices.org/display/CTS/List+of+documentation+mentioned+in+CTS+form
      3. The DOI mentioned in the organizational profile only points to the research data archive of the IISH. See also point 6a. 
      4. The plain text of this application form can also be read in a formatted and slightly easier to read version on the IISH confluence: https://confluence.socialhistoryservices.org/display/CTS/Core+Trust+Seal+%28CTS%29+certification. Also all changes from the last version will be made visible (as comments). 


R1 The repository has an explicit mission to provide access to and preserve data in its domain


Guidance: Repositories take responsibility for stewardship of digital objects, and to ensure that materials are held in the appropriate environment for appropriate periods of time. Depositors and users must be clear that preservation of, and continued access to, the data is an explicit role of the repository.

For this Requirement, please describe:

  • Your organization’s mission in preserving and providing access to data, and include links to explicit statements of this mission.
  • The level of approval within the organization that such a mission statement has received (e.g., approved public statement, roles mandated by funders, policy statement signed off by governing board).

Extended guidance 2017:

If data management is not referred to in the mission statement, then, as a rule, this Requirement cannot have a compliance level of 3 or higher.

Compliance level: 4 – The guideline has been fully implemented in the repository

Response: In the IISH mission statement is as following: "The IISH is a unique institute, serving science and society on a global scale. At an international level, we generate and offer reliable information and insights about the (long-term) origins, effects and consequences of social inequality.

To promote this, we form an international hub for social historians worldwide. We offer and produce historical sources and data, facilitate social-history research and collaborate internationally in ground breaking research projects.

Moreover, by preserving the heritage of often oppressed social movements, the Institute serves the quality of the world's memory. With our work we hope to contribute to a vibrant civil society." Source: https://iisg.amsterdam/en/about/mission

In this statement especially the phrases "We offer and produce historical sources and data" and "Moreover, by preserving the heritage of often oppressed social movements, the Institute serves the quality of the world's memory" refer to long term stewardship of the institute.  

In the IISH Strategic Plan 2019 - 2023 (https://confluence.socialhistoryservices.org/download/attachments/32703335/strategic_plan_2018-2023_iisg_engels.pdf) the ambition for long term stewardship and a trustworthy digital repository is addressed (in paragraphs 3.1, 3.2 and especially measure 5.2 (page 26): "We develop an internationally accredited repository for the sustainable storage of digital collections". Also the Collection Policy 2015-2020 (https://confluence.socialhistoryservices.org/download/attachments/42568284/20180110_collectieplan_rev_2018.pdf?api=v2) clearly points towards the ambition of long term digital stewardship, while relating to the overall IISH mission. In the plan it is stated as such (page 5): "in 2020 all digital objects will be stored in a Trusted Digital Repository". Paragraph 3.8.3 (page 23) contains the most relevant information about long term access to digital collections of the IISH.

 It is also important to mention that between 2017-2025 the IISH will receive ample funds from the KNAW to make the organisational and infrastructural transition from paper to long term digital archiving. This is a clear mandate from the umbrella organisation of the Institute to invest in the people, knowledge, software and hardware needed to help with this transition. The ideas behind this can be found in the (Dutch) application document Van archief/bibliotheek naar humanities research infrastructure, Aanvraag vernieuwingsinvestering KNAW (From archive/library to a humanities research infrastructure, application renewal investment KNAW): https://confluence.socialhistoryservices.org/download/attachments/42568284/20161203_investeren_in_collecties_intranetversie_0.pdf



R2 Licenses


The repository maintains all applicable licenses covering data access and use and monitors compliance.


Guidance: Repositories must maintain all applicable licenses covering data access and use, communicate about them with users, and monitor compliance. This Requirement relates to the access regulations and applicable licenses set by the data repository itself, as well as any codes of conduct that are generally accepted in the relevant sector for the exchange and proper use of knowledge and information. Reviewers will be seeking evidence that the repository has sufficient controls in place according to the access criteria of their data holdings, as well as evidence that any relevant licenses or processes are well managed.

For this Requirement, please describe:

  • License agreements in use.
  • Conditions of use (distribution, intended use, protection of sensitive data, etc.).
  • Documentation on measures in the case of noncompliance with conditions of access and use.

Note that if all data holdings are completely public and without conditions imposed on users—such as attribution requirements or agreement to make secondary analysis openly available—then it can simply be stated.

This Requirement must be read in conjunction with R4 (Confidentiality/Ethics) to the extent that ethical and privacy provisions impact on the licenses. Assurance that deposit licences provide sufficient rights for the repository to maintain, preserve, and offer access to data is covered under R10 (Preservation Plan).

Extended guidance 2017:

Access and use conditions could be set differently: either as standard terms and conditions, or as differentiated for particular depositors or datasets. These could cover the level of curation, what is the liability level, the level of responsibility taken for the data, limitations on use, limits on usage environment (safe room, secure remote access), and limits on types of users (approved researcher, has received training, etc.). The Creative Commons licences (https://creativecommons.org/), including CC 0 Waiver and public domain data, could be used as a reference here, but other alternatives are also possible. While it may be challenging to identify instances of noncompliance, some consideration should be given to the consequences if noncompliance is detected (e.g., sanctions on current or future access/use of data). In the case of sensitive personal data disclosure, there may be severe legal penalties that impact both the user and repository. Ideally, repositories should have a public policy in place for noncompliance. The minimum compliance level should be 4, if the applicant is currently providing access to data.

Compliance level: 4 - The guideline has been fully implemented in the repository

Response:

Metadata records about collections and collection items contain clear information regarding access, restrictions and permissions, copyright holders and - when relevant - other licenses such as Creative Commons (CC). Based on this information internal procedures and automatic responses take place. The metadata themselves are published under a CC0 license. Research data are always published with a CC license.

Some examples:

All descriptive metadata are available under a CC0 license: https://iisg.amsterdam/en/collections/use/api-linked-data

In the case of a restricted collection, data non-compliance is no issue as these collections are simply not available to the users. In the unlikely case our systems or processes fail we shall act as promptly as possible and repair breaches quickly. In the case of copyright violation we remind users of their legal obligations. See our 'Policy in case of noncompliance of licenses and copyright violations': https://confluence.socialhistoryservices.org/display/CTS/Policy+in+case+of+noncompliance+of+licences+and+copyright+violations.



 

R3 Continuity of Access

The repository has a continuity plan to ensure ongoing access to and preservation of its holdings.


Guidance: This Requirement covers the measures in place to ensure access to, and availability of, data holdings, both currently and in the future. Reviewers are seeking evidence that preparations are in place to address the risks inherent in changing circumstances.

For this Requirement, please describe:

  • The level of responsibility undertaken for data holdings, including any guaranteed preservation periods.
  • The medium-term (three- to five-year) and long-term (> five years) plans in place to ensure the continued availability and accessibility of the data. In particular, both the response to rapid changes of circumstance and long-term planning should be described, indicating options for relocation or transition of the activity to another body or return of the data holdings to their owners (i.e., data producers). For example, what will happen in the case of cessation of funding, which could be through an unexpected withdrawal of funding, a planned ending of funding for a time- limited project repository, or a shift of host institution interests?

Evidence for this Requirement should relate more to governance than to the technical information that is needed in R10 (Preservation plan) and R14 (Data reuse), and should cover the situation in which R1 (Mission/Scope) changes. This Requirement contrasts with R15 (Technical infrastructure) and R16 (Security) in that it covers full business continuity of the preservation and access functions.

Extended guidance 2017:

The reviewer is looking for information to understand the level of responsibility for data. For example, are you the primary or only custodian? Is the depositor responsible as well? Does the repository promise to provide access, preservation, and/or data storage to some minimum quality level for some minimum time period? This information helps the reviewer to judge if the repository is sustainable in terms of its finances and processes; in particular, the continuity of its collections and responsibilities in the case of a major business continuity failure. The responsibility for sustainability may not lie in the hands of the repository itself, but a higher, overarching (or umbrella) organization. If so, this should be clearly indicated. Moreover, if the repository is part of such a larger (umbrella) organization, has this or any other organization (e.g., National Archives) guaranteed that it will take over the responsibility in the case of major business continuity failure?

Compliance level3 – The repository is in the implementation phase

Response

The IISH has a long-standing tradition (since 1935) in preserving and ensuring access to its (world famous) collections. It is at the core of the institute's activities and a guiding principle for all policy and strategy documents. The plans for ongoing access to collections and research data can be found in multiple places but are most clearly stated in the Collection Policy Plan 2015 - 2020 (https://confluence.socialhistoryservices.org/download/attachments/42568284/20161203_investeren_in_collecties_intranetversie_0.pdf). A citation from the IISH collection plan:

" 1. The IISH aspires to be (in addition to being a first-class research institute) a world-leading centre for preserving and making available sources in the field of socio-economical history, in particular the world history of labour and industrial relations.

2. The collection is primarily formed, made accessible and made available for the purpose of scientific use, but it simultaneously forms a socially valuable heritage collection relevant to a broader public.

3. Highly decisive for realising the aforementioned ambitions are:

a. the volume and quality of the collection;

b. the dependability of safe storage, taking into account security, privacy and the trust of the archival creators.

c. the level of the archive’s discoverability and findability;

d. the quality and openness of the services used in making the archive available (online);

e. the measures we take to maintain the collection;

f. the extent to which the IISH leads nationally and internationally in using cutting-edge methods."

IISH collections are given on loan or as a deposit to a separate and independent foundation - the IISH Foundation. The IISH foundation is responsible for the collection management, the KNAW institute IISH is responsible for carrying out this management. For research data, the KNAW institute IISH is the owner and is responsible for carrying out this management. Being a KNAW institute and being part of the HUC (see R0), guarantees the continuity of the IISH and its collections. The sharing of (for instance IT) services and knowledge (i.e. information security) between the KNAW and HUC institutes also creates the environment for the IISH to perform at a higher level that it could do on its own.

Since the foundation of the IISH, more than 100 million euro has been spend on the collections and the infrastructural facilities to acquire, manage and preserve them. Yearly, around 60 percent of the budget for IT investments has been spent on the sustainability and development of the collection and data infrastructure that supports the work-processes, including digital preservation.  It is unlikely that funds will be spent on something completely different or that this funding will significantly drop to a point where this is a serious risk. 

In the formal contract between IISH and KNAW the importance of the collections and the need for its availability to researchers is recognized, ensuring and acknowledging the role of the IISH in the continuing process of collecting, processing and making available of the collections as one of its primary tasks. It is also stated that the IISH receives financial means from the KNAW – which itself receives the financial means from the Dutch state – to enable the institute to carry out its tasks.

In the unlikely event that funding should run out, the IISG will turn to the KNAW to come up with means to continue taking care of both collections and research data.

From the perspective of risk management, a yearly updated risk assessment document ('Risicomanagementplan 2020 Humanities Cluster', meaning 'Risk managementplan', latest version September 2019 - available on request) offers important evidence regarding the long-term conservation of the collection (see also R 16). In this document the risk of data loss and cyber-attacks is indicated and insight is given on how these risks are controlled.




 

R4 Confidentiality/Ethics 

The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms.


Guidance: Adherence to ethical norms is critical to responsible science. Disclosure risk—for example, the risk that an individual who participated in a survey can be identified or that the precise location of an endangered species can be pinpointed—is a concern that many repositories must address. Evidence sought is concerned with not only having good practices for data with disclosure risks, but also the necessity to maintain the trust of those agreeing to have personal/sensitive data stored in the repository.

For this Requirement, responses should include evidence related to the following questions:

  • How does the repository comply with applicable disciplinary norms?
  • Does the repository request confirmation that data collection or creation was carried out in accordance with legal and ethical criteria prevailing in the data producer's geographical location or discipline (e.g., Ethical Review Committee/Institutional Review Board or Data Protection legislation)?
  • Are special procedures applied to manage data with disclosure risk?
  • Are data with disclosure risk stored appropriately to limit access?
  • Are data with disclosure risk distributed under appropriate conditions?
  • Are procedures in place to review disclosure risk in data, and to take the necessary steps to either anonymize files or to provide access in a secure way?
  • Are staff trained in the management of data with disclosure risk?
  • Are there measures in place if conditions are not complied with?
  • Does the repository provide guidance in the responsible use of disclosive, or potentially disclosive data?

Evidence for this Requirement should be in alignment with provisions for the procedures stated in R12 (Workflows) and for any licenses in R2 (Licences)

Extended guidance 2017:

All organizations responsible for data have an ethical duty to manage them to the level expected by the scientific practice of its Designated Community. For repositories holding data about individuals, businesses, or other organizations, there are in addition expectations that the rights of the data subjects will be protected. These will be both of a legal and ethical nature. Disclosure of these data could also present a risk of personal harm, a breach of commercial confidentiality, or the release of critical information (e.g., the location of protected species or an archaeological site). Minimum compliance level should be a 4 if the repository is currently providing access to personal data. Reviewers expect to see evidence that the applicant understands their legal environment and the relevant ethical practices, and that they have documented procedures.

Compliance level: 4 - The guideline has been fully implemented in the repository

Response:

Looking at the ethics concerning the collection in general, two things are important to mention:

  • The IISH adheres to the ICA Code of Ethics (http://www.ica.org/en/ica-code-ethics) from which article 7 ("Archivists should respect both access and privacy, and act within the boundaries of relevant legislation") is especially important for this requirement.
  • Every IISH employee signs a confidentiality agreement in which the ethical conduct as required by the IISH is fully described.

Looking at the different phases of the archival process the following criteria are important when talking about collection ethics:

Creation of archives: 

  • The IISH collects private archives and therefore has no influence on the creation of archival records.
  • The institute does not request confirmation from the archival donor that the data was collected or created in accordance with legal and ethical criteria. The IISH collects materials on labour and labour relations. In some of the countries of origin, the data could be considered to be illegal or unethical (i.e. reports on working conditions in factories, on child labour etc).
  • Another form of creating digital collections is by digitising existing IISH collections. The possible disclosure or copyrights issues of the digitised collections are taken into account before starting the digitising process.

Curation/collection of archives:

  • New acquisitions of archives and research data sets are accepted on the basis of individual agreements with the donors. Access regulations are an explicit part of the agreement, and are agreed upon by both donor and IISH staffThere is a standard contract available but for every archival donor contracts are tailor made.
  • The IISH ensures a clear and secure procedure for transfer of collections from the donor to the institute.
  • During talks with the archival donor the most sensitive parts of the archive are identified. This might mean that these parts are not transferred to the IISH or that these parts will be closed for a certain period of time (see below). This has been common practice since the institute started archiving collections in 1930's and worked well for both paper and digital collections. 

Pre-ingest (quick scan and appraisal and selection):

During appraisal and selection of the to-be-ingested archives sensitive parts might be identified and, in consultation with the archival donor, be de-selected or closed for a certain period of time (see below under 'access'). The first step in this process is the so called 'quick scan' of the new archive by the Collection Development employee who has acquired the archive. This procedure is described in the document First inspection/quick scan of a new archive procedurehttps://confluence.socialhistoryservices.org/pages/viewpage.action?pageId=60653936. On the basis of this quick scan it can be decided to do a more elaborate appraisal and selection session. During the quick scan the archive is also checked for possible privacy or other disclosure issues. This is done on a high, general level. In practice this means that not every individual file will be checked but instead a generated file and directory list is scanned to try and identify any of these issues. 

Ingest:

  • During ingest born digital archives are not scanned any further for privacy or other disclosure issues. This is a conscious choice because of:
    • The sensitive parts of the archive are - on a high/medium level - detected during talks with the archival donor (see directly above). 
    • Privacy or disclosure issues can also be identified during appraisal and selection. 
    • The (highly) restricted access to born digital archives (see below under 'access').
    • The nature of born digital archives: the discovery of all privacy related information within unordered born digital content is highly problematic. This would need the appliance of named entity recognition on document level which for may reasons would be hard to implement (see directly below) and certainly not perfect (in the sense that it would find all such information). 
    • Technical reasons: full text indexing (which you would need to check for privacy or other disclosure issues on document level) of the born digital content is not yet part of the ingest procedure (or any later part of the workflow). 
  • During ingest checks for viruses and malware are routinely applied (see also R12).

Access:

In the IISH 'risk management plan' document (see R3) there is ample attention given to the risks of disclosure and what is done to mitigate these. Also, the legal department of the KNAW has checked all contracts and procedures. Most important to mention here is that for born digital content the standard procedure is that only an AIP is created (the AIP storage not connected to the outside world) but not automatically a DIP (which can be available in the reading room or - in seldom cases - online) as well. Only after a conscious decision by the collection acquisition and public services staff a DIP is made 'manually'. 

With the implementation of the European legislation concerning General Data Protection Regulation (GDPR) (May 2018) the KNAW has taken the lead in making sure all KNAW institutes do comply with the GPDR. Some of the necessary provisions (KNAW privacy statement - https://www.knaw.nl/en/about-us/academy-privacy-statement -, organisational procedures, data protection officer etc.) have been taken at KNAW level and apply to all the institutes. As for the collections, the Department of Collections at the IISH has made an inventory of workflows where personal data is involved and has taken the necessary measures to ensure compliance. Regarding personal data that is present in the collections of the institute, a working group within the KNAW has been established to determine necessary measures and monitor developments. Institution-specific documentation on GPDR, such as the exceptions in collecting materials with personal data valid for scientific and historical research, and how these apply to the IISH, is available on Confluence.

Specifically for research data the following is relevant:

For research data, disclosure issues between depositor and IISH are laid down in the Provisions Data Deposit Agreement document (https://confluence.socialhistoryservices.org/pages/viewpageattachments.action?pageId=32703335&preview=/32703335/48988571/20160900_provisions_data_deposit_agreement.docx). The Agreement contains the following: "The Depositor declares that the dataset contains no data or other elements that are, either in themselves or in the event of their publication, contrary to Dutch law." In other words: the same checks for disclosure issues apply for research data as for collection data. 



 

R5 Organizational infrastructure

The repository has adequate funding and sufficient numbers of qualified staff managed through a clear system of governance to effectively carry out the mission.


Guidance: Repositories need funding to carry out their responsibilities, along with a competent staff who have expertise in data archiving. However, it is also understood that continuity of funding is seldom guaranteed, and this must be balanced with the need for stability.

For this Requirement, responses should include evidence related to the following:

  • The repository is hosted by a recognized institution (ensuring long-term stability and sustainability) appropriate to its Designated Community.
  • The repository has sufficient funding, including staff resources, IT resources, and a budget for attending meetings when necessary. Ideally this should be for a three- to five-year period.
  • The repository ensures that its staff have access to ongoing training and professional development.
  • The range and depth of expertise of both the organization and its staff, including any relevant affiliations (e.g., national or international bodies), is appropriate to the mission.

Full descriptions of the tasks performed by the repository—and the skills necessary to perform them—may be provided, if available. Such descriptions are not mandatory, however, as this level of detail is beyond the scope of core certification.

Extended guidance 2017:

The description of this Requirement should contain evidence describing the organization’s governance/management decision-making processes and the entities involved. Staff should have appropriate training in data management to ensure consistent quality standards. It is also important to know what proportion of staff is employed on a permanent or temporary basis and how this might affect the professional quality of the repository, particularly for long-term preservation. To what degree is funding structural or project-based? Can this be expressed in FTE numbers? How often does periodic renewal occur?

Compliance level: 4 - The guideline has been fully implemented in the repository

Response:

  • The repository is hosted by the IISH. The IISH is an organisation with a clear profile (see mission statement mentioned in R1) and is part of the KNAW and the KNAW Humanities Cluster (https://huc.knaw.nl/). The collections are owned by, or given in loan to, the IISH foundation - except for research data which is owned by the KNAW institute IISH. See also R0 and the description on the IISH website about the organisation (with more information about the KNAW, scientific board, the KNAW Humanities Cluster, the IISH foundation and the board of directors) and staff - respectively: https://iisg.amsterdam/en/about/organization and https://iisg.amsterdam/en/about/staff. 
  • As the IISH is part of the KNAW and its collection is renowned, there are no signs of an end to the funding horizon. Yearly the IISH is funded by the government through the KNAW. Funding does not depend on a temporary budget and is guaranteed for as long as the IISH and its collections exist. Since 1935 approximately 100 million euro's have been spent on the collections and its infrastructure. Digital preservation is a crucial element in keeping the collections available for research. This is present in the mission of the IISH as well as in strategic and policy plans. See for instance Strategic Plan 2018-2023 (goal 3.1, 3.2 and 5.5.2): https://confluence.socialhistoryservices.org/download/attachments/32703335/strategic_plan_2018-2023_iisg_engels.pdf.
  • The description of tasks and expertise needed in connection to the repository are part of the description of the workflow processes (R12 Workflows) and functional descriptions of jobs. Roles and responsibilities are also described in the Digital Preservation Policy, chapter 13: https://confluence.socialhistoryservices.org/display/CTS/Digital+Preservation+Policy+2019-2022#DigitalPreservationPolicy2019-2022-13.Organisation
  • As a middle sized institute, the "range and depth of expertise" is limited to the needs and mission of the organisation. This means that there is in-depth knowledge of the systems, software (like Archivematica) and workflows connected to the digital repository as all this is hosted by the institute itself. Also, there is in-house knowledge of the OAIS model and specific preservation issues (i.e. file formats) related to the institute's digital collections. The ambition and expertise of the IISH do not extend to in-depth research on preservation issues or to ambitious software development of - for instance - preservation tools. This is clearly seen as the task of bigger institutes or collaborative efforts in national or international projects and of the open source communities.
  • From a Dutch perspective the IISH takes an active role in the Dutch Digital Heritage Network (NDE http://www.netwerkdigitaalerfgoed.nl/). The NDE "is a partnership in the Netherlands that focuses on developing a system of national facilities and services for improving the visibility, usability, and sustainability of digital heritage". Together with the KNAW-HUC partners the IISH is a so-called ‘hub’ in the NDE network and by doing so is sharing and gaining knowledge.



 

R6 Expert Guidance

The repository adopts mechanism(s) to secure ongoing expert guidance and feedback (either in- house, or external, including scientific guidance, if relevant).

 

Guidance: An effective repository strives to accommodate evolutions in data types, data volumes, and data rates, as well as to adopt the most effective new technologies in order to remain valuable to its Designated Community. Given the rapid pace of change in the research data environment, it is therefore advisable for a repository to secure the advice and feedback of expert users on a regular basis to ensure its continued relevance and improvement.

For this Requirement, responses should include evidence related to the following questions:

  • Does the repository have in-house advisers, or an external advisory committee that might be populated with technical members, data science experts, and disciplinary experts?
  • How does the repository communicate with the experts for advice?
  • How does the repository communicate with its Designated Community for feedback?

This Requirement seeks to confirm that the repository has access to objective expert advice beyond that provided by skilled staff mentioned in R5 (Organizational infrastructure).

Extended guidance 2017:

The reviewer is looking for evidence that the repository is linked to a wider network of expertise in order to demonstrate access to advice and guidance for both its day-to-day activities and the monitoring of potential new challenges on the horizon (science and technology watch). Part of this information may already have been given under ‘R0. Brief Description of the Repository’s Designated Community’ and ‘Other relevant information’. If so, then please refer to it.

Compliance level: 4 - The guideline has been fully implemented in the repository

Response:

  • Aware of the challenges that come with digital preservation, the IISH appointed a digital archivist in 2016. The digital archivist (https://iisg.amsterdam/en/about/staff/robert-gillesse) is dedicated to all aspects of long term preservation within the institute. One of the tasks of the digital archivist of the IISH is to monitor technical developments concerning file formats, (pre-)ingest, preservation and access. He has a network of other national (see below) and international experts who he can consult. Other IISH staff monitor changes in organisational digital strategies and practices (of our archival donors), changes in metadata standards, storage and access systems, security and workflow management.  
  • The data officer (https://iisg.amsterdam/en/about/staff/richard-zijdeman) of the IISH monitors changes in research data and is deeply involved in Dutch and European projects concerning data research infrastructure (CLARIAH https://www.clariah.nl/over/wie-is-wie/wp4/richard-zijdeman and DARIAH).
  • The IISH has in active role in the Dutch Netwerk Digitaal Erfgoed (NDE, network digital heritage). This network contains a nationwide pool of expertise on digital preservation (see also R5). The IISH digital archivist represents the KNAW-HUC in the domain group ('board') on digital preservation.
  • The IISH also strengthens its expertise by being an active member- or partner of the following organisations: ICA (International Council of Archives http://www.ica.org/), BRAIN/KVAN (branch organisation and association of Dutch archives: https://www.kvanbrain.nl/), IALHI (The International Association of Labour History Institutions http://www.ialhi.org/) and OCLC research library partnership (http://www.oclc.org/research/partnership.html). 
  • As part of national and international research infrastructures the IISH is deeply embedded in the research community. Communication with the designated community is therefore an everyday occurrence. The IISH-HUC is also closely involved in many digital humanities initiatives and research-infrastructures like CLARIAH (i.a. as part of the board https://www.clariah.nl/en/about/organisation/board) an DARIAH (as national coordinator https://www.dariah.eu/network/partners-countries/netherlands/).

 

 

R7 Data integrity and authenticity 

The repository guarantees the integrity and authenticity of the data.

 

Guidance: The repository should provide evidence to show that it operates a data and metadata management system suitable for ensuring integrity and authenticity during the processes of ingest, archival storage, and data access.

Integrity ensures that changes to data and metadata are documented and can be traced to the rationale and originator of the change.

Authenticity covers the degree of reliability of the original deposited data and its provenance, including the relationship between the original data and that disseminated, and whether or not existing relationships between datasets and/or metadata are maintained.

For this Requirement, responses on data integrity should include evidence related to the following:

  • Description of checks to verify that a digital object has not been altered or corrupted (i.e., fixity checks).
  • Documentation of the completeness of the data and metadata.
  • Details of how all changes to the data and metadata are logged.
  • Description of version control strategy.
  • Usage of appropriate international standards and conventions (which should be specified).

Evidence of authenticity management should relate to the follow questions:

  • Does the repository have a strategy for data changes? Are data producers made aware of this strategy?
  • Does the repository maintain provenance data and related audit trails?
  • Does the repository maintain links to metadata and to other datasets? If so, how?
  • Does the repository compare the essential properties of different versions of the same file? How?
  • Does the repository check the identities of depositors?

This Requirement covers the entire data lifecycle within the repository, and thus has relationships with workflow steps included in other requirements—for example, R8 (Appraisal) for ingest, R9 (Documented storage procedures) and R10 (Preservation plan) for archival storage, and R12–R14 (Workflows, Data discovery and identification, and Data reuse) for dissemination. However, maintaining data integrity and authenticity can also be considered a mindset, and the responsibility of everyone within the repository.

Extended guidance 2017:

A clear and complete context section is important for all Requirements but this is especially the case for R7. The organization of the curation and the types of data will help guide the reviewer expectation. The reviewer will benefit from a clear overview of the processes and tools used to curate the data, including the level of manual and automated practice, and how the processes, tools, and practices are documented. The applicant may find it useful for this particular Requirement to respond to each bullet point separately, and to address integrity and authenticity independently, as defined in the Guidance of the Requirement. Audit trails which are written records of the actions performed on the data, should be described in the evidence provided.

Compliance level: 4 – The guideline has been fully implemented in the repository

Response:

Data integrity checks by the IISH:

Fixity:

  • When the archival donor delivers the digital collection in an Archival Bag (in which checksums are included) fixity is checked as soon as the bag reaches the institute. If not, the institute will create an archival bag with MD5 checksums after arrival which is then ingested. In that case any data corruption that took place before the bag reached the institute falls outside the responsibility of the institute.
  • During the pre-ingest (resulting in a SIP) and ingest (resulting in an AIP and DIP) the checksums are validated by Archivematica.
  • During the pre-ingest, Archivematica produces SHA-256 checksums (https://www.archivematica.org/en/docs/archivematica-1.7/user-manual/transfer/transfer/#process-transfer) for each of the files.
  • After the creation of the AIP, Archivematica performs a bag check which includes a final fixity check.
  • After the storage of the AIP the fixity is regularly checked by the IISH (https://confluence.socialhistoryservices.org/display/CTS/Fixity+checking).

The integrity of the bitstreams is guaranteed by the institute's data provider. An independent comparison by the system between metadata and hashed content is considered.

Version control:

  • Original files and metadata persist in the AIP.
  • Changes over time in file formats are recorded and stored in the AIP METS file using PREMIS events.
  • Other than file format changes, AIPs are immutable; changes result in a new AIP and the relation is documented in the technical and structural metadata (METS).

Standards/conventions:

  • The IISH uses international standards for preservation of digital objects (OAIS, PREMIS, METS). 

Data authenticity measures by the IISH:

  • A strategy for the planned change of digital objects (through careful and logged migration steps, the original will always be kept) can be found in the IISH preservation policy 2019-2022 (https://confluence.socialhistoryservices.org/display/CTS/Digital+P foe reservation+Policy+2019-2022). The Preservation Policy states the following about the authenticity of the digital object: "the informational value within the digital object is first and foremost guaranteed by always keeping the original digital object. If this object threatens to go obsolete, the object is migrated to a preservation copy that - as much as possible - preserves the informational value of the original object. Also an important part of guaranteeing the authenticity of the object is the creation of technical metadata. This metadata gives in-depth technical information about the object and is the basis for possible future preservation actions. Technical metadata is acquired and stored (as PREMIS metadata in a METS wrapper) during the preservation workflow described below. The authenticity of a digital object can only be fully guaranteed if the file format can be identified." 
  • Provenance metadata are made in two senses:
    • The origin of the digital object: the context of the object is described in the archival description which is published on the IISH website.
    • Every step in the archiving workflow (audit trail) is logged by Archivematica in PREMIS metadata. Eventual migration to a new format - during the archiving process or at a later date - will be logged in the PREMIS metadata.
  • The link between digital objects and metadata is maintained/guaranteed by the use of a PID (Persistent Identifier) through the Handle service (https://www.handle.net). See the IISH PID declaration & binding workflow for digitized and born digital collections (https://confluence.socialhistoryservices.org/pages/viewpage.action?pageId=48989155). See also CTS R13 for an in-depth description of the Handle service. 
  • Essential properties of categories of digital objects (for instance word processing files) are seen by the IISH as a somewhat problematic concept as these are very much dependent on the context. The institute rather talks about preservation intent which is described as followed in the Digital Preservation Policy (https://confluence.socialhistoryservices.org/display/CTS/Digital+Preservation+Policy+2019-2022#DigitalPreservationPolicy2019-2022-5.3Preservationintent):"
    • All digital objects will be kept and (as much as possible) preserved in their original form.
    • But the IISH will, in the end, give priority to the informational value within these objects. This means that objects may be migrated to other file formats - as the original format is obscure and/or obsolete - as long as it can be guaranteed that the information within these files is still authentic.
    • These migrated files are called preservation copies.
    • As the proof of authenticity will, in some cases, be a challenge the original object will always be available as a fall back file.
    • The digital object is always shown within the right context and is findable through the correct contextual information."
  • The identity of the IISH archival donors are known to the institute and agreements are formalised in a contract between donor and institute.

Specifically, for research data the following is relevant:

At the moment of writing research data are not yet being archived by the use of the Archivematica (workflow). This is planned for the beginning of 2020 in a separate Dataverse Archivematica integration project. When this work is finished all of the above will apply for research data as well. For now, research data are published online via the Dataverse platform and the preservation level is that of bit preservation. This means that research data are stored on a replicated storage system and the fixity of the files are regularly checked. See also the schematic view of the four main digital archival workflows: https://confluence.socialhistoryservices.org/display/CTS/Scheme+of+four+main+IISH+digital+archival+workflows. 




 

R8 Appraisal 

The repository accepts data and metadata based on defined criteria to ensure relevance and understandability for data users.

 

Guidance: The appraisal function is critical in determining whether data meet all criteria for inclusion in the collection and in establishing appropriate management for their preservation. Care must be taken to ensure that the data are relevant and understandable to the Designated Community served by the repository.

For this Requirement, responses should include evidence related to the following questions:

  • Does the repository use a collection development policy to guide the selection of data for archiving?
  • Does the repository have quality control checks to ensure the completeness and understandability of data deposited? If so, please provide references to quality control standards and reporting mechanisms accepted by the relevant community of practice, and include details of how any issues are resolved (e.g., are the data returned to the data provider for rectification, fixed by the repository, noted by quality flags in the data file, and/or included in the accompanying metadata?)
  • Does the repository have procedures in place to determine that the metadata required to interpret and use the data are provided?
  • What is the repository’s approach if the metadata provided are insufficient for long-term preservation?
  • Does the repository publish a list of preferred formats?
  • Are quality control checks in place to ensure that data producers adhere to the preferred formats?
  • What is the approach towards data that are deposited in non-preferred formats?

This Requirement addresses quality assurance from the viewpoint of the interaction between the depositor of the data and metadata and the repository. It contrasts with R11 (Data quality), which addresses metadata and data quality from the viewpoint of the Designated Community.

Extended guidance 2017:

The applicant should demonstrate that procedures are in place to ensure only data appropriate to the collection policy are accepted. Repository staff should have all the necessary information, procedures, and skills to ensure long-term preservation and use as applicable for the Designated Community.

Compliance level: 4 – The guideline has been fully implemented in the repository

Response:

Collection data:

  • The IISH has a collection policy document Collection policy 2015-2020 (https://confluence.socialhistoryservices.org/download/attachments/42568284/20180110_collectieplan_rev_2018.pdf) which guides the Institute in the acquisition of new archives.  A well-defined collection profile ("labour related network organisations and persons  involved in these organisations") assures that the Institute can make sharp choices when acquiring collections.
  • As the IISH is a private archive the Institute is not in the position to enforce any criteria on the transfer of (digital) archives. 
  • The IISH strives to carry out the appraisal and selection in close dialogue with the archival donor. If, due to circumstances, this is not possible, the IISH will accept the collection as it is and will run a quick scan to see if and what appraisal and selection steps are necessary.
  • During the reception and pre-ingest phase objects will be checked - to the degree where the IISH can check these things - for completeness and understandability.
  • For reasons mentioned above the IISH cannot enforce the use of preferable file formats by the archival donor. All files will be ingested as such and for some file formats extra preservation and access copies will be made during the ingest process.
  • To make sure our designated communities can use and understand the digital objects, the IISH uses the most common files formats for the dissemination of these objects, while offering metadata on how and why the choice for these formats was made. When permitted and technically feasible the user can also have access to the originally ingested file. See the IISH File format policy for born digital collections (https://confluence.socialhistoryservices.org/display/CTS/File+format+policy+for+born+digital+collections).
  • When objects are delivered in an obscure file format which cannot be recognized by the (current) characterization tools (the IISH uses the PRONOM file format registry http://www.nationalarchives.gov.uk/PRONOM/Default.aspx within Archivematica) the files will be stored as such. For these files the authenticity cannot be guaranteed and only bit preservation is possible. As these files are marked as unrecognized, they might be re-ingested at a later date when characterization tools are updated. 

Research data:

The appraisal of research data follow a slightly different route as they are all produced by IISH researchers themselves. That means that these data automatically fall within the collection profile (as IISH research is all done following the IISH research programme: https://iisg.amsterdam/en/research/programme). Checks for completeness and understandability (as the adding of descriptive metadata) are done before the data is published on the IISH Dataverse platform. IISH research data are published using open formats. As explained in R7 in 2020 research data will also be ingested via a full preservation Archivematica 'fuelled' workflow. 

 


R9 Documented storage procedures 

The repository applies documented processes and procedures in managing archival storage of the data.


Guidance: Guidance:

Repositories need to store data and metadata from the point of deposit, through the ingest process, to the point of access. Repositories with a preservation remit must also offer ‘archival storage’ in OAIS terms.

For this Requirement, responses should include evidence related to the following questions:

  • How are relevant processes and procedures documented and managed?
  • What levels of security are required, and how are these supported?
  • How is data storage addressed by the preservation policy?
  • Does the repository have a strategy for backup/multiple copies? If so, what is it?
  • Are data recovery provisions in place? What are they?
  • Are risk management techniques used to inform the strategy?
  • What checks are in place to ensure consistency across archival copies?
  • How is deterioration of storage media handled and monitored?
  • This Requirement deals with high-level arrangements in respect of continuity. Please refer also to R15 (Technical infrastructure) and R16 (Security) for details on specific arrangements for backup, physical and logical security, failover, and business continuity.

Extended guidance 2017:

The reviewer will be looking to understand each of the storage locations that support curation processes, how data are appropriately managed in each environment, and that processes are in place to monitor and manage change to storage documentation. Can the repository recover from short-term disasters? Are procedures documented and standardized in such a way that different data managers, while performing the same tasks separately, will arrive at substantially the same outcome?

Compliance level: 4 – The guideline has been fully implemented in the repository

Response:

The primary archival storage of the IISH is provided by KNAW ICT Services. The AIPs are stored in two different data centers - provided by Vancis - in the Netherlands. Because of the nature of the materials and the agreements with archival donors, storage under Dutch law is a strict requirement of the IISH when selecting archival storage. For additional safety, Vancis also provides a 30-day backup of the data. As an extra security measure, the AIPs are also stored on secondary storage provided by SURFSara, including a backup. The secondary storage system is only accessible as a WORM storage, which does not allow deletes and does not allow overwriting AIPs. In this sense, the AIPs are always stored in versions; a new version of the same AIP does not overwrite the old AIP, but writes a new version of the AIP. See document: Multiple copies and backup https://confluence.socialhistoryservices.org/display/CTS/Multiple+copies+and+backup.

As the data is replicated across two different storage providers and mirrored across various data centers, this allows for a high level of protection against data loss. See document Security against data loss: https://confluence.socialhistoryservices.org/display/CTS/Security+against+data+loss.



 

R10 Preservation plan

The repository assumes responsibility for long-term preservation and manages this function in a planned and documented way.



Guidance: The repository, data depositors, and Designated Community need to understand the level of responsibility undertaken for each deposited item in the repository. The repository must have the legal rights to undertake these responsibilities. Procedures must be documented and their completion assured.

For this Requirement, responses should include evidence related to the following questions:

  • Is a preservation plan in place?
  • Is the ‘preservation level’ for each item understood? How is this defined?
  • Does the contract between depositor and repository provide for all actions necessary to meet the responsibilities?
  • Is the transfer of custody and responsibility handover clear to the depositor and repository?
  • Does the repository have the rights to copy, transform, and store the items, as well as provide access to them?
  • Are actions relevant to preservation specified in documentation, including custody transfer, submission information standards, and archival information standards?
  • Are there measures to ensure these actions are taken?

Extended guidance 2017:

The reviewer will be looking for clear, managed documentation to ensure: (1) an organized approach to long-term preservation, (2) continued access for data types despite format changes, and (3) there is sufficient documentation to support usability by the Designated Community. The response should address whether the repository has defined preservation levels and, if so, how these are applied. The preservation plan should be managed to ensure that changes to data technology and user requirements are handled in a stable and timely manner.

Compliance level4 – The guideline has been fully implemented in the repository

Response:

The Digital Preservation Policy 2019-2022 is published on the IISH Confluence wiki (https://confluence.socialhistoryservices.org/display/CTS/Digital+Preservation+Policy+2019-2022).

There are two preservation levels which will be used in the IISH:

  1. Full preservation level: next to fixity checks and multiple secure storage the objects that can be recognized by file format characterisation tools during ingest will be preserved on the highest level. A preservation policy exists for each category of files (text, spreadsheets, databases, email etc). See the File Format Policy: https://confluence.socialhistoryservices.org/display/CTS/File+format+policy+for+born+digital+collections
  2. Bit preservation level: for files formats that cannot be recognized during ingest only fixity checking and secure/multiple storage is possible.

Collection data are preserved on 'full preservation level', research data - at the moment of writing - on 'bit preservation level'. In 2020, when research data will also be processed with the 'Archivematica workflow', the full preservation level will also apply for research data (see also R12). 

For the contract between the archive and the archival donor/depositor a distinction has to be made between collection and research data:

Collection data:

  • The standard and generic "act of deposit" between depositor and repository that is used by the IISH states that the Institute takes up the obligation to keep the collection in good condition ("The Stichting IISG undertakes to keep the archives in a good condition"). The contract offers the possibility to expand to a specific agreement between the repository and the archival donor.
  • As to rights to transform the archive: In the standard contract one of the conditions states that the "appraisal of the archives can take place without the special permission of the depositor".
  • As to the right to give access and copy the archive the contract states the following: "The Stichting IISG will be permitted to make the Archives available to third parties through the channels commonly, including, but not limited to, reading rooms and website. The Foundation IISG will also be permitted to make copies of the Archives for conservation purposes and availability under the same conditions that apply for the original Archives".

Research data:

When the institute acquires research datasets from outside the institute a  data deposit agreement is drawn up between the IISH and researcher/depositor: https://confluence.socialhistoryservices.org/download/attachments/42568284/20160900_-_data_deposit_agreement.docx. In this standard contract all provisions are laid down regarding the deposit of dataset. The provisions data deposit agreement (https://confluence.socialhistoryservices.org/download/attachments/42568284/20160900_provisions_data_deposit_agreement%20%281%29.docx) contains the following relevant quote:"

    1. The Repository shall ensure, to the best of its ability and resources, that the deposited dataset is archived in a sustainable manner and remains legible and accessible.
    2. The Repository shall, as far as possible, preserve the dataset unchanged in its original software format, taking account of current technology and the costs of implementation. The Repository has the right to modify the format and/or functionality of the dataset if this is necessary in order to facilitate the digital sustainability, distribution or re-use of the dataset.
    3. If the access category is Open Access for registered users “under any of its three conditions, as specified in the Data Deposit Agreement at the end of these Provisions, are selected, the Repository shall, to the best of its ability and resources, ensure that effective technical and other measures are in place to prevent unauthorised third parties from gaining access to and/or consulting the dataset or substantial parts thereof."




R11 Data Quality

 

The repository has appropriate expertise to address technical data and metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations.


Guidance: Repositories must work in concert with depositors to ensure that there is enough available information about the data such that the Designated Community can assess the substantive quality of the data. Such quality assessment becomes increasingly relevant when the Designated Community is multidisciplinary, where researchers may not have the personal experience to make an evaluation of quality from the data alone. Repositories must also be able to evaluate the technical quality of data deposits in terms of the completeness and quality of the materials provided, and the quality of the metadata.

Data, or associated metadata, may have quality issues relevant to their research value, but this does not preclude their use in science if a user can make a well-informed decision on their suitability through provided documentation.

For this Requirement, please describe:

  • The approach to data and metadata quality taken by the repository.
  • Any automated assessment of metadata adherence to relevant schema.
  • The ability of the Designated Community to comment on, and/or rate data and metadata.
  • Whether citations to related works or links to citation indices are provided.

Provisions for data quality are also ensured by other Requirements. Specifically, please refer to R8 (Appraisal), R12 (Workflows), and R7 (Data integrity and authenticity).

Extended Guidance:
The applicants should make clear in the response that they understand the quality levels that can be reasonably expected from depositors. They should describe how quality will be assured during curation and the quality expectations of users, which may involve documentation of areas where quality thresholds have not been reached.

Compliance level 4 - The guideline has been fully implemented in the repository

Response:

Collection data

  • The IISH aims to give proper access to all collections by adding sufficient descriptive metadata so they can be easily found in the IISH internal and external search systems and engines. If the depositor has delivered metadata we will use that as a starting point for creating descriptive metadata. But, as noted earlier, as a private archive the Institute has no real influence on the use, the form and the amount of metadata the depositor delivers with the archive. In many cases the descriptive metadata will be created by the institute itself.
  • The approach to the quality of collection describing metadata is pragmatical in the sense that the IISH follows standards that are widely used within the cultural heritage community. Findability and interoperability of the collections are key to the allocated metadata. Obvious standards used here are archival standards ISAD(G), EAD and the library standard MARC 21. Also the institute makes extensive use of well-established vocabularies like VIAF (http://viaf.org/) and AAT (http://www.getty.edu/research/tools/vocabularies/aat/), Library of Congress Subject Headings and GeoNames. These lie at the basis of the search functionalities of the linked data powered IISH website.
  • The institute uses standardized open source applications like Evergreen and Archivespace for descriptive metadata,  which are built upon the internationally accepted metadata standards like MARC 21 and EAD.
  • For technical and preservation metadata the PREMIS standard is used. This metadata is generated by Archivematica during the different steps in the archiving workflow. Archivematica embeds the PREMIS metadata in a METS wrapper.  
  • Search results and records are always presented in context and with suggestions to related works. Example:  https://hdl.handle.net/10622/3C26EA12-6914-4C3D-B3E6-D22EFF25B6E3

Research data

  • The IISH uses the Dataverse platform for giving access to the research data: https://datasets.iisg.amsterdam/. Metadata and files are part of the standard workflows, dedicated staff members will ensure the quality.
  • Metadata and file formats are according to international standards. The DDI (Data Documentation Initiative) standard is used for describing the datasets.

 



R12 Workflows 


Archiving takes place according to defined workflows from ingest to dissemination.

 

Guidance: To ensure the consistency of practices across datasets and services and to avoid ad hoc and reactive activities, archival workflows should be documented, and provisions for managed change should be in place. The procedure should be adapted to the repository mission and activities, and procedural documentation for archiving data should be clear.

For this Requirement, responses should include evidence related to the following:

  • Workflows/business process descriptions.
  • Clear communication to depositors and users about handling of data.
  • Levels of security and impact on workflows (guarding privacy of subjects, etc.).
  • Qualitative and quantitative checking of outputs.
  • Appraisal and selection of data.
  • Approaches towards data that do not fall within the mission/collection profile.
  • The types of data managed and any impact on workflow.
  • Decision handling within the workflows (e.g., archival data transformation).
  • Change management of workflows.

This Requirement confirms that all workflows are documented. Evidence of such workflows may have been provided as part of other task-specific Requirements, such as for ingest in R8 (Appraisal), storage procedures in R9 (Documented storage procedures), security arrangements in R16 (Security), and confidentiality in R4 (Confidentiality/Ethics).

Extended Guidance 2017

The reviewer is looking for evidence that the applicant takes a consistent, rigorous, documented approach to managing its activities throughout their processes and that changes to those processes are appropriately implemented, evaluated, recorded, and administered.

Compliance level: 3 – The repository is in the implementation phase

Response:

Collection data

The IISH has two workflows for the ingest of collection data in the repository:

  1. A workflow for born digital materials: in this - Archivematica supported - workflow the ingest of born digital materials into the digital repository is described. This involves - among other things - pre-ingest steps as the assignation of checksums, checks for viruses, file format characterisation and validation. If necessary, appraisal and selection are (as much as possible) done between the pre-ingest (Transfer) and the SIP creation steps. The most important steps during the ingest phase are the normalisation of files for access and preservation, the allocation of a Handle PID's to files and folders, the creation of the AIP and the DIP (when allowed, see R4) and the storage of these packages. All steps of the workflow are logged and documented in PREMIS metadata wrapped in METS. Documentation: Functional description of the workflow for pre-ingest and ingest of Born Digital Collections https://confluence.socialhistoryservices.org/display/CTS/Functional+description+of+the+workflow+for+pre-ingest+and+ingest+of+Born+Digital+Collections. Schematic representation of the born digital workflow: https://confluence.socialhistoryservices.org/pages/viewpage.action?pageId=48989289.
  2. A workflow for digitized materials: in this workflow the ingest into the archive and dissemination of digitized collections is described. This involves - among other things - the checking of structural metadata, the creation of METS, creation of derivates, the allocation of Handle PID's and the allowed access status (resolution of the image the Institute has allowed for viewing). Documentation: https://confluence.socialhistoryservices.org/display/CTS/Flow+14. This workflow will have migrated in the second half of 2019 to an Archivematica driven workflow, so the individual steps in the workflow can be better monitored, managed and changed. 

Security and impact on workflows

  • In the dissemination part of the workflow for digital archival materials, privacy and copyrights status determine whether the archive is open or not and, if open, under which access regime the images (or otherwise) may be presented. On the highest level of the archival description under  "access"  the user receives a statement on the openness of the archive (restricted / not restricted). Under the Access and use tab the exact status can be found.
  • Access regimes can be based on copyright and privacy laws and/or on the individual agreements with the archival donor.
  • A general IISH copyright statement can be found on the website: https://iisg.amsterdam/en/collections/using/reproductions.

 Approaches towards data outside the mission/collection profile

The method of, and depth of, appraisal and selection of born digital collections is an ongoing discussion and adaptive process because of the changing nature of the materials and the organisations and persons that provide the materials. Having said this, the following materials will be deselected as a rule:

  • Computer programs
  • ISO images of hard disks or otherwise
  • Virus infected data
  • Password protected files

In the near future also extra functionality will be developed within Archivematica for identification and removal of duplicates files.

Change management of the workflows

  • Important changes of the two workflows will be logged and documented on the IISH Confluence pages that describe these workflows (see above).


Research data

The archival workflow for research data is, for an important part, determined by the Dataverse platform (https://datasets.socialhistory.org/) that the IISH uses for dissemination of the datasets: http://dataverse-guides.readthedocs.io/en/latest/user/dataset-management.html. Files are stored on a file server (of which back-ups are made). At the moment of writing this means that, for the IISH datasets, only the integrity of the files can be guaranteed (bit preservation level - see R7). In 2020 the datasets will be ingested through an Archivematica workflow which means that the datasets will be archived on full preservation level (see R7). Research data sets uploaded in Dataverse will be transferred to the Archivematica workflow and follow the standard ingest. The data will be published if access is open. See the also scheme of the four main IISH digital archival workflows: https://confluence.socialhistoryservices.org/display/CTS/Scheme+of+four+main+IISH+digital+archival+workflows


 


R13 Data Discovery and Identification  


The repository enables users to discover the data and refer to them in a persistent way through proper citation.


Guidance: Effective data discovery is key to data sharing, and most repositories provide searchable catalogues describing their holdings such that potential users can evaluate data to see if they meet their needs. Once discovered, datasets should be referenceable through full citations to the data, including persistent identifiers to ensure that data can be accessed into the future. Citations also provide credit and attribution to individuals who contributed to the creation of the dataset.

For this Requirement, responses should include evidence related to the following questions:

  • Does the repository offer search facilities?
  • Does the repository maintain a searchable metadata catalogue to appropriate (internationally agreed) standards?
  • Does the repository facilitate machine harvesting of the metadata?
  • Is the repository included in one or more disciplinary or generic registries of resources?
  • Does the repository offer recommended data citations?
  • Does the repository offer persistent identifiers?

Extended Guidance R13
The response should contain evidence that the curation of data and metadata is designed to support resource discovery of clearly defined and identified digital objects. It should be clear to users of the data how it must be cited to provide appropriate academic credit and linkages among related research.

Compliance level: 4 - The guideline has fully implemented in the repository

Response:

Collection data

Research data



R14 Data Reuse 

The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.


Guidance: Repositories must ensure that data can be understood and used effectively into the future despite changes in technology. This Requirement evaluates the measures taken to ensure that data are reusable.

For this Requirement, responses should include evidence related to the following questions:

  • Which metadata are required by the repository when the data are provided (e.g., Dublin Core or content-oriented metadata)?
  • Are data provided in formats used by the Designated Community? Which formats?
  • Are measures taken to account for the possible evolution of formats?
  • Are plans related to future migrations in place?
  • How does the repository ensure understandability of the data?

The concept of ‘reuse’ is critical in environments in which secondary analysis outputs are redeposited into a repository alongside primary data, since the provenance chain and associated rights issues may then become increasingly complicated.

Reuse is dependent on the applicable licenses covered in R2 (Licenses).

Extended Guidance (2017)

The applicant should understand the needs of the Designated Community in terms of their research practises, technical environment, and applicable standards. Changes in technology are important, but appropriate high-quality metadata should also play an essential role and should be referred to in the evidence provided. The latter information is critical to design curation processes that result in digital objects meeting the needs of the end user, as well as generic or disciplinary standards.

Compliance level: 4 - The guideline has fully implemented in the repository

Response:


Metadata of archival donors

See answer to R11: "The IISH aims to give proper access to all collections by adding sufficient descriptive metadata... created by the institute itself."

Data provided for the Designated Community

  • The digitised collections are presented - when still images - in the common JPEG format and - in case of objects which consist of more than one image like for  instance books  - PDF format. Video is presented as streaming MP4 and audio as MP3 files. When users need high resolution and/or uncompressed files they can order these files on the website.
  • The born digital collections will be publicly available, but we will use authentication and probably on-site access only to give access to the users.  The Institute strives to - when permitted and technically feasible - to give access to all objects in the original format (for instance MsWord .doc file) and, when necessary, in an at least one normalized access copy (for instance PDF).
  • Both digitized and born collections will be made available through a IIIF (https://iiif.io/) compliant viewer. This will heighten the interoperability of the collections considerably.
  • The research data are published on the IISH dataverse platform in common formats used within social history research. Metadata about the datasets can be exported as DC, DDI, Datacite, JSON, OAI_ORE, OpenAIRE and JSON-LD (see for instance https://hdl.handle.net/10622/FHJJYK under button 'export metadata'). The metadata are exported as linked data and will in 2020 be available through the integral data search on the IISH website. 
  • The IISH API can deliver descriptive metadata for collection data in EAD, DC and MARC (see for example: https://search.socialhistory.org/Record/COLL00293/Export).

Measures taken into account for the possible evolution of formats

  • With the help of Archivematica the archive is monitored regularly for the possible obsolescence of the preservation formats. When a format threatens to become obsolete a migration workflow is started to create new preservation copies of the originals. In this case a new AIP is created. For reason of authenticity and provenance the original file will always be preserved (as are the earlier preservation formats). The new preservation format will be selected on the basis of the regularly updated IISH File format policy for born digital collections (https://confluence.socialhistoryservices.org/display/CTS/File+format+policy+for+born+digital+collections). The migration process (preservation action) will be documented in technical metadata (PREMIS).
  • When research data will be ingested via Archivematica in 2020 (see R12) these measures will also apply for this data. 

Understandability of the data

  • By updating or migrating file formats that threaten to go obsolete (see above) the IISH ensures that file formats can be read by current software. Also, by offering comprehensive context in the form of descriptive metadata and metadata that proves the authenticity of the digital object the understandability is guaranteed. Preservation planning and action are described in the preservation policy.
  • By using open and common (with social history research) standards for research data the understandability of the data is guaranteed on the short and medium term. On the long-term research data can follow the procedure described directly above whereby file formats will be migrated to then common formats. 

Understandability of the metadata

All collection data are described following international standards MARC21 and EAD (see R13). For MARC21 descriptions the IISH follows RDA (Resource Description and Access) description guidelines which, among other things, guarantee the authenticity and interoperability of the metadata. EAD guarantees the standardized description of archival collections. Each archival description comes with elaborate context information (for instance http://hdl.handle.net/10622/ARCH02639 under header 'content and context'). Additional IISH description guidelines further standardize the use of the metadata within the institute. Research data are described using the DDI standard via the IISH Dataverse platform (https://datasets.socialhistory.org/). This ensures that the data are described in a standardized and interoperable fashion.  


R15 Technical infrastructure 

The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community.


Guidance: Repositories need to operate on reliable and stable core infrastructures that maximizes service availability. Furthermore, hardware and software used must be relevant and appropriate to the Designated Community and to the functions that a repository fulfils. Standards such as the OAIS reference model specify the functions of a repository in meeting user needs.

For this Requirement, responses should include evidence related to the following questions:

  • What standards does the repository use for reference? Are these international and/or community standards (e.g., Spatial Data Infrastructure (SDI) standards, OGC, W3C, or ISO 19115)? How often are these reviewed?
  • How are the standards implemented? Are there any significant deviations from the standard? If so, please explain.
  • Does the repository have a plan for infrastructure development? If so, what is it?
  • Is a software inventory maintained and is system documentation available?
  • Is community-supported software in use? Please describe.
  • For real-time to near real-time data streams, is the provision of around-the-clock connectivity to public and private networks at a bandwidth that is sufficient to meet the global and/or regional responsibilities of the repository?

Extended guidance 2017:

The workflows and human actors providing repository services must be supported by a technological infrastructure. The reviewer is looking for evidence that the applicant understands the wider ecosystem of standards, tools, and technologies available for (research) data management and curation, and has selected options that align with local requirements. If possible, this should be demonstrated by using a reference model.

Compliance level4 - The guideline has fully implemented in the repository

Response:

The IISH uses the OAIS functional model (https://en.wikipedia.org/wiki/Open_Archival_Information_System#The_functional_model) as guiding principle for the elements which have to be covered by the digital repository. 

For the recording and management of new acquisitions the IISH Acquisition database (Github: https://github.com/IISH/acquisition_database) is used. Digital packages are produced using the BagIt format using the tool Exactly (Github: https://github.com/IISH/uk-exactly). The archival bags are ingested into Archivematica (https://www.archivematica.org/en/) using the accession number provided by the Acquisition database.

Archivematica serves as the OAIS ingest workflow and preservation planning software tool. The ingest procedure in Archivematica is provided as a workflow through a number of microservices, each providing the necessary tools for the required checks, information extractions, transformations and normalization steps during the ingest procedure. At the end of the ingest procedure, Archivematica produces an AIP, which also conforms to the BagIt format. This AIP includes a METS document as a wrapper for all structural and technical metadata. All preservation metadata is included in the METS using the PREMIS data model. All customizations of rules and tooling to use for various file formats during the ingest procedure is configured using the preservation planning tools provided by Archivematica.

During the ingest procedure persistent identifiers (see R13) are assigned using the PID handle service (Githbub https://github.com/IISH/PID-webservice). For persistent identifiers and resolving the IISH uses the Handle System. All digital objects and descriptive metadata are made accessible through persistent identifiers. See also R13. 

All descriptive metadata are described using either MARC or EAD. MARC is used for describing the library holdings?? and the descriptions of individuals objects. Evergreen (http://evergreen-ils.org/) is used for the management of the MARC inventory. EAD is used for describing the archival descriptions. An XML editor XMetal (https://xmetal.com/) is used for the management of the EAD descriptions. All descriptive metadata can be accessed through OAI-PMH and SRU/SRW: https://iisg.amsterdam/en/collections/using/machine-access.

The IISH tries as much as possible to use open source (community supported) software. Especially software concerning collection management and dissemination is mostly all open source. All software, documentation and test material produced by the IISH is open source and available through our GitHub repository (https://github.com/iish). A list of software used and relevant technical documentation is managed and monitored by the Digital Infrastructure Department of the KNAW Humanities Cluster (HUC).

For real-time to near real-time data streams the provision of around-the-clock connectivity to public and private networks is at a bandwidth that is sufficient to meet the global and/or regional responsibilities of the repository.  The IISH is connected to the 100Gb/s port of the Amsterdam Internet Exchange, which is considered to be one of the largest and fasted data hub transports in the world. Agreements of the KNAW with SURF Sara ensure this connection to a high quality and broad bandwidth connection.

Future developments are (among others):

  • A further refinement of the pre-ingest procedure for digitized and born digital materials. This will be finished in the beginning of 2020.
  • In 2020 research data will be ingested via Archivematica. In this way it can be preserved in such a way that authenticity can be (as much as possible) guaranteed (see also R7).
  • Also in 2020 the Acquisitions database and XMetal (both mentioned above) will be replaced by ArchivesSpace - an open source archives information management application. As ArchivesSpace can be integrated with Archivematica the workflows will be more efficient, especially with respect to born digital collections.


R16 Security 


The technical infrastructure of the repository provides for protection of the facility and its data, products, services, and users.

 

Guidance: The repository should analyze potential threats, assess risks, and create a consistent security system. It should describe damage scenarios based on malicious actions, human error, or technical failure that pose a threat to the repository and its data, products, services, and users. It should measure the likelihood and impact of such scenarios, decide which risk levels are acceptable, and determine which measures should be taken to counter the threats to the repository and its Designated Community. This should be an ongoing process.

For this Requirement, please describe:

  • Procedures and arrangements in place to provide swift recovery or backup of essential services in the event of an outage.
  • Your IT security system, disaster plan, and business continuity plan; employees with roles related to security (e.g., security officers); and any risk analysis tools (e.g., DRAMBORA) you use.

This Requirement describes some of the aspects generally covered by others—for example, R12 (Workflows)—and is supplementary to R9 (Documented storage procedures).

Extended Guidance R16

The reviewer is looking for evidence that the applicant understands the technical risks applicable to the servive for the data users and the physical environment, and that it has mechanisms in place to respond to security incidents. Evidence must focus on technical infrastructure rather than on managerial and procedural aspects of business continuity. In what way is the technical infrastructure controlled by the repository or by their host/outsource institution? Who is in charge? Can the repository in any way determine the technical infrastructure if that is outsourced? Are the arrangements sufficient to guarantee the long-term preservation of and/or access to the data holdings?

Compliance level: 3 – The repository is in the implementation phase

Response:

The repository system is actively monitored and subject to standard security policies and procedures maintained at the Royal Dutch Academy of Arts and Sciences (KNAW).

Organizational systems are regularly subjected to vulnerability scans and to a yearly audit process (SURFaudit: https://www.surf.nl/en/surfaudit). In terms of SURFaudit recommendations all KNAW organizations are expected to operate at level 3 (‘embedded in the organization’). Progress or deviations from this expected level of outcome are monitored on a yearly basis and results and improvement points are communicated in a yearly Security Workplan.

All security incidents are registered, coordinated and handled by the Computer Security Incident Response Team of the KNAW in accordance with the process for handling information security incidents (‘Proces voor afhandeling informatiebeveiligingsincident CSIRT (nov 2015)’) Technical administrators in collaboration with functional administrators resolve incidents. Each institute within the KNAW maintains an Information Security Officer acting as an intermediary between the central CSIRT group and institute. If an incident is reported the following standard procedures are followed:

1. Identification
2. Damage assessment and control
3. Repair activities
4. Evaluation and reporting

Each step is further subdivided into a specific set of actions related to the incident level.

As of January 1st 2016 each organization is obliged to report data leaks with the Autoriteit Persoonsgegevens (Authority personal data https://autoriteitpersoonsgegevens.nl/en) if a serious data leak has been discovered. See also the implementation of the GPDR and the measures taken by the KNAW and the Institute (as described in R4). 

All of our servers are provided by the ICT Services of the KNAW and covered via Service Level Agreements. These also cover recuperation procedures in case of an organization wide outage. Risk management procedures are described at the organizational level.

To safeguard data deposited into the repository system multiple copies are maintained at off site locations. All data is furthermore replicated (see R9) in AIP packages that include metadata, data and authorization information. In case of a system outage all data can thus be retrieved from several locations.

From the perspective of risk management a yearly updated risk assessment document ('Risicomanagementplan 2020 Humanities Cluster', meaning 'Risk managementplan'), latest version September 2019) offers important evidence regarding the long term conservation of the collections. In this document the risk of data loss and cyber attacks is indicated and insight is given on how these risks are controlled.

Although the R16 Security Level can be considered to be compliance level 4, the IISH would like to see this confirmed by the Surf Audit in 2020 (which at the moment of writing still has to take place) that will take into account the infrastructural setup and measures taken over the last year when implementing Archivematica and new storage facilities. With the renewed confirmation of Surf Audit the level 4 is also confirmed.  




  • No labels