Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Digital object category

Properties to be retained in preservation format

Original formatOriginal format is preservation format?

Preservation copy

Intermediate access copy

Archivematica normalisation for preservationArchivematica normalisation for access

Raster image RGB (24/48 bits), grayscale (8/16 bits) and bitonal (1 bit)

Gray- or colour values, bit depth, resolution, colour space, if used: ICC profile

TIFF BaselineYesN/AJPEGNoYes
TIFF/EPYesN/AJPEGNoYes
TIFF/IITYesN/AJPEGNoYes
TIFF/FXYesN/AJPEGNoYes
JPEGYesN/AJPEGNoNo
PNG YesN/APNGNoNo
GIF YesN/AGIFNoNo
 JP2 (JPEG 2000 part 1)NoTIFFJPEGYesNoYes
OtherN/ABaseline TIFFJPEGYesYes

Raw camera files

Gray- or colour values, bit depth, resolution, colour space, if used: ICC profile. 

3FR, ARW, CR2, CRW, DCR, ERF, KDC, MRW, NEF, ORF, PEF, RAF, RAW, X3F etc.No

Baseline TIFF 6.0 uncompressed

TIFF

YesYes
DNGNoBaseline TIFF 6.0 uncompressedTIFFYesYes

2D vector images (2D)

Hard to say as vector files can have many origins [2]. Some properties are: points, lines and areas.

SVGYes

SVG

N/AYesNo
Other (AI, EPS etc)N/ASVGSVGYesYes

Word processing files

  • Content: text, images, tables notes, comments, etc
  • Layout: fonts, styles, colours etc
  • Structure: pages, headers, paragraphs etc
  • Behaviour: interactive elements like video (this can not be guaranteed as this behaviour is dependent on external sources).
DOCNo

N/A

N/A (no normalisation tool available)

NoN/A
DOCXYesN/AN/ANoNo
ODTYesN/AN/ANoNo
RTFYesN/AN/ANoNo
WPDNoN/A (no normalisation tool available)N/A (no normalisation tool available)N/AN/A
PDFYesPDF (as is)PDF (as is)NoNo
OtherN/AN/A (no normalisation tool available)N/A (no normalisation tool available)N/AN/A

PDF files


PDFYes

N/A

N/A

NoNo
PDF/AYesN/AN/ANoNo
PDF/XYesN/AN/ANoNo
PDF/EYesN/AN/ANoNo

Text mark-up files

Tags, text

HTMLYes

HTML

HTML

NoNo
XMLYesXMLXMLNoNo
OtherN/AOriginal formatOriginal formatNoNo

Plain text

Content: text

TXTYes

TXT

TXT

NoNo
OtherN/AOriginal formatOriginal formatNoNo

E-books

  • Content: text, images, etc
  • Layout: fonts, styles, colours etc
  • Structure headers, paragraphs etc
EPUBYes

EPUB

EPUB

NoNo
MOBINo?N/AN/ANoNo
OtherN/AN/A (no normalisation tool available)N/A (no normalisation tool available)N/AN/A

Email

Workflow chosen Nov 2017: Mailboxes: PST files are pre-Archivematica converted to MBOX.

Individual mails: .msg files are stored as such, other formats (prereably converted to eml)





  • Content: text, images, tables, etc
  • Structure: headers, paragraphs etc
  • Layout: fonts, styles, colours, HTML layout etc
  • Metadata: SMTP and IMF tags
  • Attachments
  • Thread of conversation



PST → should we ingest this as original format?Yes?
  • MBOX → to be made pre-Archivematica
  • Attached files will be treated following this file format policy. This means that the attachments will have to extracted from the MBOX file. → this is not yet realized.

 

  • MBOX
  • Attached files will be treated following this file format policy. This means that the attachments will have to extracted from the MBOX file.
N/A (no tool included)N/A (no tool included)
MBOXYes
  • MBOX
  • Attached files will be treated following this file format policy. This means that the attachments will have to extracted from the original mail.
MBOXN/A (no tool included)N/A (no tool included)
MSGYes
  • MSG
  • Attached files will be treated following this file format policy. This means that the attachments will have to extracted from the original mail.
MSG. Attached files are delivered as separate files.N/A N/A
EMLYes
  • EML
  • Attached files will be treated following this file format policy. This means that the attachments will have to extracted from the original mail.
 EML. Attached files are delivered as separate files.NoNo
Other → conversion of mailboxes are made before ArchivematicaN/A
  • Mailbox: MBOX
  • Individual mail:
  • Attached files will be treated following this file format policy. This means that the attachments will have to extracted from the original mail.
MBOX, EML. Attached files are delivered as separate files.N/A (no tool included)N/A (no tool included)

Spreadsheets

  • Content: text, numbers
  • Layout: fonts, colour, etc
  • Structure: Structural information such as the cell locations (row, column) and the nested worksheets will be preserved.
  • Behaviour: formuleas, macro’s. Link to external sources can not be guaranteed.
XLSNo

N/A

N/A

NoNo
XLSXNoN/AN/ANoNo
ODSYesN/AN/ANoNo
OtherN/AN/A (no normalisation tool available)N/A (no normalisation tool available)N/AN/A

Presentation files

   
  • Content: text, images, video (only when part of the file)
  • Layout: fonts, colour, etc
  • Structure: Slide order
  • Behaviour:
   
PPTNo

N/A

N/ANoNo
PPTXYesN/AN/ANoNo
ODPYesN/AN/ANoNo
OtherN/AN/AN/AN/AN/A

Audio files

  • Audio channels (mono/stereo)
  • Bit depth
  • Sample rate
WAVYes

N/A

MP3


NoYes
AIFFYesN/AMP3NoYes
MP3YesN/AMP3NoNo
FLACYesN/AMP3NoYes
M4A, AACYesN/AMP3NoYes
OtherN/AN/AMP3YesYes

Video files


 



Audio:

  • Audio channels (mono/stereo)
  • Bit depth
  • Sample rate

Video:

  • Gray- or colour values
  • Sample rate
  • Frame rate
  • Frame size
  • Frame type
  • Aspect ratio
  • Bit depth.


 




MKV-container file


Yes

MKV-container file with lossless FFV1-encoding for the video signal and LPCM -encoding for the audio signal.

MP4-container with a H.264-videostream and a AAC-audiostream

N/AYes

Generic MXF container file

No

MKV-container file with lossless FFV1-encoding for the video signal and LPCM -encoding for the audio signal.

(Was:

MXF-container file with lossless JPEG2000-encoding for the video signal and LPCM-encoding for the audio signal.)

MP4-container with a H.264-videostream and a AAC-audiostreamN/AYes

AVI

NoMKV-container file with lossless FFV1-encoding for the video signal and LPCM -encoding for the audio signalMP4-container with a H.264-videostream and a AAC-audiostreamYesYes
MOVNoMKV-container file with lossless FFV1-encoding for the video signal and LPCM -encoding for the audio signalMP4-container with a H.264-videostream and a AAC-audiostreamYesYes

MPEG-2

NoMKV-container file with lossless FFV1-encoding for the video signal and LPCM -encoding for the audio signalMP4-container with a H.264-videostream and a AAC-audiostreamNoYes

MPEG-4

NoMKV-container file with lossless FFV1-encoding for the video signal and LPCM -encoding for the audio signalMP4-container with a H.264-videostream and a AAC-audiostreamNoYes
OtherN/AMKV-container file with lossless FFV1-encoding for the video signal and LPCM -encoding for the audio signalMP4-container with a H.264-videostream and a AAC-audiostreamYesYes

Webarchive


WARC

The WARC file is the end product of a website harvesting proces (mostly by the use of Heretrix tool).

Yes

N/A


N/A

N/AN/A

Packed files (ZIP, RAR)






ZIPNo

ZIP file is unpacked, unpacked files are (pre-)ingested and original ZIP file deleted.

N/A

N/AN/A
RARNoRAR file is unpacked, unpacked files are (pre-)ingested and original RAR file deleted. N/AN/AN/A
OtherNoOriginal packaged file is (if possible) unpacked, unpacked files are (pre-)ingested and original package file deleted. N/AN/AN/A

Databases (more research needed)




SIARDYesN/AN/ANoNo
CSVYesN/AN/ANoNo
Microsoft Access database MDB (different versions - before 2000 problematic???)NoN/AN/ANoNo
Microsoft Access database ACCDBNoN/AN/ANoNo
OtherN/ANo normalisationNo normalisationN/AN/A
Geographical information (GIS) - more research needed


GeoTIFFYes

N/A

N/A

NoNo


ESRI Shapefiles (.shp en bijbehorende bestanden), GML???


Geojson, TopoJSON??

Unknown file format




Unknown file formats are stored as such.

As these file formats are unknown no access format can be made.

N/AN/A

...