Preferred data formats

The choice of data format is important as it ensures that the data will be readable in the future. Some formats significantly improve the long-term usability of data compared to others.

Properties of preferred data formats:

  • Non-commercial: freely available and usable without the need to buy specific software or licences. This ensures wider access and long-term preservation of data, regardless of changes in the activities of commercial enterprises
  • Open, with documented international standards: Based on publicly available and standardised technical specifications that allow different systems and tools to process the data without restriction
  • Use standard character encoding: Ensures correct display of text across languages and platforms, eliminating encoding incompatibility issues. Unicode UTF-8 is a widely used character encoding standard that allows the uniform representation and exchange of text across languages
  • Uncompressed: to avoid possible data corruption or dependency on specific compression methods. Uncompressed data also facilitates processing and long-term preservation, as no additional software is needed to open or restore the data
These features help ensure that data is easily accessible, securely stored and widely usable in the future.
File type Preferred formats Acceptable formats Non-preferred formats
Text documents
  • PDF/A (.pdf)
  • ODT (.odt)
  • Markdown (.md)
  • XML (.xml)
  • Unicode Text (.txt)
  • RTF (.rtf)
  • PDF (.pdf)
  • OpenOffice XML (.docx)
  • Microsoft Word (.doc)
  • Google Docs (.gdoc)
Plain text Unicode text (.txt)
  • Non-Unicode text (.txt)
  • HTML (.html)
Presentations PDF/A (.pdf)
  • OpenOffice PowerPoint XML (.pptx)
  • OpenDocument Presentation (.odp/.sxi)
  • Microsoft PowerPoint (.ppt)
  • Google Slides
Data tables
  • CSV (.csv)
  • TSV (.tsv)
  • Delimited text (.txt)
  • TAB (.tab)
  • ODS (.ods)
  • OpenOffice (.docx)
  • XML Workbook (.xlsx)
  • SPSS (.sav)
  • R (.RData, .rds)
  • Microsoft Excel (.xls)
  • PDF (.pdf/.pdfa)
  • SPSS Portable (.por)
  • SAS (.sas7bcat)
  • Stata (.dta)
  • Matlab (.mat)
Databases
  • SQL (.sql)
  • SIARD (.siard)
  • SQLite (.sqlite)
  • dBase (.dbf)
  • CSV (.csv)
  • Microsoft Access (.mdb, .accdb)
  • HDF5 (.hdf5, .h5)
Statistical analysis data
  • Delimited Text (.txt)
  • CSV (.csv)
  • R (.R, .RData, .rds)
  • JSON (.json)
  • SPSS (.sav, .sps)
  • Stata (.dta)
  • SAS (.sd2, .7dat, .tpt)
  • JASP (.jasp)
Audio
  • FLAC (.flac)
  • BWF (.bwf)
  • MXF (.mxf)
  • WAVE (.wav)
  • Matroska Audio (.mka)
  • AIFF (.aif)
  • MPEG-4 Audio (.mp4, .m4a)
  • Ogg Vorbis (.ogg)
  • MP3 (.mp3)
  • AAC (.aac, .m4a)
  • Monkey’s Audio (.ape)
  • WMA (.wma)
Video
  • Matroska Video (.mkv)
  • MXF (.mxf)
  • AVI (.avi)
  • MPEG-4 (.mp4, .m4v)
  • QuickTime (.mov, .qt)
  • MPEG-2 (.mpeg, .mpg)
  • WebM (.webm)
WMV (.wmv)
Images
  • TIFF (.tif, .tiff)
  • PNG (.png)
  • SVG (.svg)
  • DICOM (.dcm)
  • JPEG (.jpg, .jpeg)
  • GIF (.gif)
  • BMP (.bmp)
  • JPEG 2000 (.jp2)
  • Adobe Photoshop (.psd)
  • Apple Picture (.pct)
  • RAW formats (e.g. .cr2, .nef, .arw, .raw)
Vector datnes
  • SVG (.svg)
  • PDF/A (.pdf)
EPS (.eps)
  • Adobe Illustrator (.ai)
  • WMF/EMF (.wmf, .emf)
  • CorelDRAW (.cdr)
Geographical information systems (GIS)
  • GeoPackage (.gpkg)
  • GML (.gml)
  • GeoTIFF (.tif, .tiff)
  • Esri Shapefile (.shp + files)
  • GeoJSON (.json)
  • MIF/MID (.mif/.mid)
  • WKT (.wkt)
  • MapInfo (.tab)
  • KML (.kml, .kmz)
  • Esri Geodatabase (.gdb)
  • MXD/WOR/QGS (.mxd, .wor, .qgs)
  • ESRI Interchange Format (.eoo)
  • CAD (.dwg)
Archives
  • ZIP (.zip)
  • TAR (.tar)
  • gzip (.gz)
  • 7z (.7z)
  • CPIO (.cpio)
  • BZIP2 (.bz2)
  • RAR (.rar)
Qualitative data analysis
  • XML (.xml)
  • Unicode text (.txt)
  • CSV (.csv)
  • RTF (.rtf)
  • JSON (.json)
  • Atlas.ti (.atlproj)
  • NVivo (.nvp, .nvpx)

Preferred data formats

The choice of data format is important as it ensures that the data will be readable in the future. Some formats significantly improve the long-term usability of data compared to others.

Properties of preferred data formats:

  • Non-commercial: freely available and usable without the need to buy specific software or licences. This ensures wider access and long-term preservation of data, regardless of changes in the activities of commercial enterprises
  • Open, with documented international standards: Based on publicly available and standardised technical specifications that allow different systems and tools to process the data without restriction
  • Use standard character encoding: Ensures correct display of text across languages and platforms, eliminating encoding incompatibility issues. Unicode UTF-8 is a widely used character encoding standard that allows the uniform representation and exchange of text across languages
  • Uncompressed: to avoid possible data corruption or dependency on specific compression methods. Uncompressed data also facilitates processing and long-term preservation, as no additional software is needed to open or restore the data
These features help ensure that data is easily accessible, securely stored and widely usable in the future.
File type Preferred formats Acceptable formats Non-preferred formats
Text documents
  • PDF/A (.pdf)
  • ODT (.odt)
  • Markdown (.md)
  • XML (.xml)
  • Unicode Text (.txt)
  • RTF (.rtf)
  • PDF (.pdf)
  • OpenOffice XML (.docx)
  • Microsoft Word (.doc)
  • Google Docs (.gdoc)
Plain text Unicode text (.txt)
  • Non-Unicode text (.txt)
  • HTML (.html)
Presentations PDF/A (.pdf)
  • OpenOffice PowerPoint XML (.pptx)
  • OpenDocument Presentation (.odp/.sxi)
  • Microsoft PowerPoint (.ppt)
  • Google Slides
Data tables
  • CSV (.csv)
  • TSV (.tsv)
  • Delimited text (.txt)
  • TAB (.tab)
  • ODS (.ods)
  • OpenOffice (.docx)
  • XML Workbook (.xlsx)
  • SPSS (.sav)
  • R (.RData, .rds)
  • Microsoft Excel (.xls)
  • PDF (.pdf/.pdfa)
  • SPSS Portable (.por)
  • SAS (.sas7bcat)
  • Stata (.dta)
  • Matlab (.mat)
Databases
  • SQL (.sql)
  • SIARD (.siard)
  • SQLite (.sqlite)
  • dBase (.dbf)
  • CSV (.csv)
  • Microsoft Access (.mdb, .accdb)
  • HDF5 (.hdf5, .h5)
Statistical analysis data
  • Delimited Text (.txt)
  • CSV (.csv)
  • R (.R, .RData, .rds)
  • JSON (.json)
  • SPSS (.sav, .sps)
  • Stata (.dta)
  • SAS (.sd2, .7dat, .tpt)
  • JASP (.jasp)
Audio
  • FLAC (.flac)
  • BWF (.bwf)
  • MXF (.mxf)
  • WAVE (.wav)
  • Matroska Audio (.mka)
  • AIFF (.aif)
  • MPEG-4 Audio (.mp4, .m4a)
  • Ogg Vorbis (.ogg)
  • MP3 (.mp3)
  • AAC (.aac, .m4a)
  • Monkey’s Audio (.ape)
  • WMA (.wma)
Video
  • Matroska Video (.mkv)
  • MXF (.mxf)
  • AVI (.avi)
  • MPEG-4 (.mp4, .m4v)
  • QuickTime (.mov, .qt)
  • MPEG-2 (.mpeg, .mpg)
  • WebM (.webm)
WMV (.wmv)
Images
  • TIFF (.tif, .tiff)
  • PNG (.png)
  • SVG (.svg)
  • DICOM (.dcm)
  • JPEG (.jpg, .jpeg)
  • GIF (.gif)
  • BMP (.bmp)
  • JPEG 2000 (.jp2)
  • Adobe Photoshop (.psd)
  • Apple Picture (.pct)
  • RAW formats (e.g. .cr2, .nef, .arw, .raw)
Vector datnes
  • SVG (.svg)
  • PDF/A (.pdf)
EPS (.eps)
  • Adobe Illustrator (.ai)
  • WMF/EMF (.wmf, .emf)
  • CorelDRAW (.cdr)
Geographical information systems (GIS)
  • GeoPackage (.gpkg)
  • GML (.gml)
  • GeoTIFF (.tif, .tiff)
  • Esri Shapefile (.shp + files)
  • GeoJSON (.json)
  • MIF/MID (.mif/.mid)
  • WKT (.wkt)
  • MapInfo (.tab)
  • KML (.kml, .kmz)
  • Esri Geodatabase (.gdb)
  • MXD/WOR/QGS (.mxd, .wor, .qgs)
  • ESRI Interchange Format (.eoo)
  • CAD (.dwg)
Archives
  • ZIP (.zip)
  • TAR (.tar)
  • gzip (.gz)
  • 7z (.7z)
  • CPIO (.cpio)
  • BZIP2 (.bz2)
  • RAR (.rar)
Qualitative data analysis
  • XML (.xml)
  • Unicode text (.txt)
  • CSV (.csv)
  • RTF (.rtf)
  • JSON (.json)
  • Atlas.ti (.atlproj)
  • NVivo (.nvp, .nvpx)