Humboldt-Universität zu Berlin - Research data management

Documentation and metadata

Here you can find information about data documentation as well as suitable metadata schemata to facilitate the findability of your research data.

To facilitate re-use and thus citation of your research data, the data should be well described and documented. To do this, standardized metadata standards can be used. Furthermore, additional documentation via a separate file can be useful. A documentation goes beyond the description through metadata and is much more detailed (e.g. description of the project, variables, measuring instruments). Metadata, however, are a specific subset of the documentation and serve first and foremost the findability of data (e.g. creator, time period, geographic location). Authority files and controlled vocabularies should also be used for data description.

A documentation can be created in form of a readme file (see template and example). Additional documentation files like e.g. a codebook or an electronic lab notebook (ELN) can be named and described in the readme.

An overview over discipline-specific and interdisciplinary metadata standards can be found at the Digital Curation Center website and Research Data Alliance website.
Guidance: subject librarians of the University Library


Examples for discipline-specific metadata standards:

Biology and Biomedicine: Minimum Information for Biological and Biomedical Investigations (MIBBI)

Computer science: CodeMeta

Humanities: Text Encoding Intitiative (TEI)

Geosciences: ISO 19115, Darwin Core

Musicology: Music Encoding Initiative (MEI)

Sciences: ICAT Schema, Cristallographic Information Framework

Social Sciences and Economics: Digital Documentation Initiative (DDI)

Crystallographic Information Framework


Interdisciplinary metadata standards: DataCite, Dublin Core, MARC21


The following information can be important for describing your data via both metadata schema and documentation:


Title: Name of the dataset or research project that produced it

Creator/Primary researcher: Names and addresses of the organization and/or people who created the data (see also Authority files)

Contributor: People, who further contributed to the data creation (e.g. data curator, funder; see also Authority files)

Identifier: Number used to identify the data, even if it is just an internal project reference number

Dates: Important dates or periods of time that are associated with the data (e.g. project start and end, observation period, publication date)

Subject: Keywords or phrases describing the subject or content of the data (see also Controlled vocabularies)

Location: If the data relates to a physical location, record information about its spatial coverage (e.g. geographical coordinates)

Rights: Any known intellectual property rights held for the data (see also Select licence)

File names: List of all digital files (with names and file extentions; see also Structure files)

Formats: Format of the files, e.g. CSV, HTML, JPEG (see also Choose file format)

Methodolgy: Description of the method of data collection and processing (methodology, experimental protocol, equipment, software, lab notebook)

Language: Language(s) of the intellectual content of the resource, if applicable

Sources: Citations of source material, if data from another source have been used (see also Persistent identification)

Relations: References to other resources (data, literature) that are associated with the data (see also Persistent identification)


If a lot of entries are the same, multiple datasets can be summarized in one documentation file (e.g. a study or experiment with several files).