Introduction to Cultural Heritage Data

Vicky Dritsou

Introduction to Cultural Heritage Data

Authors

Vicky Dritsou

Topics:

Introduction

This course is designed to equip participants with the basic knowledge and essential skills necessary to navigate the fascinating world of Cultural Heritage data. Guided by the expertise and academic journey of Prof. Lorena—a persona created specifically for this course—it delves into the importance of Cultural Heritage data and presents its diverse types and formats. Prof. Lorena’s experiences provides practical insights into the processes of acquiring, organizing, and enhancing the quality of Cultural Heritage data.

Throughout the course, participants gain a comprehensive understanding of how to identify and access various sources of Cultural Heritage data, and learn the methods and techniques that can be applied to improve its quality. They are also introduced to the fundamental strategies for organizing Cultural Heritage data, with an emphasis on cataloguing and the key metadata standards that are prevalent in the field. Additionally, the course discusses the latest trends and technologies shaping the domain of Cultural Heritage data today, ensuring that participants gain an overview of the current landscape.

Learning outcomes

Upon completion of this course, participants will:

understand the significance of Cultural Heritage data and the challenges of its access and preservation;
be familiar with the different types of Cultural Heritage data, their common formats, and the basic structures in which they can be organized;
identify key sources where Cultural Heritage data can be acquired from, and the digital methods and techniques that enhance its quality;
be familiar with the current trends and technologies in the Cultural Heritage domain.

Introducing Prof. Lorena and her student Costis

Prof. Lorena holds the position of Professor of Social History in the Department of History at a university. Her area of expertise lies in the analysis of migration patterns, with a particular focus on investigating, through her academic research, the treatment of refugees in Europe. Over the past 7 years, she has been teaching courses such as “Migration and Diaspora in History”. Currently, she is investigating the impact of historical conflicts on migration, including both the identification of migration patterns and the reception of refugees in European countries.

In her research, Prof. Lorena collects diverse types of data, ranging from documents and photographs to newspaper articles and other historical materials. Motivated by the ongoing refugee crisis in southern Europe, she is currently exploring conflicts such as the Hungarian Revolution and the Greek/Turkish conflict. Additionally, she is supervising a thesis titled “Influence of the Hungarian Revolution on the Migration Pattern” recently assigned to Costis, a graduate student in History. Costis is at his first experiences with digital tools and methods. On the other hand, though Prof. Lorena has acquired experience in utilising digital methods through her extensive research in the domain of cultural heritage, she does not consider herself an expert user.

Cultural Heritage Data: Overview and Categories

Understanding history, culture and traditions entails the analysis, management and interpretation of Cultural Heritage (CH) data. In her courses, Prof. Lorena underscores to her students that one of the most crucial aspects of Cultural Heritage data is their diversity. Cultural Heritage data often come from heterogeneous sources and have different characteristics, and follow different formats and data structures. Regardless of their type or format, the professor aims that the students become able to analyse them comprehensively. This skill is essential for gaining a deep understanding of the data and exploring them further in their study, with a specific focus on the social evolutions of migration patterns over time.

Cultural Heritage data types

Cultural Heritage data include both tangible (such as monuments, pottery, paintings, books, archives) and intangible (such as customs, traditions, folklore) aspects of cultural heritage. These data can exist in analogue form, be digitised or be born digital. Based on their format, a detailed list of the basic Cultural Heritage data types is presented below, with a list of widely used file formats for each type provided in Table 1. In line with the FAIR data principles for open data optimised for reusability, open formats are highlighted in the table with bold characters.

Textual data: includes text contained in documents, books, manuscripts, inscriptions, etc.
Visual data: comprises photographs, paintings, illustrations, drawings.
Audio data: includes music, recordings, oral history, narratives, sound archives, etc.
Audiovisual data: includes films, movies, documentaries.
3D data: consists of 3D models used to produce visual representations of artefacts, monuments, buildings, architectural objects.
Geospatial data: involves maps and coordinate data that provide spatial information regarding places or events of interest.

Data Type	Common File Formats
Text	TXT, CSV, DOC, DOCX, PDF
Image	JPEG, PNG, TIFF, PDF, GIF, BMP
Audio	MP3, WAV, AAC
Video	MP4, AVI, MOV, FLV, WMV
3D	STL, OBJ, FBX, gITF
Geospatial	GML, KML, GeoJSON, Shapefile

Table 1: Common file formats of data types (open file formats are highlighted in bold)

Cultural Heritage Data: Acquisition and Collection

To retrieve a significant variety of data for the specific topic he is working on, Costis seeks Prof. Lorena’s advice. He wants her advice on locating interesting reliable and trusted sources for data retrieval and acquisition. She introduces him to the world of cultural heritage institutions, such as libraries, museums, and archives. Especially in post-COVID era, such institutions offer a wealth of collections, many of which are published online, making them accessible for acquiring and collecting information.

Costis begins by navigating through several online portals of cultural heritage institutions in search for data relating the Hungarian Revolution with migration. He first visits the website of National Széchényi Library, Hungary’s national library, expecting to retrieve numerous resources. However, after querying the online catalogue, he discovers that many items are only available on-site and not online. It is not possible for Costis to plan a visit to this library for the time being. Consequently, he heads to his university’s library, where he has access to digitised and born-digital material of the library. He performs desk research, exploring the catalogue, locating books and research papers related to the Hungarian Revolution and migration patterns, and starts compiling a list of interesting material while saving copies to his local drive. Through the university’s academic network, he also accesses scientific databases and electronic journals, retrieving additional relevant resources.

Continuing his data acquisition task, Costis sequentially navigates to popular online databases that are searchable and provide access to Cultural Heritage data. He queries the Europeana portal, the Digital Public Library of America (DPLA), and the British Museum Collection Online, retrieving interesting results. While browsing the Europeana website, he discovers the Migration Theme, which highlights data relating to thousands of migrants, both famous and unknown, also through ‘stories’. He explores the data linked to the time period of the Hungarian Revolution, seeking information about refugee movements within Europe and their treatment, and decides to keep track of the relevant stories.

Realising that he has assembled a long list of items to study further, Costis faces the challenge of efficiently keeping record of, and organising this material. But how could that be achieved? Costis recalls one of Prof. Lorena’s lectures on digital methods and techniques for Cultural Heritage data organisation and remembers her presentation on the subject.

Cultural Heritage Data: Organisation

Cultural Heritage Data organisation aims to systematically store, efficiently access and retrieve data items related to cultural assets and practices. Several digital methods and techniques have been developed to achieve these goals.The main approaches are outlined below.

Cataloguing

Cultural Heritage data cataloguing is crucial for systematically documenting items within a cultural collection and making their subsequent identification possible . This process applies not only to institutions managing large collections but also to individuals managing and curating their own datasets for research purposes. In the digital world, cataloguing involves creating detailed digital records for each item in the collection. The digital records consist of a consistent set of fields to efficiently describe the items. These fields are metadata—data about the data. Common metadata elements include the title, creator, important dates in the item’s lifecycle, and its location (either physical or in digital storage).

Organisation

Developing catalogues with the necessary metadata is essential for systematically documenting Cultural Heritage data. Following best practices, this process is best implemented by adhering to metadata standards which ensures consistency and enables interoperability across different systems, platforms and organisations. Using a common metadata schema allows for the unambiguous exchange of information about resources, by relying on metadata definitions that the standards provide. These standards are widely used in research settings, making the information organised according to them easier for humans to understand and process, as well as possible for machines to interpret. Familiarising with the standard used for a catalogue enhances discoverability and allows a more conscious use of the data. In her lectures, Prof. Lorena introduces five key metadata standards for cultural heritage institutions.

Dublin Core:

A simple yet versatile standard that provides a set of fifteen basic elements for describing a wide range of resources. The set includes the elements: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights. Dublin Core is widely used due to its simplicity and flexibility, making it suitable for a variety of digital and physical objects across different types of collections.
Europeana Data Model (EDM):

The Europeana Data Model is a data model developed specifically for aggregating and integrating Cultural Heritage metadata from diverse European cultural institutions. EDM provides a rich, flexible framework that supports the integration of different metadata elements into one consistent model. It enables the description of complex digital and physical resources, and their relationships together with contextual information. By using EDM, Europeana ensures that the aggregated data from various cultural heritage institutions is interoperable, enhancing discoverability and usability across the Europeana platform. (More details on EDM are provided in course ”Introduction to Cultural Heritage Data Modelling — with a focus on EDM”).
CIDOC Conceptual Reference Model (CIDOC-CRM):

CIDOC-CRM is an ontology—and an ISO standard—for cultural heritage information, having the scope to facilitate integration, mediation, and interchange of heterogeneous cultural heritage information. It provides well defined semantics and formal structure for describing the implicit and explicit concepts and relationships used in the domain. Following an event-centric approach, CIDOC-CRM is particularly valuable for representing complex relationships between entities, such as people, events, places, and objects. For Prof. Lorena, CIDOC-CRM can serve as a metadata standard, helping create a comprehensive and semantically rich description of her data, and enabling more complex queries to be performed.
EAD (Encoded Archival Description):

EAD is a standard for encoding archival finding aids using XML. It is used by archives to describe the content and structure of archival collections, facilitating discovery and access. EAD finding aids provide detailed information about the hierarchy and organisation of archival materials, helping researchers understand the context and provenance of the collections. It supports the creation of detailed, searchable, and interoperable descriptions of archival resources.
MARC (Machine-Readable Cataloging):

A standard for representing and exchanging information related to bibliographic resources. MARC is extensively used in libraries for cataloguing books, journals, and other materials. It provides detailed bibliographic information, facilitating the exchange of bibliographic data between libraries and other institutions. MARC records include fields for authors, titles, subjects, publication information, and more, enabling precise and thorough cataloguing.

After revisiting these metadata standards from a practical perspective, Costis recalls from Prof. Lorena’s lectures that efficient catalogues based on metadata standards can be deployed using various systems. Cultural Heritage metadata, with or without accompanying data, can document collections in spreadsheets, databases (relational or not), Content Management Systems (CMS) or platforms supporting systematic documentation of cultural heritage data. For example, the open-source platform Omeka, widely used by museums and libraries, supports Dublin Core, CIDOC CRM and can be extended with plugins to support other metadata standards. Other systems include ArchiveSpace, an open-source system for archives that adheres to EAD, Blacklight for MARC, and Research Space, a free open-source semantic web platform developed by the British Museum that adheres to CIDOC CRM.

Preservation

Digital preservation ensures the continued and uninterrupted access to Cultural Heritage data, safeguarding them from technological evolution, physical degradation and other potential threats. While specific preservation strategies are proposed in the literature, systematic frameworks for the implementation of digital preservation strategies also exist, such as the ISO standard Open Archival Information System - Reference Model (OAIS) that provides guidance. Additional methods include regular backups, cloud storage solutions, digital repositories utilisation, and adherence to standards that ensure data longevity and preservation effectiveness. In academia, data stewards are the ones tasked with preparing researchers and students to preserve their data.

Digital methods for professionals and more experienced users

Recent digital methods include 3D scanning and 3D representations of monuments and artefacts, generating their digital twins. These methods play a crucial role in preserving and providing access to Cultural Heritage data. The development of Virtual Reality (VR) experiences, whether fully immersive or not, allows for virtual tours and exhibitions in ancient places and restorations of ruins. Similarly, Extended Reality (ER) methods are developed to produce augmented reality applications using Cultural Heritage data, applying historical information to real-world locations, such as reviving through technology a landmark’s story during a city tour. It is important to note that these advanced methods and technologies require specialised professionals for their development, and can only be used with their aid.

Cultural Heritage Data: Understanding Data Quality

After developing his first metadata collection from the sources mentioned earlier, Costis begins to look deeper into the metadata values he has assembled. He has chosen to use the Dublin Core metadata standard, utilising a simple spreadsheet for managing his metadata. However, he notices that some entries lack substantial information in several fields. He shares his concerns with Prof. Lorena.

In her lectures, Prof. Lorena finds it relevant to address the topic of Data Quality, highlighting the key criteria for ensuring high-quality Cultural Heritage data. The main reason is that, in research settings, Data Quality determines the potential for Cultural Heritage data exploitation and reuse, and its principles are also applied to research data. One of its most crucial factors is the completeness of Cultural Heritage data: high-quality data should have all required metadata fields fully completed. The quality of the collection improves further if the required metadata fields are based on widely used standards like Dublin Core or EDM.

Next, she discusses data consistency, which can be achieved through uniform terminology. This eliminates ambiguity and facilitates data interpretation. To this end, popular controlled vocabularies and ontologies can be employed, standardising terms and ensuring a common understanding and interpretation. Prof. Lorena introduces the students to an open-source tool called OpenRefine, which helps improve data quality. By showcasing specific examples, she demonstrates how the tool can be used to handle messy data: cleaning it by identifying and correcting typos and errors, standardising, transforming and linking data to external sources.

Prof. Lorena also emphasises that high quality data should also be accurate. How can accuracy be achieved? By ensuring that the data reflects real, existing information, which can be cross-referenced with popular databases, trusted repositories or verified by experts of the domain. She highlights the possibilities opened up by Linked Open Data (LOD) to link and cross-reference data to external resources, which not only ensures accuracy but also enriches the data items.

In the era of open data and open science, Prof. Lorena highlights to her students during her lectures that adherence to the FAIR principles is essential for increasing data quality. Findability, Accessibility, Interoperability and Reusability are key criteria that comprise a robust framework, enhance the utility and quality of data. The FAIR principles are increasingly used in conjunction with the CARE principles, which address the ethical aspects of working with cultural heritage data. By ensuring easier identification of location, standardised access methods, interoperability across different systems, and long-term usability (and reusability), data quality is significantly improved. Costis will have the opportunity to gain insight into this as he continues his research.

Conclusion

Following the footsteps of Prof. Lorena and Costis, this course has introduced the basic characteristics of Cultural Heritage data and the methods required to analyse them efficiently within the research process. A comprehensive overview of the categories and types of Cultural Heritage data, emphasising their diversity and the importance of systematic documentation through adherence to standards has been provided. Digital methods and techniques have been explored and widely used open-source software tools for such purposes have been introduced. The quality of Cultural Heritage data emerges as particularly relevant for ensuring the preservation and sustainability of cultural heritage and reusability of data.

Introduction to Cultural Heritage Data

Introduction

Learning outcomes

Introducing Prof. Lorena and her student Costis

Cultural Heritage Data: Overview and Categories

Cultural Heritage data types

Table 1: Common file formats of data types (open file formats are highlighted in bold)

Cultural Heritage Data: Acquisition and Collection

Cultural Heritage Data: Organisation

Cataloguing

Organisation

Preservation

Digital methods for professionals and more experienced users

Cultural Heritage Data: Understanding Data Quality

Conclusion

Cite as

Reuse conditions

Full metadata

#Introduction

#Learning outcomes

#Introducing Prof. Lorena and her student Costis

#Cultural Heritage Data: Overview and Categories

#Cultural Heritage data types

#Table 1: Common file formats of data types (open file formats are highlighted in bold)

#Cultural Heritage Data: Acquisition and Collection

#Cultural Heritage Data: Organisation

#Cataloguing

#Organisation

#Preservation

#Digital methods for professionals and more experienced users

#Cultural Heritage Data: Understanding Data Quality

#Conclusion

Cite as

Reuse conditions

Full metadata

Introduction

Learning outcomes

Introducing Prof. Lorena and her student Costis

Cultural Heritage Data: Overview and Categories

Cultural Heritage data types

Table 1: Common file formats of data types (open file formats are highlighted in bold)

Cultural Heritage Data: Acquisition and Collection

Cultural Heritage Data: Organisation

Cataloguing

Organisation

Preservation

Digital methods for professionals and more experienced users

Cultural Heritage Data: Understanding Data Quality

Conclusion