11 Feb 2016

Empirical Humanities Metadata Working Group Case Statement

NOTE - This charter statement has been replaced by the attached V2 text.  See the attached PDF for the updated version.  15 May 2017


1. WG Charter

The empirical humanities include history, folklore, cultural anthropology and other fields in which researchers collect primary data of different types that can be used for cultural analysis. Today, these researchers often need to collaborate to understand phenomena that operate across geographic regions, scale and communities of people. But established research practices and infrastructures in the empirical humanities do not support this. The Empirical Humanities Metadata WG (EHM) will conduct research, develop a statement of best practices and release an adoptable product centered on what needs to be in place (standards, protocols, policies, cultural expectations) to make ethnographic and historical data archivable, discoverable and shareable.


In a preliminary phase of this work, we have identified and categorized a broad range of metadata standards and use cases within history and ethnography, surveyed the relevant literature, hosted a number of conversations asking about the many kinds of data (such as recorded interviews, field notes, and photographs, among others) that require metadata to be archived and shared in a number of different user scenarios (born-digital interviews produced by early career researchers, for example, or newly digitized data from more established researchers) and ensured complementarity between the goals of the EHM and existing RDA metadata groups. Building on this exploratory phase researching standards and use-cases, the first phase of this WG will entail facilitating the implementation of new metadata fields within the Platform for Experimental and Collaborative Ethnography (PECE), resulting in a prototype of the suggested metadata fields for about a dozen artifact types common in the empirical humanities. A second phase will document and analyze some of the diverse metadata practices within the empirical humanities through a literature review, an environmental scan of projects and interviews with project leads. This stage will also include working with metadata and provenance experts within and beyond the RDA to better understand existing best practices and recommendations. Finally, we will propose best metadata practices for a variety of use cases/scenarios and facilitate the uptake of these deliverables within a number of projects (see Adoption Plan below), updating our initial metadata fields from phase one of the project. Each phase is expected to take six months. 


Confirmed early adopters of this WG’s deliverables will include research groups working with PECE including The Asthma Files and The STS Disasters Studies Network. We are also in contact with a number of other RDA members that have agreed to consider adopting our outputs. Beyond the 18 month timeframe of this WG, lessons learned from these early adopters will feed into the RDA Digital Practices in History and Ethnography (DPHE) Interest Group’s continued dissemination of these best practices. Our deliverables will make digital artifacts in ethnographic and historical research much easier to share, find, use and cite effectively, contributing to the development of a credit/reward structure that would not only reduce barriers, but further incentivize the sharing of data in the digital humanities.


1a. Preliminary research (completed) (M0)

In preparation for this Case Statement submission, we have reviewed a wide range of use cases, identifying scenarios that historians and ethnographers (within and beyond RDA) encounter when working with metadata for shared artifacts. This phase has benefitted from multiple existing RDA initiatives including the Metadata IG, the Data Fabric IG and the Repository Platforms IG. We hosted an “issues share” call-in on metadata with representatives from the Metadata IG and other RDA groups and have received positive feedback from leadership within metadata-related RDA groups on this case statement and the complementarity of the EHM and these existing initiatives. Preliminary work conducted within the DPHE suggests that researchers often struggle to develop appropriate metadata practices when digitizing and sharing the following data types especially important to history and ethnography: field notes, interviews (audio and video), grey matter, images, analytic structure, structured annotations, surveys, maps, quantitative data, bibliographies, translations and work flows.


1b. (M6)

In the first phase of this WG we will expand the metadata fields within PECE for the following data types: projects, groups, fieldsites, design logics, substantive logics, fieldnotes, texts, PDF documents, images, audio, video, websites, licenses, annotations, structured annotations, events, memos and bibliographies. As we continue our research, and use the platform, throughout the remaining phases of this WG, we will pay attention to (and perhaps go beyond) the limits of our initial choices about which metadata fields to include for these data types.


1c. Mid-term goals (M12)

Our second phase will continue to document and analyze the diverse existing metadata practices of researchers in the empirical humanities. In order to scope this project appropriately for our timeline, we will focus primarily on the data types we have identified as priorities in PECE. In our analysis, we will identify best practices that could be codified and distributed.


1d. Long-term goals (M18)

In our final phase of work we will facilitate the uptake of our deliverables from the second phase (best practices) in at least two projects using the PECE platform (The Asthma Files and The STS Disaster Research Network) and then work with other adopters. Users of PECE will be able to use a simple form-based interface to input the relevant metadata for artifacts such as images, documents, audio, video and a variety of file types. In order to capture metadata for various analytic structures, PECE developers will establish micro-attribution vocabularies that capture the complex provenance of a particular analytic. We aim to learn lessons in this implementation phase that will help facilitate the uptake in further platforms and projects beyond the 18 month timeframe, with sustainable support provided by DPHE-IG. We will conduct extensive outreach, through RDA and professional societies, to identify, and work with, adopters of our outputs.


1e. Timeframe

Phase one, updating PECE metadata fields, will run from February 2016 through July 2016. From August 2016 through January 2017 we will collect data on the many ways researchers in history and ethnography address metadata issues. Finally, we will codify best practices and work with researchers using the PECE platform, and other research communities, to adopt these deliverables from February through June 2017.


2. Value Proposition

Research case: Given the cultural and social complexity (as well as technical, ecological and economic complexity) of many global problems today, collaborative empirical humanities research has renewed urgency. For decades, research in these fields has been an almost entirely individual-centric enterprise. Field notes, found documents, found or researcher-created photographs or recordings and other data used in cultural analysis are very rarely shared, except when reduced or rendered into some form of publication or museum display.


One of the primary barriers to sharing data within the empirical humanities is a lack of agreed-upon protocols for metadata standards for user-created primary research data. While there has been a great deal of work in the cultural heritage arena, especially within museums and libraries, and the dilemmas of qualitative data re-use are well documented (see Holstein and Gubrium 2004), the issues associated with preparing data for later use by third parties are yet to be thoroughly conceptualized. In the cultural heritage sector, for example, Jenn Riley has identified 105 metadata standards and notes that “the sheer number of metadata standards in the cultural heritage sector is overwhelming, and their inter-relationships further complicate the situation” (Riley 2009). In contrast, the RDA Metadata Standards Directory WG lists only one standard for heritage studies and one for anthropology.[1] Many researchers find themselves caught in the confusing space between the dizzying proliferation of standards and a one-size-fits-all approach that can miss out on the diversity of data practices within disciplines. Working closely with existing metadata-focused RDA groups (providing feedback on the list of elements in the package presented recently at the WG Chairs Meeting in Gaithersburg, for example), we will produce a simple list of recommended metadata fields for a delimited set of artifact types, analytics and use cases. Once endorsed by the RDA, and taken up by early adopters, these best practices will be a go-to resource for researchers that may then choose to modify (add or subtract) the fields we suggest for their own purposes. Development and uptake of shared metadata practices and tools will make user-created research data more findable and usable within these research traditions. The work of this WG could also contribute to the development of mechanisms providing greater credit and incentives for sharing data.


Business Case and Adoption Plan: Building digital infrastructure to support more data sharing and collaboration in the empirical humanities is far from straightforward. Analytic techniques in the empirical humanities differ from those in social science fields that may collect similar data, and are more akin to those used in literary and philosophical research, relying primarily on hermeneutics (interpretation for explanation and evocation rather than representative or statistical sampling for identification and validation). The goal is not to develop a concise and consistent view of an object, but to produce and explore multiple views of an object, leveraging “epistemological pluralism” (Keller 2002; Turkle and Papert 1990). Indeed, providing multiple, different interpretations and ways of representing particular phenomena (the sociocultural causes and impacts of the Fukushima nuclear disaster, for example, or the impact of genetics research on understandings of environmental health) is the key task for humanities researchers. Computational advances that support open-ended, underdetermined engagement with digital content that enables (even encourages) drift and transmutation in the way content is identified and taken up in analysis, are thus required. The standards we develop are likely to be taken up by individual empirical humanities researchers, people working on collaborative projects and institutions.

Metadata is particularly complex and dynamic in the empirical humanities, even more so when research is collaborative. Empirical material often has limited or contested provenance information; the “empirical” itself can shift, in relevance or prevalence, as analytic structures evolve and multiply. Qualitative interviews are not just collected, for example, they are produced, through questions and other elicitation techniques developed by the interviewer (often drawing on complex traditions of thought about language, culture and society). Interviews are then analyzed, again using analytic structures developed within complex traditions of thought. If interviews are analyzed collaboratively, different analytic structures may be used by different researchers, or different researchers may deploy “the same” analytic structure in different ways, and come to different interpretations of what an interview, image or document “says.” It is thus critical to recognize – and make accessible and discoverable (if researchers deem this appropriate) – the analytic structures through which data in the empirical humanities is both produced and interpreted. Metadata functionality thus needs to be in place at many stages in the ethnographic research process, addressing diverse types of “data”—including analytic structures used to produce and interpret empirical data.

Individuals, communities and initiatives that will benefit from the proposed WG:

●      Researchers: by reducing psychological, institutional, political, cultural and technological barriers to digitizing and sharing data, making shared data easier to find and cite and improving mechanisms for credit

●      Collaborative research platform developers: by providing a guide to various metadata options and recommendations on the important fields to include in form-based metadata entry systems.

●      Interlocutors: by providing informed consent forms with a wide range of options for sharing and dissemination of interviews.

●      Collections: better metadata will improve accessibility, raising demand for archived material, helping collections better meet their mission. 

3. Engagement with Existing Work in the Area

Our preliminary research suggests that researchers in history and ethnography can quickly become overwhelmed the multitude of diverse and somewhat scattered metadata standards and that many researchers have their own ideas about the limits of existing metadata practices and standards. Historian of cartography Pat Seed, for example, is involved in efforts to define best metadata practices for maps and has noted that Dublin Core is far from sufficient for her purposes. Another common standard, the Open Archives Initiative, informs the metadata fields used by many advanced digital projects supporting historical and ethnographic research, but one researcher we interviewed suggested that the OAI standards are "out-of-date" and that “the standard uses older web technology and has not been updated or changed in quite some time… if I were to do it today, first we would want a separation of the data model and the data encoding. For example, many APIs allow you to get results back in JSON, XML, RSS, etc.” This WG will examine the value (and possible limits) of encouraging community-wide compliance with Dublin Core, OAI and many other standards, with a focus on building a list of suggested metadata fields based on an in depth analysis of diverse empirical humanities research practices.

We plan to partner with existing RDA Groups, such as the Metadata IG, the Metadata Standards Directory WG, the Research Data Provenance IG and the Engagement IG. Individual researchers and groups within the RDA working on linked data, preservation, persistent identifiers, dynamic data citation and the long tail of research data will also be key partners. These connections will be strengthened at and beyond the RDA Seventh Plenary.

Beyond the RDA, we will engage with institutions with widely respected standards (i.e. the Smithsonian), initiatives (i.e. Open Folklore) and publishing bodies with a digital presence (i.e. the Journal of Cultural Anthropology). 

4. Work Plan

Work Plan Components

1.     Survey of relevant literature and projects in order to develop a list of interviewees and build initial use-case scenarios.

2.     Ethnographic interviews with researchers in history and ethnography on the types of data for which they need metadata practices, the scenarios in which they encounter metadata decisions and (with a focus on interviews and field notes) their practices.

3.     Drafting deliverables in order to codify metadata practices deemed “best” in the context of different scenarios.

4.     Facilitating uptake of deliverables, initially with researchers using PECE.

5.     Reporting on lessons learned in initial uptake, and working with the DPHE-IG to ensure sustainability and evolution of the deliverables and their uptake.

6.     Promoting the deliverables from this WG within the RDA and beyond.


WG Operation

The initial core members of this WG will meet regularly to ensure continual development towards the proposed deliverables. Many of the initial WG members have a well-established working relationship, with a record of collaborative peer-reviewed publications and presentations (at the American Anthropological Association, the Society for the Social Studies of Science, the Society for Cultural Anthropology and other conferences) as well as public dissemination (blog posts, university press releases, etc.) disseminating the results of their work. Differences of opinion and experience will be viewed as an asset within this WG, and will resolve through communication and collaboration practice.


In the spirit of “user-centered design,” this WG will partner with developers of the PECE platform from the early stages to increase the likelihood of the deliverables meeting user needs. An ongoing series of “project shares” and “issues shares” hosted by the DPHE-IG will also provide frequent opportunities for members of this WG to envision how the deliverables could feed into a wide variety of digital humanities projects. This WG will be a vehicle for the broadly understood need for RDA to continue developing engagement with social science and humanities research communities.


Updates to (and input from) the broader community of RDA will be provided at plenaries every six months in the form of poster sessions, breakout groups and Birds of a Feather sessions.


Adoption Plan

Specific groups committed to taking up the deliverables of this WG include collaborative research projects on the PECE platform. Two instances of PECE – The Asthma Files (TAF-PECE) and The Disaster-STS Research Network (DSTS-PECE) – will provide initial venues for the implementation of the deliverables proposed here. Both have small but active, cooperative and growing user communities. TAF-PECE is a collaborative research project that currently has approximately ten consistent users in geographically distributed locations, along with many more student-researchers, all likely to be working on the platform on a daily basis. DSTS-PECE is an international research network that will be actively enrolling new members over the next twelve months  -- in groups of five to ten researchers; this incremental enrollment of new members will provide excellent opportunities to test and improve new, embedded metadata management policies. Technical implementations of new metadata policies in PECE will first run on a PECE test site, then be moved to the TAF and DSTS PECE instances. Embedded metadata policies will be part of the PECE Github release. We are aware that the RDA does not promote and endorse specific products and technologies and aim to use PECE as an initial testing ground and then facilitate adoption (implementation of recommended metadata fields) within a number of additional projects.

One probable adopter of this WG’s outputs is the Northwest Knowledge Network (NKN) which began as a cyberinfrastructure provider for geospatial natural resource data, but is rapidly expanding to host and serve data to the public from a plethora of disciplines, including interviews with native american communities, farmers and other natural resource stakeholders.  They also partner with archaeologists who manage and share their data using our data portal.  Using our recommendations as a guide to the many metadata standards used in the empirical humanities, the NKN will be able to better serve their many partners with their diverse needs. We are in close communication with other researchers and project leads that have expressed great interest in integrating our deliverables into their projects, including Juila Collins (working with projects including the National Snow and Ice Center, which hosts projects such as Exchange for Local Observations and Knowledge of the Arctic), Jason Baird Jackson (Co-chair of the DPHE-IG and Director of the Mathers Museum of World Cultures at Indiana University), Jarita Holbrook (a co-chair of this proposed EHM WG) and other researchers working within AstroAnthro.net (an umbrella project focused on studying astrophysicists, their culture, diversity and their engagement with big data and big collaborations) and representatives from Digital Research Infrastructure for the Arts and Humanities (DARIAH).

We are in communication with early-career scholars that are interested in how smart metadata practices might affect their collection of born-digital data, developing informed consent forms, for example, that allow their interlocutors to make a variety of choices about how interviews will be shared. We have also been in close communication with researchers, such as Sharon Traweek and Michael M.J. Fischer, that have considerable repositories of research material, that are awaiting our WG deliverables in order to digitize and make shareable their research collections.

We will continue to develop relationships with people and projects within and beyond the RDA and aim for a minimum of ten adopters. By connecting to researchers outside of the RDA, a tangential benefit of this WG will be to broaden RDA engagement, especially within the digital humanities. Our leadership currently includes representatives from three continents and we are interested in continuing to broaden the geographic (and other) diversity of our membership and will leverage our connections with the DPHE-IG to enrol Asian partners at the upcoming Plenary 7 meeting in Tokyo.

5. Initial Membership

Leadership (biographic notes in Appendix A)

●      Co-chair: Brandon Costelloe-Kuehn, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Co-chair: Dominic Difranzo, University of Southampton, Southampton, UK

●      Co-chair: Jarita C. Holbrook, University of the Western Cape, Cape Town, South Africa

●      Co-chair: Lindsay Poirier, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Co-chair: Mike Fortun, Rensselaer Polytechnic Institute, Troy, NY, USA

Initial Members/Interested (based on prior discussions and involvement with the DPHE-IG)

●      Alison Kenner, Drexel University, Philadelphia, PA, USA

●      Brian Callahan, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Bridget Almas, Tufts University, Medford, MA, USA

●      Dan Price, University of Houston, Houston, TX, USA

●      Danah Tonne, Karlsruhe Institute of Technology, DARIAH, Germany

●      Ellen Foster, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Jason Baird Jackson, Indiana University, Bloomington, IN, USA

●      Kim Fortun, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Lindsay Poirier, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Matthew Turner, Northwest Knowledge Network, University of Idaho, ID, USA

●      Rainer Stotzka, Karlsruhe Institute of Technology, DARIAH, Germany

●      Robert R. Downs, Columbia University, New York, NY, USA

●      Sharon Traweek, University of California, Los Angeles, CA, USA

●      Luis Felipe Rosado Murillo, Berkman Center for Internet and Society, Harvard University

6. References

Holstein, J.A. and Gubrium, J.F. 2004. “Context: Working it Up, Down and Across,” in C. Seale, G. Gobo, J.F. Gubrium and D. Silverman (eds) Qualitative Research Practice, London: Sage.

Keller, Evelyn Fox. 2002. Making Sense of Life: Explaining Biological Development with Models, Metaphors, and Machines. Cambridge, Massachusetts: Harvard University Press.

Turkle, Sherry, and Seymour Papert. 1990. “Epistemological Pluralism: Styles and Voices within the Computer Culture.” Signs 16 (1): 128–57.

Riley, Jenn. 2009. “Seeing Standards: A Visualization of the Metadata Universe,” available at http://www.dlib.indiana.edu/~jenlrile/metadatamap/ (Accessed 12/9/2015)

7. Appendix A: Leadership Biographical Notes

Brandon Costelloe-Kuehn is an anthropologist and Lecturer in Science & Technology Studies at Rensselaer Polytechnic Institute. Using multi-sited ethnographic methods, his research examines, and participates in, the design of innovative media systems to address the communication and collaboration challenges of politically and scientifically complex environmental issues. He works within a number of collaborative endeavors, including The Asthma Files, PECE and the Multispecies Salon. Brandon was awarded a Summer Internship and then RDA/US Fellowship to to develop the Metadata for Empirical Humanities WG proposal and contribute to a number of ongoing initiatives within the Digital Practices in History and Ethnography Interest Group.

Dominic DiFranzo is a Research Fellow with the Web and Internet Science Group at the University of Southampton in the UK. He currently works in the Engineering and Physical Sciences Research Council (EPSRC) funded project, SOCIAM, which involves researching the nature and development of social machines. His research involves collaborating with colleagues across the social sciences and humanities to translate the tools and methods from data science, e-science and informatics to address their research needs and purposes. This includes working with a wide array of research groups and projects including large-scale social network analysis, experimental ethnography, open government data and web observatories.  He holds a PhD in Computer Science from the Rensselaer Polytechnic Institute and was a member of the Tetherless World Constellation.

Jarita C. Holbrook is an Associate Professor of Physics at the University of the Western Cape, South Africa. She holds a doctorate in Astronomy & Astrophysics from the University of California, Santa Cruz. She was a postdoctoral fellow at the Center for the Cultural Studies of Science, Technology, and Medicine at UCLA, and the Max Planck Institute for the History of Science in Berlin, Germany. She is a cultural astronomer focusing on African indigenous astronomy, the culture of astrophysicists and practices of inclusion and exclusion. RDA DPHE-IG members Jarita Holbrook, Sharon Traweek, Luis Felipe Rosado Murillo, and Reynal Guillen are part of AstroAnthro.net, an umbrella project focused on studying astrophysicists, their culture, diversity and their engagement with big data and big collaborations. Of importance to the group is automating tools for data visualizations characterizing the content of interviews and publicly available data on individual astrophysicists.


Lindsay Poirier is a PhD student in the Science and Technology Studies department at Rensselaer Polytechnic Institute. For the past year, she has served as the lead platform architect for PECE - a role that involves translating the theoretical commitments of the empirical humanities into digital infrastructure. Her dissertation work draws on the history of artificial intelligence and leverages ethnographic methods to analyze the design and politics of the World Wide Web. She has contributed to a number of initiatives in the DPHE-IG.


Mike Fortun is a historian and anthropologist of the life sciences whose research has focused on the contemporary science, culture and political economy of genomics. His work has covered the policy, scientific and social history of the Human Genome Project in the U.S.; the growth of commercial genomics and bioinformatics in the speculative economies of the 1990s; and the emergence of transdisciplinary research programs in toxicogenomics, addiction and environmental health. Mike Fortun is a co-chair of the DPHE-IG and is a lead developer of PECE.



[1] http://rd-alliance.github.io/metadata-directory/subjects/


