Sharing and Archiving Qualitative Data

Considerations on What to Share

Remember when we discussed the importance of outlining data-sharing plans in Data Management Plans (DMPs)? At this stage, Sarah could greatly benefit from having a clear strategy for archiving and storing her data. As we discussed earlier, understanding the available options and having at least a rough plan for what will be shared, along with strategies to facilitate the process, is very important. We provided Sarah with recommendations on what to document, and we hope this guidance will empower her to share her research deliverables confidently while adhering to key principles of open practices.

Also, it is important to recap the importance of balancing the value of open sharing against the risks of harm associated with the identification of participants, communities, and research sites. The good news is that there are more options in between data being closed and open!

Depending on your project needs and what was agreed in the informed consent, we recommend you to consider evaluating access control options, they will help you determine which data repository will be most suitable for storing and preserving your project data.

Access Control Questions

Access controls fall into three main categories:

Who can access your data? Access may be limited to qualified researchers, often requiring proof of interest through research proposal, or it may require pre-approval from an Institutional Review Board (IRB) for general requests.
How can others access your data? Secure internet connections, along with agreements regarding data storage and destruction, might be required for downloading data. Researchers may sometimes need to access data in person on a secure, offline computer. Hybrid solutions, like ICPSR’s “virtual enclave,” allow remote viewing without data leaving the server.
When can others access your data? Embargoes can temporarily restrict access to protect human participants, often allowing researchers to publish findings before broader access. These embargoes can also facilitate long-term data availability, with set dates for lifting restrictions, as seen in historical archives.

Sharing Levels

Openly available: data (typically de-identified) shared with no restrictions.

Example: Cunningham, Una; De Brún, Aoife; Mayumi, Willgerodt et al. (2021). Appendices interview formats [Dataset]. Dryad. https://doi.org/10.5061/dryad.q83bk3jg8

Subject to Embargo: a temporary restriction on sharing or publishing data. It means that the data can’t be made public for a set period, usually to protect sensitive information allow for further analysis, or wait for a specific event, such as a formal publication before releasing it.

Example: Ibitoye, Mobolaji; OlaOlorun, Funmilola; Casterline, John B.. 2025. “Demand for Modern Contraception in Sub-Saharan Africa: New Methods, New Evidence”. Qualitative Data Repository. https://doi.org/10.5064/F600CMLO. QDR Main Collection. V1

Closed Access/Metadata Record Only (sensitive data/no consent): a summary and description of a dataset without containing the actual data itself that provides essential information about the dataset’s provenance, structure, and context.

Depending on the research case, access can be provided through a Data Use Agreement (DUA) and involve a data enclave for safe access. These requirements will also depend on IRB and consent form agreements.

Data Use Agreement (DUA) required: a contract that outlines the terms and conditions for a recipient to use data from a data owner. It’s specific to a project or study and can include limitations on use, data safeguarding obligations, and privacy rights. Some supplementary files (i.e., codebooks, data collection instrument, selected processed data to reproduce specific figures or support some findings).

Example: Steeves, Vicky; Peltzman, Shira; Kim, Julia; Griesinger, Peggy; Blumenthal, Karl-Rainer. 2020. “Data for:”What’s Wrong with Digital Stewardship: Evaluating the Organization of Digital Preservation Programs from Practitioners’ Perspectives”. Qualitative Data Repository. https://doi.org/10.5064/F6DJRPLK.

💭 Discussion: What is the value of sharing a metadata record only?

A metadata-only record for research data that isn’t openly available enables readers to evaluate whether they want to request access quickly. While a well-crafted Data Availability Statement in journal papers serves a similar purpose, a metadata-only record in a suitable repository offers the benefit of being discoverable through data-focused searches, along with the ability to provide more detailed descriptions through rich, linked, and interoperable metadata.

A Note About DAS

Data Availability Statements (DAS) are crucial for the credibility of manuscripts and other published research. They provide interested readers—and sometimes automated algorithms—access to the underlying data supporting your claims, allowing them to verify those assertions or use the data for further research. We suggest following some best practices for crafting statements that are both effective and clear while also complying with funders’ and journal policies’ requirements.

Source: UCSB Library Data Literacy Series (perma.cc/3ZHR-6JAG)

Applying Access Controls

Implementing access controls involves a trade-off: while stricter controls reduce misuse risk, they can hinder beneficial access. Though powerful, they should not unnecessarily complicate access to low-risk data. As the principal steward of your data, you ultimately decide on access controls. However, it’s advisable to involve repository staff in this process, as they can highlight potential challenges, ensuring that your data remains accessible and ethically shared in the long run.

Sharing de-identified transcripts openly while placing recordings under more stringent access controls.
Do keep a list of de-identification rules for yourself and your team should you collaborate. This list serves as necessary documentation when you share your data. See, for example, the protocol Thad Dunning and Edward Camp used to de-identify data deposited with the Qualitative Data Repository. This document is separate from the key that links de-identified entries to the individuals or entities interviewed, which should not be included when sharing your data.
Do check the document properties of files, which may contain identifiers such as original file names identifying interview respondents.
Finally, do try to strike a balance between keeping your participants’ information confidential and unnecessarily reducing the analytic value of the data by removing too much information. If you are having difficulties striking that balance, you could ask another subject-matter expert for assistance; some repository personnel or data librarians can also provide abstract rules that you can follow.

What data?

ICPSR’s Guide for Sharing Qualitative Data outlines examples of qualitative data sources that may be archived for secondary analysis:

• Interview methods, including those captured through notes, audio, and video

In-depth and/or unstructured interviews
Semi-structured interviews
Focus group interviews

• Diary studies that are unstructured or use semi-structured writing prompts

• Observational studies that generate field notes and other text and information

Naturalistic observation of real-world environments (e.g., classrooms, workplaces, healthcare facilities, courtrooms, public spaces)
Participant observation, where the researcher becomes an active part of the setting to collect information (e.g., online gaming, community policing, nightclub culture)
Structured observation is where the research has predefined objectives and a systemic approach to collecting information. This would include case studies.

• Text from available sources

Meeting minutes
Official records Medical records
News sources and social media
Excerpts of copyrighted materials (e.g., literature, film, music)

• Survey methods or questionnaires with substantial open-ended comments

Open formats

Why should we prioritize open file formats in our research? Imagine sharing your groundbreaking findings and ensuring that anyone, anywhere, can access and build upon your work without running into compatibility issues. Open formats, offer exactly that—freedom from proprietary software constraints. By choosing open formats, you enhance collaboration and transparency and make your research more sustainable for others and your future self.

There is a diversity of open formats available across different types of media that can be of great use to qualitative data researchers, including audio, video, image, and text. Refer to the handout below for some examples:

Source: UCSB Library Data Literacy Series (perma.cc/W4FL-JDFT)

Preparing Your Data for Submission

There are a few required and recommended files that are important to be added to your project package submission.

Required:

Processed de-identified data (e.g., transcripts);
Coded Data (supporting excerpts);
README File: an overview of your project, including data sources, their relationships and a brief description of the methods. Here is a customizable README template;
Data Collection Instruments: A sample of instruments used for data collection, such as surveys or interview guides;
Codebook: the coding framework used, including definitions of codes and categories;

Recommended:

Informed consent statement(s), if applicable;
IRB protocol, if applicable;
Study protocol or procedures manual, if applicable.

Source: UCSB Library Data Literacy Series (perma.cc/E7BA-BBYE).

Licensing Your Data

Research data itself is generally not copyrightable because it consists of facts, figures, and raw information that cannot be considered original creative expression. Copyright protects the unique expression of ideas, such as written texts, artwork, and music, rather than the underlying data or factual content.

Most data repositories adhere to open licenses such as CC0 (Creative Commons Zero) or CC BY (Creative Commons Attribution) to encourage broad accessibility and reuse of data. These licenses promote the free sharing of knowledge, allowing researchers and practitioners to utilize, modify, and redistribute data without significant restrictions, ultimately fostering collaboration and innovation within the scientific community.

However, researchers may choose to assign different licenses to other creative deliverables and supplementary materials associated with their projects, such as reports, presentations, or multimedia content. For example, Sarah might opt for a CC BY-NC (Attribution-NonCommercial) license for a infographic she created to represent the ethical approaches in social media influencing market, to restrict its use for commercial purpose. This flexibility allows Sarah and the research community at large to balance openness with the need to protect specific aspects of their intellectual property while still contributing to the collective body of knowledge.

The handout below provides more insights about licenses, including the Creative Commons family:

Source: UCSB Library Data Literacy Series (perma.cc/ET6F-N84X).

Recommended/Cited Sources:

Campbell R, Javorka M, Engleton J, Fishwick K, Gregory K, Goodman-Williams R. Open-Science Guidance for Qualitative Research: An Empirically Validated Approach for De-Identifying Sensitive Narrative Data. Advances in Methods and Practices in Psychological Science. 2023;6(4). doi:10.1177/25152459231205832

Myers CA, Long SE, Polasek FO. Protecting participant privacy while maintaining content and context: Challenges in qualitative data De-identification and sharing. ProcAssoc Inf Sci Technol. 2020;57:e415. https://doi.org/10.1002/pra2.415

DuBois, J. M., Strait, M., & Walsh, H. (2018). Is it time to share qualitative research data?Qualitative Psychology, 5(3), 380–393. https://doi.org/10.1037/qup0000076

Why Sharing Qualitative Data?

Sharing with Caring