Sharing and Archiving Qualitative Data
Why Sharing Qualitative Data?
Sharing qualitative data benefits both the scholarly community and researchers in several ways:
Fostering Public Trust: Transparency enhances public confidence in research outcomes, vital for securing funding and support for future projects. It allows for verification of claims, reinforcing trust in the research.
Dynamic Research Environment: While qualitative research invites diverse interpretations, sharing data fosters improved research quality through collaborative critique and examination.
Enabling New Research: Access to shared data inspires innovative analyses, maximizing the scientific value of existing studies.
More Effective Use of Resources: Data sharing reduces costs related to new data collection, promoting efficient resource utilization, and minimizing the burden on frequently targeted communities.
Skill Development for Trainees: It offers students valuable opportunities to learn coding and analysis techniques, enhancing their educational experience.
Receiving Credit: Sharing data ensures proper attribution, allowing researchers to gain recognition for their work.
Opportunities for Collaboration: Open data fosters partnerships among researchers, leading to new insights and advancements.
Sharing with Caring
When sharing data, researchers should make their best effort to provide complete and good quality documentation to support reuse.
Before we dive into what researchers should share and where. Let’s explore something together.
Please open the links to the two data deposits below:
Taherzadeh, O., 2016, “Interview Transcripts”, Interview Transcripts, https://doi.org/10.7910/DVN/4C9KFK/XRREIY, Harvard Dataverse, V1
Klein, M., 2022. Interview transcripts of addiction therapists and recovering drug service users. Bath: University of Bath Research Data Archive. Available from: https://doi.org/10.15125/BATH-01096.
Can you spot any differences? Supposing those were both topics related to your research, how likely would you be to reuse one dataset versus another? Why?
Context and Documentation
Taherzadeh (2016): This deposit lacks detailed contextual information about the study, such as the sample, interview questions, study goals, or informed consent details. It is a standalone collection of transcripts.
Klein (2022): This deposit provides clearer context, the objectives of the research and questions asked, and links to the associated dissertation.
Reuse Value
Taherzadeh (2016) Dataset: Low
- The absence of context and supporting documentation makes it challenging to assess the dataset’s validity, reliability, and relevance to other research. Without knowing the background or how the data was collected, it’s difficult to justify its use in further studies.
Klein (2022): Higher
- The dataset seems to come with comprehensive documentation, including context about the participants and the study goals. This information facilitates a better understanding of how to apply the data effectively in new research, making it much more reusable.
Considerations on What to Share
Remember when we discussed the importance of outlining data-sharing plans in Data Management Plans (DMPs)? At this stage, Sarah could greatly benefit from having a clear strategy for archiving and storing her data. As we discussed earlier, understanding the available options and having at least a rough plan for what will be shared, along with strategies to facilitate the process, is very important. We provided Sarah with recommendations on what to document, and we hope this guidance will empower her to share her research deliverables confidently while adhering to key principles of open practices.
Also, it is important to recap the importance of balancing the value of open sharing against the risks of harm associated with the identification of participants, communities, and research sites. The good news is that there are more options in between data being closed and open!
Depending on your project needs and what was agreed in the informed consent, we recommend you to consider evaluating access control options, they will help you determine which data repository will be most suitable for storing and preserving your project data.
Access Control Questions
Access controls fall into three main categories:
Who can access your data? Access may be limited to qualified researchers, often requiring proof of interest through research proposal, or it may require pre-approval from an Institutional Review Board (IRB) for general requests.
How can others access your data? Secure internet connections, along with agreements regarding data storage and destruction, might be required for downloading data. Researchers may sometimes need to access data in person on a secure, offline computer. Hybrid solutions, like ICPSR’s “virtual enclave,” allow remote viewing without data leaving the server.
When can others access your data? Embargoes can temporarily restrict access to protect human participants, often allowing researchers to publish findings before broader access. These embargoes can also facilitate long-term data availability, with set dates for lifting restrictions, as seen in historical archives.
Sharing Levels
- Openly available: data (typically de-identified) shared with no restrictions.
Example: Cunningham, Una; De Brún, Aoife; Mayumi, Willgerodt et al. (2021). Appendices interview formats [Dataset]. Dryad. https://doi.org/10.5061/dryad.q83bk3jg8
- Subject to Embargo: a temporary restriction on sharing or publishing data. It means that the data can’t be made public for a set period, usually to protect sensitive information allow for further analysis, or wait for a specific event, such as a formal publication before releasing it.
Example: Ibitoye, Mobolaji; OlaOlorun, Funmilola; Casterline, John B.. 2025. “Demand for Modern Contraception in Sub-Saharan Africa: New Methods, New Evidence”. Qualitative Data Repository. https://doi.org/10.5064/F600CMLO. QDR Main Collection. V1
- Closed Access/Metadata Record Only (sensitive data/no consent): a summary and description of a dataset without containing the actual data itself that provides essential information about the dataset’s provenance, structure, and context.
Depending on the research case, access can be provided through a Data Use Agreement (DUA) and involve a data enclave for safe access. These requirements will also depend on IRB and consent form agreements.
- Data Use Agreement (DUA) required: a contract that outlines the terms and conditions for a recipient to use data from a data owner. It’s specific to a project or study and can include limitations on use, data safeguarding obligations, and privacy rights. Some supplementary files (i.e., codebooks, data collection instrument, selected processed data to reproduce specific figures or support some findings).
Example: Steeves, Vicky; Peltzman, Shira; Kim, Julia; Griesinger, Peggy; Blumenthal, Karl-Rainer. 2020. “Data for:”What’s Wrong with Digital Stewardship: Evaluating the Organization of Digital Preservation Programs from Practitioners’ Perspectives”. Qualitative Data Repository. https://doi.org/10.5064/F6DJRPLK.
A metadata-only record for research data that isn’t openly available enables readers to evaluate whether they want to request access quickly. While a well-crafted Data Availability Statement in journal papers serves a similar purpose, a metadata-only record in a suitable repository offers the benefit of being discoverable through data-focused searches, along with the ability to provide more detailed descriptions through rich, linked, and interoperable metadata.
A Note About DAS
Data Availability Statements (DAS) are crucial for the credibility of manuscripts and other published research. They provide interested readers—and sometimes automated algorithms—access to the underlying data supporting your claims, allowing them to verify those assertions or use the data for further research. We suggest following some best practices for crafting statements that are both effective and clear while also complying with funders’ and journal policies’ requirements.
Source: UCSB Library Data Literacy Series (perma.cc/3ZHR-6JAG)
Applying Access Controls
Implementing access controls involves a trade-off: while stricter controls reduce misuse risk, they can hinder beneficial access. Though powerful, they should not unnecessarily complicate access to low-risk data. As the principal steward of your data, you ultimately decide on access controls. However, it’s advisable to involve repository staff in this process, as they can highlight potential challenges, ensuring that your data remains accessible and ethically shared in the long run.
Sharing de-identified transcripts openly while placing recordings under more stringent access controls.
Do keep a list of de-identification rules for yourself and your team should you collaborate. This list serves as necessary documentation when you share your data. See, for example, the protocol Thad Dunning and Edward Camp used to de-identify data deposited with the Qualitative Data Repository. This document is separate from the key that links de-identified entries to the individuals or entities interviewed, which should not be included when sharing your data.
Do check the document properties of files, which may contain identifiers such as original file names identifying interview respondents.
Finally, do try to strike a balance between keeping your participants’ information confidential and unnecessarily reducing the analytic value of the data by removing too much information. If you are having difficulties striking that balance, you could ask another subject-matter expert for assistance; some repository personnel or data librarians can also provide abstract rules that you can follow.
What data?
ICPSR’s Guide for Sharing Qualitative Data outlines examples of qualitative data sources that may be archived for secondary analysis:
• Interview methods, including those captured through notes, audio, and video
In-depth and/or unstructured interviews
Semi-structured interviews
Focus group interviews
• Diary studies that are unstructured or use semi-structured writing prompts
• Observational studies that generate field notes and other text and information
Naturalistic observation of real-world environments (e.g., classrooms, workplaces, healthcare facilities, courtrooms, public spaces)
Participant observation, where the researcher becomes an active part of the setting to collect information (e.g., online gaming, community policing, nightclub culture)
Structured observation is where the research has predefined objectives and a systemic approach to collecting information. This would include case studies.
• Text from available sources
Meeting minutes
Official records Medical records
News sources and social media
Excerpts of copyrighted materials (e.g., literature, film, music)
• Survey methods or questionnaires with substantial open-ended comments
Open formats
Why should we prioritize open file formats in our research? Imagine sharing your groundbreaking findings and ensuring that anyone, anywhere, can access and build upon your work without running into compatibility issues. Open formats, offer exactly that—freedom from proprietary software constraints. By choosing open formats, you enhance collaboration and transparency and make your research more sustainable for others and your future self.
There is a diversity of open formats available across different types of media that can be of great use to qualitative data researchers, including audio, video, image, and text. Refer to the handout below for some examples:
Source: UCSB Library Data Literacy Series (perma.cc/W4FL-JDFT)
Where Should You Share Your Project Data?
The decision of where to archive data is crucial for ensuring its accessibility, integrity, and long-term preservation. Selecting a stable, certified repository not only safeguards the data against loss or corruption but also enhances its credibility and usability within the research community. Unlike sharing via email, personal communication, or unsecured websites—methods that can lead to data loss, miscommunication, and lack of traceability—certified repositories provide a structured and secure environment for data management.
Such repositories adhere to rigorous standards for data storage and access, ensuring that shared data remains discoverable, citable, and protected over time. By thoughtfully choosing the right repository, researchers can maximize the impact of their work, facilitate reproducibility, and contribute to the advancement of knowledge across various fields.
Beyond support to access controls when required, choosing a repository to archive QHS data, should take into account several factors laid out in the handout below:
Source: UCSB Library Data Literacy Series (perma.cc/WLF7-WTUC).
Preparing Your Data for Submission
There are a few required and recommended files that are important to be added to your project package submission.
Required:
Processed de-identified data (e.g., transcripts);
Coded Data (supporting excerpts);
README File: an overview of your project, including data sources, their relationships and a brief description of the methods. Here is a customizable README template;
Data Collection Instruments: A sample of instruments used for data collection, such as surveys or interview guides;
Codebook: the coding framework used, including definitions of codes and categories;
Recommended:
Informed consent statement(s), if applicable;
IRB protocol, if applicable;
Study protocol or procedures manual, if applicable.
Source: UCSB Library Data Literacy Series (perma.cc/E7BA-BBYE).
Licensing Your Data
Research data itself is generally not copyrightable because it consists of facts, figures, and raw information that cannot be considered original creative expression. Copyright protects the unique expression of ideas, such as written texts, artwork, and music, rather than the underlying data or factual content.
Most data repositories adhere to open licenses such as CC0 (Creative Commons Zero) or CC BY (Creative Commons Attribution) to encourage broad accessibility and reuse of data. These licenses promote the free sharing of knowledge, allowing researchers and practitioners to utilize, modify, and redistribute data without significant restrictions, ultimately fostering collaboration and innovation within the scientific community.
However, researchers may choose to assign different licenses to other creative deliverables and supplementary materials associated with their projects, such as reports, presentations, or multimedia content. For example, Sarah might opt for a CC BY-NC (Attribution-NonCommercial) license for a infographic she created to represent the ethical approaches in social media influencing market, to restrict its use for commercial purpose. This flexibility allows Sarah and the research community at large to balance openness with the need to protect specific aspects of their intellectual property while still contributing to the collective body of knowledge.
The handout below provides more insights about licenses, including the Creative Commons family:
Source: UCSB Library Data Literacy Series (perma.cc/ET6F-N84X).
Recommended/Cited Sources:
Campbell R, Javorka M, Engleton J, Fishwick K, Gregory K, Goodman-Williams R. Open-Science Guidance for Qualitative Research: An Empirically Validated Approach for De-Identifying Sensitive Narrative Data. Advances in Methods and Practices in Psychological Science. 2023;6(4). doi:10.1177/25152459231205832
Myers CA, Long SE, Polasek FO. Protecting participant privacy while maintaining content and context: Challenges in qualitative data De-identification and sharing. ProcAssoc Inf Sci Technol. 2020;57:e415. https://doi.org/10.1002/pra2.415
DuBois, J. M., Strait, M., & Walsh, H. (2018). Is it time to share qualitative research data?Qualitative Psychology, 5(3), 380–393. https://doi.org/10.1037/qup0000076