Background“Openness” in scientific research relates to the sharing, in a usable way, of scholarly publications and data resulting from scholarly research (including metadata and the methodology, as well as codes or algorithms that were used to generate the research data shared). This paper examines some of the ethical considerations that arise with the sharing of data through online data repositories in health and biomedical research. Data repositories represent only one mode of data sharing; other modes may include posting data on institutional or researchers’ websites, providing data to requestors personally, and making data accessible through publications (e.g. through supplementary files). Show
Sharing data through well-curated online data repositories presents opportunities as well as challenges. For example, a distinct advantage is that online data repositories create a central “pool” of data and make the data easily discoverable for bona fide researchers worldwide to access and re-use. Ideally, the storage of data in research data repositories also ensures the long-term availability of data beyond the end of a particular research project. A corresponding challenge concerns questions about the appropriate governance mechanisms for data repositories, including questions about who will be able to access the data and what (if any) levels of restriction should be applied. Another practical yet pervasive challenge is researchers’ ability to make use of data in data repositories. This could be due to the quality of the data, its formatting, or the absence of metadata. The FAIR Data Principles reflect the features that must characterise data and other research products so that humans and, importantly, machines can fully understand and use them (Wilkinson et al. 2016). To be of value, according to the FAIR principles, data must be: Findable, Accessible, Interoperable, and Re-usable (Wilkinson et al. 2016). The proliferation of data sharing policies, practices, and mandates has occurred over a number of years. Despite this, there is evidence to suggest that researchers have not kept up with these developments; researchers continue to display limited understanding of data sharing environments, including knowledge of repositories and issues such as copyright and licensing (Stuart et al. 2018). Types of Data RepositoriesData repositories are not uniform. They differ in terms of who holds the data as well as the nature of the data held and could often be considered to belong to more than one of the categories below. Institutional Data Repositories: These repositories are often university-based. They manage and disseminate the research output (primary facts and statistics but also source codes and developed software tools) generated by members of an institution’s own research community. A good example is the University of Bristol’s Research Data Service has developed a central repository with accompanying governance, technical, and workflow structures that enhance the responsible sharing of data (Merrett et al. 2018). The management and dissemination of institutional data is also supported by web-based repositories such as Figshare (https://figshare.com). Government Data Repositories: Governments hold vast amounts of data routinely collected for administrative purposes, health surveillance, and the delivery and management of healthcare. The value of access to such data for health and biomedical research is increasingly being recognised by governments and the research community. The tension between sharing such data and concerns about privacy protections remains a central issue but increasingly there are governance solutions to facilitate the re-use of valuable government data sets (Ubaldi 2013). Discipline-Specific Data Repositories: Discipline-specific data repositories contain data and metadata pertaining to specific subject areas, such as health sciences or earth and environmental sciences. Such repositories are valuable because they provide a single point for discipline-specific data discovery and retrieval. They are also necessary as domain-specific software is often required to convert file formats of data in various disciplines. Generalist Data Repositories: Generalist data repositories are suitable for the deposition of data where no discipline-specific repository exists. Scientific Data advises that such repositories are also suitable “for archiving associated analyses, or experimental-control data, supplementing the primary data in a data-type specific repository” (Scientific Data n.d.). Project/Program-specific Repositories: Program/project-specific repositories comprise collections of data collected as part of a specific body of research. An example of such a repository is the repository for the Growing Up Today Study, whose aim is to collect data from thousands of participants to investigate factors that affect health throughout life (https://www.re3data.org/repository/r3d100011832). Support for Data SharingThere is general support for data sharing from numerous stakeholders. This includes the scientific community, through international bodies, such as the International Council for Science (International Council for Science (ICSU) 2015), and funding bodies, such as the National Institutes of Health (U.S. Dept of Health and Human Services 2018), the European Commission (European Commission 2012), and the Australian Research Council (Australian Research Council 2018). Funding bodies either mandate or encourage grantees to submit a data management plan detailing how research outputs will be shared. Perhaps most importantly, many academic journals increasingly require researchers to make underlying data available upon scholarly publication (Taichman et al. 2016; Federer et al. 2018). The Transparency and Openness Promotion (TOP) Guidelines developed by the Center for Open Science articulate three levels of transparency, each requiring greater commitment to open sharing, and these have been adopted by journals in most fields and increasingly by funding bodies around the world (Nosek et al. 2015). These and other stakeholders articulate the benefits in support of open data sharing shown in Table 1. Table 1 Benefits of data sharing Full size table Data Sharing Attitudes on the GroundResearchers’ attitudes toward data sharing appear to be influenced by discipline and discipline-specific normative pressures (i.e. established norms within their disciplines) but not by funding agency mandates, perhaps because of the lack of checks and penalties (Tenopir et al. 2015). Conversely, pressure to conform to open data practices and incentivisation from scientific journals appear to increase researchers’ data sharing practices. Researchers engaging with human subjects, such as those in health and medicine, are less likely to engage in data sharing, as many believe they do not have the right to share the data or are unsure about copyright and licensing (Tenopir et al. 2015). Furthermore, making data accessible and developing the required metadata is time-consuming and the perceived effort to achieve this also acts as a deterrent to data sharing (Stuart et al. 2018). There appear to also be some age-related differences between the perception about the value of data sharing and the actual data sharing practices: older researchers (50+) claim to share significantly more data than younger researchers but younger researchers indicate a more positive outlook on data sharing (Tenopir et al. 2015). Degrees of OpennessAlthough there is a general ambition in the scientific community to strive for a model of Open Data sharing, ethical considerations sometimes call for access restrictions where human subject data is concerned, especially in the health and biomedical sciences (Merrett et al. 2018; Boulton et al. 2012). A key consideration here is whether the data that is to be shared consists of aggregate research data or of individual participant data (IPD). The sharing of IPD, even if de-identified, may give rise to re-identification concerns in the context of big data. In contrast, the sharing of aggregate data would generally not disclose information about individuals and, hence, would be safer to share openly. However, aggregate research data does not always allow for full reproducibility of results and is less beneficial for future research use (see for example Huang et al. 2016). The different models of access restriction vary significantly (Lowrance 2012) as do specific definitions but, generally, data access levels fall somewhere into the broad spectrum of open, restricted, and controlled. These access levels have been developed with two disparate mechanisms in mind: (1) security mechanisms to ensure that only bona fide researchers bound by professional obligations and specific agreements have access to the data under certain data security conditions; and (2) participant consent. Consent does not provide protections against potential re-identification but does enable the research participant to assume or decline to assume potential risks associated with access to their de-identified data.
There is a wide variety of data security mechanisms deployed by repositories, often linked to the sensitivity of the data. Examples of security mechanisms include, but are not limited to, the following: various levels of control may be imposed by the repository developer and custodian often through formal data sharing agreements with explicit researcher and institutional obligations articulated, including a mandate not to attempt to re-identify participant data; data may be shared over secure platforms and may not be downloadable; there are sometimes requirements for members of the data repository to collaborate on projects; with some data, there are audit trails to provide greater accountability and protections. Key IssuesFunding bodies, publishers, and governments alike are strong supporters of open data sharing (and consequently the use of trusted repositories), but there are several issues requiring consideration. The following is not an exhaustive list:
Aiming to assist with the identification of uptake issues and their resolution via policy recommendations on open access to research data is the project on Policy RECommendations for Open Access to Research Data in Europe (Tsoukala et al. 2016). Valuable guidance is provided for the different stakeholders (e.g. funders, publishers, data managers, research institutions) in recognition of the different roles they play in the open access ecosystem (Tsoukala et al. 2015). Additional issues, such as access by commercial actors to publicly funded research data, are also addressed (Finn et al. 2014). Conflicts in Guidance and PoliciesData sharing is currently at various levels of implementation across the research spectrum worldwide. Hence, it may be mandated, encouraged, or not yet considered systematically in any phase of the research cycle including in the development of research proposals where such issues should be considered. As a result, the data sharing requirements of various entities may clash. For example, a scientific journal may mandate deposition and sharing of all research materials and products but a university may not yet formally consider data sharing as standard practice. On the other hand, Institutional Review Boards (IRBs) and other areas of the university are likely to have restrictive policies in relation to the disclosure and sharing of participant level data. Differences in risks associated with data sharing also arise from the different kinds of data shared, as the following example illustrates. Example: Genomic data sharing Key ValuesMany of the substantive and procedural values in this Framework (Xafis et al. 2019) bear on the practice of data sharing via repositories. In the section below we take up one of the steps in the decision-making process but discuss the values broadly so as to provide the context within which we are considering them. When we come to the decision-making step-by-step process, we will again discuss these values specifically as they relate to the case study. With respect to the issues discussed in this Domain, relevant substantive values include the following:
Key procedural values include transparency, accountability, and trustworthiness. These values relate both to processes adopted throughout data sharing but also to decisions regarding the development of repositories. An example of a data repository which has clearly articulated governance policies is Brain-CODE (www.braincode.ca). The clarity of these documents is important, as they increase transparency, which may also impact on accountability and trustworthiness.
Case Study: Sharing Individual-Patient Level Data in Data RepositoriesA clinician-researcher, Dr A, has completed a 2-year long city/state-wide, prospective observational study on the prevalence and risk factors for colonisation by antimicrobial drug-resistant bacteria in adult hospital inpatients. The study involved the collection of anterior nares (nose), groin, and rectal swabs and information on participants’ history of healthcare contact, recent antibiotic use, travel, as well as information on housing and occupation. Informed consent was obtained from all 2000 participants with the consent form stating that participants’ de-identified research data may be “shared for research and teaching purposes”. The approving IRB understood this to mean conferences, journal papers, workshops, and teaching activities, as deposition of data in a repository is not yet a required research practice at Dr A’s university. The university’s standard template for a data management plan, which Dr A had submitted, does not address the deposition of data into repositories. Dr A intends to deposit the research data in an online discipline-specific data repository. Making the data accessible for future research has been strongly encouraged by the funder of the study and is mandated by the journal in which Dr A intends to publish his findings. Broad ConsiderationsThis case exemplifies the difficulties that arise where the data sharing requirements of various entities clash, e.g. the journal mandates sharing of all research materials and products but the university has not yet formally considered data sharing as part of standard practice. It also highlights the fact that some stakeholders may be justifying their data sharing policies by appealing to certain values but inadvertently not attending to other important and relevant values. Thus, an IRB that prioritises harm minimisation of research participants over other values might have restrictive policies in relation to the disclosure of participant data, even if such data is de-identified. Such reluctance to embrace the sharing of de-identified data sets may result from concerns about appropriately adhering to privacy legislation. Conversely, scientific journals might be primarily concerned with the value of accountability, which would prompt them to support data sharing to allow for reproducibility. Another broad consideration the case highlights is the importance of specifying what research data would have to be shared: would aggregate data suffice or is the sharing of IPD required? The sharing of aggregate data would avoid disclosing any information about individuals and, hence, would be less problematic. However, aggregate research data often does not allow for full reproducibility of results and is also less beneficial for future research use. Researchers should anticipate, ideally at the stage of planning the research, that they will be required to share or deposit some or all research data upon publication or completion of a research project. Thus, they should consider incorporating requests for funding to support the potential additional costs involved in the deposition of research data into repositories (preparing the data for re-use can be time-consuming and expensive depending on the data). In addition, they should develop appropriate designs for the level of data sharing depending on the sensitivity of the data. Application of the Deliberative Balancing ApproachIn this section, we apply the deliberative balancing approach that is introduced in Xafis et al. (2019) to the case study. The central question that we wish to consider is whether it would be appropriate for Dr A to upload the research data to an online research data repository.
There are four issues to consider: 1. Dr A wants to publish the findings in a reputable journal but the journal requires him to make all underlying data (including de-identified IPD) available in a data repository. The study funders strongly encourage such practices. This puts him in an ethically challenging position. 2. Participants have consented to their data being used in anonymised form for future research—yet, in the era of big data, it is not clear whether such anonymity can be guaranteed. 3. It is unclear whether the statement in the consent documents “for further research and teaching purposes” adequately conveyed to the research participants that the data would be (potentially widely) shared through a data repository. 4. Technical issues to consider include the fact that securing anonymity may not be possible, especially when fine-grained individual participant data is involved and perhaps even more so when biological samples have been collected. Another technical issue is that it may be “impracticable” to re-contact the research participants to obtain their consent for the sharing of their data because of the number of participants involved (n = 2000) and because the study commenced 2 years ago. Ethical issues include the following:
The following are substantive and procedural values from the list of 16 Key Values that are listed in Xafis et al. (2019). Other values (from the list of 16) may be relevant as well, but those listed below are the ones that we deem to be most pertinent. Deciding which values must be considered can be challenging at first. To identify the values, we need to focus on the problem at hand, the ethical issues the case raises and also the obligations that arise as a result of our relationships with others. One of the central issues here is the assurances given and commitments made to the research participants as well as their expectations which flow on from these. The researcher is in a relationship of trust and owes respect to his/her research participants. Any deviation from what research participants expect as part of their involvement in the research process could undermine their trust in the researcher and the research community more broadly. On the other hand, making research data available has the potential to yield considerable benefits in relation to promoting research integrity and public benefit. Taking into account the issues listed in the first step of the application of the deliberative balancing approach as well as the need to respect persons and meet their expectations, we decided that the following values are most relevant:
Autonomy/Liberty: Participants’ autonomy is respected if conditions are created for self-determination with respect to medical data that is about themselves. Respect for autonomy is a key reason why the researcher obtains consent from participants, as it allows participants to make decisions about what they wish to be involved in. Such decisions should be free from external pressures if a research participant’s freedom to choose is to be supported. Privacy: Even though the research data is said to be de-identified, in the age of big data we must acknowledge the potential for re-identification. Such re-identification would be a violation of privacy expectations in the sense that it violates participants’ freedom from unauthorised data activities involving information about themselves. If data did not identify individuals, they would likely be supportive of the inclusion of their data but, once re-identified, their privacy has been compromised. Public Benefit: When considering public benefits, we need to bear in mind that these benefits are not identified as such by all.
Justice:
Trustworthiness:
Transparency:
Accountability:
Several courses of action may be pursued. The most salient ones are:
Open sharing of IPD is considered valuable for the generation and testing of new hypotheses and the conduct of meta-analyses. Open sharing would strongly prioritise public benefits and is foundational to promoting accountability in the research enterprise.
The sharing of data can be done on a case-by-case basis which would involve a researcher identifying the research Dr A has done and contacting him for the underlying data. This form of data sharing relies on other researchers being familiar with the research someone has conducted, as the data is not publicly listed anywhere. Therefore, the underlying data cannot be discovered by researchers accessing repositories to identify suitable data for further analysis. Features unique to this kind of data sharing include the following:
Depositing the research data (including de-identified IPD) in a well-governed repository with restricted access is another option. This would reduce the risks to research participants for several reasons:
In combination, these features may suffice to provide adequate and reasonable data protections. Such requirements point to the weight given to ensuring that the privacy and confidentiality of individual participants are protected and that participants are not inadvertently harmed in the process of researchers sharing data for broader public benefits. Public benefit relating to the re-use of data would be supported by this course of action and aspects of the values of transparency and accountability would also be promoted.
Dr A could agree to provide aggregated data only, explaining to the journal that explicit consent for the deposition of IPD had not been requested at the start of the project. Dr A would be acting in accordance with his research participants’ expectations for respect, trustworthiness, and accountability but would potentially be viewed as not being transparent in his research practices, as the IPD would not be available for scrutiny by others.
Dr A could make efforts to re-contact research participants to explain the nature of the issue which would include clarifications of the following: new requirements to deposit all data in a repository in a de-identified form; efforts to ensure data cannot be re-identified but that this could not be guaranteed; re-use of data by bona fide researchers only. The provision of such information would demonstrate Dr A’s respect for participants and would show that the researcher is transparent about his intention to make available the data for future research.
It is usually impossible to satisfy all values that relate to a particular ethical concern but carefully considering the specific circumstances helps in weighing them against each other and identifying the option that can satisfy the most central values to the greatest degree. Here, it seems impracticable, and perhaps problematic, to re-contact the participants but we must ensure that their data and privacy are well protected. Such choices would promote the trust between the research community and publics and would provide evidence of the researcher’s respect for research participants, as their welfare is a central consideration. However, we must also bear in mind important considerations beyond the research participants themselves. A preferable option is to deposit the research data in a repository with restricted access. Dr A should identify a data repository with robust governance structures, which allows researchers to set the access level and which conducts appropriate screening of data requestors. Such screening may involve verifying affiliations, qualifications, and requiring a commitment they will not share the data with others or attempt to re-identify individuals. As previously noted, the specific features of the data, such as the level of sensitivity and the extent to which it can be meaningfully de-identified, will vary and will determine the level of access others could or should have to the data deposited in a data repository. Levels of access are often determined by each repository but researchers can also have input into this depending on the repository. Each repository makes governance documents available to researchers and other users and the level of detail in these documents reveals, to a large extent, the weight the repository places on many of the values discussed in this section. This option attempts to strike a balance between numerous values identified as underlying research and the increasing requirement to share data. On the one hand, it shows consideration for participants’ privacy and confidentiality by seeking to increase the technical and governance protections which all aim to reduce the potential for harm that might otherwise arise. On the other hand, restricted access to the IPD would enable other researchers to gain greater value from the data and to develop research projects that could further explore the area in question without engaging new research participants. Such research would require fewer funds, which, cumulatively, is of great benefit to the general public.
Dr A must now contact the IRB and the funding body to advise them of the data repository he has selected. If his research is referenced on his university website, he could indicate there which restricted access data repository he has deposited the data in. This would increase discoverability. ConclusionThis paper discussed the Domain of Openness in Big Data and Data Repositories. It presented issues that arise in open data sharing in the context of big data and provided insight into the nature of data repositories. The paper provided a case study which allowed us to firstly consider in a broader sense some values identified as being relevant. We then used the Framework by first identifying the values related to the case in question and then by applying the step-by-step decision-making process previously described (Xafis et al. 2019). Where necessary, explanations were given to elucidate further how selections and prioritisations were made. The recommended option was justified but further justification could be given by referring to the reasons why the other options were discounted. References
Download references What statement explains how the participants in the Tuskegee syphilis study were not treated respectfully quizlet?What statement explains how the participants in the Tuskegee Syphilis Study were not treated respectfully? - Researchers hid the true purpose of study, which prevent the men from making a fully informed decision about participating.
What was a potential issue with how the debriefing was done in the Milgram obedience studies?What was a potential issue with how the debriefing was done in the Milgram obedience studies? The Researchers did not tell participants that the learner had not actually been shocked.
Which of the following are ethical issues that apply to the Milgram obedience studies?The ethical issues involved with the Milgram experiment are as follows: deception, protection of participants involved, and the right to withdrawal. The experiment was deemed unethical, because the participants were led to believe that they were administering shocks to real people.
What ethical principles were violated in the Tuskegee Syphilis Study quizlet?The Tuskegee Syphilis Experiment violated ethical principles of Fidelity, respect for rights and dignity, coercion, justice, integrity, beneficence, benefits and burdens.
|