Research data is all the information that has been created, collected, recorded and processed during the research to verify its results. During the course of research and research projects, research data go through different phases - the so-called data life cycle. Research Data Management (RDM) is key to getting the processes right at all stages of this cycle. The handling of research data is increasingly becoming a focus of interest for funders - data management planning is often required when project applications are submitted, and emphasis is also placed on opening and sharing data. Adherence to the FAIR principles contributes significantly to maximising the re-use of research data. The implementation of FAIR principles at Masaryk University is carried out by the Open Science Team at the Institute of Computer Science within the framework of the MU Open Science strategy.
The Data Life Cycle describes all phases of working with scientific data from the beginning to the end of a project. ELIXIR RDMkit offers a clear view of these life cycle phases, including a detailed description of each phase and the questions we should ask at each stage of the life cycle.
FAIR Data Principles are a set of recommendations that, when applied to research data, increase the possibility of its further use. Adherence to these principles is a prerequisite for sharing and eventually opening data according to the principle "as open as possible, as closed as necessary".
The main FAIR principles are:
- Findability. In order for data to be usable, it is essential that it is findable. Sufficient metadata with persistent identifiers, indexed in searchable systems, is key to fulfilling this principle.
- Accessibility. This principle ensures that data and metadata can be accessed - in open access or through authentication and authorization using a standard communication protocol.
- Interoperability. Ensures integration with other data using standard data formats, controlled vocabularies, and linkage to other (meta)data through formalized references.
- Reuse. The highest FAIR principle, achieving this will enable quality metadata, publication under a clearly defined license, linking data to its origin and meeting the standards for the scientific field.
Open data are scientific data that meet the FAIR principles listed above and are available in digital form under an open licence (e.g. Creative Commons). The main idea behind the promotion of open research data is its reusability, the possibility to validate scientific claims, the reduction of unnecessary repetition of research, the possibility of integration with other datasets, the acceleration of the scientific process and the increase of scientific collaboration. Opening up data where possible is therefore highly desirable. This condition is also increasingly appearing in project schemes and in the publishers of peer-reviewed journals.
TIP: RDMkit – a guide to good data management practice in the life sciences.
Data management planning is the process of planning how scientific data will be handled throughout the life cycle of a project.
A data management plan (DMP) is a document that defines these activities and processes. It is a living document that should be continuously updated throughout the research project.
In order to work with the DMP as efficiently as possible, it is desirable to use an appropriate tool to create and update it. A DMP can be created as a shared document, for example, but there are a number of applications available specifically for creating DMPs (e.g. DMPonline). Masaryk University has a university instance of Data Stewardship Wizard (DSW), which is now also the recommended tool for MU staff.
TIP: DMP examples
Masaryk University data storage - MU has several types of data storage available for different types of data. The Storage Usage Recommendations are used to decide which type of storage to use for specific types of data.
SensitiveCloud - infrastructure for working with sensitive data.
CESNET Storage Department - for backup, archiving, data sharing.
EOSC - European Open Science Cloud is a European Commission initiative to develop a common research infrastructure with freely available services for storing, processing, sharing, analysing and reusing scientific data. The EOSC is currently being implemented in the Czech Republic to create a national node for this initiative and to promote good practice in research data management across scientific communities. The main objective is to create a so-called National Data Infrastructure for sharing, managing and accessing data and computing resources for research purposes.
Scientific data can be stored and shared in data repositories, which can be divided into institutional, national, disciplinary or general repositories (e.g. Zenodo). The repository is selected mainly according to the disciplinary specifics, the services and tools offered (persistent identifiers, open access, dataset licensing, versioning,...) trustworthiness (certified repositories), or the requirements of the publisher or provider.
The National Repository operated by CESNET is currently in pilot operation.
Scientific integrity is generally understood as adherence to the values, principles, practices and ideals of science and research. The moral integrity and ethics of a scientist not only has implications for science as such, but is also related to the great social responsibility of the profession. The integrity of research data is fundamental to the quality of research and plays a key role in its reproducibility and reuse. A number of errors can occur in the handling of research data that are inconsistent with scientific integrity. It is therefore crucial to set clear and transparent rules for data handling that minimise the possibility of intentional and unintentional misconduct and can lead to reputational damage for the researcher and the institution. It is also essential to cultivate the scientific environment itself, which often encourages unethical behaviour through its systemic settings.
So how can we guarantee the integrity of research data?
Data management planning and the creation of DMPs can help us to think thoroughly and comprehensively about all phases of the data lifecycle and thus significantly enhance the effective handling of data in the different areas and phases of research. The key is to set clear rules for data organization and archiving (How and where will I keep the data, for how long?), train research staff, keep accurate documentation, ensure sufficient metadata, and share and store raw data for possible future use.
If we have already made a mistake in working with data and the results of the research have already been published, it is clearly appropriate to retract the publication.
Laboratory notebooks and the detailed records they contain are the optimal form of archiving all the actions, ideas and discoveries you have made in your research. They are also a good tool for increasing efficiency in working with data. A laboratory notebook can be in the form of handwritten notes or electronic laboratory notebooks (ELNs). A laboratory notebook serves not only as documentation of your work (among other things, as evidence in any dispute over who is the originator of the data, i.e. the holder of the intellectual property). It is also a tool for collaboration and information transfer within the lab/research team.
For example, JupyterNotebooks offers an interactive environment for documented data processing including sharing of lab notebooks, used data and DOIs.