Data Curators Enhance Skills in Database and Omics Data Analysis

From March 16 to 18, the Latvian Institute of Organic Synthesis hosted the Training in Databases and Omics Data Analysis, a workshop focused on working with various biomedical and research databases, as well as on effective data retrieval and analysis. Mikus Melderis, a representative of the Latvian Data Curators Network and a data steward at the Higher Education and Science IT Shared Services Center (VPC), also participated in the training.

The first day of the programme was dedicated to clinical and regulatory databases, covering how to search for information on drugs, their approval status, and clinical trials using resources such as the EMA (European Medicines Agency) and FDA (U.S. Food and Drug Administration) databases. While these databases are reliable, navigating them can be challenging, and the data are not always complete or easily accessible.

On the second day, participants explored a broader range of research databases, including UniProt (Universal Protein Resource), NCBI (National Center for Biotechnology Information), EMBL-EBI (European Molecular Biology Laboratory – European Bioinformatics Institute), and various omics data repositories. These databases often function as “databases of databases”, providing access to very large volumes of data, but they also require a critical approach to assess data quality. Some resources were explored hands-on, such as GEO (Gene Expression Omnibus), which offers standardised and well-documented omics datasets.

Special attention was given to the fact that some biomedical databases may contain incomplete or outdated information, for example, in drug target annotations or clinical data. Therefore, it is essential to verify information against primary sources and publications. Additionally, some platforms provide only limited access to data or require specific conditions for use.

The final day focused on using Application Programming Interfaces (APIs) to access databases, enabling automated retrieval of large datasets and the creation of reproducible analysis workflows. Manual data retrieval was compared with API access, highlighting that APIs provide more structured and easily processed data but also require technical skills.

Overall, the training provided a comprehensive overview of modern approaches to bio-data analysis and practical knowledge on how to efficiently access and work with large-scale biological datasets, strengthening data stewards’ ability to navigate diverse research data sources, critically assess data quality, and select the most appropriate tools for specific research questions. The skills gained will allow stewards to more effectively support researchers in locating, retrieving, and reproducibly analysing data, particularly when working with large-scale biomedical datasets.