How to use Python and R with RDF Data

How to use Python and R with RDF Data is a training that was developed in the context of the Swiss Personalized Health Network (SPHN) initiative and is part of a series of trainings centred around the SPHN Interoperability Framework developed by the SPHN Data Coordination Center (DCC). The framework aims at facilitating collaborative research by providing a decentralized infrastructure sustained by a strong semantic layer (SPHN Dataset) and graph technology, based on RDF, for the exchange and storage of data.

Having health-related data stored in compliance with the SPHN RDF schema enables the use of SPARQL queries to provide a solid foundation for answering specific research questions. Building on top of this foundation, general purpose languages such as Python and R enable data scientists to apply further data science methods to the retrieved data. In this training, we provide a short introduction on how to use Python and R to:

  • Setup a connection to a SPARQL endpoint
  • Run a SPARQL query and retrieve results

Building on top of these basics, we look at how to combine results from different queries, as well as how to deal with various datatypes.

Prerequisites:

  • Basic knowledge about R and Python
  • Basic knowledge about RDF and SPARQL

This video assumes that your data is loaded into your triplestore (in our example, GraphDB), and that you are familiar with SPARQL. If you need instructions on loading the data into your triplestore, please watch our training RDF Schema and Data Visualization or read our user guide. If you need a reminder on SPARQL, please watch our training on Querying Data with SPARQL.

After the training you will be able to:

  • Setup a connection to a SPARQL endpoint through R and Python
  • Run a SPARQL query through R and Python to extract specific data.

Resources:

All resources are available on the training's GitLab space

Licence: Creative Commons Attribution Share Alike 4.0 International

Keywords: Clinical data, SPARQL, Query data, RDF, Knowledge graph, Python, R, GraphDB


Additional information

Target audience: Research Scientists, Data Managers, Biomedical Researchers, Bioinformaticians, Data Scientists

Resource type: Video, Training materials, Mock data, E-learning

Status: Archived

Authors: Personalized Health Informatics Group, Petar Horki

Contributors: Sabine Österle, Vasundra Touré

Scientific topics: Medical informatics, FAIR data, Data management, Computer science

Operations: Data retrieval, Data handling, Query and retrieval