Skip to main content

PhD project title

Iterative Approximate Analysis of Graph-Structured Data For Precision Medicine

Outline description, including interdisciplinary and international dimensions (300 words max)

Graph-structured data are crucial to numerous disciplines, such as bioinformatics, artificial intelligence, and cybersecurity. Analysing graph-structured data is, however, time-consuming and hard to scale up to large, parallel computers. The performance of graph analysis is often constrained by computational complexity (many graph algorithms are NP-complete) and by inefficient use of the memory system. Moreover, parallel algorithms often give rise to huge intermediate data sets, which exacerbates the memory bottleneck. This PhD project aims to address these systems issues by leveraging a key characteristic of data analysis: models (represented here as graphs) approximate the real world, and the data on which these models are built are incomplete and subject to noise. As such, this project aims to design algorithms and data structures for iteratively approximating the graph analysis. Such approach is expected to reduce computation and memory consumption, leading to faster analysis with quantifiable error bounds. The principles of the approach will be encoded in a graph processing framework to demonstrate wide applicability.

The project will work closely with two driving application domains: bioinformatics and cognitive systems. Bioinformatics builds on the analysis of rich data sets to gain insights in fundamental biomedical processes. Network approaches enable understanding of the mechanisms underpinning normal biology and disease. This project applies network analysis to identify factors that control clinical outcomes for patients suffering from cancer and also COVID-19.

Knowledge extraction is a crucial step in data-driven cognitive systems, which aim to extract and analyse large amounts of data and to structure it for efficient querying. Unstructured data can be represented faithfully using knowledge graphs that describe the relations between various entities. A variety of graph processing algorithms are applied to knowledge graphs, in particular to identify the core knowledge of the graph and to leverage the graph data for predictive purposes.

 

Key words/descriptors

 

 

Knowledge extraction, scalable algorithms, graph processing, biomedical informatics, Cancer, Coronavirus, COVID-19, SARS-CoV-2

Fit to CITI-GENS theme(s)

 

 Life Sciences, IT

Name and discipline of secondary supervisor (from a complementary discipline)

First Supervisor:        dr Hans Vandierendonck                                                  School: EEECS

Second Supervisor:      dr  Ian Overton                                                                School: MDBM

Third Supervisor (if relevant):    Dr Leonidas Georgopoulos                            School: IBM Research-Zurich

 

Name of non-HEI partner(s)

IBM Research-Zurich

Contribution of non-HEI partner(s) to the project:

 

 

IBM will provide industrially relevant context on knowledge extraction from graph-structured data. They have extensive experience in this area by building scalable software systems for the analysis of massive-scale graph data. They will moreover provide access to relevant data sets. A 3-6 month internship at their premises will solidify the collaboration and provide an excellent training opportunity for the student, including expose to engineering, collaboration and research in a multi-national industry research organisation.