High performance software for analysis of mass spectrometry data applied to challenges in Global Food Security

  • High performance software for analysis of mass spectrometry data applied to challenges in Global Food Security

School of Electronics, Electrical Engineering and Computer Science 
& ECIT Global Research Institute

Proposed Project Title:
High performance software for analysis of mass spectrometry data applied to challenges in Global Food Security 

Principal Supervisor:   Dr Charles Gillan                                   Second Supervisor: Dr Olivier Chevallier (IGFS)

Project Description:

This project is about working with, and developing further, very large and complex pieces of software in order to assist research global food security. The core computer science research challenge is that of enabling parallel computation for the data processing and mathematical kernels needed to extract information from mass spectrometry data. This is one instance of the problem domain known as Big Data that is where vast quantities of data need to be processed in realistic time intervals in order to extract valuable information.

One example of the software that could be used to launch the project is the ProteoWizard Library and Tools [1,2,3], a set of modular and extensible open-source, cross-platform tools and software libraries that facilitate proteomics data analysis.  The libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard chemistry and LCMS dataset computations.

The Institute for Global Food Security at QUB operates several mass spectrometers, each of which is capable of producing several TeraBytes of digital data per week. Computational analysis of this data unlocks the details of the compounds which existed in the original sample. This information is vital in determining whether or not food fraud is taking place.

Food fraud is costing UK food and drink manufacturers a whopping £11.2bn a year, equivalent to 85% of their total profits, according to one report [4]. The outcomes from this project could make a difference to enhancing detection of food fraud attempts.

QUB has invested in a computer infrastructure to support the analysis of this mass spectrometry data. This is a collaboration between the ECIT Institute and IGFS. The student would join this collaboration and will evolve a strategy to implement parallel workflows within the processing software pipeline. The project will be based in the ECIT Institute and therefore is primarily a computer science project developing research skills in: concurrent software programming and high performance computing for data analytics. The student will develop extensive skills including in: C++, C, OpenCL, operating systems, parallel computing, distributed computing, code optimisation, numerical analysis, computational mathematics, databases and computational chemistry. The general application is that of data analytics, a field that includes many mathematical methods from the field of statistics. An interest in chemical analysis provides an alternative motivation that may interest some students.  Knowledge of, or at least a strong interest to learn, of biology and/or chemistry will be essential for success in this project.

References

[1]  A cross-platform toolkit for mass spectrometry and proteomics. Chambers, M.C., MacLean, B., Burke, R., Amode, D., Ruderman, D.L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J., Hoff, K., Kessner, D., Tasman, N., Shulman, N., Frewen, B., Baker, T.A., Brusniak, M.-Y., Paulse, C., Creasy, D., Flashner, L., Kani, K., Moulding, C., Seymour, S.L., Nuwaysir, L.M., Lefebvre, B., Kuhlmann, F., Roark, J., Rainer, P., Detlev, S., Hemenway, T., Huhmer, A., Langridge, J., Connolly, B., Chadick, T., Holly, K., Eckels, J., Deutsch, E.W., Moritz, R.L., Katz, J.E., Agus, D.B., MacCoss, M., Tabb, D.L. & Mallick, P. Nature Biotechnology 30, 918-920 (2012).
[2] ProteoWizard: Open Source Software for Rapid Proteomics Tools Development. Darren Kessner; Matt Chambers; Robert Burke; David Agus; Parag Mallick. Bioinformatics 2008; doi: 10.1093/bioinformatics/btn323.
[3] http://proteowizard.sourceforge.net/project.shtml
[4] http://www.foodmanufacture.co.uk/Manufacturing/Food-fraud-the-true-cost
Tags: C++, C, bit manipulations, parallel computing, optimisation, workflows, chemical analysis


 

Contact details

Supervisor Name: Dr Charles J Gillan                            Tel: +44 (0)28 90971847
QUB Address:       The ECIT Institute                              Email: c.gillan@qub.ac.uk
                             Queen’s Road, Queen’s Island
                             BT3 9DT