Short description
(English)
|
The goal of this research project is to investigate new data mining / machine learning techniques for extracting useful biological knowledge from the fusion of the various heterogeneous data sources. The need for new analytical tools is mainly motivated by the difficulty in merging these data sources that has intrinsic high dimensionality, often coupled with a small number of observations and high levels of feature redundancies; this could lead to the construction of unstable models, thus reducing experts' confidence of analysis results.
|
Partners and International Organizations
(English)
|
AT, BE, CH, CY, CZ, DE, DK, ES, FI, FR, GR, IE, IT, MK, NL, NO, PL, SE, UK
|
Abstract
(English)
|
The development of high throughput technologies, such as micro-arrays and mass spectrometry, has resulted in a wealth of experimental data produced in the biological laboratories which has to be analyzed. The goal of this research project is to investigate new data mining / machine learning techniques for extracting useful biological knowledge from the fusion of the various heterogeneous data sources. The need for new analytical tools is mainly motivated by the difficulty in merging these experimental data that has intrinsic high dimensionality, often coupled with a small number of observations and high levels of feature redundancies; this could lead to the construction of unstable models, thus reducing experts' confidence of analysis results. Based on the experimental results the biologists plan targeted experiments which require considerable effort and investment, they have thus to be sure that the results of the analysis are as robust as possible. This project focuses on two main research directions. First, we explore novel data mining approaches adapted to the idiosyncrasies of proteomics (and other -omics) data, addressing in particular the problem of High Dimensionality - Small Sample (HDSS) size. Within this task we develop learning methods that address the problem of HDSS with a view to increase the stability of the learned models and to manage the problem of feature redundancies. The second research direction focuses on developing new data mining tools that perform learning from multiple heterogeneous sources which correspond to diverse experimental data such as different measurement technologies, e.g. different forms of mass spectrometry for proteomics, and/or measurements of different -omics.
|