Mining High Dimensional and Heterogenous Data in EuroKUP - Texts

Home
Main Navigation
Content
Search
Help

Font Standard Bold
Login

Research unit

COST

Project number

C09.0164

Project title

Mining High Dimensional and Heterogenous Data in EuroKUP

Texts for this project

	German	French	Italian	English
Key words	-	-	-
Research programs	-	-	-
Short description	-	-	-
Partners and International Organizations	-	-	-
Abstract	-	-	-
References in databases	-	-	-

Inserted texts

Category	Text
Key words (English)	data mining; machine learning; intelligent data analysis; genomics; proteomics
Research programs (English)	COST-Action BM0702 - Urine and Kidney Proteomics
Short description (English)	The goal of this research project is to investigate new data mining / machine learning techniques for extracting useful biological knowledge from the fusion of the various heterogeneous data sources. The need for new analytical tools is mainly motivated by the difficulty in merging these data sources that has intrinsic high dimensionality, often coupled with a small number of observations and high levels of feature redundancies; this could lead to the construction of unstable models, thus reducing experts' confidence of analysis results.
Partners and International Organizations (English)	AT, BE, CH, CY, CZ, DE, DK, ES, FI, FR, GR, IE, IT, MK, NL, NO, PL, SE, UK
Abstract (English)	The development of high throughput technologies, such as micro-arrays and mass spectrometry, has resulted in a wealth of experimental data produced in the biological laboratories which has to be analyzed. The goal of this research project is to investigate new data mining / machine learning techniques for extracting useful biological knowledge from the fusion of the various heterogeneous data sources. The need for new analytical tools is mainly motivated by the difficulty in merging these experimental data that has intrinsic high dimensionality, often coupled with a small number of observations and high levels of feature redundancies; this could lead to the construction of unstable models, thus reducing experts' confidence of analysis results. Based on the experimental results the biologists plan targeted experiments which require considerable effort and investment, they have thus to be sure that the results of the analysis are as robust as possible. This project focuses on two main research directions. First, we explore novel data mining approaches adapted to the idiosyncrasies of proteomics (and other -omics) data, addressing in particular the problem of High Dimensionality - Small Sample (HDSS) size. Within this task we develop learning methods that address the problem of HDSS with a view to increase the stability of the learned models and to manage the problem of feature redundancies. The second research direction focuses on developing new data mining tools that perform learning from multiple heterogeneous sources which correspond to diverse experimental data such as different measurement technologies, e.g. different forms of mass spectrometry for proteomics, and/or measurements of different -omics.
References in databases (English)	Swiss Database: COST-DB of the State Secretariat for Education and Research Hallwylstrasse 4 CH-3003 Berne, Switzerland Tel. +41 31 322 74 82 Swiss Project-Number: C09.0164