2nd INCF Congress of Neuroinformatics: September 6-8, 2009
- Kenji Doya
- Alon Halevy
- Astrid Prinz
- Andrew Schwartz
- Shankar Subramaniam
- Arthur Toga
- Bart ter Haar Romeny
- Uri Eden
- Klaus Linkenkaer-Hansen
- Tim Clark
- Alan Ruttenberg
- Jeffrey Grethe
- Arnd Roth
- Wulfram Gerstner
- Peter Hunter
- Markus Diesmann
- Andrey Semin
- Pietro Liò
- Albert Cardona
- Giorgio Ascoli
- Rolf Kötter
Title: Data Integration from Genome to Phenotypes
University of California at San Diego, San Diego, USA
Abstract: We are witnessing the emergence of the "data rich" era in biology. The myriad data in biology ranging from sequence strings to complex phenotypic and disease-relevant data pose a huge challenge to modern biology. The standard paradigm in biology that deals with "hypothesis to experimentation (low throughput data) to models" is being gradually replaced by "data to hypothesis to models and experimentation to more data and models". And unlike data in physical sciences, that in biological sciences is almost guaranteed to be highly heterogeneous and incomplete. In order to make significant advances in this data rich era, it is essential that there be robust data repositories that allow interoperable navigation, query and analysis across diverse data, a plug-and-play tools environment that will facilitate seamless interplay of tools and data and versatile user interfaces that will allow biologists to visualize and present the results of analysis in the most intuitive and user-friendly manner. This talk with address several of the challenges posed by enormous need for scientific data integration in biology with specific exemplars and possible strategies. The issues addressed will include:
- Architecture of Data and Knowledge Repositories
- Databases - Flat, Relational and Object-Oriented; what is most appropriate?
- The imminent need for Ontologies in biology
- The Middle Layer: How to design it?
- Applications and integration of applications into the middle layer
- Reduction and Analysis of Data - the largest challenge!
- How to integrate legacy knowledge with data?
- User Interfaces: web browser and beyond
The complex and diverse nature of biology mandates that there is no "one solution fits all" model for the above issues. While there is a need to have similar solutions across multiple disciplines within biology, the dichotomy of having to deal with the context, which is everything in some cases, poses severe design challenges. For example, can a system that describes cellular signaling also describe developmental genetics? Can the ontologies that span different areas (e.g. anatomy, gene and cellular biology, functional imaging) be compatible and connective? Can the detailed biological knowledge accrued painstakingly over decades be easily integrated with high throughput data? These are only few of the questions that arise in designing and building modern data and knowledge systems.
Bio sketch: Shankar Subramaniam is a Professor of Bioengineering, Chemistry and Biochemistry, Cellular and Molecular Medicine and Nano Engineering. He is currently the Chair of the Bioengineering Department at the University of California at San Diego. He holds the inaugural Joan and Irwin Jacobs Endowed Chair in Bioengineering and Systems Biology. He was the Founding Director of the Bioinformatics Graduate Program at the University of California at San Diego. He also has adjunct Professorships at the Salk Institute for Biological Studies and the San Diego Supercomputer Center. Prior to moving to UC San Diego, Dr. Subramaniam was a Professor of Biophysics, Biochemistry, Molecular and Integrative Physiology, Chemical Engineering and Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign (UIUC). In 2002 he received the Genome Technology All Star Award. In 2008 he was awarded the Faculty Excellence in Research Award at the University of California at San Diego.
Subramaniam has played a key role in raising national awareness for training and research in bioinformatics. He served as a member of the National Institute for Health (NIH) Director's Advisory Committee on Bioinformatics, which resulted in the BIOMEDICAL INFORMATION SCIENCE AND TECHNOLOGY INITIATIVE (BISTI) report. He is currently an overseas advisor for the Department of Biotechnology of the Government of India, and a member of a European Science Foundation Panel.
Research in Subramaniam laboratory spans several areas of bioinformatics and systems biology. In bioinformatics he is involved in developing novel strategies for identifying protein interaction networks, intracellular localization of proteins and identification of functional networks in cells. In systems biology he is involved in deciphering mammalian cellular networks from high throughput and phenotypic data and in developing strategies for modeling cellular signaling networks.
He continues to be engaged in developing state-of-the-art infrastructure for bioinformatics. The Molecule Pages Database has been recognized as the most innovative informatics resource for signaling proteins and received the ALSIP award. The integration of highly innovative and complex computer science strategies with expert-driven curation has led to the Molecule Pages Database that provides comprehensive information on all known functional states of signaling molecules. The LipidMaps database serves as the first and only integrated resource for mammalian lipids along with their complementary gene and protein data. The microarray server, widely used by the research community combines sophisticated statistical analysis methods developed in the Subramaniam laboratory with biochemical annotations and pathways to provide biological insights into consequences of transcriptional changes in mammalian cells.