E-CELL: software environment for whole-cell simulation

Bioinformatics

Pages 72-84

E-CELL: software environment for whole-cell simulation
Introduction
Previous work in simulations of cellular processes
Implementation of the E-CELL system
User interfaces
Modeling the cell
Application to genome engineering
Concluding remarks
Acknowledgements
References

Masaru Tomita¹, Kenta Hashimoto¹, Kouichi Takahashi¹, Thomas Simon Shimizu^1,3, Yuri Matsuzaki¹, Fumihiko Miyoshi¹, Kanako Saito¹, Sakura Tanida¹, Katsuyuki Yugi¹, J. Craig Venter² and Clyde A. Hutchison III²

¹Laboratory for Bioinformatics, Keio University, 5322 Endo, Fujisawa, 252, Japan and ²The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA

Received on May 1, 1998; revised on October 16, 1998; accepted on November 2, 1998

Abstract

Motivation: Genome sequencing projects and further systematic functional analyses of complete gene sets are producing an unprecedented mass of molecular information for a wide range of model organisms. This provides us with a detailed account of the cell with which we may begin to build models for simulating intracellular molecular processes to predict the dynamic behavior of living cells. Previous work in biochemical and genetic simulation has isolated well-characterized pathways for detailed analysis, but methods for building integrative models of the cell that incorporate gene regulation, metabolism and signaling have not been established. We, therefore, were motivated to develop a software environment for building such integrative models based on gene sets, and running simulations to conduct experiments in silico.
Results: E-CELL, a modeling and simulation environment for biochemical and genetic processes, has been developed. The E-CELL system allows a user to define functions of proteins, protein-protein interactions, protein-DNA interactions, regulation of gene expression and other features of cellular metabolism, as a set of reaction rules. E-CELL simulates cell behavior by numerically integrating the differential equations described implicitly in these reaction rules. The user can observe, through a computer display, dynamic changes in concentrations of proteins, protein complexes and other chemical compounds in the cell. Using this software, we constructed a model of a hypothetical cell with only 127 genes sufficient for transcription, translation, energy production and phospholipid synthesis. Most of the genes are taken from Mycoplasma genitalium, the organism having the smallest known chromosome, whose complete 580 kb genome sequence was determined at TIGR in 1995. We discuss future applications of the E-CELL system with special respect to genome engineering.
Availability: The E-CELL software is available upon request.
Supplementary information: The complete list of rules of the developed cell model with kinetic parameters can be obtained via our web site at: http://e-cell.org/.
Contact: mt@sfc.keio.ac.jp

Introduction

The complete genomes of more than 18 microorganisms have been sequenced. The availability of this new information on the gene content of organisms has led to the emergence of a number of heretofore unavailable approaches to biology. Systematic analyses of genes/proteins are now under way in numerous centers around the world, and comprehensive catalogues of protein function are being constructed.

The challenge created by genomics is to understand how all the cellular proteins work collectively as a living system. By attempting to understand the dynamics in living cells, we should be able to predict consequences of changes introduced into the cell and/or its environment, e.g. knocking out a gene or altering available metabolites. Possible consequences of such intervention include cell death, changes in growth rate, and an increase or decrease in the expression of specific genes. The development of sufficiently refined cell models which allow predictions of such behavior would complement the experimental efforts now being made systematically to modify and engineer entire genomes.

In this paper, we present E-CELL, a computer software environment for modeling and simulation of the cell. The E-CELL system is a generic object-oriented environment for simulating molecular processes in user-definable models, equipped with graphical interfaces that allow observation and interaction. E-CELL provides a unified, object-oriented framework for modeling and simulation of the complex interactions among the gene products of completed genomes. Our modeling approach described in this paper attempts to link diverse cellular processes such as gene expression, signaling and metabolism, to construct a cell model for conducting experiments in silico.

Previous work in simulations of cellular processes

Many attempts have been made to simulate molecular processes in both cellular and viral systems. Perhaps the most active area of cellular simulation is the kinetics of biochemical metabolic pathways. Several software packages for quantitative simulation of biochemical metabolic pathways, based on numerical integration of rate equations, have been developed, including GEPASI (Mendes, 1993, 1997), KINSIM (Barshop et al., 1983; Dang and Frieden, 1997), MIST (Ehlde and Zacchi, 1995), METAMODEL (Cornish-Bowden and Hofmeyr, 1991) and SCAMP (Sauro, 1993).

In predicting cell behavior, the simulation of a single or a few interconnected pathways can be useful when the pathway(s) being studied is relatively isolated from other biochemical processes. However, in reality, even the simplest and most well-studied pathways, such as glycolysis, can exhibit complex behavior due to connectivity. Moreover, simulations of metabolic pathways alone cannot account for the longer time-scale effects of processes such as gene regulation, cell division cycle and signal transduction.

Several groups have proposed and analyzed gene regulation and expression models by simulation (Meyers and Friedland, 1984; Koile and Overton, 1989; Karp, 1993; Arita et al., 1994; McAdams and Shapiro, 1995). The cell division cycle (Tyson, 1991; Novak and Tyson, 1995) and signal transduction mechanisms (Bray et al., 1993) have also been active areas of research for biological modeling and simulation. Most of them have utilized qualitative models to deal with the general lack of quantitative data in molecular biology. However, while qualitative models are generally useful when information is incomplete (Kuipers, 1986), they often generate ambiguous results (Kuipers, 1985), the behaviors of which are difficult to predict due to combinatorial explosion (for a review on computer simulations in biology, see Galper et al., 1993).

Previous studies in biochemical and genetic simulations have usually limited their models to focus on only one of the several levels of the time-scale hierarchy in cellular processes. Linking the gaps between the various levels of this hierarchy is an extremely challenging problem that has yet to be adequately addressed. This paper presents a step towards integrative simulation of several levels of cellular processes.

Implementation of the E-CELL system

The E-CELL system is, in essence, a rule-based simulation system and is written in C++, an object-oriented programming language. The model consists of three lists, and is loaded at runtime. The substance list defines all objects which make up the cell and the culture medium. The rule list defines all of the reactions which can take place within the cell, and the system list defines spatial and/or functional structure of the cell and its environment. The state of the cell at each time frame is expressed as a list of concentration values of all substances within the cell, along with global values for cell volume, pH and temperature. The simulator engine generates the next state in time by computing all of the functions defined in the reaction rule list. In addition to using the sample models provided with the system, the user can create user-defined models by writing original substance and rule lists. Graphical interfaces are provided to allow observation and interaction throughout the simulation process.

A substance can be a substrate, product or catalyst of a reaction. Typical substances include proteins, protein complexes, DNA (genes), RNA and small molecules. The list of substance concentrations is updated with the new values computed by the simulator engine after each time interval.

In a single time interval, each rule in the rule list is called upon by the simulator engine to compute the change in concentration of each substance. The net change in concentration for each substance is added to the present concentration at the end of each time interval to update the set of state variables, i.e. to generate the next state of the cell. By encapsulating numerical integration methods into object classes, virtually any integration algorithm can be used for simulation of an E-CELL model. Furthermore, E-CELL allows the assignment of any numerical integration algorithm for each compartment of the cell model, facilitating the optimization of the simulation for the user's purpose (e.g. simulation accuracy or speed). Different time intervals ([Delta]t) can also be defined for each spatial or functional compartment and they can be redefined through the control panel at runtime by the user. In the present version, the system defaults to 1 ms for [Delta]t and the user can select between the first-order Euler [error is O([Delta]t²)] or fourth-order Runge-Kutta [O([Delta]t⁵)] methods for the numerical integration in each compartment. The Euler method is used in compartments with discrete, stochastic reactions such as DNA-protein binding, and the Runge-Kutta method is used for compartments with deterministic reactions defined by continuous rate functions.