######################################################## Structure and Parameter Optimization with *poptimizer* ######################################################## =================================== *poptimizer* Documentation =================================== Introduction --------------------------------- *poptimizer* is an application for optimizing the structure and parameters of stochastic P system models using evolutionary algorithms. *poptimizer* takes a library of modules that represent basic biological processes of interest and combines them in many different ways to discover a possible assembly that mimics the behavior of the target data. During the search process, each model is evaluated by simulating its behavior with *mcss*. *poptimizer* and *mcss* are being used to develop Systems and Synthetic Biology computational models of bacterial colonies and plant systems. Installation --------------------------------- For instructions on how to compile and install *poptimizer*, see the README file included with the *poptimizer* distribution. Running *poptimizer* --------------------------------------- After installed, *poptimizer* is run by typing the following command:: $ poptimizer PARAMETER_FILE where *PARAMETER_FILE* is a file containing the input parameters required by *poptimizer*. For example, to run the promoter model optimization provided in the directory *examples/* of the *poptimizer* distribution, change to the corresponding directory and type:: $ poptimizer all_para_promoter_inputpara.xml The output of the optimization procedure can be inspected in several files. The log information with the generation number, number of function evaluations, and fitness of the best solution is saved to file *evolveprocess_Run0.txt*. The best P system obtained at the end of the optimization is saved to *bestPsystem_Run0.txt* and the corresponding time series to *bestsimulation_Run0_initfile0.txt*. Currently, *poptimizer* can process two types of input data from cell systems biological data: 1) time series data of multiple target objects under one initial state. 2) time series data of multiple target objects under different initial states. *poptimizer* Parameter File --------------------------------------------- The structure of the parameter file required for executing *poptimizer* is described in file *poptimizer-parameters-template.xml* under directory *src/poptimizer/*. Different examples for the parameter file can be seen under directory *examples/*. Models ----------------------------------------- The models built by *poptimizer* have flexible structure and parameters. A particular model is composed by a set of elementary modules (previously specified in a library) that act as the 'building blocks'. The user can define his own module library based on specific knowledge or simply on elementar biological motifs described in Systems Biology literature. While certain modules can have fixed rules and kinetic constants (fixed module library), others can be instantiated with different objects (proteins, genes, etc) and parameter values (non-fixed library). Many kinetic constants referring to well-known reactions can be taken from the literature and introduced in the library, where others need to be evolved by the parameter optimization methods available in *poptimizer*. Model Structure Optimization ------------------------------------------------------ The optimization of the model structure concerns with the choice of which modules should compose the model. The number of modules and their corresponding instantiation (according to a choice of different objects) is also explored to minimize the error between the output data generated by the model and the target data. A genetic algorithm that selects, recombines, and mutates different sets of modules is used to optimize the model structure. Model Parameters Optimization ------------------------------------------------------ The optimization of the model parameters concerns with learning the appropriate kinetic constants corresponding to each one of the rules specified in the modules. When the kinetic constants are not known from literature, the module library specifies the parameter ranges (and a choice of linear/logarithmic scale) for each kinetic constant. The parameter optimization methods currently available include genetic algorithms (GA), differential evolution (DE), opposition differential evolution (ODE), and the covariance matrix adaptation evolution strategy (CMA-ES). Fitness Function ------------------------------------------------- *poptimizer* can use two different fitness functions to quantify the quality of candidate models. These are: 1) **Equal Weighted Sum**: The fitness is given by the arithmetic sum of the RMSE (between target and model data) for each one of the time series. This is a common method for calculating the total error of several time series with similar magnitude. 2) **Random Weighted Sum**: The fitness is obtained by a weighted sum of the errors that is adjusted according to a normalized weight vector randomly generated. A high number of different weight vectors is generated and used in the fitness calculation to average out the randomness given to the weights. This method allows a more wide exploration when fitting a model to time series of different orders of magnitude. Additional Information -------------------------------------------- More detailed information about the methodology can be found in the paper entitled *Evolving Cell Models for Systems and Synthetic Biology*, to appear in the Systems and Synthetic Biology journal. ================= Examples ================= This section briefly describes three different running examples for *poptimizer*. The first two examples are taken from the reference paper cited above and the third refers to a pulse generator with different initial conditions. threegene ------------------------- This case study investigates regulatory networks consisting of three genes that are able to produce a pulse in the expression of a specific gene. The corresponding files can be found in *examples/threegene/*. To run this example, change to the corresponding directory, and type:: $ poptimizer threegene_inputpara.xml The non-fixed module library used is specified in file *threegene_module_library.xml*, the target data in *target_data_threegene.txt*, and the initial values for each of the genes in *initial_values_threegene.txt*. promoter ------------------------------------ This case study investigates a gene regulatory network consisting of five genes that is able to behave as a bandwidth detector. The corresponding files can be found in *examples/promoter/*. To run this example, change to the corresponding directory, and type:: $ poptimizer all_para_promoter_inputpara.xml The non-fixed module library used is specified in file *all_para_module_library_promoter.xml*, the target data in*target_data_promoter.txt*, and the initial values for each of the genes in *initial_values_promoter.txt*. fourinitial --------------------------------------- The last example deals with a network of at most five genes to simulate a pulse generator for one the genes under different initial conditions. The corresponding files can be found in *examples/fourinitial/*. To run this example, change to the corresponding directory, and type:: $ poptimizer four_initial_inputpara.xml A fixed module library specified in file *library2.xml* is now used together with the non-fixed library *library1-lin.xml*. The target data is now specified in four different files (*target1.txt*, *target2.txt*, *target3.txt*, *target4.txt*), as well as the initial values (*initials1.txt*, *initials2.txt*, *initials3.txt*, *initials4.txt*). ============================= *poptimizer* Software ============================= License ------------------------------------------ The *poptimizer* distribution, including all source code, model examples, and documentation, are the copyright of of the Infobiotics Team (Hongqing Cao, Claudio Lima, Natalio Krasnogor, Francisco Romero-Campero, Jamie Twycross, and Jonathan Blakes) and is released under the GNU GPL version 3 license. Credits ------------------------------------------- *poptimizer* was written by Hongqing Cao, with contributions from Claudio Lima, Natalio Krasnogor, Jamie Twycross, Francisco Romero-Campero, and Jonathan Blakes. It is being used on Systems Biology research projects in the Centre for Plant Integrative Biology and the School of Computer Science, University of Nottingham, U.K. This work is funded by grants from the BBSRC grant BB/D0196131. For further information or any questions please contact cvf AT cs.nott.ac.uk. *copyright 2009 Infobiotics Team, released under GNU GPL version 3.*