Introduction

Many bacteria have circular genomes unlike eukaryote, and maintain various symmetries. These symmetries include a gene bias between leading strand and lagging strand, an oligo sequence orientation, and a bias of base composition, and they are found throughout in most of bacterial genomes. The one of the most representative example is the symmetry of replication origin and terminus. The bacteria with circular genomes have a just pair of replication origin and terminus, and the origin and terminus pair of almost every organism are in symmetry. Currently the by the in silico prediction for replication origin and terminus is de facto standard, and the experimentally-determined them are few. This prediction is based on the base composition of guanine and cytosine due to difference of the replication mechanism between leading strand and lagging strand. However, the question that why the base composition bias caused as the result of replication does carry on symmetry is less well understood. Furthermore, this base composition bias is not conserved in exactly every organism. Several organisms have been observed that they lose the symmetry enormously, and it is not solved also what factor does cause it. On the other hand, recently a dif sequence is focused as a new marker for determination of the replication terminus. The dif sequence is 28bp sequence and the target sequence from a tyrosine recombinase for resolution the dimer DNA to two daughter DNAs at the replication termination. Therefore, it has considered that the dif sequence is exactly the replication terminus based on biological verification. However the dif sequence is not yet identified in every bacterium with exception of a few organisms.

Recursive Hidden Markov Modeling (RHMM)

We used recursively a Hidden Markov Model (HMM) supported by HMMER2 for dif sequence prediction. Firstly, to create a profile HMM, we predicted Escherichia genus 28 organisms dif sequences by a fuzzy matching (Perl module String::Approx), the query is the if of E. coli K12 with the parameter is no deletion and insertion, and 8bp mutation. Secondly, we calculated the similarity of between Escherichia XerCD and object organism XerCD amino-acid sequences for getting the clue as to prediction order. Finally, according to that similarity of XerCD, we predicted the dif sequences with recursively, in the case other phylum as well. This prediction method is called "RHMM" by us.