Polyketides and non-ribosomal peptides represent a large class of structurally diverse natural products much studied over recent years because the enzymes that synthesise them, the modular polyketide synthases (PKSs) and the non-ribosomal peptide synthetases (NRPSs), share striking architectural similarities that can be exploited to generate "un-natural" natural products. PKS and NRPS proteins are multifunctional, composed of a co-linear arrangement of discrete protein domains representing each enzymic activity needed for chain elongation using either carboxylic acid or amino acid building blocks. Each domain is housed within larger modules which form the complex. Polyketide and peptide antibiotics, antifungals, antivirals, cytostatics, immunosuppressants, antihypertensives, antidiabetics, antimalarials and anticholesterolemics are in clinical use. Of commercial importance are also polyketide and peptide antiparasitics, coccidiostatics,animal growth promoters and natural insecticides.Polyketides are assembled through serial condensations of activated coenzyme-A thioester monomers derived from simple organic acids such as acetate, propionate and butyrate. The choice of organic acid allows the introduction of different chiral centres into the polyketide backbone. The active sites required for condensation include an acyltransferase (AT), an acyl carrier protein (ACP) and a ß-ketoacylsynthase (KS). Each condensation results in a ß-keto group that undergoes all, some or none of a series of processing steps. Active sites that perform these reactions are contained within the following domains; ketoreductase (KR), dehydratase (DH) and an enoylreductase (ER). The absence of any ß-keto processing results in the incorporation of a ketone group into the growing polyketide chain, a KR alone gives rise to a hydroxyl moiety, a KR and DH produce an alkene, while the combination of KR, DH and ER domains lead to complete reduction to an alkane. Most often, the last module contains the thioesterase domain (TE) responsible for the release of linear polyketide chain from the enzyme and final cyclisation. After assembly, the polyketide backbone typically undergoes post-PKS modifications such as hydroxylation(s), methylation(s) and glycosylation(s) to give the final active compound.Non-ribosomal peptides are assembled by the so-called "multiple carrier thio-template mechanism". Three domains are necessary for an elongation module: an adenylation (A) domain that selects the substrate amino acid, analogous to a polyketide AT domain, and activates it as an amino acyl adenylate; a peptidyl carrier protein (PCP) that binds the co-factor 4-phosphopantetheine to which the activated amino acid is covalently attached, analogous to the ACP of a PKS; and a condensation (C) domain that catalyzes peptide bond formation, again analogous to the KS in modular PKSs. The NRPSs also contain a (Te) domain located at the C-terminal of the protein which is essential for release of linear, cyclic or branched cyclic peptides. Auxiliary activities can further enlarge the structural diversity of the peptide especially common are epimerization domains (Epim) that convert the thioester-bound amino acid from an L- to D- configuration.There has been a lot of interest in the last few years in generating new compounds for the production of novel drugs by manipulating the programming of such clusters in vitro (e.g. the idea of combinatorial biosynthesis). However, an important barrier to the progress is the fact that most changes made by in vitro methods result in very low yields or no detectable product. A possible solution to the yield problem would be the generation of novel clusters by homologous recombination in vivo, because this would favour more closely related sequences and should reduce problems caused by non-functional incompatible junctions.The Unified Modeling Language (UML) was used to define the platform independent integral generic program packages, CompGen and ClustScan, which are under development to model these processes in silico. The heart of CompGen is a specially structured database, based on BioSQL v1.29, which connects the biosynthetic order of synthase/synthetase enzymes to the sequences of the component polypeptides. The additional linkage to the gene sequences allows the integration of DNA sequence with product structure. The database contains sequences of the well-characterised PKS/NRPS clusters, and non-annotated sequenced clusters whose structure and functionis yet unknown, to act as building blocks for the production of novel products. It is easy to add custom sequences to the database and to annotate them by the use of propriety protein profiles designed by Pfam database and HMMER. One function of the program is the ability to generate virtual recombinants between clusters. This can be done using a recombination model (with optional parameters) to predict sites for homologous recombination or by user defined recombination sites (e.g. to model in vitro genetic manipulation such as module replacement). The program predicts the linear polyketide structure of the resulting "un-natural" natural products with a chemical description using isomeric SMILES. Molecular modelling of the subsequent spontaneous cyclisation process produces structures for a virtual compound database for further molecular modelling studies using PASS and CDD technology. An optional "reverse genetics" module analyses a given chemical structure to see if it could be produced by a novel PKS/NRPS synthesis cluster and suggests the DNA sequence of a suitable cluster based on building blocks derived from clusters contained in the database.Overall, the CompGen allows in silico generation of the database of novel "un-natural" natural chemical compounds that can be used for in silico screening using PASS or CDD technology. The other integral generic program package, ClustScan, will recognise and annotate new gene clusters from microbial genome sequencing projects or in metagenomes of soil and/or marine microorganisms.