Computational aspects of gene regulatory
network project. The description and simulation
of gene regulatory networks can only be accomplished with computational
tools specific to the task. We have locally developed several
software tools that are in constant use by our laboratory investigating
sea urchin development, as well as over 110 users working in
a variety of other systems (see below). These tools were specifically
designed to aid the experimentalist working at the bench and
using iterative cycles of experimentation and computation. The
software tools include: BioArray, a program that uses macroarray
spot data from phosphoimagers to manage intensity and position
information; SUGAR, a system to perform, display and correlate
large-BAC sequence analyses to aid the experimentalist with
the functional analysis of cis-regulatory elements in
genomic DNA; SeqComp and FamilyRelations, programs for comparative
sequence analysis; and NetBuilder, an environment for creating
and analyzing models of gene networks.
In order to make the sequence analysis programs convenient we
have installed a web-based facility, the Cartwheel Project,
that allows the user to have complete control over the process.
Within the circumscribed domain of genomic sequence-based information,
Cartwheel provides facilities to organize, analyze, and curate
information on the level of individual labs. The analyses are
then viewable by programs such as FamilyRelations. The Cartwheel
Project is the umbrella term for a bioinformatics infrastructure
first developed by C. Titus Brown and now maintained and extended
by the computational staff of the Center.As the genome data
has expanded, we have found it necessary to install another
queuing package, Parasol, to facilitate bulk jobs not covered
by Cartwheel.
The equipment that supports the computational efforts of the
Center includes two 18 unit Beowulf clusters, a web server for
the Sea Urchin Genome Project (SUGP) and several dual processor
development machines used by the staff for software construction,
testing and maintenance. In the past year we have decommissioned
one obsolete Beowulf cluster and installed a new one.
The utility of these comparative sequence analysis facilities
is reflected in the user population. At present, the Caltech
Cartwheel server, Woodward, has 249 total registered users in
40 lab groups. A total of 22181 jobs have run in the last year
for a total of 123 CPU days. The majority of these are Seqcomp
and Blast analyses.
The goal of the Center for Computational Regulatory Biology is to develop, refine and test computational approaches in genomics broadly and cis-regulatory analysis specifically. The primary focus for the latter is the elucidation of gene regulatory networks in development. The Center interacts with the wider research community in several ways: it provides open source software for use by academic research groups; it provides web-based servers for genomic analysis using software developed locally; and it maintains databases fundamental to the Sea Urchin Genome Project, an initiative that began in the Davidson laboratory and at the Genomics Technology Facility. The Facility provides to the Caltech and external scientific community upon request services and materials stemming from the macroarray libraries and arraying equipment that we maintain.
One aspect of the Center is the Sea Urchin Genome Resource that maintains information resources that are used widely in the sea urchin research community. We provide sequence information through the Sea Urchin Genome Project web site (http://spbase.org). With the advent of the web resources for annotation established at the Human Genome Sequencing Center, Baylor College of Medicine and the Sea Urchin Genome Resources at NCBI, we have not seen the need to expand our local databases. However, we have refined the cross-index between our library clones and sequences stored in public databases at NCBI. Since so many of our libraries were used for the sequencing project, and the library location for the clones was preserved in the sequence information, we can provide a searchable sequence database from which the user can obtain clone information and order the clone. This "clone by computer" method renders our arrayed libraries extremely useful and readily accessible for the working molecular biologist.