Home > CGCV

CGCV

CGCV is a project mainly written in PERL and JAVASCRIPT, based on the Apache-2.0 license.

Are you curious how a gene cluster is conserved across multiple genomes? We have developed a web-based Comparative Gene Cluster Viewer that allows you to upload the sequences of your favourite gene cluster (e.g. an operon in Bacteria), perform dynami

Publication


  • Revanna, K.V., Krishnakumar, V. & Dong, Q. (2009) A web-based software system for dynamic gene cluster comparison across multiple genomes. Bioinformatics, 25(7):956-7
  • PUBMED

Setup


Supported Operating Systems

  • Linux, Solaris (untested on other Unix flavors)

Supported Browsers

  • Mozilla Firefox 2 or higher
  • Internet Explorer 6 or higher
  • Safari
  • Opera

Prerequisites

  • Perl 5.8.0 or higher
  • Apache 1.3 or 2.0
  • MySQL Database 5.0
  • BioPerl 1.5.2 or higher

CPAN Perl Modules (can be downloaded from CPAN archive at http://www.cpan.org)

  • DBI
  • DBD::mysql
  • HTML::Template
  • GD
  • Math::Round
  • Proc::Simple
  • Apache::Session::File

Core Perl Modules

  • CGI
  • CGI::Carp
  • Data::Dumper
  • Digest::MD5
  • Math::Round
  • File::stat
  • POSIX

Installation of Perl Modules

  • Perl modules can be installed automatically using the cpan utility that is installed as part of Perl. Instructions for using cpan are available at http://perldoc.perl.org/CPAN.html
  • If you require more control over the installation, or have trouble with the cpan utility, instructions for installing modules by hand is available at http://perldoc.perl.org/perlmodinstall.html

Database Setup

CGCV requires a MySQL database to store data and generate results. By default, CGCV will connect to MySQL installed on the local machine, using the username cgcv and the database name cgcv_database. To setup CGCV with these default settings, issue the following commands from the unix command prompt.

To create the username bov, without password

mysql -u root -p password -e 'create user cgcv@localhost'

With password

mysql -u root -p password -e 'create user cgcv@localhost identified by password'

To create the database and give permissions to your user

mysql -u root -p password -e 'create database cgcv_database'

mysql -u root -p password -e 'grant select,insert,create,drop on cgcv_database.* to cgcv@localhost'

If you want to use different database settings, reference the MySQL documentation for instructions on how to configure the database.

Upload Directory

The Upload direcory is the folder on your filesystem into which the files uploaded to the CGCV program are saved and processed. In the setup.sh file, you are required to provide the path to the folder where you would like the data to be uploaded to (by changing the Upload_dir variable). Since Apache requires to access this data, you are required to change the permissions (chmod) of the folder to 700 and change the ownership (chown) of the directory to the user and group that runs Apache.

Installation


After download, uncompress CGCV using the command

tar -zxvf CGCV.tar.gz

Edit the setup.sh to customize CGCV for your location. You must set the parameters under "REQUIRED PARAMETERS". If you did not use the default database settings, you must also define the connection information under "OPTIONAL PARAMETERS".

Edit the upload_directory variable to store all the files generated by this program. You can change ownership of the directory to the account and group that Apache runs as and set the permissions to 700.

Execute the setup command

sh setup.sh

to configure CGCV, this creates three folders in the same directory called 'cgi-bin', 'htdocs' and 'bin'.

If you have set the 'lifetime' value in setup.sh for automatic expiration of results, a perl script will be created in

bin/cleanDir.pl

To have this command executed automatically, you need to add the following entry to your crontab.

* 0 * perl /PATH/TO/cleanDir.pl

Rename the 'htdocs' directory to 'CGCV' and copy it to the directory that your Apache installation uses for HTML files. This directory is defined in your Apache configuration file (/path/to/apache/conf/httpd.conf) as 'DocumentRoot'. So if your Apache configuration shows the following DocumentRoot:

DocumentRoot "/var/www/htdocs"

you would execute

cp -rp htdocs /var/www/htdocs/CGCV

Rename the cgi-bin directory to CGCV and copy it to the directory your apache installation uses for cgi-bin executables. This may be defined as 'ScriptAlias' in your apache configuration.

ScriptAlias /cgi-bin/ "/var/www/cgi-bin"

in which case you would execute

cp -rp cgi-bin /var/www/cgi-bin/CGCV

If your apache configuration does not include ScriptAlias, then append cgi-bin to the DocumentRoot, e.g.

cp -rp cgi-bin /var/www/htdocs/cgi-bin/CGCV

Change to the bin directory and execute the following commands to download the files and set up the MySQL database with the necessary tables for the viewer. For Bacterial genomes, run:

sh createDB.sh -o Prokaryotes -m "full-install"

For Eukaryotic genomes,

sh createDB.sh -o Eukaryotes -m "full-install"

NOTE: Both these scripts take a considerable amount of time to download and set up the service on your local system. To test our service, you may choose to run the following script:

sh sampler.sh

This will download a sample set of sequences corresponding to Bacteria (5 organisms) and Eukaryotes (2 organisms) and sets up the necessary MySQL tables.

Data Directory


The Data directory is the folder on your filesystem into which the files from the FTP servers will be downloaded and kept upto date. In the setup.sh file, you are required to provide the path to the folder where you would like the data to be downloaded and stored (by changing the Genomes_path variable).

As an example, if you choose to place your data in the following folder (either for Prokaryotes or Eukaryotes):

/home/username/CGCV/Data/[Bacteria | Eukaryotes]/

the installation script will create the following folder and the script that downloads the data will create the following data structure inside that folder:

/home/username/CGCV/Data/[Bacteria | Eukaryotes]/subfolder

List of subfolders: .. genomes .. aaseqs .. geneseqs .. gtf_files .. listings .. tmpListings .. tables

Free space required (minimum): Prokaryotes (~ 8-9 GB) Eukaryotes (~ 6-7 GB)

Please make sure that you run the createDB.sh script only once per organism type (otherwise, the downloaded data will get corrupted)

Authors


  • Kashi Vishwanath Revanna
  • Vivek Krishnakumar
  • Qunfeng Dong