CPDL

Conserved Property Difference Locator

CPDL is designed to flag positions in a multiple sequence alignment where two homologous groups of proteins differ either in an amino acid residue or in a property such as charge or hydrophobicity. The idea is that such positions are likely to be biologically important for defining the specific functions of the two groups.

The program is described in detail in this open access publication:
BMC Bioinformatics 2005, 6:284 (30 Nov 2005)

Run CPDL

Here are some examples you might find helpful:
Alignment file Top group Output
MurE/D 1 to 25 PDF
ATP cyclases 1 to 20 PDF
FAD OH 1 to 2 PDF
Open the alignment file, save to your machine as a text file and use as input here.
(Note: property track displays were set to "if in main track" for these examples)

CPDL is now OPEN SOURCE!
CPDL4 is a perl rewrite of the original CPDL. The source code is now available under an open source license and a tarball can be found here and also via ftp at ftp://ftp.genome.bnl.gov/CPDL/. More info in the README file.








Explanations

Alignment file
CPDL assumes that the input alignment contains sequences from two groups of homologous proteins. The input alignment must be organized such that the sequences of the first group occupy rows 1 through N and all members of the second group fall below row N.

CPDL recognizes alignment files generated with programs such as ClustalW (http://www.ebi.ac.uk/clustalw/) that have been saved as .aln or .msf only at the present time. Note that VectorNTI alignment files can be exported in .msf format and then used in CPDL. Please send an email to the Webmaster to inquire about additional file types.

Files can be converted to the proper format using the ReadSeq tool (http://thr.cit.nih.gov/molbio/readseq/)

Top group
The input here should be the number N that defines the last sequence in the first group of your alignment file.

Output file name
Please name your file it will automatically be tagged as .ps or .pdf by the program when you choose which to generate.

Track
The tracks are (1) the main sequence track which is always turned on, followed in descending order down the page: (2) size, (3) hydrophobicity, (4) charge, (5) polarity, and (6) aromaticity.




Size
Defined as tiny (G, S, A, CH), small (Cs-s, N, D, T, V) or large (I, L, M, F, Y, W, H, K, R, E, Q).

Hydrophobicity
Defined as hydrophobic (F, Y, W, H, K, T, C, G, A, V, I, L, M) or hydrophilic (P, S, N, D, E, Q, R).

Charge
Charged residues are defined as (R, K, H, D, E).

Polarity
Polar residues are defined as (Y, W, H, K, R, D, E, Q, N, T, S, Cs-s).

Aromaticity
Aromatic residues are defined as (F, W, Y, H).

Display Status
Defines which property tracks will be included in the output. The default setting is off for all tracks. Each track can be turned completely on, in which case the properties for every single residue in the protein sequence will be displayed. Alternative options allow the user to display the properties at only those positions where there are differences in (1) a specific property track, (2) the main sequence track, or (3) at a red hourglass (picture) position in the main sequence track.

Conservation level
When set to all, requires every member of a class to contain the same residue at a given position.
When set to all-but-one, allows one member of a class to have an alternative residue.

Grayscale Function
Defines the level of shading of the residues/properties in the tracks. The default setting is flat which makes all conserved residues black and all non-conserved residues medium gray. The linear setting colors the non-conserved residues along a linear gradient relative to their frequency in the alignment, while the exponential setting uses an exponential gradient.

Darkness adjustment
A number between 0.0 and 1.0, defaulting to 0.25.
If some features are too faint on your printer output, try increasing this number.

Output Files
CPDL generates either Postscript or PDF files. You can download a program for reading .ps files here (http://www.cs.wisc.edu/~ghost/) and one for .pdf files here (http://www.adobe.com/products/acrobat/readstep2.html).

Flags
Closed triangles mark positions at which one group has a conserved residue that is different from all the residues found at the same position in the other group. The triangle will be open if any residue at the same position in the other group matches the conserved residue in the first group.

Orange circles mark positions at which one group has at least one conserved property that is different from the properties found at the same position in the other group.