Borrelia burdorferi, linear chromosome

The nucleotide sequence of 909,265 bp of the large linear chromosome of Borrelia burgdorferi strain B31 has been determined at Brookhaven National Laboratory (BNL) by random first-end and directed second-end sequencing of plasmid libraries of random chromosomal fragments, followed by primer walking using 12-mer primers generated by ligation of two hexamers on hexamer templates. The sequence assembly was confirmed and contigs were aligned by end sequencing a framework of ~35-kbp fesmid clones. The few remaining gaps were filled by PCR amplification from fesmid clones or genomic DNA.

The same sequence has also been determined by The Institute for Genomic Research (TIGR) (Fraser et al., Nature, 390: 580-586 (1997)), and the current version of the TIGR sequence is available at The two sequences align with each other and with the physical map of Casjens and Huang (Molecular Microbiology 8: 967-980 (1993)). The TIGR sequence goes all the way to the covalently closed ends of the linear chromosome. The BNL sequence covers only the extent covered by BNL clones, and lacks 404 bp at the left end and 249 bp at the right end.

BNL Sequence via http or ftp
Length:909,265 bp of BNL sequence
909,918 bp total, including the first 404 bp and last 249 bp of the TIGR sequence
Coverage:The entire BNL sequence has been determined at least once on each strand
Notation: The BNL sequence is uppercase, except ambiguities
The 404 bp of TIGR sequence at the beginning and 249 bp at the end are in lower case
Ambiguities indicate where BNL sequences determined on different clones did not agree
Mismatch ambiguities are indicated by ambiguity codes (lower case)
Indel ambiguities are indicated by a lower case a, c, g, or t
Alignment between BNL and TIGR sequences:
Available via anonymous ftp from
Numbering is approximately the BNL number, diverging to accommodate small indels
Flags in the alignment:
       *mismatch between the BNL and TIGR sequences
-indel between the BNL and TIGR sequences
#BNL mismatch ambiguities
%BNL indel ambiguities
+TIGR ambiguities

Comparison of BNL and TIGR sequences:

     Alignment is between the TIGR sequence of 7/12/99 and the BNL sequence of 04/16/98.

     Indels: The TIGR sequence contains seven tandem copies of a 162-bp imperfect tandem repeat that occurs only twice in the BNL sequence, at 213,195 and 213,357. The alignment shown leaves out the five additional repeats in the TIGR sequence (810 bp). The TIGR sequence has a duplication of the AGA at BNL position 38,087 and a deletion of one copy of a 9-base tandem repeat at BNL positions 812,841 and 812,850. In addition there are 48 single-base indels between the two sequences and 8 BNL indel ambiguities.

     Mismatches: There are 36 single-base mismatches between the two sequences.

     Ambiguities: In the BNL sequence, ambiguities represent sequence differences among different clones in the plasmid libraries used for sequencing. The BNL sequence contains 57 mismatch ambiguities and 8 indel ambiguities. The TIGR sequence contains 43 ambiguity codes. Each ambiguity in both sequences corresponds to an unambiguous base in the other sequence, and, in every case, one of the ambiguous bases matches the unambiguous base in the other sequence.

Single-base differences between the BNL and TIGR sequences, as a function of BNL coverage

       BNL-TIGR discrepancies


coverage pairs mis indels total frequency mis indels mis total frequency
MM 369,937 6 14 20 0.5 x 10-4 31 3 12 46 1.2 x 10-4
MS 407,486 19 20 39 1.0 x 10-4 20 4 25 49 1.2 x 10-4
SS 131,852 11 14 25 1.9 x 10-4 6 1 6 13 1.0 x 10-4
Total 909,265 36 48 84 0.9 x 10-4 57 8 43 108 1.2 x 10-4

BNL coverage: MM indicates more than one sequence read on both strands
MS indicates more than one read on one strand and a single read on the other
SS indicates a single read on both strands
mis:numbers of single-base mismatches
indels:numbers of single-base indels

     Conclusions: The increasing frequency of discrepancies at lower BNL coverage suggests that a few errors remain in the BNL sequence. The discrepancies and ambiguities observed could be accounted for if each DNA preparation used for sequencing had polymorphisms at the ~0.01% level, with a similar level of polymorphism between the two DNA preparations.