Advice on how to recognize putative polymorphisms
Provided that it is large enough, any large dataset of rearranged immunoglobulin genes should include rearrangements containing all IGHV, IGHD and IGHJ genes that are capable of rearrangement. If the dataset of VDJ genes is aligned against a complete set of known IGHV, IGHD and IGHJ genes, many sequences will be found that include some mismatches with respect to the known genes. Analysis of the frequency distribution of the number of mismatches can then reveal putative polymorphisms. It is particularly easy to identify polymorphisms when sequences have come from a single individual, and the methods that can be used are most easily understood by considering two examples.
1. In an individual who is homozygous at a particular locus, and the allele that is carried is an unreported polymorphism, there will be few if any perfect matches to a previously reported allele. The frequency distribution of mismatches to that allele can be compared to the overall frequency distribution of mismatches in the complete dataset. This is most easily seen in a dataset of naive, IgM-associated VDJ genes. If there are four nucleotide differences between the newly identified allele and the most similar previously-reported allele, large numbers of sequences can be expected to carry four mismatches in the original analysis.
To confirm an IGHV polymorphism, it is important to check that there are many different IGHD and IGHJ genes associated with the putative polymorphism
A utility that automates the identification of polymorphisms has been developed by Steve Kleinstein's group at Yale University. The utility is available at http://clip.med.yale.edu/tigger/ and a PNAS paper describing the utility is available here.