Python_MadMapper_V112_RECBIT.py Scoring System

by Alexander Kozik, UC Davis, R.Michelmore group

BIT scoring system of Python_MadMapper_V112_RECBIT.py script for RILs (recombinant inbred lines) is similar to scoring system for alignments of protein or DNA sequences. Genotype data for particular marker for the set of RILs can be considered as a sequence (string). Comparison of two or more genotype datasets can be considered as a comparison of two or more strings (sequences). As for protein/DNA sequence alignment approach there is a scoring matrix. For recombination inbred lines (RILs) scoring matrix may look like:


           #################################################################
           #                                   +-------+                   #
           #                                   |  BIT  |                   #
           #                  SCORING SYSTEM:  |       |                   #
           #                                   |  REC  |                   #
           #                                   +-------+                   #
           #                                                               #
           #    .      +-------+-------+-------+-------+-------+-------+   #
           #      .    |       |       |       |       |       |       |   #
           #        .  |   A   |   B   |   C   |   D   |   H   |   -   |   #
           #          .|       |       |       |       |       |       |   #
           #   +-------*-------+-------+-------+-------+-------+-------+   #
           #   |       | . 6   |  -6   |  -4   |   4   |  -2   |   0   |   #
           #   |   A   |       |       |       |       |       |       |   #
           #   |       |   0  .|   1   |   1   |   0   |  0.5  |   0   |   #
           #   +-------+-------*-------+-------+-------+-------+-------+   #
           #   |       |  -6   | . 6   |   4   |  -4   |  -2   |   0   |   #
           #   |   B   |       |       |       |       |       |       |   #
           #   |       |   1   |   0  .|   0   |   1   |  0.5  |   0   |   #
           #   +-------+-------+-------*-------+-------+-------+-------+   #
           #   |       |  -4   |   4   | . 4   |  -4   |   0   |   0   |   #
           #   |   C   |       |       |       |       |       |       |   #
           #   |       |   1   |   0   |   0  .|   1   |   0   |   0   |   #
           #   +-------+-------+-------+-------*-------+-------+-------+   #
           #   |       |   4   |  -4   |  -4   | . 4   |   0   |   0   |   #
           #   |   D   |       |       |       |       |       |       |   #
           #   |       |   0   |   1   |   1   |   0  .|   0   |   0   |   #
           #   +-------+-------+-------+-------+-------*-------+-------+   #
           #   |       |  -2   |  -2   |   0   |   0   | . 2   |   0   |   #
           #   |   H   |       |       |       |       |       |       |   #
           #   |       |  0.5  |  0.5  |   0   |   0   |   0  .|   0   |   #
           #   +-------+-------+-------+-------+-------+-------*-------+   #
           #   |       |   0   |   0   |   0   |   0   |   0   | . 0   |   #
           #   |   -   |       |       |       |       |       |       |   #
           #   |       |   0   |   0   |   0   |   0   |   0   |   0  .|   #
           #   +-------+-------+-------+-------+-------+-------+-------*.  #
           #                                                               #
           #                                                               #
           #   NOTES:                                                      #
           #      C - NOT A ( H or B )                                     #
           #      D - NOT B ( H or A )                                     #
           #      H - A and B                                              #
           #                                                               #
           #################################################################

Values in this particular scoring matrix were determined empirically after several "try to see what happen" experiments. These values were adjusted to get good and reproducible results for the set of RILs in range of 100 - 200 individuals of generation 8 or higher. However, most likely, they will work with smaller or large datasets. Examples how this scoring works you can see below:


           #################################################################
           #                                                               #
           #                        EXAMPLES OF SCORING:                   #
           #                                                               #
           #                                                               #
           #  POSITIVE LINKAGE:                                            #
           #                                                               #
           #  AAAAAAAAAAAAAAAAAAAA     BIT SCORE = 6*20 = 120              #
           #  AAAAAAAAAAAAAAAAAAAA     REC SCORE = 0 (0.0)                 #
           #                    ..                                         #
           #  AAAAAAAAAAAAAAAAAAAA     BIT SCORE = 6*18 - 6*2 = 96         #
           #  AAAAAAAAAAAAAAAAAABB     REC SCORE = 2 (2/20 = 0.1)          #
           #                                                               #
           #  AAAAAAAAAABBBBBBBBBB     BIT SCORE = 6*10 + 6*10 = 120       #
           #  AAAAAAAAAABBBBBBBBBB     REC SCORE = 0 (0.0)                 #
           #           ..                                                  #
           #  AAAAAAAAABABBBBBBBBB     BIT SCORE = 6*18 - 6*2 = 96         #
           #  AAAAAAAAAABBBBBBBBBB     REC SCORE = 2 (2/20 = 0.1)          #
           #                                                               #
           #                                                               #
           #  NO LINKAGE:                                                  #
           #            ..........                                         #
           #  AAAAAAAAAAAAAAAAAAAA     BIT SCORE = 6*10 - 6*10 = 0         #
           #  AAAAAAAAAABBBBBBBBBB     REC SCORE = 10 (10/20 = 0.5)        #
           #   . . . . . . . . . .                                         #
           #  BBBAABBAAAAAAABAABBB     BIT SCORE = 6*10 - 6*10 = 0         #
           #  BABBAABBABABABBBAABA     REC SCORE = 10 (10/20 = 0.5)        #
           #                                                               #
           #                                                               #
           #  NEGATIVE LINKAGE:                                            #
           #    ..................                                         #
           #  AAAAAAAAAAAAAAAAAAAA     BIT SCORE = 6*2 - 6*18 = -96        #
           #  AABBBBBBBBBBBBBBBBBB     REC SCORE = 18 (18/20 = 0.9)        #
           #    ..................                                         #
           #  ABABABABABABABABABAB     BIT SCORE = 6*2 - 6*18 = -96        #
           #  ABBABABABABABABABABA     REC SCORE = 18 (18/20 = 0.9)        #
           #                                                               #
           #                                                               #
           #################################################################

We have applied this approach to real dataset of RILs for Arabidopsis. Results of scoring and clustering you can find here.


email to: akozik@atgc.org Alexander Kozik

last modified July 27 2004