LOT

LOT is a software program that performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait.

Citation

  • Zhang, M., Feng, R., Chen, X.,  Hu, B., and Zhang, H. (2008) LOT: a Tool for Linkage Analysis of Ordinal Traits for Pedigree Data. Bioinoformatics 24;1737-9.
  • Feng, R., Leckman, J.F., and Zhang, H. (2004) Linkage Analysis of Ordinal Traits for Pedigree Data. Proc Natl Acad Sci. 101;16739-16744.

Condition of Use

  • LOT can be used and distributed free of charge under the following terms and conditions. 
  • Any research using this program or the methods or ideas behind it should acknowledge the use of LOT, and cite the references in the Citation section.

THIS SOFTWARE IS PROVIDED BY THE COLLABORATIVE CENTER FOR STATISTICS IN SCIENCE AT YALE UNIVERSITY AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COLLABORATIVE CENTER FOR STATISTICS IN SCIENCE AT YALE UNIVERSITY BE LIABLE  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Versions

  • LOT 1.2: April 24th, 2008
  • LOT 1.1: February 18th, 2008

Back to top

Methodology

Four Main Steps of LOT 

I. Data Input This part is modified from the Genehunter program to accommodate the ordinal trait and to add the graphic user interface (GUI). See Input file formats for the details. 


II. Inference of Inheritance Vectors 

This part is taken from the Genehunter program. The method is described in (Kruglyak et al., 1996). LOT infers the inheritance pattern of a pedigree by means of inheritance vectors, v, which is irrelevant to the type (continuous or categorical) of the trait. The inheritance pattern at a marker location is completely described by an inheritance vector img 006whose elements describe the outcomes of the paternal and maternal meioses transmitted to the n offspring in a pedigree. Specifically, img 010 or 2 according to whether the grand paternal or grand maternal allele is transmitted in the paternal meiosis to the j th offspring. img 014carries the similar information for the corresponding maternal meiosis, namely, img 014= 3 or 4 according to whether the grand paternal or grand maternal allele was transmitted in the maternal meiosis to the j th offspring. 



III. Latent Variable Proportional-odds Logistic Model 

This step assesses a potential link between a marker and the trait locus through the inheritance pattern at a locus. We use a proportional-odds logistic model that includes two types of latent random variables to detect association between a marker and a disease locus. The two types of latent variables, and img 016, img 018represent: (1) the common genetic or environmental factors in a family that are not observed through the covariates and (2) the genetic susceptibility introduced by the family founders and transmitted to their offspring. Conditional on all of the latent variables and inheritance vectors, within the img 020family, the traits of all nonfounders are independent. Let superscript i denote the img 020family and subscript j denote the img 023nonfounder in the family. Given a trait img 025taking an ordinal value from img 027, the trait of the img 023nonfounder in the family follows the distribution: 


img 033

where x is the vector of covariates that is available for each study subject, img 035is the vector of parameters reflecting the covariate effects on the trait, img 037is the trait-level-dependent intercept and img 039indicates the familial and genetic contributions to the trait. The EM algorithm (Dempster et al., 1977) is used to find the maximum-likelihood estimation (MLE) of the parameters. After obtaining the MLEs of the parameters, the log-likelihoods while considering img 016(img 041) only and considering both img 016and img 018are computed. The difference between the log-likelihoods is used for determining the significance level of linkage. Under certain regularity conditions, the twice of the difference follows a mixture of chi-square distribution under the null hypothesis (img 041). We have also conducted extensive simulation experiments to derive the distribution of the log-likelihood ratio statistic under the null hypothesis for microsatellite markers and use the simulation result to set the level of suggestive and significance linkage signals. 



IV. Output 

This part uses JFreeChart library for GUI. 

LOT and GENEHUNTER 


LOT and Genehunter (parametric analysis) have equivalent parametrizations when the trait is binary. For clarity, let us assume no residual familial and genetic effects and no covariates (i.e., no img 016and x). For the parametric analysis in GENEHUNTER, the likelihood at a location t can be written as

img 047

where img 049is the set of all possible inheritance vectors for the i th family, f=(f0, f1, f2) denotes the fixed penetrance parameters that must be specified beforehand, img 051and img 053 is the number of disease allele for the jth individual in the ith family. img 055corresponds to the disease allele frequency. For any given img 055, imag 059and img 061that control the penetrance of the binary trait in our model as follows 

img 063, and thus,

img 065img 067and img 069represent the equivalent parametrization of the penetrance in our model to that in GENEHUNTER.
Ascertainment

LOT itself does not consider ascertainment. Families may not be collected at random and often, only families with at least one member with particular trait values are included in the study. The non-random ascertainment may result in over-sampling subjects affected with diseases from the original population. Parameter estimation may be biased and proper adjustment for ascertainment should be considered in this circumstance. Please refer to (Wang and Zhang, 2007) for more discussion.

Sensitivity of Parameter Inputs

It is important to assess the sensitivity of the LOT scan as to how the results may depend on the specification of the allele frequencies at the trait locus. This can be done by considering several choices (e.g., 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5) of the allele frequency and/or by estimating them as part of the maximization of the penalized log-likelihood. 

Back to top

Input File Formats

The input file for LOT uses a slightly modified format from supports input files the standard GENEHUNTER (or LINKAGE) format. Two input files are required: a locus data file and pedigree file.   

(I) locus file. This file contains information on genetic distances between markers, number of alleles at each locus and their frequencies. We explain this file using the sample.loc.   

The first row of five numbers in sample.loc is:

31 0 0 5 0
  • 31 is the number of loci; also see LINKAGE manual.       
  • 0 refers to risk locus; also see LINKAGE manual.       
  • 0 means not sex linked; also see LINKAGE manual.      
  • 5 is the designated program code used by LINKAGE, referring to MLINK.      
  • 0 is the number of covariates. This is an added feature in LOT. If you have two covariates such as sex and age to adjust for, change 0 to 2.
The second row of four numbers in sample.loc is:
0 0.0 0.0 0


The four numbers represent:
  • Mutation locus:
    = 0, if mutation rates are zero,
    = the mutation locus number (input order) for non-zero mutation rates.
  • Male mutation rate.
  • Female mutation rate
  • Linkage disequilibrium
    0, if loci are assumed to be in linkage equilibrium.
    =  1, if loci are in linkage disequilibrium.  When loci are in linkage equilibrium, allele frequencies must be given under each locus description; otherwise, haplotype frequencies are provided.
The third row in sample.loc is:
1 2 3 4 5 ?? 31


This gives the order of all marker loci, from D1S468 to D1S1609.
The fourth row in sample.loc is:
12

  •       1 refers to the nature of the trait locus, and always use 1 for LOT
  •       2 means that the trait locus is di-allelic
The fifth row in sample.loc is the assumed allele frequencies of the trait locus.
0.910000  0.090000





The sixth row in sample.loc is a single number, referring to the number of liability

classes. Set it to 1 for LOT.
1





The seventh row in sample.loc specifies the penetrances of the genotypes at the diallelic trait locus.
0.165000 0.575000 0.75000



The following rows specify the allele frequencies of each marker. For example, marker D1S468 classified as type ??3?? according to the LINKAGE program terminology and has 9 alleles. Hence, we enter the following information.
3 9 # D1S468
0.076492 0.014699 0.008799  0.261774  0.055894  0.026497  0.417558  0.132387
 0.005899




The last marker is D1S1609.
3 12 # D1S1609
0.005701  0.011301  0.073407 0.240124 0.330533  0.124312 0.096010  0.053705
0.050805 0.008501 0.002800  0.002800


    



After specifying all marker information, the following row is included in the locus file to conform the format of the LINKAGE program, but is not used by LOT.
0 0


  • The first 0 means no sex difference
  • The second 0 means no interference
The next row in sample.loc specifies the recombination or map distance between all markers
0.000000 12.000000 13.710000 7.120000 8.280000 11.410000 15.850000
3.070000 13.830000 12.530000 7.020000 4.650000 11.820000 11.370000
3.510000 11.490000 12.210000 6.750000 4.780000 16.430000 10.140000
10.250000 6.020000 7.700000 7.220000 6.280000 7.570000 7.410000
12.870000 7.020000


Distances may be specified as either recombination-fractions or centiMorgans, with the necessary assumption that if EVERY distance is less than 0.5, they are all assumed to be recombination-fractions, otherwise (if ANY distance is greater than 0.5) they are interpreted as centiMorgan distances.  If one chooses to use a map distance, it should be in Kosambi cM.


The next row in sample.loc specifies in which order the phenotype is coded.
0

This row can either have a zero or a one. A zero here indicates that lower numbers signify higher severity of the disease trait and higher numbers signify lower severity. A one indicates that lower numbers signify lower severity of the disease trait and higher numbers signify higher severity.



The last two rows in sample.loc specify the number of levels and the threshold of each level if the phenotype were to be further divided into levels.
3
1 2 3

The ??3?? on the first line in the example above indicates that the phenotypes would be divided into 3 + 1 = 4 levels, namely 0, 1, 2 and 3. The next line provides the threshold for each level. A phenotype is assigned level 0 if it is strictly smaller than the first threshold, which is ??1?? in the example. A phenotype is assigned level 1 if it is larger or equal to 1 and strictly smaller than 2. A phenotype is assigned level 2 if it is larger or equal to 2 and strictly smaller than 3. If a phenotype is larger or equal to 3, then it is assigned level 3. In this sense, the threshold for the level 3 (the highest level) is positive infinity and hence omitted. Thus, only 3 thresholds are provided for 4 levels. The assignment of phenotype Y in the above example can be summarized in the following table.
Phenotype Y Level
img 071 0
img 073 1
img 075 2
img 077 3
(missing) 999

(II) Pedigree file. This file must consist of columns with the following information in the correct order (e.g., sample.ped):

Pedigree_ID Person_ID Father_ID Mother_ID Gender Phenotype Marker_genotypes Covariates.

The columns should be separated by spaces or tabs (any number of these is allowed).

1)   Pedigree ID: pedigree identifier

2)   Person ID: individual identifier

3)   Father ID and Mother ID: founders parents are coded as 0. Note: Everyone must have either two parents or no parents in the data set. Enter 0 if one or both parents are not available.

4)   Gender and, for the gender column (1 = Male and 2 = Female). Note that gender can be re-entered again as duplicated column later to serve as a covariate.

5)   Phenotype(Y): Missing phenotypes can be coded as 999.

6)   Marker genotype code: to code a codominant marker locus phenotype, simply list the two numbered alleles with at least one space or tab between the alleles. The unknown genotype is coded as 0 0.

7)   The covariates such as gender and age.

 

The pedigree/person IDs are treated as character strings. They do not have to be integers or numbered sequentially. The phenotype and covariates can be integers or reals.

Back to top

Downloads

Executables

  •  LOT_Windows.exe
  •  LOT_Windows_command_line.exe
  • LOT_Linux.tar.gz(The javax.swing package is required to be installed in your system before running LOT.)
  •  LOT_Linux_command_line
  •  LOT_Mac_command_line

Sample Files

Back to top

Running LOT

a. Running LOT with user-friendly graphical interface on Windows or Linux.

1. After downloading the executables for either Windows or Linux, depending on what your operating system is, and extracting all files from the archive, double click on LOT.jar to start the program. To start a new calculation, click on ??New Project?? in the File menu.

2. Select the input files and click on Add to add them into the project. You can add more than one sets of input files to a project at the same time to have them run sequentially. 

Please note that if you are running LOT with the GUI in Linux, please make sure there is no white space in the file names or paths. White spaces in file names and paths will cause a Wrong number of parameters error.

3. Click on Clear if the selected input files need to be discarded and repeat step 2 to select desired input files. Otherwise, click on Run to perform calculations are input files displayed in the File Selected window. Intermediate output produced by the program, which indicates the progress of computation, is displayed in the Intermediate Output window.

4. After computation is done, LOT displays LOT finished and the location of the result file in the Intermediate Output window.

5. You can click on View Result Files to open the dialog that contains the options for displaying the result files. In the dropdown list, select the output file you would like to see.

6. Click on View Text and/or View Image to display the tabulated text output and the diagram. In the tabulated text output, the name of each marker, the position of each marker and inter-marker location, the log-likelihood computed without considering any latent variables, the log-likelihood computed with U1 and the log-likelihood computed considering both U1 and U2 are listed. The graphical output plots the difference in log-likelihood while considering both U1 and U2 and just U1 against the positions of the markers and inter-marker locations (the green curve). The blue line and red line are thresholds for suggestive linkage and significant linkage obtained from simulation studies with 400 micro-satellite markers. If the difference in log-likelihood exceeds these thresholds, the name of the markers at that particular location is displayed on the curve.

7. While the tabulated text output is automatically saved into a tab-delimited plain text file, the user has the option to save the diagram in PNG format by selecting Save As in the File menu in the diagram's dialog.

The user also has the option to suppress the marker names displayed on the curve by clicking on the Hide Significant Markers button on the lower left corner of the window.

8. You can save the current project by selecting Save from the File menu of the main program window. By doing so, the next time you open the LOT program you can view the results from this project without repeating the calculation.

9. To open a saved project, click on Open Project in the File menu. Repeat step 5-7 to view the results saved for the project.

b.  Running  LOT  from command line in Windows.

After downloading LOT_Windows_command_line.exe there are two options to evoke it.

  • Double click on LOT_Windows_command_line.exe and a DOS window will pop up. The user will be prompted to enter the .ped, .loc and output file names. After the file names are provided, the LOTprogram will start executing.
  • In the folder where LOT_Windows_command_line.exe is saved, it can be evoked by typing LOT_Windows_command_line.exe sample.ped sample.loc sample_output.txt in a DOS window. Replace sample.ped, sample.loc and sample_output.txt with the name of the pedigree file, locus file and output file of your choices. If more or less than three file names were provided, the program would prompt the user to enter the file names again

Please note that if the files are not located in the same folder as the executable, use the full path instead of just the file names. The tab-delimited text output is the only output file provided under this option.

Back to top

c. Running LOT from command line in Linux.

After downloading LOT_Linux_command_line, in the folder where LOT_Linux_command_line is saved, it can be evoked by typing ??./LOT_Linux_command_line sample.ped sample.loc sample_output.txt?? in a terminal window. Replace ??sample.ped??, ??sample.loc?? and ??sample_output.txt?? with the name of the pedigree file, locus file and output file of your choices. If more or less than three file names were provided, the program would prompt the user to enter the file names again.

Please note that if the files are not located in the same folder as the executable, use the full path instead of just the file names. If white spaces are part of a file name (or path), enclose the file name (or the path) with. The tab-delimited text output is the only output file provided under this option.
d. Running LOT from command line in Mac OS X.

After downloading LOT_Mac_command_line, in the folder where LOT_Mac_command_line is saved, it can be evoked by typing ??./LOT_Mac_command_line sample.ped sample.loc sample_output.txt?? in a terminal window. Replace ??sample.ped??, ??sample.loc?? and ??sample_output.txt?? with the name of the pedigree file, locus file and output file of your choices. If more or less than three file names were provided, the program would prompt the user to enter the file names again.

Please note that if the files are not located in the same folder as the executable, use the full path instead of just the file names. If white spaces are part of a file name (or path), enclose the file name (or the path) with ????. The tab-delimited text output is the only output file provided under this option.

Genehunter License Agreement

 Below is the GENEHUNTER license agreement listed as required by GENEHUNTER.

* License Agreement
Copyright (c) 1995
Whitehead Institute for Biomedical Research. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1.      Redistributions must reproduce the above copyright notice, this list of conditions and the following disclaimer in the  documentation and/or other materials provided with the distribution. Redistributions of source code must also reproduce this information in the source code itself.

2.      If the program is modified, redistributions must include a notice (in the same places as above) indicating that the redistributed program is not identical to the version distributed by Whitehead Institute.

3.      All advertising materials mentioning features or use of this software must display the following acknowledgement:

        This product includes software developed by the Whitehead Institute for Biomedical Research.

4.      The name of the Whitehead Institute may not be used to endorse or promote products derived from this software  without specific prior written permission.

We request that users of this software inform us by sending email to software_registration@genome.wi.mit.edu.

We also request that use of this software be cited in publications as:

L. Kruglyak, M.J. Daly, M.P. Reeve-Daly, and E.S. Lander. "Parametric and Nonparametric Linkage Analysis: A Unified Multipoint Approach". American Journal of Human Genetics 58:1347-1363 (June 1996).  For versions 1.2 and above, please also cite: L. Kruglyak and E.S. Lander. "Faster Multipoint Linkage Analysis Using Fourier Transforms".  Journal of Computational Biology 5:1-7 (1998).

THIS SOFTWARE IS PROVIDED BY THE WHITEHEAD INSTITUTE ``AS IS'' AND  ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE  ARE DISCLAIMED. IN NO EVENT SHALL THE WHITEHEAD INSTITUTE BE LIABLE  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS  OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.