LOT

 

LOT is a software program that performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait.

 

Contents

  1. Citation
  2. Condition of use
  3. Versions
  4. Methodology
  5. Input file formats
    1. .loc file
    2. .ped file
  6. Downloads
  7. Running LOT
    1. Running LOT with GUI on Windows and Linux
    2. Running LOT from command line in Windows
    3. Running LOT from command line in Linux
    4. Running LOT from command line in Mac OS X
  8. Genehunter License Agreement

 

 

 

Citation

 

*      Zhang, M., Feng, R., Chen, X.,  Hu, B., and Zhang, H. LOT: a Tool for Linkage Analysis of Ordinal Traits for Pedigree Data. [Under review at Bioinoformatics]

*      Feng, R., Leckman, J.F., and Zhang, H. (2004) Linkage Analysis of Ordinal Traits for Pedigree Data. Proc Natl Acad Sci. 101;16739-16744

 

Back to top

 

 

Condition of use

 

*      LOT can be used and distributed free of charge under the following terms and conditions.

*      Any research using this program or the methods or ideas behind it should acknowledge the use of LOT, and cite the references in the Citation section.

THIS SOFTWARE IS PROVIDED BY THE COLLABORATIVE CENTER FOR STATISTICS IN SCIENCE AT YALE UNIVERSITY ¡°AS IS¡± AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COLLABORATIVE CENTER FOR STATISTICS IN SCIENCE AT YALE UNIVERSITY BE LIABLE  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

 

Back to top

 

 

Versions

 

*      LOT 1.2                 April 24th, 2008

*      LOT 1.1                 February 18th, 2008

 

Back to top

 

 

Methodology

 

*      Four Main Steps of LOT

              I.            Data Input

This part is modified from the Genehunter program to accommodate the ordinal trait and to add the graphic user interface (GUI). See Input file formats for the details.

           II.            Inference of Inheritance Vectors

This part is taken from the Genehunter program. The method is described in (Kruglyak et al., 1996). LOT infers the inheritance pattern of a pedigree by means of inheritance vectors, v, which is irrelevant to the type (continuous or categorical) of the trait. The inheritance pattern at a marker location  is completely described by an inheritance vector whose elements describe the outcomes of the paternal and maternal meioses transmitted to the  offspring in a pedigree. Specifically,  or 2 according to whether the grand paternal or grand maternal allele is transmitted in the paternal meiosis to the th offspring.  carries the similar information for the corresponding maternal meiosis, namely, = 3 or 4 according to whether the grand paternal or grand maternal allele was transmitted in the maternal meiosis to the th offspring.

         III.            Inference of Inheritance Vectors

This step assesses a potential link between a marker and the trait locus through the inheritance pattern at a locus. We use a proportional-odds logistic model that includes two types of latent random variables to detect association between a marker and a disease locus. The two types of latent variables,  and , represent: (1) the common genetic or environmental factors in a family that are not observed through the covariates and (2) the genetic susceptibility introduced by the family founders and transmitted to their offspring. Conditional on all of the latent variables and inheritance vectors, within thefamily, the traits of all nonfounders are independent. Let superscript i denote the family and subscript j denote the  nonfounder in the family. Given a trait taking an ordinal value from , the trait of the  nonfounder in the family follows the distribution:

,

where x is the vector of covariates that is available for each study subject,  is the vector of parameters reflecting the covariate effects on the trait,  is the trait-level-dependent intercept and  indicates the familial and genetic contributions to the trait. The EM algorithm (Dempster et al., 1977) is used to find the maximum-likelihood estimation (MLE) of the parameters. After obtaining the MLEs of the parameters, the log-likelihoods while considering  () only and considering both  and  are computed. The difference between the log-likelihoods is used for determining the significance level of linkage. Under certain regularity conditions, the twice of the difference follows a mixture of chi-square distribution under the null hypothesis . We have also conducted extensive simulation experiments to derive the distribution of the log-likelihood ratio statistic under the null hypothesis for microsatellite markers and use the simulation result to set the level of suggestive and significance linkage signals.

               IV.      Output

This part uses JFreeChart library for GUI.

*        LOT and GENEHUNTER

LOT and Genehunter (parametric analysis) have equivalent parametrizations when the trait is binary. For clarity, let us assume no residual familial and genetic effects and no covariates (i.e., no  and ). For the parametric analysis in GENEHUNTER, the likelihood at a location  can be written as

,

where  is the set of all possible inheritance vectors for the ith family, f=(f0, f1, f2) denotes the fixed penetrance parameters that must be specified beforehand,  and  is the number of disease allele for the jth individual in the ith family.  corresponds to the disease allele frequency. For any given ,  and  that control the penetrance of the binary trait in our model  as follows , and thus ,  and  represent the equivalent parametrization of the penetrance in our model to that in GENEHUNTER.

*      Ascertainment

LOT itself does not consider ascertainment. Families may not be collected at random and often, only families with at least one member with particular trait values are included in the study. The non-random ascertainment may result in over-sampling subjects affected with diseases from the original population. Parameter estimation may be biased and proper adjustment for ascertainment should be considered in this circumstance. Please refer to (Wang and Zhang, 2007) for more discussion.

*      Sensitivity of Parameter Inputs

It is important to assess the sensitivity of the LOT scan as to how the results may depend on the specification of the allele frequencies at the trait locus. This can be done by considering several choices (e.g., 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5) of the allele frequency and/or by estimating them as part of the maximization of the penalized log-likelihood.

 

Back to top

 

 

Input file formats

 

The input file for LOT uses a slightly modified format from supports input files the standard GENEHUNTER (or LINKAGE) format. Two input files are required: a locus data file and pedigree file.

 

(I) locus file. This file contains information on genetic distances between markers, number of alleles at each locus and their frequencies. We explain this file using the sample.loc.

 

The first row of five numbers in sample.loc is

31 0 0 5 0

*      31 is the number of loci; also see LINKAGE manual.

*      0 refers to risk locus; also see LINKAGE manual.

*      0 means not sex linked; also see LINKAGE manual.

*      5 is the designated program code used by LINKAGE, referring to MLINK.

*      0 is the number of covariates. This is an added feature in LOT. If you have two covariates such as sex and age to adjust for, change 0 to 2.

 

The second row of four numbers in sample.loc is

0 0.0 0.0 0

The four numbers represent:

*      Mutation locus:

= 0, if mutation rates are zero,

= the mutation locus number (input order) for non-zero mutation rates.

*      Male mutation rate.

*      Female mutation rat

*      Linkage disequilibrium

=  0, if loci are assumed to be in linkage equilibrium.

 =  1, if loci are in linkage disequilibrium.  When loci are in linkage equilibrium, allele frequencies must be given under each locus description; otherwise, haplotype frequencies are provided.

 

The third row in sample.loc is

1 2 3 4 5 ¡­ 31

This gives the order of all marker loci, from D1S468 to D1S1609.

 

The fourth row in sample.loc is

1 2

*      1 refers to the nature of the trait locus, and always use 1 for LOT

*      2 means that the trait locus is di-allelic

 

The fifth row in sample.loc is the assumed allele frequencies of the trait locus.

0.910000 0.090000

The sixth row in sample.loc is a single number, referring to the number of liability

classes. Set it to 1 for LOT.

1

The seventh row in sample.loc specifies the penetrances of the genotypes at the diallelic trait locus.

0.165000 0.575000 0.75000

 

The following rows specify the allele frequencies of each marker. For example, marker D1S468 classified as type ¡°3¡± according to the LINKAGE program terminology and has 9 alleles. Hence, we enter the following information.

3 9 # D1S468