|
Description: This is a forest-based approach to detecting haplotypes and interactions among them in association with a disease. Citation: X. Chen, C.-T. Liu, M. Zhang and H.P. Zhang. A forest-based approach to identifying gene and gene-gene interactions, PNAS, 104: 19199–19203, 2007. To perform an analysis:
To invoke HapForest from command line, enter “java-jar toRun.jar response_file hap_file1 hap_file2...” in the installation folder. The response_file is a file specifying the response (disease status) of each subject, in which1 stands for affected and 0 for unaffected. A sample response_file can be found here. The hap_file1 and hap_file2 and etc each corresponds to the haplotype configuration of a region, output from SNPHAP. The order of the subject in these files should be same as that of the response_file. A sample hap_file is provided here. The number of hap_files depends on the number of haplotype blocks identified in the previous steps. Options:
Two types of output files are generated from the program. Real_out.txt is a file containing the haplotypes identified by HapForest. The first n rows in the file are dedicated to the haplotypes identified from the real data, where n is the number of haplotypes. For each haplotype, we list the haplotype block where the haplotype is from, its haplotype value, its importance value and its p-value. The remainder of the file lists maximum importance value of haplotypes from each permuted case. A sample Real_out.txt file with 5000 permuted cases can be found here. The configuration files with the names hapfile 1_config.txt and hapfile 2_config.txt and etc contain the relative orders of the haplotypes as they appear in Real_out.txt. For each case, real or permuted, the importance values of the haplotypes listed in hapfile 2_config.txt are appended after the importance values of the haplotypes in hapfile 1_config.txt and so on. A sample configuration file is provided here. Again the number of configuration files depends on the number of haplotype blocks. |