forked from DReichLab/EIG
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
170 lines (136 loc) · 7.88 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
## -- Forked Branch Modifications --
Modification Version: M1.0.0
This is a fork of the original EIGENSOFT package to implement the modifications listed below. The motivation for these modifications is to produce text-based output that can be easily parsed to allow ease-of-access to the data for custom programs.
PROGRAM: convertf
This program converts between the following formats: ANCESTRYMAP, PACKEDANCESTRYMAP, EIGENSTRAT, PED, PACKEDPED. The modifications introduced only affect the ANCESTRYMAP output format.
* Added build flag to prevent ANCESTRYMAP output from automatically switching to PACKEDANCESTRYMAP for large files.
- This will force the output to be the text ANCESTRYMAP format, but may result in HUGE file sizes.
* Added build flag to enable tab-separated ANCESTRYMAP geno file output instead of whitespace-separated data.
- This is to reduce ANCESTRYMAP output file size and to simplify parsing by custom programs.
* Added build flag to enable tab-separated ANCESTRYMAP snp file output instead of whitespace-separated data.
- This is to reduce output file size and to simplify parsing by custom programs.
* Added build flag to enable tab-separated ANCESTRYMAP ind file output instead of whitespace-separated data.
- This is to reduce output file size and to simplify parsing by custom programs.
Build Notes:
* See the build information in the pre-fork project's comments below for specifics.
* Summary:
- Install gcc compiler
- Install the following development librarys:
- build-essential
- libopenblas-dev
- liblapacke-dev
- libgsl-dev
- To compile, switch to the 'src' directory and enter 'make'.
- If there are errors mentioning missing lapacke symbols (see details below), use this command instead: make LDLIBS="-llapacke"
- After a successful make, type 'make install'. The compiled programs will be placed in the '../bin' directory.
## -- ORIGINAL (PRE-FORK) REPOSITORY/CREATOR COMMENTS ARE BELOW THIS LINE -- ##
## shrinkmode added. 3/15
EIGENSOFT version 8.0.0, 03/30/21 (for Linux only)
The EIGENSOFT package implements methods from the following 2 papers:
Patterson et al. 2006 PLoS Genet 2:e190 (population structure)
Price et al. 2006 Nat Genet 38:904-9 (EIGENSTRAT stratification correction)
NEW features of EIGENSOFT version 7.2.0
-- shrinkmode
NEW features of EIGENSOFT version 6.1.4 include:
-- pcaselection was omitted from 6.1.3 by accident
-- Statically linked GSL/openblas
-- Fixed memory allocation bug in pcaselection
-- Some routines moved into nicklib
-- Error message on allocate failure now prints length as "%ld"
supporting long values.
NEW features of EIGENSOFT version 6.1.3 include:
-- Restored script file extensions to .perl instead of .pl
-- Added updated ploteig script that disappeared from the repository
NEW features of EIGENSOFT version 6.1.2 include:
-- Updated license info to be GPL compliant required by linking the GSL
NEW features of EIGENSOFT version 6.1.1 include:
-- Minor bug fix to correctly merge version 6.0.2 and version 6.1 changes.
-- pcaselection operates on evec files. Added examples.
-- Backported twtable.c/h from EIGENSOFT 7alpha
NEW features of EIGENSOFT version 6.1 include:
-- The range finding step of PCA fastmode only scales the multiplied matrix,
as orthogonalization is unnecessary. This appears to improve accuracy.
NEW features of EIGENSOFT version 6.0.2 include:
-- Fixed Makefile and documentation to build eigenstrat properly
-- Moved Tracy-Widom table into a header file for easier building
NEW features of EIGENSOFT version 6.0.1 include:
-- Minor bug fix which prevents smartpca from trying to print out eigenvalues
if fastmode is set.
NEW features of EIGENSOFT version 6.0.0beta included:
-- New option fastmode which implements a very fast pca approximation.
See POPGEN/README and Galinsky 2014 ASHG talk.
-- Changes to external packages required. EIGENSOFT version 5.0.2 required
lapack + blas. On the other hand, EIGENSOFT version 6.0beta requires
GSL + lapack + OpenBLAS (but does not require the native version of blas).
The Makefile has been changed accordingly.
-- EIGENSOFT version 6.0beta supports multi-threading. See POPGEN/README.
-- Bug fix for ldregress option.
See CONVERTF/README for documentation of programs for converting file formats.
See POPGEN/README for documentation of population structure programs.
See EIGENSTRAT/README for documentation of EIGENSTRAT programs.
Questions?
See https://www.hsph.harvard.edu/alkes-price/eigensoft-frequently-asked-questions/
https://github.com/DReichLab/EIG
For questions about building this software:
Matthew Mah <matthew_mah@hms.harvard.edu>
For questions about smartpca:
Nick Patterson <nickp@broadinstitute.org>
For questions about eigenstrat:
Alkes Price <aprice@hsph.harvard.edu>
Executables and source code:
----------------------------
All C executables are in the bin/ directory.
We have placed source code for all C executables in the src/ directory,
for users who wish to modify and recompile our programs.
"cd src"
"make"
"make install"
Note that some of our software will only compile if your system has the
GSL + lapack + OpenBLAS packages installed.
On Mac OSX, you can install gsl and OpenBLAS with lapack using homebrew:
"brew install gsl"
"brew install openblas"
If these packages are not in standard directories, you can specify the
appropriate include and library directories with the CFLAGS and LDFLAGS
make variables.
For example, on the Harvard Medical School O2 cluster, the command is:
'make CFLAGS="-I/n/app/openblas/0.2.19/include -I/n/app/gsl/2.3/include" LDFLAGS="-L/n/app/openblas/0.2.19/lib -L/n/app/gsl/2.3/lib/"'
On Mac OSX:
'make CFLAGS="-I/usr/local/opt/openblas/include -I/usr/local/opt/gsl/include" LDFLAGS="-L/usr/local/opt/openblas/lib -L/usr/local/opt/gsl/lib"'
If you have issues with missing lapacke symbols, for example "undefined reference to `LAPACKE_dlange'", run make with the corresponding additional libraries linked:
'make LDLIBS="-llapacke"'
This has been encountered on Linux Mint 18.
If you have trouble compiling and running our code, try compiling and
running the pcatoy program in the src directory:
"cd src"
"make pcatoy"
"./pcatoy"
If you are unable to run the pcatoy program successfully, please contact
your system administrator for help, as this is a systems issue which is
beyond our scope. Your system administrator will be able to troubleshoot
your systems issue using this trivial program. [You can also try running
the pcatoy program in the bin directory, which we have already compiled.]
To remake the entire package:
"cd src"
"make clobber"
"make install"
To remake EIG7.2 it is necessary to link to the OpenBLAS library. On orchestra,
the path is /opt/openblas and should work automatically. On Broad institute machines,
the user should execute "use .openblas-0.2.8" and "use GCC-4.9" at the command
prompt before attempting to remake. All other users should install OpenBLAS and
set the variable OPENBLAS to the path at the make command line,
e.g. "make install OPENBLAS=/usr/local/openblas"
----------------------------
Acknowledgements:
EIGENSOFT was written by Nick Patterson, Alkes Price, Samuela Pollack,
Kevin Galinsky, Chris Chang, and Sasha Gusev.
We thank John Novembre and Mike Boursnell for code improvements, Matt Hanna
for the first implementation of multi-threading, and Angela Yu for a bugfix.
----------------------------
SOFTWARE COPYRIGHT NOTICE AGREEMENT
This software and its documentation are copyright (2010) by Harvard University
and The Broad Institute. All rights are reserved. This software is supplied
without any warranty or guaranteed support whatsoever. Neither Harvard
University nor The Broad Institute can be responsible for its use, misuse, or
functionality. The software may be freely copied for non-commercial purposes,
provided this copyright notice is retained.