-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME
113 lines (82 loc) · 3.62 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
- Download most recent "specieswiki-latest-pages-articles.xml" from
http://dumps.wikimedia.org/specieswiki/latest/
- Download the HTML for image-containing pages into HTML/
(Skips overwrites, so if there's a problem with the HTML, delete first)
./getImageHTML.pl < specieswiki-latest-pages-articles.xml
- Download the images on those pages into Images/
./getImages.pl
rm -r HTML # 1G of .html no longer needed
- Convert PNG, GIF, JPEG with convert[...].pl scripts
cd Images.NEW
mkdir PNG; mv *.png PNG; mkdir GIF; mv *.gif GIF; mkdir JPEG; mv *.jpeg JPEG
cp ../convert*.pl .
mv convertPNG.pl PNG; mv convertGIF.pl GIF; mv convertJPEG.pl JPEG
cd PNG; ./convertPNG.pl; mv -i *.jpg ..; cd .. # do by hand; too much for mv
cd GIF; ./convertGIF.pl; mv -i *.jpg ..; cd ..
cd JPEG; ./convertJPEG.pl; mv -i *.jpg ..; cd ..
- Erase broken images (check those under 100bytes)
(Currently none appear to be broken...)
- Put images somewhere special for safekeeping.
mv Images.NEW Images
- Resize images into new directory.
Run from base directory, looks for Images/*.jpg, creates new directory
Images.SHRUNK
Has benefits of (a) not copying seriously corrupt images, and
(b) warning about partially downloaded images
At the moment, shrinkImages.pl also GROWS small images
./shrinkImages.pl
mv Images Images.ORIG; mv Images.SHRUNK Images
- Fix regex-breaking titles in specieswiki-latest-pages-articles.xml:
<title>* Mystacidium curvatum</title>
(Change * to x)
- Convert wiki into links and common names
./buildTree.pl < specieswiki-latest-pages-articles.xml 2> buildTree.LOG > TREE.txt
- Fix weird things (maybe not technically errors):
Remove Dictyozoa -> Bikonta
Revive †Synapsida
Theria -> Placentalia (same as Eutheria)
- Add top of tree
Up -> {Biota}
Biota -> {Acytota} {Cytota}
Cytota -> {Bacteria} {Neomura}
Remove: Superregio -> {Regio} {Eukaryota} {Archaea} {Bacteria}
Remove PAGENAME, BASEPAGENAME, etc. from resulting TREE.txt
(or better yet, debug buildTree.pl)
:1,$s/ {*PAGENAME}*//g
:1,$s/ {*BASEPAGENAME}*//g
:1,$s/ {TOC[^}]*}//g
- Deal with any other problems... (see Errors.txt):
- Filter out the results
grep "{" TREE.txt > LINKS.txt
grep "=" TREE.txt > NAMES.txt
- Cut off long links and write out DEAD.txt for †
(Note, this gives a bunch of worrisome warnings, but appears to get
the job done...)
./pruneLinks.pl < LINKS.txt 2> pruneLinks.LOG > PRUNED.txt
- Double Check Eukaryota -> Bikonta
------------------------------------------------------------------------
- Create raw graph files: (~ 1hr)
# THIS CAN SEGFAULT! WHY?
./calcBests.Batch.pl Biota 2> calcBests.Batch.LOG < PRUNED.txt;
mkdir SVG
- Create a single SVG, for testing:
./run.pl Species Parent
- Create SVG's: (~2hrs)
./run.batch.pl Biota Up < PRUNED.txt
- Update directory links
./webifyAll.sh
- Enable zooming, panning (best in Chrome, Safari, okay in FF, no IE?)
./spiffSVGAll.sh
- Edit Biota.svg to point up to Wikispecies instead of Up.svg
(http://species.wikimedia.org/wiki/Main_Page)
- Create common name index
./makeNameList.pl < NAMES.txt > NameList.html
---------------------------------------------------------------------------------------------------------
- Rerun individual graphs (includes webify and spiffyify)
./rerun Species
- Rerun graphs that break Graphviz layout by tweaking these numbers in
makeGraph.pl:
$sep = max(2,int(15-200/$length)); # try adding 1 (or 2 if necessary)
$esep = max(3.9,int(10-100/$length)); # try subtracting 1
DON'T forget to reset these parameters before next normal run
- Rerun graphs with bad color