-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
125 lines (96 loc) · 5.65 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
GlusterFS-Hadoop Packaging and Deployment
== Overview ==
This top-level directory contains files and scripts common to both Red Hat
Storage (RHS) and to non-RHS targets for the preparation of gluster volumes
for Hadoop workloads. If the target is fedora or other non-Red Hat Storage
(RHS) platorms then files under the "glusterfs/" directory are used. If the
target platform is RHS then there's more work to do, but the first step is
to clone the rhs-hadoop-install repo and to read its README files.
The installation script is simple to execute:
- ./install.sh --help # to learn about the various options,
- ./install.sh --mgmt-node <node> --yarn-master <node> [brick-device], eg:
./install.sh --mgmt-node mgmt --yarn-master yarn /dev/TestVG/LV101
- ./install.sh --mgmt-node=foo --yarn-master=bar --lvm \
--vgname --lvname raw-block-dev, eg:
./install.sh mgmt-node=foo --yarn-master=bar --lvm \
--vgname MyVG --lvname MyLV /dev/sdb
- examine the log file in /var/log/glusterfs-hadoop-install.log, or in
/var/log/rhs-hadoop-install.log for RHS.
Each sub-directory may contain a script named "pre_install.sh" and/or a script
named "post_install.sh". These are the only scripts within a sub-directory
that are automatically executed by the install.sh script. As expected,
"pre_install.sh" is invoked as the first step of the prep_node.sh script, and
"post_install.sh" is invoked as the last step of prep_node.sh. Note: the
prep_node.sh script is automatically invoked by install.sh script, once per
node.
Sub-directory *_install.sh scripts may execute additional programs and/or
scripts, but install.sh script only executes one "pre_install" and one
"post_install" script per sub-directory. Note: sub-directory *_install.sh
scripts are optional and if not present no sub-directory scripts are executed,
even if other executable scripts are present in the sub-directory. If there
are multiple sub-directories in the package, each with pre_|post_ install.sh
scripts, the execution order is determined by the alphabetic order of the sub-
directory names.
== Installation ==
The tarball is downloaded to one of the cluster nodes or to the user's
localhost. The download directory is arbitrary. The common install.sh requires
password-less ssh from the node hosting the install tarball (the "install-
from" node) to all nodes in the cluster.
The tarball should contain the following:
- functions: functions common to multiple scripts.
- either glusterfs/: directory for fedora-specific files/scritps --or--
rhs/ (plus optionally other rhs-specific sub-dirs).
- hosts.example: sample "hosts" config file.
- install.sh: the common install script, executed by the root user.
- prep_node.sh: companion script to install.sh, executed once per node.
- README.txt: this file.
- setup_container_executor.sh: script to configure a hadoop linux container.
== Before you begin ==
The "hosts" file must be created by the root user doing the install. It is not
part of the tarball, but an example hosts file is provided. The "hosts" file
is expected to be created in the same directory where the tarball has been
downloaded. If a different location is required the "--hosts" option can be
used to specify the "hosts" file path. The "hosts" file is defined in the
included "hosts.example" file, which should be read carefully.
IMPORTANT: the node order in the hosts file is critical for two reasons:
1) Assuming the storage volume is created with replica 2 then each pair of
lines in hosts represents replica pairs. For example, the first 2 lines in
hosts are replica pairs, as are the next two lines, etc.
2) Hostnames are expected to be lower-case.
Note:
- passwordless SSH is required between the installation node and each storage
node.
- the order of the nodes in the "hosts" file is in replica order.
== Installation ==
Instructions:
0) upload the tarball to the deployment directory on the "install-from" node.
1) extract tarball to the local directory:
$ tar xvzf <tarballName-version.tar.gz>
2) cd to the extracted rhs-hadoop-install directory:
$ cd <tarballName-version>
3) execute the common "install.sh" from the install directory:
$ ./install.sh [options (see --help)] <brick-dev> (see hosts.example for
more on brick-dev)
For example: ./install.sh --mgmt-node=one --yarn-master=two /dev/sdb
Output is displayed on STDOUT and is also written to a logfile. The default
logfile is: /var/log/<glusterfs|rhs>-hadoop-install.log. The --logfile
option allows for a different logfile. Even when a less verbose setting is
used the logfile will contain all messages.
Note: each storage node also has a logfile named
/tmp/<glusterfs|rhs>-hadoop-install/prep_node.log. This logfile is added
to the main logfile but may be useful if a node crashes or the script
hangs.
4) When the script completes remaining Hadoop distro and management steps need
to be followed. After hadoop distro installation completes, run the
provided setup_container_executor.sh script to configure hadoop linux
containers:
$ ./setup_container_executor.sh # no arguments
5) Validate the Installation
Open a terminal and navigate to the Hadoop Directory
cd /usr/lib/hadoop
Change user to the mapred user
su mapred
Submit a TeraGen Hadoop job test
bin/hadoop jar hadoop-examples-1.2.0.1.3.2.0-112.jar teragen 1000 in-dir
Submit a TeraSort Hadoop job test
bin/hadoop jar hadoop-examples-1.2.0.1.3.2.0-112.jar terasort in-dir out-dir