GitHub - datma-health/vagrant: Spark-Hadoop VM cluster

Vagrant files for a Spark-Hadoop VM Cluster to test software from Omics Data Automation, Inc.

Not suited for production environments.

License

See the LICENSE file for license rights and limitations (MIT).

Prerequisites

Vagrant and VirtualBox installations for your host.

Before using a Vagrant machine with the shared folder functionality, you should install the vagrant-vbguest plugin:

#!bash
vagrant plugin install vagrant-vbguest

Supported Platforms

Centos 7 is the only target OS supported.

Open Vagrantfile and change the variables - ip, memory, cpus, num_slaves, slave_memory and slave_cpus. The default is to start up a master-only Spark-Hadoop cluster. Also, take note of the default forwarded port numbers especially if you have other servers and/or VMs using the ports.

# The master node will get the following ip as its address.
# The slave instances will have ip+i as their ip addresses.
$ip = "192.168.33.10"
$memory = 4096
$cpus = 2

# number of slave instances have to be less than 10
$num_slaves = 0
$slave_memory = 2048
$slave_cpus = 2

Invoke vagrant_VM_configure to install the spark-hadoop cluster.

#!bash
./vagrant_VM_configure.sh

To test HDFS and Spark, bring up a vagrant shell(vagrant ssh from the folder containing Vagrantfile on the host machine)

~/vagrant: vagrant ssh
[vagrant@oda-master ~]$ which start-dfs.sh
/usr/local/hadoop/sbin/start-dfs.sh
[vagrant@oda-master ~]$ start-dfs.sh
Starting namenodes on [oda-master]
The authenticity of host 'oda-master (127.0.0.1)' can't be established.
...
[vagrant@oda-master ~]$ hdfs dfs -mkdir /tmp
18/06/12 14:48:48 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.8.1-hadoop2
[vagrant@oda-master ~]$ hdfs dfs -ls /
18/06/12 14:48:54 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.8.1-hadoop2
Found 1 items
drwxr-xr-x   - vagrant supergroup          0 2018-06-12 14:48 /tmp
[vagrant@oda-master ~]$

The Vagrant target VM and the host machine share two folders.

host folder containing Vagrantfile is mapped to /vagrant in the target VM
parent folder to the folder containing Vagrantfile is mapped to /source in the target VM.

~/vagrant: vagrant ssh
[vagrant@oda-master ~]$ ls /vagrant
build_gatk4.sh             disable_selinux.sh             install_opencv_prereqs.sh  provision.sh                  README.md       Vagrantfile~
build_genomicsdb_distr.sh  hadoop-config                  LICENSE.md                 provision.sh~                 reset_eth1.sh   vagrant_VM_configure.sh
build_genomicsdb.sh        install_gatk4_prereqs.sh       local                      provision_spark_hadoop.sh     spark_setup.sh  vagrant_VM_configure.sh~
build_opencv.sh            install_genomicsdb_prereqs.sh  master_id_rsa.pub          pseudo-cluster-hadoop-config  Vagrantfile
[vagrant@oda-master ~]$ ls /source/vagrant
build_gatk4.sh             disable_selinux.sh             install_opencv_prereqs.sh  provision.sh                  README.md       Vagrantfile~
build_genomicsdb_distr.sh  hadoop-config                  LICENSE.md                 provision.sh~                 reset_eth1.sh   vagrant_VM_configure.sh
build_genomicsdb.sh        install_gatk4_prereqs.sh       local                      provision_spark_hadoop.sh     spark_setup.sh  vagrant_VM_configure.sh~
build_opencv.sh            install_genomicsdb_prereqs.sh  master_id_rsa.pub          pseudo-cluster-hadoop-config  Vagrantfile
[vagrant@oda-master ~]$

Install and build scripts are currently available for

Check out the settings at the head of respective scripts to make any changes in versions, build and install parameters before invoking them. All scripts are invokable from a vagrant shell.

For example, invoke build_opencv.sh to install prerequisites, build and install OpenCV into your Vagrant VM instance. Bring up a new instance of vagrant shell to get your apps locate the OpenCV libraries and binaries.

~/vagrant: vagrant ssh
[vagrant@oda-master ~]$ cd /vagrant
[vagrant@oda-master vagrant]$ ./build_opencv.sh
...
- Installing: /usr/local/bin/opencv_version
-- Set runtime path of "/usr/local/bin/opencv_version" to "/usr/local/lib64"
Setting up OpenCV environment ...
ENV_FILE=/etc/profile.d/opencv.sh
Installing OpenCV DONE
[vagrant@oda-master vagrant]$

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vagrant files for a Spark-Hadoop VM Cluster to test software from Omics Data Automation, Inc.

License

Prerequisites

Supported Platforms

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
hadoop-config		hadoop-config
pseudo-cluster-hadoop-config		pseudo-cluster-hadoop-config
web		web
.emacs.d.tgz		.emacs.d.tgz
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
Vagrantfile		Vagrantfile
build_gatk4.sh		build_gatk4.sh
build_genomicsdb.sh		build_genomicsdb.sh
build_genomicsdb_distr.sh		build_genomicsdb_distr.sh
build_opencv.sh		build_opencv.sh
disable_selinux.sh		disable_selinux.sh
install_gatk4_prereqs.sh		install_gatk4_prereqs.sh
install_genomicsdb_prereqs.sh		install_genomicsdb_prereqs.sh
install_intel_zlib.sh		install_intel_zlib.sh
install_opencv_prereqs.sh		install_opencv_prereqs.sh
install_protobuf.sh		install_protobuf.sh
install_tiff.sh		install_tiff.sh
protobuf-v3.0.0-beta-1.autogen.sh.patch		protobuf-v3.0.0-beta-1.autogen.sh.patch
provision.sh		provision.sh
provision_spark_hadoop.sh		provision_spark_hadoop.sh
reset_eth1.sh		reset_eth1.sh
spark_setup.sh		spark_setup.sh
vagrant_VM_configure.sh		vagrant_VM_configure.sh

License

datma-health/vagrant

Folders and files

Latest commit

History

Repository files navigation

Vagrant files for a Spark-Hadoop VM Cluster to test software from Omics Data Automation, Inc.

License

Prerequisites

Supported Platforms

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages