Author: | Marco Massenzio (marco@alertavert.com) |
---|---|
Revision: | 0.2 |
Created: | 2015-12-15 |
Updated: | 2015-12-20 |
We wish to enable operators to execute remotely arbitrary commands on Apache Mesos Agent/Master nodes, out-of-band from the normal task execution framework.
For more information, please consult the references listed in Prerequisites.
This is a Mesos anonymous module which can be loaded using the enclosed
modules.json
descriptor; this module will add Endpoints such
that:
- we can monitor the status (active/inactive) of this module;
- we can remotely execute arbitrary (shell) commands, with optional arguments;
- we can retrieve the outcome of the command;
- we can terminate previously launched processes.
This is not meant to offer full remote shell functionality, however.
The code in this repository is not released under an Open Source license.
This code is (c) 2015 AlertAvert.com. All rights reserved.
This may change in due course, but currently the only allowed use is for training and learning purposes: the code is meant to be used by developers of Mesos Modules to learn how to create their own module.
We explicitly disallow usage of this code, or any derivation thereof, in any commercial software deployed in Production for use by external users (regardless of whether the intended use).
If you wish to use this code in Production and/or modify it, please contact the author directly at the following address:
marco (at) alertavert (dot) com
You can retrieve the status of this module:
GET /remote/status
Returns a 200 OK response if this module is active:
200 OK { "release": "0.2.0-d94e907", "result": "ok", "sandbox_dir": "/mnt/mesos/sandbox", "status": "active", "work_dir": "/tmp/agent" }
If it's active, you can execute command
remotely on the Agent:
POST /remote/execute Request format: mesos::CommandInfo { "command": "ls", "shell": false, "arguments": ["-la", "/tmp"], }
command
- The binary command to execute; must be in the Agent's execution
$PATH
and the user running the Agent must have the required permissions to execute it. shell
If
true
thecommand
will be executed inside a shell process (in other words, we will execute something similar tosh -c command
).NOTE By default,
shell
inCommandInfo
is set totrue
- this has implications for this module, as thearguments
are ignored ifshell
is not specified (or set to``true``) and only thevalue
of the command is passed in - always make sure to specify ``"shell": false`` if you want the arguments to be passed to the command.arguments
- An array of strings that will be passed verbatim (i.e., without any
escaping or variable substitution) to
command
. [1]
There are several other fields in the CommandInfo
protobuf (see the
mesos.proto source) but not all of them are actually used in this module:
we currently ignore the Environment
(but see below
Timeout) user
and uris
fields.
The request executes a command on the Agent asynchronously; the response will contain the process's PID, that can be used afterwards to recover the outcome of the command (if any):
200 OK { "result": "OK", "pid": 6880 }
In order to specify a timeout in seconds for the command to execute, we need
to use one of the environment variables passed in via the environment
field in CommandInfo
:
{ "value": "sleep 5", "environment": { "variables": [ { "name": "EXECUTE_TIMEOUT_SEC", "value": "3" } ] } }
The EXECUTE_TIMEOUT_SEC
expresses the timeout in seconds, to wait for the
command to complete: if value
is exceeded, the implementation will try
and kill the process (sending a SIGTERM
signal) and the Future
will
be completed.
Note the response for both requests (see below to get the outcome of
the command) will be a 200 OK, but the exitCode
will be 9 (SIGKILL
) and
the signaled
field will be set to true
:
200 OK { "exitCode": 9, "signaled": true, "stderr": "", "stdout": "" }
The Agent logs also confirm that the command timed out:
I0102 01:24:10.856061 11020 execute_module.cpp:142] Running 'sleep' with args [ 5 ]; as PID [11030] E0102 01:24:13.858758 11026 execute_module.cpp:174] Command sleep timed out after 3 seconds. Aborting process 11030 I0102 01:24:13.905586 11019 execute_module.cpp:168] Result of 'sleep' was an error I0102 01:24:22.391561 11017 execute_module.cpp:236] Retrieving outcome for PID '11030'
Note The value
for timeout is of string
type, but must be a valid
integer.
To retrieve the outcome of the command [2]
POST /remote/task { "pid": 6880 }
Will return a RemoteCommandResult
response encoded in JSON:
200 OK { "exitCode": 0, "signaled": false, "stderr": "", "stdout": "total 1972\ndrwxr-xr-x 4 marco marco 4096 Dec 20 14:28 agent ...\ndrwxrwxrwt 2 root root 4096 Dec 17 16:06 .X11-unix\n" }
If the command errors out it will result in an exitCode
different from
EXIT_SUCCESS
(0) and if it times out, it will be in the signaled
state with the exitCode
the value of the signal (most likely SIG_KILL
or 9, as it was killed by the cleanup()
method) [3]
POST /remote/task { "pid": 1373 }
may return:
200 OK { "exitCode": 2, "signaled": false, "stderr": "ls: cannot access /foo/bar: No such file or directory\n", "stdout": "" }
Finally, to get the list of currently running and executed processes:
GET /remote/task
will return a list of valid pids
to query for:
200 OK { "pids": [12141, 12454, ... 12144] }
You obviously need Apache Mesos to build this
project: in particular, you will need both the includes (mesos
, stout
and libprocess
) and the shared libmesos.so
library.
In addition, Mesos needs access to picojson.h
and a subset of the boost
header files: see the
3rdparty
folder in the mirrored github repository for Mesos, and in particular the
boost-1.53.0.tar.gz
archive.
The "easiest" way to obtain all the prerequisites would probably be to clone the Mesos
repository, build mesos and then install it in a local folder that you will then need to
configure using the LOCAL_INSTALL_DIR
property (see CMake below).
Finally, you need the libsvn
library (this is required by Mesos): on OSX
this can be obtained using brew
:
brew install svn
Apache Mesos makes extensive use of Protocol Buffers
and this project uses them too (see the proto/execute.proto
file).
In order to build this module, you will need to download, build and install Google's protobuf version 2.5.0 (this is the most recent version used by Mesos - using a more recent one will cause compile and runtime errors) - see the link above for more details.
We assume that the protoc
binary will be installed in the same LOCAL_INSTALL_DIR
location;
assuming that this is set to be the $LOCAL_INSTALL
env variable:
cd protobuf-2.5.0/ ./configure --prefix $LOCAL_INSTALL make -j 4 && make install
see the protobuf documentation for more info.
This module uses cmake to build the module and the
tests; there are currently two targets: execmod
and execmod_test
, the
library and the tests, respectively.
It also needs a number of libraries and header files (see Prerequisites)
that we assume to be in the include
and lib
subdirectories of a
directory located at ${LOCAL_INSTALL_DIR}
; this can be set either using
an environment variable ($LOCAL_INSTALL
) or a cmake
property
(-DLOCAL_INSTALL_DIR
):
mkdir build && cd build cmake -DLOCAL_INSTALL_DIR=/path/to/usr/local .. make # If you want to run the tests in the execmod_test target: ctest
See the Mesos anonymous module documentation for more details; however, in
order to run a Mesos Agent with this module loaded, is a simple matter of
adding the --modules
flag, pointing it to the generated JSON
modules.json
file (the CMake step will generate it in the gen/
folder) [4]
$ ${MESOS_ROOT}/build/bin/mesos-slave.sh --work_dir=/tmp/agent \ --modules=/path/to/execute-module/gen/modules.json \ --master=zk://zk1.cluster.prod.com:2181
See Configuration
on the Apache Mesos documentation pages for more
details on the various flags.
Also, my zk_mesos github project provides an example Vagrant configuration showing how to deploy and run Mesos from the Mesosphere binary distributions.
If MESOS-4253 is accepted and the code committed, the module will also gain access to the
Agent's flags, and in particular to --work_dir
and --sandbox_dir
that could be further used
when executing commands to store logs, etc.
See the init()
method in the RemoteExecutionAnonymous
class.
Run ctest
from the build
directory, or launch the execmod_test
binary:
cd build && ./execmod_test
Notes
[1] | In other words, using {"command": "echo", "arguments": ["$PATH"]}
will result in {"exitCode": 0, "stdout": "$PATH\n"} . |
[2] | It is currently not possible to create a RESTful API using libprocess
Process::route() method, as it's not possible to create routes with wildcard
URLs (such as /remote/task/.* ) as in other HTTP frameworks.
(see process.cpp for more details, and in particular the handlers
struct ). |
[3] | Note that even in the case the command itself failed, the response code
is stiil a 200 OK : |
[4] | Make sure that the "file" field in the JSON points to the correct location
(on the Agent node) where the libexecmod.so file is located; watch out
for erros in the Agent's log. |