-
Notifications
You must be signed in to change notification settings - Fork 74
Advanced Configurations
The advanced configuration of an MMT engine can be manually set through the XML file engine.xconf located in <your_mmt_home>/engines/<your_engine_name>/engine.xconf
.
You will find below information on how to properly configure an engine through the engine.xconf file. Or you can just skip to some interesting configuration examples.
The engine.xconf file is automatically generated during the engine creation and training (launched with the ./mmt create
command). By default it looks like this and it already provides a valid configuration.
<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
xmlns="http://www.modernmt.eu/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<engine source-language="en" target-language="it" />
</node>
It is possible to configure properties by adding XML elements and attributes under the element “node”. Here is an example of a configured file:
<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
xmlns="http://www.modernmt.eu/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<engine source-language="en" target-language="it" name="my_engine">
<decoder threads="5" />
</engine>
<network>
<api port="8000"/>
<join>
<member host="16.51.99.2" port="5015" />
</join>
</network>
<datastream embedded="false" host="16.51.99.2"/>
<db embedded="false" host="16.51.99.2" />
</node>
Customizable properties include:
Some of these settings can also be passed as command line arguments; in case of conflicts the command line arguments are considered to have higher priority. In case a property is defined neither in the configuration file nor by command line, its default value will be employed.
Please note that, if you launch again the ./mmt create
command for a certain engine, the existing configuration of that engine will be overwritten with the basic default one.
The general features of the engine are described in the auto-generated "engine" XML element, child to the "node" XML element.
Here is an example of a fully configured "engine" XML element:
<node>
...
<engine source-language="en" target-language="it">
<decoder enabled="true" gpus="1,3,5" />
<aligner enabled="false" />
</engine>
...
</node>
Description | Valid Values |
---|---|
source-language |
The original language to translate from. NOTE: this field must not be used if the child is present |
target-language | The language to translate to |
ModernMT supports multilingual engines, meaning that a single engine can handle multiple (unidirectional) language pairs.
If you want to enable multiple language pairs, you need to:
- erase the default
source-language
andtarget-language
attributes in your<engine>
node; - add a new
<languages>
child to your your<engine>
node; - add to
<languages>
as many<pair>
nodes as the language pairs you want to enable, and set theirsource
andtarget
attributes to the corresponding language tags.
Each <pair>
child under the <languages>
node of <engine>
represents a language pair that must be enabled for this engine. Valid pair attributes include:
Valid decoder attributes include:
Attribute name | Description |
---|---|
source | The language tag of the source language of this pair |
target | The language tag of the target language of this pair |
Adding a "decoder" element allows to set the features of the translation decoder to use. Valid decoder attributes include:
Attribute name | Description | Valid Values | Default value |
---|---|---|---|
enabled | It defines whether the engine should use a decoder or not | true or false | true |
threads |
The decoder will run on CPU using this amount of threads. NOTE: In order to specify that it must run on CPUs it is also mandatory to set the gpus attribute to "none".
|
-- | (run on GPUs) |
gpus | Comma-separated list of the ids of the GPUS that the neural decoder will use. Example: 1,3,5 |
A single GPU id or a comma-separated list of GPU ids; 'none' if no GPUs must be used |
All the available GPUs |
Adding an "aligner" element allows to set the features of the aligner component to use. Valid aligner attributes include:
Attribute name | Description | Valid Values | Default value |
---|---|---|---|
enabled | It defines whether the engine should use an aligner for Tag Projection API. |
true or false | false |
To define the network behaviour of the engine, add a "network" XML element under "node".
Here is an example of a fully configured "network" element.
<node>
...
<network host="10.5.10.237" port="5000" interface="eth0">
<api port="8888" root="test" />
<join>
<member host="31.41.59.1" port="5015"/>
<member host="31.41.59.2" port="5016"/>
<member host="31.41.59.3" port="5017"/>
</join>
</network>
...
</node>
The attributes in the "network" node can be used to specify the general network settings:
Attribute name | Description | Valid Values | Default value |
---|---|---|---|
host | The IP address that this machine must be reachable at by the other cluster nodes | -- | The Ipv4 address of this machine |
port | The cluster communications logic port | -- | 5016 |
interface | The network interface where this machine will listen to cluster communication messages | -- | null |
More specific network settings, such as the REST APIs and the cluster joining configurations, require the definition of specific XML elements under "network":
The configuration of the REST Server used to expose APIs can be set in a new "api" XML element under "network". Valid attributes for element "api" include:
- true: launch the REST server and expose APIs
- false: do not expose any REST APIs
Attribute name | Description | Valid Values | Default value |
---|---|---|---|
enabled | It defines whether the engine should expose REST APIs or not |
|
true |
port | the REST APIs port | -- | 8045 |
root | the path in the host where REST APIs must be exposed. | -- | None |
Adding a "join" XML element under "network" allows the configuration of an MMT cluster. In "join" it is possible to specify a series of "member" children elements. Each member is a potential entry point to the cluster: this engine will contact them in order until one of them answers back. Each "member" element requires two attributes:
Attribute name | Description | Valid Values | Default Value |
---|---|---|---|
host | the current member IP address or hostname | -- | -- |
port | the cluster communication port | -- | -- |
To define the way the engine should connect to a data stream, add a "datastream" XML element, child to the "node" XML element.
Here is an example of a fully configured "datastream" element:
<node>
...
<datastream enabled="true" embedded="false" host="31.41.59.1" port="9999"/>
...
</node>
Valid "datastream" attributes are:
Attribute name | Description | Valid Values | Default value |
---|---|---|---|
enabled | it defines whether this engine should use a data stream |
|
true |
embedded | it defines whether the data stream belongs to an MMT engine or is a separate process |
|
true |
host | the the data stream host IP address or hostname | -- | localhost |
port | the data stream port | -- | 9092 |
name | The name of the data stream this engine should interact with. | -- |
|
To define the way the engine should connect to the Database, add a "db" XML element, child to the "node" XML element.
Here is an example of a fully configured "db" element:
<node>
...
<db enabled="true" embedded="false" host="31.41.59.1" port="9444" type="mysql" name="mmtDB"/>
...
</node>
Valid "db" attributes are:
Attribute name | Description | Valid Values | Default Value |
---|---|---|---|
enabled | It defines whether the engine should connect with a DB or not |
|
true |
host | the database host IP address or hostname | -- | localhost |
port | the database port | -- | 9042 |
name | the name of the database this engine should interact with | -- |
|
embedded | it defines whether the database belongs to an MMT engine or is a separate process |
|
true |
type | the type of the DB to interact with |
|
cassandra |
Here are some examples of how engine.xconf files can be used to configure nodes for various scenarios.
This is a sample configuration for an MMT engine named 'default' working alone and exposing its REST APIs on port 8045. During the execution of ./mmt start
, the engine itself launches the database process with port 8042 and the data stream process with port 8092; during the execution of ./mmt stop
, these processes are stopped as well.
<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
xmlns="http://www.modernmt.eu/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
</node>

As an alternative, you may want your MMT engine to use already launched database and data stream instances running on your machine.
You can set the database and datastream as not embedded and you can also specify their ports, that may be different from the default ones:
<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
xmlns="http://www.modernmt.eu/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
<datastream embedded="false" port="<your_data_stream_port>" />
<db embedded="false" port="<your_db_port>" />
</node>
That's it!
Of course, if a service is running as not embedded, it will not be stopped by the ./mmt stop
command.
In an MMT cluster with a Leader-Followers style:
- the Leader is a node that hosts, in addition to an engine, both the database process and the data stream process.
- the Followers join the cluster using any of its members as an entry point, and connect directly to the Leader's database and data stream.
Using an MMT cluster lets nodes propagate translation knowledge and jobs, leading to better scalability and fault-tolerance. Separate engine instances should run on different machines.
The Leader may be configured as shown in Example 1, and it should be started as first in order to make sure the Database and data stream processes are running when the Followers try to connect.
The Followers, on the contrary, require a slightly different configuration. For sake of simplicity let's say that all nodes use default ports and names, and that Followers will use as an entry point the Leader (this it's not mandatory: they may use any node that already is a cluster member):
<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
xmlns="http://www.modernmt.eu/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
<network>
<join>
<member host="31.41.59.3" port="5016" />
</join>
</network>
<datastream host="31.41.59.3" />
<db host="31.41.59.3" />
</node>

Note that the database and datastream, that are embedded in the Leader node, are considered embedded by the Followers too (and since the default "embedded" value is "true" it is not necessary to add that attribute).
As an alternative to the previous configuration, Followers can keep the default engine.xconf configuration and be started with the -join--leader options set to the Leader host:
./mmt start -join--leader 31.41.59.3
In an MMT cluster with Peer-to-Peer style, the database and the data stream processes run in a cluster member, but separate machines. Therefore all nodes have the same role, and the Leader's single-point-of-failure is avoided. Moreover, the specified database and data stream hosts may hide replication and load balancing techniques, ensuring fault-tolerance of the system.
As before, for sake of simplicity let's use default ports and names; moreover, let's consider node 31.41.59.1 as the first node to start and the one that everyone tries to join to.
Note that when the first node is started there are no cluster members to join. As a consequence, the first node does not need any nodes, and all the other nodes may use the first one as an entry point to the cluster (again, this is not mandatory: they may use any node that already is a cluster member).
Here is the configuration of any node but the first:
<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
xmlns="http://www.modernmt.eu/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
<network>
<join>
<member host="31.41.59.1" port="5016" />
</join>
</network>
<datastream embedded="false" host="27.18.28.2" name="<your_data_stream_name>"/>
<db embedded="false" host="27.18.28.1" name="<your_database_name>"/>
</node>

Note that, in opposition to the Example 2, the datastream and database are now specified as not embedded. It is thus necessary to set their names too.