Skip to content

KeyValuePairArguments

Cassandra Targett edited this page Dec 5, 2016 · 1 revision

Key-Value Pair Arguments

Key-value pair arguments apply to the ingest job generally. These arguments are expressed as -argument value. They are the last arguments supplied before the jar name is defined.

There are several possible arguments:

-cls

Required.

The ingest mapper class. This class must correspond to the content being indexed to ensure proper parsing of documents. See the section [Ingest Mappers] for a detailed explanation of each available ingest mapper.

  • com.lucidworks.hadoop.ingest.GrokIngestMapper

  • com.lucidworks.hadoop.ingest.CSVIngestMapper

  • com.lucidworks.hadoop.ingest.DirectoryIngestMapper

  • com.lucidworks.hadoop.ingest.RegexIngestMapper

  • com.lucidworks.hadoop.ingest.SequenceFileIngestMapper

  • com.lucidworks.hadoop.ingest.SolrXMLIngestMapper

  • com.lucidworks.hadoop.ingest.WarcIngestMapper

  • com.lucidworks.hadoop.ingest.ZipIngestMapper

-c

Required.

The collection name where documents should be indexed. This collection must exist prior to running the Hadoop job jar.

-of

Required.

The output format. For all cases, you can use the default com.lucidworks.hadoop.io.LWMapRedOutputFormat.

-i

Required.

The path to the Hadoop input data. This path should point to the HDFS directory. If the defined location is not a specific filename, the syntax must include a wildcard expression to find documents, such as /data/*.

-s

The Solr URL when running in standalone mode. In a default installation, this would be http://localhost:8983/solr. Use this parameter when you are not running in SolrCloud mode. If you are running Solr in SolrCloud mode, you should use -zk instead.

-zk

A list of ZooKeeper hosts, followed by the ZooKeeper root directory. For example, 10.0.1.1:2181,10.0.1.2:2181,10.0.1.3:2181/solr would be a valid value.

This parameter is used when running in SolrCloud mode, and allows the output of the ingest process to be routed via ZooKeeper to any available node. If you are not running in SolrCloud mode, use the -s argument instead.

-redcls

The class name of a custom IngestReducer, if any. In order for this to be invoked, you must also set -ur to a value higher than 0. If no value is specified, then the default reducer is used, which is com.lucidworks.hadoop.ingest.IngestReducer.

-ur

The number of reducers to use when outputting to the OutputFormat. Depending on the output format and your system resources, you may wish to have Hadoop do a reduce step so the output resource is not overwhelmed. The default is 0, which is to not use any reducers.

Clone this wiki locally