-
Notifications
You must be signed in to change notification settings - Fork 85
Kafka Server Setup
Follow below steps on all nodes where kafka is to be installed & configured (1 node / multi-node deployment of kafka)
Preferred way is to get the kafka rpm from below location & install the same. This will install kafka binaries, and create required 'kafka' user and 'kafka' group. Note that this user is 'nohome' user.
wget https://github.com/Seagate/cortx/releases/download/third-party-deps-1.0.0-0/third-party-centos-7.8.2003-1.0.0-0.tar.gz
tar -xvf third-party-centos-7.8.2003-1.0.0-0.tar.gz
cd centos-7.8.2003-2.0.0-*/commons/kafka
yum install kafka-2.13_2.7.0-el7.x86_64.rpm
Validate 'kafka' user and group has been created. If not found follow below steps to create them
We will use Kafka
downloaded from Seagate's repository; and by default, that Kafka is configured to be run by kafka
user in kafka
group, thus we have to create such user and group.
sudo su
adduser kafka
usermod -aG wheel kafka
echo "kafka ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers.d/90-cloud-init-users
groupadd --force kafka
usermod --append --groups kafka kafka
exit
The following has to be configured in /opt/kafka/config/server.properties
Configure the below if hostname or FQDN (fully qualified domain name) is used in zookeeper.connect
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://{hostname/FQDN}:9092
Configure the below to make delete(purge) interface work
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=1
log.delete.delay.ms=1
log.flush.offset.checkpoint.interval.ms=1
Configure below to indicate log directory for kafka broker
log.dirs=/var/local/data/kafka
Make below changes to indicate data and log directory for zookeeper in file /opt/kafka/config/zookeeper.properties. Since kafka
is nohome user, make use of below directories.
dataLogDir=/var/log/zookeeper
dataDir=/var/lib/zookeeper
If in case any directory like datadir/logdir is already present, then clean the content of that directory before starting zookeeper and kafka broker. Ensure that these directories have proper ownership (kafka:kafka). Use the following command to change the ownership
If in case datadir and datalogdir are not present, please create them.
mkdir -p /var/log/zookeeper
mkdir -p /var/lib/zookeeper
mkdir -p /var/local/data/kafka
# Make sure that kafka:kafka has access to the dataDir and logDir, including the parent directories
chown -R kafka:kafka /var/lib/zookeeper
chown -R kafka:kafka /var/log/zookeeper
chown -R kafka:kafka /var/local/data/kafka
systemctl enable kafka-zookeeper
systemctl enable kafka
systemctl start kafka-zookeeper
sleep 5 # (kafka service needs zookeeper service to be up and running.)
systemctl status kafka-zookeeper
systemctl start kafka
systemctl status kafka
# Make sure that you see `Active: active (running)` when checking the status of both systems.
systemctl stop kafka
systemctl stop kafka-zookeeper
Download the kafka rpm using the command and install it in all the nodes
curl "http://cortx-storage.colo.seagate.com/releases/cortx/third-party-deps/centos/centos-7.8.2003-2.0.0-latest/commons/kafka/kafka-2.13_2.7.0-el7.x86_64.rpm" -o kafka.rpm
yum install kafka.rpm
If above location is not reachable, then find the Kafka rpm in this tar image - https://github.com/Seagate/cortx/releases/download/third-party-deps-1.0.0-0/third-party-centos-7.8.2003-1.0.0-0.tar.gz
Kafka configuration involves setting up server.properties, zookeeper.properties, creating myid file and setting correct ownership to datadir.
The following has to be configured in /opt/kafka/config/server.properties across nodes
Define a unique broker id for each kafka server.
broker.id=0
Define a directory for storing of log files
log.dirs=/var/local/data/kafka
To form a cluster of 3 nodes, add a comma separated list of node and port addresses in the zookeeper.connect parameter so that if a zookeeper instance fails, the node will automatically try to connect to the next available address
zookeeper.connect= <node 1 address>:2181,<node 2 address>:2181,<node 3 address>:2181
Configure the below if hostname or FQDN is used in zookeeper.connect
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://{hostname/FQDN}:9092
Configure the below to make delete(purge) interface work,
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=1
log.delete.delay.ms=1
log.flush.offset.checkpoint.interval.ms=1
Note : It is possible to have multiple kafka server instances on a single node. In that case we need to define separate server.properties file for each instance.
Set proper replication factor for metadata and transaction states. This is required in multi-node setup.
default.replication.factor=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
Define the configuration for the zookeeper in the /opt/kafka/config/zookeeper.properties file by the following configuration parameters
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper (myid file will be created inside this directory)>
dataLogDir=/var/log/zookeeper (if not defined, then datadir will be used)>
clientPort=2181
server.1=<node 1 address>:2888:3888
server.2=<node 2 address>:2888:3888
server.3=<node 3 address>:2888:3888
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
The details for the configuration parameters can be found at https://zookeeper.apache.org/doc/current/zookeeperStarted.html.
Repeat the above steps for each node in the cluster.
In the dataDir folder, add a file myid and add the node id as 1 to the file in the first node. (This must be a single integer value).
Similarly, for nodes 2 and 3, add their respective ids in dataDir/myid file on the respective nodes.
If in case any directory like datadir/logdir is already present, then clean the content of that directory before starting zookeeper and kafka broker. Ensure that these directories have proper ownership (kafka:kafka).
chown -R kafka:kafka <path/to/datadir>
Enable the services on each node.
systemctl enable kafka-zookeeper
systemctl enable kafka
Start the services on each node.
systemctl start kafka-zookeeper
sleep 5 # (kafka service needs zookeeper service to be up and running.)
systemctl start kafka
To Stop service on each node.
systemctl stop kafka
systemctl stop kafka-zookeeper