CVRG OpenTSDB Cluster Setup

From CVRG Wiki

(Redirected from OpenTSDB Cluster Setup)
Jump to: navigation, search


Contents

References

Installation Guides

Useful Links

Software Versions

  • Apache Zookeeper 3.4.8
  • Apache Hadoop 2.7.2
  • HBase 1.2.2
  • OpenTSDB 2.2.1

Requirements

  • Java 1.6+ Installed
  • CentOS 7+ OS

CVRG OpenTSDB Setup Procedure

Note: Throughout this document, filenames, folder names and commands entered into the terminal are all in bold.

While this document is intended to detail the procedure for installing OpenTSDB, it does have some prerequisites that must be installed and configured first. These prerequisites are included below.

Generally, the procedure for installation matches those provided by the original developers, with specific configuration options and settings unique to the needs of the CVRG.

Please note that this document provides the steps for several applications, all of which are necessary to run OpenTSDB. Resource directories are configured to be outside the actual application directory for each item because this will simplify upgrades to the individual components.

The assumption is that each server is running a CentOS 7 instance.

It is further assumed that the user who will be following these steps is comfortable in the Linux environment and knows when/if to use sudo and how to manage file permissions. It is also assumed that the user can perform simple tasks like creating directories, changing file permissions/ownership, extracting tarballs, opening ports, etc. The user will need root access to complete this procedure.

In the example below, the cluster will consist of 3 nodes. When creating a larger cluster, it is not necessary to expand Zookeeper to be on all of the nodes. As long as it is installed on at least 3 nodes it will serve its purpose. If more nodes need Zookeeper to keep up with demand, install Zookeeper on an odd number of nodes only.

Install Zookeeper

These instructions are for setting up Zookeeper on multiple servers. Zookeeper is a suite of utilities and services that the other applications rely upon. A cluster of servers in Zookeeper is referred to as an ensemble. Ideally, an ensemble is composed of an odd number of servers.

Create Hadoop User and Group

A dedicated user will be created to own and run all of the applications being installed. Perform the following steps on each machine in the cluster:

  1. Create new user group hadoopgroup (groupadd hadoopgroup)
  2. Create new user hadoopuser (adduser hadoopuser)
  3. Add hadoopuser to hadoopgroup (usermod -aG hadoopgroup hadoopuser)
  4. Set the password for hadoopuser to <unspecified value> (passwd hadoopuser)

Set up the Servers

  1. Verify Java installation. Java version 1.6+ is required.
  2. Set Java heap size to avoid swapping which would degrade performance. Currently the VMs in use by our team have 2GB RAM set by default. The Java heap is recommended to be about 1GB less than the total RAM. This can be done by adding the setting JAVA_OPTS=’-Xmx1G’ to the file /etc/profile.d/java.sh
  3. Download the Zookeeper tarball (wget http://www.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz)
  4. This file should be extracted here, which will create the folder /opt/zookeeper-3.4.8 (tar -xvf zookeeper-3.4.8.tar.gz)
  5. Make a zoo-datadirectory in /opt (mkdir /opt/zoo-datadirectory)
  6. Create a symbolic link to zookeeper (ln –s /opt/zookeeper-3.4.8 zookeeper)
  7. Change the ownership (recursively) of zoo-datadirectory to hadoopuser (chown -R hadoopuser zoo-datadirectory)
  8. Change the ownership (recursively) of zookeeper-3.4.8 to hadoopuser (chown -R hadoopuser zookeeper)
  9. Change the group ownership (recursively) of zoo-datadirectory to hadoopgroup (chgrp -R hadoopgroup zoo-datadirectory)
  10. Change the group ownership (recursively) of zookeeper-3.4.8 to hadoopgroup (chgrp -R hadoopgroup zookeeper)
  11. Open ports 2888 and 3888 for TCP/IP using this command for each:
lokkit --port=2888:tcp
lokkit --port=3888:tcp

Configure Zookeeper

  • There will be a file in /opt/zookeeper/conf named zoo_sample.cfg. Rename it to zoo.cfg and change the settings to match the configuration below:
   	# The number of milliseconds of each tick
   	tickTime=2000
   	# The number of ticks that the initial 
   	# synchronization phase can take
   	initLimit=10
   	# The number of ticks that can pass between 
   	# sending a request and getting an acknowledgement
   	syncLimit=5
   	# the directory where the snapshot is stored.
   	# do not use /tmp for storage, /tmp here is just 
   	# example sakes.
   	dataDir=/opt/zoo-datadirectory
   	# the port at which the clients will connect
   	clientPort=2181
   	# the maximum number of client connections.
   	# increase this if you need to handle more clients
   	maxClientCnxns=10
  • Since we’re also installing Hadoop in a later section, we should configure the hostnames now with the Hadoop cluster in mind. Each server in the cluster serves a specific role. It is necessary to choose one server to be the MasterNode, and all the other servers will be SlaveNodes.

In a CVRG Hadoop cluster, the Masternode is named “*masternode” and the slave nodes are named “*slavenode#” where the ‘#’ is a number and ‘*’ is something to distinguish the purpose (e.g., dev, test, prod, etc.).

  • Edit the /etc/hosts file to add the names and IP addresses of the servers in the cluster. Each node should have its own DNS name attached to its home address. It is also necessary to define the machine name by its DNS in addition to its cluster name since Hadoop will use it to identify itself and other nodes on the cluster. Make sure that the 127.0.0.1 line does not have any DNS settings in it as HBase is picky.

Example:

   	10.0.0.1 *masternode firstserver
   	10.0.0.2 *slavenode1 secondserver
   	10.0.0.3 *slavenode2 thirdserver
  • Each server in the ensemble must also be configured for zookeeper. In zoo.cfg add a line to the file for each server in the ensemble. Best practices calls for us to avoid the use of IP addresses and use the hostnames we’ve configured instead.
   	server.1=*masternode:2888:3888
  • The next server would be server.2 and so on. Use the header #CVRG * Server Ensemble for this section where ‘*’ is something to distinguish the purpose (e.g., dev, test, prod, etc.).
  • Each server in the ensemble needs to be identified using its id number. For example, the server set as server.1 above would be identified by the number ’1’ and so on. To set this, create a file in the /opt/zoo-datadirectory folder called myid. The contents of this file should be just the id number of that server, nothing more. server.1 would thus contain only 1 in its myid file.

Configure Logging

Zookeeper uses Log4j as its logging mechanism. By default, it’s set to send log output to the console only. It would be more useful to output to a logfile.

  1. Create the folder /opt/zookeeper/logs (mkdir /opt/zookeeper/logs)
  2. Open the file /opt/zookeeper/conf/log4j.properties for editing.
  3. About 10 lines below is a comment saying “DEFAULT: console appender only.” Comment out the entry below it.
  4. Add the following lines below the entry just commented out:
   	#CVRG Logfile
   	log4j.rootLogger=WARN, ERROR, FATAL, CONSOLE, ROLLINGFILE


  • Locate the line that says log4j.appender.ROLLINGFILE.File=${zookeeper.log.dir}/${zookeeper.log.file} and change it to:
   	log4j.appender.ROLLINGFILE.File=/opt/zookeeper/logs/zookeeper.log
  • Save the file.

Start Zookeeper

Start Zookeeper by running the following command as the hadoopuser within the /opt/zookeeper/bin folder. Do not run the command from any other current directory.

./zkServer.sh start

Perform the above steps for each server in the ensemble, changing the id number in the myid file as appropriate.

Monitoring Zookeeper

The logging for Zookeeper is sent to /opt/zookeeper/logs/zookeeper.log. This is the same on all machines in the cluster.

Install Hadoop

Note that to use this procedure Hadoop version 2.2.0 should be used. (Do NOT use a later version. The HBase install includes jar files that are matched to Hadoop 2.2.0.) In your cluster, designate one machine to be the Hadoop ResourceManager. In this installation, we are using the masternode for this purpose. You will also need to designate one machine as the NameNode. In our example, that will be slavenode1.

Except where noted, the following steps are to be performed on each server in the cluster.

Set up RSA Key for hadoopuser

  1. su as hadoopuser
  2. create new directory .ssh in the hadoopuser’s home folder.
  3. Set permissions on the .ssh folder to 700
  4. Run the command ssh-keygen -t rsa
  5. When prompted, hit enter to set a blank passphrase
  6. Create a new file in .ssh called authorized_keys
  7. Copy the contents of id_rsa.pub to authorized_keys using the command cat id_rsa.pub >> authorized_keys
  8. Copy the RSA Key to each server using the command ssh-copy-id –i /home/hadoopuser/.ssh/id_rsa.pub hadoopuser@(target server name)
  9. Then, ssh into the target server to verify the success of the copy and to add the RSA to the list of known hosts.

Be sure to do this for each server in the cluster!

Install Hadoop Files

Perform the following steps on each machine in the cluster:

  1. Obtain the binary tarball for Hadoop (wget http://www.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz)
  2. Extract the tarball to /opt (tar -xvf hadoop-2.7.2.tar.gz)
  3. Create a symbolic link called hadoop pointing to the extracted hadoop folder (ln -s hadoop-2.7.2 hadoop)
  4. Create the folder /opt/hadoop2-resources/tmp (mkdir /opt/hadoop2-resources mkdir /opt/hadoop2-resources/tmp)
  5. Change the ownership (recursively) of hadoop2-resources to hadoopuser
  6. Change the ownership (recursively) of hadoop-2.7.2 to hadoopuser
  7. Change the group ownership (recursively) of hadoop2-resources to hadoopgroup
  8. Change the group ownership (recursively) of hadoop-2.7.2 to hadoopgroup

Configure Hadoop

Edit the following files in /opt/hadoop/etc/hadoop.

core-site.xml

In the <configuration> tag, add the following:

   	<property>
           	<name>fs.defaultFS</name>
           	<value>hdfs://*slavenode1/</value>
           	<description>NameNode URI</description>
   	</property>

This designates *slavenode1 as the NameNode for the cluster where ‘*’ is something to distinguish the purpose (e.g., dev, test, prod, etc.).

mapred-site.xml

In the <configuration> tag, add the following:

       <property>
               <name>yarn.app.mapreduce.am.resource.mb</name>
               <value>1024</value>
       </property>
       <property>
               <name>yarn.app.mapreduce.am.command-opts</name>
               <value>-Xmx768m</value>
       </property>
       <property>
               <name>mapreduce.framework.name</name>
               <value>yarn</value>
               <description>Execution framework.</description>
       </property>
       <property>
               <name>mapreduce.map.cpu.vcores</name>
               <value>1</value>
               <description>The number of virtual cores required for each map task.</description>
       </property>
       <property>
               <name>mapreduce.reduce.cpu.vcores</name>
               <value>1</value>
               <description>The number of virtual cores required for each map task.</description>
       </property>
       <property>
               <name>mapreduce.map.memory.mb</name>
               <value>1024</value>
               <description>Larger resource limit for maps.</description>
       </property>
       <property>
               <name>mapreduce.map.java.opts</name>
               <value>-Xmx768m</value>
               <description>Heap-size for child jvms of maps.</description>
       </property>
       <property>
               <name>mapreduce.reduce.memory.mb</name>
               <value>1024</value>
               <description>Larger resource limit for reduces.</description>
       </property>
       <property>
               <name>mapreduce.reduce.java.opts</name>
               <value>-Xmx768m</value>
               <description>Heap-size for child jvms of reduces.</description>
       </property>
       <property>
               <name>mapreduce.jobtracker.address</name>
               <value>*slavenode2:8021</value>
       </property>
hdfs-site.xml

In the <configuration> tag, add the following:

  <property>
           <name>dfs.datanode.data.dir</name>
           <value>file:///opt/hadoop2-resources/datanode</value>
           <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
   </property>
   <property>
           <name>dfs.namenode.name.dir</name>
           <value>file:///opt/hadoop2-resources/namenode</value>
           <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description>
   </property>
yarn-site.xml

In the <configuration> tag, add the following:

   <property>
           <name>yarn.scheduler.minimum-allocation-mb</name>
           <value>128</value>
           <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
   </property>
   <property>
           <name>yarn.scheduler.maximum-allocation-mb</name>
           <value>2048</value>
           <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
   </property>
   <property>
           <name>yarn.scheduler.minimum-allocation-vcores</name>
           <value>1</value>
           <description>
               The minimum allocation for every container request at the RM, in terms of virtual CPU cores. 
               Requests lower than this won't take effect, and the specified value will get allocated the minimum.
           </description>
   </property>
   <property>
           <name>yarn.scheduler.maximum-allocation-vcores</name>
           <value>2</value>
           <description>
               The maximum allocation for every container request at the RM, in terms of virtual CPU cores. 
               Requests higher than this won't take effect, and will get capped to this value.
           </description>
   </property>
   <property>
           <name>yarn.nodemanager.resource.memory-mb</name>
           <value>4096</value>
           <description>Physical memory, in MB, to be made available to running containers</description>
   </property>
   <property>
           <name>yarn.nodemanager.resource.cpu-vcores</name>
           <value>4</value>
           <description>Number of CPU cores that can be allocated for containers.</description>
   </property>
   <property>
           <name>yarn.resourcemanager.hostname</name>
           <value>*masternode</value>
           <description>The hostname of the RM.</description>
   </property>
   <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
           <description>shuffle service that needs to be set for Map Reduce to run </description>
   </property>

Format the Hadoop File System

  • Need to add /etc/alternatives/jre to /opt/hadoop/etc/hadoop-env.sh before trying this

Perform the following step on the NameNode machine only as the hadoopuser: In the /opt/hadoop/bin folder, run the command:

./hadoop namenode –format

Open the Appropriate Ports

On each node in the cluster, it is necessary to make sure that ports 8031 and 8020 are open and that the firewalls on each machine permits connections.

Start the cluster

Since there are individual nodes with separate roles, those items need to be started up individually. While Hadoop does provide a script for starting up the entire cluster, it’s better for troubleshooting to perform these steps manually on the appropriate machines.

On the NameNode machine only

Start the NameNode with the command

/opt/hadoop/sbin/hadoop-daemon.sh start namenode
On all the machines in the cluster

Start the DataNodes with the command

/opt/hadoop/sbin/hadoop-daemon.sh start datanode

Check the logs and verify these items have started without errors. If all is well, you now have a running HDFS cluster. Details on how to monitor these logs are provided below. Next, start Yarn.

On the ResourceManager machine only

Start the ResourceManager with the command

/opt/hadoop/sbin/yarn-daemon.sh start resourcemanager
On all the machines in the cluster

Start the NodeManagers on each machine with the command

/opt/hadoop/sbin/yarn-daemon.sh start nodemanager

Check the logs and verify these items have started without errors. If all is well, you now have a running Yarn cluster.

Monitoring the Hadoop Cluster

Note: The filename is different for each machine. The “h03” in the examples below is the machine name.

The following logfiles can be viewed on each machine in the cluster:

/opt/hadoop/logs/yarn-hadoopuser-nodemanager-h03.log /opt/hadoop/logs/hadoop-hadoopuser-datanode-h03.log

To view the log for the ResourceManager:

/opt/hadoop/logs/yarn-hadoopuser-resourcemanager-h01.log

To view the log for the NameNode:

/opt/hadoop/logs/hadoop-hadoopuser-namenode-h02.log


Install HBase

It’s time to install Hbase itself. It must also be installed on each server in the cluster.

Install Hbase

One machine in the cluster must be designated as the Master. For this installation, that will be the same machine that was defined as the masternode in Hadoop.

  1. Download the HBase tarball (wget http://apache.org/dist/hbase/stable/hbase-1.2.2-bin.tar.gz)
  2. This file should be extracted to the folder /opt. The resulting folder within /opt will be called hbase-1.2.2.
  3. Create a symbolic link to Hbase. (ln –s /opt/hbase-1.2.2 hbase)
  4. Set the ownership (recursively) of the hbase folder to hadoopuser and hadoopgroup.

Configure Hbase

Configure resource folders

Edit the following files, found in /opt/hbase/conf.

hbase-site.xml

Add the following lines inside the configuration tab.

  <property>
   	<name>hbase.rootdir</name>
   	<value>hdfs://*slavenode1/hbase</value>
 </property>
 <property>
  	 <name>hbase.zookeeper.property.dataDir</name>
   	<value>/opt/zoo-datadirectory</value>
 </property>
 <property>
   	<name>hbase.cluster.distributed</name>
   	<value>true</value>
 </property>
 <property>
  	<name>hbase.zookeeper.quorum</name>
  	<value>*masternode,*slavenode1,*slavenode2</value>
 </property>
 <property>
  	<name>hbase.zookeeper.property.clientPort</name>
  	<value>2181</value>
 </property>
hbase-env.sh

For each server in the cluster, we will need to configure Hbase to find the local Java JDK.

Uncomment the JAVA_HOME setting line and be sure it is set:

export JAVA_HOME=/etc/alternatives/jre

In the same file, near the end, is the setting HBASE_MANAGES_ZK. Set this to false so that Hbase will use the existing Zookeeper installation.

regionservers

The file should only contain 2 lines:


slavenode1

slavenode2

backup-masters

This file should contain only one line:

slavenode1

zoo.cfg

Copy the file /opt/zookeeper/conf/zoo.cfg into this folder.

Open Required Ports

Enter the command

sudo lokkit –port=(port number):tcp

Do this for ports 60000 through 60030

Start HBase

Since HBase is relying on a separate Zookeeper install, we don’t want to use the start-all script in HBase. Doing so will force start its internal Zookeeper and cause conflicts. Instead, it is necessary to start its other daemons separately.

Run the following commands on the masternode machine:

  • Note the singular ‘daemon’ for master and plural ‘daemons’ for the resgionservers.
/opt/hbase/bin/hbase-daemon.sh start master
/opt/hbase/bin/hbase-daemons.sh start regionserver

Installing OpenTSDB

Download and setup the folders

  1. Obtain the OpenTSDB source tarball from the Git download site (git clone git://github.com/OpenTSDB/opentsdb.git)
  2. Unzip the tarball into the /opt folder, which will create the folder /opt/opentsdb
  3. Change the ownership (recursively) of opentsdb to hadoopuser
  4. Change the group ownership (recursively) of opentsdb to hadoopgroup
  5. OpenTSDB works off a configuration file that is shared between the daemon and command line tools. If you compiled from source, copy the ./src/opentsdb.conf file to a proper directory as documented in Configuration and edit the following, required settings:
tsd.core.auto_create_metrics = true
tsd.storage.enable_compaction = false
tsd.storage.hbase.zk_quorum = masternode,slavenode1,slavenode2
tsd.http.request.enable_chunked = true
tsd.http.request.max_chunk = 33554432
tsd.storage.fix_duplicates = true
  1. Compile and install the source
  2. Run the commands:
/opt/opentsdb/build.sh
/opt/opentsdb/build/make install

Create HBase tables

Run the command:

env COMPRESSION=NONE HBASE_HOME=/opt/hbase /opt/opentsdb/src/create_table.sh

Start OpenTSDB

On the masternode server, run the following command:

/opt/opentsdb/build/tsdb tsd &

Stopping & Starting the Cluster

It may be necessary to occasionally restart the entire cluster. Follow these steps as hadoopuser:

Stopping the Cluster

  • Stop the OpenTSDB application using the command: ps -ef | grep opentsdb then kill -9 <pid of opentsdb>
  • Stop the hbase regionserver on the Master server with the command /opt/hbase/bin/hbase-daemons.sh stop regionserver
  • Stop the Hbase master on the Master server with the command /opt/hbase/bin/hbase-daemon.sh stop master
  • Stop the Hadoop NodeManagers on each machine with the command /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
  • Stop the Hadoop ResourceManager on the Master server with the command /opt/hadoop/sbin/yarn-daemon.sh stop resourcemanager
  • Stop Hadoop DataNodes on each machine with the command /opt/hadoop/sbin/hadoop-daemon.sh stop datanode
  • Stop the Hadoop NameNode on Slave 1 with the command /opt/hadoop/sbin/hadoop-daemon.sh stop namenode
  • Stop Zookeeper on each machine with the command /opt/zookeeper/bin/zkServer.sh stop

Starting the Cluster

  • Start Zookeeper on each machine with the command /opt/zookeeper/bin/zkServer.sh start
  • Start the Hadoop NameNode on Slave 1 with the command /opt/hadoop/sbin/hadoop-daemon.sh start namenode
  • Start Hadoop DataNodes on each machine with the command /opt/hadoop/sbin/hadoop-daemon.sh start datanode
  • Start the Hadoop ResourceManager on the Master server with the command /opt/hadoop/sbin/yarn-daemon.sh start resourcemanager
  • Start the Hadoop NodeManagers on each machine with the command /opt/hadoop/sbin/yarn-daemon.sh start nodemanager
  • Start the Hbase master on the Master server with the command /opt/hbase/bin/hbase-daemon.sh start master
  • Start the hbase regionserver on the Master server with the command /opt/hbase/bin/hbase-daemons.sh start regionserver
  • Start the OpenTSDB application using the command: /opt/opentsdb/build/tsdb tsd &
Personal tools
Project Infrastructures