Cloudera Hadoop Cluster Installation
Prerequisites:
- Ubuntu 14.04 is installed on all nodes in the cluster.
- The system should be connected to the internet.
For the Installation of Ubuntu Linux, following tutorial will be useful:
Setting up Hadoop and monitoring them
turns out to be cumbersome for new users. Cloudera seems to be doing great
service by making it simpler. Cloudera Manager simplifies the installation and
configuration process of Hadoop Cluster.
Cloudera Manager is the industry's first and most
sophisticated management application for Apache Hadoop and the enterprise data
hub. Cloudera Manager automates the installation of the Oracle JDK, Cloudera
Manager Server, CDH, embedded PostgreSQL database and Cloudera Manager Agent
with managed service software on cluster hosts. It also configures databases
for the Cloudera Manager Server and Hive Metastore and optionally Cloudera
Management Service roles.
Basically there are two methods of installing Cloudera Hadoop
cluster:
- Automated Method
- Manual Method
As I am the fan automated things, I
will proceed with the automated method. Though it is said to be automated, most
of the configuration settings have to be done manually. To
use this method, server and cluster hosts must satisfy the following system requirements:
- Provide the ability to log in to the Cloudera Manager Server host using a root account or an account that has password-less sudo permission.
- Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts.
- Assignment of static IP address and hostnames.
I will
explain above points shortly.
Steps for Installing
Cloudera-Manager:
- Make User Account for Hadoop User i.e. cloudera
- Make password less sudo access
- Assignment of static IP address and hostname
- Installation and Configuration of SSH
- Open the Relevant Ports
- Remove Previous Versions if partially installed
- Disable Firewall
- Disable Swapping
- Enable backward compatibility in Ubuntu 14.04
- Download Cloudera Manager Setup
- Run Cloudera Manager Installer
Now we will see each step in detail:
Step1: Make User Account for Hadoop User
Make new user account for Hadoop
user. We will create the new user account named ‘cloudera’, and
set the password. Apply the same procedure for all nodes. Use the following
commands to accomplish this task:
sudo useradd -m -g users -s /bin/bash cloudera;
sudo passwd cloudera;
After that
Login with the ‘cloudera’ user account and proceed to next step.
-------------------------------------------------------------------
Step2: Make password less sudo access
This point is about
password less sudo access. Cloudera Manager installs the necessary software
components and respective services on the nodes in the Hadoop cluster. In
Linux, in order to install any component that user must have the root
permission. There two options to do this. Either we can execute the command
with root account or the command prefixed with sudo keyword.
Command prefixed
with sudo keyword still requires a password, so we should make it password less
so that it will not ask for password anymore.
Doing installation
from root account can be tedious sometimes. Even some systems don’t allow to
login as a root user. So it is better to create a special user account for
Hadoop user and then make the sudo password less. As we already created a new
user account named ‘cloudera’ on every host in the cluster.
Add current user to sudoers group and make its sudo access
password less. Apply the same procedure for all nodes. Use the following
commands:
sudo adduser `whoami` sudo;
sudo sed -i '/%sudo.*/c\%sudo
ALL=(ALL:ALL) NOPASSWD:ALL' /etc/sudoers;
-------------------------------------------------------------------
Step3: Assignment of static IP address and hostname
This point is about
assignment of Static IP address and Host-names. All the nodes in the cluster
need to have a static IP address in order to communicate each other smoothly.
Host names can be
setup when installing the operating system or we can also set hostname manually
with the following command:
sudo hostname server1;
Also set the hostname in the file /etc/hostname. This
file contains the hostname. Open this file with following command,
sudo gedit /etc/hostname;
Now enter new host name [server1] and then save the
file. It will change the host name permanently. Apply this procedure for all
nodes, if the changes in host names are required.
Note:
Make sure hostname is in small-case.
After hostname setup, now we will see how
to assign static IP address. The link below will be helpful for assignment of static IP adderess.
Set Static IP adderss in Ubuntu 14.04
Now make the entries of IP adderess and hostnames in the file /etc/hosts. This file contains the IP addresses followed by respective hostnames in the network. Basically this file is to maintain the IP address and hostname mapping. Open this file with following command,
Now make the entries of IP adderess and hostnames in the file /etc/hosts. This file contains the IP addresses followed by respective hostnames in the network. Basically this file is to maintain the IP address and hostname mapping. Open this file with following command,
sudo gedit /etc/hosts;
Now update this file & save.
The structure of host file should be like this, assuming two
nodes in the network.
127.0.0.1 localhost
192.168.10.11 server1
192.168.10.12 server2
Repeat this procedure for each node. After
performing this steps, reboot all the hosts.
-------------------------------------------------------------------
Step4: Installation and Configuration of SSH
This point is about
the uniform SSH access to all host from the host we are installing Cloudera
Manager. SSH stands for secure shell. It is the new authentication and
communication protocol based on cryptography. SSH is an encrypted network
protocol for initiating shell sessions on remote machines in a secure way. SSH
was designed as a replacement for Telnet and other insecure remote shell
protocols. In SSH communication is secured with encryption algorithms. For smooth
and secure communication between the nodes in the cluster, we must install SSH
and make it password less. Now I will explain the procedure of SSH installation
and configuration.
Install openSSH Server on all hosts with the following
command.
sudo apt-get –y install openssh-server;
Now we should configure SSH to allow
password less access. Enter the following commands on the host on which we are
going to install Cloudera Manager.
ssh-keygen;
ssh-copy-id server1;
ssh-copy-id server2;
Note: The
node on which Cloudera Manager will be installed, will automatically become a
Name-Node i.e. Master-Node.
Check the
SSH connections with the following command:
ssh server1;
Login will
be successful without password. Now we are ready to proceed towards next step.
-------------------------------------------------------------------
Step5: Open the Relevant Ports:
Most of the
times, the incoming connection ports required for Cloudera Manager are disabled
by default. Open all incoming connection ports with the following command:
sudo iptables -A INPUT -p tcp --dport ssh -j ACCEPT;
-------------------------------------------------------------------
Step6: Remove Previous Versions if
partially installed:
If the
installation failed due to some reasons, then clean the previous partial
installation files with following command:
sudo rm -rf /usr/share/cmf /var/lib/cloudera*
/var/cache/yum/cloudera*;
-------------------------------------------------------------------
Step7:
Disable Firewall:
Disable
firewall to prevent its interface during the installation.
sudo ufw disable;
-------------------------------------------------------------------
Step8: Disable
Swapping:
Practically
we cannot disable the swapping, but we can prevent it as much as possible by
using the following command.
sudo sysctl vm.swappiness=0 && sudo
echo 'vm.swappiness = 0' | sudo tee -a /etc/sysctl.conf;
-------------------------------------------------------------------
Step9:
Enable backward compatibility in Ubuntu 14.04:
Enable backward compatibility in Ubuntu 14.04, using the
following command:
sudo printf "Package: *\nPin: release
o=Cloudera, l=Cloudera\nPin-Priority: 501\n" | sudo tee -a
/etc/apt/preferences.d/cloudera.pref;
-------------------------------------------------------------------
Step10: Download
Cloudera Manager Setup:
Download the
latest version of Cloudera Manager Installer with the following command:
wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin;
-------------------------------------------------------------------
Step11: Run Cloudera Manager Installer:
Now run the Cloudera Manager Installer with the following
command to begin installation.
sudo chmod +x cloudera-manager-installer.bin
&& sudo ./cloudera-manager-installer.bin;
After that installation will start. Now follow the
instructions and proceed with default settings.
After the completion of installation, open the following URL
in browser to start Cloudera Manager.
After that login prompt will appear, then Login with the
following credentials:
UserName: admin
Password:
admin
Congratulations!
Now Cloudera Hadoop Cluster is Ready..
References:
http://www.cloudera.com/content/www/en-us/documentation/cdh/5-0-x/CDH5-Installation-Guide/CDH5-Installation-Guide.html
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.7.0/Cloudera-Manager-Installation-Guide/cmig_intro_to_cm_install.html?scroll=cmig_topic_3
http://blog.cloudera.com/blog/2014/01/how-to-create-a-simple-hadoop-cluster-with-virtualbox/
https://www.youtube.com/watch?v=YKh1Wk7dO6I
--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--