Sunday, May 24, 2015

Cloudera Hadoop Cluster Installation on Ubuntu 14.04




Cloudera Hadoop Cluster Installation

Prerequisites: 

  • Ubuntu 14.04 is installed on all nodes in the cluster.
  • The system should be connected to the internet.


    For the Installation of Ubuntu Linux, following tutorial will be useful:
    Setting up Hadoop and monitoring them turns out to be cumbersome for new users. Cloudera seems to be doing great service by making it simpler. Cloudera Manager simplifies the installation and configuration process of Hadoop Cluster.
   Cloudera Manager is the industry's first and most sophisticated management application for Apache Hadoop and the enterprise data hub. Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, CDH, embedded PostgreSQL database and Cloudera Manager Agent with managed service software on cluster hosts. It also configures databases for the Cloudera Manager Server and Hive Metastore and optionally Cloudera Management Service roles.
Basically there are two methods of installing Cloudera Hadoop cluster:

  1.  Automated Method
  2.  Manual Method

As I am the fan automated things, I will proceed with the automated method. Though it is said to be automated, most of the configuration settings have to be done manually. To use this method, server and cluster hosts must satisfy the following system requirements:
  • Provide the ability to log in to the Cloudera Manager Server host using a root account or an account that has password-less sudo permission.
  • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts.
  • Assignment of static IP address and hostnames.
I will explain above points shortly.


 Steps for Installing Cloudera-Manager:

  1. Make User Account for Hadoop User i.e. cloudera
  2. Make password less sudo access
  3. Assignment of static IP address and hostname
  4.  Installation and Configuration of SSH
  5.  Open the Relevant Ports
  6. Remove Previous Versions if partially installed
  7. Disable Firewall
  8.  Disable Swapping
  9. Enable backward compatibility in Ubuntu 14.04
  10.   Download Cloudera Manager Setup
  11.  Run Cloudera Manager Installer


Now we will see each step in detail:

Step1: Make User Account for Hadoop User

Make new user account for Hadoop user. We will create the new user account named ‘cloudera’, and set the password. Apply the same procedure for all nodes. Use the following commands to accomplish this task:

sudo useradd -m -g users -s /bin/bash cloudera;
sudo passwd cloudera;

After that Login with the ‘cloudera’ user account and proceed to next step.

 ------------------------------------------------------------------- 

Step2: Make password less sudo access

    This point is about password less sudo access. Cloudera Manager installs the necessary software components and respective services on the nodes in the Hadoop cluster. In Linux, in order to install any component that user must have the root permission. There two options to do this. Either we can execute the command with root account or the command prefixed with sudo keyword.
    Command prefixed with sudo keyword still requires a password, so we should make it password less so that it will not ask for password anymore.
    Doing installation from root account can be tedious sometimes. Even some systems don’t allow to login as a root user. So it is better to create a special user account for Hadoop user and then make the sudo password less. As we already created a new user account named ‘cloudera’ on every host in the cluster.
Add current user to sudoers group and make its sudo access password less. Apply the same procedure for all nodes. Use the following commands:

sudo adduser `whoami` sudo;
sudo sed -i '/%sudo.*/c\%sudo   ALL=(ALL:ALL) NOPASSWD:ALL' /etc/sudoers;

 -------------------------------------------------------------------
Step3: Assignment of static IP address and hostname

    This point is about assignment of Static IP address and Host-names. All the nodes in the cluster need to have a static IP address in order to communicate each other smoothly.
    Host names can be setup when installing the operating system or we can also set hostname manually with the following command:

sudo hostname server1;

Also set the hostname in the file /etc/hostname. This file contains the hostname. Open this file with following command, 

sudo gedit /etc/hostname;

Now enter new host name [server1] and then save the file. It will change the host name permanently. Apply this procedure for all nodes, if the changes in host names are required.

Note: Make sure hostname is in small-case.

    After hostname setup, now we will see how to assign static IP address. The link below will be helpful for assignment of static IP adderess.
    Set Static IP adderss in Ubuntu 14.04

    Now make the entries of IP adderess and hostnames  in the file /etc/hosts. This file contains the IP addresses followed by respective hostnames in the network. Basically this file is to maintain the IP address and hostname mapping. Open this file with following command, 

sudo gedit /etc/hosts;

Now update this file & save.

The structure of host file should be like this, assuming two nodes in the network.
127.0.0.1              localhost
192.168.10.11      server1
192.168.10.12      server2

Repeat this procedure for each node. After performing this steps, reboot all the hosts.
 
 -------------------------------------------------------------------

Step4: Installation and Configuration of SSH

    This point is about the uniform SSH access to all host from the host we are installing Cloudera Manager. SSH stands for secure shell. It is the new authentication and communication protocol based on cryptography. SSH is an encrypted network protocol for initiating shell sessions on remote machines in a secure way. SSH was designed as a replacement for Telnet and other insecure remote shell protocols. In SSH communication is secured with encryption algorithms. For smooth and secure communication between the nodes in the cluster, we must install SSH and make it password less. Now I will explain the procedure of SSH installation and configuration.
Install openSSH Server on all hosts with the following command.

sudo apt-get –y install openssh-server;

 Now we should configure SSH to allow password less access. Enter the following commands on the host on which we are going to install Cloudera Manager.

ssh-keygen;
ssh-copy-id server1;
ssh-copy-id server2;

Note: The node on which Cloudera Manager will be installed, will automatically become a Name-Node i.e. Master-Node.

Check the SSH connections with the following command:

ssh server1;

Login will be successful without password. Now we are ready to proceed towards next step.

 -------------------------------------------------------------------

Step5: Open the Relevant Ports:
 
Most of the times, the incoming connection ports required for Cloudera Manager are disabled by default. Open all incoming connection ports with the following command:

sudo iptables -A INPUT -p tcp --dport ssh -j ACCEPT;
 
 -------------------------------------------------------------------

Step6: Remove Previous Versions if partially installed:

If the installation failed due to some reasons, then clean the previous partial installation files with following command:

sudo rm -rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/cloudera*;
  
 -------------------------------------------------------------------

Step7: Disable Firewall:

Disable firewall to prevent its interface during the installation.

sudo ufw disable;
  
 -------------------------------------------------------------------

Step8: Disable Swapping:

Practically we cannot disable the swapping, but we can prevent it as much as possible by using the following command.

sudo sysctl vm.swappiness=0 && sudo echo 'vm.swappiness = 0' | sudo tee -a /etc/sysctl.conf;

 -------------------------------------------------------------------

Step9: Enable backward compatibility in Ubuntu 14.04:

Enable backward compatibility in Ubuntu 14.04, using the following command:

sudo printf "Package: *\nPin: release o=Cloudera, l=Cloudera\nPin-Priority: 501\n" | sudo tee -a /etc/apt/preferences.d/cloudera.pref;

 -------------------------------------------------------------------

Step10: Download Cloudera Manager Setup:

Download the latest version of Cloudera Manager Installer with the following command:

wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin;
  
 -------------------------------------------------------------------

Step11: Run Cloudera Manager Installer:

Now run the Cloudera Manager Installer with the following command to begin installation.

sudo chmod +x cloudera-manager-installer.bin && sudo ./cloudera-manager-installer.bin;

After that installation will start. Now follow the instructions and proceed with default settings.

After the completion of installation, open the following URL in browser to start Cloudera Manager.


After that login prompt will appear, then Login with the following credentials:

UserName: admin           
Password: admin


Congratulations!  Now Cloudera Hadoop Cluster is Ready..



References:

http://www.cloudera.com/content/www/en-us/documentation/cdh/5-0-x/CDH5-Installation-Guide/CDH5-Installation-Guide.html

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.7.0/Cloudera-Manager-Installation-Guide/cmig_intro_to_cm_install.html?scroll=cmig_topic_3


http://blog.cloudera.com/blog/2014/01/how-to-create-a-simple-hadoop-cluster-with-virtualbox/


https://www.youtube.com/watch?v=YKh1Wk7dO6I


  --*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--