Skip to content

Latest commit

 

History

History
237 lines (167 loc) · 9.76 KB

File metadata and controls

237 lines (167 loc) · 9.76 KB

1 - Virtual Machine Installation & Configuration Guide

We will install and configure virtual machines using VirtualBox. In this tutorial it is accepted that:

  • You have already installed Ubuntu 18.04.4 LTS in your local computer.
  • You have updated and upgraded all packages.
  • You have a valid internet connection.

1.1. Download Image File and VirtualBox

  • Install VirtualBox from command-line:
sudo apt install virtualbox

OR

You can download image (ISO) file from command-line:

cd ~/Downloads
wget http://releases.ubuntu.com/bionic/ubuntu-18.04.4-live-server-amd64.iso

1.2. Create First Slave Machine

  • Firstly, start VirtualBox using command below:
virtualbox &
  • Click "New" button.
  • Enter Name as "slave-1", Type as "Linux", Version as "Ubuntu (64-bit)".

SS-2-1

  • Select the amount of memory (RAM). We select 2048 MB (2 GB), arbitrarily. You can change it up to your needs.

SS-2-2

  • Create a virtual hard disk for convenience.

SS-2-3

  • Choose VDI (VirtualBox Disk Image).

SS-2-4

  • Dynamically allocated has its own advantages, so we will choose it. But fixed-size option would also be selected if we were sure that disk size don't need to change in the future.

SS-2-5

  • We will enter the name of the new virtual hard disk file as "slave-1". This name is arbitrary, you can change it. Size of the virtual hard disk is very important. You should set it at least 20 GB, we will set it as 50 GB to be safe.

SS-2-6

  • Choose the Ubuntu ISO, you downloaded before.

SS-2-7

SS-2-8

SS-2-9

  • Pass the initial parts and select the options. When you are in Profile setup part. We will set "Your Name" as spark-user, "Your server's name" as slave-1, "Pick a username" as spark-user. Names can change but user-names should be the same among all machines to be safe in future ssh connections.

SS-2-10

Note: You can choose a different user-name, It is totally up to you.

  • Select "Install OpenSSH Server".

SS-2-11

  • No need to install any packages listed here. We can install them after the installation if needed.

SS-2-12

  • Reboot after installation and you are done!

1.3. Create Second Slave Machine

You don't have to do all the things again. Simply, clone the first virtual machine.

  • Shut machine "slave-1" down.
  • Click right click on "slave-1" and select "clone" option. Rename the machine to "slave-2".

SS-3-1

  • Choose full clone option and hit Clone.

SS-3-2

  • Our second virtual machine is ready, but the hostname of the machine is still "slave-1". In order to fix this, start slave-2 and login with credentials you defined in the installation of slave-1.

SS-3-3

  • After login, run the command below:
sudo hostnamectl set-hostname slave-2 # Change hostname to slave-2
exit # Logout to check the new host-name
  • If you see a screen like this, then everything is fine:

SS-3-4

We have 2 Virtual Ubuntu 18 machines now. Note that, you could have done the cloning operation later. But when you do it later, you should remember to change all variables specific to the machine, such as IP address.

1.4. Network & SSH Configuration of Both Slave Machines

We can configure IP addresses and SSH to make the system maintainable and easy to use.

  • First of all, updating systems could be useful. Log into both machines and run the command:
sudo apt update && sudo apt upgrade
  • Log into one machine (let's say "slave-1") and run the command ifconfig. You will see that IP address like 10.0.X.Y shown below:

SS-4-1

This is a default IP address for virtual machine. All machines can connect to the internet with this IP address, but cannot connect to each other directly. So, we need a private network for 3 machines and IP addresses must follow the pattern of "192.168.X.Y".

  • If your local machine is connected to the internet (otherwise you couldn't have updated the virtual machines), your local machine must have already a private network IP address starts with "192.168". We need to determine which network interface is responsible for this network in local. So, run the command ifconfig in your local machine. The output would be like this:

SS-4-2

The network interface we should remember is enp3s0 in this case.

  • Go to VirtualBox, right click on both virtual machines, go to Settings and go to Network tab. You should see "NAT" adapter attached. Change this to "Bridged Adapter" and change name to "enp3s0" (i.e. the network interface we found). You don't need any other adapter. Apply this for both machines, as shown below:

SS-4-3 SS-4-4

Note: In a company or a restricted network, new IP adresses for private network could be blocked and unable to connect to the internet. In this case, contact system admin and ask her/him to allow virtual machines' MAC IP addresses. Alternatively, system admin can allow the IP addresses only, but in this case if IP addresses of virtual machines change, system admin should do the same work again. Also, make sure IP addresse of your virtual machines don't overlap with any other machine in the network.

  • Login to both machines and run the command ifconfig. You will see that each machine has its own IP starts with "192.168", as shown below:

SS-4-5 SS-4-6

We already know that our local machine's IP is 192.168.10.107 . So we can make a table like this:

Host-Name IP Address Info
master 192.168.10.107 Local Ubuntu 18 Machine
slave-1 192.168.10.140 Virtual Ubuntu 18 Machine
slave-2 192.168.10.141 Virtual Ubuntu 18 Machine
  • In all machines, we will add these hostnames in order to be safe in the future. Open /etc/hosts file with your favorite text editor (gedit,GNU Emacs, Nano, vim, etc.) and add the following lines to the top of the document:
192.168.10.107	master
192.168.10.140	slave-1	
192.168.10.141	slave-2

and /etc/hosts file of all machines will look like:

SS-4-7 SS-4-8 SS-4-9

  • In order to be sure that everything is fine so far, ping all machines in local machine:
ping master
ping slave-1	
ping slave-2
  • You should see that ICMP packages find master and slave machines:

SS-4-10

1.5. SSH Configuration

  • First let's create a linux user dedicated to spark & HDFS operations. We created "spark-user" users in slave machines before, so we can add a user with same name in master machine. In master machine run the following commands:
sudo adduser spark-user # Add a user with name "spark-user".

Enter & retype a password and press enter for following questions.

  • Run the following command to add a group with the same name:
sudo usermod -aG spark-user spark-user # Add a group with name "spark-user".
  • Add "spark-user" to sudoers in order to allow spark-user to be root:
sudo adduser spark-user sudo
  • Login as "spark-user":
su spark-user
  • Make sure that openssh-server is installed correctly. If you are not sure and just to be safe, run the following commands:
sudo apt-get purge openssh-server # Removes openssh-server completely
sudo apt-get install openssh-server # Installs openssh-server
ssh localhost

You should successfully connect to localhost via ssh.

  • As "spark-user", create ssh-key with following command:
su spark-user
ssh-keygen

You can press enter the all questions showed up in ssh-key generation.

  • Your public key should be in /home/spark-user/.ssh/id_rsa.pub. To check this, run the following command:
cat /home/spark-user/.ssh/id_rsa.pub

If you see a long string starts with "ssh-rsa ", then everything's ok.

  • Now we will copy this id to all machines for paswordless login. Run the following commands:
ssh-copy-id spark-user@slave-1
ssh-copy-id spark-user@slave-2
ssh-copy-id spark-user@master
  • Now, ssh to machines seperately, to check that ssh configuration is set correctly.
ssh slave-1
exit # Exit from machine slave-1
ssh slave-2
exit # Exit from machine slave-2
ssh master
exit # Exit from master

You should see no password authentication and it should connect directly.

References