We will install and configure virtual machines using VirtualBox. In this tutorial it is accepted that:
- You have already installed Ubuntu 18.04.4 LTS in your local computer.
- You have updated and upgraded all packages.
- You have a valid internet connection.
- Install VirtualBox from command-line:
sudo apt install virtualbox
- Get 18.04.4 LTS Server Install Image: You can download ISO file from the link: Official Ubuntu 18.04.4 LTS Live Server ISO
OR
You can download image (ISO) file from command-line:
cd ~/Downloads
wget http://releases.ubuntu.com/bionic/ubuntu-18.04.4-live-server-amd64.iso
- Firstly, start VirtualBox using command below:
virtualbox &
- Click "New" button.
- Enter Name as "slave-1", Type as "Linux", Version as "Ubuntu (64-bit)".
- Select the amount of memory (RAM). We select 2048 MB (2 GB), arbitrarily. You can change it up to your needs.
- Create a virtual hard disk for convenience.
- Choose VDI (VirtualBox Disk Image).
- Dynamically allocated has its own advantages, so we will choose it. But fixed-size option would also be selected if we were sure that disk size don't need to change in the future.
- We will enter the name of the new virtual hard disk file as "slave-1". This name is arbitrary, you can change it. Size of the virtual hard disk is very important. You should set it at least 20 GB, we will set it as 50 GB to be safe.
- Choose the Ubuntu ISO, you downloaded before.
- Pass the initial parts and select the options. When you are in Profile setup part. We will set "Your Name" as spark-user, "Your server's name" as slave-1, "Pick a username" as spark-user. Names can change but user-names should be the same among all machines to be safe in future ssh connections.
Note: You can choose a different user-name, It is totally up to you.
- Select "Install OpenSSH Server".
- No need to install any packages listed here. We can install them after the installation if needed.
- Reboot after installation and you are done!
You don't have to do all the things again. Simply, clone the first virtual machine.
- Shut machine "slave-1" down.
- Click right click on "slave-1" and select "clone" option. Rename the machine to "slave-2".
- Choose full clone option and hit Clone.
- Our second virtual machine is ready, but the hostname of the machine is still "slave-1". In order to fix this, start slave-2 and login with credentials you defined in the installation of slave-1.
- After login, run the command below:
sudo hostnamectl set-hostname slave-2 # Change hostname to slave-2
exit # Logout to check the new host-name
- If you see a screen like this, then everything is fine:
We have 2 Virtual Ubuntu 18 machines now. Note that, you could have done the cloning operation later. But when you do it later, you should remember to change all variables specific to the machine, such as IP address.
We can configure IP addresses and SSH to make the system maintainable and easy to use.
- First of all, updating systems could be useful. Log into both machines and run the command:
sudo apt update && sudo apt upgrade
- Log into one machine (let's say "slave-1") and run the command
ifconfig
. You will see that IP address like 10.0.X.Y shown below:
This is a default IP address for virtual machine. All machines can connect to the internet with this IP address, but cannot connect to each other directly. So, we need a private network for 3 machines and IP addresses must follow the pattern of "192.168.X.Y".
- If your local machine is connected to the internet (otherwise you couldn't have updated the virtual machines), your local machine must have already a private network IP address starts with "192.168". We need to determine which network interface is responsible for this network in local. So, run the command
ifconfig
in your local machine. The output would be like this:
The network interface we should remember is enp3s0 in this case.
- Go to VirtualBox, right click on both virtual machines, go to Settings and go to Network tab. You should see "NAT" adapter attached. Change this to "Bridged Adapter" and change name to "enp3s0" (i.e. the network interface we found). You don't need any other adapter. Apply this for both machines, as shown below:
Note: In a company or a restricted network, new IP adresses for private network could be blocked and unable to connect to the internet. In this case, contact system admin and ask her/him to allow virtual machines' MAC IP addresses. Alternatively, system admin can allow the IP addresses only, but in this case if IP addresses of virtual machines change, system admin should do the same work again. Also, make sure IP addresse of your virtual machines don't overlap with any other machine in the network.
- Login to both machines and run the command
ifconfig
. You will see that each machine has its own IP starts with "192.168", as shown below:
We already know that our local machine's IP is 192.168.10.107
. So we can make a table like this:
Host-Name | IP Address | Info |
---|---|---|
master | 192.168.10.107 | Local Ubuntu 18 Machine |
slave-1 | 192.168.10.140 | Virtual Ubuntu 18 Machine |
slave-2 | 192.168.10.141 | Virtual Ubuntu 18 Machine |
- In all machines, we will add these hostnames in order to be safe in the future. Open
/etc/hosts
file with your favorite text editor (gedit,GNU Emacs, Nano, vim, etc.) and add the following lines to the top of the document:
192.168.10.107 master
192.168.10.140 slave-1
192.168.10.141 slave-2
and /etc/hosts
file of all machines will look like:
- In order to be sure that everything is fine so far, ping all machines in local machine:
ping master
ping slave-1
ping slave-2
- You should see that ICMP packages find master and slave machines:
- First let's create a linux user dedicated to spark & HDFS operations. We created "spark-user" users in slave machines before, so we can add a user with same name in master machine. In master machine run the following commands:
sudo adduser spark-user # Add a user with name "spark-user".
Enter & retype a password and press enter for following questions.
- Run the following command to add a group with the same name:
sudo usermod -aG spark-user spark-user # Add a group with name "spark-user".
- Add "spark-user" to sudoers in order to allow spark-user to be root:
sudo adduser spark-user sudo
- Login as "spark-user":
su spark-user
- Make sure that openssh-server is installed correctly. If you are not sure and just to be safe, run the following commands:
sudo apt-get purge openssh-server # Removes openssh-server completely
sudo apt-get install openssh-server # Installs openssh-server
ssh localhost
You should successfully connect to localhost via ssh.
- As "spark-user", create ssh-key with following command:
su spark-user
ssh-keygen
You can press enter the all questions showed up in ssh-key generation.
- Your public key should be in
/home/spark-user/.ssh/id_rsa.pub
. To check this, run the following command:
cat /home/spark-user/.ssh/id_rsa.pub
If you see a long string starts with "ssh-rsa ", then everything's ok.
- Now we will copy this id to all machines for paswordless login. Run the following commands:
ssh-copy-id spark-user@slave-1
ssh-copy-id spark-user@slave-2
ssh-copy-id spark-user@master
- Now, ssh to machines seperately, to check that ssh configuration is set correctly.
ssh slave-1
exit # Exit from machine slave-1
ssh slave-2
exit # Exit from machine slave-2
ssh master
exit # Exit from master
You should see no password authentication and it should connect directly.