title | description | services | documentationcenter | author | manager | editor | tags | ms.assetid | ms.service | ms.custom | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Manage Hadoop clusters in HDInsight using Azure portal | Microsoft Docs |
Learn how to create and manage HDInsight clusters using the Azure portal. |
hdinsight |
mumian |
jhubbard |
cgronlun |
azure-portal |
5a76f897-02e8-4437-8f2b-4fb12225854a |
hdinsight |
hdinsightactive |
big-data |
na |
na |
article |
04/27/2017 |
jgao |
[!INCLUDE selector]
Using the Azure portal, you can manage Hadoop clusters in Azure HDInsight. Use the tab selector for information on managing Hadoop clusters in HDInsight using other tools.
Now let's review your options for managing and administering your HDInsight clusters in the Azure portal.
Prerequisites
Before you begin this article, you must have the following:
- An Azure subscription. See Get Azure free trial.
[!INCLUDE delete-cluster-warning]
HDInsight works with a wide range of Hadoop components. For the list of the components that have been verified and supported, see What version of Hadoop is in Azure HDInsight. For the detailed steps to create a cluster using the Azure Portal, see Create Hadoop clusters in HDInsight.
You must specify an Azure subscription when you create an HDInsight cluster. This cluster can be created in either a new Azure Resource group or an existing Resource group. You can use the following steps to verify your permissions for creating HDInsight clusters:
-
To use an existing resource group.
- Sign in to the Azure portal.
- Select Resource groups from the left menu to list the resource groups.
- Select the resource group you want to use for creating your HDInsight cluster.
- Select Access control (IAM), and verify that you (or a group that you belong to) have at least the Contributor access to the resource group.
-
To create a new resource group
- Sign in to the Azure portal.
- Select Subscription from the left menu. It has a yellow key icon. You shall see a list of subscriptions.
- Select the subscription that you use to create clusters.
- Select My permissions. It shows your role on the subscription. You need at least Contributor access to create HDInsight cluster.
If you recieve the NoRegisteredProviderFound error or the MissingSubscriptionRegistration error, see Troubleshoot common Azure deployment errors with Azure Resource Manager.
-
Sign in to https://portal.azure.com.
-
Select More Services from the left menu.
-
Search for HDInsight and select the entry that appears.
-
In the listing, find the cluster by name. If the cluster list is long, you can use filter on the top of the page.
-
Select a cluster from the list to see the overview page:
Overview menu:
- Dashboard: Opens the cluster dashboard, which is Ambari Web for Linux-based clusters.
- Secure Shell: Shows the instructions to connect to the cluster using Secure Shell (SSH) connection.
- Scale Cluster: Allows you to change the number of worker nodes for this cluster.
- Delete: Deletes the cluster.
Left menu:
- Activity logs: Show and query logs of Azure Resource Manager activity.
- Access control (IAM): Use role assignments. See Use role assignments to manage access to your Azure subscription resources.
- Tags: Allows you to set key/value pairs to define a custom taxonomy of your cloud services. For example, you may create a key named project, and then use a common value for all services associated with a specific project.
- Diagnose and solve problems: Display troubleshooting information.
- Locks: Add lock to prevent the cluster being modified or deleted.
- Automation script: Display and export the Azure Resource Manager template for the cluster. Currently, you can only export the dependent Azure storage account. See Create Linux-based Hadoop clusters in HDInsight using Azure Resource Manager templates.
- Quick Start: Displays information that will help you get started using HDInsight.
- Tools for HDInsight: Help information for HDInsight related tools.
- Cluster Login: Display the cluster login information.
- Subscription Core Usage: Display the used and available cores for your subscription.
- Scale Cluster: Increase and decrease the number of cluster worker nodes. SeeScale clusters.
- Secure Shell: Shows the instructions to connect to the cluster using Secure Shell (SSH) connection. For more information, see Use SSH with HDInsight.
- HDInsight Partner: Add/remove the current HDInsight Partner.
- External Metastores: View the Hive and Oozie metastores. The metastores can only be configured during the cluster creation process. See use Hive/Oozie metastore.
- Script Actions: Run Bash scripts on the cluster. See Customize Linux-based HDInsight clusters using Script Action.
- Applications: Add/remove HDInsight applications. See Install custom HDInsight applications.
- Properties: View the cluster properties.
- Storage accounts: View the Azure Storage account or Azure Data Lake Store accounts configured during the cluster creation process.
- Data lake store access: View the configured Azure Active Directory principal used to represent the cluster when accessing Azure Data Lake Store.
- New support request: Allows you to create a support ticket with Microsoft support.
-
Click Properties:
The properties are:
- Hostname: Cluster name.
- Cluster URL: The URL for the Ambari web interface.
- Secure shell (SSH): The username and host name to use in accessing the cluster via SSH.
- Status: Include Aborted, Accepted, ClusterStorageProvisioned, AzureVMConfiguration, HDInsightConfiguration, Operational, Running, Error, Deleting, Deleted, Timedout, DeleteQueued, DeleteTimedout, DeleteError, PatchQueued, CertRolloverQueued, ResizeQueued, ClusterCustomization
- Region: Azure location. For a list of supported Azure locations, see the Region dropdown list box on HDInsight pricing.
- Date created: The date the cluster was deployed.
- Operating system: Either Windows or Linux.
- Type: Hadoop, HBase, Storm, Spark.
- Version. See HDInsight versions
- Subscription: Subscription name.
- Default data source: The default cluster file system.
- Worker nodes size: The selected VM size of the worker nodes.
- Head node size: The selected VM size of the head nodes.
- Virtual network: The name of the Virtual Network and subnet to which the cluster is deployed, if one was selected at deployment time.
Delete a cluster will not delete the default storage account or any linked storage accounts. You can re-create the cluster by using the same storage accounts and the same metastores. It is recommended to use a new default Blob container when you re-create the cluster.
- Sign in to the Portal.
- Click HDInsight Clusters from the left menu. If you don't see HDInsight Clusters, click More services first.
- Click the cluster that you want to delete.
- Click Delete from the top menu, and then follow the instructions.
See also Pause/shut down clusters.
The cluster scaling feature allows you to change the number of worker nodes used by a cluster that is running in Azure HDInsight without having to re-create the cluster.
Note
Only clusters with HDInsight version 3.1.3 or higher are supported. If you are unsure of the version of your cluster, you can check the Properties page. See List and show clusters.
The impact of changing the number of data nodes for each type of cluster supported by HDInsight:
-
Hadoop
You can seamlessly increase the number of worker nodes in a Hadoop cluster that is running without impacting any pending or running jobs. New jobs can also be submitted while the operation is in progress. Failures in a scaling operation are gracefully handled so that the cluster is always left in a functional state.
When a Hadoop cluster is scaled down by reducing the number of data nodes, some of the services in the cluster are restarted. This causes all running and pending jobs to fail at the completion of the scaling operation. You can, however, resubmit the jobs once the operation is complete.
-
HBase
You can seamlessly add or remove nodes to your HBase cluster while it is running. Regional Servers are automatically balanced within a few minutes of completing the scaling operation. However, you can also manually balance the regional servers by logging into the headnode of cluster and running the following commands from a command prompt window:
>pushd %HBASE_HOME%\bin >hbase shell >balancer
For more information on using the HBase shell, see []
-
Storm
You can seamlessly add or remove data nodes to your Storm cluster while it is running. But after a successful completion of the scaling operation, you will need to rebalance the topology.
Rebalancing can be accomplished in two ways:
-
Storm web UI
-
Command-line interface (CLI) tool
Please refer to the Apache Storm documentation for more details.
The Storm web UI is available on the HDInsight cluster:
Here is an example how to use the CLI command to rebalance the Storm topology:
## Reconfigure the topology "mytopology" to use 5 worker processes, ## the spout "blue-spout" to use 3 executors, and ## the bolt "yellow-bolt" to use 10 executors $ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
-
To scale clusters
-
Sign in to the Portal.
-
Click HDInsight Clusters from the left menu.
-
Click the cluster you want to scale.
-
Click Scale Cluster.
-
Enter Number of Worker nodes. The limit on the number of cluster node varies among Azure subscriptions. You can contact billing support to increase the limit. The cost information will reflect the changes you have made to the number of nodes.
Most of Hadoop jobs are batch jobs that are only run occasionally. For most Hadoop clusters, there are large periods of time that the cluster is not being used for processing. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.
There are many ways you can program the process:
- User Azure Data Factory. See Create on-demand Linux-based Hadoop clusters in HDInsight using Azure Data Factory for creating on-demand HDInsight linked services.
- Use Azure PowerShell. See Analyze flight delay data.
- Use Azure CLI. See Manage HDInsight clusters using Azure CLI.
- Use HDInsight .NET SDK. See Submit Hadoop jobs.
For the pricing information, see HDInsight pricing. To delete a cluster from the Portal, see Delete clusters
An HDInsight cluster can have two user accounts. The HDInsight cluster user account (A.K.A. HTTP user account) and the SSH user account are created during the creation process. You can the Ambari web UI to change the cluster user account username and password, and script actions to change the SSH user account
You can use the Ambari Web UI to change the Cluster user password. To log into Ambari, you must use the existing cluster username and password.
Note
If you change the cluster user (admin) password, this may cause script actions run against this cluster to fail. If you have any persisted script actions that target worker nodes, these may fail when you add nodes to the cluster through resize operations. For more information on script actions, see Customize HDInsight clusters using script actions.
- Sign in to the Ambari Web UI using the HDInsight cluster user credentials. The default username is admin. The URL is https://<HDInsight Cluster Name>azurehdinsight.net.
- Click Admin from the top menu, and then click "Manage Ambari".
- From the left menu, click Users.
- Click Admin.
- Click Change Password.
Ambari then changes the password on all nodes in the cluster.
-
Using a text editor, save the following text as a file named changepassword.sh.
[!IMPORTANT] You must use an editor that uses LF as the line ending. If the editor uses CRLF, then the script will not work.
#! /bin/bash USER=$1 PASS=$2 usermod --password $(echo $PASS | openssl passwd -1 -stdin) $USER
-
Upload the file to a storage location that can be accessed from HDInsight using an HTTP or HTTPS address. For example, a public file store such as OneDrive or Azure Blob storage. Save the URI (HTTP or HTTPS address,) to the file, as this is needed in the next step.
-
From the Azure portal, click HDInsight Clusters.
-
Click your HDInsight cluster.
-
Click Script Actions.
-
From the Script Actions blade, select Submit New. When the Submit script action blade appears, enter the following information.
Field Value Name Change ssh password Bash script URI The URI to the changepassword.sh file Nodes (Head, Worker, Nimbus, Supervisor, Zookeeper, etc.) ✓ for all node types listed Parameters Enter the SSH user name and then the new password. There should be one space between the user name and the password. Persist this script action ... Leave this field unchecked. -
Select Create to apply the script. Once the script finishes, you will be able to connect to the cluster using SSH with the new password.
HDInsight clusters have the following HTTP web services (all of these services have RESTful endpoints):
- ODBC
- JDBC
- Ambari
- Oozie
- Templeton
By default, these services are granted for access. You can revoke/grant the access using Azure CLI and Azure PowerShell.
To find your Azure subscription IDs
- Sign in to the Portal.
- Click Subscriptions. Each subscription has a name and an ID.
Each cluster is tied to an Azure subscription. The subscription ID is shown on the cluster Essential tile. See List and show clusters.
In the Azure Resource Manager mode, each HDInsight cluster is created with an Azure Resource Manager group. The Resource Manager group that a cluster belongs to appears in:
- The cluster list has a Resource Group column.
- Cluster Essential tile.
Each HDInsight cluster has a default storage account. The default storage account and its keys for a cluster appears under Storage Accounts. See List and show clusters.
You cannot run Hive job directly from the Azure portal, but you can use the Hive View on Ambari Web UI.
To run Hive queries using Ambari Hive View
-
Sign in to the Ambari Web UI using the HDInsight cluster user credentials. The default username is admin. The URL is https://<HDInsight Cluster Name>azurehdinsight.net.
-
Open Hive View as shown in the following screenshot:
-
Click Query from the top menu.
-
Enter a Hive query in Query Editor, and then click Execute.
See Manage HDInsight clusters by using the Ambari Web UI.
Using the Azure portal, you can browse the content of the default container.
- Sign in to https://portal.azure.com.
- Click HDInsight Clusters from the left menu to list the existing clusters.
- Click the cluster name. If the cluster list is long, you can use filter on the top of the page.
- Click Storage Accounts from the cluster left menu.
- click a storage account.
- Click the Blobs tile.
- Click the default container name.
The Usage section of the HDInsight cluster blade displays information about the number of cores available to your subscription for use with HDInsight, as well as the number of cores allocated to this cluster and how they are allocated for the nodes within this cluster. See List and show clusters.
Important
To monitor the services provided by the HDInsight cluster, you must use Ambari Web or the Ambari REST API. For more information on using Ambari, see Manage HDInsight clusters using Ambari
In this article, you have learned how to create an HDInsight cluster by using the Portal, and how to open the Hadoop command-line tool. To learn more, see the following articles:
- Administer HDInsight Using Azure PowerShell
- Administer HDInsight Using Azure CLI
- Create HDInsight clusters
- Use Hive in HDInsight
- Use Pig in HDInsight
- Use Sqoop in HDInsight
- Get Started with Azure HDInsight
- What version of Hadoop is in Azure HDInsight?
- Read more about using the Ambari Web UI
- Details on using the Ambari REST API