diff --git a/docs/EN_US/HPCCSystemAdmin/HPCCSystemAdministratorsGuide.xml b/docs/EN_US/HPCCSystemAdmin/HPCCSystemAdministratorsGuide.xml index ac212dfdeac..b5e8466344b 100644 --- a/docs/EN_US/HPCCSystemAdmin/HPCCSystemAdministratorsGuide.xml +++ b/docs/EN_US/HPCCSystemAdmin/HPCCSystemAdministratorsGuide.xml @@ -11,7 +11,7 @@ - + @@ -41,26 +41,26 @@ similarity to actual persons, living or dead, is purely coincidental. - + + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> HPCC Systems® + xmlns:xi="http://www.w3.org/2001/XInclude"/> - + @@ -72,16 +72,16 @@ Introduction - The HPCC (High Performance Computing Cluster) Systems platform is a massive - parallel-processing computing platform that solves Big Data + The HPCC (High Performance Computing Cluster) Systems platform is + a massive parallel-processing computing platform that solves Big Data problems. The HPCC Systems platform stores and processes large quantities of data, processing billions of records per second using massive parallel processing technology. Large amounts of data across disparate data sources can be accessed, analyzed, and manipulated in fractions of - seconds. The HPCC Systems platform functions as both a processing and a distributed - data storage environment, capable of analyzing terabytes of + seconds. The HPCC Systems platform functions as both a processing and a + distributed data storage environment, capable of analyzing terabytes of information. @@ -97,7 +97,7 @@ - + @@ -122,7 +122,7 @@ - + @@ -133,8 +133,8 @@ Clusters - An HPCC Systems environment contains clusters which you define and - use according to your needs. The types of clusters used by HPCC + An HPCC Systems environment contains clusters which you define + and use according to your needs. The types of clusters used by HPCC Systems: @@ -171,7 +171,7 @@ - + @@ -338,6 +338,14 @@ Those credentials are then used to authenticate any requests from those tools. + + + Topology Server + + The topology server is an internal component used by ROXIE to + keep track of the health of the different ROXIE processes in a + cluster. + @@ -422,7 +430,7 @@ + xmlns:xi="http://www.w3.org/2001/XInclude"/> @@ -431,28 +439,36 @@ Hardware and Software Requirements - This chapter provides an overview of the hardware and software requirements for running the HPCC Systems platform optimally. While these requirements were significant when the HPCC Systems platform was first deployed many years ago, there have been substantial improvements in hardware since then. The platform now supports virtual containers and cloud deployments, making the requirements less significant even for large-scale (petabytes) bare-metal deployments. In fact, the HPCC Systems platform should perform satisfactorily on most modern hardware configurations. - + This chapter provides an overview of the hardware and software + requirements for running the HPCC Systems platform optimally. While these + requirements were significant when the HPCC Systems platform was first + deployed many years ago, there have been substantial improvements in + hardware since then. The platform now supports virtual containers and + cloud deployments, making the requirements less significant even for + large-scale (petabytes) bare-metal deployments. In fact, the HPCC Systems + platform should perform satisfactorily on most modern hardware + configurations. + Hardware and Components This section provides some insight as to what sort of hardware and - infrastructure optimally the HPCC Systems platform works well on. This is not an - exclusive comprehensive set of instructions, nor a mandate on what + infrastructure optimally the HPCC Systems platform works well on. This is + not an exclusive comprehensive set of instructions, nor a mandate on what hardware you must have. Consider this as a guide to use when looking to implement or scale your HPCC Systems platform. These suggestions should be taken into consideration for your specific enterprise needs. - The HPCC Systems platform is designed to run on commodity hardware, which makes - building and maintaining large scale (petabytes) clusters economically - feasible. When planning your cluster hardware, you will need to balance a - number of considerations, including fail-over domains and potential - performance issues. Hardware planning should include distributing HPCC - Systems across multiple physical hosts, such as a cluster. Generally, one - type of best practice is to run the HPCC Systems platform processes of a particular - type, for example Thor, Roxie, or Dali, on a host configured specifically - for that type of process. + The HPCC Systems platform is designed to run on commodity hardware, + which makes building and maintaining large scale (petabytes) clusters + economically feasible. When planning your cluster hardware, you will need + to balance a number of considerations, including fail-over domains and + potential performance issues. Hardware planning should include + distributing HPCC Systems across multiple physical hosts, such as a + cluster. Generally, one type of best practice is to run the HPCC Systems + platform processes of a particular type, for example Thor, Roxie, or Dali, + on a host configured specifically for that type of process. Thor Hardware @@ -467,11 +483,11 @@ larger physical servers to run multiple Thor slave nodes per physical server. - It is important to note that the HPCC Systems platform by nature is a parallel - processing system and all Thor slave nodes will be exercising at - precisely the same time. So when allocating more than one HPCC Systems - Thor slave per physical machine assure that each slave meets the - recommended requirements. + It is important to note that the HPCC Systems platform by nature + is a parallel processing system and all Thor slave nodes will be + exercising at precisely the same time. So when allocating more than one + HPCC Systems Thor slave per physical machine assure that each slave + meets the recommended requirements. For instance, 1 physical server with 48 cores, 96GB RAM, 10Gb/sec network and 2GB/sec sequential I/O would be capable of running ten (10) @@ -518,14 +534,14 @@ Dali and Sasha Hardware Configurations - The HPCC Systems platform Dali processes store cluster metadata in RAM. For - optimal efficiency, provide at least 48GB of RAM, 6 or more CPU cores, - 1Gb/sec network interface and a high availability disk for a single HPCC - Systems Dali. The HPCC Systems platform Dali processes are one of the few native - active/passive components. Using standard "swinging disk" clustering is - recommended for a high availability setup. For a single HPCC Systems platform - Dali process, any suitable High Availability (HA) RAID level is - fine. + The HPCC Systems platform Dali processes store cluster metadata in + RAM. For optimal efficiency, provide at least 48GB of RAM, 6 or more CPU + cores, 1Gb/sec network interface and a high availability disk for a + single HPCC Systems Dali. The HPCC Systems platform Dali processes are + one of the few native active/passive components. Using standard + "swinging disk" clustering is recommended for a high availability setup. + For a single HPCC Systems platform Dali process, any suitable High + Availability (HA) RAID level is fine. Sasha only stores data to locally available disks, reading data from Dali then processing it by archiving workunits (WUs) to disk. It is @@ -584,7 +600,7 @@ transaction log at restart in order to replay the last save and changes to return to the last operational state. - + Understanding the log files, and what is normally reported in - the log files, helps in troubleshooting the HPCC Systems platform clusters. + the log files, helps in troubleshooting the HPCC Systems platform + clusters. As part of routine maintenance you may want to backup, archive, and remove the older log files. Some log files can grow quite large @@ -1029,7 +1047,7 @@ - + @@ -1080,7 +1098,7 @@ - + @@ -1091,15 +1109,15 @@ + xmlns:xi="http://www.w3.org/2001/XInclude"/> System Configuration and Management - The HPCC Systems platform require configuration. The Configuration Manager tool - (configmgr) included with the system software is a valuable piece of - setting up your HPCC Systems platform. The Configuration Manager is a - graphical tool provided that can be used to configure your system. + The HPCC Systems platform require configuration. The Configuration + Manager tool (configmgr) included with the system software is a valuable + piece of setting up your HPCC Systems platform. The Configuration Manager + is a graphical tool provided that can be used to configure your system. Configuration Manager has a wizard that you can run which will easily generate an environment file to get you configured, up and running quickly. There is an advanced option available through Configuration @@ -1113,7 +1131,7 @@ - + @@ -1122,37 +1140,38 @@ + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> Environment.conf - A component of the HPCC Systems platform on bare-metal configuration is the - environment.conf file. Environment.conf contains some global definitions - that the configuration manager uses to configure the HPCC Systems platform. In - most cases, the defaults are sufficient. + A component of the HPCC Systems platform on bare-metal + configuration is the environment.conf file. Environment.conf contains + some global definitions that the configuration manager uses to configure + the HPCC Systems platform. In most cases, the defaults are + sufficient. The environment.conf file only works for bare-metal deployments. For container or cloud deployments the environment.conf is not valid, @@ -1162,13 +1181,13 @@ - + - + - + WARNING: These settings are essential to proper system operation. Only expert @@ -1473,9 +1492,10 @@ lock=/var/lock/HPCCSystems highest priority, and a value of 19 is the lowest. The default environment.conf file is delivered with the nice - value disabled. If you wish to use nice to prioritize HPCC Systems platform - processes, you need to modify the environment.conf file to enable - nice. You can also adjust the nice value in environment.conf. + value disabled. If you wish to use nice to prioritize HPCC Systems + platform processes, you need to modify the environment.conf file to + enable nice. You can also adjust the nice value in + environment.conf. @@ -1546,13 +1566,13 @@ lock=/var/lock/HPCCSystems - + - + - + When targeting custom classes, all dependencies must be accessible by declaring their location on the @@ -1634,21 +1654,21 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> Initialization under Systemd @@ -1734,9 +1754,9 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile The performance of your system can vary depending on how some components interact. One area which could impact performance is the relationship with users, groups, and Active Directory. If possible, - having a separate Active Directory specific to the HPCC Systems platform could be a - good policy. There have been a few instances where just one Active - Directory servicing many, diverse applications has been less than + having a separate Active Directory specific to the HPCC Systems platform + could be a good policy. There have been a few instances where just one + Active Directory servicing many, diverse applications has been less than optimal. HPCC Systems makes setting up your Active Directory OU's @@ -1867,7 +1887,7 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile + xmlns:xi="http://www.w3.org/2001/XInclude"/> @@ -1925,12 +1945,12 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile Best Practices This chapter outlines various forms of best practices established by - long time HPCC Systems users and administrators running the HPCC Systems platform in a - high availability, demanding production environment. While it is not - required that you run your environment in this manner, as your specific - requirements may vary. This section provides some best practice - recommendations established after several years of running the HPCC Systems platform in - a demanding, intense, production environment. + long time HPCC Systems users and administrators running the HPCC Systems + platform in a high availability, demanding production environment. While + it is not required that you run your environment in this manner, as your + specific requirements may vary. This section provides some best practice + recommendations established after several years of running the HPCC + Systems platform in a demanding, intense, production environment. Cluster Redundancy @@ -1942,13 +1962,13 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile - + - + - + Make sure you allocate ample resources to your key components. Dali is RAM intensive. ECL Agent and ECL @@ -2220,7 +2240,7 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile - + @@ -2359,7 +2379,7 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile described. If that option is enabled, then you must unset the option, or add the hpcc user to the AllowUsers list. - + @@ -2374,7 +2394,7 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile - + @@ -2421,15 +2441,15 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile configurations were typically set to N number of slavesPerNode slavesPerNode - , where N equalled or approached - the number of cores per machine. + , where N equalled or approached the number + of cores per machine. This resulted in N independent slave processes per node, as seen below: - + - + This had several significant disadvantages: @@ -2452,7 +2472,7 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile Now a new approach is used, allowing virtual slaves to be created with a single slave process, as depicted below:. - + @@ -2465,10 +2485,9 @@ HPCCPrivateKeyFile=/keyfilepath/keyfile Thor configuration option called channelsPerSlave channelsPerSlave - . Under this architecture, - slaves within the same process can communicate directly with one - another and share resources. + . + Under this architecture, slaves within the same process can + communicate directly with one another and share resources. @@ -2624,17 +2643,18 @@ heapUseTransparentHugePages + xmlns:xi="http://www.w3.org/2001/XInclude"/> + xmlns:xi="http://www.w3.org/2001/XInclude"/> System Resources - There are additional resources available for the HPCC Systems platform. + There are additional resources available for the HPCC Systems + platform. HPCC Systems Resources @@ -2658,8 +2678,8 @@ heapUseTransparentHugePages Additional Resources - Additional help with the HPCC Systems platform and Learning ECL is also - available. There are online courses available. Go to : + Additional help with the HPCC Systems platform and Learning ECL is + also available. There are online courses available. Go to : https://learn.lexisnexis.com/hpcc