Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-33122 Clean up the mechanism for modifying the component config #19362

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
6 changes: 4 additions & 2 deletions dali/base/daclient.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ void installEnvConfigMonitor()
// ISDSSubscription impl.
virtual void notify(SubscriptionId id, const char *xpath, SDSNotifyFlags flags, unsigned valueLen=0, const void *valueData=nullptr) override
{
executeConfigUpdaterCallbacks();
refreshConfiguration();
}
};

Expand Down Expand Up @@ -147,7 +147,6 @@ bool initClientProcess(IGroup *servergrp, DaliClientRole role, unsigned mpport,
// causes any config update hooks (installed by installConfigUpdateHook() to trigger on an env. change)
switch (role)
{
case DCR_ThorMaster:
case DCR_EclServer:
case DCR_EclAgent:
case DCR_SashaServer:
Expand All @@ -158,6 +157,9 @@ bool initClientProcess(IGroup *servergrp, DaliClientRole role, unsigned mpport,
case DCR_EclScheduler:
case DCR_EclCCServer:
installEnvConfigMonitor();
break;
// Thor does not monitor because a fixed configuration is serialized to the slaves
case DCR_ThorMaster:
default:
break;
}
Expand Down
5 changes: 3 additions & 2 deletions dali/base/dadfs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10756,10 +10756,11 @@ class CInitGroups

void constructStorageGroups(bool force, StringBuffer &messages)
{
Owned<IPropertyTree> storage = getGlobalConfigSP()->getPropTree("storage");
Owned<IPropertyTree> globalConfig = getGlobalConfig();
Owned<IPropertyTree> storage = globalConfig->getPropTree("storage");
if (storage)
{
normalizeHostGroups();
normalizeHostGroups(globalConfig);

Owned<IPropertyTreeIterator> planes = storage->getElements("planes");
ForEach(*planes)
Expand Down
64 changes: 50 additions & 14 deletions dali/base/dafdesc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3722,16 +3722,14 @@ static void generateHosts(IPropertyTree * storage, GroupInfoArray & groups)
static CConfigUpdateHook configUpdateHook;
static std::atomic<unsigned> normalizeHostGroupUpdateCBId{(unsigned)-1};
static CriticalSection storageCS;
static void doInitializeStorageGroups(bool createPlanesFromGroups)
static void doInitializeStorageGroups(bool createPlanesFromGroups, IPropertyTree * newGlobalConfiguration)
{
CriticalBlock block(storageCS);
Owned<IPropertyTree> globalConfig = getGlobalConfig();
Owned<IPropertyTree> storage = globalConfig->getPropTree("storage");
Owned<IPropertyTree> storage = newGlobalConfiguration->getPropTree("storage");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: could be an unlinked item, i.e. could be IPropertyTree *storage = newGlobalConfiguration->queryPropTree("storage");

only mentioning, because I looked at the code and the fact it was linked made me wonder if it was necessary for some reason.

if (!storage)
storage.set(globalConfig->addPropTree("storage"));
storage.set(newGlobalConfiguration->addPropTree("storage"));

#ifndef _CONTAINERIZED
if (createPlanesFromGroups)
if (!isContainerized() && createPlanesFromGroups)
{
// Remove old planes created from groups
while (storage->removeProp("planes[@fromGroup='1']"));
Expand Down Expand Up @@ -3786,25 +3784,63 @@ static void doInitializeStorageGroups(bool createPlanesFromGroups)
//Uncomment the following to trace the values that been generated
//printYAML(storage);
}
#endif

//Ensure that host groups that are defined in terms of other host groups are expanded out so they have an explicit list of hosts
normalizeHostGroups();
normalizeHostGroups(newGlobalConfiguration);

//The following can be removed once the storage planes have better integration
setupContainerizedStorageLocations();
}

void initializeStorageGroups(bool createPlanesFromGroups)
/*
* This function is used to:
*
* (a) create storage planes from bare-metal groups
* (b) to expand out host groups that are defined in terms of other host groups
* (c) to setup the base directory for the storage planes. (GH->JCS is this threadsafe?)
*
* For most components this init function is called only once - the exception is roxie (called when it connects and disconnects from dali).
* In thor it is important that the update of the global config does not clone the property trees
* In containerized mode, the update function can be called whenever the config file changes (but it will not be creating planes from groups)
* In bare-metal mode the update function may be called whenever the environment changes in dali. (see initClientProcess)
*
* Because of the requirement that thor does not clone the property trees, thor cannot support any background update - via the environment in
* bare-metal, or config files in containerized. If it was to support it the code in the master would need to change, and updates would
* need to be synchronized to the workers. To support this requiremnt a threadSafe parameter is provided to avoid the normal clone.
*
* For roxie, which dynamically connects and disconnects from dali, there are different problems. The code needs to retain whether or not roxie
* is connected to dali - and only create planes from groups if it is connected. There is another potential problem, which I don't think will
* actually be hit:
* Say roxie connects, updates planes from groups and disconnects. If the update functions were called again those storage planes would be lost,
* but in bare-metal only a change to the environment will trigger an update, and that update will not happen if roxie is not connected to dali.
*/

static bool savedConnectedToDali = false; // Store in a global so it can be used by the callback without having to reregister
static bool lastUpdateConnectedToDali = false;
void initializeStoragePlanes(bool connectedToDali, bool threadSafe)
{
auto updateFunc = [createPlanesFromGroups](const IPropertyTree *oldComponentConfiguration, const IPropertyTree *oldGlobalConfiguration)
{
PROGLOG("initializeStorageGroups update");
doInitializeStorageGroups(createPlanesFromGroups);
//If the createPlanesFromGroups parameter is now true, and was previously false, force an update
CriticalBlock block(storageCS);
if (!lastUpdateConnectedToDali && connectedToDali)
configUpdateHook.clear();
savedConnectedToDali = connectedToDali;
}

auto updateFunc = [](IPropertyTree * newComponentConfiguration, IPropertyTree * newGlobalConfiguration)
{
bool connectedToDali = savedConnectedToDali;
lastUpdateConnectedToDali = connectedToDali;
PROGLOG("initializeStoragePlanes update");
doInitializeStorageGroups(connectedToDali, newGlobalConfiguration);
};

doInitializeStorageGroups(createPlanesFromGroups);
configUpdateHook.installOnce(updateFunc, false);
configUpdateHook.installModifierOnce(updateFunc, threadSafe);
}

void disableStoragePlanesDaliUpdates()
{
savedConnectedToDali = false;
}

bool getDefaultStoragePlane(StringBuffer &ret)
Expand Down
4 changes: 3 additions & 1 deletion dali/base/dafdesc.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,9 @@ extern da_decl StringBuffer &getPartMask(StringBuffer &ret,const char *lname=NUL
extern da_decl void setPartMask(const char * mask);
extern da_decl bool setReplicateDir(const char *name,StringBuffer &out, bool isrep=true,const char *baseDir=NULL,const char *repDir=NULL); // changes directory of name passed to backup directory

extern da_decl void initializeStorageGroups(bool createPlanesFromGroups);
extern da_decl void initializeStoragePlanes(bool createPlanesFromGroups, bool threadSafe); // threadSafe should be true if no other threads will be accessing the global config
extern da_decl void disableStoragePlanesDaliUpdates();

extern da_decl bool getDefaultStoragePlane(StringBuffer &ret);
extern da_decl bool getDefaultSpillPlane(StringBuffer &ret);
extern da_decl bool getDefaultIndexBuildStoragePlane(StringBuffer &ret);
Expand Down
18 changes: 14 additions & 4 deletions dali/base/dameta.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,12 @@
#include "dautils.hpp"


// Expand indirect hostGroups so each hostGroups has an expanded list of host names
void normalizeHostGroups()
//Expand indirect hostGroups so each hostGroups has an expanded list of host names
//This function cannot use (directly or indirectly) getGlobalConfig(), because it may be called when creating
//a new config.
void normalizeHostGroups(IPropertyTree * globalConfig)
{
Owned<IPropertyTreeIterator> hostGroupIter = getGlobalConfigSP()->getElements("storage/hostGroups");
Owned<IPropertyTreeIterator> hostGroupIter = globalConfig->getElements("storage/hostGroups");
//Process the groups in order - so that multiple levels of indirection are supported
ForEach (*hostGroupIter)
{
Expand All @@ -33,7 +35,15 @@ void normalizeHostGroups()
{
const char * name = cur.queryProp("@name");
const char * baseGroup = cur.queryProp("@hostGroup");
Owned<IPropertyTree> match = getHostGroup(baseGroup, true);
if (!baseGroup)
throw makeStringExceptionV(-1, "HostGroup %s with no hosts does not have a base hostgroup", name ? name : "<null>");

//Cannot call getHostGroup() because that uses getGlobalConfig()
VStringBuffer xpath("storage/hostGroups[@name='%s']", baseGroup);
IPropertyTree * match = globalConfig->queryPropTree(xpath);
if (!match)
throw makeStringExceptionV(-1, "No entry found for hostGroup: '%s'", baseGroup);

StringArray hosts;
Owned<IPropertyTreeIterator> hostIter = match->getElements("hosts");
ForEach (*hostIter)
Expand Down
2 changes: 1 addition & 1 deletion dali/base/dameta.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,6 @@ constexpr ResolveOptions operator &(ResolveOptions l, ResolveOptions r) { return
*/

extern da_decl IPropertyTree * resolveLogicalFilenameFromDali(const char * filename, IUserDescriptor * user, ResolveOptions options);
extern da_decl void normalizeHostGroups(); // Expand indirect hostGroups so each hostGroups has an expanded list of host names
extern da_decl void normalizeHostGroups(IPropertyTree * globalConfig); // Expand indirect hostGroups so each hostGroups has an expanded list of host names

#endif
5 changes: 3 additions & 2 deletions dali/base/dautils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,9 @@ bool getPlaneHost(StringBuffer &host, IPropertyTree *plane, unsigned which)
if (!hostGroup)
return false;

if (which >= hostGroup->getCount("hosts"))
throw makeStringException(0, "getPlaneHost: index out of range");
unsigned maxHosts = hostGroup->getCount("hosts");
if (which >= maxHosts)
throw makeStringExceptionV(0, "getPlaneHost: index %u out of range 1..%u", which, maxHosts);
VStringBuffer xpath("hosts[%u]", which+1); // which is 0 based
host.append(hostGroup->queryProp(xpath));
return true;
Expand Down
2 changes: 1 addition & 1 deletion dali/daliadmin/daadmin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -507,7 +507,7 @@ bool dfspart(const char *lname, IUserDescriptor *userDesc, unsigned partnum, Str
void dfsmeta(const char *filename,IUserDescriptor *userDesc, bool includeStorage)
{
//This function isn't going to work on a container system because it won't have access to the storage planes
initializeStorageGroups(true);
initializeStoragePlanes(true, true);
ResolveOptions options = ROpartinfo|ROdiskinfo|ROsizes;
if (includeStorage)
options = options | ROincludeLocation;
Expand Down
2 changes: 1 addition & 1 deletion dali/dfu/dfuserver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ int main(int argc, const char *argv[])
IPropertyTree * config = nullptr;
installDefaultFileHooks(config);

initializeStorageGroups(true);
initializeStoragePlanes(true, true);
}
StringBuffer queue, monitorQueue;
#ifndef _CONTAINERIZED
Expand Down
2 changes: 1 addition & 1 deletion dali/dfu/dfuutil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -513,7 +513,7 @@ class CFileCloner
dstfdesc->setDefaultDir(dstdir.str());

Owned<IStoragePlane> plane = getDataStoragePlane(cluster1, false);
if (plane) // I think it should always exist, even in bare-metal.., but guard against it not for now (assumes initializeStorageGroups has been called)
if (plane) // I think it should always exist, even in bare-metal.., but guard against it not for now (assumes initializeStoragePlanes has been called)
{
DBGLOG("cloneSubFile: destfilename='%s', plane='%s', dirPerPart=%s", destfilename, cluster1.get(), boolToStr(plane->queryDirPerPart()));

Expand Down
2 changes: 1 addition & 1 deletion dali/dfuxref/dfuxrefmain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ int main(int argc, char* argv[])
try
{
initClientProcess(group, DCR_XRef);
initializeStorageGroups(true);
initializeStoragePlanes(true, true);
StringArray args, clusters;
bool backupcheck = false;
unsigned mode = PMtextoutput|PMcsvoutput|PMtreeoutput;
Expand Down
2 changes: 1 addition & 1 deletion dali/sasha/saserver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -409,7 +409,7 @@ int main(int argc, const char* argv[])
else
{
addAbortHandler(actionOnAbort);
initializeStorageGroups(true);
initializeStoragePlanes(true, true);
#ifdef _CONTAINERIZED
service = serverConfig->queryProp("@service");
if (isEmptyString(service))
Expand Down
2 changes: 1 addition & 1 deletion ecl/eclagent/eclagent.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3785,7 +3785,7 @@ extern int HTHOR_API eclagent_main(int argc, const char *argv[], Owned<ILocalWor
}
}

initializeStorageGroups(daliClientActive());
initializeStoragePlanes(daliClientActive(), true);

if (!standAloneWorkUnit)
{
Expand Down
2 changes: 1 addition & 1 deletion esp/platform/espcfg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ CEspConfig::CEspConfig(IProperties* inputs, IPropertyTree* envpt, IPropertyTree*
if (sdsSessionNeeded && !daliservers.isEmpty())
initSDSSessionCleaner(isDetachedFromDali());

initializeStorageGroups(daliClientActive());
initializeStoragePlanes(daliClientActive(), true);

const unsigned dafilesrvConnectTimeout = m_cfg->getPropInt("@dafilesrvConnectTimeout", 10)*1000;
const unsigned dafilesrvReadTimeout = m_cfg->getPropInt("@dafilesrvReadTimeout", 10)*1000;
Expand Down
2 changes: 1 addition & 1 deletion esp/platform/espp.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,7 @@ int init_main(int argc, const char* argv[])
config->checkESPCache(*server.get());

initializeMetrics(config);
initializeStorageGroups(daliClientActive());
initializeStoragePlanes(daliClientActive(), true);
}
catch(IException* e)
{
Expand Down
4 changes: 3 additions & 1 deletion roxie/ccd/ccddali.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ class CRoxieDaliHelper : implements IRoxieDaliHelper, public CInterface
virtual void beforeDispose()
{
CriticalBlock b(daliHelperCrit);

disconnectSem.interrupt();
connectWatcher.stop();
if (daliHelper==this) // there is a tiny window where new dalihelper created immediately after final release
Expand Down Expand Up @@ -805,7 +806,7 @@ class CRoxieDaliHelper : implements IRoxieDaliHelper, public CInterface
waitToConnect -= delay;
}
}
initializeStorageGroups(daliHelper->connected());
initializeStoragePlanes(daliHelper->connected(), false); // This can be called while queries are running - so is not thread safe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not new, but it looks like connectToDali is called every time a query dll is loaded, unconditionally,
meaning there is excessive calling of loading/refreshing the configuration (may be slowing the init time down as well).
Should this only be called by when it 1st time and/or when it actually reconnects to Dali ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created HPCC-33198 to look at that issue.

return daliHelper;
}

Expand Down Expand Up @@ -928,6 +929,7 @@ class CRoxieDaliHelper : implements IRoxieDaliHelper, public CInterface
CriticalBlock b(daliConnectionCrit);
if (isConnected)
{
disableStoragePlanesDaliUpdates();
isConnected = false;
delete serverStatus;
serverStatus = NULL;
Expand Down
2 changes: 1 addition & 1 deletion roxie/ccd/ccdmain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1463,7 +1463,7 @@ int CCD_API roxie_main(int argc, const char *argv[], const char * defaultYaml)
#endif

//MORE: I'm not sure where this should go, or how it fits in. Possibly the function needs to be split in two.
initializeStorageGroups(false);
initializeStoragePlanes(false, true);

EnableSEHtoExceptionMapping();
setSEHtoExceptionHandler(&abortHandler);
Expand Down
Loading
Loading