-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for staking code and thread shutdown #505
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. Comments below are just minor things and simply opinions.
src/komodo_bitcoind.cpp
Outdated
segid32 = komodo_stakehash(&hash,address,hashbuf,txid,vout); | ||
if ( *numkp >= *maxkp ) | ||
if ( array.size() >= *maxkp ) | ||
{ | ||
*maxkp += 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see the reason for maxkp. I'm thinking it is to control allocations. If the desired functionality is to grow in chunks of 1000 elements, why not use array.size()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay thank you, maxkp can be eliminated and array.size() and capacity() be used
src/komodo_bitcoind.cpp
Outdated
kp->nValue = nValue; | ||
kp->scriptPubKey = pk; | ||
return(array); | ||
//array = (struct komodo_staking *)realloc(array,sizeof(*array) * (*maxkp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments like this make the code difficult to follow. If you remove these lines before committing you lose nothing, as features within git make historical research easy.
src/komodo_bitcoind.cpp
Outdated
@@ -2564,23 +2567,23 @@ int32_t komodo_staked(CMutableTransaction &txNew,uint32_t nBits,uint32_t *blockt | |||
fprintf(stderr,"[%s:%d] chain tip changed during staking loop t.%u counter.%d\n",ASSETCHAINS_SYMBOL,nHeight,(uint32_t)time(NULL),i); | |||
return(0); | |||
} | |||
kp = &array[i]; | |||
eligible = komodo_stake(0,bnTarget,nHeight,kp->txid,kp->vout,0,(uint32_t)tipindex->nTime+ASSETCHAINS_STAKED_BLOCK_FUTURE_HALF,kp->address,PoSperc); | |||
struct komodo_staking &kp = array[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct
s have been upgraded to first-class citizens now. No need to use that keyword here (unless you're compiling for C).
btw this is generally not good to memset C++ objects (which have their own constructors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Co-authored-by: DeckerSU <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks @dimxy
@@ -2188,6 +2188,8 @@ void static BitcoinMiner() | |||
if (minerThreads != NULL) | |||
{ | |||
minerThreads->interrupt_all(); | |||
// std::cout << "Waiting for mining threads to stop..." << std::endl; | |||
minerThreads->join_all(); // prevent thread overlapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the lack of interruption points, using both interrupt_all() and join_all() can (and "randomly" will) bring everything to a grinding halt when between interruptibles and unjoinable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about lack of interruption points.
Currently we do not have join what leads to inaccurate thread shutdown.
And adding join_all works well in marmara and fixed a crash when a user called 'setgenerate false' and 'setgenerate true' quickly in a sequence.
But I missed an extra interruption_point (from the marmara code) in komodo_waituntilelegible which does a long loop which would make thread shutdowns faster, adding it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and fixed a crash when a user called 'setgenerate false' and 'setgenerate true' quickly in a sequence.
This is actually where adding the join_all() creates the issue. Most specifically, it creates an issue with NN mining and CreateNewBlock (which MCL doesn't use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what issue you mean? I believe joins may create an issue when a thread can never be joined and hangs because it may have a long loop without interruption points or sleep or wait calls. If we have such loops we need to fix this to provide thread graceful shutdown to work.
(I am going to run my dev NN node on this branch for testing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BitcoinMiner
only contains a single interruption_point()
That point comes after the while loop at:
Line 1969 in dcbf657
while ( GetTime() < B.nTime-2 ) |
which can hold up to 17.5 minutes, as set by:
Line 860 in dcbf657
pblock->nTime += (r % (33 - gpucount)*(33 - gpucount)); |
Ironically, the likelihood of that longest pause is increased by the new "stall reduction" code increasing the possibility for
gpucount
to reach 0 (assuming the rand hits 1056, which is possible).
Being async, the thought would be that this part would stick on its own and there would be no care; however, in reality, there are about a dozen circumstances where it locks everything (one of those circumstances being that threads are ignorant of each other and on advanced hardware come back with multiple solves when many miner threads are used).
When you look at what "should be" vs "what is", it shouldn't be a problem because NNs "should be" only running one thread; however, this is becoming decreasingly true with more and more NNs seeking to hit smaller and more predictable gaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good catch, this loop.
However I believe when a user calls setgenerate false and then setgenerate true without join there will be two running threads for some time and this is basically not good at all (in a staking chain this could even create a crash as there is a static komodo_staking *array var that could be corrupted in this case). So I think threads should be stopped gracefully by joining.
This loop you mentioned has sleep() inside and we can replace it on boost::this_thread::sleep_for function (which allows to interrupt the thread) and we should check other remaining loops in miner.cpp to add interruption points in a similar way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we can replace it on boost::this_thread::sleep_for function (which allows to interrupt the thread) and we should check other remaining loops in miner.cpp to add interruption points in a similar way
However you want to do it, just want to make sure that you/everyone is aware that doing it as-is will lock up NNs; so, whatever way it's done to protect stakers that could crash needs to be done in such a way as to protect both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about your doubts though.
Deleting a thread object without join is an obvious bug IMO and should be fixed.
Fixing it may add some delay on daemon stopping or setgenerate false but this is not a lock-up if we do this properly.
Checking all loops in miner.cpp...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw this code does not work at all:
Line 251 in 91ea37b
boost::this_thread::disable_interruption(); |
Maybe we should fix this too as it was intended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't fully tested it yet, but (along with the others)
Line 1989 in 12261b9
boost::this_thread::sleep_for(boost::chrono::seconds(1)); // allow to interrupt |
does look like it'll solve the issue of my concern. ty
Actually
So, i guess no need to change |
now sleep() replaced on sleep_for() |
these changes implemented in combined PR: #559 |
fixes for this issue #478