Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for staking code and thread shutdown #505

Closed
wants to merge 10 commits into from

Conversation

dimxy
Copy link
Collaborator

@dimxy dimxy commented Sep 23, 2021

fixes for this issue #478

jmjatlanta
jmjatlanta previously approved these changes Sep 24, 2021
Copy link

@jmjatlanta jmjatlanta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Comments below are just minor things and simply opinions.

segid32 = komodo_stakehash(&hash,address,hashbuf,txid,vout);
if ( *numkp >= *maxkp )
if ( array.size() >= *maxkp )
{
*maxkp += 1000;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see the reason for maxkp. I'm thinking it is to control allocations. If the desired functionality is to grow in chunks of 1000 elements, why not use array.size()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay thank you, maxkp can be eliminated and array.size() and capacity() be used

kp->nValue = nValue;
kp->scriptPubKey = pk;
return(array);
//array = (struct komodo_staking *)realloc(array,sizeof(*array) * (*maxkp));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments like this make the code difficult to follow. If you remove these lines before committing you lose nothing, as features within git make historical research easy.

@@ -2564,23 +2567,23 @@ int32_t komodo_staked(CMutableTransaction &txNew,uint32_t nBits,uint32_t *blockt
fprintf(stderr,"[%s:%d] chain tip changed during staking loop t.%u counter.%d\n",ASSETCHAINS_SYMBOL,nHeight,(uint32_t)time(NULL),i);
return(0);
}
kp = &array[i];
eligible = komodo_stake(0,bnTarget,nHeight,kp->txid,kp->vout,0,(uint32_t)tipindex->nTime+ASSETCHAINS_STAKED_BLOCK_FUTURE_HALF,kp->address,PoSperc);
struct komodo_staking &kp = array[i];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structs have been upgraded to first-class citizens now. No need to use that keyword here (unless you're compiling for C).

@dimxy
Copy link
Collaborator Author

dimxy commented Sep 25, 2021

btw this is generally not good to memset C++ objects (which have their own constructors)
https://github.com/dimxy/komodo/blob/ba1b22a300144c97285906f2abc1486cc17530d9/src/komodo_bitcoind.cpp#L2462

jmjatlanta
jmjatlanta previously approved these changes Feb 22, 2022
Copy link

@jmjatlanta jmjatlanta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

src/miner.cpp Outdated Show resolved Hide resolved
Co-authored-by: DeckerSU <[email protected]>
jmjatlanta
jmjatlanta previously approved these changes Jul 12, 2022
Copy link

@jmjatlanta jmjatlanta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @dimxy

@@ -2188,6 +2188,8 @@ void static BitcoinMiner()
if (minerThreads != NULL)
{
minerThreads->interrupt_all();
// std::cout << "Waiting for mining threads to stop..." << std::endl;
minerThreads->join_all(); // prevent thread overlapping

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the lack of interruption points, using both interrupt_all() and join_all() can (and "randomly" will) bring everything to a grinding halt when between interruptibles and unjoinable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about lack of interruption points.
Currently we do not have join what leads to inaccurate thread shutdown.
And adding join_all works well in marmara and fixed a crash when a user called 'setgenerate false' and 'setgenerate true' quickly in a sequence.
But I missed an extra interruption_point (from the marmara code) in komodo_waituntilelegible which does a long loop which would make thread shutdowns faster, adding it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and fixed a crash when a user called 'setgenerate false' and 'setgenerate true' quickly in a sequence.

This is actually where adding the join_all() creates the issue. Most specifically, it creates an issue with NN mining and CreateNewBlock (which MCL doesn't use).

Copy link
Collaborator Author

@dimxy dimxy Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what issue you mean? I believe joins may create an issue when a thread can never be joined and hangs because it may have a long loop without interruption points or sleep or wait calls. If we have such loops we need to fix this to provide thread graceful shutdown to work.
(I am going to run my dev NN node on this branch for testing)

Copy link

@TheComputerGenie TheComputerGenie Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BitcoinMiner only contains a single interruption_point()
That point comes after the while loop at:

while ( GetTime() < B.nTime-2 )

which can hold up to 17.5 minutes, as set by:
pblock->nTime += (r % (33 - gpucount)*(33 - gpucount));

Ironically, the likelihood of that longest pause is increased by the new "stall reduction" code increasing the possibility for gpucount to reach 0 (assuming the rand hits 1056, which is possible).

Being async, the thought would be that this part would stick on its own and there would be no care; however, in reality, there are about a dozen circumstances where it locks everything (one of those circumstances being that threads are ignorant of each other and on advanced hardware come back with multiple solves when many miner threads are used).

When you look at what "should be" vs "what is", it shouldn't be a problem because NNs "should be" only running one thread; however, this is becoming decreasingly true with more and more NNs seeking to hit smaller and more predictable gaps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch, this loop.
However I believe when a user calls setgenerate false and then setgenerate true without join there will be two running threads for some time and this is basically not good at all (in a staking chain this could even create a crash as there is a static komodo_staking *array var that could be corrupted in this case). So I think threads should be stopped gracefully by joining.
This loop you mentioned has sleep() inside and we can replace it on boost::this_thread::sleep_for function (which allows to interrupt the thread) and we should check other remaining loops in miner.cpp to add interruption points in a similar way

Copy link

@TheComputerGenie TheComputerGenie Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we can replace it on boost::this_thread::sleep_for function (which allows to interrupt the thread) and we should check other remaining loops in miner.cpp to add interruption points in a similar way

However you want to do it, just want to make sure that you/everyone is aware that doing it as-is will lock up NNs; so, whatever way it's done to protect stakers that could crash needs to be done in such a way as to protect both.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about your doubts though.
Deleting a thread object without join is an obvious bug IMO and should be fixed.
Fixing it may add some delay on daemon stopping or setgenerate false but this is not a lock-up if we do this properly.
Checking all loops in miner.cpp...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw this code does not work at all:

boost::this_thread::disable_interruption();
as 'disable_interruption' is a type and to activate it we need to create a local var.
Maybe we should fix this too as it was intended

Copy link

@TheComputerGenie TheComputerGenie Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't fully tested it yet, but (along with the others)

boost::this_thread::sleep_for(boost::chrono::seconds(1)); // allow to interrupt

does look like it'll solve the issue of my concern. ty

@DeckerSU
Copy link

Actually MilliSleep is already defined as:

void MilliSleep(int64_t n)
{
    boost::this_thread::sleep_for(boost::chrono::milliseconds(n));
}

So, i guess no need to change MilliSleep calls on boost::this_thread::sleep_for ... just need to change all sleep(...) in the code (may be not only in the miner) on MilliSleep.

@dimxy
Copy link
Collaborator Author

dimxy commented Jul 14, 2022

now sleep() replaced on sleep_for()

@tonymorony
Copy link

these changes implemented in combined PR: #559

@tonymorony tonymorony closed this Sep 5, 2022
Alrighttt pushed a commit to Alrighttt/komodo that referenced this pull request May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants