Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of sidechains and gaps #32

Open
michaelsproul opened this issue Jan 30, 2024 · 2 comments
Open

Fix handling of sidechains and gaps #32

michaelsproul opened this issue Jan 30, 2024 · 2 comments

Comments

@michaelsproul
Copy link
Member

At the moment blockprint's DB is not handling sidechains very elegantly.

The database is designed so that a block can be inserted without knowing its parent. This is nice because it provides the ability to stay online processing new blocks, even if some old blocks have been missed due to temporary downtime. There's a background process in background_tasks.py which is meant to query the API for gaps to fill in, and patch them up.

The problem is that the logic for determining gaps returns some gaps that are impossible for the background task to heal. A gap is currently defined as a slot interval between a block with a parent missing from the DB (end_slot) and the last known block prior to that missing parent (start_slot):

blockprint/build_db.py

Lines 156 to 163 in c7f570d

def get_missing_parent_blocks(block_db):
res = block_db.execute(
"""SELECT slot, parent_slot FROM blocks b1
WHERE
(SELECT slot FROM blocks WHERE slot = b1.parent_slot) IS NULL
AND slot <> 1"""
)
return [(int(x[0]), int(x[1])) for x in res]

blockprint/build_db.py

Lines 181 to 191 in c7f570d

for block_slot, parent_slot in missing_parent_slots:
prior_slot = get_greatest_prior_block_slot(block_db, parent_slot)
if prior_slot is None:
start_slot = 0
else:
start_slot = prior_slot + 1
end_slot = block_slot - 1
assert end_slot >= start_slot
gaps.append({"start": start_slot, "end": end_slot})

E.g. the current output from https://api.blockprint.sigp.io/sync/gaps is:

[
  {
    "start": 8253079,
    "end": 8253086
  },
  {
    "start": 8253031,
    "end": 8253045
  },
  {
    "start": 8253127,
    "end": 8253151
  },
  {
    "start": 8254502,
    "end": 8254511
  },
  {
    "start": 8277096,
    "end": 8277097
  },
  {
    "start": 8299646,
    "end": 8299647
  },
  {
    "start": 8299650,
    "end": 8299651
  }
]

Looking at the first gap, we see that the block with missing parent that triggered this must be one at slot 8253087, which is reorged out (beaconcha.in doesn't even know about it): https://beaconcha.in/slot/8253087. Our Lighthouse nodes saw it though:

Jan 21 18:17:49.412 DEBG Cloned snapshot for late block/skipped slot, block_delay: Some(2.040814714s), parent_root: 0xd429dc371766b1d71fdad731879aafe7c2df990b402fbc5704b29144009cce8f, parent_slot: 8253079, slot: 8253087, service: beacon

Now the interesting thing here is the parent slot, https://beaconcha.in/slot/8253079. It's also empty! In order to heal the gap, we would need to load this parent block at 8253079, which we can't do because it has also been pruned.

In summary, blockprint's gap healing is broken for sidechains of length > 1. I can think of two ways to fix it:

Give the background task the ability to either delete or mark orphaned blocks in the database when: the slot of the missing parent has been finalized as a skipped slot. If we just mark them as orphaned, then we get to keep them in the DB (moar data) but won't block the gap healing process on them. On the other hand, marking them orphaned would require a new database column and a little DB migration (not too bad, given the small number of live blockprint instances).

@michaelsproul
Copy link
Member Author

Just tried this query as a hack to see if I could get the DB to heal:

WITH parentless AS (
  SELECT slot, proposer_index FROM blocks b1
  WHERE
    (SELECT slot FROM blocks WHERE slot = b1.parent_slot) IS NULL
    AND slot <> 1
)
DELETE FROM blocks
WHERE EXISTS (SELECT 1 FROM parentless WHERE parentless.slot = blocks.slot);

It deletes all blocks that lack parents, which could fix the DB if there are just a few length-2 sidechains. We'll see.

@michaelsproul
Copy link
Member Author

Seems to have worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant