Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/Era1 import/export #6547

Open
wants to merge 243 commits into
base: master
Choose a base branch
from
Open

Feature/Era1 import/export #6547

wants to merge 243 commits into from

Conversation

ak88
Copy link
Contributor

@ak88 ak88 commented Jan 16, 2024

Includes #7665 #7680

Introduces a new archive format called Era1, which is based on the Era spec made by nimbus, and this proposal from arnetheduck.

The purpose of Era1 is to provide the same functionality for EL, that Era provides for the CL. An easily verifiable and shareable format for pre-merge historic EL data, which is considered a prerequisite for eip-4444 . Era1 only concerns pre-merge data, since post-merge EL payload is contained in the beacon chain.

In the long term the idea is to distribute historic data through the portal network.

Underneath Era1 is based on e2store which is a simple type-length-value scheme.

Specification

An Era1 archive can be expressed like so

   era1 := Version | block-tuple* | other-entries* | Accumulator | BlockIndex
   block-tuple :=  CompressedHeader | CompressedBody | CompressedReceipts | TotalDifficulty
   block-index := starting-number | index | index | index ... | count

Headers, bodies and receipts are compressed using the snappy framing format.
Additionally each file contains a block index for fast lookup, and an Epoch accumulator for verification.

The epoch accumulator can be used to verify the entire archive with accumulators from a trusted source. Additionally it allows a node to download a header with a merkle-proof, that proves it belongs to a certain epoch.

The Epoch accumulator is defined in the portal network spec here.

Behaviour

  • Two operation is added, import and export can be invoked via rpc or cli.

    • --Era.ImportDirectory "something" --Era.From 0 --Era.To 0 --Era.TrustedAccumulatorFile "somefile"
      • From, To and TrustedAccumulatorFile is optional. Will import everything when range is set to 0. Will trust the era directory if TrustedAccumulatorFile is not specified.
      • RPC is admin_importHistory.
    • --Era.ExportDirectory "something" --Era.From 0 --Era.To 0
      • From, To is optional. Will export everything when set to 0.
      • accumulators.txt and checksums.txt is always created.
      • RPC is admin_exportHistory.
  • For simpliciy, CLI will block the application from starting until it is completed. RPC does not block but only one import/export can run at the same time.

  • During import, block range before head will be inserted in parallel like old bodies, and after head will be "suggested" like forward sync. So it will process new imported block.

    • It will not complete until the blocks are processed, and will pause adding if the processing queue reaches 2000 block.
    • An exception to this is if fastsync is enabled and the head is 0, in which case, it will only insert blocks like old bodies.
    • Accumulator and content is always verified/checksumed, along with IBlockValidator.
  • During export, it export the range in parallel, each era file having its own task.

Types of changes

What types of changes does your code introduce?

  • New feature (a non-breaking change that adds functionality)

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • Yes

Documentation

Requires documentation update

  • Yes

Requires explanation in Release Notes

  • Yes

Copy link
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall magnificent work! Just pointing out there is potential for a lot more optimizations, some micro, some maybe more substantial. I don't see anything particularly wrong, not approving yet only because I want to have those discussions and potential improvements here and not loose the conversations.

Missing if (_logger.IsXXX) in many places.
Use is null/is not null.

/// <param name="startIndex">Optional start index of the underlying buffer.</param>
/// <returns></returns>
/// <exception cref="InvalidOperationException"></exception>
public static Memory<byte> AsMemory(this IByteBuffer buffer, int? startIndex = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe have optional length for completeness?

Comment on lines +69 to +79
[JsonRpcMethod(Description = "Exports a range of historic block in era1 format.",
EdgeCaseHint = "",
ExampleResponse = "\"Export task started.\"",
IsImplemented = true)]
Task<ResultWrapper<string>> admin_exportHistory(string destination, int start, int end);

[JsonRpcMethod(Description = "Import a range of historic block from era1 directory.",
EdgeCaseHint = "",
ExampleResponse = "\"Export task started.\"",
IsImplemented = true)]
Task<ResultWrapper<string>> admin_importHistory(string source, int start, int end, string? accumulatorFile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add JsonRpcParameter for parameter docs.

Comment on lines +135 to +138
IContainer eraContainer = _api.ConfigureContainerBuilderFromApiWithBlockchain(new ContainerBuilder())
.AddModule(new EraModule())
.Build();
_api.DisposeStack.Push((IAsyncDisposable)eraContainer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why separate container?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't wanna touch NethermindApi.

Comment on lines +25 to +28
await using IContainer container = _api.ConfigureContainerBuilderFromApiWithBlockchain(new ContainerBuilder())
.AddModule(new EraModule())
.Build();
await container.Resolve<EraCliRunner>().Run(cancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why separate container?

Comment on lines +28 to +29
[ConfigItem(Description = "Network name used for era directory naming. When null, it will imply from network.", DefaultValue = "null", HiddenFromDocs = true)]
string? NetworkName { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When null, it will imply from network is not understandable what you mean.

if (accumulatorFile != null)
{
string[] lines = _fileSystem.File.ReadAllLines(accumulatorFile);
trustedAccumulators = lines.Select(s => new ValueHash256(s)).ToHashSet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the accumulators be in the same order as the data?
If yes then:

  1. Shouldn't be a Set but more of a list/enumerable
  2. Don't need to allocate a collection, but could be lazy loaded from the file when needed as IEnumerable<> or better IAsyncEnumerable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Imagine you have an accumulator file from EF website for example, and another era archive for only a portion of the history not necessarily starting from genesis.

Comment on lines +100 to +104
_receiptsDb.Tune(ITunableDb.TuneType.Default);
_blocksDb.Tune(ITunableDb.TuneType.Default);

if (t.IsFaulted)
throw t.Exception?.InnerException!;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do await and put Tune(ITunableDb.TuneType.Default) in finally

Comment on lines +142 to +144
// I wish I could say that EraStore can be run used in parallel in any way you like but I could not make it so.
// This make the `blockNumber` aligned to era file boundary so that when running parallel, each thread does not
// work on the same era file as other thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the problem with parallelization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EraReader sharing in EraStore is limited. If a thread try to access an era reader that is currently being used, it will create a new EraReader. Need a more sophisticated reference counting mechanism to make work.

await SuggestAndProcessBlock(b);
}
else
InsertBlockAndReceipts(b, r);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to decode blocks and receipts from RLP just to insert it here again and code it again (maybe we do)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

if (!(_verifiedEpochs.TryGetValue(epoch, out bool verified) && verified))
{
ValueHash256 checksum = reader.CalculateChecksum();
ValueHash256 expectedChecksum = _checksums[epoch - FirstEpoch];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checksums could be loaded lazily?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants