Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary docs #10435

Merged
merged 17 commits into from
Jun 1, 2022
Merged

Summary docs #10435

merged 17 commits into from
Jun 1, 2022

Conversation

NicholasCouri
Copy link
Contributor

@NicholasCouri NicholasCouri commented May 26, 2022

Update Summary documentation to better describe parameters and returned types.

Description

This is by no means a final version and the idea is for this to be an ongoing effort and that we can gradually add/provide more information around the Summary APIs. I will rely on the code review to add details I have missed or misunderstood from the code and after talking to Wes, Navin and Tyler.

PR Checklist

Use the check-list below to ensure your branch is ready for PR. If the item is not applicable, leave it blank.

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • My code follows the code style of this project.
  • I ran the lint checks which produced no new errors nor warnings for my changes.
  • I have checked to ensure there aren't other open Pull Requests for the same update/change.

Does this introduce a breaking change?

  • Yes
  • No

If this introduces a breaking change, please describe the impact and migration path for existing applications below.

Testing

N/A

@NicholasCouri NicholasCouri requested review from a team as code owners May 26, 2022 01:34
@github-actions github-actions bot added area: odsp-driver area: runtime Runtime related issues base: main PRs targeted against main branch labels May 26, 2022
@github-actions github-actions bot added the public api change Changes to a public API label May 26, 2022
common/lib/protocol-definitions/src/summary.ts Outdated Show resolved Hide resolved
common/lib/protocol-definitions/src/summary.ts Outdated Show resolved Hide resolved
packages/runtime/runtime-definitions/src/summary.ts Outdated Show resolved Hide resolved
export const Handle: Handle = 3 as const;
export const Attachment: Attachment = 4 as const;
/**
* Another recursive data structure.
Copy link
Contributor

@agarwal-navin agarwal-navin May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably say that this represents a sub-tree in the summary. #Resolved

export const Tree: Tree = 1 as const;

/**
* Binary data to be uploaded to the server. The data is sent to the drivers
Copy link
Contributor

@agarwal-navin agarwal-navin May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should mention driver or server here. Runtime doesn't really care about any of that.

This should probably just say that this represents a blob of data that is added to the summary. Such as the user data that is added to the DDS or metadata added by runtime such as data store / channel attributes. #Resolved

}
export type SummaryType = SummaryType.Attachment | SummaryType.Blob | SummaryType.Handle | SummaryType.Tree;

export type SummaryTypeNoHandle = SummaryType.Tree | SummaryType.Blob | SummaryType.Attachment;

/**
* Path to a summary tree object from the last uploaded summary indicating the summary object hasn't
Copy link
Contributor

@agarwal-navin agarwal-navin May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Path to a summary tree object from the last uploaded summary indicating the summary object hasn't
* Path to a summary tree object from the last successful summary indicating the summary object hasn't
``` #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't very important. I am just trying to differentiate the case where the last uploaded summary was not successfully ack'd by the server.

/**
* Path to a summary tree object from the last uploaded summary indicating the summary object hasn't
* changed since it was uploaded.
* To illustrate, if a DataStore did not get any ops since last summary, the framework runtime will use a handle for the
Copy link
Contributor

@agarwal-navin agarwal-navin May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say if a Datastore did not change since last summary. The fact that we use ops is an implementation detail. There could be other factors leading to a data store being summarized - such as its GC state changing from last summary.
Similar comment for DDS. #Resolved

* changed since it was uploaded.
* To illustrate, if a DataStore did not get any ops since last summary, the framework runtime will use a handle for the
* entire DataStore instead of re-sending the entire subtree. Same concept will be applied for a DDS.
* If a DDS did not receive any ops since the last summary, the ISummary tree for that DDS will not have changed and
Copy link
Contributor

@agarwal-navin agarwal-navin May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a readability point of view, I don't think this line about DDS is needed. Mentioning that the behavior is same as data store should be sufficient. #Resolved

* the fluid framework will send that DDS' handle so the server can use that previous handle instead of sending the
* entire DDS. An example of handle would be: '/<DataStoreId>/<DDSId>'.
* Notice that handles are optimizations from the Fluid Framework Runtime and the DDS is not aware of the handles.
* Also, the use of Summary Handles is currently restricted to DataStores and DDS.
Copy link
Contributor

@agarwal-navin agarwal-navin May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should omit this last line. We are working on adding handles for GC data and we will soon add it in other places too. #Resolved

* @returns A tree representing the summary of the channel.
*/
getAttachSummary(fullTree?: boolean, trackState?: boolean): ISummaryTreeWithStats;

/**
* Generates summary of the channel asynchronously.
* This should not be called where the channel can be modified while summarization is in progress.
* @param fullTree - flag indicating whether the attempt should generate a full
Copy link
Contributor

@agarwal-navin agarwal-navin May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding that this is called when generating a summary of the entire container. #Resolved

* compose the Summary Tree. At the same time, each component that is taking part of the summarization
* will populate the tree information aggregation for its subtree through the ISummaryStats interface.
* Any component that implements IChannelContext, IFluidDataStoreChannel or extends SharedObject
* will be taking part of the summarization process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, this description may not be very relevant here. This is essentially describing the summarization process. It should rather say what this interface means. Basically, that this represents the summary tree for a node along with the statistics for that tree. As an example it could mention that for a data store this has the data for the data store along with a subtree for each of its DDS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another important piece of information here may be its distinction from ISummarizeResult below.
Take a look at this issue which may help with some of the comments here - #4434

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its also worth adding a comment for ISummarizeInternalResult as its not clear what it is and how it differs from ISummarizeResult

@@ -21,41 +21,89 @@ export interface ISummaryCommitter {
date: string;
}

/**
Copy link
Contributor

@agarwal-navin agarwal-navin May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal to this PR - Interesting to note that ISummaryAuthor and ISummaryCommiter above are never used. We may want to look into removing them. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking by #10456

@agarwal-navin
Copy link
Contributor

Another issue that talks about these concepts that is relevant here - #4683

@NicholasCouri
Copy link
Contributor Author

NicholasCouri commented May 26, 2022

Should we add a link to this issue in the documentation ?


In reply to: 1138842742

@@ -21,41 +21,88 @@ export interface ISummaryCommitter {
date: string;
}

/**
* Represents a node from the Summary Tree.
Copy link
Contributor

@CraigMacomber CraigMacomber May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is accurate. I think something like this would be better:

Suggested change
* Represents a node from the Summary Tree.
* Type tag used to distinguish different types of {@link SummaryObject}s.
``` #Resolved

Comment on lines 41 to 42
* Summary defines a recursive data structure that will be converted to a snapshot tree and uploaded
* to the backend.
Copy link
Contributor

@CraigMacomber CraigMacomber May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can be more direct with the language, and thus less verbose.
Phrasing using "defines" and "represents" are common examples of this, and should be avoided unless we explicitly need to distinguish between some real phusical thing the code is describing and the model the types in the code create. Here this field IS the summary tree. There is no reason to put "Represents" on every type and "defines" on every field.
For example here:

Suggested change
* Summary defines a recursive data structure that will be converted to a snapshot tree and uploaded
* to the backend.
* A recursive data structure that will be converted to a snapshot tree and uploaded
* to the backend.

Also in this case we need to consider what documenation belongs on the type, and what belongs on the field. We don't want to be redundent and include docs that belong on the type everywhere its used.

Whats relevent here is that this is the tree that is described by the stats.

One other think to look out for: documentation should be clear about what a type can be used for. Is ISummaryTreeWithStats only use before a summary is uploaded (and not held onto after, or to represent downloaded summaries)? Is so, the documentation on this type chould be clear about that. If not, the docs on this field are too specific.

As far as I can tell, the purpose of this type (ISummaryTreeWithStats) is to pair a summary with some stats about it, so I think just documenting this field as "the summary decribed by stats" would be fine: leave detaisl about how ISummaryTree is used for documenation on that type, or the thing that uses it. #Resolved

@@ -21,13 +21,19 @@ export interface IChannel extends IFluidLoadable {

/**
* Generates summary of the channel synchronously.
* @param fullTree - flag indicating whether the attempt should generate a full
* summary tree without any handles for unchanged subtrees.
* @param trackState - This tells whether we should track state from this summary.
Copy link
Contributor

@CraigMacomber CraigMacomber May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm implementing this interface, how do I "trackState"? What does that mean? How can I tell if code does it correctly or not? Thats what this comment needs to answer.

Also from a verbosity standpoint, "This tells whether we should" does not really provide any information, and "we" isn't clear (this is a contract between caller and calee: which is "we"?). I'd prefer something like:

"If true, the implementation must ?????" #Resolved

Copy link
Contributor Author

@NicholasCouri NicholasCouri May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that for Summaries ( the same applies for the GC summarization?) , at least when we do the summary of the runtime at a specific sequence number, the default behavior is to always track the state of every object.

@agarwal-navin to make sure it is correct. Can you confirm which scenarios we do not track the state and why would we not track it ?

I know if it is false we bypass the Summarizer node call and go straight to summarizeInternal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State tracking is basically an optimization that we added to track state of objects across summaries. Basically, if the state of an object did not change since last successful summary, we can send an ISummaryHandle for this object instead of re-summarizing it.
If this is false, the expectation is that you should never send an ISummaryHandle since you are not expected to track state.

This was mostly internal and the state tracking happens in summarizer node for data store context and channel context. If a DDS wants to track its internal summary state, it is free to do so (but no DDS does this).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are plans to do this at DDS level - https://dev.azure.com/fluidframework/internal/_workitems/edit/423. When this happens, this option should either be removed or it will make more sense because we would likely expose the previous successful summary. And then we say something like, if trackState is true, feel free to use this previous summary to send summary handles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the requirement here is (and thus a good doc comment would be) - if false, the implementation must not return a tree containing any ISummaryHandles?.

Might be better to rename this "allowHandles"?

It would be nice to deduplicate this documentation (as well as fullTree) across the various places they occur. This could be done as a non-breaking change by adding a type alias (ex: export type TrackState = boolean;) and documenting that, or as a breaking change by switching to an enum, or a summary options object with trackState and fullTree fields document on it.

Copy link
Contributor Author

@NicholasCouri NicholasCouri May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave this change to a different PR.

* @returns A tree representing the summary of the channel.
*/
getAttachSummary(fullTree?: boolean, trackState?: boolean): ISummaryTreeWithStats;

/**
* Generates summary of the channel asynchronously.
* This should not be called where the channel can be modified while summarization is in progress.
* @param fullTree - flag indicating whether the attempt should generate a full
* summary tree without any handles for unchanged subtrees.
* @param trackState - This tells whether we should track state from this summary.
* @returns A tree representing the summary of the channel.
Copy link
Contributor

@CraigMacomber CraigMacomber May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't change this, but I think it should be improved.

We can remove some verbosity, and be more specific (ideally something more specific that what I said would be good, but I don't know better). From context we already know this returns the summary of the channel (its channel.summarize afterall), so details beyond that are useful.

Suggested change
* @returns A tree representing the summary of the channel.
* @returns A summary capturing the current state of the channel.
``` #Resolved

* To illustrate, if a DataStore did not change since last summary, the framework runtime will use a handle for the
* entire DataStore instead of re-sending the entire subtree. The same concept applies for a DDS.
* An example of handle would be: '/<DataStoreId>/<DDSId>'.
* Notice that handles are optimizations from the Fluid Framework Runtime and the DDS is not aware of the handles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two issues with this:
Shouldn't that be "SharedObjects" and not DDS.

Also SharedObject authors have to implement a method that returns a datastructure that can use this type.
Is this comment intendting to say that a SharedObject subclass should never use ISummaryHandle? Is so, we should adjust the types so they can't return them as part of the summary tries they produce.

// Stored handle reference
/**
* Unique path that identifies the stored handle reference.
*/
handle: string;
Copy link
Contributor

@CraigMacomber CraigMacomber May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I get one of these?

Copy link
Contributor Author

@NicholasCouri NicholasCouri May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a developer, I don't think you need to get it - if I got it right, it is automatically set (id) when we create the fluid file from a summary: Ex. createNewFluidFileFromSummary.

@agarwal-navin to confirm.

Copy link
Contributor

@agarwal-navin agarwal-navin May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Currently, every object that wants to use this to send a path to previous summary instead of full summary has to know what the path is. For example, the summarizer node for data store context and channel contexts tracks this path and adds it.
This path is basically its path in the container's object tree. For instance, for DDS it is /<dataStoreId>/<ddsId>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the way it should it be is similar to how summary tree is built. Each node just adds its summary tree when asked for it. It's the responsibility of the caller to add this tree against the correct path. This way by the time the summary tree reaches the root, all paths are fully built w.r.t. root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's sync up ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a reference- the only place I found that adds the handle to the tree is when we call encodeSummary (builder.addHandle).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main usage of summary handle in our code -

.

The encodeSummary bit is part of this feature called differential summary which we don't use anymore.

Copy link
Contributor Author

@NicholasCouri NicholasCouri May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Navin I had missed it - at the SummarizerNode level, if we are tracking it, it is stored under _latestSummary.fullPath.path

export interface ISummaryHandle {
type: SummaryType.Handle;

// No handles, all other SummaryType are Ok
/**
* Type of Summary Handle (SummaryType.Handle is not supported).
Copy link
Contributor

@CraigMacomber CraigMacomber May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the type of the handle, or is it the type of the data to which the handle points? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is a type for which the handle references to.

handle: string;
}

/**
Copy link
Contributor

@CraigMacomber CraigMacomber May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to mention here that to reference an already uploaded blob use ISummaryAttachment. Someone looking for how to do that is likley to find themselves here, and not realize they need to look into ISummaryAttachments. #Resolved

/**
* Handle to blobs uploaded outside of the summary. Attachment Blobs are uploaded and downloaded separately via
* http requests and are not included on the snapshot payload. The ISummaryAttachment are handles to these blobs.
* Additional information can be found here: https://github.com/microsoft/FluidFramework/issues/6374
Copy link
Contributor

@CraigMacomber CraigMacomber May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation on ISummaryAttachment is super useful to me even in its current state! #Resolved

Comment on lines 99 to 100
id: string;
}
Copy link
Contributor

@CraigMacomber CraigMacomber May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What format is this in OR how do I get one from a IFluidHandle? Is it the absolute path? A json seralized handle? Something else? #Resolved

Copy link
Contributor Author

@NicholasCouri NicholasCouri May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example, it is an id returned from the backend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used so I am not really sure. I believe the idea is that id here should be the id that storage gives for this attachment. @vladsud do you know the usage pattern for this?

Copy link
Contributor Author

@NicholasCouri NicholasCouri May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an example that makes it easier to visualize. @navin, are you sure this is not used ? After all, we do write it :)

Ex. ".blobs": {
"type": 1,
"tree": {
"0": {
"id": "bQAQKARDdMdTgqICmBa_ZB86YXwGP",
"type": 4
}
}
},

* Ex. DDS has large images or video that will be uploaded by the BlobManager and
* receive an Id that can be used in the summary.
*/
export const Attachment: Attachment = 4 as const;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you happen to know the answer, it would be good to document if there was an explicit choice not to declare SummaryType as an enum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthony-murphy Do you know why ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe because of the reason mentioned in this comment here - #9548 (comment)

@microsoft microsoft deleted a comment from ffgit2ab May 31, 2022
@NicholasCouri NicholasCouri requested a review from a team as a code owner June 1, 2022 18:31
@NicholasCouri NicholasCouri merged commit 234f450 into microsoft:main Jun 1, 2022
@tylerbutler tylerbutler mentioned this pull request Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues base: main PRs targeted against main branch public api change Changes to a public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants