Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROD-972 ID-1324 Deduplicate group synchronization calls to google. #1472

Merged
merged 7 commits into from
Jul 2, 2024

Conversation

Ghost-in-a-Jar
Copy link
Contributor

Ticket: https://broadworkbench.atlassian.net/browse/ID-1324

What:

We make a LOT of calls to the google groups api, a lot of which are duplicated and unnecessary. This PR aims to reduce the number of calls to google group apis during the group syncing

Why:

We are hitting the quota for listing group memberships every day and ended up with a prod incident over the weekend.

How:

The plan is to record the last synchronized version and current version of group, only synchronize group if group version is greater than last synchronized version.

Pseudocode:

Pull message from pubsub, check current version of group against last synchronized version
if equal:
	do nothing
if the group version is greater than last synchronized version
	sychronize the group

	update last sycnronized version to the version of the group when the synchronization started

In DB migration set initial values of current group version to 1 and last synchronized version to 0

Testing/monitoring notes:

  • can test manually by pushing our own messages into the queue
  • add unit tests in GoogleExtensionSpec
  • add some telemetry around no-ops and list group membership

PR checklist

  • I've followed the instructions if I've made any changes to the API, especially if they're breaking changes
  • I've filled out the Security Risk Assessment (requires Broad Internal network access) and attached the result to the JIRA ticket

@dvoet
Copy link
Collaborator

dvoet commented Jul 1, 2024

I think you also need to update the group version somewhere in or before GoogleExtension.onGroupUpdate. This is the logic that figures out all the policies related to an auth domain that must be synced. The related policies need to have their versions bumped so that the GoogleGroupSynchronizer doesn't ignore them.

@dvoet
Copy link
Collaborator

dvoet commented Jul 1, 2024

also need to update the group version in overwritePolicyMembersInternal

@Ghost-in-a-Jar Ghost-in-a-Jar marked this pull request as ready for review July 2, 2024 15:53
case _: BasicWorkbenchGroup => verify(mockDirectoryDAO).updateSynchronizedDateAndVersion(target, samRequestContext)
case p: AccessPolicy =>
verify(mockDirectoryDAO).updateSynchronizedDateAndVersion(p.copy(members = p.members + CloudExtensions.allUsersGroupName), samRequestContext)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This started failing because updateSynchronizedDateAndVersion was being called for the policy with the addition of the all users group in the policy. Still trying to get my head around exactly why. A lot is mocked out here and this test is really only focused around group emails so I think this change is fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably because the policy is set to public

@@ -298,6 +287,7 @@ class ResourceService(
} else IO.unit
policies <- listResourcePolicies(resource, samRequestContext)
_ <- accessPolicyDAO.addResourceAuthDomain(resource, authDomains, samRequestContext)
_ <- policies.traverse(p => directoryDAO.updateGroupUpdatedDateAndVersionWithSession(FullyQualifiedPolicyId(resource, p.policyName), samRequestContext))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updates all of the policies when an auth domain is added to a resource

@@ -45,7 +45,9 @@ trait DirectoryDAO {

def isGroupMember(groupId: WorkbenchGroupIdentity, member: WorkbenchSubject, samRequestContext: SamRequestContext): IO[Boolean]

def updateSynchronizedDate(groupId: WorkbenchGroupIdentity, samRequestContext: SamRequestContext): IO[Unit]
def updateSynchronizedDateAndVersion(group: WorkbenchGroup, samRequestContext: SamRequestContext): IO[Unit]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to pass in the group here so that we can use the version of the group that is held in memory when the sync started.

def updateSynchronizedDate(groupId: WorkbenchGroupIdentity, samRequestContext: SamRequestContext): IO[Unit]
def updateSynchronizedDateAndVersion(group: WorkbenchGroup, samRequestContext: SamRequestContext): IO[Unit]

def updateGroupUpdatedDateAndVersionWithSession(groupId: WorkbenchGroupIdentity, samRequestContext: SamRequestContext): IO[Unit]
Copy link
Contributor Author

@Ghost-in-a-Jar Ghost-in-a-Jar Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to update the policies in resource service and it just creates a write db session and then calls updateGroupUpdatedDateAndVersion

Copy link
Contributor Author

@Ghost-in-a-Jar Ghost-in-a-Jar Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateGroupUpdatedDateAndVersion is also only on PostgresGroupDAO which ResourceService doesnt have access to

members: Set[WorkbenchSubject],
email: WorkbenchEmail,
version: Int = 1,
lastSynchronizedVersion: Option[Int] = None
Copy link
Contributor Author

@Ghost-in-a-Jar Ghost-in-a-Jar Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting these values (including in the other models) was to avoid having to change BasicWorkbenchGroup in a ton of places. With the proper tests in place I think this is ok.

Copy link
Collaborator

@dvoet dvoet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 comment but don't hold this up based on it

if (group.version > group.lastSynchronizedVersion.getOrElse(0)) {
IO.unit
} else {
IO.raiseError(new GroupAlreadySynchronized)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like handling this with an exception since this is not really an exceptional situation, we just want an early escape.

case _: BasicWorkbenchGroup => verify(mockDirectoryDAO).updateSynchronizedDateAndVersion(target, samRequestContext)
case p: AccessPolicy =>
verify(mockDirectoryDAO).updateSynchronizedDateAndVersion(p.copy(members = p.members + CloudExtensions.allUsersGroupName), samRequestContext)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably because the policy is set to public

Copy link
Contributor

@samanehsan samanehsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ghost-in-a-Jar! 🎉

Copy link

sonarqubecloud bot commented Jul 2, 2024

@Ghost-in-a-Jar Ghost-in-a-Jar merged commit 6e240d2 into develop Jul 2, 2024
24 of 28 checks passed
@Ghost-in-a-Jar Ghost-in-a-Jar deleted the ID-1324-dedupe-group-sync branch July 2, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants