Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDAP-20993] Namespace and Repository Source control metadata refresh #15641

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

adrikagupta
Copy link
Contributor

@adrikagupta adrikagupta commented May 7, 2024

This PR adds the functionality of refreshing namespace and repository source control metadata in one transaction in /repository/apps call before fetching the apps for response. It does the refresh by getting the list of apps from remote repo, cleaning up the repository source control metadata table and then updating the isSynced (sync status) and last modified values accordingly. It also contains fixes to some bugs from the API impl PR in default store.


private final List<SourceControlMetadataRecord> apps;
private final String nextPageToken;
private final Long lastRefreshTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use boxed type? Is it nullable? It's better not to use box type if possible. Is there a meaning default value for it that you can use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its null if the refresh has not been run yet. I think it would be more appropiate to have default value as null in the API response than a actual long value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can ever be null. The value comes from SourceControlMetadataRefresher.getLastRefreshTime, which returns primitive long, and the implementation is defaulting it to 0L

@@ -2482,6 +2482,10 @@ public static final class SourceControlManagement {
public static final String REPOSITORY_TTL_SECONDS = "source.control.repository.ttl.seconds";
public static final String METADATA_MIGRATION_INTERVAL_SECONDS =
"source.control.metadata.migration.interval.seconds";
public static final String METADATA_REFRESH_INTERVAL_SECONDS =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont think we need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


private final List<SourceControlMetadataRecord> apps;
private final String nextPageToken;
private final Long lastRefreshTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its null if the refresh has not been run yet. I think it would be more appropiate to have default value as null in the API response than a actual long value

@@ -415,7 +427,9 @@ public ApplicationDetail getLatestAppDetail(ApplicationReference appRef)
* reference is not found.
*/
public SourceControlMetadataRecord getSourceControlMetadataRecord(ApplicationReference appRef)
throws SourceControlMetadataNotFoundException {
throws SourceControlMetadataNotFoundException, RepositoryNotFoundException {
// Checking if repository config exists for the namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC RepositoryNotFoundException would be interpreted as 404 right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right


/**
* Service for refreshing namespace and repository source control metadata of a namespace at
* scheduled intervals and manually.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return this.lastRefreshTime.get();
}

private void updateSourceControlMetaWithNoTransaction(ApplicationReference appRef,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name it updateSourceControlMeta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

SourceControlMetadataRecord record = lastRecord.get();
return !pageLimitReached || record == null ? null : record.getName();
});
List<SourceControlMetadataRecord> apps = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check if repoconfig is present here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@adrikagupta adrikagupta changed the title [CDAP-20993] [CDAP-20993] Namespace and Repository Source control metadata refresh May 7, 2024
return;
}

if (namespaceSourceControlMeta.getLastSyncedAt() != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add few comments explaining each section

List<SourceControlMetadataRecord> records = new ArrayList<>();
String type = EntityType.APPLICATION.toString();

store.scan(request, type, records::add);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do

if (!repoFiles.contains(record.getName())) {
        store.delete(new ApplicationReference(record.getNamespace(), record.getName()));
      }

inside scan

Copy link
Contributor

@samdgupi samdgupi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the

@samdgupi samdgupi added the build Triggers github actions build label May 7, 2024
return nextPageToken;
}

public Long getLastRefreshTime() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotate with @Nullable

* @param lastRefreshTime The timestamp of the last refresh operation.
*/
public ListSourceControlMetadataResponse(List<SourceControlMetadataRecord> apps,
String nextPageToken, Long lastRefreshTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotate parameters that can be null with @Nullable.

public class SingleSourceControlMetadataResponse {

private final SourceControlMetadataRecord app;
private final Long lastRefreshTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this nullable? If so, please annotate accordingly.

return apps;
}

public String getNextPageToken() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these getter methods used? If not, no need to define them.

boolean pageLimitReached;
pageLimitReached = applicationLifecycleService.scanSourceControlMetadata(
scanRequest, batchSize,
apps::add);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the old code we don't buffer the app, but in the new code we do. I remember do removed the buffered to avoid high memory consumption, but now we are bringing the same problem back. Is buffering needed? Seems like we can refactor the JsonPaginatedListResponder so that it can emit extra fields (lastRefreshTime in this case) before closing the json object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extend JsonPaginatedListResponder to also add additional field.
@chtyim Is it ok to take this up as a fast followup for the change ?

SourceControlMetadataRecord record = apps.isEmpty() ? null :apps.get(apps.size() - 1);
String nextPageToken = !pageLimitReached || record == null ? null :
record.getName();
long lastRefreshTime = applicationLifecycleService.getLastRefreshTime(namespaceId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like unwise to sync all apps, shouldn't we just sync the apps fetched in this page?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are cloning the full repository which would be time taking it might be wasteful to just update few pipelines.

* NamespaceSourceControlMetadataRefreshService scheduler instances.
*/
public class SourceControlMetadataRefresher {
private ConcurrentMap<NamespaceId, Long>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

lastRefreshTimes = new ConcurrentHashMap<>();
private final TransactionRunner transactionRunner;
private final SourceControlOperationRunner sourceControlOperationRunner;
private final SecureStore secureStore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused field

private final TransactionRunner transactionRunner;
private final SourceControlOperationRunner sourceControlOperationRunner;
private final SecureStore secureStore;
private static final Logger LOG = LoggerFactory.getLogger(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsued. Also, static field should comes before non static fields

* Constructs a SourceControlMetadataRefreshService instance.
*/
@Inject
public SourceControlMetadataRefresher(TransactionRunner transactionRunner,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

injected constructor shouldn't be public. Use package private (i.e. no access modifier).

* @return The timestamp of the last refresh
*/
public long getLastRefreshTime(NamespaceId namespaceId) {
return lastRefreshTimes.getOrDefault(namespaceId, 0L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the refresh time is only kept in memory? Meaning the information will be gone if the pod restarted?

Copy link
Contributor

@samdgupi samdgupi May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we do not have a good place to store this information.

throws IOException, RepositoryNotFoundException {
NamespaceId namespaceId = new NamespaceId(request.getNamespace());
// Checking if repository config is present in the namespace
sourceControlMetadataRefresher.getRepositoryMeta(namespaceId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of calling get without using the result?

Copy link
Contributor

@samdgupi samdgupi May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to check if RepositoryConfig exist or throw execption otherwise. @adrikagupta Please correct me if I am wrong but you chose this function to reduce duplicated code to fetch namespace repository config

lastRefreshTimes.putIfAbsent(namespaceId, createRefreshService(namespaceId).refreshMetadata());
}

public RepositoryMeta getRepositoryMeta(NamespaceId namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems duplicating the one in SourceControlManagementService. Why need to duplicate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is actually a member of SourceControlManagementService. We should move this to a common place. @adrikagupta I think you added a TODO for the same ?

Long lastSynced) throws IOException {
StructuredTable repoTable = getRepositorySourceControlMetadataTable();
repoTable.upsert(getAllFields(appRef, isSynced, lastSynced));
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on line 120. But there is 21 lines of duplicated code detected by IntelliJ with the NamespaceSourceControlMetadataStore class. Why are we duplicating code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two classes are handilng crud functions for two different (but similar) tables. Hence the duplication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Triggers github actions build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants