-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: run unauthenticated fetch before authenticated #127
Conversation
@NSeydoux can you review this quickly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is actually the fix we really want.
The problem with this solution is that we can increase the number of HTTP requests per query two-fold. Especially in cases where we're doing link traversal, where the number of requests can go in the order of thousands, this will become a significant bottleneck, as query execution will basically take twice as long in some cases.
Alternatively, we could have a (non-default) option in the web client that enables this fallback-based fetch implementation. This would ensure that the general case still performs as fast as possible, and only those queries that require this specific workaround can make use of this mode.
In any case, I think this issue is broader than just the web client here, and should eventually be tackled on a higher level.
@NSeydoux - correct me if I'm wrong; but won't this become the default logic of the Inrupt libraries in a few months anyway?
I see your point here; does it make sense to then use a caching mechanism like; and if so should we implement something similar when we add this kind of logic to solid-client-authn if (config.context.workerSolidAuth) {
const authenticatedFetch = workerToWindowHandler.buildAuthenticatedFetch();
const authUrls: Record<string, boolean> = {}
async function comunicaFetch(...args) {
if (authUrls[getNamesapce(args[0])]) {
// In future we should check if this is a 403? (double check code) response
// and if so try the *unathenticated* fetch. This handles cases like a qpf endpoint
// changing an endpoint from being auth-required to public halfway through a session
return await authenticatedFetch(...args);
}
const response = await global.fetch(...args);
if (response.status === 401) {
authUrls[getNamespace(args[0])] = true;
return await authenticatedFetch(...args);
}
return response;
}
config.context.fetch = comunicaFetch;
} ^^this solves performance when there is large number of requests to a QPF endpoint.
So if we are doing lots of traversal outside then this isn't an issue a Pod this isn't an issue, as all the documents we get should be able to retrieved with the first call; but if we are traversing documents within a Pod I do see your point. I'm not sure if it makes sense to cache origins rather than namespaces to make particular authenticated calls on (though that seems fraught with potential problems).
I agree with the idea of an option; but strongly disagree with the default order. I think this fallback-based implementation should be the default as we should favour correctness over performance (which I know is bold of me to say given how many performance rabbitholes I've gone down in the last few months). |
Good idea, I think this namespace/origin-based caching should solve all major performance issues that could arise from this mechanism. And this could then become the default indeed. I'm not sure yet though what you have in mind exactly for this |
The next version of However, you are absolutely right, it would not be acceptable to issue twice as much requests, and optimizations such as the caching @jeswr proposed will be required. |
My stance is that we should be caching by namespace rather than origin, i.e. This could be implemented as function getNamespace(input) {
const url = new URL(new Request(input).url);
return url.origin + url.pathname;
} |
@rubensworks I've added the caching optimisations now. Still won't be as optimal as you want for link traversal to different paths within the same Pod. However, the auth requirements for each of those files does need to be checked independentally as they will have different permissions. In the future I guess we can use Pod metadata to optimise this. |
But in that case we still suffer from the performance problems, right? For example, when traversing over a pod with 100 Turtle files in So perhaps origin-based caching may be better after all?
If I'm correct, using authenticated fetch is never a problem on pods, right? I've successfully done authenticated queries over pods with public resources in the past. In any case, a way to disable this behaviour seems important.
Definitely, but we just have to be sure that we don't break existing solutions in order to fix other problems :-) |
I have a hunch that this is implementation dependent - and it just so happens that our current implementations are ok with it. But that is more of a questions for someone like @NSeydoux .
Agreed - I think we were just in disagreance on which should be the default. I can add in that option here & to the website once we make a decision on whether this is the correct caching strategy for us right now. |
Any updates on this? |
closes #126