-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karton reanalysis API is slow #650
Comments
After some investigation, if I comment out the The bottleneck seems here: def reanalyze(
self, arguments: Optional[Dict[str, Any]] = None
) -> "MWDBKartonAnalysis":
"""
Submits new Karton analysis for given object.
Requires MWDB Core >= 2.3.0.
:param arguments: |
Optional, additional arguments for analysis.
Reserved for future functionality.
.. versionadded:: 4.0.0
"""
from .karton import MWDBKartonAnalysis
arguments = {"arguments": arguments or {}}
analysis = self.api.post(
"object/{id}/karton".format(**self.data), json=arguments
)
self._expire("analyses")
return MWDBKartonAnalysis(self.api, analysis) that is, the POST request is blocking. |
I think that bottleneck is an API and gathering metadata about created analysis (https://github.com/CERT-Polska/mwdb-core/blob/master/mwdb/resources/karton.py#L130) including status, last_update and processing_in (https://github.com/CERT-Polska/mwdb-core/blob/master/mwdb/model/karton.py#L61). And here comes the huge weakness of current model: we need to iterate over all tasks currently processing in Karton ( That problem is already referenced in another issue in Karton itself: CERT-Polska/karton#178 So there are two solutions for that:
|
We're actually going to speed up analysis status inspection soon: CERT-Polska/karton#207 |
I'm not sure whether this is an issue with the API server or the
MWDB
client. I'm using the following code to re-analyze all samples matching a query:I see it takes 5-10 seconds to do one iteration, which is a lot. The MWDB API is deployed with default options, using the recommended Docker Compose file, so it's one Nginx frontend and 4 uWSGI backends. The machine is doing nothing and is not experiencing load.
I'm trying to understand whether the bottleneck is the iteration over the
files
iterator, or the way I submit files. I see the iteration is technically doing aself.api.get(object_type.URL_TYPE, params=params)
in the end, so that may be the bottleneck. But why so slow?I guess there are no bulk methods in the API, right?
The text was updated successfully, but these errors were encountered: