Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Investigate and Resolve Slow Fetching from Backend API #64

Open
JustinhSE opened this issue Nov 13, 2024 · 3 comments
Open

fix: Investigate and Resolve Slow Fetching from Backend API #64

JustinhSE opened this issue Nov 13, 2024 · 3 comments
Labels
Backend 🛠️ bug Something isn't working

Comments

@JustinhSE
Copy link
Collaborator

JustinhSE commented Nov 13, 2024

Overview

The versify app is experiencing significant delays or failures when fetching k-means cluster results from our Python backend API. This issue is causing poor user experience and needs to be addressed urgently. We need to investigate the entire pipeline from the frontend request to the backend processing and response to identify and resolve bottlenecks.

Tasks

  1. Analyze Frontend API Call Implementation

    • Review the existing code for API calls to the Python endpoint
    • Check for proper error handling and timeout settings
    • Verify if requests are being made efficiently (e.g., not over-fetching)
  2. Backend API Performance Analysis

    • Investigate the Python backend to identify slow operations
    • Check if the k-means algorithm implementation is optimized
  3. Caching Strategy (up for discussion)

    • Implement caching for frequently requested k-means results
@JustinhSE JustinhSE added bug Something isn't working Backend 🛠️ labels Nov 13, 2024
@JustinhSE
Copy link
Collaborator Author

JustinhSE commented Nov 14, 2024

Since we will be moving to a larger dataset, pickle files won’t be good enough for the clusters as pickle files would load the entire file, reducing efficiency. We might want to consider Vector Databases.
Use Case:
• Ideal for applications that require similarity searches, such as those involving natural language processing (NLP), recommendation systems, or any task where semantic similarity is important.
Advantages:
• Efficient retrieval of high-dimensional data.
• Optimized for handling large volumes of vector data with fast query performance.
• Scales horizontally by adding more servers to a cluster, which is beneficial for large datasets.
Examples: Pinecone, Milvus

open to ideas but wanted to drop this here.

@JustinhSE
Copy link
Collaborator Author

Hold on this issue - Just changed from TF-IDF to sentence transformers, unsure if it takes as long. Deployment coming soon

@JustinhSE JustinhSE added the Hold For the issues that should not be worked on righ now label Dec 6, 2024
@JustinhSE JustinhSE removed the Hold For the issues that should not be worked on righ now label Dec 14, 2024
@JustinhSE
Copy link
Collaborator Author

JustinhSE commented Dec 14, 2024

Status: @Namit2111 is attempting to fix the backend that is deployed. We are up to date for all of the PRs. However, when versematch is used, the fetch to our backend API sends back a 504 error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend 🛠️ bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant