-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Refactoring #128
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… reduce redundancy.
- Reduce I/O Operations: Instead of opening each image inside the inner loop, we can open all images at once and process them. - Optimize Looping: Use list comprehensions where possible to make the code more efficient and Pythonic. - Error Handling: Move error handling outside the inner loop to avoid repeated checks.
- Used the apply method to directly create DuplicateIssue objects for each matching row, eliminating the need for an explicit loop. - Used list comprehension to collect the results, which is more efficient in terms of performance.
- Used List Comprehension for Filtering: Instead of using a for-loop, use list comprehension to filter out the items where pages are successfully removed. - Batch Processing: Process the removal and feedback in a single loop to reduce the overhead of multiple function calls.
- Used query to filter the DataFrame and retrieve the row directly, reducing the number of operations on the DataFrame. - Extract the row directly from the result of the query to avoid an additional indexing step.
- Optimize Duplicate Detection: Use pd.Series.duplicated directly on the DataFrame to avoid creating intermediate Series. - Avoid Redundant DataFrame Creation: Directly filter the DataFrame without creating an intermediate variable for hashes.
- Use set for Uniqueness: Replace groupby with a set to directly obtain unique hash values, which is more efficient and does not require sorting. - Convert to List: Convert the set back to a list to match the expected return type.
- Use with statement for file operations: This ensures that resources are properly managed and can slightly improve performance by reducing the overhead of manual resource management. - Optimize exception handling: Move the try-except block to cover only the Image.open call to minimize the scope of exception handling.
- Combine the regular expressions into a single pattern to reduce the number of re.sub calls. - Use a single re.sub call with a combined pattern to perform the replacements in one pass.
- Combine the regular expressions into a single substitution to reduce the number of passes over the input string. - Use a more efficient pattern that captures both hyphens and underscores in one go.
- Combine duplicate space removal steps: Remove the redundant second call to remove duplicate spaces by combining it with the first call. - Use regex for multiple replacements: Use a single regex substitution to handle multiple cleanup tasks in one pass.
- Combine Redundant Checks: Combine checks for md.cover_date and md.series to avoid redundant evaluations. - Optimize replace_token Calls: Group replace_token calls to minimize the number of times the method is invoked. - Simplify Month Name Calculation: Simplify the logic for calculating the month name.
- Should give better information to the user on slow IO operations.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some refactoring to improve performance.