Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed #69
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Title:
Implement TextRank-based Text Summarization with GloVe Embeddings
Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed
Description:
This PR introduces a Python script that implements the TextRank algorithm for text summarization. The key highlights include:
Key Features:
Changes:
preprocess_sentences()
to clean and tokenize sentences.generate_sentence_vectors()
to convert sentences into vector representations.build_similarity_matrix()
to calculate cosine similarity between sentence vectors.rank_sentences()
to apply TextRank and extract the most important sentences.download_glove()
to handle the automatic download of GloVe embeddings.summarize_text()
function to tie the entire summarization process.Documentation:
README.md
explaining each function, its purpose, and usage.Testing:
Issue Reference:
Fixes: #32 (New feature implementation)
Checklist:
README.md
.Notes for Reviewers:
How to test:
Place a CSV file named
sample.csv
containing an "article_text" column in theText Summarizer/
folder and run the script.Adjust the GloVe URL and file paths in the script if necessary.
Performance considerations:
The script is optimized for smaller datasets; performance may degrade with large texts. Further improvements could include parallelizing the sentence similarity calculation.
Thank you for reviewing this contribution! I look forward to any feedback or suggestions you may have.