Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed #69

himanshumahajan138 · 2024-10-19T10:37:32Z

Pull Request Title:

Implement TextRank-based Text Summarization with GloVe Embeddings
Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed

Description:

This PR introduces a Python script that implements the TextRank algorithm for text summarization. The key highlights include:

Text Preprocessing: Removes punctuation, special characters, numbers, and stopwords.
Sentence Vectorization: Utilizes GloVe word embeddings to convert sentences into vectors.
Similarity Matrix: Constructs a similarity matrix using cosine similarity between sentence vectors.
TextRank Algorithm: Applies the PageRank algorithm on the similarity matrix to rank sentences.
Summary Output: Returns the top N most important sentences as the summary.

Key Features:

GloVe Embeddings Download: Automatically downloads GloVe embeddings if not available locally.
Preprocessing Module: Cleans and prepares sentences for summarization.
Vectorization Module: Maps sentences to vector representations using GloVe.
Ranking Module: Ranks sentences based on cosine similarity and PageRank.
Logging: Provides detailed logs to track each step.

Changes:

Added preprocess_sentences() to clean and tokenize sentences.
Added generate_sentence_vectors() to convert sentences into vector representations.
Implemented build_similarity_matrix() to calculate cosine similarity between sentence vectors.
Implemented rank_sentences() to apply TextRank and extract the most important sentences.
Created download_glove() to handle the automatic download of GloVe embeddings.
Added a summarize_text() function to tie the entire summarization process.

Documentation:

Detailed the workflow in the README.md explaining each function, its purpose, and usage.
Provided instructions on downloading and utilizing GloVe embeddings in the summarization process.
Added inline comments and logging for better traceability.

Testing:

Tested on sample data stored in a CSV file.
Verified GloVe embeddings are downloaded and extracted correctly.
Confirmed output summarization meets expectations.

Issue Reference:

Fixes: #32 (New feature implementation)

Checklist:

Code follows the existing coding style.
Documentation has been added/updated in README.md.
All tests have been successfully passed.
The PR includes only relevant code changes (no unrelated modifications).

Notes for Reviewers:

How to test:
Place a CSV file named sample.csv containing an "article_text" column in the Text Summarizer/ folder and run the script.
Adjust the GloVe URL and file paths in the script if necessary.
Performance considerations:
The script is optimized for smaller datasets; performance may degrade with large texts. Further improvements could include parallelizing the sentence similarity calculation.

Thank you for reviewing this contribution! I look forward to any feedback or suggestions you may have.

…ixed

himanshumahajan138 · 2024-10-21T09:21:57Z

@king04aman Sir, i request you to please accept my PR as this is for Hacktoberfest so i need it to be done before deadline

Hope You Understand...

king04aman

Just change First Line from README file (recheck that file)

himanshumahajan138 · 2024-10-21T17:57:01Z

😂😂😂 Silly CHATGPT!

@king04aman
Fixed it Please Merge...

Mill GithHub Actions and others added 2 commits October 19, 2024 16:03

Fixes: king04aman#32 ; Feature: Text Summarization Tool Using NLP ; F…

f0bcfd9

…ixed

Merge branch 'king04aman:main' into feature/add-text-summarizer

38831bb

king04aman self-requested a review October 21, 2024 17:37

king04aman requested changes Oct 21, 2024

View reviewed changes

himanshumahajan138 added 2 commits October 21, 2024 23:27

Update README.md

4e52a82

Update README.md

b4ecb7c

king04aman added hacktoberfest hacktoberfest-accepted labels Oct 21, 2024

king04aman approved these changes Oct 21, 2024

View reviewed changes

king04aman merged commit 79d8745 into king04aman:main Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed #69

Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed #69

himanshumahajan138 commented Oct 19, 2024 •

edited

Loading

himanshumahajan138 commented Oct 21, 2024

king04aman left a comment

himanshumahajan138 commented Oct 21, 2024

Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed #69

Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed #69

Conversation

himanshumahajan138 commented Oct 19, 2024 • edited Loading

Pull Request Title:

Implement TextRank-based Text Summarization with GloVe Embeddings Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed

Description:

Key Features:

Changes:

Documentation:

Testing:

Issue Reference:

Checklist:

Notes for Reviewers:

himanshumahajan138 commented Oct 21, 2024

king04aman left a comment

Choose a reason for hiding this comment

himanshumahajan138 commented Oct 21, 2024

himanshumahajan138 commented Oct 19, 2024 •

edited

Loading

Implement TextRank-based Text Summarization with GloVe Embeddings
Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed