Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed #69

Merged

Conversation

himanshumahajan138
Copy link
Contributor

@himanshumahajan138 himanshumahajan138 commented Oct 19, 2024

Pull Request Title:

Implement TextRank-based Text Summarization with GloVe Embeddings
Fixes: #32 ; Feature: Text Summarization Tool Using NLP ; Fixed

Description:

This PR introduces a Python script that implements the TextRank algorithm for text summarization. The key highlights include:

  • Text Preprocessing: Removes punctuation, special characters, numbers, and stopwords.
  • Sentence Vectorization: Utilizes GloVe word embeddings to convert sentences into vectors.
  • Similarity Matrix: Constructs a similarity matrix using cosine similarity between sentence vectors.
  • TextRank Algorithm: Applies the PageRank algorithm on the similarity matrix to rank sentences.
  • Summary Output: Returns the top N most important sentences as the summary.

Key Features:

  1. GloVe Embeddings Download: Automatically downloads GloVe embeddings if not available locally.
  2. Preprocessing Module: Cleans and prepares sentences for summarization.
  3. Vectorization Module: Maps sentences to vector representations using GloVe.
  4. Ranking Module: Ranks sentences based on cosine similarity and PageRank.
  5. Logging: Provides detailed logs to track each step.

Changes:

  • Added preprocess_sentences() to clean and tokenize sentences.
  • Added generate_sentence_vectors() to convert sentences into vector representations.
  • Implemented build_similarity_matrix() to calculate cosine similarity between sentence vectors.
  • Implemented rank_sentences() to apply TextRank and extract the most important sentences.
  • Created download_glove() to handle the automatic download of GloVe embeddings.
  • Added a summarize_text() function to tie the entire summarization process.

Documentation:

  • Detailed the workflow in the README.md explaining each function, its purpose, and usage.
  • Provided instructions on downloading and utilizing GloVe embeddings in the summarization process.
  • Added inline comments and logging for better traceability.

Testing:

  • Tested on sample data stored in a CSV file.
  • Verified GloVe embeddings are downloaded and extracted correctly.
  • Confirmed output summarization meets expectations.

Issue Reference:

Fixes: #32 (New feature implementation)


Checklist:

  • Code follows the existing coding style.
  • Documentation has been added/updated in README.md.
  • All tests have been successfully passed.
  • The PR includes only relevant code changes (no unrelated modifications).

Notes for Reviewers:

  • How to test:
    Place a CSV file named sample.csv containing an "article_text" column in the Text Summarizer/ folder and run the script.
    Adjust the GloVe URL and file paths in the script if necessary.

  • Performance considerations:
    The script is optimized for smaller datasets; performance may degrade with large texts. Further improvements could include parallelizing the sentence similarity calculation.

Thank you for reviewing this contribution! I look forward to any feedback or suggestions you may have.

@himanshumahajan138
Copy link
Contributor Author

@king04aman Sir, i request you to please accept my PR as this is for Hacktoberfest so i need it to be done before deadline

Hope You Understand...

@king04aman king04aman self-requested a review October 21, 2024 17:37
Copy link
Owner

@king04aman king04aman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Just change First Line from README file (recheck that file)

@himanshumahajan138
Copy link
Contributor Author

😂😂😂 Silly CHATGPT!

@king04aman
Fixed it Please Merge...

@king04aman king04aman merged commit 79d8745 into king04aman:main Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Develop Text Summarization Tool using NLP Libraries
2 participants