Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for GitHub issue/prs to markdown #5

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

gagb
Copy link
Contributor

@gagb gagb commented Dec 12, 2024

Enables the following

markitdown https://github.com/andrewyng/aisuite/issues/61

The output would be:

# Request for Python Asyncio Support

I would like to request support for Python’s asyncio in this library. This feature would be particularly beneficial for Python services, which often rely on asynchronous programming for efficient and scalable operations.

Some providers, such as OpenAI, already offer native async support (e.g., `from openai import AsyncOpenAI`), making it straightforward to wrap these APIs. Others, like AWS, have community-supported async wrappers, such as `aioboto3`. For providers without async support, an interim solution using a synchronous wrapper could be implemented while awaiting a proper asyncio implementation.

Asyncio support would greatly enhance the usability of this library. Thank you for considering this enhancement.

**State:** open
**Created at:** 2024-11-26 02:16:05+00:00
**Updated at:** 2024-11-30 01:28:19+00:00
**Comments:**
- sarthakforwet (2024-11-26 03:47:34+00:00): Can you please assign this issue to me?
- soulcarus (2024-11-26 03:59:20+00:00): I refactored the code to use a thread pool instead of asyncio.

Initially, I attempted an asyncio-based solution. However, implementing a feature that solely uses asyncio would have required modifying several lines of code, which would have been time-consuming and inefficient for this specific task.

With just over 30 additional lines of code, I implemented a method that handles the heavy lifting by assigning each model inference to a separate thread. This change results in a performance improvement, reducing execution time by 

Add support for converting GitHub issues to markdown.

* Add `convert_github_issue` method in `src/markitdown/_markitdown.py` to handle GitHub issue conversion.
* Use `PyGithub` to fetch issue details using the provided token.
* Convert the issue details to markdown format and return as `DocumentConverterResult`.
* Add optional GitHub issue support with `IS_GITHUB_ISSUE_CAPABLE` flag.
@gagb gagb marked this pull request as draft December 12, 2024 21:41
@gagb gagb requested a review from Copilot December 12, 2024 21:47

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no suggestions.

Comments skipped due to low confidence (1)

src/markitdown/_markitdown.py:1115

  • The error message should specify the package name to install. Suggest changing to: 'PyGithub is not installed. Please install the package using pip install PyGithub to use this feature.'
raise ImportError("PyGithub is not installed. Please install it to use this feature.")
@gagb gagb marked this pull request as ready for review December 12, 2024 23:10
@gagb gagb requested a review from afourney December 12, 2024 23:10
@afourney
Copy link
Member

Looks good, but I'll test it when I get back in to town today.

@gagb
Copy link
Contributor Author

gagb commented Dec 13, 2024

I think my approach won't work with the CLI tho, I need to fix that.

@gagb gagb marked this pull request as draft December 13, 2024 19:45
gagb and others added 2 commits December 13, 2024 13:57
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@gagb gagb marked this pull request as ready for review December 13, 2024 21:58
@gagb gagb changed the title Add method to convert GitHub issue to markdown Support for GitHub issue to markdown Dec 13, 2024
@gagb
Copy link
Contributor Author

gagb commented Dec 13, 2024

I fixed the cli issue as well. still not sure if convert_stream would work.

@gagb gagb changed the title Support for GitHub issue to markdown Support for GitHub issue/prs to markdown Dec 13, 2024
Comment on lines +1040 to +1051
parsed_url = urlparse(url)
if parsed_url.hostname == "github.com" and any(
x in parsed_url.path for x in ["/issues/", "/pull/"]
):
github_token = kwargs.get("github_token", os.getenv("GITHUB_TOKEN"))
if not github_token:
raise ValueError(
"GitHub token is required for GitHub issue or pull request conversion."
)
return GitHubIssueConverter().convert(
github_url=url, github_token=github_token
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afourney this was the conditional that I added since accessing GH requires API call.

@gagb gagb marked this pull request as draft December 15, 2024 03:29
@Russusheikh
Copy link

Nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants