Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: split _markitdown.py into modular components #253

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

t3tra-dev
Copy link

Description

This PR addresses the growing complexity of _markitdown.py by splitting it into smaller, more focused modules. The changes improve code organization and maintainability while preserving all existing functionality.

Changes

  • Created a new converters/ package to house different converter implementations
  • Split converters into logical groups (document, web, media, text, archive)
  • Moved core MarkItDown class functionality to core.py
  • Separated exception classes into exceptions.py
  • Updated imports and tests to reflect new structure

Testing

  • All existing tests pass without modification
  • Verified no functionality changes

Implementation Details

The refactoring follows these principles:

  1. Single Responsibility: Each module handles a specific type of conversion
  2. Open/Closed: New converters can be added without modifying existing code
  3. Interface Segregation: Clear base class and consistent converter interface
  4. Dependency Inversion: Core MarkItDown class depends on abstractions

Migration Notes

This is a non-breaking change as all public APIs remain unchanged. Internal imports are updated to reflect the new structure.

@t3tra-dev
Copy link
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant