Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify README #108

Merged
merged 3 commits into from
Dec 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 39 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,65 +2,47 @@

[![PyPI](https://img.shields.io/pypi/v/markitdown.svg)](https://pypi.org/project/markitdown/)

The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)
MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).
It supports:
- PDF
- PowerPoint
- Word
- Excel
- Images (EXIF metadata and OCR)
- Audio (EXIF metadata and speech transcription)
- HTML
- Text-based formats (CSV, JSON, XML)
- ZIP files (iterates over contents)

It presently supports:
To install MarkItDown, use pip: `pip install markitdown`. Alternatively, you can install it from the source: `pip install -e .`

- PDF (.pdf)
- PowerPoint (.pptx)
- Word (.docx)
- Excel (.xlsx)
- Images (EXIF metadata, and OCR)
- Audio (EXIF metadata, and speech transcription)
- HTML (special handling of Wikipedia, etc.)
- Various other text-based formats (csv, json, xml, etc.)
- ZIP (Iterates over contents and converts each file)
## Usage

# Installation
### Command-Line

You can install `markitdown` using pip:

```python
pip install markitdown
```bash
markitdown path-to-file.pdf > document.md
```

or from the source
You can also pipe content:

```sh
pip install -e .
```bash
cat path-to-file.pdf | markitdown
```

# Usage
The API is simple:
### Python API

Basic usage in Python:

```python
from markitdown import MarkItDown

markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
md = MarkItDown()
result = md.convert("test.xlsx")
print(result.text_content)
```

To use this as a command-line utility, install it and then run it like this:

```bash
markitdown path-to-file.pdf
```

This will output Markdown to standard output. You can save it like this:

```bash
markitdown path-to-file.pdf > document.md
```

You can pipe content to standard input by omitting the argument:

```bash
cat path-to-file.pdf | markitdown
```

You can also configure markitdown to use Large Language Models to describe images. To do so you must provide `llm_client` and `llm_model` parameters to MarkItDown object, according to your specific client.

To use Large Language Models for image descriptions, provide `llm_client` and `llm_model`:

```python
from markitdown import MarkItDown
Expand All @@ -72,7 +54,7 @@ result = md.convert("example.jpg")
print(result.text_content)
```

You can also use the project as Docker Image:
### Docker

```sh
docker build -t markitdown:latest .
Expand All @@ -93,30 +75,22 @@ This project has adopted the [Microsoft Open Source Code of Conduct](https://ope
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [[email protected]](mailto:[email protected]) with any additional questions or comments.

### Running Tests

To run tests, install `hatch` using `pip` or other methods as described [here](https://hatch.pypa.io/dev/install).
### Running Tests and Checks

```sh
pip install hatch
hatch shell
hatch test
```

Alternative method: using Devcontainer
- Reopen project in the Devcontainer (via the Command Palette: `Reopen in Container`)
- Once inside the container, run:
```sh
hatch test
```

### Running Pre-commit Checks
- Install `hatch` in your environment and run tests:
```sh
pip install hatch # Other ways of installing hatch: https://hatch.pypa.io/dev/install/
hatch shell
hatch test
```

Please run the pre-commit checks before submitting a PR.
(Alternative) Use the Devcontainer which has all the dependencies installed:
```sh
# Reopen the project in Devcontainer and run:
hatch test
```

```sh
pre-commit run --all-files
```
- Run pre-commit checks before submitting a PR: `pre-commit run --all-files`

## Trademarks

Expand Down
Loading