Merge branch 'main' into main

This commit is contained in:
gagb
2024-12-16 16:35:50 -08:00
committed by GitHub
9 changed files with 251 additions and 9 deletions

View File

@@ -1,5 +1,7 @@
# MarkItDown
[![PyPI](https://img.shields.io/pypi/v/markitdown.svg)](https://pypi.org/project/markitdown/)
The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)
It presently supports:
@@ -12,6 +14,7 @@ It presently supports:
- Audio (EXIF metadata, and speech transcription)
- HTML (special handling of Wikipedia, etc.)
- Various other text-based formats (csv, json, xml, etc.)
- ZIP (Iterates over contents and converts each file)
# Installation
@@ -27,7 +30,6 @@ or from the source
pip install -e .
```
# Usage
The API is simple:
@@ -39,7 +41,26 @@ result = markitdown.convert("test.xlsx")
print(result.text_content)
```
You can also configure markitdown to use Large Language Models to describe images. To do so you must provide mlm_client and mlm_model parameters to MarkItDown object, according to your specific client.
To use this as a command-line utility, install it and then run it like this:
```bash
markitdown path-to-file.pdf
```
This will output Markdown to standard output. You can save it like this:
```bash
markitdown path-to-file.pdf > document.md
```
You can pipe content to standard input by omitting the argument:
```bash
cat path-to-file.pdf | markitdown
```
You can also configure markitdown to use Large Language Models to describe images. To do so you must provide `mlm_client` and `mlm_model` parameters to MarkItDown object, according to your specific client.
```python
from markitdown import MarkItDown