Simplify
This commit is contained in:
92
README.md
92
README.md
@@ -2,65 +2,47 @@
|
||||
|
||||
[](https://pypi.org/project/markitdown/)
|
||||
|
||||
The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)
|
||||
MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).
|
||||
It supports:
|
||||
- PDF
|
||||
- PowerPoint
|
||||
- Word
|
||||
- Excel
|
||||
- Images (EXIF metadata and OCR)
|
||||
- Audio (EXIF metadata and speech transcription)
|
||||
- HTML
|
||||
- Text-based formats (CSV, JSON, XML)
|
||||
- ZIP files (iterates over contents)
|
||||
|
||||
It presently supports:
|
||||
To install MarkItDown, use pip: `pip install markitdown`. Alternatively, you can install it from the source: `pip install -e .`.
|
||||
|
||||
- PDF (.pdf)
|
||||
- PowerPoint (.pptx)
|
||||
- Word (.docx)
|
||||
- Excel (.xlsx)
|
||||
- Images (EXIF metadata, and OCR)
|
||||
- Audio (EXIF metadata, and speech transcription)
|
||||
- HTML (special handling of Wikipedia, etc.)
|
||||
- Various other text-based formats (csv, json, xml, etc.)
|
||||
- ZIP (Iterates over contents and converts each file)
|
||||
## Usage
|
||||
|
||||
# Installation
|
||||
|
||||
You can install `markitdown` using pip:
|
||||
|
||||
```python
|
||||
pip install markitdown
|
||||
```
|
||||
|
||||
or from the source
|
||||
|
||||
```sh
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
# Usage
|
||||
The API is simple:
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
|
||||
markitdown = MarkItDown()
|
||||
result = markitdown.convert("test.xlsx")
|
||||
print(result.text_content)
|
||||
```
|
||||
|
||||
To use this as a command-line utility, install it and then run it like this:
|
||||
|
||||
```bash
|
||||
markitdown path-to-file.pdf
|
||||
```
|
||||
|
||||
This will output Markdown to standard output. You can save it like this:
|
||||
### Command-Line
|
||||
|
||||
```bash
|
||||
markitdown path-to-file.pdf > document.md
|
||||
```
|
||||
|
||||
You can pipe content to standard input by omitting the argument:
|
||||
You can also pipe content:
|
||||
|
||||
```bash
|
||||
cat path-to-file.pdf | markitdown
|
||||
```
|
||||
|
||||
You can also configure markitdown to use Large Language Models to describe images. To do so you must provide `llm_client` and `llm_model` parameters to MarkItDown object, according to your specific client.
|
||||
### Python API
|
||||
|
||||
Basic usage in Python:
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
|
||||
md = MarkItDown()
|
||||
result = md.convert("test.xlsx")
|
||||
print(result.text_content)
|
||||
```
|
||||
|
||||
To use Large Language Models for image descriptions, provide `llm_client` and `llm_model`:
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
@@ -72,7 +54,7 @@ result = md.convert("example.jpg")
|
||||
print(result.text_content)
|
||||
```
|
||||
|
||||
You can also use the project as Docker Image:
|
||||
### Docker
|
||||
|
||||
```sh
|
||||
docker build -t markitdown:latest .
|
||||
@@ -93,28 +75,24 @@ This project has adopted the [Microsoft Open Source Code of Conduct](https://ope
|
||||
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
|
||||
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
|
||||
|
||||
### Running Tests
|
||||
|
||||
To run tests, install `hatch` using `pip` or other methods as described [here](https://hatch.pypa.io/dev/install).
|
||||
### Running Tests and Checks
|
||||
|
||||
- Install `hatch` in your environment and run tests:
|
||||
```sh
|
||||
pip install hatch
|
||||
pip install hatch # Other ways of installing hatch: https://hatch.pypa.io/dev/install/
|
||||
hatch shell
|
||||
hatch test
|
||||
```
|
||||
|
||||
Alternative method: using Devcontainer
|
||||
- Reopen project in the Devcontainer (via the Command Palette: `Reopen in Container`)
|
||||
- Once inside the container, run:
|
||||
(Alternative) Use the Devcontainer which has all the dependencies installed:
|
||||
```sh
|
||||
# Reopen the project in Devcontainer and run:
|
||||
hatch test
|
||||
```
|
||||
|
||||
### Running Pre-commit Checks
|
||||
|
||||
Please run the pre-commit checks before submitting a PR.
|
||||
|
||||
- Run pre-commit checks before submitting a PR:
|
||||
```sh
|
||||
# pip install pre-commit
|
||||
pre-commit run --all-files
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user