From 1e7806a7ac191a9d95be3209addb4b187b0ca6a4 Mon Sep 17 00:00:00 2001 From: gagb Date: Tue, 17 Dec 2024 17:21:39 -0800 Subject: [PATCH] Simplify --- README.md | 112 ++++++++++++++++++++++-------------------------------- 1 file changed, 45 insertions(+), 67 deletions(-) diff --git a/README.md b/README.md index 1de6cdc..ae5aef2 100644 --- a/README.md +++ b/README.md @@ -2,65 +2,47 @@ [![PyPI](https://img.shields.io/pypi/v/markitdown.svg)](https://pypi.org/project/markitdown/) -The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.) +MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc). +It supports: +- PDF +- PowerPoint +- Word +- Excel +- Images (EXIF metadata and OCR) +- Audio (EXIF metadata and speech transcription) +- HTML +- Text-based formats (CSV, JSON, XML) +- ZIP files (iterates over contents) -It presently supports: +To install MarkItDown, use pip: `pip install markitdown`. Alternatively, you can install it from the source: `pip install -e .`. -- PDF (.pdf) -- PowerPoint (.pptx) -- Word (.docx) -- Excel (.xlsx) -- Images (EXIF metadata, and OCR) -- Audio (EXIF metadata, and speech transcription) -- HTML (special handling of Wikipedia, etc.) -- Various other text-based formats (csv, json, xml, etc.) -- ZIP (Iterates over contents and converts each file) +## Usage -# Installation - -You can install `markitdown` using pip: - -```python -pip install markitdown -``` - -or from the source - -```sh -pip install -e . -``` - -# Usage -The API is simple: - -```python -from markitdown import MarkItDown - -markitdown = MarkItDown() -result = markitdown.convert("test.xlsx") -print(result.text_content) -``` - -To use this as a command-line utility, install it and then run it like this: - -```bash -markitdown path-to-file.pdf -``` - -This will output Markdown to standard output. You can save it like this: +### Command-Line ```bash markitdown path-to-file.pdf > document.md ``` -You can pipe content to standard input by omitting the argument: +You can also pipe content: ```bash cat path-to-file.pdf | markitdown ``` -You can also configure markitdown to use Large Language Models to describe images. To do so you must provide `llm_client` and `llm_model` parameters to MarkItDown object, according to your specific client. +### Python API +Basic usage in Python: + +```python +from markitdown import MarkItDown + +md = MarkItDown() +result = md.convert("test.xlsx") +print(result.text_content) +``` + +To use Large Language Models for image descriptions, provide `llm_client` and `llm_model`: ```python from markitdown import MarkItDown @@ -72,7 +54,7 @@ result = md.convert("example.jpg") print(result.text_content) ``` -You can also use the project as Docker Image: +### Docker ```sh docker build -t markitdown:latest . @@ -93,30 +75,26 @@ This project has adopted the [Microsoft Open Source Code of Conduct](https://ope For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. -### Running Tests +### Running Tests and Checks -To run tests, install `hatch` using `pip` or other methods as described [here](https://hatch.pypa.io/dev/install). +- Install `hatch` in your environment and run tests: + ```sh + pip install hatch # Other ways of installing hatch: https://hatch.pypa.io/dev/install/ + hatch shell + hatch test + ``` -```sh -pip install hatch -hatch shell -hatch test -``` + (Alternative) Use the Devcontainer which has all the dependencies installed: + ```sh + # Reopen the project in Devcontainer and run: + hatch test + ``` -Alternative method: using Devcontainer -- Reopen project in the Devcontainer (via the Command Palette: `Reopen in Container`) -- Once inside the container, run: -```sh -hatch test -``` - -### Running Pre-commit Checks - -Please run the pre-commit checks before submitting a PR. - -```sh -pre-commit run --all-files -``` +- Run pre-commit checks before submitting a PR: + ```sh + # pip install pre-commit + pre-commit run --all-files + ``` ## Trademarks