Simplify

2024-12-17 17:21:39 -08:00
parent 1163aa2b4e
commit 1e7806a7ac
1 changed files with 45 additions and 67 deletions
--- a/README.md
+++ b/README.md
@@ -2,65 +2,47 @@
 [![PyPI](https://img.shields.io/pypi/v/markitdown.svg)](https://pypi.org/project/markitdown/)
-The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)
+MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).
 It supports:
 - PDF
 - PowerPoint
 - Word
 - Excel
 - Images (EXIF metadata and OCR)
 - Audio (EXIF metadata and speech transcription)
 - HTML
 - Text-based formats (CSV, JSON, XML)
 - ZIP files (iterates over contents)
-It presently supports:
+To install MarkItDown, use pip: `pip install markitdown`. Alternatively, you can install it from the source: `pip install -e .`.
- PDF (.pdf)
+## Usage
 - PowerPoint (.pptx)
 - Word (.docx)
 - Excel (.xlsx)
 - Images (EXIF metadata, and OCR)
 - Audio (EXIF metadata, and speech transcription)
 - HTML (special handling of Wikipedia, etc.)
 - Various other text-based formats (csv, json, xml, etc.)
 - ZIP (Iterates over contents and converts each file)
-# Installation
+### Command-Line
 You can install `markitdown` using pip:
 ```python
 pip install markitdown
 ```
 or from the source
 ```sh
 pip install -e .
 ```
 # Usage
 The API is simple:
 ```python
 from markitdown import MarkItDown
 markitdown = MarkItDown()
 result = markitdown.convert("test.xlsx")
 print(result.text_content)
 ```
 To use this as a command-line utility, install it and then run it like this:
 ```bash
 markitdown path-to-file.pdf
 ```
 This will output Markdown to standard output. You can save it like this:
 ```bash
 markitdown path-to-file.pdf > document.md
 ```
-You can pipe content to standard input by omitting the argument:
+You can also pipe content:
 ```bash
 cat path-to-file.pdf | markitdown
 ```
-You can also configure markitdown to use Large Language Models to describe images. To do so you must provide `llm_client` and `llm_model` parameters to MarkItDown object, according to your specific client.
+### Python API
 Basic usage in Python:
 ```python
 from markitdown import MarkItDown
 md = MarkItDown()
 result = md.convert("test.xlsx")
 print(result.text_content)
 ```
 To use Large Language Models for image descriptions, provide `llm_client` and `llm_model`:
 ```python
 from markitdown import MarkItDown
@@ -72,7 +54,7 @@ result = md.convert("example.jpg")
 print(result.text_content)
 ```
-You can also use the project as Docker Image:
+### Docker
 ```sh
 docker build -t markitdown:latest .
@@ -93,30 +75,26 @@ This project has adopted the [Microsoft Open Source Code of Conduct](https://ope
 For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
 contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
-### Running Tests
+### Running Tests and Checks
-To run tests, install `hatch` using `pip` or other methods as described [here](https://hatch.pypa.io/dev/install).
+- Install `hatch` in your environment and run tests:
    ```sh
    pip install hatch  # Other ways of installing hatch: https://hatch.pypa.io/dev/install/
    hatch shell
    hatch test
    ```
-```sh
+  (Alternative) Use the Devcontainer which has all the dependencies installed:
-pip install hatch
+    ```sh
-hatch shell
+    # Reopen the project in Devcontainer and run:
-hatch test
+    hatch test
-```
+    ```
-Alternative method: using Devcontainer
+- Run pre-commit checks before submitting a PR:
- Reopen project in the Devcontainer (via the Command Palette: `Reopen in Container`)
+        ```sh
- Once inside the container, run:
+        # pip install pre-commit
-```sh
+        pre-commit run --all-files
-hatch test
+        ```
 ```
 ### Running Pre-commit Checks
 Please run the pre-commit checks before submitting a PR.
 ```sh
 pre-commit run --all-files
 ```
 ## Trademarks