Cleanup and refactor, in preparation for plugin support. (#318)
* Work started moving converters to individual files. * Significant cleanup and refactor. * Moved everything to a packages subfolder. * Added sample plugin. * Added instructions to the README.md * Bumped version, and added a note about compatibility.
This commit is contained in:
96
packages/markitdown-sample-plugin/README.md
Normal file
96
packages/markitdown-sample-plugin/README.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# MarkItDown Sample Plugin
|
||||
|
||||
[](https://pypi.org/project/markitdown/)
|
||||

|
||||
[](https://github.com/microsoft/autogen)
|
||||
|
||||
|
||||
This project shows how to create a sample plugin for MarkItDown. The most important parts are as follows:
|
||||
|
||||
FNext, implement your custom DocumentConverter:
|
||||
|
||||
```python
|
||||
from typing import Union
|
||||
from markitdown import DocumentConverter, DocumentConverterResult
|
||||
|
||||
class RtfConverter(DocumentConverter):
|
||||
def convert(self, local_path, **kwargs) -> Union[None, DocumentConverterResult]:
|
||||
# Bail if not an RTF file
|
||||
extension = kwargs.get("file_extension", "")
|
||||
if extension.lower() != ".rtf":
|
||||
return None
|
||||
|
||||
# Implement the conversion logic here ...
|
||||
|
||||
# Return the result
|
||||
return DocumentConverterResult(
|
||||
title=title,
|
||||
text_content=text_content,
|
||||
)
|
||||
```
|
||||
|
||||
Next, make sure your package implements and exports the following:
|
||||
|
||||
```python
|
||||
# The version of the plugin interface that this plugin uses.
|
||||
# The only supported version is 1 for now.
|
||||
__plugin_interface_version__ = 1
|
||||
|
||||
# The main entrypoint for the plugin. This is called each time MarkItDown instances are created.
|
||||
def register_converters(markitdown: MarkItDown, **kwargs):
|
||||
"""
|
||||
Called during construction of MarkItDown instances to register converters provided by plugins.
|
||||
"""
|
||||
|
||||
# Simply create and attach an RtfConverter instance
|
||||
markitdown.register_converter(RtfConverter())
|
||||
```
|
||||
|
||||
|
||||
Finally, create an entrypoint in the `pyproject.toml` file:
|
||||
|
||||
```toml
|
||||
[project.entry-points."markitdown.plugin"]
|
||||
sample_plugin = "markitdown_sample_plugin"
|
||||
```
|
||||
|
||||
Here, the value of `sample_plugin` can be any key, but should ideally be the name of the plugin. The value is the fully qualified name of the package implementing the plugin.
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
To use the plugin with MarkItDown, it must be installed. To install the plugin from the current directory use:
|
||||
|
||||
```bash
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
Once the plugin package is installed, verify that it is available to MarkItDown by running:
|
||||
|
||||
```bash
|
||||
markitdown --list-plugins
|
||||
```
|
||||
|
||||
To use the plugin for a conversion use the `--use-plugins` flag. For example, to convert a PDF:
|
||||
|
||||
```bash
|
||||
markitdown --use-plugins path-to-file.pdf
|
||||
```
|
||||
|
||||
In Python, plugins can be enabled as follows:
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
|
||||
md = MarkItDown(enable_plugins=True)
|
||||
result = md.convert("path-to-file.pdf")
|
||||
print(result.text_content)
|
||||
```
|
||||
|
||||
## Trademarks
|
||||
|
||||
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
|
||||
trademarks or logos is subject to and must follow
|
||||
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
|
||||
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
|
||||
Any use of third-party trademarks or logos are subject to those third-party's policies.
|
||||
Reference in New Issue
Block a user