* feat: Add CSV to Markdown table converter
- Add new CsvConverter class to convert CSV files to Markdown tables\n- Support text/csv and application/csv MIME types\n- Preserve table structure with headers and data rows\n- Handle edge cases like empty cells and mismatched columns\n- Fix Azure Document Intelligence dependency handling\n- Register CsvConverter in MarkItDown class
----
Thanks also to @benny123tw who submitted a very similar PR in #1171
* feat: math equation rendering in .docx files
* fix: import fix on .docx pre processing
* test: add test cases for docx equation rendering
* docs: add ThirdPartyNotices.md
* refactor: reformatted with black
* Make it easier to use AzureKeyCredentials with Azure Doc Intelligence
* Fixed mypy type error.
* Added more fine-grained options over types.
* Pass doc intel options further up the stack.
* Added an initial minimal MCP server for MarkItDown
* Added STDIO default option.
* Added a Dockerfile, and updated the README accordingly. Also added instructions for Claude Desktop
* Pin mcp version.
* optional reserve base64 string in markdown _CustomMarkdownify and pptx
* add other converter para support
* fix linter
* Use *kwarg to pass keep_data_uri para.
* Add module cli vector tests
* Fixed formatting, and adjusted tests.
Adjusts warning filters to be more contextual
Updates dependencies for magika and youtube-transcript-api
Updates the version to 0.1.0a5 in __about__.py
* Refactored tests.
* Fixed CI errors, and included misc tests.
* Omit mskanji from streaminfo test.
* Omit mskanji from no hints test.
* Log results of debugging in comments (linked to Magika issue)
* Added docs as to when to use misc tests.
* refactor(docker): remove unnecessary root user
The USER root directive isn't needed directly after FROM
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* fix(docker): use generic nobody nogroup default instead of uid gid
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* fix(docker): build app from source locally instead of installing package
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* fix(docker): use correct files in dockerignore
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* chore(docker): dont install recommended packages with git
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* fix(docker): run apt as non-interactive
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* Update Dockerfile to new package structure, and fix streaming bugs.
---------
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
Co-authored-by: afourney <adamfo@microsoft.com>
* Sort PPTX shapes to be read in top-to-bottom, left-to-right order
Referenced from 39bef65b31/pptx2md/parser.py (L249)
* Update README.md
* Fixed formatting.
* Added missing import
* Updated DocumentConverter interface
* Updated all DocumentConverter classes
* Added support for various new audio files.
* Updated sample plugin to new DocumentConverter interface.
* Updated project README with notes about changes, and use-cases.
* Updated DocumentConverter documentation.
* Move priority to outside DocumentConverter, allowing them to be reprioritized, and keeping the DocumentConverter interface simple.
---------
Co-authored-by: Kenny Zhang <kzhang678@gmail.com>