* refactor: remove unused imports
* fix: replace NotImplemented with NotImplementedError
* refactor: resolve E722 (do not use bare 'except')
* refactor: remove unused variable
* refactor: remove unused imports
* refactor: ignore unused imports that will be used in the future
* refactor: resolve W293 (blank line contains whitespace)
* refactor: resolve F541 (f-string is missing placeholders)
---------
Co-authored-by: afourney <adamfo@microsoft.com>
* feat: Add CSV to Markdown table converter
- Add new CsvConverter class to convert CSV files to Markdown tables\n- Support text/csv and application/csv MIME types\n- Preserve table structure with headers and data rows\n- Handle edge cases like empty cells and mismatched columns\n- Fix Azure Document Intelligence dependency handling\n- Register CsvConverter in MarkItDown class
----
Thanks also to @benny123tw who submitted a very similar PR in #1171
* Make it easier to use AzureKeyCredentials with Azure Doc Intelligence
* Fixed mypy type error.
* Added more fine-grained options over types.
* Pass doc intel options further up the stack.
* Refactored tests.
* Fixed CI errors, and included misc tests.
* Omit mskanji from streaminfo test.
* Omit mskanji from no hints test.
* Log results of debugging in comments (linked to Magika issue)
* Added docs as to when to use misc tests.
* refactor(docker): remove unnecessary root user
The USER root directive isn't needed directly after FROM
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* fix(docker): use generic nobody nogroup default instead of uid gid
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* fix(docker): build app from source locally instead of installing package
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* fix(docker): use correct files in dockerignore
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* chore(docker): dont install recommended packages with git
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* fix(docker): run apt as non-interactive
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
* Update Dockerfile to new package structure, and fix streaming bugs.
---------
Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com>
Co-authored-by: afourney <adamfo@microsoft.com>
* Updated DocumentConverter interface
* Updated all DocumentConverter classes
* Added support for various new audio files.
* Updated sample plugin to new DocumentConverter interface.
* Updated project README with notes about changes, and use-cases.
* Updated DocumentConverter documentation.
* Move priority to outside DocumentConverter, allowing them to be reprioritized, and keeping the DocumentConverter interface simple.
---------
Co-authored-by: Kenny Zhang <kzhang678@gmail.com>
Initialize `res` at the beginning of `_convert`. If the first converter raises an exception, then the `res` variable was not initialized and we got an error when checking `if res is not None`
* Work started moving converters to individual files.
* Significant cleanup and refactor.
* Moved everything to a packages subfolder.
* Added sample plugin.
* Added instructions to the README.md
* Bumped version, and added a note about compatibility.