Sets up Dependabot to automatically check for updates to
GitHub Actions on a weekly basis, ensuring that the project
remains up-to-date with the latest dependencies and security
fixes.
Co-authored-by: gagb <gagb@users.noreply.github.com>
* Add support for Path objects in MarkItDown conversion methods
* Remove unnecessary blank line in test_markitdown_exiftool function
* Remove unnecessary blank line in test_markitdown_exiftool function
* remove pathlib path in test file
---------
Co-authored-by: afourney <adamfo@microsoft.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
* refactor(tests): simplify string validation in tests
Introduce a helper function `validate_strings` to streamline the
validation of expected and excluded strings in test cases. Replace
repetitive string assertions in the `test_markitdown_local` function
with calls to this new helper, improving code readability and
maintainability.
* run pre-commit
---------
Co-authored-by: lumin <71011125+l-melon@users.noreply.github.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
Use `textwrap.dedent()` to allow indented cli-helpdoc in `__main__.py` file. The indentation increases readability, while `textwrap.dedent` helps maintain the same functionality without breaking code.
fix: prevent path traversal vulnerabilities in ZipConverter
Added a secure check for path traversal vulnerabilities in the ZipConverter class.
Now validates extracted file paths using `os.path.commonprefix` to ensure all files
remain within the intended extraction directory. Raises a `ValueError` if a
path traversal attempt is detected.
- Normalized file paths using `os.path.normpath`.
- Added specific exception handling for `zipfile.BadZipFile` and traversal errors.
- Ensured cleanup of extracted files after processing when `cleanup_extracted` is enabled.