Commit Graph

188 Commits

Author SHA1 Message Date
AbSadiki
4678c8a2a4 fix(transcription): IS_AUDIO_TRANSCRIPTION_CAPABLE should be iniztialized (#194) 2025-01-03 13:29:26 -08:00
Ikko Eltociear Ashimine
125e206047 docs: update README.md (#182)
faciliate -> facilitate
2024-12-21 01:51:30 -08:00
numekudi
f94d09990e feat: enable Git support in devcontainer (#136)
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 18:09:17 -08:00
lumin
cfd2319c14 feat: add version option to markitdown CLI (#172)
Add a `--version` option to the markitdown command-line interface 
that displays the current version number.
2024-12-20 16:24:45 -08:00
dependabot[bot]
73161982ff Bump actions/setup-python from 2 to 5 (#179)
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v2...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: afourney <adamfo@microsoft.com>
2024-12-20 16:20:22 -08:00
dependabot[bot]
9b69467772 Bump actions/cache from 3 to 4 (#178)
Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
Co-authored-by: afourney <adamfo@microsoft.com>
2024-12-20 16:17:43 -08:00
gagb
857a2d160d Update README.md (#180) 2024-12-20 14:49:20 -08:00
Soulter
1123392306 fix: support -o param to avoid encoding issues (#116)
* perf: cli supports -o param

* doc: update README

---------

Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:43:00 -08:00
dependabot[bot]
377a7eaa7d Bump actions/checkout from 2 to 4 (#177)
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v2...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-20 14:36:48 -08:00
lumin
c1a0d3deaf chore: configure Dependabot for GitHub Actions updates (#112)
Sets up Dependabot to automatically check for updates to 
GitHub Actions on a weekly basis, ensuring that the project 
remains up-to-date with the latest dependencies and security 
fixes.

Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:28:55 -08:00
SigireddyBalasai
5276616ba1 Added support to use Pathlib (#93)
* Add support for Path objects in MarkItDown conversion methods

* Remove unnecessary blank line in test_markitdown_exiftool function

* Remove unnecessary blank line in test_markitdown_exiftool function

* remove pathlib path in test file

---------

Co-authored-by: afourney <adamfo@microsoft.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:12:48 -08:00
gagb
7e6c36c5d4 docs: add contribution guidelines to README (#176) 2024-12-20 14:08:58 -08:00
lumin
52d73080c7 refactor(tests): add helper function for tests (#87)
* refactor(tests): simplify string validation in tests

Introduce a helper function `validate_strings` to streamline the 
validation of expected and excluded strings in test cases. Replace 
repetitive string assertions in the `test_markitdown_local` function 
with calls to this new helper, improving code readability and 
maintainability.

* run pre-commit

---------

Co-authored-by: lumin <71011125+l-melon@users.noreply.github.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 11:42:32 -08:00
afourney
e6421000e3 Merge pull request #160 from sugatoray/support_type_hinting
Add support for type-hinting (PEP-561)
2024-12-20 10:54:43 -08:00
Sugato Ray
08a25345e3 [feat]: add support for type-hinting for PEP-561 2024-12-20 02:37:10 +00:00
Sugato Ray
8921fe7304 ignore .vscode folder
- avoid local developer vscode editor settings
2024-12-20 02:18:14 +00:00
Sugato Ray
613825d5b3 [feat]: add support for type-hinting for PEP-561 2024-12-20 02:12:24 +00:00
gagb
18e3f1d428 Merge pull request #91 from PetrAPConsulting/patch-1
Update README.md
2024-12-19 14:02:47 -08:00
gagb
c295dee5e4 Merge branch 'main' into patch-1 2024-12-19 13:22:51 -08:00
gagb
dd87dd5e36 Merge pull request #156 from microsoft/afourney-patch-1
Added holiday notice.
2024-12-19 11:18:24 -08:00
afourney
535147b2e8 Added holiday notice.
Added holiday notice.
2024-12-19 11:11:54 -08:00
gagb
5c776bda70 Update README.md 2024-12-19 10:30:53 -08:00
gagb
423a01844a Merge branch 'main' into patch-1 2024-12-19 10:30:10 -08:00
gagb
7147bef7d5 Merge pull request #130 from sugatoray/update_commandline_help
Update CLI helpdoc formatting to allow indentation in code
2024-12-19 10:20:23 -08:00
Sugato Ray
a5f39d6922 Merge branch 'main' into update_commandline_help 2024-12-19 07:58:48 -05:00
gagb
925c4499f7 Merge pull request #121 from l-lumin/add-project-description 2024-12-19 00:53:54 -08:00
Petr@AP Consulting
b28f380a47 Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-19 09:23:15 +01:00
lumin
c86287b7e3 feat: add project description in pyproject.toml 2024-12-19 13:02:47 +09:00
Sugato Ray
6f3c762526 Merge branch 'main' into update_commandline_help 2024-12-18 17:50:07 -05:00
gagb
cb66b35f11 Merge pull request #132 from microsoft/gagb-patch-1
Add downloads badge
2024-12-18 14:30:09 -08:00
gagb
a2743a5314 Add downloads badge 2024-12-18 14:26:36 -08:00
Sugato Ray
277480066a Merge branch 'update_commandline_help' of https://github.com/sugatoray/markitdown into update_commandline_help 2024-12-18 21:53:54 +00:00
gagb
6e1b9a7402 Run precommit 2024-12-18 13:46:10 -08:00
Sugato Ray
1384e80725 update .gitignore to exclude .vscode folder 2024-12-18 21:46:06 +00:00
Sugato Ray
356e895306 update formatting with pre-commit 2024-12-18 21:45:23 +00:00
Petr@AP Consulting
f6e75c46d4 Update README.md
I changed command for running script from Mac version (python3) to Windows version (python)
2024-12-18 21:17:47 +01:00
afourney
8bc1bee18b Merge pull request #129 from finchy/main
Safeguard against path traversal for ZipConverter
2024-12-18 12:11:00 -08:00
Petr@AP Consulting
f4471d96e2 Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-18 21:08:10 +01:00
Petr@AP Consulting
088007338d Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-18 21:07:55 +01:00
Petr@AP Consulting
bb929629f3 Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-18 21:05:36 +01:00
Petr@AP Consulting
233ba679b8 Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-18 21:05:04 +01:00
gagb
46b7f043d3 Merge branch 'main' into patch-1 2024-12-18 11:57:57 -08:00
gagb
5fc70864f2 Run pre-commit 2024-12-18 11:46:39 -08:00
Sugato Ray
39410d01df Update CLI helpdoc formatting to allow indentation in code
Use `textwrap.dedent()` to allow indented cli-helpdoc in `__main__.py` file. The indentation increases readability, while `textwrap.dedent` helps maintain the same functionality without breaking code.
2024-12-18 14:22:58 -05:00
Joel Esler
6e4caac70d Safeguard against path traversal for ZipConverter
fix: prevent path traversal vulnerabilities in ZipConverter

Added a secure check for path traversal vulnerabilities in the ZipConverter class.
Now validates extracted file paths using `os.path.commonprefix` to ensure all files
remain within the intended extraction directory. Raises a `ValueError` if a
path traversal attempt is detected.

- Normalized file paths using `os.path.normpath`.
- Added specific exception handling for `zipfile.BadZipFile` and traversal errors.
- Ensured cleanup of extracted files after processing when `cleanup_extracted` is enabled.
2024-12-18 13:12:55 -05:00
Petr@AP Consulting
224f1df0fc Update README.md
I collapsed section about batch processing as was suggested
2024-12-18 09:28:18 +01:00
gagb
1deaba1c6c Merge pull request #98 from waterimp/feature/fix-code-comments
fix incorrect comments for "bail if not ..." for WAV and image cases.
2024-12-17 17:57:25 -08:00
gagb
09cb048cbe Merge branch 'main' into feature/fix-code-comments 2024-12-17 17:34:53 -08:00
gagb
b029ae1cd4 Merge pull request #108 from microsoft/gagb-readme
Simplify README
2024-12-17 17:30:49 -08:00
gagb
524aa0da75 Update README.md 2024-12-17 17:25:40 -08:00