Commit Graph

66 Commits

Author SHA1 Message Date
一I
38261fd31c Update Python version requirement and add .cursorrules to .gitignore (#1249)
* update markdown
* Update and install Python version suggestions
* Update README with prerequisites.
---------

Co-authored-by: Lucas Liu <lucas@LucasdeMacBook-Pro.local>
Co-authored-by: afourney <adamfo@microsoft.com>
2025-05-21 10:47:29 -07:00
lentil32
ebe2684b3d chore: fix typo in README.md (#1175)
* chore: fix typo in README.md
2025-04-13 09:29:16 -07:00
afourney
9a951055f0 Update readme to point to the mcp package. (#1158)
* Updated readme with link to the MCP package.
2025-03-25 15:00:04 -07:00
afourney
2ffe6ea591 Bump version. (#1150) 2025-03-22 11:21:32 -07:00
afourney
a93e0567e6 EPub Support. Adapted #123 to not use epublib. (#1131)
* Adapted #123 to not use epublib.
* Updated README.md
2025-03-17 07:48:15 -07:00
Richard Ye
0229ff6cb7 feat: sort pptx shapes to be parsed in top-to-bottom, left-to-right order (#1104)
* Sort PPTX shapes to be read in top-to-bottom, left-to-right order

Referenced from 39bef65b31/pptx2md/parser.py (L249)

* Update README.md
* Fixed formatting.
* Added missing import
2025-03-07 15:45:14 -08:00
Andrea Pietrobon
80baa5db18 fix(README): correct pip install command formatting (#1090)
Added missing quotes around `markitdown[all]` in the installation command  
to ensure proper package resolution by pip.
2025-03-05 23:21:10 -08:00
Adam Fourney
00a65e8f8b Fixed version in README. 2025-03-05 23:10:21 -08:00
afourney
e921497f79 Update converter API, user streams rather than file paths (#1088)
* Updated DocumentConverter interface
* Updated all DocumentConverter classes
* Added support for various new audio files.
* Updated sample plugin to new DocumentConverter interface.
* Updated project README with notes about changes, and use-cases.
* Updated DocumentConverter documentation.
* Move priority to outside DocumentConverter, allowing them to be reprioritized, and keeping the DocumentConverter interface simple.

---------

Co-authored-by: Kenny Zhang <kzhang678@gmail.com>
2025-03-05 21:16:55 -08:00
afourney
c5cd659f63 Exploring ways to allow Optional dependencies (#1079)
* Enable optional dependencies. Starting with pptx.
* Fix CLI tests.... have them install [all]
* Added .docx to optional dependencies
* Reuse error messages for missing dependencies.
* Added xlsx and xls
* Added pdfs
* Added Ole files.
* Updated READMEs, and finished remaining feature-categories.
* Move OpenAI to hatch-test environment.
2025-03-03 09:06:19 -08:00
Nima Akbarzadeh
a394cc7c27 fix: Implement retry logic for YouTube transcript fetching and fix URL decoding issue (#1035)
* fix: add error handling, refactor _findKey to use json.items()

* fix: improve metadata and description extraction logic

* fix: improve YouTube transcript extraction reliability

* fix: implement retry logic for YouTube transcript fetching and fix URL decoding issue

* fix(readme): add youtube URLs as markitdown supports
2025-02-27 23:17:54 -08:00
afourney
c73afcffea Cleanup and refactor, in preparation for plugin support. (#318)
* Work started moving converters to individual files.
* Significant cleanup and refactor.
* Moved everything to a packages subfolder.
* Added sample plugin.
* Added instructions to the README.md
* Bumped version, and added a note about compatibility.
2025-02-10 15:21:44 -08:00
KennyZhang1
bf6a15e9b5 Kennyzhang/docintel docs (#312)
* updated docs to include doc intelligence

* include reference to doc intel setup docs
2025-01-31 22:23:26 -08:00
afourney
265aea2edf Removed the holiday away message from README.md (#266) 2025-01-06 09:06:21 -08:00
Ikko Eltociear Ashimine
125e206047 docs: update README.md (#182)
faciliate -> facilitate
2024-12-21 01:51:30 -08:00
gagb
857a2d160d Update README.md (#180) 2024-12-20 14:49:20 -08:00
Soulter
1123392306 fix: support -o param to avoid encoding issues (#116)
* perf: cli supports -o param

* doc: update README

---------

Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:43:00 -08:00
gagb
7e6c36c5d4 docs: add contribution guidelines to README (#176) 2024-12-20 14:08:58 -08:00
gagb
c295dee5e4 Merge branch 'main' into patch-1 2024-12-19 13:22:51 -08:00
afourney
535147b2e8 Added holiday notice.
Added holiday notice.
2024-12-19 11:11:54 -08:00
gagb
5c776bda70 Update README.md 2024-12-19 10:30:53 -08:00
gagb
423a01844a Merge branch 'main' into patch-1 2024-12-19 10:30:10 -08:00
Petr@AP Consulting
b28f380a47 Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-19 09:23:15 +01:00
gagb
a2743a5314 Add downloads badge 2024-12-18 14:26:36 -08:00
Petr@AP Consulting
f6e75c46d4 Update README.md
I changed command for running script from Mac version (python3) to Windows version (python)
2024-12-18 21:17:47 +01:00
Petr@AP Consulting
f4471d96e2 Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-18 21:08:10 +01:00
Petr@AP Consulting
088007338d Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-18 21:07:55 +01:00
Petr@AP Consulting
bb929629f3 Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-18 21:05:36 +01:00
Petr@AP Consulting
233ba679b8 Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-18 21:05:04 +01:00
gagb
46b7f043d3 Merge branch 'main' into patch-1 2024-12-18 11:57:57 -08:00
Petr@AP Consulting
224f1df0fc Update README.md
I collapsed section about batch processing as was suggested
2024-12-18 09:28:18 +01:00
gagb
524aa0da75 Update README.md 2024-12-17 17:25:40 -08:00
gagb
de1b54d79f Update README.md 2024-12-17 17:25:13 -08:00
gagb
1e7806a7ac Simplify 2024-12-17 17:21:39 -08:00
gagb
3bcf2bdae7 Update README.md 2024-12-17 16:54:17 -08:00
gagb
f1e399eee4 Merge branch 'main' into add-devcontainer-config 2024-12-17 16:50:32 -08:00
Petr@AP Consulting
f398f3d443 Update README.md
I added description and script for batch of files processing
2024-12-17 10:26:09 +01:00
lumin
e0a30295ff docs: update README with Devcontainer instructions
Add instructions for using Dev to run tests.Remove the install script it is no longer needed. 
Update trademark section for clarity.
2024-12-17 17:04:31 +09:00
diya155
14bd8d319a Update README.md 2024-12-17 09:16:40 +05:30
gagb
24b52b2b8f Improve readme 2024-12-16 17:35:47 -08:00
gagb
09159aa04e Merge branch 'main' into main 2024-12-16 17:24:47 -08:00
gagb
736e7d9a7e Merge branch 'main' into patch-1 2024-12-16 16:53:58 -08:00
gagb
360c2dd95f Merge branch 'main' into main 2024-12-16 16:35:50 -08:00
gagb
ae4669107c Merge branch 'main' into main 2024-12-16 16:01:59 -08:00
afourney
fa1f496d51 Merge branch 'main' into patch-1 2024-12-16 14:18:20 -08:00
gagb
9e6a19987b Merge branch 'main' into main 2024-12-16 13:51:39 -08:00
gagb
e7d9b5546a Merge branch 'main' into patch-2 2024-12-16 13:42:28 -08:00
CharlesCNorton
3d9f3f3e5b Fix LLM terms
Updated all instances of mlm_client and mlm_model to llm_client and llm_model in the readme. The previous terms (mlm_client and mlm_model) are incorrect in the context of configuring Large Language Models (LLMs), as "MLM" typically refers to Masked Language Models, which is unrelated to the intended functionality. This change aligns the documentation with standard naming conventions for LLM configuration parameters and improves clarity for users integrating with LLMs like OpenAI's GPT models.
2024-12-16 16:23:03 -05:00
CyberNobie
010f841008 Ensure hatch is installed before running tests 2024-12-16 18:47:24 +05:30
Michele Adduci
013b022427 Added Docker Image for using markitdown in a sandboxed environment 2024-12-16 13:08:15 +01:00