Commit Graph

266 Commits

Author SHA1 Message Date
gagb
ed91e8b534 Merge pull request #19 from brc-dd/fix/18
Fix character decoding issues with text-like files
2024-12-16 13:49:48 -08:00
gagb
aeff2cb5ae Merge branch 'main' into fix/18 2024-12-16 13:46:17 -08:00
gagb
c9c7d98d30 Merge pull request #11 from simonw/patch-2
CLI usage instructions
2024-12-16 13:45:05 -08:00
gagb
e7d9b5546a Merge branch 'main' into patch-2 2024-12-16 13:42:28 -08:00
CharlesCNorton
ed651aeb16 Fix LLM terminology in code
Replaced all occurrences of mlm_client and mlm_model with llm_client and llm_model for consistent terminology when referencing Large Language Models (LLMs).
2024-12-16 16:23:52 -05:00
CharlesCNorton
3d9f3f3e5b Fix LLM terms
Updated all instances of mlm_client and mlm_model to llm_client and llm_model in the readme. The previous terms (mlm_client and mlm_model) are incorrect in the context of configuring Large Language Models (LLMs), as "MLM" typically refers to Masked Language Models, which is unrelated to the intended functionality. This change aligns the documentation with standard naming conventions for LLM configuration parameters and improves clarity for users integrating with LLMs like OpenAI's GPT models.
2024-12-16 16:23:03 -05:00
Om Gupta
a3208f2bd0 feat: Add IpynbConverter
- Implemented IpynbConverter class for converting Jupyter Notebook (.ipynb) files into Markdown format.
- Supports markdown cells, code cells and raw cells.
- First markdown heading is used as the title if no title is found in notebook metadata.
- Created a test notebook (`test_notebook.ipynb`) to verify the functionality of the converter.
2024-12-17 01:00:41 +05:30
Divit
ad01da308d fix issue #65 2024-12-16 21:48:33 +05:30
CyberNobie
010f841008 Ensure hatch is installed before running tests 2024-12-16 18:47:24 +05:30
Michele Adduci
5fc03b6415 Added UID as argument 2024-12-16 13:11:13 +01:00
Michele Adduci
013b022427 Added Docker Image for using markitdown in a sandboxed environment 2024-12-16 13:08:15 +01:00
narumi
695100d5d8 Support specifying YouTube transcript language 2024-12-16 13:16:00 +08:00
Soulter
d66ef5fcca Update README to introduce the customized mlm_prompt 2024-12-16 12:08:51 +08:00
Soulter
c168703d5e Pass the kwargs to _convert method when converting an url file 2024-12-16 11:41:39 +08:00
Yeonjun
3548c96dd3 Create .gitattributes
Mark test files as linguist-vendored
2024-12-16 09:21:07 +09:00
SH4DOW4RE
1559d9d163 pre-commit ran 2024-12-15 22:15:20 +01:00
SH4DOW4RE
b7f5662ffd PR: Catching pydub's warning of ffmpeg or avconv missing 2024-12-15 17:29:14 +01:00
Ville Puuska
0a7203b876 add style_map prop to MarkItDown class 2024-12-15 17:23:57 +02:00
Ville Puuska
0704b0b6ff pass 'style_map' kwarg to mammoth when converting docx 2024-12-15 16:59:21 +02:00
sakasegawa
0dd4e95584 Remove _is_chart 2024-12-15 21:14:58 +09:00
sakasegawa
93130b5ba5 Add PPTX chart support 2024-12-15 20:42:55 +09:00
Divyansh Singh
52b723724c Fix character decoding issues with text-like files 2024-12-15 10:37:59 +05:30
Josh XT
a55c3d525c Merge branch 'main' into main 2024-12-14 23:09:30 -05:00
gagb
81e3f24acd Merge pull request #29 from microsoft/gagb-patch-1
Update README.md
2024-12-14 19:17:54 -08:00
gagb
b84294620a Update README.md 2024-12-14 19:05:51 -08:00
gagb
60c495d609 Merge branch 'main' into patch-2 2024-12-14 18:57:11 -08:00
gagb
71123a4df3 Merge pull request #7 from microsoft/gagb/improve-readme
Improve the readme with contributing guidelines
2024-12-14 18:54:28 -08:00
gagb
5753e553fe Fix conflicts 2024-12-14 18:47:34 -08:00
gagb
752dd897b9 Merge pull request #28 from pawarbi/main
Update README.md
2024-12-14 18:44:52 -08:00
gagb
1aa4abe90f Merge branch 'gagb/improve-readme' into main 2024-12-14 18:44:33 -08:00
gagb
ea7c6dcc40 Merge pull request #27 from haesleinhuepf/patch-1
Add installation instructions from haesleinhuepf:patch-1
2024-12-14 18:39:51 -08:00
gagb
a31c0a13e7 Merge branch 'main' into gagb/improve-readme 2024-12-14 18:34:27 -08:00
Sandeep Pawar
30ab78fe9e Update README.md
I have updated the readme with three changes:
- Created sections for Installation and Usage to help users
- Added installation instruction
- Added additional example of using LLM. This will be the primary use case and will help users.
2024-12-14 19:15:10 -06:00
gagb
559b1fc62a Merge branch 'main' into patch-2 2024-12-14 15:02:42 -08:00
Josh XT
df03382218 Improve docustring 2024-12-14 17:55:22 -05:00
Robert Haase
18301edcd0 Add installation instructions 2024-12-14 23:22:54 +01:00
Josh XT
4987201ef6 test 2024-12-14 08:49:03 -05:00
Josh XT
571c5bbc0e add test 2024-12-14 08:45:51 -05:00
Josh XT
e8ea8b6f3d Update readme 2024-12-14 08:41:07 -05:00
Josh XT
7e634acf5f import zipfile 2024-12-14 08:24:44 -05:00
Josh XT
862c39029e add zip handling 2024-12-14 06:34:47 -05:00
afourney
70ab149ff1 Merge pull request #10 from simonw/patch-1
Remove invalid classifiers
2024-12-13 21:10:53 -08:00
Simon Willison
33ce17954d Note about piping 2024-12-13 11:09:03 -08:00
Simon Willison
6ebef5af0c CLI usage instructions
Plus added  a PyPI badge
2024-12-13 11:06:11 -08:00
Simon Willison
3b88696777 Remove invalid classifiers
requires-python says 3.10 and higher only
2024-12-13 10:53:35 -08:00
gagb
3f9ba06418 Improve the readme with contributing guidelines
Addresses issue https://github.com/microsoft/markitdown/issues/6

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/microsoft/markitdown?shareId=XXXX-XXXX-XXXX-XXXX).
2024-12-12 15:17:18 -08:00
afourney
b40139652b Merge pull request #4 from microsoft/fixes_for_filesurfer
Small fixes for the file surfer.
v0.0.1a2
2024-11-25 15:08:44 -08:00
Adam Fourney
cc0a039bb0 Small fixes for the filesurfer. 2024-11-25 13:42:05 -08:00
afourney
851c7cff96 Merge pull request #3 from microsoft/add_cli
Added a simple CLI.
2024-11-14 10:27:55 -08:00
Adam Fourney
2ad821ae8f Merge branch 'add_cli' of github.com:microsoft/markitdown into add_cli 2024-11-14 10:24:52 -08:00