05 Dec 07:56

Moskize91

1bf3c6a

v1.0.1 Latest

Latest

What's New in v1.0.1

Enhanced Error Handling: Added structured error types (FitzError, OCRError, InterruptedError) with detailed page and step information for better debugging
Improved Stability: Fixed crashes when encountering single-page PyMuPDF errors - now handles page-level failures gracefully
Online Demo: Try PDF Craft directly in your browser at pdf.oomol.com without any installation

What's Changed

docs(project): add online demo links by @Moskize91 in #260
feat: add new errors by @Moskize91 in #262
feat: don't crash when find just a page of fitz error by @Moskize91 in #263
doc(project): sync README.md by @Moskize91 in #264

Full Changelog: v1.0.0...v1.0.1

Contributors

Moskize91

Assets 2

02 Dec 06:44

Moskize91

v1.0.0

bb733c3

v1.0.0

🎉 PDF Craft v1.0.0 Official Release

PDF Craft v1.0.0 is now officially released. This version includes major architectural changes and brings significant performance improvements.

🚀 Core Changes: Fully Embracing DeepSeek OCR

The biggest change in v1.0.0 is the complete rewrite based on DeepSeek OCR, eliminating the dependency on LLM for text correction.

DeepSeek OCR is a powerful open-source OCR engine that supports complex content recognition (tables, formulas, images, footnotes, etc.) with excellent document structure understanding capabilities. Thanks to DeepSeek OCR, pdf-craft now offers:

Fully Local Processing: The entire conversion process runs completely locally without any network requests. No need to configure LLM APIs, and no risk of conversion failures due to network issues or API outages—in the old version, a single LLM request failure would halt the entire conversion process.
Faster Speed: Compared to v0.2.8 which required multiple LLM calls for text correction, the new version uses direct OCR recognition with significantly improved speed.
Higher Accuracy: DeepSeek OCR excels at document structure analysis, table recognition, and formula extraction, delivering high-quality results without secondary correction.
Simpler API: Removed complex LLM configuration and multi-step processing workflows. Now conversion can be completed with a single function call.

Additionally, v1.0.0 has fully migrated to DeepSeek OCR (MIT License), removing the previous AGPL-3.0 dependency. The entire project now uses the more permissive MIT License, making it easier for commercial use and integration!

⚠️ Important Change: CUDA Environment Required

The new version requires a CUDA environment to run. This is because DeepSeek OCR depends on CUDA acceleration for efficient document recognition. The old version (v0.2.8) could work in pure CPU environments using LLM, but the new version cannot run without a GPU.

If your environment doesn't support CUDA, do not upgrade to v1.0.0. Continue using v0.2.8:

pip install pdf-craft==0.2.8

For specific CUDA environment installation instructions, please refer to the Installation Guide.

🚫 When NOT to Upgrade

Continue using v0.2.8 in the following situations:

No GPU or CUDA Environment: The new version requires CUDA and cannot run without GPU
Need LLM Text Correction: The new version has removed LLM correction functionality. If your use case requires secondary correction of OCR results, continue using the old version or use it in combination with epub-translator

🙏 Acknowledgments

Thanks to DeepSeek OCR for being open source, and to all community members who have contributed code and feedback to pdf-craft!

If you have a CUDA environment, upgrade to v1.0.0 now and experience faster, more stable, and simpler PDF conversion! 🚀

Assets 2

26 Sep 05:40

Moskize91

v0.2.8

862487b

v0.2.8

What's Changed

fix(project): upgrade dependencies to fix bug by @Moskize91 in #248
Full Changelog: v0.2.7...v0.2.8

Contributors

Moskize91

Assets 2

23 Jul 03:14

Moskize91

v0.2.7

cd03f4b

v0.2.7

What's Changed

chore(project): update doc-page-extractor to fix bug by @Moskize91 #233
fix(project): will clear HTML of table by @Moskize91 #234
fix(analysers): will generate empty hash in chapters by @Moskize91 in #239
chore(project): upgrade epub-generator to fix bug by @Moskize91 in #240

Full Changelog: v0.2.5...v0.2.7

Contributors

Moskize91

Assets 2

12 Jul 04:22

Moskize91

v0.2.5

6a5aa1a

v0.2.5

What's Changed

fix(analysers): some codes are out of the lock domain by @Moskize91 in #224
fix(analysers): generate a huge paragraph and it will make request oversize by @Moskize91 in #228
feat(project): support new dependency API & update it to fix bugs by @Moskize91 in #229
fix(analysers): cannot report with max_count by @Moskize91 in #230
chore(project): upgrade to 0.2.5 by @Moskize91 in #231

Full Changelog: v0.2.4...v0.2.5

Contributors

Moskize91

Assets 2

11 Jul 01:30

Moskize91

v0.2.4

1521bbf

v0.2.4

What's Changed

fix: #209
fix: #216
fix: some of chapters cannot be generated in EPUB file

Full Changelog: v0.2.3...v0.2.4

Assets 2

Releases: oomol-lab/pdf-craft

v1.0.1

What's New in v1.0.1

What's Changed

Contributors

Uh oh!

v1.0.0

🎉 PDF Craft v1.0.0 Official Release

🚀 Core Changes: Fully Embracing DeepSeek OCR

⚠️ Important Change: CUDA Environment Required

🚫 When NOT to Upgrade

🙏 Acknowledgments

Uh oh!

v0.2.8

What's Changed

Contributors

Uh oh!

v0.2.7

What's Changed

Contributors

Uh oh!

v0.2.5

What's Changed

Contributors

Uh oh!

v0.2.4

What's Changed

Uh oh!