Document cloud's document viewer integration into plone.
Project description
Introduction
This package integrates documentcloud’s viewer and pdf processing into plone.
Example viewer: https://www.documentcloud.org/documents/19864-goldman-sachs-internal-emails
Features
very nice document viewer
OCR
Searchable on OCR text
works with many different document types
plone.app.async integration with task monitor
lots of configuration options
PDF Album view for display groups of PDFs
Works with
Besides displaying PDFs, it will also display:
Word
Excel
Powerpoint
HTML
RTF
Install requirements
graphicsmagick
ghostscript
poppler
tesseract
pdftk
openoffice(for doc, excel, ppt, etc types)
Async Integration
It it highly recommended to install and configure plone.app.async in combination with this package. Doing so will manage all pdf conversions processes asynchronously so the user isn’t delayed so much when saving files.
Settings
The product can be configured via a control panel item Document Viewer Settings.
Some interesting configuration options:
- Storage Type
If you want to be able to serve you files via amazon cloud, this will allow you to store the data in flat files that can be synced to another server.
- Storage Location
Where are the server to store the files.
- OCR
Use tesseract to scan the document for text. This process ca be slow so if your pdfs do not need to be OCR’d, you may disable.
- Auto Select Layout
For pdf files added to the site, automatically select the document viewer display.
- Auto Convert
When pdf files are added and modified, automatically convert.
File storage integration
If you choose to use basic file storage instead of zodb blob storage, there are a few things you’ll want to keep in mind.
Use nginx to then serve the file system files. This might require you install a local nginx just for serving file storage on the plone server. You can get creative with how your file storage is used though.
Since in plone’s operation, it can be interrupted and the deletion of a file on the OS system system can not be done within a transaction, no files are ever deleted. However, there is an action you can put in a cron task to clean up your file storage directory. Just call the url http://zeoinstace/plone/@@dvcleanup-filestorage.
TODO
- reject converting pdf if too large and no async support provided?
or let people hang themselves?
Changelog
1.2a1 (unreleased)
fix full screen page bug [vangheem]
better async integration with quota setting [vangheem]
View async queue for conversions [vangheem]
index ocr data in portal catalog [vangheem]
better pdf group view with search [vangheem]
handle large files better [vangheem]
check if file has already been converted by storing hash of the file to check against. [vangheem]
be able to remove document viewer conversion tasks [vangheem]
add ability to cleanup file storage files for deleted plone File objects. [vangheem]
1.1a1 (2012-04-18)
add pdf folder album view [vangheem]
fix async integration [vangheem]
1.0a2 (2012-04-17)
add control panel icon [vangheem]
fix uninstall procedure [vangheem]
changing image type does not cause existing ones to fail. [vangheem]
1.0a1 (2012-04-17)
Initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for collective.documentviewer-1.2a1.zip
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5044e519f8a0cbbf8b48f461f330f1eb278b7c0dfeed2d0a0e2c20dea6fff93d |
|
MD5 | 1f94bbbb97a98a6e68dc713d3c52efba |
|
BLAKE2b-256 | 8f4730c4f9df15c588dbff66f2cc45732903420782d453c28f71c56de260fef5 |