Party Pooper Mac OS
Qt Development Frameworks Documentation Qt Quarterly | |
« The Panel Stack Pattern |
Poppler: Displaying PDF Files with Qt
As we saw earlier in this issue, Qt can be used to generate documents in anever expanding range of formats that can be viewed and edited with externalapplications. Qt also comes with facilities to display HTML 'out of the box',and can generate its own print previews, but what about files that originateoutside Qt applications?
Fortunately, there are third party libraries available for some of the thingsthat Qt doesn't provide. One of these is Poppler, a Portable Document Format(PDF) rendering library that forms the basis of a number of widely-used PDFviewing applications. Poppler is a fork of the Xpdf PDF viewer that islicensed under the GNU General Public License. Xpdf can also be obtained underother licensing terms.
Sorry to be a party pooper, and yes, I have lots of vested intrests, but I get a bit frustrated that various parties seem to actively encourage Mac usage, and my experience has been that unless you mainly use Apple products on the Mac, you’re out in the cold. I cannot find the original source and I have looked but this explanation 1 is amusing if dubious. anecdote about the origin of the phrase 'party pooper.' According to this story, the phrase originated during Victorian aristocratic parties at w.
Poppler is designed in a way that allows it to be used with anytoolkit or framework as long as a suitable rendering backend is available.Qt application developers are fortunate in that there is also a Qt frontendavailable—a set of Qt-style classes that use Qt classes to describe parts ofPDF documents.
In this article, we'll take a brief look at some of the features providedby Poppler in the context of creating a simple PDF viewing application.
Setting Things Up
Developers using Linux should find that Poppler and the Qt 4 frontend areavailable as a package for most recent distributions. Developers on Windows,Mac OS X, and other Unix platforms can download source code from thepoppler.freedesktop.org Web site.
By default, Poppler is built with all kinds of frontends and backends. If youcompile Poppler from source, you can exclude some of these to save compiletime. When configuring the build, it may be easier to set the installationprefix to that used for the Qt installation—this prefix is the directoryunder which subdirectories containing executables, libraries and data filesare stored.
It is important to know where the Poppler library and header files will beinstalled because our example will need them.
Rendering Documents
In our example, we provide a simple user interface to display PDF files,displaying a single page at a time and providing controls to let the usermove between pages. Each page is displayed in a custom widget,DocumentWidget, held in the main window's central widget, a scrollarea.
The user opens a new file via a file dialog, which we open in response to anaction being triggered. The path to the file is passed to theDocumentWidget so that the document it contains can be fed to thePoppler library.
Unlike with many Qt classes, we load a document using a static function inthe following way:
If the document returned is not null, we have a document that we can explore.Note that our example takes ownership of the document, so we must remember todispose of it when we have finished with it.
Each document contains a series of pages that can be obtained one by one usingthe Document::page() function. Although the Document classhas a collection of functions to control the appearance of the document, actualrendering is performed by each Page object. In our example, we renderpages into QImage objects that we display using theDocumentWidget, itself just a simple QLabelsubclass.
The key part of our DocumentWidget::showPage() function looks likethis:
In the above code we pass the resolution of the image to be created, multipliedby a scale factor that the user controls via the example's user interface.We have to be careful with the range of scale factors available because it iseasy to request extremely large images. In practice, we restrict the user'schoice to a set of predefined scale factors.
Searching for Text
One of the many useful features that Poppler provides is the ability to locatespecific text strings in PDF documents. Since PDF is designed to storeprintable rather than editable documents, it is not always easy to easilyaccess and reconstruct the author's original text. However, Poppler does agood job of locating text in many documents, and we can expose this feature inour example.
The API for locating text provides conventional features such ascase-insensitive and directional searching, but also returns informationabout the position of any located text on the page—since PDF is a displayformat, this is really the only useful information about the text we canobtain. This information can be used to indicate where any subsequent searchesshould begin.
Basically, the code to perform a forward search in a given page looks likethis:
Here, searchLocation is a QRectF objectthat indicates where the search should start from on the given page. Initially,when we perform a search, we just pass a default constructedQRectF object to start from the page origin.
The rectangle we obtain from the Page::search() function can be usedwhen we render the page to highlight the located text and scroll the view tomake sure it is visible. However, the position and dimensions of the rectangleare given in points (1 inch = 72 points), so we need to transform the rectangleto cover the correct area on-screen.
Searching through a document for a piece of text is slightly more involvedthan just a single function call. We'll look at this in more detail later.
Extracting Text
Since the mapping between the author's original text and its location on-screenmay be purely visual, it is difficult to automate the extraction of text fromPDF files, though there are tools that try very hard to achieve this.
Many document viewers let the user select and export text by making them selecta region on-screen, giving the application something to work with, and Popplersupports this approach by providing a function that returns a string for a givenrectangle that we call like this:
The method we use is somewhat different to this. We'll cover it in moredetail later.
The Example in More Detail
Having covered the basics of displaying pages, searching, and extractingtext from documents, let's take a closer look at how our example usesthese features.
We provide two functions to search for text strings supplied by theuser via the user interface. For forwards searching, we start by lookingfor strings on the current page, beginning at the current search location,then try each following page until the end of the document.
If we reach the end of the document without finding anything, we searchfrom the beginning until we reach the current page.
As well as rendering pages at different scales, as shown earlier, we wouldlike to highlight the results of searches. To do this, we insert somecode to paint on the image obtained from the current page, using amatrix to map the rectangle onto the image.
The result of this additional effort is shown in the followingimage—the located text is displayed normally while the rest ofthe page is slightly darker.
In our example, we allow the user to draw a selection onto the pageby reimplementing three of the mouse event handler functions in ourDocumentWidget. In these we maintain aQRubberBand object to keep track of the areaselected, following the pattern shown in theQRubberBand documentation.
The mouse release event handler is where we start the process ofselecting text:
When the user releases the mouse button, we create a rectangle withcoordinates relative to the top-left corner of the image within thelabel, and we pass this to the selectedText() function which isresponsible for informing the rest of the application about any textit finds.
As noted earlier, the Poppler Page class provides a function toreturn text within a rectangle in a document. However, inselectedText(), we use a more convoluted method to show how muchinformation we can obtain about a document.
We begin by mapping the selection rectangle onto the page, using theinverse of the matrix we used to highlight search results, beforeobtaining a list of TextBox objects, each of which describes a pieceof text on the page.
We test whether each piece of text lies within the selection and appendit in a QString if it does. We also perform someelementary checks to see if we can cleverly insert newline characters inappropriate places.
Note that, while we're satisfied with obtaining whole pieces of text (typicallywords in a sentence), recent versions of Poppler allow the individualcharacters in TextBox objects to be located.
In the user interface, when the user selects some text, we display it ina text browser so that it can be copied and pasted elsewhere.
Building the Example
Party Pooper Mac Os Download
The example is provided as a standard Qt project with a simplepdfviewer.pro file. Because there is a certain amount of freedomassociated with where you can install the Poppler library and header fileson your system, you will need to modify this file to use the correct paths.
On Ubuntu 8.04 with the libpoppler-qt4-dev package installed, theappropriate paths are as follows:
Other Linux distributions may install these files in different locations,and developers on other platforms may find it easier to build the libraryalongside the example instead of installing it.
Other Features and Possible Improvements
Our PDF viewer example only uses the most basic features of the Popplerlibrary. Since many documents use features like encryption, slideshowtransitions, tables of contents and annotations, the viewer applications thatuse Poppler to render documents rely on the library's support for thesefeatures.
Poppler includes a number of low level features that are useful for thepurpose of analysing PDF files. Access to the list of fonts used in adocument and the font data itself can be useful when preparing documentsfor publication.
Access to the body of text in a document is usefulto developers looking to index documents for text mining and subsequentanalysis. However, as noted earlier, this might be of limited use for somedocuments. A good summary of the issues surrounding text extraction can befound on the following page:
Party Pooper Mac Os Download
Information that is not part of the visible document is also available viathe Poppler API. Annotations, scripts (typically written in JavaScript) andthe URLs for hyperlinks can all be obtained, though it is up to theapplication developer to present this information in a meaningful way.
Like Qt's QPrinter class, Poppler is also able towrite PostScript files, so we could easily add support for file export andconversion. Recent versions also support PDF output, and this opens the door tothe use of the library for PDF manipulation. In fact, since the library allowsus to examine documents without having to display pages, it is possible towrite command line tools to handle documents, and a number of these aresupplied with Poppler.
Finding Out More
Poppler is a hosted on freedesktop.org, a site dedicated to Free and OpenSource desktop projects:
Poppler's Qt 4 frontend has its own documentation, which can be obtainedvia the project's Wiki:
Popular PDF viewers which use Poppler include Okular and Evince for the KDEand GNOME desktop environments:
The Xpdf application, from which Poppler is derived, can be obtained fromthe following Web site:
The source code for the example described in this article can be obtainedfrom the Qt Quarterly Web site.