Many academics now read journal articles on screen, as PDF files. Holdouts—and they are everywhere—print out forest-sized stacks of paper that teeter on crowded desks.
Freedom from clutter is just one advantage of digital reading. Another is searchability: a large database of articles is a crutch for our fallible memories. Thanks to Spotlight, a full-text keyword query can substitute for a laborious hunt for literature. A few carefully phrased searches is all it takes for instant recall.
Spotlight has a maddening flaw, however. Standard PDF annotations—both “sticky notes” and text “typed” on a page—are not indexed. As far as Spotlight is concerned, they do not exist.
For me this is no trivial problem. More often than not, these annotations contain the very text I’m searching for.
Papers, my reference manager of choice, has a partial solution. The software has its own annotation feature built-in. Though sticky notes created in Papers don’t get indexed by Spotlight, they are searchable with Papers’ own search function. Still, I often want to search PDF notes that aren’t annotated in Papers. I’m also wary of relying too heavily on Papers’ annotations since the feature is still a little buggy.
Skim is also Spotlight-friendly. The app stores its annotations in a separate file (with a .skim extension) that Spotlight happily indexes. And it’s easy to convert the Adobe-style annotations used by Preview, PDFpen and most other PDF software into Skim notes. Select “Convert Notes…” under the File menu, or assign a keyboard shortcut.
I now designate Skim as my default PDF editor in Papers. (In the “Paper” menu, scroll down to the “Open PDF with…” option. If you want Papers to default to Skim, rather than open a new tab in Papers, go to Preferences, select the “Papers” pane, and select Skim in the “Open PDF Files” drop-down.)
My problem was mostly solved, but what about all those PDFs already annotated using the standard tools? And then there were the PDFs annotated on the iPad. All of the best PDF apps on iOS—including iAnnotate, GoodReader, PDFpen for iPad, and (my favorite) PDF Expert—use standard, Adobe-style annotation tools. It takes too much time to manually convert each and every old or iPad-derived PDF to Skim formatting.
There’s an obvious need here for automation, but unfortunately no simple solution. The problem is that there’s no easy way to automatically detect that a PDF contains annotations (and therefore needs converting to Skim). None of the PDF metadata registers that a PDF has been annotated, so there’s no way to trigger an automated conversion.
Be warned that the solution I hit upon is fiddly. It works, but only justifies the set up if you have a large batch of PDFs to convert and/or if you frequently take notes on your iPad. The solution involves Hazel, the indispensable Mac automation tool.
A post on Hazel’s forums (by AppleSuperlatives) suggested using grep, the Unix plain-text search utility, for a related issue (detecting whether a PDF had been OCRed). The basic insight was that PDFs are, at core, binary files, which grep can search.
The next step was to isolate a snippet of text that gets added to a PDF’s binary when annotations exist. To do this I opened two PDFs—identical, except that one had annotations—in a text editor and saved them as text files. Next, I used the superb file-comparison tool Kaleidoscope to call out the differences between the two files. The annotations, in this case, had been made in Preview, and I found and copied some text (“Annot /T”) that gets generated in every annotation. This would be my trigger.
I set up a new Hazel rule to monitor my main Papers folder. (For basics on Hazel, see the tutorials and tips on the developer’s site.) The updated version of Hazel has a few new “if” conditions, including “Passes shell script.”
I embedded a short script to detect if the “Annot /T” phrase appeared; if so, the Hazel action would trigger.
Just in case I added a second condition, to ensure that the matched file is a PDF.
The next step was to ask Hazel to convert the standard-annotation PDF to Skim. First I prompted Hazel to open the file in Skim, and then added a short, embedded AppleScript.
I also added a separate Hazel rule, to run the conversion rule on all the subfolders in my Papers directory.
Soon all of my old, Preview-generated annotated PDFs converted to Skim. I encountered a problem, though, with the PDFs I annotated in PDF Expert for iPad: the “Annot /T” did not appear. So I used Kaleidoscope to find another phrase that appeared in PDF Expert-generated PDFs: “/Name /Comment”. I then modified the Hazel rule, using a nested condition—so that either “Annot /T” or “/Name /Comment” would trigger Hazel.
Now the articles I annotate on my iPad get converted too, once they’re back on my Mac. Of course if you use another iOS app like iAnnotate, you’ll need to locate a phrase and modify your Hazel rule accordingly.
It’s a hassle, for sure, but worth it. Now all my notes exist for Spotlight again. Memory restored.