Version your Apple Pages and Microsoft Word docs
Introduction
Versioning is a very helpful way of distinguising the state of a resource throughout its lifetime. It is used in source code control and can be applied to products like Apple Page and Microsoft word with the following steps.
- I want to include a version string that associates the document with a commit
- I want to be able to tell what changes happened between commits
I solved this problem by writing a script to generate a PDF file and an XHTML file. The XHTML file is used to identify the differences between commits. The PDF file is the file that I distribute; it is not included in the repo. I used this process with Apple Pages but it should work with other editors like Microsoft Office.
Setup
- Install [LibreOffice]. You can download the software from their website.
- Install Git. If you’re on Mac, simply run the git command from the command line and it will walk you through the install process.
- Install SourceTree. I use sourcetree to help me visualize the changes I’ve made.
-
Create the repository that will store your files. This is required to apply a tag, otherwise we will run into the error on line 12 of our script.
git init echo "*.pdf" >>.gitignore git add . git commit -m 'Initial.' git tag -a v1.0.0 -m v1.0.0 git describe --dirty
This is the view from SourceTree once you’ve completed the above steps.
Workflow
- Create a file in Pages (or modify an existing one).
- From the repository, run the script
p pages
. - Commit the file pairs (Pages and XHTML).
This is what the source document looks like:
This is what the PDF looks like:
This is what the diff looks like when I make a small change to the document”
Script
error() { RED='\033[0;31m' ; NC='\033[0m' ; printf "${RED}$@${NC}\n" ; exit 1 ; }
cmd_convert() { /Applications/LibreOffice.app/Contents/MacOS/soffice --convert-to "$@" >/dev/null ; }
cmd_describe() { git describe --dirty ; }
cmd_pages() {
files="$1"
[ -z "$files" ] && files="*.pages"
version=$(cmd_describe)
[ -z "$version" ] && error "No version information"
for src in $files ; do
base="${src%.*}"
echo "• Process $src"
echo "=> Generate opendoc"
cmd_convert_opendoc "$src"
echo "=> Apply substibutions"
unzip "$base.odt" content.xml >/dev/null
LC_ALL=C sed -i '' 's/{{version}}/'"$version"'/g' content.xml
zip -m "$base.odt" content.xml >/dev/null
echo "=> Generate xhtml"
cmd_convert_xhtml "$base.odt"
tidy --indent yes --indent-attributes yes --wrap 0 -o "$base.xhtml" "$base.xhtml" >/dev/null 2>&1
echo "=> Generate pdf"
cmd_convert_pdf "$base.odt"
echo "=> Remove opendoc"
rm "$base.odt"
done
}
I wrap my code snippets in functions so that I can drop them into a larger script. You can learn more about this in Create a self-contained Bash script.
This script is invoked as p pages
. I can call it with parameters as in p pages foo.pages bar.pages
or without any parameters. Please note that if you use parameters and the file names have spaces, then you will need to either include the file in quotes (p pages "foo bar.pages"
) or escape the spaces with a backslash (p pages foo\ bar.pages
).
The script uses helper functions described in the next section.
- This function allows us to print a message in red using non-printing escape sequences and to exit the program with a non-zero exit code.
- –
- We call the
soffice
program to do the conversions. Our script allows us to call this command independently. So we could, for example, callp convert odt Untitled.pages
. - –
- This function gives us the version string that we substitute in the documents.
- –
- We open the
cmd_pages
function. - We allow the user to specify the file(s) to process.
- If the user doesn’t specify a file, then we list all the
pages
files in the current directory. - –
- We get version information from the git repo. This will result in a string like
v1.0.0-18-g6d8992d
. - If the version isn’t set properly, we throw an error.
- –
- We open the
for
loop for each item infiles
. Thesrc
variable will be set to the file we are currently processing. - We set a variable called base to the name of the file without the extension. If the src is
foo.pages
, thenbase
is set tofoo
. - –
- –
- –
- We convert the Pages file to OpenDoc. This allows me to make changes to the file. We found it much more difficult to modify the PDF file and I didn’t want to alter the Pages file.
- –
- –
- The opendoc file is actually a ZIP of a collection of files. We pull out the content.xml file which contains the string we want to change.
- We use sed to replace the string
{{version}}
with the version string from line 13. In previous attempts at this script, I ran into the messagesed: RE error: illegal byte sequence
when I ran the sed command. AddingLC_ALL=C
solved this problem. I didn’t experience the error when I ran it oncontent.xml
but I left it in just in case. - We move the modified
content.xml
back into the ODT file. - –
- –
- We convert the OpenDoc file to XHTML.
- We use the
tidy
command to beautify the XHTML. This makes thediff
easier to read. - –
- –
- We convert the OpenDoc file to PDF. This ensures that the file I distribute includes the version string.
- –
- –
- We remove the OpenDoc file.
- We close the
for
loop. - We close the
cmd_pages
function.
Comments
Join the discussion for this article on this ticket. Comments appear on this page instantly.