Introduction

Versioning is a very helpful way of distinguising the state of a resource throughout its lifetime. It is used in source code control and can be applied to products like Apple Page and Microsoft word with the following steps.

  1. I want to include a version string that associates the document with a commit
  2. I want to be able to tell what changes happened between commits

I solved this problem by writing a script to generate a PDF file and an XHTML file. The XHTML file is used to identify the differences between commits. The PDF file is the file that I distribute; it is not included in the repo. I used this process with Apple Pages but it should work with other editors like Microsoft Office.

Setup

  1. Install [LibreOffice]. You can download the software from their website.
  2. Install Git. If you’re on Mac, simply run the git command from the command line and it will walk you through the install process.
  3. Install SourceTree. I use sourcetree to help me visualize the changes I’ve made.
  4. Create the repository that will store your files. This is required to apply a tag, otherwise we will run into the error on line 12 of our script.

     git init
     echo "*.pdf" >>.gitignore
     git add .
     git commit -m 'Initial.'
     git tag -a v1.0.0 -m v1.0.0
     git describe --dirty
    

This is the view from SourceTree once you’ve completed the above steps. SourceTree init

Workflow

  1. Create a file in Pages (or modify an existing one).
  2. From the repository, run the script p pages.
  3. Commit the file pairs (Pages and XHTML).

This is what the source document looks like: Pages

This is what the PDF looks like: PDF

This is what the diff looks like when I make a small change to the document” SourceTree diff

Script

error() { RED='\033[0;31m' ; NC='\033[0m' ; printf "${RED}$@${NC}\n" ; exit 1 ; }

cmd_convert() { /Applications/LibreOffice.app/Contents/MacOS/soffice --convert-to "$@" >/dev/null ; }

cmd_describe() { git describe --dirty ; }

cmd_pages() {
    files="$1"
    [ -z "$files" ] && files="*.pages"

    version=$(cmd_describe)
    [ -z "$version" ] && error "No version information"

    for src in $files ; do
        base="${src%.*}"
        echo "• Process $src"

        echo "=> Generate opendoc"
        cmd_convert_opendoc "$src"

        echo "=> Apply substibutions"
        unzip "$base.odt" content.xml >/dev/null
        LC_ALL=C sed -i '' 's/{{version}}/'"$version"'/g' content.xml
        zip -m "$base.odt" content.xml >/dev/null

        echo "=> Generate xhtml"
        cmd_convert_xhtml "$base.odt"
        tidy --indent yes --indent-attributes yes --wrap 0 -o "$base.xhtml" "$base.xhtml" >/dev/null 2>&1
        
        echo "=> Generate pdf"
        cmd_convert_pdf "$base.odt"

        echo "=> Remove opendoc"
        rm "$base.odt"
    done
}

I wrap my code snippets in functions so that I can drop them into a larger script. You can learn more about this in Create a self-contained Bash script.

This script is invoked as p pages. I can call it with parameters as in p pages foo.pages bar.pagesor without any parameters. Please note that if you use parameters and the file names have spaces, then you will need to either include the file in quotes (p pages "foo bar.pages") or escape the spaces with a backslash (p pages foo\ bar.pages).

The script uses helper functions described in the next section.

  1. This function allows us to print a message in red using non-printing escape sequences and to exit the program with a non-zero exit code.
  2. We call the soffice program to do the conversions. Our script allows us to call this command independently. So we could, for example, call p convert odt Untitled.pages.
  3. This function gives us the version string that we substitute in the documents.
  4. We open the cmd_pages function.
  5. We allow the user to specify the file(s) to process.
  6. If the user doesn’t specify a file, then we list all the pages files in the current directory.
  7. We get version information from the git repo. This will result in a string like v1.0.0-18-g6d8992d.
  8. If the version isn’t set properly, we throw an error.
  9. We open the for loop for each item in files. The src variable will be set to the file we are currently processing.
  10. We set a variable called base to the name of the file without the extension. If the src is foo.pages, then base is set to foo.
  11. We convert the Pages file to OpenDoc. This allows me to make changes to the file. We found it much more difficult to modify the PDF file and I didn’t want to alter the Pages file.
  12. The opendoc file is actually a ZIP of a collection of files. We pull out the content.xml file which contains the string we want to change.
  13. We use sed to replace the string {{version}} with the version string from line 13. In previous attempts at this script, I ran into the message sed: RE error: illegal byte sequence when I ran the sed command. Adding LC_ALL=C solved this problem. I didn’t experience the error when I ran it on content.xml but I left it in just in case.
  14. We move the modified content.xml back into the ODT file.
  15. We convert the OpenDoc file to XHTML.
  16. We use the tidy command to beautify the XHTML. This makes the diff easier to read.
  17. We convert the OpenDoc file to PDF. This ensures that the file I distribute includes the version string.
  18. We remove the OpenDoc file.
  19. We close the for loop.
  20. We close the cmd_pages function.