Alfresco Google Docs integration – not very usable for advanced formatting

feb 17 2013

I have done some tests using the new Google Docs (Google Drive) integration in Alfresco 4.2. You can se all on how it works in this video created by Jeff Potts, Alfresco.
The Alfresco engineers has all done a great job here, and I would love to use this as my primary way of editing office documents. If it were not for what the actual output of the advanced formatted documents look like.

The process of editing goes like this; you create your file in Alfresco, this file is stored in by default as an Open Office XML file (Microsoft Office 2007). You then check this file out, in this case transfer it to Google Docs. This is where the first conversion of file format takes place, from ooxml to proprietary Google Docs format. And we have the first formatting loss. Do your edits, save back to Alfresco. Now we have a second conversion of file format, from Google Docs format back to ooxml, and with that some more formatting loss.

You can clearly notice that some formatting gets lost on each edit, and you cannot be sure that your end result is what you expected. I also tested with Open Document format files, result is different formatting loss, but the same basic problem.
As test documents I used OoXML files from ooninja.

I actually think the Google Docs editor is ok for most situations, and if you print the document or export the document to pdf while still in Google Docs, the looks and formatting matches what you see on screen.

So what can be done about this situation?

Not much until Google makes its proprietary document format available. Even if format conversion filters improves a lot I think there will be formatting losses when you have to convert format back and forth each edit. But if you could actually retrieve the Google Docs format file, Alfresco could store this file in the repository. When you need to edit next time, just upload this file to Google Docs, it’s their own format so should be no formatting loss here. Preview could be generated when the checking is done by exporting a pdf from Google Docs. We shouldn’t need to involve other office formats at all.

As stated initially, I think Alfresco engineers have done as good job as possible. The problem here is locking in file formats, only Google can change that. They will probably argue that there is no lock in since you can export content in may different formats. And it may be that it is technically very difficult, I’m still thinking ‘file’ when thinking about document, a document stored in Google Docs format may very well be many fragments split up in a database.