Closing Up the Open Format Debate
|
This past week, Adobe and Microsoft both upped the ante in a file format war with stakes that are huge for both companies and their business customers. Here, we make some sense out of the rhetoric and poker-face bluffs. |
At the core is definition of "open" and "open standard," which labels are important to proponents of OASIS ODF (OpenDocument Format), Microsoft's OOXML (Office Open Extensible Markup Language) formats and Adobe's PDF (Portable Document Format). Adobe is the only one of three camps not claiming "open" as part of its format's name, even though the company has greatest experience of the camps with file format standardization.
In the week's news, open-source vendors working with Microsoft released an ODF-OOXML translator. Additionally, Adobe submitted PDF 1.7 to AIIM (Association for Information and Image Management), which will work on submission to ISO (International Organization for Standards) for adoption as an accepted standard. Tomorrow, Microsoft's OOXML ISO submission reaches its first important milestone--close of the contradiction period that determines whether or not there is contradiction, or invalidation, with another adopted standard.
Rightly Defining Open
"Open" with respect to file formats has become a hot topic among some governmental agencies, because of mandates that citizens have open and free access to information. The definitions of "open" and "standard" are cloaked in rhetoric and doublespeak, particularly among ODF supporters and OOXML advocates, as opposing camps seek stronger position for their formats.
In a speech two years, Eric Kriss, former Massachusetts' Secretary of Administration & Finance, offered a succinct definition: "Open Formats are specifications for data file formats based on an underlying open standard, developed by an open community, and affirmed by a standards body; or de facto format standards controlled by other entities that are fully documented and available for public use under perpetual, royalty-free, and nondiscriminatory terms. An example is TXT text and PDF document files."
Kriss offered this "open" definition before Microsoft received Ecma standards approval for OOXML. Under his definition, OOXML arguably qualifies as an open standard because of Ecma control and Microsoft's perpetual and royalty-free licensing terms.
However, OOXML fails to meet a second definition which I affirm: When presented with two so-called open standards of like purpose, only the more truly accessible of the two can be called open. By that definition, OOXML is open in name only. XML, upon which OOXML is based, is more open. Additionally, Microsoft's format takes highly-open XML and closes some of its openness. Same criticism applies to ODF.
In video "Web 2.0 ... The Machine is Us/ing Us," Michael Wesch, assistant professor of Cultural Anthropology at Kansas State University, aptly explains what XML is supposed to do. His definition contradicts Microsoft's position on OOXML and the need for ODF translators. Wesch presented the video as part of a program studying digital ethnography.
Wesch's video explains that HTML stylistic elements "define how content would be formatted. In other words, form and content become inseparable in HTML." By contrast, XML "does not define the form. It defines the content...So the data can be exported free of formatting constraints."
The KSU assistant professor defines XML's purpose as I long have understood it: free data exchange beyond formats and content consumable pretty much anywhere. XML is truly open, because as he explains, "XML facilitates automated data exchange," and it does so across application, database and Website boundaries.
Taking the Os from XML
By Wesch's definition, XML is more open than OOXML. Two Os do not an open XML make. Microsoft's own definition says it all. On Friday, I spoke to Microsoft XML architect Jean Paoli for more than hour about file formats. I asked why ODF translators are even necessary. He stressed the importance of maintaining formatting, which I
repeatedly argued is an artificial constraint. Microsoft's definition ties an XML-based format to the authoring tool that defines form and applies formatting constraints.
Adopted photo formats gif, jpg and png are examples of how open or standard formats should work. Each one is an accepted standard easily and widely supported by software applications or Web browsers. Pictures taken or maintained in a proprietary RAW file can be edited in the camera manufacturer's software or third-party application, like Adobe Photoshop, Apple Aperture, Corel PaintShop Pro or Microsoft Picture It. While some editing capabilities are lost when files are saved as gif, jpg or png, end users can be confident that the images will look the same regardless of the supporting program used to open the file.
XML is even more open and less constrained than the photo formats. But XML's openness doesn't necessarily extend to a format. Raw XML and a XML format are not the same. Microsoft's rhetoric touts OOXML openness. Instead, formatting restraints make content less open. In part, Paoli described the formatting approach as necessary to meet Microsoft's objective of offering backward compatibility with older Office file formats.
While XML is not a format, it is designed to fill the role of a "universal file format," said Roger Kay, founder of EndPoint Technologies. The ODF-OOXML "translators are a work around," he asserted.
To emphasize the point, Kay joked, "The world's going to end and there'll be no more text readers! I've got text readers hidden in a cave in Eastern Oregon." In his view, any productivity suite should be able to open and read a single, universally adopted format.
Paoli takes the position that "Open XML is going to be widely adopted. It is not dependent on any Microsoft technology or platform." He may be right, but fact remains that OOXML cannot be exported free of formatting constraints, as XML could do. XML can be removed from Microsoft's format and freely exported, but easy accessibility of the data remains constrained by the applied formatting.
If OOXML were truly open, the data could be extracted and consumed in a meaningful way across applications.
What About ODF and PDF?
While Microsoft and OASIS diverge on their approach to XML-based formats, they're pretty much the same with respect to openness--or lack thereof. ODF also meets Kriss' open definition, and in some ways more so than OOXML. ISO certified ODF as a standard last year.
Paoli described ODF as an immature file format compared to OOXML, particularly spreadsheet functions. "We really [don't] have a lot of demand from our customers, and ODF is not really that stable," he said.
Both formats are technically 1.0, so from that perspective they're equally mature. Paoli really means feature sophistication with respect to Office. But such an implication blurs the line between authoring tool and file format. Microsoft treats them as inseparable, whether its products or OpenOffice. If Microsoft truly embraced the vision of what XML could do, data and its meaningful consumption wouldn't be dependent on formatting constraints applied by an authoring tool. The same criticism could be applied to ODF supporters.
Both formats are XML-based, although their approaches to tagging and representing data differ. But neither truly embraces the real format constraint-freeing potential of pure XML.
Adobe's AIIM submission puts the format in similar open definition as the other two formats.
"The term open standard has different definitions depending on who you're talking to," said Sarah Rosenbaum, Adobe's director of product management. PDF "will be a standard like the way ODF is."
The standards process is important for Adobe, particularly in the government markets, where PDF has use huge traction. Adobe already has a couple PDF variants--one for advertising, another for archival documents--approved by ISO as standards.
I'll reaffirm the position taken when working as an analyst: PDF poses the greatest competitive threat to Microsoft formats. Period. Competitively, Microsoft's quest for OOXML recognition as being open or standards-based is as much about PDF as ODF. For ODF, Microsoft pushes back against a potential upstart. PDF is a long-established competitor continuing to win customers.
Over the coming year, particularly as OOXML and PDF move through standards bodies, the rhetoric over the terms "open" and "standard" will likely get murkier. Some ODF advocates are already pushing hard against OOXML. As long as there is need for translators or no universal file format--a role XML could play--each side can talk the talk but not walk the walk. By my definition, none of the formats is truly open or offers a truly meaningful way for enterprises to maximize the value of their data.
Create, Communicate, Collaborate with IT Professionals at Ziff Davis Enterprise IT Link.


Comments (10)
Sorry but a lot of what you say does not make any sense at all.
A standard is something which is defined.
An open standard is one which people can freely use in their own products.
ODF and OOXML are both open standards.
ODF can be said to be immature because it fails to define some important things such as what functions are available in spreadsheets. This leaves it up to the various implementers to decide which functions they will support and how they calculate the results. Thus an ODF workbook may not calculate the same or even at all when opened with different implementations of the "standard".
XML can be used to represent just about anything but to be useful you first have to define a schema which says how you will use XML to represent your things. ODF and OOXML are effectively different schemas for representing similar things (office documents). The schemas are slightly different hence the need for translators to convert files between the two schemas.
You say that formatting should be irrelevant to file formats but that seems to imply that you think all office documents should be written in plain text with no headings, no bold or italic, no bullet points, page headers, etc. Such "formatting" makes documents easier to read, more attractive, more searchable. Are you sure you want to remove all formatting from all documents?
Both ODF and OOXML make it easy to extract the plain text of a document from the file, because all text is identified with the appropriate schema defined tags. OOXML goes further by allowing the files to also include custom defined XML schema tags to mark pieces of text as having special meaning such as "CustomerName" or "InvoiceValue". Such custom XML markup makes it possible to automatically create OOXML documents containing business data or to consolidate business data found in these documents into centralised IT systems.
Simon Jones
Contributing Editor
PC Pro Magazine
Posted by Simon Jones | February 5, 2007 6:24 AM
Open XML (OOXML) is a total joke. It's what's referred to as a "monopoly enabler". It has proprietary 'extensions', its specs are incomplete, bugs are part of the 'standard', and the list of issues goes on and on. It's a case of creating a 'standard' to suit an application's development history, rather than defining a standard and then implementaing it.
Anyone who defends OOXML is selling something (or is being paid/bribed). Be very suspicious.
Posted by Roy Schestowitz | February 5, 2007 7:39 AM
The only thing which might be called "proprietary extensions" are the ability to embed WMF and EMF files in OOXML or possibly the compatibility settings which say things like "autospaceLikeWord95" without fully defining what that means. The former I thnk is perfectly justified. The latter could be more explicit. Perhaps it will be fixed as OOXML moves towards becoming an ISO standard.
"Bugs are part of the standard" must refer to the fact that all Excel versions think that the year 1900 was a leap year because that was what was originally implemented in Lotus 1-2-3. Yes, it would have been good to eliminate this bug from the standard. However, the problem only affects date calculations for Jan & Feb 1900 so isn't a huge problem.
The full list of objections to OOXML becoming an ISO standard is at http://www.grokdoc.net/index.php/EOOXML_objections
I do not think that ODF is any different in the way it was developed. ODF file formats were defined by Sun/OpenOffice and then submitted to OASIS as a proposed standard. OASIS didn't define a standard for documents and then get software manufacturers to write suites around them.
As for the implied accusation that I am paid or bribed by someone - I make a living from advising people on office applications and writing about that work. I've investigated both ODF and OOXML and come to my own conclusion that, while both are capable, and acknowledging the limitations and failings of both, OOXML has distinct advantages over ODF.
Simon Jones
Contributing Editor
PC Pro Magazine
Posted by Simon Jones | February 5, 2007 8:48 AM
Under his definition, OOXML arguably qualifies as an open standard
Only if you ignore "developed by an open community." OOXML was not developed by an open community; it was developed by Microsoft.
Posted by Swashbuckler | February 5, 2007 10:12 AM
Joe-
You & Swash, here, are firing on all cylinders.
MOOXML is a crock. Simon has his head so far up his proverbial he needs a glass stomach. (He's desperately protecting the ad revenue stream from Microsoft. This is transparent.)
So, the spreadsheet issue for ODF is not one to hang your hat on since the work is under weigh and you'll have nothing to object to this spring when the OASIS Formula sub-comittee signs off its work.
As for immature, that's plainly an immature thing to say. Talk about immature...the 6,000-page MOOXML specification document is that long because you, Jean, dumped all you had documented regarding you file formats (much made up off the tops of Brian Jones' head at the last minute because it wasn't documented) into a duffel bag when the (Massachusetts) house caught fire, as if they were the family jewels.
You guys are hilarious!
Posted by Sam Hiser | February 5, 2007 11:23 AM
Saying "it is a crock" and making disparaging personal remarks does not shed much light on the subject.
I don't have to protect any ad revenues. I can write whatever I like and I frequently criticize Microsoft.
I'll be very glad when ODF 1.2, including the specifications of spreadsheet functions is published and even more glad when the vendors update, or certify, their products to conform.
I will really celebrate, however, when the ODF specification includes the ability to include custom XML schemas so that business data can be automatically included in and parsed from the files. This ability, buit in to OOXML, offers significant business advantages over ODF as it currently stands or is planned.
Simon Jones
Contributing Editor
PC Pro Magazine
Posted by Simon Jones | February 5, 2007 2:04 PM
"Saying "it is a crock" and making disparaging personal remarks does not shed much light on the subject."
Sadly, that's the nature of the dialogue whenever the standard ABM or OSS crowd appears. Next they'll be referring to it as M$. In any event, appreciated your input - nice job.
Posted by bob | February 5, 2007 9:46 PM
Roy, What proprietary extensions are you talking about? Why you produce evidence to back up your claims?
Sam, sadly, the spreadsheet issue is not the only one present in ODF. There's many many more missing definitions. Could you tell me please, where in the ODF spec I will find a description of this formatting element:
false
You won't be able to, because it's not there. Needless to say, that's not the only one you will find in ODF "conforming" documents. One might be tempted to call them Open Office proprietary extensions. ;)
Please respond only on technical grounds, ad hominem attacks and straw men will only further discredit your statements. Please do note that I am not a paid shill, am not employed by Microsoft, and don't stand to gain anything monetarily via this argument. I also have been a happy FreeBSD and Linux user for about 7 years and generally support the libre software community.
I will not support, carte blanche, an Open Source position funded by large companies such as IBM and Sun (both demonstrably as evil as MS) and neither will I support large angry and vocal hypocritical masses resorting to the very same topics they bash their "opponent" with
Posted by Jason Gurtz | February 8, 2007 8:45 AM
blog stripped element, missing from ODF spec, angle brackets replaced by curly braces:
{config:config-item config:name="DoNotJustifyLinesWithManualBreak" config:type="boolean"}false{/config:config-item}
Posted by Jason Gurtz | February 8, 2007 8:54 AM
@Swashbuckler: So, Open means for you "created by an Open Community"? well, ODF does not enter in that kind, remember, it was developed by Sun (I guess it is not an organization or something :P)
Posted by cprieto | August 29, 2007 9:17 AM