Technology
13 minute read

A Deep Look at JSON vs. XML, Part 1: The History of Each Standard

Seva is a veteran of both enterprise and startups with 20 years of industry experience, and a UC Berkeley graduate in EECS and MSE.

From desktop to web and mobile, nearly all computer applications that we use today rely on one of two principal message standards: JSON and XML. Today, JSON is the most widely-used format, but it only overtook XML within the last five years. A quick online search for “JSON vs. XML” will bring countless articles and blog posts comparing the two standards, and amounting to a progressively expanding bias praising JSON’s simplicity and criticizing XML’s verbosity. Numerous articles insist that JSON is superior to XML due to its terse semantics and discount XML as an inefficient and confusing standard of the past. At first blush, JSON seems to be most popular—so is JSON simply better than XML? The battle of “JSON vs. XML” might go to JSON on the surface, but at depth, there is more than meets the eye.

In Part 1 of this article, we will:

  1. Take a closer look at the history of the web to uncover the original purpose of XML and JSON.
  2. Consider the evolution of software trends in recent years to ascertain why JSON became more popular than XML.

The History of JSON and XML

To uncover the reason for JSON’s popularity over XML, let’s explore the history of the web and how its evolution from Web 1.0 to Web 2.0 influenced trends in development.

The Web 1.0: HTML

The early 1990s was the dawn of Web 1.0. HTML was introduced in 1991 and was widely adopted by universities, businesses, and government organizations as the language of the web. HTML came from SGML, or “Standard Generalized Markup Language,” invented in the 1970s by IBM. In addition to mass adoption, HTML saw mass adaptation—extensions were embedded to support multimedia, animation, online applications, eCommerce, and more. As a derivative of SGML, HTML lacked a strict specification to restrain companies from freely expanding it to fulfill requirements that were beyond the original concept. The contest for the most popular web browser between Netscape and Microsoft yielded rapid progress, but it also led to relentless fragmentation of the standard. The fierce competition resulted in a “divergence catastrophe” as the augmentation of HTML by the two companies caused the browsers to support their own unique versions of HTML. This divergence catastrophe became a huge problem for web applications as developers struggled to write interoperable code for the browsers.

The Web 1.1: XML + HTML = XHTML

In the late 1990s, a group of people—including Jon Bosak, Tim Bray, James Clark and others—came up with XML: the “eXtensible Markup Language.” Like SGML, XML is not itself a markup language, but a specification for the definition of markup languages. XML was an evolution from SGML—designed to provide a means to define, and to enforce, structured content. Considered as “the holy grail of computing,”1 the XML language endeavored “to solve the problem of universal data interchange between dissimilar systems” (Dr. Charles Goldfarb)2. In lieu of the ongoing fragmentation of HTML, the World Wide Web Committee (W3C) was formed to foster compatibility and agreement among the industry in the adoption of new standards for the World Wide Web.3 The W3C set about reshaping HTML as an XML application, with the result being XHTML.

XHTML was a big initiative that brought attention to XML, but it is only one small part of what XML is all about.

XML provided a way for the industry to specify, with strict semantics, custom markup languages for any application. With the keyword being “strict semantics,” XML defined a standard that could assert the integrity of data in any XML document, of any XML sub-language. For software companies developing distributed enterprise applications that interface with disparate systems, a markup language that could assert the integrity of its data was significant. By defining structured content with XML, companies leveraged the features of this technology to interoperate with any platform, enforce data integrity of every data interchange, and to systematically reduce the software risk of their systems. For the industry, XML provided a technology to store, communicate, and validate any kind of data, in a form that applications on any platform can easily read and process. For HTML, XML promised to solve the “divergence catastrophe.”

Java and .NET

In the early 2000s, the web was governed by two companies: Sun and Microsoft. At the time, the landscape of programming languages was heavily slanted to the server side. The common architecture for web applications relied on servers rendering HTML pages on the back end to be delivered to the browser. This approach highlighted back-end technologies, which in turn popularized the leading back-end platforms: Java and C#.NET. Developed by Sun Microsystems, Java led the new generation of object-oriented programming languages that solved the cross-architecture problem with its novel “write once run anywhere”4 approach. Microsoft followed with .NET, C#, and the Common Language Runtime (CLR) and set its eyes on XML as the choice approach to solve the data interoperability puzzle. Microsoft became XML’s greatest advocate, with the company choosing XML as an integral part of its prominent .NET initiative. Advertised as “a platform for XML web services,”5 .NET applications were architected to use XML for communication with other platforms. Selected as Microsoft’s data interchange standard, XML was integrated into its flagship server products, such as SQL Server and Exchange.

The Web 1.2: AJAX

The delivery of pre-rendered HTML pages to the browser was not scalable, and the web needed an alternative. With each user action requiring a fresh page to be loaded from the server, process load and bandwidth consumption became a concern as more people swelled the web.

Netscape and Microsoft endeavored to tackle this problem with asynchronous content delivery: ActiveX and JavaScript. In 1998, the Microsoft Outlook Web Access team developed the concept behind ActiveX6, which was later implemented by Mozilla, Safari, Opera and other browsers in JavaScript as the XMLHttpRequest object.

AJAX was born from Microsoft's ActiveX and Netscape's JavaScript.

The term AJAX—short for “Asynchronous JavaScript and XML”—has come to represent a broad range of web technologies that can be used to implement web applications that communicate with servers in the background without requiring a page to be reloaded. In the article that coined the term AJAX,7 8 Jesse James Garrett outlined the main concepts:

  1. HTML (or XHTML) and CSS for presentation.
  2. The Document Object Model (DOM) for dynamic display of, and interaction with, data.
  3. XML for the interchange of data, and XSLT for its manipulation.
  4. The XMLHttpRequest object for asynchronous communication.
  5. JavaScript to bring these technologies together.

With asynchronous content delivery proving to reduce server load and save considerable bandwidth, AJAX was a game changer. The introduction of XMLHttpRequest to the browsers allowed developers to implement more complex logic in the front end. Google made a wide deployment of standards-compliant, cross-browser AJAX with Gmail in 2004 and Google Maps in 2005.9 And, in October 2004, Kayak.com’s public beta release was among the first large-scale eCommerce uses of AJAX.10

The Web 2.0: Single-page Applications

The adoption of AJAX as the scalable architecture for web applications led to the dawn of Web 2.0: the Single-page Application (SPA).11 Instead of reloading the entire page for each user interaction, SPAs dynamically rewrite the current page within the browser. In addition to a considerable reduction in server load and bandwidth consumption, the SPA approach allowed web applications to resemble desktop applications due to the seamless and uninterrupted experience during user interaction.

In April 2002, Stuart Morris wrote one of the first SPAs at slashdotslash.com12, and later the same year, Lucas Birdeau, Kevin Hakman, Michael Peachey, and Evan Yeh described a single-page application implementation in US patent 8,136,109.13 The patent described web browsers using JavaScript to display the user interface, run application logic, and communicate with a web server.

Google’s Gmail, Google Maps, and Kayak’s public beta sparked a new era of web application development. Browsers enabled with AJAX empowered developers to write rich applications for the web. The easy semantics of JavaScript made application development possible for programmers of any caliber. The barrier to entry into software development was greatly reduced, and individual developers around the world began contributing with libraries and frameworks of their own. Popular libraries like jQuery, which normalize AJAX behavior across browsers from different manufacturers, further progressed the AJAX revolution.

The Rise of JSON

In April 2001, Douglas Crockford and Chip Morningstar sent the first JSON message from a computer at Morningstar’s Bay Area garage. Crockford and Morningstar were trying to build AJAX applications well before the term “AJAX” had been coined, but browser support for what they were attempting to achieve was not good. They desired to pass data to their application after the page had loaded, but had not found a way to allow this to work across all browsers.

In 2001, the development of AJAX was just starting, and there was not yet an interoperable form of the XMLHttpRequest object in Internet Explorer 5 and Netscape 4. So Crockford and Morningstar used a different approach that worked in both browsers.

The first JSON message looked like this:

<html><head><script>
    document.domain = 'fudco';
    parent.session.receive(
        { to: "session," do: "test,"
          text: "Hello world" }
    )
</script></head></html>

This message is actually an HTML document containing some JavaScript, and only a small part of the message resembles JSON as we know it today. Crockford and Morningstar were able to load data asynchronously by pointing an <iframe> to a URL that would return an HTML document like the one above. When the response was received, the JavaScript in the HTML would be run, and by sidestepping browser protections preventing sub-windows from accessing the parent, the object literal was passed back to the main frame of the application. This frame-based technique, sometimes called the “hidden frame technique,” was commonly used in the late 90s before the widespread implementation of XMLHttpRequest.14

This approach appealed to developers because it offered interoperability across all browsers. Since the message is just JavaScript, it doesn’t require any kind of special parsing. The idea of using JavaScript this way was so straightforward that Crockford himself said that he wasn’t the first person to do it—he claims that somebody at Netscape was using JavaScript to communicate information as early as 1996.15

Crockford and Morningstar realized they had something that could be used in all sorts of applications. They named their format JSON, which is short for “JavaScript Object Notation.” They began pitching it to clients but soon found that many were unwilling to take a chance on a novel technology that lacked an official specification. So Crockford decided he would write one. In 2002, Crockford bought the domain JSON.org and put up the JSON grammar and a reference implementation of a parser. The website is still up, though it now includes a prominent link to the JSON ECMA standard ratified in 2013.16 After putting up the website, Crockford did little more to promote JSON but soon found people submitting JSON parser implementations for different programming languages. JSON’s origin is clearly tied to JavaScript, but it became apparent that JSON was well-suited to interchange data between arbitrary languages.

JSON in AJAX

In 2005, Jesse James Garrett coined the term “AJAX” in a blog post, where he stressed that AJAX was not any one new technology but rather “several technologies, each flourishing in its own right, coming together in powerful new ways.”16 His blog post went on to describe how developers could leverage JavaScript and XMLHttpRequest to build new kinds of applications that were more responsive and stateful than the typical web page. He pointed to Gmail, Google Maps, and Flickr as examples of websites using AJAX techniques. Though “X” in “AJAX” stood for XML, Garrett pointed to JSON as an entirely acceptable alternative. He wrote that “XML is the most fully developed means of getting data in and out of an AJAX client, but there’s no reason you couldn’t accomplish the same effects using a technology like JavaScript Object Notation or any similar means of structuring data.”17

JavaScript and JSON were unequivocally meant to be together. JSON’s semantics map directly to JavaScript, making it the native data interchange format for the language. Developers quickly found that JSON was easier to work with in JavaScript, and many came to prefer it to XML.

As JSON drew the attention of the blogosphere, the proliferation of JSON had begun.

JSON is the native format for data in JavaScript applications. With the popularization of JavaScript in the last decade, more JSON messages have been created than any other data format. Writing applications in JavaScript almost mandates the use of JSON for data interchange. Other formats are possible, but they require more effort than with JSON. With JavaScript gaining popularity for application development, JSON followed closely in its wake as the easy-to-use and natively integrated data interchange format.

As far as the blogosphere is concerned, more articles, samples, and tutorials have been written about JavaScript (and hence JSON) than any other programming platform.

The history and evolutionary path of the web has played a significant role in the popularization of JSON. According to Stack Overflow, more questions are now asked about JSON than about other data interchange formats.18

alt_text

According to Google Trends, a similar profile is seen comparing search interest for JSON and XML.19

alt_text

Does the proliferation of JavaScript mean that JSON is better than XML?

Developer communities insist that JSON became more popular than XML because of its concise declarative scope and simple semantics. Douglas Crockford himself summarizes some of JSON’s advantages on JSON.org: “JSON is easier for both humans and machines to understand, since its syntax is minimal and its structure is predictable.”20 Other critics of XML have focused on XML’s verbosity as the “the angle bracket tax.”21 The XML format requires each opening tag to be matched with a closing tag, resulting in redundant information that can make an XML document significantly larger than an equivalent JSON document when uncompressed. And, perhaps more importantly, developers say: “it also makes an XML document harder to read.”22

JSON has been readily praised due to its simplicity and terse semantics, and XML labeled as an antiquated standard of the past due to its verbosity and seemingly excessive complexity. Many articles and blog posts offer a limited perspective when comparing JSON to XML, leading readers to believe that JSON is a replacement for XML. This is not the case!

The limited perspective offered by articles and blog posts has led readers to discount XML as obsolete, leaving many unaware of powerful features that may help them improve their software's architecture and resilience to change as well as systematically reduce software risk.

JSON is more popular than XML because of JavaScript’s dominance as the most widely used language of today. With JavaScript’s influence of software trends in the last decade, JSON continues to receive increasingly more attention than any other data interchange format. The blogosphere is quick to echo that “JSON is better than XML,” but most often than not, these statements are left unjustified, or are backed with simplistic comparisons regarding semantics and verbosity. So is either standard better than the other? The answer to this question can only come from a deeper evaluation of each standard.

Conclusion

From 1990 to today, the web has come a long way. The browser wars between Netscape and Microsoft led to a divergence catastrophe of HTML, and a solution was needed to save the web. XML was invented formalize XHTML, and provided a holy grail solution for computing as a whole. From rendering of full HTML pages by back-end servers to AJAX and SPAs, trends in web architecture and browser development have brought focus onto JavaScript, steering developers worldwide toward JSON.

JSON’s popularity is correlated to that of JavaScript. With its ease of use and short learning curve, JavaScript has brought more new developers to write software than any other language. With JSON’s native integration with the most popular development platform, it is not surprising that more questions are asked about JSON on Stack Overflow than any other data interchange format.

With software trends in recent years bringing more JavaScript developers to the industry, JSON has gained the title of "most popular data interchange format."

On the surface, the battle of “JSON vs. XML” goes to JSON, but at depth, there is more than meets the eye.

In Part 2 of this article, we will look closer at the technical strengths and weaknesses of JSON and XML, and evaluate the suitability of each standard for common applications and the enterprise. A closer look at “data interchange” will reveal the breadth of its influence unto the software risk of the application in whole. A deeper understanding of the fundamental differences between JSON and XML will allow developers to better evaluate the software risk of each standard in relation to the requirements of their project—to empower developers to build software that is more stable, and more resistant to bugs and future unknowns.

By the way, an interesting quirk of the JSON specification is that you cannot convert JavaScript objects with bidirectional relationships, where child properties refer to parent properties, into JSON. Doing so would result in an Uncaught TypeError: Converting circular structure to JSON error. For a hack around this limitation see Bidirectional Relationship Support in JSON.

References

1. The Internet: A Historical Encyclopedia. Chronology, Volume 3, p. 130 (ABC-CLIO, 2005)

2. Handbook Of Metadata, Semantics And Ontologies, p. 109 (World Scientific, December 2013)

3. WebDiy.org - World Wide Web Consortium (W3C) - History

4. "JavaSoft ships Java 1.0" (Sun Microsystems, January 1996)

5. Spatially Enabling the Next Generation Internet (David Engen, January 2002)

6. The story of XMLHTTP (AlexHopmann.com, January 2007)

7. Beginning Ajax - Page 2 (Wiley Publishing, March 2007)

8. Ajax: A New Approach to Web Applications (Jesse James Garrett, February 2005)

9. A Brief History of Ajax (Aaron Swartz, December 2005)

10. "What is Kayak.com?" (Corporate Backgrounder, October 2008)

11. Inner-Browsing: Extending the Browsing Navigation Paradigm (Netscape, May 2003)

12. "A self contained website using DHTML" (SlashDotSlash.com, July 2012)

13. Delivery of data and formatting information to allow client-side manipulation (US Patent Bureau, April 2002)

14. "What Is Ajax?" Professional Ajax, 2nd ed. (Wiley, March 2007)

15. Douglas Crockford: The JSON Saga (Yahoo!, July 2009)

16. ECMA Standard 404 (ECMA International, December 2017)

17. Ajax: A New Approach to Web Applications (Jesse James Garrett, February 2005)

18. Stack Overflow Trends (Stack Overflow, 2009-2019)

19. Google Trends (Google, 2004-2019)

20. JSON: The Fat-Free Alternative to XML (Crockford, 2006)

21. XML: The Angle Bracket Tax (Coding Horror, May 2008)

22. Xml Sucks (WikiWikiWeb, 2016)

Understanding the basics

Is JSON better than XML?

JSON is simpler than XML, but XML is more powerful. For common applications, JSON's terse semantics result in code that is easier to follow. For applications with complex requirements surrounding data interchange, such as in enterprise, the powerful features of XML can significantly reduce software risk.

How is XML different from JSON?

JSON is a data interchange format and only provides a data encoding specification. XML is a language to specify custom markup languages, and provides a lot more than data interchange. With its strict semantics, XML defined a standard to assert data integrity of XML documents, of any XML sub-language.

When should I use JSON and XML?

JSON is best for simple applications, developed to satisfy simple requirements surrounding data interchange. XML is best for applications with complex requirements surrounding data interchange, such as in enterprise.

Is JSON more secure than XML?

Neither JSON nor XML is more secure than the other. For both standards, security is implemented in a logical layer above the message layer.

Is JSON faster than XML?

JSON is faster because it is designed specifically for data interchange. JSON encoding is terse, which requires less bytes for transit. JSON parsers are less complex, which requires less processing time and memory overhead. XML is slower, because it is designed for a lot more than just data interchange.