Edi Diwan: Data interchange

Data exchange is the process of taking data structured under a source schema and actually transforming it into data structured under a target schema, so that the target data is an accurate representation of the source data.

The following is an incomplete list of popular generic languages used for data exchange in multiple domains.

	Schemas	Flexible	Semantic verification	Dictionary -Taxonomy	Synonyms and homonyms	Dialecting	Web standard	Transformations	Lightweight	Human readable	Compatibility
XML	Yes^[1]	Yes	No	No	No	Yes	Yes	Yes	No	No	subset of SGML, HTML
Atom	Yes	Unknown	Unknown	Unknown	Unknown	Yes	Yes	Yes	No	No	XML dialect
JSON	No	Unknown	Unknown	Unknown	Unknown	No	Yes	No	Yes	No	subset of JavaScript
YAML	No^[2]	Unknown	Unknown	Unknown	Unknown	No	No	No^[2]	Yes	Yes^[3]	superset of JSON
REBOL	Yes^[6]	Yes	No	Yes	Yes	Yes	No	Yes^[6]	Yes	Yes^[4]
Gellish	Yes	Yes	Yes	Yes^[7]	Yes	Yes	ISO	No	Yes	Partial^[5]	SQL, RDF/XML, OWL

Nomenclature:

Schemas - Whether the language definition is available in a computer interpretable form.
Flexible - Whether the language enables extension of the semantic expression capabilities without modifying the schema.
Semantic verification - Whether the language definition enables semantic verification of the correctness of expressions in the language.
Dictionary-Taxonomy - Whether the language includes a dictionary and a taxonomy (subtype-supertype hierarchy) of concepts with inheritance.
Synonyms and homonyms - Whether the language includes and supports the use of synonyms and homonyms in the expressions.
Dialecting - Whether the language definition is available in multiple natural languages or dialects.
Web or ISO standard - Organization that endorsed the language as a standard.
Transformations - Whether the language includes a translation to other standards.
Lightweight - Whether a lightweight version is available, in addition to a full version.
Human readable - Whether expressions in the language are readable by humans without training.
Compatibility - Which other tools are possible or required when using the language.

XML for data exchange. The popularity of XML for data exchange on the World Wide Web has several reasons. First of all, it is closely related to the preexisting standards Standard Generalized Markup Language (SGML) and Hypertext Markup Language (HTML), and as such a parser written to support these two languages can be easily extended to support XML as well. For example, XHTML has been defined as a format that is formal XML, but understood correctly by most (if not all) HTML parsers. This led to quick adoption of XML support in web browsers and the toolchains used for generating web pages.

JSON for data exchange. Actually a part of the JavaScript programming language, the JSON (JavaScript Object Notation) was split out into a low-level format for structured data exchange. While it was originally not designed for data exchange at all, it was discovered to be useful. In contrast to XML above, there exist no schema definition and no support for dialecting. The key benefits of this language are the low overhead (the amount of data needed for structuring) compared to XML and the similarly wide support: every web browser that has JavaScript support can also process JSON.

YAML for data exchange. YAML is a language that was designed to be human-readable (and as such to be easy to edit with any standard text editor). Its notion often is similar to reStructuredText or a Wiki syntax, who also try to be readable both by humans and computers. YAML 1.2 also includes a shorthand notion that is compatible with JSON, and as such any JSON document is also valid YAML; this however does not hold the other way.

REBOL for data exchange. REBOL is a language that was designed to be human-readable and easy to edit using any standard text editor. To achieve that it uses a simple free-form syntax with minimal punctuation, and a rich set of datatypes. REBOL datatypes like URLs, e-mails, date and time values, tuples, strings, tags, etc. respect the common standards. REBOL is designed to not need any additional meta-language, being designed in a metacircular fashion. The metacircularity of the language is the reason why e.g. the Parse dialect used (not exclusively) for definitions and transformations of REBOL dialects is also itself a dialect of REBOL. REBOL was used as a source of inspiration by the designer of JSON.

Gellish for data exchange. Gellish English is a formalized subset of natural English, which includes a simple grammar and a large extensible English Dictionary-Taxonomy that defines the general and domain specific terminology (terms for concepts), whereas the concepts are arranged in a subtype-supertype hierarchy (a Taxonomy), which supports inheritance of knowledge and requirements. The Dictionary-Taxonomy also includes standardized fact types (also called relation types). The terms and relation types together can be used to create and interpret expressions of facts, knowledge, requirements and other information. Gellish can be used in combination with SQL, RDF/XML, OWL and various other meta-languages. The Gellish standard is being adopted as ISO 15926-11.

JSON, or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. The official Internet media type for JSON is application/json. The JSON filename extension is .json. The JSON format is often used for serializing and transmitting structured data over a network connection. It is used primarily to transmit data between a server and web application, serving as an alternative to XML.

XML. XML has been used to describe structured data and to serialize objects. Various XML-based protocols exist to represent the same kind of data structures as JSON for the same kind of data interchange purposes. When data is encoded in XML, the result is typically larger than an equivalent encoding in JSON, mainly because of XML's closing tags. Yet, if the data is compressed using an algorithm like gzip there is little difference because compression is good at saving space when a pattern is repeated.

In XML there are alternative ways to encode the same information because some values can be represented both as child nodes and attributes. This can make automated data exchange complicated unless the used XML format is strictly specified as programs need to deal with many different variations of the data structure. Both of the following XML examples carry the same information as the JSON example above in different ways.

YAML is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail (RFC 2822). YAML is a recursive acronym for "YAML Ain't Markup Language". Early in its development, YAML was said to mean "Yet Another Markup Language", but was retronymed to distinguish its purpose as data-oriented, rather than document markup. YAML syntax was designed to be easily mapped to data types common to most high-level languages: list, associative array, and scalar.

source: wikipedia

Edi Diwan

Pages

Data interchange

No comments:

Post a Comment