ZapFlash

Semantic Interoperability: The Pot of Gold Under the Rainbow

Our ZapThink 2020 poster lays out our complex web of predictions for enterprise IT in the year 2020. You might think that semantic operability is an important part of this story; after all, several groups have been heads down working on the problem of how to teach computers to agree on the meaning of the information they exchange for years now. But look again: we relegate semantics to the lower right corner, where we point out that we don’t believe there will be much progress in this area by 2020. Eventually, maybe, but even though semantic interoperability appears to be within our grasp, it behaves more like the pot of gold under the rainbow. The closer you get to the rainbow, the farther away it appears.

What gives? ZapThink usually takes an optimistic perspective about the future of technology, but we’re decidedly pessimistic about the prospects for semantic interoperability. The problem as we see it comes down to the human understanding of language. All efforts to standardize meanings in order to facilitate semantic interoperability strip out vagueness and ambiguity from data, and presume a single, universal underlying grammar. After all, isn’t the goal to foster precise, unambiguous, and consistent communication between systems? The problem is, human communication is inherently vague, ambiguous, and relative. The way humans understand the world, the way we think, and the way we put our thoughts into language require both vagueness and ambiguity. Without them, we lose important aspects of meaning. Furthermore, how we structure our language is culturally and linguistically relative. As a result, current semantic interoperability efforts will be able to address a certain class of problems, but in the grand scheme of things, that class of problems is a relatively small subset of the types of communication we would prefer to automate between systems.

The Importance of Vagueness

Ironically, to discuss semantics we must first define our terms. A term is vague when it’s impossible to say whether the term applies in certain circumstances, for example, “my face is red.” Just how red does it have to be before we’re sure it’s red? In contrast, a term is ambiguous when it’s possible to interpret it in more than one way. For example, “I’m going to a bank” might mean that I’m going to a financial institution or to the side of a river.

Vagueness leads to knotty problems in philosophy that impact our ability to provide semantic interoperability. So, let’s go back to philosophy class, and study the sorites paradox. If you have a heap of sand and you take away a single grain, do you still have a heap of sand? Certainly. OK, repeat the process. Clearly, when you get down to a single grain of sand remaining, you no longer have a heap. So, when did the heap cease to be a heap?

Philosophers and linguists have been arguing over how to solve the sorites paradox for over a century now (yes, I know, they should find something more useful to do with their time). One answer: put your foot down and establish a precise boundary. 1,000 or more grains of sand are a heap, but 999 or less are not. Our computers will have no problem with such a resolution to the paradox, but it doesn’t accurately represent what we really mean by a heap. After all, if 1,000 grains constitutes a heap, wouldn’t 999? Central to the meaning of the term “heap” is its inherent vagueness.

Another solution: yes, there is some number of grains of sand where a heap ceases to be a heap, but we can’t know what it is. This resolution might satisfy some philosophers, but it doesn’t help our computers make sense out of our language. A third approach: instead of considering “is a heap” and “isn’t a heap” as the only two possible values, define a spectrum of intermediary values, or perhaps a continuum of values. The computer scientists are likely to be happy with this answer, as it lends itself to fuzzy logic: the statement “this pile of sand is a heap” might be, say, 40% true. Yes, we can do our fuzzy logic math now, but we’ve still lost some fundamental elements of meaning.

To bring back our natural language-based understanding of the sorites paradox, let’s step away from an overly analytical approach to the problem and try to look at the paradox from a human perspective. How, for example, would a seven-year-old describe the heap of sand as you take away a grain of sand at a time? They might answer, “well, it’s a smaller heap” or “it’s kinda a heap” or “it’s a little heap” or “it’s not really a heap,” etc. Such expressions are clearly not precise. Our computers wouldn’t be able to make much sense out of them. But these simple, even childish expressions are how people really speak and how people truly understand vagueness.

The important takeaway here is that vagueness isn’t a property relegated to heaps and blushing faces. It’s a ubiquitous property of virtually all human communication, even within the business context. Take for example an insurance policy. Insurance policies have a number of properties (policy holder, underwriter, insured property, deductible, etc.) and relationships to other business entities (policy application, underwriting documentation, claims forms, etc.) Now let’s add or take away individual properties and relationships from our canonical understanding of an insurance policy one at a time. Is it still a policy? Clearly, if we take away everything that makes a policy a policy then it’s no longer a policy. But if we take away a single property, we’re likely to say it’s still a policy. So where do we draw the line? If philosophers and linguists haven’t solved this problem in over a century, don’t expect your semantic interoperability tool to make much headway either.

The Problem of Linguistic Relativity

Another century-long battle in the world of linguistics is the fray over linguistic relativity vs. linguistic universality. Linguistic relativity is the position that language affects how speakers see their world, and by extension, how they think. In the other corner is Noam Chomsky’s universal grammar, the linguistic theory that grammar is hardwired into the brain, and hence universal across all peoples regardless of their language or their culture. Theoretical work on a universal grammar has led to dramatic advances in natural language translation, and we all get to use and appreciate Google Translate and its brethren as a result. But while Google Translate is a miraculous tool indeed (especially for us Star Trek fans who marveled at the Universal Translator), it doesn’t take a polyglot to realize that the state of the art for such technology still leaves much to be desired.

Linguistic relativity, however, goes at the heart of the semantic interoperability challenge. Take for example, one of today’s most useful semantic standards: the Resource Description Framework (RDF). RDF is a metadata data model intended for making statements about resources (in particular, Web-based resources) in the form of subject-predicate-object expressions. For example, you might be able to express the statement “ZapThink wrote this ZapFlash” in the triplet consisting of “ZapThink” (the subject); “wrote” (the predicate); and “this ZapFlash” (the object). Take this basic triplet building block and you can build semantic webs of arbitrary complexity, with the eventual goal of describing the relationships among all business entities within a particular business context.

The problem with the approach RDF takes, however, is that the subject-predicate-object structure is Eurocentric. Non-European languages (and hence, non-European speakers) don’t necessarily think in sentences that follow this structure. And furthermore, this problem isn’t new. In fact, the research into this phenomenon dates back to the 1940s, with the work of linguist Benjamin Lee Whorf. Whorf conducted linguistic research among the Hopi and other Native American peoples, and thus established an empirical basis for linguistic relativity. The illustration below comes from one of his seminal papers on the subject:




In the graphic above, Whorf compares a simple sentence, “I clean it with a ramrod,” where “it” refers to a gun, in English and Shawnee. The English sentence predictably follows the subject-predicate-object format that RDF leverages. The Shawnee translation, however, translates literally to “dry space/interior of hole/by motion of tool or instrument.” Not only is there no one-to-one correspondence between parts of speech across the two sentences, but the entire context of the expression is different. If you were in the unenviable position of establishing RDF-based semantic interoperability between, say, a British business and a Shawnee business, you’d find RDF far too culturally specific to rise to the challenge.

The ZapThink Take

We have tools for semantic interoperability today, of course – but all such tools require the human step of configuring or training the tool to understand the properties and relationships among entities. Once you’ve trained the tool, it’s possible to automate many semantic interactions. But to get this process started, we must get together in a room with the people we want to communicate with and hammer out the meanings of the terms we’d like to use.

This human component to semantic interoperability actually dates to the Stone Age. How did we do business in the Stone Age? Say your tribe was on the coast, so you had fish. You were getting tired of fish, so you and your tribemates decided to pack up some fish and bring the bundle to the next village where they had fruit. You showed up at the village market, only you had no common language. So what did you do? You held up some fish, pointed to some fruit, grunted, and waved your hands. If you established a basis of communication, you conducted business, and went home with some fruit. If not, then you went home empty handed (or you pulled out your clubs and attacked, but that’s another story). Cut to the 21st century, and little has changed. People still have to get together and establish a basis of communication as human beings in order to facilitate semantic interoperability. But fully automating such interoperability is as close as the next rainbow.

Discussion

5 comments for “Semantic Interoperability: The Pot of Gold Under the Rainbow”

  1. Jason:

    This is a very interesting article and is noteworthy!
    This is the first article that I have read and which addresses “interoperability” issue from the human side.

    Until this article, most techies –except for the DoD’s research scientists doing work on cognitive and social domains of Command and Control (C2) — did not realize that computers don’t have cognitive capabilities that can reason to understand the difference between “I am going to the bank” and “I am going to the river bank”.

    The individuals of a social-cultural system interpret meanings of sentences differently from other social-cultural systems. Thus, attempting to use computers to understand the meanings of language differences among different social-cultural systems is meaningless — impossible.

    “Rashomon effect” is another term used by social scientists to address this issue. An interesting article –The Rashomon Effect Combining Positivist and Interpretivist Approaches in the Analysis of Contested Events WENDY D. ROTH and JAL D. MEHTA, Harvard University — exists in the literature about “Rashomon effect.”

    Recently, I have come to realize that many of the techies may need to become social scientists, to address many of scientific issues on Cyber Security, interoperability, etc.

    Great job!

    Kofi

    Posted by Dr. Kofi Nyamekye | February 15, 2013, 12:32 pm
  2. In my opinion you exaggerate the problem. Most of your examples can be solved. For example:
    - homonyms such as ‘bank’ can be solved via context
    - ‘heap’ clearly specifies what it is (and what it is not). If you buy a heap of sand everybody knows that the quantity is still unspecified. Thus being a heap does by definition not specify a quantity.
    - Similarly for ‘red’.

    The remaining problems should be solved by formalization of the language. Gellish Formal English is an example of such a formal language.

    For non-European languages Gellish allows to make expressions with different sequences of terms or characters and different phrases to express the same meaning.

    Posted by Andries van Renssen | February 18, 2013, 7:25 am
    • The question of context is an important one. There’s more to say about that, so stay tuned for a future ZapFlash.

      But you’re missing the point about the heap. When we formalize language, we can say things like “‘heap’ doesn’t specify a quantity,” but that formalization strips the word of some of its meaning.

      In other words, formalization of language will solve a certain class of problems, but it will never resolve the broader issue of semantic interoperability in general.

      Posted by Jason Bloomberg | February 18, 2013, 8:52 am
  3. A very good article. But I fear some of your readers will take away an overly pessimistic message, which you probably don’t intend. Although the subject-predicate-object form may be not serve all languages well, that doesn’t mean there aren’t a very large class of problems to which it is suited. I’ll be happy if a few of them are adequately addressed by 2020.

    Posted by Steve Wartik | March 5, 2013, 4:23 pm
  4. I am reminded of a great book I read when it was written in the 1980s when I was working in AI. Terry Winograd and Fernando Flores’ book, Understanding Computers and Cognition was isntrumental in turning me away from the dead end of trying to automatically understand natural language. I am pleased to discover, from this review, http://www.amazon.com/Understanding-Computers-Cognition-Foundation-Design/product-reviews/0201112973 that it is rated among the top 10 books of all time on IT by Byte magazine.

    I still use their lovely example of “Time flies like and arrow” and “Fruit flies like a banana” to illustrate the problem.

    Posted by John Moody | October 23, 2013, 9:10 am

Post a comment

FREE POSTERS

NEW VERSION! ZapThink's Vision for Enterprise IT in 2020
With all new content including Dev/Ops, Hypermedia-Oriented Architecture, Big Data Visualization, and more!
Click here to download for FREE
10-pack of prints for only the cost of shipping!

SOA Implementation Roadmap
Over 100,000 downloaded!
Click here to download for FREE
10-pack of prints for only the cost of shipping!