Trending Tech

The Technicalities of Ambiguity Detection

By Fountech Labs

November 15, 2023

Human interaction is a complex process that is founded on communication, with natural language being the most effective means of exchanging information. In fact, it is through language that we are able to interpret and comprehend the social world around us.

The process of human interaction involves several steps. It starts with the exchange of messages through language, both verbal and nonverbal. Once someone receives the message, they then subconsciously or consciously process the data for understanding and comprehension, after which they may respond in various ways, such as asking for clarification, expressing agreement or disagreement, or taking action based on the information they received.

This process of communication and response is what sets humans apart from other animals and demonstrates our intelligence. In fact, it is the very thing that we hope machines will eventually be able to do with us, as we strive to create more intelligent and responsive technology.

Natural language is considered to be the most effective means of communication between humans, and it can also be used to facilitate communication between machines and humans, or even between different computers. By using natural language, machines can better interpret and respond to human queries and commands, making it easier for people to interact with technology in a more intuitive way. Programmers have used this capability to develop intelligent virtual assistants and chatbots that can mimic human-like conversations and perform tasks on behalf of the user. Human conversations are inherently open to more than one interpretation as with the use of pronouns or inferred meaning based on presumptions, and this ambiguity is particularly challenging in Natural Language Understanding (NLU) situations.

Ambiguity in any language varies greatly depending on the speaker and the audience. A language with a large enough lexicon can have alternative interpretations for any sentence. To learn a new language, non-native speakers must modify their brains to recognize alternative representations rather than the prior understanding.

NLU systems need help with interpreting ambiguity in questions or queries. These systems can solve a query reasonably well. However, they face many challenges.

Understanding Ambiguity

Ambiguity refers to a statement or situation that has multiple meanings or interpretations, making it difficult to determine the intended meaning without additional context or clarification. For example, this statement could have multiple meanings or interpretations, depending on the context and emphasis placed on certain words. It could mean:

"I saw her (meaning the person) duck (meaning quickly lower her head or body)."

"I saw her (meaning the person) duck (meaning the animal)."

Without additional context or clarification, it is unclear which of these interpretations is the intended meaning of the statement.

Ambiguous sentences are difficult to understand because they have multiple meanings. The human brain will have difficulty understanding if it doesn't know which terminology is in use.

Ambiguity Detection

Ambiguous sentences can be confusing for many people because people may assume that there is only one correct interpretation, when in fact there can be multiple possible meanings. This can lead to miscommunication and errors, particularly in fields such as finance or medicine where precise language is critical.

Ambiguity Detection is a useful tool that can identify and flag ambiguous statements and sentences within a body of text. It considers the surrounding context and can detect issues with word tense, pronoun usage, and other potential sources of ambiguity. This can be particularly helpful for writers who want to evaluate the linguistic coherence of their content and make it more understandable.

In natural language texts, ambiguity is a common problem that can lead to confusion and misinterpretation. It's important to identify and clarify any ambiguities as quickly as possible, especially in requirements documents where ambiguous phrases can cause serious problems. Detecting ambiguities in requirements documents is vital for systematizing typical ambiguous phrases. In addition to seeing ambiguous sentences, ambiguity detection tools should explain what is potentially ambiguous for every detected sentence.

‍

Types of Ambiguity

A fun fact is that defining ambiguity is also ambiguous. There are many types of ambiguity in natural language and, subsequently, in artificial intelligence (AI). In this article, we will discuss four of the most common types of ambiguity:

Lexical Ambiguity

A lexical ambiguity occurs when a word has more than one meaning. For example, when one word has several meanings like "take" meaning "to get into one's hands or one's possession, power, or control.” However, its meanings also include: "to take something in" where the definition is "to consider or view in a particular relation." In some cases, two words of different origins come to the same spelling and pronunciation (like "bank" meaning "river bank" or "financial institution.")

Syntactic Ambiguity

Words can have multiple grammatical structures, each with a different meaning, known as syntactic ambiguity or structural ambiguity. Consider the sentence: "Visiting relatives can be a nuisance." This sentence is synthetically ambiguous because it can be interpreted in two ways, depending on the meaning of the word "can."

Interpretation 1: "Visiting relatives has the potential to be a nuisance." In this case, "can" means "has the ability to."

Interpretation 2: "Visiting relatives sometimes is a nuisance." In this case, "can" means "sometimes."

The meaning of the sentence changes depending on how "can" is interpreted. This is an example of synthetic ambiguity.

Syntactic ambiguity occours when the structure of a sentence can be analyzed differently, i.e., the components of the sentence can be combined differently. Compared to other types of ambiguity, syntactic ambiguity appears more often and is more complicated.

Semantic Ambiguity

Semantic ambiguity in a sentence occurs when there are several ways in which to interpret it within its context even though it does not contain lexical or structural ambiguity. This can be the case, for example, when several quantifiers occur in the same sentence, for example: "all cats have a chip number." This is ambiguous in two ways:

Every cat has a unique chip number.

All cats have the same chip number.

Pragmatic Ambiguity

A sentence can have several meanings depending on the context of the phrase. This often happens when references in a body of text can be resolved in several ways. "They" in "The Doctors shall treat the patients before they leave" can refer to both the doctors and the patients.

Ambiguity Detection Approaches

The types of ambiguity discussed above are traditionally used in natural language processing. In addition, programmers use various techniques to detect ambiguity in Software Requirements Specification (SRS) documents or natural language requirements. In general, there are three types of ambiguity detection approaches:

Manual approach

A semi-automatic approach using natural language processing

A semi-automatic approach using machine learning

Manual Approaches in Detecting Ambiguity

The manual method of detecting ambiguity relies solely on the requirements of engineering expertise and no automatic techniques. Manual approaches include inspection and review.

Using Inspection Techniques, all stakeholders are requested to provide an interpretation of the requirements. A comparison is returned between the interpretations of different stakeholders. If the interpretations differ, the requirements are ambiguous.

Reviewing involves three steps. As a first step, the reviewers manually look for ambiguities in documents. Afterward, each ambiguity is rated according to its severity. Ultimately, the collected ambiguity and the natural language document review are forwarded to the author for correction.

Semi-Automatic Approach Using Natural Language Processing Techniques

The semi-automatic approach uses natural language processing techniques with the help of human experts to detect ambiguities in SRS. Ontology and natural language patterns are two types of natural language processing techniques.

During the ontology text classification process, less powerful words are removed from documents. Ontology text classification helps to clarify the possible meanings of ambiguous terms by classifying the relationships between the concepts and their explanations.

To detect ambiguity using natural language patterns, requirements engineers first manually look for ambiguity. Analyzing requirements for ambiguity is part of matching natural language patterns. Lastly, requirements engineers rewrite the requirements.

Semi-Automatic Approach Using Machine Learning Techniques

Semi-automatic approaches using machine learning techniques to detect ambiguity include decision trees, SVMs, Naive Bayes (NBs), and N-grams.

Decision-Tree Text Classification Technique

The decision tree enables a search to narrow down to a specific set of attributes by adding only the most valuable features to the tree structure. As part of the search process, each component undergoes an evaluation between several, unlikely possible values. In this process, all training examples are considered using a decision tree.

Support Vector Machine (SVM) text classification

Every element representing a word in the SVM model tags it with a Part of Speech (POS) tag. After that, SVM will assign weight to the corpus. A comparison occurs between the weight and the threshold value. SVM detects errors if the weight is large or difficult to identify.

Naïve Bayes (NB) text classification

The NB text classifier trains the text based on word probability and count. To classify a text, it analyzes the probabilities of words learned through training data.

Statistical Machine translation using the n-gram model

In addition to using probabilistic language models, the n-gram model can predict ambiguity using only words. With probabilistic language models, words in corpora accumulate to estimate probability distributions among words. Conversely, word prediction collects a series of consecutive terms. After that, using other machine learning techniques, it derives the probability of the next word.

Final Words

Ambiguity can be a major problem with documentation of any kind. The application of ambiguity detection can considerably help mitigate this issue. In spite of the myriad of techniques available, modern AI still struggles to cope with ambiguity.

‍

Download Part 1