Download Report

Computer Standards & Interfaces 35 (2013) 470–481
Contents lists available at SciVerse ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
How to make a natural language interface to query databases accessible to everyone:
An example
Miguel Llopis ⁎, Antonio Ferrández
Dept. Languages and Information Systems, University of Alicante, Spain
a r t i c l e
i n f o
Available online 12 October 2012
Keywords:
Natural language interface
Relational database
Ontology extraction
Concept hierarchy
Query-authoring services
a b s t r a c t
Natural Language Interfaces to Query Databases (NLIDBs) have been an active research ﬁeld since the 1960s.
However, they have not been widely adopted. This article explores some of the biggest challenges and approaches for building NLIDBs and proposes techniques to reduce implementation and adoption costs. The article
describes {AskMe*}, a new system that leverages some of these approaches and adds an innovative feature:
query-authoring services, which lower the entry barrier for end users. Advantages of these approaches are
proven with experimentation. Results conﬁrm that, even when {AskMe*} is automatically reconﬁgurable against
multiple domains, its accuracy is comparable to domain-speciﬁc NLIDBs.
© 2012 Elsevier B.V. All rights reserved.
1. Introduction
2. Classiﬁcation of existing NLIDBs
A natural language interface to query databases (NLIDB) is a system
that allows users to access information stored in a database by means of
typing requests expressed in some natural language [1,2,22], such as
English, Spanish, etc.
NLIDBs have been a ﬁeld of investigation since 1960s [2]. There have
been many interesting theories and approaches about how an NLIDB
could be built, in order to improve their accuracy [3], how to make
them more open in terms of the natural language expressions that
they accept [4,16], or even more, how to make them guess the real intend of the user who is trying to construct a query where some pieces
are missing [21], etc. We will analyze these approaches in this article.
While the research work on NLIDBs has led to many different
systems being implemented in academic and research environments
(e.g. [2–5,8,9,17–19]), it is difﬁcult to ﬁnd many of these systems
being used in business environments or being commercialized in companies expanding across various market segments or domain niches
[22].
In this article, we will explore previous NLIDB systems and classify
them based on the different approaches that they implement. At the
same time, we will explain which of these approaches lead to reduced
costs at different stages of the NLIDB lifecycle. Finally, we will look at
how we have implemented our proposals to minimize implementation,
conﬁguration, portability and learning costs, by analyzing the implementation of {AskMe*}, an ongoing NLIDB research work.
As we outlined in the previous section, there have been many different approaches to the construction of NLIDBs. There are various
ways for classifying them. In this article, we will explore two of the
most common taxonomies for classiﬁcation of NLIDBs which appear
across various overview articles about the NLIDB ﬁeld (e.g. [2,22])
and is also complemented by our own research observations:
- Based on user interface: textual NLIDBs vs. graphical NLIDBs.
- Based on domain-dependency: domain-dependent vs. domainindependent NLIDBs.
ο As part of the previous classiﬁcation, we will divide these
NLIDBs in subcategories, based on their degree of portability
and reconﬁguration capabilities. This particular classiﬁcation is
not something that we have found on previous work per se,
but rather a pattern that we have extracted based on the characteristics of systems that we have analyzed and previous research papers on the ﬁeld of NLIDBs that we have taken into
account for our work.
In the next sections, we will explore the idiosyncrasies of each of
these approaches. It is important to emphasize that we do not claim
one of these approaches to be better than others, as each of the approaches has its advantages and disadvantages [2,22]. However, we
will evaluate the convenience of each of these approaches in regards
to the main goal of our research work: optimize costs of NLIDBs.
2.1. NLIDBs by their user interface: textual interfaces vs. graphical interfaces
⁎ Corresponding author.
E-mail addresses: mll9@alu.ua.es (M. Llopis), antonio@dlsi.ua.es (A. Ferrández).
0920-5489/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.csi.2012.09.005
One of the biggest questions in the space of NLIDBs through decades
has been the disjunctive of choosing a textual or a graphical user
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
interface to build the system. Each of these two alternatives has its own
advantages and disadvantages that are worth considering, as described
in [2,22]:
- Textual NLIDBs: examples of this type of NLIDB are HEY [18], AT&T
[19], LUNAR [2,24] or PRECISE [3].
ο Advantage:
■ User is not required to learn any additional language.
ο Disadvantages:
■ Linguistic coverage of the system is not obvious.
■ Overlap of linguistic and conceptual failures.
- Graphical NLIDBs: an example of this type of NLIDBs is NL-Menu
[30].
ο Advantage:
■ Easy to dynamically constrain query formulation based on
user selections, in order to only build valid queries.
ο Disadvantages:
■ Lack of ﬂexibility in query formulation.
■ Expressivity power reduced to the user interface design (less
expressivity power than a textual natural language).
While most of the NLIDBs built in the past can be classiﬁed in one of
these two categories, there is an intermediate option between both
which consists on combining the expressivity power of a textual
NLIDB with the visual feedback to the user provided by a graphical
NLIDB as we presented in our previous work [23]. This can be achieved
by including query authoring services such as syntax coloring, text completions or keyword highlighting as part of the system design; our proposal is the ﬁrst NLIDB that incorporates these features to the design of
the system, to our best knowledge. Moreover, {AskMe*} helps the user
to make valid queries by automatically distinguishing between linguistic and conceptual failures.
2.2. NLIDBs by their degree of portability and re-conﬁgurability:
domain-dependent vs. domain-independent NLIDBs
A second taxonomy in NLIDB classiﬁcation can be made by considering the different approaches for how an NLIDB relates to the knowledge
domain of the database that is being queried.
- Domain-dependent NLIDBs: These NLIDBs need to “know” particularities about the underlying domain entities and restrictions in
order to work.
ο Non-Reconﬁgurable: Many of the NLIDBs in this group are
designed ad-hoc for a particular problem domain (database).
An example on this category is LUNAR [2,24].
ο Reconﬁgurable: Another group of NLIDBs are domain-dependent
but can be reconﬁgured towards being used to query a database that belongs to a different domain. In most cases, this
reconﬁguration consists on remapping domain entities and terms
from the DB in the query DSL.1 This, very often, requires the intervention of a technical user in order to perform these adjustments.
Examples in this category include AT&T [19] and ASK [2,22].
ο Auto-reconﬁgurable: This bucket is the most interesting from a
cost-saving perspective [23,30], as it will allow NLIDBs that are
knowledgeable about the underlying domain data (and therefore,
they can provide more accurate information, error messages, etc.)
and at the same time it enables non-technical users to connect to
multiple databases without the need for manual reconﬁguration.
The system knows who (the database connection string) and
what (entities, properties, data types, etc. generally captured in
an underlying source of knowledge, such as ontologies) to ask in
order to learn how to deal with the user queries. Examples of this
category include HEY [18], GINLIDB [29], FREyA [28] and an
NLIDB for CINDI virtual library [4].
1
Domain Speciﬁc Language
471
- Domain-independent NLIDBs: There are many other NLIDBs that allow
the user to write queries in a natural language and that do not store
any knowledge about the underlying domain; they simply translate
NL queries into SQL queries and execute them against the underlying
database [22]. Since the system does not know anything about the domain, it is not able to warn the user about conceptual errors in the
query (entity–property mismatch, data type mismatch, etc.) and
therefore the error-catching will happen in the database, thus, making
the system slower and less user-friendly when the query is ill-formed.
An example of NLIDB system in this category is PRECISE [3].
The problem of portability of NLIDBs is, from our perspective, one of
the most critical ones to be solved. By itself, the cost of developing an
NLIDB can be very high, and in most of the approaches taken for creating
NLIDBs, the resulting systems are tightly coupled to the underlying databases [22].
In the last few years, there have been interesting approaches to the
design of NLIDBs that are database-independent (e.g. [3,4]), in the
sense that they can cope effectively with queries targeting different domains without requiring substantial reconﬁguration efforts. One of the
best examples of this approach is PRECISE [3]. This system combines
the latest advances in statistical parsers with a new concept of semantic
tractability. This approach allows PRECISE to easily become highly
reconﬁgurable. In addition, this was one of the ﬁrst NLIDB systems
that used the parser as a plug-in, so it could be changed with relative
ease in order to leverage newest advantages in the parsers' space.
An interesting advantage of adapting the parsing process to each of
the knowledge domains that the system connects to is that analyzing an
input question in NLIDB systems is often based on a part-of-speech
(POS) tagging, followed by a syntactic analysis (partial or full) and ﬁnally, a more or less precise semantic interpretation. Although there are
broadly accepted techniques for POS tagging (e.g. [5–7]) and syntactic
analysis (e.g. [6]), techniques for semantic parsing are still very diverse
and ad hoc. In an open-domain situation, where the user can ask questions on any topic, this task is often very difﬁcult and relies mainly on
lexical semantics only. However, when the domain is limited (as is the
case of an NLIDB), the interpretation of a question becomes easier as
the space of possible meanings is smaller, and speciﬁc templates can
be used [8]. It has been demonstrated [9] that meta-knowledge of the
database, namely the schema of the database, can be used as an additional resource to better interpret the question in a limited domain.
Another interesting existing solution, based on the creation of a
new NLIDB every time that the system is connected to a new database, is the system developed for the CINDI virtual library [4], which
is based in the use of semantic templates. The input sentences are
syntactically parsed using the Link Grammar Parser [10], and semantically parsed through the use of domain-speciﬁc templates. The system is composed of a pre-processor and a run-time module. The
pre-processor builds a conceptual knowledge base from the database
schema using WordNet [13]. This knowledge base is then used at
run-time to semantically parse the input and create the corresponding SQL query. The system is meant to be domain independent and
has been tested with the CINDI database that contains information
on a virtual library.
The improvements that our research work in {AskMe*} provides in
regards to the portability problem space are described in the next
sections.
3. Most signiﬁcant costs in NLIDBs
Building an NLIDB system and bringing it into production has a signiﬁcant cost [2,22,27,28,30]. This cost can be analyzed and divided
across the different stages of the NLIDB lifecycle: system implementation, deployment and conﬁguration, and ﬁnally system users' adoption.
- System implementation: Creating an NLIDB is not a trivial task; it represents an engineering effort that must be taken into account when
472
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
considering the creation of an NLIDB [2]. During the implementation
phase, and independent from the planning methodology being used,
the costs can be mostly divided in the following three categories:
ο Design: The design of the system has an expensive cost; it is in this
phase when various decisions must be taken: whether the system
is designed to be domain-dependent or independent, what the
different modules should look like—lexer, syntactic and semantic
parsers, translation to SQL, etc. This design phase might require
weeks, even months, of engineering and architectural work [22].
ο Development: Even when there are tools and frameworks to assist
in the process, creating a natural language interface is a laborious
task. Being able to provide high expressivity power while also
processing queries efﬁciently is hard [2,28].
ο Testing: In order to create a system than can be reliable, efﬁcient
and error-free, it is important to signiﬁcantly invest in testing it:
unit testing independent modules of the system, verifying the robustness and validity of the system when integrating various
pieces or validating the usability or the system in end-to-end scenarios or queries are just some of the testing activities that must
be done in this phase [2].
- Deployment and conﬁguration: This phase comprises the different activities required to deploy and adapt the system, once it has been fully
implemented and tested, for the real use in a concrete enterprise. It
includes, among others, the following tasks: deploying system components, conﬁguring connections between components, connecting
the system to the domain database, mapping database entities to system keywords, training the system to understand users' expressions,
and ensuring robustness and high-availability of the deployed system
[22].
- Users' learning process: Last but certainly not the least, once the system has been deployed to an enterprise environment, it has to be accepted and understood by end-users. This is not a trivial process, in
fact, making a system easy to understand, learn and use for the target user must be considered as the most important principle from
the design stage and across all the other phases: the most complete
and sophisticated NLIDB is worthless if users are not happy and satisﬁed while using and interacting with it, or even more, if they reject
using it because they do not like it. Thus, the learning process must
be made smooth and compelling for the users, and this implies
that a few different factors must be taken into account: users'
learning curve for NLI constructions, database entities and relationships, etc. may be slow without the help from the system; users'
learning curve for system graphical user interface may require additional learning effort, users need to be trained in order to be able to
troubleshoot the most frequent system errors by themselves (connectivity issues, users access, etc.) [27].
It is due to all the costs enumerated previously that we believe that
an NLIDB, in order to be successfully and widely adopted in the
real-world enterprise has to be designed once, being portable and able
to target different databases and knowledge domains, can be
reconﬁgured easily in order to connect to a different database without
the need of specialized deployment or reconﬁguration steps that
end-users cannot understand and, ﬁnally, must allow users to be productive using the system since the ﬁrst day of use, while implementing
a mechanism for letting users learn more advanced concepts of the system as they use it.
4. Contributions of our approach compared to previous
related work
The main improvements of our proposal compared to other existing
systems are the signiﬁcant reduction of costs: implementation and
reconﬁguration costs are optimized due to the dynamic nature of the
system; and learning costs for end users are greatly reduced as well
thanks to the use of query-authoring services.
Some other NLIDB systems developed in the past few years include
GINLIDB [29], WASP [22,25] and NALIX [22,26]. GINLIDB represents an
interesting attempt at creating a fully auto-reconﬁgurable or “generic
interactive” (as the “G” and “I” letters in the acronym stand for) approach to the creation of NLIDBs. This system has been an inspiration
for the work developed in {AskMe*}, however our system tries to go a
step beyond what GINLIDB accomplished in auto-reconﬁguration of
the NLIDB; while GINLIDB lets the user deﬁne custom mappings between words in their input queries and actual database entities by
means of graphical menus that are displayed after a query with errors
or ambiguities has been introduced by the user, {AskMe*} attempts to
provide richer query-authoring services, which are aimed at helping
users to easily learn how to ask questions in a new domain, by providing
query suggestions, error highlighting and domain-speciﬁc error descriptions, as we will describe later.
WASP (Word Alignment-based Semantic Parsing) is a system developed at the University of Texas by Yuk Wah Wong [25]. While the system
is designed to address the broader goal of constructing “a complete, formal, symbolic, meaningful representation of a natural language sentence”,
it can also be applied to the NLIDB domain. A predicate logic (Prolog) was
used as the formal query language. WASP learns to build a semantic parser given a corpus a set of natural language sentences annotated with their
correct formal query languages. The strength of WASP comes from the
ability to build a semantic parser from annotated corpora. This approach
is beneﬁcial because it uses statistical machine translation with minimal
supervision. Therefore, the system does not have to manually develop a
grammar in different domains. In spite of the strength, WASP also has
two weaknesses. The ﬁrst is: the system is based solely on the analysis
of a sentence and its possible query translation, and the database part is
therefore left untouched. There is a lot of information that can be
extracted from a database, such as the lexical notation, the structure,
and the relations within. Not using this knowledge prevents WASP to
achieve better performances, and this is an approach that {AskMe*} tries
to improve as we will see later. The second problem is that the system requires a large amount of annotated corpora before it can be used, and
building such corpora requires a large amount of work [22].
NALIX is a “Natural Language Interface for an XML Database” [26].
The database used for this system is extensible markup language
(XML) database with Schema-Free XQuery as the database query language. Schema-Free XQuery is a query language designed mainly for retrieving information in XML. The idea is to use keyword search for
databases. However, pure keyword search certainly cannot be applied.
Therefore, some richer query mechanisms are added [26]. Given a collection of keywords, each keyword has several candidate XML elements
to relate. All of these candidates are added to MQF (Meaningful Query
Focus), which will automatically ﬁnd all the relations between these elements. The main advantage of Schema-Free XQuery is that it is not
necessary to map a query into the exact database schema, since it will
automatically ﬁnd all the relations given certain keywords. In NALIX
the transformation processes are done in three steps: generating a
parse tree, validating the parse tree, and translating the parse tree to
an XQuery expression. This approach is being leveraged by our system
as well, in the sense that user queries are validated by {AskMe*} before
being executed against the database, thanks to the information available from the database schema, but with {AskMe*} we try to go beyond
this in order to provide richer query-authoring services to the user in
order to make the “writing a query” step more interactive and educational for the user, as we will describe later.
One of the ﬁrst natural language interfaces that provide a notion of
suggestions to the user in order to author a query is OWLPath [27].
This system suggests to the user how to complete a query by combining
the knowledge of two ontologies, namely, the question and the domain
ontologies. The question ontology plays the role of a grammar, providing
the basic syntactic structure for building sentences. The domain ontology
characterizes the structure of the application-domain knowledge in
terms of concepts and relationships. The system makes then suggestions
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
based on the content of the question ontology and its relationships with
the domain ontology. Once the user has ﬁnished formulating the natural
language query, OWLPath transforms it into a SPARQL query and issues it
to the ontology repository. In the end, the results of the query are shown
back to the user. This is an interesting approach in natural language interfaces to query ontologies that were published just a few months before the ﬁrst publication about {AskMe*} [23]. While both systems
leverage ontologies in order to provide the user suggestions in how to
complete their queries, the systems are considerably different in a few
aspects: OWLPath is a natural language interface to query ontologies,
while {AskMe*} is a natural language interface to query databases that leverages ontology generation as a technique to capture the characteristics
and semantics of the underlying database schema. In addition, while
OWLPath provides query suggestions or auto-completions for terms
that exist in the underlying ontology, it does not provide error information that is speciﬁc to the domain, in case that a query contains errors,
which is something that {AskMe*} tries to emphasize in order to educate
users and help them learn how to use the system and understand the
logical model of the underlying domain.
Another interesting and recent approach that inspires our work is
FREyA [28], which combines syntactic parsing with the knowledge
encoded in ontologies in order to reduce the customization effort. If
the system fails to automatically derive an answer, it will generate clariﬁcation dialogs for the user. The user's selections are saved and used for
training the system in order to improve its performance over time.
While this is an interesting approach and inspires {AskMe*} in its principles, it differs from our research work in the sense that {AskMe*} focuses on helping users create valid queries from the beginning, as
opposed to FREyA's approach of letting them introduce wrong queries
and help the system correct them by means of clariﬁcation dialogs
which are used for auto-correcting errors in the future.
{AskMe*} is the ﬁrst NLIDB system that proposes the combination of
textual NLIDBs with rich query-authoring services (syntax coloring,
error squiggles, tooltips, etc.). This provides a substantial improvement
in the user experience when writing queries, especially in regards to
query accuracy in order to solve both linguistic failures and conceptual
failures, which could not be fully solved by the use of menu-based user
interfaces either. The use of query-authoring services helps to reinforce
the conceptual center of the dialog between the user and the NLIDB
around the domain entities in focus.
473
In order to achieve this, we combine domain-speciﬁc information,
captured in concept-hierarchy ontologies any time the system is
connected to a new database. The system automatically generates the
syntactic and semantic parsing templates and the rest of components
needed in order to provide query-authoring services. In addition, the system is fully auto-reconﬁgurable without the need of any specialized
knowledge. This is a signiﬁcant improvement compared to the existing
portable solutions mentioned before, because it makes the entire
reconﬁguration process fully transparent to the end users as opposed
to having to perform some reconﬁguration steps for entity mapping, disambiguation, etc. This is a substantial improvement not only by the
amount of extra work that is saved in the reconﬁguration steps, but
also because it enables the system to be automatically managed without
user intervention. This represents a step towards the democratization of
NLIDBs, as users ﬁtting a non-technical proﬁle will be able to use the system on their own, throughout the entire system lifecycle, from the very
early steps of adoption and deployment of the system towards a
real-world production environment to the management, reconﬁguration
and diagnosis steps across multiple domains, for which the system is able
to adapt itself automatically. In this sense, also, the role of queryauthoring services is fundamental because they enable to perform the
very little manual reconﬁgurations needed, if any, driven by intuitive
real-time hints in the query authoring process.
5. {AskMe*}: an NLIDB that reduces adoption, portability and
users' learning costs
{AskMe*} is a database-independent NLIDB and uses a templatebased approach for the dynamic generation of the lexer, syntactic and semantic parsers. Fig. 1 shows the different modules of the system.
An exhaustive description of every component of the system is out
of the scope for this paper, instead we will focus on describing the
most relevant techniques that enable the proposed improvements
of the system compared to other state-of-the-art systems: dynamic
generation of the system and query-authoring services.
In order to make this analysis easier to follow, we will use a case of
study and complement the description of each of these components
with the application on a given domain. We will use a sub-set of
Northwind [14] (see Fig. 2), a canonical example of a relational database
Fig. 1. {AskMe*}'s high level architecture.
474
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
Fig. 2. Sub-set of Northwind database schema.
that captures the domain of a ﬁctitious trading company, containing information about products, orders, suppliers, employees, etc.
5.1. Ontology builder
The ﬁrst operation performed once {AskMe*} connected to a database is to search for the ontology representing that domain in the ontology repository. This repository consists on a dictionary that stores
ontology references for any given tuples bServer, Database> that the
system has been connected to.
If the ontology for that particular domain does not exist, the ontology
generation process is triggered. This process [10] analyzes the database
catalog and schema in order to build the ontology that captures the domain entities, properties, relationships and constraints. {AskMe*} is
using OWL for representing ontologies. In order to build the ontology
for each database, and keep the system within a manageable range of
data volume, only the minimal information needed from the database
is stored into the ontology. Concretely, entity names, properties and
value types are mapped from the database into the ontology, while the
actual data is not. The reason that motivates this decision is that we are
using OWL as a way to represent domain characteristics (entity names,
entity properties, relationships, etc.) in the underlying database, however, the actual data is much bigger in size than the schema and also
changes more often than the schema. Therefore, we decided to perform
an analysis of the database schema that allows us to capture the nature of
the domain, while the actual data retrieval process part of each user
query execution is performed directly against the database, after validating that all domain restrictions are being satisﬁed by the user query, for
which we leverage the domain representation captured in the OWL ontology. As a matter of fact, if a user query that complies to all domain restrictions stored in OWL is then executed against the database and the
result indicates that there has been a change in the underlying domain
that makes the OWL be out of date, a new XML–OWL generation process
is triggered automatically, in order to refresh the domain ontology and
keep it accurate at any time with the underlying database.
In order to build the ontology capturing the mapping described
above, {AskMe*} leverages OWLminer's approach [7] which consists
on implementing the algorithm known as Feature and Relation Selection (FARS) [11]. FARS is multi relation feature selection that uses target
tables and attributes in order to create join chains with other tables
using foreign keys as links. The algorithm also uses Levenshtein Distance [12] as a metric for determining whether features are related or
not. This metric is based on closeness between text and feature's value
of the dataset. During this approximate search, every set of input texts
from the set of relations and tables in the given database is analyzed.
The result of this analysis is a set of attributes that meet the constraint:
all members must be columns (properties) within the current database
table (entity) as described in Table 1.
After this ﬁrst-level search has been performed for a given table, the
next steps consist on ﬁnding the cross-table relationships, taxonomic
and non-taxonomic relations and dependencies, in order to make the
ontology grow in this dimension too. The attributes identiﬁed in the
previous step are now used to analyze and discover the set of corresponding tables. As part of this process, also the primary and foreign
keys are identiﬁed.
The output of the feature and relation selection algorithm described
in Fig. 3 is represented as an XML document where the ﬁrst-level nodes
in the tree represent tables. An example of the output of this algorithm
can be found on Fig. 4, based on the Northwind schema described
previously.
Table 1
Database–ontology mapping.
Database component
OWL component
Table/entity
Column
Column metadata:
- Data type.
- Mandatory/non-nullable.
- Nullable.
Class
Functional property
OWL property restriction:
- All values from restriction.
- Cardinality() restriction.
- MaxCardinality() restriction.
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
475
Fig. 3. Feature and relation selection algorithm.
As can be seen in Fig. 4, all entities in the previously described sample based on Northwind are being captured in a custom XML tree structure. This XML tree contains not only the entity (table) names but also
columns and column types for each table, as well as primary key and
foreign key information.
The next step in the extraction process consists on converting this
XML tree into an ontology that can be used to generate all queryauthoring services information needed. In order to achieve this, the
previous XML tree needs to be converted into OWL and for that the
tree is processed and each table node is converted into an OWL class in
the resulting document. For this sample we will only focus on the foreign
key relations for Category, Product, Supplier, Order and OrderDetail, but
all other object properties would be represented in this OWL as well. In a
similar way, each foreign key is expressed as an OWL object property in
which two primary classes are related using domain and range attributes
(see Fig. 5).
By using this approach, the building process of the OWL ontology is
accelerated and also the use of background knowledge helps to extract
the required knowledge from the database. This approach is considerably
better in cost (time and space) than simply mirroring the database schema to the ontology, based on multiple experiments as described in [11].
5.2. Dynamic parser generation
After building the ontology that captures the overall characteristics
of the database domain, the next step consists on automatically building
the parsers that will help understand users' queries and translate them
into SQL queries to be executed against the database.
As described previously, {AskMe*} is fully auto-reconﬁgurable and it
can be pointed against multiple domains, while at the same time it is
able to offer domain-speciﬁc features such as lexical, semantic and conceptual error detection. The key for these capabilities resides in the ability to perform this dynamic parser generation at all three levels: lexical,
syntactic and semantic.
5.2.1. Lexicon
A lexicon is formed by the set of terms that can be understood by
the system; that is, the set of terms that have a special meaning in a
given NLIDB. This particularly means the set of entities and properties
that have been identiﬁed in the database schema. In the case of
{AskMe*}, these terms are also captured in the domain ontology.
In order to build the lexicon, we combine the set of nouns derived
from the domain knowledge contained in the database, namely the
entity and property names, with a general-knowledge vocabulary
terms, mostly verbs, adjectives and adverbs. We are retrieving these
general-knowledge vocabulary terms from WordNet [13], a large lexical
database of English. This database classiﬁes nouns, verbs, adjectives and
adverbs into sets of cognitive synonyms. Cognitive synonyms (also
named in WordNet as “synsets” [13]) are terms which belong to a different syntactic category (i.e. nouns, verbs, etc.) but represent related
concepts; an example of a set of cognitive synonyms could be “approximation” (noun), “approximated” (adjective) and “approximate”
(verb). Thanks to these cognitive synonyms sets, we are also able to
complement the existing set of domain-speciﬁc nouns (entities and
properties from the domain ontology) with an important amount of
synonyms, into the system lexicon. This aspect is very important, as it
will allow the lexer to automatically accept terms that, even when
they are not the exact noun used in the underlying database schema,
represent the same concept for the user. For example, the database
may contain a property called “Telephone” for the entity “Customer”;
while the user probably refers to it simply as “Phone”. The lexer is
able to recognize “phone” as a valid term as well. In case the term is
not in WordNet, such as “ProductID”, several heuristics are applied
(e.g. by splitting a term into several terms when there is an uppercase
letter in the middle of a term in lower case letters: “Product+ ID”). Finally, the user can review this lexicon in order to add or suppress synonyms (e.g. the term “Emp” is not in WordNet so the user could add
synonyms such as “Employee”).
Following the example of Northwind described previously (see
Fig. 2), the domain-speciﬁc lexicon of nouns built from the ontology
(WordNet synonyms in parentheses) is presented in Table 2. Note
that it contains Entities and Properties as specialized terms, this classiﬁcation is not relevant to the lexicon itself, but will be used lately for semantic analysis as we will describe.
By having the dynamic lexicon generation process, {AskMe*} can
implement an interesting feature such as the lexical error detection
476
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
Fig. 4. Generated custom XML tree containing all Northwind entities in the sample.
capability. Once the system has been conﬁgured, a user can start typing
in a query and it will be processed by the lexer at ﬁrst. Every time that a
white space has been added to the buffer, the lexer analyzes the term
Fig. 5. OWL capturing classes and relations from Northwind.
that goes immediately before this white space and decides whether it
is valid from the lexical perspective or not. If the term does not appear
in the lexicon, the lexer will tag it as an invalid lexical item. This tag information is automatically retrieved by the query-authoring services
component, that will underline the invalid term with red squiggles in
the query bar, making it evident to the user that the underlined part is
wrong in his query, even before he ﬁnishes writing it and offering
tooltip information about the invalid term (Fig. 6).
In the example shown in Fig. 6, a query about “projects” is being
provided by the user. The system analyzes this query in real time
and determines that “projects” is the entity that needs to be found
in the underlying domain. In order to do this, {AskMe*} looks for
this entity on the lexicon (from Table 2) and determines that it does
not exist. As a result of this, the system notiﬁes the user about the
error in the query by adding red squiggles as an underline to the
term that has not been found in the lexicon. When the user places
the mouse on this term, a tooltip containing additional information
about the error is displayed to the user.
The other query-authoring service offered by {AskMe*} at lexer
level is the completion suggestions mechanism, which offers in a
dropdown pop-up menu, which appears below the word that the
user is currently typing, the set of suggested words that contain the
portion typed by the user as a fragment. This helps the user to remember the exact word that he is trying to write, and also to
autocomplete it, making him write queries faster (Fig. 7).
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
477
Table 2
Northwind's lexicon with WordNet synonyms.
Entities (and synonyms)
Properties
Product (merchandise, ware)
Product ID (product identiﬁer), product name (product denomination), supplier ID (provider identiﬁer), category ID (type
identiﬁer, class identiﬁer), quantity per unit, unit price (unit cost), units in stock, units in order, reorder level, discontinued.
Order ID (order identiﬁer), employee ID (worker identiﬁer), order date (command date), required date (due date), shipped
date, ship via, freight (cargo), ship name, ship address, ship city (town, municipality), ship region, ship postal code (ship zip
code), ship country.
Order ID (order identiﬁer), product ID (product identiﬁer, ware identiﬁer, merchandise identiﬁer), unit price (unit cost),
quantity (amount), discount (deduction, reduction, allowance).
Category ID (type identiﬁer, class identiﬁer), category name (category denomination, class name, class denomination, type
denomination, type name), description (representation, information), picture (photo, photograph, image).
Supplier ID (dealer identiﬁer, provider identiﬁer, vendor identiﬁer), company name (business name, enterprise name),
contact name (correspondent name), contact title (correspondent appellation), address (direction, domicile), city (town,
municipality), region (territory, district), postal code (zip code), country (nation, state), phone (telephone), fax (facsimile),
home page.
Order (command)
Order details
Categories (types, classes)
Suppliers (dealers, providers, vendors)
5.2.2. Syntactic parser
{AskMe*} leverages the Link Grammar Parser [10] for the core syntactic parsing operations. The Link Grammar Parser is a syntactic parser
of English, based on link grammar, an original theory of English syntax.
Given a sentence, the system assigns to it a syntactic structure, which
consists of a set of labeled links connecting pairs of words. The parser
also produces a “constituent” representation of a sentence (showing
noun phrases, verb phrases, etc.), like the one shown in Fig. 8.
The parser has a dictionary of about 60,000 word forms. It has coverage of a wide variety of syntactic constructions, including many rare and
idiomatic ones. The parser is robust; it is able to skip over portions of the
sentence that it cannot understand, and assign some structure to the
rest of the sentence. It is able to handle unknown vocabulary, and
make intelligent guesses from context and spelling about the syntactic
categories of unknown words. It has knowledge of capitalization, numerical expressions, and a variety of punctuation symbols.
A full description of the Link Grammar is out of scope for this article,
however it is noteworthy that, by using the Link Grammar API, the totality of this parser's capabilities can be leveraged in {AskMe*}, thus
enabling our efforts to focus on other innovative areas such as the combination of query-authoring services within the proposed NLIDB, as
well as the portability of the system.
The concurrency mechanisms implemented on top of the Link
Grammar Parser API are based on event notiﬁcations for all the syntactic
parser events: every time the parser processes and tags a fragment
of the input query an event is generated, containing information
about the syntactic classiﬁcation for each token. This is a key component for driving the syntactic query-authoring service that {AskMe*}
implements: syntactic error squiggles (green). These squiggles warn
the user about syntactic errors in a query, even before the query
authoring has been fully completed (Fig. 9).
5.2.3. Semantic parser
The third parsing step performed to an input query is the semantic
parsing. In {AskMe*}, given its dynamic domain-speciﬁc knowledge
acquisition nature, it may be feasible to ﬁnd that a certain query is
valid according to the lexical and syntactic analysis, but does not represent a concept that ﬁts into the current domain. For example, the
query “Name and date of the customers from the country where
most orders were made in 2010” could be lexically and syntactically
valid, all the terms in the sentence may be present in the dynamic lexicon, and the syntactic construction and order of words match one of
the valid categories of phrases in the Link Grammar Parser. However,
as you will notice, the concept of Date may not exist for the entity
Customer. This is deﬁnitely an error in the input query, a semantic
error.
In order to detect this kind of errors, the semantic parsing step is
applied to the input query. The semantic parser is guided by the use
of semantic templates which are ﬁlled with the concepts captured
in the domain ontology. The set of rules that are modeled by these
dynamically-generated semantic templates are:
- Entity–Property correspondence: This rule enforces that all the requested properties for an entity in a query are indeed part of the
current domain schema.
- Cross-entities relationships: This rule is applied to queries that contain multiple sub-phrases, and its purpose is to enforce that there
exists a foreign-key relationship in the database schema between
the entities in the query.
Fig. 6. Lexical error squiggles and tooltip error information.
Fig. 8. Constituent tree for the query “Suppliers that are not in United States”.
Fig. 7. Completions for supplier properties starting with “Co”.
Fig. 9. Syntactic error squiggles and tooltip error information.
478
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
Table 3
Examples of semantic rules behavior.
Schema relationships
Input query
Result
Reason
Customers–orders and orders–products
Products from customers whose last name is Llopis.
Fail
Customers–orders and orders–products
Products that were ordered by more than 100 customers
in 2010.
Success
There is not an existing relationship between products and
customers in this domain.
Products are related to orders, and every order references a
customer.
6. Evaluation
Table 4
Examples of template-based semantic error messages.
Inconsistency type
Error description message
Entity–property
mismatch
Missing relationship
“Entity A” does not contain a property called “Property A”
(where Entity A and Property A are the values in a query).
“Entity A” and “Entity B” are not related to each other.
- Entities' default attributes: There are cases in which the query is
valid from a lexical, syntactic and semantic analysis, but it does
not specify which attributes must be present in the result. For instance, the query in Table 3, “Products that were ordered by more
than 100 customers in 2010”, does not specify which product
properties we are interested in. This semantic rule does not invalidate a given input query, but rather imposes that the resulting
SQL query must return all the product attributes that are
not-null in the database schema, such as the Product ID, Product
Name, Price, etc. This information, as we explained previously,
was captured in the domain ontology as an OWL cardinality metadata attribute.
Some examples of these rules are presented and analyzed in Table 3.
In the case that one or more of these semantic requirements are not met
by the input query, the semantic analysis would report errors. These errors are notiﬁed to the system in the form of events. The query-authoring
services component is subscribed to these semantic events, in the same
way as it is to the lexical and syntactic ones, and would therefore notify
the user in a visual way about the issue, by highlighting the portions of
the input query that cause the inconsistency. When the user hovers
with the mouse over these highlighted regions, a tooltip containing a
description of the inconsistency comes up. This description is also
template-based, see Table 4.
In order to evaluate the effectiveness of our approach, we are applying three different experiments:
- Accuracy in query interpretation for a concrete domain.
- Effectiveness of query authoring services for a concrete domain.
- Portability of the system across domains.
6.1. Accuracy in query interpretation for a concrete domain
The ﬁrst experiment consists on evaluating the accuracy of our
query interpretation process in a concrete domain. For that purpose,
we evaluated our system using data from the Air Travel Information
(ATIS) domain [15]. The ATIS database is based on air travel data
obtained from tile Ofﬁcial Airline Guide (OAG) in June 1992 and current
at that time. The database includes information for 46 cities and 52 airports in the US and Canada. The largest table in the expanded database,
the ﬂight table, includes information on 23,457 ﬂights. A complete reference about the ATIS domain can be found at [15]. The selection of
ATIS was motivated by three concerns. First, a large corpus of ATIS
sentences already exists and is readily available. Second, ATIS provides
an existing evaluation methodology, complete with independent training and test corpora, and scoring programs. Finally, evaluation on a
common corpus makes it easy to compare the performance of the system with those based on different approaches. Our experiments utilized
the 448 context independent questions in the ATIS “Scoring Set A”,
which is one of the sets of questions of the ATIS benchmark, generally
the most commonly used for the evaluation of other systems, and the
one that lets us compare with most of them.
{AskMe*} produced an accuracy rate of 94.8%. System accuracy
rate is calculated based on the equation in Fig. 11.
Fig. 10. Examples of queries from ATIS and results obtained with {AskMe*}.
Fig. 11. System accuracy equation.
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
Table 5
Accuracy comparison using ATIS between various NLIDB systems.
HEY [16]
SRI [17]
PRECISE [3]
{AskMe*}
MIT [18 ]
AT&T [19]
92.5
93
94
94.8
95.5
96.2
Table 5 contains a comparison of the results obtained by {AskMe*} in
the ATIS benchmark, to other state-of-the art systems. In some cases, as
displayed in Fig. 10, some of the failures are due to domain-speciﬁc information or query shortcuts (such as “tomorrow” →Date–Time, etc.)
which {AskMe*} does not support yet because other functional work
was prioritized higher, such as domain-portability or query-authoring
services.
These results conﬁrm that, even when {AskMe*} is a fully
reconﬁgurable system that can be targeted to multiple knowledge domains, its accuracy results against a particular domain are very similar
to the results for other state-of-the-art systems which are tailored to
the underlying domain.
6.2. Effectiveness of query authoring services for a concrete domain
The second experiment that we are using to evaluate our system is
measuring how using query-authoring services improves the overall
usability of the system, by enabling early detection of query errors. In
order to do that, we asked a set of ten users to write ﬁfty queries per
user in a given domain. These users were completely new to the system
and they did not have any previous knowledge about the underlying
domain. We gave them an initial description of the Northwind database,
without schema representation or concrete entity/property names, and
let them query the system in an exploratory way. This description was
as simple as explaining them that the database contained information
about products, product categories, orders, order details and suppliers.
During this process, users are very likely to introduce mistakes in
most of the queries they come up with for the ﬁrst time. We captured
traces for all of these queries and recorded in which stage of the parsing
process they were raised.
Our results indicate that, from the set of ﬁfty input queries per user,
almost 90% of them contained errors, from which roughly the 80% of
these wrong queries could be detected before they were translated
into SQL and, therefore, before they were being executed against the database. This fact results in signiﬁcant improvements in terms of latency
time for wrong queries, since thanks to the query-authoring services
that {AskMe*} implements, they are locally detected by the system instead of being translated into SQL and executed against the database.
The results of this experiment show that while an important amount
of errors (23%) are due to lexical errors (usually things like typos), and
26% of them correspond to syntactic errors (mostly ill-formed
sentences in the English language), most of the errors are due to semantic errors (51%). In order to help minimizing the probability of having
lexical errors in a query, the system provides auto-completion for
479
entities and properties, and also auto-correction of typos based on
distance-editing algorithms. Table 6 shows some of the most interesting
queries written by users and how {AskMe*} guided them towards the
right query.
In terms of semantic error distribution classiﬁed by the main semantic rules that {AskMe*} implements, this evaluation determines that 51%
of them fall in the rule of entity–property mismatch, thus being the
most common semantic error, 41% of errors correspond to queries trying to refer to a missing relationship that does not exist in the domain
and the remaining 8% represents semantic errors due to the query specifying invalid values in property conditions.
6.3. Portability of the system across multiple domains
Our third experiment focuses on evaluating the portability of the
system. For this purpose, we have created a script that simulates the
user actions through the visual interface. In this test, the system will
be connected to three different databases that we have previously
conﬁgured: ATIS, AdventureWorksDB [20] and Northwind [14]. For
each of these database connections, a custom benchmark made up of
ﬁfty different queries that are relevant to the correspondent domains
(ATIS as described in the ﬁrst experiment, Northwind as described
through different sections of this paper and Adventure Works shown
in Fig. 12) is executed against the system, asserting that the queryauthoring services work as expected and that the resulting SQL query
is generated as expected as well.
Finally, the test also evaluates the behavior when the system is
connected to a database that had been already connected before,
checking that the ontology generation process is not kicked-off again,
but rather the existing ontology for that source is pulled back from
the store and brought into the current connection context. The results
of this experiment indicate that there is not any lose in accuracy after
a reconnection to a different database, and the results are the same as
if the system was only connected to a single database for its lifetime.
This means that the same results observed in the ﬁrst and second experiments apply to the scenario of multiple database reconnections without degrading the overall accuracy of the system after connecting to
multiple domains.
7. Conclusions and future work
{AskMe*} is an adaptive natural language interface and environment
system to query arbitrary databases. Internally, the system leverages an
ontology-based approach in which a new ontology is auto-generated
every time the system is connected to a different database. Once this ontology has been generated, the rest of the system – domain-speciﬁc
grammar, query-authoring services, etc. – reconﬁgures itself based on
the set of language terms and relationships contained in the ontology.
This automatic reconﬁguration enables an effective lexical, syntactic
and semantic validation of an input query, which will result in a higher
Table 6
Sample queries ﬁxed by user interaction with query services.
User query
Corrected query
List all categories of products
List all categories of products
How was it ﬁxed?
Auto-correction of typos and lexical level (distance-editing
algorithm comparing to “known” valid tokens)
Products from customers whose last name is Llopis Products ordered by customers whose last name is Llopis A semantic error tooltip is displayed in user query; they learn
that the relation products–customers is transitive, via order
details and orders (which contain customer ID).
The user follows guidance in order to end up with a valid
query.
Products from whose last name is Llopis
Products from customers whose last name is Llopis
The syntactic parser detects a syntactic error when the user
types “whose” as the entity is missing. By providing an error
squiggle and tooltip, the user is able to identify the missing
piece on the query and correct it in order to ﬁx and complete
the rest of the query.
480
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
Fig. 12. Adventure Works database simpliﬁed schema used in the second experiment.
accuracy of the system. The evaluation process showed how, despite the
system is not speciﬁc to any concrete domain, the result of 94.8% of
accuracy against the ATIS benchmark is relatively good compared to
other existing state-of-the-art systems, both domain-dependent and
independent.
Furthermore, this approach enables full portability of the system
without any reconﬁguration steps needed for the system to successfully
execute queries against any new database. Extra mapping reconﬁgurations, user preferred ways to refer to elements of the domain-model,
can be done through easy user interface gestures such as right-clicking
elements (i.e. words) of a given query. We believe that the simpliﬁcation
of the reconﬁguration process when connecting to new database
schemas is a very important step towards the democratization of NLIDBs
in real world setup, as it enables non-technical users to be able to fully
control the system through its entire lifecycle.
In addition, it enables the construction of a customized textual query
environment in which a set of query-authoring services can be provided
to the user, to help authoring and disambiguating queries. These
query-authoring services play a fundamental role in systems' usability,
making it possible to early detect query errors, as demonstrated in the
evaluation section, where we observed that around the 80% of the
queries that contained errors could be detected before they were actually translated into SQL, resulting in a more efﬁcient, lower-latency,
user-interactive system. The classiﬁcation of these errors based on the
parsing stage in which they are detected, as shown in the evaluation,
gives us the possibility to selectively focus on improving the quality
and functionality of query-authoring services at each stage of the parsing process, in order to maximize the investment in relation to the gain
of the overall user experience. Finally, just remark that {AskMe*} helps
the user to make valid queries as well by automatically distinguishing
between linguistic and conceptual failures.
Based on our very positive evaluation results for early error detection, thanks to the use of query-authoring services, as future work, we
are trying to maximize this beneﬁt by experimenting with new
query-authoring services and improving the existing ones. Moreover,
we will add anaphora and ellipsis resolution capabilities in {AskMe*}.
Anaphora and ellipsis resolution are an active research ﬁeld in the
space of NLIDBs; this capability enables users to have the possibility
to dramatically abbreviate the number of words to be written when
asking different questions about different aspects of the same entity,
which will result, again, in another important usability shift for
{AskMe*} [21]. The main drawback of using this kind of resolution is
its low precision. However, we plan to overcome the low precision
of anaphora and ellipsis resolution by means of beneﬁting of query
authoring services.
Acknowledgments
This research has been partially funded by the Valencia Government under Project PROMETEO/2009/119, and by the Spanish Government under Project Textmess 2.0 (TIN2009-13391-C04-01) and
TIN2012-31224.
References
[1] S. Abiteboul, V. Hull, R. Viannu, Foundations of Database Systems, Addison Wesley,
1995.
[2] L. Androutsopoulos, Natural language interfaces to databases—an introduction,
Journal of Natural Language Engineering 1 (1995) 29–81.
[3] A. Popescu, A. Armanasu, O. Etzioni, D. Ko, A. Yates, PRECISE on ATIS: Semantic
Tractability and experimental results, in: Proceedings of the National Conference
on Artiﬁcial Intelligence – AAAI, 2004, pp. 1026–1027.
[4] N. Stratica, L. Kosseim, B.C. Desai, Using Semantic Templates for a natural language interface to the CINDI virtual library, Data & Knowledge Engineering
Journal 55 (1) (2004) 4–19.
[5] D. Jurafsky, J. Martin, Speech and Language Processing: An Introduction to Natural
Language Processing, Speech Recognition and Computational Linguistics, Prentice
Hall, 2000.
[6] C. Manning, H. Schutze, Foundations of Statistical Natural Language Processing,
MIT Press, 1999.
[7] H. Santoso, S. Haw, Z.T. Abdul-Mehdi, Ontology extraction from relational database:
concept hierarchy as background knowledge, Knowledge-Based Systems 24 (3)
(2011) 457–464.
[8] M. Watson, NLBean(tm) version 4: a natural language interface to databases,
www.markwatson.com.
[9] R. Bartolini, C. Caracciolo, E. Giovanetti, A. Lenci, S. Marchi, V. Pirrelli, C. Renso, L.
Spinsanti, Creation and use of lexicons and ontologies for NL interfaces to databases, in: Proceedings of the International Conference on Language Resources
and Evaluation, vol. 1, 2006, pp. 219–224.
[10] D. Sleator, D. Temperley, Parsing English with a link grammar, in: Proceedings of
the Third International Workshop on Parsing Technologies, 1991.
[11] B. Hu, H. Liu, J. He, X. Du, FARS: multi-relational feature and relation selection approach for efﬁcient classiﬁcation, in: Proceedings of the Advance Data Mining and
Application Conference, vol. 1, 2008, pp. 73–86.
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
[12] V.I. Levenhstein, Binary Codes capable of correcting deletions, insertions, and reversals, Soviet Physics – Doklady 10 (8) (1966) 707–710.
[13] G.A. Miller, WordNet: a lexical database for English, Communications of the ACM
Journal - CACM 38 (11) (1995) 39–41.
[14] Northwind, http://msdn.microsoft.com/en-us/library/aa276825(SQL.80).aspx.
[15] M. Bates, S. Boisen, J. Makhoul, Developing an evaluation methodology for spoken
language systems, in: Proceedings of the Speech and Natural Language Workshop, vol. 1, 1990, pp. 102–108.
[16] H. Young, S. Young, A data-driven spoken language understanding system, in:
Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, vol. 1, 2003, pp. 583–588.
[17] R.C. Moore, D.E. Appelt, SRI's experience with the ATIS evaluation, in: Proceedings
of the Workshop on Speech and Natural Language, 1990, pp. 147–148.
[18] V. Zue, J. Glass, D. Goddeau, D. Goodine, L. Hirschman, M. Phillips, J. Polifroni, S.
Seneff, The MIT ATIS system: February 1992 Progress Report, in: Proceedings of
the Workshop on Speech and Natural Language, 1992, pp. 84–88.
[19] D. Hindle, An analogical parser for restricted domains, in: Proceedings of the
Workshop on Speech and Natural Language, 1992, pp. 150–154.
[20] Adventure Works, http://msdn.microsoft.com/en-us/library/ms124659.aspx.
[21] J.L. Vicedo, A. Ferrandez, Importance of pronominal anaphora resolution in question answering systems, in: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000, pp. 555–562.
[22] N. Nihalani, S. Silakari, M. Motwani, Natural language interface for database: a
brief review, International Journal of Computer Science Issues 8 (2) (2011)
600–608.
[23] M. Llopis, A. Ferrandez, {AskMe*}: Reducing the costs of adoption, portability and
learning process in a natural language interface to query databases, in: Proceedings
of the 8th International Workshop on Natural Language Processing and Cognitive
Science, vol. 1, 2011, pp. 75–89.
[24] W.A. Woods, R.M. Kaplan, B.N. Webber, The Lunar Sciences Natural Language Information System: Final Report, in: BBN Report, 2378, 1972.
[25] Y.W. Wong, Learning for semantic parsing using statistical machine translation
techniques, in: Technical Report UT-AI-05-323, University of Texas, Austin, 2005.
[26] Y. Li, H. Yang, H.V. Jagadish, NALIX: an interactive natural language interface for
querying XML, in: Proceedings of the International Conference on Management
of Data, 2005, pp. 900–902.
[27] R. Valencia-Garcia, F. Garcia-Sanchez, D. Castellanos-Nieves, J.T. Fernandez-Breis,
OWLPath: an OWL ontology-guided query editor, IEEE Transactions on Systems,
Man, and Cybernetics—Part A: Systems and Humans 41 (1) (2011) 121–136.
[28] D. Damljanovic, M. Agatonovic, H. Cunningham, Natural language interfaces to
ontologies: combining syntactic analysis and ontology-based lookup through the
481
user interaction, in: Proceedings of the 7th Extended Semantic Web Conference,
2010, pp. 106–120.
[29] P.R. Devale, A. Deshpande, Probabilistic context free grammar: an approach to generic interactive natural language interfaces to databases, Journal of Information,
Knowledge and Research in Computer Engineering 1 (2) (2010) 52–58.
[30] H.R. Tennant, K.M. Ross, M. Saenz, C.W. Thompson, J.R. Miller, Menu-based natural language understanding, in: Proceedings of the 21st Annual Meeting of ACL,
1983, pp. 151–158.
Miguel Llopis is a Ph.D. Student at the Department of Software and Computing Systems in the University of Alicante
(Spain). His research interests include: Natural Language
Processing, Question Answering and Domain-Speciﬁc
Languages. He has written various papers in journals and
participated in international conferences related to his
research topics. Besides his Ph.D. studies and research
activity, Miguel works as a Program Manager in the
SQL Server Team at Microsoft Corporation (Redmond,
Washington). Contact him at mll9@alu.ua.es.
Antonio Ferrández is a Full-time Lecturer at the Department of Software and Computing Systems in the University of Alicante (Spain). He obtained his Ph.D. in Computer
Science from the University of Alicante (Spain). His research
interests are: Natural Language Processing, Anaphora Resolution, Information Extraction, Information Retrieval and
Question Answering. He has participated in numerous
projects, agreements with private companies and public organizations related to his research topics. Finally, he has supervised Ph.D. Thesis and participated in many papers in
Journals and Conferences related to their research interests.
Contact him at antonio@dlsi.ua.es.