SPEX: XPath Evaluation against XML Streams

SPEX stands for Streamed and Progressive Evaluation of XPath queries. Streamed means the XML input is processed via a SAX-based interface and with a low memory footprint. Progressive means the query answer is delivered as soon as possible.

Motivation

As XML data becomes a de facto standard on the Web, querying XML data is becoming an issue of great importance, a real research challenge. Several solutions have been proposed in the last years to this challenge, described in various working drafts of W3C or research papers. As a core of this common effort has been identified a navigational approach for information localization in XML data, comprised in a powerful language called XPath [1].
Initially, speaking about XML data was not so different as speaking about other non-XML Web documents, where the size does not play an important role. It was considered straightforward to get all the XML data into memory and then to query it. Approaches like DOM [2] stands for this. Nowadays, there are available huge XML repositories, where size matters. Even more, unbounded XML streams gain more importance due to their practical applications from publish-subscribe systems to data integration over the Web. As quickly realised by the XML community, the random access to XML data precludes efficient query processing with no outstanding gain, and sometimes it makes it even impossible.
Stream-based processing comes as an attempt to overcome pitfalls of the traditional DOM-based processing:

Progressive processing, i.e. stream-based processing generating partial results as soon as they are available, even before the full input is completely read, gives rise to a more efficient evaluation in certain contexts:

XPath Rewriter

XPath proposes constructs for back and forth navigation in the XML document tree. The backward navigation precludes efficient evaluation of XPath expressions on XML data streams, as the evaluation of backward navigation requires to buffer already encountered stream fragments. Therefore, we rewrite XPath expressions involving backward navigation into forward-only expressions.

Streamed XPath Evaluator

SPEX Viewer: A Graphical User Interface for SPEX

SPEX has a graphical user interface. In this way, the user can see at every incoming message from the stream the processing status in terms of current states, stack configurations, and result candidates. Find a tutorial here.

References


(validation) Last modified: Tue Jan 3 15:42:01 CET 2006 SourceForge.net Logo