Systems Seminar

EPFL IC Systems Seminar

S3K - Seeking Statement-Supporting top-K Witnesses


Traditional information retrieval techniques based on keyword search help to identify a ranked set of relevant documents for a user’s interest. When searching for evidence of general statements, e.g. to verify their correctness, the results of these approaches often contain many documents in the top ranks that do not meet the user’s intention, as the connection between individual keywords is often lost and alternative expressions are only considered on a per-keyword basis. » In this talk, I will discuss our document retrieval approach for ‘statement search’ based on extraction systems. » While one natural application is the retrieval of documents that support extracted information to verify the correctness of the extracted piece of information and thus to evaluate the extraction method, a more general application in combination with an extraction system is a fact based indexing of a document corpus allowing statement based document retrieval on this corpus for general users. » The ranking model proposed in this approach is based on statistical language-models and considers aspects such as the authority of a document and the confidence in the textual pattern representing the queried information. By using an extraction system as the indexing mechanism, our approach can recognize a variety of different ways to express a given semantic statement.


Steffen Metzger obtained his bachelor’s degree in computer science from Saarland University, Saarbrücken, Germany. After a two-semester visit at the University of Edinburgh, Edinburgh, Scotland, he returned to Saarbrücken to complete his master’s degree in computer science. He is interested in a broad field of topics, having written his bachelor’s thesis in the area of automatic theorem proving under supervision of Jörg Siekman while his master’s thesis, which was supervised by Philipp Slusallek, was concerned with crowd movement simulation in computer games.

Since January 2010 he is a researcher at the Max-Planck-Institute for Informatics, where his research is mainly concerned with user-centric information extraction, ontology maintenance and ontology based applications. His work at the Databases and Information Systems Group, headed by Gerhard Weikum, led to his recently completed PhD thesis, which was supervised by Ralf Schenkel.