Evaluation

In order to, firstly, obtain a test set of web pages which is unbiased by us and, secondly, to learn about other people's assumption of what actually constitutes a web table, we asked students from Robert Baumgartner's class on Web data extraction and integration to provide us with a selection of of what they consider to be web tables of interest (Here is the announcement). As a result we got 493 web tables contained on 269 web pages out of 347 web pages submitted by 63 students. What is or what is not a web table was determined by the member of our team checking the respective web page with regard to our definition given in the WWW paper. This set of web pages was made permanent ("dumped") using our Firefox extension WebPagedump and annotated using our ground truthing method, which is outlined in the WWW paper and which will be soon described in more detail. The ground truth set is around 200 MB in size and is available to any researcher who is interested (And it will be put here online as soon as the ground truthing method itself is published...).