- Towards Domain-Independent Information Extraction from Web Tables,
Wolfgang Gatterbauer, Paul Bohunsky, Marcus Herzog, Bernhard Krüpl, Bernhard Pollak,
In Proceedings of the 16th International World Wide Web Conference (WWW 2007), pp. 71-80, Banff, Canada, May 2007. ACM Press.
[ Paper, bib]
- Creating Permanent Test Collections of Web Pages for Information Extraction Research,
Bernhard Pollak, Wolfgang Gatterbauer.
In Proceedings of the 33nd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2007), Volume II, pp. 103-115, Prague, Czech Republic, January 2007. Institute of Computer Science, Academy of Sciences of the Czech Republic.
[ Paper, Presentation, Poster, bib]
WebPageDump: an essential part of the ground truthing system. Details on the method of ground truthing arbitrary web pages soon to follow.
- Table Extraction Using Spatial Reasoning on the CSS2 Visual Box Model,
Wolfgang Gatterbauer, Paul Bohunsky.
In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), pp. 1313-1318, Boston, Massachusetts, July 2006. AAAI, MIT Press.
[ Paper, Presentation, bib]
VENTrec: the precursor of VENTex.