Current projects

Approximate lifted inference: a PTIME solution for probabilistic query evaluation
Evaluating queries over probabilistic databases needs to calculate the "reliability" of a query and is #P-hard in general. We propose a new efficient approach for approximate inference which (i) is always in PTIME for every query and data instance (and even expressible in relational algebra), (ii) always results in a unique and well-defined score in [0,1], (iii) is an upper bound to query reliability and both are identical for safe queries, and (iv) is inspired by existing widely deployed propagation approaches for ranking in networks.

Semi-supervised Learning with Heterophily (SSLH)
SSLH is a family of linear inference algorithms that generalize existing graph-based label propagation algorithms by allowing them to propagate generalized assumptions about ``attraction'' or ``compatibility'' between classes of neighboring nodes (in particular those that involve heterophily between nodes where ``opposites attract''). Importantly, this framework allows us to reduce the problem of estimating the relative compatibility between nodes from partially labeled graph to a simple optimization problem. The result is a very fast algorithm that -- despite its simplicity -- surprisingly effective algorithm.

Bootstrapping Virtuous Active Learning cYcles (BOVALY)
We are developing a novel Quiz Item Management System (QIMS) that engages students throughout the semester in a sequence of learning activities. Each of these learning activities fosters critical thinking, and together they lead to knowledge artifacts in the form of quiz items. These artifacts go through cycles of creation, improvements, selections and deduplications and result in re-usable learning artifacts with known provenance and item response characteristics. Combined, the approach thus addresses 5 important challenges for technology-enhanced learning (e.g., MOOCs): 1. Continuous practice and assessment 2. Practice of "generative" skills, 3. "Auto-calibrated" peer evaluation, 4. Re-usable learning artifacts, and 5. Optimal use of instructors' time.

Causality Causality in databases: helping users understand unexpected query results
When queries return unexpected results, users are interested in explanations for what they see. Recently, the database community has proposed various notions of lineage of query results, such as why or where provenance, and very recently, explanations for non-answers. We propose to unify and extend existing approaches for provenance and missing query result explanations ("positive and negative provenance") into one single framework, namely that of understanding causal relationships in databases.

QueryViz QueryViz: helping users understand and re-use existing SQL queries
Re-use of existing SQL queries is difficult since queries are complex structures and not easily understood. QueryViz is a method for translating SQL into a visual formalism that helps users intuitively understand their meaning. It is inspired by the First-Order logic representation of a query and combines succinctness features of both tuple and domain relational calculus, thereby providing a minimal yet expressive visual vocabulary. The project page has a link to the yet incomplete online demo.

Past projects

BeliefDB BeliefDB: managing conflicting beliefs inside a standard DBMS
Data management is becoming increasingly social. Questions arise about how to best model inconsistent and changing opinions in a community of users inside a DBMS. First, we propose an annotation semantics based on modal logic, which allows users to engage in a structured discussion. Second, we propose a principled solution to the automated conflict resolution problem. While based on the certain tuples of all stable models of a logic program, our algorithm is still in PTIME.

Visualized Element Nodes Table Extraction VENTex (Visualized Element Nodes Table Extraction)
Web tables contain a vast amount of semantically explicit information, which makes them a worthwhile target for automatic information extraction and knowledge acquisition from the Web. However, extracting and interpreting web tables is difficult, because of the variety in which tables can be encoded in HTML and style sheets. Our approach VENTex can extract arbitrary web tables by focusing on table representations in the "Visual Web" instead of the "Syntactic Web" as used by previous approaches.

Unique Recall Unique Recall: modeling the limits of knowledge acquisition from redundant data
Often in life, 20% of effort can achieve roughly 80% of the desired effects. Interestingly, this does not hold in the context of web information extraction. We develop an analytic model for information acquisition from redundant data (in the limit of infinitely large data), and use it to derive a new 40-20 rule (crawling 20% of the Web will help us learn less then 40% of its content). We further describe a new family of power law functions that remains invariant under sampling, and use its properties to give a second rule of thumb.

Collaboratative Query Management CQMS (Collaborative Query Management System)
While modern database management systems (DBMSs) provide sophisticated tools for managing data, tools for managing queries over these data have remained relatively primitive. As analysts today share and explore increasingly large volumes of data, they need assistance for repeatedly issuing their queries. This project develops the essential techniques for a Collaborative Query Management System (CQMS) which provides new capabilities ranging from query browsing to automatic query recommendations.

Deregulation of Electricity Markets Economics of deregulated electricity markets
We revisit commonly accepted assumptions about the economics of deregulated electricity markets. First, we disprove that, in theory and under the condition of perfect information, decentralized and centralized unit commitment would lead to the same power quantities trade and the same optimal social welfare. Second, we show that a generator owner's optimum bid sequence for an auction market is generally above marginal cost, even where absolutely no abuse of market power is involved.

Graz Cycle Graz Cycle: a zero emission power cycle of highest efficiency
The Graz Cycle is a thermodynamical combustion cycle that allows to retain and capture CO2 emissions stemming from combustion processes. It burns fossil fuels with pure oxygen which enables the cost-effective separation of the combustion CO2 by condensation. The efforts for the oxygen supply in an air separation plant are partly compensated by cycle efficiencies far higher than 65%. The combined efficiency is equal in thermodynamic performance to any other proposal in the field of Carbon Capture and Storage (CCS).