Towards a Dichotomy for Minimally Factorizing the Provenance of Self-Join Free Conjunctive Queries

Proposes the problem of finding the minimal-size factorization of the provenance of queries. Shows it to be a natural generalization and to recover all known PTIME cases of the problem of exact probabilistic inference.
Makes first steps towards a dichotomy of the data complexity of the problem.

We consider the problem of finding the minimal-size factorization of the provenance of self-join-free conjunctive queries, i.e., we want to find an equivalent propositional formula that minimizes the number of variable occurrences. Our work is partly motivated from probabilistic inference where read-once formulas are known to allow exact PTIME solutions and non-read-once formulas allow approximate solutions with an error that depends on the number of repetitions of variables. We embark on the challenge of characterizing the data complexity of this problem and show its connection to the query resilience problem. While the problem is NP-complete in general, we develop an encoding as max-flow problem that is guaranteed to give the exact solution for several queries (and otherwise approximate minimizations). We show that our encoding is guaranteed to return a read-once factorization if it exists. Our problem and approach is a complete solution that naturally recovers exact solutions for all known PTIME cases, as well as identifying additional queries for which the problem can be solved in PTIME.