Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

TODS (to appear)

Project page: Any-k

Builds upon our PODS'21 work on the complexity of direct access to a ranked list of answers to a database query. Establishes a complete dichotomy of the selection problem with lexicographic orders.

We study the question of when we can answer a Conjunctive Query (CQ) with an ordering over the answers by constructing a structure for direct (random) access to the sorted list of answers, without actually materializing this list, so that the construction time is linear (or quasilinear) in the size of the database. In the absence of answer ordering, such a construction has been devised for the task of enumerating query answers of free-connex acyclic CQs, so that the access time is logarithmic. Moreover, it follows from past results that within the class of CQs without self-joins, being free-connex acyclic is necessary for the existence of such a construction (under conventional assumptions in fine-grained complexity).

In this work, we embark on the challenge of identifying the answer orderings that allow for ranked direct access with the above complexity guarantees. We begin with the class of lexicographic orderings and give a decidable characterization of the class of feasible such orderings for every CQ without self-joins. We then continue to the more general case of orderings by the sum of attribute scores. As it turns out, in this case ranked direct access is feasible only in trivial cases. Hence, to better understand the computational challenge at hand, we consider the more modest task of providing access to only one single answer (i.e., finding the answer at a given position). We indeed achieve a quasilinear-time algorithm for a subset of the class of full CQs without self-joins, by adopting a solution of Frederickson and Johnson to the classic problem of selection over sorted matrices. We further prove that none of the other queries in this class admit such an algorithm.

In this work, we embark on the challenge of identifying the answer orderings that allow for ranked direct access with the above complexity guarantees. We begin with the class of lexicographic orderings and give a decidable characterization of the class of feasible such orderings for every CQ without self-joins. We then continue to the more general case of orderings by the sum of attribute scores. As it turns out, in this case ranked direct access is feasible only in trivial cases. Hence, to better understand the computational challenge at hand, we consider the more modest task of providing access to only one single answer (i.e., finding the answer at a given position). We indeed achieve a quasilinear-time algorithm for a subset of the class of full CQs without self-joins, by adopting a solution of Frederickson and Johnson to the classic problem of selection over sorted matrices. We further prove that none of the other queries in this class admit such an algorithm.