I am working on the theory of scalable data management. One of my goals is to extend the capabilities of modern data management systems in generic ways to allow them to support novel functionalities that seem hard at first. Examples of such functionalities are managing provenance, trust, explanations, and uncertain or inconsistent data. To support these functionalities, I am interested in understanding the fundamental algebraic properties that allow algorithms to scale with the size of data by leveraging structure in data: Given a large data or knowledge base, what types of questions can be answered efficiently? And what do we do about those that cannot?
For the hard questions, this work of line tries to find ways to change the objective in a way that qualitatively preserves the original motivation, yet installs those desirable algebraic properties (something we call ''algebraic cheating''). Our work has shown that approaches that leverage those properties and optimize for the overall end-to-end goal can work with less training data and achieve remarkable speed-ups.
Thanks to NSF for supporting us under NSF Career Award IIS-1762268 and NSF Award IIS-1956096. Also thanks a lot to the anonymous reviewers and respective committees for selecting one of our SIGMOD 2024 papers for an honorable mention, our EDBT 2021 paper for the best paper award, our PODS 2021, SIGMOD 2017, VLDB 2015, and WALCOM 2017 papers among "best of conference", and two of our SIGMOD 2020 papers for reproducibility awards.
ORCID, Google scholar, DBLP, ArXiv, ACM profile.
Before academia, I worked for McKinsey & Co. My first university degree is a Dipl.-Ing. (basically a combined BSc and MSc degree) in Mechanical Engineering. I also won a bronze medal in the International Physics Olympiad (IPHO).
Please read here.