I am working on the theory of scalable data management. One of my goals is to extend the capabilities of modern data management systems in generic ways to allow them to support novel functionalities that seem hard at first. Examples of such functionalities are managing provenance, trust, explanations, and uncertain or inconsistent data. To support these functionalities, I am interested in understanding the fundamental algebraic properties that allow algorithms to scale to large amounts of data: Given a large data or knowledge base, what types of questions can be answered efficiently? And what do we do about those that cannot?
For the hard questions, our work tries to find ways to change the objective in a way that qualitatively preserves the original motivation, yet installs those nice algebraic properties (something we call ''algebraic cheating''). Our work has shown that approaches that leverage those properties and optimize for the overall end-to-end goal can work with smaller training data and achieve remarkable speed-ups.
Thanks to NSF for supporting this work under NSF Career Award IIS-1762268 (formerly IIS-1553547) and NSF Award IIS-1956096. Also thanks to the respective committees for selecting our EDBT 2021 paper for the best paper award, our PODS 2021, SIGMOD 2017, VLDB 2015, and WALCOM 2017 papers among "best of conference", and two of our SIGMOD 2020 papers for reproducibility awards.
Our DATA lab is growing and we are actively looking for students with strong foundations in algorithms, theory, discrete math, data management, and machine learning. Please visit our research opportunities. Notice I am a big fan of Ray Dalio's principles applied to research.I have been co-advising a number of students over the years. My current directly advised PhD students are: Nikos Tziavelis, Neha Makhija, Jiahui Zhang, and Zixuan Chen.