I am working on the theory of scalable data management. One of my goals is to extend the capabilities of modern data management systems in generic ways as to allow them to support novel functionalities that seem hard at first. Examples of such functionalities are managing provenance, trust, explanations, uncertain or inconsistent data. To support these functionalities, I am interested in understanding the fundamental algebraic properties that allow algorithms to scale to large amounts of data: Given a large data or knowledge base, what types of questions can be answered efficiently? And what do we do about those that cannot?
For the hard questions, our work tries to find ways to change the objective in a way that qualitatively preserves the original motivation, yet installs those nice algebraic properties (something we call ''algebraic cheating''). Our work has shown that approaches that leverage those properties look at the overall end-to-end goal in a more holistic way can often work with smaller training data and achieve remarkable speed-ups.
Thanks to NSF for supporting us under NSF Career Award IIS-1762268 (formerly IIS-1553547) and NSF Award IIS-1956096. Also thanks to the respective award committees for selecting our EDBT 2021 paper on homographs for the best paper award, and our VLDB 2015, SIGMOD 2017, and WALCOM 2017 papers among "best of conference."
Our DATA lab is growing and we are actively looking for students with strong foundations in algorithms, theory, discrete math, data management, and machine learning. Please visit our research opportunities. Notice I am a big fan of Ray Dalio's principles applied to research.Current PhD students: Neha Makhija, Nikos Tziavelis, Aristotelis Leventidis, Jiahui Zhang, Zixuan Chen