I am working on the theory of scalable data management. One of my goals is to extend the capabilities of modern data management systems in generic ways to allow them to support novel functionalities that seem hard at first. Examples of such functionalities are managing provenance, trust, explanations, and uncertain or inconsistent data. To support these functionalities, I am interested in understanding the the fundamental algebraic properties that allow algorithms to scale with the size of data by leveraging structure in data: Given a large data or knowledge base, what types of questions can be answered efficiently? And what do we do about those that cannot?
For the hard questions, our work tries to find ways to change the objective in a way that qualitatively preserves the original motivation, yet installs those desirable algebraic properties (something we call ''algebraic cheating''). Our work has shown that approaches that leverage those properties and optimize for the overall end-to-end goal can work with less training data and achieve remarkable speed-ups.
Thanks to NSF for supporting this work under NSF Career Award IIS-1762268 and NSF Award IIS-1956096. Also thanks to our colleagues for selecting our EDBT 2021 paper for the best paper award, our PODS 2021, SIGMOD 2017, VLDB 2015, and WALCOM 2017 papers among "best of conference", and two of our SIGMOD 2020 papers for reproducibility awards.
Our DATA lab is growing and we are actively looking for students with strong foundations in algorithms, theory, discrete math, data management, and machine learning. Please visit our research opportunities and the topics page of my class ``Principles of scalable data management." Notice I am a big fan of Ray Dalio's principles applied to research.I have been working or co-advising a number of students over the years, not always as direct advisor. My current directly advised PhD students are: Nikos Tziavelis, Neha Makhija, and Agapi Rissaki.