Descripción del proyecto
Formal language theory is indisputably one of the most successful theories in theoretical computer science with many applications in such fields as formal verification, programming languages, and databases, to name a few. Despite this, the standard restriction to finite alphabets has been a limiting factor in the applicability of the theory in various important application domains, wherein sequences with data (i.e. ranging over an infinite domain like the set of integers and the set of strings) naturally arise. This has motivated an extensive study of automata over infinite alphabets in the last three decades, which aims to develop algorithms for analyzing data languages. It was, however, quickly realized that permitting anything but the simplest data reasoning results in undecidability for the most fundamental models in the theory. Given the growing practical needs for tools for reasoning about sequences with data, it is crucial that we break through this undecidability barrier.
The aim of the project is to overcome the current undecidability barrier in extending automata theory over infinite alphabets with complex data reasoning. The first concrete objective is, thus, to identify new decidable logic/automata models for data languages that permit complex data reasoning over various data domains (e.g. integer linear arithmetic, real-closed fields, string constraints, etc.), and to develop efficient algorithms for analyzing them. In addition to making fundamental theoretical contributions in automata theory
over infinite alphabets, I also aim to address the current practical needs for tools for reasoning about sequences.
Thus, the second objective of the project is to transfer new algorithmic insights to the following important application
domains:
(1) querying graph databases, (2) verification of list-manipulating programs, and (3) interpretable machine learning for sequences.