The objective of this project is to investigate scalability questions arising with a new wave of smart relational data management systems that integrate analytics and query processing. These questions will be addressed by a fundam...
The objective of this project is to investigate scalability questions arising with a new wave of smart relational data management systems that integrate analytics and query processing. These questions will be addressed by a fundamental shift from centralized processing on tabular data representation, as supported by traditional systems and analytics software packages, to distributed and approximate processing on factorized data representation.
Factorized representations exploit algebraic properties of relational algebra and the structure of queries and analytics to achieve radically better data compression than generic compression schemes, while at the same time allowing processing in the compressed domain. They can effectively boost the performance of relational processing by avoiding redundant computation in the one-server setting, yet they can also be naturally exploited for approximate and distributed processing. Large relations can be approximated by their subsets and supersets, i.e., lower and upper bounds, that factorize much better than the relations themselves. Factorizing relations, which represent intermediate results shuffled between servers in distributed processing, can effectively reduce the communication cost and improve the latency of the system.
The key deliverables will be novel algorithms that combine distribution, approximation, and factorization for computing mixed loads of queries and predictive and descriptive analytics on large-scale data. This research will result in fundamental theoretical contributions, such as complexity results for large-scale processing and tractable algorithms, and also in a scalable factorized data management system that will exploit these theoretical insights. We will collaborate with industrial partners, who are committed to assist in providing datasets and realistic workloads, infrastructure for large-scale distributed systems, and support for transferring the products of the research to industrial users.ver más
Seleccionando "Aceptar todas las cookies" acepta el uso de cookies para ayudarnos a brindarle una mejor experiencia de usuario y para analizar el uso del sitio web. Al hacer clic en "Ajustar tus preferencias" puede elegir qué cookies permitir. Solo las cookies esenciales son necesarias para el correcto funcionamiento de nuestro sitio web y no se pueden rechazar.
Cookie settings
Nuestro sitio web almacena cuatro tipos de cookies. En cualquier momento puede elegir qué cookies acepta y cuáles rechaza. Puede obtener más información sobre qué son las cookies y qué tipos de cookies almacenamos en nuestra Política de cookies.
Son necesarias por razones técnicas. Sin ellas, este sitio web podría no funcionar correctamente.
Son necesarias para una funcionalidad específica en el sitio web. Sin ellos, algunas características pueden estar deshabilitadas.
Nos permite analizar el uso del sitio web y mejorar la experiencia del visitante.
Nos permite personalizar su experiencia y enviarle contenido y ofertas relevantes, en este sitio web y en otros sitios web.