The objective of this project is to investigate scalability questions arising with a new wave of smart relational data management systems that integrate analytics and query processing. These questions will be addressed by a fundam...
ver más
¿Tienes un proyecto y buscas un partner? Gracias a nuestro motor inteligente podemos recomendarte los mejores socios y ponerte en contacto con ellos. Te lo explicamos en este video
Información proyecto FADAMS
Duración del proyecto: 75 meses
Fecha Inicio: 2016-02-02
Fecha Fin: 2022-05-31
Líder del proyecto
UNIVERSITAT ZURICH
No se ha especificado una descripción o un objeto social para esta compañía.
TRL
4-5
Presupuesto del proyecto
2M€
Fecha límite de participación
Sin fecha límite de participación.
Descripción del proyecto
The objective of this project is to investigate scalability questions arising with a new wave of smart relational data management systems that integrate analytics and query processing. These questions will be addressed by a fundamental shift from centralized processing on tabular data representation, as supported by traditional systems and analytics software packages, to distributed and approximate processing on factorized data representation.
Factorized representations exploit algebraic properties of relational algebra and the structure of queries and analytics to achieve radically better data compression than generic compression schemes, while at the same time allowing processing in the compressed domain. They can effectively boost the performance of relational processing by avoiding redundant computation in the one-server setting, yet they can also be naturally exploited for approximate and distributed processing. Large relations can be approximated by their subsets and supersets, i.e., lower and upper bounds, that factorize much better than the relations themselves. Factorizing relations, which represent intermediate results shuffled between servers in distributed processing, can effectively reduce the communication cost and improve the latency of the system.
The key deliverables will be novel algorithms that combine distribution, approximation, and factorization for computing mixed loads of queries and predictive and descriptive analytics on large-scale data. This research will result in fundamental theoretical contributions, such as complexity results for large-scale processing and tractable algorithms, and also in a scalable factorized data management system that will exploit these theoretical insights. We will collaborate with industrial partners, who are committed to assist in providing datasets and realistic workloads, infrastructure for large-scale distributed systems, and support for transferring the products of the research to industrial users.