Expected Impact:The outcome should contribute to:
Standardisation of testing for dialogue systems.Enhanced clarity on the performances of dialogue systems for all stakeholders, including system developers, funders, and users.Community building at the European defence level.Trustworthy dialogue systems that enhance operational decision-making.Availability of databases to further develop dialogue systems. Objective:Human-AI dialogue systems offer impressive results but are still prone to errors of various types. Moreover, there is no established metric to measure system performances. In order to ensure trustworthiness and steer progress, these systems should be submitted to common tests using shared data and clear metrics and protocols.
The goal of this call topic is thus to set up a testing environment and organise a technological challenge to evaluate the performances of such systems for defence use cases, including their abilities to manage classified information and to justify their answers. The challenge should be open to research teams supported through another call topic (EDF-2025-LS-RA-CHALLENGE-DIGIT-HAIDP) and possibly by other sources of funding....
ver más
Expected Impact:The outcome should contribute to:
Standardisation of testing for dialogue systems.Enhanced clarity on the performances of dialogue systems for all stakeholders, including system developers, funders, and users.Community building at the European defence level.Trustworthy dialogue systems that enhance operational decision-making.Availability of databases to further develop dialogue systems. Objective:Human-AI dialogue systems offer impressive results but are still prone to errors of various types. Moreover, there is no established metric to measure system performances. In order to ensure trustworthiness and steer progress, these systems should be submitted to common tests using shared data and clear metrics and protocols.
The goal of this call topic is thus to set up a testing environment and organise a technological challenge to evaluate the performances of such systems for defence use cases, including their abilities to manage classified information and to justify their answers. The challenge should be open to research teams supported through another call topic (EDF-2025-LS-RA-CHALLENGE-DIGIT-HAIDP) and possibly by other sources of funding. Representative defence users should be involved to contribute to the definition of the use cases and associated data, to test the demonstrators produced by the participating teams, and to provide feedback.
Scope:The proposals should address the organisation of a technological challenge on human-AI dialogue based on the preliminary evaluation plan provided as part of the call document (cf. Annex 4). This includes the collection, annotation and distribution of data, the elaboration of evaluation plans and metrics, the measurement of system performances, and the organisation of debriefing workshops.
Types of activities
The following types of activities are eligible for this topic:
Types of activities
(art 10(3) EDF Regulation)
Eligible?
(a)
Activities that aim to create, underpin and improve knowledge, products and technologies, including disruptive technologies, which can achieve significant effects in the area of defence (generating knowledge)
Yes
(optional)
(b)
Activities that aim to increase interoperability and resilience, including secured production and exchange of data, to master critical defence technologies, to strengthen the security of supply or to enable the effective exploitation of results for defence products and technologies (integrating knowledge)
Yes
(mandatory)
(c)
Studies, such as feasibility studies to explore the feasibility of new or upgraded products, technologies, processes, services and solutions
Yes
(optional)
(d)
Design of a defence product, tangible or intangible component or technology as well as the definition of the technical specifications on which such design has been developed, including partial tests for risk reduction in an industrial or representative environment
Yes
(optional)
(e)
System prototyping of a defence product, tangible or intangible component or technology
No
(f)
Testing of a defence product, tangible or intangible component or technology
No
(g)
Qualification of a defence product, tangible or intangible component or technology
No
(h)
Certification of a defence product, tangible or intangible component or technology
No
(i)
Development of technologies or assets increasing efficiency across the life cycle of defence products and technologies
No
The proposals must cover at least the following tasks as part of mandatory activities:
Integrating knowledge: Setting-up of the infrastructure for testing human-AI dialogue systems in the framework of the technological challenge.Elaboration of data annotation guidelines, collection and annotation of data, quality assessment, distribution and curation of databases.Organisation of the evaluation campaigns, and in particular. coordination of the exchanges with the participating teams and any other relevant stakeholders on the evaluation plans and elaboration of these plans.management of the experimental test campaigns and of the objective measurements of the performances of the systems submitted to the tests by the participating teams according to the protocols and metrics described in the evaluation plans.organisation of the debriefing workshops. The proposals should include descriptions of work packages, tasks and deliverables that enable a clear assessment of work package completion. These should include the production of detailed evaluation plans agreed upon by all stakeholders, the production of the annotated databases needed for the evaluations, the production of measurements for all systems submitted to the tests by the participating teams following these plans, and the organisation of the needed events.
Functional requirements
The proposed solutions should enable the measurement of the performances of dialogue systems according to detailed evaluation plans based on the preliminary evaluation plan provided as part of the call document (cf. Annex 4). Key aspects of the foreseen detailed evaluation plans and associated data management should be described in the proposals. The proposals should in particular describe:
the scenarios considered.the languages that can be covered for each evaluation campaign.the nature and volume of data annotation to be produced, and in particular how the data is representative of defence use cases.a detailed plan of the test campaigns and an overall timeline/Gantt chart of the technological challenge.the evaluation procedures (rules and tools to implement the metrics) and significance tests to be performed on measurements. A user board consisting of representative defence users should be set up and involved in live tests. The proposals should describe the foreseen efforts from users to test demonstrators and provide feedback.
During the challenge, detailed evaluation plans should be prepared for each evaluation campaign. Drafts of these detailed evaluation plans should be submitted for discussion to the participating teams, early enough to take into account the feedback for the actual evaluation campaigns. Any evolution of the evaluation plans should take into account several factors: technical possibilities and cost, scientific relevance of the measurement, and representativeness of the metrics and protocols with respect to military needs.
More generally, the user board and the participating teams should be involved in the steering of the technological challenge. The proposals should include a clear description of the foreseen governance and decision-making processes at the technological challenge level.
ver menos
Características del consorcio
Características del Proyecto
Características de la financiación
Información adicional de la convocatoria
Otras ventajas