Providing intelligent search capabilities as part of the Agora platform is crucial for fostering collaboration between Higher Education Institutions in the European Union. Taking into consideration that Agora is an extensive platform that provides access to heterogeneous acceleration services that differ in the kind, scope and volume of collected data, the search service is a key component of the system.
AI-Assisted Search in a Complex, Multi-Source Data Environment
As the collected data and provided services vary in complexity, origin, access rules, intended users and the level of integration with the platform, the search capabilities have to go beyond conventional techniques utilized in information retrieval systems. Thus, to enhance the discovery, exploration, and analysis of the data gathered in the acceleration services, a dedicated AI assistant is developed as an integral part of the platform. The introduction of a conversational interface offered by the AI assistant ensures uniform access to all the search capabilities that stem from the proposed scenarios without the need for developing elaborate training procedures for HEI staff.
Regulatory Compliance and Trustworthy AI Principles
The fast-paced progress in the field of AI means that the requirements for the intelligent search cannot be reduced to a set of recommendations that indicate particular types of AI models, categories of system architectures or specific service providers. Instead, to make Agora a sustainable platform, there is a need for open, transparent, and reproducible guidelines that will test the capabilities of the AI assistant. As any AI system to be put into service in the European Union, the AI assistant has to conform to the legal requirements of the European Parliament and of the Council. Recital 27 of Regulation (EU) 2024/1689 of the European Parliament (the AI Act) lists seven principles for trustworthy and ethically sound AI developed by the high-level expert group on artificial intelligence (AI HLEG) of the European Commission:
- Human agency and oversight
- Technical robustness and safety
- Privacy and data governance
- Transparency
- Diversity, non-discrimination and fairness
- Societal and environmental well-being
- Accountability
The AI assistant of Agora, being an AI system deployed in the European Union, should follow the aforementioned principles. In the case of intelligent search, human agency is the result of the problem setting. This is because the decisions made based on the search results are the sole responsibility of the Agora platform users. However, the other principles to be preserved require considerable technical and organizational efforts.
To ensure privacy and data governance, standard practices imposed by General Data Protection Regulation (GDPR, Regulation (EU) 2016/679) have to be followed. For safety, transparency and accountability, extensive usage logs should be collected that encompass search queries, results being returned, timestamps, fingerprints of all AI models being utilized in the process of executing the user query and GDPR-compliant user data. This is an obligatory practice for high-risk AI systems (Article 26, paragraph 6 of the AI Act), however taking into consideration the effectiveness of log analysis for quality control, detection of security accidents and determining the responsibility of natural and artificial agents in the search process, it is beneficial to implement usage logs collection even in case of AI systems that present moderate risk only.
Risk management
If a general-purpose AI model with systemic-risk is employed to implement AI assistant, then to reduce safety risks extensive red-teaming with the use of domain-specific data should be performed. The goal of this red-teaming should be the detection of hallucinations, misinformation and harmful content. Furthermore, specific requirements placed by the stakeholders and owners of the acceleration services with regard to copyright and access rights and the organization of the access rights in the Agora platform itself have to be taken into consideration, while penetration tests are executed to minimize the risk of revealing unauthorized data.
For the purpose of evaluation, a domain-specific benchmark that implements common usage scenarios and performs multi-criteria assessment of the search component should be developed. To guarantee technical robustness of the implemented search solution, standard performance measures adopted in the industry and research practice such as Precison@K, Recall@K, mean average precision, and discounted cumulative gain should be tracked across revisions. Diversity, non-discrimination, and fairness are key concerns that have to be addressed in the intelligent search engine which aims to streamline collaboration across scientists and institutions on the basis of qualifications and competences. Thus, to minimize the impact of non-merit criteria on the search results, the measurement of fairness has to be an integral part of the proposed benchmark.
Environmental Impact
While societal impact is a concern that should be analyzed at the system-wide level with intelligent search being considered one of many building blocks of the Agora platform, the environmental impact and in particular carbon footprint of the implemented search solution should be assessed.