The Infocommunications Media Development Authority (IMDA) and the AI Verify Foundation have launched the Generative AI Evaluation Sandbox for Trusted AI ("Sandbox"), a first-of-its-kind initiative to bring global ecosystem players together to develop methods of testing of trusted artificial intelligence (AI) products, and to address the currently undeveloped evaluation of AI in certain domains or culturally specific areas (e.g., human resources and security).
The Sandbox aims to develop a baseline of testing methods, or "common language" for the evaluation of AI; develop a body of knowledge on how AI models should be tested in practice; and develop new benchmarks and tests.
The ecosystem players involved in the Sandbox so far include key model developers, application developers, and third-party AI testers. A full list of participants is available on the IMDA's website here.
The AI Verify Foundation ("Foundation") was established by the IMDA in June 2023 to harness the collective power and contributions of the global open-source community in developing testing tools for the responsible use of AI. The Sandbox is a clear exemplification of this goal.
The Foundation promotes best practices and standards for AI systems and currently has over 60 general members, many of whom are technology powerhouses.
Offering a common language for evaluation of AI
The Sandbox hopes to develop a baseline of methods and recommendations for Large Language Models (LLMs) through the Evaluation Catalogue. The catalogue compiles existing commonly-used technical testing tools, organises these tests according to what they test for and their methods, and recommends a baseline set of evaluation tests for their use in generative AI products.
These tests would guide companies employing generative AI systems in preventing risks, such as misinformation and bias.
The recommended set of safety and trustworthiness evaluations in the catalogue take reference from the 11 governance principles in the AI Verify Framework, which are consistent with international governance frameworks, including those from the United States, the European Union, and the Organisation for Economic Co-Operation and Development (OECD).
Developing a body of knowledge on how generative AI products should be tested
The Sandbox will also help build evaluation capabilities, particularly with AI application developers, recognising that the implementation of evaluation tests also needs to be developed. This is in recognition that such evaluation capabilities currently reside largely with AI model developers.
Players in the third-party testing ecosystem will be involved in the Sandbox, to help model developers understand the priorities of external testers. According to the IMDA, each Sandbox use case should involve an upstream generative AI model developer, a downstream application developer, and a third-party tester, to demonstrate how these different players can collaborate. The Sandbox will also involve participation by regulators, including the Singapore Personal Data Protection Commission (PDPC). In doing so, the Sandbox will provide a space for experimentation and development, and encourage parties along the supply chain to be transparent about their needs.
Developing new benchmarks and tests
The IMDA anticipates that the Sandbox use cases will reveal gaps in the ways generative AI systems are currently tested, particularly in domains such as human resources and security, and culturally specific areas that are currently underdeveloped. The IMDA has highlighted that because LLMs are trained on data from the internet, their output may not be representative of the nuances in Singapore's cultural context. For example, such systems may not appreciate that there exists a diversity of faiths and languages within racial groups.
The intention is for the Sandbox to develop benchmarks for evaluating model performance in specific areas, which may be important for countries like Singapore with cultural and language specificities.
The implementation of AI requires careful consideration of organisational, technical, ethical and regulatory issues. The AI legal landscape is constantly evolving, with key legal issues including intellectual property and confidentiality, data privacy, reliability and quality of AI output, and bias and discrimination.
Companies developing and implementing AI tools must put the appropriate safeguards in place, including employee training, human involvement, periodic reviews of model output to ensure quality, and traceability capabilities built into algorithms for ease of identifying and rectifying issues. Stakeholders, including users, auditors and regulators, are also increasingly expecting companies to uphold transparency and explainability, and provide meaningful information on their use of AI.
The Sandbox itself is an exciting initiative that further demonstrates Singapore's keenness to stay ahead of the curve and foster innovation while encouraging responsible and trustworthy AI solutions, and to actively engage the AI community in doing so. This approach will help Singapore to develop protocols and standards that are responsive and relevant, and companies are encouraged to engage and contribute to the conversation.
AI Verify Foundation and IMDA are currently inviting interested model and application developers, and third-party testers, to participate in the Sandbox.