https://openreview.net/forum?id=6MQ4tTH8km&referrer=%5Bthe%20profile%20of%20Keiran%20Paster%5D(%2Fprofile%3Fid%3D~Keiran_Paster1)
The generality and dynamic nature of large language models (LLMs) make it difficult for conventional quantitative benchmarks to accurately assess their...
report cardsqualitative evaluationnatural languagellmsusing
https://www.bhpa-incidents.org/
BHPA Incident Report Summaries - a searchable summary of incident reports received by the British Hang Gliding and Paragliding Association.
incident reportsummariesinformal