Sponsor of the Day:
Jerkmate
https://www.aisi.gov.uk/blog/announcing-our-san-francisco-office
Announcing our San Francisco office | AISI Work
We are opening an office in San Francisco! This will enable us to hire more top talent, collaborate closely with the US AI Safety Institute and engage even...
san francisco officeaisi workannouncing
https://www.aisi.gov.uk/blog/early-insights-from-developing-question-answer-evaluations-for-frontier-ai
Early Insights from Developing Question-Answer Evaluations for Frontier AI | AISI Work
A common technique for quickly assessing AI capabilities is prompting models to answer hundreds of questions, then automatically scoring the answers. We share...
early insightsquestion answerfrontier aiaisi workdeveloping
https://www.aisi.gov.uk/blog/how-to-evaluate-control-measures-for-ai-agents
How to evaluate control measures for AI agents? | AISI Work
Our new paper outlines how AI control methods can mitigate misalignment risks as capabilities of AI systems increase
control measuresai agentsaisi workevaluate
https://www.aisi.gov.uk/blog/a-pipeline-for-transcript-analysis-using-inspect-scout
A pipeline for transcript analysis using Inspect Scout | AISI Work
We outline a step-by-step pipeline for using our open-source transcript analysis tool, Inspect Scout.
analysis usingaisi workpipelinetranscriptinspect
https://www.aisi.gov.uk/blog/deepening-our-partnership-with-google-deepmind
Deepening our partnership with Google DeepMind | AISI Work
Expanding our collaboration with a new research MOU
google deepmindaisi workdeepeningpartnership
https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities
Our evaluation of Claude Mythos Preview’s cyber capabilities | AISI Work
We conducted cyber evaluations of Anthropic’s Claude Mythos Preview and found continued improvement in capture-the-flag (CTF) challenges and significant...
claude mythoscyber capabilitiesaisi workevaluation
https://www.aisi.gov.uk/blog/from-bugs-to-bypasses-adapting-vulnerability-disclosure-for-ai-safeguards
From bugs to bypasses: adapting vulnerability disclosure for AI safeguards | AISI Work
Exploring how far cyber security approaches can help mitigate risks in generative AI systems, in collaboration with the National Cyber Security Centre (NCSC).
vulnerability disclosureai safeguardsaisi workbugsbypasses
https://www.aisi.gov.uk/blog/transcript-analysis-for-ai-agent-evaluations
Transcript analysis for AI agent evaluations | AISI Work
Why we use transcript analysis for our agent evaluations, and results from an early case study.
ai agentaisi worktranscriptanalysisevaluations
https://www.aisi.gov.uk/blog/evals-bounty
Bounty programme for novel evaluations and agent scaffolding | AISI Work
We are launching a bounty for novel evaluations and agent scaffolds to help assess dangerous capabilities in frontier AI systems.
aisi workbountyprogrammenovelevaluations