https://openreview.net/forum?id=6Mxhg9PtDE
Safety Alignment Should be Made More Than Just a Few Tokens Deep | OpenReview
The safety alignment of current Large Language Models (LLMs) is vulnerable. Simple attacks, or even benign fine-tuning, can jailbreak aligned models. We note...
safety alignmentmadetokensdeepopenreview
https://arxiv.org/abs/2406.05946
[2406.05946] Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Abstract page for arXiv paper 2406.05946: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
safety alignmentmadetokensdeep
https://framia.pro/page/en-US/news/deepseek-v4-safety-alignment
DeepSeek V4 Safety & Alignment: What Organizations Need to Know
Apr 29, 2026 - DeepSeek V4 safety overview: post-training alignment, open-weight risks, deployment safeguards, and regulatory considerations for enterprise use in 2026.
safety alignmentorganizations needdeepseekknow
https://registry.opendata.aws/gretel-synthetic-safety-alignment-en-v1/
Gretel Synthetic Safety Alignment Dataset - Registry of Open Data on AWS
safety alignmentdataset registrygretelsyntheticopen
https://arxiv.org/abs/2605.04992
[2605.04992] You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight...
Abstract page for arXiv paper 2605.04992: You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation
safety alignmentsnoozeloseautomaticrestoration
https://huggingface.co/datasets/gretelai/gretel-safety-alignment-en-v1
gretelai/gretel-safety-alignment-en-v1 · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
safety alignmentgreteldatasetshuggingface
https://llm-safety-challenges.github.io/
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
large languagefoundationalchallengesassuringalignment
https://thechels.uk/this-is-fine
This is Fine - AI Alignment and Safety Research
This is Fine - AI Alignment and Safety Research | Weak Notes
ai alignmentfinesafetyresearch
https://jobs.ashbyhq.com/cbai/005ca9cd-8e31-42c9-9778-c4286f7d50df
Research Program Associate, AI Safety @ Cambridge Boston Alignment Initiative
ABOUT CAMBRIDGE BOSTON ALIGNMENT INITIATIVE The Cambridge Boston Alignment Initiative http://cbai.ai (CBAI) is a nonprofit research organization working to...
research programai safetycambridge bostonassociatealignment
https://jobs.ashbyhq.com/cbai/c99e7019-5dda-4739-8943-a19f47570689
Research Manager, AI Safety @ Cambridge Boston Alignment Initiative
manager aicambridge bostonresearchsafetyalignment
https://80000hours.org/articles/11-essential-resources-ai-risk/
AI Safety Reading List 2025 (11 AI Risk & Alignment Resources) | 80,000 Hours
Apr 29, 2026 - 80,000 Hours’ top 11 resources for an AI safety crash course in 2025 — essential research on AGI risk and alignment.
ai safetyreading listriskalignmentresources