OpenAI's 'Jailbreak-Proof' New Models? Hacked on Day One

Main Idea
OpenAI released its first open-source models, GPT-OSS-120b and GPT-OSS-20b, claiming enhanced jailbreak resistance, but they were quickly cracked by pseudonymous jailbreaker Pliny the Liberator.
Key Points
1. OpenAI touted GPT-OSS-120b and GPT-OSS-20b as fast, efficient, and resistant to jailbreaks, with 'worst-case fine-tuning' in biological and cyber domains.
2. Pliny the Liberator successfully cracked GPT-OSS within hours, sharing screenshots and a jailbreak prompt on social media.
3. OpenAI had conducted extensive testing, including a $500,000 red teaming initiative, to ensure jailbreak resistance, but Pliny's method bypassed these measures.
4. Pliny's jailbreak technique, involving multi-stage prompts and 'LOVE PLINY' markers, has been used to crack multiple OpenAI models, including GPT-4o and GPT-4.1.
5. Pliny's GitHub repository, L1B3RT4S, is a popular resource for jailbreak prompts, with over 10,000 stars.
Description
Hours after releasing its first open-weight models in years with claims of robust safety measures, OpenAI's GPT-OSS has been cracked by notorious AI jailbreaker Pliny the Liberator
Latest News
- Ripple Aiming for More Developers With Ethereum Sidechain: CTO2025-08-07 17:24:16
- Ethereum Treasury Sharplink to Raise $200 Million for More ETH Purchases2025-08-07 17:18:40
- ETH NEARS $4K, TRUMP TARIFFS BEGIN, REKT HITS ANOTHER ATH2025-08-07 16:53:07
- Sam Altman's OpenAI Confirms GPT-5 Is About to Drop2025-08-07 16:33:45
- Stablecoin Provider Paxos to Pay $26.5M Fine to Settle Charges Related to Binance2025-08-07 15:17:54