Google's Gemma 4 Gets "Abliterated" Just Days After Launch

A safety-removing variant of Google’s flagship open-weight model appeared on Hugging Face within days of its April 2, 2026 release — stripping 93.7% of its refusal behaviours through a novel weight-editing technique that requires no retraining.

Google released Gemma 4 on April 2, 2026 — its most capable open-weight model family to date, built from the same research lineage as Gemini 3 and published under the commercially permissive Apache 2.0 licence. The company described it as delivering “an unprecedented level of intelligence-per-parameter,” with four model sizes spanning edge devices to workstations. Within roughly two days, an independent researcher group posted a version to Hugging Face that had been surgically altered to bypass the model’s safety filters entirely.

What Is Gemma 4?

Gemma 4 ships in four configurations: the E2B and E4B “effective parameter” models optimised for smartphones and edge hardware, and the larger 26B A4B Mixture-of-Experts and 31B Dense variants designed for workstations and servers. All four handle text, images, and video; the two edge models additionally support native audio input, enabling on-device speech processing without a network call.

The models are trained on material spanning over 140 languages with a knowledge cutoff of January 2025. On Arena AI’s public leaderboard, the 31B Dense and 26B MoE variants ranked third and sixth respectively — remarkable given they compete against models many times their parameter count. The Apache 2.0 licence distinguishes Gemma 4 from earlier generations, which used Google’s more restrictive Gemma licence, and removes legal ambiguity for enterprise deployers.

Variant	Architecture	Active Params	Target Hardware	Audio Input
E2B	Dense	~2B	Smartphones	Yes
E4B	Dense	~4B	Edge / Phones	Yes
26B A4B	Mixture-of-Experts	4B	Consumer GPU	No
31B	Dense	31B	Workstation / Server	No

The “CRACK” Variant: What Actually Happened

The repository dealignai/Gemma-4-31B-JANG_4M-CRACK appeared on Hugging Face within two days of Google’s launch. Its creators, operating under the name dealign.ai, describe the work not as a traditional software crack but as abliteration — a weight-surgery technique that removes a model’s learned tendency to refuse requests without retraining it from scratch.

The specific method applied is called MPOA (Magnitude-Preserving Oblique Ablation). Rather than fine-tuning the model on permissive examples — which can degrade general capability — MPOA identifies directions in the model’s weight space that correspond to refusal behaviour and removes them through a mathematically controlled edit, preserving the magnitude of the remaining weights to minimise collateral damage to model quality.

“Full abliteration of the dense Gemma 4 31B. 93.7% HarmBench compliance with only −2.0% MMLU drop.”
— dealignai model card, Hugging Face, April 2026

The benchmark result is striking: the CRACK variant complies with 93.7% of prompts in the HarmBench safety evaluation suite, while its performance on MMLU — a standard academic knowledge benchmark — drops by just 2 percentage points (from 76.5% to 74.5%). The creators describe this as “minimal knowledge loss from surgery,” and the numbers broadly support that characterisation. The model retains its full multimodal capability, running in a mixed-precision format that keeps the total file size to approximately 18 GB.

# Model card metadata (dealignai/Gemma-4-31B-JANG_4M-CRACK)
Source:        google/gemma-4-31b-it
Architecture:  Dense Transformer + Hybrid Sliding/Global Attention
Abliteration:  CRACK (MPOA refusal removal)
HarmBench:     93.7%  compliance
MMLU delta:    −2.0%  (76.5% → 74.5%)
Model size:    18 GB  (JANG 4M mixed precision)
Vision:        Yes (float16 passthrough)

Fact-Check: What a Viral Summary Got Wrong

Reports circulating on social platforms have described this story accurately in broad strokes but introduced several factual errors. The table below corrects the most significant ones.

Accuracy Review

Circulating Claims vs. Verified Facts

✗ Claimed: Gemma 4 was released on April 3, 2026.
Correct: Google released Gemma 4 on April 2, 2026, per the official Google DeepMind announcement and Wikipedia’s Gemma model page.

✗ Claimed: “149 out of 159 rejection vectors were removed — a 93.7% reduction.”
Correct: This framing is fabricated. The 93.7% figure refers to HarmBench compliance rate — the percentage of safety-evaluation prompts the model now answers. The model card makes no mention of “159 rejection vectors.” The actual method (MPOA) operates on weight directions, not a discrete counted list of vectors.

✗ Claimed: Gemma 4 is “open-source.”
Correct: Gemma 4 is more precisely described as open-weight. The model weights are freely downloadable under Apache 2.0, but the training data and full training code are not publicly released.

✓ Confirmed: The dealignai/Gemma-4-31B-JANG_4M-CRACK model does exist on Hugging Face, was uploaded within days of Gemma 4’s release, and does remove most of the model’s safety refusals using a weight-editing technique that requires no retraining.

✓ Confirmed: The 93.7% HarmBench figure and −2.0% MMLU impact are taken directly from the model’s published benchmark table.

Why Open-Weight Models Are Particularly Vulnerable

Abliteration is not a new concept, but it becomes significantly more tractable when the full model weights are publicly available — as they are with Gemma 4, Llama, Mistral, and other open-weight releases. With proprietary API-only models, a researcher must probe the model’s outputs to infer its internal structure; with open weights, they can inspect and modify the weights directly.

Google’s safety work on Gemma 4 is described in its model card as going through “the same rigorous infrastructure security protocols as our proprietary models,” and the models are evaluated across a broad set of safety benchmarks before release. Nevertheless, once weights are published, the model developer has no technical means of preventing post-hoc modification. The abliteration is performed locally by the end user; it does not require any access to Google’s systems or infrastructure.

The dealign.ai team frames its work explicitly as safety research: “We research and publish abliterated models to advance AI safety understanding.” Whether that justification holds, and whether Hugging Face will leave such models available, remains an open question as of publication.

Implications for Users and Enterprises

For the majority of developers and businesses deploying Gemma 4 through official channels — Google AI Studio, Vertex AI, Cloud Run, or direct Hugging Face download of the official weights — this development has no direct impact. The original model remains unchanged. The risk lies in environments where provenance of weights is not verified, or where users deliberately seek out uncensored variants.

Enterprises building products on top of open-weight models should ensure they are sourcing weights from verified repositories (the official google/gemma-4-31b-it namespace on Hugging Face), and should establish internal policies around which fine-tuned or modified variants are permitted in their pipelines.

Google’s Gemma 4 Gets “Abliterated” Just Days After Launch