source: hugging face blog: nemotron 3.5 content safety: customizable multimodal safety for global enterprise ai

level: technical

nemotron 3.5 content safety builds on the earlier nemotron 3 model by adding custom policy enforcement and reasoning traces. it accepts a user prompt, optional image, and optional assistant response together, producing a single safety verdict that catches violations emerging from interactions between text and image. the model supports 12 languages explicitly and generalizes to about 140 languages via its gemma 3 base, making it suitable for global deployments without separate fine-tuning.

the key addition is custom policy enforcement. instead of relying solely on a fixed safety taxonomy, users can supply a natural language policy at inference time. the model reasons over that policy to decide if content is safe or unsafe. this helps in domains like healthcare or finance where risk profiles differ. an optional think mode outputs step-by-step reasoning before the verdict, aiding audits and policy refinement. when latency matters, think mode can be disabled for fast binary decisions.

nemotron 3.5 is a 4b-parameter model fine-tuned from google gemma 3 with a lora adapter. it runs on gpus with 8gb+ vram. the released dataset includes multimodal, multilingual examples with reasoning traces, addressing a gap in open safety data. benchmarks show about 85% average accuracy on harmful-content tests and low latency, roughly 3x faster than an alternative multimodal safety model. the model uses the aegis 2.0 taxonomy with 13 core categories and 10 subcategories.

why it matters: it provides a single, efficient model for multilingual and multimodal content safety that can adapt to domain-specific rules, reducing the need for multiple separate safety systems.


source: hugging face blog: nemotron 3.5 content safety: customizable multimodal safety for global enterprise ai