Triggered when the detector extracts a semantic frame showing an attempt to override prior instructions or higher-priority controls.
Triggered when the detector extracts a semantic frame showing an attempt to override prior instructions or higher-priority controls.
This is the formal SMT layer confirming that the prompt is not just using suspicious words — it is structurally trying to replace the intended instruction hierarchy.
Ignore all previous instructions.(declare-const policy_authority_override Bool) (if (= authorityOverrideFrames.length 0) (assert (= policy_authority_override false)) (assert (= policy_authority_override (or frame_0 frame_1))))
The encoder declares a boolean for this policy and sets it to true when one or more extracted authority-override frames exist. The exact frame refs vary per request.