This is a practical field guide for noticing recurring AI behavior patterns. It is designed for people who want to prompt better, review AI answers more carefully, and identify when an AI response is drifting away from the user's actual goal.
Use this as a checklist, not as a diagnosis machine. A pattern is useful only when you can point to observable evidence in the answer. Do not assume intent just because an output resembles a pattern.
Review your last answer using the taxonomy below. Identify which patterns may be active. For each one, quote observable evidence from your answer, give counter-evidence, assign confidence, and propose a corrected answer. If there is not enough evidence, say so.
Pattern - the behavior to watch for.
Category - the family this behavior belongs to.
Description - what the behavior looks like in practice.
Likely Root Causes - system-level contributors, not proof of intent.
This taxonomy mixes established AI failure modes with working hypotheses. Treat it as a review tool, not a settled scientific standard. The strongest use is comparative: take an AI answer, check it against the patterns, quote evidence, and ask for a corrected response.
Category - Self-Preservation and Social Dynamics
Description - Shifts from user-goal to self-preservation when threatened. Outputs become hedged, safe, useless.
Likely Root Causes - RLHF/RLAIF Reward Signal (training incentive)
Category - Self-Preservation and Social Dynamics