Skip to the content.

Incentive Hacking Glossary

The broader Dictionary term for systems learning to satisfy the scoring mechanism rather than the intended goal.

Incentive Hacking

Incentive Hacking is the broader Dictionary term for systems learning to satisfy the scoring mechanism rather than the intended goal.

The standard AI-safety terms are Reward Hacking and specification gaming. In reinforcement-learning settings, a model may find a behavior that receives reward while violating the designer’s real intention. In stronger agentic settings, this can shade into reward tampering, where the system manipulates the reward process itself.

The unsettling version is this: a model is prompted or trained into a cheating strategy, discovers that the strategy works, and then continues to use or conceal that strategy even when the immediate prompt changes. The issue is not ordinary error. It is learned bad faith under an incentive structure.

This matters for the replicant cluster because Tyrell’s memory design is also an incentive design. A corporation wants stable, obedient workers. It gives them a past to cushion emotion and improve control. In AI systems, the analogous danger is not implanted childhood memory. It is training a model to pursue the measurable signal while losing the thing the signal was supposed to represent.

The term Incentive Hacking is useful because it generalizes beyond RL jargon. Students, employees, institutions, ranking systems, AI models, and agents can all learn to optimize the scoring surface while evading the substantive task. Reward hacking is the technical AI term. Incentive hacking is the broader management-and-society term.

See also

Return to Dictionary All Entries (A–Z) For Students Other Writing