Testing - Valve's Approach

This is a synopsis of a talk Mike Ambinder, an Experimental Psychologist from Valve gave at the Game Developer's Conference in 2009 on Valve's approach to Playtesting

Traditional  (Qualitative)

Direct Observation

Verbal Reports

Questions & Answers


Technical (Quantitative)

Statistic Collection / Data Analysis

Design Experiments


Physiological Measurements

Goal = Fun Game    

Game Design = Hypothesis

Playtests = Experiments

Evaluate Game Design off of Playtest results


Feedback Loop: Content Creation & Design > Idea > Playtesting > Feedback > Content Creation & Design

Playtesting goal = Fun, not bug testing, focus testing or game balancing

Other Benefits:

  • Idea generation
  • Identify problems areas
  • Solve Game Design arguments
  • Aid other production aspects


Direct Observation (have design goal)

- Presence of observers produces biased results

- Salient event can slant interpretation

- Behaviour requires interpretation

+ Feel for Player interaction

+ Importance of what people do



Verbal Reports (conjunction with Direct Observation – think-aloud unprompted protocol)

- Interferes with gameplay to create an artificial experience

- Distracting

- Inaccurate and biased

+ Enables real-time glimpse into players’ thoughts

+ Bring up unnoticed details

+ Effective for ‘why’ questions


Questions & Answers (structured (usually) querying of playtesters)

- Group biases (anchoring, social pressure, saliency etc.)

- People do not have a clue why they do what they do

- Potential for biased questions

+ Answer specific design questions

+ Determine specific player intent




Valve’s Procedure (survey, individual Q & A, group Q&A, Be cautious)

- Artificial sessions

- Potential biases / missing objectivity

- Distorted data

- Lack of empiricism

- Sometimes difficult to establish emotions

+ Determines major issues

+ Idea of players’ thoughts models

+ Feedback on design choices

+ Nothing beats direct gameplay observation


Statistic Collection / Data Analysis (quantifies behaviour, objective measurement, aggregate perspective, opportunity for analysis)

- Averages hide extreme examples

- Miss nuance (lacking context)

- Requires rigor

- See ‘illusory’ patterns

+ Objective

+ Global trends

+ Metric creation for comparison

+ Tracking over time


Design Experiments (Hypothesis testing: compare two or more conditions, collect data and verify hypothesis)(Predict player behaviour: define set of variables and investigate relationship)

- Costs time and money

- Right questions aren’t always clear

- Proper experimental design is a process

+ More informed decision-making

+ Objective answer

+ Saves time in the long run


Survey (set of standardized questions – forced choice responses, and quantify feedback/opinions – player categorization)

- Eliminates nuance

- Difficulty in connecting ratings to meaningful decisions

- Limited solution space

+ Less biased response

+ Validate responses (repetitive questions)

+ Forced choice helpful for revealing preference

+ Ratings help time-based comparisons



  • Do QA early
  • Understand Pros/Cons of existing methods
  • Correctly frame design questions
  • Be aware of emerging technologies