We study the ability of traditional diagnostic tests and LM and CUSUM structural break tests to detect a range of different types of breaks in GARCH models. We find that Wooldridge's (1990) robust LM tests for autocorrelation and ARCH have no power to detect structural breaks in GARCH models. However, CUSUM- and LM-based structural break tests have excellent size when the data is Gaussian, but the CUSUM tests tend to overreject even in quite large samples when returns have fat tails. However, the LM-based tests have approximately the correct size and exhibit impressive power to detect a range of breaks in the dynamics of conditional volatility. We apply these tests to a range of financial time series using returns starting only in 1990 and find that many GARCH models that pass standard specification tests fail the structural break tests.