“Regional redundancy works when failure is bodily or infrastructural. It doesn’t work when failure is logical and shared,” Gogia stated. “When metadata contracts change in a backwards-incompatible manner, each area that depends upon that shared contract turns into susceptible, no matter the place the info bodily resides.”
The outage uncovered a misalignment between how platforms take a look at and the way manufacturing really behaves, Gogia stated. Manufacturing entails drifting shopper variations, cached execution plans, and long-running jobs that cross launch boundaries. “Backwards compatibility failures usually floor solely when these realities intersect, which is troublesome to simulate exhaustively earlier than launch,” he stated.
The difficulty raises questions on Snowflake’s staged deployment course of. Staged rollouts are broadly misunderstood as containment ensures when they’re really probabilistic danger discount mechanisms, Gogia stated. Backwards-incompatible schema adjustments usually degrade performance step by step as mismatched elements work together, permitting the change to propagate throughout areas earlier than detection thresholds are crossed, he stated.
