Tuesday, September 13, 2011

Bad For Babies and Bad For SQL Server

The life of a superhero isn't all fame and glory, most days the only excitement that we see is rescuing a cat from a tree, or in my case, saving a bit of T-SQL code from a terrible fate, as in this next story.

I encountered a real villain, a nasty sort of fellow who liked to steal candy from babies.  The nogoodnik was using a SQL Server database to record his evil deeds.  When I tracked him down, he was attempting to build some reports that would allow him to summarize his nefarious activities.

What I saw made me sick, and it's going to be very difficult to describe for you, but I'll do my best.  So that you may truly appreciate what I was up against, I've included some sample code below.  I apologize for what you're about to see, women and small children should leave the room now.

First, let's create some sample data to represent the villain's database.  Note that the real data was many times larger than this, the meanie had really been busy.

CREATE TABLE Baby
   (
       BabyID          INT             NOT NULL PRIMARY KEY CLUSTERED,
       BabyName        VARCHAR(255)    NOT NULL
   );

CREATE TABLE Candy
   (
       CandyID         INT             NOT NULL PRIMARY KEY CLUSTERED,
       CandyForm       CHAR(4)         NOT NULL,
       CandyFlavor     VARCHAR(25)     NOT NULL
   );

CREATE TABLE CandySwipeEvent
   (
       EventID         INT             NOT NULL IDENTITY(1,1)   PRIMARY KEY CLUSTERED,
       BabyID          INT             NOT NULL FOREIGN KEY REFERENCES Baby(BabyID),
       CandyID         INT             NOT NULL FOREIGN KEY REFERENCES Candy(CandyID),
       EventDate       DATETIME        NOT NULL,
       Weather         VARCHAR(25)     NOT NULL,
       BabyReaction    VARCHAR(25)     NOT NULL
   );

INSERT INTO Baby (BabyID, BabyName) VALUES (1, 'Billy');
INSERT INTO Baby (BabyID, BabyName) VALUES (2, 'Timmy');
INSERT INTO Baby (BabyID, BabyName) VALUES (3, 'Susie');
INSERT INTO Baby (BabyID, BabyName) VALUES (4, 'Butch');
INSERT INTO Baby (BabyID, BabyName) VALUES (5, 'Josephina');
INSERT INTO Baby (BabyID, BabyName) VALUES (6, 'Stevie');

INSERT INTO Candy (CandyID, CandyForm, CandyFlavor) VALUES (1, 'Bar', 'Chocolate');
INSERT INTO Candy (CandyID, CandyForm, CandyFlavor) VALUES (2, 'Bar', 'Peanut Butter');
INSERT INTO Candy (CandyID, CandyForm, CandyFlavor) VALUES (3, 'Bag', 'Chocolate');
INSERT INTO Candy (CandyID, CandyForm, CandyFlavor) VALUES (4, 'Pop', 'Fruit');
INSERT INTO Candy (CandyID, CandyForm, CandyFlavor) VALUES (5, 'Gel', 'Rainbow Worm')

INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (1, 1, '12/25/1973 06:45am', 'Snowing', 'Cried');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (1, 2, '12/25/1974 06:07am', 'Cloudy', 'Cried');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (1, 1, '12/25/1975 04:31am', 'Snowing', 'Cried');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (1, 5, '12/25/1988 10:29am', 'Snowing', 'Punched me');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (1, 4, '12/25/1992 1:47pm', 'Sunny', 'Called police');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (2, 3, '4/1/1978 10:13am', 'Sunny', 'Cried');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (2, 3, '4/2/1978 10:17am', 'Sunny', 'Cried');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (2, 3, '4/3/1978 10:31am', 'Sunny', 'Cried');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (3, 4, '7/4/1983 2:49pm', 'Raining', 'Dog bit me');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (4, 4, '7/4/1983 3:03pm', 'Raining', 'Pulled knife');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (5, 5, '9/1/1986 4:51pm', 'Sunny', 'Cried');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (5, 5, '10/31/1986 8:19pm', 'Cloudy', 'Cried');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (6, 1, '10/31/1986 8:26pm', 'Cloudy', 'Cried');
INSERT INTO CandySwipeEvent (BabyID, CandyID, EventDate, Weather, BabyReaction) VALUES (6, 1, '10/31/1986 8:27pm', 'Cloudy', 'Cried');

Our evil genius was trying to write a query that would show him the last time that he had taken chocolate candy on a cloudy day, from any baby in his long history.  Somehow, he had mangled together this bit of code - makes me cold all over just to look at it:

SELECT
   BabiesToProcess.BabyID,
   (
       SELECT TOP 1 CandySwipeEvent.EventID
       FROM CandySwipeEvent
       INNER JOIN Candy
           ON CandySwipeEvent.CandyID = Candy.CandyID
       INNER JOIN Baby
           ON CandySwipeEvent.BabyID = Baby.BabyID
       WHERE CandySwipeEvent.Weather = 'Cloudy'
           AND Candy.CandyFlavor = 'Chocolate'
           AND CandySwipeEvent.BabyID = BabiesToProcess.BabyID
       ORDER BY CandySwipeEvent.EventDate DESC
   ) AS CandySwipeEventID
FROM
   (
       SELECT TOP (100) PERCENT BabyID
       FROM Baby
       ORDER BY BabyID
   ) AS BabiesToProcess
WHERE EXISTS
   (
       SELECT TOP 1 CandySwipeEvent.EventID
       FROM CandySwipeEvent
       INNER JOIN Candy
           ON CandySwipeEvent.CandyID = Candy.CandyID
       INNER JOIN Baby
           ON CandySwipeEvent.BabyID = Baby.BabyID
       WHERE CandySwipeEvent.Weather = 'Cloudy'
           AND Candy.CandyFlavor = 'Chocolate'
           AND CandySwipeEvent.BabyID = BabiesToProcess.BabyID
       ORDER BY CandySwipeEvent.EventDate DESC
   );

A closer look at the query plan for his monstrous creation shows that each table in the query is traversed twice, sometimes as a full scan.  I mentioned earlier that the real database was quite large - the real CandySwipeEvent table contains 59 million rows, the Candy table 30 million, and the Baby table 19 million.


After studying this for a moment, and realizing what his ultimate evil goal really was, I summoned all of my limited powers to produce this:

SELECT
   BabyID,
   CandySwipeEventID
FROM
   (
       SELECT
           BabyID              = Baby.BabyID,
           CandySwipeEventID   = CandySwipeEvent.EventID,
           SwipeDateSequence   = ROW_NUMBER() OVER(PARTITION BY Baby.BabyID ORDER BY CandySwipeEvent.EventDate DESC)
       FROM CandySwipeEvent
       INNER JOIN Baby
           ON CandySwipeEvent.BabyID = Baby.BabyID
       INNER JOIN Candy
           ON CandySwipeEvent.CandyID = Candy.CandyID
       WHERE CandySwipeEvent.Weather = 'Cloudy'
           AND Candy.CandyFlavor = 'Chocolate'
   ) AS SequencedSwipeEvents
WHERE SwipeDateSequence = 1;

The result is a much cleaner, more efficient query plan.  I could have improved it further by doing some index analysis on the CandySwipeEvent table to eliminate that scan, but my powers were exhausted by this point.


If I'd had my way, this evil-doer would have been permanently dispatched that day, but alas, the SuperHero Rulebook prevents such things, and he's still out there, somewhere, torturing T-SQL and stealing candy from babies.

0 comments:

Post a Comment