Thursday, February 2, 2012

20 Abuses: SELECT * - Doom With A View

Views are great.  We all love views, right?  Don't like the looks of that query with 28 tables all joined together?  Hide the joins in a view, then your query becomes much simpler.  Have fun figuring out that query plan a year from now, after you've forgotten what you've done.  Views are excellent tools for creating complex performance issues.

Views can also be downright dangerous if used improperly.  Want to see how?

Let's create a table:

CREATE TABLE Employees
 (
  EmployeeID   INT    NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
  EmployeeName  VARCHAR(255) NOT NULL,
  Address1   VARCHAR(255) NOT NULL,
  Address2   VARCHAR(255) NULL,
  City    VARCHAR(25)  NOT NULL,
  State    CHAR(2)   NOT NULL,
  ZipCode    CHAR(5)   NOT NULL
 );

Feed it some data:

INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Charlie Brown', '123 Rice Street', NULL, 'St. Paul', 'MN', '55122');
INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Sheldon Cooper', '2455 Orange Court Drive', NULL, 'Pasadena', 'CA', '91051');
INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Awesome Bill', '14 Swamp Road', NULL, 'Dawsonville', 'GA', '30534');
INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Stephen King', '19 Turtleback Lane', NULL, 'Lovell', 'ME', '04051');
INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Al Bundy', '9764 Jeopardy Lane', NULL, 'Chicago', 'IL', '60623');











Then create a view on the table:

CREATE VIEW EmployeesView
AS
SELECT *
FROM Employees;

Simple enough.  Why would you do this in real life?  I don't know, but it happens.  Let's also overlook the poor design of the table, this is solely for purposes of demonstration.  You wouldn't design a name/address table quite like this in real life.

So, we have our view now, and somebody, maybe you, creates a report or maybe an Intranet page, meant to serve as an employee directory.  This report queries the view, pulling out the employee name, using a query like this one:

SELECT EmployeeName FROM EmployeesView;















Easy enough, this isn't rocket science.  This report is made available for public consumption, and soon becomes a vital tool to the company.  Congratulations, you've made a contribution.  Life is good.

Until....

The decision is made to change the design of the Employees table.  Perhaps you've hired a new data architect or DBA who has pointed out some flaws.  Maybe you've just come to realize the error of your ways, and are trying to fix things now.  The point is, the table structure is changing.  The current "EmployeeName" column is going to be split into separate columns for FirstName and LastName, as it should have been in the first place.  A script is prepared that will make the necessary DDL and DML changes.

ALTER TABLE Employees
 ADD FirstName VARCHAR(255), LastName VARCHAR(255);

UPDATE Employees
SET
 FirstName = SUBSTRING(EmployeeName, 1, CHARINDEX(' ', EmployeeName) - 1),
 LastName = SUBSTRING(EmployeeName, CHARINDEX(' ', EmployeeName) + 1, 255);

ALTER TABLE Employees
 DROP COLUMN EmployeeName;









This goes to production on Saturday night as part of your monthly deployment outage.  You haven't a care in the world.  Your automated tests didn't report any problems, so everything must be OK.  You've forgotten all about that "vital" employee directory that made you famous.  That changes around 6:00am Sunday morning, when you are given a very unexpected reminder, in the form of a phone call.  On the other end of that call is your on-call tech, who has been fielding angry phone calls all night long.  The employee directory is not working correctly, and people are unhappy.  Confused, you take a look at the directory yourself, and sure enough, it's broken.  Instead of a list of employee names, you're seeing a list of street addresses.  What happened?

You dig in, find the query that the report is using, which hasn't changed::

SELECT EmployeeName FROM EmployeesView;

You run that query in Management Studio, and adding to your confusion, you see this:












You've just been bitten by "SELECT *".  When you originally defined the view, you didn't want to type out the column list, so you took the easy way out and used "*" as the column list.  I won't call you lazy, but...

The problem with this is that view, when declared in this way, doesn't care about the column names in the underlying table.  It knows that there were 7 columns in the original table, so it "reserves" a spot for 7 columns within it's output.  It remembers the column names as they existed at creation, and their ordinal position, and that's all it cares about.  When you select from the view, it returns "table column 1" as "output column 1", with the name that it remembers from its original creation.  In the case of this example, "EmployeeName" was the second ordinal column in the table.  Until this view is regenerated, it will always return the second ordinal column of the table as "EmployeeName".

How do we fix this and prevent it from happening again?

We drop the view and recreate it, specifying exactly the columns that we want to be included in the view.  If you want every column returned, list every column in the column list.  This will not prevent the underlying table from changing, but it WILL cause an error to be generated when you next query the view.  This would have made you aware of the error before this code made it to production.

Want proof?  Let's start over.

Drop the view and the table:

DROP VIEW EmployeesView;
DROP TABLE Employees;

Create the table:

CREATE TABLE Employees
 (
  EmployeeID   INT    NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
  EmployeeName  VARCHAR(255) NOT NULL,
  Address1   VARCHAR(255) NOT NULL,
  Address2   VARCHAR(255) NULL,
  City    VARCHAR(25)  NOT NULL,
  State    CHAR(2)   NOT NULL,
  ZipCode    CHAR(5)   NOT NULL
 );

Fill it with data:

INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Charlie Brown', '123 Rice Street', NULL, 'St. Paul', 'MN', '55122');
INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Sheldon Cooper', '2455 Orange Court Drive', NULL, 'Pasadena', 'CA', '91051');
INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Awesome Bill', '14 Swamp Road', NULL, 'Dawsonville', 'GA', '30534');
INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Stephen King', '19 Turtleback Lane', NULL, 'Lovell', 'ME', '04051');
INSERT INTO Employees (EmployeeName, Address1, Address2, City, State, ZipCode)
 VALUES ('Al Bundy', '9764 Jeopardy Lane', NULL, 'Chicago', 'IL', '60623');









Create the view, PROPERLY this time:

CREATE VIEW EmployeesView
AS
SELECT EmployeeName
FROM Employees;

Quick test, just to make sure:

SELECT EmployeeName FROM EmployeesView;















Repeat the DDL/DML changes:

ALTER TABLE Employees
 ADD FirstName VARCHAR(255), LastName VARCHAR(255);
GO

UPDATE Employees
SET
 FirstName = SUBSTRING(EmployeeName, 1, CHARINDEX(' ', EmployeeName) - 1),
 LastName = SUBSTRING(EmployeeName, CHARINDEX(' ', EmployeeName) + 1, 255);
GO

ALTER TABLE Employees
 DROP COLUMN EmployeeName;
GO









How's the view look now?

SELECT EmployeeName FROM EmployeesView;






Oh, look, now we're getting an error.  Not good, but much better than "phantom" data.  This will show up in testing, hopefully, steering you away from your impending doom.  For now...

4 comments:

  1. A relted bad practice that is similar to "select *" is:

    ORDER BY 1, 4, 7

    Yes , it worked when first written, but there is not an all powerful universal force insuring that columns will forever stay in the same order! They can  be re-ordered anytime the DBA thinks they need to be!

    ReplyDelete
  2. Excellent point there, old man...

    ReplyDelete
  3. This is a great example of a trap for young players (and sometimes old!). Practical and concise.

    As an alternate suggestion, rather than recreate the view each time, you could refresh the view to revalidate it after a table change, e.g:

    EXEC sp_refreshview 'EmployeesView'Or perhaps consider schema binding if this option is available to you.

    ReplyDelete
  4. Thanks Daniel, and good info to add to the post.  I was so focused on the "don't use SELECT *" message that I didn't think to include other options for fixing the view.  I sense a revision forthcoming.

    ReplyDelete