Friday, December 21, 2012

Mayans, Shmayans


The day is nearly over, and it appears that we're going to survive the Mayan apocalypse, so I'm going to proceed with my end-of-year review.  

Last year, I laid out some specific things that I wanted to focus on during this year.  I assure you that when I made this list, all of these seemed attainable.  I'm a little ashamed to see just how far off track I got:
  • continuing my involvement in local PASSMN activities - I have a mixed opinion on whether I met this goal or not.  I count this as a failure because I'm not attending meetings regularly.  On the other hand, I count this as a success because I did something completely unexpected this year.  I presented at SQL Saturday #118 in Madison!  I also made a minuscule contribution to SQL Saturday #149, helping out in limited capacity due to a scheduling conflict.  Add in the trip to Colorado Springs for SQL Saturday #104, and I'm comfortable saying that I've made up for not being a regular meeting attendee.
  • attend the PASS Summit - this one's a big FAIL.  I could not get my employer to sign off on sending me to Summit.  With one of my team out having a baby, it wouldn't have worked out anyway.  Maybe next year!
  • continue improving this blog and my writing skills - this didn't go as well as planned, but I'm not going to count it as a failure.  Writing takes time, preparing meaningful examples and demos takes time, and like so many others out there, I've got a lot on my plate, and little free time for writing.  I'm still trying to figure out how to make this fit into my routine.
  • establish an ongoing developer training and mentoring program at work - this worked well, for a while.  I did several in-house trainings, sort of "mini SQL Saturday" sessions.  However, just like the writing, it takes time to prepare quality training presentations, and I simply don't have it right now.
  • push to eliminate all SQL 2000 and SQL 2005 systems from our environment - failed, due to the inability to get hardware to build the replacement servers for these environments.  On the agenda for 2013.
  • read and review 4 non-fiction books - one.  One book.  I read several books this year, but with one exception, they were all fiction of some sort.  The only non-fiction book that I finished in 2012 was Poke The Box.
  • obtain the SQL 2008 developer and admin IT Pro certifications - I decided not to pursue this, mostly because the SQL 2012 beta exams became available in April of 2012.  I took all five of the MCSA and MCSE:Data Platform exams, 71-461, 71-462, 71-463, 71-464, and 71-465.  I passed all of them except 71-463 (now 70-463).  This one I'm actually going to have to study for.  The study guide is sitting on my desk, right here in front of me.

In terms of goals, 2012 looks like a loss.  I was incredibly busy, got a promotion, worked my butt off, but accomplished very little of what I personally wanted to accomplish.  Maybe my expectations were too high, maybe I really was just too busy, or maybe I'm too quick to look for excuses, I'm not sure.  I'm going to try again, with a similar list of goals, but I'm going to narrow the focus a bit, and try to make the goals more specific in 2013.
  • attend 6 local PASS meetings
  • help out with the local SQL Saturday event
  • present at the local SQL Saturday event
  • write a minimum of one new blog post per month
  • conduct 6 in-house trainings, perhaps dual-purposing blog post or SQL Saturday content, to minimize prep time needed
  • read and review 4 non-fiction books
  • pass exam 70-463 (obtaining both MCSA and MCSE: Data Platform for SQL 2012)
This feels like a solid list of goals, without feeling like I'm making them too easy versus last year's.  They're also specific, easy to determine if I've obtained them or not, and should be fairly easy to track progress throughout the year.  I'd welcome any advice as to how to keep on track and reach these!

Monday, December 17, 2012

To Test, Or Not To Test?

That is the question.

You've been called in to help another division within your company.  They're experiencing severe performance problems (the server is basically on fire) with their production SQL Server instance, and without a true DBA on staff, they're looking for help to resolve the problems.

You gain access to their SQL instance, check the basic stuff - CPU is not pegged, disk I/O seems high, lots of CXPACKET and PAGELATCH waits.  Using sp_who2, you quickly identify a query that seems to be running several times concurrently, and takes a long time to run each time.

SELECT RowID, CharValue, DateValue
FROM TestOrNot
WHERE ExternalID = 12

The table TestOrNot has the following structure, and contains 1 million rows of data, with roughly 10,000 rows for each ExternalID value.

CREATE TABLE TestOrNot
(RowID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,CharValue CHAR(20)
,ExternalID INT
,DateValue DATETIME
);

The table has one (and only one) non-clustered index:

CREATE INDEX IX_TestOrNot_ExternalID ON TestOrNot (ExternalID);

The query that you've identified is using the following query plan:


It is clear from this plan that the query is going parallel (the likely source of the CXPACKET waits), and it's doing an expensive key lookup (likely the source of the PAGELATCH waits).  This seems like a no-brainer - widen the existing non-clustered index to include the additional columns being requested by the query.  This will eliminate the key lookup, the most expensive step in the query plan.

To prove our theory, lets create a second non-clustered index that includes the extra columns:

CREATE INDEX IX_TestOrNot_ExternalID2 ON TestOrNot (ExternalID) INCLUDE (CharValue, DateValue);

The plan for the problem query changes - it now looks like this:


We've not only eliminated the key lookup, but we've also stopped the query from going parallel.  True, we could have done this in a number of ways, but in this case, we've given SQL Server what it needed to make that decision itself.

Right away, the effects of the index are apparent.  System performance improves immediately, response times are improved, this change appears to have solved the problem.  You proudly announce to the world "Problem solved!", only to hear "Not so fast, we need to test this.  Please remove that index from production and we'll start testing it in development."

Ummm, but, it's a simple non-clustered index change.  It's not a unique index, it's not a clustered index, it clearly solved the problem - your server is no longer on fire!  I make minor index changes like this ALL THE TIME to address performance problems in my systems.  "Well, that's not how we do things in this division, this needs to go through a development and test cycle.  Should be ready sometime next week."

Alrighty then.  Your system, your customers, your suffering.  You asked me for help, I found and fixed the problem, you're choosing to suffer needlessly for a few more days.  My work here is done.

How do you handle seemingly trivial changes like this in your environment?  I'm not talking about CLUSTERED index changes, or UNIQUE index changes, I'm talking only about simple non-clustered index additions or modifications.  Do you test, or do you just jump in with both feet?

Leave a comment with your thoughts...

Tuesday, August 14, 2012

T-SQL Tuesday: Are We There Yet?



Wow, my second blog post in a week!  I continue to struggle to find time to write content to post here.  Not for lack of ideas; I have more ideas than time.  Since today is T-SQL Tuesday, it seemed like a good time to try to sneak in a blog post before lunch.

This week is being hosted by Mike Fal, and the topic is "Trick Shots".  The assignment is to share some nifty SQL Server trick that we've learned along the way, something simple but useful, something that maybe some people haven't seen before.

There are so many tricks out there, many of which have been written about ad nauseum, I wasn't sure what to write about.  After digging through my collection of utility scripts, I settled on one that I use at least once a week.

I discovered this "trick" a couple of years ago, shortly after upgrading to SQL Server 2008 R2.  I spent several weeks getting to know compression, aggressively compressing some of our largest and busiest indexes and tables.  This translated to lots of time sitting there staring at Management Studio, wondering how long that 40M row index was going to take to compress.  I was thrilled to discover that there's an easy way to answer that question, and it comes in the form of a simple SELECT statement.

Open up your AdventureWorks database, and run this statement:


SELECT *
FROM sys.indexes
WHERE object_id = OBJECT_ID('Sales.SalesOrderDetail');


We can see from these results that the table Sales.SalesOrderDetail has three indexes.  Now go run this statement:

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('Sales.SalesOrderDetail');


These results shows us more information about those three indexes, stuff about the partitioning, compression status, etc.  The bit of info that we're interested in for this exercise is the rows column, which tells us, obviously, how many rows are in a particular index.

Now let's assume that we want to rebuild one of those indexes.  We can't interfere with users who are in the system, nor can we afford to cause any downtime, so we're going to do this as an ONLINE operation:

ALTER INDEX PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID 
ON Sales.SalesOrderDetail 
REBUILD WITH (ONLINE=ON);

While this is running, quickly open another tab and run this statement:

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('Sales.SalesOrderDetail');


Notice that there are now TWO rows returned with an index_id of 1.  What we're seeing here is a copy of the clustered index (the one we specified in our ALTER statement) being rebuilt alongside the existing clustered index.  When it's done, that copy will replace the real index.  That's not the point of this exercise though.  The point is to answer the question - how is the rebuild progressing?  For that answer, we're going to look at the rows column.

You can see that the two rows with an index_id of 1 have a different number of rows indicated.  The larger number is the number of rows contained in the existing index.  The smaller number is the number of rows contained in the copy that is being built.  By repeatedly running this SELECT statement and observing those two rows values, we can monitor the progress of the rebuild.  The closer the smaller number gets to the larger number, the closer the rebuild is to being finished.

I know, this is a pretty underwhelming "trick", but it's turned out to be an extremely useful bit of knowledge for me.  Maybe somebody else will agree.

Thursday, August 9, 2012

Cheating DBCC To Fix Index Corruption

You're still here? Why? There hasn't been anything new to read here for weeks! Sorry, I've been busy, sitting in endless meetings, learning how to play "manager". It hasn't been pretty.

I'm back, with a new tale of good versus evil. For dramatic effect, I suggest that you read this one aloud, in Christian Bale's "Batman" voice. Go ahead, try it!

This tale starts two days ago, with one of my Oracle DBA's reporting a problem. "We're missing records from a table, looks like replication from SQL Server is broken." Yeah, sure, you Oracle guys are always blaming SQL Server when something goes wrong. Based on the evidence so far, he was, however, seemingly correct. Grabbing my cane, I struggled out of my easy chair and hobbled over to the computer console.

The table in question contains 96 million rows, and is replicated to both an Oracle data warehouse and a SQL Server warehouse. There is also a companion audit table, populated via a trigger, capturing all update and delete activity that occurs within the source table. It would be easy to track down the culprit responsible for these missing rows. Personally, I was starting to suspect the maid.

A quick comparison of row counts revealed 1,295 missing rows. It also revealed a bigger mystery - the rows were missing from the SOURCE table, not the subscriber table in the Oracle warehouse. Checking the subscriber table in the SQL warehouse revealed the same thing - 1,295 rows that existed in the subscriber table but not in the source. I quickly formed an assumption - somebody had deleted these rows from the source table, but replication had for some reason failed to push that/those transactions to the subscribers. The audit table should make it easy to find out who, I just needed to know which rows were missing. The primary key on this table is "RowID", so a simple "NOT IN" query gave me the list of RowID values that were missing.
SELECT RowID 
FROM WAREHOUSE.StagingDB.dbo.BigCriticalTable 
WHERE RowID NOT IN 
   (
      SELECT RowID 
      FROM PublisherDB.dbo.BigCriticalTable
   )



Grabbing the first RowID from that list, I queried the audit table, ready to reveal the identity of the person responsible for deleting this missing row.

SELECT * 
FROM zBigCriticalTable_Audit 
WHERE RowID = 12345 
ORDER BY LogDate

And the villain is none other than... Hmmm, odd, I can see that the record was modified a few times, but according to the audit table, it has NOT been deleted. This villain is good, very good, he/she was able to bypass our auditing process. Cinching up my tights, I reached for the next tool in my utility belt - the trace logs that we capture 24 hours a day, 7 days a week.

I knew the specific day that these rows disappeared, so mining the trace logs was simply a matter of loading up all of the logs for that day into a table, then running some queries to find the DELETE statement that removed RowID 12345. I started that load process, and nine hours later, I had my answer. Once again, the villain is none other than... Ummm, hmmm, there doesn't seem to be a DELETE statement anywhere in the trace logs. I suddenly broke out into a cold sweat. As the numbness began to fade from my brain (manager, remember?), I realized who my opponent was.



As DBAs, we're responsible for keeping our data safe and secure. We're the silent guardians of the information age. The bane of our existence, always lurking, always threatening, is data corruption. Rarely seen these days, but when it is, there's always a mess to be cleaned up. Corruption is a deep, dark pit, and I just landed at the bottom of it.

Going back to my source table, I took a look at the table structure:

The table has a couple of indexes:


These are both non-filtered indexes, and should therefore both be able to locate a record by its RowID value, using an index scan if necessary. Using index hints, I used each index to check for the existence of my test record, RowID = 12345:

SELECT * FROM BigCriticalTable WITH (INDEX(PK_BigCriticalTable)) WHERE RowID = 12345

SELECT * FROM BigCriticalTable WITH (INDEX(CIX_BigCriticalTable_SecretCode)) WHERE RowID = 12345

Aha! When the clustered index is used, I'm able to see my test record. The non-clustered index, in spite of being keyed on RowID, returns nothing. My test row is missing from the non-clustered index. Easy fix, right? Just rebuild the non-clustered index!

ALTER INDEX PK_BigCriticalTable ON BigCriticalTable REBUILD WITH (ONLINE=ON)

Is that the way to climb out of this pit? Let's have a look:

SELECT * FROM BigCriticalTable WITH (INDEX(PK_BigCriticalTable)) WHERE RowID = 12345

SELECT * FROM BigCriticalTable WITH (INDEX(CIX_BigCriticalTable_SecretCode)) WHERE RowID = 12345

No luck - the row is still missing from the non-clustered index. Back to the bottom of the pit. Time to give DBCC a try:

DBCC CHECKTABLE (BigCriticalTable)

This is the actual DBCC CHECKTABLE output from my real table:

DBCC results for 'HostedAuthorization'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'HostedAuthorization' (ID 293576084). Data row does not have a matching index row in the index 'IDX_HOSAUT_ClientAuthIDClient' (ID 3). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:36536:157) identified by (ClientCode = 'XXXX' and ClientStudentID = '32168796' and UNIQUIFIER = 48) with index values 'ClientAuthorizationID = NULL and ClientCode = 'XXXX' and ClientStudentID = '32168796' and UNIQUIFIER = 48'.

(same errors repeat hundreds of times)

There are 94437500 rows in 510195 pages for object "HostedAuthorization".
CHECKTABLE found 0 allocation errors and 2508 consistency errors in table 'HostedAuthorization' (object ID 293576084).
repair_rebuild is the minimum repair level for the errors found by DBCC CHECKTABLE (HCDS.dbo.HostedAuthorization).
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


Looks pretty scary, right? Imagine running into that in a dark data center!

So, according to that output, it looks like I can fix this with DBCC using the REPAIR_REBUILD option. I just have to put the database into single-user mode first. Which means shutting down an extremely busy system used by people around the world, 24 hours a day, or waiting four more days until the scheduled weekend maintenance outage. Neither is an attractive option. So, what to do?

Let's take a close look at just what the REPAIR_REBUILD option does. This is direct from Books Online for SQL Server 2008 R2:


Some specific parts of that stand out, in particular the parts about "no possibility of data loss" and lots of references to rebuilding indexes. Sounds an awfully lot like what happens when you run this:

ALTER INDEX ALL ON BigCriticalTable REBUILD WITH (ONLINE=ON)

Is this the secret to climbing out of this pit? After running that ALTER INDEX statement, I checked my query results again:

SELECT * FROM BigCriticalTable WITH (INDEX(PK_BigCriticalTable)) WHERE RowID = 12345

SELECT * FROM BigCriticalTable WITH (INDEX(CIX_BigCriticalTable_SecretCode)) WHERE RowID = 12345

Success! The "missing" record is now visible in both indexes. The city, err, data is saved!

Am I suggesting this as a replacement for DBCC CHECKTABLE? No, not at all. I'm simply offering it as an alternative to a costly system outage, or as an alternative to waiting and possibly allowing the problem to get worse. It's a low-risk option worth trying (unlike heli-skiing).  Again, I'm NOT suggesting this to be a replacement for, or your first choice over, the DBCC functions.  If you're facing a situation where you need to try something NOW, and DBCC isn't an option, this might be a solution for you.  In my case, the missing data was causing problems NOW, but I could not put the database into single-user mode to run DBCC without bringing down the production system.

You can stop with the Christian Bale voice now...

Monday, April 23, 2012

Why Did The Minnesotan Cross The River?

To get to the other side, duh...  The river in question is the St. Croix River, protecting Minnesota from Wisconsin.  Typically, Minnesotans only cross that river for one of three reasons - beer, cheese, or to get to Chicago.  This past weekend there was another reason - SQL Saturday in Madison, and it was AWESOME!  For a first-time event, it couldn't have been any better executed.  Great food all weekend, a great venue, great speakers, lots of fun and interesting people, just a fun weekend all around.

Did I mention great speakers?  There were so many sessions that I wanted to catch, but there just wasn't enough time or enough of me to go around.  I ended up bouncing from session to session, leaving some of them early to catch the ending of another.  I hope nobody thought I was being rude, I was simply trying to soak it all in.  Lots of great speakers...

And then there was me.  Yes, I spoke at this event as well, my first public presentation.  Ever.  Three weeks of preparation.  Good demos, fun slides, and surprisingly few jitters.  At least until about an hour before showtime.  I sat down in the speakers' lounge to do a quick run-through of the presentation, and COULD NOT CONNECT TO MY VM!!!  Instant panic attack.  Brain froze, and for a moment I had no idea what to do.  SQL?  What's SQL?  Eventually, I settled down, and realized that the problem was that the LAN adapter in my VM was setup as a bridged adapter instead of a host adapter.  A quick config change and I was up and running.  WHEW!

4:15pm - time for my big debut.  There I stood, in front of 30 total strangers, not terribly nervous, but having what I would almost describe as an "out of body" experience.  I could hear myself talking, but my brain didn't seem to be in control.  "What was I trying to show with this demo?  C'mon, think!".  I need better comments in my demo scripts.  After an awkward few minutes, I began to find my rhythm and the flow improved.  Audience participation was great, there was a lot of productive back-and-forth (we all learned a few things), and I was actually able to answer most of the questions that were asked.  My only regret is that I needed about 10 more minutes to full finish the presentation - we had to rush through the last few slides, skipping a couple of demos.

In spite of the bumpy start, the audience apparently liked what they saw.  From the 29 eval forms that were turned in, my average score was 4.1 (out of 5).  Not too shabby for a first-timer!  Lots of constructive comments as well, and then there was "Title of talk should be more specific".  This one has me stumped.  The session title was 10 Ways To Abuse T-SQL, and I showed 10 ways that new developers commonly abuse the language.  I'm not entirely sure what was unspecific, but I'm definitely curious how to fix this one.

Enough of my rambling.  A huge pat-on-the-back goes out to Jes Borland (blog | @grrl_geek), Ted Kreuger (blog | @onpnt), and everybody else who helped organize this event.  Very, very well done.  I also want to extend a big thanks to Eric Selje (blog | @EricSelje) for his encouragement before my session, and his reassurance afterwards.  Oh, and to my coworkers, who have had to endure endless "what do you think of this" questions over the past few weeks as this presentation began to take shape.  Gold stars for everybody!



Alright, where's the next event?

Friday, April 13, 2012

#SQLSat118

What's that weird title all about?  That's the Twitter hashtag for SQL Saturday #118, being held next Saturday, April 21st, in Madison, Wisconsin.  What's so special about SQL Saturday #118?  Well, besides being the first one held in Wisconsin, it's also the first time that I'll be presenting publicly.  I'll be discussing 10 ways to abuse T-SQL, 10 of the more common coding mistakes that new T-SQL developers make.

In addition to the presentation, I'll also be hosting a discussion table during the lunch break, something they're calling "Birds of a Feather".  There will be multiple tables to choose from, covering a variety of topics, each hosted by a different SQL Server expert.  Apparently there were no experts available to talk about Query Tuning, so they chose me.

I'm very much looking forward to this, and to many more such events in the future.  If you're attending the Madison event, look me up.  Stop by my table at lunch, or if you need a nap, sit through my presentation at 4:15pm.

Saturday, March 17, 2012

Ready Or Not, Here I Come!

A couple of months back, I conducted my first "training", kicking off a series of presentations to our internal development and QA teams.  I noted then how much fun I had, not only preparing the presentations, but giving them as well.  Since then, I've done two more sessions for the internal audience, the response to which has been very positive.

Time to step things up a bit...

The next few weeks will be full of firsts for me, as I step, no, let's say LEAP, well outside of my comfort zone.  Or what I thought was my comfort zone.  I say "thought" because I'm finding that I really enjoy doing these presentations.

The first first will come on March 21st at 1:00pm CST, as I participate in a discussion panel with Karen Lopez, Grant Fritchey, Tim Ford, Jorge Segarra, Stacia Misner, and Mike Walsh as part of the upcoming "24 Hours Of PASS" event.  This will be my first ever public "speaking" event, and I'm very much looking forward to it.

The second first occurs on April 21st in Madison, Wisconsin.  No webcam for protection this time - I'll be stepping out in front of a live audience as I take part in SQL Saturday #118.  I'll be presenting a condensed version of the presentation that I've done for my developer/QA audience, "10 Ways To Abuse T-SQL".  Small potatoes for the people that do this all the time, but a huge deal for a first-timer.  I can't wait!