My Generic Event Notifications for SQL Servers

While reading a blog post regarding using server DDL triggers to capture a “CREATE DATABASE” and fire an email to the DBA, I thought that coupling a trigger to sp_send_dbmail and an external executable wasn’t a great idea. I decided that there must be a better way to capture that important event. For me, this would really only occur in a development server where the developers often have enhanced rights to the dev servers. Production is pretty tight. However, my heart is always broken when a developer runs up and asks me to restore an accidentally dropped database on the Dev server, only for me to find out that the same developer created that database without telling me. My backup packages don’t dynamically grab all the online databases when it runs, so if they add a database and don’t tell me, it don’t get backed up.

My first thought was Extended Events. I don’t know much about them, I’ve heard the abstracts and read the rumors, but I haven’t had call to work with them. If I remember correctly, it seemed there was a way to capture the CREATE Database event, but then I was stuck with a data record of the event, and not a handy way to email it to me. Searching for Extended Events Send Email took me quickly to a Johnathan Kehayias awesome answer on the MSDN forums explaining that, no, there’s no real plumbing between Extended Events and the Service Broker. His second answer in the same discussion linked to his blog which is fountain of knowledge and an article explaining Event Notifications vs Extended Events. From there, I saw a link to another article in his blog. Sounded pretty darn close to what I wanted.

That lead to my current version of a generic Event Notification system which I am currently testing on a number of servers. It’s actually a combination of procedures from Jonathan’s articles and Sergey Maskalik’s article. Sergey’s error handling and timeout on the Waitfor along with cleanup of the Conversation Handles coupled with Jonathan’s shredding of the XML message body seems to be a work of art to me. I added in some of my own magic to ensure XACT_ABORT was on, some COALESCE’s to ensure a null value wasn’t concatenated over valid values, and setting this up in a “utility” database and setting “TRUSTWORTHY ON” to allow the execution of sp_send_dbmail in MSDB.

We’ll start with the guts needed to set up and wire up the Service Broker and Queues, Services, and Routes. It’s pretty boiler plate, with the added commands to turn on the Service Broker and set TRUSTWORTHY.

NOTE: Jonathan has visited and brought up a very valid and real security risk. In my approach, I take a utility database and set TRUSTWORTHY=ON to it. Please visit and read Raul Garcia’s article on the risks of the TRUSTWORTHY bit. At the time of my writing this, the database that I deploy this solution to is already restricted to the DBA team and anyone with Sysadmin privileges. In this case, all those principals already have enough access to do what this setting allows without any other work, so I feel the risk in my environment is low. For a more secure solution, I strongly recommend a careful review of your situation, and indeed using certificates to sign the procedures to allow cross-database execution.

USE [master];
go

-- SET EVERYTHING UP WITH THE SERVICE BROKER
--  We could also do this when creating the DatabaseBackup database
--  as part of the initial package run, or update.
--  Trustworthy allows a stored proc in the current database
--   execute SP_SEND_DBMAIL in msdb

ALTER DATABASE DatabaseBackup SET ENABLE_BROKER;
go
ALTER DATABASE DatabaseBackup SET TRUSTWORTHY ON;
go


USE DatabaseBackup
GO

-- Drop the notification if it exists
IF EXISTS ( SELECT  *
            FROM    sys.server_event_notifications
            WHERE   name = N'CaptureDBAEvents' ) 
    BEGIN
        DROP EVENT NOTIFICATION [CaptureDBAEvents] ON SERVER;
    END

-- Drop the route if it exists
IF EXISTS ( SELECT  *
            FROM    sys.routes
            WHERE   name = N'DBAEventRoute' ) 
    BEGIN
        DROP ROUTE [DBAEventRoute];
    END

-- Drop the service if it exists
IF EXISTS ( SELECT  *
            FROM    sys.services
            WHERE   name = N'DBAEventService' ) 
    BEGIN
        DROP SERVICE [DBAEventService];
    END

-- Drop the queue if it exists
IF EXISTS ( SELECT  *
            FROM    sys.service_queues
            WHERE   name = N'DBAEventQueue' ) 
    BEGIN
        DROP QUEUE [DBAEventQueue];
    END

IF EXISTS ( SELECT * 
			FROM MASTER.sys.event_notifications
			WHERE name = N'CaptureDBAEvents' )
	BEGIN
		DROP EVENT NOTIFICATION [CaptureDBAEvents] ON SERVER
	END

--  Create a service broker queue to hold the events
CREATE QUEUE [DBAEventQueue]
WITH STATUS=ON;
GO

--  Create a service broker service receive the events
CREATE SERVICE [DBAEventService]
ON QUEUE [DBAEventQueue] ([http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]);
GO

-- Create a service broker route to the service
CREATE ROUTE [DBAEventRoute]
WITH SERVICE_NAME = 'DBAEventService',
ADDRESS = 'LOCAL';
GO

-- Create the event notification to capture the events
CREATE EVENT NOTIFICATION [CaptureDBAEvents]
ON SERVER
WITH FAN_IN
FOR CREATE_DATABASE, DROP_DATABASE, CREATE_LOGIN, DROP_LOGIN, CREATE_USER, DROP_USER, BLOCKED_PROCESS_REPORT, DEADLOCK_GRAPH, ADD_ROLE_MEMBER, ADD_SERVER_ROLE_MEMBER
TO SERVICE 'DBAEventService', 'current database';
GO

Right above, while creating the EVENT NOTIFICATION, you can see the event types I have. I decided that while it’s great to have AutoGrowth events sent, that in our current environment, this might be more noise than there is value for, so we have left that out for now. Sure there’s a lot more audit events that I could hit up too, but I felt that the ROLL MEMEBERSHIPS, USER and LOGIN work and the DATABASE create and drop were a great start. Also the DEADLOCK_GRAPH was just a nice freebie.

Next, the guts of this, an Stored Procedure that is generic enough to handle different Event Types and shred as much of the XML as possible into a friendly email message. Right, who doesn’t like a mailbox full of raw XML in the morning? 🙂 Note, the final ELSE in the shredding and Email Body building, so if we decided to add a Event Type, we will always just email off the XML until things are fixed.

USE DatabaseBackup
GO

-- Drop the procedure if it exists
IF EXISTS ( SELECT * 
			FROM sys.procedures
            WHERE   name = N'ProcessEvents' ) 
    BEGIN
        DROP PROCEDURE [ProcessEvents];
    END
GO

CREATE PROCEDURE [dbo].[ProcessEvents]
WITH EXECUTE AS OWNER
AS    
	SET XACT_ABORT ON;
    DECLARE @eventType VARCHAR(128);
	DECLARE @messagetypename NVARCHAR(256);
	DECLARE @ch UNIQUEIDENTIFIER;

    DECLARE @serverName VARCHAR(128);
    DECLARE @postTime VARCHAR(128);
    DECLARE @databaseName VARCHAR(128);
    DECLARE @duration VARCHAR(128);
    DECLARE @growthPages INT;   
	DECLARE @userName VARCHAR(128);
	DECLARE @loginInfo VARCHAR(256);
	DECLARE @SID VARCHAR(128);

    DECLARE @messageBody XML;
	DECLARE @emailTo VARCHAR(50);
	DECLARE @emailBody VARCHAR(MAX);
	DECLARE @subject varchar(150);

	SET @emailTo = '<DBA TEAM EMAIL HERE>@gmail.com;   

	WHILE (1=1) 
	BEGIN         
		BEGIN TRY                
			BEGIN TRANSACTION               
				WAITFOR (                        
					RECEIVE TOP(1)    
					@ch = conversation_handle,                                                            
					@messagetypename = message_type_name,                                
					@messagebody = CAST(message_body AS XML)                        
					FROM DBAEventQueue              
				), TIMEOUT 60000;             
				IF (@@ROWCOUNT = 0)              
				BEGIN                     
					ROLLBACK TRANSACTION;                       
					BREAK;                
				END                
				IF (@messagetypename = 'http://schemas.microsoft.com/SQL/Notifications/EventNotification')                
				BEGIN  
					--  Get the common information 
					SELECT @eventType = COALESCE(@messagebody.value('(/EVENT_INSTANCE/EventType)[1]','varchar(128)'),'UNKNOWN'),
						@serverName = COALESCE(@messagebody.value('(/EVENT_INSTANCE/ServerName)[1]','varchar(128)'),'UNKNOWN'),
						@postTime = COALESCE(CAST(@messagebody.value('(/EVENT_INSTANCE/PostTime)[1]','datetime') AS VARCHAR),'UNKNOWN');
					 
					SELECT  @emailBody = 'The following event occurred:' + CHAR(10) 
						+ CAST('Event Type: ' AS CHAR(25)) + @EventType + CHAR(10)
						+ CAST('ServerName: ' AS CHAR(25)) + @ServerName + CHAR(10) 
						+ CAST('PostTime: ' AS CHAR(25)) + @PostTime + CHAR(10);
                    
					-- Now the custom XML fields depending on the Event Type
					IF (@EventType like '%_FILE_AUTO_GROW')
					BEGIN
						SELECT @duration = COALESCE(@messagebody.value('(/EVENT_INSTANCE/Duration)[1]','varchar(128)'),'UNKNOWN'),
							@growthPages = COALESCE(@messagebody.value('(/EVENT_INSTANCE/IntegerData)[1]', 'int'),'UNKNOWN'),
							@databaseName = COALESCE(@messagebody.value('(/EVENT_INSTANCE/DatabaseName)[1]','varchar(128)'),'UNKNOWN');
                    
						SELECT @emailBody = @emailBody
							+ CAST('Duration: ' AS CHAR(25)) + @Duration + CHAR(10) 
							+ CAST('GrowthSize_KB: ' AS CHAR(25)) + CAST(( @GrowthPages * 8 ) AS VARCHAR(20)) + CHAR(10)
							+ CAST('DatabaseName: ' AS CHAR(25)) + @DatabaseName + CHAR(10);
					END
					ELSE IF (@EventType like '%_DATABASE')
					BEGIN
						SELECT @userName = COALESCE(@messageBody.value('/EVENT_INSTANCE[1]/LoginName[1]', 'varchar(128)'),'UNKNOWN'),
							@DatabaseName = COALESCE(@messagebody.value('(/EVENT_INSTANCE/DatabaseName)[1]','varchar(128)'),'UNKNOWN');
					
						SELECT @emailBody = @emailBody 
							+ CAST('User: ' AS CHAR(25)) + @userName + CHAR(10)
							+ CAST('DatabaseName: ' AS CHAR(25)) + @DatabaseName + CHAR(10);
					END
					ELSE IF (@EventType like '%_LOGIN')
					BEGIN
						SELECT @userName = COALESCE(@messageBody.value('/EVENT_INSTANCE[1]/LoginName[1]', 'varchar(128)'),'UNKNOWN'),
							@loginInfo = COALESCE(@messageBody.value('/EVENT_INSTANCE[1]/ObjectName[1]', 'varchar(256)'),'UNKNOWN'),
							@SID = COALESCE(@messageBody.value('/EVENT_INSTANCE[1]/SID[1]', 'varchar(128)'),'UNKNOWN');
					
						SELECT @emailBody = @emailBody
							+ CAST('User: ' AS CHAR(25)) + @userName + CHAR(10)
							+ CAST('New User: ' AS CHAR(25)) + @loginInfo + CHAR(10)
							+ CAST('New SID: ' AS CHAR(25)) + @SID + CHAR(10);
					END
					ELSE IF (@EventType like '%_ROLE_MEMBER')
					BEGIN
						DECLARE @roleName VARCHAR(128);
						DECLARE @command VARCHAR(128);
						SELECT @userName = COALESCE(@messageBody.value('/EVENT_INSTANCE[1]/LoginName[1]', 'varchar(128)'),'UNKNOWN'),
							@loginInfo = COALESCE(@messageBody.value('/EVENT_INSTANCE[1]/ObjectName[1]', 'varchar(256)'),'UNKNOWN'),
							@roleName = COALESCE(@messageBody.value('/EVENT_INSTANCE[1]/RoleName[1]', 'varchar(256)'),'UNKNOWN'),
							@command = COALESCE(@messageBody.value('/EVENT_INSTANCE[1]/TSQLCommand[1]/CommandText[1]', 'varchar(256)'),'UNKNOWN');
						SELECT @emailBody = @emailBody
							+ CAST('User: ' AS CHAR(25)) + @userName + CHAR(10)
							+ CAST('Affected User: ' AS CHAR(25)) + @loginInfo + CHAR(10)
							+ CAST('New Role: ' AS CHAR(25)) + @roleName + CHAR(10)
							+ CAST('Command issued: ' AS CHAR(25)) + @command + CHAR(10);
					END
					ELSE  -- TRAP ALL OTHER EVENTS AND SPIT OUT JUST THE XML - We can pretty it up later 🙂
					BEGIN
						SELECT @emailBody = CAST(@messagebody AS VARCHAR(max));
					END

					-- Send email using Database Mail
					SELECT @subject = @eventType + ' on ' + @serverName;
					EXEC msdb.dbo.sp_send_dbmail                
						@profile_name = 'DBA Email', -- your defined email profile 
						@recipients = @emailTo, -- your email
						@subject = @subject,
						@body = @emailBody;               
				END              
				IF (@messagetypename = 'http://schemas.microsoft.com/SQL/ServiceBroker/Error')            
				BEGIN                        
					DECLARE @errorcode INT;                          
					DECLARE @errormessage NVARCHAR(3000) ;                 
					-- Extract the error information from the sent message                  
					SET @errorcode = (SELECT @messagebody.value(                        
						N'declare namespace brokerns="http://schemas.microsoft.com/SQL/ServiceBroker/Error";                         
						(/brokerns:Error/brokerns:Code)[1]', 'int'));                  
					SET @errormessage = (SELECT @messagebody.value(                        
						N'declare namespace brokerns="http://schemas.microsoft.com/SQL/ServiceBroker/Error";                        
						(/brokerns:Error/brokerns:Description)[1]', 'nvarchar(3000)'));                  
					-- Log the error 
					END CONVERSATION @ch WITH CLEANUP;                             
				END
				IF (@messagetypename = 'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog')                
				BEGIN                       
					-- End the conversation                        
					END CONVERSATION @ch WITH CLEANUP;                
				END                                 
			COMMIT TRANSACTION;   
		END TRY        
		BEGIN CATCH             
			ROLLBACK TRANSACTION;                
			DECLARE @ErrorNum INT;                
			DECLARE @ErrorMsg NVARCHAR(3000);                
			SELECT @ErrorNum = ERROR_NUMBER(), @ErrorMsg = ERROR_MESSAGE();                
			-- log the error                
			BREAK;        
		END CATCH   
	END
GO

Finally, let’s activate the new Stored Procedure by altering the Queue. Again this is pretty boiler plate.

-- Activate the procedure with the Queue
ALTER QUEUE [DBAEventQueue]
   WITH STATUS=ON, 
      ACTIVATION 
         (STATUS=ON,
          PROCEDURE_NAME = [ProcessEvents],
          MAX_QUEUE_READERS = 1,
          EXECUTE AS OWNER);
GO

Thanks, I hope that helps anyone interested in Event Notifications.

Merge replication crash dump

Ran into an interesting issue with Merge replication that had been set up from a vendor. This has been up and running in my environment with a central publisher that is not accessed by any client systems, and three subscribers which are placed regionally and used by client systems exclusively. The publisher simply acts are the publisher and synchronizes changes between the subscribers. The subscribers are pull subscriptions and everything is SQL Server 2008 SP2 CU6.

We added a new subscriber and left it unused by client systems for a few weeks. Things were fine, it was syncing all changes occurring at the other subscribers without issue. Suddenly after some maintenance work the merge process started crashing on this subscriber. In C:\Program Files\Microsoft SQL Server\100\Share\ErrorDumps\ we were getting minidump files every time we restarted the merge agent. I did some analysis of the dump files, and from the public symbols could see that it was a access exception coming from ReplRec.dll

FAULTING_IP:
replrec!CReplRowChange::GetSourceRowData+19
00000000`70e8e469 48833a00 cmp qword ptr [rdx],0

EXCEPTION_RECORD: ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 0000000070e8e469 (replrec!CReplRowChange::GetSourceRowData+0x0000000000000019)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 0000000000000002
Attempt to read from address 0000000000000002

DEFAULT_BUCKET_ID: NULL_CLASS_PTR_READ

PROCESS_NAME: replmerg.exe

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_PARAMETER1: 0000000000000000

EXCEPTION_PARAMETER2: 0000000000000002

READ_ADDRESS: 0000000000000002

What I didn’t realize at the time was that it was actually ssrmin.dll, a custom SQL Replication resolver that says if there is a conflict between two values, the lowest value wins. Looking back at the minidump, and the stack trace, I can see it now…

STACK_TEXT:
00000000`0a1bee60 00000000`70d13482 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`71f0796c : replrec!CReplRowChange::GetSourceRowData+0x19
00000000`0a1beea0 00000000`70e8deac : 00000000`04192ee0 00000000`00000000 00000000`041adf40 00000000`00000000 : ssrmin!MinResolver::Reconcile+0x1b2
00000000`0a1bfad0 00000000`70e3f807 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : replrec!CReplRowChange::Reconcile+0x123c
00000000`0a1bfc40 00000000`70e66592 : 00000000`04212a08 00000000`00000001 00000000`0b87e4d0 00000000`084e2610 : replrec!CDatabaseReconciler::DoArticleLoopDest+0x167
00000000`0a1bfcc0 00000000`70e7432f : 00000000`00000000 00000000`00000000 00000000`00000001 00000000`0000005e : replrec!CDatabaseReconciler::DestThreadProcessQueue+0x9d2
00000000`0a1bfe80 00000000`738d37d7 : 00000000`04390e00 00000000`00000000 00000000`00000000 00000000`00000000 : replrec!DestThreadProc+0x1af
00000000`0a1bff00 00000000`04390e00 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : msvcr80!endthreadex+0x47
00000000`0a1bff08 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`738d3894 : 0x4390e00

Since I couldn’t dig anything further into the dll’s or the debug, I opened a PSS case. In the mean time, I also started some profile traces on both the publisher and the subscriber. I caught where I thought the last TSQL statements were running before crashing, and in hind-sight it was also showing ssrmin.dll, since the article that was being compared was using that custom Minimum resolver.

I have to say, my experience with PSS (MSDN support contract) prior to this has not been pleasant. Long cycle times and delays after sending massive amounts of data to PSS were normal. This time, that was not the case. I opened the incident with as much detail as I could give, including some of the minidump files and my analysis similar to above. Within a few hours I had an email that the case was assigned and that I should expect a call soon. An hour later I had a voicemail on my work phone from Akbar at PSS. He was reviewing the crash dump files and other details without asking me to re-upload the data!

After a few days of some little back and forth, gathering details of the replication topology and gathering version numbers of key files on all the systems, Akbar came back with his analysis of the crash dump files using the private debugging symbols that are available to PSS. He was able to trace through and see that where he expected a function call to jump into SSRMIN.DLL, it was not occurring as expected. He had me compare the version of SSRMIN.DLL and it was not matched REPLREC.DLL. SSRMIN.DLL was at 10.0.4321.0 (SQL 2008 SP2 CU6) and REPLREC.DLL was at 10.50.1600.1.

SSRMIN.DLL version

SSRMIN.DLL file properties - version showing 10.0.4321.0

SSRMIN.DLL file properties – version showing 10.0.4321.0

REPLREC.DLL file properties - version showing 10.50.1600.1

REPLREC.DLL file properties – version showing 10.50.1600.1

This subscriber also has a side-by-side install of SQL Server 2008 R2 which is why some versions were at 10.50.2500.0. What is odd is that two other subscribers were set up the same and also had side-by-side installs of 2008 and 2008 R2, and their versions of the custom resolvers were all at 10.50.1600.1

As a quick test, I copied SSRMIN.DLL from another subscriber and replaced the 10.0.4321.0 version on the bad subscriber. Merge replication was off and running again without crashing.

So we had our problem, we needed a root cause, and we needed a real fix. What had caused this state were part of the Replication bits were updated when installing SQL Server 2008 R2 to a named instance, and how were we going to properly insure that all the bits got updated properly. Akbar recommended running SP1 for SQL Server 2008 R2, which should update all the bits to 10.50.2500.0. After running SP1, I checked the file versions and SSRMIN.DLL (and all the other SSR*.DLL files) were still at 10.0.4321.0.

After reviewing all setup log for the SQL 2008 R2 install in C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\ Akbar noticed that the SQL Server 2008 R2 install had only included the Engine, and not Replication. That’s why SP1 did not touch any of the Replication bits. I ran SQL Server 2008 R2 install again, and this time selected Replication. After completing, and checking file versions, all the DLL’s in C:\Program Files\Microsoft SQL Server\100\COM\ were updated to 10.50.2500.0… Yeah, to SP1 version! So we had our fix. We also had the root cause.

Installed bits for SQL Server 2008 and 2008 R2

Showing the bits that are installed on both the SQL 2008 instance and the SQL 2008 R2 instance.

Since then, I have been able to reproduce this state on a lab machine. I installed SQL Server 2008 with the Engine and Replication selected, then I installed SP2, then I installed CU6. Almost all the files in the COM directory were at 10.0.4321.0 (some were at 10.0.1600.0). Then I installed SQL Server 2008 R2 to a named instance and selected only Engine. The results were that most everything was at 10.50.1600.1, but a number of DLL’s were still at 10.0.4321.0. Here’s the list of what was mismatched.
Ssrup.dll 10.0.4321.0
SSRPUB.dll 10.0.4321.0
SSRMIN.dll 10.0.4321.0
SSRMAX.dll 10.0.4321.0
SSRDOWN.dll 10.0.4321.0
SSRAVG.dll 10.0.4321.0
SSRADD.dll 10.0.4321.0
SPRESOLV.DLL 10.0.4321.0
MERGETXT.DLL 10.0.4321.0
Sqlfthndlr.dll 10.0.4321.0

Personally, I think this is caused by having both releases of SQL Server 2008 and SQL Server 2008 R2 share the same C:\Program Files\Microsoft SQL Server\100\ folder. SQL Server 2005 used the \90\ folder and SQL Server 2000 used the \80\ folder. Akbar is still testing things out in his lab to get me a final answer to my hypothesis. Until then, just something to keep in mind when running SQL Server 2008 and 2008 R2 side-by-side on the same server.

How I upgraded from Windows 8 Release Preview to RTM

I have been dabbling and playing in Windows 8 since the first Release Candidate was out. I like it. Of course, on my laptop I have installed Classic Shell http://classicshell.sourceforge.net/ which brings back my familiar Start Menu and Desktop. Metro seems ok for tablets, but for a Keyboard and Mouse setup, it drove me nuts.

Anyway, to the meat of the post here… When the RTM was released to MSDN yesterday I downloaded the ISO. I then mounted the ISO natively in my Windows 8 Release Preview install and ran Setup. I tried the upgrade path, the selection was available. After a couple of steps I was presented with the “This is not an available upgrade path, please visit http://noupgradepathforyou for more information”.

I decided to try the Windows 7 work around for this. This means mounting the iso in Read / Write (I just copied the already mounted iso to a new folder) and then modifying /sources/cversion.ini. I changed the values of MinClient and MinServer to 7100. I then ran Setup from that folder, and my upgrade was successful.

Now, keep in mind, that this is not supported, and could end up with a catastrophic failure if some bits are not properly upgraded. But 12 hours in, and things seem stable.

Reporting Services 2008 R2 subscription error

So today we’re setting up new SQL Server 2008 R2 servers from existing SQL Server 2005 server. One of the parts is Reporting Services reports running using Data Driven Subscriptions. I inherited the design of this system, where I feel Reporting Services has been turned sideways to simply use the Enterprise Edition feature of Data Driven Subscriptions to simply allow users to schedule reports to email to end users.

In the existing SQL Server 2005 system, the service account being used to run Reporting Services is a Domain Administrator account (Yeah, I know!), and the “administrator user” who set up the schedules, reports, and subscriptions is in the Local Administrators group in the OS and in the SysAdmin role in SQL Server. Again, this was inherited.

So, when I set this up and installed 2008 R2 and SSRS on the new server, there was absolutely no way I was going to set the service to run under a domain admin, and we’re also enforcing no administrator accounts for developers on the production instances. Code deployments are going through TFS and any DDL changes that they don’t script out in source control will go through the DBA team. Set up SSRS to run with a dedicated AD service account with minimal rights on the OS and in SQL. Everything works via the Web UI. However, no emails were sent for the scheduled reports.

The reports will email the user reqeusting the report with the PDF embedded. The error the ReportServer database table was “rsConfigError” and the error in the trace file was a generic “Configuration Error”. After checking permissions on the data sources for the reports to make sure the configured user had permissions, and trying to set up an execution account yielded no improvement, I switched the service account to run under LocalHost\System account. That yielded no better results, actually there were some errors for AuthzInitializeContextFromSid and Access denied trying to look up the AD account. So then I decided to throw out a curve ball and set the service account to a domain admin account. I know! I did it simply for testing. So after this, the trace log showed new information… about not being able to authenticate the “administrative” user’s account… WTF, where was that coming from!!?? I then added that account in the Administrators group in Windows, and BLAM! Reports were emailed. W.T.F.?!!!?

I switched the service account back to my domain service account (no way I’m running this as a domain admin)… and back to the generic error. So then I started digging to the ReportServer database, and found the “Subscriptions” table with an OwnerID column and a GUID. Cross-referencing with the Users table, and blam, there’s the “administrative” user’s account as the owner. Damn you MS! The user that configures the subscriptions needs elevated permissions to send emails with attachments. And… there’s no way to change the owner of the subscription via the GUI. So I updated the OwnerID column to NT Authority\System, removed the “administrative” user’s account from the Administrators group in Windows. Everything works as expected. Unfortunately, I have no idea of what might not work properly going forward with this manual change. Also, I’ve got to do this work around if the “administrative” user ever creates new reports and schedules and subscribes them for end users.

SQL Server backups to Alternate Data Streams or Colons

So in the midst of a very busy day I performed a manual backup of a database for a developer so he could make some major changes, test, and rollback if needed. I entered the backup file name with a colon on the timestamp…DUH! Of course, this worked, as NTFS supports the use of Colons in the filename. Went back later to restore the backup for the developer and the filename was truncated at the colon and was 0 bytes in size. WTF, the backup worked, there was no warning or error from SQL Server? Then I remembered my old NTFS “friend” – Alternate Data Streams. Basically ADS is a way to put data into different streams of the file. If you’ve ever wondered how Windows knows to warn you when you run an executable downloaded from the internet via IE, this is how. IE places a “zone.identifier” in the ADS to let Windows know this file might not be safe.

There’s a couple of ways to get around this and recover the backup regardless of the truncated filename and the size of 0. The quickest and easiet way is to just restore the database or log from TSQL. So if you backup a database with

BACKUP DATABASE test TO DISK='test_11:30.bak'

That will work fine. In your default backup directory, you’ll see a file “test_11” and it will be 0 bytes in size. If you then try to use the SSMS GUI to restore this, it will fail.

If you instead use TSQL…

RESTORE DATABASE test FROM DISK='test_11:30.bak'

it will work.

The colon tells the OS to create a file with aname of everything before the colon, and all the data into an ADS with an idetifier of everything after the colon… so in our example, test_11 has an ADS in it with an identity of :30.bak. The backup data is all there in that stream.

So now you’re saying “Well, what if I don’t know the stream identity”! If that’s the case there are a number of tools that can tell you all the ADS identities in a file… I use STREAMS from that Sysinternals genius, Mark Russinovich. If will spit out the ADS in the file you give it. I’ve also used notepad and a Windows port of the *nix “cat” command to pull the data out of that ADS and into a new file. That new file would then be able to be restored from with the SSMS GUI. With Notepad, just open a command prompt and type NOTEPAD test_11:30.bak and give it some time, and it will have all that data in Notepad. Save that as test.bak and you can restore anyway you want.

Denali Always On adventures

I’ve built a new AD controller VM and two SQL server VM’s with Windows Server 2008 R2 EE. Joined both SQL servers to the domain. Installed the Failover Cluster feature on each. Installed Denali RC0. Enabled AlwaysOn High Availability in the SQL Server Configuration Manager.

Then I created a FayWorks database, and a new Availability Group. I set up the primary and then a replica / read only / preferred backup. Then I set up a Availability Group listener. I connected with SSMS to the AG via the listener, started a script that inserted 5000 rows into a temp table in RBAR fashion. Initiated a failover. The insert failed at some point, but I was able to restart the insert without reconnecting or changing anything after the failover completed. Slick as a pan covered in bacon grease.

This combines the best of HA Failover Clustering with the best of Mirroring / Log Shipping / Etc. Being able to geographically set up an Availibility group, having the mirror be targeted for backups, reporting, etc, having up to 4 replicas, completing a failover of just an AG, creating a virtual instance name / IP. Oh yeah, Denali is a game changer.

Some links I have used

http://msdn.microsoft.com/en-us/library/hh213080(v=sql.110).aspx
http://msdn.microsoft.com/en-us/library/hh213417(v=sql.110).aspx
http://www.brentozar.com/archive/2011/07/how-set-up-sql-server-denali-availability-groups/

There seem to be a lot of questions and even some misinformation popping up with regards to SQL Server 2012 licensing. Microsoft is moving away from licenses based on the number of processor in a server to figure out per/cpu licensing. They are now licensing hardware on a per/core license.

Right or wrong, agree or disagree, here’s the details as I know them. Based on a number of sources, including a meeting with my employers VAR and an internal MS licensing expert.

Per core licensing is based on “Core-Packs”. Each core-pack covers two cores, and there is a minimum purchase of two core-packs. This will be an expensive premium if one plans on building a single processor dual core machine, and you’re paying to license a minimum of four cores.

I’ve been told that the core-packs cost 50% less than current per/cpu licenses. That makes the magic number a total of 4 cores with 2012 licensing costing the same a single processor license with current license costs.

I’ve read that with 2012 licesning – in the case of virtualization if you license all the physical cores you have unlimited rights to virtual OS’s (vOS). Current day licensing with Enterprise Edition would only allow a total of 4 vOS’s per license.

Also, current customers with an EA will retain their current purchasing plan until the expiration of the EA, regardless of when that is.. if it’s Jan 1 of 2013, then all of 2012, you continue to purchase the licenses as you have. Once the EA is up, you will have to submit the number of cores in your currently licensed environment and MS will “trade” those for the equivalent number of Core-Packs. I’ve heard conflicting reports of a hard limit of 20 cores or 10 Core-Packs per server, and other reports that do not mention that limit.

SQL Server, Kerberos, SPN

So having a lot of fun recently here at work. First some background information.

We have two domains, an internal domain, and an external domain. We’ll call them JASONINC.com and JASONEXT.com. We have a one way trust between them that says JASONEXT.com trusts anything from JASONINC.com. I have SQL Servers running in both domains, and Kerberos has worked flawlessly to allow JASONINC.com users to connect to the JASONEXT.com SQL Server regardless of the Service Account used to run SQL Server. In JASONEXT.com I have some SQL Servers running under Local Service, some runing under a JASONEXT service account, and some even running under a JASONINC service account.

Suddenly on Monday morning, for any SQL Server in JASONEXT I was receiving an “Cannot generate SSPI context” error from Management Studio. It’s very transient, at my office location, I could not connect to any JASONEXT box via the short name, FQDN via Kerberos. If I specified the IP address in the connection, it worked, because it would fall back to NTLM authentication.

Looking at the ERRORLOG on these SQL Servers, the ones not running under the JASONEXT service credentials regestered their SPNs without issue. The ones running under Local Service or JASONEXT accounts would not register their SPNs with an error 0xd state 13.

What has seemed to be working is manually setting the SPNs via SETSPN -A mssqlsvc/hostname:port Service Account on the JASONCOMINC domain.

However, there’s still something funny happening. I have one SQL Server running under Network Service, it was restarted 2 months ago, the log indicates that the SPNs were registered properly, but I still get “Cannot generate SSPI context”.

Netmon traces are weird… I see the kerberos call from my client to the primary DC in the location where the SQL Server is located, with the SPN in the request. I see a response from the DC saying contact krbtgt/root DC. That’s the end of the Kerberos traffic…. I never see the client then call to the root DC asking for the SQL Server SPN ticket.

On ones where it is working now via a manual SETSPN, I see much more, I see the call to the primary DC where the SQL Server is located, I see the response with krgtgt/root DC. I see the client call to the root DC, I see the response from the root with krbtgt/root JASONEXT DC. I see the client call out to the root JASONEXT DC, I see a response from that… but the response is 0x1f KRB_AP_ERR_BAD_INTEGRITY but it connects. I did not check to see if it failed back to NTLM or if it was connected via Kerberos.

This morning, from one client, I am getting 0x7 KDC_ERR_S_PRINCIPAL_UNKNOWN, but connecting, and I verified it fell back to NTLM to connect. I’m lost with all the different things happening at different times from different hosts and clients. Thankfully we don’t use domain service accounts for our applications other than Sharepoint, and thankfully for Sharepoint, we don’t try to cross the domains. The problem comes when developers are trying to connect into the JASONEXT.com domain. The work around is to have them connect via a SQL Login.

Anyone have any ideas or other troubleshooting tips? I’m going to sit in our AD admin’s cube this morning and have him prove out that our trust is correct and working between the domains. I want to get MS involved with the odd error I see in the SQL Server ERRORLOG for failing to register the SPN, but most of those errors are from months ago, and up until Monday everything was working.

FUN SQL Server Publication error

Transactional Replication from single publisher to three remote subscribers. Right click on Publication at Publisher, and click properties. Try to select another page other than the General page which shows by default pops up with an error …. “The value must be greater than or equal to -1 and less than or equal to -1”. I think that means it HAS to be -1… default value is 0.

LOVE IT