SQL Agent – CMDExec – ODBC – Oh My!

So, a vendor app sets up some SQL Agent jobs that call out to the OS to run an executable. This Executable then connects back to SQL and to some other systems to compute some values and store those. I didn’t write this app, and I didn’t design the system. If I did, I wouldn’t be letting them use SQL Server Agent as a Task Manager.

Current system has SQL Server Agent running under a domain level service account – lets say CORP\SQLAcct. The CMDExec calls go out as the user running SQL Agent to fire an executable. This Executable uses a System DSN set up to connect back into SQL Server. The DSN is configured to verify the authenticity of the login ID With Windows NT authentication using the network ID.

Queue the investigation… SQL Server log files are logging
2011-04-19 16:15:03.78 Logon Error: 18456, Severity: 14, State: 5.
2011-04-19 16:15:03.78 Logon Login failed for user 'SQLAcct'. Reason: Could not find a login matching the name provided. [CLIENT: local_ip]

over and over. Initially it doesn’t tell me anything, other than the only executable running under the SQLAcct user is SQL Agent. So, I stop the Agent. Blam, errors disappear. Start the agent, they start again. Now I involve the vendor, and I ask them to check their configurations and DSN settings. They report back that everything seems fine. So I decide to change the account that the Agent is running under – change to CORP\ServAcct and the errors follow, logged under ‘ServAcct’. Interestingly both domain accounts have access to the SQL Instance.

So my next step is that I create a SQL Login as SQLAcct with the domain service account’s password. Get a new error…
2011-04-19 16:16:52.20 Logon Error: 18456, Severity: 14, State: 8.
2011-04-19 16:16:52.20 Logon Login failed for user 'SQLAcct'. Reason: Password did not match that for the login provided. [CLIENT: local_ip]

Closer though!!! Some how the SQL Agent CmdExec call is not passing out the domain info, or the DSN call is not passing the domain info back to SQL Server as the executable runs. So I change the SQLAcct SQL Login to a blank password, and BLAM!!! no more connection errors.

Process Explorer shows me that when the executable is running, it’s running as the full domain/user, so I think that’s being passed out correctly. I think it’s the DSN configuration or runtime is not passing it back to SQL. I’ll have to get back with the vendor tomorrow and have them change their DSN’s, like I asked 2 days ago!

Recover a database from a older full backup and a current .LDF file

Pinal Dave at SQLAuthority.com posted an interesting question that tested my knowledge. I was sure there was a way to complete his question, but I haven’t had much experience with databases in FULL recovery model in the past.

His question

Let us assume that you have corrupted (beyond repairable) or missing MDF. Along with that you have full backup of the database at TimeA. Additionally there has been no backup since TimeA. You have now recovered log file at TimeB. How to get the Database back online with the same state as TimeB?

My response is as follows.

On a test system, I created a database FullBackupTest with Full Recovery Model.
I created a table TableA.
I inserted a single record into TableA.
I took a full backup of FullBackupTest.
I stopped SQL Server Service and copied both the .mdf and .ldf files to different folder.
I started SQL Server Service
I inserted two more records into TableA.

I then stopped the SQL Server Service.
I then deleted the .mdf file in the Data directory.
I started SQL Server.
Database FullBackupTest failed to come online.
Just for tests, I stopped SQL Server Service again and copied the .mdf file back from the copy I made manually (I now have the .mdf file from an earlier copy out of sync with the .ldf file).
I started SQL Server and the database failed to come online.The Error in the log file was

2011-04-14 21:13:15.37 spid16s Error: 5173, Severity: 16, State: 1.
2011-04-14 21:13:15.37 spid16s One or more files do not match the primary file of the database. If you are attempting to attach a database, retry the operation with the correct files. If this is an existing database, the file may be corrupted and should be restored from a backup.

Looks good, can’t just do that, as I expected. For further testing, I did the following
I switched FullBackupTest to Emergency Mode
I switched FullBackupTest to Single User Mode
I ran dbcc checkdb (FullBackupTest).
No errors were reported.
Database would not switch from Emergency to Online mode with same error as above.
I then ran dbcc checkdb (FullBackupTest, repair_allow_data_loss)
Database would then switch to Online mode, however we were back to a single record in TableA, so it was just the same as a restore from the full backup at TimeA.

Then I stopped SQL Server Service one more time, and restored both the mdf and ldf from the manual copy.
Started SQL Server, verified FullBackupTest came online, and there were three rows in TableA.
Stopped SQL Server Service
Deleted the .mdf file
Started SQL Server Service.
FullBackupTest failed to come online again.
In Management Studio I executed
BACKUP LOG FullBackupTest TO DISK='FullBackupTest.trn' WITH NO_TRUNCATE"
Then
RESTORE DATABASE FullBackupTest FROM DISK='FullBackupTestTimeA.bak' WITH NORECOVERY

This set the database in RECOVERING Mode and allowed me to restore Log File Backups.
I then executed
RESTORE LOG FullBackupTest FROM DISK='FullBackupTest.trn' WITH RECOVERY

Once complete, I find three rows in TableA as I expected to.

The keys are
1) To make sure you backup the tail of the log on the lost database.
2) Restore the full backup using the WITH NORECOVERY
3) Restore the log file backup from Step 1 using WITH RECOVERY to switch back to Online mode.

Great question Pinal!