SQL Server, Kerberos, SPN

So having a lot of fun recently here at work. First some background information.

We have two domains, an internal domain, and an external domain. We’ll call them JASONINC.com and JASONEXT.com. We have a one way trust between them that says JASONEXT.com trusts anything from JASONINC.com. I have SQL Servers running in both domains, and Kerberos has worked flawlessly to allow JASONINC.com users to connect to the JASONEXT.com SQL Server regardless of the Service Account used to run SQL Server. In JASONEXT.com I have some SQL Servers running under Local Service, some runing under a JASONEXT service account, and some even running under a JASONINC service account.

Suddenly on Monday morning, for any SQL Server in JASONEXT I was receiving an “Cannot generate SSPI context” error from Management Studio. It’s very transient, at my office location, I could not connect to any JASONEXT box via the short name, FQDN via Kerberos. If I specified the IP address in the connection, it worked, because it would fall back to NTLM authentication.

Looking at the ERRORLOG on these SQL Servers, the ones not running under the JASONEXT service credentials regestered their SPNs without issue. The ones running under Local Service or JASONEXT accounts would not register their SPNs with an error 0xd state 13.

What has seemed to be working is manually setting the SPNs via SETSPN -A mssqlsvc/hostname:port Service Account on the JASONCOMINC domain.

However, there’s still something funny happening. I have one SQL Server running under Network Service, it was restarted 2 months ago, the log indicates that the SPNs were registered properly, but I still get “Cannot generate SSPI context”.

Netmon traces are weird… I see the kerberos call from my client to the primary DC in the location where the SQL Server is located, with the SPN in the request. I see a response from the DC saying contact krbtgt/root DC. That’s the end of the Kerberos traffic…. I never see the client then call to the root DC asking for the SQL Server SPN ticket.

On ones where it is working now via a manual SETSPN, I see much more, I see the call to the primary DC where the SQL Server is located, I see the response with krgtgt/root DC. I see the client call to the root DC, I see the response from the root with krbtgt/root JASONEXT DC. I see the client call out to the root JASONEXT DC, I see a response from that… but the response is 0x1f KRB_AP_ERR_BAD_INTEGRITY but it connects. I did not check to see if it failed back to NTLM or if it was connected via Kerberos.

This morning, from one client, I am getting 0x7 KDC_ERR_S_PRINCIPAL_UNKNOWN, but connecting, and I verified it fell back to NTLM to connect. I’m lost with all the different things happening at different times from different hosts and clients. Thankfully we don’t use domain service accounts for our applications other than Sharepoint, and thankfully for Sharepoint, we don’t try to cross the domains. The problem comes when developers are trying to connect into the JASONEXT.com domain. The work around is to have them connect via a SQL Login.

Anyone have any ideas or other troubleshooting tips? I’m going to sit in our AD admin’s cube this morning and have him prove out that our trust is correct and working between the domains. I want to get MS involved with the odd error I see in the SQL Server ERRORLOG for failing to register the SPN, but most of those errors are from months ago, and up until Monday everything was working.

4 thoughts on “SQL Server, Kerberos, SPN

  1. If you are getting “Cannot Generate SSPI Context” from Management Studio, that means it found an SPN and chose to go down the kerb path, but encountered a failure after that point. Once we choose to go kerb, a failback to NTLM is not possible. That happens during the Negotiate phase which pretty much just checks to see if an SPN is available. Doesn’t mean it will succeed though. I’ve seen cases where the SPN was misplaced and led to this.

    If you receive a KDC_ERR_S_PRINCIPAL_UNKNOWN, that means it couldn’t find an SPN – or there was a duplicate. This would cause it to go NTLM.

    I wouldn’t have throught that this config would work, but it may depending on which end of the trust you are invoking. I’d have to test it more though. I’ve always been told that a two way transitive trust is required for Kerb to work.

    Have you actually seen a connection that shows Kerb before?

    the 0xd error is KDC_ERR_BAD_OPTION – KDC cannot accommodate requested option. This can happen for a lot of different reasons.

    0x1f KRB_AP_ERR_BAD_INTEGRITY – Integrity check on decrypted field failed

    Saw some references to password issues with that error.

    It may be worth calling into Support to see what the Directory Services Team can do for you to try and explain the behavior. Again though, i’d have to test this more to see if this should even work from a Kerb perspective.

    • Some interesting updates. When I stepped through the netmon traces, I found the kerberos request change flowing as expected to a certain point and then returning the 0x1f KRB_AP_ERR_BAD_INTEGRITY error.

      So, if this might be followed, from client.jason.JASONINC.com, I saw the first request go to the local DC at jason.JASONINC.com. The response was SNAME JASONINC.com (or the root domain). The next call from the client went to the JASONINC.com DC. The response from that was SNAME JASONEXT.com. Then the client called to JASONEXT.com, and that reply was the KRB_ERROR – KRB_AP_ERR_BAD_INTEGRITY… however, the SNAME in that packet had the correct host name, and even the correct port that the named instance was running on… the dynamic port.

      So I wanted to install Netmon on the DC that was part of JASONEXT.com – the root of the other domain. When I went to RDP to it, I received “There are currently no logon servers to process the request” or something similar. Forced a reboot, windows updates were applied, second reboot, back up and running as expected.

      The SSPI errors from the client’s SSMS disappeared. I am now receiving “KDC_ERR_S_PRINCIPAL_UNKNOWN” for servers in the JASONEXT domain running with a jason.JASONINC.com domain account. Should be expected, there shouldn’t be an SPNs and it does fail back to NTLM connection.

      On servers running with a jasonext.JASONEXT.com domain account, I connect via Kerberos with my jason.JASONINC.com windows account. So the JASONEXT.com service accounts are able to create an SPN, and the delegation is working properly.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.