Ok, this is an interesting issue…
We have, to quickly summarize, an architecture that was developed before I joined… the team moved from a single SQL Server database with 12 DB’s on it, to two DB Servers, and separated the db’s loosely into data processing and data presentation. Unfortunately there is one process that crossed those boundaries and has to do a join query to a database on one server and insert the results after the business logic is processed onto the other server. Both servers are SQL Server 2005 SP3.
Right now, we’re working from the presentation server, where the data is inserted, and have a link to the processing server where a large table is joined to a small table, and also joined to a couple of tables on the presentation server.
We have a SP that holds the SELECT and join. Often times this runs fine, and returns data in an acceptable time considering the large table contains 80 million rows. However, after a time, it seems that the performance drops massively, and even times out the default settings for the linked server connection (10 minutes). When that happens, we actually have to drop and recreate the stored procedure. Yes, setting WITH RECOMPILE on a call to the SP does not help. Setting WITH RECOMPILE inside the SP does not help. The odd thing is that when this happens and the SP query return nosedives, I can pull the query out of the SP, run it with the same parameters, and BAM! 7 seconds later I get the results as expected. We’ve even tried replacing the parameters with local parameters, worried about parameter sniffing – which shouldn’t really matter if we force a RECOMPILE.
So, we’re working out some other options, such as switching the flow, and the linked server direction, since the tables on the Presentation DB Server are very small. The insert it larger, but the code can just connect directly to the server once the in memory processing is done.
I know there’s also issues if you use a login for the linked server that does not have permissions to get the statistics for the tables being queried, but two things tell me that’s not the issue… 1) I’ve since changed the login to be in the SysAdmin role, and 2) When I run just the query instead of executing the SP, the results are lightning fast… so I know the statistics are being used.
WOW, has anyone seen this before?