Sql Virtual Machine Race Condition
I’ve been having problems deploying a Sql Virtual Machine in Azure lately and decided to perform some tests to get to the bottom of the issue. I’d like to share some strange behaviour.
If I deploy a new SQL VM using bicep, it deploys fine. If however, I redeploy the same VM the deployment fails with this error:
- code: Ext_AutomatedBackupError
- Error: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.;System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. —> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out.
In the SQL Server ERRORLOG on the VM:
Delving into the previous
ERRORLOG.1 I can see this
This suggests that SQL Server is unable to create tempdb.
- Location: UK South
- VM Size: Standard_D2ds_v4
- SqlImageType: sql2019-ws2022
- SqlImageSku: sqldev-gen2
If you want to reproduce the issues I am having I created a repo here. Just clone the repo and change into the repo directory in PowerShell 7, Connect to the Azure subscription you want to deploy to and then run
Initial deploy works fine
Takes 8 minutes to run.
Redeploy without rebooting
Takes 2 minutes to run.
Redeploy after VM deallocated
Strange workaround 1
Restart the SQL Server service and redeploy
Strange workaround 2
Change the VM Size to Standard_D2ds_v4 and redeploy (same as rebooting?)
Suggestion 1: Delayed Start
I raised a Support Request with Microsoft and the suggestion that came back was to set the SQL Server service to Delayed Start. This is because there is a program scheduled using the task scheduler to create the
D:\SQLTemp\Log directories on server boot. Looking in Task Scheduler, there is indeed a task called SqlStartUp which creates these tempdb directories.
This is a race condition, where if the SQL Server service wins the race with the SqlStartUp scheduled task, SQL Server will fail to start, because the tempdb directories do not exist.
I tried the Delayed Start suggestion but I get the same error and I suspect that it worked for Microsoft because the race condition just happened to fall on the correct side. However, when I run the deployment I get the same
Suggestion 2: Manual Start
The next suggestion from Microsoft that seemed to work for them was to use Manual Start for the SQL Server and SQL Server Agent services. However when I tried it, I get the same
SqlTimeOut error. Again, classic race condition.
Suggestion 3: Place TempDb on Premium attached storage
Adding another premium disk to the machine and configuring TempDb file to reside on there “solves” the problem. However according to Microsoft’s best practises guide for SQL Server on VMs, TempDb should be placed on the ephemeral local SSD drive (D:\ by default).
Alternative home-grown solution
One solution I am considering, but would rather not, because I am coding around Microsoft, is to set the SQL Server services to Manual start, and run a PowerShell script on startup which creates the TempDb directories, followed by a start of the SQL server services. This should eliminate the race condition.
For now, I will configure the Sql Virtual Machines to use premium attached storage. In my environment it is not too important to have ultra-fast TempDb performance, but I know for some people that is a requirement. I have created a branch called workaround in the repo to show you have I’ve done that. See https://github.com/markallisongit/sqliaas-demo/tree/workaround
Microsoft have escalated the issue to the SQL Product Team, and I’m still waiting for that and will post an update when I hear back.