Monday, May 30, 2016
0 comments

SQL Server Hardware Configuration Best Practices

2:53 PM
Problem
You have been asked to deploy a brand new SQL Server instance.  Your management asks you to come up with the best balance of availability, performance and cost for SQL Server.  What do you recommend?
Solution
I'm going to try to describe my recommendations for hardware and server configuration best practices.  However, let me just say that best practices are somewhat of a sticky issue.  Almost everybody has an opinion just like almost everybody has a nose.  I am hoping that this article will generate a good deal of helpful discussion.  I really hope community members keep it to a helpful discussion however.

High Availability

The whole point of High Availability (HA) is to have your service available to respond to requests, to have your database available and not have consistency issues.  There are four main ways to provide HA; power, redundant hardware, clustering and multiple copies of your database.

Power

Your server at a minimum should have dual power supplies connected to a UPS.  For extra protection, it is good to have separate power feeds as well. 

Redundant Hardware

As listed above, you should have dual power supplies, and for disks, some form of RAID that provides redundancy.  Jeremy Kadlec describes the different types of RAID in his article Hardware 101 for SQL Server DBAs.  In general, RAID 1, 5, 6, 10 or 0+1 will provide redundancy.  You should always configure hot spares when using RAID with redundancy.  That way, you have to have at least two failures before you actually go down.
It's not exactly redundant hardware, but backups should always be on another device, like a file server.  That way if your SQL Server completely melts into the ground, your backups are safe elsewhere.

Clustering

Clustering typically refers to combining two or more servers together to allow them to either share workload, or takeover workload quickly in the event of a failure.  When it comes to SQL Server, there are two types of clustering we are interested in.  Windows Server Cluster is the combination of two or more Windows Servers into a cluster.  SQL Failover Cluster Instance (FCI) is an instance of SQL Server that can run on the nodes of a Windows Server Cluster.
So I should point out here that it is possible to create a Windows Server Cluster that only has one node.  Why you might ask would you want to do that?  There are a couple of reasons:
  • If you start out with a one node Windows Server cluster and install an FCI, you can add more nodes later and then install a SQL FCI on them as well.  If you start out with a stand-alone Windows Server and install a stand-alone SQL Instance you cannot convert them into a cluster later.
  • A SQL FCI has a virtual computer object associated with it, which means if you use a non-default instance name, you can use the virtual computer object name to refer to the instance instead of the format: \[InstanceName].
The way you configure the quorum for your cluster is critical.  There are way too many factors to consider for me to explain all of them here.  The way you configure your quorum determines how many layers of failure are necessary before the cluster fails.  Please see this article for more information: Understanding Quorum Configurations in a Failover Cluster.

Multiple Copies of Database

Another way to provide HA is to have multiple copies of your database available that can be switched to if needed.  Copies of databases are usually described as some sort of standby (hot, warm, cold).  Each has it's features as described below:
  • Hot Standby - Hot standby means a copy of the database is available immediately with no intervention from a DBA.  Hot standby usually uses something like Peer to Peer replication to keep more than one copy of the database available for immediate use.  Hot standby will require the client to be smart enough to reconnect to the other database if the original fails.  Theoretically only current transactions would be lost if you have to failover to a hot standby database.
  • Warm Standby - Warm standby means a copy of the database is available quickly with little intervention from a DBA.  Mirroring is a good example of a warm standby database.  It is easy to switch from the primary to the mirror, and can even be accomplished automatically if a witness server is used.  Client connections are often smart enough to reconnect if the primary fails over to the mirror.  With mirroring there are two modes of operation, synchronous, and asynchronous.  In synchronous mode, only current transactions would be lost.  In asynchronous mode, only transactions since the last update was applied would be lost.  Typically this would be only a few seconds at the most.
  • Cold Standby - Cold standby means there is a copy of the database that can be made available, but it may take some work, and it is not likely to be easy to switch the roles of the databases.  Log shipping is an example of making a cold standby copy of a database.  In cold standby mode with log shipping, several minutes of transactions might be lost in a failover.
Microsoft has changed mirroring starting with SQL Server 2012.  They are deprecating mirroring and they are replacing it with AlwaysOn Availability Groups.  There are several differences, but a couple really stand out for this discussion.  First, you are now able to have a secondary database that is read-only.  Before the mirror was always offline in a restoring mode, so now the secondary (mirror) is able to be used for reporting or other read-only uses.  Another interesting feature is that as the name implies databases can be grouped together into Availability Groups.  Now you can fail over a group of databases instead of only one at a time. 

Performance

When it comes to performance, the main factors include: hardware bottlenecks, database design and application design (server processing vs. client processing and locking).  In this tip, I am only going to speak to hardware bottlenecks.  Might look into database design and application design later.
The primary types of bottlenecks are disk IO, memory, CPU and network IO.

Disk IO

Disk IO speed is critical to performance.  There are three places that disk IO can impact performance; Data Files, Log Files and TempDB files.  To a lesser degree, the Windows swap file can sometimes impact performance.  For the data files, expect random IO, for log files, it will pretty much always be sequential IO, and TempDB is the same for its data and log files.  Different types of RAID work better with sequential IO than with random IO.  If I can afford it, I try to use at least four disks in a RAID 10 configuration for each data volume, and two disks in a RAID 1 configuration for a log volume.  I sometimes use RAID 5 when performance isn't as much an issue, because you get more capacity and still have one layer of redundancy.  See Hard Drive Configurations for SQL Server for more information on choosing a RAID type.
It is best to separate the following loads into different volumes: backups, logs, data and TempDB.  While I am on the subject DO NOT partition a disk or RAID volume into multiple drives.  This will force disk head thrashing which is BAD.  Also pay attention to the allocation unit size when formatting the volumes.  In general I have found that 64k provides the best throughput.  Each configuration can vary, so it is a good idea to test your IO performance with a tool like SQLIO before you put it into production to make sure you are getting good performance.  See the tip Benchmarking SQL Server IO with SQLIO for more information on using SQLIO.

Data File IO

SQL Server is really good at multi-threading IO if there is more than one file available, so for data files where performance is important, it is often a good idea to have more than one.  There are guidelines all over the Internet regarding the optimum number of data files compared to the number of CPU cores or the number of sockets, but I don't really have a good recommendation.  Keeping the IO balanced is important, so if you add a file to an existing database, make sure you spread the load out over the files.  As a general rule of thumb, I go with four data files for databases where I think performance will be an issue.

Memory

SQL Server likes to cache data pages in RAM, so the more RAM you can throw at a SQL Server instance, usually the better.  This means that 64 bit is a must.  It is also important to note that SQL Server uses memory besides what you assign as the min and max memory values.  For one thing, it uses memory for each connection to the instance.  The OS also needs memory, so for my servers, I usually set aside 8GB for OS and SQL overhead.  I used to see the recommendation to set aside 4GB for OS and SQL overhead, but lately I have run into problems with that amount.
It might be good to talk about the min and max settings.  The min setting is what SQL Server uses to decide if it has enough memory to startup.  If there isn't at least enough free memory to match the min memory setting when SQL tries to startup it will abort and say there isn't enough memory.  The max memory setting is the maximum amount of memory that SQL Server will use if it is available.

CPU

We have already established that we need both a 64 bit processor and OS to support as much memory as possible.  I have rarely had a problem with CPU being a bottleneck with my SQL Servers.
Things to watch out for that might have a CPU bottleneck are things that can utilize parallelism.  Index rebuilds, seeks, and joins are some definite culprits.

Network IO

In general, there are at least three types of traffic at your SQL Server; data coming in, data going out and management traffic.  If you follow the advice from above where your backups are on another device like a file server, you will also have backup traffic going out.  You also might have block storage traffic if you are using ISCSI.
It is a good idea to segment your network traffic by types if possible, so it does not interfere with other types.  For example, if you can setup a dedicated network adapter for your backups, and one for your ISCSI, they will not interfere with your data traffic.
It is also a good idea to encourage your application team to limit as much as possible the data they pass between the SQL Server and the Application server.  Using WHERE clauses, not using SELECT * and so forth will limit the amount of data returned with a query.  Using stored procedures is also a good way to keep the data on the SQL Server.
Next Steps
  • I recommend spending time with your favorite search engine researching the following:
    • RAID Performance
    • ISCSI
    • SQLIO
    • Allocation Unit Size and performance with SQL data pages
    • Eliminating performance bottlenecks with SQL server
  • I am recommending using a minimum of four volumes for a SQL instance.  In addition, you will have your OS volume.  If running multiple instances, it might be a good idea to look into using mount points to limit the number of drives you are looking at.

0 comments:

 
Toggle Footer