by Scott Ellis on October 16, 2013
Brent Ozar is one of approximately 100 Microsoft Certified Masters of SQL Server, and we’ve been fortunate enough to have him helping out our users as a go-to SQL expert since 2011. Last week marked his second time attending Relativity Fest, where he presented two sessions on performance tuning for Relativity’s SQL Servers. Today, he’s sharing some of that insight as our first guest blogger.
Every month, Relativity sends me to a different client to help system administrators with everything from making their Relativity SQL Servers faster and more reliable to designing the right indexes for their document table. I, in turn, take what I hear from customers and suggest how Relativity can fold that feedback back into Relativity. As you might guess, this has given me a pretty good idea on how to tune the performance of your Relativity SQL Server. Let’s dive in.
How Relativity Uses SQL Server
SQL Server is the backbone of Relativity. As documents come in from the field, the document metadata—source, file name, document type, etc.—are loaded into SQL Server. As the processing software figures out the file’s contents, the extracted text is also added.
Databases grow large quickly even though the document itself isn’t loaded into the database. Each workspace is its own SQL Server database, and these databases can easily grow into the terabytes. Database management is made more difficult because, when the case is first created, we have no idea how big it’ll become—some cases are just a few gigs, and others grow into the terabytes with no warning.
How Relativity Challenges SQL Server
Lawyers want near-zero downtime. Any given SQL Server can be hosting dozens—or hundreds—of live cases, all of which have different law firms involved, often with global staff analyzing documents around the clock. Typically, this means multi-node failover clusters with complex disaster recovery strategies.
Users can add databases on any server at any time. New cases are added without warning. Your disaster recovery mechanism had better be able to keep up.
Users can add their own columns at any time. When you add columns in SQL Server, you need a table lock, and this can create all kinds of blocking havoc.
Anybody can search for anything, anytime. It’s extremely difficult to design an indexing strategy when every incoming query is different. This element of Relativity is more like a data warehouse, where end users are constantly asking new questions of the data and trying to spot trends.
We’re talking about lawyers building SQL. One awesome thing about Relativity is that it empowers anybody to build really complex searches without the help of a developer or a DBA. The drawback is that the queries aren’t always optimized. But, I will say this: the queries by lawyers look a lot better than a lawsuit written by DBAs.
And all the while, people are still loading documents. New batches of data come in from the field at unpredictable times. It’s like having a data warehouse that loads data in the middle of the day.
Security and auditing are vital. We have to make sure no one sees a document they’re not allowed to see, and that every access to privileged information is tracked.
Every case is different. Even though the basic application code is the same, the added fields and the search patterns vary wildly. You can’t just make one set of indexes and call it a day. These things add up to make performance tuning your SQL Server significantly more complex – but still totally doable.
The Easy, Expensive Way to Tune Relativity’s SQL Server
Follow my instructions from my TechEd 2012 session, "Building the Fastest SQL Serveres":
1. Buy as much memory as your server can hold.
2. Build your storage infrastructure to deliver SQL Server’s maximum consumption rate (MCR) for your number of CPU cores.
3. Before going live, load test to make sure your SQL Server actually delivers that throughput.
4. After going live, monitor your storage throughput and latency, looking for surprise problems.
To learn more about how this works, dig through Microsoft’s Fast Track data warehouse reference architecture. Those documents teach you how to calculate SQL Server’s MCR and how to design storage networks to deliver data that quickly, while also showing you how to configure SQL Server correctly for massive throughput.
Done right, you’re done here. This gives you an infrastructure that can tolerate any search queries at any time, indexed or not. Well, not exactly—you still have to follow basic good design processes. You can’t lather on dozens of indexes, thereby slowing down inserts/updates during document loads. You also have to stay on top of your case loads because they’re only going to grow over time, and you can still outgrow your hardware’s speed capacity.
The Hard, but Cheaper, Way to Performance Tune
Manage with automation in mind. Each time you do a performance tuning or availability task, think about how you’re going to accomplish it automatically going forward. This is important as you think about the rest of these tasks.
Proactively create, modify, and disable indexes. Because anybody can query for anything at any time, Relativity’s developers can’t possibly come up with an index strategy that will work for your users. They build in a basic set that should be good enough for most use cases, but if you want killer performance with minimal hardware spends, you’re going to have to roll up your sleeves. Start with our How to Think Like the Engine training, and then move on to our How to Tune Indexes and Speed Up SQL Server videos.
Use an index naming convention. You’ll need to be able to instantly identify which indexes were created by your shop, and which are natively shipped with Relativity. Don’t go dropping Relativity’s built-in indexes—even though some may be duplicates or not used—in your environment. Consider disabling them instead, and notify Relativity’s support team first.
Monitor the most resource-intensive queries. Use a SQL Server performance monitoring tool or analyze the plan cache to find queries doing the most logical reads. Because every database will have its own execution plans, you may also need to zoom out and look at aggregated execution plan stats.
When something new pops up, attack it. Your options are:
- If it’s coming from outside of Relativity—like a monitoring app or a loading tool—try tuning the query first. If you can’t eliminate common anti-patterns like implicit conversions and non-sargable ‘where’ clauses, you may need to use indexes.
- If it’s a Relativity saved search, work with the client support reps to find the most efficient way to get the data the client needs. Sometimes just a few tweaks to the search can make an unbelievable difference.
- If it’s part of Relativity’s own code, tune it with indexing. This is where tuning a third-party application gets challenging, because your work never really ends. Every new version of Relativity brings changed queries, and you have to make sure you don’t leave indexes lying around that aren’t helping anymore.
If you can’t tune it, start a case with Relativity support. Include your Relativity version, plus specific metrics showing how much load the query is causing on your server and where the query is coming from when you reach out. Relativity’s developers love making the Relativity experience better for everyone involved, and they need your real-world feedback on what’s causing you problems.
The Bottom Line on Scaling Relativity
It’s really just like performance tuning any other database: when you blow past a terabyte of data, and you’re doing a combination of OLTP and reporting-style access, you’re going to have to roll up your sleeves.
It doesn’t matter whether the app is homegrown or from a third-party developer—the DBA needs to:
- Know the database schema well
- Know the queries well
- Know how hardware can offset query and schema challenges
- Know when to turn SQL Server’s knobs—and when not to
Guest Blog Post by Brent Ozar