AZR432 – SQL Azure Federations Deep Dive

Most developers are familiar with the concept of scaling out their application tier; with SQL Azure Federations it is now possible to scale out the data tier as well. In this session we will deep dive on building large scale solutions on SQL Azure.

In this session we will cover patterns and techniques for building scalability into your relational databases. SQL Azure Federations allow databases to be spread over 100s of nodes in the Azure datacentre with databases paid for by the day. This presents a unique avenue for dealing with particularly massive volumes of data, of user load, or both.

This session will discuss how to design a schema for federation scale-out while still maintaining the value afforded by a true relational (SQL) database. We’ll look at approaches for minimizing cross federation queries and as well as approaches to fan-out queries when necessary. We will examine approaches for dealing with elastically scaling applications and other high load scenarios.

Introduction

Kiwi Chris Auld is back to go into more detail on SQL Federations.

Disclaimer: These are conference session notes I compiled during various sessions at Microsoft Tech Ed 2012, September 11-14, 2012. The majority of the content comprises notes taken from the presentation slides accompanied, occasionally, by my own narration. Some of the content may be free hand style. Enjoy… Rob

Agenda

Overview
Tips and Tricks
- Design and Development
  - Picking a Federation Model
  - Picking Reference Tables
  - Generating a key without bottlenecks
  - Coding fan-out queries
- Administration
  - Configuring layout
  - Where and when to split
vNext
- Improvements

SQL Azure has been renamed.. Keeping it “simple”: SQL Database = SQL Azure

Scalability model for the Cloud

Cloud Apps allow massive scale
- Orders of magnitude more than burst
Cloud Apps demand the best economics
- Best Price/Performance
- Elasticity + Pay-as-you-go

Data Scale Challenges

For small scenarios scale up is cheaper & easier
For larger scenarios scale out is the only solution
- Massive diseconomies of scale
  - 1×64 way server >>> $$$ 64×1 way servers
- Reach a limit (can’t get a big enough box)
- Shared resource contention
Scale out differs

Federations in SQL Database

Canonical 3 tier app scales by
- Adding and removing nodes
- Buying a huge DB server
Federations extend the model to the DB tier
- Add and remove Azure nodes with federations
- Scale on demand to your traffic without downtime

Sort of like a load balancer for databases. Database horizontal partitioning under transactional load.
Federation is essentially a database sharding strategy.

Why use Federations?

Scale beyond single DB to almost unlimited scale
- Many nodes
Best economics
- elastic tier that can repartition
- Always on
Simplified multi-tenancy
Simplified development and admin
- Platform supports sharding
- Proper RDBMS

Concepts

Sharding/federation: A named dimension over which data is sliced.
’CREATE FEDERATION fed_name(key_label, distribution_type)

Atomic unit: What is the smallest atomic unit (record) to shard from? e.g. a Customer record (which can’t be shared across two shards).

Repartitioning without downtime:

SPLIT members into workloads over nodes e.g. ALTER FEDERATION <name> SPLIT AT(key=value)
DROP members to shrink back to fewer nodes
MERGE not yet supported

Built-in Data Dependent Routing

DDR ensure apps can discover where data is just-in-time (JIT)
- Apps don’t need to cache a ‘shard map’
- No cache coherency issues even with partitioning
Prevents connection pool fragmentation issues

USE FEDERATION <name>(<key>=value);

[Demo]

Working with Federations

Create federation (shard)
Create federation member
insert data (turn filtering off for multiple records)
Supports standard T-SQL
using FEDERATED ON to assign column to federation scope
Needs to be applied consistently when sharding a table (e.g. on the atomic unit – e.g. customerId)
Can only do one split at a time
Can only split on discrete values (int, varbinary, uniqueidentifier)
Can use a dynamic view to track progress

Concepts – 3 kinds of tables

Federated Tables
- Contain data that is distributed by the federation
- Contains a slice of data in each database
- Optimized for read/write at scale
- Bounded by atomic unit/filtering
Reference Tables
- Duplicate copy of data in each federation member
- Must manually modify data in each federation member, thus eventually consistent
- optimised for read
- Used for referential integrity
- No cross database transactions in SQL Azure
Central table
- Refer to tables that are created in the federation root (database) for low traffic objects such as metadata
- Single point of load (use sparingly) or through a cache

Picking Federations

Normalize data model to 3NF (or more) – sort of (keep in mind the persistence of the federated column needs to be used across the related data entities) – sort of like adding a “federation key” across the data model
Apply scale-first DB design principals
Pick Federations, i.e. “Table Groups” that need to scale out

Picking Reference Tables

Look up table
Joined in queries

Generating Unique Keys

Identity not available (because it’s shared)
Identity Generation can be expensive for large apps
- provides linearly increasing values
- provides no gaps in generation
- only generated at the data tier
- creates bottleneck, must have some sort of shared counter
Benefits of GUID
- no centralized id generation
- generated anywhere in the tiers
- random distribution over an enormous address space
Can also use varbinary

Data Dependent Routing

Based on ‘atomic units’. DDR connects to the correct database federation member.

FILTERING=ON – connects to a single atomic unit (based on the federation key), good for:

Management tasks
CRUD

FILTERING=OFF – for fan-out queries (data across multiple federation members)

Used for:

Reporting queries
- e.g. Union or aggregate queries
Unaligned queries
fan-out queries (union all, additive/non-additive aggregates)

Member query: part sent to each member
Summary query: post processing query for member query results

[Demo] A number of different T-SQL queries which demonstrate querying across multiple and single federations.

Well, there’s obviously a lot to cover off here. I’ll be coming back to this topic myself to run a few sample proof of concepts, stay tuned here at Sanders Technology for more on SQL Federation.

One thought on “AZR432 – SQL Azure Federations Deep Dive”

1 Comment

William Denniss
September 14, 2012, 10:34 AM

Wow, this is very very interesting. That does solve some major issues larger apps will run into (like you blogged, “can’t get a big enough box”). It’s nice to see solutions for scaling relational databases other than “don’t use relational databases, use NoSQL”!