# Database design tricks using Mathematics

10. March 2011 21:54

Increase performance by 30-40% and reduce the table size by 90% by considering mathematics while database design

Writer: Amol Rajamane

Applies to: SQL Server 2000 / 2005 / 2008 / 2011

Summary:

In many area Mathematics makes your life easier and faster. This paper contains the SQL Server database design to get benefits of Bitwise operation in terms of size and performance.

Introduction:

When database table grows in size then its bit difficult to manage insert update deletes and Index maintenance. It’s very difficult to modify the existing schema as well. For example if you want change one datatype  say, INT to BIGINT, the operation would take its own sweet time to complete which is directly proportional to the number of records in the table.
More number of records resulting in slow query performance and large amount of space.  Which could affect the ETL process.
To overcome these two problems, size and performance, database design can  play a very important role. This paper tells you some tricky design to overcome these two problems.

Overview:

You might be aware of the bitwise operation in SQL Server.  Let’s step back and see how it works. Lets consider the simple example of Student and their SkillSet. We need to populate SkillSet table as shown in below.

 SkillSetId Skill 1 DB2 2 FoxPro 4 Informix 8 Interbase 16 MS Access 32 MS SQL Server 64 MySQL

SkillSetId whould be the sequence of POWER of 2, like 1, 2, 4, 8 and so on …
If you see carefully, sum of bit values will never exceeds next bit value. For example
1 + 2 = 3
1 + 2 + 4 = 7
1 + 2 + 4 + 8 = 15
. . .

Consider Student table as shown in below. (Students may be in millions)

 StudentId Name 1 Student 1 2 Student 2 3 Student 3 4 Student 4

Now consider Student 1 has skills sets MS Access, MS SQL Server, MySQL. We need to store Student and their skill sets. Consider the table StudentSkillSet like below. This is the normal way of storing the relational data.

 StudentId SkillSetId 1 16 1 32 1 64

In another approach, Instead of storing SkillSetIds of student, we will store the SUM of SkillSetIds.  Here 16 + 32 + 64 = 112

 StudentId SkillSetId 1 112

This approach reduces the number of records. As you see earlier you have 3 records for StudentId 1.
Now how to relate this 112 number with IDs associated with MSAccess(16), MS SQL Server(32) and MySQL(64).  Here we will use bitwise AND to get the SkillSetIds associated to number 112. Just run the following statement in query analyzer.

```SELECT  112 & 1 AS 'DB2',
112 & 2 AS 'FoxPro',
112 & 4 AS 'Informix',
112 & 8 AS 'Interbase',
112 & 16 AS 'MS Access',
112 & 32 AS 'MS SQL Server',
112 & 64 AS 'MySQL'
GO```

This will return the output like

 DB2 FoxPro Informix Interbase MS Access MS SQL Server MySQL 0 0 0 0 16 32 64

Only skill sets which are non zero are associated to 112. Isn’t it simple?

How it reduces the number of rows and size of table. For that consider there are 1 million students and each student has 10 skillsets. Then there will be 10 millions of rows in StudentSkillSet table. If we store the SUM of skillSetIds there will be the same number of records as in Student table. That means 1 millions.  To get the space used I created two tables StudentSkillSet and StudentSkillSetTemp. StudentSkillSet table contains StudentId and sum of SkillSetIds. And StudentSkillSetTemp contains StudentId and SkillSetId. Here is size difference between both approaches.

Execute below query to get the space used by StudentSkillSet

```EXEC sp_spaceused StudentSkillSet
GO```

Out is as below which indicate total space used = 35.976 MB  (Data + Index_size)

 Name rows Reserved data index_size unused StudentSkillSet 1000000 37216 KB 20816 KB 15160 KB 1240 KB

Same way get the space used by StudentSkillSetTemp

```EXEC sp_spaceused StudentSkillSetTemp
GO```

Out is as below which indicate total space used = 357.544 MB (Data + Index_size)

 name rows reserved Data index_size unused StudentSkillSetTemp 10000000 358224 KB 207824 KB 149720 KB 680 KB

You can see the difference between these two tables in size. There is size saving almost 90%

This is all about the size. Now step back and see if there is any performance impact of this technic.
Let’s create the following tables

1. SkillSet   –- To store all the skillsets
2. Student   –- To store all the students
3. StudentSkillSet  -- To store sum of skillsets associated to the student
4. StudentSkillSetTemp -- To store skillsets associated to the student

Run the below script in your test database.  Please be patient for 4-5 minutes to execute below script.

```-- First drop the dependent objects
IF OBJECT_ID('[dbo].[vwStudentSkillSet]') IS NOT NULL
DROP VIEW [dbo].[vwStudentSkillSet]
GO
IF OBJECT_ID('[dbo].[vwStudentSkillSetTemp]') IS NOT NULL
DROP VIEW [dbo].[vwStudentSkillSetTemp]
GO

-- Drop if exists
IF OBJECT_ID('[dbo].[SkillSet]') IS NOT NULL
DROP TABLE [dbo].[SkillSet]
GO

-- Create SkillSet table
CREATE TABLE [dbo].[SkillSet] (SkillSetID BIGINT NOT NULL, Skill VARCHAR(256) NOT NULL)

GO

-- Insert skillsets, SkillSetIDs must be in POWER of 2
INSERT INTO [dbo].[SkillSet] (SkillSetID, Skill)
SELECT 1 AS SkillSetID, 'DB2' AS Skill UNION ALL
SELECT 2 AS SkillSetID, 'FoxPro' AS Skill UNION ALL
SELECT 4 AS SkillSetID, 'Informix' AS Skill UNION ALL
SELECT 8 AS SkillSetID, 'Interbase' AS Skill UNION ALL
SELECT 16 AS SkillSetID, 'MS Access' AS Skill UNION ALL
SELECT 32 AS SkillSetID, 'MS SQL Server' AS Skill UNION ALL
SELECT 64 AS SkillSetID, 'MySQL' AS Skill UNION ALL
SELECT 128 AS SkillSetID, 'Oracle' AS Skill UNION ALL
SELECT 256 AS SkillSetID, 'Paradox' AS Skill UNION ALL
SELECT 512 AS SkillSetID, 'PostgreSQL' AS Skill UNION ALL
SELECT 1024 AS SkillSetID, 'Borland C++ Builder' AS Skill UNION ALL
SELECT 2048 AS SkillSetID, 'Borland Delphi' AS Skill UNION ALL
SELECT 4096 AS SkillSetID, 'Borland JBuilder' AS Skill UNION ALL
SELECT 8192 AS SkillSetID, 'ErWIN' AS Skill UNION ALL
SELECT 16384 AS SkillSetID, 'GNU C/C++' AS Skill UNION ALL
SELECT 32768 AS SkillSetID, 'MS Visual C++' AS Skill UNION ALL
SELECT 65536 AS SkillSetID, 'Sun ONE Studio' AS Skill UNION ALL
SELECT 131072 AS SkillSetID, 'HP ANSI C++' AS Skill UNION ALL
SELECT 262144 AS SkillSetID, 'JDBC' AS Skill UNION ALL
SELECT 524288 AS SkillSetID, 'JDK' AS Skill UNION ALL
SELECT 1048576 AS SkillSetID, 'Macromedia Flash' AS Skill UNION ALL
SELECT 2097152 AS SkillSetID, 'Macromedia Flash Media Server' AS Skill UNION ALL
SELECT 4194304 AS SkillSetID, 'MS Visual Basic' AS Skill UNION ALL
SELECT 8388608 AS SkillSetID, 'MS Visual J++' AS Skill UNION ALL
SELECT 16777216 AS SkillSetID, 'MS Visual Source Safe' AS Skill UNION ALL
SELECT 33554432 AS SkillSetID, 'Powerbuilder Foundation Classes' AS Skill UNION ALL
SELECT 67108864 AS SkillSetID, 'Rational Rose' AS Skill UNION ALL
SELECT 134217728 AS SkillSetID, 'UML' AS Skill UNION ALL
SELECT 268435456 AS SkillSetID, 'JavaScript' AS Skill UNION ALL
SELECT 536870912 AS SkillSetID, 'Lua' AS Skill UNION ALL
SELECT 1073741824 AS SkillSetID, 'Modula-2' AS Skill UNION ALL
SELECT 2147483648 AS SkillSetID, 'Pascal' AS Skill UNION ALL
SELECT 4294967296 AS SkillSetID, 'Perl' AS Skill UNION ALL
SELECT 8589934592 AS SkillSetID, 'PL/I' AS Skill UNION ALL
SELECT 17179869184 AS SkillSetID, 'PHP' AS Skill UNION ALL
SELECT 34359738368 AS SkillSetID, 'Python' AS Skill UNION ALL
SELECT 68719476736 AS SkillSetID, 'REALBasic' AS Skill UNION ALL
SELECT 137438953472 AS SkillSetID, 'Visual Basic' AS Skill UNION ALL
SELECT 274877906944 AS SkillSetID, 'Active X' AS Skill UNION ALL
SELECT 549755813888 AS SkillSetID, 'ADO' AS Skill UNION ALL
SELECT 1099511627776 AS SkillSetID, 'ASP' AS Skill UNION ALL
SELECT 2199023255552 AS SkillSetID, 'ATL' AS Skill UNION ALL
SELECT 4398046511104 AS SkillSetID, 'BDE' AS Skill UNION ALL
SELECT 8796093022208 AS SkillSetID, 'CGI' AS Skill UNION ALL
SELECT 17592186044416 AS SkillSetID, 'Client-Server Architecture' AS Skill UNION ALL
SELECT 35184372088832 AS SkillSetID, 'COM/DCOM' AS Skill UNION ALL
SELECT 70368744177664 AS SkillSetID, 'CORBA' AS Skill UNION ALL
SELECT 140737488355328 AS SkillSetID, 'DAO/RDO' AS Skill UNION ALL
SELECT 281474976710656 AS SkillSetID, 'DICOM' AS Skill UNION ALL
SELECT 562949953421312 AS SkillSetID, 'HTML' AS Skill UNION ALL
SELECT 1125899906842624 AS SkillSetID, 'HL7' AS Skill UNION ALL
SELECT 2251799813685248 AS SkillSetID, 'IBM WebSphere' AS Skill UNION ALL
SELECT 4503599627370496 AS SkillSetID, 'Image processing' AS Skill UNION ALL
SELECT 9007199254740992 AS SkillSetID, 'JDBC' AS Skill UNION ALL
SELECT 18014398509481984 AS SkillSetID, 'MFC' AS Skill UNION ALL
SELECT 36028797018963968 AS SkillSetID, 'Microsoft .NET' AS Skill UNION ALL
SELECT 72057594037927936 AS SkillSetID, 'MIDAS' AS Skill UNION ALL
SELECT 144115188075855872 AS SkillSetID, 'ODBC' AS Skill UNION ALL
SELECT 288230376151711744 AS SkillSetID, 'OLE' AS Skill UNION ALL
SELECT 576460752303423488 AS SkillSetID, 'Oracle Developer/Programmer' AS Skill UNION ALL
SELECT 1152921504606846976 AS SkillSetID, 'Sound processing' AS Skill UNION ALL
SELECT 2305843009213693952 AS SkillSetID, 'VBA' AS Skill

GO
-- Create PK
ALTER TABLE [dbo].[SkillSet] ADD CONSTRAINT pk_SkillSet PRIMARY KEY (SkillSetID)

GO

-- Drop if exists
IF OBJECT_ID('[dbo].[Student]') IS NOT NULL
DROP TABLE [dbo].[Student]
GO

-- Create student table
CREATE TABLE [dbo].[Student] (StudentID INT IDENTITY(1,1), StudentName VARCHAR(100) NOT NULL)
GO

-- Add 1 million records in Student table
SET NOCOUNT ON
DECLARE @LoopCounter INT
SET @LoopCounter = 1

BEGIN TRAN
WHILE @LoopCounter <= 100000
BEGIN
INSERT [dbo].[Student] VALUES('Student ' + CAST(@LoopCounter AS VARCHAR))
SET @LoopCounter = @LoopCounter + 1
END
COMMIT
GO

-- Create PK
ALTER TABLE [dbo].[Student] ADD CONSTRAINT pk_Student PRIMARY KEY (StudentID)

GO

-- Drop if exists
IF OBJECT_ID('[dbo].[vwStudentSkillSet]') IS NOT NULL
DROP VIEW [dbo].[vwStudentSkillSet]

GO
IF OBJECT_ID('[dbo].[vwStudentSkillSetTemp]') IS NOT NULL
DROP VIEW [dbo].[vwStudentSkillSetTemp]

GO

-- Drop if exists
IF OBJECT_ID('[dbo].[StudentSkillSet]') IS NOT NULL
DROP TABLE [dbo].[StudentSkillSet]
GO

-- Create StudentSkillSet
CREATE TABLE [dbo].[StudentSkillSet] (StudentID INT NOT NULL, SkillSetID BIGINT NOT NULL)
GO

-- Drop if exists
IF OBJECT_ID('[dbo].[StudentSkillSetTemp]') IS NOT NULL
DROP TABLE [dbo].[StudentSkillSetTemp]
GO

-- Create StudentSkillSetTemp
CREATE TABLE [dbo].[StudentSkillSetTemp] (StudentID INT NOT NULL, SkillSetID BIGINT NOT NULL)
GO

-- Insert random 10 skills for each student into StudentSkillSetTemp
SET NOCOUNT ON
DECLARE @LoopCounter INT, @StudentID INT, @strLoopCounter VARCHAR(100)
SET @LoopCounter = 1

BEGIN TRAN
WHILE @LoopCounter <= 100000
BEGIN

SELECT @StudentID = StudentID
FROM Student
WHERE StudentID = @LoopCounter

INSERT [dbo].[StudentSkillSetTemp] (StudentID, SkillSetID)
SELECT TOP 10 @StudentID, SkillSetID
FROM SkillSet
ORDER BY NEWID()

SET @LoopCounter = @LoopCounter + 1

SET @strLoopCounter = CAST(@LoopCounter AS VARCHAR)
RAISERROR (@strLoopCounter, 0, 1) WITH NOWAIT

END
COMMIT

GO

-- Create PK
ALTER TABLE [dbo].[StudentSkillSetTemp] ADD CONSTRAINT pk_StudentSkillSetTemp PRIMARY KEY (SkillSetID, StudentID)
GO

-- Create NC on StudentID
CREATE NONCLUSTERED INDEX [ix_StudentSkillSetTemp_StudentID] ON [dbo].[StudentSkillSetTemp] ([StudentID])
GO

-- Insert SUM of Skillsets for students into StudentSkillSet
INSERT [dbo].[StudentSkillSet] (StudentID, SkillSetID)
SELECT StudentID, SUM(SkillSetID) AS SkillSetID
FROM [dbo].[StudentSkillSetTemp]
GROUP BY StudentID
GO

-- Create PK, Here don't need to create composit key as PK as we have 1-1 mapping
ALTER TABLE [dbo].[StudentSkillSet] ADD CONSTRAINT pk_StudentSkillSet PRIMARY KEY (SkillSetID)
GO

-- Create NC on SkillSetID
CREATE NONCLUSTERED INDEX [ix_StudentSkillSet_StudentID] ON [dbo].[StudentSkillSet] (StudentID)
GO

-- Create view to get the actual skillset ids associated to students
CREATE VIEW [dbo].[vwStudentSkillSet]
WITH SCHEMABINDING
AS
SELECT
sss.StudentID,
ss.SkillSetID,
ss.Skill
FROM [dbo].[StudentSkillSet] sss
CROSS JOIN [dbo].[SkillSet] ss
WHERE ss.SkillSetID > 0
AND ss.SkillSetID | sss.SkillSetID = sss.SkillSetID

GO

-- Here we will create composite PK
CREATE UNIQUE CLUSTERED INDEX ucix_vwStudentSkillSet_StudentID
ON [dbo].[vwStudentSkillSet] (SkillSetID, StudentID)
GO

-- Create the same view for our temp table as well
CREATE VIEW [dbo].[vwStudentSkillSetTemp]
WITH SCHEMABINDING
AS
SELECT  sss.StudentID, ss.SkillSetID, ss.Skill
FROM    [dbo].[StudentSkillSetTemp] sss
INNER JOIN [dbo].[SkillSet] ss ON (sss.SkillSetID = ss.SkillSetID)
WHERE   ss.SkillSetID > 0

GO

-- Create composite PK
CREATE UNIQUE CLUSTERED INDEX ucix_vwStudentSkillSetTemp_StudentID
ON [dbo].[vwStudentSkillSetTemp] (SkillSetID, StudentID)
GO```

Now Lets look at the performance different between these two approaches.
First we will run some queries against the tables directly instead of indexed view which we created above.
Query Number  1 :  -- Get all the Students having Skill Set  1, 2,  and 3

```--With new design
SELECT  sss.StudentID,
ss.SkillSetID,
ss.Skill
FROM    [dbo].[StudentSkillSet] sss
CROSS JOIN [dbo].[SkillSet] ss
WHERE   ss.SkillSetID IN (1,2,4) AND
ss.SkillSetID | sss.SkillSetID = sss.SkillSetID
GO
--With normal design
SELECT  sss.StudentID,
ss.SkillSetID,
ss.Skill
FROM    [dbo].[StudentSkillSetTemp] sss
INNER JOIN [dbo].[SkillSet] ss ON (sss.SkillSetID = ss.SkillSetID)
WHERE   ss.SkillSetID IN (1,2,4)
GO```

Run these queries and see the execution plan.
Figure 1

See the execution plan in Figure 1, It’s 22% faster.

Query Number  2 :  -- Get all the Skill Sets of Student  1

```-- With new design
SELECT  sss.StudentID,
ss.SkillSetID,
ss.Skill
FROM    [dbo].[StudentSkillSet] sss
CROSS JOIN [dbo].[SkillSet] ss
WHERE   ss.SkillSetID > 0 AND
ss.SkillSetID | sss.SkillSetID = sss.SkillSetID
AND sss.StudentID = 1
GO

-- With normal design
SELECT  sss.StudentID,
ss.SkillSetID,
ss.Skill
FROM    [dbo].[StudentSkillSetTemp] sss
INNER JOIN [dbo].[SkillSet] ss ON (sss.SkillSetID = ss.SkillSetID)
WHERE   ss.SkillSetID > 0 AND
sss.StudentID = 1
GO```

Execute above queries and see the execution plan.

Figure 2

See the execution plan in Figure 2, It’s 8% faster.

Now lets run some queries against Indexed views, [dbo].[vwStudentSkillSet] and [dbo].[vwStudentSkillSetTemp]

Query Number  3 :  -- Get all the Students having Skill Sets  1, 2,  and 3 using Indexed views

```--With new design
SELECT  StudentID,
SkillSetID,
Skill
FROM    [dbo].[vwStudentSkillSet]
WHERE   SkillSetID IN (1,2,4)

GO
--With normal design
SELECT  StudentID,
SkillSetID,
Skill
FROM    [dbo].[vwStudentSkillSetTemp]
WHERE   SkillSetID IN (1,2,4)
GO```

Figure 3

See the execution plan in Figure 3, It’s 22% faster.

Query Number  4 :  -- Get all the Skill Sets of Student  1 using Indexed views

```--With new design
SELECT  StudentID,
SkillSetID,
Skill
FROM    [dbo].[vwStudentSkillSet]
WHERE   StudentID = 1

GO
--With new design
SELECT  StudentID,
SkillSetID,
Skill
FROM    [dbo].[vwStudentSkillSetTemp]
WHERE   StudentID  = 1
GO```

Figure 4

Query Number  5 :  -- Get the number of skillsets for Student  1 using Indexed views

```SELECT  COUNT(*)
FROM    [dbo].[vwStudentSkillSet]
WHERE   SkillSetID = 1

GO

SELECT  COUNT(*)
FROM    [dbo].[vwStudentSkillSetTemp]
WHERE   SkillSetID  = 1
GO```

Figure 5

If you observe, all the queries are running at least 22% (this number may vary depending on hardware) faster than old design.

Limitations:

The biggest disadvantage of this design is, you cannot use it if you have more than 62 dimensions keys. In this case I used 62 Skill Sets. This is because the POWER(2,63) crosses the max limit of big integer supported by SQL Server. And binary operations cannot be performed against Float or Decimal values.

Inferences:

This approach is very useful if you have dimensions having keys less than 63 and you have huge facts which holds these dimension keys.
Consider the case where you have 100 millions fact keys and each key hold minimum 10 dimension keys. So there will be 100 * 10 = 1000 Millions of rows. But if you use this design there will be 100 millions of rows. So at the max you could save the space 62 time, it’s huge one …

SQL Queries:

SQL Queries.sql (10.31 kb)

Whitepaper

# SQL Server Tools code-named “Juneau”

7. March 2011 11:30

Good news for SQL Server Developers, SQL Server Developer tool named Juneau is coming out.

Juneau will be used as SQL Server Explorer to drill down into a database and get the same preview that SSMS provides. It will be used as text editior to execute SQL statements, debuging step over step.

Plenty of other cool things with Juneau check it out Here

SQL Server Denali

# Using Contained Database

5. March 2011 19:41

Why do we need Contained Database? Lets take an example, we develop application and database on our development machine. Now its time to deploy it on production. There are couple of ways to do it. One, generate database object script and create the database. Second, backup the database and restore it on production. These steps will create database on production but it wont create any dependant server level objects for example logins. With help of database database backup user will user will get created on production but it will be an orphand user. You have to map a login for the user.

To avoid such ambiguities, Contained Database comes in picture. While creating database SQL Server Denali will gives you option of setting its containment type, which creates Contained objects defined at server level within database itself. Currently only containment option "Partial" is available in SQL Server Denali CTP 1. "Full" containment is not yet enabled in CTP 1.

Let's see how to create Contained Database. First you have enable contained databases server level option. To Enable contained databases open Server

Properties as shown in figure. By default Contained Database property is False. To work on it make it True.

Or secondly you can enable this property by using T-SQL as well.

sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
sp_configure 'contained database authentication', 1
GO
RECONFIGURE
GO

Once you enabled contained databases, create contained database using create database statement shown in below.

```CREATE DATABASE [TestContainedDB]
CONTAINMENT = PARTIAL
GO```

Lets create user in TestContainedDB.

```USE [TestContainedDB]
GO
CREATE USER [TestUser] WITH PASSWORD = ‘password123’
GO```

If you notice, earlier we required valid server login to create database user. But here we do not require to create SQL Server login for the database user. So dependency between database user and server login has been removed here. Means it all required objects containance in Contained Database.

This is useful when you need to move database from one SQL Server instance to another. You do not have to map the database users to login.

SQL Server Denali