SQL Antipatterns – Store Multivalue Attributes

Today, I want to share with something wonderful about books SQL Antipatterns: Avoiding the Pitfalls of Database Programming (Pragmatic Programmers). This is begin for a tour around this book. Chapter I will begin now.

When design database for a system, sometimes we create a table with multivalue attributes. They have a many columns, with many value in one table.

There is nothing to say if they have a problem. The table create, and it should use for the system, and if it makes a developer difficult to DML (Data Manipulation Languages)?.

That’s time to apply SQL Antipatterns, we need to change the table structure, to make it clean. Easy to maintenance and understand.

1. When you create a table for bug note

Let me show you an example below:
I have many bugs, one bug has a description, many tags. If you don’t know a SQL Multivalue Pattern design. You can design table bugs like this

// Table bugs - one bug_id with many tags.
  bug_id         SERIAL    PRIMARY KEY,
  description    VARCHAR(1000),
  tag1           VARCHAR(20),
  tag2           VARCHAR(20),
  tag3           VARCHAR(20),

This is good for look. Right?. OK, I agree with you, when one bug create, you can see the data in a table like this.

// When create a new bug, data in table like that.
bug_id        description              tag1        tag2        tag3
1234          Crashes while saving     crash       NULL        NULL
3456          Increase performance     printing    performance NULL
5678          Support XML              NULL        NULL        NULL

1.1 Searching for Values

Let begin an example when we want to DML any information in this table. First, when you want to search one row, the condition always includes three columns.

// When we want to get a one bug, we need to search with three condition
WHERE tag1 = 'performance'
      OR tag2 = 'performance'
      OR tag3 = 'performance';

You might need to search for bugs that reference both tags, that’s easy to see that performance in this case is not good. To improve performance, we can use WHERE condition, group all OR condition to AND. Because OR has lower precedence than AND.

// Use WHERE, we can improve performance (but just a little bit)
 (tag1 = 'performance' OR tag2 = 'performance' OR tag3 = 'performance')
 AND (tag1 = 'printing' OR tag2 = 'printing' OR tag3 = 'printing')

Another way, we can use IN condition.

// Use IN condition, we can write like this
WHERE 'performance' IN (tag1, tag2, tag3)
  AND 'printing' IN (tag1, tag2, tag3);

1.2 Adding and removing value

Only one problem comes when we update table with a structure like this. If one column is null, we can write it with NULLIF.

// Update a tag columns if it is not NULL
SET tag1 = NULLIF(tag1, 'performance'),
    tag2 = NULLIF(tag2, 'performance'),
    tag3 = NULLIF(tag3, 'performance')
WHERE bug_id - 3456;

This statement will adds the new tag performance to the first column if that is currently null. However, if none of the three columns is null, the statement make no change to the row, and the new tag value is not recorded at all.

1.3 When we want to add new column

Without SQL Antipatterns, another weakness of this design is that three columns might not be enough. In the future, the customer wants to add new tag to bug (by add new column).

These is only one way to do that

// The customer want to add new tag column

However, this change is costly in three ways

  • Restructuring a database table that already contains data maybe locking table if they have a NOT NULL required.
  • Some databases implement need to change source code to match with new table structure. If the tag4 column is required NOT NULL, the insert statement should update.
  • The SQL statement must change follow the column add new.

2. How we solve it with SQL Antipatterns

Summary, this problem only by a relationship ONE TO MANY. Before show the solutions, we should ask yourself two questions:

  • How many is the greatest number of tags we need to support?
  • How can I search multiple columns at the same time in SQL?

After answer the two above questions, I will show you solutions for this problem. We can CREATE A DEPENDENT TABLE. That’s one of many solutions in the books SQL Antipatterns.

That’s the best solution when use AntiPattern. The main content of this problem is “STORE THE MULTIPLE VALUES IN MULTIPLE ROWS, INSTEAD for MULTIPLE COLUMNS”.

Also, define a foreign key in the dependent table associate the values to its parent row in the Bugs table.

// The solutions is  CREATE DEPENDENT TABLE
  tag        VARCHAR(20),
  PRIMARY KEY (bug_id, tag),
  FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id)

// The insert statement
INSERT INTO Tags (bug_id, tag)
  VALUES (1234, 'crash') (3456, 'printing'), (3456, 'performance');

3. It’s easy for anything

When all the tags associated with a bug are in single column, searching for bugs with a given tag is more straightforward.

// Select all information about bug is easy like that
   JOIN Tags USING (bug_id)
WHERE tag = 'performance';

Furthermore, if you want more complex searches, such a bug that related to two specific tags, is easy to read.

// Select a bug related
   JOIN Tags AS t1 USING (bug_id)
   JOIN Tags AS t2 USING (bug_id)
WHERE t1.tag = 'printing' AND t2.tag = 'performance';

In case we want to add or remove an association much more easily than with the Multicolumn in SQL Antipatterns.

// Insert and delete will be easy
INSERT INTO Tags (bug_id, tag) VALUES (1234, 'save');
DELETE FROM Tags WHERE bug_id = 1234 AND tag = 'crash';

4. Reference

You can read one more post about Database indexes : advantages and disadvantages.

Có gì thắc mắc cứ comment đây nha! - Please feel free to comment here!