Today, I want to share with something wonderful about books SQL Antipatterns: Avoiding the Pitfalls of Database Programming (Pragmatic Programmers). This is begin for a tour around this book. Chapter I will begin now.
When design database for a system, sometimes we create a table with multivalue attributes. They have a many columns, with many value in one table.
There is nothing to say if they have a problem. The table create, and it should use for the system, and if it makes a developer difficult to DML (Data Manipulation Languages)?.
That’s time to apply SQL Antipatterns, we need to change the table structure, to make it clean. Easy to maintenance and understand.
1. When you create a table for bug note
Let me show you an example below:
I have many bugs, one bug has a description, many tags. If you don’t know a SQL Multivalue Pattern design. You can design table bugs like this
// Table bugs - one bug_id with many tags. CREATE TABLE Bugs ( bug_id SERIAL PRIMARY KEY, description VARCHAR(1000), tag1 VARCHAR(20), tag2 VARCHAR(20), tag3 VARCHAR(20), )
This is good for look. Right?. OK, I agree with you, when one bug create, you can see the data in a table like this.
// When create a new bug, data in table like that. bug_id description tag1 tag2 tag3 1234 Crashes while saving crash NULL NULL 3456 Increase performance printing performance NULL 5678 Support XML NULL NULL NULL
1.1 Searching for Values
Let begin an example when we want to DML any information in this table. First, when you want to search one row, the condition always includes three columns.
// When we want to get a one bug, we need to search with three condition SELECT * FROM Bugs WHERE tag1 = 'performance' OR tag2 = 'performance' OR tag3 = 'performance';
You might need to search for bugs that reference both tags, that’s easy to see that performance in this case is not good. To improve performance, we can use WHERE condition, group all OR condition to AND. Because OR has lower precedence than AND.
// Use WHERE, we can improve performance (but just a little bit) SELECT * FROM Bugs WHERE (tag1 = 'performance' OR tag2 = 'performance' OR tag3 = 'performance') AND (tag1 = 'printing' OR tag2 = 'printing' OR tag3 = 'printing')
Another way, we can use IN condition.
// Use IN condition, we can write like this SELECT * FROM Bugs WHERE 'performance' IN (tag1, tag2, tag3) AND 'printing' IN (tag1, tag2, tag3);
1.2 Adding and removing value
Only one problem comes when we update table with a structure like this. If one column is null, we can write it with NULLIF.
// Update a tag columns if it is not NULL UPDATE Bugs SET tag1 = NULLIF(tag1, 'performance'), tag2 = NULLIF(tag2, 'performance'), tag3 = NULLIF(tag3, 'performance') WHERE bug_id - 3456;
This statement will adds the new tag performance to the first column if that is currently null. However, if none of the three columns is null, the statement make no change to the row, and the new tag value is not recorded at all.
1.3 When we want to add new column
Without SQL Antipatterns, another weakness of this design is that three columns might not be enough. In the future, the customer wants to add new tag to bug (by add new column).
These is only one way to do that
// The customer want to add new tag column ALTER TABLE Bugs ADD COLUMN tag4 VARCHAR(20);
However, this change is costly in three ways
- Restructuring a database table that already contains data maybe locking table if they have a NOT NULL required.
- Some databases implement need to change source code to match with new table structure. If the tag4 column is required NOT NULL, the insert statement should update.
- The SQL statement must change follow the column add new.
2. How we solve it with SQL Antipatterns
Summary, this problem only by a relationship ONE TO MANY. Before show the solutions, we should ask yourself two questions:
- How many is the greatest number of tags we need to support?
- How can I search multiple columns at the same time in SQL?
After answer the two above questions, I will show you solutions for this problem. We can CREATE A DEPENDENT TABLE. That’s one of many solutions in the books SQL Antipatterns.
That’s the best solution when use AntiPattern. The main content of this problem is “STORE THE MULTIPLE VALUES IN MULTIPLE ROWS, INSTEAD for MULTIPLE COLUMNS”.
Also, define a foreign key in the dependent table associate the values to its parent row in the Bugs table.
// The solutions is CREATE DEPENDENT TABLE CREATE TABLE Tags ( bug_id BIGINT UNSIGNED NOT NULL tag VARCHAR(20), PRIMARY KEY (bug_id, tag), FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id) ) // The insert statement INSERT INTO Tags (bug_id, tag) VALUES (1234, 'crash') (3456, 'printing'), (3456, 'performance');
3. It’s easy for anything
When all the tags associated with a bug are in single column, searching for bugs with a given tag is more straightforward.
// Select all information about bug is easy like that SELECT * FROM Bugs JOIN Tags USING (bug_id) WHERE tag = 'performance';
Furthermore, if you want more complex searches, such a bug that related to two specific tags, is easy to read.
// Select a bug related SELECT * FROM Bugs JOIN Tags AS t1 USING (bug_id) JOIN Tags AS t2 USING (bug_id) WHERE t1.tag = 'printing' AND t2.tag = 'performance';
In case we want to add or remove an association much more easily than with the Multicolumn in SQL Antipatterns.
// Insert and delete will be easy INSERT INTO Tags (bug_id, tag) VALUES (1234, 'save'); DELETE FROM Tags WHERE bug_id = 1234 AND tag = 'crash';
4. Reference
You can read one more post about Database indexes : advantages and disadvantages.