Whenever you, as a developer, are given a task based on existing code, you have to face many challenges. One such challenge—more often than not the most demanding one—involves understanding the data model of an application.

You are faced normally with confusing tables, views, columns, values, stored procedures, functions, constraints, and triggers that take a long time to make sense to you. And, once they do, you start noticing many ways to improve and take advantage of the stored information.

If you are an experienced developer, chances are you will also notice things that could have been done better in the beginning, i.e., design flaws.

Common database design bad practices

In this article, you will learn about some of the common database design bad practices, why they are bad, and how you can avoid them.

Bad Practice No. 1: Ignoring the Purpose of the Data

Data is stored to be consumed later, and the goal is always to store it and retrieve it in the most efficient manner. To achieve this, the database designer must know in advance what the data is going to represent, how is it going to be acquired and at what rate, what its operational volume will be (i.e., how much data is expected), and, finally, how it is going to be used.

For example, an industrial information system where data is gathered manually every day will not have the same data model as an industrial system where information is generated in real time. Why? Because it is very different handling a few hundreds or thousands of records per month compared with managing millions of them in the same period. Special considerations must be made by the designers in order to keep the efficiency and usability of the database, if data volumes are to be big.

But, of course, data volume is not the only aspect to consider, since the purpose of the data also affects the level of normalization, the data structure, the record size, and the general implementation of the whole system.

Therefore, thoroughly knowing the purpose of the data system you will create leads to considerations in the choice of the database engine, the entities to design, the record size and format, and the database engine management policies.

Ignoring these goals will lead to designs that are flawed in their basics, although structurally and mathematically correct.

Bad Practice No. 2: Poor Normalization

Designing a database is not a deterministic task; two database designers may follow all the rules and normalization principles for a given problem, and in most cases they will generate different data layouts. This is inherent to the creative nature of software engineering. However, there are some analysis techniques that make sense in every instance, and to follow them is the best way to get to a database that performs at its best.

Image of one set of data leading to two different layouts

In spite of this, we are often faced with databases that were designed on the fly without following the most basic rules of normalization. We have to be clear on that: Every database should, at least, be normalized to third normal form, since it is the layout that will best represent your entities, and whose performance will be best balanced between querying and inserting-updating-deleting records.

If you stumble with tables that do not comply with 3NF, 2NF, or even 1NF, consider redesigning these tables. The effort you invest in doing so will pay off in the very short term.

Bad Practice No. 3: Redundancy

Very related to the previous point, since one of the goals of normalization is to reduce it, redundancy is another bad practice that appears quite often.

Redundant fields and tables are a nightmare for developers, since they require business logic to keep many version of the same information up to date. This is an overhead that can be avoided if normalization rules are followed thoroughly. Although sometimes redundancy may seem necessary, it must be used only in very specific cases and be clearly documented in order to be taken into consideration in future developments.

Typical bad effects of redundancy are an unnecessary increase of database size, data being prone to inconsistency, and decreases in the efficiency of the database, but—more importantly—it may lead to data corruption.

Bad Practice No. 4: Bad Referential Integrity (Constraints)

Referential integrity is one of the most valuable tools that database engines provide to keep data quality at its best. If no constraints or very few constraints are implemented from the design stage, the data integrity will have to rely entirely on the business logic, making it susceptible to human error.

Bad Practice No. 5: Not Taking Advantage of DB Engine Features

When you are using a database engine (DBE), you have a powerful piece of software for your data handling tasks that will simplify software development and guarantee that information is always correct, safe, and usable. A DBE provides services like:

  • Views that provide a quick and efficient way to look at your data, typically de-normalizing it for query purposes without losing data correctness.
  • Indexes that help speed up queries on tables.
  • Aggregate functions that help analyze information without programming.
  • Transactions or blocks of data-altering sentences that are all executed and committed or cancelled (rolled back) if something unexpected occurs, thus keeping information in a perpetually correct state.
  • Locks that keep data safe and correct while transactions are being executed.
  • Stored procedures that provide programming features to allow complex data management tasks.
  • Functions that allow sophisticated calculations and data transformations.
  • Constraints that help guarantee data correctness and avoid errors.
  • Triggers that help automate actions when events occur on the data.
  • Command optimizer (execution planner) that runs under the hood, ensuring that every sentence is executed at its best and keeping the execution plans for future occasions. This is one of the best reasons to use views, stored procedures, and functions, since their execution plans are kept permanently in the DBE.

Not knowing or ignoring these capabilities will take development to an extremely uncertain path and surely to bugs and future problems.

Bad Practice No. 6: Composite Primary Keys

This is sort of a controversial point, since many database designers talk nowadays about using an integer ID auto-generated field as the primary key instead of a composite one defined by the combination of two or more fields. This is currently defined as the “best practice” and, personally, I tend to agree with it.

Image of a composite primary key

However, this is just a convention and, of course, DBEs allow the definition of composite primary keys, which many designers think are unavoidable. Therefore, as with redundancy, composite primary keys are a design decision.

Beware, though, if your table with a composite primary key is expected to have millions of rows, the index controlling the composite key can grow up to a point where CRUD operation performance is very degraded. In that case, it is a lot better to use a simple integer ID primary key whose index will be compact enough and establish the necessary DBE constraints to maintain uniqueness.

Bad Practice No. 7: Poor Indexing

Sometimes, you will have a table that you need to query by many columns. As the table grows, you will notice that the SELECTs on these columns slow down. If the table is big enough, you will think, logically, to create an index on each column that you use to access this table only to find almost immediately that the performance of SELECTs improves but INSERTs, UPDATEs, and DELETEs drop. This, of course, is due to the fact that indexes have to be kept synchronized with the table, which means massive overhead for the DBE. This is a typical case of over indexing that you can solve in many ways; for instance, having only one index on all the columns different from the primary key that you use to query the table, ordering these columns from the most used to the least may offer better performance in all CRUD operations than one index per column.

On the other hand, having a table with no index on columns that are used to query it will, as we all know, lead to poor performance on SELECTs.

Also, index efficiency depends sometimes on the column type; indexes on INT columns show the best possible performance, but indexes on VARCHAR, DATE or DECIMAL (if it ever makes sense) are not as efficient. This consideration can even lead to redesigning tables that need to be accessed with the best possible efficiency.

Therefore, indexing is always a delicate decision, as too much indexing can be as bad as too little and because the data type of the columns to index on have a big influence on the final outcome.

Bad Practice No. 8: Poor Naming Conventions

This is something that programmers always struggle with when facing an existing database: understanding what information is stored in it by the names of tables and columns because, often, there is no other way.

The table name must describe what entity it holds, and each column name must describe what piece of information it represents. This is easy, but it starts to be complicated when tables have to relate to each other. Names start to become messy and, worse, if there are confusing naming conventions with illogical norms (like, for instance, “column name must be 8 characters or less”). The final consequence is that the database becomes unreadable.

Therefore, a naming convention is always necessary if the database is expected to last and evolve with the application it supports, and here are some guidelines to establish a succinct, simple, and readable one:

  • No limitations on table or column name size. It is better to have a descriptive name than an acronym that no one remembers or understands.
  • Names that are equal have the same meaning. Avoid having fields that have the same name but with different types or meanings; this will be confusing sooner or later.
  • Unless necessary, don’t be redundant. For example, in the table “Item,” there is no need to have columns like “ItemName,” “PriceOfItem,” or similar names; “Name” and “Price” are enough.
  • Beware of DBE reserved words. If a column is to be called “Index,” which is a SQL reserved word, try to use a different one like “IndexNumber.”
  • If sticking to the simple primary key rule (single integer auto generated), name it “Id” in every table.
  • If joining to another table, define the necessary foreign key as an integer, named “Id” followed by the name of the joined table (e.g., IdItem).
  • If naming constraints, use a prefix describing the constraint (e.g., “PK” or “FK”), followed by the name of the table or tables involved. Of course, using underscores (“_”) sparingly helps make things more readable.
  • To name indexes, use the prefix “IDX” followed by the table name and the column or columns of the index. Also, use “UNIQUE” as a prefix or suffix if the index is unique, and underscores where necessary.

There are many database naming guidelines on the internet that will shine more light on this very important aspect of database design, but with these basic ones, you can at least get to a readable database. What is important here is not the size or the complexity of your naming guidelines but your consistency in following them!

Some Final Remarks

Database design is a combination of knowledge and experience; the software industry has evolved a lot since its early days. Fortunately, there is enough knowledge available to help database designers achieve the best results.

There are good database design guidelines all over the internet as well as bad practices and things to avoid in database design. Just take your pick and stick to it.

And, don’t forget, it is only through experimentation, mistakes, and successes that you learn, so go ahead and start now.

About the author

Fernando Martinez, Colombia
member since April 24, 2017
Fernando is a systems and computing engineer who graduated from the University of Los Andes in 1987 and has worked in software development ever since. He started developing in C on UNIX operating system with ORACLE SQL database. Fernando has kept his skills up-to-date and has developed in Java, C#, SQL Server, and more. [click to continue...]
Hiring? Meet the Top 10 Freelance Database Developers for Hire in November 2017

Comments

Usama Ahmed
Awesome post.
Fernando Martinez Leyva
Thank you Usama. I appreciate.
Vadim
Great article. Thanks. The article is about relation DB (RDBMS) and not DB generally. I mostly agree with all items and support the same philosophy in RDBMS design. But theory sometimes is quite different from practice. Here my few notes from my experience: 1. Ignoring the Purpose of the Data Yes and No. The main purpose of RDBMS was new layer of abstraction. Tha means: DB should not be aware how it will be used and as result: the same DB can be used in different application. Nowadays, DB is tighly connected to App, so there is no reason to spend time to design very abstract DB layer. 2. Normalization - As I belong to old school, I totally agree with it. But nowadays: we need performance and normalization contradicts with it. The solution is always compromise between level of normalization and performance. 3. Redundancy - it is bad, but again: performance :) The same as item 2. 4. Bad Referential Integrity - RDBMS was created to support ingerity, but sometime constraints are so complicate that we are forced to implement it on App level. And sometime it is better to keep all integrity constraints in one layer, (easy to debug later.) 5. Not Taking Advantage of DB Engine Features - YES!!!! 6. Composite Primary Keys - Relation DB theory recognizes only composite primary keys (there were big debate about it 15-20 years before). but as you said: performance becomes really problem in this case. 7. Poor Indexing - Indexes are created for specific queries, specific queries = specific application. The same problem as in item 1. 8. Poor Naming Conventions - totally agree. Even in our everyday language, different name for same objects causes a lot of misunderstandings (sometime wars :( )
Tabraiz Dada
Nice artice Fernando - strongly agree with point#3. Redundant records/columns are a mess and too often we see people storing values, which can be computed on the fly as fields in the database. With all the computing power on hand in today's day and age, we should avoid storing values in the DB which can be computed from existing data. Thanks
Ahtsham Raziq
Is this a resume or an article?
Fernando Martinez Leyva
Thank you for your comments, Vadim. I agree with you on almost everything. Regarding point #1, I have seen database which are almost perfect (mathematically), but that fail to fulfill their purpose, mainly because the designer created an "abstract" correct design that did not work 100% when queries and transactions were applied to it (i.e. the "purpose" of the data). Regarding point #5, I even worked with a developer who did not trust the DBE locking mechanism and tried to implement one by himself; of course, everything was a mess almost form the start and I had to force him to go back to DBE locking, after a great loss of time and work. It would have been funny, if it hadn't affected our project the way it did! Anyway, thanks again for your comments.
Fernando Martinez Leyva
I don't understand you question. The article summarizes experiences I have had and, of course, it is not a resumé.
Fernando Martinez Leyva
Thank you Tabraiz. I agree; redundancy is a terrible thing in databases. I have seen cases where it is necessary, mainly for performance reasons, but it has to be treated with extreme care.
Roger Wolf
Articles such as this are definitely needed, not just written but also read. However, since declarative programming in general (and database design and development in particular) is all about signing a deal with the Devil, a very careful wording is a must when one writes about such matters. In this case, some aspects are expressed in a bit too authoritative manner, sort of "thou shalt not" kind of thing (which practically never occurs in the RDBMS world, where everything is a trade-off). Others are just plain wrong. "Redundancy" - except when introduced on purpose, it always goes hand in hand with insufficient normalisation. Was it worth a separate paragraph? Personally, I would list triggers instead. After having seen a database with all the logic implemented in triggers, it's very hard to believe they can do any good. "Composite primary keys" - I take it you are confusing keys being primary or alternate with indices being clustered or nonclustered. If you are somehow forced to use an RDBMS in which primary key always has to be clustered - well, you have my condolences. Thankfully, it's not this way everywhere. "Poor indexing" - The phrase "each column that you use to access this table" is so vague that it's almost devoid of meaning. It's impossible to be understood correctly unless the reader is already an expert in the subject, which defeats the purpose of it being used here. On the other hand, there are many books entirely dedicated to indexing alone, so cramming all this knowledge in a couple of sentences without losing a crucial point or two is surely impossible. "Poor naming conventions" - although I completely agree with you here, the topic is so controversial that mentioning any concrete recommendations is bound to make it sound subjective. In my experience I have seen that (almost) any naming convention is good as long as everyone in the team follows it. The uniform applicability of it is a point which is often overlooked.
Javier Pedroza
Great, this is a excellent post about database design, definitely all these steps are necessary to keep in mind, thank you very much fernando.
Fernando Martinez Leyva
Great comments, Roger! Thank you very much. I agree; I sound a bit authoritative sometimes but, believe me, I have suffered with databases showing the things I talk about so, yes, I am not totally imparcial about them. Best regards.
Fernando Martinez Leyva
Thank you, Javier. I hope it is useful to designers...
Umesh
Great Article :) Thank you Fernando :) It's very useful :)
Harry Bajwa
nice!! http://www.q-vantage.com
Fernando Martinez Leyva
Thank you, Harry. I hope you find it useful.
Fernando Martinez Leyva
Thank you, Umesh.
Avag Sargsyan
Great article!
Nippon Data
great post about database design guidelines thanks for posting this blog and if someone is looking for the software development services as ERP software, CRM software, Cloud infrastructure services then visit at http://www.nippondata.com/
Juan Francisco Giordana
The article list best practices, but often applications demands rules to be bent.
Juan Francisco Giordana
Thanks for the article. It's been a while I didn't read one of these and it's always good to review from time to time. My 2 cents, it's often good to declare integer column values as UNSIGNED if negative values are not going to be used in the application, not only to take advantage of a higher range values, but also for indexes performance as well.
Алексей traster100
These are the usual tips for MySQL InnoDB
Fernando Martinez Leyva
Great suggestion Juan Francisco! Thank you.
Fernando Martinez Leyva
Not exactly, but very similar. Common sense is only one, right?
Fernando Martinez Leyva
I agree, specially with redundancy, denormalization, indexing and others, it is sometimes necessary to bend rules. What I have seen is that query performance is normally the main reason for that. Lastly, I have used views to denormalize (which also implies redundancy without compromising the database design), but of course "strange" indexing cannot be avoided sometimes. Regarding naming conventions, I try to be 100% strict because this is what makes a database readable and understandable. Thanks a lot for your comments.
Алексей traster100
yes, of course
comments powered by Disqus
Subscribe
The #1 Blog for Engineers
Get the latest content first.
No spam. Just great engineering posts.
The #1 Blog for Engineers
Get the latest content first.
Thank you for subscribing!
Check your inbox to confirm subscription. You'll start receiving posts after you confirm.
Trending articles
Relevant Technologies
About the author
Fernando Martinez
SQL Developer
Fernando is a systems and computing engineer who graduated from the University of Los Andes in 1987 and has worked in software development ever since. He started developing in C on UNIX operating system with ORACLE SQL database. Fernando has kept his skills up-to-date and has developed in Java, C#, SQL Server, and more.