VIVEK SHAH

Introduction to Database Indexes

Posted by vivek shah

Tuesday, March 15, 2011

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

Put simply, database indexes help speed up retrieval of data. The other great benefit of indexes is that your server doesn’t have to work as hard to get the data. They are much the same as book indexes, providing the database with quick jump points on where to find the full reference (or to find the database row).

There are both advantages and disadvantages to using indexes,however.

One disadvantage is they can take up quite a bit of space – check a textbook or reference guide and you’ll see it takes quite a few pages to include those page references.

Another disadvantage is using too many indexes can actually slow your database down. Thinking of a book again, imagine if every “the”, “and” or “at” was included in the index. That would stop the index being useful – the index becomes as big as the text! On top of that, each time a page or database row is updated or removed, the reference or index also has to be updated.

So indexes speed up finding data, but slow down inserting, updating or deleting data.

Some fields are automatically indexed. A primary key or a field marked as ‘unique’ – for example an email address, a userid or a social security number – are automatically indexed so the database can quickly check to make sure that you’re not going to introduce bad data.

So when should a database field be indexed?

The general rule is anything that is used to limit the number of results you’re trying to find.

It’s hard to generalise so we’ll look at some specific but common examples.

Note – the database tables shown below are used as an example only and will not necessarily be the best setup for your particular needs.

In a database table that looks like this:

Note: The SQL code shown below works with both MySQL and PostgreSQL databases.

CREATE TABLE subscribers (
subscriberid INT PRIMARY KEY,
emailaddress VARCHAR(255),
firstname VARCHAR(255),
lastname VARCHAR(255)
);

if we want to quickly find an email address, we create an index on the emailaddress field:

CREATE INDEX subscriber_email ON subscribers(emailaddress);

… and any time we want to find an email address:

SELECT firstname, lastname FROM subscribers WHERE emailaddress=’email@domain.com’;

… it will be quite quick to find!

Another reason for creating indexes is for tables that reference other tables. For example, in a CMS you might have a news table that looks something like this:

CREATE TABLE newsitem (
newsid INT PRIMARY KEY,
newstitle VARCHAR(255),
newscontent TEXT,
authorid INT,
newsdate TIMESTAMP
);

and another table for authors:

CREATE TABLE authors (
authorid INT PRIMARY KEY,
username VARCHAR(255),
firstname VARCHAR(255),
lastname VARCHAR(255)
);

A query like this:

SELECT newstitle, firstname, lastname FROM newsitem n, authors a WHERE n.authorid=a.authorid;

… will be take advantage of an index on the newsitem authorid:

CREATE INDEX newsitem_authorid ON newsitem(authorid);

This allows the database to very quickly match the records from the ‘newsitem’ table to the ‘authors’ table. In database terminology this is called a table join – you should index any fields involved in a table join like this.

Since the ‘authorid’ in the authors table is a primary key, it is already indexed. The same goes for the ‘newsid’ in the news table, so we don’t need to look at those cases.

On a side note, table aliases make things a lot easier to see what’s happening. Using ‘newsitem n’ and ‘authors a’ means we don’t have to write:

SELECT newstitle, firstname, lastname FROM newsitem, authors WHERE newsitem.authorid=authors.authorid;

for more complicated queries where more tables are referenced this can be extremely helpful and make things really easy to follow.

In a more complicated example, a news item could exist in multiple categories, so in a design like this:

CREATE TABLE newsitem (
newsid INT PRIMARY KEY,
newstitle VARCHAR(255),
newscontent TEXT,
authorid INT,
newsdate TIMESTAMP
);

CREATE TABLE newsitem_categories (
newsid INT,
categoryid INT
);

CREATE TABLE categories (
categoryid INT PRIMARY KEY,
categoryname VARCHAR(255)
);

This query:

SELECT n.newstitle, c.categoryname FROM categories c, newsitem_categories nc, newsitem n WHERE c.categoryid=nc.categoryid AND nc.newsid=n.newsid;

… will show all category names and newstitles for each category.

To make this particular query fast we need to check we have an index on:

newsitem newsid
newsitem_categories newsid
newsitem_categories categoryid
categories categoryid

Note: Because the newsitem newsid and the categories categoryid fields are primary keys, they already have indexes.

We need to check there are indexes on the “join” table – newsitem_categories

This will do it:

CREATE INDEX newscat_news ON newsitem_categories(newsid);
CREATE INDEX newscat_cats ON newsitem_categories(categoryid);

We could create an index like this:

CREATE INDEX news_cats ON newsitem_categories(newsid, categoryid);

However, doing this limits some ways the index can be used. A query against the table that uses both ‘newsid’ and ‘categoryid’ will be able to use this index. A query against the table that only gets the ‘newsid’ will be able to use the index.

A query against that table that only gets the ‘categoryid’ will not be able to use the index.

For a table like this:

CREATE TABLE example (
a int,
b int,
c int
);

With this index:

CREATE INDEX example_index ON example(a,b,c);

It will be used when you check against ‘a’.
It will be used when you check against ‘a’ and ‘b’.
It will be used when you check against ‘a’, ‘b’ and ‘c’.
It will not be used if you check against ‘b’ and ‘c’, or if you only check ‘b’ or you only check ‘c’.
It will be used when you check against ‘a’ and ‘c’ but only for the ‘a’ column – it won’t be used to check the ‘c’ column as well.

A query against ‘a’ OR ‘b’ like this:

SELECT a,b,c FROM example where a=1 OR b=2;

6 Will only be able to use the index to check the ‘a’ column as well – it won’t be able to use it to check the ‘b’ column.

Multi-column indexes have quite specific uses, so check their use carefully.

Now that we’ve seen when we should use indexes, let’s look at when we shouldn’t use them. They can actually slow down your database (some databases may actually choose to ignore the index if there’s no reason to use it).

A table like this:

CREATE TABLE news (
newsid INT PRIMARY KEY,
newstitle VARCHAR(255),
newscontent TEXT,
active CHAR(1),
featured CHAR(1),
newsdate TIMESTAMP
);

… looks pretty standard. The ‘active’ field tells us whether the news item is active and ready to be viewed on the site.

So… should we should create an index on this field for a query like this?

SELECT newsid, newstitle FROM news WHERE active=’1′;

No, we shouldn’t.

If most of your content is live, this index will take up extra space and slow the query down because almost all of the fields match this criteria. Imagine 500 news items in the database with 495 being active. It’s quicker to eliminate the ones that aren’t active than it is to list all of the active ones (if you do have an index on the ‘active’ field, some databases will choose to ignore it anyway because it will slow the query down).

The featured field tells us whether the news item should feature on the front page. S

hould we index this field? Yes. Most of our content is not featured, so an index on the ‘featured’ column will be quite useful.

Other examples of when to index a field include if you’re going to order by it in a query. To get the most recent news items, we do a query like this:

SELECT newtitle, newscontent FROM news ORDER BY newsdate DESC;

Creating an index on ‘newsdate’ will allow the database to quickly sort the results so it can fetch the items in the right order. Indexing can be a bit tricky to get right, however there are tools available for each database to help you work out if it’s working as it should.

Well there you have it — my introduction to database indexes. Hopefully you’ve learned something from this article and can apply what you’ve learned to your own databases.

Labels: Database Concepts

VIVEK SHAH

011011010010000001100001011001000110010001101001011000110111010001100101011001000010000001110100011011110010000001101010011000010111011001100001 Click Here

Blog Owner

Labels

Facebook Badge

Facebook Badge

Blog Archive

Followers

Give away of the day

Introduction to Database Indexes