The Great Debate: Indexing vs. Relational Databases for Email Archiving
Archive vendors have long debated the merits of using indexing technology vs. relational databases to archive ESI. As a result, I’m often asked what kind of database M+Archive uses. In fact, many clients even specify a database to be mandatory in their RFPs.
The confusion primarily results from the fact that other archive vendors rely on an enterprise database such as Oracle, MySQL (which happens to be owned by Oracle), MS SQL Server, etc. Clients need to know what database they're using in advance so they can anticipate the additional cost of licensing the database, hiring expert DBAs to manage a large enterprise database, support issues, etc.
Why relational databases and email archiving are a mismatch
Relational databases are great for storing certain structured data, like the line items of expenses we have when we make budgets in Excel. The problem is most ESI such as email is unstructured data. It just doesn't fit properly in a database. Add to that the fact that email archives can run into hundreds of millions of documents. Imagine the headache of managing a database of that size.
The good news is M+Archive doesn’t need a database. Messaging Architects' use of indexing for M+Archive is unique — no one else that I know of uses indexing technology only.
So how is search performed? That’s where indexing comes along. All of these files are indexed by the M+Archive Indexing Server. All searches go against the index to be able to retrieve results in less than a second. For a client, this means you don’t need to hire a fancy DBA, and there's no Oracle license to buy/renew. Eliminating the database dependency simplifies things greatly — and the performance is scorching fast. Indexing technology was built to handle billions of documents with unstructured content.
The best way to understand indexing is to think Google. There is no way Google would exist if they had to use traditional databases. In fact, their search engine relies on a proprietary object storage system called Bigtable. The indexing technology M+Archive uses also powers another public search engine that has indexed more than 8 billion Web pages.
Indexing just makes sense, especially in the context of archiving hundreds of millions of records where a few terms hidden in one of these records can make or break a litigation case.
– Ranjit Sarai