Thoughts on Data Abstraction

Something that I've been working on a lot with both my work and personal web programming is data abstraction. The opportunity to work with several different handmade php frameworks has given me different ideas on what an effective data abstraction system should have. Without formal education, though, this post will be only based on my hands-on experience with the functionality I find myself using without true design patterns.

Data abstraction is a fancy word for treating the data as a separate entity from your web application logic. Web sites usually store their data in databases, using customized queries to pull information for their code to output and manipulate information. It doesn't matter what type of database or language you're working with; data abstraction can be employed whenever there is interaction with data (this post will be approaching this from a rough PHP - MySQL angle). Complex sites often have numerous queries scattered throughout the web site, and one needs to have intimate knowledge of the database structure during the development process in order to update and maintain the site. Also, any changes to the database requires plenty of rewrite and debugging to keep the code functional.

The easiest way to start with simplifying the data level of a web site is to create a centralized function or class to route all of the queries through. This first step will allow you to centralize the data connection logic and possibly add tracking to queries as they are passed in. While this step in abstraction will save you some coding (and in the case of PHP programming, possible repeated mysql_connect statements), you still have queries randomly tossed in with your code.

The main goal of data abstraction is to remove data storage dependency from the application logic of your code. If you wanted to change the structure or rename a column in a table, you would have to hunt down all of the queries that use the table and rewrite them. To make things worse, if you wanted to change the structure of your database or switch from MySQL to another database engine (like Oracle or MSSQL), there's a good chance that you'd have to rewrite your entire web application. By advanced use of data abstraction, you will minimize the need for rewrites during these changes and simplify your data structure.

After the first step of a centralized connection class there are a few possible routes. A great abstraction technique I've worked with involved defining a single database table with each model, using php __get() and __set() to select/manipulate the data. Each model class had it's own validation and error generation logic, which could be simplified through a single abstract class extension. This technique does not follow DRY principles, but is an effective and easy to replicate as the table and site grew in size.

While thinking of the models in data abstraction as single tables, and each instance a row, you lose out on a really powerful tool of SQL - the join. You could do a separate query on certain gets to instantiate new objects... For example, I would create a new BlogCategory object when I call $blog->category on a post (I save my categories in a separate table then my blog posts). This still results in two separate queries when it could be a single join on the category_id. Instead, I think a much smarter way to do abstraction is to break out of thinking of a model as a single table and thinking of them as pieces of data. A blog post has a category and several tags, so when I instantiate the model, it should join across the tables and have all of this data without repeat queries.

A more advanced solution would involve instantiating a complex object based off of several tables instead of a single model. You could always use lazy methods to delay extra queries to joined tables, but this step would allow you to have more complete abstraction of the data. You don't care what keys join tables together, but you know that a specific model has some properties/methods that are attached to other models.

No matter how one implements data abstraction, it's important to weigh the pros and cons. SQL queries can be incredibly powerful with joins, functions, and subqueries. While the benefits of abstraction can lead to faster application development and easier debugging, there's a point where you need to decide if it's worth adding complex logic to the data layer or rethink your application's needs.