If I write all my SQL statements in PHP code is this ideal for performance?
My first question in response to this is what is considered “ideal” for performance? Having worked as a technical performance specialist for a variety of companies, we need to look more closely at what is considered adequate performance. The true measure of performance is the time it takes to generate a report, respond to a query and so on (response times). Each one of these have different expectations on response times. For example, having a report processing millions of records in a database to summarize is expected to take several seconds/minutes and in some cases hours to complete. However, an online transaction processing (OLTP) solution cannot have the end user waiting minutes or hours to complete the purchase of a product.
Expectations should be clearly drawn up before working on performance related issues.
Before working on performance related issues you need to understand the architecture that you are dealing with and understanding how your application interacts with the hosts and the server processes.
Recently I have been developing applications with databases (MySQL) and PHP. Your server architecture will play a huge role in how your application should be developed. For example, some hosting companies provide both the web server and the database server on the same machine. Access to the database is made via internal memory structures which are quick. However, some hosting companies separate the web servers from the databases and as a result applications need to establish a connection to the database and then send the database SQL statements across the network. This can significantly slow down the application.
An application connected to a database has two very important measurements:
- Response time
The response time is the time lag between an application sending a request and receiving a response. For example, picture a water pipe. You put water in at one end and you time the water arriving at the other end. This is your half your response time. It does not matter how much water you put in it will still arrive at the same time. The response time is directly proportional to the length of the pipe i.e. the distance it needs to cover.
Throughput on the other hand is the size of the pipe. Imagine a straw sized pipe. The amount of water reaching the other end is limited by the volume of water contained by the pipe. The larger the pipe the more water will pour through. This is your throughput. The time it takes to pour the water through on both pipes is still the same.
Localhost is a synonym for a local based connection to the server process. This is very quick as access is typically via memory. In cases where the database server is on the same machine as the application, the pipe analogy above would give us a very large pipe with both ends close together.
In cases where the database server exists on a separate host than the application then there are several things you need to consider:
- The response time between sending a request and receiving the response
- The throughput of data available from the database server
Imagine that a response time of 100ms between sending a message across the network and received by the database. The database processes the information and sends its response back to the calling application. This may seem like a small amount of time, but what happens, as many applications do, it sends another SQL statement based on the response from the first query. Several 100ms roundtrips add up very quickly.
Likewise, imagine a lot of data is either sent to the database server or received back from the database server. Imagine a 3Mbps (mega bits per second) connection (this translates to approximately 300 kilobytes per second). To transfer 1MB of data will take approximately 3 seconds. If the database server exists in the same data center as the application host it typically has a 1Gbps connection (or approximately 100Mbytes per second) which is often advertised. This can be quite misleading because all traffic inside the data center will use this 1Gbps connection and your application will only have a share of this.
If database server and application exist on different machines is there anything that can be done to improve performance?
Indeed there is. Database servers have been designed to allow stored procedures to be created and executed. If you shift the work from the application into the database then you can reduce substantially the response times and the data transfer between the application and database.
Consider the following scenario. You are building logic into the application whereby you are attempting to add a new contact into your database. To avoid duplicate entries you want to first verify that the contact name does not already exist in the database. If it does you then update the remainder of the information i.e. perform and update. If there is not existing entry then you insert a new record into the database. Without using stored procedures you would have the application send the SQL statement (does contact record already exist) across the network to the database. The database then responds back to the application that it either exists or it does not. It then makes a decision based on the response from the database and either generates an INSERT SQL statement or an UPDATE SQL statement. In this case two statements are transferred over the network, parsed by the database server and then executed. The results are then sent back and more SQL statements are executed. In the case of a stored procedure a call is made to the stored procedure which has already been parsed and simply executes. Upon completion of execution it then returns a response to the calling application. This can significantly improve performance, especially if the response times are large and the volume of data exchanged is large which is used to make some form of decision.
Another advantage of using stored procedures for any UPDATE, DELETE or INSERT statements is that all the applications business logic is maintained within the database. The business logic is the logic that is followed when data is being added into a database.
On the flip side, maintaining code at two locations for an application can become complex, the application code and the database stored procedures. Their interactions need to be fully understood as a change in one location can have substantial implications in the others.
Look out for further whitepapers on developing database stored procedures and other performance enhancement tips and tricks.
Ronan Cashell is a Senior Database Administrator/Technical Project manager with over 20 years of experience working in IT both as a database administrator and a UNIX/Linux administrator. He is a certified Oracle DBA and certified Linux administrator.