This is an improvement / follow up to PHP: Hackers Paradise. You might like to read it first, or refer to it at various times since I choose to repeat myself as little as possible.
PHP (http://www.php.net) is a powerful server side web scripting solution. It has quickly grown in popularity and according to the February 2001 usage stats PHP is installed on 19.8% of all web sites (up 7% from when I gave a similar talk last year). Much of its syntax is borrowed from C, Java and Perl with some unique PHP-specific features thrown in. The goal of the language is to allow web developers to write dynamically generated pages quickly.
Being a good PHP hacker isn't just about writing single line solutions to complex problems. For example, web gurus know that speed of coding is much more important than speed of code. In this article we'll look at techniques that can help you become a better PHP hacker. We'll assume that you have a basic knowledge of PHP and databases.
The main topics that I want to cover today are:
Some of these were covered in more detail in PHP: Hackers Paradise. This revisit has refined a number of ideas, made the transition to PHP4 and focuses a lot more on first principles and good code structure for web applications.
There are two key ways to be lazy. Firstly always use existing code when it is available, just integrate it into your standards and project. The second technique is to develop a library of helpful functions that let you be lazy in the future.
PHP Code Exchange - http://px.sklar.com
PHP Classes Repository - http://phpclasses.upperdesign.com
PHP Knowledge Base - http://php.faqts.com
PHP Mailing List Archive - http://www.progressive-comp.com/Lists/?l=php3-general&r=1&w=2
There are a number of useful code modules you can use that I used as part of the PHP: Hackers Paradise talk. They include things like:
Once you start doing serious PHP programming though it is very important to revisit the PHP language from a concept of first principles. By understanding the basics under the hood of the language you will glean a greater insight into its power and avoid many of the pitfalls that can reveal themselves over time.
Typically as you discover and research some of the basic PHP concepts you will look at your old code (which mostly worked) and discover things that could have been done in much safer and more reliable ways.
The main thing to remember is that just because you can use any type of programming paradigm, that doesn't mean that you should. Consistency of style and programming practice is essential if you want to build sites that are maintainable. Besides, you don't want to spend the rest of your life trying to work out what the hell you were thinking.
Web programming requires everything right through from structured backend code to hackable frontend scripts. Typically I like to use classes and objects at the backend to build a high-level functionality that I can quickly utilize from frontend scripts. I use a fuzzy middle layer of functions to fill the gaps that inevitably appear between these layers.
Function and class names are case insensitive. Even all of the built in PHP functions are case insensitive.
function foo () { return 'foo'; } function FOO () { return 'FOO'; } // error: function foo() already declared
Variables, on the other hand, are case sensitive.
$foo = 'foo'; $FOO = 'FOO'; echo "$foo:$FOO"; // prints foo:FOO
PHP parses one file at a time. Thus, you can use any function or class that has already been declared or is declared as part of the current file. So, this is valid:
foo(); function foo () { echo 'foo'; }
Note that if we move the foo() function into an include file the above ordering will no longer work regardless of whether we use include() or require() since PHP will attempt to run that step after the call to the function foo(). So, this will not work:
foo(); include('./foo.inc');
It is important to note that the above does not hold true for classes, even when declared in the same file. So, this will produce an error:
$b = new b; echo $b->foo; class b extends a { var $foo = 'b'; } class a { var $foo = 'a'; }But, we can extend a class that hasn't yet been defined. So this is valid:
class b extends a { var $foo = 'b'; } class a { var $foo = 'a'; } $b = new b; echo $b->foo;
When a file is include()ed, the code it contains inherits the variable scope of the line on which the include() occurs. Any variables available at that line in the calling file will be available within the called file. If the include() occurs inside a function within the calling file, then all of the code contained in the called file will behave as though it had been defined inside that function. The same is true for require().
Code declared within a function works in an interesting way. Basically, PHP lets you declare constructs such as functions and classes within a function but then makes them available in the global space only after the function has been executed. So, this is valid:
function foo () { function bar () { return 'bar'; } return 'foo'; } // if I call bar() before calling foo() PHP would throw an error echo foo(); // bar() is available in the global space now that foo() has been called echo bar();
We also need to be careful about how we use variables in function definitions. For example, you write a function to do all of the includes for your site:
function my_include ($name) { require("/my/include/path/$name.inc"); }
Now, any variables that you declare seemingly in the global space in this require()d files will actually only exist in the scope of the my_include function. To make them global, they must be explicitly declared as such. For example, imagine foo.inc:
// using foo in what would appear to be global space (but actually depends on how we are included) echo "$foo\n";and its calling file main.phtml:
$foo = 'global foo'; function test () { $foo = 'test function foo'; include('./foo.inc'); } test(); // prints "test function foo"; include('./foo.inc'); // prints "global foo";Now if we change foo.inc to explicitly declare the variable as global (since that was the intention):
global $foo; echo "$foo\n";we would get the following output from main.phtml
global foo global foo
The scope of variables in PHP is basically identical to that in other languages. The only difference is that global variables must be explicitly declared before they are used. This is because they are implemented as references which I will discuss in more detail later.
The last interesting effect on scope is eval() statements. eval() is basically the same as pretending that you have a file for the string and then include()ing the file. It is executed in exactly the same way. That means: it is run in the scope of where it is called, any definitions in the script are available to the eval, any definitions made in the eval are then available in the script. So, be careful of variable scope and definition conflicts.
Variables that have not been set are defined as NULL. If you use a variable that has not been set then PHP will treat the NULL like it is an empty string (unless you are doing tests on its type). But, don't depend on your variables being empty unless you have explicitly set them to be so. Users can pass arbitrary variable values into your script using the GET and POST HTTP methods.
The big gotcha to watch out for with types are mismatches. Equality (==) of two variables is not the same as identity (===). There is more about this in the know your data section below.
You can explicitly cast and convert between types in PHP when necessary. The main time you need to think about doing this is when you are trying to do comparisons between variables of unknown or different types.
A good example of where types can get tricky is with database operations. Any return value from a database will always be represented in PHP as a string. The type in the database is irrelevant, immediately upon its return to PHP the value will be a string. Implicit conversion will treat this string as an integer or however you might expect, but under the hood it is still a string. There is one notable exception however. If you get a NULL return value from the database then PHP will create a variable of type NULL (ie: effectively the variable is not set).
The classic debugging technique for PHP (and many other languages) is through echo statements. Just dump out information or the contents of your variables at various points in the script. There are some tricks that you can use to make debugging more effective:
There are now some PHP debuggers coming onto the market that allow you to step through PHP code line by line. Since PHP always runs on the server side these generally require some sort of debugging server to be installed.
To get around this problem PHP lets you create references to data. For example, you can make $b reference the same data as $a with this command:
$b =& $a;
An example of a reference that you've all been using for years is this:
$foo = "foo\n"; function unset_foo () { global $foo; unset($foo); } unset_foo(); echo $foo;Unexpectedly this will print the following:
fooThis is because global $foo just creates a reference to the global variable $foo. Unsetting the reference does nothing to unset the other reference that is in the global space.
Another interesting example is in the creation of a new object. This familiar code also does a copy, not a reference assignment:
$b = new b();So, the constructor for the class b will have created an object that is then immediately copied into the variable $b and the first object is wasted since it no longer has any references. You can avoid this copy by instead assigning the value of $b by reference:
$b =& new b();
I've just scratched the surface of references here. They are a mine field with lots of surprises. I know there are big gains to be made through careful use of references. I did some quick timing tests and found a simple function pass and return by reference example that was about 200x faster than passing and returning by copy. Read through the PHP manual on references:
then re-read it and try some examples and re-read it again.
Most of the time data manipulation in PHP is so easy that you don't even need to think about it. Most of the time. The rest of the time it can cause bugs and problems that you may never have anticipated and will have lots of trouble tracking down. As PHP hackers we need to build up an awareness of the data which just ticks away in the back of our brains so we can avoid these problems before they arise.
PHP4 has introduced a concept of identity to the language. Two values are identical if they have the same content and the same type. Equality is based on two values having the same content after they have been converted to the same type. Here are some examples:
echo "<p>undefined variable is".($a == '' ? '' : ' not')." equal to empty string"; // equal echo "<p>false is".(false == '' ? '' : ' not')." equal to empty string"; // equal echo "<p>number zero is".(0 == '' ? '' : ' not')." equal to empty string"; // equal echo "<p>string zero is".('0' == 0 ? '' : ' not')." equal to number zero"; // equal echo "<p>string foo is".('foo' == 0 ? '' : ' not')." equal to number zero"; // equal echo "<p>string foo123 is".('foo123' == 0 ? '' : ' not')." equal to number zero"; // equal echo "<p>string 123 is".('123' == 0 ? '' : ' not')." equal to number zero"; // not equal (123 != 0) echo "<p>string 123foo is".('123foo' == 0 ? '' : ' not')." equal to number zero"; // not equal (123 != 0) echo "<p><br>"; echo "<p>undefined variable is".($a === '' ? '' : ' not')." identical to empty string"; // not identical echo "<p>false is".(false === '' ? '' : ' not')." identical to empty string"; // not identical echo "<p>number zero is".(0 === '' ? '' : ' not')." identical to empty string"; // not identical echo "<p>string zero is".('0' === 0 ? '' : ' not')." identical to number zero"; // not identical echo "<p>string foo is".('foo' === 0 ? '' : ' not')." identical to number zero"; // not identical echo "<p>string foo123 is".('foo123' === 0 ? '' : ' not')." identical to number zero"; // not identical echo "<p>string 123 is".('123' === 0 ? '' : ' not')." identical to number zero"; // not identical echo "<p>string 123foo is".('123foo' === 0 ? '' : ' not')." identical to number zero"; // not identical
You need to think carefully about your data cases, particularly where information is optional. For example, imagine you are creating a database query form where the user can enter an empty string, the word NULL or no data. We need some way to catch and handle all three of these conditions. It's the no data case that is tricky since we can't use the empty string as that will conflict with the case where they have entered data that just happens to be an empty string. Using the new NULL keyword seems like a good choice but that might conflict with the word NULL entered by a user. The solution is to introduce a new unique string that we consistently use in these cases. In all Synop code we use ss_unknown to describe values that are not specified. This is a unique string that is very unlikely to be matched or entered by users.
Having an ss_unknown case is particularly useful given the differences in SQL syntax that are required when handling NULL values in database queries. For example, in MySQL these queries can return different results:
select * from member where url is NULL; // matches when url is NULL select * from member where url = ''; // matches when url is the empty string select * from member where url = 'NULL'; // matches when url is the string 'NULL'This means that we need to form different database queries based on the type of data that we are looking for. By having an explicit case for NULL (no value) we can build the appropriate queries easily.
Passing data from page to page through the HTTP mechanisms is quite an art form the minute there might be anything slightly complex to be sent. For example, sending a double quote as part of a hidden POST field or sending ampersands in GET data. The key is to use urlencoding to protect your data from corruption. But, the cases for using urlencode() and urldecode() need to be carefully chosen and examined.
hidden POST variables need to be urlencoded and urldecoded POSTed forms don't need to be urlencoded or urldecoded GET variables need to be urlencoded but not urldecoded Cookies don't need to be urlencoded or urldecoded
Data in textareas may need to have htmlspecialchars() applied. For example, if you want to have the string "" in your textarea then it must be protected so that it doesn't prematurely end the box. Using htmlspecialchars() causes the text to be displayed properly in the textarea and the effect is undone when the code is submitted for processing. I'm a bit baffled as to how this works, but all my testing seems to indicate that it solves the problem.
PHP is the perfect language for chameleon coding as it supports both structured classes and simple web scripting. In this section we will look at some coding and page structures you can use to help build applications that are robust, yet easy to change and simple to maintain.
Mixing programming code in with HTML is messy. We can talk about ways to format the code or structure your pages, but the end result will still be quite complicated. We need to move as much of the code away from the HTML as possible. But, we need to do this so that we don't get lost in the interaction between our application and the user interface. A web site is a dynamic target. It is continually evolving, improving and changing. We need to keep our HTML pages simple so that these changes can be made quickly and easily. The best way to do that is by making all calls to PHP code simple and their results obvious. We shouldn't worry too much about the structure of the PHP code contained in the front end, it will change soon anyway. That means that we need to remove all structured code from the actual pages into the supporting include files. All common operations should be encapsulated into functions contained in the backend.
In complete contrast to the web pages your backend code should be well designed, documented and structured. All the time you invest here is well spent, next time you need a page quickly hacked together all the hard parts will be already done waiting for you in backend functions. Your backend code should be arranged into a set of include files. These should be either included dynamically when required, or automatically included in all pages through the use of the php_auto_prepend_file directive. If you need to include HTML in your backend code it should be as generic as possible. All presentation and layout should really be contained in the front end code. Exceptions to this rule are obvious when they arise, for example, the creation of select boxes for a date selection form. PHP is flexible enough to let you design your code using classes and or functions. My object oriented background means that I like to create a class to represent each facet of the application. All database queries are encapsulated in these classes, hidden from the front end pages completely. This helps by keeping all database code in a single location and simplifying the PHP code contained in pages.
A good example use of include files is to separate out sections of content into a form that makes them easier to maintain and reuse. For example, many home pages on the web are basically broken into a number of content boxes. Yahoo is basically boxes of links, auctions, news, shopping, events and self-promotion. Using include files in PHP we can break this page into the following structure:
index.phtml -> links.inc -> auctions.inc -> news.inc -> shopping.inc -> events.inc
Now each content box is on its own and can be maintained independently. This structure is so simple that you can use it to build completely dynamic sites for people who know nothing about PHP and refuse to use any HTML editor other than Frontpage. Just break their content areas out into a number of small files and let them go nuts. Your PHP code is safely locked in files called something like index-dont-touch.php that they can ignore.
Best of all, those content boxes can be reused anywhere on the site and only need to be edited and updated in a single location.
The easiest way to do web forms is through a multiple page interaction with the user. The simple case is just to have a page for prompting and a page for processing the results. Slightly more complicated (mostly due to the urlencoding problems discussed above) is to add a confirm step to the sequence. Here is a file structure that we found to be fairly flexible across many applications:
A more flexible and robust scheme that we've been working on lately uses a more complex include file structure but manages to break up all the form processing steps into simple stages. This makes it simple to write forms and the end result is easier to use. All of the prompting, confirming and saving phases are done on the same page. This way we can display errors along with the data to be edited, can make the confirm step optional for the user, and can redirect from the save step to another location.
Here is the structure for the index.phtml file calling to the other scripts:
include('./prepare.inc'); if (isset($cancel)) { include('./cancel.inc'); } else { if (!isset($confirm) && !isset($save)) { include('./init.inc'); include('./index.inc'); } else { include('./check.inc'); if (!strempty($error_message)) { include('./index.inc'); } else { if (isset($confirm)) { include('./confirm.inc'); } elseif (isset($save)) { include('./process.inc'); include('./save.inc'); include('./cancel.inc'); } } } }
The web is different. Here it is more important to finish a project as soon as possible than it is to get it perfect first time. Web sites are evolutionary, there is no freeze date after which it is difficult to make changes.
I like to think of my web sites as prototypes. Everyday they get a little closer to being finished. I can throw together 3 pages in the time it would take to do one perfectly. It's usually better on the web to release all three and then decide where your priorities lie. Speed is all important.
So, everything you do as a programmer should be focused on the speed at which you are producing code (pages).
They are useful to know both so you can feel you are optimizing your code and to aid your understanding of certain PHP concepts.
Here is a quick set of test data to compare the performance of str_replace with some regular expressions when making changes to a simple string. Not that although the difference is significant (20x) the overall saving from a single usage would only be 0.000095 secs.
$string = 'Testing with <em>emphasis</em> on a long string so we can see how the <em>different</em> replace functions perform.'; ss_timing_start('str_replace'); for ($i=0; $i<10000; $i++) { str_replace('em>', 'strong>', $string).'<br>'; } ss_timing_stop('str_replace'); ss_timing_start(ereg); for ($i=0; $i<10000; $i++) { ereg_replace('em>', 'strong>', $string).'<br>'; } ss_timing_stop(ereg); ss_timing_start(eregi); for ($i=0; $i<10000; $i++) { eregi_replace('em>', 'strong>', $string).'<br>'; } ss_timing_stop(eregi); ss_timing_start(ereg_pattern); for ($i=0; $i<10000; $i++) { ereg_replace('<([/]*)em>', '<\1strong>', $string).'<br>'; } ss_timing_stop(ereg_pattern); ss_timing_start(eregi_pattern); for ($i=0; $i<10000; $i++) { eregi_replace('<([/]*)em>', '<\1strong>', $string).'<br>'; } ss_timing_stop(eregi_pattern); echo "10,000 iterations gave:"; echo "<p>str_replace - ".ss_timing_current(str_replace); echo "<p>ereg - ".ss_timing_current(ereg); echo "<p>ereg_pattern - ".ss_timing_current(ereg_pattern); echo "<p>eregi - ".ss_timing_current(eregi); echo "<p>eregi_pattern - ".ss_timing_current(eregi_pattern);Here are the results:
// TODO
There are a number of caching systems that are starting to come out in the open source space and through commercial vendors. These basically keep a compiled form of PHP in memory saving the need for PHP to parse all of your scripts for every request to the web server. The time and memory savings through these offerings is significant, we've seen up to a 400% increase in performance.
In fact, I was pretty close the solution as I ran around blowing my stack. It turns out that PHP doesn't like infinite recursion. It can take quite a while to track down that your no data error is due to runaway recursion, particularly if you are used to being burned by the old PHP parser which would occaisionally get in a funny state and start returning those kinds of problems for parse errors.
PHP4 has introduced the capability for functions to handle variable length argument lists. This is fantastic for situations where you want to create data objects for example. Imagine:
$s = new Set($a, $b, $c, $d, $e, $f, $g, $h);
Be careful when forming regular expressions to keep in mind all the different times when slashes, periods, dollar signs and quotes need to be escaped. For example, to match a dollar amount you would need a regular expression that looks like this:
\$[0-9]{1,}\.[0-9]{2,2}As a PHP string the slashes and the dollar sign need to be escaped, so this needs to be done as:
"\\\$[0-9]{1,}\\.[0-9]{2,2}"
Installing PHP for scripting on unix is easy. Just remove the -with-apache directive from your configure options. This will create the PHP binary that can be used to run scripts directly from the command line.
You can then write your script like any other shell script. Here is an example:
#!/usr/local/bin/php -q <?php // your php code here ?>
Once you start scripting with PHP the possibilities are endless. It's a fully featured language, you can do anything you would normally do in a shell script.
In turns out that the reality of protecting your code and scripts is actually much harder than it may initially seem. Protecting a single library file is easy, just encode it with some hard coded time bomb checks or similar and away you go. Protecting thousands of PHP scripts and libraries is a completely different ball game.
Encoded source does nothing to protect you at run time. All of your global variables and function names are there for everyone to see. All they have to do is remove your libraries one at a time and they will get PHP errors that dutifully report every missing function or class.
A problem we faced recently was working out how to protect all of our scripts through a license package that could modularise all of the license checks and problems. It seems simple, just encode the license package and make a call to something like ss_license_valid(package_name). Unfortunately it's not that easy since a user can just remove the license package and replace it with their own little license validation function that always returns true.
In fact, you quickly realise that we have the classic authentication problem with eavesdropping. Luckily we have some safety in that the encoder can be used to encrypt up secrets for communcation between the packages allowing us to authenticate. The ideal solution would be through some sort of public key that can be published by each package, but PHP doesn't have support for public key encryption at this time.
So, to protect a library of encoded scripts you must first validate the license package from package foo, then validate the license for foo using the license package.
The PHP Knowledge Base is a growing collection of PHP related information. It captures the knowledge from the mailing list into a complete collection of searchable, correct answers. Of course, I may be a little biased:
The PHP manual is a great reference point for information on functions or language constructs.
If you can't find the relevant information in the PHP Knowledge Base your next stop should be the mailing list archives. There are thousands of questions on the mailing list every month so you can be almost certain your question has been asked before. Prepare to do some wading.
If all that searching fails to help, try asking on the mailing list. A lot of PHP gurus reside there.
If all these on-line resources aren't enough or you hate reading from a computer screen, you might be interested in one of the many PHP books that are now available.