Reformat PHP code with RegEx in Eclipse, Dreamweaver etc
Eclipse and Dreamweaver both support regular expressions in search and replace. This is an immensely powerful and useful feature that makes otherwise tedious replacement tasks a breeze. In this article, a few expressions for some useful search / replace tasks are illustrated. Dreamweaver's regular expression search and replace is particularly handy for developers because you can do multi-line search and replace, and you can search and replace over an entire site or selected files in the site at once.
compton, 9 November 06
Updated 9 April 13
To use regular expressions in Eclipse or Dreamweaver searches, simply tick the Use regular expressions box on the search dialog. Useful RegEx features for Dreamweaver and Eclipse include the use of brackets to create a 'group'. The text which matches the RegEx within a pair of brackets can then be used in the replace string using $1 for the first bracketed group, $2 for the second bracketed group, and so on.
Replacing Concatenated SQL with Multi-line StringsWhen you have a multiline SQL query, you can either build it up one part at a time with string concatenation, or have it as a single string containing lots of whitespace. There are arguments for and against both ways, and in the end it's down to personal preference. Here's how you can replace concatenated strings with a single multiline string however - search for:
";\n(\s*)\$sql .= "
Remove Extraneous Whitespace Before PHP Tags in HTML FilesSome editors indent PHP tags within a HTML file to match the indentation of the HTML elements. Sometimes, you'll find inexperienced devs trying to match the PHP indentation with the HTML indentation, then I'm afraid it's table-flip time. The PHP code has a totally distinct indentation from the output it's producing, and HTML tags should be viewed as the PHP output - they are not part of the server-side script at all. Graah!
Anyway, ranting aside, removing the whitespace before the PHP opening and closing tags is pretty straightforward.
Search for the following:
And replace it with:
Escape HTML Form Values with EntitesWhen you wish to output PHP variables inside HTML elements, such as attribute values in an HTML form, some characters could royally fuck with your site, principally the quote character that delimits the attribute, or a less-than sign if you are so old-school (or just plain old) that you don't use any quote character around HTML attributes. Escaping these characters by passing the PHP variable through htmlspecialchars() is the way to go, but it's easily forgotten, and a pain to do afterwards. Thanks to the following RegEx, you can add this function later with no sweat:
"<\?= (\$.*?);? \?>"
"<?= htmlspecialchars($1); ?>"
If for some reason you are averse to PHP short tags, use the following.
"<\?php echo (\$.*?);? \?>"
"<?php echo htmlspecialchars($1); ?>"
Replace Deprecated ereg() Calls with preg_match() Equivalent
The POSIX compliant ereg() function is being deprecated from PHP in the near future. It's also not as efficient or as powerful as its PERL compatible brother, preg_match(). The following search/replace will also replace the case-insensitve eregi() with preg_match() and a RegEx case-insensitive modifier. Note that by design, the search pattern will not match any ereg() calls that contain a forward slash in the pattern, as they would not be correctly replaced. To fix any such instances, you can repeat the search but using a different pattern delimiter such as the hash character.
The search text will also not match if the RegEx contains a double or single quote. If you have many of those that you wish to replace, and you know all your strings use single quotes, you can easily rewrite the above:
Replace ereg with Simple Logic Expressions
RegEx addicts have been known to use regular expressions to perform simple equality matches, for instance:
ereg( "^mars|jupiter$", $my_favourite_planet )
This can't easily be replaced with the equivalent logical expressions in a single step. The following will make it quicker though, by replacing those sorts of ereg calls with a single logical equality comparison that can then be changed as required.
$2 == '$1'
Add semi-colon to last rule of a CSS style
Whilst implementing a PrestaShop template, I came across several CSS files where the final rule's semi-colon is omitted on every rule definition. It's a bit tiresome to have to keep adding it when a new rule is added; this will fix those.
Change from opening braces at end of line to hanging braces
To enter a return character in the search/replace boxes, use Ctrl-Enter (Shift-Enter also works). Similarly, to fix any else statements, use the following after using the above:
Both these assume that you use spaces to indent code. If tabs are used, then use (\t*) in place of ( *) in the search strings above.
Change spaces to tabs
If your code has spaces, and you wish to use tabs, it's easy enough to change those with search and replace. Regular expressions aren't needed. If your code uses three spaces for each level of indentation, just use 3 spaces as the search string, and a single tab character for the replace text. To do a tab character in Dreamweaver's search or replace fields, use Ctrl-Tab.
Change from double-quotes to single-quotes
Double quotes and single quotes are subtly different in PHP. A string enclosed in double quotes will be scanned and all tokens it contains will be replaced, while no such extra processing is performed on strings enclosed in single quotes. So if you have a string which does not contain any tokens (eg PHP variables), it's preferable to use single quotes.
In my job, I recently had to work with some very poorly written code, the least problem of which was use of double quotes when single ones would be (slightly) better. Only in the most extreme cases would this make any significant difference to a website's performance, but converting some of these is a good example for how RegEx can be used in Dreamweaver, or even Notepad++ should you prefer that like I do. The following RegEx would replace array key strings (eg $myarray["key"] - these indexes almost never contain tokens):
Instead of using the character class, [^"]*, to match the content of the string, you can make use of the \w RegEx escape character. It indicates a 'word' character ie any character which can be used in a word: \["(\w*)"\]
The RegEx can be quite easily expanded to also match strings inside normal brackets, for instance in the line include("myfile.php");, using the pipe "|" operator, which will match either the character to its left, or the one to its right. This defines a character class, so must be contained within square brackets. As we'll need to know whether a square or standard bracket was matched for the replace, the bracket-matching patterns will each have to be placed within their own set of normal brackets, making for a virtually unreadable RegEx:
To do the same in Notepad++ requires slightly different notation. Notepad++ RegEx doesn't support the \w escape character, so you need to specify a character class that does the same thing. Secondly, Notepad++ uses \1 etc instead of $1 in the replace string to indicate a group:
Fix unclosed HTML tags for XHTML Compliance
Older HTML standards allowed single tags such as img and input which in order to be XML compliant for XHTML need to be closed. The following RegEx can be used to search and replace these.
<$1 $2$3 />
Replace Strings set to empty string just before being assigned a value
\$sql = "";\s*\$sql \.= "
\$sql = "
Add whitespace around logical equality operators
Of all the keys on the keyboard to avoid pressing, the spacebar is the least worthy. After all, it has nothing printed on it so you need not worry about wearing away the letters.
To add spaces before and after operators in if statements, this will help:
([^ !])([=<>!]==?)([^ !])
$1 $2 $3
Restore Missing Semicolons
Here's another trivial and inconsequential annoyance: missing semicolons/spaces, in particular at the ends of the last PHP statement of a scriptlet. The semicolon can acceptably be dropped if you use the shorthand <?= $var ?>, but if you prefer to use the equivalent <?php echo $var; ?>, a semicolon should strictly be present. Spaces always help readability and there's never an excuse for stingy spacebar usage, other than being a noob.
There's no good reason not to use the shorthand syntax - under the hood they are identical - but consistency is important. Similarly, to replace the shorthand form with longhand, search for this:
<\?= ?(.*?);? ?\?>
And replace with:
<?php echo $1; ?>