Now we turn to a component of regular expressions that cannot be left out. We will be dealing with characters, how to use them, and how to escape them when needed. Regular expressions are explained in the PHP manual, and in this article we are focusing on the type that are Perl compatible, like in the function preg_match. The pattern is the regular expression we are using in each case. Since it uses a slash at the beginning and end, here’s the first thing to understand about characters in patterns: each character has a meaning or function, and escaping the character gives it a different meaning.

preg_replace("/catalog/toys/", "catalog/fun", "http://shop.com/catalog/toys");
//this basically cannot work because of the 2nd slash

preg_replace("/catalog/toys/", "catalog/fun", "http://shop.com/catalog/toys");
//this is correct

Escaping a character, as shown, is usually butting the backslash character before it. It is also normal to put it in front of a double quote or another backslash in a PHP string like that anyway, but in patterns we use it for a various things.

preg_replace("/[rn]/", "", "http://shop.com/catalog/toysn"); //removes the newline

r and n are also standard php string characters representing newlines. (There may be different systems for newlines from different operating systems and programs, some use n only, some use rn.) The [] brackets in the pattern mean you want to match any character inside the brackets. The function basically means replace r or n with “”. If you wanted to actually match brackets instead of that you can escape them too: []

preg_replace("/s/", "", " My Dog Has Fleas");
//removes the spaces (ie. MyDogHasFleas)

preg_replace("/^s/", "", " My Dog Has Fleas");
//removes only a single space and only if it is at the beginning

This is turning into a crash course, 2 things are shown here. s means any space, even n, even tabs, and in this case spaces. Because it is the only thing in the pattern it matches each space and removes them all. The next one doesn’t, it finds a space at the beginning and then since there no more it stops. The ^ means only match at the start, and you only put the ^ at the beginning of the pattern. There is one for the end as well: $. Using $ or ^ gives you the regular old character like the other examples. Using a ^ inside a bracket set, which is called a character class, has a different meaning.

preg_replace("/[^s]*$/", "", " My Dog Has Fleas");
//removes the end till it hits a space (ie. My Dog Has )

preg_replace("/S*$/", "", " My Dog Has Fleas");
//same as the last one

Inside the character set the ^ at the start reverses meaning. s means all spaces, but we made it mean anything but spaces. We have the * to make it match any number in a row. We have the $ at the end to match the ending only. The second one is a capital S, and it means the same thing, anything but spaces. d means any digit, D means anything but digits, b is a word boundary, B is not a word boundary. Not all escaped characters have a counterpart like this though.

Some of the characters are tricky to use because of their changing meaning. For example, the manual describes:

40 is another way of writing a space

40 is the same, provided there are fewer than 40 previous capturing subpatterns

preg_replace("/w+.com/", "", "http://shop.com/catalog/capcomn");

That example looks like an attempt to match and replace and domain that has .com. w is any letter or digit and it must end with .com. The only thing is, there is no slash on the dot. It can actually match any character when not escaped, and can match capcom as well as shop.com. That was just an example of where mistakes can be made. This concludes the article on regular expression characters, I hope it has been informative.