Monday, May 2, 2011

What is the difference between Explode and Split and Regular Expressions in PHP

Explode v/s split
split handles regular expressions. explode does not.
Split – Split a string into array by regular expression.
Explode – Split a string by string.

What are Regular Expressions in PHP

Regular Expression, commonly known as RegEx is considered to be one of the most complex concepts. However, this is not really true. Unless you have worked with regular expressions before, when you look at a regular expression containing a sequence of special characters like /, $, ^, \, ?, *, etc., in combination with alphanumeric characters, you might think it a mess. RegEx is a kind of language and if you have learnt its symbols and understood their meaning, you would find it as the most useful tool in hand to solve many complex problems related to text searches.
Just consider how you would make a search for files on your computer. You most likely use the ? and * characters to help find the files you're looking for. The ? character matches a single character in a file name, while the * matches zero or more characters. A pattern such as 'file?.txt' would find the following files:
file1.txt
filer.txt
files.txt

Using the * character instead of the ? character expands the number of files found. 'file*.txt' matches all of the following:
file1.txt
file2.txt
file12.txt
filer.txt
filedce.txt
While this method of searching for files can certainly be useful, it is also very limited. The limited ability of the ? and * wildcard characters give you an idea of what regular expressions can do, but regular expressions are much more powerful and flexible.
Let Us Start on RegEx
A regular expression is a pattern of text that consists of ordinary characters (for example, letters a through z) and special characters, known as metacharacters. The pattern describes one or more strings to match when searching a body of text. The regular expression serves as a template for matching a character pattern to the string being searched.
The following table contains the list of some metacharacters and their behavior in the context of regular expressions:
Character Description
\ Marks the next character as either a special character, a literal, a backreference, or an octal escape. For example, 'n' matches the character "n". '\n' matches a newline character. The sequence '\\' matches "\" and "\(" matches "(".
^ Matches the position at the beginning of the input string.
$ Matches the position at the end of the input string.
* Matches the preceding subexpression zero or more times.
+ Matches the preceding subexpression one or more times.
? Matches the preceding subexpression zero or one time.
{n} Matches exactly n times, where n is a nonnegative integer.
{n,} Matches at least n times, n is a nonnegative integer.
{n,m} Matches at least n and at most m times, where m and n are nonnegative integers and n <= m.
? When this character immediately follows any of the other quantifiers (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. A non-greedy pattern matches as little of the searched string as possible, whereas the default greedy pattern matches as much of the searched string as possible.
. Matches any single character except "\n".
x|y Matches either x or y.
[xyz] A character set. Matches any one of the enclosed characters.
[^xyz] A negative character set. Matches any character not enclosed.
[a-z] A range of characters. Matches any character in the specified range.
[^a-z] A negative range characters. Matches any character not in the specified range.
\b Matches a word boundary, that is, the position between a word and a space.
\B Matches a nonword boundary. 'er\B' matches the 'er' in "verb" but not the 'er' in "never".
\d Matches a digit character.
\D Matches a nondigit character.
\f Matches a form-feed character.
\n Matches a newline character.
\r Matches a carriage return character.
\s Matches any whitespace character including space, tab, form-feed, etc.
\S Matches any non-whitespace character.
\t Matches a tab character.
\v Matches a vertical tab character.
\w Matches any word character including underscore.
\W Matches any nonword character.
\un Matches n, where n is a Unicode character expressed as four hexadecimal digits. For example, \u00A9 matches the copyright symbol (©).

RegEx functions in PHP
PHP has functions to work on complex string manipulation using RegEx.  The following are the RegEx functions provided in PHP.

Function Description
ereg This function matches the text pattern in a string using a RegEx pattern.
eregi This function is similar to ereg(), but ignore the case sensitivity.
ereg_replace This function matches the text pattern in a string using a RegEx Pattern and replaces it with the given text.
eregi_replace This is similar to ereg_replace(), but ignores the case sensitivity.
split This function split string into array using RegEx.
Spliti This is similar to Split(), but ignores the case sensitivity.
sql_regcase This function create a RegEx from the given string to make a case insensitive match.

Finding US Zip Code
Now let us see a simple example to match a US 5 digit zip code from a string
<?
$zip_pattern = "[0-9]{5}";
$str = "Mission Viejo, CA 92692";
ereg($zip_pattern,$str,$regs);
echo $regs[0];
?>
This script would output as follows
92692
The above example can also be rewritten using Perl-compatible regular expression syntax with preg_match() function.
<?
$zip_pattern = "/\d{5}/";
$str = "Mission Viejo, CA 92692";
preg_match($zip_pattern,$str,$regs);
echo $regs[0];
?>

Wrap Up
Hope you had a good session with RegEx and now you would have some understanding on tackling problems related to text pattern findings using RegEx.  To become a specialist in RegEx, you need to continuously practice it and need to identify complex problems and give a try to solve them. Happy Practicing With RegEx.

No comments:

Post a Comment