While at work I came to an interesting conundrum when trying to validate/clean user names for the application I work on. Here is a description of the restrictions we put on our user names:
“ID may consists of A-Z, a-z, 0-9, underscores, and a single dot (.).”
While the project development staff is extremely talented, we didn’t have anyone who was particularly amazing with regular expressions. I decided to go ahead and take the plunge and start to do some research into the subject. I took a look around the internet and within the PHP documentation and couldn’t find an example of exactly what I was looking for. In fact, while searching for “php remove repeating character in string,” the first article listed in Google included a note at the bottom stating:
“I tried to have the same result using regular expressions but no success. If anyone resolved this using RegExp please share :).”
It was evident I was going to dive into the wonderful world of regex on my own. Our other filtering functions were easy enough. We have ranges of acceptable characters and some RFC-type reg-ex checks on many of our inputs that we let our users submit. In most cases running through something like this:
cFilter::clean_variable($variable); |
cFilter::clean_variable($variable);
This would run a preg_replace with the necessary regex expression and remove the bad stuff.
The challenging part was enforcing the single character. Here’s why. Let’s assume that someone wants to enter the user name of ‘cool…guy‘. First note: I am escaping the ‘.’ character with a backslash (‘\’).
$nickname = 'cool...guy';
if (ereg("\.{2,n}", $nickname, $reg)) {
echo var_dump($reg);
} |
$nickname = 'cool...guy';
if (ereg("\.{2,n}", $nickname, $reg)) {
echo var_dump($reg);
}
This code produces the following output:
array(1) { [0]=> string(3) "..." }
So, if we were to run a replace using the simple “\.{2,n}”, we would end up blowing away all the dots. But that’s not what we’re looking for because the user is allowed to have one dot in their name. So, simply replacing the offending area with black characters was not going to work.
The second iteration I came up with was the following:
while (ereg("\.{2,}", $string, $reg)) {
$string = str_replace($reg[0], ".", $string);
} |
while (ereg("\.{2,}", $string, $reg)) {
$string = str_replace($reg[0], ".", $string);
}
If you are familiar with PHP, you can probably see that I am using ereg with str_replace. So, why not just use ereg_replace? Well, that’d be a great plan. Here’s what I finally came up with:
$new_string = ereg_replace("\.{2,}", ".", $string);
So, if you wanted to use a different character, here’s an example of what you can do with this line of thinking. In this case, we want them to be able to have an ‘n’ in their name, but only one:
$string = 'coolnness';
$new_string = eregi_replace("n{2,}", "n", $string);
echo $new_string; |
$string = 'coolnness';
$new_string = eregi_replace("n{2,}", "n", $string);
echo $new_string;
If you wanted to remove the offending characters all together, then use a blank character identifier in place of the second parameter like so:
$new_string = eregi_replace("n{2,}", "", $string);
Hope this helps someone else ’cause it sure took me a while to figure out. If you liked this post, please be sure to subscribe to my feed.