Misc

Just when you thought the mainstream media couldn’t get more convoluted and controlled, on April 14th, 2008, the Associated Press announced their new members for their board of directors.

The Associated Press is the last remaining nationally oriented news service in the United States. It is a service that can reproduce stories that are published with American news organizations that belong to the organization. Companies can then pay a fee to the Associated Press to reproduce the story in other countries/languages.

However, the latest appointees to the board of directors definitely points to a bid for control and power.

Here are the 4 new members:

Just look at the massive amount of media these companies control.

In a world where the major news media outlets are falling to the online availability and dispersal of information that comes with the internet, it is not surprising that the heads of these news outlets are coming together to decide what and how news is delivered to the rest of the world. However, I don’t think I’m the only one who thinks that none of these appointees are there for our interests in any way, shape, or form.

For more information and the original story, check out this link.

If you liked this post, then please be sure to subscribe to my feed.

Geekery

[Update] – a comment left by TLP gives a much better solution to the problem that seems to be better in benchmarks as well. I’m changing the post to reflect the best and fastest method for the situation described.

In a previous post, I described a situation where we needed to remove a repeating dot in a user name. In this article, I mentioned the first site that came up when searching in Google to find a solution.

I thought that I might have come across something that would be a solution for what this person was looking for. However, they clarified what they needed the function to do on their website and left me a comment with some more info:

“Thanks for citing us but the article was about making a unique chars string. For example: “aabbccaaaaaddee” will become “abcade“. That is what I accomplished in the article. I know very well regular expression and there are some way of accomplishing that but not in one step. Every char of the string must be separated by a separator (ex: comma, pipe, etc) and then apply the lookback regular expression.

The beauty of using regular expressions is this: a lot of steps that are needed to accomplish some string formating/parsing can be done in one step using RegExp. But if there are the same amount of steps it will be faster not to use RegExp.”

I love programming and I definitely love a challenge and so, webdev.andrei: I accept your challenge.

$str = 'aabbccaaaaaddee';
echo preg_replace('{(.)\1+}','$1',$str);
//abcade

This type of simple expression is thanks to a commenter that dropped by named TLP. We’ll try break it down to see how it works. The curly braces {(.)\1+} without a number in the middle means to repeat it as many times as needed until it no longer occurs. The round brackets {(.)\1+} create something called a backreference. A backreference allows it to reuse part of the regular expression match in the expression itself. So, when it comes to the first character repeating more than once, it will replace it with a single version of itself (or $1). It then places itself back in at the location of the backslash-1 {(.)\1+}.

Throughout the night until the wee hours of the morning I was furthering my ‘regex-fu’. I ultimately came to a simple loop that seems to satisfy the issue.

$string = 'aabbccaaaaaddee';
$new_string = '';
$starting_char = 0;
while (strlen($string) > 0 && $starting_char < strlen($string)) {
    $blah = preg_match('/[A-z]{2,}/', $string, $matches);
    $letter = $matches[0][$starting_char];
    $new_string .= $letter;
    $regex = '/' . $letter . '{2,}/';
    $string = preg_replace($regex, $letter, $string);
    $starting_char++;
}
echo $new_string;

In short: it tries to find a repeating character. When it finds one, it replaces it with a single version of itself. Now that it knows the first repeating instance is now a single character in the string, it can move on to the next character of the string and try that one out. It does this until there are only single instances of each character left in the string.

And there you have it. This could be done with any number of different characters by altering the first line of the loop and inputting a different range of characters instead of [A-z].

I hope this helps others and especially hope that webdev.andrei can get some use out of it.

If you liked this post, then please be sure to subscribe to my feed.

Geekery

While at work I came to an interesting conundrum when trying to validate/clean user names for the application I work on. Here is a description of the restrictions we put on our user names:

“ID may consists of A-Z, a-z, 0-9, underscores, and a single dot (.).”

While the project development staff is extremely talented, we didn’t have anyone who was particularly amazing with regular expressions. I decided to go ahead and take the plunge and start to do some research into the subject. I took a look around the internet and within the PHP documentation and couldn’t find an example of exactly what I was looking for. In fact, while searching for “php remove repeating character in string,” the first article listed in Google included a note at the bottom stating:

“I tried to have the same result using regular expressions but no success. If anyone resolved this using RegExp please share :).”

It was evident I was going to dive into the wonderful world of regex on my own. Our other filtering functions were easy enough. We have ranges of acceptable characters and some RFC-type reg-ex checks on many of our inputs that we let our users submit. In most cases running through something like this:

cFilter::clean_variable($variable);

This would run a preg_replace with the necessary regex expression and remove the bad stuff.

The challenging part was enforcing the single character. Here’s why. Let’s assume that someone wants to enter the user name of ‘cool…guy‘. First note: I am escaping the ‘.’ character with a backslash (‘\’).

$nickname = 'cool...guy';
if (ereg("\.{2,n}", $nickname, $reg)) {
    echo var_dump($reg);
}

This code produces the following output:

array(1) { [0]=> string(3) "..." }

So, if we were to run a replace using the simple “\.{2,n}”, we would end up blowing away all the dots. But that’s not what we’re looking for because the user is allowed to have one dot in their name. So, simply replacing the offending area with black characters was not going to work.

The second iteration I came up with was the following:

while (ereg("\.{2,}", $string, $reg)) {
    $string = str_replace($reg[0], ".", $string);
}

If you are familiar with PHP, you can probably see that I am using ereg with str_replace. So, why not just use ereg_replace? Well, that’d be a great plan. Here’s what I finally came up with:

$new_string = ereg_replace("\.{2,}", ".", $string);

So, if you wanted to use a different character, here’s an example of what you can do with this line of thinking. In this case, we want them to be able to have an ‘n’ in their name, but only one:

$string = 'coolnness';
$new_string = eregi_replace("n{2,}", "n", $string);
echo $new_string;

If you wanted to remove the offending characters all together, then use a blank character identifier in place of the second parameter like so:

$new_string = eregi_replace("n{2,}", "", $string);

Hope this helps someone else ’cause it sure took me a while to figure out. If you liked this post, please be sure to subscribe to my feed.