Getting the urls and only the urls with ereg

Trying to match any weird and wonderful url or hyperlink in a piece of text is fun – I found “help” all over the place, but none of them quite got it all –  This was clear when using the eccentric holidays calendar for a test case in my ical upcoming events list

Eventually with the help of “The Regex Coach” (thank you very much) to test we (I’m a gemini) settled on the following code.  It will find any of the following links, while excluding any text already hyper linked.

Dasher, Dancer, Prancer, Vixen, Comet, Cupid, Donner, Blixen….

The code handles urls with dashes, underscores, trailing slashes, suffixes, query strings ? and ampersands &, equal signs.  It rejects urls with other funny characters.

We want regular expressions to replace EXPR in the formula below:

$text = ereg_replace( EXPR , "<a href=\"\\0\">\\0</a>", $text);

http or https or ftp type urls:

"[a-zA-Z]+://([.]?[a-zA-Z0-9-])*([/]?[a-zA-Z0-9_-])*([/a-zA-Z0-9?&#\._=-]*)"

and another for www urls (at start of line or after space)

“(^| |\n)(www([.]?[a-zA-Z0-9-])*)([/]?[a-zA-Z0-9_-])*([/a-zA-Z0-9?&#\._=-]*)”

These two lines of code deals with the following examples:

  • http://smarmycarny.com/get-over-it-day
  • http://www.smarmycarny.com/get-over-it-day
  • www.smarmycarny.com/get-over-it-day
  • www.smarmycarny.com/get_over_it-day/
  • www.smarmycarny.com/get-over-it-day/test.htm
  • www.smarmycarny.com/get-over-it-day/test.php?somequery&yes=true
  • www.smarmycarny.com/get-over-it-day/test.html#bot-tom

Strange urls are ignored or dealt with accordingly. See example:

Demo of ignored and selected text
Demo of ignored and selected text