Parsing HTML with preg_match
April 8th, 2010
preg_match_all(‘#<b>.+?</b>#i’, $html, $matches);
// ? means non-greedy and is absolutely critical
$match[1], $match[2] contain the results
Non greedy stops the match gobbling everything between the first bold tag and the very last. I’ve never quite understood why greedy is the default behavior of RegExp (Regular Expressions) but I guess there’s a good reason for it.
PHP also has the very useful strip_tags function.