Thursday, June 21, 2018

php - Multiple matches within a regex group?



I need to match all 'tags' (e.g. %thisIsATag%) that occur within XML attributes. (Note: I'm guaranteed to receive valid XML, so there is no need to use full DOM traversal). My regex is working, except when there are two tags in a single attribute, only the last one is returned.



In other words, this regex should find tag1, tag2, ..., tag6. However, it omits tag2 and tag5.




Here's a fun little test harness for you (PHP):




$xml = <<































XML;

$matches = null;
preg_match_all('#<[^>]+("([^%>"]*%([^%>"]+)%[^%>"]*)+"|\'([^%>\']*%([^%>\']+)%[^%>\']*)+\')[^>]*>#i', $xml, $matches);

print_r($matches);

?>


Thanks! :)


Answer



What you're trying to do is recover intermediate captures from groups that match more than once per regex match. As far as I know, only .NET and Perl 6 provide that capability. You'll have to do the job in two stages: match an attribute value with one or more %tag% sequences in it, then break out the individual sequences.



You don't seem to care which XML tag or attribute the values are associated with, so you could use this, somewhat simpler regex to find the values with %tag% sequences in them:



'#"([^"%<>]*+%[^%"]++%[^"]*+)"|\'([^\'%<>]*+%[^%\']++%[^\']*+)\'#'



EDIT: That regex captures the attribute value in group 1 or group 2, depending in which quotes it used. Here's another version that merges the alternatives so it can always save the value in group 2:



'#(["\'])((?:(?![%<>]|\1).)*+%(?:(?!%|\1).)++%(?:(?!\1).)*+)\1#'

No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...