I am going through learncodethehardway's regular expression courses.
One matching criteria is [A-Za-z]+?
The author explains that by adding the ?
, this matching criteria will be "non-greedy".
However, I think the following two are equivalent. Shouldn't it always generate the same output if I use it for parsing strings?
[A-Za-z]+?
[A-Za-z]+
Am I correct?
Answer
Actually, no they are not the same. If you include the "?", it will match as FEW characters as possible. Here's an example:
var string = 'abcdefg';
alert(string.match(/[A-Za-z]+/));
var string = 'abcdefg';
alert(string.match(/[A-Za-z]+?/));
Now, the "?" can still match multiple characters, but only if it has too, like this:
var string = 'abcdefg>';
alert(string.match(/[A-Za-z]+>/));
Now it gets a little confusing. Check out this example that does NOT include the "?" (the dot character matches everything but a space or new line character):
var string = 'sldkfjsldkj>';
alert(string.match(/<.+>/));
You can see that it matches everything. However, with the "?", it will match only up to the first ">".
var string = 'sldkfjsldkj>';
alert(string.match(/<.+?>/));
Now it's time for a practical example of why we would need the "?" symbol. Suppose I want to match everything between HTML tags:
var string = 'I\'m strong<\/strong> I\'m not strong I am again.<\/strong>';
alert(string.match(/[\s\S]+<\/strong>/));
As you can see, that matched EVERYTHING between the first and last tags. To match ONLY one, use the life-saving "?" symbol again:
var string = 'I\'m strong<\/strong> I\'m not strong I am again.<\/strong>';
alert(string.match(/[\s\S]+?<\/strong>/));
No comments:
Post a Comment