I'm looking for a .NET regular expression extract all the URLs from a webpage but haven't found one to be comprehensive enough to cover all the different ways you can specify a link.
And a side question:
Is there one regex to rule them all? Or am I better off using a series of less complicated regular expressions and just using mutliple passes against the raw HTML? (Speed vs. Maintainability)
Answer
((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)
I took this from regexlib.com
[editor's note: the {1} has no real function in this regex; see this post]
No comments:
Post a Comment