Friday, May 31, 2019

.net - Regular expression for parsing links from a webpage?



I'm looking for a .NET regular expression extract all the URLs from a webpage but haven't found one to be comprehensive enough to cover all the different ways you can specify a link.



And a side question:



Is there one regex to rule them all? Or am I better off using a series of less complicated regular expressions and just using mutliple passes against the raw HTML? (Speed vs. Maintainability)



Answer



((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)


I took this from regexlib.com



[editor's note: the {1} has no real function in this regex; see this post]


No comments:

Post a Comment

plot explanation - Why did Peaches' mom hang on the tree? - Movies & TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...