I'm stuck on a regex. I'm trying to match words in any language to the right of a colon without matching the colon itself.
The basic rule:
For a line to be valid, it must not begin with or contain any characters outside of
[a-z0-9_]
until after:
.Any characters to the right of
:
should match as long as the line begins with the set of characters defined above.
For instance, given a string such as these:
this string should not match
bob_1:Hi. I'm Bob. I speak русский and this string should match
alice:Hi Bob. I speak 한국어 and this string should also match
http://example.com - would prefer to not match URLs
This string:should not match because no spaces or capital letters are allowed left of the colon
Only 2 of the 5 strings above need to match. And only to the right of the colon.
Hi. I'm Bob. I speak русский and this string should match
Hi Bob. I speak 한국어 and this string should also match
I'm currently using (^[a-z0-9_]+(?=:))
to match characters to the left of :
. I just can't seem to reverse the logic.
The closest I have at the moment is (?!(?!:)).+
. This seems to match everything to right of the colon as well as the colon itself. I just can't figure out how to not include :
in the match.
Can one of you regex wizards help me out? If anything is unclear please let me know.
Answer
Short regex pattern (case insensitive):
^\w+:(\w.*)
\w
- matches any word character (equal to[a-zA-Z0-9_]
)
https://regex101.com/r/MZhqSL/6
As you marked pcre
, here's the pattern you need (only to the right of the colon):
^\w+:\K\w.*
\K
- resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
https://regex101.com/r/E1yHVY/1
No comments:
Post a Comment