Thursday, June 20, 2019

python - Regex include line breaks





I have the following xml file




A





B
C




D



Picture number 3?




and I just want to get the text between

and
.
So I've tried this code :



import os, re

html = open("2.xml", "r")
text = html.read()

lon = re.compile(r'
\n(.+)\n
', re.MULTILINE)
lon = lon.search(text).group(1)
print lon


but It doesn't seem to work.


Answer



1) Don't parse XML with regex. It just doesn't work. Use an XML parser.



2) If you do use regex for this, you don't want re.MULTILINE, which controls how ^ and $ work in a multiple-line string. You want re.DOTALL, which controls whether . matches \n or not.




3) You probably also want your pattern to return the shortest possible match, using the non-greedy +? operator.



lon = re.compile(r'
\n(.+?)\n
', re.DOTALL)

No comments:

Post a Comment

plot explanation - Why did Peaches' mom hang on the tree? - Movies & TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...