Thursday, 29 August 2013

Looking for difference between re.match(pattern, ...) and re.search(r'\A' + pattern, ...)

Looking for difference between re.match(pattern, ...) and re.search(r'\A'
+ pattern, ...)

(All the code below assumes a context where import re has already been
evaluated.)
The documentation on the differences between re.match and re.search
specifically compares running re.match(pattern, ...) with running
re.search('^' + pattern, ...). This seems to me a bit of a strawman,
because the real test would be to compare re.match(pattern, ...) with
re.search(r'\A' + pattern, ...)1.
To be more specific, I for one can't readily come up with a combination of
pattern and string for which the outcome of
m = re.match(pattern, string)
will differ from the outcome of
m = re.search(r'\A' + pattern, string)
(Note that if the original pattern in pattern happens to be of type
unicode, so is the revised pattern in r'\A' + pattern, conveniently
enough.)
Let me emphasize that here I'm not interested in possible differences in
performance. At the moment I'm interested only in differences in the final
outcomes (i.e. differences in the final values of m).
To phrase the question somewhat more generally, I'm looking for a
combination of pattern, flags, string, and kwargs such that the final
value of m in
r0 = re.compile(pattern, flags=flags)
m = r0.match(string, **kwargs)
differs from the final value of m in
r1 = re.compile(r'\A' + pattern, flags=flags)
m = r1.search(string, **kwargs)
Alternatively, someone with sufficiently in-depth knowledge of the
implementation of Python's regular expressions may answer this question in
the negative (i.e. no combination of pattern, flags, string, and kwargs
will produce different outcomes above).



1 \A anchors the matching to the beginning of the string, irrespective of
whether the matching is multiline or not. BTW, the counterpart of \A for
end-of-string matching is \z. I remember these as "the very beginning–"
and "the very end of the string", and my mnemonic for it (if one may call
it that) is that, in the standard sort order, A < Z < a < z.

No comments:

Post a Comment