0

Regular Expressions - Using Special Sequences

Hello, I understand and know how this code works: >>> import re pattern = r"(.+) \1" match = re.match(pattern, "word word") if match: print ("Match 1") match = re.match(pattern, "?! ?!") if match: print ("Match 2") match = re.match(pattern, "abc cde") if match: print ("Match 3") <<< It's the first example for the lesson "special sequences" in the module "regular expressions" (Course: Python Core) and outputs Match 1 and Match 2. But what I don't understand is why changing the regex-pattern to r"(.)+ \1" outputs Match 3 and the regex-pattern r"(.) \1" doesn't output anything at all. I would have naively expected that both changes shouldn't change the output at all. But it obviously does. More extremely, it evokes quite the opposite output. Could anyone explain why Python behaves like this? Thanks in advance and cheers to all helping coders out there!

13th Mar 2021, 3:04 PM
Fabian Barthold
Fabian Barthold - avatar
3 Answers
+ 1
original r'(.+) \1' search for 1 or more chat followed by a space then same char sequence (capture 1, wich is the part inside parenthesis... r'(.)+ \1' search for 1 or more char with only the last one captured, followed by a space and again the captured char r'(.) \1' search for 1 char followed by a space and again the same char. all regex patterns could give different results if used with match (search from string start -- as if there's the ^ anchor at its start) or with search (search from anywhere inside string): pattern = "world" text = "Hello world!" re.match(pattern,text) # None re.search(pattern,text) # Match object
13th Mar 2021, 4:28 PM
visph
visph - avatar
0
the dot sign . means 1 occurence of any character except new line but + sign means 1 or more character occurence(s). Hence (.)+ means 1 or more occurence of any character. https://www.w3schools.com/JUMP_LINK__&&__python__&&__JUMP_LINK/python_regex.asp
13th Mar 2021, 9:40 PM
iTech
iTech - avatar
0
Hey guys, sorry, I've been busy in the last few weeks and had not much time to deal with SoloLearn. Nonetheless an heartful thank you to you both for answering my question! :) @visph: Interesting! I didn't know or simply forgot that re.match searches from the start of the string. I never really put much thought to the fact that the space is also interpreted as a part of the regular expression. I simply thought it was a necessary "command" before the recall of the group (special sequence: \1). Also interesting that r'(.)+ \1 captures only the last character. That explains a lot, but is still a confusing default option to me. Do you know why this is set like this in Python? I tested that with this line of code: >>> import re pattern = r"(.)+ \1" match = re.match(pattern, "abc cde") print(match) if match: print ("Match 3") <<< Output: <re.Match object; span=(0, 5), match='abc c'> Match 3 Quite intruiging. So re.match matches first "abc" with the regular expression, but only captures the char "c" to use it as the new pattern for the group recall. Finds it again in "cde" and matches of course only "c". Faszinating!
5th Apr 2021, 9:06 AM
Fabian Barthold
Fabian Barthold - avatar