Groups

Apr 20 2017- POSTED BY projecth 0 Comment

We’ve seen it already where our match objects only have one group in them.
It’s often really handy to have multiple groups defined inside of your pattern, so that you can later access just parts of the text that you care about.

([abc]) – Group containing a set for ‘a’, ‘b’, and ‘c’. Can be accessed from the Match object as .group(1)
(?P<name>[abc]) – Name a group. This could later be accessed from the Match object as .group('name').
.groups() – Shows all of the groups on a Match object.
^ – Beginning of the string.
$ – End of the string.

Like for our case, making a group for the email address, and a group for the phone number, and a group for the name would make it a lot simpler later to pull those out and use them.

import re

names_file = open("names.txt", encoding="utf-8")
data = names_file.read()
names_file.close()

line = re.search(r'''
    ^(?P<name>[-\w ]*,\s[-\w ]+)\t             # Last and first names
    (?P<email>[-\w\d.+]+@[-\w\d.]+)\t          # Email
    (?P<phone>\(?\d{3}\)?-?\s?\d{3}-\d{4})?\t  # Phone
    (?P<job>[\w\s]+,\s[\w\s.]+)\t?             # Job and company
    (?P<twitter>@[\w\d]+)?$                    # Twitter
    ''', data, re.X|re.M)

print(line)
# OUTPUT is a Match object:
<_sr.SRE_Match object; span=(0, 86), match='Love, Kenneth\tkenneth@teamtreehouse.com\t(555) 5>

print(line.groupdict())
# OUTPUT is a Dictionary object.

.group(’email’)

Gets the contents of the group named email from the match object

.groupdict()

Creates a dictionary for each string item – the string item determined by start ^ and end $ – where keys are group names and values are their matched regex string chunk.
re.MULTILINE or re.M

A flag that treats newlines in our pattern as individual strings.

Exercise

Create a new variable named contacts that is an re.search() where the pattern catches the email address and phone number from string. Name the email pattern email and the phone number pattern phone. The comma and spaces * should not* be part of the groups.

Then, make a new variable, twitters that is an re.search() where the pattern catches the Twitter handle for a person. Remember to mark it as being at the end of the string. You’ll also want to use the re.MULTILINE flag.

import re

string = '''Love, Kenneth, kenneth+challenge@teamtreehouse.com, 555-555-5555, @kennethlove
Chalkley, Andrew, andrew@teamtreehouse.co.uk, 555-555-5556, @chalkers
McFarland, Dave, dave.mcfarland@teamtreehouse.com, 555-555-5557, @davemcfarland
Kesten, Joy, joy@teamtreehouse.com, 555-555-5558, @joykesten'''

contacts = re.search(r'''
        (?P<email>[-\w+.]+@[-\w.]+),\s
        (?P<phone>[-\(\d\)\s]+)
        ''', string, re.X)

twitters = re.search(r'''
        (?P<twitter>@[\w\d]+)$
        ''', string, re.X|re.M)

Mind Juice

J U S T · A · S A F E T Y · N E T

Groups

Exercise