Sets & Flags

Apr 20 2017- POSTED BY projecth 0 Comment

[abc] – ‘a’, ‘b’, and ‘c’ set of characters It’ll match any of those characters, in any order, but only once each.
[a-z], [A-Z], or [a-zA-Z] – ranges that’ll match all English letters.
[0-9] – range numbers from 0 to 9.
[^2] – Exclude 2.
[^abc] – Exclude the letters ‘a’, ‘b’, and ‘c’.
re.IGNORECASE or re.I – Case-insensitivity flag. re.match('A', 'apple', re.I) would find the ‘a’ in ‘apple’.
re.VERBOSE or re.X – Multiple-lined regular expression flag, which ignores whitespace and comments.
re.MULTILINE or re.M – flag to make a pattern regard lines in your text as the beginning or end of a string.

import re

names_file = open("names.txt", encoding="utf-8")
data = names_file.read()
names_file.close()

# Search for email addresses with typical email formatting
print(re.findall(r'[-\w\d+.]+@[-\w\d.]+', data))

# Search for 'Treehouse' and IGNORECASE
print(re.findall(r'\b[trehous]{9}\b', data, re.IGNORECASE))
print(re.findall(r'\b[trehous]{9}\b', data, re.I))

Multi-line Regex

import re

names_file = open("names.txt", encoding="utf-8")
data = names_file.read()
names_file.close()

print(re.findall(r'''
    \b@[-\w\d.]*  # First a word boundary, an @, and then any number of characters
    [^gov\t]+     # Ignore 1+ instances of the letters 'g', 'o', or 'v' and a tab.
    \b            # Match another word boundary
    ''', data, re.VERBOSE|re.I))

print(re.findall(r"""
    \b[-\w]*,     # Find a word boundary, 1+ hyphens or characters, and a comma
    \s            # Find 1 whitespace
    [-\w ]+       # 1+ hyphens and characters and explicit spaces
    [^\t\n]       # Ignore tabs and newlines
    """, data, re.X))

Exercise

Create a function named find_words that takes a count and a string. Return a list of all of the words in the string that are count word characters long or longer.

import re

# EXAMPLE:
# >>> find_words(4, "dog, cat, baby, balloon, me")
# ['baby', 'balloon']

def find_words(count, string):
    return re.findall(r'\w{{{},}}'.format(count), string)
    # OR
    return re.findall(r'\w{%d,}'% (count), string)

String format braces should be doubled {{ }} inside a raw string not to be confused by the regex range braces.

Mind Juice

J U S T · A · S A F E T Y · N E T

Sets & Flags

Multi-line Regex

Exercise