[abc]– ‘a’, ‘b’, and ‘c’ set of characters It’ll match any of those characters, in any order, but only once each.[a-z],[A-Z], or[a-zA-Z]– ranges that’ll match all English letters.[0-9]– range numbers from 0 to 9.[^2]– Exclude 2.[^abc]– Exclude the letters ‘a’, ‘b’, and ‘c’.re.IGNORECASEorre.I– Case-insensitivity flag.re.match('A', 'apple', re.I)would find the ‘a’ in ‘apple’.re.VERBOSEorre.X– Multiple-lined regular expression flag, which ignores whitespace and comments.re.MULTILINEorre.M– flag to make a pattern regard lines in your text as the beginning or end of a string.
import re
names_file = open("names.txt", encoding="utf-8")
data = names_file.read()
names_file.close()
# Search for email addresses with typical email formatting
print(re.findall(r'[-\w\d+.]+@[-\w\d.]+', data))
# Search for 'Treehouse' and IGNORECASE
print(re.findall(r'\b[trehous]{9}\b', data, re.IGNORECASE))
print(re.findall(r'\b[trehous]{9}\b', data, re.I))
Multi-line Regex
import re
names_file = open("names.txt", encoding="utf-8")
data = names_file.read()
names_file.close()
print(re.findall(r'''
\b@[-\w\d.]* # First a word boundary, an @, and then any number of characters
[^gov\t]+ # Ignore 1+ instances of the letters 'g', 'o', or 'v' and a tab.
\b # Match another word boundary
''', data, re.VERBOSE|re.I))
print(re.findall(r"""
\b[-\w]*, # Find a word boundary, 1+ hyphens or characters, and a comma
\s # Find 1 whitespace
[-\w ]+ # 1+ hyphens and characters and explicit spaces
[^\t\n] # Ignore tabs and newlines
""", data, re.X))
Exercise
Create a function named find_words that takes a count and a string. Return a list of all of the words in the string that are count word characters long or longer.
import re
# EXAMPLE:
# >>> find_words(4, "dog, cat, baby, balloon, me")
# ['baby', 'balloon']
def find_words(count, string):
return re.findall(r'\w{{{},}}'.format(count), string)
# OR
return re.findall(r'\w{%d,}'% (count), string)
String format braces should be doubled {{ }} inside a raw string not to be confused by the regex range braces.