[abc]
– ‘a’, ‘b’, and ‘c’ set of characters It’ll match any of those characters, in any order, but only once each.[a-z]
,[A-Z]
, or[a-zA-Z]
– ranges that’ll match all English letters.[0-9]
– range numbers from 0 to 9.[^2]
– Exclude 2.[^abc]
– Exclude the letters ‘a’, ‘b’, and ‘c’.re.IGNORECASE
orre.I
– Case-insensitivity flag.re.match('A', 'apple', re.I)
would find the ‘a’ in ‘apple’.re.VERBOSE
orre.X
– Multiple-lined regular expression flag, which ignores whitespace and comments.re.MULTILINE
orre.M
– flag to make a pattern regard lines in your text as the beginning or end of a string.
import re names_file = open("names.txt", encoding="utf-8") data = names_file.read() names_file.close() # Search for email addresses with typical email formatting print(re.findall(r'[-\w\d+.]+@[-\w\d.]+', data)) # Search for 'Treehouse' and IGNORECASE print(re.findall(r'\b[trehous]{9}\b', data, re.IGNORECASE)) print(re.findall(r'\b[trehous]{9}\b', data, re.I))
Multi-line Regex
import re names_file = open("names.txt", encoding="utf-8") data = names_file.read() names_file.close() print(re.findall(r''' \b@[-\w\d.]* # First a word boundary, an @, and then any number of characters [^gov\t]+ # Ignore 1+ instances of the letters 'g', 'o', or 'v' and a tab. \b # Match another word boundary ''', data, re.VERBOSE|re.I)) print(re.findall(r""" \b[-\w]*, # Find a word boundary, 1+ hyphens or characters, and a comma \s # Find 1 whitespace [-\w ]+ # 1+ hyphens and characters and explicit spaces [^\t\n] # Ignore tabs and newlines """, data, re.X))
Exercise
Create a function named find_words that takes a count and a string. Return a list of all of the words in the string that are count word characters long or longer.
import re # EXAMPLE: # >>> find_words(4, "dog, cat, baby, balloon, me") # ['baby', 'balloon'] def find_words(count, string): return re.findall(r'\w{{{},}}'.format(count), string) # OR return re.findall(r'\w{%d,}'% (count), string)
String format braces should be doubled {{ }} inside a raw string not to be confused by the regex range braces.