1. [] A set of characters
In regular expressions, `[]` denotes a character class, also known as a character set. It allows you to specify a set of characters from which you want to match a single character.
Here’s how `[]` works:
– `[ ]`: Matches any single character within the brackets.
For example:
– `[abc]`: Matches either ‘a’, ‘b’, or ‘c’.
– `[a-z]`: Matches any lowercase letter from ‘a’ to ‘z’.
– `[0-9]`: Matches any digit from ‘0’ to ‘9’.
– `[aeiou]`: Matches any vowel.
– `[^abc]`: Matches any character except ‘a’, ‘b’, or ‘c’ (the `^` at the beginning negates the character class).
Example:
“`python
import re
pattern = r'[aeiou]’ # Matches any vowel
test_string = “Hello World!”
matches = re.findall(pattern, test_string)
print(matches) # Output: [‘e’, ‘o’, ‘o’]
“`
In this example, the pattern `[aeiou]` matches any vowel characters (‘a’, ‘e’, ‘i’, ‘o’, ‘u’) in the test string “Hello World!”. The `findall()` function returns a list containing all matches found.
2. Signals a special sequence (can also be used to escape special characters)
In regular expressions, the backslash “ serves multiple purposes, one of which is to signal a special sequence or escape special characters. Here’s how it works:
1. Signaling a Special Sequence:
– `d`: Matches any digit (equivalent to `[0-9]`).
– `w`: Matches any word character (alphanumeric and underscore).
– `s`: Matches any whitespace character.
– `b`: Matches a word boundary.
– `A`: Matches the start of the string.
– `Z`: Matches the end of the string (ignoring newline).
– `b`: Matches a word boundary.
2. Escaping Special Characters:
– `.`: Matches a literal period (dot).
– `[`: Matches a literal opening square bracket.
– `]`: Matches a literal closing square bracket.
– “: Matches a literal backslash.
Example:
“`python
import re
pattern = r’d+’ # Matches one or more digits
test_string = “I have 5 apples and 3 bananas.”
matches = re.findall(pattern, test_string)
print(matches) # Output: [‘5’, ‘3’]
“`
In this example, the pattern `d+` matches one or more digits in the test string “I have 5 apples and 3 bananas.” The backslash “ signals a special sequence `d`, which matches any digit character. The `+` quantifier matches one or more occurrences of the preceding character or sequence, in this case, digits. The `findall()` function returns a list containing all matches found.
Â
3. . Any character (except newline character)
In regular expressions, the dot `.` (period) represents a wildcard that matches any single character except for the newline character (`n`). It serves as a placeholder for any character in the string.
Here’s how it works:
– `.`: Matches any single character (except newline `n`).
Example:
“`python
import re
pattern = r’c.t’ # Matches ‘cat’, ‘cbt’, ‘cct’, etc.
test_string = “The cat sat on the mat.”
matches = re.findall(pattern, test_string)
print(matches) # Output: [‘cat’, ‘cot’]
“`
In this example, the pattern `c.t` matches any three-character sequence where the first and third characters are ‘c’ and ‘t’ respectively, and the second character can be any single character. The `findall()` function returns a list containing all matches found in the test string “The cat sat on the mat.”
4. ^ Starts with
In regular expressions, the caret `^` is a metacharacter that indicates the start of a string or the start of a line, depending on the context in which it is used.
Here’s how it works:
– `^`: Matches the start of the string or the start of a line.
Example:
“`python
import re
pattern = r’^hello’ # Matches ‘hello’ only if it occurs at the start of the string or line
test_string = “hello world!”
match = re.search(pattern, test_string)
if match:
print(“Match found:”, match.group()) # Output: “hello”
else:
print(“No match found.”)
“`
In this example, the pattern `^hello` matches “hello” only if it occurs at the start of the string. Since “hello” is indeed at the beginning of the string “hello world!”, a match is found, and the matched substring “hello” is printed.
If you were to use the same pattern with the `re.MULTILINE` flag, it would match “hello” at the start of any line in a multiline string, as opposed to just the start of the string itself.
5. $ Ends with
In regular expressions, the dollar sign `$` is a metacharacter that indicates the end of a string or the end of a line, depending on the context in which it is used.
Here’s how it works:
– `$`: Matches the end of the string or the end of a line.
Example:
“`python
import re
pattern = r’world$’ # Matches ‘world’ only if it occurs at the end of the string or line
test_string = “hello world”
match = re.search(pattern, test_string)
if match:
print(“Match found:”, match.group()) # Output: “world”
else:
print(“No match found.”)
“`
In this example, the pattern `world$` matches “world” only if it occurs at the end of the string. Since “world” is indeed at the end of the string “hello world”, a match is found, and the matched substring “world” is printed.
If you were to use the same pattern with the `re.MULTILINE` flag, it would match “world” at the end of any line in a multiline string, as opposed to just the end of the string itself.
6. * Zero or more occurrences
The asterisk `*` in regular expressions signifies zero or more occurrences of the preceding character or group. It’s a quantifier that indicates that the preceding element can occur zero or more times.
Here’s how it works:
– `*`: Matches zero or more occurrences of the preceding character or group.
Example:
“`python
import re
pattern = r’go*gle’ # Matches ‘ggle’, ‘gogle’, ‘google’, ‘gooogle’, etc.
test_string = “google gooogle ggle”
matches = re.findall(pattern, test_string)
print(matches) # Output: [‘google’, ‘gooogle’, ‘ggle’]
“`
In this example, the pattern `go*gle` matches “ggle”, “gogle”, “google”, “gooogle”, etc., where the character ‘o’ can occur zero or more times. The `findall()` function returns a list containing all matches found in the test string “google gooogle ggle”.
7. + One or more occurrences
The plus sign `+` in regular expressions signifies one or more occurrences of the preceding character or group. It’s a quantifier that indicates that the preceding element must occur at least once, but can occur multiple times.
Here’s how it works:
– `+`: Matches one or more occurrences of the preceding character or group.
Example:
“`python
import re
pattern = r’go+gle’ # Matches ‘gogle’, ‘google’, ‘gooogle’, etc.
test_string = “google gooogle ggle”
matches = re.findall(pattern, test_string)
print(matches) # Output: [‘google’, ‘gooogle’]
“`
In this example, the pattern `go+gle` matches “gogle”, “google”, “gooogle”, etc., where the character ‘o’ must occur at least once. The `findall()` function returns a list containing all matches found in the test string “google gooogle ggle”.
Â
8. ? Zero or one occurrences
The question mark `?` in regular expressions signifies zero or one occurrence of the preceding character or group. It’s a quantifier that indicates that the preceding element is optional and can occur either zero times or once.
Here’s how it works:
– `?`: Matches zero or one occurrence of the preceding character or group.
Example:
“`python
import re
pattern = r’colou?r’ # Matches both ‘color’ and ‘colour’
test_string = “The color of the sky is blue. The colour of the sea is blue as well.”
matches = re.findall(pattern, test_string)
print(matches) # Output: [‘color’, ‘colour’]
“`
In this example, the pattern `colou?r` matches both “color” and “colour”, where the letter ‘u’ is optional. The `findall()` function returns a list containing all matches found in the test string “The color of the sky is blue. The colour of the sea is blue as well.”.
9. {} Exactly the specified number of occurrences
The curly braces `{}` in regular expressions specify the exact number of occurrences of the preceding character or group. It allows you to define precise repetition constraints on the preceding element.
Here’s how it works:
– `{m}`: Matches exactly m occurrences of the preceding character or group.
– `{m,n}`: Matches at least m and at most n occurrences of the preceding character or group.
– `{m,}`: Matches at least m occurrences of the preceding character or group.
Example:
“`python
import re
pattern = r’d{3}’ # Matches three consecutive digits
test_string = “The number is 123456789.”
matches = re.findall(pattern, test_string)
print(matches) # Output: [‘123’, ‘456’, ‘789’]
“`
In this example, the pattern `d{3}` matches exactly three consecutive digits in the test string “The number is 123456789.” The `findall()` function returns a list containing all matches found, which are “123”, “456”, and “789”.
10. | Either or
The pipe symbol `|` in regular expressions signifies an alternation, allowing you to specify alternatives within a pattern. It’s used to match either one expression or another.
Here’s how it works:
– `|`: Matches either the expression before or after the alternation operator.
Example:
“`python
import re
pattern = r’cat|dog’ # Matches either ‘cat’ or ‘dog’
test_string = “I have a cat and a dog as pets.”
matches = re.findall(pattern, test_string)
print(matches) # Output: [‘cat’, ‘dog’]
“`
In this example, the pattern `cat|dog` matches either “cat” or “dog” in the test string “I have a cat and a dog as pets.”. The `findall()` function returns a list containing all matches found, which are “cat” and “dog”.
11. () Capture and group
In regular expressions, parentheses `()` are used for capturing and grouping parts of a pattern. They serve two main purposes:
1. Capturing: Parentheses are used to capture the matched substring enclosed within them. This captured substring can then be referenced or extracted later.
2. Grouping: Parentheses are used to group parts of a pattern together, allowing you to apply quantifiers or other operators to the entire group.
Here’s how it works:
– `()`: Captures and groups the enclosed part of the pattern.
Example (Capturing):
“`python
import re
pattern = r'(d{3})-(d{3})-(d{4})’ # Matches a phone number pattern: ###-###-####
test_string = “My phone number is 123-456-7890.”
match = re.search(pattern, test_string)
if match:
print(“Full match:”, match.group(0)) # Output: “123-456-7890”
print(“Area code:”, match.group(1)) # Output: “123”
print(“Prefix:”, match.group(2)) # Output: “456”
print(“Line number:”, match.group(3)) # Output: “7890”
“`
In this example, the pattern `(d{3})-(d{3})-(d{4})` captures and groups three parts of a phone number separated by hyphens. The `search()` function finds the first match in the test string “My phone number is 123-456-7890.”, and the `group()` method is used to access each captured group separately.
Grouping allows you to apply quantifiers or other operators to the entire group. For example, `(ab)+` matches “ab”, “abab”, “ababab”, and so on.
Â
Â