Regular expressions are used for pattern matching in a string. They provide a general way for matching patters for finding valid phone numbers, email addresses which adhere to certain format. This is a tedious task to do using normal search functionalities. Regular expressions are implemented internally using finite automata.
matches first occurance of pattern in the string and returns match object containing index of match and the matched string
import re #import regex module
#appending r before string makes it raw string
print(re.search('n', r'ba\n'))
running above program ouput will be :
<_sre.SRE_Match object; span=(3, 4), match='n'>
Method | Functionality |
---|---|
start() | return starting index of matched string |
end() | return ending index of matched string |
span() | return starting and ending index as tuple |
group() | return matched string |
import re
print(re.search('ab', '12abcd').start())
print(re.search('ab', '12abcd').end())
print(re.search('ab', '12abcd').span())
print(re.search('ab', '12abcd').group())
running above program ouput will be :
2
4
(2, 4)
ab
matches pattern at the beginning of the string (prefix matching) and returns the match object
import re
print(re.match('abc', 'abcdef'))
running above program ouput will be :
<_sre.SRE_Match object; span=(0, 3), match='abc'>
finds all occurances of pattern in string and returns list of matched strings
import re
print(re.findall('[0-9]+', '12ad123cd'))
['12', '123']
replace all matches in string with repla and returns modified string
import re
print(re.sub('ab', 'cd', 'habcabc'))
hcdccdc
compiles pattern into returns regular expression object. This object can be used for matching using match(), search() and other methods.
import re
p = re.compile('ab') #p can be reused
print(p.search('cab'))
running above program ouput will be :
<_sre.SRE_Match object; span=(1, 3), match='ab'>
import re
print(re.search('ab*c', 'abbbbbc'))
print(re.search('ab?c', 'ac'))
print(re.search('ab+c', 'abbc'))
print(re.search('a{1,4}', 'aaabcd'))
running above program ouput will be :
<_sre.SRE_Match object; span=(0, 7), match='abbbbbc'>
<_sre.SRE_Match object; span=(0, 2), match='ac'>
<_sre.SRE_Match object; span=(0, 4), match='abbc'>
<_sre.SRE_Match object; span=(0, 3), match='aaa'>
import re
print(re.search('[a-c]{2}', 'dmacf'))
print(re.search('[0-9]{3}', 'aa123'))
print(re.search('[^a-c]', 'ca1d'))
print(re.search('.+', 'aaabcd\n'))
running above program ouput will be :
<_sre.SRE_Match object; span=(2, 4), match='ac'>
<_sre.SRE_Match object; span=(2, 5), match='123'>
<_sre.SRE_Match object; span=(2, 3), match='1'>
<_sre.SRE_Match object; span=(0, 6), match='aaabcd'>
import re
print(re.search('\w\w', 'c123'))
print(re.search('\d\d\d', 'aa123bc'))
print(re.search('\W', 'ca\n'))
print(re.search('\D', '123c23'))
print(re.search('\s', 'how are'))
running above program ouput will be :
<_sre.SRE_Match object; span=(0, 2), match='c1'>
<_sre.SRE_Match object; span=(2, 5), match='123'>
<_sre.SRE_Match object; span=(2, 3), match='\n'>
<_sre.SRE_Match object; span=(3, 4), match='c'>
<_sre.SRE_Match object; span=(3, 4), match=' '>