What is Regular Expression?
Regular Expression
A regular expression is a sequence of characters which enable you to find string or set of string using a specialized pattern.
Regular expressions also called REs, or regexes, or regex patterns made available through the re module in Python.
Now let's try to understand it
Note: In Python if we try to find out whether a particular characters (or set of characters) is in string or not, if it is in string then where it is (means index of it).
For this we have to write a simple code given below
#A simple example of python
Str= 'abcdefg12345'
print('23' in Str)
Str.index('23')
|
Output : True
8
Note: In the above example we simply trace each character one by one and match them to find out required characters. It works for simple problems but what if we have asked to find out Three consecutive decimal digits in a given String.
In such cases we cannot use simple method (as we used in above example)
Here come the concept of Regular Expression Or RegEX.
Regular expression is a python library we can install it with help of pip.
Installation:
In python there is a package called re to use regex (Regular Expression).
Now let's see some examples to understand it.
#A simple example
import re
Str= 'abcdefg12345'
re.search("123", Str)
|
Output:
Note: As you can see the output contain useful information about string in just one line of code. It give information if a searched object present in string or not and the index of objects in string.
re.search() function return a match objects if there is a match of searched object into given string.
Actually re offers some functions:
- search()
- findall()
- split()
- and sub()
search(): It return a match objects if there is a match of searched object into given string.
*** Note that in the absence of no match (means no match object found for searched object), the search function will execute without a output and error. (means no error and no output) ***
#search function explanation
import re
Str = 'Python is a useful language Python'
match= re.search('Python', Str)
if match:
print("match found")
else:
print("None")
|
Output:
match found
Note: Here we simply searched for a object (Python) to find out if it is in string or not.
findall(): It return all the matched objects in string.
#findall function explanation
import re
Str = 'Python is a useful language Python'
re.findall('Python', Str)
|
Output:
Note: As we have to "Python" in our string, findall() function return both "Python".
Split(): Split the string where split has matched and formed a list of it.
#split function explanation
import re
Str = 'Python is a useful language Python'
match= re.split('use', Str)
print(match)
match= re.split(' ', Str)
print(match)
|
Output:
sub(): Replace the matches with a characters (or string ) of your choice.
#sub function explanation
import re
Str = 'Python is a useful language Python'
match= re.sub('use', " 123", Str)
print(match)
match= re.sub(' ', "5", Str)
print(match)
|
Output:
What are metacharacters in Python?
Metacharacters
what is meta characters in Regular Expression (re), To understand it let's first see an example.
#Example of metacharcaters
import re
Str = 'Rama is 22 and Sita is 23 and Ravan is 44 '
match= re.findall(r'[A-Z][a-z]*', Str)
print(match)
match= re.findall(r'\d{1,3}', Str)
print(match)
|
Output
Note: '[A-Z][a-z]*' and '\d{1,3}' seeing these you might be wondering . Actually these are metacharcters in regular expression (re) with a special meaning.
Here, [A-Z][a-z]* means:We are searching for objects in string start with a capital letter (A to Z ,any) followed by small letter (a to z , any). In the above example all names start with a capital letter followed by small letter. And because we used findall() function That's why we got all the names present in string.
What if we used search function not findall(), you can see it by replacing findall() in the above example with search().
BTW, we will get first matched object when use search function. (i.e. Rama)
'\d{1,3}': \d means finding all digit characters in string and {1,3} number of atleast 2 digits (22, 23 two digits).
Here, Listed some important Metacharacters
Characters (Example) |
Description |
[] ([a-z]) |
A set of characters |
\ (\d) |
Signals a special sequence |
. (he....o) |
Any character (Except newline character) |
^ (^hello) |
Start with |
$ (End$) |
Ends With |
* (an*) |
Zero or more occurrences |
+ (any+) |
One or more occurrences |
{} (al{2}) |
Exactly the specified number of occurrences |
| (python|C) |
Either or |
() |
Capture a Group |
Special Sequences
Characters |
Description |
\A |
Returns a match if the specified characters are at the beginning of the string |
\b |
Returns a match where the specified characters are at the beginning or at the end of a word |
\B |
Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word |
\d |
Returns a match where the string contains digits (numbers from 0-9) |
\D |
Returns a match where the string DOES NOT contain digits |
\s |
Returns a match where the string contains a white space character |
\S |
Returns a match where the string DOES NOT contain a white space character |
\w |
Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) |
\W |
Returns a match where the string DOES NOT contain any word characters |
\Z |
Returns a match if the specified characters are at the end of the string |
Python Tutorial
Machine Learning Tutorial