More about Regular Expressions

This lecture notes is based on Prof Doreen De Leon's lectures.

Reading

Learning Perl, Chapter 8

Character Classes


File

XaX
XabcX
XdXaX

Program

while(<>) {
    # matches XaX, XbX, XcX
    if(/X[abc]X/) {
        print $_;
    }
}

Output

XaX
XdXaX

File

XaX
XadX
aXa1Xb
aXbX3Xc
XaXe9X

Program

while(<>) {
    # matches Xa1X, Xb3X, Xc0X etc
    if(/X[a-e][0-9]X/) {
        print $_;
    }
}

Output

aXa1Xb
XaXe9X

Program

while(<>) {
    # matches XaX,XbX,XcX,XdX,XxX,XyX,XzX
    if(/X[a-dx-z]X/) {
        print $_;
    }
}

Negated Character Class


File

XaX
XbX
1XcX2

Program

while(<>) {
    # no a between X
    if(/X[^a]X/) {
        print $_;
    }
}

Output

XbX
1XcX2

Predefined Character Classes


File

12
1a1a1
abcd
fg456

Program

while(<>) {
    # matches 2 digit
    if(/\d\d/) {
        print $_;
    }
}

Output

12
fg456

General Quantifiers


File

123456899
X222X
abcedeft
e111222333f
ID123111222
1
12
123
12345
123456
1234567
12345678

Program

while(<>) {
    # matches exactly 9 digits
    if(/\d{9}/) {
        print $_;
    } 
}

Output

123456899
e111222333f
ID123111222

Program

while(<>) {
    # matches more than 6 digits
    if(/\d{6,}/) {
        print $_;
    }   
}

Output

123456899
e111222333f
ID123111222
123456
1234567
12345678

Program

while(<>) {
    # matches more than 8 or 9 digits
    if(/\d{8,9}/) {
        print $_;
    }   
}

Output

123456899
e111222333f
ID123111222
12345678

Anchoring Patterns


File

abc
abcd
abcdef
Xabc
XabcY

Program

while(<>) {
    # matches exactly abc
    if(/^abc$/) {
        print $_;
    } 
}

Output

abc

Program

while(<>) {
     # begins with abc
    if(/^abc/) {
        print $_;
    } 
}

Output

abc
abcd
abcdef

Program

while(<>) {
    # ends with abc
    if(/abc$/) {
        print $_;
    } 
}

Output

abc
Xabc

File

fred is a nice guy.
Alfred is a nice guy.
I saw fredrick yesterday.

Program

while(<>) {
    if(/\bfred\b/) {
        print $_;
    }   
}

Output

fred is a nice guy.

Parentheses

Backreferences


File

XabcX
BabcC
yabcy
(abc)
ZabcY
ZabcZ

Program

while(<>) {
    # XabcX, YabcY,2abc2 etc
    if(/^(.)abc(\1)$/) {
        print $_;
    }  
}

Output

XabcX
yabcy
ZabcZ

File

abzy
aabb
aaxx
bbzz
bbzy
ccab

Program

while(<>) {
    # aaxx, aayy, etc etc
    if(/^([abc])\1([xyz])\2$/) {
        print $_;
    }  
}

Output

aaxx
bbzz

Precedence in Regular Expressions

  1. Parentheses
  2. Quantifiers
  3. Anchors and sequencing
  4. Alternation