InterScan Messaging Security Appliance (IMSA) 7.0 treats all keyword expressions as regular expressions. eManager 6.0 in IMSA 7.0 supports the following regular expressions.
Regular Expression |
Description |
. (dot) |
Any character (byte) except newline |
x |
The character 'x' |
\\ |
The character '\' |
\a |
The alert (bell) character (ASCII 0x07) |
\b
|
1. If this meta-symbol is within square brackets [] or “”, it will be treated as the backspace character (ASCII 0x08). For example, [\b] or “\b” 2. If this meta-symbol is at the beginning (or end) of a regular expression, it means any matched string of the regular expression must check whether the left (or right) side of the matched string is a boundary. For example, a) \bluck -> left side must be boundary. b) luck\b -> right side must be boundary. c) \bluck\b -> both sides must be boundary. 3. If this meta-symbol appears in the middle of a regular expression, it would cause a syntax error. |
\f |
The form-feed character (ASCII 0x0C) |
\n |
The newline (line feed) character (ASCII 0x0A) |
\r |
The carriage-return character (ASCII 0x0D) |
\t |
The normal (horizontal) tab character (ASCII 0x09) |
\v |
The vertical tab character (ASCII 0x0B) |
\n |
The character with octal value 0n (0 <= n <= 7) |
\nn |
The character with octal value 0nn (0 <= n <= 7) |
\mnn |
The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) |
\xhh |
The character with a hexadecimal value 0xhh, for example, \x20 means the space character |
Bracket expression and character classes
Bracket expressions are a list of characters and/or character classes enclosed in brackets ‘[]’. Use bracket expressions to match single characters in a list, or a range of characters in a list. If the first character of the list is the carat ‘^’ then it matches characters that are not in the list.
For example:
Expression |
Matches |
[abc] |
a, b, or c |
[a-z] |
a through z |
[^abc] |
Any character except a, b, or c |
[[:alpha:]] |
Any alphabetic character (see below) |
Each character class designates a set of characters equivalent to the corresponding standard C isXXX function. For example, [:alpha:] designates those characters for which isalpha() returns true, i.e. any alphabetic character. Character classes must be within bracket expression.
Character class |
Description |
[:alpha:] |
Alphabetic characters |
[:digit:] |
Digits |
[:alnum:] |
Alphabetic characters and numeric characters |
[:cntrl:] |
Control character |
[:blank:] |
Space and tab |
[:space:] |
All white space characters |
[:graph:] |
Non-blank (not spaces, control characters, or the like) |
[:print:] |
Like [:graph:], but includes the space character |
[:punct:] |
Punctuation characters |
[:lower:] |
Lowercase alphabetic |
[:upper:] |
Uppercase alphabetic |
[:xdigit:] |
Digits allowed in a hexadecimal number (0-9a-fA-F) |
For a case-insensitive expression, [:lower:] and [:upper:] are equivalent to [:alpha:].
Expression |
Description |
^ |
Beginning of line |
$ |
End of line |
Expression |
Description |
R? |
Matches R, once or not at all |
R* |
Matches R, zero or more times |
R+ |
Matches R, one or more times |
R{n} |
Matches R, exactly n times |
R{n,} |
Matches R, at least n times |
R{n,m} |
Matches R, at least n but no more than m times |
R is a regular expression.
Trend Micro does not recommend using ".*" in a regular expression. ".*" matches any length of letters and the large number of matches may increase memory usage and affect performance.
For example:
If the content is 123456abc, the regular expression ".*abc" match results are:
12345abc
23455abc
3456abc
456abc
56abc
6abc
abc
In this example, replace ".*abc" with "abc" to prevent excessive use of resources.
Expression |
Description |
RS |
R followed by S (concatenation) |
R|S |
Either R or S |
R/S |
An R but only if it is followed by S |
(R) |
Grouping R |
R and S are regular expressions
eManager provides the following shorthand for writing complicated regular expressions. eManager will pre-process expressions and translate the shorthand into regular expressions. For example, {D}+ would be translated to [0-9]+. If a shorthand is enclosed in bracket expression (i.e., {}) or double-quotes, then eManager will not translate that shorthand to regular expression.
Shorthand |
Description |
{D} |
[0-9] |
{L} |
[A-Za-z] |
{SP} |
[(),;\.\\<>@\[\]:] |
{NUMBER} |
[0-9]+ |
{WORD} |
[A-Za-z]+ |
{CR} |
\r |
{LF} |
\n |
{LWSP} |
[ \t] |
{CRLF} |
(\r\n) |
{WSP} |
[ \t\f]+ |
{ALLC} |
. |
eManager also provides the following meta-symbols. The difference between shorthand and meta-symbols is that meta-symbols can be within a bracket expression.
Meta-symbol |
Description |
\s |
[[:space:]] |
\S |
[^[:space:]] |
\d |
[[:digit:]] |
\D |
[^[:digit:]] |
\w |
[_[:alnum:]] |
\W |
[^_[:alnum:]] |
Literal string and escape character of regular expressions
To match a character that has a special meaning in regular expressions (e.g. ‘+’), you need to use the backslash ‘\’ escape character. For example, to match string “C/C++”, use the expression C\/C\+\+.
Sometimes, you have to add many escape characters to your expression (e.g. C\/C\+\+). In this situation, enclose the string “C/C++” in double-quotes (e.g. .REG “C/C++”) then the new expression is equivalent to the old one. Characters (except ‘\’ which is an escape character) within double-quotes are literal. Following are some examples,
Expression |
Description |
“C/C++” |
Match string “C/C++” (does not include double-quotes) |
“Regular\x20Expression” |
Match string “Regular Expression” (does not include double-quotes), where \x20 means the space character. |
"[xyz]\"foo" |
Match the literal string: [xyz]"foo |
|
Change the adjacent <space> to "\x20" for
the following in a regular expression: |