match-regexp

Function

Package: excl

Arguments: string-or-regexp string-to-match &key newlines-special case-fold return start end shortest

The string-or-regexp argument is a regular expression object (the result of compile-regexp) or it is a string (in which case it will be compiled into a regular expression object). The string-to-match is a string to match against the regular expression. This function will attempt to match the regular expression against the string-to-match starting at the first character of the string-to-match, and if that fails it will look inside the string-to-match for a match (unless the regular expression begins with a caret).

The keyword arguments are:

newlines-specialIf true (default is true) then a newline will not match the . (i.e. a period) regular expression. This is useful to prevent multiline matches.
case-foldIf true then the string-to-match is effectively mapped to lower case before doing the match. Thus lower case characters in the regular expression match either case and upper case characters match only upper case characters.
returnThe return value from a failed match is nil. If the value of return is :string then the return value from a successful match are multiple values. The first value is t. The second value is the substring of the string-to-match that matched the regular expression. The third value (if any) is the substring that matched group 1. The fourth value is the substring that matched group 2. And so on. If you use the \| form, then some groups may have no associated match in which case nil will be returned as that value. In highly nested \| forms, a group may return a match string when in the final match that group had no match.

If the value of return is :index then it is just like :string except that instead of the strings being returned, a cons is returned giving the start and end indices in the original string-to-match of the match. The end index is one greater than the last character in the substring.

If the value of return is nil then the one value t is returned when the match succeeds.

startThe first character in the string-to-match to match against.
endOne past the last character in the string-to-match to match against.
shortestThis makes match-regexp return the shortest rather than the longest match. One motivation for this is parsing html. Suppose you want to search for the next item in italics, which in html looks like <i>foo</i>. Suppose your string is "<i>foo</i> and <i>foo</i>". The following example shows the difference:
    user(10): (match-regexp "<i>.*</i>" string)
    "<i>foo</i> and  <i>bar</i>
    user(11): (match-regexp "<i>.*</i>" string
                        :shortest t)
    "<i>foo</i>"
     

Compilation note: there is a compiler macro defined for match-regexp that will handle in a special way match-regexp calls where the first argument is a constant string. That is, this form (match-regexp "foo" x) will compile to code that will arrange to call compile-regexp on the string when the code is fasl'ed in. Since the cost of compile-regexp is high, this saves a lot of time.

See regexp.htm for more information.

The documentation is described in introduction.htm and the index is in index.htm.

Copyright (c) 1998-2000, Franz Inc. Berkeley, CA., USA. All rights reserved.

Created 2000.10.5.