Help with regular expressions (preg_match)

Live forum: http://forum.freeipodguide.com/viewtopic.php?t=78303

JennyWren

27-01-2009 02:33:05

Help me, regex gurus. I'm trying to get information from this page

http//www.students.ubc.ca/calendar/courses.cfm

I do $text = file_get_contents ( 'http//www.students.ubc.ca/calendar/courses.cfm' );

Then when I try to do a preg_match it usually returns no matches. For instance, if I do preg_match ( '|ADHE|', $text ) it returns 0. Why?

dmorris68

27-01-2009 19:00:26

Well, for starters your sample expression doesn't seem to make sense to me, so I'm not surprised it doesn't find anything. What are you trying to find with the '|ADHE|' pattern? Vertical bars are boolean OR operators, so you can search for 'that|this' to find either word. It looks to me the way you have it formatted would not be syntactically correct.

If you're going to work with regex much, particularly if you're new to complicated expressions, consider checking out some of the visual regex tools like RegexBuddy or Expresso. The former is probably the best known and most powerful but is shareware, Expresso and a few others are free. There are also some web sites I've run across over the years with interactive expression builders.

EDIT Okay, after clicking through one of the links on that page I found the couse description page, and 'ADHE' is the first category code in the list. If that's all you're looking for, drop the vertical bars and search for 'ADHE'. If you're worried about searching for common character sequences that might occur in other phrases, you can make it a word search (occurs between word delimiters) by using '\bADHE\b'. Or since you're scraping an HTML dump, you could search for the ending/beginning tag brackets, like '>ADHE<'

JennyWren

27-01-2009 20:34:40

I thought I needed delimiters? If I just use 'ADHE' I get this

Warning preg_match() [function.preg-match] Delimiter must not be alphanumeric or backslash

That's just an example, I'm writing a scraper which grabs the department name and code for each department. I managed to do it a different way in the end, but I still don't know why this didn't work.

dmorris68

28-01-2009 15:42:07

Ah, yeah, with PCRE expressions you're right. Sorry, I still default to thinking in POSIX since I rarely use Perl.

Yes, you need delimiters for Perl expressions, but they can be any characters that don't exist in the search pattern. It might not matter, but since the vertical bar has special meaning in a conditional, try using using another pair of delimiters, i.e. /ADHE/ or {ADHE} or %ADHE% etc.

If that doesn't work, I'd have to poke around. It looks to me like it should. I don't recall a modifier on the end being lirequiredli but just for grins throw an 'i' on the end to indicate case-insensitivity and see if it works.