English 中文(简体)
Regex to get value within tag
原标题:

I have a sample set of XML returned back:

<rsp stat="ok">
  <site>
    <id>1234</id>
    <name>testAddress</name>
    <hostname>anotherName</hostname>
    ...

  </site>
  <site>
    <id>56789</id>
    <name>ba</name>
    <hostname>alphatest</hostname>
    ...
  </site>
</rsp>

I want to extract everything within <name></name> but not the tags themselves, and to have that only for the first instance (or based on some other test select which item).

Is this possible with regex?

最佳回答

The best tool for this kind of task is XPath.

NSURL *rspURL = [NSURL fileURLWithPath:[@"~/rsp.xml" stringByExpandingTildeInPath]];
NSXMLDocument *document = [[[NSXMLDocument alloc] initWithContentsOfURL:rspURL options:NSXMLNodeOptionsNone error:NULL] autorelease];

NSArray *nodes = [document nodesForXPath:@"/rsp/site[1]/name" error:NULL];
NSString *name = [nodes count] > 0 ? [[nodes objectAtIndex:0] stringValue] : nil;

If you want the name of the site which has id 56789, use this XPath: /rsp/site[id= 56789 ]/name instead. I suggest you read W3Schools XPath tutorial for a quick overview of the XPath syntax.

问题回答

<disclaimer>I don t use Objective-C</disclaimer>

You should be using an XML parser, not regexes. XML is not a regular language, hence not easely parseable by a regular expression. Don t do it.

Never use regular expressions or basic string parsing to process XML. Every language in common usage right now has perfectly good XML support. XML is a deceptively complex standard and it s unlikely your code will be correct in the sense that it will properly parse all well-formed XML input, and even it if does, you re wasting your time because (as just mentioned) every language in common usage has XML support. It is unprofessional to use regular expressions to parse XML.

You could use Expat, with has Objective C bindings.

Apple s options are:

  1. The CF xml parser
  2. The tree based Cocoa parser (10.4 only)

Without knowing your language or environment, here are some perl expressions. Hopefully it will give you the right idea for your application.

Your regular expression to capture the text content of a tag would look something like this:

m/>([^<]*)</

This will capture the content in each tag. You will have to loop on the match to extract all content. Note that this does not account for self-terminated tags. You would need a regex engine with negative lookbehinds to accomplish that. Without knowing your environment, it s hard to say if it would be supported.

You could also just strip all tags from your source using something like:

s/<[^>]*>//g

Also depending on your environment, if you can use an XML-parsing library, it will make your life much easier. After all, by taking the regex approach, you lose everything that XML really offers you (structured data, context awareness, etc).

As others say, you should really be using NSXMLParser for this sort of thing.

HOWEVER, if you only need to extract the stuff in the name tags, then RegexKitLite can do it quite easily:

NSString * xmlString = ...;
NSArray * captures = [xmlString arrayOfCaptureComponentsMatchedByRegex:@"<name>(.*?)</name>"];
for (NSArray * captureGroup in captures) {
  NSLog(@"Name: %@", [captureGroup objectAtIndex:1];
}

Careful about namespaces:

<prefix:name xmlns:prefix="">testAddress</prefix:name>

is equivalent XML that will break regexp based code. For XML, use an XML parser. XPath is your friend for things like this. The XPath code below will return a sequence of strings with the info you want:

./rsp/site/name/text()

Cocoa has NSXML support for XPath.





相关问题
Parse players currently in lobby

I m attempting to write a bash script to parse out the following log file and give me a list of CURRENT players in the room (so ignoring players that left, but including players that may have rejoined)...

How to get instance from string in C#?

Is it possible to get the property of a class from string and then set a value? Example: string s = "label1.text"; string value = "new value"; label1.text = value; <--and some code that makes ...

XML DOM parsing br tag

I need to parse a xml string to obtain the xml DOM, the problem I m facing is with the self closing html tag like <br /> giving me the error of Tag mismatch expected </br>. I m aware this ...

Ruby parser in Java

The project I m doing is written in Java and parsers source code files. (Java src up to now). Now I d like to enable parsing Ruby code as well. Therefore I am looking for a parser in Java that parses ...

Locating specific string and capturing data following it

I built a site a long time ago and now I want to place the data into a database without copying and pasting the 400+ pages that it has grown to so that I can make the site database driven. My site ...

热门标签