English 中文(简体)
我需要定期表达《图像和超文本文件》。
原标题:I need a Regular Expression To Extract Images And HTML Documents
  • 时间:2012-04-13 20:02:47
  •  标签:
  • c#
  • wpf
  • regex

我有各种超文本文件试图与以下文件链接:(1) 其他html文件,(2)图像文件,如jpg、png和bmp。 我需要定期表达这一点,而且似乎不能说出这一点。

每一页的代码将类似于以下编码:


IMG作风=“MARGIN-BOTTOM: 20px;MARGIN-LEFT: 20px”=right src=“页: 1”>

IMG作风=“MARGIN-BOTTOM: 25px;MARGIN-LEFT: 25px”=right src=“页: 1”>

IMG作风=“MARGIN-BOTTOM: 20px;MARGIN-LEFT: 20px”=right src=“页: 1”>

href=“javascript:will.POPUP(eturl: 测试Doc001.htm, 类型:共有,width:600,hels:645})”>

href=“javascript:will.POPUP(eturl:Doc002.html, 类型:共有,width:700,hels:712})”>


例如,定期表达将以上述超文本的形式进行,并形成相应的阵列:

页: 1

页: 1

页: 1

测试Doc001.htm

测试Doc002.html

谁能帮助我? 非常感谢。

最佳回答

不要忘记,你试图用正常的言辞把超文本同住。 http://htmlagility Pack.codeplex.com/“rel=“nofollow” 超文本设备

问题回答

页: 1

my $x = "your html";

#$1 - is a first group in match - (.+.(jpg|png))
while ($x =~ /<img .* src="(.+.(jpg|png))"/ig) {
    print "$1
";
}

while ($x =~ /<a( .)* href=".*url:( |")(.+.htm(l)?)( |").*/ig) {
    print "$3
";
}

产出:

images/sample001.jpg
images/sample002.png
testDoc001.htm
testDoc002.html

regexps <img .* src=”(+.+.(jp ng)>和<a( )* href= .*url:htm>+.l)? <代码>ig下定义,查询对个案不敏感,多重匹配





相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...