English 中文(简体)
将 look头合并成对等特性,但只有在没有具体其他特性的情况下才列入
原标题:Regex combining look-ahead to include a character as a match but only when not followed by a specific other character
  • 时间:2023-12-22 01:07:05
  •  标签:
  • regex
  • pcre

* E/CN.6/2009/1。 如同工作一样(这里有各种突然尝试,要求不同的应用):

<ref name="*([^">/(?=>)]*)"* *(/*)>
<ref name="*([^">/(?=>)]*)"* *(/*)>
<ref name="*([^">/(?=>)]*)"* *(/*)>
<ref name="*([^">/(?=>)]*)"* *(/*)>

处理类似案文的案件

<ref name="Te/st12" />

缩略语

我有工作要做。

<ref name=Test1/>
<ref name=Test2 />
<ref name="Test3"/>
<ref name="Test4" />
<ref name=Test5>Foo</ref>
<ref name="Test6">Foo</ref>
<ref name=Te/st7/>
<ref name=Te/st8 />
<ref name="Te/st9"/>
<ref name="Te/st10" />
<ref name=Te/st11>Foo</ref>
<ref name="Te/st12">Foo</ref>

This regex seems like it should work (and I ve tried several variants on the theme):

<ref name="*([^">/(?=>)]*)"* *(/*)>

或者如果在某一特定应用中需要更紧急的:

<ref name="*([^">/(?=>)]*)"* *(/*)>

或更少:

<ref name="*([^">/(?=>)]*)"* *(/*)>

但是,它未能将所有包含<代码>/的编号为示例数据的案例。

更方便地研究这一问题:

<ref name="*([^">/(?=>)]*)"*

或者如果在某一特定应用中需要更紧急的:

<ref name="*([^">/(?=>)]*)"*

或更少:

<ref name="*([^">/(?=>)]*)"*

It always chokes on the / in the name data of the latter examples, despite /(?=>) (which depending on application might be escaped as /(?=>) or other variations thereof), meaning "the slash character when followed by the greater-than character". My suspicion is that (?=...) doesn t work properly inside [^...], but I ve tried various | (or) constructions, e.g. using a separate [?!>], and those didn t work either.

目标是通过下列方式将所有这类案例(包括有“姓名数据”或“姓名数据”的案件)正规化成一个前后一致的格式:

<ref name="$1"$2>

PS:我将此称为“pcre”。 相关应用是Wikipedia现有编辑工具的检索/替代特征(编辑一页,在编辑室外的工具包中点击“和”“先进”,而远右面看看看成一个宽松的玻璃一孔,它带有一种reg子的搜索/替代特征,但没有确切说明什么是reg鱼。 I m 不是使用打字/...,而是在整个表述上加标记,因为申请没有使用。

问题回答

您不能把眼光放在一个名为[<>/code>的特性中,试图将其分为两种使用OR条件的不同案例:

<ref name="?((?:[^">/]|/(?!>))*?)"? */?>
<refsname="?    # start of the `ref` tag, up to the `name` attribute

(                # group 1, what you want to capture
  (?:            # non-capturing group, for grouping the following OR condition
      [^">/]    # any character but `">/`
      |          # OR
      /(?!>)    # if it IS a `/`, then it must NOT be followed by a `>`
  )*?            # this non-capturing group repeats 0 to unlimited times, as few as possible (to avoid capturing trailing whitespaces)
)                # group 1 ends

"?s*/?>        # everything after the `name` attribute

Check the test cases here.

But be aware, matching HTML/XML tags using regex is generally considered a bad idea. It only works for some basic validation (this doesn t work for inputs like name="a>b", name= ab etc.), for complex ones, it is recommended to use some markup language parsing library to handle it.

鉴于你们别无选择,只能使用reg,你可以使用reg:

<ref name=(?:"((?:[^"\]|\.)*)"|((?:(?!s*/?>).)*))s*(/?)>

这与:

  • <ref name= : literally <ref name=
  • either:
    1. "((?:[^"\]|\.)*)" : a quoted string (captured in group 1) that may include escaped characters, including other quotes (from this answer); or
    2. ((?:(?!s*/?>).)*) : an unquoted string (captured in group 2) which will end when the position at the regex cursor matches s*/?>. This uses a tempered greedy token to not match any further than necessary.
  • s*(/?)> : some spaces, an optional /, captured in group 3, and a >

您将取代以下表格:

<ref name="$1$2"$3>

Demo on regex101





相关问题
Uncommon regular expressions [closed]

Recently I discovered two amazing regular expression features: ?: and ?!. I was curious of other neat regex features. So maybe you would like to share some tricky regular expressions.

regex to trap img tag, both versions

I need to remove image tags from text, so both versions of the tag: <img src="" ... ></img> <img src="" ... />

C++, Boost regex, replace value function of matched value?

Specifically, I have an array of strings called val, and want to replace all instances of "%{n}%" in the input with val[n]. More generally, I want the replace value to be a function of the match ...

PowerShell -match operator and multiple groups

I have the following log entry that I am processing in PowerShell I m trying to extract all the activity names and durations using the -match operator but I am only getting one match group back. I m ...

Is it possible to negate a regular expression search?

I m building a lexical analysis engine in c#. For the most part it is done and works quite well. One of the features of my lexer is that it allows any user to input their own regular expressions. This ...

regex for four-digit numbers (or "default")

I need a regex for four-digit numbers separated by comma ("default" can also be a value). Examples: 6755 3452,8767,9865,8766,3454 7678,9876 1234,9867,6876,9865 default Note: "default" ...