Question

如今,我是新鲜的,在Xquery世界。我无法想起起点,在Marklogic Xquery写道以下逻辑。我很感谢大家给我的想法/意愿,以便我实现以下目标:

我想根据在B.XML中的一字眼调查,对A.XML进行 Qu。彩礼应当生产C.XML。逻辑应当如下:

A.XML

<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
</root>

B.XML

<WordLookUp>
<companies>
    <company name="Vodafone">Vodafone</company>
    <company name="Nokia">Nokia</company>
</companies>
<topics>
    <topic group="Sports">Cricket</topic>
    <topic group="Entertainment">HBO</topic>
    <topic group="Finance">GDP</topic>
</topics>
<moods>
    <mood number="4">Growth</mood>
    <mood number="-5">Depression</mood>
    <mood number="-3">Recession</mood>
</moods>

C.XML (Result XML)

<root>
    <content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
    <updatedElement>
        <companies>
            <company count="1">Vodafone</company>
            <company count="2">Nokia</company>
        </companies>
        <mood>1</mood>
        <topics>
             <topic count="1">Sports</topic>
             <topic count="1">Entertainment</topic>
        </topics>
            <word-count>22</word-count>
    </updatedElement>
    </root>

Search each company/text() of A.xml in B.xml, if match found create tag: TAG {company count="Number of occurrence of that word"}company/@name {/company}
Search each topic/text() of A.xml in B.xml, if match found create tag TAG {topic topic="Number of occurrences of that word"}topic/@group{/topic}
Search each mood/text() of A.xml in B.xml, if match found [occurrences of first word * {/mood[first word]/@number}] + [occurrences of second word * {/mood[second word]/@number})]....
内容字数。

Answer 1

www.un.org/spanish/ecosoc 这是更简单/更紧的,完全符合规定。不含任何执行延伸内容的文具,使其与任何符合要求的XQuery 1.0处理器合作:

let $content := doc( file:///c:/temp/delete/A.xml )/*/*,
      $lookup := doc( file:///c:/temp/delete/B.xml )/*,
      $words := tokenize($content,  W+ )[.]
         return
           <root>
            {$content}
             <updatedElement>
               <companies>
                  {for $c in $lookup/companies/*,
                       $occurs in count(index-of($words, $c))
                     return
                       if($occurs)
                          then
                            <company count="{$occurs}">
                              {$c/text()}
                            </company>
                          else ()
                  }
               </companies>
               <mood>
                  {
                   sum($lookup/moods/*[false or index-of($words, data(.))]/@number)
                  }
               </mood>
               <topics>
                 {for $t in $lookup/topics/*,
                      $occurs in count(index-of($words, $t))
                    return
                      if($occurs)
                         then
                           <topic count="{$occurs}">
                             {data($t/@group)}
                           </topic>
                         else ()
                  }
               </topics>
               <word-count>{count($words)}</word-count>
              </updatedElement>
          </root>

www.un.org/spanish/ecosoc 在对所提供文件A.xml和B.XML(载于当地名录<代码>c:/temp/delete)适用时,希望得到正确结果:

<root>
   <content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Cricket HBO</content>
   <updatedElement>
      <companies>
         <company count="1">Vodafone</company>
         <company count="2">Nokia</company>
      </companies>
      <mood>1</mood>
      <topics>
         <topic count="1">Sports</topic>
         <topic count="1">Entertainment</topic>
      </topics>
      <word-count>22</word-count>
   </updatedElement>
</root>

Answer 2

这是一帆风顺的,我在此过程中了解到一些情况。感谢!

注:为了取得你想要的结果,我在A.xml(“Creicket”->“Cricket”)中确定了一个打字。

下面的解决办法是使用两个具体指标记录仪:

cts:highlight (for replacing matching text with nodes which you can then count)
cts:tokenize (for breaking up a given string into word, space, and punctuation parts)

这还包括与这两项职能具体相关的一些强有力的魔法:

the dynamic binding of the special variable $cts:text (which isn t really necessary for this particular use case, but I digress), and
the data model extension which adds these subtypes of xs:string:
- cts:word,
- cts:space, and
- cts:punctuation.

欢乐!

xquery version "1.0-ml";

(: Generic function using MarkLogic s ability to find query matches within a single node :)
declare function local:find-matches($content, $search-text) {
  cts:highlight($content, $search-text, <MATCH>{$cts:text}</MATCH>)
  //MATCH
};

(: Generic function using MarkLogic s ability to tokenize text into words, punctuation, and spaces :)
declare function local:get-words($text) {
  cts:tokenize($text)[. instance of cts:word]
};

(: The rest of this is pure XQuery :)
let $content := doc("A.xml")/root/content,
    $lookup  := doc("B.xml")/WordLookUp
return
  <root>
    {$content}
    <updatedElement>

      <companies>{
        for $company in $lookup/companies/company
        let $results := local:find-matches($content, string($company))
        where exists($results)
        return
          <company count="{count($results)}">{string($company/@name)}</company>
      }</companies>

      <mood>{
        sum(
          for $mood in $lookup/moods/mood
          let $results := local:find-matches($content, string($mood))
          return count($results) * $mood/@number
        )
      }</mood>

      <topics>{
        for $topic in $lookup/topics/topic
        let $results := local:find-matches($content, string($topic))
        where exists($results)
        return
          <topic count="{count($results)}">{string($topic/@group)}</topic>
      }</topics>

      <word-count>{
        count(local:get-words($content))
      }</word-count>

    </updatedElement>
  </root>

Let me know if you have any follow-up questions about how all the above works. At first, I was inclined to use cts:search or cts:contains, which are the bread and butter for search in MarkLogic. But I realized that this example wasn t so much about search (finding documents) as it was about looking up matching text within an already-given document. If you needed to extend this somehow to aggregate across a large number of documents, then you d want to look into the additional use of cts:search or cts:contains.

最后一项警告:如果你认为你的内容可能包含<条码>和>;MATCH>内容已经存在,则你希望在打电话<条码>时使用不同的内容名称:<> (你可以保证姓名与你的内容有冲突。) 否则,你可能会得出错误结果(比准确数字高)。

<<>ADDENDUM>:

鉴于<代码>cts:tokenize,我很想知道,如果没有<条码>:高明灯/条码>即可做到这一点。案文已经分解成你所讲的所有字。采用这一替代方式,得出同样的结果:<代码> 当地:find-matches。 (如果你因职务申报单的顺序取决于另一功能):

(: Find word matches by comparing them one-by-one :)
declare function local:find-matches($content, $search-text) {
  local:get-words($content)[cts:stem(.) = cts:stem($search-text)]
};

它使用<代码>cts:stem实现特定字体的正常化,例如,搜索“通行证”将与“通行证”相匹配。然而,这仍为多语(新语)搜索赢得了一定的工作。为了安全起见,我用<代码>cts:highlight ,如cts:search和cts:contains可处理给它的任何盒子:

Answer 3

如何回头看,并询问是否可以更好地利用你的数据或文件,以文件为导向的数据库而不是纸浆使用。

友情链接