English 中文(简体)
Replace only within a XML tag; exporting from Referencer .reflib to bibtex format with filenames intact and URL-encoding removed, with a bash command
原标题:

I have many references in Referencer. I m trying to include filenames in my bibtex file when exporting from Referencer. Since the software doesn t do this by default I m trying to use a sed command to include the filename as a bibtex information in the XML file before I export and thus include the filename.

Input

  <doc>
<filename>file:///home/dwickrama/Desktop/stevenJonesLab/papers/Transcription%20Factor%20Binding/A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</filename>
<relative_filename>A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</relative_filename>
<key>Sadowski93</key>
<notes></notes>
<bib_type>article</bib_type>
<bib_doi></bib_doi>
<bib_title>A common nuclear signal transduction pathway activated by growth factor and cytokine receptors.</bib_title>
<bib_authors>Sadowski, H B and Shuai, K and Darnell, J E and Gilman, M Z</bib_authors>
<bib_journal>Science</bib_journal>
<bib_volume>261</bib_volume>
<bib_number>5129</bib_number>
<bib_pages>1739-44</bib_pages>
<bib_year>1993</bib_year>
<bib_extra key="pmid">8397445</bib_extra>

Ouput

  <doc>
<filename>file:///home/dwickrama/Desktop/stevenJonesLab/papers/Transcription%20Factor%20Binding/A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</filename>
<bib_extra key="File">article:../Transcription Factor Binding/A Common Nuclear Signal Transduction Pathway Activated by Growth Factor and Cytokine.pdf:pdf</bib_extra>
<relative_filename>A%20Common%20Nuclear%20Signal%20Transduction%20Pathway%20Activated%20by%20Growth%20Factor%20and%20Cytokine.pdf</relative_filename>
<key>Sadowski93</key>
<notes></notes>
<bib_type>article</bib_type>
<bib_doi></bib_doi>
<bib_title>A common nuclear signal transduction pathway activated by growth factor and cytokine receptors.</bib_title>
<bib_authors>Sadowski, H B and Shuai, K and Darnell, J E and Gilman, M Z</bib_authors>
<bib_journal>Science</bib_journal>
<bib_volume>261</bib_volume>
<bib_number>5129</bib_number>
<bib_pages>1739-44</bib_pages>
<bib_year>1993</bib_year>
<bib_extra key="pmid">8397445</bib_extra>

I can use the following sed command to partially do what I want, but the URL encoding "%20" remains. How do I get rid of that in only the bibtex tag ?

sed -e  s/(    <filename>file:///home/dwickrama/Desktop/stevenJonesLab/papers)([^.]*)(.?)(.*)(</filename>)/12345
    <bib_extra key="File">article:..234:4</bib_extra>/g  NewPapers.reflib > NewPapers.new.reflib
最佳回答

Regex and sed are not very good tools for processing XML, or URL-decoding.

A quick script in more complete scripting language would be able to do it more clearly and reliably. For example in Python:

import urllib, urlparse
from xml.dom import minidom

doc= minidom.parse( NewPapers.reflib )
el= doc.getElementsByTagName( filename )[0]
path= urlparse.urlparse(el.firstChild.data)[2]
foldername, filename= map(urllib.unquote, path.split( / )[-2:])

extra= doc.createElement( bib_extra )
extra.setAttribute( key ,  File )
extra.appendChild(document.createTextNode( article:../%s/%s:pdf  % (foldername, filename)))
el.parentNode.insertBefore(extra, el.nextSibling)
doc.writexml(open( NewPapers.new.reflib ))

(I haven t included a function to reproduce the backslash-escaping in the given example output as it s not clearly exactly what format that is. The simplest approach would be filename= filename.replace( , \ ), but I m not sure that would be correct.)

问题回答

all you need is to add a line after right?? So just print it out after is searched.

#!/bin/bash

s= <bib_extra key="File">article:../Transcription\ Factor\ Binding/A\ Common\ Nuclear\ Signal\ Transduction\ Pathway\ Activated\ by\ Growth\ Factor\ and\ Cytokine.pdf:pdf</bib_extra> 

awk -vstr="$s"  
/<filename>/{
    print
    print str;next
}
{print}  file




相关问题
Parse players currently in lobby

I m attempting to write a bash script to parse out the following log file and give me a list of CURRENT players in the room (so ignoring players that left, but including players that may have rejoined)...

encoding of file shell script

How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1. Thanks

Bash usage of vi or emacs

From a programming standpoint, when you set the bash shell to use vi or emacs via set -o vi or set -o emacs What is actually going on here? I ve been reading a book where it claims the bash shell ...

Dynamically building a command in bash

I am construcing a command in bash dynamically. This works fine: COMMAND="java myclass" ${COMMAND} Now I want to dynamically construct a command that redirectes the output: LOG=">> myfile.log ...

Perform OR on two hash outputs of sha1sum

I want perform sha1sum file1 and sha1sum file2 and perform bitwise OR operation with them using bash. Output should be printable i.e 53a23bc2e24d039 ... (160 bit) How can I do this? I know echo $(( ...

Set screen-title from shellscript

Is it possible to set the Screen Title using a shell script? I thought about something like sending the key commands ctrl+A shift-A Name enter I searched for about an hour on how to emulate ...

热门标签