English 中文(简体)
Using awk printf to urldecode text
原标题:

I m using awk to urldecode some text.

If I code the string into the printf statement like printf "%s", "x3D" it correctly outputs =. The same if I have the whole escaped string as a variable.

However, if I only have the 3D, how can I append the x so printf will print the = and not x3D?

I m using busybox awk 1.4.2 and the ash shell.

最佳回答

Since you re using ash and Perl isn t available, I m assuming that you may not have gawk.

For me, using gawk or busybox awk, your second example works the same as the first (I get "=" from both) unless I use the --posix option (in which case I get "x3D" for both).

If I use --non-decimal-data or --traditional with gawk I get "=".

What version of AWK are you using (awk, nawk, gawk, busybox - and version number)?

Edit:

You can coerce the variable s string value into a numeric one by adding zero:

~/busybox/awk  BEGIN { string="3D"; pre="0x"; hex=pre string; printf "%c", hex+0} 
问题回答

I don t know how you do this in awk, but it s trivial in perl:

echo "http://example.com/?q=foo%3Dbar" | 
    perl -pe  s/+/ /g; s/%([0-9a-f]{2})/chr(hex($1))/eig 

GNU awk

#!/usr/bin/awk -fn
@include "ord"
BEGIN {
  RS = "%.."
}
{
  printf RT ? $0 chr("0x" substr(RT, 2)) : $0
}

Or

#!/bin/sh
awk -niord  {printf RT?$0chr("0x"substr(RT,2)):$0}  RS=%..

Decoding URL encoding (percent encoding)

This relies on gnu awk s extension of the split function, but this works:

gawk  { numElems = split($0, arr, /%../, seps);
        outStr = ""
        for (i = 1; i <= numElems - 1; i++) {
            outStr = outStr arr[i]
            outStr = outStr sprintf("%c", strtonum("0x" substr(seps[i],2)))
        }
        outStr = outStr arr[i]
        print outStr
      } 

To start with, I m aware this is an old question, but none of the answers worked for me (restricted to busybox awk)

Two options. To parse stdin:

awk  {for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\&" : sprintf("%c", y));gsub(/%25/, "%");print} 

To take a command line parameter:

awk  BEGIN {for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\&" : sprintf("%c", y), ARGV[1]);gsub(/%25/, "%", ARGV[1]);print ARGV[1]}  parameter

Have to do %25 last because otherwise strings like %253D get double-parsed, which shouldn t happen.

The inline check for y==38 is because gsub treats & as a special character unless you backslash it.

This one is the fastest of them all by a large margin and it doesn t need gawk:

#!/usr/bin/mawk -f

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART + 1, RLENGTH - 1)
        rep = sprintf("%c", ("0x" mid) + 0)
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

{
    print decode_url($0)
}

Save it as decode_url.awk and use it like you normally would. E.g:

$ ./decode_url.awk <<<  Hello%2C%20world%20%21 
Hello, world !

But if you want an even faster version:

#!/usr/bin/mawk -f

function gen_url_decode_array(      i, n, c) {
    delete decodeArray
    for (i = 32; i < 64; ++i) {
        c = sprintf("%c", i)
        n = sprintf("%%%02X", i)
        decodeArray[n] = c
        decodeArray[tolower(n)] = c
    }
}

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART, RLENGTH)
        rep = decodeArray[mid]
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

BEGIN {
    gen_url_decode_array()
}

{
    print decode_url($0)
}

Other interpreters than mawk should have no problem with them.





相关问题
awk save command ouput to variable

I need to execute a command per line of some file. For example: file1.txt 100 4 file2.txt 19 8 So my awk script need to execute something like command $1 $2 $3 and save the output of command $1 $2 ...

awk and bash script

I have a tgz file which contains a version file, version.txt. This file has only one line "version=1.0.0". This tgz file is present in two different directories and has the same name. My requirement ...

awk - how to specify field separator as binary value 0x1

Is it possible to specify the separator field FS in binary for awk? I have data file with ascii data fields but separated by binary delimiter 0x1. If it was character 1 it would look like this: ...

Awk - print next record following matched record

I m trying to get a next field after matching field using awk. Is there an option to do that or do I need to scan the record into array then check each field in array and print the one after that? ...

Using Unix Tools to Extract String Values

I wrote a small Perl script to extract all the values from a JSON formatted string for a given key name (shown below). So, if I set a command line switch for the Perl script to id, then it would ...

how to use sed, awk, or gawk to print only what is matched?

I see lots of examples and man pages on how to do things like search-and-replace using sed, awk, or gawk. But in my case, I have a regular expression that I want to run against a text file to extract ...

Bulk Insert Code Before </body> Tag in 100 Files

I d like to insert <?php include_once( google_analytics.php ); ?> before the closing body tag of about 100 php files. Unfortunately the person who made the site didn t make a header or ...

热门标签