English 中文(简体)
合并多个文件
原标题:join multiple files

I正在使用标准合并命令来加入基于列1 的两个分类文件。 命令是简单的加入文件1 文件2 & gt; 输出_ file.

但是使用同样的技术我如何加入3个或3个以上的文件? 加入文件1 文件2 file3 & gt; 上端命令的输出_ file 给了我一个空文件。 我认为 sed 可以帮助我, 但我不太确定如何?

最佳回答

man 加入 :

NAME
       join - join lines of two files on a common field

SYNOPSIS
       join [OPTION]... FILE1 FILE2

它只对两个 < two < / strong > 文件起作用 。

如果你需要加入三个,也许你可以先加入前两个,然后加入第三个。

尝试 :

join file1 file2 | join - file3 > output

加入三个文件, 而不创建中间临时文件 。 - 告诉联合命令读取第一个输入流, 从 stdin 读取第一个输入流

问题回答

您可以通过反复构建 join 管道加入多个文件( N>=2) :

#!/bin/sh

# multijoin - join multiple files

join_rec() {
    if [ $# -eq 1 ]; then
        join - "$1"
    else
        f=$1; shift
        join - "$f" | join_rec "$@"
    fi
}

if [ $# -le 2 ]; then
    join "$@"
else
    f1=$1; f2=$2; shift 2
    join "$f1" "$f2" | join_rec "$@"
fi

I know this is an old question but for future reference. If you know that the files you want to join have a pattern like in the question here e.g. file1 file2 file3 ... fileN Then you can simply join them with this command

cat file* > output

输出将是按字母顺序合并的合并文件序列。

我为此创建了一个函数。 第一个参数是输出文件, 休息参数是要合并的文件 。

function multijoin() {
    out=$1
    shift 1
    cat $1 | awk  {print $1}  > $out
    for f in $*; do join $out $f > tmp; mv tmp $out; done
}

用法 :

multijoin output_file file*

虽然这是一个老问题,但这就是如何用一个单一的awk 来做到这一点:

awk -v j=<field_number>  {key=$j; $j=""}  # get key and delete field j
                         (NR==FNR){order[FNR]=key;} # store the key-order
                         {entry[key]=entry[key] OFS $0 } # update key-entry
                         END { for(i=1;i<=FNR;++i) {
                                  key=order[i]; print key entry[key] # print
                               }
                         }  file1 ... filen

本脚本假定:

  • all files have the same amount of lines
  • the order of the output is the same order of the first file.
  • files do not need to be sorted in field <field_number>
  • <field_number> is a valid integer.

man page of join 表示它只对两个文件有效。 所以您需要创建中间文件, 然后删除其中的文件, 即:

> join file1 file2 > temp
> join temp file3 > output
> rm temp

在一个共同字段中加入 < 坚固> 2 文件 的连队。 如果您想要加入更多的话 - 以对对方式加入。 先加入两个文件, 然后加入第三个文件等的结果 。

假设您有四个文件 A.txt, B.txt, C.txt 和 D.txt, 它们是:

~$ cat A.txt
x1 2
x2 3
x4 5
x5 8

~$ cat B.txt
x1 5
x2 7
x3 4
x4 6

~$ cat C.txt
x2 1
x3 1
x4 1
x5 1

~$ cat D.txt
x1 1

加入文件使用 :

firstOutput= 0,1.2 ; secondOutput= 2.2 ; myoutput="$firstOutput,$secondOutput"; outputCount=3; join -a 1 -a 2 -e 0 -o "$myoutput" A.txt B.txt > tmp.tmp; for f in C.txt D.txt; do firstOutput="$firstOutput,1.$outputCount"; myoutput="$firstOutput,$secondOutput"; join -a 1 -a 2 -e 0 -o "$myoutput" tmp.tmp $f > tempf; mv tempf tmp.tmp; outputCount=$(($outputCount+1)); done; mv tmp.tmp files_join.txt

结果:

~$ cat files_join.txt 
x1 2 5 0 1
x2 3 7 1 0
x3 0 4 1 0
x4 5 6 1 0
x5 8 0 1 0




相关问题
Signed executables under Linux

For security reasons, it is desirable to check the integrity of code before execution, avoiding tampered software by an attacker. So, my question is How to sign executable code and run only trusted ...

encoding of file shell script

How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1. Thanks

How to write a Remote DataModule to run on a linux server?

i would like to know if there are any solution to do this. Does anyone? The big picture: I want to access data over the web, using my delphi thin clients. But i´would like to keep my server/service ...

How can I use exit codes to run shell scripts sequentially?

Since cruise control is full of bugs that have wasted my entire week, I have decided the existing shell scripts I have are simpler and thus better. Here is what I have so far svn update /var/www/...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

热门标签