English 中文(简体)
将txt文档淹没在六氯环己烷中,但保证不重复线。
原标题:Randomize txt file in Linux but guarantee no repetition of lines

我有一份称为测试的档案。

Line 1
Line 2
Line 3
Line 3
Line 3
Line 4
Line 8

I need some code which will randomize these lines BUT GUARANTEE that the same text cannot appear on consecutive lines ie "Line 3" must be split up and not appear twice or even three times in a row.

我在此看到这一问题有许多不同之处,但迄今为止,没有任何情况重复。

至今,我已测试如下:

shuf test.txt

awk  BEGIN{srand()}{print rand(), $0}  test.txt | sort -n -k 1 | awk  sub(/S /,"") *

awk  BEGIN {srand()} {print rand(), $0}  test.txt | sort -n | cut -d     -f2-

cat test.txt | while IFS= read -r f; do printf "%05d %s
" "$RANDOM" "$f"; done | sort -n | cut -c7-

perl -e  print rand()," $_" for <>;  test.txt | sort -n | cut -d     -f2-

perl -MList::Util -e  print List::Util::shuffle <>  test.txt

所有这一切在档案中随机抽取线,但最终往往在档案中连续出现同一线。

Is there any way I can do this?

这是以前的数据。 可查阅<代码>82576483。 连续线

REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476098</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441754</ORD-AUTH-C><ORD-AUTH-V>94.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5759148</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576786</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>24.79</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576324</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441754</ORD-AUTH-C><ORD-AUTH-V>98.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5759148</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576113</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>28.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82590483</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>25.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576883</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>17.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476483</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>

<>strong>NOTE: asterisks supplemented tolight Line of interest; asterisks do not now in the data file

This is what I need to happen where number 82576483 is spread out across the file rather than being on consecutive lines

REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476098</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441754</ORD-AUTH-C><ORD-AUTH-V>94.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5759148</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576786</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>24.79</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576324</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441754</ORD-AUTH-C><ORD-AUTH-V>98.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5759148</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576113</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>28.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82590483</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>25.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576883</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>17.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476483</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
问题回答

至少与随意尝试相比,采取高效办法:

  1. Sort all the unique string
  2. For each duplicate,
    1. Identify the places in which it could be placed.
    2. Pick one at random.
    3. Insert the duplicate there.
use strict;
use warnings;

use List::Util qw( shuffle );

my %counts; ++$counts{ $_ } while <>;

my @strings = shuffle keys %counts;

for my $string ( keys( %counts ) ) {
   my $count = $counts{ $string };
   for ( 2 .. $count ) {
      my @safe =
         grep { $_ == 0        || $strings[ $_ - 1 ] ne $string }
         grep { $_ == @strings || $strings[ $_ - 0 ] ne $string }
         0 .. @strings;

      my $pick = @safe ? $safe[ rand( @safe ) ] : rand( @strings+1 );

      splice( @strings, $pick, 0, $string );
   }
}

print( @strings );

(Just wrap with perl -e ... to run form the shell.)

测试。 也许会采取更妥善的办法。

ruby , 有一些 n子,用于简练。

https://stackoverflow.com/a/65843200 便于修改您的数据:

ruby -e  

regex = /<CUST-ACNT-N>d+</CUST-ACNT-N>/

arr = readlines.map {|line| {:k => line[regex], :v => line}}
arr = arr.sort_by {|kv| kv[:k]}
mid = arr.size.succ / 2
arr = arr[0..mid-1].zip(arr[mid..-1]).flatten.compact.map {|kv| kv[:v]}
idx = (1..arr.size-1).find { |i| arr[i] == arr[i-1] }

puts idx ? arr.rotate(idx) : arr 

  file.txt

一般做法:

  • use associative array (linecnt[]) to keep count of number of times a line is seen
  • break linecnt[] into two separate normal arrays: single[1]=<lineX>; single[2]=<lineY> and multi[1]=<lineA_copy1>; multi[2]=<lineA_copy2>; multi[3]=<lineB_copy1>
  • while we have at least one entry in both arrays (single[] / multi[]) intersperse our printing (ie, print random(single[]), print randome(multi[]), print random(single[]), print randome(multi[])); NOTE: obviously not truly random but this allows us to maximize chances of separating dupes while limiting cpu overhead (ie, no need to repetitively shuffle hoping for a random ordering that splits dupes)
  • if we have any single[] entries left then print random(single[])
  • if we have any multi[] entries left then print random(multi[]); NOTE: assumes OP s comment re: tough!! means dupes can be printed consecutively if this is all that s left

1 D-1, 1 D-1, 1 D-1, 1 D-1, 1 P-5, 1 P-4, 1 P-3, 1 P-2, 1 GS 想法:

$ cat dupes.awk

function print_random(a, acnt,     ndx) {
    ndx=int(1 + rand() * acnt)
    print a[ndx]
    if (acnt>1) { a[ndx]=a[acnt]; delete a[acnt] }
    return --acnt
}

BEGIN { srand() }

      { linecnt[$0]++ }

END   { for (line in linecnt) {
            if (linecnt[line] == 1)
               single[++scnt]=line
            else
               for (i=1; i<=linecnt[line]; i++)
                   multi[++mcnt]=line
            delete linecnt[line]
        }

        while (scnt>0 && mcnt>0) {
              scnt=print_random(single,scnt)
              mcnt=print_random(multi,mcnt)
        }

        while (scnt>0)
              scnt=print_random(single,scnt)

        while (mcnt>0)
              mcnt=print_random(multi,mcnt)
      }

<>0>NOTES:

  • srand() isn t truly random (eg, two quick, successive runs can generate the same exact output)
  • additional steps could be added to insure quick, successive runs don t generate exact output (eg, providing an OS-level seed for use in srand())

对照OP的一组抽样数据:

$ awk -f dupes.awk test.txt
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476098</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476483</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576883</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82590483</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N>

REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576113</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576786</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576324</CUST-ACNT-N>

<>0>NOTES:

  • data lines cut for brevity
  • blank line added to highlight a) 1st block of interleaved single[] / multi[] entries and b) 2nd block finishing off the rest of the single[] entries
  • repeated runs will generate different results

A. 处理复印的例证

$ cat test.txt
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476098</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**99999999**</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**99999999**</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576786</CUST-ACNT-N>

运行<代码>awk的结果 文字:

$ awk -f dupes.awk test.txt
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576786</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**99999999**</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476098</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N>

REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**99999999**</CUST-ACNT-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N>

<>0>NOTES:

  • blank line added to highlight a) 1st block of interleaved single[] / multi[] entries and b) 2nd block finishing off the rest of the multi[] entries
  • repeated runs will generate different results

使用任何 a子:

$ cat tst.awk
match($0,/<CUST-ACNT-N>[^<]+</CUST-ACNT-N>/) {
    key = substr($0,RSTART,RLENGTH)
    gsub(/^<CUST-ACNT-N>|</CUST-ACNT-N>$/,"",key)
    keys[NR] = key
    lines[NR] = $0
}
END {
    srand()
    maxAttempts = 1000
    while ( (output == "") && (++attempts <= maxAttempts) ) {
        output = distribute()
    }
    printf "%s", output
    if ( output == "" ) {
        print "Error: Failed to distribute the input." | "cat>&2"
        exit 1
    }
}

function distribute(    iters,numLines,maxIters,tmpLines,tmpKeys,idx,i,ret) {
    for ( idx in keys ) {
        tmpKeys[idx] = keys[idx]
        tmpLines[idx] = lines[idx]
        numLines++
    }

    maxIters = 1000
    while ( (numLines > 0) && (++iters <= maxIters) ) {
        idx = int(1+rand()*numLines)

        if ( tmpKeys[idx] != prev ) {
            ret = ret tmpLines[idx] ORS
            prev = tmpKeys[idx]
            for ( i=idx; i<numLines; i++ ) {
                tmpKeys[i] = tmpKeys[i+1]
                tmpLines[i] = tmpLines[i+1]
            }
            numLines--
        }
    }

    if ( numLines ) {
        ret = ""
    }
    return ret
}

$ awk -f tst.awk file
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476098</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>83476483</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576324</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441754</ORD-AUTH-C><ORD-AUTH-V>98.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5759148</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82590483</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>25.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576113</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>28.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441754</ORD-AUTH-C><ORD-AUTH-V>94.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5759148</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576883</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>17.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>82576786</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>24.79</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>
REC-TYPE-C>CHARGE INVOICE</REC-TYPE-C><CUST-ACNT-N>**82576483**</CUST-ACNT-N><CUST-NAME-T>TEST TEN</CUST-NAME-T><ORD-AUTH-C>0044441552</ORD-AUTH-C><ORD-AUTH-V>21.99</ORD-AUTH-V><OUT-DOCM-D>01/09/2023</OUT-DOCM-D><ORD-N>5758655</ORD-N>

因此,在一次生产过程中,它尝试了1,000次(maxIters ),从一套未加工的线路中随机找到一个与产出中刚添加的线数相同的下线,但最终仍可能失败,因此它尝试了1,000次(maxAttempts)来生产产出。 这仍然可能失败——如果你喜欢的话,那么这些价值就会增加,但你仍然可以最终拿出产出,而产出只能像你一样组织起来(例如,只有两条线相同的投入线)。

You could make it more efficient and increase it s chances of success by changing this code:

        ret = ret tmpLines[idx] ORS
        prev = tmpKeys[idx]
        for ( i=idx; i<numLines; i++ ) {
            tmpKeys[i] = tmpKeys[i+1]
            tmpLines[i] = tmpLines[i+1]
        }
        numLines--

创建/使用二级阵列,其中仅包括与刚处理过的钥匙+线不相同的钥匙,然后我们不需要<代码>(tmpKeys[idx]! prev )测试,而且我们不会冒着<代码>idx = int(1+rand(numLines)的风险,这种风险是任意地在有其他人选择时发现同一关键1000次。 这一改进作为一项行动:-。

另一种做法——按行距、收集的pes,以及每条线检查现有的pes,以便尽可能将其lip出。 在经过处理之后,从开始尝试将剩余的pes(如果有的话)放在一起。

use warnings;
use strict;
use feature  say ;

use List::Util qw(shuffle);
use List::MoreUtils qw(firstidx any);

my @lines = <>; 
chomp @lines;

my @res = shift @lines;

my (@dupes, @mask_dupes);
my @shf = shuffle @lines;
say "Shuffled lines (", scalar(@res) + scalar(@shf), "):";
say for @res, @shf; say  - x40;

LINE:
for my $line (@shf) {    
    # Redistribute dupes found so far if possible
    DUPE:
    for my $idx (0..$#dupes) { 
        next DUPE if $dupes[$idx] eq $line;
        next DUPE if any { $idx == $_ } @mask_dupes;

        push @res, $line, $dupes[$idx];
        push @mask_dupes, $idx;
        next LINE;
    }   

    if ($line eq $res[-1]) { push @dupes, $line }
    else                   { push @res, $line }
}

# Redistribute remaining (unused) dupes   
my @final;
if (@dupes) {
    while ( my $line = shift @res ) {
        DUPE:
        for my $idx (0..$#dupes) {
            next DUPE if $dupes[$idx] eq $line;
            next DUPE if any { $idx == $_ } @mask_dupes;

            push @final, $line, $dupes[$idx];
            push @mask_dupes, $idx;
            last DUPE;
        }

        # Done redistributing dupes, copy the rest
        push @final, $line
            if not @dupes or $#dupes == $mask_dupes[-1];
    }
}

say "
Final (", scalar @final, " items):";
say for @final;

这在发现的阵列上pes,对每一条线进行制衡,看看它是否能够在现有的 du中lip。 它使用一种辅助掩蔽阵列来标示已经使用的 du。

倒数是,它必须寻找每个线路的pes阵,因此,原则上最糟糕的情况是O(N^2)。 然而,至少各种 du料预计会相当短,而不是整个阵列都得到搜索。 因此,如果投入是巨大的,我就认为,这种投入应当发挥相当可接受的作用。

通过各种投入进行测试,有许多重复的多个线路,但需要更多的测试。 (至少,增加基本诊断印刷,再三运行——每次重复使用都会帮助——并检查产出)

Using Lisp:

$ txr spread-sort.tl < data
Line 2
Line 4
Line 3
Line 1
Line 3
Line 8
Line 3
$ txr spread-sort.tl < data
Line 4
Line 3
Line 1
Line 3
Line 8
Line 3
Line 2
$ txr spread-sort.tl < data
Line 4
Line 3
Line 8
Line 3
Line 1
Line 3
Line 2

The code:

(set *random-state* (make-random-state))

(let ((dupstack (vec)))
  (labels ((distrib (single)
             (build
               (pend single)
               (each ((i 0..(len dupstack)))
                 (iflet ((item (pop [dupstack i])))
                   (add item)))
               (upd dupstack (remq nil))))
           (distrib-push (dupes)
             (prog1
               (distrib nil)
               (vec-push dupstack dupes))))
    (flow (get-lines)
      sort-group
      shuffle
      (mapcar [iff cdr distrib-push distrib])
      (mapcar distrib)
      tprint)))

这不是一种正确的算法,因为如果投入的重复率很高,则有正确的顺序,例如:

1
2
2

它不会始终如一地印制两份<代码>1 2的单号令,将复制件分开。

算法的主要流量载于<代码>/>>。 线路是从标准投入中获取的,通过<代码> ort-group,该编码将分类重复和分类,从而形成一组示意图清单。 斜线重复的线是1号长度清单。 我们随机填满这一名单,这意味着这些重复者一起停留。

然后,我们分发两张通行证的复制件,该通行证使用的是dupstack

在第一份通行证中,我们绘制了名单,以便通过<编码>distrib,复制品通过<编码>distrib-push。 下面描述的方式围绕这些重复。 通行证之后,有些项目仍留在<编码>dupstack;因此,名单没有所有项目。 我们再发放通行证,这只是通过<编码>distrib通过每一份名单,在<代码>dupstack上分发这些物品。

<代码>dupstack是清单的载体,这些清单是重复线的清单。 例如,[dupstack 0] 可能包含(“Line 3”“Line 3””

distrib 工作是:它穿透镜头,在每一要素的前线穿透一个元素,并将之附在投入清单中,并将该投入清单退回。 如果我们利用这一行动绘制地图,这意味着,在我们访问的每一个名单上,我们从每套重复中增加一个项目。 在每次穿过这套工作之后,我们先使用<条码>(顶端d dupstack (remq nil),以伪造已经空出的清单。

在处理具有不止一项内容的清单时,将使用<代码>distrib-push(>)。 distrib-push 仅收集任何现有复制件,即每一件。 这些项目从<条码>dupstack上计算,然后replace<>_em>。 这些项目是相同的,被推入重复打中的新位置。





相关问题
Why does my chdir to a filehandle not work in Perl?

When I try a "chdir" with a filehandle as argument, "chdir" returns 0 and a pwd returns still the same directory. Should that be so? I tried this, because in the documentation to chdir I found: "...

How do I use GetOptions to get the default argument?

I ve read the doc for GetOptions but I can t seem to find what I need... (maybe I am blind) What I want to do is to parse command line like this myperlscript.pl -mode [sth] [inputfile] I can use ...

Object-Oriented Perl constructor syntax and named parameters

I m a little confused about what is going on in Perl constructors. I found these two examples perldoc perlbot. package Foo; #In Perl, the constructor is just a subroutine called new. sub new { #I ...

Where can I find object-oriented Perl tutorials? [closed]

A Google search yields a number of results - but which ones are the best? The Perl site appears to contain two - perlboot and perltoot. I m reading these now, but what else is out there? Note: I ve ...