Question

How do I remove duplicate characters and keep the unique one only. For example, my input is:

EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

预期产出是:

EFUAH
UEH
UJHACDEF

我来过<代码>perl -pe s/$1//g while/(.)*/,虽然这很不理想,但甚至消除了产出特性的单一发生。

Answer 1

可使用呈阳性光线:

perl -pe  s/(.)(?=.*?1)//g  FILE_NAME

The regex used is:(......)(?=?1)

. : to match any char.
first () : remember the matched single char.
(?=...) : +ve lookahead
.*? : to match anything in between
1 : the remembered match.
(.)(?=.*?1) : match and remember any char only if it appears again later in the string.
s/// : Perl way of doing the substitution.
g: to do the substitution globally...that is don t stop after first substitution.
s/(.)(?=.*?1)//g : this will delete a char from the input string only if that char appears again later in the string.

这将<>not > 保持投入中的焦炭顺序,因为对于投入体中的每一个独一无二的焦炭而言,我们保留了<><>last的发件,而不是>。

为了保持相对秩序不变,我们可以做什么? KennyTM 在其中一项评论中说明:

reverse the input line
do the substitution as before
reverse the result before printing

每一行:

perl -ne  $_=reverse;s/(.)(?=.*?1)//g;print scalar reverse;  FILE_NAME

自我们着手进行<条码> 倒置后,我们没有使用<代码>-p的旗帜,而是使用<代码>-n的旗帜。

我不敢肯定,这是这样做的最佳一线。我欢迎其他人,如果他们有更好的选择的话,就把这一回答ed。

Answer 2

如果有勿庸置疑,你也可以使用 a。此处为单行车贴上w子的单行人基准。 awk为10+秒,备案量为300万++ 线路

$ wc -l <file2
3210220

$ time awk  BEGIN{FS=""}{delete _;for(i=1;i<=NF;i++){if(!_[$i]++) printf $i};print""}  file2 >/dev/null

real    1m1.761s
user    0m58.565s
sys     0m1.568s

$ time perl -n -e  %seen=();  -e  for (split //) {print unless $seen{$_}++;}   file2 > /dev/null

real    1m32.123s
user    1m23.623s
sys     0m3.450s

$ time perl -ne  $_=reverse;s/(.)(?=.*?1)//g;print scalar reverse;  file2 >/dev/null

real    1m17.818s
user    1m10.611s
sys     0m2.557s

$ time perl -ne my%s;print grep!$s{$_}++,split//  file2 >/dev/null

real    1m20.347s
user    1m13.069s
sys     0m2.896s

Answer 3

perl -ne my%s;print grep!$s{$_}++,split//

Answer 4

这里的解决办法是,我认为应该比头头更快地工作,但不能靠reg和用 has。

perl -n -e  %seen=();  -e  for (split //) {print unless $seen{$_}++;}

它把每一条线分为特性和印本,只计算在百年内出现的数字。

Answer 5

Tie:IxHash是储存散草单的良好模块(但可能很缓慢,如果速度很重要,你将需要基准)。试样:

use Test::More 0.88;

use Tie::IxHash;
sub dedupe {
  my $str=shift;
  my $hash=Tie::IxHash->new(map { $_ => 1} split //,$str);
  return join(  ,$hash->Keys);
}

{
my $str= EFUAHUU ;
is(dedupe($str), EFUAH );
}

{
my $str= EFUAHHUU ;
is(dedupe($str), EFUAH );
}

{
my $str= UJUJHHACDEFUCU ;
is(dedupe($str), UJHACDEF );
}

done_testing();

Answer 6

Use uniq from

perl -MList::MoreUtils=uniq -ne  print uniq split ""

Answer 7

If the set of characters that can be encountered is restricted, e.g. only letters, then the easiest solution will be with tr
perl -p -e tr/a-zA-Z/a-zA-Z/s
It will replace all the letters by themselves, leaving other characters unaffected and /s modifier will squeeze repeated occurrences of the same character (after replacement), thus removing duplicates

www.un.org/Depts/DGACM/index_spanish.htm 反之,它只消除了令人厌恶的外表。 Disregard

Answer 8

这看起来像正面的典型应用,但不幸的是,这种支持没有说服力。事实上,只有这样做(把上述性质的案文与不能确定期限的完全限定语相匹配)。我认为,该网络的校外班。

然而,积极的 look头支持完全的reg,因此,你们都需要做的是扭转str,采用积极的 look头(如单角字塔说):

perl -pe  s/(.)(?=.*?1)//g

反之,因为如果不扭转这种情况,就只能把重复性放在最后一行。

www.un.org/Depts/DGACM/index_spanish.htm MASSIVE EDIT

我过去曾为此花费了半小时,这同这项工作一样,是,没有逆转。

perl -pe  s/G$1//g while (/(.).*(?=1)/g)  FILE_NAME

我不知道是骄傲还是可怕的。我基本上做的是正面的 lo头,然后与G具体指明的阵列相去,这使得reg发动机从最后的配对点开始配对(通常由()变数代表)。

试验投入如下:

aabbbcbbccbab

EFAUUUUH

ABCBBBBD

DEEEFEGGH

缩略语

产出如下:

ab

EFAUH

ABCD

DEFGH

ABC

页: 1 它的工作......

Explanation - Okay,如果我上次的解释不够明确,头部将走,最后停止重复变数[在代码中,你可以做的是打印物;在坡道内进行核对],S/G/g将删除[现在不需要]。因此,在整段内,替代物将继续去除,直到所有此类复制件都得到估价。当然,这或许对你的教条过于紧张......但你所看到的多数基于规范的解决办法也是如此。倒置/失控方法可能比这一方法更有效率。

Answer 9

从车上看,这项工作:

sed -e  s/$/<EOL>/ ; s/./&
/g  test.txt | uniq | sed -e :a -e  $!N; s/
//; ta ; s/<EOL>/
/g

换言之:标明每一条线,编号为<EOL> string, 然后将每一特性放在自己的一条线上,然后使用uniq去除重复线,然后冲破所有线,然后将背线而不是<EOL>标记。

I found the -e :a -e $!N; s/ //; ta part in a forum post and I don t understand the seperate -e :a part, or the $!N part, so if anyone can explain those, I d be grateful.

Hmm, that one does only consecutiveplicas; tomovall<>em>repreleases can do this:

Answer 10

use strict;
use warnings;

my ($uniq, $seq, @result);
$uniq =  ;
sub uniq {
    $seq = shift;
    for (split  ,$seq) {
    $uniq .=$_ unless $uniq =~ /$_/;
    }
    push @result,$uniq;
    $uniq=  ;
}

while(<DATA>){
   uniq($_);
}
print @result;

__DATA__
EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

产出:

EFUAH
UEH
UJHACDEF

Answer 11

页: 1

python -c "print set(open( foo.txt ).read())"

友情链接