English 中文(简体)
我如何去除重复性,把独一无二的特性保留在Perl?
原标题:How do I remove duplicate characters and keep the unique one only in Perl?

How do I remove duplicate characters and keep the unique one only. For example, my input is:

EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

预期产出是:

EFUAH
UEH
UJHACDEF

我来过<代码>perl -pe s/$1//g while/(.)*/,虽然这很不理想,但甚至消除了产出特性的单一发生。

问题回答

可使用呈阳性光线:

perl -pe  s/(.)(?=.*?1)//g  FILE_NAME

The regex used is:(......)(?=?1)

  • . : to match any char.
  • first () : remember the matched single char.
  • (?=...) : +ve lookahead
  • .*? : to match anything in between
  • 1 : the remembered match.
  • (.)(?=.*?1) : match and remember any char only if it appears again later in the string.
  • s/// : Perl way of doing the substitution.
  • g: to do the substitution globally...that is don t stop after first substitution.
  • s/(.)(?=.*?1)//g : this will delete a char from the input string only if that char appears again later in the string.

这将<>not > 保持投入中的焦炭顺序,因为对于投入体中的每一个独一无二的焦炭而言,我们保留了<><>last的发件,而不是>

为了保持相对秩序不变,我们可以做什么? KennyTM 在其中一项评论中说明:

  • reverse the input line
  • do the substitution as before
  • reverse the result before printing

每一行:

perl -ne  $_=reverse;s/(.)(?=.*?1)//g;print scalar reverse;  FILE_NAME

自我们着手进行<条码> 倒置后,我们没有使用<代码>-p的旗帜,而是使用<代码>-n的旗帜。

我不敢肯定,这是这样做的最佳一线。 我欢迎其他人,如果他们有更好的选择的话,就把这一回答ed。

如果有 勿庸置疑,你也可以使用 a。 此处为单行车贴上w子的单行人基准。 awk为10+秒,备案量为300万++ 线路

$ wc -l <file2
3210220

$ time awk  BEGIN{FS=""}{delete _;for(i=1;i<=NF;i++){if(!_[$i]++) printf $i};print""}  file2 >/dev/null

real    1m1.761s
user    0m58.565s
sys     0m1.568s

$ time perl -n -e  %seen=();  -e  for (split //) {print unless $seen{$_}++;}   file2 > /dev/null

real    1m32.123s
user    1m23.623s
sys     0m3.450s

$ time perl -ne  $_=reverse;s/(.)(?=.*?1)//g;print scalar reverse;  file2 >/dev/null

real    1m17.818s
user    1m10.611s
sys     0m2.557s

$ time perl -ne my%s;print grep!$s{$_}++,split//  file2 >/dev/null

real    1m20.347s
user    1m13.069s
sys     0m2.896s
perl -ne my%s;print grep!$s{$_}++,split// 

这里的解决办法是,我认为应该比头头更快地工作,但不能靠reg和用 has。

perl -n -e  %seen=();  -e  for (split //) {print unless $seen{$_}++;}  

它把每一条线分为特性和印本,只计算在百年内出现的数字。

Tie:IxHash是储存散草单的良好模块(但可能很缓慢,如果速度很重要,你将需要基准)。 试样:

use Test::More 0.88;

use Tie::IxHash;
sub dedupe {
  my $str=shift;
  my $hash=Tie::IxHash->new(map { $_ => 1} split //,$str);
  return join(  ,$hash->Keys);
}

{
my $str= EFUAHUU ;
is(dedupe($str), EFUAH );
}

{
my $str= EFUAHHUU ;
is(dedupe($str), EFUAH );
}

{
my $str= UJUJHHACDEFUCU ;
is(dedupe($str), UJHACDEF );
}

done_testing();

If the set of characters that can be encountered is restricted, e.g. only letters, then the easiest solution will be with tr
perl -p -e tr/a-zA-Z/a-zA-Z/s
It will replace all the letters by themselves, leaving other characters unaffected and /s modifier will squeeze repeated occurrences of the same character (after replacement), thus removing duplicates

www.un.org/Depts/DGACM/index_spanish.htm 反之,它只消除了令人厌恶的外表。 Disregard

这看起来像正面的典型应用,但不幸的是,这种支持没有说服力。 事实上,只有这样做(把上述性质的案文与不能确定期限的完全限定语相匹配)。 我认为,该网络的校外班。

然而,积极的 look头支持完全的reg,因此,你们都需要做的是扭转str,采用积极的 look头(如单角字塔说):

perl -pe  s/(.)(?=.*?1)//g  

反之,因为如果不扭转这种情况,就只能把重复性放在最后一行。

www.un.org/Depts/DGACM/index_spanish.htm MASSIVE EDIT

我过去曾为此花费了半小时,这同这项工作一样,是,没有逆转

perl -pe  s/G$1//g while (/(.).*(?=1)/g)  FILE_NAME

我不知道是骄傲还是可怕的。 我基本上做的是正面的 lo头,然后与G具体指明的阵列相去,这使得reg发动机从最后的配对点开始配对(通常由()变数代表)。

试验投入如下:

aabbbcbbccbab

EFAUUUUH

ABCBBBBD

DEEEFEGGH

缩略语

产出如下:

ab

EFAUH

ABCD

DEFGH

ABC

页: 1 它的工作......

Explanation - Okay,如果我上次的解释不够明确,头部将走,最后停止重复变数[在代码中,你可以做的是打印物;在坡道内进行核对],S/G/g将删除[现在不需要]。 因此,在整段内,替代物将继续去除,直到所有此类复制件都得到估价。 当然,这或许对你的教条过于紧张......但你所看到的多数基于规范的解决办法也是如此。 倒置/失控方法可能比这一方法更有效率。

从车上看,这项工作:

sed -e  s/$/<EOL>/ ; s/./&
/g  test.txt | uniq | sed -e :a -e  $!N; s/
//; ta ; s/<EOL>/
/g 

换言之:标明每一条线,编号为<EOL> string, 然后将每一特性放在自己的一条线上,然后使用uniq去除重复线,然后冲破所有线,然后将背线而不是<EOL>标记。

I found the -e :a -e $!N; s/ //; ta part in a forum post and I don t understand the seperate -e :a part, or the $!N part, so if anyone can explain those, I d be grateful.

Hmm, that one does only consecutiveplicas; tomovall<>em>repreleases can do this:

cat test.txt | while read line ; do echo $line | sed -e  s/./&
/g  | sort | uniq | sed -e :a -e  $!N; s/
//; ta  ; done

因此,按字母顺序排列每一行的特性。

use strict;
use warnings;

my ($uniq, $seq, @result);
$uniq =  ;
sub uniq {
    $seq = shift;
    for (split  ,$seq) {
    $uniq .=$_ unless $uniq =~ /$_/;
    }
    push @result,$uniq;
    $uniq=  ;
}

while(<DATA>){
   uniq($_);
}
print @result;

__DATA__
EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

产出:

EFUAH
UEH
UJHACDEF

页: 1

python -c "print set(open( foo.txt ).read())"




相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...