How do I remove duplicate characters and keep the unique one only. For example, my input is:
EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU
预期产出是:
EFUAH
UEH
UJHACDEF
我来过<代码>perl -pe s/$1//g while/(.)*/,虽然这很不理想,但甚至消除了产出特性的单一发生。
How do I remove duplicate characters and keep the unique one only. For example, my input is:
EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU
预期产出是:
EFUAH
UEH
UJHACDEF
我来过<代码>perl -pe s/$1//g while/(.)*/,虽然这很不理想,但甚至消除了产出特性的单一发生。
可使用呈阳性光线:
perl -pe s/(.)(?=.*?1)//g FILE_NAME
The regex used is:(......)(?=?1)
.
: to match any char.()
: remember the matched
single char.(?=...)
: +ve lookahead.*?
: to match anything in between1
: the remembered match.(.)(?=.*?1)
: match and remember
any char only if it appears again
later in the string.s///
: Perl way of doing the
substitution.g
: to do the substitution
globally...that is don t stop after
first substitution.s/(.)(?=.*?1)//g
: this will
delete a char from the input string
only if that char appears again later
in the string.这将<>not > 保持投入中的焦炭顺序,因为对于投入体中的每一个独一无二的焦炭而言,我们保留了<><>
为了保持相对秩序不变,我们可以做什么? KennyTM 在其中一项评论中说明:
每一行:
perl -ne $_=reverse;s/(.)(?=.*?1)//g;print scalar reverse; FILE_NAME
自我们着手进行<条码> 倒置后,我们没有使用<代码>-p的旗帜,而是使用<代码>-n的旗帜。
我不敢肯定,这是这样做的最佳一线。 我欢迎其他人,如果他们有更好的选择的话,就把这一回答ed。
如果有 勿庸置疑,你也可以使用 a。 此处为单行车贴上w子的单行人基准。 awk为10+秒,备案量为300万++ 线路
$ wc -l <file2
3210220
$ time awk BEGIN{FS=""}{delete _;for(i=1;i<=NF;i++){if(!_[$i]++) printf $i};print""} file2 >/dev/null
real 1m1.761s
user 0m58.565s
sys 0m1.568s
$ time perl -n -e %seen=(); -e for (split //) {print unless $seen{$_}++;} file2 > /dev/null
real 1m32.123s
user 1m23.623s
sys 0m3.450s
$ time perl -ne $_=reverse;s/(.)(?=.*?1)//g;print scalar reverse; file2 >/dev/null
real 1m17.818s
user 1m10.611s
sys 0m2.557s
$ time perl -ne my%s;print grep!$s{$_}++,split// file2 >/dev/null
real 1m20.347s
user 1m13.069s
sys 0m2.896s
perl -ne my%s;print grep!$s{$_}++,split//
这里的解决办法是,我认为应该比头头更快地工作,但不能靠reg和用 has。
perl -n -e %seen=(); -e for (split //) {print unless $seen{$_}++;}
它把每一条线分为特性和印本,只计算在百年内出现的数字。
Tie:IxHash是储存散草单的良好模块(但可能很缓慢,如果速度很重要,你将需要基准)。 试样:
use Test::More 0.88;
use Tie::IxHash;
sub dedupe {
my $str=shift;
my $hash=Tie::IxHash->new(map { $_ => 1} split //,$str);
return join( ,$hash->Keys);
}
{
my $str= EFUAHUU ;
is(dedupe($str), EFUAH );
}
{
my $str= EFUAHHUU ;
is(dedupe($str), EFUAH );
}
{
my $str= UJUJHHACDEFUCU ;
is(dedupe($str), UJHACDEF );
}
done_testing();
Use uniq from
perl -MList::MoreUtils=uniq -ne print uniq split ""
If the set of characters that can be encountered is restricted, e.g. only letters, then the easiest solution will be with tr
perl -p -e tr/a-zA-Z/a-zA-Z/s
It will replace all the letters by themselves, leaving other characters unaffected and /s modifier will squeeze repeated occurrences of the same character (after replacement), thus removing duplicates
www.un.org/Depts/DGACM/index_spanish.htm 反之,它只消除了令人厌恶的外表。 Disregard
这看起来像正面的典型应用,但不幸的是,这种支持没有说服力。 事实上,只有这样做(把上述性质的案文与不能确定期限的完全限定语相匹配)。 我认为,该网络的校外班。
然而,积极的 look头支持完全的reg,因此,你们都需要做的是扭转str,采用积极的 look头(如单角字塔说):
perl -pe s/(.)(?=.*?1)//g
反之,因为如果不扭转这种情况,就只能把重复性放在最后一行。
www.un.org/Depts/DGACM/index_spanish.htm MASSIVE EDIT
我过去曾为此花费了半小时,这同这项工作一样,是,没有逆转。
perl -pe s/G$1//g while (/(.).*(?=1)/g) FILE_NAME
我不知道是骄傲还是可怕的。 我基本上做的是正面的 lo头,然后与G具体指明的阵列相去,这使得reg发动机从最后的配对点开始配对(通常由()变数代表)。
试验投入如下:
aabbbcbbccbab
EFAUUUUH
ABCBBBBD
DEEEFEGGH
缩略语
产出如下:
ab
EFAUH
ABCD
DEFGH
ABC
页: 1 它的工作......
从车上看,这项工作:
sed -e s/$/<EOL>/ ; s/./&
/g test.txt | uniq | sed -e :a -e $!N; s/
//; ta ; s/<EOL>/
/g
换言之:标明每一条线,编号为<EOL>
string, 然后将每一特性放在自己的一条线上,然后使用uniq
去除重复线,然后冲破所有线,然后将背线而不是<EOL>
标记。
I found the -e :a -e $!N; s/
//; ta
part in a forum post and I don t understand the seperate -e :a
part, or the $!N
part, so if anyone can explain those, I d be grateful.
Hmm, that one does only consecutiveplicas; tomovall<>em>repreleases can do this:
cat test.txt | while read line ; do echo $line | sed -e s/./&
/g | sort | uniq | sed -e :a -e $!N; s/
//; ta ; done
因此,按字母顺序排列每一行的特性。
use strict;
use warnings;
my ($uniq, $seq, @result);
$uniq = ;
sub uniq {
$seq = shift;
for (split ,$seq) {
$uniq .=$_ unless $uniq =~ /$_/;
}
push @result,$uniq;
$uniq= ;
}
while(<DATA>){
uniq($_);
}
print @result;
__DATA__
EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU
产出:
EFUAH
UEH
UJHACDEF
页: 1
python -c "print set(open( foo.txt ).read())"
I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...
The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?
I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...
I tried to print all the possible combination of members of several vectors. Why the function below doesn t return the string as I expected? #include <iostream> #include <vector> #...
I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...
I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??
Is there a PHP string function that transforms a multi-line string into a single-line string? I m getting some data back from an API that contains multiple lines. For example: <p>Some Data</...
I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...