请注意,我使用<代码>p{Alpha},因为该词在技术上界定了词。 各位可以key子,增加数字,或确保一开始出现甲型六氯环己烷或你可能再次需要的东西。
还要指出的是,对于每行一字的档案,reg鱼是多余的,你应该 o。 Just rel=“nofollow”>chomp/code>
http://www.ohchr.org。
use 5.010; # for say
use strict;
use warnings;
my ( %hash );
sub load_words {
@hash{ @_ } = ( 0 ) x @_; return;
}
sub count_words {
$hash{$_}++ foreach grep { exists $hash{$_} } @_;
}
my $word_regex
= qr{ ( # start a capture
p{Alpha}+ # any sequence of one or more alpha characters
(?: # begin grouping of
[ -] # allow hyphenated words and contractions
p{Alpha}+ # which must be followed by an alpha
)* # any number of times
(?: (?<=s) )? # case for plural possessives (ht: tchrist)
) # end capture
}x;
# load @ARGV to do <> processing
@ARGV = qw( list of files I take words from );
while ( <> ) {
load_words( m/$word_regex/g );
}
@ARGV = qw( list of files where I count words );
while ( <> ) {
count_words( m/$word_regex/g );
}
# take a look at the hash
say Data::Dumper->Dump( [ \%hash ], [ *hash ] );