English 中文(简体)
ruby 1.9: how do I get a byte-index-based slice of a String?
原标题:

I m working with UTF-8 strings. I need to get a slice using byte-based indexes, not char-based.

I found references on the web to String#subseq, which is supposed to be like String#[], but for bytes. Alas, it seems not to have made it to 1.9.1.

Now, why would I want to do that? There s a chance I ll end up with an invalid string should I slice in the middle of a multi-byte char. This sounds like a terrible idea.

Well, I m working with StringScanner, and it turns out its internal pointers are byte-based. I accept other options here.

Here s what I m working with right now, but it s rather verbose:

s.dup.force_encoding("ASCII-8BIT")[ix...pos].force_encoding("UTF-8")

Both ix and pos come from StringScanner, so are byte-based.

问题回答

You can do this too: s.bytes.to_a[ix...pos].join(""), but that looks even more esoteric to me.

If you re calling the line several times, a nicer way to do it could be this:

class String
  def byteslice(*args)
    self.dup.force_encoding("ASCII-8BIT").slice(*args).force_encoding("UTF-8")
  end
end

s.byteslice(ix...pos)

Doesn t String#bytes do what you want? It returns an enumerator to the bytes in a string (as numbers, since they might not be valid characters, as you pointed out)

str.bytes.to_a.slice(...)

Use this monkeypatch until String#byteslice() is added to Ruby 1.9.

class String
  unless method_defined? :byteslice
    ##
    # Does the same thing as String#slice but
    # operates on bytes instead of characters.
    #
    def byteslice(*args)
      unpack( C* ).slice(*args).pack( C* )
    end
  end
end




相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签