I m working with UTF-8 strings. I need to get a slice using byte-based indexes, not char-based.
I found references on the web to String#subseq
, which is supposed to be like String#[]
, but for bytes. Alas, it seems not to have made it to 1.9.1.
Now, why would I want to do that? There s a chance I ll end up with an invalid string should I slice in the middle of a multi-byte char. This sounds like a terrible idea.
Well, I m working with StringScanner
, and it turns out its internal pointers are byte-based. I accept other options here.
Here s what I m working with right now, but it s rather verbose:
s.dup.force_encoding("ASCII-8BIT")[ix...pos].force_encoding("UTF-8")
Both ix
and pos
come from StringScanner
, so are byte-based.