Question

I ve written a Ruby script that is reading a file (File.read()) that contains unicode characters, and it works fine from the command line.

However, when I try to put it into an Automator Workflow (Mac OS X), I get this error;

2009-12-23 17:55:15 -0500: /Users/jeffreyaylesworth/bin/symbols:19:in `split : invalid byte sequence in US-ASCII (ArgumentError)
(traceback)

So when running from Automator, split suddenly doesn t like non ASCII characters. As far as I can tell, both are running from the same version of Ruby (the version number is the same).

I m not too concerned about why they are acting different (but if someone knows, that s great), but I would like a solution to make split accept non ASCII characters.

If it helps, I need to split text at a single character into two pieces, so if something that s similar to C s tokenizer would work, I can use that.

Answer 1

You don t specify the encoding of the file. Since it is impossible to reliably determine the encoding of a file automatically, the encoding must be explicitly specified. If it isn t, the external encoding is used, if that isn t set, the encoding specified in the environment is going to be used, and if the environment doesn t specify an encoding, the file is assumed to be in 7 bit US-ASCII.

In your case, it seems that there is either a difference in the two environments (automated scripts are often run in a very restrictive environment without locale settings) or in the way the interpreter gets invoked.

So, you d need to do something like

File.read( /path/to/file , encoding:  UTF-8 )

Answer 2

Sounds like the two are being run from different environments - with different LOCALE values.

友情链接