How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1.
Thanks
How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1.
Thanks
I d just use
file -bi myfile.txt
to determine the character encoding of a particular file.
A solution with an external dependency but I suspect file
is very common nowadays among all semi-modern distro s.
EDIT:
As a response to Laurence Gonsalves comment: b
is the option to be brief (not include the filename) and i
is the shorthand equivalent of --mime
so the most portable way (including Mac OSX) then probably is:
file --mime myfile.txt
There s no way to be 100% certain (unless you re dealing with a file format that internally states its encoding).
Most tools that attempt to make this distinction will try and decode the file as utf-8 (as that s the more strict encoding), and if that fails, then fall back to iso-8859-1. You can do this with iconv
"by hand", or you can use file
:
$ file utf8.txt
utf8.txt: UTF-8 Unicode text
$ file latin1.txt
latin1.txt: ISO-8859 text
Note that ASCII files are both UTF-8 and ISO-8859-1 compatible.
$ file ascii.txt
ascii.txt: ASCII text
Finally: there s no real way to distinguish between ISO-8859-1 and ISO-8859-2, for example, unless you re going to assume it s natural language and use statistical methods. This is probably why file says "ISO-8859".
you can use the file command
file --mime myfile.text
File command is not 100% certain. Simple test:
#!/bin/bash
echo "a" > /tmp/foo
for i in {1..1000000}
do
echo "asdas" >> /tmp/foo
done
echo "üöäÄÜÖß " >> /tmp/foo
file -b --mime-encoding /tmp/foo
this outputs:
us-ascii
Ascii does not know german umlauts.
File is a bunch of bytes (sequence of bytes). Without trusting meta data (BOM only recomended for utf-16 and utf-32, MIME, header of data) you can t really detect encoding. Sequence of bytes can be interpreted as utf-8 or ISO-8859-1/2 or anything you want. Well it depends for certain sequence if iso-8850-1/utf-8 map exist. What you want is to encode the whole file content to desired character encoding. If it fails the desired encoding does not have map for this sequence of bytes.
In shell maybe use python, perl or like Laurence Gonsalves says iconv. For text files I use in python this:
f = codecs.open(path, encoding= utf-8 , errors= strict )
def valid_string(str):
try:
str.decode( utf-8 )
return True
except UnicodeDecodeError:
return False
How do you that a file is a text file. You don t. You encode line by line with desired character encoding. Ok, you can add a little trust and check if BOM exists (file is utf encoded).
For security reasons, it is desirable to check the integrity of code before execution, avoiding tampered software by an attacker. So, my question is How to sign executable code and run only trusted ...
How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1. Thanks
i would like to know if there are any solution to do this. Does anyone? The big picture: I want to access data over the web, using my delphi thin clients. But i´would like to keep my server/service ...
Since cruise control is full of bugs that have wasted my entire week, I have decided the existing shell scripts I have are simpler and thus better. Here is what I have so far svn update /var/www/...
I ve just installed Zend Studio 7.0.2 on my Linux-Ubuntu 9.10 system. There were no problems during the installation but when I try to create a new project, the New Project form hangs when I click ...
I am running valgrind as follows:- /usr/local/bin/valgrind "process_name" After excecution its giving me following error ==21731== ==21731== Warning: Can t execute setuid/setgid executable: ==...
I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...
Is there anything other than DDD that will draw diagrams of my data structures like DDD does that runs on Linux? ddd is okay and runs, just kind of has an old klunky feeling to it, just wanted to ...