English 中文(简体)
GSM-7 conversion- and septet-encoding library in Ruby?
原标题:

I am looking for a pure Ruby solution to convert UTF-8 to GSM-7 and back, and do septet encoding/decoding along the way.

Background here is: Sending and receiving SMS via a gateway and via REST-requests.

I found a solution with libiconv (http://mobiletidings.com/2009/07/06/gsm-7-encoding-gnu-libiconv/) (which works more or less, but is not accepted into libiconv itself due to certain deficiencies).

I would prefer a pure Ruby solution, most probably with a lookup-table for conversion and a 7-8 bit encoder to handle the resulting septets.

Anyone out there already did this? Any pointers?

Thanks, Tom!

问题回答

I was looking for exactly the same (converting UTF-8 to GSM 03.38), but found nothing. So I wrote a little converter:

require  iconv 

class String

  GSM0338_MAP = [
    0x00, 0x0040, # COMMERCIAL AT
    0x01, 0x00A3, # POUND SIGN
    0x02, 0x0024, # DOLLAR SIGN
    0x03, 0x00A5, # YEN SIGN
    0x04, 0x00E8, # LATIN SMALL LETTER E WITH GRAVE
    0x05, 0x00E9, # LATIN SMALL LETTER E WITH ACUTE
    0x06, 0x00F9, # LATIN SMALL LETTER U WITH GRAVE
    0x07, 0x00EC, # LATIN SMALL LETTER I WITH GRAVE
    0x08, 0x00F2, # LATIN SMALL LETTER O WITH GRAVE
    0x09, 0x00E7, # LATIN SMALL LETTER C WITH CEDILLA
    0x0B, 0x00D8, # LATIN CAPITAL LETTER O WITH STROKE
    0x0C, 0x00F8, # LATIN SMALL LETTER O WITH STROKE
    0x0E, 0x00C5, # LATIN CAPITAL LETTER A WITH RING ABOVE
    0x0F, 0x00E5, # LATIN SMALL LETTER A WITH RING ABOVE
    0x11, 0x005F, # LOW LINE
    [0x1B, 0x14], 0x005E, # CIRCUMFLEX ACCENT
    [0x1B, 0x28], 0x007B, # LEFT CURLY BRACKET
    [0x1B, 0x29], 0x007D, # RIGHT CURLY BRACKET
    [0x1B, 0x2F], 0x005C, # REVERSE SOLIDUS
    [0x1B, 0x3C], 0x005B, # LEFT SQUARE BRACKET
    [0x1B, 0x3D], 0x007E, # TILDE
    [0x1B, 0x3E], 0x005D, # RIGHT SQUARE BRACKET
    [0x1B, 0x40], 0x007C, # VERTICAL LINE
    0x1C, 0x00C6, # LATIN CAPITAL LETTER AE
    0x1D, 0x00E6, # LATIN SMALL LETTER AE
    0x1E, 0x00DF, # LATIN SMALL LETTER SHARP S (German)
    0x1F, 0x00C9, # LATIN CAPITAL LETTER E WITH ACUTE
    0x20, 0x0020, # SPACE
    0x21, 0x0021, # EXCLAMATION MARK
    0x22, 0x0022, # QUOTATION MARK
    0x23, 0x0023, # NUMBER SIGN
    0x24, 0x00A4, # CURRENCY SIGN
    0x25, 0x0025, # PERCENT SIGN
    0x26, 0x0026, # AMPERSAND
    0x27, 0x0027, # APOSTROPHE
    0x00, 0x0040, # COMMERCIAL AT
    0x01, 0x00A3, # POUND SIGN
    0x02, 0x0024, # DOLLAR SIGN
    0x03, 0x00A5, # YEN SIGN
    0x04, 0x00E8, # LATIN SMALL LETTER E WITH GRAVE
    0x05, 0x00E9, # LATIN SMALL LETTER E WITH ACUTE
    0x06, 0x00F9, # LATIN SMALL LETTER U WITH GRAVE
    0x07, 0x00EC, # LATIN SMALL LETTER I WITH GRAVE
    0x08, 0x00F2, # LATIN SMALL LETTER O WITH GRAVE
    0x09, 0x00E7, # LATIN SMALL LETTER C WITH CEDILLA
    0x0B, 0x00D8, # LATIN CAPITAL LETTER O WITH STROKE
    0x0C, 0x00F8, # LATIN SMALL LETTER O WITH STROKE
    0x0E, 0x00C5, # LATIN CAPITAL LETTER A WITH RING ABOVE
    0x0F, 0x00E5, # LATIN SMALL LETTER A WITH RING ABOVE
    0x11, 0x005F, # LOW LINE
    [0x1B, 0x14], 0x005E, # CIRCUMFLEX ACCENT
    [0x1B, 0x28], 0x007B, # LEFT CURLY BRACKET
    [0x1B, 0x29], 0x007D, # RIGHT CURLY BRACKET
    [0x1B, 0x2F], 0x005C, # REVERSE SOLIDUS
    [0x1B, 0x3C], 0x005B, # LEFT SQUARE BRACKET
    [0x1B, 0x3D], 0x007E, # TILDE
    [0x1B, 0x3E], 0x005D, # RIGHT SQUARE BRACKET
    [0x1B, 0x40], 0x007C, # VERTICAL LINE
    0x1C, 0x00C6, # LATIN CAPITAL LETTER AE
    0x1D, 0x00E6, # LATIN SMALL LETTER AE
    0x1E, 0x00DF, # LATIN SMALL LETTER SHARP S (German)
    0x1F, 0x00C9, # LATIN CAPITAL LETTER E WITH ACUTE
    0x20, 0x0020, # SPACE
    0x21, 0x0021, # EXCLAMATION MARK
    0x22, 0x0022, # QUOTATION MARK
    0x23, 0x0023, # NUMBER SIGN
    0x24, 0x00A4, # CURRENCY SIGN
    0x25, 0x0025, # PERCENT SIGN
    0x26, 0x0026, # AMPERSAND
    0x27, 0x0027, # APOSTROPHE
    0x28, 0x0028, # LEFT PARENTHESIS
    0x29, 0x0029, # RIGHT PARENTHESIS
    0x2A, 0x002A, # ASTERISK
    0x2B, 0x002B, # PLUS SIGN
    0x2C, 0x002C, # COMMA
    0x2D, 0x002D, # HYPHEN-MINUS
    0x2E, 0x002E, # FULL STOP
    0x2F, 0x002F, # SOLIDUS
    0x30, 0x0030, # DIGIT ZERO
    0x31, 0x0031, # DIGIT ONE
    0x32, 0x0032, # DIGIT TWO
    0x33, 0x0033, # DIGIT THREE
    0x34, 0x0034, # DIGIT FOUR
    0x35, 0x0035, # DIGIT FIVE
    0x36, 0x0036, # DIGIT SIX
    0x37, 0x0037, # DIGIT SEVEN
    0x38, 0x0038, # DIGIT EIGHT
    0x39, 0x0039, # DIGIT NINE
    0x3A, 0x003A, # COLON
    0x3B, 0x003B, # SEMICOLON
    0x3C, 0x003C, # LESS-THAN SIGN
    0x3D, 0x003D, # EQUALS SIGN
    0x3E, 0x003E, # GREATER-THAN SIGN
    0x3F, 0x003F, # QUESTION MARK
    0x40, 0x00A1, # INVERTED EXCLAMATION MARK
    0x41, 0x0041, # LATIN CAPITAL LETTER A
    0x42, 0x0042, # LATIN CAPITAL LETTER B
    0x42, 0x0392, # GREEK CAPITAL LETTER BETA
    0x43, 0x0043, # LATIN CAPITAL LETTER C
    0x44, 0x0044, # LATIN CAPITAL LETTER D
    0x45, 0x0045, # LATIN CAPITAL LETTER E
    0x46, 0x0046, # LATIN CAPITAL LETTER F
    0x47, 0x0047, # LATIN CAPITAL LETTER G
    0x48, 0x0048, # LATIN CAPITAL LETTER H
    0x49, 0x0049, # LATIN CAPITAL LETTER I
    0x4A, 0x004A, # LATIN CAPITAL LETTER J
    0x4B, 0x004B, # LATIN CAPITAL LETTER K
    0x4C, 0x004C, # LATIN CAPITAL LETTER L
    0x4D, 0x004D, # LATIN CAPITAL LETTER M
    0x4E, 0x004E, # LATIN CAPITAL LETTER N
    0x4F, 0x004F, # LATIN CAPITAL LETTER O
    0x50, 0x0050, # LATIN CAPITAL LETTER P
    0x51, 0x0051, # LATIN CAPITAL LETTER Q
    0x52, 0x0052, # LATIN CAPITAL LETTER R
    0x53, 0x0053, # LATIN CAPITAL LETTER S
    0x54, 0x0054, # LATIN CAPITAL LETTER T
    0x55, 0x0055, # LATIN CAPITAL LETTER U
    0x56, 0x0056, # LATIN CAPITAL LETTER V
    0x57, 0x0057, # LATIN CAPITAL LETTER W
    0x58, 0x0058, # LATIN CAPITAL LETTER X
    0x59, 0x0059, # LATIN CAPITAL LETTER Y
    0x5A, 0x005A, # LATIN CAPITAL LETTER Z
    0x5B, 0x00C4, # LATIN CAPITAL LETTER A WITH DIAERESIS
    0x5C, 0x00D6, # LATIN CAPITAL LETTER O WITH DIAERESIS
    0x5D, 0x00D1, # LATIN CAPITAL LETTER N WITH TILDE
    0x5E, 0x00DC, # LATIN CAPITAL LETTER U WITH DIAERESIS
    0x5F, 0x00A7, # SECTION SIGN
    0x60, 0x00BF, # INVERTED QUESTION MARK
    0x61, 0x0061, # LATIN SMALL LETTER A
    0x62, 0x0062, # LATIN SMALL LETTER B
    0x63, 0x0063, # LATIN SMALL LETTER C
    0x64, 0x0064, # LATIN SMALL LETTER D
    0x65, 0x0065, # LATIN SMALL LETTER E
    0x66, 0x0066, # LATIN SMALL LETTER F
    0x67, 0x0067, # LATIN SMALL LETTER G
    0x68, 0x0068, # LATIN SMALL LETTER H
    0x69, 0x0069, # LATIN SMALL LETTER I
    0x6A, 0x006A, # LATIN SMALL LETTER J
    0x6B, 0x006B, # LATIN SMALL LETTER K
    0x6C, 0x006C, # LATIN SMALL LETTER L
    0x6D, 0x006D, # LATIN SMALL LETTER M
    0x6E, 0x006E, # LATIN SMALL LETTER N
    0x6F, 0x006F, # LATIN SMALL LETTER O
    0x70, 0x0070, # LATIN SMALL LETTER P
    0x71, 0x0071, # LATIN SMALL LETTER Q
    0x72, 0x0072, # LATIN SMALL LETTER R
    0x73, 0x0073, # LATIN SMALL LETTER S
    0x74, 0x0074, # LATIN SMALL LETTER T
    0x75, 0x0075, # LATIN SMALL LETTER U
    0x76, 0x0076, # LATIN SMALL LETTER V
    0x77, 0x0077, # LATIN SMALL LETTER W
    0x78, 0x0078, # LATIN SMALL LETTER X
    0x79, 0x0079, # LATIN SMALL LETTER Y
    0x7A, 0x007A, # LATIN SMALL LETTER Z
    0x7B, 0x00E4, # LATIN SMALL LETTER A WITH DIAERESIS
    0x7C, 0x00F6, # LATIN SMALL LETTER O WITH DIAERESIS
    0x7D, 0x00F1, # LATIN SMALL LETTER N WITH TILDE
    0x7E, 0x00FC, # LATIN SMALL LETTER U WITH DIAERESIS
    0x7F, 0x00E0  # LATIN SMALL LETTER A WITH GRAVE
  ]

  # Returns a new string in GSM 03.38 encoding
  def to_gsm0338
    latin1_to_gsm_map = Hash.new
    GSM0338_MAP.each_slice(2) { |gsm_symbol, latin1_symbol| latin1_to_gsm_map[latin1_symbol] = gsm_symbol }
    Iconv.iconv( ISO-8859-1 ,  UTF-8 , self).first.unpack( C* ).collect { |utf8_symbol| latin1_to_gsm_map[utf8_symbol] }.flatten.pack( c* )
  end
end



# Test
if $0 == __FILE__
  puts "[ÖÄÜöäü]~|-_".to_gsm0338.inspect
end

I can now use this for sending SMS via sms_client (smsclient.org) in ruby:

system "sms_client 0123456789  #{msg.to_gsm0338} "

I know it s an old question, but I was looking for a solution to this. I found a gem that is working good I think: https://github.com/livebg/smstools

I m using it in both Rails 4 and 5.





相关问题
Ruby parser in Java

The project I m doing is written in Java and parsers source code files. (Java src up to now). Now I d like to enable parsing Ruby code as well. Therefore I am looking for a parser in Java that parses ...

rails collection_select vs. select

collection_select and select Rails helpers: Which one should I use? I can t see a difference in both ways. Both helpers take a collection and generates options tags inside a select tag. Is there a ...

RubyCAS-Client question: Rails

I ve installed RubyCAS-Client version 2.1.0 as a plugin within a rails app. It s working, but I d like to remove the ?ticket= in the url. Is this possible?

Ordering a hash to xml: Rails

I m building an xml document from a hash. The xml attributes need to be in order. How can this be accomplished? hash.to_xml

multiple ruby extension modules under one directory

Can sources for discrete ruby extension modules live in the same directory, controlled by the same extconf.rb script? Background: I ve a project with two extension modules, foo.so and bar.so which ...

Text Editor for Ruby-on-Rails

guys which text editor is good for Rubyonrails? i m using Windows and i was using E-Texteditor but its not free n its expired now can anyone plese tell me any free texteditor? n which one is best an ...

热门标签