English 中文(简体)
更好地将结构化案文转化为商业实体
原标题:Better way to convert structured text to business entities
  • 时间:2011-11-22 02:06:03
  •  标签:
  • c#
  • .net

I am trying to find a better solution to convert a plain text (but with predefined lengths for each field) to a business entity. For example the input text can be "Testuser new york 10018", the first 11 characters indicates user name, next 12 character indicates city and next 5 characters indicates zip code. The input text can be long like 1000 characters, which represents multiple properties in an entity

任何帮助都值得赞赏。 增 编

我尝试采用这种办法。

  1. 定义的xml结构,可以被注入商业实体

  2. 使用xslt,通过在投入案文中使用分层功能,对每一节点进行导航,并填充xml元素值。

  3. Xml一旦有人居住,将Xml传给实体。

但我认为,上述办法可能无法规模,能够装载多轴带,将不同投入转化为相应的xmls。

问题回答

单单和法文本可在<条码>中使用定期表述。 System.Text.RegularExpressions namespace, so some such as this:

static Regex inputParser = new Regex("(.{11})(.{12})(.{5})", RegexOptions.Compiled");

foreach(Match m in inputParser.Matches(yourInput)) {
    BusinessEntity e = new BusinessEntity();
    e.Username = m.Groups(1).Value.TrimEnd(); // Remove spaces from the end; I take it that s what they ll be padded with
    e.City = m.Groups(2).Value.TrimEnd();
    e.ZipCode = m.Groups(3).Value;
    myListOfBusinessEntities.Add(e);
}

如果你面临一种单一情况,你就可以简单地写出一个简单的类别,其方法是掌握一个文字线,并归还一个新实体。

If you pad your lines with blanks, having a fixed length line, a binary reader with System.Text.Encoding class and GetString method can produce a faster solution.

Based on the refinement of the question, I am inferring that you have multiple different formats for different inputs. Here is an implementation of IFormatter that should get you most of the way there. Note that this is broken in several different ways, hacky, and comes with no sort of guarantee:

void Test()
{
    var serializer = new FixedWidthSerializer<MyClass>();
    var ms = new MemoryStream();
    serializer.Serialize(ms, new MyClass { Age = 30, FirstName = "John", LastName = "Doe"});
    ms.Position = 0;
    var newMyClass = (MyClass)serializer.Deserialize(ms);
}

[Serializable]
private class MyClass
{
    public String FirstName { get; set; }
    public String LastName;
    public Int32 Age { get; set; }
}

public class FixedWidthSerializer<T> : IFormatter
{
    private readonly FixedWidthFieldDefinition[] _fieldDefinition;

    public FixedWidthSerializer()
        : 
        this(FormatterServices.GetSerializableMembers(typeof(T)).Select(sm=>new FixedWidthFieldDefinition(sm.Name, 100)).ToArray())
    { }

    public FixedWidthSerializer(FixedWidthFieldDefinition[] fieldDefinition)
    {
        if (fieldDefinition == null) throw new ArgumentNullException("fieldDefinition");
        _fieldDefinition = fieldDefinition;
        Context = new StreamingContext(StreamingContextStates.All);            
    }

    public class FixedWidthFieldDefinition
    {
        public String FieldName { get; protected set; }
        public Int32 CharLength { get; protected set; }

        public FixedWidthFieldDefinition(String fieldName, Int32 charLength)
        {
            FieldName = fieldName;
            CharLength = charLength;
        }
    }

    public object Deserialize(Stream serializationStream)
    {
        var streamReader = new StreamReader(serializationStream);
        var textLine = streamReader.ReadLine();

        if (textLine == null)
            throw new SerializationException("Ran out of text!");

        var obj = FormatterServices.GetUninitializedObject(typeof (T));
        var memberDictionary = FormatterServices.GetSerializableMembers(obj.GetType(), Context).ToDictionary(mi => mi.Name);

        var offset = 0;
        foreach (var fieldDef in _fieldDefinition)
        {
            if (offset + fieldDef.CharLength > textLine.Length)
                throw new SerializationException("Line was too short!");

            // Read the current field and increase the offset
            var fieldStringValue = textLine.Substring(offset, fieldDef.CharLength);
            offset += fieldDef.CharLength;

            MemberInfo memberInfo;

            if (!memberDictionary.TryGetValue(fieldDef.FieldName, out memberInfo))
                throw new SerializationException("You asked for the member  " + fieldDef.FieldName + " , but it doesn t exist on type  " + typeof (T) + " ");

            var memberAsField = memberInfo as FieldInfo;

            if (memberAsField != null)
                memberAsField.SetValue(obj, Convert.ChangeType(fieldStringValue.TrimEnd(), memberAsField.FieldType));
            else
                throw new SerializationException("I don t know what to make of the property  " + fieldDef.FieldName + " ");
        }
        return obj;
    }

    public void Serialize(Stream serializationStream, object graph)
    {
        var serializableMembers = FormatterServices.GetSerializableMembers(graph.GetType());
        var membersToSerialize = _fieldDefinition.Select(fd => serializableMembers.First(sm => sm.Name == fd.FieldName)).ToArray();
        var objectData = FormatterServices.GetObjectData(graph, membersToSerialize);
        var sb = new StringBuilder(_fieldDefinition.Sum(fd => fd.CharLength));
        for (var i = 0; i < _fieldDefinition.Length; i++)
            sb.Append(((String) Convert.ChangeType(objectData[i], typeof (String))).PadRight(_fieldDefinition[i].CharLength), 0, _fieldDefinition[i].CharLength);
        var sw = new StreamWriter(serializationStream);
        sw.WriteLine(sb.ToString());
        sw.Flush();
    }

    public ISurrogateSelector SurrogateSelector { get; set; }

    public SerializationBinder Binder { get; set; }

    public StreamingContext Context { get; set; }
}




相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签