English 中文(简体)
What is the best file parsing solution for converting files?
原标题:

I am looking for the best solution for custom file parsing for our enterprise import routines. I want to basically change one file format into a standard file format and have one routine that imports that data into the database. I need to be able to create custom scripts for each client since its difficult to get the customer to comply with a standard or template format. I have looked at PowerShell and Iron Python to do this so far but I am not sure this is the route I want to go. I have also looked at some tools such as Talend which is a drag and drop style tool which may or may not give me what I want as far as flexibility. We are a .NET shop and have created custom code to do this in the past but I need something that is quicker to create then coding custom parsing functions each time we get a new file format in.

最佳回答

Python is wonderful for this kind of thing. That s why we use. Each new customer transfer is a new adventure and Python gives us the flexibility to respond quickly.


Edit. All python scripts that read files are "custom file parsers". Without an actual example, it s not sensible to provide a detailed example.

with open( "some file", "r" ) as source:
    for line in source:
        process( line )

That s about all there is to a "custom file parser". If you re parsing .csv or .xml files, then Python has modules for that. If you re parsing fixed-format files, you d use string slicing operations. If you re parsing other files (X12? JSON? YAML?) you ll need appropriate parsers.

Tab-Delim.

from collections import namedtuple
RecordLayout = namedtuple( RecordLayout ,[ field1 , field2 , field3 ,...])
def process( aLine ):
    record = RecordLayout( aLine.split( 	 ) )
    ...

Fixed Layout.

from collections import namedtuple
RecordLayout = namedtuple( RecordLayout ,[ field1 , field2 , field3 ,...])
def process( aLine ):
    fields = ( aLine[:10], aLine[10:20], aLine[20:30], ... )
    record = RecordLayout( fields )
    ...
问题回答

Depending on the complexity and variability of your work, you should consider an ETL tool like SSIS (SQL Server Integration Services).





相关问题
Create a new Tuple with one element modified

(I am working interactively with a WordprocessingDocument object in IronPython using the OpenXML SDK, but this is really a general Python question that should be applicable across all implementations) ...

Why can t I import my C# type into IronPython?

I have some types in a C# library I wrote, e.g.: namespace SprocGenerator.Generators { public class DeleteGenerator : GeneratorBase { public DeleteGenerator(string databaseName, ...

IronPython ScriptRuntime equivalent to CPython PYTHONPATH

The following import works inside ipy.exe prompt but fails using IronPython ScriptRuntime inside a C# 4.0 program. import ConfigParser C# code: using System; using System.Collections.Generic; using ...

Rollback in Ironpython using System.Data.SqlClient

I am unable to rollback using the following code snippet and need help: import clr import sys clr.AddReference( System.Data ) from System.Data.SqlClient import SqlConnection, SqlParameter, ...

Nonblocking webserver on .Net for Comet applications

I am trying to implement a Comet style (e.g. chat) application using IronPython. While I don t need to scale to twitter like dimensions, it is vital that the response time is lightening fast. All ...

IronPython asp.net IntelliSense

I m trying IronPython for asp.net, I got a simple CRUD screen to work. I ve read IntelliSense doesnt work for IronPython, but is there any way to get rid of Visual Studio underlining all the lines ...

C# Running IronPython On Multiple Threads

I have a WPF app that controls audio hardware. It uses the same PythonEngine on multiple threads. This causes strange errors I see from time to time where the PythonEngines Globals dictionary has ...

热门标签