English 中文(简体)
Best practices when migrating data from one database scheme to another?
原标题:
  • 时间:2009-11-19 12:59:44
  •  标签:
  • etl

Often times when I am working on a project, I find my self looking at the database scheme and having to export the data to work with the new scheme.

Lots of times there has been a database where the data stored was fairly crude. What I mean by that is that its stored with lots of unfiltered characters. I find my self writing custom php scripts to filter through this information and create a nice clean UTF-8 CSV file that I then reimport into my new database.

I d like to know if there are better ways to handle this?

最佳回答

You can consider Logstash.

logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching)

Logstash operate every single event/log like pipe: input | filter | output. Logstash have many input plugins to accept different sources/formats, and you can use filter to parse your source data then output to multiple outputs/formats which you need.

问题回答

I would suggest using an ETL tool, or at least following ETL practices when moving data. Considering that you are already cleaning, you may follow the whole ECCD path -- extract, clean, conform, deliver. If you do your own cleaning, consider saving intermediate csv files for debug and audit purpose.

1. Extract (as is, junk included) to file_1
2. Clean file_1 --> file_2
3. Conform file_2 --> file_3
4. Deliver file_3 --> DB tables

If you archive files 1-3 and document versions of your scripts, you will be able to backtrack in case of a bug.

ETL tools -- like Microsoft SSIS, Oracle Data Integrator, Pentaho Data Integrator -- connect to various data sources and offer plenty of transformation and profiling tasks.

No one answer to this one, but i once needed to quickly migrate a database and ended up using sqlautocode, which is a tool to autogenerate a (python orm) model from an existing database - the model uses the great sqlalchemy orm library. It even generates some sample code, to get started ... (see below)

Amazingly, it worked out of the box. You do not have a full migration, but an easy way to programmatically access all your tables (in python).

I didn t do it at that project, but you could of course autogenerate your orm layer for the target DB as well, then write a script, which transfers the right rows over into the desired structure.

Once you get your DB content into python, you will be able to deal with u unicode , even if it will take some attepts, dependent on the actual crudeness ...

Example code:

# some example usage
if __name__ ==  __main__ :
db = create_engine(u mysql://username:password@localhost/dbname )
metadata.bind = db

# fetch first 10 items from address_book
s = customers.select().limit(10)
rs = s.execute()
for row in rs:
    print row




相关问题
Is Web Service suitable for ETL purpose?

My company is considering using web service as mean of ETL process. However I don t think web service fit into this purpose, for several reasons: 1. web service could possibly consume a lot of memory ...

What is the best file parsing solution for converting files?

I am looking for the best solution for custom file parsing for our enterprise import routines. I want to basically change one file format into a standard file format and have one routine that imports ...

SSIS Package failing with New structure of Flatfile

SSIS package is just importing from txt file to sql database. when we made the package were using old file and its executing fine.the old source file got (10 columns) the new source file got 15 ...

Is Pentaho ETL and Data Analyzer good choice?

I was looking for ETL tool and on google found lot about Pentaho Kettle. I also need a Data Analyzer to run on Star Schema so that business user can play around and generate any kind of report or ...

ETL Tools and Build Tools

I have familiarities with software automated build tools ( such as Automated Build Studio). Now I am looking at ETL tools. The one thing crosses my mind is that, I can do anything I can do in ETL ...

热门标签