English 中文(简体)
将数据库内容从一个非常贫穷的结构迁移到一个非常符合逻辑的最佳做法?
原标题:Best practices for migrating database content from one very poor structure to one very logical?

TL;DR 在一个结构极差的数据库(几栏重复、没有相互关系和重复数据)与另一个高度组织和关系结构之间迁移数据的最佳方式是什么?——长读!

我最近承担了非常复杂的工作。 它重写一个全公司网上信息技术平台。 恐怕我会提供太多细节,因为我们可以让老的开发商知道(他拥有对公司头部的恶棍枪,因为他是唯一知道如何做像发票制作这样的重要事情的人,并且需要越来越多的钱。

主要问题是,整个网络平台(由所有工作人员和所有客户使用)都由一位具有技能的gu编为,不低于业余<>。 它由大约300份个人代码文档组成。 没有一个模板图书馆——它把所有硬编码输入每个档案。 没有一个符合逻辑的数据库结构,实际上随着他走到一起而组成。 没有安全——它令人震惊。 无论如何,我们将在大约3个月时间内重写整个平台。

然而, b子说,早上它就活着,任何地方都不可能失去客户数据。 整个数据库内容必须直接复制。 数据库的结构目前非常贫穷,几乎不可能与之合作,但本周我们将(试图!) 撰写一些文字,将其转移到我们新的高度关系结构上,这种结构更符合逻辑。 问题是,如何最好地做到这一点?

其中一个例子是处理。 在旧数据库中,大约12个表格(共44个)使用地址。 我们有一张<代码>地址的表格,其他表格(例如<代码>address_id)将相互参照,以保持对物的清理。 主要问题是,在大约一半的表格中,这些地址作为<条码>1、<条码>、<2>>>>、<条码>、<条码>、<条码>、<条码>、<条码>、<条码>、>条码>、>、<条码>、<条码>、>、<条码>、<>条码>、<>>条码>、<>条码>、<>条码>、>、<条码>、>、<条码>、>、<条码>、>、>、<条码>、<>、>、>、>条码>、 整件事的现场!

A second example is dates - in some tables he has seconds-since-Epoch dates, in others MySQL NOW() dates, and in others he literally stores it in 6 columns per row - year, month, day, hour, minute, second - ouch...

  • 如何设法解决这一问题?Should we look at our * E/CN.6/2009/1。 从我们获得的数据,或者如果我们扭转这种局面,看his。 表格和工作说明其数据需要进入我们吗?

  • www.un.org/Depts/DGACM/index_spanish.htm 从方案拟订角度看,我们应如何处理这一问题? 许多数据需要动态格式(例如日期),因此,我们正想把数据一流放到一行,正确格式,然后将数据重新列入我们的文字正确位置。

  • Speed and efficiency of queries is not an issue for us, as we will only need to run this once (after testing), on our local machines. His database is currently ~800MB when SQL dumped, but again a lot of this is his useless test data, or just totally unnecessary.

Any ideas on the best way to tackle this? For reference our system will be re-written in PHP so any PHP-based recommendations would be nice. The database is currently (and still will be) in MySQL.

问题回答

这里没有解决办法。 无神论。 简便的辛勤工作。

你有新的模式,实现这一模式的唯一途径是,在纸张上,在白色板上,按逻辑将表格单独转换为新模式。

你们将处理不仅仅是简单的格式问题。 您还将处理数据重复问题。 如果你有12个有地址的表格,但只有1个用户能够打赢?

仅此决定就可简化许多处理方式(如果你可以忽略与总客户记录挂钩的无损地址以外的地址,例如)。

这给你带来了最后的问题。 在转换过程中“不丢失任何数据”。

从一天起,这很可能是一个非开端,取决于“不丢失任何数据”的含义。 例如,如果你重新放弃地址,就会出现数据损失。 每个组成部分都“有地址”,但不一定是以前的内容。 在此之前,它们可能都是相同的,但也可能不是。 它会非常令人迷惑。

Once you have your mapping and other processes done, coding them is straightforward in most any language. Scripting languages work well for this. You could bulk load each of the tables "as is" in to a new DB and write store procedures to do the conversion. Whatever you re conversant with. Your conversion will likely be several steps, and most of this code will likely be "one off" solely for facilitating the conversion.

这将是虚幻的。 这些情况始终存在。 很详细。 这是一种可怕的制度,其所有原因都是这种转换是可怕的原因。 如果你没有把足够时间从预算上撤出,那不会感到惊讶。

最后,如果你掌握了数据,如果你在工作期间(周末,不论夜间)不执行削减规定,你可能会有一些时间限制。 如果你用最新数据重新做这项工作的话,这将是鱼的所有其他ke子。 我可以强烈地建议,如果有可能,就不这样做。

我最近做了几个较大的移民,在此期间逐渐为我制定了一些实用的最佳做法。 这丝毫没有突破性,但你可能发现其中的一些有帮助:

<>中程>

  • Before your start make sure you understand the existing data model and the requirements for the new version of the system.
  • Design the new database schema as best as you can and try to not stress yourself by the fact that you ll need to migrate the old content.
  • Use a framework with a solid ORM. Not only will it be easier to develop the new version but also the migration will be much easier.

<><>

The code dealing with the data migration will be part of your project for some time so It s a good idea to dedicate it a package/folder (i.e. legacy). In this package keep your conversion scripts and other files related to the legacy system. After some time you ll be able to get rid of it by simple rm -rf legacy.

The scripts should do the conversion in small steps. It s better to loop over a table several times and keep the steps small, simple and debuggable than having one big script that does everything although faster.

它也是一个好的想法,即管理自己的交易中的每一步骤,只有在成功完成之后,才作出承诺,以便你不必在一步失败后再次控制整个移民。

整个移民过程以及特殊步骤或步骤组应当能够从指挥线上进行指挥,因为你在达到最后版本之前多次飞行,使你更加自动化。

The main script (i.e. legacy/bin/full-migration) should perform the whole process (i.e. fetch a fresh copy of the legacy production DB, (re-)create the new DB and tables in it, run the whole migration) and it should be exactly the same process as you ll eventually run after you deploy the new version in the production server(s) (only with different configuration). It will allow you test it thoroughly in your development environment.

由于转换需要很长的时间才能记录每项行动(请参看<代码> action 诉+ 物体_id)。 经常有两行出现一些意想不到的分歧,造成你的脚本坠毁或造成参考正直错误。 如果它很好地看到了哪一个目标,那么你可以立即去行,检查数据,相应地更新文字,并再次采取失败步骤。

One thing that has proven very useful for me was to define model classes also for the legacy databases tables using the ORM. I ve done this a couple of time in Django which supports multiple database connections and per-model routing so I was able to write scripts that looked roughly like this (Python):

from legacy import models as old
from catalog import models as new

# Loop through all products from the legacy DB
for old_product in old.Product.objects.all():  
    # Create an instance of the new product model class
    new_product = new.Product() 
    # Copy and modify attributes as needed
    new_product.name = old_product.product_name.strip()
    # ...
    # Save it to the new database
    new_product.save()

此外,新图纸的限制性越强,越好(例如,可能的话,外国关键制衡等),因为这将有助于你发现你对旧图纸的假设是错误的,也防止错误数据进入你的新系统(InnoDB作为MySQL的支持者是一个好的想法)。

Other good practice is to preserve the old primary keys in the new database where possible. If you see something strange in the new data after the migration you can go back and lookup the item by its ID in the legacy system.

The first step of doing a rewrite is fully understanding the current data structure and the code that runs over it. There may be some data which appears redundant but the code requires it to be so for some odd reason. Is it poor design? Probably - but make sure you completely understand each bit of code that writes or accesses data, so you can determine what can be dropped, what must be refactored, and what must be left as is.

Tools can help automate the process - but without a deep grasp of the current system, they can automate you into a corner.

I would design the new data structure, write scripts to transfer the old structure to the new, then test the functionality. If there are problems, alter the new structure and / or the import scripts, then run the data transfer routine again and repeat the whole process until sure that no data or functionality are being lost. At this point, arrange a date to shut down the old system, do the data migration, then bring up the new system.

Missing from all this of course is training the users on the new / improved system. This is vital! Don t leave it out of your plan or the best new shiny improved system will be sunk due to user unhappiness.

思考......

为什么不把新的、固定的、光彩的chem子隐藏在使它像旧的观念后面?

This means you have 2 client code bases on the same data, each has their own "API" in the database though.

This also means that the old system is never actually switched off on "go live".

First in designing your new structure include columns to hold the record identifiers from the old system and the table it came from. You can drop these after the move was proven to be successful, but they will help tremendously in migrating the data and testing that it is correct after migration and in answering questions about where the data came from when users are surprised at what they see. If the old data doesn t have PKs, then create them with some type of automnumber field.

Work from the parent tables down. If addresses are stored in more than one place, determine which order you want to grab the addresses from and which will take precendence if there are multiple records that are different. You may want to store different addresses as well (the address table is a one-to-many with the person table yes?) but you may need to have additional address types available.

你们需要处理旧数据的问题,而不是与新的数据类型或规模或制约因素相匹配(例如,你希望需要一些东西,而且没有价值)。 决定你们想在你们开始之前如何处理,并从利益攸关方那里获得回报。 如果需要街道1,而且只有城市和州,那么你可能希望使用“不知”一词。

将转换成符合新标准或无法说明如何改变为例外表格的任何数据。 利益攸关方或用户可能需要与他们打交道,以获取新的所需数据或告诉你什么变化。

和你一样,你也需要几次管理。 首先,在dev子上,然后在QA盒上。 当移到引人时,如果网球比你负担得更长,那么你可能需要在发射之前将大部分数据迁移,然后在发射时仅仅提出新的或改变的数据。

有许多工作要做,3个月对于这种移徙极为紧张。 Good luck.





相关问题
SQL SubQuery getting particular column

I noticed that there were some threads with similar questions, and I did look through them but did not really get a convincing answer. Here s my question: The subquery below returns a Table with 3 ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

php return a specific row from query

Is it possible in php to return a specific row of data from a mysql query? None of the fetch statements that I ve found return a 2 dimensional array to access specific rows. I want to be able to ...

Character Encodings in PHP and MySQL

Our website was developed with a meta tag set to... <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> This works fine for M-dashes and special quotes, etc. However, I ...

Pagination Strategies for Complex (slow) Datasets

What are some of the strategies being used for pagination of data sets that involve complex queries? count(*) takes ~1.5 sec so we don t want to hit the DB for every page view. Currently there are ~...

Averaging a total in mySQL

My table looks like person_id | car_id | miles ------------------------------ 1 | 1 | 100 1 | 2 | 200 2 | 3 | 1000 2 | 4 | 500 I need to ...