English 中文(简体)
Postgres dump of only parts of tables for a dev snapshot
原标题:
  • 时间:2009-11-16 21:54:33
  •  标签:
  • postgresql

On production our database is a few hundred gigabytes in size. For development and testing, we need to create snapshots of this database that are functionally equivalent, but which are only 10 or 20 gigs in size.

The challenge is that the data for our business entities are scattered across many tables. We want to create some sort of filtered snapshot so that only some of the entities are included in the dump. That way we can get fresh snapshots every month or so for dev and testing.

For example, let s say we have entities that have these many-to-many relationships:

  • Company has N Divisions
  • Division has N Employees
  • Employee has N Attendance Records

There are maybe 1000 companies, 2500 divisions, 175000 employees, and tens of millions of attendance records. We want a replicable way to pull, say, the first 100 companies and all of its constituent divisions, employees, and attendance records.

We currently use pg_dump for the schema, and then run pg_dump with --disable-triggers and --data-only to get all the data out of the smaller tables. We don t want to have to write custom scripts to pull out part of the data because we have a fast development cycle and are concerned the custom scripts would be fragile and likely to be out of date.

How can we do this? Are there third-party tools that can help pull out logical partitions from the database? What are these tools called?

Any general advice also appreciated!

最佳回答

On your larger tables you can use the COPY command to pull out subsets...

COPY (SELECT * FROM mytable WHERE ...) TO  /tmp/myfile.tsv 

COPY mytable FROM  myfile.tsv 

https://www.postgresql.org/docs/current/static/sql-copy.html

You should consider maintaining a set of development data rather than just pulling a subset of your production. In the case that you re writing unit tests, you could use the same data that is required for the tests, trying to hit all of the possible use cases.

问题回答

I don t know about any software which already does this, but I can think of 3 alternative solutions. Unfortunately, they all require some custom coding.

  1. Re-create all the tables in a separate schema, then copy into those tables only the subset of data you would like to dump, using INSERT INTO copy.tablename SELECT * FROM tablename WHERE ... and dump that.

  2. Write your own script for dumping data as SQL statements. I have used this approach in the past and it only took something like 20-30 lines of PHP.

  3. Modify pg_dump so it accepts a condition along with the -t switch when dumping a single table.





相关问题
摘录数据

我如何将Excel板的数据输入我的Django应用? I m将PosgreSQL数据库作为数据库。

Postgres dump of only parts of tables for a dev snapshot

On production our database is a few hundred gigabytes in size. For development and testing, we need to create snapshots of this database that are functionally equivalent, but which are only 10 or 20 ...

How to join attributes in sql select statement?

I want to join few attributes in select statement as one for example select id, (name + + surname + + age) as info from users this doesn t work, how to do it? I m using postgreSQL.

What text encoding to use?

I need to setup my PostgreSQL DB s text encoding to handle non-American English characters that you d find showing up in languages such as German, Spanish, and French. What character encoding should ...

SQL LIKE condition to check for integer?

I am using a set of SQL LIKE conditions to go through the alphabet and list all items beginning with the appropriate letter, e.g. to get all books where the title starts with the letter "A": SELECT * ...

热门标签