English 中文(简体)
Database patterns for partitioning large hierarchical datasets
原标题:

Are there any best-pratices/patterns or general advice for partitioning large amounts of hierarchical data?

Think of, say, a database of all the people in a given country and tracking who has worked with who. Thinking of the "person" entities in isolation, if a lot of data were to be kept about each person then a natural approach seems to be to divide the population across multiple horizontal partitions. However, the relations (who worked with who) could (and will) cross partitionsa. Clustering on these relations (ie. using employer for example as the partition key in order to minimize cross-partition-references) won t be viable over time as the data becomes more and more cross-linked. Such clustering would also result in unbalanced partitions which would hamper scalability.

I m rather stuck right now, so would be very greatful for any help offered.

Thanks.

问题回答

It seems you have three problems:

  1. Storing data about an employee (excluding relationships/hierarchy)
  2. Employer to Employee hierarchy (which can change over time)
  3. Employee to Employee work history (again, changing over time)

To tackle each in turn:

  1. Employee data: This could be partitioned, with a unique id, with alternate key for surname+given names+date of birth. Either partition by spreading evenly by id, or other info such as area/region (though that will mean some partitions will be hotter than others)

  2. Employer/employee hierarchy: Needs a secondary table to define this, allowing changes over time. eg. Employee id, Employer id, start date, end date and keyed by employee id + employer id and back the other way employer id + employee id. I recommend reading the following: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back , it might have ideas that work well for the size of your data.

  3. Employee/employee work history: Needs another secondary table, very similar to #2, cross referencing employees and the time they ve worked together. eg. employee1 id, employee2 id, start date, end date, which would be indexed by each of the id s at a minimum.

The key here is that don t attempt to place the relationships/hierarchy within the employee data table - it will be slow and limit the linking you need (especially as links change over time).





相关问题
SQL SubQuery getting particular column

I noticed that there were some threads with similar questions, and I did look through them but did not really get a convincing answer. Here s my question: The subquery below returns a Table with 3 ...

难以执行 REGEXP_SUBSTR

I m 查询Oracle 10g。 我有两张表格(样本数据见下文)。 i m 试图提取一些领域

SQL Query Shortcuts

What are some cool SQL shorthands that you know of? For example, something I learned today is you can specify to group by an index: SELECT col1, col2 FROM table GROUP BY 2 This will group by col2

PHP array callback functions for cleaning output

I have an array of output from a database. I am wondering what the cleanest way to filter the values is example array Array ( [0] => Array ( [title] => title 1 ...

OracleParameter and DBNull.Value

we have a table in an Oracle Database which contains a column with the type Char(3 Byte). Now we use a parameterized sql to select some rows with a DBNull.Value and it doesn t work: OracleCommand ...

Running numbers in SQL

I have a SQL-statement like this: SELECT name FROM users WHERE deleted = 0; How can i create a result set with a running number in the first row? So the result would look like this: 1 Name_1 2 ...

How to get SQL queries for each user where env is production

I’m developing an application dedicated to generate statistical reports, I would like that user after saving their stat report they save sql queries too. To do that I wrote the following module: ...

热门标签