Are there any best-pratices/patterns or general advice for partitioning large amounts of hierarchical data?
Think of, say, a database of all the people in a given country and tracking who has worked with who. Thinking of the "person" entities in isolation, if a lot of data were to be kept about each person then a natural approach seems to be to divide the population across multiple horizontal partitions. However, the relations (who worked with who) could (and will) cross partitionsa. Clustering on these relations (ie. using employer for example as the partition key in order to minimize cross-partition-references) won t be viable over time as the data becomes more and more cross-linked. Such clustering would also result in unbalanced partitions which would hamper scalability.
I m rather stuck right now, so would be very greatful for any help offered.
Thanks.