Wednesday, May 2, 2018

sql - What are the options for storing hierarchical data in a relational database?

Good Overviews



Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:





Options




Ones I am aware of and general features:




  1. Adjacency List:


    • Columns: ID, ParentID

    • Easy to implement.

    • Cheap node moves, inserts, and deletes.


    • Expensive to find the level, ancestry & descendants, path

    • Avoid N+1 via Common Table Expressions in databases that support them


  2. Nested Set (a.k.a Modified Preorder Tree Traversal)


    • Columns: Left, Right

    • Cheap ancestry, descendants

    • Very expensive O(n/2) moves, inserts, deletes due to volatile encoding



  3. Bridge Table (a.k.a. Closure Table /w triggers)


    • Uses separate join table with: ancestor, descendant, depth (optional)

    • Cheap ancestry and descendants

    • Writes costs O(log n) (size of subtree) for insert, updates, deletes

    • Normalized encoding: good for RDBMS statistics & query planner in joins

    • Requires multiple rows per node


  4. Lineage Column (a.k.a. Materialized Path, Path Enumeration)



    • Column: lineage (e.g. /parent/child/grandchild/etc...)

    • Cheap descendants via prefix query (e.g. LEFT(lineage, #) = '/enumerated/path')

    • Writes costs O(log n) (size of subtree) for insert, updates, deletes

    • Non-relational: relies on Array datatype or serialized string format


  5. Nested Intervals



    • Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)

    • Has real/float/decimal representation/precision issues

    • Matrix encoding variant adds ancestor encoding (materialized path) for "free", but with added trickiness of linear algebra.


  6. Flat Table


    • A modified Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.

    • Cheap to iterate/paginate over

    • Expensive move and delete


    • Good Use: threaded discussion - forums / blog comments


  7. Multiple lineage columns


    • Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL

    • Cheap ancestors, descendants, level

    • Cheap insert, delete, move of the leaves

    • Expensive insert, delete, move of the internal nodes

    • Hard limit to how deep the hierarchy can be





Database Specific Notes



MySQL





Oracle






PostgreSQL





SQL Server





  • General summary

  • 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.

No comments:

Post a Comment

plot explanation - Why did Peaches' mom hang on the tree? - Movies & TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...