10-Join Algorithm

发表于2024-10-31|更新于2024-10-31|DB

|阅读量:

10-Join Algorithm

10.1 join algorithms

尽可能选择较小的表作为外围表

10.2 Join opetators

Output

data

Cost Analysis Criteria

Nested Loop Join

Nested Loop Join

已知表非常小的情况下，可以使用 nested loop join，可以适配 L3 缓存

Index Nested Loop Join

Block Nested loop join

Nested Loop Join Summary

Key Takeaways

选择更小的表作为 Outer table
尽可能在 bufferpool 中缓存 outer table
使用索引快速访问 inner table

Algorithms

Naive
Block
Index

Sort-Merge Join

Sort

Merge

此时 R 中 id 为 200；S 的 id 比 200 大，需要回溯，但是由于表是有序的，不必像 nested loop join 回溯到开头只需要回溯到上一个不大于 R.id 的值就可以

when is sort-merge join useful

其中一个或多个表在 Join key 上已经排序
输出必须在 join key 上排序

Hash Join

simple hash join algorithm

可以在 probe 阶段使用 bloom filter 进行优化；首先概率性地查找是否存在这个 Key

Partitioned hash join

partiion phase
probe phase

如果其中的桶仍然溢出，使用第二个哈希函数进行 rehash

Hybird hash join

仅仅在数据分布及其倾斜的时候使用；将大量使用的 hash 桶保存在内存

Join algorithms summary

hashing is almost always better than sorting for operator executino
sorting is better on non-uniform data
sorting si better when result needs to be sorted

文章作者: LZY

文章链接: http://tom-jerr.github.io/2024/10/31/CMU15445/10-Join%20Algorithm/

版权声明: 本博客所有文章除特别声明外，均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 LZY的Code生活！

相关推荐

11 Query Execution

13 Concurrency Control Theory

14 Two Phase Lock

12 Query Planning and Optimization

15 Timestamp Ordering Concurrency Control

16 Multi-Version Concurrency