This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://github.com/Thomas-George-T/Movies-Analytics-in-Spark-and-Scala | |
Change execution engine = Tez, spark ( set Tez/Spark client jars into HADOOP_CLASSPATH) | |
Partitioning - PARTITIONED BY clause is used to divide the table into buckets. | |
Buckting - CLUSTERED BY clause is used to divide the table into buckets. | |
Map-Side join, Bucket-Map-Side join, Sorted Bucket-Map-Side join | |
Usage of suitable file format = ORC(Optimized Row Columnar) file formate | |
Indexing | |
Vectorization along with ORC | |
CBO |