Phaneesh phaneesh

phaneesh / hiveQueryOptimizationTechniques.txt

Created October 28, 2023 11:52 — forked from thanoojgithub/hiveQueryOptimizationTechniques.txt

hive query optimization techniques

	https://github.com/Thomas-George-T/Movies-Analytics-in-Spark-and-Scala

	Change execution engine = Tez, spark ( set Tez/Spark client jars into HADOOP_CLASSPATH)
	Partitioning - PARTITIONED BY clause is used to divide the table into buckets.
	Buckting - CLUSTERED BY clause is used to divide the table into buckets.
	Map-Side join, Bucket-Map-Side join, Sorted Bucket-Map-Side join
	Usage of suitable file format = ORC(Optimized Row Columnar) file formate
	Indexing
	Vectorization along with ORC
	CBO