Your Hive Query Run Slow? This cheat make it faster!

Cheat Code
ANALYZE TABLE mydb.mySuperPartitionedTable PARTITION (filedate) COMPUTE STATISTICS;

ANALYZE TABLE mydb.mySuperBigFlatTable COMPUTE STATISTICS;

I have couple queries running bloody slow, I mean REALLY darn slow!
I ran over ODBC, JDBC, Beeline,… what ever you name it.. slow!
then I have a look at the execution (oh it’s LLAP + TEZ should be freaking fast, right!?).
I found strange things that most query requires 1099 task of reducer… That’s One Thousand and Ninety Nine!
Something completely wrong here.

Then back to basic buddy!
Do we have CBO enabled? Checked!
Do we have ORC file? YES!
Do we partition the big fat fact table? YEP!
Do we remove unnecessary joins? You BET!
Do we forget to update statistics of the table? HEY! I’m asking!!! Helloooo….. No, I forget about it. *FLIP TABLE*

So, we fire the analyze table on those tables, guest what from 20 mins down to 1.5 mins. I know… What the heck!? 1099 down to 60 NIZE!
all happy now!

Photo by Juhasz Imre from Pexels

Published by Feivel

We love to travel!

Leave a comment

Design a site like this with WordPress.com
Get started