Open/Explore/Export ORC file

Cheat Codebigdata-file-viewer Today I need to explore some of the ORC file that has been loaded by someone to datalake. I can do my usual stun by creating external table on Hive and run my checking. So I was googling around and managed to reach to this tool. And I love it!Downloaded the latest releasedContinue reading “Open/Explore/Export ORC file”

DBeaver – Use Space instead of TAB

Cheat CodeWindows -> Preferences -> General -> Editors -> Text Editor -> Insert spaces for tabs Your welcome!I’ve been pulling my hair lately because when I try to get view definitions of our hive views, the definition is screwed. Some part of the view definition are truncated.Apparently if you have more than 1 level indentContinue reading “DBeaver – Use Space instead of TAB”

DBeaver – Hive connection keep disconnecting

Cheat CodeSET Keep-Alive settings to 60 seconds I love DBeaver! It’s a GUI tool to run queries and connect to various databases. One of the reasons I really like this tool is because they can auto discover driver required to connect to our databases (Hive, SQL, Drill, yada yada.. you name it!) and download itContinue reading “DBeaver – Hive connection keep disconnecting”

Hive Rename Table

Cheat CodeALTER TABLE mydatabase.fact_transaxion RENAME TO mydatabase.fact_transaction One thing I really hate dealing with various project is TYPO ERROR.How the hell the developers anyhow type and deploy table to database without double checking the name. ARGH! Luckily it’s pretty straight forward to rename the table in HIVE. Use the cheat code above to rename it.SimpleContinue reading “Hive Rename Table”

Hive Drop Partition

Cheat CodeALTER TABLE mydatabase.mysupertable_agg DROP IF EXISTS PARTITION(month_partitionkey = ‘__HIVE_DEFAULT_PARTITION__’); Another amazing day.We have been almost 1 year with Hive as main big data engine, and everything was smooth as baby bum.This morning, out of the blue (why blue? I wonder)… my dashboard looks damn weird. I have strange month start appearing on the chart,Continue reading “Hive Drop Partition”

pyodbc – Connecting to HDInsight

So…. recently I’m exploring Python as it’s COOL!What other reason you need to learn new things except that when it’s COOL! Anyhow, I want to try to compare data between my HDInsight Hive External Table vs CSV using Python.Because I can, so why not 🙂 First challenge in this journey is how the heck canContinue reading “pyodbc – Connecting to HDInsight”

User can’t connect to hive with Kerberos because of http header size too big

Cheat CodeAmbari-> Hive -> Custom hiveserver2-site hive.server2.thrift.http.request.header.size=65536 hive.server2.thrift.http.response.header.size=65536 Our Azure HDInsight is secured with Enterprise Service Package/ESP (a fancy name saying the cluster is join domain to AD for authentication). Some of the users said they are having difficulties trying to logon from their beloved, most powerful, most flexible BI tool – Excel. After troubleshootingContinue reading “User can’t connect to hive with Kerberos because of http header size too big”

Your Hive Query Run Slow? This cheat make it faster!

Cheat CodeANALYZE TABLE mydb.mySuperPartitionedTable PARTITION (filedate) COMPUTE STATISTICS; ANALYZE TABLE mydb.mySuperBigFlatTable COMPUTE STATISTICS; I have couple queries running bloody slow, I mean REALLY darn slow!I ran over ODBC, JDBC, Beeline,… what ever you name it.. slow!then I have a look at the execution (oh it’s LLAP + TEZ should be freaking fast, right!?).I found strangeContinue reading “Your Hive Query Run Slow? This cheat make it faster!”

Design a site like this with WordPress.com
Get started