Your Hive Query Run Slow? This cheat make it faster!

Cheat Code
ANALYZE TABLE mydb.mySuperPartitionedTable PARTITION (filedate) COMPUTE STATISTICS;

ANALYZE TABLE mydb.mySuperBigFlatTable COMPUTE STATISTICS;

I have couple queries running bloody slow, I mean REALLY darn slow!
I ran over ODBC, JDBC, Beeline,… what ever you name it.. slow!
then I have a look at the execution (oh it’s LLAP + TEZ should be freaking fast, right!?).
I found strange things that most query requires 1099 task of reducer… That’s One Thousand and Ninety Nine!
Something completely wrong here.

Then back to basic buddy!
Do we have CBO enabled? Checked!
Do we have ORC file? YES!
Do we partition the big fat fact table? YEP!
Do we remove unnecessary joins? You BET!
Do we forget to update statistics of the table? HEY! I’m asking!!! Helloooo….. No, I forget about it. *FLIP TABLE*

So, we fire the analyze table on those tables, guest what from 20 mins down to 1.5 mins. I know… What the heck!? 1099 down to 60 NIZE!
all happy now!

Photo by Juhasz Imre from Pexels

Azure Data Factory working with “UCS-2 LE BOM” a.k.a “UTF-16LE with Signature” a.k.a “UTF-16” a.k.a “Windows Unicode”

Cheat Code
Set Encoding to “UTF-16”

So we received this file on our blob and we need to convert it into ORC file for our hive to eat it fast n furious. Using the battle proven Azure Data Factory (ADF) we spin quickly a pipeline to read the blob data which is zipped and have tab delimited and use the copy activity to convert it to ORC file on ADLv2.

After we run the job, we create external table on hive, and we fell off the chair. There are so many extra new line shown on the result when we select * of the external table.

We spin Notepad++, EmEditor to check the encoding and we see “UCS-2 LE BOM” on notepad++ and “UTF-16LE with Signature” on EmEditor… Bloody heck!

UTF-16LE with Signature – DAMN Scary mate!
UCS-2 LE BOM – What a BOMB!

Encoding -> UTF-16

So using the above cheat we can load the CSV to ADLv2 as ORC and now everything live happily ever after.

This is my personal view… I kind of glad that Microsoft provide solution to issue that they invent themselves 😉 whereas UTF-8 just provide whole world a better place to live

Photo by Brett Sayles from Pexels

SQL Server – Rename column via T-SQL

Cheat Code
EXEC sp_rename ‘[dbo].[TABLE_NAME].[OLD_COLUMN_NAME]’,’NEW_COLUMN_NAME’, ‘COLUMN’;

I just received a set of tables in SQL Server which was loaded from CSV file without column header on 1st row. They provide separate EXCEL file which define the column name.
You can imagine that all fields are like “[Column01]”, “[Column02]”, etc.

Good luck to our Data Analyst writing
“SELECT [Column01], sum([Column10]) from [dbo].[csv01] group by [Column01]” 😈 😈 😈

Well… I’m a good guy and love to make people live easy, so I use the above cheat code by generating the script over EXCEL file, you now “Substitute” function, and once I have the scripts, copy-paste-F5.
Your’re welcome users!

Photo by Andri from Pexels

Run HIVE HQL script on Beeline with Parameter

Cheat Code
beeline -u ‘jdbc:hive2://zk0-mycluster.mydomainads.onmicrosoft.com:2181,zk2-mycluster.mydomainads.onmicrosoft.com:2181,zk3-mycluster.mydomainads.onmicrosoft.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2’ –hivevar filedate=’2019-04-10′ –hivevar dwhdatabase=’dw_reporting_global’ –hivevar stagingdatabase=’dw_reproting_local’

Reuse, reuse, reuse!
I think this resonate very well with where we are today. Developer love to develop, but if you keep redeveloping e.g. rewriting same script again and again just for the sake of “clocking # of lines” then it’s a foolish!
In programming world we know what called parameter, it’s a magical thing that allow us to use same piece of code for different situation.
In our case we want to run Hive SQL script which load our Hive DW table but we don’t want to keep generating same script just to change some parameter like date, customer, etc. So we can use the “–hivevar” parameter magic for this.
Use the cheat code above!

If you wonder, our Hive cluster is actually hosted as Microsoft HDInsight service.

Photo by Markus Spiske temporausch.com from Pexels

List all partitions from Hive Table

Cheat Code
> show partitions database_name.table_name

I was asked to list all partitions for the transaction hive table because somehow the number are not complete, so I ran the above cheat code to check all date partitions on my transaction tables.
Simple right!?

Once we found the missing partitions, we can just simple reload it from our raw dataset in data lake

Photo by Engin Akyurt from Pexels

Finding SQL Server Object Dependency

Cheat Code
SELECT * FROM sys.sql_expression_dependencies

This is continuation from previous blog post about my “special project” that requires us to find there the heck some object are depends on and who use them.
Using the cheat code above is a fantastic breeze to find how things are wired.
Give it a kick and you will find a score quickly!

Photo by Pixabay from Pexels

Insert Overwrite Hive Partitioned Table

Cheat Code
INSERT OVERWRITE TABLE apac.amazing_report partition (yyyymm=’2019-12-01′)
SELECT year, month, partnerid, cost, revenue
FROM apac.amazing_csv
WHERE yyyymm=’2019-12-01′

I have hive partitioned table and I need to load/reload this partition from raw source which is CSV in my case.

Just run the cheat code above (don’t forget to adjust the table, partition, and source to your environment.
ONE MORE THING! remember to exclude the partition key column from your select statement 😀

Image by Gerd Altmann from Pixabay

How to check Network MTU

Cheat Code
ping http://www.google.com -f -l 1472

So we have issue with network connectivity that intermittently experiencing slowness (God, no one love slow network!). After checking around we found that likely it’s due to MTU settings not optimized.

First is we need to check what’s the MTU set by our router (usually it’s 1500 btw). So use the cheat code above!
Oh don’t forget to minus a magic number 28 (28 bytes is IP and ICMP header size).
You don’t want to get message “Packet needs to be fragmented but DF set.” which means your MTU is too big!
What you want to find is largest number (+28 don’t forget) that the server is responding.

Ping – Testing MTU size – NOT GOOD!
Woman Doing Goggles Hands Gesture by gratisography

When Browsing is rewarding and safe!

Cheat Code
Brave Browser

Long time ago (like 15 years ago) I was obsessed and thinking how can we earn money while browsing internet. It was days where people are running multiple pop-up and installing various application which promise users that they will get money by browsing internet. In reality, that never happened! Keep browsing and leaving internet on (dial-up days baby!) and we get USD 0.2 after so long. Now I realized that I was a fool, the cost of electricity and dial-up was like mad for USD 0.2.

With the invention of blockchain technology, now rewarding user for browsing internet is possible. Created by no other than co-founder of Mozilla (here for more), based on Chromium, tweaked for security and privacy in mind, Brave Browser is fast and great browser. With the bonus of earning some pocket money just by using it for normal day-to-day activity. Why not?!

Oh, they have it on Mobile as well… Sweet!

Brave Browser – Showing earned rewards 🙂
Design a site like this with WordPress.com
Get started