Ambari -> YARN -> Advanced -> yarn.admin.acl
Add the user
Powershell – Open Large CSV and skip few lines
Cheat Code
Get-Content .\MyDamnJumboFile-20200906-172617.CSV | Select-Object -skip 852115 -first 10 | Out-File Output.txt
Open/Explore/Export ORC file
Cheat Code
bigdata-file-viewer
Today I need to explore some of the ORC file that has been loaded by someone to datalake. I can do my usual stun by creating external table on Hive and run my checking.
So I was googling around and managed to reach to this tool. And I love it!
Downloaded the latest released file, and within seconds I’m able to open the ORC file, inspect the content and relevant metadata. Love it!
Thanks Eugene-Mark for creating such wonderful tool!
Feature List
- Open and view Parquet, ORC and AVRO at local directory, HDFS, AWS S3, etc.
- Convert binary format data to text format data like CSV
- Support complex data type like array, map, struct, etc
- Support multiple platforms like Windows, MAC and Linux
- Code is extensible to involve other data format

DBeaver – Use Space instead of TAB
Cheat Code
Windows -> Preferences -> General -> Editors -> Text Editor -> Insert spaces for tabs
Your welcome!
I’ve been pulling my hair lately because when I try to get view definitions of our hive views, the definition is screwed. Some part of the view definition are truncated.
Apparently if you have more than 1 level indent in your view definition when you try to get reverse engine it via “describe extended mydb.my_table” the engine can’t pull definition properly.
I tried is many tools, DBeaver, Beeline, Ambari Hive View, all failed to fetch the definition.
In my case I need to access Hive metadata on Azure SQL to get the full text.
Lesson learned, change your tabs into spaces, please….

DBeaver – Hive connection keep disconnecting
Cheat Code
SET Keep-Alive settings to 60 seconds
I love DBeaver! It’s a GUI tool to run queries and connect to various databases.
One of the reasons I really like this tool is because they can auto discover driver required to connect to our databases (Hive, SQL, Drill, yada yada.. you name it!) and download it automagically.
It also have nice UI (subjective) and just making browsing Hive database easier.
However when I set connection to HDInsight, very often whenever I want to run query after idling for awhile (dreaming and wondering why I exists….) it always trying to reconnect… re-establish connection… re-….re-…. damn slow (like 30 secs?).
Using the cheat code allow the DBeaver to remain connected by periodically ping the hive and ensure connection is alive.
Right click on the connection -> Edit Connection -> Connection Settings -> Initialization
Set Keep-alive 60 seconds.

Now I can daydreaming longer….. while still be happy.
Hive Rename Table
Cheat Code
ALTER TABLE mydatabase.fact_transaxion
RENAME TO mydatabase.fact_transaction
One thing I really hate dealing with various project is TYPO ERROR.
How the hell the developers anyhow type and deploy table to database without double checking the name. ARGH!
Luckily it’s pretty straight forward to rename the table in HIVE.
Use the cheat code above to rename it.
Simple enough! I’m happy now.

Hive Drop Partition
Cheat Code
ALTER TABLE mydatabase.mysupertable_agg
DROP IF EXISTS PARTITION(month_partitionkey = ‘__HIVE_DEFAULT_PARTITION__’);
Another amazing day.
We have been almost 1 year with Hive as main big data engine, and everything was smooth as baby bum.
This morning, out of the blue (why blue? I wonder)… my dashboard looks damn weird. I have strange month start appearing on the chart, it said __HIVE_DEFAULT_PARTITION__ WHAT ON EARTH IS THAT!?
So…. checked the query that populate this super table and saw nothing new or strange or alien lurking around. Finally found it was due to crazy behavior when we set “hive.vectorized.execution.enabled=true;” CRAP!
anyway, adjusted the code and reload. But this crazy partition still listed on the table, so to remove it we need to drop the partition manually. Just fire the cheat code, remove the folder and all good.
BTW __HIVE_DEFAULT_PARTITION__ appear if your column for the partition is null (your welcome!).

PowerBI – Switch Measures Dynamically
Cheat Code
Selected = if(
HASONEVALUE(‘Currency Switcher'[Currency]),
SWITCH(
VALUES(‘Currency Switcher'[Currency]),
“Local”,sum(fact_order[CostLocalCurrency]),
“USD”,sum(fact_order[CostUSD])))
Simple!
Enuf say…
Everyday I’m shufflin’ with PowerBI
always new things to learn…
Too many copy of dashboard/report just to switch couple of measures. I’m lazy and I dont like repetitive task. KISS Yo! Keep It Simple and Stupid.
Have Fun!

PowerBI – Count Dimension by other dimension linked by fact
Cheat Code
CountTargetRows = CALCULATE(
DISTINCTCOUNT(dim_TargetDimension[TheColumnName]),
FILTER(dim_middle_man, CALCULATE(sum(FT_Transaction[Cost])>0))
)
Love-Hate relationship with PowerBI!
I love the product as it’s super powerful BI tool, not to mention CHEAP (it’s free as matter of fact – PowerBI Desktop). It can do magic and wonder.
Today I was asked how to create calculation for count of dimension rows but it’s related via fact, you know… usual start schema.
Anyway after shooting couple of failed formula, I arrived at the wonderland.
Every one is happy now.

Want to take screenshot and auto save it to disk so you don’t miss it?
Cheat Code
Greenshot!
I need to take screenshit… I mean Screenshot, and alot of them for whatever reason I can think of.
Been using Windows buildin tool Snipping Tool, it’s super convenient to use it. My muscle memory on hands know it best (Windows Key + “SN”) then KACHA! Screenshot taken.
The problem is, i need to paste it somewhere or save it before I can take more screenshot. Been thinking about this issue for sometime, until today!
Browsing REDDIT (during work hour) makes you smart… well.. don’t do during work hour (if possible – but heck care!) 😀 don’t blame me.
I came across the post about Greenshot, and decided to try it.
Immediately fall in love with it…
You can setup the COMBO CHEAT OUTPUT, that’s real cheating…
DARN why now only I know this.
Have Fun, and join the fun with me!
