Difference between HiveQL and in SQL.

  1. No windowing functions.  IE, SUM(sales) OVER (PARTITION BY date).  Its difficult to do a lot things common to warehousing, like a running sum, without having to write custom mappers/reducers or a UDF.
  2. No regular UNION, INTERSECT, or MINUS operators.
  3. Null values are treated differently than empty string, and are exported differently.  IE, empty strings are exported as ‘n’ and nulls are exported as nulls.  I know this isn’t unique to Hive but still annoying when exporting data from Hive into another system.
  4. No hierarchical/self referencing querying.  I know most distributed computing solutions can’t do this, but it can be very handy.
  5. No Update or Delete statements.
  6. Haven’t been able to find any kind of cost-based explain plans.  Running explain plans generally just shows the path of accessing data.  Useful to some degree but it would be great if it was more advanced in that it could help the user understand which steps are causing the biggest slowdowns.


Add Comment
0 Answer(s)

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.