Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Hive sync error with <class org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not com.uber.hoodie.org.apache.hadoop_hive.metastore.MetaStoreFilterHoo> #533

Closed
louisliu318 opened this issue Dec 14, 2018 · 10 comments
Assignees

Comments

@louisliu318
Copy link

Environment:
spark-2.3.2
hadoop-2.7.3
hive-1.2.1

Error:
I am using spark datasource api to insert data into hoodie table and sync to hive.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not com.uber.hoodie.org.apache.hadoop_hive.metastore.MetaStoreFilterHook at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2227) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.loadFilterHooks(HiveMetaStoreClient.java:240) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:192) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:181) at com.uber.hoodie.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:102) at com.uber.hoodie.hive.HiveSyncTool.<init>(HiveSyncTool.java:61) at com.uber.hoodie.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:246) at com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:179) at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:106) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225) at com.lianjia.dtarch.databus.hudi.HudiBatchSync.execute(HudiBatchSync.java:85) at com.lianjia.dtarch.databus.hudi.HudiBatchSync.main(HudiBatchSync.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.RuntimeException: class org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not com.uber.hoodie.org.apache.hadoop_hive.metastore.MetaStoreFilterHook at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2221) ... 39 more

@louisliu318
Copy link
Author

Removing the s in maven-shade-plugin , the error gone. Maybe need some workaroud in HoodieSparkSqlWriter.syncHive() in which we can set some hive configurations with com.uber.hoodie or set configurations in hive-site.xml.

@vinothchandar
Copy link
Member

Hive 1.x.. Are you using the correct spark bundle?

From quickstart

To work with older version of Hive (pre Hive-1.2.1), use

$ mvn clean install -DskipTests -DskipITs -Dhive11

@bvaradar for context

@louisliu318
Copy link
Author

@vinothchandar I'm using Hive-1.2.1, not Hive-1.1.1. In packaging/hoodie-spark-bundle/pom.xml, -Dhive12 points to hive version 1.2.1.

@bvaradar
Copy link
Contributor

@louisliu318 : THanks for filing this ticket. Yes, with Hive-1.2.1, the maven profile is hive12 (default). When I tested similar setup, I did not encounter this issue. This is caused by shading some of hive jars and including them in the bundle (bit of a magic by trial and error).

It is not clear from your comment if you solved this by disabling shading ?

Can you also try shading hive-metastore jar. Add this relocation in the shading section of pom.xml of hoodie-spark -

org.apache.hadoop.hive.metastore.
com.uber.hoodie.org.apache.hadoop_hive.metastore.

Let me know if this solves the problem.

@louisliu318
Copy link
Author

I solved the problem by comment out the relocations in the shading section of pom.xml of hoodie-spark -
.

@vinothchandar
Copy link
Member

can you throw your changes into a PR? Balaji and I discussed this more. He seems to have tested on hive1.2 as a part of the dockerized setup, and things worked for him.. Ideally we need to test this across all hive versions before making a call ... The hive jar versioning is very sensitive and changes made for one version often end up causing side effects for others

@vinothchandar
Copy link
Member

@louisliu318 ping again, to see if you can share the changes with us..

@bvaradar bvaradar self-assigned this Jan 17, 2019
@louisliu318
Copy link
Author

louisliu318 commented Feb 13, 2019

@vinothchandar In my environment, I commented out the following code about hive in packaging\hoodie-spark-bundle\hoodie-spark-bundle.iml

            <relocation>
              <pattern>org.apache.hive.jdbc.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hive.jdbc.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hadoop.hive.metastore.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.metastore.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hive.common.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hive.common.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hadoop.hive.common.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.common.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hadoop.hive.conf.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.conf.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hive.service.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hive.service.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hadoop.hive.service.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.service.</shadedPattern>
            </relocation>

@bvaradar bvaradar self-assigned this Apr 9, 2019
@bvaradar
Copy link
Contributor

bvaradar commented Apr 9, 2019

This should be fixed by #633

@n3nash
Copy link
Contributor

n3nash commented Apr 10, 2019

closing this ticket in favor of #633 fixing the underlying issue

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants