Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error when uses setPredefinedTypes "java.lang.String cannot be cast to scala.Enumeration$Value" #49

Closed
laurinhocode opened this issue Jun 21, 2021 · 1 comment · Fixed by #56

Comments

@laurinhocode
Copy link

laurinhocode commented Jun 21, 2021

When i try to use a dict of column in setPredefinedTypes(column_dict) pydeequ send "An error occurred while calling o97.run. java.lang.String cannot be cast to scala.Enumeration$Value"

dict example = {'code': 'string', 'filler': 'int'}

Deequ version: 1.0.3
Pydeequ: 0.1.7

.setKLLParameters(KLLParameters(spark, 2, 0.64, 4))
File "/tmp/pydeequ.zip/pydeequ/profiles.py", line 103, in run
run = self._ColumnProfilerRunBuilder.run()
File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o97.run.
: java.lang.ClassCastException: java.lang.String cannot be cast to scala.Enumeration$Value
at com.amazon.deequ.profiles.GenericColumnStatistics.typeOf(ColumnProfiler.scala:41)
at com.amazon.deequ.profiles.ColumnProfiler$$anonfun$castNumericStringColumns$1.apply(ColumnProfiler.scala:454)
at com.amazon.deequ.profiles.ColumnProfiler$$anonfun$castNumericStringColumns$1.apply(ColumnProfiler.scala:452)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at com.amazon.deequ.profiles.ColumnProfiler$.castNumericStringColumns(ColumnProfiler.scala:452)
at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:155)
at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:61)
at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

gucciwang pushed a commit that referenced this issue Jul 19, 2021
…eck signature (#56)

* #49-setPredefinedTypes();#34-_AnalyzerObject

Co-authored-by: Calvin Wang <calviwan@amazon.com>
Co-authored-by: Joan Aoanan <jaoanan@amazon.com>
@jaoanan1126
Copy link
Contributor

jaoanan1126 commented Jul 19, 2021

Hi @costalauro! The method setDefinedTypes is now implemented and is used to baseline the dataType for the columns.

An an example of the dictionary would be: {'columnName': DataTypeInstances.String} . Where the key references the column name and the value references the DataTypeInstances class. Try it out and let me know if you have any questions!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
2 participants