Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Lida Summarizer, data type convertion error #117

Open
Dejian0328 opened this issue May 7, 2024 · 2 comments
Open

Lida Summarizer, data type convertion error #117

Dejian0328 opened this issue May 7, 2024 · 2 comments

Comments

@Dejian0328
Copy link

Does anyone facing this issue?
I plan to do a summarization on the dataframe, end up having a datatype issue.
Can you please advice on this.

df = pd.DataFrame.from_records(data, columns=columns)
data_summary = lida.summarize(df, summary_method="llm", textgen_config=textgen_config)

df:
ContributionID MemberID EmployerID ContributionMonth EmployeeShare
0 1 27 15 May 883.43
1 2 44 2 December 626.79
2 3 1 17 January 732.94
3 4 28 15 September 149.57
4 5 49 15 September 616.06
5 6 45 8 February 154.46
6 7 41 16 August 941.70
7 8 2 3 July 707.85
8 9 2 8 May 186.81
9 10 22 7 June 558.11

EmployerShare TotalContribution ContributionDate
0 536.68 1420.11 2021-05-13
1 368.82 995.61 2024-12-23
2 716.15 1449.09 2021-01-03
3 258.10 407.67 2022-09-27
4 519.45 1135.51 2022-09-09
5 840.50 994.96 2022-02-25
6 990.86 1932.56 2020-08-17
7 960.77 1668.62 2021-07-08
8 349.01 535.82 2021-05-16
9 585.05 1143.16 2022-06-30

error log:

\lida\components\manager.py:131, in Manager.summarize(self, data, file_name, n_samples, summary_method, textgen_config)
[128] data = read_dataframe(data)
[130] self.data = data
--> [131] return self.summarizer.summarize(
[132] data=self.data, text_gen=self.text_gen, file_name=file_name, n_samples=n_samples,
[133] summary_method=summary_method, textgen_config=textgen_config)

\lida\components\summarizer.py:130, in Summarizer.summarize(self, data, text_gen, file_name, n_samples, textgen_config, summary_method, encoding)
[128] # modified to include encoding
[129] data = read_dataframe(data, encoding=encoding)
--> [130] data_properties = self.get_column_properties(data, n_samples)
[132 # default single stage summary construction
...
File tslib.pyx:596, in pandas._libs.tslib.array_to_datetime()

File tslib.pyx:588, in pandas._libs.tslib.array_to_datetime()

TypeError: <class 'decimal.Decimal'> is not convertible to datetime, at position 0

@skyprince999
Copy link

skyprince999 commented May 9, 2024

can you share a copy of the data. Is it a tsv file?

Typically while summarizing the function uses the pandas.to_datetime function to convert it to a datetime object. If it doesnt find it in correct format it raises an error.

@Dejian0328
Copy link
Author

I extract the data from a Azure SQL DB, using pyodbc cursor.
The conversion raise an error when the data is in decimal data type. Once I convert them manually into float in the Azure DB, then the summarize function works fine.

The error is raised when I do not exclude EmployeeShare, EmployerShare and TotalContribution columns

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants