Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Racing condition with parquet writer #1792

Open
richiesgr opened this issue Jan 3, 2021 · 3 comments
Open

Racing condition with parquet writer #1792

richiesgr opened this issue Jan 3, 2021 · 3 comments

Comments

@richiesgr
Copy link

Hi
After debugging a lot secor when writing Parquet from Avro message. I come to the conclusion that a possible racing condition can occurs.

  • So Iv'e a topic with 300 Partitions
  • Start 10 Pods - 7 Core per pod
  • 7 Thread per pod

Result not enough parquet writer to handle every partition. So start a IndexOutOfBoundsException
This occurs into the parquet writer no code related to secor. As you know the parquet writer is not thread safe. Is there a possibility one Parquet writer is used by more that 1 thread ?

My investigations show me that assumption is correct because the only workaround I found at this moment is to have a much higher number of secor.consumer.threads to be sure that Parquet writer is not reused by mistake

Can you confirm ?
Thanks

@HenryCaiHaiying
Copy link
Contributor

HenryCaiHaiying commented Jan 4, 2021 via email

@richiesgr
Copy link
Author

richiesgr commented Jan 4, 2021

Hi
My question was if my assumption are valid ?
yes we can debug with many tools but As I read the code I see there is 1 parquet writer by file because it's stored in a hashset<path,writer> but is there by any chance a problem a race condition trying to use the same parquet writer by 2 threads ?
Because in this case it's more a design problem than a code bug

@HenryCaiHaiying
Copy link
Contributor

HenryCaiHaiying commented Jan 4, 2021 via email

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants