Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

LookupClient.QueryAsync is not concurrency-safe #209

Closed
sengelha opened this issue Feb 7, 2024 · 2 comments · Fixed by #218
Closed

LookupClient.QueryAsync is not concurrency-safe #209

sengelha opened this issue Feb 7, 2024 · 2 comments · Fixed by #218
Milestone

Comments

@sengelha
Copy link

sengelha commented Feb 7, 2024

Summary
In DnsClient 1.7.0, LookupClient.QueryAsync is not concurrency-safe. If an application issues two concurrent requests to lookup DNS names A and B, the request to lookup A may return the result from the lookup of B because they can share the same DNS request unique id.

How to Reproduce
At work we have a number of DNS TXT records which we use for service discovery purposes. We determined, by connecting to the DnsClient internal logger, that two DnsRequestHeaders were generated with identical query ids, and the response for request B was delivered to a request A. The fact that the DNS TXT record may be a few hundred bytes and the response may take more than a trivial amount of time to deliver likely makes this bug easier to observe.

I tried to reproduce the bug by writing a program doing CNAME lookups of popular public domain names but I was unable to do so.

Here are some sample log records from the repro, with some fields replaced with placeholders:

2/7/2024 12:30:46 PM [DBG] Begin query [15842 - Qs: 1 Recursion: True OpCode: Query] via UDP => adls.11111111-1111-1111-1111-111111111111.filepaas.r1.kcura.com IN TXT on [[2600:1700:1cc2:9800::1]:53, 192.168.1.254:53].
2/7/2024 12:30:46 PM [DBG] Begin query [15842 - Qs: 1 Recursion: True OpCode: Query] via UDP => adls.22222222-2222-2222-2222-222222222222.filepaas.relativity.one IN TXT on [[2600:1700:1cc2:9800::1]:53, 192.168.1.254:53].
2/7/2024 12:30:46 PM [DBG] TryResolve 15842 via UDP => adls.11111111-1111-1111-1111-111111111111.filepaas.r1.kcura.com IN TXT on [2600:1700:1cc2:9800::1]:53, try 1/3.
2/7/2024 12:30:46 PM [DBG] TryResolve 15842 via UDP => adls.22222222-2222-2222-2222-222222222222.filepaas.relativity.one IN TXT on [2600:1700:1cc2:9800::1]:53, try 1/3.
2/7/2024 12:30:46 PM [DBG] Got 1 answers for query 15842 via UDP => adls.11111111-1111-1111-1111-111111111111.filepaas.r1.kcura.com IN TXT from [2600:1700:1cc2:9800::1]:53.
2/7/2024 12:30:46 PM [DBG] Response 15842 => adls.22222222-2222-2222-2222-222222222222.filepaas.relativity.one. IN TXT opt record sets buffer of [2600:1700:1cc2:9800::1]:53 to 4096.
2/7/2024 12:30:46 PM [VRB] tid=66 indx=33: dnsEntry=adls.11111111-1111-1111-1111-111111111111.filepaas.r1.kcura.com request audit trail: ; (2 server found)
;; Got answer:
;; ->>HEADER<<- opcode: Query, status: No Error, id: 15842
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; UDP: 4096; code: NoError
;; QUESTION SECTION:
adls.22222222-2222-2222-2222-222222222222.filepaas.relativity.one. 	IN 	TXT

;; ANSWER SECTION:
adls.22222222-2222-2222-2222-222222222222.filepaas.relativity.one. 	3600 	IN 	TXT 	"...ELIDED..."

;; Query time: 5 msec
;; SERVER: 2600:1700:1cc2:9800::1#53
;; WHEN: Wed Feb 07 18:30:46 Z 2024
;; MSG SIZE  rcvd: 353

Root Cause
The class DnsRequestHeader uses a static Random instance to generate query ids. This Random instance is not concurrency-safe, which means that two threads which simultaneously construct DnsRequestHeader can result generate the same unique 16-bit request identifier in the method DnsRequestHeader.GetNextUniqueId(). If DNS requests A and B happen to generate the same unique request identifier, the response for DNS request B may be delivered to DNS request A.

Suggested Fix

  1. On .NET 6+, use Random.Shared instead of new Random(), as Random.Shared is a thread-safe Random instance.
  2. Otherwise, lock or otherwise guard the concurrency around the Random instance.
@MichaCo MichaCo added this to the 1.8.0 milestone Feb 8, 2024
@MichaCo
Copy link
Owner

MichaCo commented Feb 8, 2024

Hi @sengelha,
First of all, I'll change the implementation for creating the query Id to be thread safe, although I wasn't able to reproduce any problems.

That being said, because of the limitation for min/max value being between 0 and 65535, collisions could still occur in theory.

After reading a bit how the request identifier is supposed to be used I found that the Id is not supposed to be used alone to route a response. The server is supposed to use the source port plus the id.
Source port is also randomized by the UDP client and cannot be used twice at the same time, which, I think, should make it nearly impossible to get the wrong response even if the identifier doesn't change.

I tested that locally with using bind or some public DNS servers, sending 10 queries in parallel in a loop, all queries with the same hardcoded ID (not random); and I did not see any issues at all.

Could you tell me what DNS server you are using or if that is a custom implementation which might not respect the source port properly? Maybe that explains why you see that behavior.

@sengelha
Copy link
Author

sengelha commented Feb 8, 2024

Thanks @MichaCo (btw library is great other than this obscure bug, thank you)

Something I didn't mention above. In order to reproduce, I had to disable DnsClient lookup caching (LookupClientOptions.UseCache = false) (an unusual configuration, and not something we use in production of course).

I am running my tests from a Windows 10 Version 22H2 (OS Build 19045.3930) machine. The DNS server is my corporate intranet DNS server, which I would guess is our Active Directory domain server, but I don't know for sure.

FWIW, not using request id alone and using source port + id instead seems even more sensible to me.

MichaCo added a commit that referenced this issue Mar 10, 2024
* better impl for TCP connection timeouts
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
@sengelha @MichaCo and others