Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

There is a http request splitting vulnerability #1798

Closed
wubonetcn opened this issue Sep 24, 2021 · 29 comments
Closed

There is a http request splitting vulnerability #1798

wubonetcn opened this issue Sep 24, 2021 · 29 comments

Comments

@wubonetcn
Copy link

wubonetcn commented Sep 24, 2021

By sending a specific HTTP POST request, ZEEK will split a request into multiple and split the wrong fields. This will invalidate any ZEEK HTTP based security analysis(Including ZEEK's internal security plug-ins).

POC

curl -A '' -H "Content-Length: 1" -X POST -d 'ePOST /a.html HTTP/1.0
Content-Length: 1

abc abc 
abc' http://192.168.220.132/a.html

Detailed information

ZEEK version

[root@localhost etc]# zeek -v
zeek version 4.1.0

Start ZEEK

[root@localhost etc]# zeekctl deploy
checking configurations ...
installing ...
removing old policies in /opt/zeek/spool/installed-scripts-do-not-touch/site ...
removing old policies in /opt/zeek/spool/installed-scripts-do-not-touch/auto ...
creating policy directories ...
installing site policies ...
generating standalone-layout.zeek ...
generating local-networks.zeek ...
generating zeekctl-config.zeek ...
generating zeekctl-config.sh ...
stopping ...
stopping zeek ...
starting ...
starting zeek ...

Send normal request

$ curl -A '' -H "Content-Length: 1" -X POST -d 'abcdefg' "http://192.168.220.132/b.html"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   153  100   146  100     7  93890   4501 --:--:-- --:--:-- --:--:--  149k<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

ZEEK generates 1 log
The request method is POST, the host is 192.168.220.132, and the uri is / b.html

#separator \x09
#set_separator	,
#empty_field	(empty)
#unset_field	-
#path	http
#open	2021-09-24-05-00-24
#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	trans_depth	method	host	uri	referrer	version	user_agent	origin	request_body_len	response_body_len	status_code	status_msg	info_code	info_msg	tags	username	password	proxied	orig_fuids	orig_filenames	orig_mime_types	resp_fuids	resp_filenames	resp_mime_types
#types	time	string	addr	port	addr	port	count	string	string	string	string	string	string	string	count	count	count	string	count	string	set[enum]	string	string	set[string]	vector[string]	vector[string]	vector[string]	vector[string]	vector[string]	vector[string]
1632484819.459273	C24wiZ28yo32mwBMu3	192.168.220.1	64691	192.168.220.132	80	1	POST	192.168.220.132	/b.html	-	-	-	-	1	0	-	-	-	-	(empty)	-	-	-	Fbvx9w4gFY2M8p5l5b	-	-	-	-	-

Send poc

$ curl -A '' -H "Content-Length: 1" -X POST -d 'ePOST /a.html HTTP/1.0
> Content-Length: 1
> abc' "http://192.168.220.132/a.html"
> abc abc
> abc' "http://192.168.220.132/a.html"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   200  100   146  100    54  67655  25023 --:--:-- --:--:-- --:--:--  195k<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

ZEEK generates HTTP logs with 3 errors
The first display request method is POST, host is - uri is / a.html
The second display request method is bc, host is - uri is abc
The third display request method is POST, the host is 192.168.220.132, and the uri is / a.html
Obviously, ZEEK divides an HTTP request into three, and the request method, request host and request URI are misplaced. This will invalidate any ZEEK HTTP based analysis. HTTP request splitting vulnerability exists.

#separator \x09
#set_separator	,
#empty_field	(empty)
#unset_field	-
#path	http
#open	2021-09-24-05-00-24
#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	trans_depth	method	host	uri	referrer	version	user_agent	origin	request_body_len	response_body_len	status_code	status_msg	info_code	info_msg	tags	username	password	proxied	orig_fuids	orig_filenames	orig_mime_types	resp_fuids	resp_filenames	resp_mime_types
#types	time	string	addr	port	addr	port	count	string	string	string	string	string	string	string	count	count	count	string	count	string	set[enum]	string	string	set[string]	vector[string]	vector[string]	vector[string]	vector[string]	vector[string]	vector[string]
1632484819.459273	C24wiZ28yo32mwBMu3	192.168.220.1	64691	192.168.220.132	80	1	POST	192.168.220.132	/b.html	-	-	-	-	1	0	-	-	-	-	(empty)	-	-	-	Fbvx9w4gFY2M8p5l5b	-	-	-	-	-
1632484946.877654	CmKAkh1zenUA59sVsj	192.168.220.1	64698	192.168.220.132	80	2	POST	-	/a.html	-	-	-	-	1	0	-	-	-	-	(empty)	-	-	-	FUmgdL1Zmy0pyu5aM6	-	-	-	-	-
1632484946.877654	CmKAkh1zenUA59sVsj	192.168.220.1	64698	192.168.220.132	80	3	bc	-	abc	-	-	-	-	0	0	-	-	-	-	(empty)	-	-	-	-	-	-	-	-	-
1632484946.877654	CmKAkh1zenUA59sVsj	192.168.220.1	64698	192.168.220.132	80	1	POST	192.168.220.132	/a.html	-	-	-	-	1	0	-	-	-	-	(empty)	-	-	-	F0j8nJ1AX0MLkUx1Y7	-	-	-	-	-
@rsmmr
Copy link
Member

rsmmr commented Sep 28, 2021

Can you elaborate what you think Zeek should be doing in this case instead?

What Zeek sees here from the client is this:

POST /a.html HTTP/1.1
Host: 127.0.0.1:8080
Accept: */*
Content-Length: 1
Content-Type: application/x-www-form-urlencoded

ePOST /a.html HTTP/1.0
Content-Length: 1

abc abc
abc

Afaict, Zeek is actually doing as good as it can here: the Content-Length headers terminate the HTTP messages, so that HTTP parser will then expect another HTTP request to follow in both cases, and try to parse it accordingly.

@wubonetcn
Copy link
Author

wubonetcn commented Sep 28, 2021

Can you elaborate what you think Zeek should be doing in this case instead?

What Zeek sees here from the client is this:

POST /a.html HTTP/1.1
Host: 127.0.0.1:8080
Accept: */*
Content-Length: 1
Content-Type: application/x-www-form-urlencoded

ePOST /a.html HTTP/1.0
Content-Length: 1

abc abc
abc

Afaict, Zeek is actually doing as good as it can here: the Content-Length headers terminate the HTTP messages, so that HTTP parser will then expect another HTTP request to follow in both cases, and try to parse it accordingly.

After you send this request, check /opt/zeek/logs/current/http.log to find the logs of these errors
In this case, ZEEK should produce only one correct record

1632484946.877654	CmKAkh1zenUA59sVsj	192.168.220.1	64698	192.168.220.132	80	1	POST	192.168.220.132	/a.html	-	-	-	-	1	0	-	-	-	-	(empty)	-	-	-	F0j8nJ1AX0MLkUx1Y7	-	-	-	-	-

However, ZEEK produced three lines of logs. And the second line marks HTTP method as bc.
Obviously, there is no bc such request method in http.

1632484946.877654	CmKAkh1zenUA59sVsj	192.168.220.1	64698	192.168.220.132	80	3	bc	-	abc	-	-	-	-	0	0	-	-	-	-	(empty)	-	-	-	-	-	-	-	-	-

Using Content-Length to segment HTTP requests is dangerous. In this POC, you send an HTTP request, but there are actually two Content-Lengthfields, so this HTTP request will be treated as two.

Here is a similar example, you can have a look.
https://www.blackhat.com/html/webcast/10292020-http-request-smuggling-in-2020.html

@rsmmr
Copy link
Member

rsmmr commented Sep 29, 2021

Yeah, I understand what's happening and see the multiple entries in http.log. What I'm wondering is how do you think Zeek should be parsing this instead. The following is really the main challenge:

Using Content-Length to segment HTTP requests is dangerous

The thing is: That's what Content-Length is for. The receiving server will likewise use it to segment the client's data. I don't see how else Zeek (or the server) could parse this otherwise. What would you suggest?

@sethhall
Copy link
Member

I would argue on the side of what Robin's saying here. The intent of the Zeek logs is frequently to take the incoming data and, as accurately as possible, give a representation in logs. I believe that in this case Zeek did exactly that and gave an accurate representation of what was on the wire. If Zeek only printed the single log line as you suggested, it would be confusing because there are legitimately more requests there and they would have been hidden.

@wubonetcn
Copy link
Author

But more importantly, ZEEK records a request method as bc, which is a fatal error. The real HTTP request method sent by hacker is POST, and there is only one method.
Even if it is recorded truthfully, it should not record bc such an HTTP request method, because there is no such request method in http.

@sethhall
Copy link
Member

Because Zeek never actively participates in transactions, but only sits to the side and observes them we have to acknowledge reality and the reality is that not all of the HTTP verbs that people use are defined in any RFC. If you want to do detection activity, you can create the list of HTTP verbs that you consider to be valid and watch for anything that diverges. The argument that you are making would put us in the position where we wouldn't even be able to do detection of someone attempting a request splitting vulnerability.

Actually, now that I think about it, we already generate "Weirds" whenever a request is seen that isn't in the list defined here:

option http_methods: set[string] = {

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

Hi @wubonetcn

I just took a look at the talk that you referenced. And I think that what you show here is actually not buggy behavior.

From my understanding, HTTP request smuggling works by you having several different content-length headers in a HTTP request. So something like...

POST / HTTP/1.1
Content-Length: 0
Content-Length: 10
...

where different software interprets the content-length differently. In the example that you give, there is only one content-length - so there only is one viable interpretation, which Zeek chooses.

I just did a quick test what happens when several different Content-length headers are present, using...

curl -A '' -H "Content-Length: 1" -H "Content-Length: 10" -X POST -d 'POST /a.html HTTP/1.0' http://google.com

The resulting log here is not optimal, we definitely get confused:

1633003668.962276	CN0f2c1bzcS0IQu6x6	192.168.0.105	51430	142.250.187.206	80	1	POST	google.com	/	-	1.0	-	-	10	54	400	Bad Request	-	-(empty)	-	-	-	FRkdK44SX5siqyb0Hi	-	text/plain	FphsVe3pQDVQSOSrq9	-	text/html
1633003668.984800	CN0f2c1bzcS0IQu6x6	192.168.0.105	51430	142.250.187.206	80	2	ml	-	(empty)	-	-	-	-	0	0	-	-	-	-	(empty)	--	-	-	-	-	-	-	-

and weird.log:

1633003668.984800	CN0f2c1bzcS0IQu6x6	192.168.0.105	51430	142.250.187.206	80	missing_HTTP_uri	-	F	zeek	HTTP
1633003668.984800	CN0f2c1bzcS0IQu6x6	192.168.0.105	51430	142.250.187.206	80	HTTP_version_mismatch	-	F	zeek	HTTP
1633003668.984800	CN0f2c1bzcS0IQu6x6	192.168.0.105	51430	142.250.187.206	80	unknown_HTTP_method	ml	F	zeek	-

My read of RFC7230 is that several content-length headers with different values is just completely illegal, and that, in that case, a server (or client) should stop processing the data. Since we are a man-in-the-middle-observer we don't really have the option of stopping a connection that does that. My feeling is that we should report a weird if we see several different content-length headers. I am not sure what we should do from a parsing point of view in these cases - do you have any opinion of that?

In any case - I don't think that this, as originally reported, is an issue. Please let me knof if I am mistaken about this.

@wubonetcn
Copy link
Author

In fact, you can easily solve this problem.
I see a uid field in ZEEK's http.log log, which remains unchanged in the log parsing of the three errors. I recommend that zeek use the uid field to incorporate the parsing of these errors.
One uid corresponds to one HTTP log output.
All requests with the same uid field should no longer be split based on Content-Length.
I think ZEEK can solve the problem of HTTP request smuggling by using uid field to merge output in parsing.

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

we have to use the content-length to parse requests. As, by the way, the talk you referenced makes clear. Without the content-length you cannot decide when a request ends - and there can (and will) be several HTTP requests in one connection.

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

For example, how should Zeek parse the result of the following curl command without using Content-length:

curl -X POST -d 'POST /a.html HTTP/1.0' http://example.com/contact.php -X POST -d 'POST /a.html HTTP/1.0' http://example.com/contact.php

This results in 4 POST lines in 2 requests - that are correctly parsed. There also is no ambiguity what should happen here - just as there is none in your example. The only problem that I can see is having two content-lengths in one http request - which is an undecidable problem.

@sethhall
Copy link
Member

Oh! The request splitting is due to multiple repeated headers. The original example didn't do that. I definitely agree that it would make sense to do a weird if there are multiple content-length headers. That's definitely just another one of those undecidable situations that we inevitably run into.

@wubonetcn
Copy link
Author

Yes, it's not only your software that has this problem, but also many software.
I see that F5 seems to solve the same problem. You can have a look.
https://support.f5.com/csp/article/K50375550
Their solution is to strictly follow the HTTP RFCs (for example, RFC7230, RFC2616).

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

@wubonetcn - the difference is that they are an active participant. We are a passive participant. We cannot terminate a connection, like a proxy can.

So - we have to choose to parse one variant if we are presented with multiple content-length headers. The only alternative is to stop processing the HTTP connection completely - which also does not seem like the optimal way to do it, since that means that no more information about the connection will be logged.

@wubonetcn
Copy link
Author

I did an experiment before. Using Wireshark's HTTP tracking function, there will be no multiple parsing results. They can handle this problem well.

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

Wireshark parses your example POC just like Zeek does, as 3 requests...

Screenshot 2021-09-30 at 13 56 37

Follow tcp-stream output:

Screenshot 2021-09-30 at 13 57 38

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

If you think that wireshark does this differently to Zeek, please provide a pcap where this is the case.

@wubonetcn
Copy link
Author

The example you provided can be used. Red is the request and blue is the response. The red request is complete and not split.

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

This is not true - you misunderstand the output of Wireshark.

Red/blue is just the direction of traffic in this case (follow TCP stream). If you look at the output above, wireshark does indeed split this into three different requests - just as Zeek does.

@wubonetcn
Copy link
Author

Oh, it seems that I read Wireshark wrong.It seems that we need to find other solutions.

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

As mentioned by Robin in the second comment - for your report there actually is no problem. Wireshark, Zeek, and proxies will all interpret this as 3 different requests - in accordance with the standards.

The problem that we do have is that we cannot decide what to do if we get a request of the form:

POST /a.html HTTP/1.1
Host: 127.0.0.1:8080
Accept: */*
Content-Length: 1
Content-Length: 10
Content-Type: application/x-www-form-urlencoded

In this case, we have no way of knowing if the software at the other end of the connection will interpret this with content-length 1, or content-length 10 - but we do have to decide for one of the choices. The RFC just tells you that this is invalid - which is true, but does not directly help us.

We can raise a weird - but we still do have to decide what to do parsing-wise. And this is just undecidable - there is no correct answer.

@wubonetcn
Copy link
Author

OK,this is really a difficult problem to solve. I'll contact you if I can think of a solution.

@stevesmoot
Copy link
Contributor

stevesmoot commented Sep 30, 2021 via email

@wubonetcn
Copy link
Author

You may also be interested in the package https://github.com/precurse/zeek-httpattacks -s

On Thu, Sep 30, 2021 at 7:06 AM wubonetcn @.***> wrote: OK,this is really a difficult problem to solve. I'll contact you if I can think of a solution. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1798 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRHAWCRDOKXMJO4P5KYWBLUERVFBANCNFSM5EV47F3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Thank you. The script you provide can detect such events and generate a new event record. However, we use ZEEK as an HTTP traffic recorder. This security event detection is done by our other programs. ZEEK provides the original HTTP input. The problem now is that ZEEK will generate three records from an HTTP connection. In this way, we will also get 3 wrong inputs, which will lead to problems in our test results. Of course, other scripts based on HTTP statistics will also make errors, such as the HTTP flood detection script provided by ZEEK. Such scripts based on request quantity statistics will make errors because more records are generated.

@wubonetcn
Copy link
Author

My suggestion is that an HTTP connection only generates one record, even if there are multiple transmissions in the HTTP connection. ZEEK, as a faithful recorder, should be regarded as the same connection as long as the HTTP connection is not disconnected.

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

My suggestion is that an HTTP connection only generates one record, even if there are multiple transmissions in the HTTP connection. ZEEK, as a faithful recorder, should be regarded as the same connection as long as the HTTP connection is not disconnected.

If you mean this as "you should only record one http log line for each http TCP connection" - I do not see how this is viable/desirable/reasonable. One HTTP connection can contain thousands of requests - people use Zeek because they want this logged.

@0xxon
Copy link
Member

0xxon commented Sep 30, 2021

Thank you. The script you provide can detect such events and generate a new event record. However, we use ZEEK as an HTTP traffic recorder. This security event detection is done by our other programs. ZEEK provides the original HTTP input. The problem now is that ZEEK will generate three records from an HTTP connection. In this way, we will also get 3 wrong inputs, which will lead to problems in our test results.

As mentioned above - Zeek produces the output that is required by the HTTP RFCs. For your example there is no ambiguity around this - it is 3 requests, not one. So you do get the correct output - Zeek records the traffic as it happened (thus, as 3 requests, just as Wireshark, e.g., parses it)

Why do you want this logged as one request? Why do you think it should be logged as such?

@wubonetcn
Copy link
Author

Thank you. The script you provide can detect such events and generate a new event record. However, we use ZEEK as an HTTP traffic recorder. This security event detection is done by our other programs. ZEEK provides the original HTTP input. The problem now is that ZEEK will generate three records from an HTTP connection. In this way, we will also get 3 wrong inputs, which will lead to problems in our test results.

As mentioned above - Zeek produces the output that is required by the HTTP RFCs. For your example there is no ambiguity around this - it is 3 requests, not one. So you do get the correct output - Zeek records the traffic as it happened (thus, as 3 requests, just as Wireshark, e.g., parses it)

Why do you want this logged as one request? Why do you think it should be logged as such?

Didn't you see the previous discussion?

@wubonetcn
Copy link
Author

Thank you. The script you provide can detect such events and generate a new event record. However, we use ZEEK as an HTTP traffic recorder. This security event detection is done by our other programs. ZEEK provides the original HTTP input. The problem now is that ZEEK will generate three records from an HTTP connection. In this way, we will also get 3 wrong inputs, which will lead to problems in our test results.

As mentioned above - Zeek produces the output that is required by the HTTP RFCs. For your example there is no ambiguity around this - it is 3 requests, not one. So you do get the correct output - Zeek records the traffic as it happened (thus, as 3 requests, just as Wireshark, e.g., parses it)

Why do you want this logged as one request? Why do you think it should be logged as such?

rsmmr commented yesterday •
Yeah, I understand what's happening and see the multiple entries in http.log. What I'm wondering is how do you think Zeek should be parsing this instead. The following is really the main challenge:

Using Content-Length to segment HTTP requests is dangerous

The thing is: That's what Content-Length is for. The receiving server will likewise use it to segment the client's data. I don't see how else Zeek (or the server) could parse this otherwise. What would you suggest?

Rsmmr has confirmed. I don't want to answer this question again. The solution I proposed is just a solution idea, which does not mean that you will eventually do so.

@rsmmr
Copy link
Member

rsmmr commented Oct 1, 2021

Sorry, I don't follow what you're saying. I'll go ahead and close the ticket as everything's working as expected.

@rsmmr rsmmr closed this as completed Oct 1, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

5 participants