Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

AttributeError: 'NoneType' object has no attribute 'proto' #617

Open
jonaslejon opened this issue Nov 29, 2022 · 6 comments
Open

AttributeError: 'NoneType' object has no attribute 'proto' #617

jonaslejon opened this issue Nov 29, 2022 · 6 comments
Labels

Comments

@jonaslejon
Copy link

Describe the bug
Running pyshark on a specific pcap file makes it crash with the exception:
AttributeError: 'NoneType' object has no attribute 'proto'

Full backtrace:

Traceback (most recent call last):
  File "/Users/jonasl/pcap2redis/pyshark2redis.py", line 27, in <module>
    for pkt in cap:
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/capture/capture.py", line 221, in _packets_from_tshark_sync
    packet, data = self.eventloop.run_until_complete(
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/tshark_xml.py", line 27, in get_packets_from_stream
    return await super().get_packets_from_stream(stream, existing_data, got_first_packet=got_first_packet)
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/base_parser.py", line 15, in get_packets_from_stream
    packet = self._parse_single_packet(packet)
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/tshark_xml.py", line 30, in _parse_single_packet
    return packet_from_xml_packet(packet, psml_structure=self._psml_structure)
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/tshark_xml.py", line 85, in packet_from_xml_packet
    return _packet_from_pdml_packet(xml_pkt)
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/tshark_xml.py", line 93, in _packet_from_pdml_packet
    layers = [XmlLayer(proto) for proto in pdml_packet.proto]
AttributeError: 'NoneType' object has no attribute 'proto'

To Reproduce
Run the following code on the PCAP-file attached:

# Read PCAP file
cap = pyshark.FileCapture(sys.argv[1])

print("# Starting to read PCAP file: " + sys.argv[1])

for pkt in cap:

Expected behavior
The library should not crash parsing tshark output.

Versions (please complete the following information):

  • OS: macOS
  • pyshark version: 0.5.3
  • tshark version: 4.0.1

Example pcap / packet
The following PCAP-file can be used for testing:
crash.pcap.gz

@jonaslejon jonaslejon added the bug label Nov 29, 2022
@thebigdalt
Copy link

thebigdalt commented Dec 5, 2022

To add another data point, I've encountered the exact same bug on an M1 Mac with the same pyshark and tshark versions with various PCAP files. However, I have not encountered this issue on an x64 machine running Ubuntu 22.04 with the same versions on the same capture files. My guess is it's specific to macOS and/or Apple Silicon hardware.

@mahtin
Copy link
Contributor

mahtin commented Dec 14, 2022

I came to this issue because I have code that's crashing with the same error while listening on en0 on a MacBook Air.

I tried your pcap file with tshark version: TShark (Wireshark) 4.0.2 (v4.0.2-0-g415456d13370) on an M1 MacBook Air with BigSur 1.17.1 and sadly could not get a crash. It worked cleanly. I'm also running pyshark version: 0.5.3

(a few moments later)[1]

I went back a rev and downloaded 4.0.1 with tshark version: TShark (Wireshark) 4.0.1 (v4.0.1-0-ge9f3970b1527) and sadly I still could not get your code to crash. :( I did try it quite a few times.

Oh. Python version 3.10.8

I will try to add some specific crash info for my situation. At least I can replicate that part.

Sorry I can't help much more (for now).

[1] Spongebob

@mahtin
Copy link
Contributor

mahtin commented Dec 14, 2022

(a few hours later)

I found the cause; but not the solution (yet). In pyshark/tshark/output_parser/tshark_xml.py these lines:

def packet_from_xml_packet(xml_pkt, psml_structure=None):
...
        xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
...

It's the call to lxml.objectify.fromstring() that returns None and then causes pure hell further down the code. There's no test for None.

I believe (in my case) that because the input xml_pkt variable contains some Unicode characters, it fails. I have a copy of the value of xml_pkt that causes this. It's attached.

The following code show fromstring() producing None from the attached file.

import lxml.objectify
with open('xml-pkt.txt', 'r') as fd:
    xml_pkt = fd.read()
parser = lxml.objectify.makeparser(huge_tree=True, recover=True)
xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
if xml_pkt == None:
    print('xml_pkt None')
else:
    print('xml_pkt len() = %d' % len(xml_pkt))

I didn't debug any further.

xml_pkt.txt

BTW: The following parts of the xml file could well be what's triggering the error (if it really is a Unicode issue):

      <field name="" show="_rdlink._tcp.local: type PTR, class IN, Li Vol4ek 🐺iPhone ._rdlink._tcp.local" size="36" pos="443" value="c00c000c00010000114f0018154c6920566f6c34656b20f09f90ba6950686f6e6520c00c">
        <field name="dns.resp.name" showname="Name: _rdlink._tcp.local" size="2" pos="443" show="_rdlink._tcp.local" value="c00c"/>
        <field name="dns.resp.type" showname="Type: PTR (domain name PoinTeR) (12)" size="2" pos="445" show="12" value="000c"/>
        <field name="dns.resp.class" showname=".000 0000 0000 0001 = Class: IN (0x0001)" size="2" pos="447" show="0x0001" value="1" unmaskedvalue="0001"/>
        <field name="dns.resp.cache_flush" showname="0... .... .... .... = Cache flush: False" size="2" pos="447" show="0" value="0" unmaskedvalue="0001"/>
        <field name="dns.resp.ttl" showname="Time to live: 4431 (1 hour, 13 minutes, 51 seconds)" size="4" pos="449" show="4431" value="0000114f"/>
        <field name="dns.resp.len" showname="Data length: 24" size="2" pos="453" show="24" value="0018"/>
        <field name="dns.ptr.domain_name" showname="Domain Name: Li Vol4ek 🐺iPhone ._rdlink._tcp.local" size="24" pos="455" show="Li Vol4ek 🐺iPhone ._rdlink._tcp.local" value="154c6920566f6c34656b20f09f90ba6950686f6e6520c00c"/>
      </field>

or:

      <field name="" show="_rdlink._tcp.local: type PTR, class IN, DN💋._rdlink._tcp.local" size="21" pos="572" value="c00c000c000100001155000906444ef09f928bc00c">
        <field name="dns.resp.name" showname="Name: _rdlink._tcp.local" size="2" pos="572" show="_rdlink._tcp.local" value="c00c"/>
        <field name="dns.resp.type" showname="Type: PTR (domain name PoinTeR) (12)" size="2" pos="574" show="12" value="000c"/>
        <field name="dns.resp.class" showname=".000 0000 0000 0001 = Class: IN (0x0001)" size="2" pos="576" show="0x0001" value="1" unmaskedvalue="0001"/>
        <field name="dns.resp.cache_flush" showname="0... .... .... .... = Cache flush: False" size="2" pos="576" show="0" value="0" unmaskedvalue="0001"/>
        <field name="dns.resp.ttl" showname="Time to live: 4437 (1 hour, 13 minutes, 57 seconds)" size="4" pos="578" show="4437" value="00001155"/>
        <field name="dns.resp.len" showname="Data length: 9" size="2" pos="582" show="9" value="0009"/>
        <field name="dns.ptr.domain_name" showname="Domain Name: DN💋._rdlink._tcp.local" size="9" pos="584" show="DN💋._rdlink._tcp.local" value="06444ef09f928bc00c"/>
      </field>

@mahtin
Copy link
Contributor

mahtin commented Dec 15, 2022

Further testing (on a large public network - in this case an airport wifi) shows that the above crash can be triggered by mdns packets with unicode (vs ascii) name. Setting the packet filter to not udp port 5353 allows the code to run continuously without hitting this error. Once you allow mdns packets the code crashes within a few seconds.

Secondly ... In a modification to my previous code, I could stop fromstring() failing by either reading in my xml-pkt file using open(..., 'rb') or open(..., 'r', encoding='utf-8'). Maybe inside pyshark/tshark/tshark.py the subprocess() should process the output as unicode? I have not experimented with this yet.

BTW: I'm not actually snooping on a public network - that was just a way to get a lot of inbound random packets. The received packets from pyshark were sent to /dev/null, I was just waiting for the error to occur. Lucky for me there's iPhone's around here with Emoji names. :)

@mahtin
Copy link
Contributor

mahtin commented Dec 16, 2022

(sorry - I've been traveling and hence unable to dedicate some focus time to this). However ...

Here's the fix (which must be tested in more cases than just this one):

$ git diff src/pyshark/tshark/output_parser/tshark_xml.py
diff --git a/src/pyshark/tshark/output_parser/tshark_xml.py b/src/pyshark/tshark/output_parser/tshark_xml.py
index e6f4379..b03391d 100644
--- a/src/pyshark/tshark/output_parser/tshark_xml.py
+++ b/src/pyshark/tshark/output_parser/tshark_xml.py
@@ -77,9 +77,9 @@ def packet_from_xml_packet(xml_pkt, psml_structure=None):
     :return: Packet object.
     """
     if not isinstance(xml_pkt, lxml.objectify.ObjectifiedElement):
-        parser = lxml.objectify.makeparser(huge_tree=True, recover=True)
+        parser = lxml.objectify.makeparser(huge_tree=True, recover=True, encoding='utf-8')
         xml_pkt = xml_pkt.decode(errors='ignore').translate(DEL_BAD_XML_CHARS)
-        xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
+        xml_pkt = lxml.objectify.fromstring(xml_pkt.encode('utf-8'), parser)
     if psml_structure:
         return _packet_from_psml_packet(xml_pkt, psml_structure)
     return _packet_from_pdml_packet(xml_pkt)
$

The real fix is xml_pkt.encode('utf-8') for fromstring() and the just-to-be-pedantic fix is the , encoding='utf-8' for makeparser(). These two seem to fully allow mdns packets to flow thru the code without crashing.

Once again ... I'm testing on an M1 MacPool Air.

@mahtin
Copy link
Contributor

mahtin commented Dec 18, 2022

I've added PR #624.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants