Description
Severity
Major
Versions
master,20,19,18,16
Components/Modules
chan_spy, channels
Operating Environment
Right now, testing with branch 18, but this should be on all versions.
Frequency of Occurrence
Frequent
Issue Description
The ChanSpy()
channel iterator holds references to channels for the duration of the listening/spying. These referenced channels will get destroyed first when the channel iteration resumes (when done listening to that one channel). This is a problem for the CEL HANGUP event, because all channels still in the iterator will not get any HANGUP reporting until this resumption: anyone relying on CEL HANGUP time for call duration statistics will get wrong values.
A.k.a. CEL HANGUP times can be wrong (late) and appear "batched".
Components
-
ChanSpy -- iterates over all channels using (e.g. ast_channel_iterator_by_name_new) -- keeps the iterator until the spying is done:
-
CEL hangup events -- are generated because of a publish by the ast_channel destructor:
Lines 2206 to 2213 in 1a7866b
asterisk/main/stasis_channels.c
Lines 948 to 950 in 1a7866b
Lines 910 to 921 in 1a7866b
To get accurate HANGUP timings from CEL, channel destruction should never be delayed for more than a short while.
But because some components (ChanSpy) can hold references to channels that are hung up, destruction can be delayed for an indetermined amount of time.
This results in inaccurate CEL HANGUP timestamps: those who rely on CEL HANGUP time for call duration statistics will get wrong values
Scenario
At t=0, we might have three channels:
- channelA, ref (1)
- channelB, ref (1)
- channelC, ref (1)
At t=1 we start spying on channels using ChanSpy. The channels (hash) container is filtered and turned into a (list) container (multi_container
()) holding a snapshot of then active channels that match the filter: the multi_iterator
(). This list/iterator is returned to the application (chanspy):
- channelA, +1 ref (=2) (
__ao2_link()
->rb_ao2_new_node()
->"Container node creation"
) - channelB, +1 ref (=2)
- channelC, +1 ref (=2)
At t=1, we traverse the iterator. We might decide to skip channelA and start spying on channelB:
- channelA, +1 ref
/* Bump ref of returned object */ ("Next iterator object.")
, replace last_node (NULL), last_node = channelA, return channelA - ignore channelA: -1 ref, get next
- channelB, +1 ref
/* Bump ref of returned object */ ("Next iterator object.")
, replace last_node (channelA, -1 ref), last_node = channelB - start listening on channelB (
channel_spy
)
At this point, the references look like this:
- channelA, +1-1-1 ref (=1)
- channelB, +1 ref (=3)
- channelC, +0 ref (=2)
Now, at t=2, channelA and channelC hang up:
- channelA, -1 ref (=0) -> destroying
- channelB, +0 ref (=3)
- channelC, -1 ref (=1)
First at t=3, when either the spying channel hangs up, or the spyee hangs up, do we exit channel_spy
and do we resume looking at the multi_iterator:
- channelC, -1 ref (=0) -> destroying
This causes the channelC destroying event to fire at t=3 instead of t=2. And that means that the reported CEL time for the hangup event is wrong.
Solutions
- Either the CEL HANGUP event should not be fired by the channel destructor,
- or no component should be allowed to hold references to channels that are hung up (read: channel iterators should be short-lived).
Someone should make a design decision on this. I would say the least impactful is to make the chanspy iterator short-lived, by copying the IDs of the channels and iterating over the channel IDs instead.
But if we can make the HANGUP event fire sooner by moving the AST_FLAG_DEAD+publish to something earlier, that works for me too.
Cheers,
Walter Doekes
OSSO B.V.
(*) The channel iteration:
ast_channel_iterator_by_name_new()
-> ast_channel_callback()
-> ao2_callback_data()
-> __ao2_callback_data()
-> internal_ao2_traverse()
asterisk/main/astobj2_container.c
Lines 260 to 273 in 1a7866b
asterisk/main/astobj2_container.c
Line 356 in 1a7866b
(an ao2_container list is created to hold the filtered channels)
next_channel()
-> ast_channel_iterator_next()
-> ao2_iterator_next()
-> __ao2_iterator_next()
:
asterisk/main/astobj2_container.c
Lines 603 to 604 in 1a7866b
Lines 994 to 999 in 1a7866b
(channels are popped from the ao2_container list, but not as immediate as we would like)
Relevant log output
Many charlies, some of whom are too late:
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc9b8021a58 'SIP/charlie-0000004c' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa01eb458 'SIP/charlie-0000004b' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa027a608 'SIP/charlie-0000004e' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa0263a48 'SIP/charlie-0000004d' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc9c001f878 'SIP/charlie-0000004f' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa022a6c8 'SIP/charlie-0000005b' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa0222048 'SIP/charlie-0000005d' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc9dc01e7c8 'SIP/charlie-0000005e' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc9e8020238 'SIP/charlie-0000005f' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa01a34c8 'SIP/charlie-00000041' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc98000cc68 'SIP/charlie-00000042' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc97c00c418 'SIP/charlie-00000044' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc99c01f868 'SIP/charlie-00000047' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa01cb7a8 'SIP/charlie-00000046' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa01e3208 'SIP/charlie-00000049' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc9bc021b18 'SIP/charlie-00000050' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc9c8021078 'SIP/charlie-00000052' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa02a91b8 'SIP/charlie-00000053' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc9d00211c8 'SIP/charlie-00000056' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc9e800a908 'SIP/charlie-0000000e' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa009c888 'SIP/charlie-0000000c' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa00ed368 'SIP/charlie-0000001f' destroying
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fcaa028ec78 'SIP/alice-00000060' destroying
[May 8 11:50:15] DEBUG[149070][C-0000001a] channel.c: Channel 0x7fcaa014ee18 'SIP/alice-00000031' destroying
[May 8 11:50:15] DEBUG[149081][C-0000001a] channel.c: Channel 0x7fc98800bad8 'SIP/alice-00000034' destroying
[May 8 11:50:15] DEBUG[149080][C-00000023] channel.c: Channel 0x7fcaa01e0d98 'SIP/bob-00000043' destroying
[May 8 11:50:15] DEBUG[149100][C-00000023] channel.c: Channel 0x7fc98400cb18 'SIP/bob-00000045' destroying
[May 8 11:50:15] DEBUG[149096][C-0000002a] channel.c: Channel 0x7fcaa0294298 'SIP/charlie-00000051' destroying
[May 8 11:50:15] DEBUG[149110][C-0000002a] channel.c: Channel 0x7fc9c4020f38 'SIP/charlie-00000054' destroying
Singling out a single Charlie:
[May 8 11:49:49] VERBOSE[149117][C-00000031] app_chanspy.c: Spying on channel SIP/alice-00000034
[May 8 11:50:07] DEBUG[149106][C-00000026] channel.c: Channel 0x7fc9b8021a58 'SIP/charlie-0000004c' hanging up. Refs: 3
[May 8 11:50:15] VERBOSE[149117][C-00000031] app_chanspy.c: Done Spying on channel SIP/alice-00000034
[May 8 11:50:15] DEBUG[149117][C-00000031] channel.c: Channel 0x7fc9b8021a58 'SIP/charlie-0000004c' destroying
As seen, the destroying is late.
Asterisk Issue Guidelines
- Yes, I have read the Asterisk Issue Guidelines