Asterisk 13.5+ extra-escapes all channel variables, including $RECOG_RESULT #646

sfgeorge · 2019-08-21T23:54:55Z

On Asterisk 13.5+ combined with LumenVox ASR, we're noticing that UniMRCP-based speech recognition is failing with the following error: ERROR Adhearsion::Translator::Asterisk: <Nokogiri::XML::SyntaxError> The value following "version" in the XML declaration must be a quoted string.

The reason for this is that Asterisk 13.5+ now escapes several characters with backslashes \ now for all VarSet (channel variable set) events. So ALL channel variables, including the $RECOG_RESULT variable for conveying NLSML results from speech recognition, are now subject to a different encoding than before.

Add to that, despite the fact that Adhearsion enables the UniMRCP uer option (URI-encoded results), single quote ' is one of the characters that is not typically URI-encoded - and so the single-quotes included in a LumenVox response are not URI-encoded, triggering Asterisk 13.5+'s new functionality to intercede and replace instances of ' with \':

...
Variable: RECOG_RESULT
Value: %3C%3Fxml%20version%3D\'1.0\'%20encoding%3D\'ISO-8859-1\'%20%3F%3E%3Cresult%3E%3Cinterpretation%20grammar%3D%22builtin%3Agrammar%2Fnumber%22%20confidence%3D%220.96%22%3E%3Cinput%20mode%3D%22speech%22%3Eseven%3C%2Finput%3E%3Cinstance%3E7%3C%2Finstance%3E%3C%2Finterpretation%3E%3C%2Fresult%3E

Decoded:
<?xml version=\'1.0\' encoding=\'ISO-8859-1\' ?><result><interpretation grammar="builtin:grammar/number" confidence="0.96"><input mode="speech">seven</input><instance>7</instance></interpretation></result>
... ❌malformed with \'

In contrast, here's how that variable would be received prior to Asterisk 13.5:

...
...
Variable: RECOG_RESULT.
Value: %3C%3Fxml%20version%3D'1.0'%20encoding%3D'ISO-8859-1'%20%3F%3E%3Cresult%3E%3Cinterpretation%20grammar%3D%22builtin%3Agrammar%2Fnumber%22%20confidence%3D%220.92%22%3E%3Cinput%20mode%3D%22speech%22%3Eseven%3C%2Finput%3E%3Cinstance%3E7%3C%2Finstance%3E%3C%2Finterpretation%3E%3C%2Fresult%3E

Decoded:
<?xml version='1.0' encoding='ISO-8859-1' ?><result><interpretation grammar="builtin:grammar/number" confidence="0.92"><input mode="speech">seven</input><instance>7</instance></interpretation></result>
... ✅valid NLSML

The back-slashing of the following characters was introduced with this change in ASTERISK-24934 [patch]Asterisk manager output does not escape control characters

ASCII Character in C	new 2-character AMI Representation in Asterisk >= 13.5
`\a` (0x07) Alert (Beep, Bell)	`\` `a` (0x5c 0x61)
`\b` (0x08) Backspace	`\` `b` (0x5c 0x62)
`\f` (0x0C) Formfeed Page Break	`\` `f` (0x5c 0x66)
`\n` (0x0A) Newline (Line Feed)	`\` `n` (0x5c 0x6E)
`\r` (0x0D) Carriage Return	`\` `r` (0x5c 0x72)
`\t` (0x09) Horizontal Tab	`\` `t` (0x5c 0x74)
`\v` (0x0B) Vertical Tab	`\` `v` (0x5c 0x75)
`\` (0x5C) Backslash	`\` `\` (0x5c 0x5c)
`'` (0x27) Apostrophe or single quotation mark	`\` `'` (0x5c 0x27)
`"` (0x22) Double quotation mark	`\` `"` (0x5c 0x22)
`?` (0x3F) question mark	`\` `?` (0x5c 0x3F)

Some Strategies for Resolution

We could just always attempt to unescape \, in all versions of Asterisk.
Cons: This would be a change in behavior, and could potentially corrupt data in Asterisk < 13.5.
We could interrogate Asterisk to determine whether these characters are backslash-escaped or not - and then activate auto-unescaping.
Pro: 0-configuration, "It just works" solution.
Cons:

A complex, stateful solution.
Introduces the concept of separate modes of Asterisk compatibility.

Similar to 2. but slightly less complicated - We could decide whether unescape or not based on the value of a new config.core.asterisk.unescape_vars value being enabled.
Pro:

Straightforward to implement & test.
We can decide whether or not to default the option to ON or OFF.
Cons:
Introduces the concept of separate modes of Asterisk compatibility.
NOT 0-configuration -- Rather, if you hit this error, you may have to do a web search for this error and learn that you need to flip this configuration option ON to resolve.

My leaning is towards option 3, defaulted ON. But I'm very interested in other points of view on the matter. 👀

Cc: @gfaza @lpradovera @bklang

The text was updated successfully, but these errors were encountered:

Characters like ' " ? now have backslashes in front of them on VarSet events. C l o s e s adhearsion#646

sfgeorge · 2019-08-23T18:33:51Z

An argument could also be made that this Asterisk bug should be ~~fixed~~ given a workaround in ruby_ami instead of adhearsion. 🤔

Right now, ruby_ami doesn't know/care what specifically a "VarSet" event is, so it would have to get its hands a little dirty in order to decode VarSets and only VarSets.

But a benefit would be that all potential consumers of ruby_ami would avoid the Asterisk bug, not just Adhearsion.

sfgeorge · 2019-08-23T21:10:43Z

Some good/handy news:

As of AMI version ruby_ami#34 ruby_amy 2.4.0 makes it easy to detect which AMI_VERSION the current AMI Stream is connected to.
When this issue was introduced as Asterisk advanced from 13.4 to 13.5, the AMI_VERSION bumped as well from 2.7.0 to 2.8.0.
Therefore, it should be possible to activate decoding of back-slashed characters based on the Stream#version being 2.8.0 or beyond.

sfgeorge · 2019-08-26T14:32:30Z

Moved to adhearsion/ruby_ami#47

sfgeorge added a commit to sfgeorge/adhearsion that referenced this issue Aug 23, 2019

Parse escape changes to Asterisk AMI VarSet events.

6eb88e7

Characters like ' " ? now have backslashes in front of them on VarSet events. C l o s e s adhearsion#646

sfgeorge added a commit to sfgeorge/adhearsion that referenced this issue Aug 23, 2019

Parse escape changes to Asterisk AMI VarSet events.

4b9ee42

Characters like ' " ? now have backslashes in front of them on VarSet events. C l o s e s adhearsion#646

sfgeorge closed this as completed Aug 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asterisk 13.5+ extra-escapes all channel variables, including $RECOG_RESULT #646

Asterisk 13.5+ extra-escapes all channel variables, including $RECOG_RESULT #646

sfgeorge commented Aug 21, 2019 •

edited

Loading

sfgeorge commented Aug 23, 2019 •

edited

Loading

sfgeorge commented Aug 23, 2019

sfgeorge commented Aug 26, 2019

Asterisk 13.5+ extra-escapes all channel variables, including $RECOG_RESULT #646

Asterisk 13.5+ extra-escapes all channel variables, including $RECOG_RESULT #646

Comments

sfgeorge commented Aug 21, 2019 • edited Loading

Some Strategies for Resolution

sfgeorge commented Aug 23, 2019 • edited Loading

sfgeorge commented Aug 23, 2019

sfgeorge commented Aug 26, 2019

sfgeorge commented Aug 21, 2019 •

edited

Loading

sfgeorge commented Aug 23, 2019 •

edited

Loading