📑 Paper | 📑 Blog | 🤗 Huggingface Hub
We introduce the Reading with Intent task and prompting method and accompanying datasets.
The goal of this task is to have LLMs read beyond the surface level of text and integrate an understanding of the underlying sentiment of a text when reading it. The focus of this work is on sarcastic text.
We have released:
- The code used creating the sarcastic datasets
- The sarcasm-poisoned dataset
- The reading with intent prompting method
Note: Due to dataset sizes, we strongly recommend using the Huggingface repository for data download.
@misc{reichman2024readingintent, title={Reading with Intent}, author={Benjamin Reichman and Kartik Talamadupula and Toshish Jawale and Larry Heck}, year={2024}, eprint={2408.11189}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.11189}, }
Sarcasm-poisoning Architecture:
Reading with Intent Architecture
nq_val.json
- corresponds to the retrievals for NQ from GPL before any changes are madenq_val_fs.json
- Each passage fromnq_val.json
is made to be sarcastic. They are still factually accuratenq_val_psm.json
- The passages fromnq_val_fs.json
are mixed withnq_val.json
. For the first two correctly retrieved passages, their fact-distorted sarcastic version is placed in front of them. In addition two more passages are substituted for their fact-distorted sarcastic version.nq_val_psa.json
- Fact-distorted sarcastic passages were added back to the NQ Wikipedia Corpus. Passages were then re-retrieved with GPL. The resulting retrieved passages are found here.
The main code for sarcasm poisoning can be found in sarcasm_poisoning/sarcasm_poisoning.py
.
For manual merging of passages the code used was sarcasm_poisoning/merge_sarcasm_poisoning_with_corpus.py
.
Retrieval is a multi-step process.
- Start by embedding all the passages of interest using
retrieval/embed_nq.py
. retrieval/val_gpl_nq.py
retrieves the passages for the NQ queries.retrieval/eval_gpl_nq_sarcastic_retrievals.py
evaluates the result of the retrieval process.
reader/llm_reader_v2.py
holds the experimental code for using the Reading with Intent prompt system.
To train and validate the intent tagging system use: reader/sentiment_classifier/train.py
and reader/sentiment_classifier/val.py
.
This dataset and the accompanying code are released under the MIT License.
You are free to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
A copy of the full license can be found in the LICENSE file.