Evaluating LLM Semantic Profiling Capabilities

Abstract: Automatically generating data visualizations in response to human utterances on datasets necessitates a deep semantic understanding of the utterance, including implicit and explicit references to data attributes, visualization tasks, and necessary data preparation steps. Natural Language Interfaces (NLIs) for data visualization have explored ways to infer such information, yet challenges persist due to inherent uncertainty in human speech. Recent advances in Large Language Models (LLMs) provide an avenue to address these challenges, but their ability to extract the relevant semantic information remains unexplored. In this study, we evaluate four publicly available LLMs (GPT-4, Gemini-Pro, Llama3, and Mixtral), investigating their ability to comprehend utterances even in the presence of uncertainty and identify the relevant data context and visual tasks. Our fndings reveal that LLMs are sensitive to uncertainties in utterances. Despite this sensitivity, they are able to extract the relevant data context. However, LLMs struggle with inferring visualization tasks. Based on these results, we highlight future research directions on using LLMs for visualization generation. Our supplementary materials have been shared in this repository.

Paper Link

Repo Content

Contains the codebase used for prompting LLMs and performing comparisons with human annotations.We evaluated two proprietary and two open-source LLMs.

Proprietary LLMs. We evaluated OpenAI's GPT4-Turbo and Google's Gemini-Pro. GPT4-Turbo has a training data cutoff of December 2023 and Gemini-Pro's training data cutoff is described as "early 2023" According to Google AI documentation. We utilized the Application Programming Interfaces (APIs) for both of these models to generate responses for the 500 utterances in our corpus.

Open Source LLMs. We evaluated two open-source LLMs, Llama3, and Mixtral, on the Llama factory code base. Llama3 has 70 billion parameters and a context length of 8,000 tokens, with a knowledge cutoff of December 2023. Mixtral-8x7B-Instruct is configured with 46.7 billion parameters and similarly has a knowledge cutoff in December 2023.

Our experimental setup for the open-source models involved utilizing an NVIDIA H100 GPU coupled with a 48-core Intel Sapphire Rapids CPU, supported by 100GB of system memory. Both models were operational in 4-bit quantization mode and were enhanced with flash attention mechanisms to expedite the inference process. The inference duration for the LLAMA3 model was approximately 2 hours, whereas for the Mixtral model, it extended to about 3 hours.

Instructions

To insure you have all the required packages for this project please run pip install -r ./requirements.txt in the terminal. You should now be able to run the necessary scripts in this repo.

Content

This project contains the following folders

Datasets: contains all 37 datasets used by the 500 utterances used in this study. All files are .csv

GPT_Gemini_Prompting_Scripts: Contains the scripts used to prompt the proprietary LLMs (GPT4-turbo and Gemini-Pro) evaluated in this study.

Llama_Mixtral_Prompting_Scripts: Contains the scripts used to prompt the open source LLMs evaluated in this study. Also contaions json results from llama and mixtral runs.

Output Analysis: Contains the scripts used to evaluate the responses generated by all LLMs evaluated in this study

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
GPT_Gemini_Prompting_Scripts		GPT_Gemini_Prompting_Scripts
Llama_Mixtral_Prompting_Scripts		Llama_Mixtral_Prompting_Scripts
Output Analysis		Output Analysis
datasets		datasets
.gitignore		.gitignore
README.md		README.md
datasets-20240415T135212Z-001.zip		datasets-20240415T135212Z-001.zip
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating LLM Semantic Profiling Capabilities

Repo Content

Instructions

Content

About

Releases

Packages

Languages

hdi-umd/Semantic_Profiling_LLM_Evaluation

Folders and files

Latest commit

History

Repository files navigation

Evaluating LLM Semantic Profiling Capabilities

Repo Content

Instructions

Content

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages