All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Increased compatibility with
PromptingTools
upto 0.71.
- Increased compatibility with
PromptingTools
upto 0.64.
- Added a new template
TopicLabelerQuestionsWithContext
to generate topic labels for questions in a given context (eg, chatbot inputs). - Added a new function
topic_tree
to generate a topic tree for a given topic level (display withprint_tree
).
- Updated to use
PromptingTools
0.44.0. - That implicitly changes the default chat model to
gpt-4o-mini
. - Minor updates to labeling templates to ensure that the labels are plain text, no markdown or code.
- Re-formatted code to SciML style guide.
- Upstreamed function
wrap_string
fromLLMTextAnalysis
toPromptingTools
.
- Fixed a bug where
wrap_string
would not properly handle large words with a lot of Unicode characters.
- Added a classification function
train_classifier
to train a model to classify documents into a set of predefined labels (as opposed to the more open-ended topic modeling inbuild_clusters!
). You can either provide a small set of labeled documents to train the model (that are in theindex
), or just specify thenum_samples
and the LLM model will generate its own training data based on thelabels
andlabels_description
provided. - Added a new template
TextWriterFromLabel
to generate synthetic documents for any given label (=topic). - Added methods for
build_clusters!
to add custom topic levels, eg, from aTrainedClassifier
(build_clusters!(index,cls; topic_level="MyTopics")
) or directly via providing a vector of documentassignments
(build_clusters!(index, assignments; topic_level="MyTopics")
). The convention is to usetopic_level::Integer
for auto-generated topics, andtopic_level::String
for custom topics.
- Updated to use
PromptingTools
0.15.
- Fixed a bug where
keywords
were not properly filtered before being provided to the auto-labeling function.
- Added a new example on topic label customization (
examples/3_customize_topic_labels.jl
) and the corresponding sections in the FAQ. - Added a few string cleanup tricks in
build_topic
function to strip unnecessary repetition of the prompt template in the generated labels. - Added new templates
TopicLabelerWithInstructions
andTopicSummarizerWithInstructions
that include the placeholderinstructions
to allow users to easily customize the labels and summaries, respectively.
- Fixed small typos in templates
TopicLabelerBasic
andTopicSummarizerBasic
.
- Updated logic in the
plot
to ensure topic labels are generated only when necessary. Usebuild_clusters!
to force the generation of topic labels, orplot
to generate them only if necessary. - Increased compatibility for PromptingTools to 0.12.
- Fixed a bug where
plot()
would error withUndefVarError(scores1)
.
wrap_string
utility would error with SubString chunks. Now it works with any AbstractString type.
- Changed compat for PromptingTools to 0.10.0 (with new default models! Ie, default embeddings will not match the previous version)
- Updated documentation to show Example 2 for concept/spectrum training.
- Added
train_concept
. Introduces the ability to train a model focusing on a single, specific concept within a collection of documents. This function helps in identifying and scoring the presence or intensity of the selected concept across the document set. Ideal for thematic studies, sentiment analysis, or tracking specific ideas in the text. - Added
train_spectrum
. Adds functionality to analyze documents across a spectrum defined by two contrasting concepts. This feature allows for a comparative analysis, providing insights into how documents align or contrast with two polar themes or sentiments. - Spectrum and concept can be plotted using
plot
function. - Improved plotting support:
- Added package extension for
PlotlyJS
forplot
function. - Enabled
plot
function to accept an arbitraryhoverdata
table with information to be added to the tooltip for each document (expects Tables.jl-compatible data).
- Added package extension for
- Initial release with support for Plots.jl and building and labeling topics