syllable is a small collection of tools for counting syllables and polysyllables. The tools rely primarily on data.table hash table lookups, resulting in fast syllable counting.
- Main Functions
- Installation
- Contact
- Examples
- Count Syllables In a String
- Count Syllables In a Vector of Strings
- Sum the Syllables In a Vector of Strings by Grouping Variable(s)
- Tally the Short/Poly-Syllabic Words by Group(s)
- Readability Word Statistics by Grouping Variable(s)
- Visualize Poly Syllable Distributions
- Visualize Poly Syllable Distributions by Group
The main functions follow the format of action_object
.
The following table outlines the actions. Example Output correspond to
this string: "I like chicken sandwiches."
.
Action | Description | Returns | Example Output |
---|---|---|---|
count |
One integer per word | A vector per string | 1, 1, 2, 3 |
sum |
Sum of syllable counts | An integer per string | 7 |
tally * |
Sum of syllable attributes | An integer per string | pollysyllable tallies = 1 |
* The addition of _mono
, _di
, _poly
_short
(monosyllabic +
disyllabic), or _both
(short & pollysyllabic) to tally
allows the
user specify what syllable attribute is being tallied.
The following table outlines the objects acted upon:
Object | Description | Example |
---|---|---|
string |
A character string | "I like chicken sandwiches." |
vector * |
A vector of character strings | c("I like it.", "Look out!") |
* The addition of _by
to vector
allows the user to aggregate by one
or more vectors of grouping variables.
The function count_vector
will provide a vector of integer counts for
each word in a string. For this reason count_vector
will return a
list
of integer vector counts.
count_vector(c("I like it.", "Look out!"))
## $`1`
## [1] 1 1 1
##
## $`2`
## [1] 1 1
Each of the main functions is optimized to do its task efficiently.
While one could use sum(count_vector(x))
and achieve the same results
as sum_vector(x)
it would be less efficient.
The available syllable functions that follow the format of
action_object
are:
count_string | tally_both_string | tally_mono_string | tally_short_string |
count_vector | tally_both_vector | tally_mono_vector | tally_short_vector |
count_vector_by | tally_both_vector_by | tally_mono_vector_by | tally_short_vector_by |
sum_string | tally_di_string | tally_poly_string | |
sum_vector | tally_di_vector | tally_poly_vector | |
sum_vector_by | tally_di_vector_by | tally_poly_vector_by |
Available Variable Functions
To download the development version of syllable:
Download the zip
ball or tar
ball, decompress
and run R CMD INSTALL
on it, or use the pacman package to install
the development version:
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh(
'trinker/lexicon',
'trinker/textclean',
'trinker/textshape',
'trinker/syllable'
)
You are welcome to:
- submit suggestions and bug-reports at: https://github.com/trinker/syllable/issues
- send a pull request on: https://github.com/trinker/syllable/
- compose a friendly e-mail to: tyler.rinker@gmail.com
The following examples demonstrate the functionality of a select sample of syllable functions.
Counts the number of syllables for each word in a string.
count_string("I like chicken and eggs for breakfast")
## [1] 1 1 2 1 1 1 2
sents <- c("I like chicken.", "I want eggs benidict for breakfast.")
count_vector(sents)
## $`1`
## [1] 1 1 2
##
## $`2`
## [1] 1 1 1 3 1 2
Map(function(x, y) setNames(x, y),
count_vector(sents),
strsplit(gsub("[^a-z ]", "", tolower(sents)), "\\s+")
)
## $`1`
## i like chicken
## 1 1 2
##
## $`2`
## i want eggs benidict for breakfast
## 1 1 1 3 1 2
dat <- data.frame(
text = c("I like chicken.", "I want eggs benedict for breakfast.", "Really?"),
group = c("A", "B", "A")
)
sum_vector_by(dat$text, dat$group)
## group n.words count
## 1: A 4 7
## 2: B 6 9
dat <- data.frame(
text = c("I like excellent chicken.", "I want eggs benedict now.", "Really?"),
group = c("A", "B", "A")
)
tally_both_vector_by(dat$text, dat$group)
## group n.words short poly
## 1: A 5 3 2
## 2: B 5 4 1
with(presidential_debates_2012, tally_both_vector_by(dialogue, person))
## person n.words short poly
## 1: OBAMA 18319 16286 2033
## 2: ROMNEY 19924 17858 2066
## 3: CROWLEY 1672 1525 147
## 4: LEHRER 765 674 91
## 5: QUESTION 583 486 97
## 6: SCHIEFFER 1445 1289 156
with(presidential_debates_2012, readability_word_stats_by(dialogue, list(person, time)))
## person time n.sents n.words n.chars n.sylls n.shorts n.polys
## 1: OBAMA time 1 179 3599 16002 5221 3221 378
## 2: OBAMA time 2 494 7477 32459 10654 6696 781
## 3: OBAMA time 3 405 7243 32288 10675 6369 874
## 4: ROMNEY time 1 279 4085 17984 5875 3646 439
## 5: ROMNEY time 2 560 7536 32504 10720 6788 748
## 6: ROMNEY time 3 569 8303 35824 11883 7424 879
## 7: CROWLEY time 2 165 1672 6904 2308 1525 147
## 8: LEHRER time 1 87 765 3256 1087 674 91
## 9: QUESTION time 2 40 583 2765 930 486 97
## 10: SCHIEFFER time 3 133 1445 6234 2058 1289 156
## n.complexes
## 1: 378
## 2: 781
## 3: 873
## 4: 439
## 5: 746
## 6: 878
## 7: 147
## 8: 91
## 9: 97
## 10: 156
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, scales)
tally_both_vector(presidential_debates_2012$dialogue) %>%
mutate(Duration = 1:length(poly)) %>%
rowwise() %>%
filter((short + poly) > 4) %>%
mutate(
short = short/(short+poly),
poly = 1 - short,
size = poly > .3
) %>%
ggplot(aes(Duration, poly)) +
geom_text(aes(label = Duration, size = size, color = size)) +
coord_flip() +
scale_size_manual(values = c(1.5, 2.5), guide=FALSE) +
scale_color_manual(values = c("grey75", "black"), guide=FALSE) +
scale_x_reverse() +
scale_y_continuous(label = scales::percent) +
ylab("Poly-syllabic") +
xlab("Duration (sentences)") +
theme_bw()
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, tidyr, scales)
with(presidential_debates_2012, tally_both_vector_by(dialogue, list(person, time))) %>%
mutate(
person_time = paste(person, time, sep = "-"),
short = short/(short+poly),
poly = 1 - short
) %>%
arrange(poly) %>%
mutate(person_time = factor(person_time, levels = person_time)) %>%
gather(type, prop, c(short, poly)) %>%
ggplot(aes(person_time, weight = prop, fill = type)) +
geom_bar() +
coord_flip() +
scale_y_continuous(label = scales::percent) +
scale_fill_discrete(name="Syllable\nType") +
xlab("Person & Time") +
ylab("Usage") +
theme_bw()