- Changed reddit https collection method because of platform issues with the
httr
packages on Windows. - Removed failing S3 dispatch methods from the
Graph
function. - Replaced
httr
request methods withhttr2
versions.
- Removed twitter functions from the package.
- Added
writeToFile
to all methods. - Added
Merge
support for mastodon. - Changed the
voson.msg
option tovoson.cat
forcat
message output. - Changed
verbose
message technique andverbose = TRUE
is now the default for most functions.
- Disabled metadata logging that occurs when the
writeToFile
parameter ofCollect
is used. This is due to a new package issue with R version 4.4.
- Fixed a reddit data collection issue for threads that are specified using shorter URL's without the title part and
that contain
continue thread
links. These links were resolving to the main thread resulting in duplication of comments and thread structures.
- Added a parameter to
Mastodon
networkCreate()
function namedsubtype
for creating variations to theactivity
andactor
networks. For theactivity
network asubtype = tag
parameter can be used to create atag
network of post tags that are colocated. For theactor
network asubtype = server
parameter can be used to create aserver
network, which is anactor
network reduced to server associations.
- Added
Mastodon
authentication, collection and network creation. There are two options forMastodon
collection, a hashtag search for global or local server timeline posts that is optionally authenticated:Collect.search.mastodon()
, and a public thread collection function using input URL's that is similar toReddit
thread collection that requires no authentication:Collect.thread.mastodon()
. To access these methods viaCollect
anendpoint = "search"
orendpoint = "thread"
parameter should be passed to the functions. - The
Mastodon
authentication and collection uses thertoot
package and a function has been created for importingrtoot
data intovosonSML
calledImportRtoot
. Imported data can be passed as input to theCreate
network functions.
- Changed default
Reddit
request wait time range from 3 to 5 seconds, to 6 to 8 seconds to avoid a proposed platform rate limit of 10 requests per minute. This value can still be manually set using thewaitTime = c(min, max)
wait time range parameter.
- Fixed a bug in the regex for
Reddit
URL parsing in which thread ID's were limited to 6 characters. - Fixed verbose output for
2mode
networks to use option specified method. - Fixed an issue with adding text to
Twitter
networks caused by missing columns in the data. - Added twitter tokenization functions that were recently removed from the
tidytext
andtokenizers
packages due to a change in the ICU library unicode standard and thestringi
package (tokenizers issue #82). This affects only the generation ofsemantic
and2mode
twitter networks and the fix maintains their functionality until an alternative tweet tokenization method is implemented. Unfortunately these two twitter network types are not supported on systems using ICU library versions >= 72.0 at this time. - Fixed an intermitant column mismatch error in
Twitter
caused by unexpected type when data merging. - Fixed the number of tweet observations does not match number of users error reported with
rtweet
v1.1. - Fixed number of tweets requested count in verbose message for
Twitter
timeline collection. - Fixed a bug in
Reddit
thread collection where URL's missing trailing slashes would trigger loop protection errors. - Changed the default
sort
parameter value forReddit
threaad collection to beNA
. Default sort order onReddit
is not a fixed value.
- Added
sort
parameter toReddit
collection. As this collection method is limited, it may be useful to request comments in sort order using theReddit
sort optionstop
,new
,controversial
,old
,qa
andbest
. - Added a
Collect.listing
function for subreddits onReddit
. This is not a search, however it allows complete metadata for a specified number of subreddit threads to be collected in sorted order. The sort options arehot
,top
,new
andrising
. There is a further time parameterperiod
that can be set tohour
,day
,week
,month
,year
orall
ifsort = top
, meaning for example, results sorted by top threads over the last week.
- Added simple log file output for
Collect
andMerge
functions whenwriteToFile = TRUE
. The log file is written in the same location as the data file with the.txt
extension appended. - Changed data output path option
option(voson.data = "my-data")
to now attempt to create the directory if it does not exist.
- Fixed two issues that arose from the introduction of tibbles and verbose messaging in
Collect.reddit()
. - Fixed an error caused by unescaped regex parameters in hyperlinks processed by
Collect.web()
(#49).
- Re-wrote and modified
vosonSML
Twitter
functions to support major changes made inrtweet
release version 1.0.2. - Added an
endpoint
parameter to theTwitter
Collect
function. It is set tosearch
by default, which is the usual collect behaviour, but can also now be set totimeline
to collect user timelines instead. SeeCollect.timeline.twitter()
for parameters. - Changed output message system.
vosonSML
functions are now silent by default. Using theverbose
parameter will again print function output. - Changed output messages to use the
message()
function instead of thecat()
function by default. Setting the global optionoption(voson.msg = FALSE)
will again redirect output tocat()
. The option can be removed by assigning a value ofNULL
. - Added the
voson.data
option allowing a directory path to be set forwriteToFile
output files. Files are output to the current working directory by default, however a new directory can now be set withoption(voson.data = "my-data")
for example. The directory path can be relative or a full path, but must be created beforehand or already exist. If the path is invalid or does not exist it will continue with the default behaviour. This option can be removed by assigning a value ofNULL
. This will not effect other file write operations performed by the user. - The
Twitter
AddText()
andAddUserData()
functions now work with mostTwitter
network types. AddText()
now adds columns for embedded tweet text and has ahashtags
parameter to add a list of tweet hashtags as a network attribute.AddUserData()
now adds an additional dataframe formissing_users
. It lists the ids and screen names of users that did not have metadata embedded in the collected data. Using thelookupUsers
parameter will retrieve the metadata using the twitter API. Additonally passing therefresh = TRUE
parameter will now retrieve and update the metadata for all users in the network.- Twitter data collection now returns a named list of two dataframes containing
tweets
andusers
. - Removed the
ImportData
function and replaced it withImportRtweet()
forrtweet
version 1.0 format data. - Added
Merge()
andMergeFiles()
functions to support the merging of collected data from separate operations. These functions support input of multiple Collect objects or.RDS
files, automatically detect the datasource type and support thewriteToFile
parameter for file output of merged data.
- Re-wrote
YouTube
id extraction from url function to be more robust and added support forYouTube
shorts urls. - Removed stand-alone
GetYoutubeVideoIDs
function. TheYouTube
collect function parametervideoIDs
will now accept video ids or video urls. - Added wrappers and aliases for some functions. Twitter auth objects can now be created with simplified
auth_twitter_app()
,auth_twitter_dev()
andauth_twitter_user()
functions for each token type. Thecollect_reddit_threads()
andcollect_web_hyperlinks()
functions skip the unecessaryAuthenticate
step forReddit
and web data collection.
- Incorrectly ordered tweets by
status ID
to summarise collected tweet range. TheMin ID
andMax ID
are not necessarily the earliest and latest tweet in the tweets collected and therefore not ideal for delimiting subsequent collections. Instead the twoEarliest Obs
and twoLatest Obs
tweets as returned by theTwitter API
are now reported.
- Added
enpoint
parameter toCollect
, allowingsearch
ortimeline
to be specified for atwitter
data collection. If it is not specified the default is a twittersearch
. - The
timeline
collection accepts ausers
vector of user names or ID's or a mixture of both, and will return up to 3,200 of each users most recent tweets. - Minimum required version of R has changed from 3.6 to 4.1.
- Updated standard package documentation, added citation, code of conduct and README.Rmd.
- Replaced magrittr pipes with native pipe operators.
- Updated standard package documentation, added citation and README.Rmd.
- Re-implemented
Create.actor.twitter
andCreate.activity.twitter
to usedplyr
anddata.table
techniques consistent with other package network creation functions. Both functions are significantly faster for large collection dataframes.
Create.actor.twitter
includes two new parameters formentions
,inclMentions
that will process and includementions
edges in the network andinclRtMentions
that will process and include mentions found in retweets. TheinclMentions
parameter is set toTRUE
by default andinclRtMentions
set toFALSE
. TheinclRtMentions
parameter is a subset of mentions, therefore for it to be set toTRUE
,inclMentions
must also beTRUE
.- Re-implemented and simplified the
Create.activity.twitter
network creation. Addedauthor_id
andauthor_screen_name
to nodes to assist with labels or re-creating tweet URLs from data. - Added
rmEdgeTypes
parameter toCreate.activity.twitter
andCreate.actor.twitter
. These accept a list of edge types that can be filtered out of the network during network creation. - Removed label attributes from igraph graphs generated by the
Graph
function. - Tidied up and renamed many of the utils functions. Removed unused functions.
- Added last observation tweet to minimum and maximum status ID values reported for twitter collections. Usually the
last observation and
Min ID
will be the same, but sometimes theMin ID
is outside of the expected collection range. The last observation is a more reliable tweet to use as the starting point for subsequent search collections. - Cleaned up package imports, suggests and added some interactive package checks to reduce the number of required imports.
- Added a web crawler
Collect
method with hyperlink network creation. TheCreate
function withactivity
type parameter creates a network where nodes areweb pages
and edges thehyperlinks
linking them (extracted froma href
HTML tags). Theactor
network has page orsite domains
as the nodes and again thehyperlinks
from linking pages between domains.
- Prepending instead of appeneding S3 class names to
Collect
dataframes to avoiddplyr
issues. - Removed
retryOnRateLimit
set toFALSE
if rate limit cannot be determined. ImportData
will now accept a file path or a dataframe.
- S3 class names were being added to
Collect
dataframes afterwriteToFile
. Should no longer be required to manually add class names or useImportData
to load RDS files to use previously saved data withCreate
functions.
- Minor documentation updates to
Create.semantic.twitter
,Create.twomode.twitter
and theIntro-to-vosonSML
vignette:- Specified the
tidyr
,tidytext
andstopwords
package requirements in descriptions and examples - Updated references to
twomode
networks as2-mode
where possible
- Specified the
- Fixed an issue with custom classes assigned to dataframes causing an
vctrs
error when usingdplyr
functions. The classes are no longer needed post-method routing so they are simply removed. - Replaced an instance of the deprecated
dplyr::funs
function that was generating a warning.
- Minor documentation updates.
- Fixed a reddit collect
bind_rows
error on joining dataframes with different types for the structure column. Column type was being set to integer instead of character in cases when every thread comment have no replies or depth (except the OP).
- Reimplemented the
Create.semantic.twitter
andCreate.twomode.twitter
functions using thetidytext
package. They now better support tokenization of tweet text and allows a range of stopword lists and sources to be used from thestopwords
package. The semantic network function requires thetidytext
andtidyr
packages to be installed before use. - New parameters have been added to
Create.semantic.twitter
:- Numbers and urls can be removed or included from the term list using
removeNumbers
andremoveUrls
, default value isTRUE
. - The
assoc
parameter has been added to choose which node associations or ties to include in the network. The default value is"limited"
and includes only ties between most frequently occurring hashtags and terms in tweets. A value offull
will also include ties between most frequently occurring hashtags and hashtags, and terms with terms creating a more densely connected network. - Parameters to specify
stopwords
language e.gstopwordsLang = "en"
and source e.gstopwordsSrc = "smart"
have been added. These correspond to thelanguage
andsource
parameters of thetidytext::get_stopwords
function. Thestopwords
default value isTRUE
.
- Numbers and urls can be removed or included from the term list using
- The network produced by the
Create.twomode.twitter
function is weighted by default but can be disabled by setting the newweighted
parameter toFALSE
. - Renamed the
replies_from_text
parameter torepliesFromText
andat_replies_only
toatRepliesOnly
in theAddText.actor.youtube
function for consistency. - Improved the usage examples in the README file.
- Removed
tm
package dependency.
- Updated
Introduction to vosonSML
vignetteMerging Collected Data
examples. - Added new hex sticker to package documentation.
- Fixed a logic problem in
Collect.youtube
that was causing no video comments to be collected if there were no reply comments for any of the videos firstmaxComments
number of top level comments. For example, ifmaxComments
is set to 100 and the first 100 comments made to a video had no replies then no results would be returned.
- A recent intermittent problem with the Twitter API caused an issue with the
rtweet::rate_limit
function that resulted in an error when using the rtweetretryonratelimit
search parameter. Therate_limit
function was being called byvosonSML
to check the twitter rate limit regardless of whether the search parameter was set or not, and so was failingCollect
with an error. A fix was made so thatvosonSML
checks ifrtweet::rate_limit
succeeds, and if not automatically setsretryonratelimit
toFALSE
so that a twitterCollect
can still be performed without error should this problem occur again.
- Added some links to the
pkgdown
site navbar.
- Added some guidance for merging collected data to the
Introduction to vosonSML
vignette.
- Added
Introduction to vosonSML
vignette to the package. - Minor changes and input checks added to
ImportData
. - Added some unit testing for
Authenticate
andImportData
.
- Reddit JSON is now retrieved using
jsonlite::fromJSON
. - Reddit 'Continue' threads are now followed with additional thread requests. Many more comments are now collected for threads with large diameters or breadth. Continue threads also have a Reddit limit of 500 comments per thread request.
- Reddit comment ID's and timestamps are now extracted.
- Removed the
tictoc
package from dependency imports to suggested packages. - Added some checks for whether the
rtweet
package is installed. - Removed the
RedditExtractoR
package from imports. - HTML decoded tweet text during network creation to replace '&', '<', and '>' HTML codes.
- Added node type attribute to
twomode
networks.
- Renamed
bimodal
networks totwomode
.
- Added output messages from supplemental functions such as
AddText()
andGraph()
. Also improved consistency of output messages fromCollect
andCreate
functions.
- Added a fix
reddit
gsub locale error #21. - Changed
bimodal
network hashtags to lowercase as filter terms when entered are converted to lowercase. - Fixed errors thrown when removing terms from
bimodal
andsemantic
networks. - Removed a duplicate
GetVideoData()
function call inAddVideoData
. - Fixed data type errors in
AddText
functions related to strict typing bydplyr::if_else
function.
- A feature was added to the youtube actor
AddText
function to redirect edges towards actors based on the presence of ascreen name
or@screen name
that may be found at the beginning of a reply comment. Typically reply comments are directed towards a top-level comment, this instead captures when reply comments are directed to other commenters in the thread.
- Changed youtube
actor
network identifiers to be their uniqueChannel ID
instead of theirscreen names
. - Created the
AddVideoData
function to add collected video data to the youtubeactor
network. The main purpose of this function is to replace video identifiers with theChannel ID
of the video publisher (actor) instead. To get theChannel ID
of video publishers an additional API lookup for the videos in the network is required. Additional columns such as videoTitle
,Description
andPublished
time are also added to the network$edges
dataframe as well as returned in their own dataframe called$videos
.
- Created the
AddText
function to add collected text data to networks. This feature applies toactivity
andactor
networks and will typically add a node attribute to activity networks and an edge attribute to actor networks. For example, this function will add the columnvosonTxt_tweets
containing tweet text to$nodes
if passed an activity network, and to$edges
if passed an actor network. - Generation of
igraph
graph objects and subsequent writing to file has been removed from theCreate
function and placed in a new functionGraph
. This change abstracts the graph creation and makes it optional, but also allows supplemental network steps such asAddText
to be performed prior to creating the final igraph object.
- Removed
writeToFile
parameter fromCreate
functions and added it toGraph
. - Removed
weightEdges
,textData
andcleanText
parameters fromCreate.actor.reddit
.cleanText
is now a parameter ofAddText.activity.reddit
andAddText.actor.reddit
. - Replaced
AddTwitterUserData
withAddUserData
function that works similarly toAddText
. This function currently only applies to twitter actor networks and will add, or download add if missing, user profile information to actors as node attributes.
- Added
activity
network type for reddit. In the reddit activity network nodes are the thread posts and comments, edges represent where comments are directed in the threads. - Added github dev version badge to README.
- Added new
activity
network type for twitter and youtubeCreate
function. In this network nodes are the items collected such as tweets returned from a twitter search and comments posted to youtube videos. Edges represent the platform relationship between the tweets or comments.
- Added a new twitter actor network edge type
self-loop
. This aims to facilitate the later addition of tweet text to the network graph for user tweets that have no ties to other users.
- Added twitter interactive web authorization of an app as provided by
rtweet::create_token
. Method is used when only twitter app name and consumer keys are passed toAuthenticate.twitter
as parameters. e.gAuthenticate("twitter", appName = "An App", apiKey = "xxxxxxxxxxxx", apiSecret = "xxxxxxxxxxxx")
. A browser tab will open asking the user to authorize the app to their twitter account to complete authentication. This is using twittersApplication-user authentication: OAuth 1a (access token for user context)
method. - It is suspected that Reddit is rate-limiting some generic R UA strings. So a User-Agent string is
now set for underlaying R Collect functions (e.g
file
) via theHTTPUserAgent
option. It is temporarily set to package name and current version number for Collect e.gvosonSML v.0.27.2 (R Package)
. - Removed hex sticker (and favicons for pkgdown site).
- Fixed a bug in
Create.semantic.twitter
in which a sum operation calculating edge weights would setNA
values for all edges due toNA
values present in the hashtag fields. This occurs when there are tweets with no hashtags in the twitter collection and is now checked. - Some UTF encoding issues in
Create.semantic.twitter
were also fixed.
- Added '#' to hashtags and '@' to mentions in twitter semantic network to differentiate between hashtags, mentions and common terms.
- Fixed a bug in
Collect.twitter
in which any additionaltwitter API
parameters e.glang
oruntil
were not being passed properly tortweet::search_tweets
. This resulted in the additional parameters being ignored.
- Removed the
SaveCredential
andLoadCredential
functions, as well as theuseCachedToken
parameter forAuthenticate.twitter
. These were simply calling thesaveRDS
andreadRDS
functions and not performing any additional processing. UsingsaveRDS
andreadRDS
directly to save and load anAuthenticate
credential object to file is simpler. - Changed the way that the
cleanText
parameter works inCreate.actor.reddit
so that it is more permissive. Addresses encoding issues with apostrophes and pound symbols and removes unicode characters not permitted by the XML 1.0 standard as used ingraphml
files. This is best effort and does not resolve allreddit
text encoding issues.
- Added
Collect.twitter
summary information that includes the earliest (min) and latest (max) tweetstatus_id
collected with timestamp. Thestatus_id
values can be used to frame subsequent collections assince_id
ormax_id
parameter values. If theuntil
date parameter was used the timestamp can also be used as a quick confirmation. - Added elapsed time output to the
Collect
method.
- Fixed bugs in
Create.actor.reddit
that were incorrectly creating edges between top-level commentors and thread authors from different threads. These bugs were only observable in when collecting multiple reddit threads.
- Improved output for
reddit
collection. Removed the progress bar and added a table of results summarising the number of comments collected for each thread. - Added to
twitter
collection output the userstwitter API
reset time.
- Fixed a bug in
Create.actor.twitter
andCreate.bimodal.twitter
in which the vertices dataframe provided to thegraph_from_data_frame
function as a contained duplicate names raising an error.
- Revised and updated
roxygen
documentation and examples for all package functions. - Updated all
Authenticate
,Collect
andCreate
S3 methods to implement function routing based on object class names.
- Created a
pkgdown
web site for github hosted package documentation. - Created a new hex sticker logo.
- Replaced the
twitteR
twitter collection implementation with thertweet
package. - A users
twitter
authentication token can now be cached in the.twitter_oauth_token
file and used for subsequenttwitter API
requests without re-authentication. A new authentication token can be cached by deleting this file and using the re-using the parameteruseCachedToken = TRUE
.