Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add function to read and manipulate the population lookups #2

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
5 changes: 4 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,12 @@ Imports:
fs,
glue,
magrittr,
janitor,
phsmethods,
readr,
rlang,
tibble
tibble,
tidyr
Suggests:
testthat (>= 3.0.0)
Config/testthat/edition: 3
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

export(find_latest_file)
export(get_hscp_locality)
export(get_pop_est)
export(get_simd_datazone)
export(get_simd_postcode)
export(get_spd)
Expand Down
9 changes: 6 additions & 3 deletions R/find_latest_file.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@
#' passed to [fs::dir_info()] to search for the file.
#' @param selection_method Valid arguments are "modification_date"
#' (the default) or "file_name".
#' @param ... Additional arguments passed to [fs::dir_info()].
#'
#' @return the [fs::path()] to the file
#' @return the [fs::path()] to the file.
#' @export
#'
#' @examples
Expand All @@ -25,13 +26,15 @@
#' }
find_latest_file <- function(directory,
regexp,
selection_method = "modification_date") {
selection_method = "modification_date",
...) {
if (selection_method == "modification_date") {
latest_file <- fs::dir_info(
path = directory,
type = "file",
regexp = regexp,
recurse = TRUE
recurse = TRUE,
...
) |>
dplyr::arrange(
dplyr::desc(.data$birth_time),
Expand Down
177 changes: 177 additions & 0 deletions R/get_pop_est.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
#' Get population estimates
#'
#' This function retrieves population estimates based on various parameters.
#' It reads population data from a specified file and filters it based on the
#' input parameters. The function also allows for grouping by age and pivoting
#' the data for wider format.
#' @param level The geographic level for which to retrieve population estimates.
#' One of "datazone", "intzone", "hscp", "ca", or "hb".
#' @param version The version of the population estimates to use (default: "latest").

Check warning on line 9 in R/get_pop_est.R

View workflow job for this annotation

GitHub Actions / lint

file=R/get_pop_est.R,line=9,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 85 characters.
#' @param min_year,max_year (optional) The minimum and maximum years to include in the results.

Check warning on line 10 in R/get_pop_est.R

View workflow job for this annotation

GitHub Actions / lint

file=R/get_pop_est.R,line=10,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 95 characters.
#' @param age_groups Logical, indicating whether to aggregate population estimates by age groups.

Check warning on line 11 in R/get_pop_est.R

View workflow job for this annotation

GitHub Actions / lint

file=R/get_pop_est.R,line=11,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 97 characters.
#' If `TRUE`, the `phsmethods::create_age_groups` function is used.
#' @param pivot_wider Optionally reshape the data into a wider format, summarising population counts by the specified columns.

Check warning on line 13 in R/get_pop_est.R

View workflow job for this annotation

GitHub Actions / lint

file=R/get_pop_est.R,line=13,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 126 characters.
#' Allowed values:
#' * `FALSE` (default): Do not pivot.
#' * `TRUE` or `"all"`: Pivot by both sex and age/age group.
#' * `"age"`: Pivot by age/age group only.
#' * `"age-only"`: Pivot by age/age group and aggregate to remove sex.
#' * `"sex"`: Pivot by sex only.
#' * `"sex-only"`: Pivot by sex group and aggregate to remove age/age group
#' @param ... Additional arguments passed to [phsmethods::create_age_groups()].
#'
#' @return A tibble containing the filtered and possibly transformed population data.

Check warning on line 23 in R/get_pop_est.R

View workflow job for this annotation

GitHub Actions / lint

file=R/get_pop_est.R,line=23,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 85 characters.
#'
#' @note
#' Depending on the values for `age_groups` and `pivot_wider`, the resulting
#' columns in the returned tibble will vary. Refer to the examples below for
#' illustration.
#'
#' @export
#'
#' @examples
#' # Basic Usage: Datazone Population Estimates (no filtering)
#' get_pop_est("datazone")
#'
#' # Filter by Year:
#' get_pop_est("ca", min_year = 1995, max_year = 2020)
#'
#' # Age Groups: Health Board (HB) Population Estimates by Age Group
#' get_pop_est("hb", age_groups = TRUE)
#'
#' # Age Groups with Custom Settings:
#' # Aggregate into 5-year age groups, with an open-ended final group "85+"
#' get_pop_est("hb", age_groups = TRUE, by = 5, to = 85)
#'
#' # Pivot Wider (All): CA Population Estimates, Reshaped by Sex and Age Group
#' # The result will have columns for each combination of sex and age group,
#' # e.g., "pop_f_0_4", "pop_m_5_9", etc.
#' get_pop_est("ca", age_groups = TRUE, pivot_wider = "all")
#'
#' # Pivot Wider (Age Only): CA Population Estimates, Reshaped by Age Group Only
#' # This is useful if you only need the total population for each age group, regardless of sex.

Check warning on line 52 in R/get_pop_est.R

View workflow job for this annotation

GitHub Actions / lint

file=R/get_pop_est.R,line=52,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 96 characters.
#' get_pop_est("ca", age_groups = TRUE, pivot_wider = "age-only")
#'
#' # Combined Filtering, Age Groups, and Pivoting:
#' # CA population from 2015-2020, aggregated by 10-year age groups, and pivoted by sex

Check warning on line 56 in R/get_pop_est.R

View workflow job for this annotation

GitHub Actions / lint

file=R/get_pop_est.R,line=56,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 87 characters.
#' # The result will have columns for each sex ("pop_f", "pop_m") and a row per age group.

Check warning on line 57 in R/get_pop_est.R

View workflow job for this annotation

GitHub Actions / lint

file=R/get_pop_est.R,line=57,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 90 characters.
#' get_pop_est("ca", min_year = 2015, max_year = 2020, age_groups = TRUE, by = 10, pivot_wider = "sex")
get_pop_est <- function(
level = c("datazone", "intzone", "hscp", "ca", "hb"),
version = "latest",
min_year = NULL,
max_year = NULL,
age_groups = FALSE,
pivot_wider = FALSE,
...) {
level <- tolower(level)
level <- rlang::arg_match(level)
if (!inherits(pivot_wider, "logical")) {
pivot_wider <- rlang::arg_match(
pivot_wider,
values = c("all", "age", "age-only", "sex", "sex-only")
)
}

ext <- "rds"
pop_dir <- fs::path(get_lookups_dir(), "Populations", "Estimates")

file_name_re <- paste0(level, "[0-9]{0,4}_pop_est_[0-9]{4}_[0-9]{4}\\.", ext)

pop_path <- find_latest_file(
directory = pop_dir,
regexp = file_name_re,
ignore.case = TRUE
)

pop_est <- read_file(pop_path)

# Validate year range and filter
if (!is.null(min_year) && !is.null(max_year) && min_year > max_year) {
cli::cli_abort(
"Invalid years: {.arg min_year} must not be greater than {.arg max_year}"
)
}

if (!is.null(min_year)) {
min_year_available <- min(pop_est$year)
if (min_year < min_year_available) {
cli::cli_abort(
"{.arg min_year} must be at least {min_year_available} when using the
{.file {fs::path_file(pop_path)}} file."
)
}
pop_est <- dplyr::filter(pop_est, .data$year >= min_year)
}

if (!is.null(max_year)) {
max_year_available <- max(pop_est$year)
if (max_year > max_year_available) {
cli::cli_abort(
"{.arg max_year} must be at most {max_year_available} when using the
{.file {fs::path_file(pop_path)}} file."
)
}
pop_est <- dplyr::filter(pop_est, .data$year <= max_year)
}

# Create age groups
if (age_groups) {
pop_est <- pop_est |>
dplyr::mutate(
age_group = phsmethods::create_age_groups(x = .data$age, ...),
.keep = "unused"
) |>
dplyr::group_by(dplyr::across(!.data$pop)) |>
dplyr::summarise(pop = sum(.data$pop), .groups = "drop")
}

# Pivot data
if (pivot_wider %in% list(TRUE, "all")) {
pop_est <- pop_est |>
pivot_data(
id_cols = -"sex",
names_from = c("sex_name", dplyr::if_else(age_groups, "age_group", "age"))
)
} else if (pivot_wider == "sex") {
pop_est <- pop_est |>
pivot_data(
id_cols = c(-"sex", dplyr::if_else(age_groups, "age_group", "age")),
names_from = "sex_name"
)
} else if (pivot_wider == "sex-only") {
pop_est <- pop_est |>
pivot_data(
id_cols = c(-"sex", -dplyr::if_else(age_groups, "age_group", "age")),
names_from = "sex_name"
)
} else if (pivot_wider == "age") {
pop_est <- pop_est |>
pivot_data(
id_cols = c(-"sex", "sex_name"),
names_from = dplyr::if_else(age_groups, "age_group", "age")
)
} else if (pivot_wider == "age-only") {
pop_est <- pop_est |>
pivot_data(
id_cols = c(-"sex", -"sex_name"),
names_from = dplyr::if_else(age_groups, "age_group", "age")
)
}

return(pop_est)
}


# Helper function to pivot data
pivot_data <- function(data, id_cols, names_from) {
tidyr::pivot_wider(
data,
id_cols = {{ id_cols }},
names_from = !!names_from,
values_from = "pop",
values_fn = sum,
names_prefix = "pop_",
names_repair = janitor::make_clean_names
)
}
11 changes: 9 additions & 2 deletions man/find_latest_file.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

82 changes: 82 additions & 0 deletions man/get_pop_est.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion phslookups.Rproj
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace

UseNativePipeOperator: No
UseNativePipeOperator: Yes

SpellingDictionary: en_GB
Loading