Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

error in fetchNASIS with diagHzBoolean #158

Closed
smroecker opened this issue Jan 20, 2021 · 6 comments
Closed

error in fetchNASIS with diagHzBoolean #158

smroecker opened this issue Jan 20, 2021 · 6 comments
Labels
NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions

Comments

@smroecker
Copy link
Member

smroecker commented Jan 20, 2021

Their is an error in fetchNASIS. I noticed while fetching data for pedons queried using the user pedon ids 2015MT663%. The total number of ochric.epipedons should equal 61, but its returning 8.

I think the issue is coming from line 161. e.g. site(h) <- extended_data$diagHzBoolean

I think the issue is due to a difference in the data type being joined on via site(). In the site slot peiid is a character, while in diagHzBoolean the peiid is a integer. If this is indeed the issue, it begs the question, how many other joins are breaking because of a different in data type???

On a related .diagHzLongtoWide() looks a bit verbose. I think something like the following would be simplier.

.diagHzLongtoWide <- function(df) as.data.frame.matrix(table(df[["peiid"]], df[["featkind"]]) > 0)

@brownag brownag added the NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions label Jan 20, 2021
@brownag
Copy link
Member

brownag commented Jan 20, 2021

See below


Can you install aqp off CRAN and see if you still have the issue?

This sounds data.table related -- cant say for sure without you posting the stack trace / error message -- and we just moved data.table into imports in aqp on the master branch off github (ncss-tech/aqp#157). I was worried about the downstream effect of merge.data.table getting triggered if we aren't explicit and I literally was about to go in and do some testing related to the upcoming 2.5.9 soilDB release #155 and my last remaining TODO.

There is a simple fix on aqp side in site<- if my suspicions are correct

@brownag brownag added the data.table Related to package-wide safety of data.table objects and methods label Jan 20, 2021
@brownag
Copy link
Member

brownag commented Jan 21, 2021

False alarm. Though i am gonna test the cr*p out of the data.table stuff now and have a couple more safety ideas.

Using latest GitHub:

# remotes::install_github(c("ncss-tech/aqp","ncss-tech/soilDB"), dep=F)

library(aqp)
#> This is aqp 1.27
#> 
#> Attaching package: 'aqp'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(soilDB)

packageVersion("aqp")
#> [1] '1.27'

packageVersion("soilDB")
#> [1] '2.5.9'

# populate selected set with 2015MT663%
f <- fetchNASIS()
#> mixing dry colors ... [1 of 98 horizons]
#> mixing moist colors ... [1 of 454 horizons]
#> Warning: some records are missing rock fragment volume, these have been removed
#> -> QC: some fragsize_h values == 76mm, may be mis-classified as cobbles [93 / 907 records]
#> Warning: some records are missing artifact volume, these have been removed
#> Warning: all records are missing artifact volume (NULL). buffering result with
#> NA. will be converted to zero if nullFragsAreZero = TRUE.
#> -> QC: horizon errors detected, use `get('bad.pedon.ids', envir=soilDB.env)` for related userpedonid values or `get('bad.horizons', envir=soilDB.env)` for related horizon designations


site(f)$ochric.epipedon
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
#> [25]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
#> [37] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
sum(site(f)$ochric.epipedon)
#> [1] 8

length(f)
#> [1] 46

# get extended data independently
ex <- get_extended_data_from_NASIS_db()
#> Warning: some records are missing rock fragment volume, these have been removed
#> -> QC: some fragsize_h values == 76mm, may be mis-classified as cobbles [93 / 907 records]
#> Warning: some records are missing artifact volume, these have been removed

#> Warning: all records are missing artifact volume (NULL). buffering result with
#> NA. will be converted to zero if nullFragsAreZero = TRUE.

# number of unique peiid in the diagnostics table
length(unique(ex$diagHzBoolean$peiid))
#> [1] 115

# 61
sum(ex$diagHzBoolean$ochric.epipedon)
#> [1] 61

# which diagnostics correspond to profiles in the SPC?
dp <- subset(ex$diagHzBoolean, (peiid %in% profile_id(f)) & ochric.epipedon)
length(unique(dp$peiid))
#> [1] 8

# as expected
length(unique(dp$peiid)) == sum(site(f)$ochric.epipedon)
#> [1] TRUE

@brownag brownag removed the data.table Related to package-wide safety of data.table objects and methods label Jan 21, 2021
@brownag
Copy link
Member

brownag commented Jan 21, 2021

On a related .diagHzLongtoWide() looks a bit verbose. I think something like the following would be simplier.

.diagHzLongtoWide <- function(df) as.data.frame.matrix(table(df[["peiid"]], df[["featkind"]]) > 0)

This sounds plausible, feel free to submit a PR I would gladly review it

@brownag
Copy link
Member

brownag commented Jan 21, 2021

I was out for a walk with the dog, and realized that rmHzErrors=FALSE would probably give you your expected value of 61. Can confirm that is the case for that 2015MT663 dataset -- which is one I am interested in making into a test case after the SQLite stuff goes live. This is a good test for fetchNASIS rmHzErrors and general "fixing" of database errors.

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Jan 21, 2021

.diagHzLongtoWide <- function(df) as.data.frame.matrix(table(df[["peiid"]], df[["featkind"]]) > 0)

Agreed, this function can probably be replaced with dcast(...). Making note of it in #161

@smroecker
Copy link
Member Author

Amazing what dog walks can do when it comes to trouble shooting. rmHzErrors = FALSE appears to have done the trick. Thanks!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions
Projects
None yet
Development

No branches or pull requests

3 participants