Do the different representations need custom classes? #1

jgabry · 2019-10-02T03:01:19Z

@paul-buerkner and I were discussing this today. it should be possible to do the conversions between representations without using methods by just checking the input objects, but we could use classes and methods if we want. Both have their appeal. We also have to consider the diagnostics and summaries. Without clases and methods we’d end up doing lots of checking of inputs in many different parts of the package, whereas the methods know the types of their inputs already. Anyway, just something to consider and we can proceed with implementing the internals of the conversion, summary, and diagnostic functions without having deciding this.

paul-buerkner · 2019-10-02T06:49:14Z

If we decide for custom classes, the most important part is not letting the user know they are dealing with them. We could achieve this in two ways. First just add a new class for a validated object. For instance, if we know that we are using a matrix, we can write

class(object) <- c("posterior_matrix", "matrix")

without actually adding any meta-data or anything else. Adding meta-data via attributes is super instable as it is removed via standard subsetting.

The other, potentially more complex (but perhaps more robust) implementation, is that we have one class with the different slots, most notably the draws slot, which contains the actual representation, and then some more meta-slots.

object <- list(draws = draws, ...)
class(object) <- c("posterior_matrix", "posterior_draws")

where posterior_draws is the parent class common to all representations. The difficult thing here will be to make sure that users can work with those objects as if they were working with the draws slot itself, so they don't know (or need to know) the additional meta-layer. As I write that, I find the second option actually quite appealing, as we can then use the same base-type for different representations, but I am not sure if it is worth the effort in the end.

paul-buerkner · 2019-10-02T13:47:46Z

I added a prototype for posterior_matrix and posterior_array classes based on rray objects. Here is some example code of how it could look like (this code actually works already):

x <- round(rnorm(50), 2)

A <- matrix(x, nrow = 10, ncol = 5)
dimnames(A) <- list(NULL, paste0("theta", 1:5))
(B <- as_posterior(A))
as_posterior_array(B)

C <- array(x, dim = c(5, 2, 5))
(D <- as_posterior(C))
as_posterior_matrix(D)

The current (ad-hoc) naming convention for matrix dimension are draw and par and for arrays iter, chain, and par. I remember that @mjskay had thought about it extensively. What kind of names did you choose in tidybayes?

jgabry · 2019-10-02T16:11:31Z

Cool thanks for starting to play around with this. I like this general direction. Another possibility is to use something like reference or R6 classes in combination with your approach. I'm not saying we should definitely do that, but something like this is kind of nice:

posterior <- R6::R6Class(
  "posterior",
  public = list(
    initialize = function(x) {
      private$draws <- x
    },
    # use paul's functions to do conversions
    to_matrix = function() as_posterior(private$draws, class = "matrix"),
    to_array = function() as_posterior(private$draws, class = "array")
  ),
  private = list(
    draws = NULL
  )
)

# Example
x <- round(rnorm(50), 2)
x <- matrix(x, nrow = 10, ncol = 5)
dimnames(x) <- list(NULL, paste0("theta", 1:5))

post <- posterior$new(x)
mat <- post$to_matrix()
arr <- post$to_array()

What do you think?

mjskay · 2019-10-02T16:16:38Z

The naming scheme discussion was here and the table with some options / opinions of a few folks here, though you probably don't want to wade through those :).

The high-level results were: chain, iteration, draw, and variable. variable was chosen over parameter because sometimes you have things that are not parameters, such as pushforward transformations of parameters. In tidybayes data frames these names all have . prepended to them to ensure they don't clobber any variable names from the model itself, but this is probably not necessary in array or matrix formats.

I have some trepidation about reinventing the wheel here if the plan is to create a new format. I think something like what the rv package does would be just about perfect if it could also include chain information.

jgabry · 2019-10-02T16:24:59Z

I have some trepidation about reinventing the wheel here if the plan is to create a new format. I think something like what the rv package does would be just about perfect if it could also include chain information.

I'm going to look more into the rv package today so I can refresh my memory. One question is why aren't many people using rv (looks like it has about 400 downloads a month) if it's so great? Is it just the lack of chain info? or just lack of awareness about its existence, or some other issue?

mjskay · 2019-10-02T16:31:04Z

Good question. Perhaps lack of awareness? Perhaps lack of maintenance? I think it was dropped from CRAN at some point and then got a new maintainer sometime in the past year or so. Possibly it is also just a bit of a pain to get things into that format, which we could help with here and in tidybayes if we adopted it in both.

paul-buerkner · 2019-10-02T17:14:43Z

I just pushed some work on arrays and matrices and added a bunch of convergence diagnostics from rstan. Building classes directly on top of rray is a pain as they are dropped by mathemtical, logical, and subsetting operations. Similar things might happen for other base R-ish classes. So I think if we want to have custom classes (and I argue it is probably good to have some) we should go the route of having draws as a slot in a bigger object.

I think an R6 class would actually be nice, but don't we want to keep the resulting object the same R6 class just with a different representation of the private draws slot? Right now, the above R6 class returns draws itself instead of the whole object with an amended information about how draws is stored. Is something like this possible with R6? (I have never used it before)

I will also take a look at the rv package, but honestly, I don't want to use a barely maintained pacakage at our core of a lot of packages. This looks unwise to me in the long run.

paul-buerkner · 2019-10-02T17:16:16Z

I like the naming convention of tidybayes and I would prefer using it in posterior. To make sure I understand correctly, how is draw and iteration differnt? The former increases continuously over chains and the latter is chain specific?

mjskay · 2019-10-02T17:28:31Z

re: draw: exactly. The value of having draw in the tabular format is that it simplifies doing operations draw-wise and simplifies some visualization tasks (like constructing spaghetti plots or draw-wise animations). Older versions of tidybayes did not have that column, but I found myself constructing it on the fly from chain and iteration so often that I just added it by default.

Re: rv: would one option be to reach out to the current maintainer and see if they would be willing to have us work on it?

Either that or I would advocate for creating a very similar interface (from the user perspective) to what rv has, possibly from scratch or possibly by using some of that code. The fact that most math operations "just work" and there are E() and Pr() operations with sensible semantics is very appealing. You can also store rv objects in tibbles pretty easily (I just checked) and operations work as expected (though we would want to define some printing functions a bit differently to get readable tabular output).

paul-buerkner · 2019-10-02T17:34:43Z

I agree that rv has a sensible user interface. I think copying some of the syntax would be a save option. Not sure how much we gain from taking over their project (if they let us) as compared to redoing some things. Both is generally fine for me. @jgabry what do you think?

In the meantime, I will play around with and R6 class and can present you some interface later on today or tomorrow.

mjskay · 2019-10-02T17:46:04Z

Yeah, I am fine with either. Pro of taking over is a bunch of code there already works (saves time). Con is some possible baggage: if we want to create a consistent interface with consistent terminology we probably would have to rename some things and/or change some of the existing output it gives.

E.g. currently it does this:

library(rv)
x = c(rvnorm(mean = 0, sd = 1), rvnorm(mean = 1, sd = 1))
x

     mean   sd   1% 2.5%   25%    50% 75% 97.5% 99% sims
[1] 0.021 1.00 -2.3 -1.9 -0.66 0.0074 0.7     2 2.4 4000
[2] 0.994 0.99 -1.2 -0.9  0.32 0.9856 1.7     3 3.4 4000

and I assume we would want to rename the quantiles columns and probably also sims; if we go with the draw terminology that might become draws or ndraw or something.

Similarly it would be great to overload the necessary functions for getting pretty printing in tibbles. E.g. this output:

df = tibble(x = c(rvnorm(mean = 0, sd = 1), rvnorm(mean = 1, sd = 1)))
df$y = x + 1
df

# A tibble: 2 x 2
  x                                                 y                                                
  <rv>                                              <rv>                                             
1 0.690907690, 1.409027699, 2.409375239, 2.3315525~ 1.8675633448, 0.3802025667, 2.0189151390, 1.7843~
2 0.9486634620, 0.5860807670, -0.1492639660, 1.283~ 0.2183170642, 2.6795406142, 0.8604936890, 0.2022~

Would be really nice if it instead looked something like this:

# A tibble: 2 x 2
  x       y      
  <rv>   <rv>  
1 0.02 ±1 1.02 ±1
2 0.99 ±1 1.99 ±1

Using mean or median and sd or mad_sd or something

mjskay · 2019-10-02T17:52:02Z

Re: R6: what is the motivation for using R6 here? Performance? Reference semantics are unfamiliar to a lot of R users so I'm not sure I see the motivation unless there is a performance-related one. Even then I would maybe advocate for hiding any R6 stuff behind a wrapper interface so that users don't need to know it exists.

paul-buerkner · 2019-10-02T18:10:17Z

It may yield a nice way to get around some complications we might be facing when working with different representations within the same object class. For me it is just something to try out right now and I am not married with the idea of actually using it eventually.

paul-buerkner · 2019-10-02T19:13:19Z

I have just pushed a prototype of an R6 class to github. With it, we can do stuff like this:

# Example
x <- round(rnorm(50), 2)
x <- matrix(x, nrow = 10, ncol = 5)
dimnames(x) <- list(NULL, paste0("theta", 1:5))

post <- posterior$new(x)
post$to_matrix()$draws()
post$to_array()
post$subset_variables("theta1")

Not saying we definitely need to use R6. We should be able use the same internal structure with a standard S3 class instead of an R6 class, but I think it is worth considering both options.

jgabry · 2019-10-02T20:15:35Z

A bunch of comments to some of the previous posts:

I think an R6 class would actually be nice, but don't we want to keep the resulting object the same R6 class just with a different representation of the private draws slot?

Yeah if we were to do R6 then I agree we'd want it to return an R6 object like your example.

Even then I would maybe advocate for hiding any R6 stuff behind a wrapper interface so that users don't need to know it exists.

Yeah maybe that's a possibility too.

I have mixed feelings about the reference semantics: on the one hand @mjskay is right that most R users are not very familiar with it. On the other hand, I think we'll be seeing more examples of reference semantics in R going forward (RStudio seems to like them and they also seem to be pushing for more integration of R and python, and reference classes (RC) or R6 are the most similar to what python does). Here are a few other things I particularly like about RC/R6 objects:

They can have private immutable slots in the object. One of the pitfalls of S3 lists is that they can be modified super easily and then the associated methods are not guaranteed to work.
I also really like that the RC/R6 objects have methods accessible via the object as opposed to standalone functions. Being able to typex$ and getting a list of the methods is super useful, especially with tab complete.

I'm not saying those are enough of a reason to do it, just thinking out loud.

I agree that rv has a sensible user interface. I think copying some of the syntax would be a save option. Not sure how much we gain from taking over their project (if they let us) as compared to redoing some things. Both is generally fine for me. @jgabry what do you think?

It seems like more work to do our own implementations when rv almost has what we want, but it actually might be less work in the long run to do it ourselves and have full control over the implementation. I'm really not sure on this one. I think we could also copy some of their code and use it / modify it as needed as long as we give proper credit in the DESCRIPTION file. It looks like rv is GPL-2, so that would affect our choice of license (I don't remember the difference between GPL 2 and 3 off the top of my head).

jgabry · 2019-10-02T20:19:30Z

Following up on my comments about RC/R6, it's super nice that with what @paul-buerkner did I can do

post <- posterior$new(x)

and then type post$ [tab] and I get a list of all the methods like to_matrix, to_array, etc. There's some other stuff that shows up too but I think we can get rid of that.

jgabry · 2019-10-02T21:32:40Z

Forgot to say that I like the tidybayes names too. In particular variable instead of parameter seems like a smart choice!

mjskay · 2019-10-03T03:36:50Z

Thanks!

Re: R6: yeah, good point about autocomplete. I do worry about building a class that other packages are meant to use / build on that doesn't use function-style OOP like S3/S4 (at least for the user-facing portion). It seems like R6 makes it harder for other packages to provide generic functions that work with the class and integrate well into code that uses multiple packages. It's one of those problems that basically every traditional OOP language has that R's style of OOP neatly bypasses, and is (I think) a strength of the language. I haven't seen good solutions to it for R6, have you?

mjskay · 2019-10-03T03:46:25Z

On further investigation, a possibly bigger issue with R6 if we go with an rv-style interface (which includes arrays of random variables), from an R6 issue on adding indexing to a class:

I would strongly argue against providing a [ method (unless that error just throws an error message). The [ method needs to be compatible with length() and names(), and also [[ and $. Defining a [ method that is not compatible is going to cause many weird problems.

Another way of saying this is that R6 provides scalar objects. If you want to provide vector objects, you need to use a different OO system.

from here

paul-buerkner · 2019-10-03T09:24:01Z

After giving it more thought and reading your comments, I agree with @mjskay that R6 may not be a good choice and even if it is just because everything else around it uses S3 (and perhaps S4 although I am not a fan of the latter). If I translate our R6 prototype to S3, it would look something like this.

as_posterior <- function(x) {
  out <- list(
    draws = as_posterior_draws(x),
    format = detect_draws_format(x)
  )
  class(out) <- "posterior"
}

and then for instance

as_posterior_array.posterior <- function(x, ...) {
   x$draws <- .as_posterior_array(x$draws, ...)
   x$format <- "array"
   x
}

As in the R6 class, the key thing is that we need to store both the draws object and its format in a combined object so that we do not have to (necessarily) assign custom classes to the draws object itself that indicates its format. In other words, if the format is "matrix" or "array" we can just work with rray objects. If the format is "tidy_data_frame" (or whatever) than we can just use the tibbles that tidybayes uses etc.

What do you think?

paul-buerkner · 2019-10-03T15:10:05Z

An additional point of discussion: Shall we name the object posterior, or posterior_draws or something else? If we decide to go with posterior we need to check with Mans as he also uses the
object name posterior in his upcoming posterior data base package for a different purpose.

paul-buerkner · 2019-10-03T15:25:25Z

I have now also added support for S3 classes so that we can compare the two possible class systems to facilitate comparison:

# Example
x <- round(rnorm(50), 2)
x <- matrix(x, nrow = 10, ncol = 5)
dimnames(x) <- list(NULL, paste0("theta", 1:5))

# with S3
(post <- as_posterior_draws(x))
(post <- to_posterior_array(post))
extract_one_variable_matrix(post, "theta2")

# with R6
post2 <- posterior$new(x)
post2$to_array()
post2$extract_one_variable_matrix("theta2")

mjskay · 2019-10-03T15:55:27Z

Re: names, I wonder if there's a more generic name for this (e.g. what if someone is looking at samples from the prior?). What is the generic thing here? A sample from a joint distribution? What about joint_draws?

paul-buerkner · 2019-10-03T16:18:15Z

Re: names, this is a good point. joint_draws sounds good to me. @jgabry any thoughts?

jgabry · 2019-10-03T16:23:24Z

Good points of the potential limitations of R6. I’m in a rush right now but will try out the S3 stuff you implemented later today.

Also, good point that the draws could be from the prior, so I agree maybe we need a different name for the draws. The problem with joint_draws is that joint usually refers to p(y, theta) so could be confusing if it just refers to draws from p(theta). Of course if theta has more than one element then it’s also a joint distribution, but I think people will think of joint as including y. What about just calling the draws slot sample?

paul-buerkner · 2019-10-03T16:29:38Z

I think we have two naming issues here. (1) The name of the object slot in which draws are stored. (2) the name of the object itself. I think Matthew and I were talking about (2), but (1) needs some discussion as well.

For (1) I would be happy with draws as sample might be seen as two data related.
For (2) we can of course also just go with draws if joint is too specific.

mjskay · 2019-10-03T17:43:19Z

For (1) I would be happy with draws as sample might be seen as two data related. For (2) we can of course also just go with draws if joint is too specific.

I agree with both points separately (draws is a good name for either) but using the same name for both might create ambiguities. It could be fine if (1) is never exposed to users (so at least for users there is no ambiguity).

jgabry · 2019-10-03T17:48:43Z

I’m fine with draws as the user facing name.

paul-buerkner · 2019-10-03T17:52:25Z

The outside draws would be an object class but the inside draws would be a slot in the object with various possible classes. If we settle for S3 objects (and I think this is currently likely), the slot name is more or less exposed to the users although we would never force them to access it directly, as we provide methods for this purpose. The question is about the name of those extractors. Suppose both would be named draws, then we had

get_draws.draws <- function(x, ...) {
  x$draws
}

so there is some user-facing ambiguity if we use the same name twice. One way to avoid that, and I believe we plan this anyway, is to make the outer and inner object indistinquishable from one another from the user perspective, by providing [, [[, and $ methods which all target the inner object. The only drawback is then that we need to internally use special extractor functions as we cannot use those methods for the outer object anymore in a standard fashion.

jgabry · 2019-10-03T17:54:56Z

In terms of the S3 vs R6 example in @paul-buerkner’s recent comment, I find the R6 code more aesthetically pleasing (personal preference probably), but I’m totally fine going the S3 route if the limitations of R6 that @mjskay pointed out are insurmountable. or if either or both of you really think users won’t like the R6 approach then that’s a good reason not to do it too.

paul-buerkner · 2019-10-03T17:55:21Z

Side note: The outer object could contain more slots than draws and format. It may also contain information about iterations, chains, and variables. This is partially redundant to the information stored in the draws slot, but would allow us to bring an array to a matrix and back to an array without loosing information about the chains (which are otherwise lost through conversion to a matrix).

jgabry · 2019-10-03T17:57:00Z

@paul-buerkner I hadn’t seen your most recent comment before I posted mine. I’ll think about this.

jgabry · 2019-10-03T17:58:04Z

Regarding your side note, that’s true. And storing that extra info wouldn’t take up much memory at all.

mjskay · 2019-10-03T18:03:49Z

Question about format: If we are creating our own internal format for meta-data anyway, would it make more sense to just decide on a consistent internal format for the draws slot that any input data structure is converted to rather than trying to support multiple internal formats? It seems like supporting many internal formats is likely to add complexity to the code; is that complexity worth it?

paul-buerkner · 2019-10-03T18:08:46Z

Two brief points:

Re format: I think it is one of the primary ideas of the package to support multiple formats, as different formats might be more efficient or preferred for other reasons in different contexts. Supporting only one format may be a reason why other packages failed with this task or are not as widely applied as hoped.

Re inner/outer object: Ideally, I would like to make the inner and outer object indistinquishable from the user perspective, but this may imply definining new methods for all sorts of things such as +, - etc. all up to log, sin etc. Do you see an efficient way of storing meta-data without causing this whole tail of problems? Attributes are an obivious solution, but they will be removed faster than we can add them back...

mjskay · 2019-10-03T18:37:02Z

Re format: I think it is one of the primary ideas of the package to support multiple formats, as different formats might be more efficient or preferred for other reasons in different contexts. Supporting only one format may be a reason why other packages failed with this task or are not as widely applied as hoped.

Ah. Maybe I am misunderstanding the purpose of the package or what it means to support multiple formats. I guess I had imagined coming up with a single primary format that supports most use cases, and then providing functions for converting between formats. But I didn't expect all the functions in the package to directly support all formats being passed into them; that instead they would be written to use the base format. It was from this mental model that I had originally suggested not having a bunch of different classes.

However, if I understand correctly, you are proposing that converting to different "formats" would still produce objects that are supported by every other function in the package. I could definitely see value in that (e.g. a version of ess or whatever that works on various different formats, returning something sensible and consistent for each one). But then I think having separate classes starts to make more sense, as then you can define generic functions representing common operations more easily. So creating a posterior_matrix (or draws_matrix or whatever) would be very efficient (essentially just tag it as such, adding a class on top). This would also make the appropriate generic functions work without requiring us to implement any kind of mirroring of functions already supported by that class, like + and - and whatnot. E.g. classes like c("draws", "matrix") or c("draws_matrix", "draws", "matrix") or something.

paul-buerkner · 2019-10-03T19:56:55Z

That is not exactly what I ment. Rather I think that we will work with a specific format most efficient for a given task. For instance when it comes to convergence diagnostics we (likely) best work with an array format to then be subsetted for each variable (which is then a iteration times chain matrix). Or a list of these matrices. For other purposes, it might be more efficient to work with other format such as tidy data frames. So I think the package should do is converting formats into each other but always return a fixed output format specific to each processing function such as Rhat, ESS etc. Does that make more sense? And is it closer to what you had in mind? Matthew Kay <notifications@github.com> schrieb am Do., 3. Okt. 2019, 20:37:

…

Re format: I think it is one of the primary ideas of the package to support multiple formats, as different formats might be more efficient or preferred for other reasons in different contexts. Supporting only one format may be a reason why other packages failed with this task or are not as widely applied as hoped. Ah. Maybe I am misunderstanding the purpose of the package or what it means to support multiple formats. I guess I had imagined coming up with a single primary format that supports most use cases, and then providing functions for converting between formats. But I didn't expect all the functions in the package to directly support all formats being passed into them; that instead they would be written to use the base format. It was from this mental model that I had originally suggested not having a bunch of different classes. However, if I understand correctly, you are proposing that converting to different "formats" would still produce objects that are supported by every other function in the package. I could definitely see value in that (e.g. a version of ess or whatever that works on various different formats, returning something sensible and consistent for each one). But then I think having separate classes starts to make more sense, as then you can define generic functions representing common operations more easily. So creating a posterior_matrix (or draws_matrix or whatever) would be very efficient (essentially just tag it as such, adding a class on top). This would also make the appropriate generic functions work without requiring us to implement any kind of mirroring of functions already supported by that class, like + and - and whatnot. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADCW2AE6AN5AWBEI6SBYSJDQMY3U5ANCNFSM4I4RENMQ> .

mjskay · 2019-10-03T20:33:00Z

Ah, I think that makes sense. So the idea is a standard internal format that can be converted easily, right? Then is the idea to define functions like this:

ess = function(draws, ...) {
  UseMethod("ess", draws)
}
ess.draws = function(draws, ...) {
  ess(as_draws_array(draws), ...)
}
ess.draws_array = function(draws, ...) {
  # insert actual implementation of ess here
}

Then XXX.draws is always defined for a given function, but other implementations may not be.
Then as_draws_array.draws() (or maybe as_array.draws()) returns an object with class c("draws_array", "array") (or maybe just "array" --- but the advantage of having "draws_array" in there is a guarantee that it's in the right format, so ess can rely upon the array being in the correct format). Then, if someone is doing a bunch of operations that are most efficient with the array representation, they could do that conversion up front and call those functions on the converted object. And if they don't do the conversion up front, it all still "just works" from the user's perspective.

paul-buerkner · 2019-10-03T20:59:17Z

Right. If we go ahead and define custom classes for each format, we may not even need the "high-level" object class. It may store some meta-information but at the expense of not allowing mathematical operations and whatnot directly on the posterior draws (unless we invest a huge amount of effort).

Even without the high-level class, we could simply have

ess.draws_matrix <- function(draws, ...) {
  ess(as_draws_array(draws), ...)
}

or even simpler

ess.default <- function(draws, ...) {
  ess(as_draws_array(draws), ...)
}

For other methods, another format may be more canonical, so we would transform all other formats into that specific format beforehand. If this turns out to be a speed issue in some cases, we can still implement a specific methods for different formats (with the same output structure of course). For instance

ess.draws_list = function(draws, ...) {
  # some ess implementation without converting to draws_array first
}

Base R object seem class stable under vectorized operations, which is good for our purposes

A <- array(1:27, dim = c(3, 3, 3))
class(A) <- c("draws_array", class(A))
class(A)
> [1] "draws_array" "array" 
class(A+A)
> [1] "draws_array" "array"

For the matrix and array format, I may actually prefer using the rray package as it seems more consistent to me (dropping of dimension wise for instance). However, it currently drops custom classes:

B <- rray::as_rray(array(1:27, dim = c(3, 3, 3)))
class(B) <- c("draws_array", class(B))
class(B)
> [1] "draws_array"    "vctrs_rray_int" "vctrs_rray"     "vctrs_vctr"
class(B+B)
> [1] "vctrs_rray_int" "vctrs_rray"     "vctrs_vctr"

which is annoying and prevents its usage for our purposes. I have already opened an issue for it (r-lib/rray#247) so lets see if their is a solution. In the worst case, we can stick to base R matrices and arrays but redefine [ for our custom class so that drop is always FALSE.

paul-buerkner · 2019-10-04T15:42:00Z

I interprete your "thumbs up" as if you agree with me on going that route. I am starting to change our prototype in that direction.

paul-buerkner · 2019-10-04T17:01:55Z

Done. Switching to the new set of custom classes based on base R classes allows for a surprisingly clean code structure. The user interface also looks pretty clean and R-like:

# Example
x <- round(rnorm(50), 2)
x <- matrix(x, nrow = 10, ncol = 5)
dimnames(x) <- list(NULL, paste0("theta", 1:5))

post <- as_draws_matrix(x)
post <- as_draws_array(post)
subset_variables(post, "theta2")
# expected format for most convergence diagnostics
extract_one_variable_matrix(post, "theta2")

Do I have your ok for continuing with this format, or would you prefer another structure?

mjskay · 2019-10-04T17:53:59Z

This looks great to me!

paul-buerkner · 2019-10-04T21:06:53Z

@jgabry what are your thoughts? I am asking because if we agree on that, I can continue working in this direction in the upcoming days if I have time.

jgabry · 2019-10-04T21:13:04Z

Thanks! And yeah if you have time go for it!

jgabry · 2019-10-04T21:16:27Z

I’m not sure about the names for all the functions (I don’t dislike them, just haven’t thought about it yet) but that can be decided later. But I do like the approach so if you’re on a roll and want to continue with the implementation that’d be great!

paul-buerkner · 2019-10-04T21:30:57Z

thanks! Yeah, the names can be changed at any time, some of them are pretty ad-hoc. If I find the time during the weekend or on monday, I am going to implement flattend lists. All formats which take the dimension of the variables into account are a little bit more involved. @mjskay you have some experience with those in tidybayes, don't you?

jgabry · 2019-10-04T21:48:20Z

Also, I was thinking about custom indexing methods and, in addition to making drop=FALSE, it's easy to have the indexing method preserve the custom class (and dimnames or whatever attributes we want). For example:

`[.foo` <- function(x, i, j, ...) {
  out <- NextMethod("[", drop = FALSE)
  class(out) <- class(x)
  dimnames(out) <- dimnames(x) # can do other attributes too if we want
  out
}

x <- array(rnorm(1000), dim = c(10, 10, 10))
class(x[1:5, 1:2, 1]) # "matrix"
dim(x[1:5, 1:2, 1])   # c(5, 2)

class(x) <- "foo"
class(x[1:5, 1:2, 1]) # "foo", but without the custom method would be "matrix" or "array" (if drop=FALSE)
dim(x[1:5, 1:2, 1])   # c(5, 2, 1)

paul-buerkner · 2019-10-04T22:30:16Z

I did this already :-D Jonah Gabry <notifications@github.com> schrieb am Fr., 4. Okt. 2019, 23:48:

…

Also, I was thinking about custom indexing methods and, in addition to making drop=FALSE, it's easy to have the indexing method preserve the custom class (and dimnames or whatever attributes we want). For example: `[.foo` <- function(x, i, j, ...) { out <- NextMethod("[", drop = FALSE) class(out) <- class(x) dimnames(out) <- dimnames(x) # can do other attributes too if we want out } x <- array(rnorm(1000), dim = c(10, 10, 10)) class(x[1:5, 1:2, 1]) # "matrix" dim(x[1:5, 1:2, 1]) # c(5, 2) class(x) <- "foo" class(x[1:5, 1:2, 1]) # "foo" dim(x[1:5, 1:2, 1]) # c(5, 2, 1) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADCW2AGMQI73Q5PCX5MISFTQM622JANCNFSM4I4RENMQ> .

paul-buerkner · 2019-10-04T22:31:01Z

But not as elegantly I guess. May need to update this. Paul Buerkner <paul.buerkner@gmail.com> schrieb am Sa., 5. Okt. 2019, 00:30:

…

I did this already :-D Jonah Gabry ***@***.***> schrieb am Fr., 4. Okt. 2019, 23:48: > Also, I was thinking about custom indexing methods and, in addition to > making drop=FALSE, it's easy to have the indexing method preserve the > custom class (and dimnames or whatever attributes we want). For example: > > `[.foo` <- function(x, i, j, ...) { > out <- NextMethod("[", drop = FALSE) > class(out) <- class(x) > dimnames(out) <- dimnames(x) # can do other attributes too if we want > out > } > x <- array(rnorm(1000), dim = c(10, 10, 10)) > class(x[1:5, 1:2, 1]) # "matrix" > dim(x[1:5, 1:2, 1]) # c(5, 2) > > class(x) <- "foo" > class(x[1:5, 1:2, 1]) # "foo" > dim(x[1:5, 1:2, 1]) # c(5, 2, 1) > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#1>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ADCW2AGMQI73Q5PCX5MISFTQM622JANCNFSM4I4RENMQ> > . >

mjskay · 2019-10-05T04:50:26Z

All formats which take the dimension of the variables into account are a little bit more involved. @mjskay you have some experience with those in tidybayes, don't you?

Yeah. Some thoughts on that off the top of my head:

I'm not yet sure we need to support generating the longer-form data frame format that tidybayes generates from spread_draws() and gather_draws() in this package. That is the format where dimensions of variables are themselves pulled out into columns. That format can be very useful, but also is not a format you can automatically transform the entire posterior into: it typically only makes sense to do it with subsets of the posterior for variables that share some dimensions, and spread_draws() / gather_draws() requires the user to specify such a subset in order to generate the data frame. Thus it might make sense to leave that code in tidybayes but make sure that the general format for data frames of draws is the same in both packages (e.g. column names and the class name we decorate it with).
To that end, I think we should support the basic data-frame-of-draws that tidybayes uses here (i.e. a .chain, .iteration, and .draw column plus every other column is a variable/parameter). In fact if this package handled all the conversion code I could simplify tidybayes::tidy_draws() to use this package's conversion functions instead of implementing them in tidybayes.
For other formats that handle dimensions, I think it would be great if we can make named dimensions as painless as possible (e.g. if you have variables like "x[foo]", "x[bar]", ... these should retain the "foo" and "bar" dimension names if extracted as an array). One thing tidybayes currently does to help with this is to provide a mechanism for automatically mapping indices back onto factor levels if a variable index was created from a factor. So if you have "x[1]", "x[2]", ... you can provide a factor using recover_types() to have the dimension names mapped back onto character strings whenever you use spread_draws() / gather_draws(). We might want a similar mechanism for formats like arrays.

paul-buerkner · 2019-10-05T06:33:07Z

Thanks @mjskay! I realized we are now already in the topic of another issue with this and so I have copied over your response to #4. Since we decided to go for custom classes, I think we can close this issue here.

mjskay mentioned this issue Oct 4, 2019

Glossary of nouns and verbs for the package #5

Closed

paul-buerkner mentioned this issue Oct 5, 2019

Which representations of posterior draws are included? #4

Closed

paul-buerkner added this to the beta release milestone Oct 5, 2019

paul-buerkner closed this as completed Oct 5, 2019

Do the different representations need custom classes? #1

Do the different representations need custom classes? #1

Comments

jgabry commented Oct 2, 2019

paul-buerkner commented Oct 2, 2019

paul-buerkner commented Oct 2, 2019

jgabry commented Oct 2, 2019

mjskay commented Oct 2, 2019

jgabry commented Oct 2, 2019 • edited Loading

mjskay commented Oct 2, 2019

paul-buerkner commented Oct 2, 2019

paul-buerkner commented Oct 2, 2019

mjskay commented Oct 2, 2019 • edited Loading

paul-buerkner commented Oct 2, 2019

mjskay commented Oct 2, 2019

mjskay commented Oct 2, 2019 • edited Loading

paul-buerkner commented Oct 2, 2019

paul-buerkner commented Oct 2, 2019

jgabry commented Oct 2, 2019 • edited Loading

jgabry commented Oct 2, 2019

jgabry commented Oct 2, 2019

mjskay commented Oct 3, 2019

mjskay commented Oct 3, 2019

paul-buerkner commented Oct 3, 2019 • edited Loading

paul-buerkner commented Oct 3, 2019

paul-buerkner commented Oct 3, 2019

mjskay commented Oct 3, 2019

paul-buerkner commented Oct 3, 2019

jgabry commented Oct 3, 2019

paul-buerkner commented Oct 3, 2019

mjskay commented Oct 3, 2019

jgabry commented Oct 3, 2019

paul-buerkner commented Oct 3, 2019 • edited Loading

jgabry commented Oct 3, 2019

paul-buerkner commented Oct 3, 2019

jgabry commented Oct 3, 2019

jgabry commented Oct 3, 2019 • edited Loading

mjskay commented Oct 3, 2019

paul-buerkner commented Oct 3, 2019

mjskay commented Oct 3, 2019 • edited Loading

paul-buerkner commented Oct 3, 2019 via email

mjskay commented Oct 3, 2019

paul-buerkner commented Oct 3, 2019

paul-buerkner commented Oct 4, 2019

paul-buerkner commented Oct 4, 2019

mjskay commented Oct 4, 2019

paul-buerkner commented Oct 4, 2019

jgabry commented Oct 4, 2019

jgabry commented Oct 4, 2019

paul-buerkner commented Oct 4, 2019

jgabry commented Oct 4, 2019 • edited Loading

paul-buerkner commented Oct 4, 2019 via email

paul-buerkner commented Oct 4, 2019 via email

mjskay commented Oct 5, 2019

paul-buerkner commented Oct 5, 2019

jgabry commented Oct 2, 2019 •

edited

Loading

mjskay commented Oct 2, 2019 •

edited

Loading

mjskay commented Oct 2, 2019 •

edited

Loading

jgabry commented Oct 2, 2019 •

edited

Loading

paul-buerkner commented Oct 3, 2019 •

edited

Loading

paul-buerkner commented Oct 3, 2019 •

edited

Loading

jgabry commented Oct 3, 2019 •

edited

Loading

mjskay commented Oct 3, 2019 •

edited

Loading

jgabry commented Oct 4, 2019 •

edited

Loading