From 58d4e12dac123b3123318d6b0e2ea475c465f375 Mon Sep 17 00:00:00 2001 From: TolisChal Date: Thu, 1 Jul 2021 21:29:57 +0300 Subject: [PATCH 1/3] update comparisons --- _posts/2021-06-30-volesti-comparisons.md | 33 +++++++++--------------- 1 file changed, 12 insertions(+), 21 deletions(-) diff --git a/_posts/2021-06-30-volesti-comparisons.md b/_posts/2021-06-30-volesti-comparisons.md index 85c97a3..f4c1e5a 100644 --- a/_posts/2021-06-30-volesti-comparisons.md +++ b/_posts/2021-06-30-volesti-comparisons.md @@ -51,7 +51,7 @@ cat(min(coda::effectiveSize(t(samples1))) / time1[3], And the output of this script is, -``` +```R 0.02818878 74.18608 ``` @@ -64,22 +64,11 @@ This means that the performance of *volesti* is ~2500 times faster than the perf In many Bayesian models the posterior distribution is a multivariate Gaussian distribution restricted to a specific domain. We illustrate the efficiency of *volesti* for the case of the truncation being the canonical simplex,
-$$\Delta^n =\{ x\in\R^n\ |\ x_i\geq 0,\ \sum_ix_i=1 \}$$ -
+$$\Delta^n =\{ x\in\mathbb{R}^n\ |\ x_i\geq 0,\ \sum_ix_i=1 \}.$$ +
This case is of special interest. This situation typically occurs whenever the unknown parameters can be interpreted as fractions or probabilities. Thus, it appears in many important applications; for more details you can read this [paper](https://ieeexplore.ieee.org/document/6884588). -The probability density function we are going to sample from in the following example is, - -
-$$f(x|\mu,\Sigma) \propto \left\{ -\begin{array}{ll} - exp[-\frac{1}{2}(x-\mu)^T\Sigma(x-\mu)], & \mbox{ if } x\in\Delta^n ,\\ - 0, & \mbox{otherwise.}\\ -\end{array} -\right. $$ -
- In the following script we generate a random 100-dimensional covariance, we apply the necessary linear transformations and use *volesti*, *tmg* and *restrictedMVN* for sampling. The following script applies the necessary linear transformations and then uses the three packages to sample from the previous probability density function. @@ -117,7 +106,7 @@ cat(min(coda::effectiveSize(t(samples1))) / time1[3], And the output of this script is, -``` +```R 30.3589 3.936369 ``` @@ -152,7 +141,7 @@ time1 = system.time({geom_values = geometry::convhulln(P@V, options = 'FA')}) and then, we have a run-time error,, -``` +```R QH6082 qhull error (qh_memalloc): insufficient memory to allocate 1329542232 bytes ``` @@ -167,7 +156,7 @@ cat(time2[3], vol2) The output is, -``` +```R 23.888 2.673747398e-07 ``` @@ -181,13 +170,15 @@ On the other hand *volesti* can be used to approximate the value of such an inte
$$I = \int_P f(x)dx .$$ -
+
+ Then sample N uniformly distributed points from P and,
-$$I\approx \vol(P)\frac{1}{N}\sum_{i=1}^N f(x_i) .$$ -
+$$I \approx vol(P) (1/N) \sum_{i} f(x_i).$$ +
+ The following script generates a polytope for dimension d = 5, 10, 15, 20 and defines a function f. Then computes the exact value of the integral of f over each polytope. In the sequel it approximates that integral by using *volesti*. @@ -220,7 +211,7 @@ for (d in seq(from = 5, to = 20, by = 5)) { The output is, -``` +```R 5 0.02738404 0.02446581 0.1065667 0.023 3.983 10 3.224286e-06 3.204522e-06 0.00612976 3.562 11.95 15 4.504834e-11 4.867341e-11 0.08047068 471.479 33.256 From 5a948482240c9c0b3ef0b7110612f70f8a116a71 Mon Sep 17 00:00:00 2001 From: TolisChal Date: Thu, 8 Jul 2021 21:43:23 +0300 Subject: [PATCH 2/3] update the latest post --- ...-06-30-Sampling-integration-volume-in-R.md | 57 ++++++++++++++++--- 1 file changed, 50 insertions(+), 7 deletions(-) diff --git a/_posts/2021-06-30-Sampling-integration-volume-in-R.md b/_posts/2021-06-30-Sampling-integration-volume-in-R.md index 8eba136..45f9898 100644 --- a/_posts/2021-06-30-Sampling-integration-volume-in-R.md +++ b/_posts/2021-06-30-Sampling-integration-volume-in-R.md @@ -80,12 +80,45 @@ The output of the above script, shows that the performance of `volesti` is ~2500 times faster than `hitandrun`. +The following R script generates a random polytope for each dimension and samples 10000 points (walk length = 5) with both packages. + +```{r} +for (d in seq(from = 10, to = 100, by = 10)) { + + P = gen_cube(d, 'H') + tim = system.time({ sample_points(P, n = 10000, random_walk = list( + "walk" = "RDHR", "walk_length"=5)) }) + + constr <- list(constr = P$A, dir=rep('<=', 2*d), rhs=P$b) + tim = system.time({ hitandrun::hitandrun(constr, n.samples = 10000, thin=5) }) +} +``` +The following Table reports the run-times. + +| dimension | volesti time | hitandrun time | +| --------- | ------------ | -------------- | +| 10 | 0.017 | 0.102 | +| 20 | 0.024 | 0.157 | +| 30 | 0.031 | 0.593 | +| 40 | 0.043 | 1.471 | +| 50 | 0.055 | 3.311 | +| 60 | 0.069 | 6.460 | +| 70 | 0.089 | 11.481 | +| 80 | 0.108 | 19.056 | +| 90 | 0.132 | 33.651 | +| 100 | 0.156 | 50.482 | + +The following figure illustrates the asymptotic in the dimension difference in run-time between `volesti` and `hitandrun`. + +![ratio_times](https://user-images.githubusercontent.com/29889291/77155777-b9a5fd00-6aa6-11ea-90d9-8e4602ffe5c8.png) + ## Comparison with restrictedMVN and tmg In many Bayesian models the posterior distribution is a multivariate Gaussian -distribution restricted to a specific domain. +distribution restricted to a specific domain. For more details on Bayesian inference you can read the [Wikipedia page](https://en.wikipedia.org/wiki/Bayesian_statistics). + We illustrate the efficiency of `volesti` for the case of the truncation being the canonical simplex,
@@ -97,9 +130,13 @@ the unknown parameters can be interpreted as fractions or probabilities. Thus, it appears naturally in many important [applications](https://ieeexplore.ieee.org/document/6884588). -In our example we first generate a random 100-dimensional covariance matrix and -apply all the necessary linear transformations to construct a full dimensional -polytope. +In our example we first generate a random 100-dimensional positive definite matrix. +Then, we sample from the multivariate Gaussian distribution with that covariance +matrix and mode on the center of the canonical simplex. To achieve this goal we first +apply all the necessary linear transformations to both the probability density function and the +canonical simplex. Last, we sample from the standard Gaussian distribution restricted to +the computed full dimensional simplex and we apply the inverse transformations to obtain +a sample in the initial space. ```R d = 100 @@ -118,8 +155,7 @@ P = Hpolytope(A=A, b=as.numeric(b)) #new truncation ``` -Then we use `volesti`, `tmg` and `restrictedMVN` to sample from the polytope -following a target Gaussian distribution. +Now we use `volesti`, `tmg` and `restrictedMVN` to sample, ```R time1 = system.time({samples1 = sample_points(P, n = 100000, random_walk = @@ -145,7 +181,14 @@ the output of the script, 30.3589 3.936369 ``` -allow us to conclude that `volesti` is one order of magnitude faster than +Now we can apply the inverse transformation, + +```R +samples_initial_space = N %*% samples1 + + kronecker(matrix(1, 1, 100000), matrix(shift, ncol = 1)) +``` + +Allow us to conclude that `volesti` is one order of magnitude faster than `restrictedMVN` for computing a sample of similar quality. Unfortunately, `tmg` failed to terminate in our setup. From cc11694a46a2686312d4c51581680fa18f32e3bc Mon Sep 17 00:00:00 2001 From: TolisChal Date: Thu, 8 Jul 2021 21:45:01 +0300 Subject: [PATCH 3/3] minor update --- _posts/2021-06-30-Sampling-integration-volume-in-R.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2021-06-30-Sampling-integration-volume-in-R.md b/_posts/2021-06-30-Sampling-integration-volume-in-R.md index 45f9898..e64de81 100644 --- a/_posts/2021-06-30-Sampling-integration-volume-in-R.md +++ b/_posts/2021-06-30-Sampling-integration-volume-in-R.md @@ -82,7 +82,7 @@ shows that the performance of `volesti` is ~2500 times faster than `hitandrun`. The following R script generates a random polytope for each dimension and samples 10000 points (walk length = 5) with both packages. -```{r} +```R for (d in seq(from = 10, to = 100, by = 10)) { P = gen_cube(d, 'H')