diff mbox

libstdc++: add uniform on sphere distribution

Message ID 87d2catvi3.fsf@x240.local.i-did-not-set--mail-host-address--so-tickle-me
State New
Headers show

Commit Message

Ulrich Drepper Aug. 8, 2014, 11:24 p.m. UTC
Jonathan Wakely <jwakely@redhat.com> writes:

> On 23/07/14 11:58 +0200, Marc Glisse wrote:
> As an aside, we already have divide-by-zero bugs in <ext/random>, it
> would be nice if someone could look at that.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60037

Sorry, it took a while to get back to tihs and now the referenced bug is
already fixed.  Good.  Now also for a fix of the sphere distribution.
Unless someone objects I'll check in the patch below.


2014-08-08  Ulrich Drepper  <drepper@gmail.com>

	* include/ext/random.tcc
	(uniform_on_sphere_distribution::__generate_impl): Reject
	vectors with norm zero.

Comments

Marc Glisse Aug. 9, 2014, 7:15 a.m. UTC | #1
On Fri, 8 Aug 2014, Ulrich Drepper wrote:

> Now also for a fix of the sphere distribution. Unless someone objects 
> I'll check in the patch below.
>
>
> 2014-08-08  Ulrich Drepper  <drepper@gmail.com>
>
> 	* include/ext/random.tcc
> 	(uniform_on_sphere_distribution::__generate_impl): Reject
> 	vectors with norm zero.

While there, do we want to also reject infinite norms?
I would have done: while (__sum < small || __sum > large)
but testing exactly for 0 and infinity seems good enough.
Ulrich Drepper Aug. 9, 2014, 11:54 a.m. UTC | #2
On Sat, Aug 9, 2014 at 3:15 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
> While there, do we want to also reject infinite norms?
> I would have done: while (__sum < small || __sum > large)
> but testing exactly for 0 and infinity seems good enough.

I guess the squaring can theoretically overflow and produce infinity.
It will never happen with the way we generate normally distributed
numbers, though.  These values are always so unlikely that it is OK
that the algorithms cannot return them.  If you insist I'll add a test
for infinity.

The other change (which would eliminate the necessity for this test in
a special case) is to use hypot for _Dimen==2.  This might be a case
common enough to warrant that little bit of extra text.  I'll prepare
a patch.
Marc Glisse Aug. 9, 2014, 12:34 p.m. UTC | #3
On Sat, 9 Aug 2014, Ulrich Drepper wrote:

> On Sat, Aug 9, 2014 at 3:15 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> While there, do we want to also reject infinite norms?
>> I would have done: while (__sum < small || __sum > large)
>> but testing exactly for 0 and infinity seems good enough.
>
> I guess the squaring can theoretically overflow and produce infinity.
> It will never happen with the way we generate normally distributed
> numbers, though.  These values are always so unlikely that it is OK
> that the algorithms cannot return them.  If you insist I'll add a test
> for infinity.

Oh, a comment saying exactly what you just said would be fine with me (or 
even nothing).

> The other change (which would eliminate the necessity for this test in
> a special case) is to use hypot for _Dimen==2.  This might be a case
> common enough to warrant that little bit of extra text.  I'll prepare
> a patch.

If you are going to specialize for dim 2, I imagine you won't be computing 
normal distributions, you will only generate a point uniformy in a square 
and reject it if it is not in the ball? (interestingly enough this is used 
as a subroutine by the implementation of normal_distribution)
Ulrich Drepper Aug. 9, 2014, 3:24 p.m. UTC | #4
On Sat, Aug 9, 2014 at 8:34 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
> Oh, a comment saying exactly what you just said would be fine with me (or
> even nothing).

We might at some point use a different method than Box-Muller sampling
so I'm OK with the test.


> If you are going to specialize for dim 2, I imagine you won't be computing
> normal distributions, you will only generate a point uniformy in a square
> and reject it if it is not in the ball? (interestingly enough this is used
> as a subroutine by the implementation of normal_distribution)

We need to be *on* the circle, not inside.  We'll still have to follow
the algorithm unless I miss something.  With reasonable probability we
cannot generate those numbers directly from a uniform source. What is
optimized is just the norm computation.
Marc Glisse Aug. 9, 2014, 3:33 p.m. UTC | #5
On Sat, 9 Aug 2014, Ulrich Drepper wrote:

>> If you are going to specialize for dim 2, I imagine you won't be computing
>> normal distributions, you will only generate a point uniformy in a square
>> and reject it if it is not in the ball? (interestingly enough this is used
>> as a subroutine by the implementation of normal_distribution)
>
> We need to be *on* the circle, not inside.

Yes, you still need the normalization step (divide by the norm). It works 
whether you start from a uniform distribution in the disk or from a 
Gaussian in the plane, but the first one is easier to generate (generate 
points in a square until you get one in the disk). When the dimension 
becomes higher, the probability that a point in the cube actually belongs 
to the ball decreases, and it becomes cheaper to use a Gaussian.
Ed Smith-Rowland Aug. 9, 2014, 4:55 p.m. UTC | #6
On 08/09/2014 11:33 AM, Marc Glisse wrote:
> On Sat, 9 Aug 2014, Ulrich Drepper wrote:
>
>>> If you are going to specialize for dim 2, I imagine you won't be 
>>> computing
>>> normal distributions, you will only generate a point uniformy in a 
>>> square
>>> and reject it if it is not in the ball? (interestingly enough this 
>>> is used
>>> as a subroutine by the implementation of normal_distribution)
>>
>> We need to be *on* the circle, not inside.
>
> Yes, you still need the normalization step (divide by the norm). It 
> works whether you start from a uniform distribution in the disk or 
> from a Gaussian in the plane, but the first one is easier to generate 
> (generate points in a square until you get one in the disk). When the 
> dimension becomes higher, the probability that a point in the cube 
> actually belongs to the ball decreases, and it becomes cheaper to use 
> a Gaussian.
>
If we pull in the implementation of normal you will just be able to use 
the two values that normal computes on the first, (and third, ...) calls 
without doing two calls.  That and hypot would be a real win.
diff mbox

Patch

diff --git a/libstdc++-v3/include/ext/random.tcc b/libstdc++-v3/include/ext/random.tcc
index 05361d8..d1f0b9c 100644
--- a/libstdc++-v3/include/ext/random.tcc
+++ b/libstdc++-v3/include/ext/random.tcc
@@ -1548,13 +1548,21 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		 const param_type& __p)
       {
 	result_type __ret;
-	_RealType __sum = _RealType(0);
+	_RealType __norm;
+
+	do
+	  {
+	    _RealType __sum = _RealType(0);
+
+	    std::generate(__ret.begin(), __ret.end(),
+		          [&__urng, &__sum, this](){
+			    _RealType __t = _M_nd(__urng);
+			    __sum += __t * __t;
+			    return __t; });
+	    __norm = std::sqrt(__sum);
+	  }
+	while (__norm == _RealType(0));
 
-	std::generate(__ret.begin(), __ret.end(),
-		      [&__urng, &__sum, this](){ _RealType __t = _M_nd(__urng);
-						 __sum += __t * __t;
-						 return __t; });
-	auto __norm = std::sqrt(__sum);
 	std::transform(__ret.begin(), __ret.end(), __ret.begin(),
 		       [__norm](_RealType __val){ return __val / __norm; });