diff mbox

Crypto oops in async_chainiv_do_postponed

Message ID 19101.16628.347039.619378@waldo.imnotcreative.homeip.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Brad Bosch Sept. 1, 2009, 3:42 p.m. UTC
Herbert Xu writes:
 > On Mon, Aug 31, 2009 at 11:11:42AM -0500, Brad Bosch wrote:
 > >
 > > OK.  I was looking for something subtle because the crash takes a long
 > > time to happen.  But do you agree that the race I described above also
 > > a real bug?
 > 
 > No I don't think it is.  CHAINV_STATE_INUSE guarantees that only
 > one entity can use ctx->err at any time.

I don't see how you are protecting ctx->err with the INUSE flag.  For
example:

If two threads enter async_chainiv_givencrypt at the same time, one
thread will call async_chainiv_postpone_request (INUSE will be clear
until set by async_chainiv_postpone_request) and the other thread will
call async_chainiv_givencrypt_tail (INUSE may or may not be set yet).

Now, ctx-err may be used by both async_chainiv_postpone_request to
store the return value from skcipher_enqueue_givcrypt and by
async_chainiv_givencrypt_tail to store the return value from
crypto_ablkcipher_encrypt at the same time.  This can cause the
calling function to think async_chainiv_givencrypt has completed it's
work, when in fact, the work was defered.

The patch I proposed earlier (included again below) avoids this and
also makes the error handling simpler and more direct without
requiring ctx->err at all.  I still don't understand why ctx->err was
required in the first place.

Did I miss something with regard to the use of ctx->err?

Now, as to the other bug...

 > 
 > Where we subtract the offset the pointer can never be NULL.  Please
 > try my patch.

OK.  I see now that your offset patch should indeed solve that
problem.  But why did you choose to fix it in a complex way?  My
suggestion just adds a single test while yours adds new parameters, a
new function and an extra function call.

Thanks for your help.

--Brad

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Herbert Xu Sept. 1, 2009, 10:17 p.m. UTC | #1
On Tue, Sep 01, 2009 at 10:42:44AM -0500, Brad Bosch wrote:
> 
> Now, ctx-err may be used by both async_chainiv_postpone_request to
> store the return value from skcipher_enqueue_givcrypt and by
> async_chainiv_givencrypt_tail to store the return value from
> crypto_ablkcipher_encrypt at the same time.  This can cause the
> calling function to think async_chainiv_givencrypt has completed it's
> work, when in fact, the work was defered.

async_chainiv_postpone_request never touches ctx->err unless
it can obtain the INUSE bit lock.  On the other hand, the normal
patch async_chainiv_givencrypt_tail never relinquishes the INUSE
bit until it is finisehd with ctx->err.
 
> OK.  I see now that your offset patch should indeed solve that
> problem.  But why did you choose to fix it in a complex way?  My
> suggestion just adds a single test while yours adds new parameters, a
> new function and an extra function call.

Because that introduces two NULL checks where the second one is
useless.  Not a big deal but then again, my patch wasn't that
complicated either :)

Please let me know whether it actually fixes your problem though
so I can get this upstream.

Thanks,
Brad Bosch Sept. 2, 2009, 2:23 p.m. UTC | #2
(resent due to bounce notification for vger)
Herbert Xu writes:
 > On Tue, Sep 01, 2009 at 10:42:44AM -0500, Brad Bosch wrote:
 > > 
 > > Now, ctx-err may be used by both async_chainiv_postpone_request to
 > > store the return value from skcipher_enqueue_givcrypt and by
 > > async_chainiv_givencrypt_tail to store the return value from
 > > crypto_ablkcipher_encrypt at the same time.  This can cause the
 > > calling function to think async_chainiv_givencrypt has completed it's
 > > work, when in fact, the work was defered.
 > 
 > async_chainiv_postpone_request never touches ctx->err unless
 > it can obtain the INUSE bit lock.  On the other hand, the normal
 > patch async_chainiv_givencrypt_tail never relinquishes the INUSE
 > bit until it is finisehd with ctx->err.

But the above statements are not adequate to demonstrate that your use
of the INUSE flag always prevents a condition where both
async_chainiv_postpone_request and async_chainiv_givencrypt_tail
operate on the same ctx at the same time.  The flaw in your logic may
be that async_chainiv_schedule_work does not have solid assurance that
it's thread is the one that holds the INUSE bit when it calls
clear_bit.

I seem to have trouble getting the details right in describing a path
that causes both uses of ctx->err to happen at the same time.  Let me
try again.

Assume the worker thread is executing between the dequeue in
async_chainiv_do_postponed and the clear_bit call in
async_chainiv_schedule_work.  Further assume that we are processing
the last item on the queue so durring this time, ctx->queue.qlen =
0.

Meanwhile, three threads enter async_chainiv_givencrypt for the same
ctx at about the same time.

Thread one calls test_and_set_bit which returns 1 and calls
async_cahiniv_postpone_request but suppose it has not yet enqueued.
Now INUSE is set and qlen=0.

Next, the worker thread calls clear_bit in async_chainiv_schedule_work
but it is interrupted before it can call test_and_set_bit.  Now INUSE
is clear and qlen=0

The test_and_set_bit in thread two is called at this moment and
returns 0 and then calls async_chainiv_givencrypt_tail.  Now INUSE is
set and qlen=0.

Thread one now locks the ctx and calls skcipher_enqueue_givcrypt and
unlocks.  Now INUSE is set and qlen=1.

Thread three calls test_and_set_bit which returns 1 and then it clears
INUSE since qlen=1 and it calls postpone with INUSE clear and qlen=1

Now thread three will use ctx->err to hold the return value of
skcipher_enqueue_givcrypt at the same time as thread two uses ctx->err
to hold the return value of crypto_ablkcipher_encrypt!

Did I make a mistake above?  I suspect more bad things can happen as
well in this scenario, but I'm just focusing on the use of ctx->err here.

 > 
 > Please let me know whether it actually fixes your problem though
 > so I can get this upstream.

Unfortunately, the offset problem is not easily reproduced with our
application, so testing long enough to be sure the problem is fixed
(assuming that it was indeed the cause of the oops) may not be
practical.  All I can say at the moment is that I have not seen the
crash since I introduced the two patches I sent you.

Thanks for taking the time to discuss this!

--Brad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Sept. 2, 2009, 9:57 p.m. UTC | #3
On Wed, Sep 02, 2009 at 09:08:38AM -0500, Brad Bosch wrote:
> 
> Assume the worker thread is executing between the dequeue in
> async_chainiv_do_postponed and the clear_bit call in
> async_chainiv_schedule_work.  Further assume that we are processing

It cannot.  The worker thread can only execute when it owns
the INUSE bit.  In that case do_postponed will never call the
schedule_work function.

Perhaps you were misled by the clear_bit call in schedule_work.
That is only used if we end up not scheduling the work.
 
> Unfortunately, the offset problem is not easily reproduced with our
> application, so testing long enough to be sure the problem is fixed
> (assuming that it was indeed the cause of the oops) may not be
> practical.  All I can say at the moment is that I have not seen the
> crash since I introduced the two patches I sent you.

OK I'll forward this upstream then.

Thanks,
Brad Bosch Sept. 2, 2009, 11:47 p.m. UTC | #4
Herbert Xu writes:
 > On Wed, Sep 02, 2009 at 09:08:38AM -0500, Brad Bosch wrote:
 > > 
 > > Assume the worker thread is executing between the dequeue in
 > > async_chainiv_do_postponed and the clear_bit call in
 > > async_chainiv_schedule_work.  Further assume that we are processing
 > 
 > It cannot.  The worker thread can only execute when it owns
 > the INUSE bit.  In that case do_postponed will never call the
 > schedule_work function.

In the example I cited (one entry in the queue when the worker
function starts), async_chainiv_schedule_work is indeed executed.
(indirectly) by async_chainiv_givencrypt_tail from the worker thread.
I'm sorry I didn't make it more clear that it is that code path I was
talking about.

 > 
 > Perhaps you were misled by the clear_bit call in schedule_work.
 > That is only used if we end up not scheduling the work.

No, I was not misled.  But apparently, I was not clear.  I do
understand how you use the INUSE bit.  I did not say above that
INUSE is not set when the worker thread is running (at least not for
the first part of my example).  If you had read further, you might
have noticed that the following paragraphs showed that indeed I do
understand that INUSE is set in the worker thread as evidenced by
"thread one calls test_and_set_bit which returns 1" I have added one
sentence (marked by **) to my event description below to make my
understanding more clear.  Please read on.

Assume the worker thread is executing between the dequeue in
async_chainiv_do_postponed and the clear_bit call in
async_chainiv_schedule_work.  Further assume that we are processing
the last item on the queue so durring this time, ctx->queue.qlen =
0.  **INUSE is still set at this point.

Meanwhile, three threads enter async_chainiv_givencrypt for the same
ctx at about the same time.

Thread one calls test_and_set_bit which returns 1 and calls
async_cahiniv_postpone_request but suppose it has not yet enqueued.
Now INUSE is set and qlen=0.

Next, the worker thread calls clear_bit in async_chainiv_schedule_work
but it is interrupted before it can call test_and_set_bit.  Now INUSE
is clear and qlen=0

The test_and_set_bit in thread two is called at this moment and
returns 0 and then calls async_chainiv_givencrypt_tail.  Now INUSE is
set and qlen=0.

Thread one now locks the ctx and calls skcipher_enqueue_givcrypt and
unlocks.  Now INUSE is set and qlen=1.

Thread three calls test_and_set_bit which returns 1 and then it clears
INUSE since qlen=1 and it calls postpone with INUSE clear and qlen=1

Now thread three will use ctx->err to hold the return value of
skcipher_enqueue_givcrypt at the same time as thread two uses ctx->err
to hold the return value of crypto_ablkcipher_encrypt!

Did I make a mistake above?  I suspect more bad things can happen as
well in this scenario, but I'm just focusing on the use of ctx->err here.

Thanks

--Brad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Sept. 3, 2009, 1:53 a.m. UTC | #5
On Wed, Sep 02, 2009 at 06:47:49PM -0500, Brad Bosch wrote:
>
> Assume the worker thread is executing between the dequeue in
> async_chainiv_do_postponed and the clear_bit call in
> async_chainiv_schedule_work.  Further assume that we are processing
> the last item on the queue so durring this time, ctx->queue.qlen =
> 0.  **INUSE is still set at this point.
> 
> Meanwhile, three threads enter async_chainiv_givencrypt for the same
> ctx at about the same time.
> 
> Thread one calls test_and_set_bit which returns 1 and calls
> async_cahiniv_postpone_request but suppose it has not yet enqueued.
> Now INUSE is set and qlen=0.
> 
> Next, the worker thread calls clear_bit in async_chainiv_schedule_work
> but it is interrupted before it can call test_and_set_bit.  Now INUSE
> is clear and qlen=0
> 
> The test_and_set_bit in thread two is called at this moment and
> returns 0 and then calls async_chainiv_givencrypt_tail.  Now INUSE is
> set and qlen=0.
> 
> Thread one now locks the ctx and calls skcipher_enqueue_givcrypt and
> unlocks.  Now INUSE is set and qlen=1.
> 
> Thread three calls test_and_set_bit which returns 1 and then it clears
> INUSE since qlen=1 and it calls postpone with INUSE clear and qlen=1

How can thread three clear INUSE if test_and_set_bit returned 1?
If thread three sees it set then it will postpone.  It can only
clear it if it was not set originally.

Cheers,
diff mbox

Patch

Index: chainiv.c
===================================================================
RCS file: /share/cvs/sdg/kernels/kernel.wms/kernel_2_6_27/src/crypto/chainiv.c,v
retrieving revision 1.1.1.1.4.2
diff -u -r1.1.1.1.4.2 chainiv.c
--- chainiv.c	10 Mar 2009 05:16:24 -0000	1.1.1.1.4.2
+++ chainiv.c	27 Aug 2009 19:40:27 -0000
@@ -36,7 +36,6 @@ 
 	unsigned long state;
 
 	spinlock_t lock;
-	int err;
 
 	struct crypto_queue queue;
 	struct work_struct postponed;
@@ -114,10 +113,9 @@ 
 	return chainiv_init_common(tfm);
 }
 
-static int async_chainiv_schedule_work(struct async_chainiv_ctx *ctx)
+static void async_chainiv_schedule_work(struct async_chainiv_ctx *ctx)
 {
 	int queued;
-	int err = ctx->err;
 
 	if (!ctx->queue.qlen) {
 		smp_mb__before_clear_bit();
@@ -125,14 +123,11 @@ 
 
 		if (!ctx->queue.qlen ||
 		    test_and_set_bit(CHAINIV_STATE_INUSE, &ctx->state))
-			goto out;
+			return;
 	}
 
 	queued = schedule_work(&ctx->postponed);
 	BUG_ON(!queued);
-
-out:
-	return err;
 }
 
 static int async_chainiv_postpone_request(struct skcipher_givcrypt_request *req)
@@ -148,8 +143,8 @@ 
 	if (test_and_set_bit(CHAINIV_STATE_INUSE, &ctx->state))
 		return err;
 
-	ctx->err = err;
-	return async_chainiv_schedule_work(ctx);
+	async_chainiv_schedule_work(ctx);
+	return err;
 }
 
 static int async_chainiv_givencrypt_tail(struct skcipher_givcrypt_request *req)
@@ -158,18 +153,20 @@ 
 	struct async_chainiv_ctx *ctx = crypto_ablkcipher_ctx(geniv);
 	struct ablkcipher_request *subreq = skcipher_givcrypt_reqctx(req);
 	unsigned int ivsize = crypto_ablkcipher_ivsize(geniv);
+	int err;
 
 	memcpy(req->giv, ctx->iv, ivsize);
 	memcpy(subreq->info, ctx->iv, ivsize);
 
-	ctx->err = crypto_ablkcipher_encrypt(subreq);
-	if (ctx->err)
+	err = crypto_ablkcipher_encrypt(subreq);
+	if (err)
 		goto out;
 
 	memcpy(ctx->iv, subreq->info, ivsize);
 
 out:
-	return async_chainiv_schedule_work(ctx);
+	async_chainiv_schedule_work(ctx);
+	return err;
 }
 
 static int async_chainiv_givencrypt(struct skcipher_givcrypt_request *req)
@@ -236,7 +233,7 @@ 
 	spin_unlock_bh(&ctx->lock);
 
 	if (!req) {
-		async_chainiv_schedule_work(ctx);
+	    async_chainiv_schedule_work(ctx);
 		return;
 	}