Message ID | 19101.16628.347039.619378@waldo.imnotcreative.homeip.net |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, Sep 01, 2009 at 10:42:44AM -0500, Brad Bosch wrote: > > Now, ctx-err may be used by both async_chainiv_postpone_request to > store the return value from skcipher_enqueue_givcrypt and by > async_chainiv_givencrypt_tail to store the return value from > crypto_ablkcipher_encrypt at the same time. This can cause the > calling function to think async_chainiv_givencrypt has completed it's > work, when in fact, the work was defered. async_chainiv_postpone_request never touches ctx->err unless it can obtain the INUSE bit lock. On the other hand, the normal patch async_chainiv_givencrypt_tail never relinquishes the INUSE bit until it is finisehd with ctx->err. > OK. I see now that your offset patch should indeed solve that > problem. But why did you choose to fix it in a complex way? My > suggestion just adds a single test while yours adds new parameters, a > new function and an extra function call. Because that introduces two NULL checks where the second one is useless. Not a big deal but then again, my patch wasn't that complicated either :) Please let me know whether it actually fixes your problem though so I can get this upstream. Thanks,
(resent due to bounce notification for vger) Herbert Xu writes: > On Tue, Sep 01, 2009 at 10:42:44AM -0500, Brad Bosch wrote: > > > > Now, ctx-err may be used by both async_chainiv_postpone_request to > > store the return value from skcipher_enqueue_givcrypt and by > > async_chainiv_givencrypt_tail to store the return value from > > crypto_ablkcipher_encrypt at the same time. This can cause the > > calling function to think async_chainiv_givencrypt has completed it's > > work, when in fact, the work was defered. > > async_chainiv_postpone_request never touches ctx->err unless > it can obtain the INUSE bit lock. On the other hand, the normal > patch async_chainiv_givencrypt_tail never relinquishes the INUSE > bit until it is finisehd with ctx->err. But the above statements are not adequate to demonstrate that your use of the INUSE flag always prevents a condition where both async_chainiv_postpone_request and async_chainiv_givencrypt_tail operate on the same ctx at the same time. The flaw in your logic may be that async_chainiv_schedule_work does not have solid assurance that it's thread is the one that holds the INUSE bit when it calls clear_bit. I seem to have trouble getting the details right in describing a path that causes both uses of ctx->err to happen at the same time. Let me try again. Assume the worker thread is executing between the dequeue in async_chainiv_do_postponed and the clear_bit call in async_chainiv_schedule_work. Further assume that we are processing the last item on the queue so durring this time, ctx->queue.qlen = 0. Meanwhile, three threads enter async_chainiv_givencrypt for the same ctx at about the same time. Thread one calls test_and_set_bit which returns 1 and calls async_cahiniv_postpone_request but suppose it has not yet enqueued. Now INUSE is set and qlen=0. Next, the worker thread calls clear_bit in async_chainiv_schedule_work but it is interrupted before it can call test_and_set_bit. Now INUSE is clear and qlen=0 The test_and_set_bit in thread two is called at this moment and returns 0 and then calls async_chainiv_givencrypt_tail. Now INUSE is set and qlen=0. Thread one now locks the ctx and calls skcipher_enqueue_givcrypt and unlocks. Now INUSE is set and qlen=1. Thread three calls test_and_set_bit which returns 1 and then it clears INUSE since qlen=1 and it calls postpone with INUSE clear and qlen=1 Now thread three will use ctx->err to hold the return value of skcipher_enqueue_givcrypt at the same time as thread two uses ctx->err to hold the return value of crypto_ablkcipher_encrypt! Did I make a mistake above? I suspect more bad things can happen as well in this scenario, but I'm just focusing on the use of ctx->err here. > > Please let me know whether it actually fixes your problem though > so I can get this upstream. Unfortunately, the offset problem is not easily reproduced with our application, so testing long enough to be sure the problem is fixed (assuming that it was indeed the cause of the oops) may not be practical. All I can say at the moment is that I have not seen the crash since I introduced the two patches I sent you. Thanks for taking the time to discuss this! --Brad -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 02, 2009 at 09:08:38AM -0500, Brad Bosch wrote: > > Assume the worker thread is executing between the dequeue in > async_chainiv_do_postponed and the clear_bit call in > async_chainiv_schedule_work. Further assume that we are processing It cannot. The worker thread can only execute when it owns the INUSE bit. In that case do_postponed will never call the schedule_work function. Perhaps you were misled by the clear_bit call in schedule_work. That is only used if we end up not scheduling the work. > Unfortunately, the offset problem is not easily reproduced with our > application, so testing long enough to be sure the problem is fixed > (assuming that it was indeed the cause of the oops) may not be > practical. All I can say at the moment is that I have not seen the > crash since I introduced the two patches I sent you. OK I'll forward this upstream then. Thanks,
Herbert Xu writes: > On Wed, Sep 02, 2009 at 09:08:38AM -0500, Brad Bosch wrote: > > > > Assume the worker thread is executing between the dequeue in > > async_chainiv_do_postponed and the clear_bit call in > > async_chainiv_schedule_work. Further assume that we are processing > > It cannot. The worker thread can only execute when it owns > the INUSE bit. In that case do_postponed will never call the > schedule_work function. In the example I cited (one entry in the queue when the worker function starts), async_chainiv_schedule_work is indeed executed. (indirectly) by async_chainiv_givencrypt_tail from the worker thread. I'm sorry I didn't make it more clear that it is that code path I was talking about. > > Perhaps you were misled by the clear_bit call in schedule_work. > That is only used if we end up not scheduling the work. No, I was not misled. But apparently, I was not clear. I do understand how you use the INUSE bit. I did not say above that INUSE is not set when the worker thread is running (at least not for the first part of my example). If you had read further, you might have noticed that the following paragraphs showed that indeed I do understand that INUSE is set in the worker thread as evidenced by "thread one calls test_and_set_bit which returns 1" I have added one sentence (marked by **) to my event description below to make my understanding more clear. Please read on. Assume the worker thread is executing between the dequeue in async_chainiv_do_postponed and the clear_bit call in async_chainiv_schedule_work. Further assume that we are processing the last item on the queue so durring this time, ctx->queue.qlen = 0. **INUSE is still set at this point. Meanwhile, three threads enter async_chainiv_givencrypt for the same ctx at about the same time. Thread one calls test_and_set_bit which returns 1 and calls async_cahiniv_postpone_request but suppose it has not yet enqueued. Now INUSE is set and qlen=0. Next, the worker thread calls clear_bit in async_chainiv_schedule_work but it is interrupted before it can call test_and_set_bit. Now INUSE is clear and qlen=0 The test_and_set_bit in thread two is called at this moment and returns 0 and then calls async_chainiv_givencrypt_tail. Now INUSE is set and qlen=0. Thread one now locks the ctx and calls skcipher_enqueue_givcrypt and unlocks. Now INUSE is set and qlen=1. Thread three calls test_and_set_bit which returns 1 and then it clears INUSE since qlen=1 and it calls postpone with INUSE clear and qlen=1 Now thread three will use ctx->err to hold the return value of skcipher_enqueue_givcrypt at the same time as thread two uses ctx->err to hold the return value of crypto_ablkcipher_encrypt! Did I make a mistake above? I suspect more bad things can happen as well in this scenario, but I'm just focusing on the use of ctx->err here. Thanks --Brad -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 02, 2009 at 06:47:49PM -0500, Brad Bosch wrote: > > Assume the worker thread is executing between the dequeue in > async_chainiv_do_postponed and the clear_bit call in > async_chainiv_schedule_work. Further assume that we are processing > the last item on the queue so durring this time, ctx->queue.qlen = > 0. **INUSE is still set at this point. > > Meanwhile, three threads enter async_chainiv_givencrypt for the same > ctx at about the same time. > > Thread one calls test_and_set_bit which returns 1 and calls > async_cahiniv_postpone_request but suppose it has not yet enqueued. > Now INUSE is set and qlen=0. > > Next, the worker thread calls clear_bit in async_chainiv_schedule_work > but it is interrupted before it can call test_and_set_bit. Now INUSE > is clear and qlen=0 > > The test_and_set_bit in thread two is called at this moment and > returns 0 and then calls async_chainiv_givencrypt_tail. Now INUSE is > set and qlen=0. > > Thread one now locks the ctx and calls skcipher_enqueue_givcrypt and > unlocks. Now INUSE is set and qlen=1. > > Thread three calls test_and_set_bit which returns 1 and then it clears > INUSE since qlen=1 and it calls postpone with INUSE clear and qlen=1 How can thread three clear INUSE if test_and_set_bit returned 1? If thread three sees it set then it will postpone. It can only clear it if it was not set originally. Cheers,
Index: chainiv.c =================================================================== RCS file: /share/cvs/sdg/kernels/kernel.wms/kernel_2_6_27/src/crypto/chainiv.c,v retrieving revision 1.1.1.1.4.2 diff -u -r1.1.1.1.4.2 chainiv.c --- chainiv.c 10 Mar 2009 05:16:24 -0000 1.1.1.1.4.2 +++ chainiv.c 27 Aug 2009 19:40:27 -0000 @@ -36,7 +36,6 @@ unsigned long state; spinlock_t lock; - int err; struct crypto_queue queue; struct work_struct postponed; @@ -114,10 +113,9 @@ return chainiv_init_common(tfm); } -static int async_chainiv_schedule_work(struct async_chainiv_ctx *ctx) +static void async_chainiv_schedule_work(struct async_chainiv_ctx *ctx) { int queued; - int err = ctx->err; if (!ctx->queue.qlen) { smp_mb__before_clear_bit(); @@ -125,14 +123,11 @@ if (!ctx->queue.qlen || test_and_set_bit(CHAINIV_STATE_INUSE, &ctx->state)) - goto out; + return; } queued = schedule_work(&ctx->postponed); BUG_ON(!queued); - -out: - return err; } static int async_chainiv_postpone_request(struct skcipher_givcrypt_request *req) @@ -148,8 +143,8 @@ if (test_and_set_bit(CHAINIV_STATE_INUSE, &ctx->state)) return err; - ctx->err = err; - return async_chainiv_schedule_work(ctx); + async_chainiv_schedule_work(ctx); + return err; } static int async_chainiv_givencrypt_tail(struct skcipher_givcrypt_request *req) @@ -158,18 +153,20 @@ struct async_chainiv_ctx *ctx = crypto_ablkcipher_ctx(geniv); struct ablkcipher_request *subreq = skcipher_givcrypt_reqctx(req); unsigned int ivsize = crypto_ablkcipher_ivsize(geniv); + int err; memcpy(req->giv, ctx->iv, ivsize); memcpy(subreq->info, ctx->iv, ivsize); - ctx->err = crypto_ablkcipher_encrypt(subreq); - if (ctx->err) + err = crypto_ablkcipher_encrypt(subreq); + if (err) goto out; memcpy(ctx->iv, subreq->info, ivsize); out: - return async_chainiv_schedule_work(ctx); + async_chainiv_schedule_work(ctx); + return err; } static int async_chainiv_givencrypt(struct skcipher_givcrypt_request *req) @@ -236,7 +233,7 @@ spin_unlock_bh(&ctx->lock); if (!req) { - async_chainiv_schedule_work(ctx); + async_chainiv_schedule_work(ctx); return; }