diff mbox series

[16/18] crypto: talitos - do hw_context DMA mapping outside the requests

Message ID e47bf94ae44f23230b1647e31c9623f1075d3b79.1507284818.git.christophe.leroy@c-s.fr (mailing list archive)
State Not Applicable
Headers show
Series crypto: talitos - fixes and performance improvement | expand

Commit Message

Christophe Leroy Oct. 6, 2017, 1:05 p.m. UTC
At every request, we map and unmap the same hash hw_context.

This patch moves the dma mapping/unmapping in functions ahash_init()
and ahash_import().

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 drivers/crypto/talitos.c | 80 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 57 insertions(+), 23 deletions(-)

Comments

Horia Geantă Feb. 7, 2018, 2:39 p.m. UTC | #1
On 10/6/2017 4:06 PM, Christophe Leroy wrote:
> At every request, we map and unmap the same hash hw_context.
> 
> This patch moves the dma mapping/unmapping in functions ahash_init()
> and ahash_import().
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  drivers/crypto/talitos.c | 80 ++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 57 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
> index ebfd6d982ed6..d495649d5267 100644
> --- a/drivers/crypto/talitos.c
> +++ b/drivers/crypto/talitos.c
> @@ -819,6 +819,7 @@ struct talitos_ctx {
>  	unsigned int keylen;
>  	unsigned int enckeylen;
>  	unsigned int authkeylen;
> +	dma_addr_t dma_hw_context;
This doesn't look correct.

talitos_ctx structure is the tfm context.
dma_hw_context is the IOVA of hw_context, located in talitos_ahash_req_ctx
structure (request context).

If there are multiple requests in flight for the same tfm, dma_hw_context will
be overwritten.

dma_hw_context needs to be moved in request context (talitos_ahash_req_ctx struct).

Thanks,
Horia
Christophe Leroy Feb. 17, 2018, 4:32 p.m. UTC | #2
Le 07/02/2018 à 15:39, Horia Geantă a écrit :
> On 10/6/2017 4:06 PM, Christophe Leroy wrote:
>> At every request, we map and unmap the same hash hw_context.
>>
>> This patch moves the dma mapping/unmapping in functions ahash_init()
>> and ahash_import().
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>>   drivers/crypto/talitos.c | 80 ++++++++++++++++++++++++++++++++++--------------
>>   1 file changed, 57 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
>> index ebfd6d982ed6..d495649d5267 100644
>> --- a/drivers/crypto/talitos.c
>> +++ b/drivers/crypto/talitos.c
>> @@ -819,6 +819,7 @@ struct talitos_ctx {
>>   	unsigned int keylen;
>>   	unsigned int enckeylen;
>>   	unsigned int authkeylen;
>> +	dma_addr_t dma_hw_context;
> This doesn't look correct.
> 
> talitos_ctx structure is the tfm context.
> dma_hw_context is the IOVA of hw_context, located in talitos_ahash_req_ctx
> structure (request context).

Yes but I have now found how I can know that the request context is 
being released in order to unmap() dma at that time.
It is tricky to use the tmf context I agree, but at least I know when 
tmf context get destroyed, ie in talitos_cra_exit_ahash()
The request context is created by ahash_request_alloc() and released by
ahash_request_free(). I have not found the way to call dma_unmap() 
before ahash_request_free() gets called.

> 
> If there are multiple requests in flight for the same tfm, dma_hw_context will
> be overwritten.

Before overwritting dma_hw_context, it is always released, see 
talitos_cra_exit_ahash(), ahash_init(), ahash_import()

> 
> dma_hw_context needs to be moved in request context (talitos_ahash_req_ctx struct).

Any suggestion then on how to handle the issue explained above ?

Thanks
Christophe

> 
> Thanks,
> Horia
>
Horia Geantă Feb. 18, 2018, 5:14 p.m. UTC | #3
On 2/17/2018 6:32 PM, Christophe LEROY wrote:
> 
> 
> Le 07/02/2018 à 15:39, Horia Geantă a écrit :
>> On 10/6/2017 4:06 PM, Christophe Leroy wrote:
>>> At every request, we map and unmap the same hash hw_context.
>>>
>>> This patch moves the dma mapping/unmapping in functions ahash_init()
>>> and ahash_import().
>>>
>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>> ---
>>>   drivers/crypto/talitos.c | 80 ++++++++++++++++++++++++++++++++++--------------
>>>   1 file changed, 57 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
>>> index ebfd6d982ed6..d495649d5267 100644
>>> --- a/drivers/crypto/talitos.c
>>> +++ b/drivers/crypto/talitos.c
>>> @@ -819,6 +819,7 @@ struct talitos_ctx {
>>>   	unsigned int keylen;
>>>   	unsigned int enckeylen;
>>>   	unsigned int authkeylen;
>>> +	dma_addr_t dma_hw_context;
>> This doesn't look correct.
>>
>> talitos_ctx structure is the tfm context.
>> dma_hw_context is the IOVA of hw_context, located in talitos_ahash_req_ctx
>> structure (request context).
> 
> Yes but I have now found how I can know that the request context is 
> being released in order to unmap() dma at that time.
> It is tricky to use the tmf context I agree, but at least I know when 
> tmf context get destroyed, ie in talitos_cra_exit_ahash()
> The request context is created by ahash_request_alloc() and released by
> ahash_request_free(). I have not found the way to call dma_unmap() 
> before ahash_request_free() gets called.
> 
>>
>> If there are multiple requests in flight for the same tfm, dma_hw_context will
>> be overwritten.
> 
> Before overwritting dma_hw_context, it is always released, see 
> talitos_cra_exit_ahash(), ahash_init(), ahash_import()
> 
The problem is not the unmapping.
If there are two requests for the same tfm, then given the following sequence
1. tfm->ahash_init(req1)
	tfm_ctx->dma_hw_context points to req1_ctx->hw_context
2. tfm->ahash_init(req2)
	tfm_ctx->dma_hw_context [unmapped, then] points to req2_ctx->hw_context
i.e. req1 will use the hw_context of req2.

>>
>> dma_hw_context needs to be moved in request context (talitos_ahash_req_ctx struct).
> 
> Any suggestion then on how to handle the issue explained above ?
> 
There is no ahash_exit() callback mirroring ahash_init().

The clean-up of request ctx should be done in the last states of the hash flows
described here:
https://www.kernel.org/doc/html/latest/crypto/devel-algos.html#cipher-definition-with-struct-shash-alg-and-ahash-alg
for e.g. in the final() callback.

Hope this helps,
Horia
Christophe Leroy Feb. 19, 2018, 7:58 a.m. UTC | #4
Le 18/02/2018 à 18:14, Horia Geantă a écrit :
> On 2/17/2018 6:32 PM, Christophe LEROY wrote:
>>
>>
>> Le 07/02/2018 à 15:39, Horia Geantă a écrit :
>>> On 10/6/2017 4:06 PM, Christophe Leroy wrote:
>>>> At every request, we map and unmap the same hash hw_context.
>>>>
>>>> This patch moves the dma mapping/unmapping in functions ahash_init()
>>>> and ahash_import().
>>>>
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>>> ---
>>>>    drivers/crypto/talitos.c | 80 ++++++++++++++++++++++++++++++++++--------------
>>>>    1 file changed, 57 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
>>>> index ebfd6d982ed6..d495649d5267 100644
>>>> --- a/drivers/crypto/talitos.c
>>>> +++ b/drivers/crypto/talitos.c
>>>> @@ -819,6 +819,7 @@ struct talitos_ctx {
>>>>    	unsigned int keylen;
>>>>    	unsigned int enckeylen;
>>>>    	unsigned int authkeylen;
>>>> +	dma_addr_t dma_hw_context;
>>> This doesn't look correct.
>>>
>>> talitos_ctx structure is the tfm context.
>>> dma_hw_context is the IOVA of hw_context, located in talitos_ahash_req_ctx
>>> structure (request context).
>>
>> Yes but I have now found how I can know that the request context is
>> being released in order to unmap() dma at that time.
>> It is tricky to use the tmf context I agree, but at least I know when
>> tmf context get destroyed, ie in talitos_cra_exit_ahash()
>> The request context is created by ahash_request_alloc() and released by
>> ahash_request_free(). I have not found the way to call dma_unmap()
>> before ahash_request_free() gets called.
>>
>>>
>>> If there are multiple requests in flight for the same tfm, dma_hw_context will
>>> be overwritten.
>>
>> Before overwritting dma_hw_context, it is always released, see
>> talitos_cra_exit_ahash(), ahash_init(), ahash_import()
>>
> The problem is not the unmapping.
> If there are two requests for the same tfm, then given the following sequence
> 1. tfm->ahash_init(req1)
> 	tfm_ctx->dma_hw_context points to req1_ctx->hw_context
> 2. tfm->ahash_init(req2)
> 	tfm_ctx->dma_hw_context [unmapped, then] points to req2_ctx->hw_context
> i.e. req1 will use the hw_context of req2.
> 
>>>
>>> dma_hw_context needs to be moved in request context (talitos_ahash_req_ctx struct).
>>
>> Any suggestion then on how to handle the issue explained above ?
>>
> There is no ahash_exit() callback mirroring ahash_init().
> 
> The clean-up of request ctx should be done in the last states of the hash flows
> described here:
> https://www.kernel.org/doc/html/latest/crypto/devel-algos.html#cipher-definition-with-struct-shash-alg-and-ahash-alg
> for e.g. in the final() callback.

Unfortunatly it seems that we can't rely on those finalising functions 
being called all the time.
If you look into test_ahash_jiffies() for instance, in case of error the 
call of crypto_hash_final() is skipped.
So at the time being, I can't see any place to put the unmapping to be 
100% sure it will be done before the call of ahash_request_free()

Christophe

> 
> Hope this helps,
> Horia
>
Horia Geantă Feb. 19, 2018, 8:30 a.m. UTC | #5
On 2/19/2018 9:58 AM, Christophe LEROY wrote:
> Le 18/02/2018 à 18:14, Horia Geantă a écrit :
>> There is no ahash_exit() callback mirroring ahash_init().
>>
>> The clean-up of request ctx should be done in the last states of the hash flows
>> described here:
>> https://www.kernel.org/doc/html/latest/crypto/devel-algos.html#cipher-definition-with-struct-shash-alg-and-ahash-alg
>> for e.g. in the final() callback.
> 
> Unfortunatly it seems that we can't rely on those finalising functions 
> being called all the time.
> If you look into test_ahash_jiffies() for instance, in case of error the 
> call of crypto_hash_final() is skipped.

If test_ahash_jiffies() errors before calling crypto_ahash_final(req), this
means a previous callback failed.
Accordingly, DMA unmapping should be performed also on the corresponding errors
paths in the driver.

Horia
Christophe Leroy Feb. 19, 2018, 9:14 a.m. UTC | #6
Le 19/02/2018 à 09:30, Horia Geantă a écrit :
> On 2/19/2018 9:58 AM, Christophe LEROY wrote:
>> Le 18/02/2018 à 18:14, Horia Geantă a écrit :
>>> There is no ahash_exit() callback mirroring ahash_init().
>>>
>>> The clean-up of request ctx should be done in the last states of the hash flows
>>> described here:
>>> https://www.kernel.org/doc/html/latest/crypto/devel-algos.html#cipher-definition-with-struct-shash-alg-and-ahash-alg
>>> for e.g. in the final() callback.
>>
>> Unfortunatly it seems that we can't rely on those finalising functions
>> being called all the time.
>> If you look into test_ahash_jiffies() for instance, in case of error the
>> call of crypto_hash_final() is skipped.
> 
> If test_ahash_jiffies() errors before calling crypto_ahash_final(req), this
> means a previous callback failed.
> Accordingly, DMA unmapping should be performed also on the corresponding errors
> paths in the driver.
> 

And what about ALGIF path from user space ?
What if the user never calls the last sendmsg() which will call 
hash_finup() ?

Christophe
Horia Geantă Feb. 19, 2018, 1:16 p.m. UTC | #7
On 2/19/2018 11:14 AM, Christophe LEROY wrote:
> Le 19/02/2018 à 09:30, Horia Geantă a écrit :
>> On 2/19/2018 9:58 AM, Christophe LEROY wrote:
>>> Le 18/02/2018 à 18:14, Horia Geantă a écrit :
>>>> There is no ahash_exit() callback mirroring ahash_init().
>>>>
>>>> The clean-up of request ctx should be done in the last states of the hash flows
>>>> described here:
>>>> https://www.kernel.org/doc/html/latest/crypto/devel-algos.html#cipher-definition-with-struct-shash-alg-and-ahash-alg
>>>> for e.g. in the final() callback.
>>>
>>> Unfortunatly it seems that we can't rely on those finalising functions
>>> being called all the time.
>>> If you look into test_ahash_jiffies() for instance, in case of error the
>>> call of crypto_hash_final() is skipped.
>>
>> If test_ahash_jiffies() errors before calling crypto_ahash_final(req), this
>> means a previous callback failed.
>> Accordingly, DMA unmapping should be performed also on the corresponding errors
>> paths in the driver.
>>
> 
> And what about ALGIF path from user space ?
> What if the user never calls the last sendmsg() which will call 
> hash_finup() ?
> 
User is expected to follow the rules of the crypto API.
Of course, kernel won't (or at least shouldn't) crash in case of misuse.
However, in these cases some resources might not be freed - it's unavoidable.

Horia
Herbert Xu Feb. 20, 2018, 10:34 a.m. UTC | #8
On Mon, Feb 19, 2018 at 01:16:30PM +0000, Horia Geantă wrote:
>
> > And what about ALGIF path from user space ?
> > What if the user never calls the last sendmsg() which will call 
> > hash_finup() ?
> > 
> User is expected to follow the rules of the crypto API.
> Of course, kernel won't (or at least shouldn't) crash in case of misuse.
> However, in these cases some resources might not be freed - it's unavoidable.

the crypto API does not require the presence of a finalisation.
It is entirely optional.  So leaving resources pinned down until
final/finup occurs is unacceptable, both from user-space and the
kernel.

Cheers,
Horia Geantă Feb. 20, 2018, 11:32 a.m. UTC | #9
On 2/20/2018 12:34 PM, Herbert Xu wrote:
> On Mon, Feb 19, 2018 at 01:16:30PM +0000, Horia Geantă wrote:
>>
>>> And what about ALGIF path from user space ?
>>> What if the user never calls the last sendmsg() which will call 
>>> hash_finup() ?
>>>
>> User is expected to follow the rules of the crypto API.
>> Of course, kernel won't (or at least shouldn't) crash in case of misuse.
>> However, in these cases some resources might not be freed - it's unavoidable.
> 
> the crypto API does not require the presence of a finalisation.
> It is entirely optional.  So leaving resources pinned down until
> final/finup occurs is unacceptable, both from user-space and the
> kernel.
> 
If final/finup is optional, how is the final hash supposed to be retrieved?

According to documentation, these are the accepted flows (with the option to
export/import a partial hash b/w update and final/finup):

.init() -> .update() -> .final()
            ^    |         |
            '----'         '---> HASH

.init() -> .update() -> .finup()
            ^    |         |
            '----'         '---> HASH

           .digest()
               |
               '---------------> HASH

Note that digest() is not an issue in the case we are discussing, since resource
allocation happens only in init().

Thanks,
Horia
Herbert Xu Feb. 22, 2018, 11:47 a.m. UTC | #10
On Tue, Feb 20, 2018 at 11:32:25AM +0000, Horia Geantă wrote:
>
> If final/finup is optional, how is the final hash supposed to be retrieved?

Sometimes the computation ends with a partial hash, that's what
export is for.  Also it is completely legal to abandon the hash
state entirely.

> According to documentation, these are the accepted flows (with the option to
> export/import a partial hash b/w update and final/finup):
> 
> .init() -> .update() -> .final()
>             ^    |         |
>             '----'         '---> HASH
> 
> .init() -> .update() -> .finup()
>             ^    |         |
>             '----'         '---> HASH
> 
>            .digest()
>                |
>                '---------------> HASH

The documentation is simply incomplete in this regard.

Cheers,
Horia Geantă Feb. 22, 2018, 12:29 p.m. UTC | #11
On 2/22/2018 1:47 PM, Herbert Xu wrote:
> On Tue, Feb 20, 2018 at 11:32:25AM +0000, Horia Geantă wrote:
>>
>> If final/finup is optional, how is the final hash supposed to be retrieved?
> 
> Sometimes the computation ends with a partial hash, that's what
> export is for.  Also it is completely legal to abandon the hash
> state entirely.
> 
Thanks for the explanation.
It's unintuitive to call .init() -> .update() and then not to call any of
.final(), .finup(), .export().

Christophe,

IIUC this means that there is no room for improvement.
This patch needs to be reverted, to restore previous behaviour when the
hw_context was mapped / unmapped for every request.

Thanks,
Horia
Herbert Xu Feb. 22, 2018, 1:46 p.m. UTC | #12
On Thu, Feb 22, 2018 at 12:29:28PM +0000, Horia Geantă wrote:
>
> IIUC this means that there is no room for improvement.
> This patch needs to be reverted, to restore previous behaviour when the
> hw_context was mapped / unmapped for every request.

In general we should avoid trying to do batching in drivers.  Such
optimisations should instead be done at a higher level.  For example,
for disk encryption we want to do the aggregation at the block layer
rather than the crypto API because that has innate knowledge of the
data layout which we can only guess.

Cheers,
diff mbox series

Patch

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index ebfd6d982ed6..d495649d5267 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -819,6 +819,7 @@  struct talitos_ctx {
 	unsigned int keylen;
 	unsigned int enckeylen;
 	unsigned int authkeylen;
+	dma_addr_t dma_hw_context;
 };
 
 #define HASH_MAX_BLOCK_SIZE		SHA512_BLOCK_SIZE
@@ -1663,18 +1664,9 @@  static void common_nonsnoop_hash_unmap(struct device *dev,
 				       struct ahash_request *areq)
 {
 	struct talitos_ahash_req_ctx *req_ctx = ahash_request_ctx(areq);
-	struct talitos_private *priv = dev_get_drvdata(dev);
-	bool is_sec1 = has_ftr_sec1(priv);
-
-	unmap_single_talitos_ptr(dev, &edesc->desc.ptr[5], DMA_FROM_DEVICE);
 
 	talitos_sg_unmap(dev, edesc, req_ctx->psrc, NULL, 0, 0);
 
-	/* When using hashctx-in, must unmap it. */
-	if (from_talitos_ptr_len(&edesc->desc.ptr[1], is_sec1))
-		unmap_single_talitos_ptr(dev, &edesc->desc.ptr[1],
-					 DMA_TO_DEVICE);
-
 	if (edesc->dma_len)
 		dma_unmap_single(dev, edesc->dma_link_tbl, edesc->dma_len,
 				 DMA_BIDIRECTIONAL);
@@ -1744,10 +1736,8 @@  static int common_nonsnoop_hash(struct talitos_edesc *edesc,
 
 	/* hash context in */
 	if (!req_ctx->first || req_ctx->swinit) {
-		map_single_talitos_ptr(dev, &desc->ptr[1],
-				       req_ctx->hw_context_size,
-				       (char *)req_ctx->hw_context,
-				       DMA_TO_DEVICE);
+		to_talitos_ptr(&desc->ptr[1], ctx->dma_hw_context,
+			       req_ctx->hw_context_size, is_sec1);
 		req_ctx->swinit = 0;
 	}
 	/* Indicate next op is not the first. */
@@ -1780,9 +1770,8 @@  static int common_nonsnoop_hash(struct talitos_edesc *edesc,
 				       crypto_ahash_digestsize(tfm),
 				       areq->result, DMA_FROM_DEVICE);
 	else
-		map_single_talitos_ptr(dev, &desc->ptr[5],
-				       req_ctx->hw_context_size,
-				       req_ctx->hw_context, DMA_FROM_DEVICE);
+		to_talitos_ptr(&desc->ptr[5], ctx->dma_hw_context,
+			       req_ctx->hw_context_size, is_sec1);
 
 	/* last DWORD empty */
 
@@ -1815,17 +1804,25 @@  static struct talitos_edesc *ahash_edesc_alloc(struct ahash_request *areq,
 static int ahash_init(struct ahash_request *areq)
 {
 	struct crypto_ahash *tfm = crypto_ahash_reqtfm(areq);
+	struct talitos_ctx *ctx = crypto_ahash_ctx(tfm);
+	struct device *dev = ctx->dev;
 	struct talitos_ahash_req_ctx *req_ctx = ahash_request_ctx(areq);
+	unsigned int size;
 
 	/* Initialize the context */
 	req_ctx->nbuf = 0;
 	req_ctx->first = 1; /* first indicates h/w must init its context */
 	req_ctx->swinit = 0; /* assume h/w init of context */
-	req_ctx->hw_context_size =
-		(crypto_ahash_digestsize(tfm) <= SHA256_DIGEST_SIZE)
+	size =	(crypto_ahash_digestsize(tfm) <= SHA256_DIGEST_SIZE)
 			? TALITOS_MDEU_CONTEXT_SIZE_MD5_SHA1_SHA256
 			: TALITOS_MDEU_CONTEXT_SIZE_SHA384_SHA512;
+	req_ctx->hw_context_size = size;
 
+	if (ctx->dma_hw_context)
+		dma_unmap_single(dev, ctx->dma_hw_context, size,
+				 DMA_BIDIRECTIONAL);
+	ctx->dma_hw_context = dma_map_single(dev, req_ctx->hw_context, size,
+					     DMA_BIDIRECTIONAL);
 	return 0;
 }
 
@@ -1836,6 +1833,9 @@  static int ahash_init(struct ahash_request *areq)
 static int ahash_init_sha224_swinit(struct ahash_request *areq)
 {
 	struct talitos_ahash_req_ctx *req_ctx = ahash_request_ctx(areq);
+	struct crypto_ahash *tfm = crypto_ahash_reqtfm(areq);
+	struct talitos_ctx *ctx = crypto_ahash_ctx(tfm);
+	struct device *dev = ctx->dev;
 
 	ahash_init(areq);
 	req_ctx->swinit = 1;/* prevent h/w initting context with sha256 values*/
@@ -1853,6 +1853,9 @@  static int ahash_init_sha224_swinit(struct ahash_request *areq)
 	req_ctx->hw_context[8] = 0;
 	req_ctx->hw_context[9] = 0;
 
+	dma_sync_single_for_device(dev, ctx->dma_hw_context,
+				   req_ctx->hw_context_size, DMA_TO_DEVICE);
+
 	return 0;
 }
 
@@ -1990,7 +1993,12 @@  static int ahash_export(struct ahash_request *areq, void *out)
 {
 	struct talitos_ahash_req_ctx *req_ctx = ahash_request_ctx(areq);
 	struct talitos_export_state *export = out;
+	struct crypto_ahash *ahash = crypto_ahash_reqtfm(areq);
+	struct talitos_ctx *ctx = crypto_ahash_ctx(ahash);
+	struct device *dev = ctx->dev;
 
+	dma_sync_single_for_cpu(dev, ctx->dma_hw_context,
+				req_ctx->hw_context_size, DMA_FROM_DEVICE);
 	memcpy(export->hw_context, req_ctx->hw_context,
 	       req_ctx->hw_context_size);
 	memcpy(export->buf, req_ctx->buf, req_ctx->nbuf);
@@ -2008,14 +2016,22 @@  static int ahash_import(struct ahash_request *areq, const void *in)
 	struct talitos_ahash_req_ctx *req_ctx = ahash_request_ctx(areq);
 	struct crypto_ahash *tfm = crypto_ahash_reqtfm(areq);
 	const struct talitos_export_state *export = in;
+	unsigned int size;
+	struct talitos_ctx *ctx = crypto_ahash_ctx(tfm);
+	struct device *dev = ctx->dev;
 
 	memset(req_ctx, 0, sizeof(*req_ctx));
-	req_ctx->hw_context_size =
-		(crypto_ahash_digestsize(tfm) <= SHA256_DIGEST_SIZE)
+	size = (crypto_ahash_digestsize(tfm) <= SHA256_DIGEST_SIZE)
 			? TALITOS_MDEU_CONTEXT_SIZE_MD5_SHA1_SHA256
 			: TALITOS_MDEU_CONTEXT_SIZE_SHA384_SHA512;
-	memcpy(req_ctx->hw_context, export->hw_context,
-	       req_ctx->hw_context_size);
+	req_ctx->hw_context_size = size;
+	if (ctx->dma_hw_context)
+		dma_unmap_single(dev, ctx->dma_hw_context, size,
+				 DMA_BIDIRECTIONAL);
+
+	memcpy(req_ctx->hw_context, export->hw_context, size);
+	ctx->dma_hw_context = dma_map_single(dev, req_ctx->hw_context, size,
+					     DMA_BIDIRECTIONAL);
 	memcpy(req_ctx->buf, export->buf, export->nbuf);
 	req_ctx->swinit = export->swinit;
 	req_ctx->first = export->first;
@@ -2954,6 +2970,24 @@  static void talitos_cra_exit(struct crypto_tfm *tfm)
 		dma_unmap_single(dev, ctx->dma_key, ctx->keylen, DMA_TO_DEVICE);
 }
 
+static void talitos_cra_exit_ahash(struct crypto_tfm *tfm)
+{
+	struct talitos_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct device *dev = ctx->dev;
+	unsigned int size;
+
+	talitos_cra_exit(tfm);
+
+	size = (crypto_ahash_digestsize(__crypto_ahash_cast(tfm)) <=
+		SHA256_DIGEST_SIZE)
+	       ? TALITOS_MDEU_CONTEXT_SIZE_MD5_SHA1_SHA256
+	       : TALITOS_MDEU_CONTEXT_SIZE_SHA384_SHA512;
+
+	if (ctx->dma_hw_context)
+		dma_unmap_single(dev, ctx->dma_hw_context, size,
+				 DMA_BIDIRECTIONAL);
+}
+
 /*
  * given the alg's descriptor header template, determine whether descriptor
  * type and primary/secondary execution units required match the hw
@@ -3052,7 +3086,7 @@  static struct talitos_crypto_alg *talitos_alg_alloc(struct device *dev,
 	case CRYPTO_ALG_TYPE_AHASH:
 		alg = &t_alg->algt.alg.hash.halg.base;
 		alg->cra_init = talitos_cra_init_ahash;
-		alg->cra_exit = talitos_cra_exit;
+		alg->cra_exit = talitos_cra_exit_ahash;
 		alg->cra_type = &crypto_ahash_type;
 		t_alg->algt.alg.hash.init = ahash_init;
 		t_alg->algt.alg.hash.update = ahash_update;