diff mbox

[v8,1/2] virtio-crypto: Add virtio crypto device specification

Message ID 1472559136-89096-2-git-send-email-arei.gonglei@huawei.com
State New
Headers show

Commit Message

Gonglei (Arei) Aug. 30, 2016, 12:12 p.m. UTC
The virtio crypto device is a virtual crypto device (ie. hardware
crypto accelerator card). The virtio crypto device can provide
five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.

In this patch, CIPHER, MAC, HASH, AEAD services are introduced.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Cornelia Huck <cornelia.huck@de.ibm.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Lingli Deng <denglingli@chinamobile.com>
CC: Jani Kokkonen <Jani.Kokkonen@huawei.com>
CC: Ola Liljedahl <Ola.Liljedahl@arm.com>
CC: Varun Sethi <Varun.Sethi@freescale.com>
CC: Zeng Xin <xin.zeng@intel.com>
CC: Keating Brian <brian.a.keating@intel.com>
CC: Ma Liang J <liang.j.ma@intel.com>
CC: Griffin John <john.griffin@intel.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Mihai Claudiu Caraman <mike.caraman@nxp.com>
---
 content.tex       |   2 +
 virtio-crypto.tex | 835 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 837 insertions(+)
 create mode 100644 virtio-crypto.tex

Comments

Alexander Graf Sept. 1, 2016, 1:37 p.m. UTC | #1
On 08/30/2016 02:12 PM, Gonglei wrote:
> The virtio crypto device is a virtual crypto device (ie. hardware
> crypto accelerator card). The virtio crypto device can provide
> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>
> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.

I have mostly a few high level comments.

For starters, a lot of the structs rely on the compiler to pad them to 
natural alignment. That may get us into trouble when trying to emulate a 
virtio device on a host with different guest architecture (like arm on 
x86). It'd be a good idea to manually pad everything to be 64bit aligned 
- then all fields are always at the same spot.

I also have a hard time getting my head around the final flow of 
everything. Do I always have to create a session to be able to emit a 
command? In that case, doesn't that slow down everything, since a 
request would then need to wait for the host reply to receive its 
session id? There should be some way to fire off a simple non-iv 
operation without any session set up imho.

Also, I don't fully understand the split between control and data 
queues. As far as I read things, the control queue is used to create 
session ids and the data queues can then be used to push data. Is there 
any particular reason for the split? One thing that seems natural to me 
would be to have sessions be per-queue, so you would create a session on 
a particular queue and only have it ever be available there. That way 
you get rid of any locking for sessions.


Alex
Gonglei (Arei) Sept. 2, 2016, 3:08 a.m. UTC | #2
Hi Alex,


> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Thursday, September 01, 2016 9:37 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
> 
> On 08/30/2016 02:12 PM, Gonglei wrote:
> > The virtio crypto device is a virtual crypto device (ie. hardware
> > crypto accelerator card). The virtio crypto device can provide
> > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >
> > In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> 
> I have mostly a few high level comments.
> 
> For starters, a lot of the structs rely on the compiler to pad them to
> natural alignment. That may get us into trouble when trying to emulate a
> virtio device on a host with different guest architecture (like arm on
> x86). It'd be a good idea to manually pad everything to be 64bit aligned
> - then all fields are always at the same spot.
> 
Good point! I'll do this in the next version. Thanks!

> I also have a hard time getting my head around the final flow of
> everything. Do I always have to create a session to be able to emit a
> command? In that case, doesn't that slow down everything, since a
> request would then need to wait for the host reply to receive its
> session id? There should be some way to fire off a simple non-iv
> operation without any session set up imho.
> 
For symmetric algorithms, we'd better create a session before executing encryption
Or decryption, because the session usually be kept for a specific
algorithm with specific key in the production environment. And if we only change the iv, 
we don't need to re-create the session. 

For the asymmetric algorithms, we don't need create a session IIRC.

So, I don't think this is a performance degradation, but a performance enhancement.

> Also, I don't fully understand the split between control and data
> queues. As far as I read things, the control queue is used to create
> session ids and the data queues can then be used to push data. Is there
> any particular reason for the split? One thing that seems natural to me
> would be to have sessions be per-queue, so you would create a session on
> a particular queue and only have it ever be available there. That way
> you get rid of any locking for sessions.
> 
We want to keep a unify request type (structure) for data queue, so we can
keep the session operation in the control queue. Of course the control queue
only used to create sessions currently, but we can extend its functions if needed
in the future.

> 
> Alex

Regards,
-Gonglei
Alexander Graf Sept. 2, 2016, 8:06 a.m. UTC | #3
On 02.09.16 05:08, Gonglei (Arei) wrote:
> Hi Alex,
> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Thursday, September 01, 2016 9:37 PM
>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
>>
>> On 08/30/2016 02:12 PM, Gonglei wrote:
>>> The virtio crypto device is a virtual crypto device (ie. hardware
>>> crypto accelerator card). The virtio crypto device can provide
>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>>>
>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
>>
>> I have mostly a few high level comments.
>>
>> For starters, a lot of the structs rely on the compiler to pad them to
>> natural alignment. That may get us into trouble when trying to emulate a
>> virtio device on a host with different guest architecture (like arm on
>> x86). It'd be a good idea to manually pad everything to be 64bit aligned
>> - then all fields are always at the same spot.
>>
> Good point! I'll do this in the next version. Thanks!
> 
>> I also have a hard time getting my head around the final flow of
>> everything. Do I always have to create a session to be able to emit a
>> command? In that case, doesn't that slow down everything, since a
>> request would then need to wait for the host reply to receive its
>> session id? There should be some way to fire off a simple non-iv
>> operation without any session set up imho.
>>
> For symmetric algorithms, we'd better create a session before executing encryption
> Or decryption, because the session usually be kept for a specific
> algorithm with specific key in the production environment. And if we only change the iv, 
> we don't need to re-create the session. 

I think we have a slight misunderstanding here :)

The current workflow is

  -> create session
  <- session key
  -> data in
  <- data out
  ...
  <- close session
  -> ack

That means that at least for the first packet you have at least one full
round-trip cost from guest to host to guest to be able to send any data.

That sounds pretty expensive to me on the latency side. There are
multiple ways to mitigate that. One idea was to have a separate path in
parallel to the create session + data + close session dance that would
combine them all into a single command. You would still have the session
based version, but accelerate the one-blob case.

Another idea would be to make the guest be the session id janitor. Then
you could just do

  -> create session with key X
  -> data in
  <- data out
  ...

so you save the round trip, if you combine command and data queues, as
then the create and data bits are serialized by their position in the queue.

> 
> For the asymmetric algorithms, we don't need create a session IIRC.
> 
> So, I don't think this is a performance degradation, but a performance enhancement.
> 
>> Also, I don't fully understand the split between control and data
>> queues. As far as I read things, the control queue is used to create
>> session ids and the data queues can then be used to push data. Is there
>> any particular reason for the split? One thing that seems natural to me
>> would be to have sessions be per-queue, so you would create a session on
>> a particular queue and only have it ever be available there. That way
>> you get rid of any locking for sessions.
>>
> We want to keep a unify request type (structure) for data queue, so we can
> keep the session operation in the control queue. Of course the control queue
> only used to create sessions currently, but we can extend its functions if needed
> in the future.

I still don't understand. With separate control+data queues you just get
yourself into synchronization troubles. Both struct
virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
have an opcode as first le32 field. You can easily use that to determine
both length of the payload as well as command (control vs data).

You could then also completely get rid of the "queue_id" fields, as any
operation would only ever operate on the queue it's running on.


Alex
Gonglei (Arei) Sept. 2, 2016, 10:26 a.m. UTC | #4
> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Friday, September 02, 2016 4:07 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
> 
> 
> 
> On 02.09.16 05:08, Gonglei (Arei) wrote:
> > Hi Alex,
> >
> >
> >> -----Original Message-----
> >> From: Alexander Graf [mailto:agraf@suse.de]
> >> Sent: Thursday, September 01, 2016 9:37 PM
> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> specification
> >>
> >> On 08/30/2016 02:12 PM, Gonglei wrote:
> >>> The virtio crypto device is a virtual crypto device (ie. hardware
> >>> crypto accelerator card). The virtio crypto device can provide
> >>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >>>
> >>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >>
> >> I have mostly a few high level comments.
> >>
> >> For starters, a lot of the structs rely on the compiler to pad them to
> >> natural alignment. That may get us into trouble when trying to emulate a
> >> virtio device on a host with different guest architecture (like arm on
> >> x86). It'd be a good idea to manually pad everything to be 64bit aligned
> >> - then all fields are always at the same spot.
> >>
> > Good point! I'll do this in the next version. Thanks!
> >
> >> I also have a hard time getting my head around the final flow of
> >> everything. Do I always have to create a session to be able to emit a
> >> command? In that case, doesn't that slow down everything, since a
> >> request would then need to wait for the host reply to receive its
> >> session id? There should be some way to fire off a simple non-iv
> >> operation without any session set up imho.
> >>
> > For symmetric algorithms, we'd better create a session before executing
> encryption
> > Or decryption, because the session usually be kept for a specific
> > algorithm with specific key in the production environment. And if we only
> change the iv,
> > we don't need to re-create the session.
> 
> I think we have a slight misunderstanding here :)
> 
> The current workflow is
> 
>   -> create session
>   <- session key
>   -> data in
>   <- data out
>   ...
>   <- close session
>   -> ack
> 
> That means that at least for the first packet you have at least one full
> round-trip cost from guest to host to guest to be able to send any data.
> 
Yes, that's true... 

> That sounds pretty expensive to me on the latency side. There are
> multiple ways to mitigate that. One idea was to have a separate path in
> parallel to the create session + data + close session dance that would
> combine them all into a single command. You would still have the session
> based version, but accelerate the one-blob case.
> 
> Another idea would be to make the guest be the session id janitor. Then
> you could just do
> 
>   -> create session with key X
>   -> data in
>   <- data out
>   ...
> 
> so you save the round trip, if you combine command and data queues, as
> then the create and data bits are serialized by their position in the queue.
> 
... But for lots of crypto requests, the exhaust is low:

	-> create session with key X
	<- session id
    //do something in the guest if you like
 	-> data in with session_id
	-> data in with session_id
	-> data in with session_id
    ... ...    (fill out data virtqueue)
	<-data out
	<-data out
    <-data out

And this is what the production environment do currently.

> >
> > For the asymmetric algorithms, we don't need create a session IIRC.
> >
> > So, I don't think this is a performance degradation, but a performance
> enhancement.
> >
> >> Also, I don't fully understand the split between control and data
> >> queues. As far as I read things, the control queue is used to create
> >> session ids and the data queues can then be used to push data. Is there
> >> any particular reason for the split? One thing that seems natural to me
> >> would be to have sessions be per-queue, so you would create a session on
> >> a particular queue and only have it ever be available there. That way
> >> you get rid of any locking for sessions.
> >>
> > We want to keep a unify request type (structure) for data queue, so we can
> > keep the session operation in the control queue. Of course the control queue
> > only used to create sessions currently, but we can extend its functions if
> needed
> > in the future.
> 
> I still don't understand. With separate control+data queues you just get
> yourself into synchronization troubles. Both struct
> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
> have an opcode as first le32 field. You can easily use that to determine
> both length of the payload as well as command (control vs data).
> 

There is a big problem that the control handle logic is synchronization, but the data queue
handling logic is asynchronization. We can't combine them into one queue.
It will decrease the performance because you need indentify each packet
if we do this forcedly.


Regards,
-Gonglei

> You could then also completely get rid of the "queue_id" fields, as any
> operation would only ever operate on the queue it's running on.
> 
> 
> Alex
Stefan Hajnoczi Sept. 2, 2016, 12:06 p.m. UTC | #5
On Tue, Aug 30, 2016 at 08:12:15PM +0800, Gonglei wrote:

Hi,
I have read through part of the spec and suggest mostly grammar fixes
below.

> The virtio crypto device is a virtual crypto device (ie. hardware
> crypto accelerator card). The virtio crypto device can provide
> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> 
> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> 
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Cornelia Huck <cornelia.huck@de.ibm.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Lingli Deng <denglingli@chinamobile.com>
> CC: Jani Kokkonen <Jani.Kokkonen@huawei.com>
> CC: Ola Liljedahl <Ola.Liljedahl@arm.com>
> CC: Varun Sethi <Varun.Sethi@freescale.com>
> CC: Zeng Xin <xin.zeng@intel.com>
> CC: Keating Brian <brian.a.keating@intel.com>
> CC: Ma Liang J <liang.j.ma@intel.com>
> CC: Griffin John <john.griffin@intel.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Mihai Claudiu Caraman <mike.caraman@nxp.com>
> ---
>  content.tex       |   2 +
>  virtio-crypto.tex | 835 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 837 insertions(+)
>  create mode 100644 virtio-crypto.tex
> 
> diff --git a/content.tex b/content.tex
> index 4b45678..ab75f78 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len}, \field{residual},
>  \field{status_qualifier}, \field{status}, \field{response} and
>  \field{sense} fields.
>  
> +\input{virtio-crypto.tex}
> +
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
>  Currently there are three device-independent feature bits defined:
> diff --git a/virtio-crypto.tex b/virtio-crypto.tex
> new file mode 100644
> index 0000000..491fc25
> --- /dev/null
> +++ b/virtio-crypto.tex
> @@ -0,0 +1,835 @@
> +\section{Crypto Device}\label{sec:Device Types / Crypto Device}
> +
> +The virtio crypto device is a virtual crypto device, and is a kind of
> +virtual hardware accelerators for virtual machine.  The encryption and

s/accelerators/accelerator/
s/virtual machine/virtual machines/

> +decryption requests of are placed in the data queue, and handled by the

s/of//

> +real crypto accelerators finally. The second queue is the control queue,
> +which is used to create or destroy session for symmetric algorithms, and

s/session/sessions/

> +control some advanced features in the future. The virtio crypto
> +device can provide seven crypto services: CIPHER, MAC, HASH, AEAD,
> +KDF, ASYM, PRIMITIVE.
> +
> +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID}
> +
> +20
> +
> +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / Virtqueues}
> +
> +\begin{description}
> +\item[0] dataq1
> +\item[\ldots]
> +\item[N-1] dataqN
> +\item[N] controlq
> +\end{description}
> +
> +N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1).

I suggest dropping (\field{max_dataqueues} >= 1) since this constraint
already is expressed in the device normative section below.  Things
can go out of sync if they are duplicated.

> +
> +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature bits}
> +  None currently defined
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / Crypto Device / Device configuration layout}
> +
> +Thirteen driver-read-only configuration fields are currently defined.

I count only 12 fields.  I suggest saying "The following
driver-read-only configuration fields are currently defined:" instead.

> +\begin{lstlisting}
> +struct virtio_crypto_config {
> +    le16  status;
> +    le16  max_dataqueues;
> +    le32  crypto_services;
> +    /* detailed algorithms mask */
> +    le32 cipher_algo_l;
> +    le32 cipher_algo_h;
> +    le32 hash_algo;
> +    le32 mac_algo_l;
> +    le32 mac_algo_h;
> +    le32 asym_algo;
> +    le32 kdf_algo;
> +    le32 aead_algo;
> +    le32 primitive_algo;
> +};
> +\end{lstlisting}
> +
> +The first field, \field{status} is currently defined: VIRTIO_CRYPTO_S_HW_READY.
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
> +\end{lstlisting}
> +
> +The following driver-read-only field, \field{max_dataqueuess} specifies the
> +maximum number of data virtqueues (dataq1\ldots dataqN). The crypto_services
> +shows the crypto services the virtio crypto supports. The service currently

s/service/services/

> +defined are:
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */
> +#define VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */

How are these constants used: 1 << VIRTIO_CRYPTO_NO_SERVICE?

I suggest eliminating the 0 bit constants since they can be deduced from
the fact that all other bits are zeroed.  There is no need for a
dedicated "NO_SERVICE" constant.

> +#define VIRTIO_CRYPTO_SERVICE_HASH  (2) /* hash service */
> +#define VIRTIO_CRYPTO_SERVICE_MAC   (3) /* MAC (Message Authentication Codes) service */
> +#define VIRTIO_CRYPTO_SERVICE_AEAD  (4) /* AEAD(Authenticated Encryption with Associated Data) service */
> +\end{lstlisting}
> +
> +The last driver-read-only fields specify detailed algorithms mask which

s/mask/masks/

> +the device offered for corresponding service. The below CIPHER algorithms

s/the device offered for corresponding service/the device offers for
corresponding services/

> +are defined currently:
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_NO_CIPHER                 0
> +#define VIRTIO_CRYPTO_CIPHER_ARC4               1
> +#define VIRTIO_CRYPTO_CIPHER_AES_ECB            2
> +#define VIRTIO_CRYPTO_CIPHER_AES_CBC            3
> +#define VIRTIO_CRYPTO_CIPHER_AES_CTR            4
> +#define VIRTIO_CRYPTO_CIPHER_DES_ECB            5
> +#define VIRTIO_CRYPTO_CIPHER_DES_CBC            6
> +#define VIRTIO_CRYPTO_CIPHER_3DES_ECB           7
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CBC           8
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CTR           9
> +#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8          10
> +#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA2        11
> +#define VIRTIO_CRYPTO_CIPHER_AES_F8             12
> +#define VIRTIO_CRYPTO_CIPHER_AES_XTS            13
> +#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3           14
> +\end{lstlisting}
> +
> +The below HASH algorithms are defined currently:
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_NO_HASH            0
> +#define VIRTIO_CRYPTO_HASH_MD5           1
> +#define VIRTIO_CRYPTO_HASH_SHA1          2
> +#define VIRTIO_CRYPTO_HASH_SHA_224       3
> +#define VIRTIO_CRYPTO_HASH_SHA_256       4
> +#define VIRTIO_CRYPTO_HASH_SHA_384       5
> +#define VIRTIO_CRYPTO_HASH_SHA_512       6
> +#define VIRTIO_CRYPTO_HASH_SHA3_224      7
> +#define VIRTIO_CRYPTO_HASH_SHA3_256      8
> +#define VIRTIO_CRYPTO_HASH_SHA3_384      9
> +#define VIRTIO_CRYPTO_HASH_SHA3_512      10
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE128      11
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE256      12
> +\end{lstlisting}

Perhaps all masks should be 64-bit because the 32-bit hash_algo field
already has 13 bits defined so room for future expansion is limited.

> +
> +The below MAC algorithms are defined currently:
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_NO_MAC                       0
> +#define VIRTIO_CRYPTO_MAC_HMAC_MD5                 1
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA1                2
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_224             3
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_256             4
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_384             5
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_512             6
> +#define VIRTIO_CRYPTO_MAC_CMAC_3DES                25
> +#define VIRTIO_CRYPTO_MAC_CMAC_AES                 26
> +#define VIRTIO_CRYPTO_MAC_CMAC_KASUMI_F9           27
> +#define VIRTIO_CRYPTO_MAC_CMAC_SNOW3G_UIA2         28
> +#define VIRTIO_CRYPTO_MAC_GMAC_AES                 41
> +#define VIRTIO_CRYPTO_MAC_GMAC_TWOFISH             42
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_AES               49
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_KASUMI_F9         50
> +#define VIRTIO_CRYPTO_MAC_XCBC_AES                 53
> +#define VIRTIO_CRYPTO_MAC_ZUC_EIA3                 54
> +\end{lstlisting}
> +
> +The below AEAD algorithms are defined currently:
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_NO_AEAD     0
> +#define VIRTIO_CRYPTO_AEAD_GCM    1
> +#define VIRTIO_CRYPTO_AEAD_CCM    2
> +#define VIRTIO_CRYPTO_AEAD_CHACHA20_POLY1305  3
> +\end{lstlisting}
> +
> +\subsubsection{Device Requirements: Device configuration layout}\label{sec:Device Types / Crypto Device / Device configuration layout / Device Requirements: Device configuration layout}
> +
> +\begin{itemize*}

This section should use \devicenormative{\subsection}{...}.  I'm not
sure why you used a regular \subsubsection{} followed by
\begin{itemize*}.

> +\item The device MUST set \field{max_dataqueues} to between 1 and 65535 inclusive.
> +\item The device SHOULD set \field{status} according to the status of the hardware-backed implementation.
> +\item The device MUST set \field{crypto_services} according to the crypto services which the device offered.
> +\item The device MUST set detailed algorithms mask according to \field{crypto_services} field.
> +\end{itemize*}
> +
> +\subsubsection{Driver Requirements: Device configuration layout}\label{sec:Device Types / Crypto Device / Device configuration layout / Driver Requirements: Device configuration layout}
> +
> +\begin{itemize*}

\drivernormative{\subsection}{...}

> +\item The driver MUST read the ready \field{status} from the bottom bit of status to
> +      check whether the hardware-backed implementation is ready or not.
> +\item The driver MAY read \field{max_dataqueues} field to discover how many data queues the device supports.
> +\item The driver MUST read \field{crypto_services} field to discover which services the device is able to offer.
> +\item The driver MUST read detail \field{algorithms} field according to \field{crypto_services} field.

s/detail/the detailed/

> +\end{itemize*}
> +
> +\subsection{Device Initialization}\label{sec:Device Types / Crypto Device / Device Initialization}
> +
> +\subsubsection{Driver Requirements: Device Initialization}\label{sec:Device Types / Crypto Device / Device Initialization / Driver Requirements: Device Initialization}
> +
> +\begin{itemize*}
> +\item The driver MUST identify and initialize data virtqueue, up to \field{max_dataqueues}.

"The driver MUST identify and initialize up to \field{max_dataqueues}
data virtqueues."

Besides the grammar fixes I think this version makes it clearer that the
driver does not have to initialize all data virtqueues if it wants to
use fewer.

> +\item The driver MUST identify the control virtqueue.
> +\item The driver MUST identify the ready status of hardware-backend from \field{status} field.
> +\item The driver MUST read the supported crypto services from bits of \field{crypto_servies}. 
> +\item The driver MUST read the supported algorithms according to \field{crypto_services} field.
> +\end{itemize*}
> +
> +\subsubsection{Device Requirements: Device Initialization}\label{sec:Device Types / Crypto Device / Device Initialization / Device Requirements: Device Initialization}
> +
> +\begin{itemize*}
> +\item The device MUST be configured at least one real accelerator as the backend accelerator which executes real crypto operations.

The spec does not have to require a "real" accelerator.  A pure software
implementation should be able to conform to the spec.  I would drop this
line.

> +\item The device MUST write the \field{crypto_services} field according to the capacities of the backend accelerator.
> +\item The device MUST write the corresponding algorithms field according to the \field{crypto_services} field.

"Device Types / Crypto Device / Device configuration layout / Device Requirements: Device configuration layout" already expresses the same thing.  This text seems unnecessary.

> +\end{itemize*}
> +
> +\subsection{Device Operation}\label{sec:Device Types / Crypto Device / Device Operation}
> +
> +Packets can be transmitted by placing them in both the controlq and dataq.
> +The packet are consisted of general header and services specific request,

Grammar: "Packets consist of a generic header and a service-specific request."

> +Where 'general header' is for all crypto request, 'service specific request'

s/request/requests/

> +is composed of operation parameter + output data + input data in generally.

s/is/are/
s/in generally/in general/

> +Operation parameters are algorithm-specific parameters, output data is the
> +data should be operated, input data is the "operation result + result buffer".
> +The general header of controlq:
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_OPCODE(service, op)   ((service << 8) | (op))
> +
> +struct virtio_crypto_ctrl_header{
> +#define VIRTIO_CRYPTO_CIPHER_CREATE_SESSION   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x02)
> +#define VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION  VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x03)
> +#define VIRTIO_CRYPTO_HASH_CREATE_SESSION     VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x02)
> +#define VIRTIO_CRYPTO_HASH_DESTROY_SESSION    VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x03)
> +#define VIRTIO_CRYPTO_MAC_CREATE_SESSION      VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x02)
> +#define VIRTIO_CRYPTO_MAC_DESTROY_SESSION     VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x03)
> +#define VIRTIO_CRYPTO_AEAD_CREATE_SESSION     VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
> +#define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION    VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
> +    le32 opcode;
> +    /* algo should be service-specific algorithms */
> +    le32 algo;
> +    /* control flag to control the request */
> +    le16 flag;
> +    /* data virtqeueu id */

s/virtqeueu/virtqueue/

> +    le16 queue_id;
> +};
> +\end{lstlisting}
> +
> +The general header of dataq:
> +
> +\begin{lstlisting}
> +struct virtio_crypto_op_header {
> +#define VIRTIO_CRYPTO_CIPHER_ENCRYPT    VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x00)
> +#define VIRTIO_CRYPTO_CIPHER_DECRYPT    VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x01)
> +#define VIRTIO_CRYPTO_HASH              VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x00)
> +#define VIRTIO_CRYPTO_MAC               VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_ENCRYPT      VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_DECRYPT      VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
> +    le32 opcode;
> +    /* algo should be service-specific algorithms */
> +    le32 algo;
> +    /* control flag to control the request */
> +    le16 flag;
> +    /* data virtqeueu id */

s/virtqeueu/virtqueue/

> +    le16 queue_id;
> +    /* session_id should be service-specific algorithms */
> +    le64 session_id;

This field is not 64-bit aligned.  The compiler will insert 32-bit
padding.  Please move the field to a 64-bit aligned boundary so that the
binary layout does not change according to the compiler/architecture.

> +};
> +\end{lstlisting}
> +
> +\subsubsection{Control virtqueue}\label{sec:Device Types / Crypto Device / Device Operation / Control virtqueue}
> +
> +The driver uses the control virtqueue to send control commands to the
> +device which finish the non-data plane operations, such as session

s/finish/handles/

> +operations (see \ref{sec:Device Types / Crypto Device / Device Operation / Session operation}).
> +The packet of controlq:
> +
> +\begin{lstlisting}
> +struct virtio_crypto_op_ctrl_req {
> +    struct virtio_crypto_ctrl_header header;
> +
> +    union {
> +        struct virtio_crypto_sym_create_session_req   sym_create_session;
> +        struct virtio_crypto_hash_create_session_req  hash_create_session;
> +        struct virtio_crypto_mac_create_session_req   mac_create_session;
> +        struct virtio_crypto_aead_create_session_req  aead_create_session;
> +        struct virtio_crypto_destroy_session_req      sym_destroy_session;
> +    } u;
> +};
> +\end{lstlisting}
> +
> +The header is the general header, the union is algorithm-specific type,

/is algorithm-specific type/is an algorithm-specific type/

> +which is set by the driver. All the properties in the union will be shown as follow.

s/follow/follows/

> +
> +\subsubsection{Session operation}\label{sec:Device Types / Crypto Device / Device Operation / Session operation}
> +
> +The symmetric algorithms have the concept of sessions. A session is a
> +handle which describes the cryptographic parameters to be applied to
> +a number of buffers. The data within a session handle includes the following:
> +
> +\begin{enumerate}
> +\item The operation (cipher, hash/mac or both, and if both, the order in
> +      which the algorithms should be applied).
> +\item The cipher setup data, including the cipher algorithm and mode,
> +      the key and its length, and the direction (encrypt or decrypt).
> +\item The hash/mac setup data, including the hash algorithm or mac algorithm,
> +      and digest result length (to allow for truncation).
> +\begin{itemize*}
> +\item Authenticated mode can refer to MAC, which requires that the key and
> +      its length are also specified.
> +\item For nested mode, the inner and outer prefix data and length are specified,
> +      as well as the outer hash algorithm.
> +\end{itemize*}
> +\end{enumerate}
> +
> +\paragraph{Session operation: HASH session}\label{sec:Device Types / Crypto Device / Device Operation / Session operation / Session operation: HASH session}
> +
> +The packet of HASH session is:
> +
> +\begin{lstlisting}
> +struct virtio_crypto_hash_session_para {
> +    /* See VIRTIO_CRYPTO_HASH_* above */
> +    le32 algo;
> +    /* hash result length */
> +    le32 hash_result_len;
> +};
> +struct virtio_crypto_hash_create_session_req {
> +    // Device-readable part
> +    struct virtio_crypto_hash_session_para parameter;
> +    // Device-writable part
> +    le64    session_id;
> +    u8     status;
> +};
> +\end{lstlisting}
> +
> +\paragraph{Session operation: MAC session}\label{sec:Device Types / Crypto Device / Device
> +Operation / Session operation / Session operation: MAC session}
> +
> +The packet of MAC session is:
> +
> +\begin{lstlisting}
> +struct virtio_crypto_mac_session_para {
> +    /* See VIRTIO_CRYPTO_MAC_* above */
> +    le32 algo;
> +    /* hash result length */
> +    le32 hash_result_len;
> +    /* length of authenticated key */
> +    le32 auth_key_len;
> +};
> +struct virtio_crypto_mac_create_session_req {
> +    // Device-readable part
> +    struct virtio_crypto_mac_session_para  parameter;
> +    // Device-writable part
> +    le64    session_id;

This field is not 64-bit aligned.  I suggest manually adding a le32
padding field before it.

Please check all other structs defined in this spec.  You can also use
the pahole(1) utility to inspect struct layout chosen by your compiler.
There must be no compiler-generated padding.

> +    u8     status;
> +};
> +\end{lstlisting}
> +
> +\paragraph{Session operation: Symmetric algorithms session}\label{sec:Device Types / Crypto Device / Device
> +Operation / Session operation / Session operation: Symmetric algorithms session}
> +
> +The request of symmetric session includes two parts, CIPHER algorithms and chain
> +algorithms (chaining cipher and hash/mac). The packet of symmetric session is:
> +
> +\begin{lstlisting}
> +struct virtio_crypto_cipher_session_para {
> +    /* See VIRTIO_CRYPTO_CIPHER* above */
> +    le32 algo;
> +    /* length of key */

In bytes or in bits?
Ola Liljedahl Sept. 2, 2016, 12:16 p.m. UTC | #6
On 02/09/2016, 12:26, "Gonglei (Arei)" <arei.gonglei@huawei.com> wrote:

>
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Friday, September 02, 2016 4:07 PM
>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>specification
>>
>>
>>
>> On 02.09.16 05:08, Gonglei (Arei) wrote:
>> > Hi Alex,
>> >
>> >
>> >> -----Original Message-----
>> >> From: Alexander Graf [mailto:agraf@suse.de]
>> >> Sent: Thursday, September 01, 2016 9:37 PM
>> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>> specification
>> >>
>> >> On 08/30/2016 02:12 PM, Gonglei wrote:
>> >>> The virtio crypto device is a virtual crypto device (ie. hardware
>> >>> crypto accelerator card). The virtio crypto device can provide
>> >>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>> >>>
>> >>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
>> >>
>> >> I have mostly a few high level comments.
>> >>
>> >> For starters, a lot of the structs rely on the compiler to pad them
>>to
>> >> natural alignment. That may get us into trouble when trying to
>>emulate a
>> >> virtio device on a host with different guest architecture (like arm
>>on
>> >> x86). It'd be a good idea to manually pad everything to be 64bit
>>aligned
>> >> - then all fields are always at the same spot.
>> >>
>> > Good point! I'll do this in the next version. Thanks!
>> >
>> >> I also have a hard time getting my head around the final flow of
>> >> everything. Do I always have to create a session to be able to emit a
>> >> command? In that case, doesn't that slow down everything, since a
>> >> request would then need to wait for the host reply to receive its
>> >> session id? There should be some way to fire off a simple non-iv
>> >> operation without any session set up imho.
>> >>
>> > For symmetric algorithms, we'd better create a session before
>>executing
>> encryption
>> > Or decryption, because the session usually be kept for a specific
>> > algorithm with specific key in the production environment. And if we
>>only
>> change the iv,
>> > we don't need to re-create the session.
>>
>> I think we have a slight misunderstanding here :)
>>
>> The current workflow is
>>
>>   -> create session
>>   <- session key
>>   -> data in
>>   <- data out
>>   ...
>>   <- close session
>>   -> ack
>>
>> That means that at least for the first packet you have at least one full
>> round-trip cost from guest to host to guest to be able to send any data.
>>
>Yes, that's true...
>
>> That sounds pretty expensive to me on the latency side. There are
>> multiple ways to mitigate that. One idea was to have a separate path in
>> parallel to the create session + data + close session dance that would
>> combine them all into a single command. You would still have the session
>> based version, but accelerate the one-blob case.
>>
>> Another idea would be to make the guest be the session id janitor. Then
>> you could just do
>>
>>   -> create session with key X
>>   -> data in
>>   <- data out
>>   ...
>>
>> so you save the round trip, if you combine command and data queues, as
>> then the create and data bits are serialized by their position in the
>>queue.
>>
>... But for lots of crypto requests, the exhaust is low:
>
>-> create session with key X
><- session id
>    //do something in the guest if you like
> -> data in with session_id
>-> data in with session_id
>-> data in with session_id
>    ... ...    (fill out data virtqueue)
><-data out
><-data out
>    <-data out
>
>And this is what the production environment do currently.
>
>> >
>> > For the asymmetric algorithms, we don't need create a session IIRC.
>> >
>> > So, I don't think this is a performance degradation, but a performance
>> enhancement.
>> >
>> >> Also, I don't fully understand the split between control and data
>> >> queues. As far as I read things, the control queue is used to create
>> >> session ids and the data queues can then be used to push data. Is
>>there
>> >> any particular reason for the split? One thing that seems natural to
>>me
>> >> would be to have sessions be per-queue, so you would create a
>>session on
>> >> a particular queue and only have it ever be available there. That way
>> >> you get rid of any locking for sessions.
>> >>
>> > We want to keep a unify request type (structure) for data queue, so
>>we can
>> > keep the session operation in the control queue. Of course the
>>control queue
>> > only used to create sessions currently, but we can extend its
>>functions if
>> needed
>> > in the future.
>>
>> I still don't understand. With separate control+data queues you just get
>> yourself into synchronization troubles. Both struct
>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
>> have an opcode as first le32 field. You can easily use that to determine
>> both length of the payload as well as command (control vs data).
>>
>
>There is a big problem that the control handle logic is synchronization,
>but the data queue
>handling logic is asynchronization. We can't combine them into one queue.
>It will decrease the performance because you need indentify each packet
>if we do this forcedly.
Are you saying that control and data operations are handled by separate
"blocks²?
If you combined control and data queues, there would have to be a (SW)
demultiplexer
that would add overhead (and potentially decrease throughout) especially
for the data
operations?

And in order to truly benefit from (HW) crypto acceleration (especially
for smaller
data sizes), you want to minimise the amount of SW processing in the data
path.

‹ Ola

>
>
>Regards,
>-Gonglei
>
>> You could then also completely get rid of the "queue_id" fields, as any
>> operation would only ever operate on the queue it's running on.
>>
>>
>> Alex
>

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Alexander Graf Sept. 2, 2016, 1:52 p.m. UTC | #7
On 02.09.16 12:26, Gonglei (Arei) wrote:
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Friday, September 02, 2016 4:07 PM
>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
>>
>>
>>
>> On 02.09.16 05:08, Gonglei (Arei) wrote:
>>> Hi Alex,
>>>
>>>
>>>> -----Original Message-----
>>>> From: Alexander Graf [mailto:agraf@suse.de]
>>>> Sent: Thursday, September 01, 2016 9:37 PM
>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>> specification
>>>>
>>>> On 08/30/2016 02:12 PM, Gonglei wrote:
>>>>> The virtio crypto device is a virtual crypto device (ie. hardware
>>>>> crypto accelerator card). The virtio crypto device can provide
>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>>>>>
>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
>>>>
>>>> I have mostly a few high level comments.
>>>>
>>>> For starters, a lot of the structs rely on the compiler to pad them to
>>>> natural alignment. That may get us into trouble when trying to emulate a
>>>> virtio device on a host with different guest architecture (like arm on
>>>> x86). It'd be a good idea to manually pad everything to be 64bit aligned
>>>> - then all fields are always at the same spot.
>>>>
>>> Good point! I'll do this in the next version. Thanks!
>>>
>>>> I also have a hard time getting my head around the final flow of
>>>> everything. Do I always have to create a session to be able to emit a
>>>> command? In that case, doesn't that slow down everything, since a
>>>> request would then need to wait for the host reply to receive its
>>>> session id? There should be some way to fire off a simple non-iv
>>>> operation without any session set up imho.
>>>>
>>> For symmetric algorithms, we'd better create a session before executing
>> encryption
>>> Or decryption, because the session usually be kept for a specific
>>> algorithm with specific key in the production environment. And if we only
>> change the iv,
>>> we don't need to re-create the session.
>>
>> I think we have a slight misunderstanding here :)
>>
>> The current workflow is
>>
>>   -> create session
>>   <- session key
>>   -> data in
>>   <- data out
>>   ...
>>   <- close session
>>   -> ack
>>
>> That means that at least for the first packet you have at least one full
>> round-trip cost from guest to host to guest to be able to send any data.
>>
> Yes, that's true... 
> 
>> That sounds pretty expensive to me on the latency side. There are
>> multiple ways to mitigate that. One idea was to have a separate path in
>> parallel to the create session + data + close session dance that would
>> combine them all into a single command. You would still have the session
>> based version, but accelerate the one-blob case.
>>
>> Another idea would be to make the guest be the session id janitor. Then
>> you could just do
>>
>>   -> create session with key X
>>   -> data in
>>   <- data out
>>   ...
>>
>> so you save the round trip, if you combine command and data queues, as
>> then the create and data bits are serialized by their position in the queue.
>>
> ... But for lots of crypto requests, the exhaust is low:
> 
> 	-> create session with key X
> 	<- session id
>     //do something in the guest if you like
>  	-> data in with session_id
> 	-> data in with session_id
> 	-> data in with session_id
>     ... ...    (fill out data virtqueue)
> 	<-data out
> 	<-data out
>     <-data out
> 
> And this is what the production environment do currently.

Does this also apply to a web server handling lots of small requests?
Does crypto acceleration even make sense in such an environment or would
it be cheaper to simply to the crypto on the CPU for such small payloads?

I'm seriously asking :).

> 
>>>
>>> For the asymmetric algorithms, we don't need create a session IIRC.
>>>
>>> So, I don't think this is a performance degradation, but a performance
>> enhancement.
>>>
>>>> Also, I don't fully understand the split between control and data
>>>> queues. As far as I read things, the control queue is used to create
>>>> session ids and the data queues can then be used to push data. Is there
>>>> any particular reason for the split? One thing that seems natural to me
>>>> would be to have sessions be per-queue, so you would create a session on
>>>> a particular queue and only have it ever be available there. That way
>>>> you get rid of any locking for sessions.
>>>>
>>> We want to keep a unify request type (structure) for data queue, so we can
>>> keep the session operation in the control queue. Of course the control queue
>>> only used to create sessions currently, but we can extend its functions if
>> needed
>>> in the future.
>>
>> I still don't understand. With separate control+data queues you just get
>> yourself into synchronization troubles. Both struct
>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
>> have an opcode as first le32 field. You can easily use that to determine
>> both length of the payload as well as command (control vs data).
>>
> 
> There is a big problem that the control handle logic is synchronization, but the data queue
> handling logic is asynchronization. We can't combine them into one queue.
> It will decrease the performance because you need indentify each packet
> if we do this forcedly.

I might again just not grasp the full picture here. Is the session key
something global you get from hardware?

In that case, it might be a good idea to take a look at all existing
crypto accelerator implementations we know of today and see which model
works best across the board.

I also think it's probably a good idea to not pass hardware session ids
into the guest, so you'd want to create a separate lookup table that
converts from virtio session to hardware session. That way hardware that
does not have global session tokens can actually leverage multiple CPUs
hammering it.


Alex
Alexander Graf Sept. 2, 2016, 2:05 p.m. UTC | #8
On 02.09.16 14:16, Ola Liljedahl wrote:
> 
> 
> On 02/09/2016, 12:26, "Gonglei (Arei)" <arei.gonglei@huawei.com> wrote:
> 
>>
>>> -----Original Message-----
>>> From: Alexander Graf [mailto:agraf@suse.de]
>>> Sent: Friday, September 02, 2016 4:07 PM
>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>> specification
>>>
>>>
>>>
>>> On 02.09.16 05:08, Gonglei (Arei) wrote:
>>>> Hi Alex,
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Alexander Graf [mailto:agraf@suse.de]
>>>>> Sent: Thursday, September 01, 2016 9:37 PM
>>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>> specification
>>>>>
>>>>> On 08/30/2016 02:12 PM, Gonglei wrote:
>>>>>> The virtio crypto device is a virtual crypto device (ie. hardware
>>>>>> crypto accelerator card). The virtio crypto device can provide
>>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>>>>>>
>>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
>>>>>
>>>>> I have mostly a few high level comments.
>>>>>
>>>>> For starters, a lot of the structs rely on the compiler to pad them
>>> to
>>>>> natural alignment. That may get us into trouble when trying to
>>> emulate a
>>>>> virtio device on a host with different guest architecture (like arm
>>> on
>>>>> x86). It'd be a good idea to manually pad everything to be 64bit
>>> aligned
>>>>> - then all fields are always at the same spot.
>>>>>
>>>> Good point! I'll do this in the next version. Thanks!
>>>>
>>>>> I also have a hard time getting my head around the final flow of
>>>>> everything. Do I always have to create a session to be able to emit a
>>>>> command? In that case, doesn't that slow down everything, since a
>>>>> request would then need to wait for the host reply to receive its
>>>>> session id? There should be some way to fire off a simple non-iv
>>>>> operation without any session set up imho.
>>>>>
>>>> For symmetric algorithms, we'd better create a session before
>>> executing
>>> encryption
>>>> Or decryption, because the session usually be kept for a specific
>>>> algorithm with specific key in the production environment. And if we
>>> only
>>> change the iv,
>>>> we don't need to re-create the session.
>>>
>>> I think we have a slight misunderstanding here :)
>>>
>>> The current workflow is
>>>
>>>   -> create session
>>>   <- session key
>>>   -> data in
>>>   <- data out
>>>   ...
>>>   <- close session
>>>   -> ack
>>>
>>> That means that at least for the first packet you have at least one full
>>> round-trip cost from guest to host to guest to be able to send any data.
>>>
>> Yes, that's true...
>>
>>> That sounds pretty expensive to me on the latency side. There are
>>> multiple ways to mitigate that. One idea was to have a separate path in
>>> parallel to the create session + data + close session dance that would
>>> combine them all into a single command. You would still have the session
>>> based version, but accelerate the one-blob case.
>>>
>>> Another idea would be to make the guest be the session id janitor. Then
>>> you could just do
>>>
>>>   -> create session with key X
>>>   -> data in
>>>   <- data out
>>>   ...
>>>
>>> so you save the round trip, if you combine command and data queues, as
>>> then the create and data bits are serialized by their position in the
>>> queue.
>>>
>> ... But for lots of crypto requests, the exhaust is low:
>>
>> -> create session with key X
>> <- session id
>>    //do something in the guest if you like
>> -> data in with session_id
>> -> data in with session_id
>> -> data in with session_id
>>    ... ...    (fill out data virtqueue)
>> <-data out
>> <-data out
>>    <-data out
>>
>> And this is what the production environment do currently.
>>
>>>>
>>>> For the asymmetric algorithms, we don't need create a session IIRC.
>>>>
>>>> So, I don't think this is a performance degradation, but a performance
>>> enhancement.
>>>>
>>>>> Also, I don't fully understand the split between control and data
>>>>> queues. As far as I read things, the control queue is used to create
>>>>> session ids and the data queues can then be used to push data. Is
>>> there
>>>>> any particular reason for the split? One thing that seems natural to
>>> me
>>>>> would be to have sessions be per-queue, so you would create a
>>> session on
>>>>> a particular queue and only have it ever be available there. That way
>>>>> you get rid of any locking for sessions.
>>>>>
>>>> We want to keep a unify request type (structure) for data queue, so
>>> we can
>>>> keep the session operation in the control queue. Of course the
>>> control queue
>>>> only used to create sessions currently, but we can extend its
>>> functions if
>>> needed
>>>> in the future.
>>>
>>> I still don't understand. With separate control+data queues you just get
>>> yourself into synchronization troubles. Both struct
>>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
>>> have an opcode as first le32 field. You can easily use that to determine
>>> both length of the payload as well as command (control vs data).
>>>
>>
>> There is a big problem that the control handle logic is synchronization,
>> but the data queue
>> handling logic is asynchronization. We can't combine them into one queue.
>> It will decrease the performance because you need indentify each packet
>> if we do this forcedly.
> Are you saying that control and data operations are handled by separate
> "blocks²?
> If you combined control and data queues, there would have to be a (SW)
> demultiplexer
> that would add overhead (and potentially decrease throughout) especially
> for the data
> operations?

Uh, the multiplexer is as simple as a switch() statement on the opcode,
no? It might stall that one particular queue, but that sounds like
something you can improve by increasing the number of queues (and invent
something smart to ease polling of activity on more queues).

I think the main point Gonglei is bringing up here is that session
creation needs to finish before we can start firing off packets to that
session. That's another point that I dislike about the approach. I can't
think of any reason why session creation should be synchronous if you
allow the guest to handle host<->guest session tokens.

Data ordering needs to be consistent either way if you handle IVs in
hardware, so you naturally order session creation and data transmission
for a particular session.

But again, I really might just be missing something basic.

> And in order to truly benefit from (HW) crypto acceleration (especially
> for smaller
> data sizes), you want to minimise the amount of SW processing in the data
> path.

I think we're on full agreement on that one :). That's the whole point.
If you have 256 cores, you don't want all of your crypto sessions go
over a single bottleneck control queue. Creating sessions is a semi-hot
path.


Alex
Gonglei (Arei) Sept. 3, 2016, 2:51 a.m. UTC | #9
Hi Alex,

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Friday, September 02, 2016 9:53 PM
> Euler)
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
> 
> 
> 
> On 02.09.16 12:26, Gonglei (Arei) wrote:
> >
> >> -----Original Message-----
> >> From: Alexander Graf [mailto:agraf@suse.de]
> >> Sent: Friday, September 02, 2016 4:07 PM
> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> specification
> >>
> >>
> >>
> >> On 02.09.16 05:08, Gonglei (Arei) wrote:
> >>> Hi Alex,
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Alexander Graf [mailto:agraf@suse.de]
> >>>> Sent: Thursday, September 01, 2016 9:37 PM
> >>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >> specification
> >>>>
> >>>> On 08/30/2016 02:12 PM, Gonglei wrote:
> >>>>> The virtio crypto device is a virtual crypto device (ie. hardware
> >>>>> crypto accelerator card). The virtio crypto device can provide
> >>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >>>>>
> >>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >>>>
> >>>> I have mostly a few high level comments.
> >>>>
> >>>> For starters, a lot of the structs rely on the compiler to pad them to
> >>>> natural alignment. That may get us into trouble when trying to emulate a
> >>>> virtio device on a host with different guest architecture (like arm on
> >>>> x86). It'd be a good idea to manually pad everything to be 64bit aligned
> >>>> - then all fields are always at the same spot.
> >>>>
> >>> Good point! I'll do this in the next version. Thanks!
> >>>
> >>>> I also have a hard time getting my head around the final flow of
> >>>> everything. Do I always have to create a session to be able to emit a
> >>>> command? In that case, doesn't that slow down everything, since a
> >>>> request would then need to wait for the host reply to receive its
> >>>> session id? There should be some way to fire off a simple non-iv
> >>>> operation without any session set up imho.
> >>>>
> >>> For symmetric algorithms, we'd better create a session before executing
> >> encryption
> >>> Or decryption, because the session usually be kept for a specific
> >>> algorithm with specific key in the production environment. And if we only
> >> change the iv,
> >>> we don't need to re-create the session.
> >>
> >> I think we have a slight misunderstanding here :)
> >>
> >> The current workflow is
> >>
> >>   -> create session
> >>   <- session key
> >>   -> data in
> >>   <- data out
> >>   ...
> >>   <- close session
> >>   -> ack
> >>
> >> That means that at least for the first packet you have at least one full
> >> round-trip cost from guest to host to guest to be able to send any data.
> >>
> > Yes, that's true...
> >
> >> That sounds pretty expensive to me on the latency side. There are
> >> multiple ways to mitigate that. One idea was to have a separate path in
> >> parallel to the create session + data + close session dance that would
> >> combine them all into a single command. You would still have the session
> >> based version, but accelerate the one-blob case.
> >>
> >> Another idea would be to make the guest be the session id janitor. Then
> >> you could just do
> >>
> >>   -> create session with key X
> >>   -> data in
> >>   <- data out
> >>   ...
> >>
> >> so you save the round trip, if you combine command and data queues, as
> >> then the create and data bits are serialized by their position in the queue.
> >>
> > ... But for lots of crypto requests, the exhaust is low:
> >
> > 	-> create session with key X
> > 	<- session id
> >     //do something in the guest if you like
> >  	-> data in with session_id
> > 	-> data in with session_id
> > 	-> data in with session_id
> >     ... ...    (fill out data virtqueue)
> > 	<-data out
> > 	<-data out
> >     <-data out
> >
> > And this is what the production environment do currently.
> 
> Does this also apply to a web server handling lots of small requests?
> Does crypto acceleration even make sense in such an environment or would
> it be cheaper to simply to the crypto on the CPU for such small payloads?
> 
> I'm seriously asking :).
> 
I think I get your points now. You mean for one-blob request (different packets have different algorithms or key), 
then the exhaust is too higher under current approach? That's true, because for each packet
we do have the below workflow:

 Guest -> Host

 -> create session
 <- session id
 -> data in
 <- data out
 <- close session

This is a big problem indeed.

> >
> >>>
> >>> For the asymmetric algorithms, we don't need create a session IIRC.
> >>>
> >>> So, I don't think this is a performance degradation, but a performance
> >> enhancement.
> >>>
> >>>> Also, I don't fully understand the split between control and data
> >>>> queues. As far as I read things, the control queue is used to create
> >>>> session ids and the data queues can then be used to push data. Is there
> >>>> any particular reason for the split? One thing that seems natural to me
> >>>> would be to have sessions be per-queue, so you would create a session on
> >>>> a particular queue and only have it ever be available there. That way
> >>>> you get rid of any locking for sessions.
> >>>>
> >>> We want to keep a unify request type (structure) for data queue, so we can
> >>> keep the session operation in the control queue. Of course the control
> queue
> >>> only used to create sessions currently, but we can extend its functions if
> >> needed
> >>> in the future.
> >>
> >> I still don't understand. With separate control+data queues you just get
> >> yourself into synchronization troubles. Both struct
> >> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
> >> have an opcode as first le32 field. You can easily use that to determine
> >> both length of the payload as well as command (control vs data).
> >>
> >
> > There is a big problem that the control handle logic is synchronization, but the
> data queue
> > handling logic is asynchronization. We can't combine them into one queue.
> > It will decrease the performance because you need indentify each packet
> > if we do this forcedly.
> 
> I might again just not grasp the full picture here. Is the session key
> something global you get from hardware?
> 
The specific algorithm and key, key length etc. can exclusively determine a session,
which is a software concept IMO, in order to improve the performance for lots of
packets with the same algo and key while executing crypto operation. A session is a
handle which describes the cryptographic parameters to be applied to a number of
buffers.

> In that case, it might be a good idea to take a look at all existing
> crypto accelerator implementations we know of today and see which model
> works best across the board.
> 
Mihai, what about your crypto accelerators for SOC?

> I also think it's probably a good idea to not pass hardware session ids
> into the guest, so you'd want to create a separate lookup table that
> converts from virtio session to hardware session. That way hardware that
> does not have global session tokens can actually leverage multiple CPUs
> hammering it.
> 
The retuned session_id is an index based on software realization, not hardware session ids.
We truly need to create a separate lookup table.

> 
> Alex

Regards,
-Gonglei
Gonglei (Arei) Sept. 3, 2016, 3:18 a.m. UTC | #10
Hi,

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Friday, September 02, 2016 10:05 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
> 
> 
> 
> On 02.09.16 14:16, Ola Liljedahl wrote:
> >
> >
> > On 02/09/2016, 12:26, "Gonglei (Arei)" <arei.gonglei@huawei.com> wrote:
> >
> >>
> >>> -----Original Message-----
> >>> From: Alexander Graf [mailto:agraf@suse.de]
> >>> Sent: Friday, September 02, 2016 4:07 PM
> >>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>> specification
> >>>
> >>>
> >>>
> >>> On 02.09.16 05:08, Gonglei (Arei) wrote:
> >>>> Hi Alex,
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Alexander Graf [mailto:agraf@suse.de]
> >>>>> Sent: Thursday, September 01, 2016 9:37 PM
> >>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>> specification
> >>>>>
> >>>>> On 08/30/2016 02:12 PM, Gonglei wrote:
> >>>>>> The virtio crypto device is a virtual crypto device (ie. hardware
> >>>>>> crypto accelerator card). The virtio crypto device can provide
> >>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM,
> PRIMITIVE.
> >>>>>>
> >>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >>>>>
> >>>>> I have mostly a few high level comments.
> >>>>>
> >>>>> For starters, a lot of the structs rely on the compiler to pad them
> >>> to
> >>>>> natural alignment. That may get us into trouble when trying to
> >>> emulate a
> >>>>> virtio device on a host with different guest architecture (like arm
> >>> on
> >>>>> x86). It'd be a good idea to manually pad everything to be 64bit
> >>> aligned
> >>>>> - then all fields are always at the same spot.
> >>>>>
> >>>> Good point! I'll do this in the next version. Thanks!
> >>>>
> >>>>> I also have a hard time getting my head around the final flow of
> >>>>> everything. Do I always have to create a session to be able to emit a
> >>>>> command? In that case, doesn't that slow down everything, since a
> >>>>> request would then need to wait for the host reply to receive its
> >>>>> session id? There should be some way to fire off a simple non-iv
> >>>>> operation without any session set up imho.
> >>>>>
> >>>> For symmetric algorithms, we'd better create a session before
> >>> executing
> >>> encryption
> >>>> Or decryption, because the session usually be kept for a specific
> >>>> algorithm with specific key in the production environment. And if we
> >>> only
> >>> change the iv,
> >>>> we don't need to re-create the session.
> >>>
> >>> I think we have a slight misunderstanding here :)
> >>>
> >>> The current workflow is
> >>>
> >>>   -> create session
> >>>   <- session key
> >>>   -> data in
> >>>   <- data out
> >>>   ...
> >>>   <- close session
> >>>   -> ack
> >>>
> >>> That means that at least for the first packet you have at least one full
> >>> round-trip cost from guest to host to guest to be able to send any data.
> >>>
> >> Yes, that's true...
> >>
> >>> That sounds pretty expensive to me on the latency side. There are
> >>> multiple ways to mitigate that. One idea was to have a separate path in
> >>> parallel to the create session + data + close session dance that would
> >>> combine them all into a single command. You would still have the session
> >>> based version, but accelerate the one-blob case.
> >>>
> >>> Another idea would be to make the guest be the session id janitor. Then
> >>> you could just do
> >>>
> >>>   -> create session with key X
> >>>   -> data in
> >>>   <- data out
> >>>   ...
> >>>
> >>> so you save the round trip, if you combine command and data queues, as
> >>> then the create and data bits are serialized by their position in the
> >>> queue.
> >>>
> >> ... But for lots of crypto requests, the exhaust is low:
> >>
> >> -> create session with key X
> >> <- session id
> >>    //do something in the guest if you like
> >> -> data in with session_id
> >> -> data in with session_id
> >> -> data in with session_id
> >>    ... ...    (fill out data virtqueue)
> >> <-data out
> >> <-data out
> >>    <-data out
> >>
> >> And this is what the production environment do currently.
> >>
> >>>>
> >>>> For the asymmetric algorithms, we don't need create a session IIRC.
> >>>>
> >>>> So, I don't think this is a performance degradation, but a performance
> >>> enhancement.
> >>>>
> >>>>> Also, I don't fully understand the split between control and data
> >>>>> queues. As far as I read things, the control queue is used to create
> >>>>> session ids and the data queues can then be used to push data. Is
> >>> there
> >>>>> any particular reason for the split? One thing that seems natural to
> >>> me
> >>>>> would be to have sessions be per-queue, so you would create a
> >>> session on
> >>>>> a particular queue and only have it ever be available there. That way
> >>>>> you get rid of any locking for sessions.
> >>>>>
> >>>> We want to keep a unify request type (structure) for data queue, so
> >>> we can
> >>>> keep the session operation in the control queue. Of course the
> >>> control queue
> >>>> only used to create sessions currently, but we can extend its
> >>> functions if
> >>> needed
> >>>> in the future.
> >>>
> >>> I still don't understand. With separate control+data queues you just get
> >>> yourself into synchronization troubles. Both struct
> >>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
> >>> have an opcode as first le32 field. You can easily use that to determine
> >>> both length of the payload as well as command (control vs data).
> >>>
> >>
> >> There is a big problem that the control handle logic is synchronization,
> >> but the data queue
> >> handling logic is asynchronization. We can't combine them into one queue.
> >> It will decrease the performance because you need indentify each packet
> >> if we do this forcedly.
> > Are you saying that control and data operations are handled by separate
> > "blocks²?
> > If you combined control and data queues, there would have to be a (SW)
> > demultiplexer
> > that would add overhead (and potentially decrease throughout) especially
> > for the data
> > operations?
> 
> Uh, the multiplexer is as simple as a switch() statement on the opcode,
> no? It might stall that one particular queue, but that sounds like
> something you can improve by increasing the number of queues (and invent
> something smart to ease polling of activity on more queues).
> 
Actually, the virtio multiple queue's capacity is based on the backend crypto device,
like multiple queue tap device work with virtio-net.

> I think the main point Gonglei is bringing up here is that session
> creation needs to finish before we can start firing off packets to that
> session. That's another point that I dislike about the approach. I can't
> think of any reason why session creation should be synchronous if you
> allow the guest to handle host<->guest session tokens.
> 
> Data ordering needs to be consistent either way if you handle IVs in
> hardware, so you naturally order session creation and data transmission
> for a particular session.
> 
> But again, I really might just be missing something basic.
> 
Assuming combine control and data queue into one queue, let's imagine the below scenarios:

The synchronization scenario is ok:
 
1. Guest send packet A with session and data information to Host, opcode=create session
2. Host creates session then executes crypto operation with session created
3. Host return the out data to Guest (with session id for next crypto operation?)
4. Guest send packet B with session id and data information to Host, opcode= crypto operation
5. Host searches the global transfer table with session id to get hardware session.
6. Host executes crypto operation
7. Host return the out data to Guest.
... ...

How about the asynchronization scenario? (assuming that the below packets have same algo and key etc.)

1. Guest send packet A with session and data information to Host, opcode=create session
2. Guest send packet B with session and data information to Host, opcode=create session or crypto operation?
3. Guest send packet C with session and data information to Host, opcode=create session or crypto operation?
... ...

Each packet need to create a session (though they are the same), we can't get the best performance.

> > And in order to truly benefit from (HW) crypto acceleration (especially
> > for smaller
> > data sizes), you want to minimise the amount of SW processing in the data
> > path.
> 
> I think we're on full agreement on that one :). That's the whole point.
> If you have 256 cores, you don't want all of your crypto sessions go
> over a single bottleneck control queue. Creating sessions is a semi-hot
> path.
> 
Please refer to my another reply, it's indeed a problem.

> 
> Alex

Regards,
-Gonglei
Gonglei (Arei) Sept. 3, 2016, 3:44 a.m. UTC | #11
Hi Stefan,


> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> Sent: Friday, September 02, 2016 8:07 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
> 
> On Tue, Aug 30, 2016 at 08:12:15PM +0800, Gonglei wrote:
> 
> Hi,
> I have read through part of the spec and suggest mostly grammar fixes
> below.
> 
Thank you very much! Forgive me for my poor English please :)
I'll fix them in the following version.

> > The virtio crypto device is a virtual crypto device (ie. hardware
> > crypto accelerator card). The virtio crypto device can provide
> > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >
> > In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >
> > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > CC: Michael S. Tsirkin <mst@redhat.com>
> > CC: Cornelia Huck <cornelia.huck@de.ibm.com>
> > CC: Stefan Hajnoczi <stefanha@redhat.com>
> > CC: Lingli Deng <denglingli@chinamobile.com>
> > CC: Jani Kokkonen <Jani.Kokkonen@huawei.com>
> > CC: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > CC: Varun Sethi <Varun.Sethi@freescale.com>
> > CC: Zeng Xin <xin.zeng@intel.com>
> > CC: Keating Brian <brian.a.keating@intel.com>
> > CC: Ma Liang J <liang.j.ma@intel.com>
> > CC: Griffin John <john.griffin@intel.com>
> > CC: Hanweidong <hanweidong@huawei.com>
> > CC: Mihai Claudiu Caraman <mike.caraman@nxp.com>
> > ---
> >  content.tex       |   2 +
> >  virtio-crypto.tex | 835
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 837 insertions(+)
> >  create mode 100644 virtio-crypto.tex
> >
> > diff --git a/content.tex b/content.tex
> > index 4b45678..ab75f78 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len},
> \field{residual},
> >  \field{status_qualifier}, \field{status}, \field{response} and
> >  \field{sense} fields.
> >
> > +\input{virtio-crypto.tex}
> > +
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >
> >  Currently there are three device-independent feature bits defined:
> > diff --git a/virtio-crypto.tex b/virtio-crypto.tex
> > new file mode 100644
> > index 0000000..491fc25
> > --- /dev/null
> > +++ b/virtio-crypto.tex
> > @@ -0,0 +1,835 @@
> > +\section{Crypto Device}\label{sec:Device Types / Crypto Device}
> > +
> > +The virtio crypto device is a virtual crypto device, and is a kind of
> > +virtual hardware accelerators for virtual machine.  The encryption and
> 
> s/accelerators/accelerator/
> s/virtual machine/virtual machines/
> 
> > +decryption requests of are placed in the data queue, and handled by the
> 
> s/of//
> 
> > +real crypto accelerators finally. The second queue is the control queue,
> > +which is used to create or destroy session for symmetric algorithms, and
> 
> s/session/sessions/
> 
> > +control some advanced features in the future. The virtio crypto
> > +device can provide seven crypto services: CIPHER, MAC, HASH, AEAD,
> > +KDF, ASYM, PRIMITIVE.
> > +
> > +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID}
> > +
> > +20
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device /
> Virtqueues}
> > +
> > +\begin{description}
> > +\item[0] dataq1
> > +\item[\ldots]
> > +\item[N-1] dataqN
> > +\item[N] controlq
> > +\end{description}
> > +
> > +N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1).
> 
> I suggest dropping (\field{max_dataqueues} >= 1) since this constraint
> already is expressed in the device normative section below.  Things
> can go out of sync if they are duplicated.
> 
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature
> bits}
> > +  None currently defined
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / Crypto
> Device / Device configuration layout}
> > +
> > +Thirteen driver-read-only configuration fields are currently defined.
> 
> I count only 12 fields.  I suggest saying "The following
> driver-read-only configuration fields are currently defined:" instead.
> 
> > +\begin{lstlisting}
> > +struct virtio_crypto_config {
> > +    le16  status;
> > +    le16  max_dataqueues;
> > +    le32  crypto_services;
> > +    /* detailed algorithms mask */
> > +    le32 cipher_algo_l;
> > +    le32 cipher_algo_h;
> > +    le32 hash_algo;
> > +    le32 mac_algo_l;
> > +    le32 mac_algo_h;
> > +    le32 asym_algo;
> > +    le32 kdf_algo;
> > +    le32 aead_algo;
> > +    le32 primitive_algo;
> > +};
> > +\end{lstlisting}
> > +
> > +The first field, \field{status} is currently defined:
> VIRTIO_CRYPTO_S_HW_READY.
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
> > +\end{lstlisting}
> > +
> > +The following driver-read-only field, \field{max_dataqueuess} specifies the
> > +maximum number of data virtqueues (dataq1\ldots dataqN). The
> crypto_services
> > +shows the crypto services the virtio crypto supports. The service currently
> 
> s/service/services/
> 
> > +defined are:
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */
> > +#define VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */
> 
> How are these constants used: 1 << VIRTIO_CRYPTO_NO_SERVICE?
> 
> I suggest eliminating the 0 bit constants since they can be deduced from
> the fact that all other bits are zeroed.  There is no need for a
> dedicated "NO_SERVICE" constant.
> 
Good points.

> > +#define VIRTIO_CRYPTO_SERVICE_HASH  (2) /* hash service */
> > +#define VIRTIO_CRYPTO_SERVICE_MAC   (3) /* MAC (Message
> Authentication Codes) service */
> > +#define VIRTIO_CRYPTO_SERVICE_AEAD  (4) /* AEAD(Authenticated
> Encryption with Associated Data) service */
> > +\end{lstlisting}
> > +
> > +The last driver-read-only fields specify detailed algorithms mask which
> 
> s/mask/masks/
> 
> > +the device offered for corresponding service. The below CIPHER algorithms
> 
> s/the device offered for corresponding service/the device offers for
> corresponding services/
> 
> > +are defined currently:
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_CRYPTO_NO_CIPHER                 0
> > +#define VIRTIO_CRYPTO_CIPHER_ARC4               1
> > +#define VIRTIO_CRYPTO_CIPHER_AES_ECB            2
> > +#define VIRTIO_CRYPTO_CIPHER_AES_CBC            3
> > +#define VIRTIO_CRYPTO_CIPHER_AES_CTR            4
> > +#define VIRTIO_CRYPTO_CIPHER_DES_ECB            5
> > +#define VIRTIO_CRYPTO_CIPHER_DES_CBC            6
> > +#define VIRTIO_CRYPTO_CIPHER_3DES_ECB           7
> > +#define VIRTIO_CRYPTO_CIPHER_3DES_CBC           8
> > +#define VIRTIO_CRYPTO_CIPHER_3DES_CTR           9
> > +#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8          10
> > +#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA2        11
> > +#define VIRTIO_CRYPTO_CIPHER_AES_F8             12
> > +#define VIRTIO_CRYPTO_CIPHER_AES_XTS            13
> > +#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3           14
> > +\end{lstlisting}
> > +
> > +The below HASH algorithms are defined currently:
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_CRYPTO_NO_HASH            0
> > +#define VIRTIO_CRYPTO_HASH_MD5           1
> > +#define VIRTIO_CRYPTO_HASH_SHA1          2
> > +#define VIRTIO_CRYPTO_HASH_SHA_224       3
> > +#define VIRTIO_CRYPTO_HASH_SHA_256       4
> > +#define VIRTIO_CRYPTO_HASH_SHA_384       5
> > +#define VIRTIO_CRYPTO_HASH_SHA_512       6
> > +#define VIRTIO_CRYPTO_HASH_SHA3_224      7
> > +#define VIRTIO_CRYPTO_HASH_SHA3_256      8
> > +#define VIRTIO_CRYPTO_HASH_SHA3_384      9
> > +#define VIRTIO_CRYPTO_HASH_SHA3_512      10
> > +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE128      11
> > +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE256      12
> > +\end{lstlisting}
> 
> Perhaps all masks should be 64-bit because the 32-bit hash_algo field
> already has 13 bits defined so room for future expansion is limited.
> 
Actually the cipher's mask and mac's mask had used 64-bit. 
Xin, what's your opinion?

> > +
> > +The below MAC algorithms are defined currently:
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_CRYPTO_NO_MAC                       0
> > +#define VIRTIO_CRYPTO_MAC_HMAC_MD5                 1
> > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA1                2
> > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_224             3
> > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_256             4
> > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_384             5
> > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_512             6
> > +#define VIRTIO_CRYPTO_MAC_CMAC_3DES                25
> > +#define VIRTIO_CRYPTO_MAC_CMAC_AES                 26
> > +#define VIRTIO_CRYPTO_MAC_CMAC_KASUMI_F9           27
> > +#define VIRTIO_CRYPTO_MAC_CMAC_SNOW3G_UIA2         28
> > +#define VIRTIO_CRYPTO_MAC_GMAC_AES                 41
> > +#define VIRTIO_CRYPTO_MAC_GMAC_TWOFISH             42
> > +#define VIRTIO_CRYPTO_MAC_CBCMAC_AES               49
> > +#define VIRTIO_CRYPTO_MAC_CBCMAC_KASUMI_F9         50
> > +#define VIRTIO_CRYPTO_MAC_XCBC_AES                 53
> > +#define VIRTIO_CRYPTO_MAC_ZUC_EIA3                 54
> > +\end{lstlisting}
> > +
> > +The below AEAD algorithms are defined currently:
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_CRYPTO_NO_AEAD     0
> > +#define VIRTIO_CRYPTO_AEAD_GCM    1
> > +#define VIRTIO_CRYPTO_AEAD_CCM    2
> > +#define VIRTIO_CRYPTO_AEAD_CHACHA20_POLY1305  3
> > +\end{lstlisting}
> > +
> > +\subsubsection{Device Requirements: Device configuration
> layout}\label{sec:Device Types / Crypto Device / Device configuration layout /
> Device Requirements: Device configuration layout}
> > +
> > +\begin{itemize*}
> 
> This section should use \devicenormative{\subsection}{...}.  I'm not
> sure why you used a regular \subsubsection{} followed by
> \begin{itemize*}.
> 
Ok.
> > +\item The device MUST set \field{max_dataqueues} to between 1 and 65535
> inclusive.
> > +\item The device SHOULD set \field{status} according to the status of the
> hardware-backed implementation.
> > +\item The device MUST set \field{crypto_services} according to the crypto
> services which the device offered.
> > +\item The device MUST set detailed algorithms mask according to
> \field{crypto_services} field.
> > +\end{itemize*}
> > +
> > +\subsubsection{Driver Requirements: Device configuration
> layout}\label{sec:Device Types / Crypto Device / Device configuration layout /
> Driver Requirements: Device configuration layout}
> > +
> > +\begin{itemize*}
> 
> \drivernormative{\subsection}{...}
> 
> > +\item The driver MUST read the ready \field{status} from the bottom bit of
> status to
> > +      check whether the hardware-backed implementation is ready or not.
> > +\item The driver MAY read \field{max_dataqueues} field to discover how
> many data queues the device supports.
> > +\item The driver MUST read \field{crypto_services} field to discover which
> services the device is able to offer.
> > +\item The driver MUST read detail \field{algorithms} field according to
> \field{crypto_services} field.
> 
> s/detail/the detailed/
> 
> > +\end{itemize*}
> > +
> > +\subsection{Device Initialization}\label{sec:Device Types / Crypto Device /
> Device Initialization}
> > +
> > +\subsubsection{Driver Requirements: Device Initialization}\label{sec:Device
> Types / Crypto Device / Device Initialization / Driver Requirements: Device
> Initialization}
> > +
> > +\begin{itemize*}
> > +\item The driver MUST identify and initialize data virtqueue, up to
> \field{max_dataqueues}.
> 
> "The driver MUST identify and initialize up to \field{max_dataqueues}
> data virtqueues."
> 
> Besides the grammar fixes I think this version makes it clearer that the
> driver does not have to initialize all data virtqueues if it wants to
> use fewer.
> 
> > +\item The driver MUST identify the control virtqueue.
> > +\item The driver MUST identify the ready status of hardware-backend from
> \field{status} field.
> > +\item The driver MUST read the supported crypto services from bits of
> \field{crypto_servies}.
> > +\item The driver MUST read the supported algorithms according to
> \field{crypto_services} field.
> > +\end{itemize*}
> > +
> > +\subsubsection{Device Requirements: Device Initialization}\label{sec:Device
> Types / Crypto Device / Device Initialization / Device Requirements: Device
> Initialization}
> > +
> > +\begin{itemize*}
> > +\item The device MUST be configured at least one real accelerator as the
> backend accelerator which executes real crypto operations.
> 
> The spec does not have to require a "real" accelerator.  A pure software
> implementation should be able to conform to the spec.  I would drop this
> line.
> 
Maybe we can change this line with 
" The device MUST be configured at least one accelerator which executes crypto operations "
I think this is better than drop it.

> > +\item The device MUST write the \field{crypto_services} field according to
> the capacities of the backend accelerator.
> > +\item The device MUST write the corresponding algorithms field according
> to the \field{crypto_services} field.
> 
> "Device Types / Crypto Device / Device configuration layout / Device
> Requirements: Device configuration layout" already expresses the same thing.
> This text seems unnecessary.
> 
> > +\end{itemize*}
> > +
> > +\subsection{Device Operation}\label{sec:Device Types / Crypto Device /
> Device Operation}
> > +
> > +Packets can be transmitted by placing them in both the controlq and dataq.
> > +The packet are consisted of general header and services specific request,
> 
> Grammar: "Packets consist of a generic header and a service-specific request."
> 
> > +Where 'general header' is for all crypto request, 'service specific request'
> 
> s/request/requests/
> 
> > +is composed of operation parameter + output data + input data in generally.
> 
> s/is/are/
> s/in generally/in general/
> 
> > +Operation parameters are algorithm-specific parameters, output data is the
> > +data should be operated, input data is the "operation result + result buffer".
> > +The general header of controlq:
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_CRYPTO_OPCODE(service, op)   ((service << 8) | (op))
> > +
> > +struct virtio_crypto_ctrl_header{
> > +#define VIRTIO_CRYPTO_CIPHER_CREATE_SESSION
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x02)
> > +#define VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x03)
> > +#define VIRTIO_CRYPTO_HASH_CREATE_SESSION
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x02)
> > +#define VIRTIO_CRYPTO_HASH_DESTROY_SESSION
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x03)
> > +#define VIRTIO_CRYPTO_MAC_CREATE_SESSION
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x02)
> > +#define VIRTIO_CRYPTO_MAC_DESTROY_SESSION
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x03)
> > +#define VIRTIO_CRYPTO_AEAD_CREATE_SESSION
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
> > +#define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
> > +    le32 opcode;
> > +    /* algo should be service-specific algorithms */
> > +    le32 algo;
> > +    /* control flag to control the request */
> > +    le16 flag;
> > +    /* data virtqeueu id */
> 
> s/virtqeueu/virtqueue/
> 
> > +    le16 queue_id;
> > +};
> > +\end{lstlisting}
> > +
> > +The general header of dataq:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_op_header {
> > +#define VIRTIO_CRYPTO_CIPHER_ENCRYPT
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x00)
> > +#define VIRTIO_CRYPTO_CIPHER_DECRYPT
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x01)
> > +#define VIRTIO_CRYPTO_HASH
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x00)
> > +#define VIRTIO_CRYPTO_MAC
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x00)
> > +#define VIRTIO_CRYPTO_AEAD_ENCRYPT
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
> > +#define VIRTIO_CRYPTO_AEAD_DECRYPT
> VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
> > +    le32 opcode;
> > +    /* algo should be service-specific algorithms */
> > +    le32 algo;
> > +    /* control flag to control the request */
> > +    le16 flag;
> > +    /* data virtqeueu id */
> 
> s/virtqeueu/virtqueue/
> 
> > +    le16 queue_id;
> > +    /* session_id should be service-specific algorithms */
> > +    le64 session_id;
> 
> This field is not 64-bit aligned.  The compiler will insert 32-bit
> padding.  Please move the field to a 64-bit aligned boundary so that the
> binary layout does not change according to the compiler/architecture.
> 
Yes, as Alex's comments, I'll redefine all structures to keep 64bit aligned.

> > +};
> > +\end{lstlisting}
> > +
> > +\subsubsection{Control virtqueue}\label{sec:Device Types / Crypto Device /
> Device Operation / Control virtqueue}
> > +
> > +The driver uses the control virtqueue to send control commands to the
> > +device which finish the non-data plane operations, such as session
> 
> s/finish/handles/
> 
> > +operations (see \ref{sec:Device Types / Crypto Device / Device Operation /
> Session operation}).
> > +The packet of controlq:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_op_ctrl_req {
> > +    struct virtio_crypto_ctrl_header header;
> > +
> > +    union {
> > +        struct virtio_crypto_sym_create_session_req
> sym_create_session;
> > +        struct virtio_crypto_hash_create_session_req
> hash_create_session;
> > +        struct virtio_crypto_mac_create_session_req
> mac_create_session;
> > +        struct virtio_crypto_aead_create_session_req
> aead_create_session;
> > +        struct virtio_crypto_destroy_session_req
> sym_destroy_session;
> > +    } u;
> > +};
> > +\end{lstlisting}
> > +
> > +The header is the general header, the union is algorithm-specific type,
> 
> /is algorithm-specific type/is an algorithm-specific type/
> 
> > +which is set by the driver. All the properties in the union will be shown as
> follow.
> 
> s/follow/follows/
> 
> > +
> > +\subsubsection{Session operation}\label{sec:Device Types / Crypto Device /
> Device Operation / Session operation}
> > +
> > +The symmetric algorithms have the concept of sessions. A session is a
> > +handle which describes the cryptographic parameters to be applied to
> > +a number of buffers. The data within a session handle includes the following:
> > +
> > +\begin{enumerate}
> > +\item The operation (cipher, hash/mac or both, and if both, the order in
> > +      which the algorithms should be applied).
> > +\item The cipher setup data, including the cipher algorithm and mode,
> > +      the key and its length, and the direction (encrypt or decrypt).
> > +\item The hash/mac setup data, including the hash algorithm or mac
> algorithm,
> > +      and digest result length (to allow for truncation).
> > +\begin{itemize*}
> > +\item Authenticated mode can refer to MAC, which requires that the key
> and
> > +      its length are also specified.
> > +\item For nested mode, the inner and outer prefix data and length are
> specified,
> > +      as well as the outer hash algorithm.
> > +\end{itemize*}
> > +\end{enumerate}
> > +
> > +\paragraph{Session operation: HASH session}\label{sec:Device Types /
> Crypto Device / Device Operation / Session operation / Session operation: HASH
> session}
> > +
> > +The packet of HASH session is:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_hash_session_para {
> > +    /* See VIRTIO_CRYPTO_HASH_* above */
> > +    le32 algo;
> > +    /* hash result length */
> > +    le32 hash_result_len;
> > +};
> > +struct virtio_crypto_hash_create_session_req {
> > +    // Device-readable part
> > +    struct virtio_crypto_hash_session_para parameter;
> > +    // Device-writable part
> > +    le64    session_id;
> > +    u8     status;
> > +};
> > +\end{lstlisting}
> > +
> > +\paragraph{Session operation: MAC session}\label{sec:Device Types /
> Crypto Device / Device
> > +Operation / Session operation / Session operation: MAC session}
> > +
> > +The packet of MAC session is:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_mac_session_para {
> > +    /* See VIRTIO_CRYPTO_MAC_* above */
> > +    le32 algo;
> > +    /* hash result length */
> > +    le32 hash_result_len;
> > +    /* length of authenticated key */
> > +    le32 auth_key_len;
> > +};
> > +struct virtio_crypto_mac_create_session_req {
> > +    // Device-readable part
> > +    struct virtio_crypto_mac_session_para  parameter;
> > +    // Device-writable part
> > +    le64    session_id;
> 
> This field is not 64-bit aligned.  I suggest manually adding a le32
> padding field before it.
> 
> Please check all other structs defined in this spec.  You can also use
> the pahole(1) utility to inspect struct layout chosen by your compiler.
> There must be no compiler-generated padding.
> 
Ok, thanks!

> > +    u8     status;
> > +};
> > +\end{lstlisting}
> > +
> > +\paragraph{Session operation: Symmetric algorithms
> session}\label{sec:Device Types / Crypto Device / Device
> > +Operation / Session operation / Session operation: Symmetric algorithms
> session}
> > +
> > +The request of symmetric session includes two parts, CIPHER algorithms
> and chain
> > +algorithms (chaining cipher and hash/mac). The packet of symmetric session
> is:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_cipher_session_para {
> > +    /* See VIRTIO_CRYPTO_CIPHER* above */
> > +    le32 algo;
> > +    /* length of key */
> 
> In bytes or in bits?

Bytes. All lengths are bytes in the spec.

Regards,
-Gonglei
Ola Liljedahl Sept. 4, 2016, 3:47 p.m. UTC | #12
On 02/09/2016, 16:05, "Alexander Graf" <agraf@suse.de> wrote:

>>>There is a big problem that the control handle logic is synchronization,

>>>but the data queue

>>>handling logic is asynchronization. We can't combine them into one

>>>queue.

>>>It will decrease the performance because you need indentify each packet

>>>if we do this forcedly.

>>Are you saying that control and data operations are handled by separate

>>"blocks²?

>>If you combined control and data queues, there would have to be a (SW)

>>demultiplexer

>>that would add overhead (and potentially decrease throughout) especially

>>for the data

>>operations?

>

>Uh, the multiplexer is as simple as a switch() statement on the opcode,

>no?

You are assuming the backend will (always) be implemented in software.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Alexander Graf Sept. 5, 2016, 7:37 a.m. UTC | #13
On 09/03/2016 05:18 AM, Gonglei (Arei) wrote:
> Hi,
>
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Friday, September 02, 2016 10:05 PM
>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
>>
>>
>>
>> On 02.09.16 14:16, Ola Liljedahl wrote:
>>>
>>> On 02/09/2016, 12:26, "Gonglei (Arei)" <arei.gonglei@huawei.com> wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: Alexander Graf [mailto:agraf@suse.de]
>>>>> Sent: Friday, September 02, 2016 4:07 PM
>>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>>>> specification
>>>>>
>>>>>
>>>>>
>>>>> On 02.09.16 05:08, Gonglei (Arei) wrote:
>>>>>> Hi Alex,
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Alexander Graf [mailto:agraf@suse.de]
>>>>>>> Sent: Thursday, September 01, 2016 9:37 PM
>>>>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>>>> specification
>>>>>>> On 08/30/2016 02:12 PM, Gonglei wrote:
>>>>>>>> The virtio crypto device is a virtual crypto device (ie. hardware
>>>>>>>> crypto accelerator card). The virtio crypto device can provide
>>>>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM,
>> PRIMITIVE.
>>>>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
>>>>>>> I have mostly a few high level comments.
>>>>>>>
>>>>>>> For starters, a lot of the structs rely on the compiler to pad them
>>>>> to
>>>>>>> natural alignment. That may get us into trouble when trying to
>>>>> emulate a
>>>>>>> virtio device on a host with different guest architecture (like arm
>>>>> on
>>>>>>> x86). It'd be a good idea to manually pad everything to be 64bit
>>>>> aligned
>>>>>>> - then all fields are always at the same spot.
>>>>>>>
>>>>>> Good point! I'll do this in the next version. Thanks!
>>>>>>
>>>>>>> I also have a hard time getting my head around the final flow of
>>>>>>> everything. Do I always have to create a session to be able to emit a
>>>>>>> command? In that case, doesn't that slow down everything, since a
>>>>>>> request would then need to wait for the host reply to receive its
>>>>>>> session id? There should be some way to fire off a simple non-iv
>>>>>>> operation without any session set up imho.
>>>>>>>
>>>>>> For symmetric algorithms, we'd better create a session before
>>>>> executing
>>>>> encryption
>>>>>> Or decryption, because the session usually be kept for a specific
>>>>>> algorithm with specific key in the production environment. And if we
>>>>> only
>>>>> change the iv,
>>>>>> we don't need to re-create the session.
>>>>> I think we have a slight misunderstanding here :)
>>>>>
>>>>> The current workflow is
>>>>>
>>>>>    -> create session
>>>>>    <- session key
>>>>>    -> data in
>>>>>    <- data out
>>>>>    ...
>>>>>    <- close session
>>>>>    -> ack
>>>>>
>>>>> That means that at least for the first packet you have at least one full
>>>>> round-trip cost from guest to host to guest to be able to send any data.
>>>>>
>>>> Yes, that's true...
>>>>
>>>>> That sounds pretty expensive to me on the latency side. There are
>>>>> multiple ways to mitigate that. One idea was to have a separate path in
>>>>> parallel to the create session + data + close session dance that would
>>>>> combine them all into a single command. You would still have the session
>>>>> based version, but accelerate the one-blob case.
>>>>>
>>>>> Another idea would be to make the guest be the session id janitor. Then
>>>>> you could just do
>>>>>
>>>>>    -> create session with key X
>>>>>    -> data in
>>>>>    <- data out
>>>>>    ...
>>>>>
>>>>> so you save the round trip, if you combine command and data queues, as
>>>>> then the create and data bits are serialized by their position in the
>>>>> queue.
>>>>>
>>>> ... But for lots of crypto requests, the exhaust is low:
>>>>
>>>> -> create session with key X
>>>> <- session id
>>>>     //do something in the guest if you like
>>>> -> data in with session_id
>>>> -> data in with session_id
>>>> -> data in with session_id
>>>>     ... ...    (fill out data virtqueue)
>>>> <-data out
>>>> <-data out
>>>>     <-data out
>>>>
>>>> And this is what the production environment do currently.
>>>>
>>>>>> For the asymmetric algorithms, we don't need create a session IIRC.
>>>>>>
>>>>>> So, I don't think this is a performance degradation, but a performance
>>>>> enhancement.
>>>>>>> Also, I don't fully understand the split between control and data
>>>>>>> queues. As far as I read things, the control queue is used to create
>>>>>>> session ids and the data queues can then be used to push data. Is
>>>>> there
>>>>>>> any particular reason for the split? One thing that seems natural to
>>>>> me
>>>>>>> would be to have sessions be per-queue, so you would create a
>>>>> session on
>>>>>>> a particular queue and only have it ever be available there. That way
>>>>>>> you get rid of any locking for sessions.
>>>>>>>
>>>>>> We want to keep a unify request type (structure) for data queue, so
>>>>> we can
>>>>>> keep the session operation in the control queue. Of course the
>>>>> control queue
>>>>>> only used to create sessions currently, but we can extend its
>>>>> functions if
>>>>> needed
>>>>>> in the future.
>>>>> I still don't understand. With separate control+data queues you just get
>>>>> yourself into synchronization troubles. Both struct
>>>>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
>>>>> have an opcode as first le32 field. You can easily use that to determine
>>>>> both length of the payload as well as command (control vs data).
>>>>>
>>>> There is a big problem that the control handle logic is synchronization,
>>>> but the data queue
>>>> handling logic is asynchronization. We can't combine them into one queue.
>>>> It will decrease the performance because you need indentify each packet
>>>> if we do this forcedly.
>>> Are you saying that control and data operations are handled by separate
>>> "blocks²?
>>> If you combined control and data queues, there would have to be a (SW)
>>> demultiplexer
>>> that would add overhead (and potentially decrease throughout) especially
>>> for the data
>>> operations?
>> Uh, the multiplexer is as simple as a switch() statement on the opcode,
>> no? It might stall that one particular queue, but that sounds like
>> something you can improve by increasing the number of queues (and invent
>> something smart to ease polling of activity on more queues).
>>
> Actually, the virtio multiple queue's capacity is based on the backend crypto device,
> like multiple queue tap device work with virtio-net.

So what happens when you have 5 crypto accelerator cards in your system? :)


Alex
Alexander Graf Sept. 5, 2016, 7:40 a.m. UTC | #14
On 09/04/2016 05:47 PM, Ola Liljedahl wrote:
>
> On 02/09/2016, 16:05, "Alexander Graf" <agraf@suse.de> wrote:
>
>>>> There is a big problem that the control handle logic is synchronization,
>>>> but the data queue
>>>> handling logic is asynchronization. We can't combine them into one
>>>> queue.
>>>> It will decrease the performance because you need indentify each packet
>>>> if we do this forcedly.
>>> Are you saying that control and data operations are handled by separate
>>> "blocks²?
>>> If you combined control and data queues, there would have to be a (SW)
>>> demultiplexer
>>> that would add overhead (and potentially decrease throughout) especially
>>> for the data
>>> operations?
>> Uh, the multiplexer is as simple as a switch() statement on the opcode,
>> no?
> You are assuming the backend will (always) be implemented in software.

If you implement it in something that is not software, multiplexing
suddenly becomes a lot harder. What if you want to run 20 VMs on a
single host? Would you spawn SR-IOV devices with separate control queues
each? Or would you trap the control queue into the host and let the
guest freely access data queues which then means one guest could
interfere with another guest's data?

If you manage to give each queue its own stream ID, you could just pass
as many real hardware queues as you like into guests, no?


Alex
Gonglei (Arei) Sept. 5, 2016, 8:14 a.m. UTC | #15
>
> 
> On 09/03/2016 05:18 AM, Gonglei (Arei) wrote:
> > Hi,
> >
> >> -----Original Message-----
> >> From: Alexander Graf [mailto:agraf@suse.de]
> >> Sent: Friday, September 02, 2016 10:05 PM
> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> specification
> >>
> >>
> >>
> >> On 02.09.16 14:16, Ola Liljedahl wrote:
> >>>
> >>> On 02/09/2016, 12:26, "Gonglei (Arei)" <arei.gonglei@huawei.com>
> wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Alexander Graf [mailto:agraf@suse.de]
> >>>>> Sent: Friday, September 02, 2016 4:07 PM
> >>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>>>> specification
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 02.09.16 05:08, Gonglei (Arei) wrote:
> >>>>>> Hi Alex,
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Alexander Graf [mailto:agraf@suse.de]
> >>>>>>> Sent: Thursday, September 01, 2016 9:37 PM
> >>>>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>>>> specification
> >>>>>>> On 08/30/2016 02:12 PM, Gonglei wrote:
> >>>>>>>> The virtio crypto device is a virtual crypto device (ie. hardware
> >>>>>>>> crypto accelerator card). The virtio crypto device can provide
> >>>>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM,
> >> PRIMITIVE.
> >>>>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >>>>>>> I have mostly a few high level comments.
> >>>>>>>
> >>>>>>> For starters, a lot of the structs rely on the compiler to pad them
> >>>>> to
> >>>>>>> natural alignment. That may get us into trouble when trying to
> >>>>> emulate a
> >>>>>>> virtio device on a host with different guest architecture (like arm
> >>>>> on
> >>>>>>> x86). It'd be a good idea to manually pad everything to be 64bit
> >>>>> aligned
> >>>>>>> - then all fields are always at the same spot.
> >>>>>>>
> >>>>>> Good point! I'll do this in the next version. Thanks!
> >>>>>>
> >>>>>>> I also have a hard time getting my head around the final flow of
> >>>>>>> everything. Do I always have to create a session to be able to emit a
> >>>>>>> command? In that case, doesn't that slow down everything, since a
> >>>>>>> request would then need to wait for the host reply to receive its
> >>>>>>> session id? There should be some way to fire off a simple non-iv
> >>>>>>> operation without any session set up imho.
> >>>>>>>
> >>>>>> For symmetric algorithms, we'd better create a session before
> >>>>> executing
> >>>>> encryption
> >>>>>> Or decryption, because the session usually be kept for a specific
> >>>>>> algorithm with specific key in the production environment. And if we
> >>>>> only
> >>>>> change the iv,
> >>>>>> we don't need to re-create the session.
> >>>>> I think we have a slight misunderstanding here :)
> >>>>>
> >>>>> The current workflow is
> >>>>>
> >>>>>    -> create session
> >>>>>    <- session key
> >>>>>    -> data in
> >>>>>    <- data out
> >>>>>    ...
> >>>>>    <- close session
> >>>>>    -> ack
> >>>>>
> >>>>> That means that at least for the first packet you have at least one full
> >>>>> round-trip cost from guest to host to guest to be able to send any data.
> >>>>>
> >>>> Yes, that's true...
> >>>>
> >>>>> That sounds pretty expensive to me on the latency side. There are
> >>>>> multiple ways to mitigate that. One idea was to have a separate path in
> >>>>> parallel to the create session + data + close session dance that would
> >>>>> combine them all into a single command. You would still have the session
> >>>>> based version, but accelerate the one-blob case.
> >>>>>
> >>>>> Another idea would be to make the guest be the session id janitor. Then
> >>>>> you could just do
> >>>>>
> >>>>>    -> create session with key X
> >>>>>    -> data in
> >>>>>    <- data out
> >>>>>    ...
> >>>>>
> >>>>> so you save the round trip, if you combine command and data queues, as
> >>>>> then the create and data bits are serialized by their position in the
> >>>>> queue.
> >>>>>
> >>>> ... But for lots of crypto requests, the exhaust is low:
> >>>>
> >>>> -> create session with key X
> >>>> <- session id
> >>>>     //do something in the guest if you like
> >>>> -> data in with session_id
> >>>> -> data in with session_id
> >>>> -> data in with session_id
> >>>>     ... ...    (fill out data virtqueue)
> >>>> <-data out
> >>>> <-data out
> >>>>     <-data out
> >>>>
> >>>> And this is what the production environment do currently.
> >>>>
> >>>>>> For the asymmetric algorithms, we don't need create a session IIRC.
> >>>>>>
> >>>>>> So, I don't think this is a performance degradation, but a performance
> >>>>> enhancement.
> >>>>>>> Also, I don't fully understand the split between control and data
> >>>>>>> queues. As far as I read things, the control queue is used to create
> >>>>>>> session ids and the data queues can then be used to push data. Is
> >>>>> there
> >>>>>>> any particular reason for the split? One thing that seems natural to
> >>>>> me
> >>>>>>> would be to have sessions be per-queue, so you would create a
> >>>>> session on
> >>>>>>> a particular queue and only have it ever be available there. That way
> >>>>>>> you get rid of any locking for sessions.
> >>>>>>>
> >>>>>> We want to keep a unify request type (structure) for data queue, so
> >>>>> we can
> >>>>>> keep the session operation in the control queue. Of course the
> >>>>> control queue
> >>>>>> only used to create sessions currently, but we can extend its
> >>>>> functions if
> >>>>> needed
> >>>>>> in the future.
> >>>>> I still don't understand. With separate control+data queues you just get
> >>>>> yourself into synchronization troubles. Both struct
> >>>>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
> >>>>> have an opcode as first le32 field. You can easily use that to determine
> >>>>> both length of the payload as well as command (control vs data).
> >>>>>
> >>>> There is a big problem that the control handle logic is synchronization,
> >>>> but the data queue
> >>>> handling logic is asynchronization. We can't combine them into one
> queue.
> >>>> It will decrease the performance because you need indentify each packet
> >>>> if we do this forcedly.
> >>> Are you saying that control and data operations are handled by separate
> >>> "blocks²?
> >>> If you combined control and data queues, there would have to be a (SW)
> >>> demultiplexer
> >>> that would add overhead (and potentially decrease throughout) especially
> >>> for the data
> >>> operations?
> >> Uh, the multiplexer is as simple as a switch() statement on the opcode,
> >> no? It might stall that one particular queue, but that sounds like
> >> something you can improve by increasing the number of queues (and invent
> >> something smart to ease polling of activity on more queues).
> >>
> > Actually, the virtio multiple queue's capacity is based on the backend crypto
> device,
> > like multiple queue tap device work with virtio-net.
> 
> So what happens when you have 5 crypto accelerator cards in your system? :)
> 
That means you can support much more virito-crypto devices,
not one virtio-crypto device' queues.

Regards,
-Gonglei
Ola Liljedahl Sept. 5, 2016, 8:53 a.m. UTC | #16
On 05/09/2016, 09:40, "Alexander Graf" <agraf@suse.de> wrote:

>On 09/04/2016 05:47 PM, Ola Liljedahl wrote:

>>

>> On 02/09/2016, 16:05, "Alexander Graf" <agraf@suse.de> wrote:

>>

>>>>> There is a big problem that the control handle logic is

>>>>>synchronization,

>>>>> but the data queue

>>>>> handling logic is asynchronization. We can't combine them into one

>>>>> queue.

>>>>> It will decrease the performance because you need indentify each

>>>>>packet

>>>>> if we do this forcedly.

>>>> Are you saying that control and data operations are handled by

>>>>separate

>>>> "blocks²?

>>>> If you combined control and data queues, there would have to be a (SW)

>>>> demultiplexer

>>>> that would add overhead (and potentially decrease throughout)

>>>>especially

>>>> for the data

>>>> operations?

>>> Uh, the multiplexer is as simple as a switch() statement on the opcode,

>>> no?

>> You are assuming the backend will (always) be implemented in software.

>

>If you implement it in something that is not software, multiplexing

>suddenly becomes a lot harder. What if you want to run 20 VMs on a

>single host? Would you spawn SR-IOV devices with separate control queues

>each? Or would you trap the control queue into the host and let the

>guest freely access data queues which then means one guest could

>interfere with another guest's data?

For a backend implementation in hardware, it would of course also have to
support separation and protection between clients.

I haven’t tried to understand how virtio could be made to support hardware
implementation of some interesting backends. I just want us to avoid making
interface definitions and specification that make alternative backend
implementations difficult or less efficient.

In OPNFV DPACC project, there was some prototyping of virtio-crypto with HW
offload and one conclusion was that the SW overhead was so high that you
had
to pass packets of size >1000 bytes for the HW acceleration to be worth it.
(I think the comparison was with AES).

>If you manage to give each queue its own stream ID, you could just pass

>as many real hardware queues as you like into guests, no?

>

>

>Alex

>


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
diff mbox

Patch

diff --git a/content.tex b/content.tex
index 4b45678..ab75f78 100644
--- a/content.tex
+++ b/content.tex
@@ -5750,6 +5750,8 @@  descriptor for the \field{sense_len}, \field{residual},
 \field{status_qualifier}, \field{status}, \field{response} and
 \field{sense} fields.
 
+\input{virtio-crypto.tex}
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 Currently there are three device-independent feature bits defined:
diff --git a/virtio-crypto.tex b/virtio-crypto.tex
new file mode 100644
index 0000000..491fc25
--- /dev/null
+++ b/virtio-crypto.tex
@@ -0,0 +1,835 @@ 
+\section{Crypto Device}\label{sec:Device Types / Crypto Device}
+
+The virtio crypto device is a virtual crypto device, and is a kind of
+virtual hardware accelerators for virtual machine.  The encryption and
+decryption requests of are placed in the data queue, and handled by the
+real crypto accelerators finally. The second queue is the control queue,
+which is used to create or destroy session for symmetric algorithms, and
+control some advanced features in the future. The virtio crypto
+device can provide seven crypto services: CIPHER, MAC, HASH, AEAD,
+KDF, ASYM, PRIMITIVE.
+
+\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID}
+
+20
+
+\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / Virtqueues}
+
+\begin{description}
+\item[0] dataq1
+\item[\ldots]
+\item[N-1] dataqN
+\item[N] controlq
+\end{description}
+
+N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1).
+
+\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature bits}
+  None currently defined
+
+\subsection{Device configuration layout}\label{sec:Device Types / Crypto Device / Device configuration layout}
+
+Thirteen driver-read-only configuration fields are currently defined.
+
+\begin{lstlisting}
+struct virtio_crypto_config {
+    le16  status;
+    le16  max_dataqueues;
+    le32  crypto_services;
+    /* detailed algorithms mask */
+    le32 cipher_algo_l;
+    le32 cipher_algo_h;
+    le32 hash_algo;
+    le32 mac_algo_l;
+    le32 mac_algo_h;
+    le32 asym_algo;
+    le32 kdf_algo;
+    le32 aead_algo;
+    le32 primitive_algo;
+};
+\end{lstlisting}
+
+The first field, \field{status} is currently defined: VIRTIO_CRYPTO_S_HW_READY.
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
+\end{lstlisting}
+
+The following driver-read-only field, \field{max_dataqueuess} specifies the
+maximum number of data virtqueues (dataq1\ldots dataqN). The crypto_services
+shows the crypto services the virtio crypto supports. The service currently
+defined are:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */
+#define VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */
+#define VIRTIO_CRYPTO_SERVICE_HASH  (2) /* hash service */
+#define VIRTIO_CRYPTO_SERVICE_MAC   (3) /* MAC (Message Authentication Codes) service */
+#define VIRTIO_CRYPTO_SERVICE_AEAD  (4) /* AEAD(Authenticated Encryption with Associated Data) service */
+\end{lstlisting}
+
+The last driver-read-only fields specify detailed algorithms mask which
+the device offered for corresponding service. The below CIPHER algorithms
+are defined currently:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_NO_CIPHER                 0
+#define VIRTIO_CRYPTO_CIPHER_ARC4               1
+#define VIRTIO_CRYPTO_CIPHER_AES_ECB            2
+#define VIRTIO_CRYPTO_CIPHER_AES_CBC            3
+#define VIRTIO_CRYPTO_CIPHER_AES_CTR            4
+#define VIRTIO_CRYPTO_CIPHER_DES_ECB            5
+#define VIRTIO_CRYPTO_CIPHER_DES_CBC            6
+#define VIRTIO_CRYPTO_CIPHER_3DES_ECB           7
+#define VIRTIO_CRYPTO_CIPHER_3DES_CBC           8
+#define VIRTIO_CRYPTO_CIPHER_3DES_CTR           9
+#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8          10
+#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA2        11
+#define VIRTIO_CRYPTO_CIPHER_AES_F8             12
+#define VIRTIO_CRYPTO_CIPHER_AES_XTS            13
+#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3           14
+\end{lstlisting}
+
+The below HASH algorithms are defined currently:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_NO_HASH            0
+#define VIRTIO_CRYPTO_HASH_MD5           1
+#define VIRTIO_CRYPTO_HASH_SHA1          2
+#define VIRTIO_CRYPTO_HASH_SHA_224       3
+#define VIRTIO_CRYPTO_HASH_SHA_256       4
+#define VIRTIO_CRYPTO_HASH_SHA_384       5
+#define VIRTIO_CRYPTO_HASH_SHA_512       6
+#define VIRTIO_CRYPTO_HASH_SHA3_224      7
+#define VIRTIO_CRYPTO_HASH_SHA3_256      8
+#define VIRTIO_CRYPTO_HASH_SHA3_384      9
+#define VIRTIO_CRYPTO_HASH_SHA3_512      10
+#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE128      11
+#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE256      12
+\end{lstlisting}
+
+The below MAC algorithms are defined currently:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_NO_MAC                       0
+#define VIRTIO_CRYPTO_MAC_HMAC_MD5                 1
+#define VIRTIO_CRYPTO_MAC_HMAC_SHA1                2
+#define VIRTIO_CRYPTO_MAC_HMAC_SHA_224             3
+#define VIRTIO_CRYPTO_MAC_HMAC_SHA_256             4
+#define VIRTIO_CRYPTO_MAC_HMAC_SHA_384             5
+#define VIRTIO_CRYPTO_MAC_HMAC_SHA_512             6
+#define VIRTIO_CRYPTO_MAC_CMAC_3DES                25
+#define VIRTIO_CRYPTO_MAC_CMAC_AES                 26
+#define VIRTIO_CRYPTO_MAC_CMAC_KASUMI_F9           27
+#define VIRTIO_CRYPTO_MAC_CMAC_SNOW3G_UIA2         28
+#define VIRTIO_CRYPTO_MAC_GMAC_AES                 41
+#define VIRTIO_CRYPTO_MAC_GMAC_TWOFISH             42
+#define VIRTIO_CRYPTO_MAC_CBCMAC_AES               49
+#define VIRTIO_CRYPTO_MAC_CBCMAC_KASUMI_F9         50
+#define VIRTIO_CRYPTO_MAC_XCBC_AES                 53
+#define VIRTIO_CRYPTO_MAC_ZUC_EIA3                 54
+\end{lstlisting}
+
+The below AEAD algorithms are defined currently:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_NO_AEAD     0
+#define VIRTIO_CRYPTO_AEAD_GCM    1
+#define VIRTIO_CRYPTO_AEAD_CCM    2
+#define VIRTIO_CRYPTO_AEAD_CHACHA20_POLY1305  3
+\end{lstlisting}
+
+\subsubsection{Device Requirements: Device configuration layout}\label{sec:Device Types / Crypto Device / Device configuration layout / Device Requirements: Device configuration layout}
+
+\begin{itemize*}
+\item The device MUST set \field{max_dataqueues} to between 1 and 65535 inclusive.
+\item The device SHOULD set \field{status} according to the status of the hardware-backed implementation.
+\item The device MUST set \field{crypto_services} according to the crypto services which the device offered.
+\item The device MUST set detailed algorithms mask according to \field{crypto_services} field.
+\end{itemize*}
+
+\subsubsection{Driver Requirements: Device configuration layout}\label{sec:Device Types / Crypto Device / Device configuration layout / Driver Requirements: Device configuration layout}
+
+\begin{itemize*}
+\item The driver MUST read the ready \field{status} from the bottom bit of status to
+      check whether the hardware-backed implementation is ready or not.
+\item The driver MAY read \field{max_dataqueues} field to discover how many data queues the device supports.
+\item The driver MUST read \field{crypto_services} field to discover which services the device is able to offer.
+\item The driver MUST read detail \field{algorithms} field according to \field{crypto_services} field.
+\end{itemize*}
+
+\subsection{Device Initialization}\label{sec:Device Types / Crypto Device / Device Initialization}
+
+\subsubsection{Driver Requirements: Device Initialization}\label{sec:Device Types / Crypto Device / Device Initialization / Driver Requirements: Device Initialization}
+
+\begin{itemize*}
+\item The driver MUST identify and initialize data virtqueue, up to \field{max_dataqueues}.
+\item The driver MUST identify the control virtqueue.
+\item The driver MUST identify the ready status of hardware-backend from \field{status} field.
+\item The driver MUST read the supported crypto services from bits of \field{crypto_servies}. 
+\item The driver MUST read the supported algorithms according to \field{crypto_services} field.
+\end{itemize*}
+
+\subsubsection{Device Requirements: Device Initialization}\label{sec:Device Types / Crypto Device / Device Initialization / Device Requirements: Device Initialization}
+
+\begin{itemize*}
+\item The device MUST be configured at least one real accelerator as the backend accelerator which executes real crypto operations.
+\item The device MUST write the \field{crypto_services} field according to the capacities of the backend accelerator.
+\item The device MUST write the corresponding algorithms field according to the \field{crypto_services} field.
+\end{itemize*}
+
+\subsection{Device Operation}\label{sec:Device Types / Crypto Device / Device Operation}
+
+Packets can be transmitted by placing them in both the controlq and dataq.
+The packet are consisted of general header and services specific request,
+Where 'general header' is for all crypto request, 'service specific request'
+is composed of operation parameter + output data + input data in generally.
+Operation parameters are algorithm-specific parameters, output data is the
+data should be operated, input data is the "operation result + result buffer".
+The general header of controlq:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_OPCODE(service, op)   ((service << 8) | (op))
+
+struct virtio_crypto_ctrl_header{
+#define VIRTIO_CRYPTO_CIPHER_CREATE_SESSION   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x02)
+#define VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION  VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x03)
+#define VIRTIO_CRYPTO_HASH_CREATE_SESSION     VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x02)
+#define VIRTIO_CRYPTO_HASH_DESTROY_SESSION    VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x03)
+#define VIRTIO_CRYPTO_MAC_CREATE_SESSION      VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x02)
+#define VIRTIO_CRYPTO_MAC_DESTROY_SESSION     VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x03)
+#define VIRTIO_CRYPTO_AEAD_CREATE_SESSION     VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
+#define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION    VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
+    le32 opcode;
+    /* algo should be service-specific algorithms */
+    le32 algo;
+    /* control flag to control the request */
+    le16 flag;
+    /* data virtqeueu id */
+    le16 queue_id;
+};
+\end{lstlisting}
+
+The general header of dataq:
+
+\begin{lstlisting}
+struct virtio_crypto_op_header {
+#define VIRTIO_CRYPTO_CIPHER_ENCRYPT    VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x00)
+#define VIRTIO_CRYPTO_CIPHER_DECRYPT    VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x01)
+#define VIRTIO_CRYPTO_HASH              VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x00)
+#define VIRTIO_CRYPTO_MAC               VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x00)
+#define VIRTIO_CRYPTO_AEAD_ENCRYPT      VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
+#define VIRTIO_CRYPTO_AEAD_DECRYPT      VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
+    le32 opcode;
+    /* algo should be service-specific algorithms */
+    le32 algo;
+    /* control flag to control the request */
+    le16 flag;
+    /* data virtqeueu id */
+    le16 queue_id;
+    /* session_id should be service-specific algorithms */
+    le64 session_id;
+};
+\end{lstlisting}
+
+\subsubsection{Control virtqueue}\label{sec:Device Types / Crypto Device / Device Operation / Control virtqueue}
+
+The driver uses the control virtqueue to send control commands to the
+device which finish the non-data plane operations, such as session
+operations (see \ref{sec:Device Types / Crypto Device / Device Operation / Session operation}).
+The packet of controlq:
+
+\begin{lstlisting}
+struct virtio_crypto_op_ctrl_req {
+    struct virtio_crypto_ctrl_header header;
+
+    union {
+        struct virtio_crypto_sym_create_session_req   sym_create_session;
+        struct virtio_crypto_hash_create_session_req  hash_create_session;
+        struct virtio_crypto_mac_create_session_req   mac_create_session;
+        struct virtio_crypto_aead_create_session_req  aead_create_session;
+        struct virtio_crypto_destroy_session_req      sym_destroy_session;
+    } u;
+};
+\end{lstlisting}
+
+The header is the general header, the union is algorithm-specific type,
+which is set by the driver. All the properties in the union will be shown as follow.
+
+\subsubsection{Session operation}\label{sec:Device Types / Crypto Device / Device Operation / Session operation}
+
+The symmetric algorithms have the concept of sessions. A session is a
+handle which describes the cryptographic parameters to be applied to
+a number of buffers. The data within a session handle includes the following:
+
+\begin{enumerate}
+\item The operation (cipher, hash/mac or both, and if both, the order in
+      which the algorithms should be applied).
+\item The cipher setup data, including the cipher algorithm and mode,
+      the key and its length, and the direction (encrypt or decrypt).
+\item The hash/mac setup data, including the hash algorithm or mac algorithm,
+      and digest result length (to allow for truncation).
+\begin{itemize*}
+\item Authenticated mode can refer to MAC, which requires that the key and
+      its length are also specified.
+\item For nested mode, the inner and outer prefix data and length are specified,
+      as well as the outer hash algorithm.
+\end{itemize*}
+\end{enumerate}
+
+\paragraph{Session operation: HASH session}\label{sec:Device Types / Crypto Device / Device Operation / Session operation / Session operation: HASH session}
+
+The packet of HASH session is:
+
+\begin{lstlisting}
+struct virtio_crypto_hash_session_para {
+    /* See VIRTIO_CRYPTO_HASH_* above */
+    le32 algo;
+    /* hash result length */
+    le32 hash_result_len;
+};
+struct virtio_crypto_hash_create_session_req {
+    // Device-readable part
+    struct virtio_crypto_hash_session_para parameter;
+    // Device-writable part
+    le64    session_id;
+    u8     status;
+};
+\end{lstlisting}
+
+\paragraph{Session operation: MAC session}\label{sec:Device Types / Crypto Device / Device
+Operation / Session operation / Session operation: MAC session}
+
+The packet of MAC session is:
+
+\begin{lstlisting}
+struct virtio_crypto_mac_session_para {
+    /* See VIRTIO_CRYPTO_MAC_* above */
+    le32 algo;
+    /* hash result length */
+    le32 hash_result_len;
+    /* length of authenticated key */
+    le32 auth_key_len;
+};
+struct virtio_crypto_mac_create_session_req {
+    // Device-readable part
+    struct virtio_crypto_mac_session_para  parameter;
+    // Device-writable part
+    le64    session_id;
+    u8     status;
+};
+\end{lstlisting}
+
+\paragraph{Session operation: Symmetric algorithms session}\label{sec:Device Types / Crypto Device / Device
+Operation / Session operation / Session operation: Symmetric algorithms session}
+
+The request of symmetric session includes two parts, CIPHER algorithms and chain
+algorithms (chaining cipher and hash/mac). The packet of symmetric session is:
+
+\begin{lstlisting}
+struct virtio_crypto_cipher_session_para {
+    /* See VIRTIO_CRYPTO_CIPHER* above */
+    le32 algo;
+    /* length of key */
+    le32 keylen;
+    /* encrypt or decrypt */
+    u8 op;
+};
+struct virtio_crypto_sym_create_session_req {
+    // Device-readable part
+/* No operation */
+#define VIRTIO_CRYPTO_SYM_OP_NONE  0
+/* Cipher only operation on the data */
+#define VIRTIO_CRYPTO_SYM_OP_CIPHER  1
+/* Chain any cipher with any hash or mac operation. The order
+   depends on the value of alg_chain_order param */
+#define VIRTIO_CRYPTO_SYM_OP_ALGORITHM_CHAINING  2
+    u8 op_type;
+
+    union {
+        struct virtio_crypto_cipher_session_para cipher_param;
+        struct virtio_crypto_alg_chain_param {
+#define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_HASH_THEN_CIPHER  1
+#define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_CIPHER_THEN_HASH  2
+            u8 alg_chain_order;
+/* Plain hash */
+#define VIRTIO_CRYPTO_SYM_HASH_MODE_PLAIN    1
+/* Authenticated hash (mac) */
+#define VIRTIO_CRYPTO_SYM_HASH_MODE_AUTH     2
+/* Nested hash */
+#define VIRTIO_CRYPTO_SYM_HASH_MODE_NESTED   3
+            u8 hash_mode;
+            struct virtio_crypto_cipher_session_para cipher_param;
+            union {
+                struct virtio_crypto_hash_session_para hash_param;
+                struct virtio_crypto_mac_session_para mac_param;
+            } u;
+        } chain;
+    } sym;
+
+    // Device-writable part
+    le64    session_id;
+    u8     status;
+};
+\end{lstlisting}
+
+\paragraph{Session operation: AEAD session}\label{sec:Device Types / Crypto Device / Device
+Operation / Session operation / Session operation: AEAD session}
+
+The packet of AEAD session is:
+
+\begin{lstlisting}
+struct virtio_crypto_aead_session_para {
+    /* See VIRTIO_CRYPTO_AEAD_* above*/
+    u8 algo;
+    /* length of key */
+    le32 key_len;
+    /* digest result length */
+    le32 digest_result_len;
+    /* The length of the additional authenticated data (AAD) in bytes */
+    le32 aad_len;
+    /* encrypt or decrypt */
+    u8 op;
+};
+struct virtio_crypto_aead_create_session_req {
+    // Device-readable part
+    struct virtio_crypto_aead_session_para parameter;
+    // Device-writable part
+    le64    session_id;
+    u8     status;
+};
+\end{lstlisting}
+
+\paragraph{Session operation: create session}\label{sec:Device Types / Crypto Device / Device
+Operation / Session operation / Session operation: create session}
+
+A request of creating a session including the following information:
+
+\begin{lstlisting}
+struct virtio_crypto_op_ctrl_req {
+    struct virtio_crypto_ctrl_header header;
+
+    union {
+        struct virtio_crypto_sym_create_session_req   sym_create_session;
+        struct virtio_crypto_hash_create_session_req  hash_create_session;
+        struct virtio_crypto_mac_create_session_req   mac_create_session;
+        struct virtio_crypto_aead_create_session_req  aead_create_session;
+        struct virtio_crypto_destroy_session_req      sym_destroy_session;
+    } u;
+};
+\end{lstlisting}
+
+\subparagraph{Driver Requirements: Session operation: create session}\label{sec:Device Types / Crypto Device / Device
+Operation / Session operation / Session operation: create session / Driver Requirements: Session operation: create session}
+
+The driver MUST set the control general header and corresponding property
+of union in structure virtio_crypto_op_ctrl_req. See \ref{sec:Device Types / Crypto Device / Device Operation}.
+Take the example of MAC service, the driver MUST set VIRTIO_CRYPTO_MAC_CREATE_SESSION
+for \field{opcode} field in struct virtio_crypto_op_ctrl_req, and set the \field{queue_id}
+to show dataq used.
+
+\subparagraph{Device Requirements: Session operation: create session}\label{sec:Device Types / Crypto Device / Device
+Operation / Session operation / Session operation: create session / Device Requirements: Session operation: create session}
+
+The device MUST return a session identifier to the driver when the device
+finishes the processing of session creation. The session creation request
+MUST end by a \field{status} field and a \field{session_id} field:
+
+Both \field{status} and \field{session_id} are written by the device: either VIRTIO_CRYPTO_OP_OK for success,
+VIRTIO_CRYPTO_OP_ERR for creation failed or device error, VIRTIO_CRYPTO_OP_NOTSUPP for not support,
+VIRTIO_CRYPTO_OP_INVSESS for invalid session id when executing crypto operations.
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_OP_OK        0
+#define VIRTIO_CRYPTO_OP_ERR       1
+#define VIRTIO_CRYPTO_OP_BADMSG    2
+#define VIRTIO_CRYPTO_OP_NOTSUPP   3
+#define VIRTIO_CRYPTO_OP_INVSESS   4
+\end{lstlisting}
+
+\paragraph{Session operation: destroy session}\label{sec:Device Types / Crypto Device / Device
+Operation / Session operation / Session operation: destroy session}
+
+A request which destroy a session including the following information:
+
+\begin{lstlisting}
+struct virtio_crypto_destroy_session_req {
+    // Device-readable part
+    le64    session_id;
+    // Device-writable part
+    u8     status;
+};
+struct virtio_crypto_op_ctrl_req {
+    struct virtio_crypto_ctrl_header header;
+
+    union {
+        struct virtio_crypto_sym_create_session_req   sym_create_session;
+        struct virtio_crypto_hash_create_session_req  hash_create_session;
+        struct virtio_crypto_mac_create_session_req   mac_create_session;
+        struct virtio_crypto_aead_create_session_req  aead_create_session;
+        struct virtio_crypto_destroy_session_req      sym_destroy_session;
+    } u;
+};
+\end{lstlisting}
+
+\subparagraph{Driver Requirements: Session operation: destroy session}\label{sec:Device Types / Crypto Device / Device
+Operation / Session operation / Session operation: destroy session / Driver Requirements: Session operation: destroy session}
+
+The driver MUST set the control general header and corresponding property
+of union in struct virtio_crypto_op_ctrl_req. See \ref{sec:Device Types / Crypto Device / Device Operation}.
+Take the example of MAC service, the driver MUST set VIRTIO_CRYPTO_MAC_DESTROY_SESSION
+for \field{opcode} field in struct virtio_crypto_op_ctrl_req, and set the \field{queue_id} shows dataq used.
+The driver MUST set the \field{session_id} which MUST be a valid value which assigned by the
+device when a session was created.
+
+\subparagraph{Device Requirements: Session operation: destroy session}\label{sec:Device Types / Crypto Device / Device
+Operation / Session operation / Session operation: destroy session / Device Requirements: Session operation: destroy session}
+
+Status is written by the device: either VIRTIO_CRYPTO_OP_OK for success, VIRTIO_CRYPTO_OP_ERR for failure or device error.
+
+\subsubsection{Crypto operation}\label{sec:Device Types / Crypto Device / Device Operation / Crypto operation}
+
+The driver uses the dataq to finish the data plane operations (such as crypto operation).
+The packet of dataq:
+
+\begin{lstlisting}
+struct virtio_crypto_op_data_req {
+    struct virtio_crypto_op_header header;
+
+    union {
+        struct virtio_crypto_sym_data_req   sym_req;
+        struct virtio_crypto_hash_data_req  hash_req;
+        struct virtio_crypto_mac_data_req   mac_req;
+        struct virtio_crypto_aead_data_req  aead_req;
+    } u;
+};
+\end{lstlisting}
+
+The header is the general header, the union is algorithm-specific type,
+which are set by the driver. All properties in the union will be shown as follow.
+
+\paragraph{Crypto operation: HASH operation}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: HASH operation}
+
+\begin{lstlisting}
+struct virtio_crypto_hash_input {
+    le64 digest_result_addr; /* digest result guest physical address */
+    u8    status;
+};
+
+struct virtio_crypto_hash_output {
+    /* length of source data */
+    le32 src_data_len;
+    /* source data guest physical address */
+    le64 src_data_addr;
+};
+
+struct virtio_crypto_hash_data_req {
+    // Device-readable part
+    struct virtio_crypto_hash_output odata;
+    // Device-writable part
+    struct virtio_crypto_hash_input idata;
+};
+\end{lstlisting}
+
+Each data request uses virtio_crypto_hash_data_req structure to store informations,
+which are used to execute one HASH operation. The request only occupies one entry
+of the Vring descriptor table in virtio crypto device's dataq, which improves
+the throughput of data transferring for HASH service, so that the virtio crypto
+device CAN get the better result of acceleration.
+
+The informations include the source data guest physical address stored by \field{src_data_addr},
+length of source data stored by \field{src_data_len}, digest result guest physical address
+stored by \field{digest_result_addr} which is used to save the result of HASH operation.
+The address and length CAN determine the specific content in the guest memory.
+
+Note: The guest memory MUST be guaranteed to be allocated and physically-contiguous
+pointed by \field{digest_result_addr} in struct virtio_crypto_hash_input and
+\field{src_data_addr} in struct virtio_crypto_hash_output.
+
+\subparagraph{Driver Requirements: Crypto operation: HASH operation}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: HASH operation / Driver Requirements: Crypto operation: HASH operation}
+
+The driver MUST set the \field{opcode}, \field{session_id} in struct virtio_crypto_op_header.
+The \field{opcode} is set to VIRTIO_CRYPTO_HASH, \field{session_id} MUST be a valid value
+which assigned by the device when a session was created.
+The driver SHOULD set the \field{queue_id} field to show dataq used in struct virtio_crypto_op_header.
+The driver MUST fill out all fields in struct virtio_crypto_hash_data_req, including \field{parameter},
+\field{odata} and \field{idata} sub structures.
+
+\subparagraph{Device Requirements: Crypto operation: HASH operation}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: HASH operation / Device Requirements: Crypto operation: HASH operation}
+
+The device MUST copy the result of hash operation recorded by \field{digest_result_addr}
+filed in struct virtio_crypto_hash_input.
+The device MUST set the \field{status} in strut virtio_crypto_hash_input: either VIRTIO_CRYPTO_OP_OK for success,
+VIRTIO_CRYPTO_OP_ERR for creation failed or device error, VIRTIO_CRYPTO_OP_NOTSUPP for not support.
+
+\paragraph{Crypto operation: MAC operation}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: MAC operation}
+
+\begin{lstlisting}
+struct virtio_crypto_mac_input {
+    struct virtio_crypto_hash_input hash_input;
+};
+
+struct virtio_crypto_mac_output {
+    struct virtio_crypto_hash_output hash_output;
+};
+
+struct virtio_crypto_mac_data_req {
+    // Device-readable part
+    struct virtio_crypto_mac_output odata;
+    // Device-writable part
+    struct virtio_crypto_mac_input idata;
+};
+\end{lstlisting}
+
+Each data request uses virtio_crypto_mac_data_req structure to store informations,
+which are used to execute one MAC operation. The request only occupies one entry
+of the Vring descriptor table in virtio crypto device's dataq, which improves
+the throughput of data transferring for MAC service, so that the virtio crypto
+device CAN get the better result of acceleration.
+
+The informations include the source data guest physical address stored by \field{src_data_addr},
+length of source data stored by \field{src_data_len}, digest result guest physical address
+stored by \field{digest_result_addr} which is used to save the result of MAC operation.
+The address and length CAN determine the specific content in the guest memory.
+
+Note: The guest memory MUST be guaranteed to be allocated and physically-contiguous
+pointed by \field{digest_result_addr} in struct virtio_crypto_hash_input and
+\field{src_data_addr} in struct virtio_crypto_hash_output.
+
+\subparagraph{Driver Requirements: Crypto operation: MAC operation}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: MAC operation / Driver Requirements: Crypto operation: MAC operation}
+
+Refer to the hash operation.
+The driver MUST set the \field{opcode} field to VIRTIO_CRYPTO_MAC.
+
+\subparagraph{Device Requirements: Crypto operation: MAC operation}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: MAC operation / Device Requirements: Crypto operation: MAC operation}
+
+Refer to the hash operation.
+
+\paragraph{Crypto operation: Symmetric algorithms operation}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: Symmetric algorithms operation}
+
+\begin{lstlisting}
+struct virtio_crypto_cipher_para {
+    le32 iv_len;
+    /* length of source data */
+    le32 src_data_len;
+    /* length of dst data */
+    le32 dst_data_len;
+};
+struct virtio_crypto_cipher_input {
+    le64 dst_data_addr; /* destination data guest physical address */
+    u8    status;
+};
+
+struct virtio_crypto_cipher_output {
+    le64 iv_addr; /* iv guest physical address */
+    le64 src_data_addr; /* source data guest physical address */
+};
+struct virtio_crypto_cipher_data_req {
+    // Device-readable part
+    struct virtio_crypto_cipher_para parameter;
+    struct virtio_crypto_cipher_output odata;
+    // Device-writable part
+    struct virtio_crypto_cipher_input idata;
+};
+struct virtio_crypto_sym_data_req {
+    struct virtio_crypto_cipher_data_req cipher_req;
+    union {
+        struct virtio_crypto_hash_data_req hash_req;
+        struct virtio_crypto_mac_data_req mac_req;
+    } u;
+};
+\end{lstlisting}
+
+Each data request uses virtio_crypto_cipher_data_req structure to store informations,
+which are used to execute one CIPHER operation. The request only occupies one entry
+of the Vring descriptor table in virtio crypto device's dataq, which improves
+the throughput of data transferring for CIPHER service, so that the virtio crypto
+device CAN get the better result of acceleration.
+
+In the first virtio_crypto_cipher_para structure, \field{iv_len} specifics the length of initialization vector,
+\field{src_data_len} specifics the length of source data, \field{dst_data_len} specifics the
+length of destination data.
+
+In the following virtio_crypto_cipher_input structure, \field{dst_data_addr} specifics destination
+data guest physical address which is used to store the result of CIPHER operation. \field{status} special
+the status of CIPHER operation, See \ref{sec:Device Types / Crypto Device / Device Operation / Session operation /
+Session operation: create session / Device Requirements: Session operation: create session}. 
+
+In the virtio_crypto_cipher_output structure, \field{iv_addr} specifics guest physical address of initialization vector,
+\field{src_data_addr} specifics source data guest physical address.
+
+The addresses and length CAN determine the specific content in the guest memory.
+
+Note: The guest memory MUST be guaranteed to be allocated and physically-contiguous
+pointed by \field{dst_data_addr} in struct virtio_crypto_cipher_input, 
+\field{iv_addr} and \field{src_data_addr} in struct virtio_crypto_cipher_output.
+
+\subparagraph{Driver Requirements: Crypto operation: Symmetric algorithms encryption}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: Symmetric algorithms operation / Driver Requirements: Crypto operation: Symmetric algorithms encryption}
+
+Refer to the hash operation.
+
+The driver MUST set the \field{opcode} field to VIRTIO_CRYPTO_CIPHER_ENCRYPT.
+The driver MUST fill out the fields in struct virtio_crypto_sym_data_req according to
+the operation type of the session. 
+
+The driver SHOULD fill out fields of struct virtio_crypto_cipher_data_req in struct virtio_crypto_sym_data_req
+if the created session is based on VIRTIO_CRYPTO_SYM_OP_CIPHER
+
+The driver SHOULD fill out fields of both struct virtio_crypto_cipher_data_req and struct
+virtio_crypto_mac_data_req mac_req in struct virtio_crypto_sym_data_req if the created session
+is VIRTIO_CRYPTO_SYM_OP_ALGORITHM_CHAINING type and VIRTIO_CRYPTO_SYM_HASH_MODE_AUTH mode.
+
+\subparagraph{Device Requirements: Crypto operation: Symmetric algorithms encryption}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: Symmetric algorithms operation / Device Requirements: Crypto operation: Symmetric algorithms encryption}
+
+The device MUST parse the virtio_crypto_sym_data_req according to the session_id in general header.
+
+The device SHOULD only parsed fields of struct virtio_crypto_cipher_data_req in
+struct virtio_crypto_sym_data_req if the created session is VIRTIO_CRYPTO_SYM_OP_CIPHER type.
+
+The driver SHOULD parse fields of both struct virtio_crypto_cipher_data_req and struct
+virtio_crypto_mac_data_req mac_req in struct virtio_crypto_sym_data_req if the created
+session is VIRTIO_CRYPTO_SYM_OP_ALGORITHM_CHAINING operation type and VIRTIO_CRYPTO_SYM_HASH_MODE_AUTH mode.
+
+The device MUST copy the result of encryption operation recorded by \filed{dst_data_addr} filed in struct virtio_crypto_cipher_input in plain cipher mode.
+The device MUST copy the result of hash operation recorded by \filed{digest_result_addr} filed in struct virtio_crypto_hash_input in chain hash/mac mode.
+The device MUST set the \filed{status} field in strut virtio_crypto_cipher_input: either VIRTIO_CRYPTO_OP_OK for success, VIRTIO_CRYPTO_OP_ERR for creation
+failed or device error, VIRTIO_CRYPTO_OP_NOTSUPP for not support.
+
+\subparagraph{Crypto operation: Steps of encryption}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: Symmetric algorithms operation / Crypto operation: Steps of encryption}
+
+Step1: Creating a session:
+\begin{enumerate}
+\item The driver fills out informations in struct virtio_crypto_op_ctrl_req, including algorithm name, key, keylen etc;
+\item The driver puts the request of session creation into the controlq's Vring descriptor table;
+\item The driver kicks the device;
+\item The device gets the request from controlq;
+\item The device parses informations of the request, determines the informations of the backend crypto accelerator;
+\item The device packages informations according to the API of the backend crypto accelerator;
+\item The device invokes the session creation API of the backend crypto accelerator to create a session;
+\item The device returns the session id to the driver.
+\end{enumerate}
+
+Step2: Executing the detail encryption operation:
+\begin{enumerate}
+\item The driver fills out informations in struct virtio_crypto_op_data_req, including struct virtio_crypto_op_header and struct virtio_crypto_sym_data_req, See \ref{sec:Device Types / Crypto Device / Device
+      Operation / Crypto operation / Crypto operation: Symmetric algorithms operation / Driver Requirements: Crypto operation: Symmetric algorithms encryption};
+\item The driver puts the request of encryption into the dataq's Vring descriptor table;
+\item The driver kicks the device (Or the device polls the dataq's Vring descriptor table actively);
+\item The device gets the request from dataq;
+\item The device parses informations of the request, determines the identifying informations of the backend crypto accelerator.
+      For example, changing guest physical addresses to host physical addresses;
+\item The device packages identifying informations according to the API of the backend crypto accelerator;
+\item The device invokes the encryption API of the backend crypto accelerator;
+\item The backend crypto accelerator executes the encryption operation implicitly;
+\item The device gets the encryption result from the backend crypto accelerator (synchronous or asynchronous);
+\item The device set the \field{status} in struct virtio_crypto_cipher_input;
+\tiem The device updates and flushes the Vring used table to return the encryption result to the driver;
+\item The device notifies the driver (Or the driver polls the dataq's Vring used table actively);
+\item The driver handles the encryption result.
+\end{enumerate}
+
+Note: 
+\begin{itemize*}
+\item The driver MAY support both synchronous and asynchronous encryption. Then the performance
+      is poor in synchronous operation because frequent context switching and virtualization overhead.
+      The driver SHOULD by preference use asynchronous encryption.
+\item For better performance, the device SHOULD by preference use vhost scheme (user space or kernel space)
+      as the backend crypto accelerator in real production environment.
+\end{itemize*}
+
+\paragraph{Crypto operation: AEAD operation}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: AEAD operation}
+
+\begin{lstlisting}
+struct virtio_crypto_aead_para {
+    le32 iv_len;
+    /* length of additional auth data */
+    le32 aad_len;
+    /* length of source data */
+    le32 src_data_len;
+    /* length of dst data */
+    le32 dst_data_len;
+};
+
+struct virtio_crypto_aead_input {
+    le64 dst_data_addr; /* destination data guest physical address */
+    le64 digest_result_addr; /* digest result guest physical address */
+    u8    status;
+};
+
+struct virtio_crypto_aead_output {
+    le64 iv_addr; /* iv guest physical address */
+    le64 src_data_addr; /* source data guest physical address */
+    le64 add_data_addr; /* additional auth data guest physical address */
+};
+struct virtio_crypto_aead_data_req {
+    // Device-readable part
+    struct virtio_crypto_aead_para parameter;
+    struct virtio_crypto_aead_output odata;
+    // Device-writable part
+    struct virtio_crypto_aead_input idata;
+};
+\end{lstlisting}
+
+Each data request uses virtio_crypto_aead_data_req structure to store informations,
+which are used to execute one CIPHER operation. The request only occupies one entry
+of the Vring descriptor table in virtio crypto device's dataq, which improves
+the throughput of data transferring for AEAD service, so that the virtio crypto
+device CAN get the better result of acceleration.
+
+In the first virtio_crypto_aead_para structure, \field{iv_len} specifics the length of initialization vector,
+\field{aad_len} specifics the length of additional authentication data, \field{src_data_len} specifics the
+length of source data, \field{dst_data_len} specifics the length of destination data.
+
+In the following virtio_crypto_aead_input structure, \field{dst_data_addr} specifics destination
+data guest physical address which is used to store the result of CIPHER operation. \field{digest_result_addr} specifics
+digest result guest physical address which is used to store the result of HASH/MAC operation. \field{status} special
+the status of AEAD operation, See \ref{sec:Device Types / Crypto Device / Device Operation / Session operation /
+Session operation: create session / Device Requirements: Session operation: create session}. 
+
+In the virtio_crypto_aead_output structure, \field{iv_addr} specifics guest physical address of initialization vector,
+\field{src_data_addr} specifics source data guest physical address, \field{add_data_addr} specifics the guest physical address
+of additional authentication data.
+
+The addresses and length CAN determine the specific content in the guest memory.
+
+Note: The guest memory MUST be gauranted to be allocated and physically-contiguous
+pointed by \field{dst_data_addr} and \field{digest_result_addr} in struct virtio_crypto_aead_input, 
+\field{iv_addr}, \field{add_data_addr} and \field{src_data_addr} in struct virtio_crypto_aead_output.
+
+\subparagraph{Driver Requirements: Crypto operation: AEAD encryption}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: AEAD operation / Driver Requirements: Crypto operation: AEAD encryption}
+
+Refer to the symmetric algorithms encryption operation (See \ref{sec:Device Types / Crypto Device / Device Operation / Crypto
+                        operation / Crypto operation: Symmetric algorithms operation / Crypto operation: Steps of encryption}).
+The driver MUST set the \field{opcode} field to VIRTIO_CRYPTO_AEAD_ENCRYPT.
+
+\subparagraph{Device Requirements: Crypto operation: AEAD encryption}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: AEAD operation / Device Requirements: Crypto operation: AEAD encryption}
+
+Refer to the symmetric algorithms encryption operation (See \ref{sec:Device Types / Crypto Device / Device Operation / Crypto
+                        operation / Crypto operation: Symmetric algorithms operation / Crypto operation: Steps of encryption}).
+
+\subparagraph{Driver Requirements: Crypto operation: AEAD decryption}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: AEAD operation / Driver Requirements: Crypto operation: AEAD decryption}
+
+Refer to the symmetric algorithms encryption operation (See \ref{sec:Device Types / Crypto Device / Device Operation / Crypto
+                        operation / Crypto operation: Symmetric algorithms operation / Crypto operation: Steps of encryption}).
+The driver MUST set the \field{opcode} field to VIRTIO_CRYPTO_AEAD_DECRYPT.
+
+\subparagraph{Device Requirements: Crypto operation: AEAD decryption}\label{sec:Device Types / Crypto Device / Device
+Operation / Crypto operation / Crypto operation: AEAD operation / Device Requirements: Crypto operation: AEAD decryption}
+
+The device MUST verify and return the verify result to the driver.
+If the verify result is not correct, VIRTIO_CRYPTO_OP_BADMSG (bad message)
+MUST be returned the driver.