diff mbox series

[v1,1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

Message ID 5c8b83775b982e6ee851c127444a8e839f422ad0.1506377430.git.digetx@gmail.com
State Deferred
Headers show
Series NVIDIA Tegra20 video decoder driver | expand

Commit Message

Dmitry Osipenko Sept. 25, 2017, 10:15 p.m. UTC
Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
decoding of CAVLC H.264 only.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 .../bindings/arm/tegra/nvidia,tegra20-vde.txt      |  38 +
 drivers/staging/Kconfig                            |   2 +
 drivers/staging/Makefile                           |   1 +
 drivers/staging/tegra-vde/Kconfig                  |   6 +
 drivers/staging/tegra-vde/Makefile                 |   1 +
 drivers/staging/tegra-vde/TODO                     |   8 +
 drivers/staging/tegra-vde/uapi.h                   |  99 +++
 drivers/staging/tegra-vde/vde.c                    | 971 +++++++++++++++++++++
 8 files changed, 1126 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
 create mode 100644 drivers/staging/tegra-vde/Kconfig
 create mode 100644 drivers/staging/tegra-vde/Makefile
 create mode 100644 drivers/staging/tegra-vde/TODO
 create mode 100644 drivers/staging/tegra-vde/uapi.h
 create mode 100644 drivers/staging/tegra-vde/vde.c

Comments

Stephen Warren Sept. 25, 2017, 11:01 p.m. UTC | #1
On 09/25/2017 04:15 PM, Dmitry Osipenko wrote:
> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
> decoding of CAVLC H.264 only.

Note: I don't know anything much about video decoding on Tegra (just NV 
desktop GPUs, and that was a while ago), but I had a couple small 
comments on the DT binding:

> diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt

> +NVIDIA Tegra Video Decoder Engine
> +
> +Required properties:
> +- compatible : "nvidia,tegra20-vde"
> +- reg : Must contain 2 register ranges: registers and IRAM area.
> +- reg-names : Must include the following entries:
> +  - regs
> +  - iram

I think the IRAM region needs more explanation: What is the region used 
for and by what? Can it be moved, and if so does the move need to be 
co-ordinated with any other piece of SW?

> +- clocks : Must contain one entry, for the module clock.
> +  See ../clocks/clock-bindings.txt for details.
> +- resets : Must contain an entry for each entry in reset-names.
> +  See ../reset/reset.txt for details.
> +- reset-names : Must include the following entries:
> +  - vde

Let's require a clock-names property too.
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Osipenko Sept. 25, 2017, 11:45 p.m. UTC | #2
On 26.09.2017 02:01, Stephen Warren wrote:
> On 09/25/2017 04:15 PM, Dmitry Osipenko wrote:
>> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
>> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
>> decoding of CAVLC H.264 only.
> 
> Note: I don't know anything much about video decoding on Tegra (just NV desktop
> GPUs, and that was a while ago), but I had a couple small comments on the DT
> binding:
> 
>> diff --git
>> a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>> b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
> 
>> +NVIDIA Tegra Video Decoder Engine
>> +
>> +Required properties:
>> +- compatible : "nvidia,tegra20-vde"
>> +- reg : Must contain 2 register ranges: registers and IRAM area.
>> +- reg-names : Must include the following entries:
>> +  - regs
>> +  - iram
> 
> I think the IRAM region needs more explanation: What is the region used for and
> by what? Can it be moved, and if so does the move need to be co-ordinated with
> any other piece of SW?
> 

IRAM region is used by Video Decoder HW for internal use and some of decoding
parameters are supplied via IRAM, like frames order list. AFAIK IRAM addresses
are hardwired in HW and aren't movable, it is not 100% but I'm pretty sure.
Should it be explained in the binding?

>> +- clocks : Must contain one entry, for the module clock.
>> +  See ../clocks/clock-bindings.txt for details.
>> +- resets : Must contain an entry for each entry in reset-names.
>> +  See ../reset/reset.txt for details.
>> +- reset-names : Must include the following entries:
>> +  - vde
> 
> Let's require a clock-names property too.

Okay, I'll add this property to the binding.
Stephen Warren Sept. 26, 2017, 5:11 a.m. UTC | #3
On 09/25/2017 05:45 PM, Dmitry Osipenko wrote:
> On 26.09.2017 02:01, Stephen Warren wrote:
>> On 09/25/2017 04:15 PM, Dmitry Osipenko wrote:
>>> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
>>> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
>>> decoding of CAVLC H.264 only.
>>
>> Note: I don't know anything much about video decoding on Tegra (just NV desktop
>> GPUs, and that was a while ago), but I had a couple small comments on the DT
>> binding:
>>
>>> diff --git
>>> a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>>> b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>>
>>> +NVIDIA Tegra Video Decoder Engine
>>> +
>>> +Required properties:
>>> +- compatible : "nvidia,tegra20-vde"
>>> +- reg : Must contain 2 register ranges: registers and IRAM area.
>>> +- reg-names : Must include the following entries:
>>> +  - regs
>>> +  - iram
>>
>> I think the IRAM region needs more explanation: What is the region used for and
>> by what? Can it be moved, and if so does the move need to be co-ordinated with
>> any other piece of SW?
> 
> IRAM region is used by Video Decoder HW for internal use and some of decoding
> parameters are supplied via IRAM, like frames order list. AFAIK IRAM addresses
> are hardwired in HW and aren't movable, it is not 100% but I'm pretty sure.
> Should it be explained in the binding?

I think this should be briefly mentioned, yes. Otherwise at least people
who don't know the VDE HW well (like me) will wonder why on earth VDE
interacts with IRAM at all. I would have assumed all parameters were
supplied via registers or via descriptors in DRAM.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Osipenko Sept. 26, 2017, 12:02 p.m. UTC | #4
On 26.09.2017 08:11, Stephen Warren wrote:
> On 09/25/2017 05:45 PM, Dmitry Osipenko wrote:
>> On 26.09.2017 02:01, Stephen Warren wrote:
>>> On 09/25/2017 04:15 PM, Dmitry Osipenko wrote:
>>>> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
>>>> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
>>>> decoding of CAVLC H.264 only.
>>>
>>> Note: I don't know anything much about video decoding on Tegra (just NV desktop
>>> GPUs, and that was a while ago), but I had a couple small comments on the DT
>>> binding:
>>>
>>>> diff --git
>>>> a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>>>> b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>>>
>>>> +NVIDIA Tegra Video Decoder Engine
>>>> +
>>>> +Required properties:
>>>> +- compatible : "nvidia,tegra20-vde"
>>>> +- reg : Must contain 2 register ranges: registers and IRAM area.
>>>> +- reg-names : Must include the following entries:
>>>> +  - regs
>>>> +  - iram
>>>
>>> I think the IRAM region needs more explanation: What is the region used for and
>>> by what? Can it be moved, and if so does the move need to be co-ordinated with
>>> any other piece of SW?
>>
>> IRAM region is used by Video Decoder HW for internal use and some of decoding
>> parameters are supplied via IRAM, like frames order list. AFAIK IRAM addresses
>> are hardwired in HW and aren't movable, it is not 100% but I'm pretty sure.
>> Should it be explained in the binding?
> 
> I think this should be briefly mentioned, yes. Otherwise at least people
> who don't know the VDE HW well (like me) will wonder why on earth VDE
> interacts with IRAM at all. I would have assumed all parameters were
> supplied via registers or via descriptors in DRAM.
> 
> Thanks.
> 

I also forgot to mention that VDE scrubs that IRAM region on HW reset. So yeah,
it's definitely a part of HW definition. I'll add a brief explanation to the
binding.
Dan Carpenter Sept. 27, 2017, 9:45 a.m. UTC | #5
I feel like this code is good enough to go into the regular kernel
instead of staging, but I don't really know what "- properly handle
decoding faults" means in this context.  Does the driver Oops all the
time or what?

Anyway, minor comments inline.

On Tue, Sep 26, 2017 at 01:15:42AM +0300, Dmitry Osipenko wrote:
> diff --git a/drivers/staging/tegra-vde/Kconfig b/drivers/staging/tegra-vde/Kconfig
> new file mode 100644
> index 000000000000..b947c012a373
> --- /dev/null
> +++ b/drivers/staging/tegra-vde/Kconfig
> @@ -0,0 +1,6 @@
> +config TEGRA_VDE
> +	tristate "NVIDIA Tegra20 video decoder driver"
> +	depends on ARCH_TEGRA_2x_SOC

Could we get a || COMPILE_TEST here as well?

> +	help
> +	    Say Y here to enable support for a NVIDIA Tegra20 video decoder
> +	    driver.
> diff --git a/drivers/staging/tegra-vde/Makefile b/drivers/staging/tegra-vde/Makefile
> new file mode 100644
> index 000000000000..e7c0df1174bf
> --- /dev/null
> +++ b/drivers/staging/tegra-vde/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_TEGRA_VDE)	+= vde.o
> diff --git a/drivers/staging/tegra-vde/TODO b/drivers/staging/tegra-vde/TODO
> new file mode 100644
> index 000000000000..533ddfc5190e
> --- /dev/null
> +++ b/drivers/staging/tegra-vde/TODO
> @@ -0,0 +1,8 @@
> +All TODO's require reverse-engineering to be done first, it is very
> +unlikely that NVIDIA would ever release HW specs to public.
> +
> +TODO:
> +	- properly handle decoding faults
> +	- support more formats

Adding more formats is not a reason to put this into staging instead of
the normal drivers/ dir.

> +static int tegra_vde_setup_context(struct tegra_vde *vde,
> +				   struct tegra_vde_h264_decoder_ctx *ctx,
> +				   struct video_frame *dpb_frames,
> +				   phys_addr_t bitstream_data_paddr,
> +				   int bitstream_data_size,
> +				   int macroblocks_nb)
> +{
> +	struct device *dev = vde->miscdev.parent;
> +	u32 value;
> +
> +	tegra_vde_set_bits(vde->regs,    0xA, SXE(0xF0));
> +	tegra_vde_set_bits(vde->regs,    0xB, BSEV(CMDQUE_CONTROL));
> +	tegra_vde_set_bits(vde->regs, 0x8002, MBE(0x50));
> +	tegra_vde_set_bits(vde->regs,    0xA, MBE(0xA0));
> +	tegra_vde_set_bits(vde->regs,    0xA, PPE(0x14));
> +	tegra_vde_set_bits(vde->regs,    0xA, PPE(0x28));
> +	tegra_vde_set_bits(vde->regs,  0xA00, MCE(0x08));
> +	tegra_vde_set_bits(vde->regs,    0xA, TFE(0x00));
> +	tegra_vde_set_bits(vde->regs,    0x5, VDMA(0x04));
> +
> +	VDE_WR(0x00000000, vde->regs + VDMA(0x1C));
> +	VDE_WR(0x00000000, vde->regs + VDMA(0x00));
> +	VDE_WR(0x00000007, vde->regs + VDMA(0x04));
> +	VDE_WR(0x00000007, vde->regs + FRAMEID(0x200));
> +	VDE_WR(0x00000005, vde->regs + TFE(0x04));
> +	VDE_WR(0x00000000, vde->regs + MBE(0x84));
> +	VDE_WR(0x00000010, vde->regs + SXE(0x08));
> +	VDE_WR(0x00000150, vde->regs + SXE(0x54));
> +	VDE_WR(0x0000054C, vde->regs + SXE(0x58));
> +	VDE_WR(0x00000E34, vde->regs + SXE(0x5C));
> +	VDE_WR(0x063C063C, vde->regs + MCE(0x10));
> +	VDE_WR(0x0003FC00, vde->regs + BSEV(INTR_STATUS));
> +	VDE_WR(0x0000150D, vde->regs + BSEV(BSE_CONFIG));
> +	VDE_WR(0x00000100, vde->regs + BSEV(BSE_INT_ENB));
> +	VDE_WR(0x00000000, vde->regs + BSEV(0x98));
> +	VDE_WR(0x00000060, vde->regs + BSEV(0x9C));
> +
> +	memset_io(vde->iram + 512, 0, macroblocks_nb / 2);
> +
> +	tegra_setup_frameidx(vde->regs, dpb_frames, ctx->dpb_frames_nb,
> +			     ctx->pic_width_in_mbs, ctx->pic_height_in_mbs);
> +
> +	tegra_vde_setup_iram_tables(vde->iram, dpb_frames,
> +				    ctx->dpb_frames_nb - 1,
> +				    ctx->dpb_ref_frames_with_earlier_poc_nb);
> +
> +	VDE_WR(0x00000000, vde->regs + BSEV(0x8C));
> +	VDE_WR(bitstream_data_paddr + bitstream_data_size,
> +	       vde->regs + BSEV(0x54));
> +
> +	value = ctx->pic_width_in_mbs << 11 | ctx->pic_height_in_mbs << 3;
> +
> +	VDE_WR(value, vde->regs + BSEV(0x88));
> +
> +	if (tegra_vde_wait_bsev(vde, false))
> +		return -EIO;
> +
> +	if (tegra_vde_push_bsev_icmdqueue(vde, 0x800003FC, false))

Preserve the error code from tegra_vde_push_bsev_icmdqueue().  It's
still -EIO so this doesn't affect runtime.

> +		return -EIO;
> +
> +	value = 0x01500000;
> +	value |= ((vde->iram_lists_paddr + 512) >> 2) & 0xFFFF;
> +
> +	if (tegra_vde_push_bsev_icmdqueue(vde, value, true))

Same for all.

> +		return -EIO;
> +
> +	if (tegra_vde_push_bsev_icmdqueue(vde, 0x840F054C, false))
> +		return -EIO;
> +
> +	if (tegra_vde_push_bsev_icmdqueue(vde, 0x80000080, false))
> +		return -EIO;
> +
> +	value = 0x0E340000 | ((vde->iram_lists_paddr >> 2) & 0xFFFF);
> +
> +	if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
> +		return -EIO;
> +
> +	value = (1 << 23) | 5;

I don't like these magic numbers.

> +	value |= ctx->pic_width_in_mbs << 11;
> +	value |= ctx->pic_height_in_mbs << 3;
> +
> +	VDE_WR(value, vde->regs + SXE(0x10));
> +
> +	value = !ctx->is_baseline_profile << 17;
> +	value |= ctx->level_idc << 13;
> +	value |= ctx->log2_max_pic_order_cnt_lsb << 7;
> +	value |= ctx->pic_order_cnt_type << 5;
> +	value |= ctx->log2_max_frame_num;
> +
> +	VDE_WR(value, vde->regs + SXE(0x40));
> +
> +	value = ctx->pic_init_qp << 25;
> +	value |= !!(ctx->deblocking_filter_control_present_flag) << 2;
> +	value |= !!ctx->pic_order_present_flag;
> +
> +	VDE_WR(value, vde->regs + SXE(0x44));
> +
> +	value = ctx->chroma_qp_index_offset;
> +	value |= ctx->num_ref_idx_l0_active_minus1 << 5;
> +	value |= ctx->num_ref_idx_l1_active_minus1 << 10;
> +	value |= !!ctx->constrained_intra_pred_flag << 15;
> +
> +	VDE_WR(value, vde->regs + SXE(0x48));
> +
> +	value = 0x0C000000;
> +	value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 24;
> +
> +	VDE_WR(value, vde->regs + SXE(0x4C));
> +
> +	value = 0x03800000;
> +	value |= min(bitstream_data_size, SZ_1M);
> +
> +	VDE_WR(value, vde->regs + SXE(0x68));
> +
> +	VDE_WR(bitstream_data_paddr, vde->regs + SXE(0x6C));
> +
> +	value = (1 << 28) | 5;
> +	value |= ctx->pic_width_in_mbs << 11;
> +	value |= ctx->pic_height_in_mbs << 3;
> +
> +	VDE_WR(value, vde->regs + MBE(0x80));
> +
> +	value = 0x26800000;
> +	value |= ctx->level_idc << 4;
> +	value |= !ctx->is_baseline_profile << 1;
> +	value |= !!ctx->direct_8x8_inference_flag;
> +
> +	VDE_WR(value, vde->regs + MBE(0x80));
> +
> +	VDE_WR(0xF4000001, vde->regs + MBE(0x80));
> +	VDE_WR(0x20000000, vde->regs + MBE(0x80));
> +	VDE_WR(0xF4000101, vde->regs + MBE(0x80));
> +
> +	value = 0x20000000;
> +	value |= ctx->chroma_qp_index_offset << 8;
> +
> +	VDE_WR(value, vde->regs + MBE(0x80));
> +
> +	if (tegra_vde_setup_mbe_frame_idx(vde->regs,
> +					  ctx->pic_order_cnt_type == 0,
> +					  ctx->dpb_frames_nb - 1)) {


Preserve the error code.

> +		dev_err(dev, "MBE frames setup failed\n");
> +		return -EIO;
> +	}
> +
> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 0, 0x000009FC);
> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 2, 0xF1DEAD00);
> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 4, 0xF2DEAD00);
> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 6, 0xF3DEAD00);
> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 8, dpb_frames[0].aux_paddr);
> +
> +	value = 0xFC000000;
> +	value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 2;
> +
> +	if (!ctx->is_baseline_profile)
> +		value |= !!(dpb_frames[0].flags & FLAG_IS_REFERENCE) << 1;
> +
> +	VDE_WR(value, vde->regs + MBE(0x80));
> +
> +	if (tegra_vde_wait_mbe(vde->regs)) {
> +		dev_err(dev, "MBE programming failed\n");
> +		return -EIO;
> +	}
> +
> +	return 0;
> +}
> +
> +static void tegra_vde_decode_frame(struct tegra_vde *vde, int macroblocks_nb)
> +{
> +	VDE_WR(0x00000001, vde->regs + BSEV(0x8C));
> +	VDE_WR(0x20000000 | (macroblocks_nb - 1), vde->regs + SXE(0x00));
> +}
> +
> +static void tegra_vde_detach_and_put_dmabuf(struct dma_buf_attachment *a)
> +{
> +	struct dma_buf *dmabuf = a->dmabuf;
> +
> +	if (IS_ERR_OR_NULL(a))

You just dereferenced this on the line before so it's too late.

Obviously we don't want to dereference either an error pointer or a NULL
but I'm wondering the significance of having it be both.  Normally that
would mean that NULL is a special case of success.  In other words,
error pointer means the hardware is broken but NULL means it's missing
and not required.  I guess I'm suggesting to just delete the check and
make sure we never set this to either NULL or ERR_PTR.

> +		return;
> +
> +	dma_buf_detach(dmabuf, a);
> +	dma_buf_put(dmabuf);
> +}
> +
> +static int tegra_vde_attach_dmabuf(struct device *dev, int fd,
> +				   unsigned long offset, int min_size,
> +				   struct dma_buf_attachment **a,
> +				   phys_addr_t *paddr, u32 *size,
> +				   enum dma_data_direction dma_dir)
> +{
> +	struct dma_buf_attachment *attachment;
> +	struct dma_buf *dmabuf;
> +	struct sg_table *sgt;
> +
> +	*a = NULL;
> +	*paddr = 0xFBDEAD00;
> +
> +	dmabuf = dma_buf_get(fd);
> +	if (IS_ERR(dmabuf)) {
> +		dev_err(dev, "Invalid dmabuf FD\n");
> +		return PTR_ERR(dmabuf);
> +	}
> +
> +	if ((u64)offset + min_size > dmabuf->size) {
> +		dev_err(dev, "Too small dmabuf size %d @0x%lX, "
> +			     "should be at least %d\n",
> +			dmabuf->size, offset, min_size);
> +		return -EINVAL;
> +	}
> +
> +	attachment = dma_buf_attach(dmabuf, dev);
> +	if (IS_ERR(attachment)) {
> +		dev_err(dev, "Failed to attach dmabuf\n");
> +		dma_buf_put(dmabuf);
> +		return PTR_ERR(attachment);
> +	}
> +
> +	sgt = dma_buf_map_attachment(attachment, dma_dir);
> +	if (IS_ERR(sgt)) {
> +		dev_err(dev, "Failed to get dmabuf sg_table\n");
> +		dma_buf_detach(dmabuf, attachment);
> +		dma_buf_put(dmabuf);
> +		return PTR_ERR(sgt);
> +	}
> +
> +	if (sgt->nents != 1) {
> +		dev_err(dev, "Sparsed DMA area is unsupported\n");

s/Sparsed/Sparse/

> +		dma_buf_unmap_attachment(attachment, sgt, dma_dir);
> +		dma_buf_detach(dmabuf, attachment);
> +		dma_buf_put(dmabuf);
> +		return -EINVAL;

This function would be cleaner using gotos to unwind.  Pick the goto
name to reflect what the goto does.  For example, here it would be:

	if (sgt->nents != 1) {
		dev_err(dev, "Sparse DMA area is unsupported\n");
		ret = -EINVAL;
		goto err_umap;
	}



> +	}
> +
> +	*paddr = sg_dma_address(sgt->sgl) + offset;
> +
> +	dma_buf_unmap_attachment(attachment, sgt, dma_dir);
> +
> +	*a = attachment;
> +
> +	if (size)
> +		*size = dmabuf->size - offset;
> +
> +	return 0;

	return 0;

err_unmap:
	dma_buf_unmap_attachment(attachment, sgt, dma_dir);
err_detach:
	dma_buf_detach(dmabuf, attachment);
err_put:
	dma_buf_put(dmabuf);
	return ret;

> +}
> +
> +static int tegra_vde_attach_frame_dmabufs(struct device *dev,
> +					  struct video_frame *frame,
> +					  struct tegra_vde_h264_frame *source,
> +					  enum dma_data_direction dma_dir,
> +					  int is_baseline_profile, int csize)
> +{
> +	int ret;
> +
> +	ret = tegra_vde_attach_dmabuf(dev, source->y_fd,
> +				      source->y_offset, csize * 4,
> +				      &frame->y_dmabuf_attachment,
> +				      &frame->y_paddr, NULL, dma_dir);
> +	if (ret)
> +		return ret;
> +
> +	ret = tegra_vde_attach_dmabuf(dev, source->cb_fd,
> +				      source->cb_offset, csize,
> +				      &frame->cb_dmabuf_attachment,
> +				      &frame->cb_paddr, NULL, dma_dir);
> +	if (ret)
> +		return ret;
> +
> +	ret = tegra_vde_attach_dmabuf(dev, source->cr_fd,
> +				      source->cr_offset, csize,
> +				      &frame->cr_dmabuf_attachment,
> +				      &frame->cr_paddr, NULL, dma_dir);
> +	if (ret)
> +		return ret;
> +
> +	if (is_baseline_profile)
> +		frame->aux_paddr = 0xF4DEAD00;

The handling of is_baseline_profile is strange to me.  It feels like we
should always check it before we use ->aux_paddr but we don't ever do
that.

> +	else
> +		ret = tegra_vde_attach_dmabuf(dev, source->aux_fd,
> +					      source->aux_offset, csize,
> +					      &frame->aux_dmabuf_attachment,
> +					      &frame->aux_paddr, NULL, dma_dir);


This function should have some error handling to undo the earlier
attach calls if the latter ones fail.  See below:


> +
> +	return ret;

	return 0;

err_detach_cr:
	tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
err_detach_cb:
	tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
err_detach_y:
	tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);

	return ret;


> +}
> +
> +static void tegra_vde_deattach_frame_dmabufs(struct video_frame *frame)
> +{
> +	tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);
> +	tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
> +	tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
> +	tegra_vde_detach_and_put_dmabuf(frame->aux_dmabuf_attachment);


It would be happier to write this in the reverse order so it matches
the error handling that I wrote for you.


> +}
> +
> +static int tegra_vde_copy_and_validate_frame(struct device *dev,
> +					     struct tegra_vde_h264_frame *frame,
> +					     unsigned long vaddr)
> +{
> +	if (copy_from_user(frame, (void __user *)vaddr, sizeof(*frame))) {
> +		dev_err(dev, "Copying of H.264 frame from user failed\n");
> +		return -EFAULT;

Error message are a funny thing and different people feel different
ways.  These can be triggered by the user so they let you spam dmesg
but I can see how many of them would be useful.  These ones for
copy_from_user() are not useful since we assume the programmer should
know that stuff.  The error code seems enough.

> +	}
> +
> +	if (frame->frame_num > 0x7FFFFF) {
> +		dev_err(dev, "Bad frame_num %u\n", frame->frame_num);
> +		return -EINVAL;
> +	}
> +
> +	if (frame->y_offset & 0xFF) {
> +		dev_err(dev, "Bad y_offset 0x%X\n", frame->y_offset);
> +		return -EINVAL;
> +	}
> +
> +	if (frame->cb_offset & 0xFF) {
> +		dev_err(dev, "Bad cb_offset 0x%X\n", frame->cb_offset);
> +		return -EINVAL;
> +	}
> +
> +	if (frame->cr_offset & 0xFF) {
> +		dev_err(dev, "Bad cr_offset 0x%X\n", frame->cr_offset);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int tegra_vde_validate_h264_ctx(struct device *dev,
> +				       struct tegra_vde_h264_decoder_ctx *ctx)
> +{
> +	if (ctx->dpb_frames_nb == 0 || ctx->dpb_frames_nb > 17) {
> +		dev_err(dev, "Bad DPB size %u\n", ctx->dpb_frames_nb);
> +		return -EINVAL;
> +	}
> +
> +	if (ctx->level_idc > 15) {
> +		dev_err(dev, "Bad level value %u\n", ctx->level_idc);
> +		return -EINVAL;
> +	}
> +
> +	if (ctx->pic_init_qp > 52) {
> +		dev_err(dev, "Bad pic_init_qp value %u\n", ctx->pic_init_qp);
> +		return -EINVAL;
> +	}
> +
> +	if (ctx->log2_max_pic_order_cnt_lsb > 16) {
> +		dev_err(dev, "Bad log2_max_pic_order_cnt_lsb value %u\n",
> +			ctx->log2_max_pic_order_cnt_lsb);
> +		return -EINVAL;
> +	}
> +
> +	if (ctx->log2_max_frame_num > 16) {
> +		dev_err(dev, "Bad log2_max_frame_num value %u\n",
> +			ctx->log2_max_frame_num);
> +		return -EINVAL;
> +	}
> +
> +	if (ctx->chroma_qp_index_offset > 31) {
> +		dev_err(dev, "Bad chroma_qp_index_offset value %u\n",
> +			ctx->chroma_qp_index_offset);
> +		return -EINVAL;
> +	}
> +
> +	if (ctx->pic_order_cnt_type > 2) {
> +		dev_err(dev, "Bad pic_order_cnt_type value %u\n",
> +			ctx->pic_order_cnt_type);
> +		return -EINVAL;
> +	}
> +
> +	if (ctx->num_ref_idx_l0_active_minus1 > 15) {
> +		dev_err(dev, "Bad num_ref_idx_l0_active_minus1 value %u\n",
> +			ctx->num_ref_idx_l0_active_minus1);
> +		return -EINVAL;
> +	}
> +
> +	if (ctx->num_ref_idx_l1_active_minus1 > 15) {
> +		dev_err(dev, "Bad num_ref_idx_l1_active_minus1 value %u\n",
> +			ctx->num_ref_idx_l1_active_minus1);
> +		return -EINVAL;
> +	}
> +
> +	if (!ctx->pic_width_in_mbs || ctx->pic_width_in_mbs > 127) {
> +		dev_err(dev, "Bad pic_width_in_mbs value %u, min 1 max 127\n",
> +			ctx->pic_width_in_mbs);
> +		return -EINVAL;
> +	}
> +
> +	if (!ctx->pic_height_in_mbs || ctx->pic_height_in_mbs > 127) {
> +		dev_err(dev, "Bad pic_height_in_mbs value %u, min 1 max 127\n",
> +			ctx->pic_height_in_mbs);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde,
> +				       unsigned long vaddr)
> +{
> +	struct tegra_vde_h264_decoder_ctx ctx;
> +	struct tegra_vde_h264_frame frame;
> +	struct device *dev = vde->miscdev.parent;
> +	struct video_frame *dpb_frames = NULL;
> +	struct dma_buf_attachment *bitstream_data_dmabuf_attachment = NULL;
> +	enum dma_data_direction dma_dir;
> +	phys_addr_t bitstream_data_paddr;
> +	phys_addr_t bsev_paddr;
> +	u32 bitstream_data_size;
> +	u32 macroblocks_nb;
> +	bool timeout = false;
> +	int i, ret;
> +
> +	if (copy_from_user(&ctx, (void __user *)vaddr, sizeof(ctx))) {
> +		dev_err(dev, "Copying of H.264 CTX from user failed\n");
> +		return -EFAULT;
> +	}
> +
> +	ret = tegra_vde_validate_h264_ctx(dev, &ctx);
> +	if (ret)
> +		return -EINVAL;
> +
> +	ret = tegra_vde_attach_dmabuf(dev, ctx.bitstream_data_fd,
> +				      ctx.bitstream_data_offset, 0,
> +				      &bitstream_data_dmabuf_attachment,
> +				      &bitstream_data_paddr,
> +				      &bitstream_data_size,
> +				      DMA_TO_DEVICE);
> +	if (ret)
> +		goto cleanup;


I hate this label name.  What are we cleaning up???  Now I have to set a
bookmark so I can remember where I left and then scroll down to the
bottom of the function and take a look.

[pause]

OK.  I'm back.  I call this "one err" style error handling.  And it's
very bug prone.  Please read my essay on the topic:

https://plus.google.com/106378716002406849458/posts/dnanfhQ4mHQ

The bug is that we call tegra_vde_detach_and_put_dmabuf() with a NULL
pointer and there was that dereference before check that I mentioned
earlier.

> +
> +	dpb_frames = kcalloc(ctx.dpb_frames_nb, sizeof(*dpb_frames),
> +			     GFP_KERNEL);
> +	if (!dpb_frames) {
> +		ret = -ENOMEM;
> +		goto cleanup;
> +	}
> +
> +	macroblocks_nb = ctx.pic_width_in_mbs * ctx.pic_height_in_mbs;
> +	vaddr = ctx.dpb_frames_ptr;
> +
> +	for (i = 0; i < ctx.dpb_frames_nb; i++) {
> +		ret = tegra_vde_copy_and_validate_frame(dev, &frame, vaddr);
> +		if (ret)
> +			goto cleanup;
> +
> +		dpb_frames[i].flags = frame.flags;
> +		dpb_frames[i].frame_num = frame.frame_num;
> +
> +		dma_dir = (i == 0) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +		ret = tegra_vde_attach_frame_dmabufs(dev,
> +						     &dpb_frames[i], &frame,
> +						     dma_dir,
> +						     ctx.is_baseline_profile,
> +						     macroblocks_nb * 64);
> +		if (ret) {
> +			tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
> +			goto cleanup;
> +		}
> +
> +		vaddr += sizeof(frame);
> +	}
> +
> +	ret = clk_prepare_enable(vde->clk);
> +	if (ret) {
> +		dev_err(dev, "Failed to enable clk: %d\n", ret);
> +		goto cleanup;
> +	}
> +
> +	ret = mutex_lock_interruptible(&vde->lock);
> +	if (ret)
> +		goto clkgate;
> +
> +	ret = reset_control_deassert(vde->rst);
> +	if (ret) {
> +		dev_err(dev, "Failed to deassert reset: %d\n", ret);
> +		clk_disable_unprepare(vde->clk);

We do a second clk_disable_unprepare(vde->clk); after the unlock.
Delete this?

> +		goto unlock;
> +	}
> +
> +	ret = tegra_vde_setup_context(vde, &ctx, dpb_frames,
> +				      bitstream_data_paddr,
> +				      bitstream_data_size,
> +				      macroblocks_nb);
> +	if (ret)
> +		goto reset;
> +
> +	tegra_vde_decode_frame(vde, macroblocks_nb);
> +
> +	timeout = !wait_for_completion_io_timeout(&vde->decode_completion,
> +						  TEGRA_VDE_TIMEOUT);
> +	if (timeout) {
> +		bsev_paddr = readl(vde->regs + BSEV(0x10));
> +		macroblocks_nb = readl(vde->regs + SXE(0xC8)) & 0x1FFF;
> +
> +		dev_err(dev, "Decoding failed, "
> +				"read 0x%X bytes : %u macroblocks parsed\n",
> +			bsev_paddr ? bsev_paddr - bitstream_data_paddr : 0,
> +			macroblocks_nb);
> +	}
> +
> +reset:
> +	/*
> +	 * We rely on the VDE registers reset value, otherwise VDE would
> +	 * cause bus lockup.
> +	 */
> +	ret = reset_control_assert(vde->rst);
> +	if (ret)
> +		dev_err(dev, "Failed to assert reset: %d\n", ret);

We're overwriting "ret" here when we probably want to preserve the error
code from reset_control_deassert().

> +
> +	if (timeout)
> +		ret = -EIO;
> +
> +unlock:
> +	mutex_unlock(&vde->lock);
> +
> +clkgate:
> +	clk_disable_unprepare(vde->clk);
> +
> +cleanup:
> +	if (dpb_frames)
> +		while (i--)
> +			tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
> +
> +	kfree(dpb_frames);
> +
> +	tegra_vde_detach_and_put_dmabuf(bitstream_data_dmabuf_attachment);
> +
> +	return ret;
> +}
> +
> +static long tegra_vde_unlocked_ioctl(struct file *filp,
> +				     unsigned int cmd, unsigned long arg)
> +{
> +	struct miscdevice *miscdev = filp->private_data;
> +	struct tegra_vde *vde = container_of(miscdev, struct tegra_vde,
> +					     miscdev);
> +
> +	switch (cmd) {
> +	case TEGRA_VDE_IOCTL_DECODE_H264:
> +		return tegra_vde_ioctl_decode_h264(vde, arg);
> +	}
> +
> +	dev_err(miscdev->parent, "Invalid IOCTL command %u\n", cmd);
> +
> +	return -ENOTTY;
> +}
> +
> +static const struct file_operations tegra_vde_fops = {
> +	.owner		= THIS_MODULE,
> +	.unlocked_ioctl	= tegra_vde_unlocked_ioctl,
> +};
> +
> +static irqreturn_t tegra_vde_isr(int irq, void *data)
> +{
> +	struct tegra_vde *vde = data;
> +
> +	tegra_vde_set_bits(vde->regs, 0, FRAMEID(0x208));
> +	complete(&vde->decode_completion);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static int tegra_vde_probe(struct platform_device *pdev)
> +{
> +	struct resource *res_regs, *res_iram;
> +	struct device *dev = &pdev->dev;
> +	struct tegra_vde *vde;
> +	int ret;
> +
> +	vde = devm_kzalloc(&pdev->dev, sizeof(*vde), GFP_KERNEL);
> +	if (!vde)
> +		return -ENOMEM;
> +
> +	platform_set_drvdata(pdev, vde);
> +
> +	res_regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
> +	if (!res_regs)
> +		return -ENODEV;
> +
> +	res_iram = platform_get_resource_byname(pdev, IORESOURCE_MEM, "iram");
> +	if (!res_iram)
> +		return -ENODEV;
> +
> +	vde->irq = platform_get_irq_byname(pdev, "sync-token");
> +	if (vde->irq < 0)
> +		return vde->irq;
> +
> +	vde->regs = devm_ioremap_resource(dev, res_regs);
> +	if (IS_ERR(vde->regs))
> +		return PTR_ERR(vde->regs);
> +
> +	vde->iram = devm_ioremap_resource(dev, res_iram);
> +	if (IS_ERR(vde->iram))
> +		return PTR_ERR(vde->iram);
> +
> +	vde->clk = devm_clk_get(dev, NULL);
> +	if (IS_ERR(vde->clk)) {
> +		dev_err(dev, "Could not get VDE clk\n");
> +		return PTR_ERR(vde->clk);
> +	}
> +
> +	vde->rst = devm_reset_control_get(dev, "vde");
> +	if (IS_ERR(vde->rst)) {
> +		dev_err(dev, "Could not get VDE reset\n");
> +		return PTR_ERR(vde->rst);
> +	}
> +
> +	ret = devm_request_irq(dev, vde->irq, tegra_vde_isr, IRQF_SHARED,
> +			       dev_name(dev), vde);
> +	if (ret)
> +		return -ENODEV;

Presever the error code.

> +
> +	ret = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_VDEC,
> +						vde->clk, vde->rst);
> +	if (ret) {
> +		dev_err(&pdev->dev, "Failed to power up VDE unit\n");
> +		return ret;
> +	}
> +
> +	ret = reset_control_assert(vde->rst);
> +	if (ret) {
> +		dev_err(dev, "Failed to assert reset: %d\n", ret);
> +		return ret;
> +	}
> +
> +	clk_disable_unprepare(vde->clk);
> +
> +	mutex_init(&vde->lock);
> +	init_completion(&vde->decode_completion);
> +
> +	vde->iram_lists_paddr = res_iram->start;
> +	vde->miscdev.minor = MISC_DYNAMIC_MINOR;
> +	vde->miscdev.name = "tegra_vde";
> +	vde->miscdev.fops = &tegra_vde_fops;
> +	vde->miscdev.parent = dev;
> +
> +	ret = misc_register(&vde->miscdev);
> +	if (ret)
> +		return -ENODEV;

Preserve the error code.

> +
> +	return 0;
> +}
> +

regards,
dan carpenter


--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Osipenko Sept. 27, 2017, 11:28 p.m. UTC | #6
On 27.09.2017 12:45, Dan Carpenter wrote:
> I feel like this code is good enough to go into the regular kernel
> instead of staging, but I don't really know what "- properly handle
> decoding faults" means in this context.

As Greg pointed out, this patch should be cc'ed to media maintainers for a
review. I'll cc them on V2, will see what they would suggest. Yeah, probably the
decoding faults isn't a very candidate for a TODO for staging.

  Does the driver Oops all the
> time or what?
> 

Driver works fine.

> Anyway, minor comments inline.
> 

Thank you very much for the awesome review. I agree with the most of the comments.

> On Tue, Sep 26, 2017 at 01:15:42AM +0300, Dmitry Osipenko wrote:
>> diff --git a/drivers/staging/tegra-vde/Kconfig b/drivers/staging/tegra-vde/Kconfig
>> new file mode 100644
>> index 000000000000..b947c012a373
>> --- /dev/null
>> +++ b/drivers/staging/tegra-vde/Kconfig
>> @@ -0,0 +1,6 @@
>> +config TEGRA_VDE
>> +	tristate "NVIDIA Tegra20 video decoder driver"
>> +	depends on ARCH_TEGRA_2x_SOC
> 
> Could we get a || COMPILE_TEST here as well?
> 
>> +	help
>> +	    Say Y here to enable support for a NVIDIA Tegra20 video decoder
>> +	    driver.
>> diff --git a/drivers/staging/tegra-vde/Makefile b/drivers/staging/tegra-vde/Makefile
>> new file mode 100644
>> index 000000000000..e7c0df1174bf
>> --- /dev/null
>> +++ b/drivers/staging/tegra-vde/Makefile
>> @@ -0,0 +1 @@
>> +obj-$(CONFIG_TEGRA_VDE)	+= vde.o
>> diff --git a/drivers/staging/tegra-vde/TODO b/drivers/staging/tegra-vde/TODO
>> new file mode 100644
>> index 000000000000..533ddfc5190e
>> --- /dev/null
>> +++ b/drivers/staging/tegra-vde/TODO
>> @@ -0,0 +1,8 @@
>> +All TODO's require reverse-engineering to be done first, it is very
>> +unlikely that NVIDIA would ever release HW specs to public.
>> +
>> +TODO:
>> +	- properly handle decoding faults
>> +	- support more formats
> 
> Adding more formats is not a reason to put this into staging instead of
> the normal drivers/ dir.
> 

Well, it feels that the driver isn't very acceptable for the core drivers/. But
maybe it's a wrong feeling. Again, will see what media/ maintainers would suggest.

>> +static int tegra_vde_setup_context(struct tegra_vde *vde,
>> +				   struct tegra_vde_h264_decoder_ctx *ctx,
>> +				   struct video_frame *dpb_frames,
>> +				   phys_addr_t bitstream_data_paddr,
>> +				   int bitstream_data_size,
>> +				   int macroblocks_nb)
>> +{
>> +	struct device *dev = vde->miscdev.parent;
>> +	u32 value;
>> +
>> +	tegra_vde_set_bits(vde->regs,    0xA, SXE(0xF0));
>> +	tegra_vde_set_bits(vde->regs,    0xB, BSEV(CMDQUE_CONTROL));
>> +	tegra_vde_set_bits(vde->regs, 0x8002, MBE(0x50));
>> +	tegra_vde_set_bits(vde->regs,    0xA, MBE(0xA0));
>> +	tegra_vde_set_bits(vde->regs,    0xA, PPE(0x14));
>> +	tegra_vde_set_bits(vde->regs,    0xA, PPE(0x28));
>> +	tegra_vde_set_bits(vde->regs,  0xA00, MCE(0x08));
>> +	tegra_vde_set_bits(vde->regs,    0xA, TFE(0x00));
>> +	tegra_vde_set_bits(vde->regs,    0x5, VDMA(0x04));
>> +
>> +	VDE_WR(0x00000000, vde->regs + VDMA(0x1C));
>> +	VDE_WR(0x00000000, vde->regs + VDMA(0x00));
>> +	VDE_WR(0x00000007, vde->regs + VDMA(0x04));
>> +	VDE_WR(0x00000007, vde->regs + FRAMEID(0x200));
>> +	VDE_WR(0x00000005, vde->regs + TFE(0x04));
>> +	VDE_WR(0x00000000, vde->regs + MBE(0x84));
>> +	VDE_WR(0x00000010, vde->regs + SXE(0x08));
>> +	VDE_WR(0x00000150, vde->regs + SXE(0x54));
>> +	VDE_WR(0x0000054C, vde->regs + SXE(0x58));
>> +	VDE_WR(0x00000E34, vde->regs + SXE(0x5C));
>> +	VDE_WR(0x063C063C, vde->regs + MCE(0x10));
>> +	VDE_WR(0x0003FC00, vde->regs + BSEV(INTR_STATUS));
>> +	VDE_WR(0x0000150D, vde->regs + BSEV(BSE_CONFIG));
>> +	VDE_WR(0x00000100, vde->regs + BSEV(BSE_INT_ENB));
>> +	VDE_WR(0x00000000, vde->regs + BSEV(0x98));
>> +	VDE_WR(0x00000060, vde->regs + BSEV(0x9C));
>> +
>> +	memset_io(vde->iram + 512, 0, macroblocks_nb / 2);
>> +
>> +	tegra_setup_frameidx(vde->regs, dpb_frames, ctx->dpb_frames_nb,
>> +			     ctx->pic_width_in_mbs, ctx->pic_height_in_mbs);
>> +
>> +	tegra_vde_setup_iram_tables(vde->iram, dpb_frames,
>> +				    ctx->dpb_frames_nb - 1,
>> +				    ctx->dpb_ref_frames_with_earlier_poc_nb);
>> +
>> +	VDE_WR(0x00000000, vde->regs + BSEV(0x8C));
>> +	VDE_WR(bitstream_data_paddr + bitstream_data_size,
>> +	       vde->regs + BSEV(0x54));
>> +
>> +	value = ctx->pic_width_in_mbs << 11 | ctx->pic_height_in_mbs << 3;
>> +
>> +	VDE_WR(value, vde->regs + BSEV(0x88));
>> +
>> +	if (tegra_vde_wait_bsev(vde, false))
>> +		return -EIO;
>> +
>> +	if (tegra_vde_push_bsev_icmdqueue(vde, 0x800003FC, false))
> 
> Preserve the error code from tegra_vde_push_bsev_icmdqueue().  It's
> still -EIO so this doesn't affect runtime.
> 

Okay, I'll propagate error code in all other places as well.

>> +		return -EIO;
>> +
>> +	value = 0x01500000;
>> +	value |= ((vde->iram_lists_paddr + 512) >> 2) & 0xFFFF;
>> +
>> +	if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
> 
> Same for all.
> 
>> +		return -EIO;
>> +
>> +	if (tegra_vde_push_bsev_icmdqueue(vde, 0x840F054C, false))
>> +		return -EIO;
>> +
>> +	if (tegra_vde_push_bsev_icmdqueue(vde, 0x80000080, false))
>> +		return -EIO;
>> +
>> +	value = 0x0E340000 | ((vde->iram_lists_paddr >> 2) & 0xFFFF);
>> +
>> +	if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
>> +		return -EIO;
>> +
>> +	value = (1 << 23) | 5;
> 
> I don't like these magic numbers.
> 

I don't know what these bits do, at best I can replace them with a hex for
consistency.

>> +	value |= ctx->pic_width_in_mbs << 11;
>> +	value |= ctx->pic_height_in_mbs << 3;
>> +
>> +	VDE_WR(value, vde->regs + SXE(0x10));
>> +
>> +	value = !ctx->is_baseline_profile << 17;
>> +	value |= ctx->level_idc << 13;
>> +	value |= ctx->log2_max_pic_order_cnt_lsb << 7;
>> +	value |= ctx->pic_order_cnt_type << 5;
>> +	value |= ctx->log2_max_frame_num;
>> +
>> +	VDE_WR(value, vde->regs + SXE(0x40));
>> +
>> +	value = ctx->pic_init_qp << 25;
>> +	value |= !!(ctx->deblocking_filter_control_present_flag) << 2;
>> +	value |= !!ctx->pic_order_present_flag;
>> +
>> +	VDE_WR(value, vde->regs + SXE(0x44));
>> +
>> +	value = ctx->chroma_qp_index_offset;
>> +	value |= ctx->num_ref_idx_l0_active_minus1 << 5;
>> +	value |= ctx->num_ref_idx_l1_active_minus1 << 10;
>> +	value |= !!ctx->constrained_intra_pred_flag << 15;
>> +
>> +	VDE_WR(value, vde->regs + SXE(0x48));
>> +
>> +	value = 0x0C000000;
>> +	value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 24;
>> +
>> +	VDE_WR(value, vde->regs + SXE(0x4C));
>> +
>> +	value = 0x03800000;
>> +	value |= min(bitstream_data_size, SZ_1M);
>> +
>> +	VDE_WR(value, vde->regs + SXE(0x68));
>> +
>> +	VDE_WR(bitstream_data_paddr, vde->regs + SXE(0x6C));
>> +
>> +	value = (1 << 28) | 5;
>> +	value |= ctx->pic_width_in_mbs << 11;
>> +	value |= ctx->pic_height_in_mbs << 3;
>> +
>> +	VDE_WR(value, vde->regs + MBE(0x80));
>> +
>> +	value = 0x26800000;
>> +	value |= ctx->level_idc << 4;
>> +	value |= !ctx->is_baseline_profile << 1;
>> +	value |= !!ctx->direct_8x8_inference_flag;
>> +
>> +	VDE_WR(value, vde->regs + MBE(0x80));
>> +
>> +	VDE_WR(0xF4000001, vde->regs + MBE(0x80));
>> +	VDE_WR(0x20000000, vde->regs + MBE(0x80));
>> +	VDE_WR(0xF4000101, vde->regs + MBE(0x80));
>> +
>> +	value = 0x20000000;
>> +	value |= ctx->chroma_qp_index_offset << 8;
>> +
>> +	VDE_WR(value, vde->regs + MBE(0x80));
>> +
>> +	if (tegra_vde_setup_mbe_frame_idx(vde->regs,
>> +					  ctx->pic_order_cnt_type == 0,
>> +					  ctx->dpb_frames_nb - 1)) {
> 
> 
> Preserve the error code.
> 
>> +		dev_err(dev, "MBE frames setup failed\n");
>> +		return -EIO;
>> +	}
>> +
>> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 0, 0x000009FC);
>> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 2, 0xF1DEAD00);
>> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 4, 0xF2DEAD00);
>> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 6, 0xF3DEAD00);
>> +	tegra_vde_mbe_set_0xa_reg(vde->regs, 8, dpb_frames[0].aux_paddr);
>> +
>> +	value = 0xFC000000;
>> +	value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 2;
>> +
>> +	if (!ctx->is_baseline_profile)
>> +		value |= !!(dpb_frames[0].flags & FLAG_IS_REFERENCE) << 1;
>> +
>> +	VDE_WR(value, vde->regs + MBE(0x80));
>> +
>> +	if (tegra_vde_wait_mbe(vde->regs)) {
>> +		dev_err(dev, "MBE programming failed\n");
>> +		return -EIO;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void tegra_vde_decode_frame(struct tegra_vde *vde, int macroblocks_nb)
>> +{
>> +	VDE_WR(0x00000001, vde->regs + BSEV(0x8C));
>> +	VDE_WR(0x20000000 | (macroblocks_nb - 1), vde->regs + SXE(0x00));
>> +}
>> +
>> +static void tegra_vde_detach_and_put_dmabuf(struct dma_buf_attachment *a)
>> +{
>> +	struct dma_buf *dmabuf = a->dmabuf;
>> +
>> +	if (IS_ERR_OR_NULL(a))
> 
> You just dereferenced this on the line before so it's too late.
> 

Indeed, good catch!

> Obviously we don't want to dereference either an error pointer or a NULL
> but I'm wondering the significance of having it be both.  Normally that
> would mean that NULL is a special case of success.  In other words,
> error pointer means the hardware is broken but NULL means it's missing
> and not required.  I guess I'm suggesting to just delete the check and
> make sure we never set this to either NULL or ERR_PTR.
> 

NULL here exactly means that attachment is missing and not required. But anyway
I'll get rid of IS_ERR_OR_NULL, thanks for the suggestion.

>> +		return;
>> +
>> +	dma_buf_detach(dmabuf, a);
>> +	dma_buf_put(dmabuf);
>> +}
>> +
>> +static int tegra_vde_attach_dmabuf(struct device *dev, int fd,
>> +				   unsigned long offset, int min_size,
>> +				   struct dma_buf_attachment **a,
>> +				   phys_addr_t *paddr, u32 *size,
>> +				   enum dma_data_direction dma_dir)
>> +{
>> +	struct dma_buf_attachment *attachment;
>> +	struct dma_buf *dmabuf;
>> +	struct sg_table *sgt;
>> +
>> +	*a = NULL;
>> +	*paddr = 0xFBDEAD00;
>> +
>> +	dmabuf = dma_buf_get(fd);
>> +	if (IS_ERR(dmabuf)) {
>> +		dev_err(dev, "Invalid dmabuf FD\n");
>> +		return PTR_ERR(dmabuf);
>> +	}
>> +
>> +	if ((u64)offset + min_size > dmabuf->size) {
>> +		dev_err(dev, "Too small dmabuf size %d @0x%lX, "
>> +			     "should be at least %d\n",
>> +			dmabuf->size, offset, min_size);
>> +		return -EINVAL;
>> +	}
>> +
>> +	attachment = dma_buf_attach(dmabuf, dev);
>> +	if (IS_ERR(attachment)) {
>> +		dev_err(dev, "Failed to attach dmabuf\n");
>> +		dma_buf_put(dmabuf);
>> +		return PTR_ERR(attachment);
>> +	}
>> +
>> +	sgt = dma_buf_map_attachment(attachment, dma_dir);
>> +	if (IS_ERR(sgt)) {
>> +		dev_err(dev, "Failed to get dmabuf sg_table\n");
>> +		dma_buf_detach(dmabuf, attachment);
>> +		dma_buf_put(dmabuf);
>> +		return PTR_ERR(sgt);
>> +	}
>> +
>> +	if (sgt->nents != 1) {
>> +		dev_err(dev, "Sparsed DMA area is unsupported\n");
> 
> s/Sparsed/Sparse/
> 

ACK.

>> +		dma_buf_unmap_attachment(attachment, sgt, dma_dir);
>> +		dma_buf_detach(dmabuf, attachment);
>> +		dma_buf_put(dmabuf);
>> +		return -EINVAL;
> 
> This function would be cleaner using gotos to unwind.  Pick the goto
> name to reflect what the goto does.  For example, here it would be:
> 

Okay.

> 	if (sgt->nents != 1) {
> 		dev_err(dev, "Sparse DMA area is unsupported\n");
> 		ret = -EINVAL;
> 		goto err_umap;
> 	}
> 
> 
> 
>> +	}
>> +
>> +	*paddr = sg_dma_address(sgt->sgl) + offset;
>> +
>> +	dma_buf_unmap_attachment(attachment, sgt, dma_dir);
>> +
>> +	*a = attachment;
>> +
>> +	if (size)
>> +		*size = dmabuf->size - offset;
>> +
>> +	return 0;
> 
> 	return 0;
> 
> err_unmap:
> 	dma_buf_unmap_attachment(attachment, sgt, dma_dir);
> err_detach:
> 	dma_buf_detach(dmabuf, attachment);
> err_put:
> 	dma_buf_put(dmabuf);
> 	return ret;
> 
>> +}
>> +
>> +static int tegra_vde_attach_frame_dmabufs(struct device *dev,
>> +					  struct video_frame *frame,
>> +					  struct tegra_vde_h264_frame *source,
>> +					  enum dma_data_direction dma_dir,
>> +					  int is_baseline_profile, int csize)
>> +{
>> +	int ret;
>> +
>> +	ret = tegra_vde_attach_dmabuf(dev, source->y_fd,
>> +				      source->y_offset, csize * 4,
>> +				      &frame->y_dmabuf_attachment,
>> +				      &frame->y_paddr, NULL, dma_dir);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = tegra_vde_attach_dmabuf(dev, source->cb_fd,
>> +				      source->cb_offset, csize,
>> +				      &frame->cb_dmabuf_attachment,
>> +				      &frame->cb_paddr, NULL, dma_dir);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = tegra_vde_attach_dmabuf(dev, source->cr_fd,
>> +				      source->cr_offset, csize,
>> +				      &frame->cr_dmabuf_attachment,
>> +				      &frame->cr_paddr, NULL, dma_dir);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (is_baseline_profile)
>> +		frame->aux_paddr = 0xF4DEAD00;
> 
> The handling of is_baseline_profile is strange to me.  It feels like we
> should always check it before we use ->aux_paddr but we don't ever do
> that.
> 

In a case of baseline profile, aux buffer isn't needed, HW should't use it. Aux
phys address is set to a predefined and invalid address, so that in a case of
VDE trying to use it, its invalid memory accesses would be reported in KMSG by
memory controller driver and the reported invalid addresses would be known to be
associated with the aux buffer. I'm not sure what you are meaning.

>> +	else
>> +		ret = tegra_vde_attach_dmabuf(dev, source->aux_fd,
>> +					      source->aux_offset, csize,
>> +					      &frame->aux_dmabuf_attachment,
>> +					      &frame->aux_paddr, NULL, dma_dir);
> 
> 
> This function should have some error handling to undo the earlier
> attach calls if the latter ones fail.  See below:
> 
> 

Okay.

>> +
>> +	return ret;
> 
> 	return 0;
> 
> err_detach_cr:
> 	tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
> err_detach_cb:
> 	tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
> err_detach_y:
> 	tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);
> 
> 	return ret;
> 
> 
>> +}
>> +
>> +static void tegra_vde_deattach_frame_dmabufs(struct video_frame *frame)
>> +{
>> +	tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);
>> +	tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
>> +	tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
>> +	tegra_vde_detach_and_put_dmabuf(frame->aux_dmabuf_attachment);
> 
> 
> It would be happier to write this in the reverse order so it matches
> the error handling that I wrote for you.
> 
> 

Okay.

>> +}
>> +
>> +static int tegra_vde_copy_and_validate_frame(struct device *dev,
>> +					     struct tegra_vde_h264_frame *frame,
>> +					     unsigned long vaddr)
>> +{
>> +	if (copy_from_user(frame, (void __user *)vaddr, sizeof(*frame))) {
>> +		dev_err(dev, "Copying of H.264 frame from user failed\n");
>> +		return -EFAULT;
> 
> Error message are a funny thing and different people feel different
> ways.  These can be triggered by the user so they let you spam dmesg
> but I can see how many of them would be useful.  These ones for
> copy_from_user() are not useful since we assume the programmer should
> know that stuff.  The error code seems enough.
> 

Agree.

>> +	}
>> +
>> +	if (frame->frame_num > 0x7FFFFF) {
>> +		dev_err(dev, "Bad frame_num %u\n", frame->frame_num);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (frame->y_offset & 0xFF) {
>> +		dev_err(dev, "Bad y_offset 0x%X\n", frame->y_offset);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (frame->cb_offset & 0xFF) {
>> +		dev_err(dev, "Bad cb_offset 0x%X\n", frame->cb_offset);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (frame->cr_offset & 0xFF) {
>> +		dev_err(dev, "Bad cr_offset 0x%X\n", frame->cr_offset);
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int tegra_vde_validate_h264_ctx(struct device *dev,
>> +				       struct tegra_vde_h264_decoder_ctx *ctx)
>> +{
>> +	if (ctx->dpb_frames_nb == 0 || ctx->dpb_frames_nb > 17) {
>> +		dev_err(dev, "Bad DPB size %u\n", ctx->dpb_frames_nb);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (ctx->level_idc > 15) {
>> +		dev_err(dev, "Bad level value %u\n", ctx->level_idc);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (ctx->pic_init_qp > 52) {
>> +		dev_err(dev, "Bad pic_init_qp value %u\n", ctx->pic_init_qp);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (ctx->log2_max_pic_order_cnt_lsb > 16) {
>> +		dev_err(dev, "Bad log2_max_pic_order_cnt_lsb value %u\n",
>> +			ctx->log2_max_pic_order_cnt_lsb);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (ctx->log2_max_frame_num > 16) {
>> +		dev_err(dev, "Bad log2_max_frame_num value %u\n",
>> +			ctx->log2_max_frame_num);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (ctx->chroma_qp_index_offset > 31) {
>> +		dev_err(dev, "Bad chroma_qp_index_offset value %u\n",
>> +			ctx->chroma_qp_index_offset);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (ctx->pic_order_cnt_type > 2) {
>> +		dev_err(dev, "Bad pic_order_cnt_type value %u\n",
>> +			ctx->pic_order_cnt_type);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (ctx->num_ref_idx_l0_active_minus1 > 15) {
>> +		dev_err(dev, "Bad num_ref_idx_l0_active_minus1 value %u\n",
>> +			ctx->num_ref_idx_l0_active_minus1);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (ctx->num_ref_idx_l1_active_minus1 > 15) {
>> +		dev_err(dev, "Bad num_ref_idx_l1_active_minus1 value %u\n",
>> +			ctx->num_ref_idx_l1_active_minus1);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!ctx->pic_width_in_mbs || ctx->pic_width_in_mbs > 127) {
>> +		dev_err(dev, "Bad pic_width_in_mbs value %u, min 1 max 127\n",
>> +			ctx->pic_width_in_mbs);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!ctx->pic_height_in_mbs || ctx->pic_height_in_mbs > 127) {
>> +		dev_err(dev, "Bad pic_height_in_mbs value %u, min 1 max 127\n",
>> +			ctx->pic_height_in_mbs);
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde,
>> +				       unsigned long vaddr)
>> +{
>> +	struct tegra_vde_h264_decoder_ctx ctx;
>> +	struct tegra_vde_h264_frame frame;
>> +	struct device *dev = vde->miscdev.parent;
>> +	struct video_frame *dpb_frames = NULL;
>> +	struct dma_buf_attachment *bitstream_data_dmabuf_attachment = NULL;
>> +	enum dma_data_direction dma_dir;
>> +	phys_addr_t bitstream_data_paddr;
>> +	phys_addr_t bsev_paddr;
>> +	u32 bitstream_data_size;
>> +	u32 macroblocks_nb;
>> +	bool timeout = false;
>> +	int i, ret;
>> +
>> +	if (copy_from_user(&ctx, (void __user *)vaddr, sizeof(ctx))) {
>> +		dev_err(dev, "Copying of H.264 CTX from user failed\n");
>> +		return -EFAULT;
>> +	}
>> +
>> +	ret = tegra_vde_validate_h264_ctx(dev, &ctx);
>> +	if (ret)
>> +		return -EINVAL;
>> +
>> +	ret = tegra_vde_attach_dmabuf(dev, ctx.bitstream_data_fd,
>> +				      ctx.bitstream_data_offset, 0,
>> +				      &bitstream_data_dmabuf_attachment,
>> +				      &bitstream_data_paddr,
>> +				      &bitstream_data_size,
>> +				      DMA_TO_DEVICE);
>> +	if (ret)
>> +		goto cleanup;
> 
> 
> I hate this label name.  What are we cleaning up???  Now I have to set a
> bookmark so I can remember where I left and then scroll down to the
> bottom of the function and take a look.
> 
> [pause]
> 
> OK.  I'm back.  I call this "one err" style error handling.  And it's
> very bug prone.  Please read my essay on the topic:
> 
> https://plus.google.com/106378716002406849458/posts/dnanfhQ4mHQ
> 
> The bug is that we call tegra_vde_detach_and_put_dmabuf() with a NULL
> pointer and there was that dereference before check that I mentioned
> earlier.
> 

Again, this NULL pointer was intentional. But yes, I messed up the
IS_ERR_OR_NULL/derefernce order and will get rid of it as you suggested. No
objections to what you wrote in the essay ;)

>> +
>> +	dpb_frames = kcalloc(ctx.dpb_frames_nb, sizeof(*dpb_frames),
>> +			     GFP_KERNEL);
>> +	if (!dpb_frames) {
>> +		ret = -ENOMEM;
>> +		goto cleanup;
>> +	}
>> +
>> +	macroblocks_nb = ctx.pic_width_in_mbs * ctx.pic_height_in_mbs;
>> +	vaddr = ctx.dpb_frames_ptr;
>> +
>> +	for (i = 0; i < ctx.dpb_frames_nb; i++) {
>> +		ret = tegra_vde_copy_and_validate_frame(dev, &frame, vaddr);
>> +		if (ret)
>> +			goto cleanup;
>> +
>> +		dpb_frames[i].flags = frame.flags;
>> +		dpb_frames[i].frame_num = frame.frame_num;
>> +
>> +		dma_dir = (i == 0) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
>> +
>> +		ret = tegra_vde_attach_frame_dmabufs(dev,
>> +						     &dpb_frames[i], &frame,
>> +						     dma_dir,
>> +						     ctx.is_baseline_profile,
>> +						     macroblocks_nb * 64);
>> +		if (ret) {
>> +			tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
>> +			goto cleanup;
>> +		}
>> +
>> +		vaddr += sizeof(frame);
>> +	}
>> +
>> +	ret = clk_prepare_enable(vde->clk);
>> +	if (ret) {
>> +		dev_err(dev, "Failed to enable clk: %d\n", ret);
>> +		goto cleanup;
>> +	}
>> +
>> +	ret = mutex_lock_interruptible(&vde->lock);
>> +	if (ret)
>> +		goto clkgate;
>> +
>> +	ret = reset_control_deassert(vde->rst);
>> +	if (ret) {
>> +		dev_err(dev, "Failed to deassert reset: %d\n", ret);
>> +		clk_disable_unprepare(vde->clk);
> 
> We do a second clk_disable_unprepare(vde->clk); after the unlock.
> Delete this?
> 

Good catch! I've reshuffled clk managment several times before..

>> +		goto unlock;
>> +	}
>> +
>> +	ret = tegra_vde_setup_context(vde, &ctx, dpb_frames,
>> +				      bitstream_data_paddr,
>> +				      bitstream_data_size,
>> +				      macroblocks_nb);
>> +	if (ret)
>> +		goto reset;
>> +
>> +	tegra_vde_decode_frame(vde, macroblocks_nb);
>> +
>> +	timeout = !wait_for_completion_io_timeout(&vde->decode_completion,
>> +						  TEGRA_VDE_TIMEOUT);
>> +	if (timeout) {
>> +		bsev_paddr = readl(vde->regs + BSEV(0x10));
>> +		macroblocks_nb = readl(vde->regs + SXE(0xC8)) & 0x1FFF;
>> +
>> +		dev_err(dev, "Decoding failed, "
>> +				"read 0x%X bytes : %u macroblocks parsed\n",
>> +			bsev_paddr ? bsev_paddr - bitstream_data_paddr : 0,
>> +			macroblocks_nb);
>> +	}
>> +
>> +reset:
>> +	/*
>> +	 * We rely on the VDE registers reset value, otherwise VDE would
>> +	 * cause bus lockup.
>> +	 */
>> +	ret = reset_control_assert(vde->rst);
>> +	if (ret)
>> +		dev_err(dev, "Failed to assert reset: %d\n", ret);
> 
> We're overwriting "ret" here when we probably want to preserve the error
> code from reset_control_deassert().
> 

I think it doesn't really matter. It's very unlikely that the reset assert could
ever fail, it might result in a further system hang on the next decode invocation.

>> +
>> +	if (timeout)
>> +		ret = -EIO;
>> +
>> +unlock:
>> +	mutex_unlock(&vde->lock);
>> +
>> +clkgate:
>> +	clk_disable_unprepare(vde->clk);
>> +
>> +cleanup:
>> +	if (dpb_frames)
>> +		while (i--)
>> +			tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
>> +
>> +	kfree(dpb_frames);
>> +
>> +	tegra_vde_detach_and_put_dmabuf(bitstream_data_dmabuf_attachment);
>> +
>> +	return ret;
>> +}
>> +
>> +static long tegra_vde_unlocked_ioctl(struct file *filp,
>> +				     unsigned int cmd, unsigned long arg)
>> +{
>> +	struct miscdevice *miscdev = filp->private_data;
>> +	struct tegra_vde *vde = container_of(miscdev, struct tegra_vde,
>> +					     miscdev);
>> +
>> +	switch (cmd) {
>> +	case TEGRA_VDE_IOCTL_DECODE_H264:
>> +		return tegra_vde_ioctl_decode_h264(vde, arg);
>> +	}
>> +
>> +	dev_err(miscdev->parent, "Invalid IOCTL command %u\n", cmd);
>> +
>> +	return -ENOTTY;
>> +}
>> +
>> +static const struct file_operations tegra_vde_fops = {
>> +	.owner		= THIS_MODULE,
>> +	.unlocked_ioctl	= tegra_vde_unlocked_ioctl,
>> +};
>> +
>> +static irqreturn_t tegra_vde_isr(int irq, void *data)
>> +{
>> +	struct tegra_vde *vde = data;
>> +
>> +	tegra_vde_set_bits(vde->regs, 0, FRAMEID(0x208));
>> +	complete(&vde->decode_completion);
>> +
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +static int tegra_vde_probe(struct platform_device *pdev)
>> +{
>> +	struct resource *res_regs, *res_iram;
>> +	struct device *dev = &pdev->dev;
>> +	struct tegra_vde *vde;
>> +	int ret;
>> +
>> +	vde = devm_kzalloc(&pdev->dev, sizeof(*vde), GFP_KERNEL);
>> +	if (!vde)
>> +		return -ENOMEM;
>> +
>> +	platform_set_drvdata(pdev, vde);
>> +
>> +	res_regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
>> +	if (!res_regs)
>> +		return -ENODEV;
>> +
>> +	res_iram = platform_get_resource_byname(pdev, IORESOURCE_MEM, "iram");
>> +	if (!res_iram)
>> +		return -ENODEV;
>> +
>> +	vde->irq = platform_get_irq_byname(pdev, "sync-token");
>> +	if (vde->irq < 0)
>> +		return vde->irq;
>> +
>> +	vde->regs = devm_ioremap_resource(dev, res_regs);
>> +	if (IS_ERR(vde->regs))
>> +		return PTR_ERR(vde->regs);
>> +
>> +	vde->iram = devm_ioremap_resource(dev, res_iram);
>> +	if (IS_ERR(vde->iram))
>> +		return PTR_ERR(vde->iram);
>> +
>> +	vde->clk = devm_clk_get(dev, NULL);
>> +	if (IS_ERR(vde->clk)) {
>> +		dev_err(dev, "Could not get VDE clk\n");
>> +		return PTR_ERR(vde->clk);
>> +	}
>> +
>> +	vde->rst = devm_reset_control_get(dev, "vde");
>> +	if (IS_ERR(vde->rst)) {
>> +		dev_err(dev, "Could not get VDE reset\n");
>> +		return PTR_ERR(vde->rst);
>> +	}
>> +
>> +	ret = devm_request_irq(dev, vde->irq, tegra_vde_isr, IRQF_SHARED,
>> +			       dev_name(dev), vde);
>> +	if (ret)
>> +		return -ENODEV;
> 
> Presever the error code.
> 
>> +
>> +	ret = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_VDEC,
>> +						vde->clk, vde->rst);
>> +	if (ret) {
>> +		dev_err(&pdev->dev, "Failed to power up VDE unit\n");
>> +		return ret;
>> +	}
>> +
>> +	ret = reset_control_assert(vde->rst);
>> +	if (ret) {
>> +		dev_err(dev, "Failed to assert reset: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	clk_disable_unprepare(vde->clk);
>> +
>> +	mutex_init(&vde->lock);
>> +	init_completion(&vde->decode_completion);
>> +
>> +	vde->iram_lists_paddr = res_iram->start;
>> +	vde->miscdev.minor = MISC_DYNAMIC_MINOR;
>> +	vde->miscdev.name = "tegra_vde";
>> +	vde->miscdev.fops = &tegra_vde_fops;
>> +	vde->miscdev.parent = dev;
>> +
>> +	ret = misc_register(&vde->miscdev);
>> +	if (ret)
>> +		return -ENODEV;
> 
> Preserve the error code.
> 
>> +
>> +	return 0;
>> +}
>> +
>
Dmitry Osipenko Sept. 27, 2017, 11:31 p.m. UTC | #7
On 27.09.2017 12:45, Dan Carpenter wrote:
>> --- /dev/null
>> +++ b/drivers/staging/tegra-vde/Kconfig
>> @@ -0,0 +1,6 @@
>> +config TEGRA_VDE
>> +	tristate "NVIDIA Tegra20 video decoder driver"
>> +	depends on ARCH_TEGRA_2x_SOC
> 
> Could we get a || COMPILE_TEST here as well?
> 

Good point, I'll address it in V2.
Dan Carpenter Sept. 28, 2017, 7:23 a.m. UTC | #8
On Thu, Sep 28, 2017 at 02:28:04AM +0300, Dmitry Osipenko wrote:
> >> +	if (is_baseline_profile)
> >> +		frame->aux_paddr = 0xF4DEAD00;
> > 
> > The handling of is_baseline_profile is strange to me.  It feels like we
> > should always check it before we use ->aux_paddr but we don't ever do
> > that.
> > 
> 
> In a case of baseline profile, aux buffer isn't needed, HW should't use it. Aux
> phys address is set to a predefined and invalid address, so that in a case of
> VDE trying to use it, its invalid memory accesses would be reported in KMSG by
> memory controller driver and the reported invalid addresses would be known to be
> associated with the aux buffer. I'm not sure what you are meaning.

It's not used perhaps, but we do write it to the hardware, right?

	tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr);

It's just strange.

regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Osipenko Sept. 28, 2017, 11:37 a.m. UTC | #9
On 28.09.2017 10:23, Dan Carpenter wrote:
> On Thu, Sep 28, 2017 at 02:28:04AM +0300, Dmitry Osipenko wrote:
>>>> +	if (is_baseline_profile)
>>>> +		frame->aux_paddr = 0xF4DEAD00;
>>>
>>> The handling of is_baseline_profile is strange to me.  It feels like we
>>> should always check it before we use ->aux_paddr but we don't ever do
>>> that.
>>>
>>
>> In a case of baseline profile, aux buffer isn't needed, HW should't use it. Aux
>> phys address is set to a predefined and invalid address, so that in a case of
>> VDE trying to use it, its invalid memory accesses would be reported in KMSG by
>> memory controller driver and the reported invalid addresses would be known to be
>> associated with the aux buffer. I'm not sure what you are meaning.
> 
> It's not used perhaps, but we do write it to the hardware, right?
> 

That's right. I'm pretty sure HW won't use the aux in a case of baseline
profile, haven't seen an evidence of it. But in order to be on a safe side, the
addresses are initialized to an invalid value, so HW won't have a chance to
silently fetch/trash 'arbitrary' main memory and we will know if it tries to do it.

	if (baseline_profile)
		frame->aux_paddr = 0xF4DEAD00;
	else {

So here ^ all *used* frame entries are being initialized.

> 	tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr);
> 
> It's just strange.
> 

There up to 16 reference video frames (already decoded) that could be used for
decoding of a video frame. Addresses of reference frames that shouldn't be used
for decoding of the current frame are set to an invalid address. Userspace my
supply wrong frames list or frames list setup code may contain an obscure bug
and we will know about it.

	} else {
		aux_paddr = 0xFADEAD00;
		value = 0;
	}

Here ^ all *unused* frame entries are being initialized.

	tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr);

And here ^ these entries are written to the tables that are read by HW.
diff mbox series

Patch

diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
new file mode 100644
index 000000000000..e5ca6f5e96c0
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
@@ -0,0 +1,38 @@ 
+NVIDIA Tegra Video Decoder Engine
+
+Required properties:
+- compatible : "nvidia,tegra20-vde"
+- reg : Must contain 2 register ranges: registers and IRAM area.
+- reg-names : Must include the following entries:
+  - regs
+  - iram
+- interrupts : Must contain an entry for each entry in interrupt-names.
+- interrupt-names : Must include the following entries:
+  - ucq-error
+  - sync-token
+  - bsev
+  - bsea
+  - sxe
+- clocks : Must contain one entry, for the module clock.
+  See ../clocks/clock-bindings.txt for details.
+- resets : Must contain an entry for each entry in reset-names.
+  See ../reset/reset.txt for details.
+- reset-names : Must include the following entries:
+  - vde
+
+Example:
+	vde@6001a000 {
+		compatible = "nvidia,tegra20-vde";
+		reg = <0x6001a000 0x3D00    /* VDE registers */
+		       0x40000400 0x3FC00>; /* IRAM area */
+		reg-names = "regs", "iram";
+		interrupts = <GIC_SPI  8 IRQ_TYPE_LEVEL_HIGH>, /* UCQ error interrupt */
+			     <GIC_SPI  9 IRQ_TYPE_LEVEL_HIGH>, /* Sync token interrupt */
+			     <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>, /* BSE-V interrupt */
+			     <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>, /* BSE-A interrupt */
+			     <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>; /* SXE interrupt */
+		interrupt-names = "ucq-error", "sync-token", "bsev", "bsea", "sxe";
+		clocks = <&tegra_car TEGRA20_CLK_VDE>;
+		resets = <&tegra_car 61>;
+		reset-names = "vde";
+	};
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 554683912cff..10c982811093 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -118,4 +118,6 @@  source "drivers/staging/vboxvideo/Kconfig"
 
 source "drivers/staging/pi433/Kconfig"
 
+source "drivers/staging/tegra-vde/Kconfig"
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index 8951c37d8d80..d07c167d5773 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -49,3 +49,4 @@  obj-$(CONFIG_BCM2835_VCHIQ)	+= vc04_services/
 obj-$(CONFIG_CRYPTO_DEV_CCREE)	+= ccree/
 obj-$(CONFIG_DRM_VBOXVIDEO)	+= vboxvideo/
 obj-$(CONFIG_PI433)		+= pi433/
+obj-$(CONFIG_ARCH_TEGRA)	+= tegra-vde/
diff --git a/drivers/staging/tegra-vde/Kconfig b/drivers/staging/tegra-vde/Kconfig
new file mode 100644
index 000000000000..b947c012a373
--- /dev/null
+++ b/drivers/staging/tegra-vde/Kconfig
@@ -0,0 +1,6 @@ 
+config TEGRA_VDE
+	tristate "NVIDIA Tegra20 video decoder driver"
+	depends on ARCH_TEGRA_2x_SOC
+	help
+	    Say Y here to enable support for a NVIDIA Tegra20 video decoder
+	    driver.
diff --git a/drivers/staging/tegra-vde/Makefile b/drivers/staging/tegra-vde/Makefile
new file mode 100644
index 000000000000..e7c0df1174bf
--- /dev/null
+++ b/drivers/staging/tegra-vde/Makefile
@@ -0,0 +1 @@ 
+obj-$(CONFIG_TEGRA_VDE)	+= vde.o
diff --git a/drivers/staging/tegra-vde/TODO b/drivers/staging/tegra-vde/TODO
new file mode 100644
index 000000000000..533ddfc5190e
--- /dev/null
+++ b/drivers/staging/tegra-vde/TODO
@@ -0,0 +1,8 @@ 
+All TODO's require reverse-engineering to be done first, it is very
+unlikely that NVIDIA would ever release HW specs to public.
+
+TODO:
+	- properly handle decoding faults
+	- support more formats
+
+Contact: Dmitry Osipenko <digetx@gmail.com>
diff --git a/drivers/staging/tegra-vde/uapi.h b/drivers/staging/tegra-vde/uapi.h
new file mode 100644
index 000000000000..4e60449f688e
--- /dev/null
+++ b/drivers/staging/tegra-vde/uapi.h
@@ -0,0 +1,99 @@ 
+/*
+ * Copyright (C) 2016-2017 Dmitry Osipenko <digetx@gmail.com>
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _UAPI_TEGRA_VDE_H_
+#define _UAPI_TEGRA_VDE_H_
+
+#include <linux/types.h>
+#include <asm/ioctl.h>
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define FLAG_IS_B_FRAME		(1 << 0)
+#define FLAG_IS_REFERENCE	(1 << 1)
+
+struct tegra_vde_h264_frame {
+	__s32 y_fd;
+	__s32 cb_fd;
+	__s32 cr_fd;
+	__s32 aux_fd;
+	__u32 y_offset;
+	__u32 cb_offset;
+	__u32 cr_offset;
+	__u32 aux_offset;
+	__u32 frame_num;
+	__u32 flags;
+
+	__u32 reserved;
+} __attribute__((packed));
+
+struct tegra_vde_h264_decoder_ctx {
+	__s32 bitstream_data_fd;
+	__u32 bitstream_data_offset;
+
+	__u32 dpb_frames_ptr;
+	__u8  dpb_frames_nb;
+	__u8  dpb_ref_frames_with_earlier_poc_nb;
+
+	// SPS
+	__u8  is_baseline_profile;
+	__u8  level_idc;
+	__u8  log2_max_pic_order_cnt_lsb;
+	__u8  log2_max_frame_num;
+	__u8  pic_order_cnt_type;
+	__u8  direct_8x8_inference_flag;
+	__u8  pic_width_in_mbs;
+	__u8  pic_height_in_mbs;
+
+	// PPS
+	__u8  pic_init_qp;
+	__u8  deblocking_filter_control_present_flag;
+	__u8  constrained_intra_pred_flag;
+	__u8  chroma_qp_index_offset;
+	__u8  pic_order_present_flag;
+
+	// Slice header
+	__u8  num_ref_idx_l0_active_minus1;
+	__u8  num_ref_idx_l1_active_minus1;
+
+	__u32 reserved;
+} __attribute__((packed));
+
+#define VDE_IOCTL_BASE			'v'
+#define VDE_IO(nr)			_IO(VDE_IOCTL_BASE,nr)
+#define VDE_IOR(nr,type)		_IOR(VDE_IOCTL_BASE,nr,type)
+#define VDE_IOW(nr,type)		_IOW(VDE_IOCTL_BASE,nr,type)
+#define VDE_IOWR(nr,type)		_IOWR(VDE_IOCTL_BASE,nr,type)
+
+#define TEGRA_VDE_DECODE_H264		0x01
+
+#define TEGRA_VDE_IOCTL_DECODE_H264	VDE_IOW(VDE_IOCTL_BASE + TEGRA_VDE_DECODE_H264, struct tegra_vde_h264_decoder_ctx)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif // _UAPI_TEGRA_VDE_H_
diff --git a/drivers/staging/tegra-vde/vde.c b/drivers/staging/tegra-vde/vde.c
new file mode 100644
index 000000000000..309d87926e21
--- /dev/null
+++ b/drivers/staging/tegra-vde/vde.c
@@ -0,0 +1,971 @@ 
+/*
+ * NVIDIA Tegra20 Video decoder driver
+ *
+ * Copyright (C) 2016-2017 Dmitry Osipenko <digetx@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/dma-buf.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include <soc/tegra/pmc.h>
+
+#include "uapi.h"
+
+#define SXE(offt)		(0x0000 + (offt))
+#define BSEV(offt)		(0x1000 + (offt))
+#define MBE(offt)		(0x2000 + (offt))
+#define PPE(offt)		(0x2200 + (offt))
+#define MCE(offt)		(0x2400 + (offt))
+#define TFE(offt)		(0x2600 + (offt))
+#define VDMA(offt)		(0x2A00 + (offt))
+#define FRAMEID(offt)		(0x3800 + (offt))
+
+#define ICMDQUE_WR		0x00
+#define CMDQUE_CONTROL		0x08
+#define INTR_STATUS		0x18
+#define BSE_INT_ENB		0x40
+#define BSE_CONFIG		0x44
+
+#define BSE_ICMDQUE_EMPTY	BIT(3)
+#define BSE_DMA_BUSY		BIT(23)
+
+#define TEGRA_VDE_TIMEOUT	(msecs_to_jiffies(1000))
+
+#define VDE_WR(data, addr)				\
+do {							\
+	pr_debug("%s: %d: 0x%08X => " #addr ")\n",	\
+		  __func__, __LINE__, (data));		\
+	writel_relaxed(data, addr);			\
+} while (0)
+
+struct video_frame {
+	struct dma_buf_attachment *y_dmabuf_attachment;
+	struct dma_buf_attachment *cb_dmabuf_attachment;
+	struct dma_buf_attachment *cr_dmabuf_attachment;
+	struct dma_buf_attachment *aux_dmabuf_attachment;
+	dma_addr_t y_paddr;
+	dma_addr_t cb_paddr;
+	dma_addr_t cr_paddr;
+	dma_addr_t aux_paddr;
+	u32 frame_num;
+	u32 flags;
+};
+
+struct tegra_vde {
+	void __iomem *regs;
+	void __iomem *iram;
+	struct mutex lock;
+	struct miscdevice miscdev;
+	struct reset_control *rst;
+	struct clk *clk;
+	struct completion decode_completion;
+	phys_addr_t iram_lists_paddr;
+	int irq;
+};
+
+static void tegra_vde_set_bits(void __iomem *regs, u32 mask, u32 offset)
+{
+	u32 value = readl_relaxed(regs + offset);
+
+	VDE_WR(value | mask, regs + offset);
+}
+
+static int tegra_vde_wait_mbe(void __iomem *regs)
+{
+	u32 tmp;
+
+	return readl_poll_timeout(regs + MBE(0x8C), tmp, (tmp >= 0x10), 1, 100);
+}
+
+static int tegra_vde_setup_mbe_frame_idx(void __iomem *regs,
+					 int setup_refs, int refs_nb)
+{
+	u32 frame_idx_enb_mask = 0;
+	u32 value;
+	int frame_idx;
+	int idx;
+
+	VDE_WR(0xD0000000 | (0 << 23), regs + MBE(0x80));
+	VDE_WR(0xD0200000 | (0 << 23), regs + MBE(0x80));
+
+	if (tegra_vde_wait_mbe(regs))
+		return -EIO;
+
+	if (!setup_refs)
+		return 0;
+
+	for (idx = 0, frame_idx = 1; idx < refs_nb; idx++, frame_idx++) {
+		VDE_WR(0xD0000000 | (frame_idx << 23), regs + MBE(0x80));
+		VDE_WR(0xD0200000 | (frame_idx << 23), regs + MBE(0x80));
+
+		frame_idx_enb_mask |= frame_idx << (6 * (idx % 4));
+
+		if (idx % 4 == 3 || idx == refs_nb - 1) {
+			value = 0xC0000000;
+			value |= (idx >> 2) << 24;
+			value |= frame_idx_enb_mask;
+
+			VDE_WR(value, regs + MBE(0x80));
+
+			if (tegra_vde_wait_mbe(regs))
+				return -EIO;
+
+			frame_idx_enb_mask = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void tegra_vde_mbe_set_0xa_reg(void __iomem *regs, int reg, u32 val)
+{
+	VDE_WR(0xA0000000 | (reg << 24) | (val & 0xFFFF), regs + MBE(0x80));
+	VDE_WR(0xA0000000 | ((reg + 1) << 24) | (val >> 16), regs + MBE(0x80));
+}
+
+static int tegra_vde_wait_bsev(struct tegra_vde *vde, bool wait_dma)
+{
+	struct device *dev = vde->miscdev.parent;
+	u32 polled;
+	int ret;
+
+	ret = readl_poll_timeout(vde->regs + BSEV(INTR_STATUS), polled,
+				 !(polled & BIT(2)), 1, 100);
+	if (ret) {
+		dev_err(dev, "BSEV unknown bit timeout\n");
+		return -EIO;
+	}
+
+	ret = readl_poll_timeout(vde->regs + BSEV(INTR_STATUS), polled,
+				 (polled & BSE_ICMDQUE_EMPTY), 1, 100);
+	if (ret) {
+		dev_err(dev, "BSEV ICMDQUE flush timeout\n");
+		return -EIO;
+	}
+
+	if (!wait_dma)
+		return 0;
+
+	ret = readl_poll_timeout(vde->regs + BSEV(INTR_STATUS), polled,
+				 !(polled & BSE_DMA_BUSY), 1, 100);
+	if (ret) {
+		dev_err(dev, "BSEV DMA timeout\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int tegra_vde_push_bsev_icmdqueue(struct tegra_vde *vde,
+					 u32 value, bool wait_dma)
+{
+	VDE_WR(value, vde->regs + BSEV(ICMDQUE_WR));
+
+	return tegra_vde_wait_bsev(vde, wait_dma);
+}
+
+static void tegra_vde_setup_frameid(void __iomem *regs,
+				    struct video_frame *frame, int frameid,
+				    u32 mbs_width, u32 mbs_height)
+{
+	u32 y_paddr  = frame ? frame->y_paddr  : 0xFCDEAD00;
+	u32 cb_paddr = frame ? frame->cb_paddr : 0xFCDEAD00;
+	u32 cr_paddr = frame ? frame->cr_paddr : 0xFCDEAD00;
+	u32 v1 = frame ? ((mbs_width << 16) | mbs_height) : 0;
+	u32 v2 = frame ? ((((mbs_width + 1) >> 1) << 6) | 1) : 0;
+
+	VDE_WR( y_paddr >> 8, regs + FRAMEID(0x000 + frameid * 4));
+	VDE_WR(cb_paddr >> 8, regs + FRAMEID(0x100 + frameid * 4));
+	VDE_WR(cr_paddr >> 8, regs + FRAMEID(0x180 + frameid * 4));
+	VDE_WR(v1,            regs + FRAMEID(0x080 + frameid * 4));
+	VDE_WR(v2,            regs + FRAMEID(0x280 + frameid * 4));
+}
+
+static void tegra_setup_frameidx(void __iomem *regs,
+				 struct video_frame *frames, int frames_nb,
+				 u32 mbs_width, u32 mbs_height)
+{
+	int idx;
+
+	for (idx = 0; idx < frames_nb; idx++)
+		tegra_vde_setup_frameid(regs, &frames[idx], idx,
+					mbs_width, mbs_height);
+	for (; idx < 17; idx++)
+		tegra_vde_setup_frameid(regs, NULL, idx, 0, 0);
+}
+
+static void tegra_vde_write_iram_entry(void __iomem *tables,
+				       int table, int row,
+				       u32 value1, u32 value2)
+{
+	VDE_WR(value1, tables + 0x80 * table + row * 8);
+	VDE_WR(value2, tables + 0x80 * table + row * 8 + 4);
+}
+
+static void tegra_vde_setup_iram_tables(void __iomem *iram_tables,
+					struct video_frame *dpb_frames,
+					int ref_frames_nb,
+					int with_earlier_poc_nb)
+{
+	struct video_frame *frame;
+	u32 value, aux_paddr;
+	int with_later_poc_nb;
+	int i, k;
+
+	pr_debug("DPB: Frame 0: frame_num = %d\n", dpb_frames[0].frame_num);
+
+	pr_debug("REF L0:\n");
+
+	for (i = 0; i < 16; i++) {
+		if (i < ref_frames_nb) {
+			frame = &dpb_frames[i + 1];
+
+			aux_paddr = frame->aux_paddr;
+
+			value  = (i + 1) << 26;
+			value |= !(frame->flags & FLAG_IS_B_FRAME) << 25;
+			value |= 1 << 24;
+			value |= frame->frame_num;
+
+			pr_debug("\tFrame %d: frame_num = %d is_B_frame = %d\n",
+				 i + 1, frame->frame_num,
+				 (frame->flags & FLAG_IS_B_FRAME));
+		} else {
+			aux_paddr = 0xFADEAD00;
+			value = 0;
+		}
+
+		tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr);
+		tegra_vde_write_iram_entry(iram_tables, 1, i, value, aux_paddr);
+		tegra_vde_write_iram_entry(iram_tables, 2, i, value, aux_paddr);
+		tegra_vde_write_iram_entry(iram_tables, 3, i, value, aux_paddr);
+	}
+
+	if (!(dpb_frames[0].flags & FLAG_IS_B_FRAME))
+		return;
+
+	if (with_earlier_poc_nb >= ref_frames_nb)
+		return;
+
+	with_later_poc_nb = ref_frames_nb - with_earlier_poc_nb;
+
+	pr_debug("REF L1: with_later_poc_nb %d with_earlier_poc_nb %d\n",
+		 with_later_poc_nb, with_earlier_poc_nb);
+
+	for (i = 0, k = with_earlier_poc_nb; i < with_later_poc_nb; i++, k++) {
+		frame = &dpb_frames[k + 1];
+
+		aux_paddr = frame->aux_paddr;
+
+		value  = (k + 1) << 26;
+		value |= !(frame->flags & FLAG_IS_B_FRAME) << 25;
+		value |= 1 << 24;
+		value |= frame->frame_num;
+
+		pr_debug("\tFrame %d: frame_num = %d\n",
+			 k + 1, frame->frame_num);
+
+		tegra_vde_write_iram_entry(iram_tables, 2, i, value, aux_paddr);
+	}
+
+	for (k = 0; i < ref_frames_nb; i++, k++) {
+		frame = &dpb_frames[k + 1];
+
+		aux_paddr = frame->aux_paddr;
+
+		value  = (k + 1) << 26;
+		value |= !(frame->flags & FLAG_IS_B_FRAME) << 25;
+		value |= 1 << 24;
+		value |= frame->frame_num;
+
+		pr_debug("\tFrame %d: frame_num = %d\n",
+			 k + 1, frame->frame_num);
+
+		tegra_vde_write_iram_entry(iram_tables, 2, i, value, aux_paddr);
+	}
+}
+
+static int tegra_vde_setup_context(struct tegra_vde *vde,
+				   struct tegra_vde_h264_decoder_ctx *ctx,
+				   struct video_frame *dpb_frames,
+				   phys_addr_t bitstream_data_paddr,
+				   int bitstream_data_size,
+				   int macroblocks_nb)
+{
+	struct device *dev = vde->miscdev.parent;
+	u32 value;
+
+	tegra_vde_set_bits(vde->regs,    0xA, SXE(0xF0));
+	tegra_vde_set_bits(vde->regs,    0xB, BSEV(CMDQUE_CONTROL));
+	tegra_vde_set_bits(vde->regs, 0x8002, MBE(0x50));
+	tegra_vde_set_bits(vde->regs,    0xA, MBE(0xA0));
+	tegra_vde_set_bits(vde->regs,    0xA, PPE(0x14));
+	tegra_vde_set_bits(vde->regs,    0xA, PPE(0x28));
+	tegra_vde_set_bits(vde->regs,  0xA00, MCE(0x08));
+	tegra_vde_set_bits(vde->regs,    0xA, TFE(0x00));
+	tegra_vde_set_bits(vde->regs,    0x5, VDMA(0x04));
+
+	VDE_WR(0x00000000, vde->regs + VDMA(0x1C));
+	VDE_WR(0x00000000, vde->regs + VDMA(0x00));
+	VDE_WR(0x00000007, vde->regs + VDMA(0x04));
+	VDE_WR(0x00000007, vde->regs + FRAMEID(0x200));
+	VDE_WR(0x00000005, vde->regs + TFE(0x04));
+	VDE_WR(0x00000000, vde->regs + MBE(0x84));
+	VDE_WR(0x00000010, vde->regs + SXE(0x08));
+	VDE_WR(0x00000150, vde->regs + SXE(0x54));
+	VDE_WR(0x0000054C, vde->regs + SXE(0x58));
+	VDE_WR(0x00000E34, vde->regs + SXE(0x5C));
+	VDE_WR(0x063C063C, vde->regs + MCE(0x10));
+	VDE_WR(0x0003FC00, vde->regs + BSEV(INTR_STATUS));
+	VDE_WR(0x0000150D, vde->regs + BSEV(BSE_CONFIG));
+	VDE_WR(0x00000100, vde->regs + BSEV(BSE_INT_ENB));
+	VDE_WR(0x00000000, vde->regs + BSEV(0x98));
+	VDE_WR(0x00000060, vde->regs + BSEV(0x9C));
+
+	memset_io(vde->iram + 512, 0, macroblocks_nb / 2);
+
+	tegra_setup_frameidx(vde->regs, dpb_frames, ctx->dpb_frames_nb,
+			     ctx->pic_width_in_mbs, ctx->pic_height_in_mbs);
+
+	tegra_vde_setup_iram_tables(vde->iram, dpb_frames,
+				    ctx->dpb_frames_nb - 1,
+				    ctx->dpb_ref_frames_with_earlier_poc_nb);
+
+	VDE_WR(0x00000000, vde->regs + BSEV(0x8C));
+	VDE_WR(bitstream_data_paddr + bitstream_data_size,
+	       vde->regs + BSEV(0x54));
+
+	value = ctx->pic_width_in_mbs << 11 | ctx->pic_height_in_mbs << 3;
+
+	VDE_WR(value, vde->regs + BSEV(0x88));
+
+	if (tegra_vde_wait_bsev(vde, false))
+		return -EIO;
+
+	if (tegra_vde_push_bsev_icmdqueue(vde, 0x800003FC, false))
+		return -EIO;
+
+	value = 0x01500000;
+	value |= ((vde->iram_lists_paddr + 512) >> 2) & 0xFFFF;
+
+	if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
+		return -EIO;
+
+	if (tegra_vde_push_bsev_icmdqueue(vde, 0x840F054C, false))
+		return -EIO;
+
+	if (tegra_vde_push_bsev_icmdqueue(vde, 0x80000080, false))
+		return -EIO;
+
+	value = 0x0E340000 | ((vde->iram_lists_paddr >> 2) & 0xFFFF);
+
+	if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
+		return -EIO;
+
+	value = (1 << 23) | 5;
+	value |= ctx->pic_width_in_mbs << 11;
+	value |= ctx->pic_height_in_mbs << 3;
+
+	VDE_WR(value, vde->regs + SXE(0x10));
+
+	value = !ctx->is_baseline_profile << 17;
+	value |= ctx->level_idc << 13;
+	value |= ctx->log2_max_pic_order_cnt_lsb << 7;
+	value |= ctx->pic_order_cnt_type << 5;
+	value |= ctx->log2_max_frame_num;
+
+	VDE_WR(value, vde->regs + SXE(0x40));
+
+	value = ctx->pic_init_qp << 25;
+	value |= !!(ctx->deblocking_filter_control_present_flag) << 2;
+	value |= !!ctx->pic_order_present_flag;
+
+	VDE_WR(value, vde->regs + SXE(0x44));
+
+	value = ctx->chroma_qp_index_offset;
+	value |= ctx->num_ref_idx_l0_active_minus1 << 5;
+	value |= ctx->num_ref_idx_l1_active_minus1 << 10;
+	value |= !!ctx->constrained_intra_pred_flag << 15;
+
+	VDE_WR(value, vde->regs + SXE(0x48));
+
+	value = 0x0C000000;
+	value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 24;
+
+	VDE_WR(value, vde->regs + SXE(0x4C));
+
+	value = 0x03800000;
+	value |= min(bitstream_data_size, SZ_1M);
+
+	VDE_WR(value, vde->regs + SXE(0x68));
+
+	VDE_WR(bitstream_data_paddr, vde->regs + SXE(0x6C));
+
+	value = (1 << 28) | 5;
+	value |= ctx->pic_width_in_mbs << 11;
+	value |= ctx->pic_height_in_mbs << 3;
+
+	VDE_WR(value, vde->regs + MBE(0x80));
+
+	value = 0x26800000;
+	value |= ctx->level_idc << 4;
+	value |= !ctx->is_baseline_profile << 1;
+	value |= !!ctx->direct_8x8_inference_flag;
+
+	VDE_WR(value, vde->regs + MBE(0x80));
+
+	VDE_WR(0xF4000001, vde->regs + MBE(0x80));
+	VDE_WR(0x20000000, vde->regs + MBE(0x80));
+	VDE_WR(0xF4000101, vde->regs + MBE(0x80));
+
+	value = 0x20000000;
+	value |= ctx->chroma_qp_index_offset << 8;
+
+	VDE_WR(value, vde->regs + MBE(0x80));
+
+	if (tegra_vde_setup_mbe_frame_idx(vde->regs,
+					  ctx->pic_order_cnt_type == 0,
+					  ctx->dpb_frames_nb - 1)) {
+		dev_err(dev, "MBE frames setup failed\n");
+		return -EIO;
+	}
+
+	tegra_vde_mbe_set_0xa_reg(vde->regs, 0, 0x000009FC);
+	tegra_vde_mbe_set_0xa_reg(vde->regs, 2, 0xF1DEAD00);
+	tegra_vde_mbe_set_0xa_reg(vde->regs, 4, 0xF2DEAD00);
+	tegra_vde_mbe_set_0xa_reg(vde->regs, 6, 0xF3DEAD00);
+	tegra_vde_mbe_set_0xa_reg(vde->regs, 8, dpb_frames[0].aux_paddr);
+
+	value = 0xFC000000;
+	value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 2;
+
+	if (!ctx->is_baseline_profile)
+		value |= !!(dpb_frames[0].flags & FLAG_IS_REFERENCE) << 1;
+
+	VDE_WR(value, vde->regs + MBE(0x80));
+
+	if (tegra_vde_wait_mbe(vde->regs)) {
+		dev_err(dev, "MBE programming failed\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static void tegra_vde_decode_frame(struct tegra_vde *vde, int macroblocks_nb)
+{
+	VDE_WR(0x00000001, vde->regs + BSEV(0x8C));
+	VDE_WR(0x20000000 | (macroblocks_nb - 1), vde->regs + SXE(0x00));
+}
+
+static void tegra_vde_detach_and_put_dmabuf(struct dma_buf_attachment *a)
+{
+	struct dma_buf *dmabuf = a->dmabuf;
+
+	if (IS_ERR_OR_NULL(a))
+		return;
+
+	dma_buf_detach(dmabuf, a);
+	dma_buf_put(dmabuf);
+}
+
+static int tegra_vde_attach_dmabuf(struct device *dev, int fd,
+				   unsigned long offset, int min_size,
+				   struct dma_buf_attachment **a,
+				   phys_addr_t *paddr, u32 *size,
+				   enum dma_data_direction dma_dir)
+{
+	struct dma_buf_attachment *attachment;
+	struct dma_buf *dmabuf;
+	struct sg_table *sgt;
+
+	*a = NULL;
+	*paddr = 0xFBDEAD00;
+
+	dmabuf = dma_buf_get(fd);
+	if (IS_ERR(dmabuf)) {
+		dev_err(dev, "Invalid dmabuf FD\n");
+		return PTR_ERR(dmabuf);
+	}
+
+	if ((u64)offset + min_size > dmabuf->size) {
+		dev_err(dev, "Too small dmabuf size %d @0x%lX, "
+			     "should be at least %d\n",
+			dmabuf->size, offset, min_size);
+		return -EINVAL;
+	}
+
+	attachment = dma_buf_attach(dmabuf, dev);
+	if (IS_ERR(attachment)) {
+		dev_err(dev, "Failed to attach dmabuf\n");
+		dma_buf_put(dmabuf);
+		return PTR_ERR(attachment);
+	}
+
+	sgt = dma_buf_map_attachment(attachment, dma_dir);
+	if (IS_ERR(sgt)) {
+		dev_err(dev, "Failed to get dmabuf sg_table\n");
+		dma_buf_detach(dmabuf, attachment);
+		dma_buf_put(dmabuf);
+		return PTR_ERR(sgt);
+	}
+
+	if (sgt->nents != 1) {
+		dev_err(dev, "Sparsed DMA area is unsupported\n");
+		dma_buf_unmap_attachment(attachment, sgt, dma_dir);
+		dma_buf_detach(dmabuf, attachment);
+		dma_buf_put(dmabuf);
+		return -EINVAL;
+	}
+
+	*paddr = sg_dma_address(sgt->sgl) + offset;
+
+	dma_buf_unmap_attachment(attachment, sgt, dma_dir);
+
+	*a = attachment;
+
+	if (size)
+		*size = dmabuf->size - offset;
+
+	return 0;
+}
+
+static int tegra_vde_attach_frame_dmabufs(struct device *dev,
+					  struct video_frame *frame,
+					  struct tegra_vde_h264_frame *source,
+					  enum dma_data_direction dma_dir,
+					  int is_baseline_profile, int csize)
+{
+	int ret;
+
+	ret = tegra_vde_attach_dmabuf(dev, source->y_fd,
+				      source->y_offset, csize * 4,
+				      &frame->y_dmabuf_attachment,
+				      &frame->y_paddr, NULL, dma_dir);
+	if (ret)
+		return ret;
+
+	ret = tegra_vde_attach_dmabuf(dev, source->cb_fd,
+				      source->cb_offset, csize,
+				      &frame->cb_dmabuf_attachment,
+				      &frame->cb_paddr, NULL, dma_dir);
+	if (ret)
+		return ret;
+
+	ret = tegra_vde_attach_dmabuf(dev, source->cr_fd,
+				      source->cr_offset, csize,
+				      &frame->cr_dmabuf_attachment,
+				      &frame->cr_paddr, NULL, dma_dir);
+	if (ret)
+		return ret;
+
+	if (is_baseline_profile)
+		frame->aux_paddr = 0xF4DEAD00;
+	else
+		ret = tegra_vde_attach_dmabuf(dev, source->aux_fd,
+					      source->aux_offset, csize,
+					      &frame->aux_dmabuf_attachment,
+					      &frame->aux_paddr, NULL, dma_dir);
+
+	return ret;
+}
+
+static void tegra_vde_deattach_frame_dmabufs(struct video_frame *frame)
+{
+	tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);
+	tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
+	tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
+	tegra_vde_detach_and_put_dmabuf(frame->aux_dmabuf_attachment);
+}
+
+static int tegra_vde_copy_and_validate_frame(struct device *dev,
+					     struct tegra_vde_h264_frame *frame,
+					     unsigned long vaddr)
+{
+	if (copy_from_user(frame, (void __user *)vaddr, sizeof(*frame))) {
+		dev_err(dev, "Copying of H.264 frame from user failed\n");
+		return -EFAULT;
+	}
+
+	if (frame->frame_num > 0x7FFFFF) {
+		dev_err(dev, "Bad frame_num %u\n", frame->frame_num);
+		return -EINVAL;
+	}
+
+	if (frame->y_offset & 0xFF) {
+		dev_err(dev, "Bad y_offset 0x%X\n", frame->y_offset);
+		return -EINVAL;
+	}
+
+	if (frame->cb_offset & 0xFF) {
+		dev_err(dev, "Bad cb_offset 0x%X\n", frame->cb_offset);
+		return -EINVAL;
+	}
+
+	if (frame->cr_offset & 0xFF) {
+		dev_err(dev, "Bad cr_offset 0x%X\n", frame->cr_offset);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int tegra_vde_validate_h264_ctx(struct device *dev,
+				       struct tegra_vde_h264_decoder_ctx *ctx)
+{
+	if (ctx->dpb_frames_nb == 0 || ctx->dpb_frames_nb > 17) {
+		dev_err(dev, "Bad DPB size %u\n", ctx->dpb_frames_nb);
+		return -EINVAL;
+	}
+
+	if (ctx->level_idc > 15) {
+		dev_err(dev, "Bad level value %u\n", ctx->level_idc);
+		return -EINVAL;
+	}
+
+	if (ctx->pic_init_qp > 52) {
+		dev_err(dev, "Bad pic_init_qp value %u\n", ctx->pic_init_qp);
+		return -EINVAL;
+	}
+
+	if (ctx->log2_max_pic_order_cnt_lsb > 16) {
+		dev_err(dev, "Bad log2_max_pic_order_cnt_lsb value %u\n",
+			ctx->log2_max_pic_order_cnt_lsb);
+		return -EINVAL;
+	}
+
+	if (ctx->log2_max_frame_num > 16) {
+		dev_err(dev, "Bad log2_max_frame_num value %u\n",
+			ctx->log2_max_frame_num);
+		return -EINVAL;
+	}
+
+	if (ctx->chroma_qp_index_offset > 31) {
+		dev_err(dev, "Bad chroma_qp_index_offset value %u\n",
+			ctx->chroma_qp_index_offset);
+		return -EINVAL;
+	}
+
+	if (ctx->pic_order_cnt_type > 2) {
+		dev_err(dev, "Bad pic_order_cnt_type value %u\n",
+			ctx->pic_order_cnt_type);
+		return -EINVAL;
+	}
+
+	if (ctx->num_ref_idx_l0_active_minus1 > 15) {
+		dev_err(dev, "Bad num_ref_idx_l0_active_minus1 value %u\n",
+			ctx->num_ref_idx_l0_active_minus1);
+		return -EINVAL;
+	}
+
+	if (ctx->num_ref_idx_l1_active_minus1 > 15) {
+		dev_err(dev, "Bad num_ref_idx_l1_active_minus1 value %u\n",
+			ctx->num_ref_idx_l1_active_minus1);
+		return -EINVAL;
+	}
+
+	if (!ctx->pic_width_in_mbs || ctx->pic_width_in_mbs > 127) {
+		dev_err(dev, "Bad pic_width_in_mbs value %u, min 1 max 127\n",
+			ctx->pic_width_in_mbs);
+		return -EINVAL;
+	}
+
+	if (!ctx->pic_height_in_mbs || ctx->pic_height_in_mbs > 127) {
+		dev_err(dev, "Bad pic_height_in_mbs value %u, min 1 max 127\n",
+			ctx->pic_height_in_mbs);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde,
+				       unsigned long vaddr)
+{
+	struct tegra_vde_h264_decoder_ctx ctx;
+	struct tegra_vde_h264_frame frame;
+	struct device *dev = vde->miscdev.parent;
+	struct video_frame *dpb_frames = NULL;
+	struct dma_buf_attachment *bitstream_data_dmabuf_attachment = NULL;
+	enum dma_data_direction dma_dir;
+	phys_addr_t bitstream_data_paddr;
+	phys_addr_t bsev_paddr;
+	u32 bitstream_data_size;
+	u32 macroblocks_nb;
+	bool timeout = false;
+	int i, ret;
+
+	if (copy_from_user(&ctx, (void __user *)vaddr, sizeof(ctx))) {
+		dev_err(dev, "Copying of H.264 CTX from user failed\n");
+		return -EFAULT;
+	}
+
+	ret = tegra_vde_validate_h264_ctx(dev, &ctx);
+	if (ret)
+		return -EINVAL;
+
+	ret = tegra_vde_attach_dmabuf(dev, ctx.bitstream_data_fd,
+				      ctx.bitstream_data_offset, 0,
+				      &bitstream_data_dmabuf_attachment,
+				      &bitstream_data_paddr,
+				      &bitstream_data_size,
+				      DMA_TO_DEVICE);
+	if (ret)
+		goto cleanup;
+
+	dpb_frames = kcalloc(ctx.dpb_frames_nb, sizeof(*dpb_frames),
+			     GFP_KERNEL);
+	if (!dpb_frames) {
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	macroblocks_nb = ctx.pic_width_in_mbs * ctx.pic_height_in_mbs;
+	vaddr = ctx.dpb_frames_ptr;
+
+	for (i = 0; i < ctx.dpb_frames_nb; i++) {
+		ret = tegra_vde_copy_and_validate_frame(dev, &frame, vaddr);
+		if (ret)
+			goto cleanup;
+
+		dpb_frames[i].flags = frame.flags;
+		dpb_frames[i].frame_num = frame.frame_num;
+
+		dma_dir = (i == 0) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+		ret = tegra_vde_attach_frame_dmabufs(dev,
+						     &dpb_frames[i], &frame,
+						     dma_dir,
+						     ctx.is_baseline_profile,
+						     macroblocks_nb * 64);
+		if (ret) {
+			tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
+			goto cleanup;
+		}
+
+		vaddr += sizeof(frame);
+	}
+
+	ret = clk_prepare_enable(vde->clk);
+	if (ret) {
+		dev_err(dev, "Failed to enable clk: %d\n", ret);
+		goto cleanup;
+	}
+
+	ret = mutex_lock_interruptible(&vde->lock);
+	if (ret)
+		goto clkgate;
+
+	ret = reset_control_deassert(vde->rst);
+	if (ret) {
+		dev_err(dev, "Failed to deassert reset: %d\n", ret);
+		clk_disable_unprepare(vde->clk);
+		goto unlock;
+	}
+
+	ret = tegra_vde_setup_context(vde, &ctx, dpb_frames,
+				      bitstream_data_paddr,
+				      bitstream_data_size,
+				      macroblocks_nb);
+	if (ret)
+		goto reset;
+
+	tegra_vde_decode_frame(vde, macroblocks_nb);
+
+	timeout = !wait_for_completion_io_timeout(&vde->decode_completion,
+						  TEGRA_VDE_TIMEOUT);
+	if (timeout) {
+		bsev_paddr = readl(vde->regs + BSEV(0x10));
+		macroblocks_nb = readl(vde->regs + SXE(0xC8)) & 0x1FFF;
+
+		dev_err(dev, "Decoding failed, "
+				"read 0x%X bytes : %u macroblocks parsed\n",
+			bsev_paddr ? bsev_paddr - bitstream_data_paddr : 0,
+			macroblocks_nb);
+	}
+
+reset:
+	/*
+	 * We rely on the VDE registers reset value, otherwise VDE would
+	 * cause bus lockup.
+	 */
+	ret = reset_control_assert(vde->rst);
+	if (ret)
+		dev_err(dev, "Failed to assert reset: %d\n", ret);
+
+	if (timeout)
+		ret = -EIO;
+
+unlock:
+	mutex_unlock(&vde->lock);
+
+clkgate:
+	clk_disable_unprepare(vde->clk);
+
+cleanup:
+	if (dpb_frames)
+		while (i--)
+			tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
+
+	kfree(dpb_frames);
+
+	tegra_vde_detach_and_put_dmabuf(bitstream_data_dmabuf_attachment);
+
+	return ret;
+}
+
+static long tegra_vde_unlocked_ioctl(struct file *filp,
+				     unsigned int cmd, unsigned long arg)
+{
+	struct miscdevice *miscdev = filp->private_data;
+	struct tegra_vde *vde = container_of(miscdev, struct tegra_vde,
+					     miscdev);
+
+	switch (cmd) {
+	case TEGRA_VDE_IOCTL_DECODE_H264:
+		return tegra_vde_ioctl_decode_h264(vde, arg);
+	}
+
+	dev_err(miscdev->parent, "Invalid IOCTL command %u\n", cmd);
+
+	return -ENOTTY;
+}
+
+static const struct file_operations tegra_vde_fops = {
+	.owner		= THIS_MODULE,
+	.unlocked_ioctl	= tegra_vde_unlocked_ioctl,
+};
+
+static irqreturn_t tegra_vde_isr(int irq, void *data)
+{
+	struct tegra_vde *vde = data;
+
+	tegra_vde_set_bits(vde->regs, 0, FRAMEID(0x208));
+	complete(&vde->decode_completion);
+
+	return IRQ_HANDLED;
+}
+
+static int tegra_vde_probe(struct platform_device *pdev)
+{
+	struct resource *res_regs, *res_iram;
+	struct device *dev = &pdev->dev;
+	struct tegra_vde *vde;
+	int ret;
+
+	vde = devm_kzalloc(&pdev->dev, sizeof(*vde), GFP_KERNEL);
+	if (!vde)
+		return -ENOMEM;
+
+	platform_set_drvdata(pdev, vde);
+
+	res_regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
+	if (!res_regs)
+		return -ENODEV;
+
+	res_iram = platform_get_resource_byname(pdev, IORESOURCE_MEM, "iram");
+	if (!res_iram)
+		return -ENODEV;
+
+	vde->irq = platform_get_irq_byname(pdev, "sync-token");
+	if (vde->irq < 0)
+		return vde->irq;
+
+	vde->regs = devm_ioremap_resource(dev, res_regs);
+	if (IS_ERR(vde->regs))
+		return PTR_ERR(vde->regs);
+
+	vde->iram = devm_ioremap_resource(dev, res_iram);
+	if (IS_ERR(vde->iram))
+		return PTR_ERR(vde->iram);
+
+	vde->clk = devm_clk_get(dev, NULL);
+	if (IS_ERR(vde->clk)) {
+		dev_err(dev, "Could not get VDE clk\n");
+		return PTR_ERR(vde->clk);
+	}
+
+	vde->rst = devm_reset_control_get(dev, "vde");
+	if (IS_ERR(vde->rst)) {
+		dev_err(dev, "Could not get VDE reset\n");
+		return PTR_ERR(vde->rst);
+	}
+
+	ret = devm_request_irq(dev, vde->irq, tegra_vde_isr, IRQF_SHARED,
+			       dev_name(dev), vde);
+	if (ret)
+		return -ENODEV;
+
+	ret = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_VDEC,
+						vde->clk, vde->rst);
+	if (ret) {
+		dev_err(&pdev->dev, "Failed to power up VDE unit\n");
+		return ret;
+	}
+
+	ret = reset_control_assert(vde->rst);
+	if (ret) {
+		dev_err(dev, "Failed to assert reset: %d\n", ret);
+		return ret;
+	}
+
+	clk_disable_unprepare(vde->clk);
+
+	mutex_init(&vde->lock);
+	init_completion(&vde->decode_completion);
+
+	vde->iram_lists_paddr = res_iram->start;
+	vde->miscdev.minor = MISC_DYNAMIC_MINOR;
+	vde->miscdev.name = "tegra_vde";
+	vde->miscdev.fops = &tegra_vde_fops;
+	vde->miscdev.parent = dev;
+
+	ret = misc_register(&vde->miscdev);
+	if (ret)
+		return -ENODEV;
+
+	return 0;
+}
+
+static int tegra_vde_remove(struct platform_device *pdev)
+{
+	struct tegra_vde *vde = platform_get_drvdata(pdev);
+
+	misc_deregister(&vde->miscdev);
+
+	return 0;
+}
+
+static const struct of_device_id tegra_vde_of_match[] = {
+	{ .compatible = "nvidia,tegra20-vde", },
+	{ },
+};
+MODULE_DEVICE_TABLE(of, tegra_vde_of_match);
+
+static struct platform_driver tegra_vde_driver = {
+	.probe		= tegra_vde_probe,
+	.remove		= tegra_vde_remove,
+	.driver		= {
+		.name		= "tegra-vde",
+		.of_match_table = tegra_vde_of_match,
+	},
+};
+module_platform_driver(tegra_vde_driver);
+
+MODULE_DESCRIPTION("NVIDIA Tegra20 Video Decoder driver");
+MODULE_AUTHOR("Dmitry Osipenko");
+MODULE_LICENSE("GPL");