Message ID | 5c8b83775b982e6ee851c127444a8e839f422ad0.1506377430.git.digetx@gmail.com |
---|---|
State | Deferred |
Headers | show |
Series | NVIDIA Tegra20 video decoder driver | expand |
On 09/25/2017 04:15 PM, Dmitry Osipenko wrote: > Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of > video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports > decoding of CAVLC H.264 only. Note: I don't know anything much about video decoding on Tegra (just NV desktop GPUs, and that was a while ago), but I had a couple small comments on the DT binding: > diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt > +NVIDIA Tegra Video Decoder Engine > + > +Required properties: > +- compatible : "nvidia,tegra20-vde" > +- reg : Must contain 2 register ranges: registers and IRAM area. > +- reg-names : Must include the following entries: > + - regs > + - iram I think the IRAM region needs more explanation: What is the region used for and by what? Can it be moved, and if so does the move need to be co-ordinated with any other piece of SW? > +- clocks : Must contain one entry, for the module clock. > + See ../clocks/clock-bindings.txt for details. > +- resets : Must contain an entry for each entry in reset-names. > + See ../reset/reset.txt for details. > +- reset-names : Must include the following entries: > + - vde Let's require a clock-names property too. -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26.09.2017 02:01, Stephen Warren wrote: > On 09/25/2017 04:15 PM, Dmitry Osipenko wrote: >> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of >> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports >> decoding of CAVLC H.264 only. > > Note: I don't know anything much about video decoding on Tegra (just NV desktop > GPUs, and that was a while ago), but I had a couple small comments on the DT > binding: > >> diff --git >> a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt >> b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt > >> +NVIDIA Tegra Video Decoder Engine >> + >> +Required properties: >> +- compatible : "nvidia,tegra20-vde" >> +- reg : Must contain 2 register ranges: registers and IRAM area. >> +- reg-names : Must include the following entries: >> + - regs >> + - iram > > I think the IRAM region needs more explanation: What is the region used for and > by what? Can it be moved, and if so does the move need to be co-ordinated with > any other piece of SW? > IRAM region is used by Video Decoder HW for internal use and some of decoding parameters are supplied via IRAM, like frames order list. AFAIK IRAM addresses are hardwired in HW and aren't movable, it is not 100% but I'm pretty sure. Should it be explained in the binding? >> +- clocks : Must contain one entry, for the module clock. >> + See ../clocks/clock-bindings.txt for details. >> +- resets : Must contain an entry for each entry in reset-names. >> + See ../reset/reset.txt for details. >> +- reset-names : Must include the following entries: >> + - vde > > Let's require a clock-names property too. Okay, I'll add this property to the binding.
On 09/25/2017 05:45 PM, Dmitry Osipenko wrote: > On 26.09.2017 02:01, Stephen Warren wrote: >> On 09/25/2017 04:15 PM, Dmitry Osipenko wrote: >>> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of >>> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports >>> decoding of CAVLC H.264 only. >> >> Note: I don't know anything much about video decoding on Tegra (just NV desktop >> GPUs, and that was a while ago), but I had a couple small comments on the DT >> binding: >> >>> diff --git >>> a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt >>> b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt >> >>> +NVIDIA Tegra Video Decoder Engine >>> + >>> +Required properties: >>> +- compatible : "nvidia,tegra20-vde" >>> +- reg : Must contain 2 register ranges: registers and IRAM area. >>> +- reg-names : Must include the following entries: >>> + - regs >>> + - iram >> >> I think the IRAM region needs more explanation: What is the region used for and >> by what? Can it be moved, and if so does the move need to be co-ordinated with >> any other piece of SW? > > IRAM region is used by Video Decoder HW for internal use and some of decoding > parameters are supplied via IRAM, like frames order list. AFAIK IRAM addresses > are hardwired in HW and aren't movable, it is not 100% but I'm pretty sure. > Should it be explained in the binding? I think this should be briefly mentioned, yes. Otherwise at least people who don't know the VDE HW well (like me) will wonder why on earth VDE interacts with IRAM at all. I would have assumed all parameters were supplied via registers or via descriptors in DRAM. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26.09.2017 08:11, Stephen Warren wrote: > On 09/25/2017 05:45 PM, Dmitry Osipenko wrote: >> On 26.09.2017 02:01, Stephen Warren wrote: >>> On 09/25/2017 04:15 PM, Dmitry Osipenko wrote: >>>> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of >>>> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports >>>> decoding of CAVLC H.264 only. >>> >>> Note: I don't know anything much about video decoding on Tegra (just NV desktop >>> GPUs, and that was a while ago), but I had a couple small comments on the DT >>> binding: >>> >>>> diff --git >>>> a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt >>>> b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt >>> >>>> +NVIDIA Tegra Video Decoder Engine >>>> + >>>> +Required properties: >>>> +- compatible : "nvidia,tegra20-vde" >>>> +- reg : Must contain 2 register ranges: registers and IRAM area. >>>> +- reg-names : Must include the following entries: >>>> + - regs >>>> + - iram >>> >>> I think the IRAM region needs more explanation: What is the region used for and >>> by what? Can it be moved, and if so does the move need to be co-ordinated with >>> any other piece of SW? >> >> IRAM region is used by Video Decoder HW for internal use and some of decoding >> parameters are supplied via IRAM, like frames order list. AFAIK IRAM addresses >> are hardwired in HW and aren't movable, it is not 100% but I'm pretty sure. >> Should it be explained in the binding? > > I think this should be briefly mentioned, yes. Otherwise at least people > who don't know the VDE HW well (like me) will wonder why on earth VDE > interacts with IRAM at all. I would have assumed all parameters were > supplied via registers or via descriptors in DRAM. > > Thanks. > I also forgot to mention that VDE scrubs that IRAM region on HW reset. So yeah, it's definitely a part of HW definition. I'll add a brief explanation to the binding.
I feel like this code is good enough to go into the regular kernel instead of staging, but I don't really know what "- properly handle decoding faults" means in this context. Does the driver Oops all the time or what? Anyway, minor comments inline. On Tue, Sep 26, 2017 at 01:15:42AM +0300, Dmitry Osipenko wrote: > diff --git a/drivers/staging/tegra-vde/Kconfig b/drivers/staging/tegra-vde/Kconfig > new file mode 100644 > index 000000000000..b947c012a373 > --- /dev/null > +++ b/drivers/staging/tegra-vde/Kconfig > @@ -0,0 +1,6 @@ > +config TEGRA_VDE > + tristate "NVIDIA Tegra20 video decoder driver" > + depends on ARCH_TEGRA_2x_SOC Could we get a || COMPILE_TEST here as well? > + help > + Say Y here to enable support for a NVIDIA Tegra20 video decoder > + driver. > diff --git a/drivers/staging/tegra-vde/Makefile b/drivers/staging/tegra-vde/Makefile > new file mode 100644 > index 000000000000..e7c0df1174bf > --- /dev/null > +++ b/drivers/staging/tegra-vde/Makefile > @@ -0,0 +1 @@ > +obj-$(CONFIG_TEGRA_VDE) += vde.o > diff --git a/drivers/staging/tegra-vde/TODO b/drivers/staging/tegra-vde/TODO > new file mode 100644 > index 000000000000..533ddfc5190e > --- /dev/null > +++ b/drivers/staging/tegra-vde/TODO > @@ -0,0 +1,8 @@ > +All TODO's require reverse-engineering to be done first, it is very > +unlikely that NVIDIA would ever release HW specs to public. > + > +TODO: > + - properly handle decoding faults > + - support more formats Adding more formats is not a reason to put this into staging instead of the normal drivers/ dir. > +static int tegra_vde_setup_context(struct tegra_vde *vde, > + struct tegra_vde_h264_decoder_ctx *ctx, > + struct video_frame *dpb_frames, > + phys_addr_t bitstream_data_paddr, > + int bitstream_data_size, > + int macroblocks_nb) > +{ > + struct device *dev = vde->miscdev.parent; > + u32 value; > + > + tegra_vde_set_bits(vde->regs, 0xA, SXE(0xF0)); > + tegra_vde_set_bits(vde->regs, 0xB, BSEV(CMDQUE_CONTROL)); > + tegra_vde_set_bits(vde->regs, 0x8002, MBE(0x50)); > + tegra_vde_set_bits(vde->regs, 0xA, MBE(0xA0)); > + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x14)); > + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x28)); > + tegra_vde_set_bits(vde->regs, 0xA00, MCE(0x08)); > + tegra_vde_set_bits(vde->regs, 0xA, TFE(0x00)); > + tegra_vde_set_bits(vde->regs, 0x5, VDMA(0x04)); > + > + VDE_WR(0x00000000, vde->regs + VDMA(0x1C)); > + VDE_WR(0x00000000, vde->regs + VDMA(0x00)); > + VDE_WR(0x00000007, vde->regs + VDMA(0x04)); > + VDE_WR(0x00000007, vde->regs + FRAMEID(0x200)); > + VDE_WR(0x00000005, vde->regs + TFE(0x04)); > + VDE_WR(0x00000000, vde->regs + MBE(0x84)); > + VDE_WR(0x00000010, vde->regs + SXE(0x08)); > + VDE_WR(0x00000150, vde->regs + SXE(0x54)); > + VDE_WR(0x0000054C, vde->regs + SXE(0x58)); > + VDE_WR(0x00000E34, vde->regs + SXE(0x5C)); > + VDE_WR(0x063C063C, vde->regs + MCE(0x10)); > + VDE_WR(0x0003FC00, vde->regs + BSEV(INTR_STATUS)); > + VDE_WR(0x0000150D, vde->regs + BSEV(BSE_CONFIG)); > + VDE_WR(0x00000100, vde->regs + BSEV(BSE_INT_ENB)); > + VDE_WR(0x00000000, vde->regs + BSEV(0x98)); > + VDE_WR(0x00000060, vde->regs + BSEV(0x9C)); > + > + memset_io(vde->iram + 512, 0, macroblocks_nb / 2); > + > + tegra_setup_frameidx(vde->regs, dpb_frames, ctx->dpb_frames_nb, > + ctx->pic_width_in_mbs, ctx->pic_height_in_mbs); > + > + tegra_vde_setup_iram_tables(vde->iram, dpb_frames, > + ctx->dpb_frames_nb - 1, > + ctx->dpb_ref_frames_with_earlier_poc_nb); > + > + VDE_WR(0x00000000, vde->regs + BSEV(0x8C)); > + VDE_WR(bitstream_data_paddr + bitstream_data_size, > + vde->regs + BSEV(0x54)); > + > + value = ctx->pic_width_in_mbs << 11 | ctx->pic_height_in_mbs << 3; > + > + VDE_WR(value, vde->regs + BSEV(0x88)); > + > + if (tegra_vde_wait_bsev(vde, false)) > + return -EIO; > + > + if (tegra_vde_push_bsev_icmdqueue(vde, 0x800003FC, false)) Preserve the error code from tegra_vde_push_bsev_icmdqueue(). It's still -EIO so this doesn't affect runtime. > + return -EIO; > + > + value = 0x01500000; > + value |= ((vde->iram_lists_paddr + 512) >> 2) & 0xFFFF; > + > + if (tegra_vde_push_bsev_icmdqueue(vde, value, true)) Same for all. > + return -EIO; > + > + if (tegra_vde_push_bsev_icmdqueue(vde, 0x840F054C, false)) > + return -EIO; > + > + if (tegra_vde_push_bsev_icmdqueue(vde, 0x80000080, false)) > + return -EIO; > + > + value = 0x0E340000 | ((vde->iram_lists_paddr >> 2) & 0xFFFF); > + > + if (tegra_vde_push_bsev_icmdqueue(vde, value, true)) > + return -EIO; > + > + value = (1 << 23) | 5; I don't like these magic numbers. > + value |= ctx->pic_width_in_mbs << 11; > + value |= ctx->pic_height_in_mbs << 3; > + > + VDE_WR(value, vde->regs + SXE(0x10)); > + > + value = !ctx->is_baseline_profile << 17; > + value |= ctx->level_idc << 13; > + value |= ctx->log2_max_pic_order_cnt_lsb << 7; > + value |= ctx->pic_order_cnt_type << 5; > + value |= ctx->log2_max_frame_num; > + > + VDE_WR(value, vde->regs + SXE(0x40)); > + > + value = ctx->pic_init_qp << 25; > + value |= !!(ctx->deblocking_filter_control_present_flag) << 2; > + value |= !!ctx->pic_order_present_flag; > + > + VDE_WR(value, vde->regs + SXE(0x44)); > + > + value = ctx->chroma_qp_index_offset; > + value |= ctx->num_ref_idx_l0_active_minus1 << 5; > + value |= ctx->num_ref_idx_l1_active_minus1 << 10; > + value |= !!ctx->constrained_intra_pred_flag << 15; > + > + VDE_WR(value, vde->regs + SXE(0x48)); > + > + value = 0x0C000000; > + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 24; > + > + VDE_WR(value, vde->regs + SXE(0x4C)); > + > + value = 0x03800000; > + value |= min(bitstream_data_size, SZ_1M); > + > + VDE_WR(value, vde->regs + SXE(0x68)); > + > + VDE_WR(bitstream_data_paddr, vde->regs + SXE(0x6C)); > + > + value = (1 << 28) | 5; > + value |= ctx->pic_width_in_mbs << 11; > + value |= ctx->pic_height_in_mbs << 3; > + > + VDE_WR(value, vde->regs + MBE(0x80)); > + > + value = 0x26800000; > + value |= ctx->level_idc << 4; > + value |= !ctx->is_baseline_profile << 1; > + value |= !!ctx->direct_8x8_inference_flag; > + > + VDE_WR(value, vde->regs + MBE(0x80)); > + > + VDE_WR(0xF4000001, vde->regs + MBE(0x80)); > + VDE_WR(0x20000000, vde->regs + MBE(0x80)); > + VDE_WR(0xF4000101, vde->regs + MBE(0x80)); > + > + value = 0x20000000; > + value |= ctx->chroma_qp_index_offset << 8; > + > + VDE_WR(value, vde->regs + MBE(0x80)); > + > + if (tegra_vde_setup_mbe_frame_idx(vde->regs, > + ctx->pic_order_cnt_type == 0, > + ctx->dpb_frames_nb - 1)) { Preserve the error code. > + dev_err(dev, "MBE frames setup failed\n"); > + return -EIO; > + } > + > + tegra_vde_mbe_set_0xa_reg(vde->regs, 0, 0x000009FC); > + tegra_vde_mbe_set_0xa_reg(vde->regs, 2, 0xF1DEAD00); > + tegra_vde_mbe_set_0xa_reg(vde->regs, 4, 0xF2DEAD00); > + tegra_vde_mbe_set_0xa_reg(vde->regs, 6, 0xF3DEAD00); > + tegra_vde_mbe_set_0xa_reg(vde->regs, 8, dpb_frames[0].aux_paddr); > + > + value = 0xFC000000; > + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 2; > + > + if (!ctx->is_baseline_profile) > + value |= !!(dpb_frames[0].flags & FLAG_IS_REFERENCE) << 1; > + > + VDE_WR(value, vde->regs + MBE(0x80)); > + > + if (tegra_vde_wait_mbe(vde->regs)) { > + dev_err(dev, "MBE programming failed\n"); > + return -EIO; > + } > + > + return 0; > +} > + > +static void tegra_vde_decode_frame(struct tegra_vde *vde, int macroblocks_nb) > +{ > + VDE_WR(0x00000001, vde->regs + BSEV(0x8C)); > + VDE_WR(0x20000000 | (macroblocks_nb - 1), vde->regs + SXE(0x00)); > +} > + > +static void tegra_vde_detach_and_put_dmabuf(struct dma_buf_attachment *a) > +{ > + struct dma_buf *dmabuf = a->dmabuf; > + > + if (IS_ERR_OR_NULL(a)) You just dereferenced this on the line before so it's too late. Obviously we don't want to dereference either an error pointer or a NULL but I'm wondering the significance of having it be both. Normally that would mean that NULL is a special case of success. In other words, error pointer means the hardware is broken but NULL means it's missing and not required. I guess I'm suggesting to just delete the check and make sure we never set this to either NULL or ERR_PTR. > + return; > + > + dma_buf_detach(dmabuf, a); > + dma_buf_put(dmabuf); > +} > + > +static int tegra_vde_attach_dmabuf(struct device *dev, int fd, > + unsigned long offset, int min_size, > + struct dma_buf_attachment **a, > + phys_addr_t *paddr, u32 *size, > + enum dma_data_direction dma_dir) > +{ > + struct dma_buf_attachment *attachment; > + struct dma_buf *dmabuf; > + struct sg_table *sgt; > + > + *a = NULL; > + *paddr = 0xFBDEAD00; > + > + dmabuf = dma_buf_get(fd); > + if (IS_ERR(dmabuf)) { > + dev_err(dev, "Invalid dmabuf FD\n"); > + return PTR_ERR(dmabuf); > + } > + > + if ((u64)offset + min_size > dmabuf->size) { > + dev_err(dev, "Too small dmabuf size %d @0x%lX, " > + "should be at least %d\n", > + dmabuf->size, offset, min_size); > + return -EINVAL; > + } > + > + attachment = dma_buf_attach(dmabuf, dev); > + if (IS_ERR(attachment)) { > + dev_err(dev, "Failed to attach dmabuf\n"); > + dma_buf_put(dmabuf); > + return PTR_ERR(attachment); > + } > + > + sgt = dma_buf_map_attachment(attachment, dma_dir); > + if (IS_ERR(sgt)) { > + dev_err(dev, "Failed to get dmabuf sg_table\n"); > + dma_buf_detach(dmabuf, attachment); > + dma_buf_put(dmabuf); > + return PTR_ERR(sgt); > + } > + > + if (sgt->nents != 1) { > + dev_err(dev, "Sparsed DMA area is unsupported\n"); s/Sparsed/Sparse/ > + dma_buf_unmap_attachment(attachment, sgt, dma_dir); > + dma_buf_detach(dmabuf, attachment); > + dma_buf_put(dmabuf); > + return -EINVAL; This function would be cleaner using gotos to unwind. Pick the goto name to reflect what the goto does. For example, here it would be: if (sgt->nents != 1) { dev_err(dev, "Sparse DMA area is unsupported\n"); ret = -EINVAL; goto err_umap; } > + } > + > + *paddr = sg_dma_address(sgt->sgl) + offset; > + > + dma_buf_unmap_attachment(attachment, sgt, dma_dir); > + > + *a = attachment; > + > + if (size) > + *size = dmabuf->size - offset; > + > + return 0; return 0; err_unmap: dma_buf_unmap_attachment(attachment, sgt, dma_dir); err_detach: dma_buf_detach(dmabuf, attachment); err_put: dma_buf_put(dmabuf); return ret; > +} > + > +static int tegra_vde_attach_frame_dmabufs(struct device *dev, > + struct video_frame *frame, > + struct tegra_vde_h264_frame *source, > + enum dma_data_direction dma_dir, > + int is_baseline_profile, int csize) > +{ > + int ret; > + > + ret = tegra_vde_attach_dmabuf(dev, source->y_fd, > + source->y_offset, csize * 4, > + &frame->y_dmabuf_attachment, > + &frame->y_paddr, NULL, dma_dir); > + if (ret) > + return ret; > + > + ret = tegra_vde_attach_dmabuf(dev, source->cb_fd, > + source->cb_offset, csize, > + &frame->cb_dmabuf_attachment, > + &frame->cb_paddr, NULL, dma_dir); > + if (ret) > + return ret; > + > + ret = tegra_vde_attach_dmabuf(dev, source->cr_fd, > + source->cr_offset, csize, > + &frame->cr_dmabuf_attachment, > + &frame->cr_paddr, NULL, dma_dir); > + if (ret) > + return ret; > + > + if (is_baseline_profile) > + frame->aux_paddr = 0xF4DEAD00; The handling of is_baseline_profile is strange to me. It feels like we should always check it before we use ->aux_paddr but we don't ever do that. > + else > + ret = tegra_vde_attach_dmabuf(dev, source->aux_fd, > + source->aux_offset, csize, > + &frame->aux_dmabuf_attachment, > + &frame->aux_paddr, NULL, dma_dir); This function should have some error handling to undo the earlier attach calls if the latter ones fail. See below: > + > + return ret; return 0; err_detach_cr: tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment); err_detach_cb: tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment); err_detach_y: tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment); return ret; > +} > + > +static void tegra_vde_deattach_frame_dmabufs(struct video_frame *frame) > +{ > + tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment); > + tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment); > + tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment); > + tegra_vde_detach_and_put_dmabuf(frame->aux_dmabuf_attachment); It would be happier to write this in the reverse order so it matches the error handling that I wrote for you. > +} > + > +static int tegra_vde_copy_and_validate_frame(struct device *dev, > + struct tegra_vde_h264_frame *frame, > + unsigned long vaddr) > +{ > + if (copy_from_user(frame, (void __user *)vaddr, sizeof(*frame))) { > + dev_err(dev, "Copying of H.264 frame from user failed\n"); > + return -EFAULT; Error message are a funny thing and different people feel different ways. These can be triggered by the user so they let you spam dmesg but I can see how many of them would be useful. These ones for copy_from_user() are not useful since we assume the programmer should know that stuff. The error code seems enough. > + } > + > + if (frame->frame_num > 0x7FFFFF) { > + dev_err(dev, "Bad frame_num %u\n", frame->frame_num); > + return -EINVAL; > + } > + > + if (frame->y_offset & 0xFF) { > + dev_err(dev, "Bad y_offset 0x%X\n", frame->y_offset); > + return -EINVAL; > + } > + > + if (frame->cb_offset & 0xFF) { > + dev_err(dev, "Bad cb_offset 0x%X\n", frame->cb_offset); > + return -EINVAL; > + } > + > + if (frame->cr_offset & 0xFF) { > + dev_err(dev, "Bad cr_offset 0x%X\n", frame->cr_offset); > + return -EINVAL; > + } > + > + return 0; > +} > + > +static int tegra_vde_validate_h264_ctx(struct device *dev, > + struct tegra_vde_h264_decoder_ctx *ctx) > +{ > + if (ctx->dpb_frames_nb == 0 || ctx->dpb_frames_nb > 17) { > + dev_err(dev, "Bad DPB size %u\n", ctx->dpb_frames_nb); > + return -EINVAL; > + } > + > + if (ctx->level_idc > 15) { > + dev_err(dev, "Bad level value %u\n", ctx->level_idc); > + return -EINVAL; > + } > + > + if (ctx->pic_init_qp > 52) { > + dev_err(dev, "Bad pic_init_qp value %u\n", ctx->pic_init_qp); > + return -EINVAL; > + } > + > + if (ctx->log2_max_pic_order_cnt_lsb > 16) { > + dev_err(dev, "Bad log2_max_pic_order_cnt_lsb value %u\n", > + ctx->log2_max_pic_order_cnt_lsb); > + return -EINVAL; > + } > + > + if (ctx->log2_max_frame_num > 16) { > + dev_err(dev, "Bad log2_max_frame_num value %u\n", > + ctx->log2_max_frame_num); > + return -EINVAL; > + } > + > + if (ctx->chroma_qp_index_offset > 31) { > + dev_err(dev, "Bad chroma_qp_index_offset value %u\n", > + ctx->chroma_qp_index_offset); > + return -EINVAL; > + } > + > + if (ctx->pic_order_cnt_type > 2) { > + dev_err(dev, "Bad pic_order_cnt_type value %u\n", > + ctx->pic_order_cnt_type); > + return -EINVAL; > + } > + > + if (ctx->num_ref_idx_l0_active_minus1 > 15) { > + dev_err(dev, "Bad num_ref_idx_l0_active_minus1 value %u\n", > + ctx->num_ref_idx_l0_active_minus1); > + return -EINVAL; > + } > + > + if (ctx->num_ref_idx_l1_active_minus1 > 15) { > + dev_err(dev, "Bad num_ref_idx_l1_active_minus1 value %u\n", > + ctx->num_ref_idx_l1_active_minus1); > + return -EINVAL; > + } > + > + if (!ctx->pic_width_in_mbs || ctx->pic_width_in_mbs > 127) { > + dev_err(dev, "Bad pic_width_in_mbs value %u, min 1 max 127\n", > + ctx->pic_width_in_mbs); > + return -EINVAL; > + } > + > + if (!ctx->pic_height_in_mbs || ctx->pic_height_in_mbs > 127) { > + dev_err(dev, "Bad pic_height_in_mbs value %u, min 1 max 127\n", > + ctx->pic_height_in_mbs); > + return -EINVAL; > + } > + > + return 0; > +} > + > +static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde, > + unsigned long vaddr) > +{ > + struct tegra_vde_h264_decoder_ctx ctx; > + struct tegra_vde_h264_frame frame; > + struct device *dev = vde->miscdev.parent; > + struct video_frame *dpb_frames = NULL; > + struct dma_buf_attachment *bitstream_data_dmabuf_attachment = NULL; > + enum dma_data_direction dma_dir; > + phys_addr_t bitstream_data_paddr; > + phys_addr_t bsev_paddr; > + u32 bitstream_data_size; > + u32 macroblocks_nb; > + bool timeout = false; > + int i, ret; > + > + if (copy_from_user(&ctx, (void __user *)vaddr, sizeof(ctx))) { > + dev_err(dev, "Copying of H.264 CTX from user failed\n"); > + return -EFAULT; > + } > + > + ret = tegra_vde_validate_h264_ctx(dev, &ctx); > + if (ret) > + return -EINVAL; > + > + ret = tegra_vde_attach_dmabuf(dev, ctx.bitstream_data_fd, > + ctx.bitstream_data_offset, 0, > + &bitstream_data_dmabuf_attachment, > + &bitstream_data_paddr, > + &bitstream_data_size, > + DMA_TO_DEVICE); > + if (ret) > + goto cleanup; I hate this label name. What are we cleaning up??? Now I have to set a bookmark so I can remember where I left and then scroll down to the bottom of the function and take a look. [pause] OK. I'm back. I call this "one err" style error handling. And it's very bug prone. Please read my essay on the topic: https://plus.google.com/106378716002406849458/posts/dnanfhQ4mHQ The bug is that we call tegra_vde_detach_and_put_dmabuf() with a NULL pointer and there was that dereference before check that I mentioned earlier. > + > + dpb_frames = kcalloc(ctx.dpb_frames_nb, sizeof(*dpb_frames), > + GFP_KERNEL); > + if (!dpb_frames) { > + ret = -ENOMEM; > + goto cleanup; > + } > + > + macroblocks_nb = ctx.pic_width_in_mbs * ctx.pic_height_in_mbs; > + vaddr = ctx.dpb_frames_ptr; > + > + for (i = 0; i < ctx.dpb_frames_nb; i++) { > + ret = tegra_vde_copy_and_validate_frame(dev, &frame, vaddr); > + if (ret) > + goto cleanup; > + > + dpb_frames[i].flags = frame.flags; > + dpb_frames[i].frame_num = frame.frame_num; > + > + dma_dir = (i == 0) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; > + > + ret = tegra_vde_attach_frame_dmabufs(dev, > + &dpb_frames[i], &frame, > + dma_dir, > + ctx.is_baseline_profile, > + macroblocks_nb * 64); > + if (ret) { > + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]); > + goto cleanup; > + } > + > + vaddr += sizeof(frame); > + } > + > + ret = clk_prepare_enable(vde->clk); > + if (ret) { > + dev_err(dev, "Failed to enable clk: %d\n", ret); > + goto cleanup; > + } > + > + ret = mutex_lock_interruptible(&vde->lock); > + if (ret) > + goto clkgate; > + > + ret = reset_control_deassert(vde->rst); > + if (ret) { > + dev_err(dev, "Failed to deassert reset: %d\n", ret); > + clk_disable_unprepare(vde->clk); We do a second clk_disable_unprepare(vde->clk); after the unlock. Delete this? > + goto unlock; > + } > + > + ret = tegra_vde_setup_context(vde, &ctx, dpb_frames, > + bitstream_data_paddr, > + bitstream_data_size, > + macroblocks_nb); > + if (ret) > + goto reset; > + > + tegra_vde_decode_frame(vde, macroblocks_nb); > + > + timeout = !wait_for_completion_io_timeout(&vde->decode_completion, > + TEGRA_VDE_TIMEOUT); > + if (timeout) { > + bsev_paddr = readl(vde->regs + BSEV(0x10)); > + macroblocks_nb = readl(vde->regs + SXE(0xC8)) & 0x1FFF; > + > + dev_err(dev, "Decoding failed, " > + "read 0x%X bytes : %u macroblocks parsed\n", > + bsev_paddr ? bsev_paddr - bitstream_data_paddr : 0, > + macroblocks_nb); > + } > + > +reset: > + /* > + * We rely on the VDE registers reset value, otherwise VDE would > + * cause bus lockup. > + */ > + ret = reset_control_assert(vde->rst); > + if (ret) > + dev_err(dev, "Failed to assert reset: %d\n", ret); We're overwriting "ret" here when we probably want to preserve the error code from reset_control_deassert(). > + > + if (timeout) > + ret = -EIO; > + > +unlock: > + mutex_unlock(&vde->lock); > + > +clkgate: > + clk_disable_unprepare(vde->clk); > + > +cleanup: > + if (dpb_frames) > + while (i--) > + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]); > + > + kfree(dpb_frames); > + > + tegra_vde_detach_and_put_dmabuf(bitstream_data_dmabuf_attachment); > + > + return ret; > +} > + > +static long tegra_vde_unlocked_ioctl(struct file *filp, > + unsigned int cmd, unsigned long arg) > +{ > + struct miscdevice *miscdev = filp->private_data; > + struct tegra_vde *vde = container_of(miscdev, struct tegra_vde, > + miscdev); > + > + switch (cmd) { > + case TEGRA_VDE_IOCTL_DECODE_H264: > + return tegra_vde_ioctl_decode_h264(vde, arg); > + } > + > + dev_err(miscdev->parent, "Invalid IOCTL command %u\n", cmd); > + > + return -ENOTTY; > +} > + > +static const struct file_operations tegra_vde_fops = { > + .owner = THIS_MODULE, > + .unlocked_ioctl = tegra_vde_unlocked_ioctl, > +}; > + > +static irqreturn_t tegra_vde_isr(int irq, void *data) > +{ > + struct tegra_vde *vde = data; > + > + tegra_vde_set_bits(vde->regs, 0, FRAMEID(0x208)); > + complete(&vde->decode_completion); > + > + return IRQ_HANDLED; > +} > + > +static int tegra_vde_probe(struct platform_device *pdev) > +{ > + struct resource *res_regs, *res_iram; > + struct device *dev = &pdev->dev; > + struct tegra_vde *vde; > + int ret; > + > + vde = devm_kzalloc(&pdev->dev, sizeof(*vde), GFP_KERNEL); > + if (!vde) > + return -ENOMEM; > + > + platform_set_drvdata(pdev, vde); > + > + res_regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs"); > + if (!res_regs) > + return -ENODEV; > + > + res_iram = platform_get_resource_byname(pdev, IORESOURCE_MEM, "iram"); > + if (!res_iram) > + return -ENODEV; > + > + vde->irq = platform_get_irq_byname(pdev, "sync-token"); > + if (vde->irq < 0) > + return vde->irq; > + > + vde->regs = devm_ioremap_resource(dev, res_regs); > + if (IS_ERR(vde->regs)) > + return PTR_ERR(vde->regs); > + > + vde->iram = devm_ioremap_resource(dev, res_iram); > + if (IS_ERR(vde->iram)) > + return PTR_ERR(vde->iram); > + > + vde->clk = devm_clk_get(dev, NULL); > + if (IS_ERR(vde->clk)) { > + dev_err(dev, "Could not get VDE clk\n"); > + return PTR_ERR(vde->clk); > + } > + > + vde->rst = devm_reset_control_get(dev, "vde"); > + if (IS_ERR(vde->rst)) { > + dev_err(dev, "Could not get VDE reset\n"); > + return PTR_ERR(vde->rst); > + } > + > + ret = devm_request_irq(dev, vde->irq, tegra_vde_isr, IRQF_SHARED, > + dev_name(dev), vde); > + if (ret) > + return -ENODEV; Presever the error code. > + > + ret = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_VDEC, > + vde->clk, vde->rst); > + if (ret) { > + dev_err(&pdev->dev, "Failed to power up VDE unit\n"); > + return ret; > + } > + > + ret = reset_control_assert(vde->rst); > + if (ret) { > + dev_err(dev, "Failed to assert reset: %d\n", ret); > + return ret; > + } > + > + clk_disable_unprepare(vde->clk); > + > + mutex_init(&vde->lock); > + init_completion(&vde->decode_completion); > + > + vde->iram_lists_paddr = res_iram->start; > + vde->miscdev.minor = MISC_DYNAMIC_MINOR; > + vde->miscdev.name = "tegra_vde"; > + vde->miscdev.fops = &tegra_vde_fops; > + vde->miscdev.parent = dev; > + > + ret = misc_register(&vde->miscdev); > + if (ret) > + return -ENODEV; Preserve the error code. > + > + return 0; > +} > + regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 27.09.2017 12:45, Dan Carpenter wrote: > I feel like this code is good enough to go into the regular kernel > instead of staging, but I don't really know what "- properly handle > decoding faults" means in this context. As Greg pointed out, this patch should be cc'ed to media maintainers for a review. I'll cc them on V2, will see what they would suggest. Yeah, probably the decoding faults isn't a very candidate for a TODO for staging. Does the driver Oops all the > time or what? > Driver works fine. > Anyway, minor comments inline. > Thank you very much for the awesome review. I agree with the most of the comments. > On Tue, Sep 26, 2017 at 01:15:42AM +0300, Dmitry Osipenko wrote: >> diff --git a/drivers/staging/tegra-vde/Kconfig b/drivers/staging/tegra-vde/Kconfig >> new file mode 100644 >> index 000000000000..b947c012a373 >> --- /dev/null >> +++ b/drivers/staging/tegra-vde/Kconfig >> @@ -0,0 +1,6 @@ >> +config TEGRA_VDE >> + tristate "NVIDIA Tegra20 video decoder driver" >> + depends on ARCH_TEGRA_2x_SOC > > Could we get a || COMPILE_TEST here as well? > >> + help >> + Say Y here to enable support for a NVIDIA Tegra20 video decoder >> + driver. >> diff --git a/drivers/staging/tegra-vde/Makefile b/drivers/staging/tegra-vde/Makefile >> new file mode 100644 >> index 000000000000..e7c0df1174bf >> --- /dev/null >> +++ b/drivers/staging/tegra-vde/Makefile >> @@ -0,0 +1 @@ >> +obj-$(CONFIG_TEGRA_VDE) += vde.o >> diff --git a/drivers/staging/tegra-vde/TODO b/drivers/staging/tegra-vde/TODO >> new file mode 100644 >> index 000000000000..533ddfc5190e >> --- /dev/null >> +++ b/drivers/staging/tegra-vde/TODO >> @@ -0,0 +1,8 @@ >> +All TODO's require reverse-engineering to be done first, it is very >> +unlikely that NVIDIA would ever release HW specs to public. >> + >> +TODO: >> + - properly handle decoding faults >> + - support more formats > > Adding more formats is not a reason to put this into staging instead of > the normal drivers/ dir. > Well, it feels that the driver isn't very acceptable for the core drivers/. But maybe it's a wrong feeling. Again, will see what media/ maintainers would suggest. >> +static int tegra_vde_setup_context(struct tegra_vde *vde, >> + struct tegra_vde_h264_decoder_ctx *ctx, >> + struct video_frame *dpb_frames, >> + phys_addr_t bitstream_data_paddr, >> + int bitstream_data_size, >> + int macroblocks_nb) >> +{ >> + struct device *dev = vde->miscdev.parent; >> + u32 value; >> + >> + tegra_vde_set_bits(vde->regs, 0xA, SXE(0xF0)); >> + tegra_vde_set_bits(vde->regs, 0xB, BSEV(CMDQUE_CONTROL)); >> + tegra_vde_set_bits(vde->regs, 0x8002, MBE(0x50)); >> + tegra_vde_set_bits(vde->regs, 0xA, MBE(0xA0)); >> + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x14)); >> + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x28)); >> + tegra_vde_set_bits(vde->regs, 0xA00, MCE(0x08)); >> + tegra_vde_set_bits(vde->regs, 0xA, TFE(0x00)); >> + tegra_vde_set_bits(vde->regs, 0x5, VDMA(0x04)); >> + >> + VDE_WR(0x00000000, vde->regs + VDMA(0x1C)); >> + VDE_WR(0x00000000, vde->regs + VDMA(0x00)); >> + VDE_WR(0x00000007, vde->regs + VDMA(0x04)); >> + VDE_WR(0x00000007, vde->regs + FRAMEID(0x200)); >> + VDE_WR(0x00000005, vde->regs + TFE(0x04)); >> + VDE_WR(0x00000000, vde->regs + MBE(0x84)); >> + VDE_WR(0x00000010, vde->regs + SXE(0x08)); >> + VDE_WR(0x00000150, vde->regs + SXE(0x54)); >> + VDE_WR(0x0000054C, vde->regs + SXE(0x58)); >> + VDE_WR(0x00000E34, vde->regs + SXE(0x5C)); >> + VDE_WR(0x063C063C, vde->regs + MCE(0x10)); >> + VDE_WR(0x0003FC00, vde->regs + BSEV(INTR_STATUS)); >> + VDE_WR(0x0000150D, vde->regs + BSEV(BSE_CONFIG)); >> + VDE_WR(0x00000100, vde->regs + BSEV(BSE_INT_ENB)); >> + VDE_WR(0x00000000, vde->regs + BSEV(0x98)); >> + VDE_WR(0x00000060, vde->regs + BSEV(0x9C)); >> + >> + memset_io(vde->iram + 512, 0, macroblocks_nb / 2); >> + >> + tegra_setup_frameidx(vde->regs, dpb_frames, ctx->dpb_frames_nb, >> + ctx->pic_width_in_mbs, ctx->pic_height_in_mbs); >> + >> + tegra_vde_setup_iram_tables(vde->iram, dpb_frames, >> + ctx->dpb_frames_nb - 1, >> + ctx->dpb_ref_frames_with_earlier_poc_nb); >> + >> + VDE_WR(0x00000000, vde->regs + BSEV(0x8C)); >> + VDE_WR(bitstream_data_paddr + bitstream_data_size, >> + vde->regs + BSEV(0x54)); >> + >> + value = ctx->pic_width_in_mbs << 11 | ctx->pic_height_in_mbs << 3; >> + >> + VDE_WR(value, vde->regs + BSEV(0x88)); >> + >> + if (tegra_vde_wait_bsev(vde, false)) >> + return -EIO; >> + >> + if (tegra_vde_push_bsev_icmdqueue(vde, 0x800003FC, false)) > > Preserve the error code from tegra_vde_push_bsev_icmdqueue(). It's > still -EIO so this doesn't affect runtime. > Okay, I'll propagate error code in all other places as well. >> + return -EIO; >> + >> + value = 0x01500000; >> + value |= ((vde->iram_lists_paddr + 512) >> 2) & 0xFFFF; >> + >> + if (tegra_vde_push_bsev_icmdqueue(vde, value, true)) > > Same for all. > >> + return -EIO; >> + >> + if (tegra_vde_push_bsev_icmdqueue(vde, 0x840F054C, false)) >> + return -EIO; >> + >> + if (tegra_vde_push_bsev_icmdqueue(vde, 0x80000080, false)) >> + return -EIO; >> + >> + value = 0x0E340000 | ((vde->iram_lists_paddr >> 2) & 0xFFFF); >> + >> + if (tegra_vde_push_bsev_icmdqueue(vde, value, true)) >> + return -EIO; >> + >> + value = (1 << 23) | 5; > > I don't like these magic numbers. > I don't know what these bits do, at best I can replace them with a hex for consistency. >> + value |= ctx->pic_width_in_mbs << 11; >> + value |= ctx->pic_height_in_mbs << 3; >> + >> + VDE_WR(value, vde->regs + SXE(0x10)); >> + >> + value = !ctx->is_baseline_profile << 17; >> + value |= ctx->level_idc << 13; >> + value |= ctx->log2_max_pic_order_cnt_lsb << 7; >> + value |= ctx->pic_order_cnt_type << 5; >> + value |= ctx->log2_max_frame_num; >> + >> + VDE_WR(value, vde->regs + SXE(0x40)); >> + >> + value = ctx->pic_init_qp << 25; >> + value |= !!(ctx->deblocking_filter_control_present_flag) << 2; >> + value |= !!ctx->pic_order_present_flag; >> + >> + VDE_WR(value, vde->regs + SXE(0x44)); >> + >> + value = ctx->chroma_qp_index_offset; >> + value |= ctx->num_ref_idx_l0_active_minus1 << 5; >> + value |= ctx->num_ref_idx_l1_active_minus1 << 10; >> + value |= !!ctx->constrained_intra_pred_flag << 15; >> + >> + VDE_WR(value, vde->regs + SXE(0x48)); >> + >> + value = 0x0C000000; >> + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 24; >> + >> + VDE_WR(value, vde->regs + SXE(0x4C)); >> + >> + value = 0x03800000; >> + value |= min(bitstream_data_size, SZ_1M); >> + >> + VDE_WR(value, vde->regs + SXE(0x68)); >> + >> + VDE_WR(bitstream_data_paddr, vde->regs + SXE(0x6C)); >> + >> + value = (1 << 28) | 5; >> + value |= ctx->pic_width_in_mbs << 11; >> + value |= ctx->pic_height_in_mbs << 3; >> + >> + VDE_WR(value, vde->regs + MBE(0x80)); >> + >> + value = 0x26800000; >> + value |= ctx->level_idc << 4; >> + value |= !ctx->is_baseline_profile << 1; >> + value |= !!ctx->direct_8x8_inference_flag; >> + >> + VDE_WR(value, vde->regs + MBE(0x80)); >> + >> + VDE_WR(0xF4000001, vde->regs + MBE(0x80)); >> + VDE_WR(0x20000000, vde->regs + MBE(0x80)); >> + VDE_WR(0xF4000101, vde->regs + MBE(0x80)); >> + >> + value = 0x20000000; >> + value |= ctx->chroma_qp_index_offset << 8; >> + >> + VDE_WR(value, vde->regs + MBE(0x80)); >> + >> + if (tegra_vde_setup_mbe_frame_idx(vde->regs, >> + ctx->pic_order_cnt_type == 0, >> + ctx->dpb_frames_nb - 1)) { > > > Preserve the error code. > >> + dev_err(dev, "MBE frames setup failed\n"); >> + return -EIO; >> + } >> + >> + tegra_vde_mbe_set_0xa_reg(vde->regs, 0, 0x000009FC); >> + tegra_vde_mbe_set_0xa_reg(vde->regs, 2, 0xF1DEAD00); >> + tegra_vde_mbe_set_0xa_reg(vde->regs, 4, 0xF2DEAD00); >> + tegra_vde_mbe_set_0xa_reg(vde->regs, 6, 0xF3DEAD00); >> + tegra_vde_mbe_set_0xa_reg(vde->regs, 8, dpb_frames[0].aux_paddr); >> + >> + value = 0xFC000000; >> + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 2; >> + >> + if (!ctx->is_baseline_profile) >> + value |= !!(dpb_frames[0].flags & FLAG_IS_REFERENCE) << 1; >> + >> + VDE_WR(value, vde->regs + MBE(0x80)); >> + >> + if (tegra_vde_wait_mbe(vde->regs)) { >> + dev_err(dev, "MBE programming failed\n"); >> + return -EIO; >> + } >> + >> + return 0; >> +} >> + >> +static void tegra_vde_decode_frame(struct tegra_vde *vde, int macroblocks_nb) >> +{ >> + VDE_WR(0x00000001, vde->regs + BSEV(0x8C)); >> + VDE_WR(0x20000000 | (macroblocks_nb - 1), vde->regs + SXE(0x00)); >> +} >> + >> +static void tegra_vde_detach_and_put_dmabuf(struct dma_buf_attachment *a) >> +{ >> + struct dma_buf *dmabuf = a->dmabuf; >> + >> + if (IS_ERR_OR_NULL(a)) > > You just dereferenced this on the line before so it's too late. > Indeed, good catch! > Obviously we don't want to dereference either an error pointer or a NULL > but I'm wondering the significance of having it be both. Normally that > would mean that NULL is a special case of success. In other words, > error pointer means the hardware is broken but NULL means it's missing > and not required. I guess I'm suggesting to just delete the check and > make sure we never set this to either NULL or ERR_PTR. > NULL here exactly means that attachment is missing and not required. But anyway I'll get rid of IS_ERR_OR_NULL, thanks for the suggestion. >> + return; >> + >> + dma_buf_detach(dmabuf, a); >> + dma_buf_put(dmabuf); >> +} >> + >> +static int tegra_vde_attach_dmabuf(struct device *dev, int fd, >> + unsigned long offset, int min_size, >> + struct dma_buf_attachment **a, >> + phys_addr_t *paddr, u32 *size, >> + enum dma_data_direction dma_dir) >> +{ >> + struct dma_buf_attachment *attachment; >> + struct dma_buf *dmabuf; >> + struct sg_table *sgt; >> + >> + *a = NULL; >> + *paddr = 0xFBDEAD00; >> + >> + dmabuf = dma_buf_get(fd); >> + if (IS_ERR(dmabuf)) { >> + dev_err(dev, "Invalid dmabuf FD\n"); >> + return PTR_ERR(dmabuf); >> + } >> + >> + if ((u64)offset + min_size > dmabuf->size) { >> + dev_err(dev, "Too small dmabuf size %d @0x%lX, " >> + "should be at least %d\n", >> + dmabuf->size, offset, min_size); >> + return -EINVAL; >> + } >> + >> + attachment = dma_buf_attach(dmabuf, dev); >> + if (IS_ERR(attachment)) { >> + dev_err(dev, "Failed to attach dmabuf\n"); >> + dma_buf_put(dmabuf); >> + return PTR_ERR(attachment); >> + } >> + >> + sgt = dma_buf_map_attachment(attachment, dma_dir); >> + if (IS_ERR(sgt)) { >> + dev_err(dev, "Failed to get dmabuf sg_table\n"); >> + dma_buf_detach(dmabuf, attachment); >> + dma_buf_put(dmabuf); >> + return PTR_ERR(sgt); >> + } >> + >> + if (sgt->nents != 1) { >> + dev_err(dev, "Sparsed DMA area is unsupported\n"); > > s/Sparsed/Sparse/ > ACK. >> + dma_buf_unmap_attachment(attachment, sgt, dma_dir); >> + dma_buf_detach(dmabuf, attachment); >> + dma_buf_put(dmabuf); >> + return -EINVAL; > > This function would be cleaner using gotos to unwind. Pick the goto > name to reflect what the goto does. For example, here it would be: > Okay. > if (sgt->nents != 1) { > dev_err(dev, "Sparse DMA area is unsupported\n"); > ret = -EINVAL; > goto err_umap; > } > > > >> + } >> + >> + *paddr = sg_dma_address(sgt->sgl) + offset; >> + >> + dma_buf_unmap_attachment(attachment, sgt, dma_dir); >> + >> + *a = attachment; >> + >> + if (size) >> + *size = dmabuf->size - offset; >> + >> + return 0; > > return 0; > > err_unmap: > dma_buf_unmap_attachment(attachment, sgt, dma_dir); > err_detach: > dma_buf_detach(dmabuf, attachment); > err_put: > dma_buf_put(dmabuf); > return ret; > >> +} >> + >> +static int tegra_vde_attach_frame_dmabufs(struct device *dev, >> + struct video_frame *frame, >> + struct tegra_vde_h264_frame *source, >> + enum dma_data_direction dma_dir, >> + int is_baseline_profile, int csize) >> +{ >> + int ret; >> + >> + ret = tegra_vde_attach_dmabuf(dev, source->y_fd, >> + source->y_offset, csize * 4, >> + &frame->y_dmabuf_attachment, >> + &frame->y_paddr, NULL, dma_dir); >> + if (ret) >> + return ret; >> + >> + ret = tegra_vde_attach_dmabuf(dev, source->cb_fd, >> + source->cb_offset, csize, >> + &frame->cb_dmabuf_attachment, >> + &frame->cb_paddr, NULL, dma_dir); >> + if (ret) >> + return ret; >> + >> + ret = tegra_vde_attach_dmabuf(dev, source->cr_fd, >> + source->cr_offset, csize, >> + &frame->cr_dmabuf_attachment, >> + &frame->cr_paddr, NULL, dma_dir); >> + if (ret) >> + return ret; >> + >> + if (is_baseline_profile) >> + frame->aux_paddr = 0xF4DEAD00; > > The handling of is_baseline_profile is strange to me. It feels like we > should always check it before we use ->aux_paddr but we don't ever do > that. > In a case of baseline profile, aux buffer isn't needed, HW should't use it. Aux phys address is set to a predefined and invalid address, so that in a case of VDE trying to use it, its invalid memory accesses would be reported in KMSG by memory controller driver and the reported invalid addresses would be known to be associated with the aux buffer. I'm not sure what you are meaning. >> + else >> + ret = tegra_vde_attach_dmabuf(dev, source->aux_fd, >> + source->aux_offset, csize, >> + &frame->aux_dmabuf_attachment, >> + &frame->aux_paddr, NULL, dma_dir); > > > This function should have some error handling to undo the earlier > attach calls if the latter ones fail. See below: > > Okay. >> + >> + return ret; > > return 0; > > err_detach_cr: > tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment); > err_detach_cb: > tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment); > err_detach_y: > tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment); > > return ret; > > >> +} >> + >> +static void tegra_vde_deattach_frame_dmabufs(struct video_frame *frame) >> +{ >> + tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment); >> + tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment); >> + tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment); >> + tegra_vde_detach_and_put_dmabuf(frame->aux_dmabuf_attachment); > > > It would be happier to write this in the reverse order so it matches > the error handling that I wrote for you. > > Okay. >> +} >> + >> +static int tegra_vde_copy_and_validate_frame(struct device *dev, >> + struct tegra_vde_h264_frame *frame, >> + unsigned long vaddr) >> +{ >> + if (copy_from_user(frame, (void __user *)vaddr, sizeof(*frame))) { >> + dev_err(dev, "Copying of H.264 frame from user failed\n"); >> + return -EFAULT; > > Error message are a funny thing and different people feel different > ways. These can be triggered by the user so they let you spam dmesg > but I can see how many of them would be useful. These ones for > copy_from_user() are not useful since we assume the programmer should > know that stuff. The error code seems enough. > Agree. >> + } >> + >> + if (frame->frame_num > 0x7FFFFF) { >> + dev_err(dev, "Bad frame_num %u\n", frame->frame_num); >> + return -EINVAL; >> + } >> + >> + if (frame->y_offset & 0xFF) { >> + dev_err(dev, "Bad y_offset 0x%X\n", frame->y_offset); >> + return -EINVAL; >> + } >> + >> + if (frame->cb_offset & 0xFF) { >> + dev_err(dev, "Bad cb_offset 0x%X\n", frame->cb_offset); >> + return -EINVAL; >> + } >> + >> + if (frame->cr_offset & 0xFF) { >> + dev_err(dev, "Bad cr_offset 0x%X\n", frame->cr_offset); >> + return -EINVAL; >> + } >> + >> + return 0; >> +} >> + >> +static int tegra_vde_validate_h264_ctx(struct device *dev, >> + struct tegra_vde_h264_decoder_ctx *ctx) >> +{ >> + if (ctx->dpb_frames_nb == 0 || ctx->dpb_frames_nb > 17) { >> + dev_err(dev, "Bad DPB size %u\n", ctx->dpb_frames_nb); >> + return -EINVAL; >> + } >> + >> + if (ctx->level_idc > 15) { >> + dev_err(dev, "Bad level value %u\n", ctx->level_idc); >> + return -EINVAL; >> + } >> + >> + if (ctx->pic_init_qp > 52) { >> + dev_err(dev, "Bad pic_init_qp value %u\n", ctx->pic_init_qp); >> + return -EINVAL; >> + } >> + >> + if (ctx->log2_max_pic_order_cnt_lsb > 16) { >> + dev_err(dev, "Bad log2_max_pic_order_cnt_lsb value %u\n", >> + ctx->log2_max_pic_order_cnt_lsb); >> + return -EINVAL; >> + } >> + >> + if (ctx->log2_max_frame_num > 16) { >> + dev_err(dev, "Bad log2_max_frame_num value %u\n", >> + ctx->log2_max_frame_num); >> + return -EINVAL; >> + } >> + >> + if (ctx->chroma_qp_index_offset > 31) { >> + dev_err(dev, "Bad chroma_qp_index_offset value %u\n", >> + ctx->chroma_qp_index_offset); >> + return -EINVAL; >> + } >> + >> + if (ctx->pic_order_cnt_type > 2) { >> + dev_err(dev, "Bad pic_order_cnt_type value %u\n", >> + ctx->pic_order_cnt_type); >> + return -EINVAL; >> + } >> + >> + if (ctx->num_ref_idx_l0_active_minus1 > 15) { >> + dev_err(dev, "Bad num_ref_idx_l0_active_minus1 value %u\n", >> + ctx->num_ref_idx_l0_active_minus1); >> + return -EINVAL; >> + } >> + >> + if (ctx->num_ref_idx_l1_active_minus1 > 15) { >> + dev_err(dev, "Bad num_ref_idx_l1_active_minus1 value %u\n", >> + ctx->num_ref_idx_l1_active_minus1); >> + return -EINVAL; >> + } >> + >> + if (!ctx->pic_width_in_mbs || ctx->pic_width_in_mbs > 127) { >> + dev_err(dev, "Bad pic_width_in_mbs value %u, min 1 max 127\n", >> + ctx->pic_width_in_mbs); >> + return -EINVAL; >> + } >> + >> + if (!ctx->pic_height_in_mbs || ctx->pic_height_in_mbs > 127) { >> + dev_err(dev, "Bad pic_height_in_mbs value %u, min 1 max 127\n", >> + ctx->pic_height_in_mbs); >> + return -EINVAL; >> + } >> + >> + return 0; >> +} >> + >> +static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde, >> + unsigned long vaddr) >> +{ >> + struct tegra_vde_h264_decoder_ctx ctx; >> + struct tegra_vde_h264_frame frame; >> + struct device *dev = vde->miscdev.parent; >> + struct video_frame *dpb_frames = NULL; >> + struct dma_buf_attachment *bitstream_data_dmabuf_attachment = NULL; >> + enum dma_data_direction dma_dir; >> + phys_addr_t bitstream_data_paddr; >> + phys_addr_t bsev_paddr; >> + u32 bitstream_data_size; >> + u32 macroblocks_nb; >> + bool timeout = false; >> + int i, ret; >> + >> + if (copy_from_user(&ctx, (void __user *)vaddr, sizeof(ctx))) { >> + dev_err(dev, "Copying of H.264 CTX from user failed\n"); >> + return -EFAULT; >> + } >> + >> + ret = tegra_vde_validate_h264_ctx(dev, &ctx); >> + if (ret) >> + return -EINVAL; >> + >> + ret = tegra_vde_attach_dmabuf(dev, ctx.bitstream_data_fd, >> + ctx.bitstream_data_offset, 0, >> + &bitstream_data_dmabuf_attachment, >> + &bitstream_data_paddr, >> + &bitstream_data_size, >> + DMA_TO_DEVICE); >> + if (ret) >> + goto cleanup; > > > I hate this label name. What are we cleaning up??? Now I have to set a > bookmark so I can remember where I left and then scroll down to the > bottom of the function and take a look. > > [pause] > > OK. I'm back. I call this "one err" style error handling. And it's > very bug prone. Please read my essay on the topic: > > https://plus.google.com/106378716002406849458/posts/dnanfhQ4mHQ > > The bug is that we call tegra_vde_detach_and_put_dmabuf() with a NULL > pointer and there was that dereference before check that I mentioned > earlier. > Again, this NULL pointer was intentional. But yes, I messed up the IS_ERR_OR_NULL/derefernce order and will get rid of it as you suggested. No objections to what you wrote in the essay ;) >> + >> + dpb_frames = kcalloc(ctx.dpb_frames_nb, sizeof(*dpb_frames), >> + GFP_KERNEL); >> + if (!dpb_frames) { >> + ret = -ENOMEM; >> + goto cleanup; >> + } >> + >> + macroblocks_nb = ctx.pic_width_in_mbs * ctx.pic_height_in_mbs; >> + vaddr = ctx.dpb_frames_ptr; >> + >> + for (i = 0; i < ctx.dpb_frames_nb; i++) { >> + ret = tegra_vde_copy_and_validate_frame(dev, &frame, vaddr); >> + if (ret) >> + goto cleanup; >> + >> + dpb_frames[i].flags = frame.flags; >> + dpb_frames[i].frame_num = frame.frame_num; >> + >> + dma_dir = (i == 0) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; >> + >> + ret = tegra_vde_attach_frame_dmabufs(dev, >> + &dpb_frames[i], &frame, >> + dma_dir, >> + ctx.is_baseline_profile, >> + macroblocks_nb * 64); >> + if (ret) { >> + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]); >> + goto cleanup; >> + } >> + >> + vaddr += sizeof(frame); >> + } >> + >> + ret = clk_prepare_enable(vde->clk); >> + if (ret) { >> + dev_err(dev, "Failed to enable clk: %d\n", ret); >> + goto cleanup; >> + } >> + >> + ret = mutex_lock_interruptible(&vde->lock); >> + if (ret) >> + goto clkgate; >> + >> + ret = reset_control_deassert(vde->rst); >> + if (ret) { >> + dev_err(dev, "Failed to deassert reset: %d\n", ret); >> + clk_disable_unprepare(vde->clk); > > We do a second clk_disable_unprepare(vde->clk); after the unlock. > Delete this? > Good catch! I've reshuffled clk managment several times before.. >> + goto unlock; >> + } >> + >> + ret = tegra_vde_setup_context(vde, &ctx, dpb_frames, >> + bitstream_data_paddr, >> + bitstream_data_size, >> + macroblocks_nb); >> + if (ret) >> + goto reset; >> + >> + tegra_vde_decode_frame(vde, macroblocks_nb); >> + >> + timeout = !wait_for_completion_io_timeout(&vde->decode_completion, >> + TEGRA_VDE_TIMEOUT); >> + if (timeout) { >> + bsev_paddr = readl(vde->regs + BSEV(0x10)); >> + macroblocks_nb = readl(vde->regs + SXE(0xC8)) & 0x1FFF; >> + >> + dev_err(dev, "Decoding failed, " >> + "read 0x%X bytes : %u macroblocks parsed\n", >> + bsev_paddr ? bsev_paddr - bitstream_data_paddr : 0, >> + macroblocks_nb); >> + } >> + >> +reset: >> + /* >> + * We rely on the VDE registers reset value, otherwise VDE would >> + * cause bus lockup. >> + */ >> + ret = reset_control_assert(vde->rst); >> + if (ret) >> + dev_err(dev, "Failed to assert reset: %d\n", ret); > > We're overwriting "ret" here when we probably want to preserve the error > code from reset_control_deassert(). > I think it doesn't really matter. It's very unlikely that the reset assert could ever fail, it might result in a further system hang on the next decode invocation. >> + >> + if (timeout) >> + ret = -EIO; >> + >> +unlock: >> + mutex_unlock(&vde->lock); >> + >> +clkgate: >> + clk_disable_unprepare(vde->clk); >> + >> +cleanup: >> + if (dpb_frames) >> + while (i--) >> + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]); >> + >> + kfree(dpb_frames); >> + >> + tegra_vde_detach_and_put_dmabuf(bitstream_data_dmabuf_attachment); >> + >> + return ret; >> +} >> + >> +static long tegra_vde_unlocked_ioctl(struct file *filp, >> + unsigned int cmd, unsigned long arg) >> +{ >> + struct miscdevice *miscdev = filp->private_data; >> + struct tegra_vde *vde = container_of(miscdev, struct tegra_vde, >> + miscdev); >> + >> + switch (cmd) { >> + case TEGRA_VDE_IOCTL_DECODE_H264: >> + return tegra_vde_ioctl_decode_h264(vde, arg); >> + } >> + >> + dev_err(miscdev->parent, "Invalid IOCTL command %u\n", cmd); >> + >> + return -ENOTTY; >> +} >> + >> +static const struct file_operations tegra_vde_fops = { >> + .owner = THIS_MODULE, >> + .unlocked_ioctl = tegra_vde_unlocked_ioctl, >> +}; >> + >> +static irqreturn_t tegra_vde_isr(int irq, void *data) >> +{ >> + struct tegra_vde *vde = data; >> + >> + tegra_vde_set_bits(vde->regs, 0, FRAMEID(0x208)); >> + complete(&vde->decode_completion); >> + >> + return IRQ_HANDLED; >> +} >> + >> +static int tegra_vde_probe(struct platform_device *pdev) >> +{ >> + struct resource *res_regs, *res_iram; >> + struct device *dev = &pdev->dev; >> + struct tegra_vde *vde; >> + int ret; >> + >> + vde = devm_kzalloc(&pdev->dev, sizeof(*vde), GFP_KERNEL); >> + if (!vde) >> + return -ENOMEM; >> + >> + platform_set_drvdata(pdev, vde); >> + >> + res_regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs"); >> + if (!res_regs) >> + return -ENODEV; >> + >> + res_iram = platform_get_resource_byname(pdev, IORESOURCE_MEM, "iram"); >> + if (!res_iram) >> + return -ENODEV; >> + >> + vde->irq = platform_get_irq_byname(pdev, "sync-token"); >> + if (vde->irq < 0) >> + return vde->irq; >> + >> + vde->regs = devm_ioremap_resource(dev, res_regs); >> + if (IS_ERR(vde->regs)) >> + return PTR_ERR(vde->regs); >> + >> + vde->iram = devm_ioremap_resource(dev, res_iram); >> + if (IS_ERR(vde->iram)) >> + return PTR_ERR(vde->iram); >> + >> + vde->clk = devm_clk_get(dev, NULL); >> + if (IS_ERR(vde->clk)) { >> + dev_err(dev, "Could not get VDE clk\n"); >> + return PTR_ERR(vde->clk); >> + } >> + >> + vde->rst = devm_reset_control_get(dev, "vde"); >> + if (IS_ERR(vde->rst)) { >> + dev_err(dev, "Could not get VDE reset\n"); >> + return PTR_ERR(vde->rst); >> + } >> + >> + ret = devm_request_irq(dev, vde->irq, tegra_vde_isr, IRQF_SHARED, >> + dev_name(dev), vde); >> + if (ret) >> + return -ENODEV; > > Presever the error code. > >> + >> + ret = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_VDEC, >> + vde->clk, vde->rst); >> + if (ret) { >> + dev_err(&pdev->dev, "Failed to power up VDE unit\n"); >> + return ret; >> + } >> + >> + ret = reset_control_assert(vde->rst); >> + if (ret) { >> + dev_err(dev, "Failed to assert reset: %d\n", ret); >> + return ret; >> + } >> + >> + clk_disable_unprepare(vde->clk); >> + >> + mutex_init(&vde->lock); >> + init_completion(&vde->decode_completion); >> + >> + vde->iram_lists_paddr = res_iram->start; >> + vde->miscdev.minor = MISC_DYNAMIC_MINOR; >> + vde->miscdev.name = "tegra_vde"; >> + vde->miscdev.fops = &tegra_vde_fops; >> + vde->miscdev.parent = dev; >> + >> + ret = misc_register(&vde->miscdev); >> + if (ret) >> + return -ENODEV; > > Preserve the error code. > >> + >> + return 0; >> +} >> + >
On 27.09.2017 12:45, Dan Carpenter wrote: >> --- /dev/null >> +++ b/drivers/staging/tegra-vde/Kconfig >> @@ -0,0 +1,6 @@ >> +config TEGRA_VDE >> + tristate "NVIDIA Tegra20 video decoder driver" >> + depends on ARCH_TEGRA_2x_SOC > > Could we get a || COMPILE_TEST here as well? > Good point, I'll address it in V2.
On Thu, Sep 28, 2017 at 02:28:04AM +0300, Dmitry Osipenko wrote: > >> + if (is_baseline_profile) > >> + frame->aux_paddr = 0xF4DEAD00; > > > > The handling of is_baseline_profile is strange to me. It feels like we > > should always check it before we use ->aux_paddr but we don't ever do > > that. > > > > In a case of baseline profile, aux buffer isn't needed, HW should't use it. Aux > phys address is set to a predefined and invalid address, so that in a case of > VDE trying to use it, its invalid memory accesses would be reported in KMSG by > memory controller driver and the reported invalid addresses would be known to be > associated with the aux buffer. I'm not sure what you are meaning. It's not used perhaps, but we do write it to the hardware, right? tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr); It's just strange. regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28.09.2017 10:23, Dan Carpenter wrote: > On Thu, Sep 28, 2017 at 02:28:04AM +0300, Dmitry Osipenko wrote: >>>> + if (is_baseline_profile) >>>> + frame->aux_paddr = 0xF4DEAD00; >>> >>> The handling of is_baseline_profile is strange to me. It feels like we >>> should always check it before we use ->aux_paddr but we don't ever do >>> that. >>> >> >> In a case of baseline profile, aux buffer isn't needed, HW should't use it. Aux >> phys address is set to a predefined and invalid address, so that in a case of >> VDE trying to use it, its invalid memory accesses would be reported in KMSG by >> memory controller driver and the reported invalid addresses would be known to be >> associated with the aux buffer. I'm not sure what you are meaning. > > It's not used perhaps, but we do write it to the hardware, right? > That's right. I'm pretty sure HW won't use the aux in a case of baseline profile, haven't seen an evidence of it. But in order to be on a safe side, the addresses are initialized to an invalid value, so HW won't have a chance to silently fetch/trash 'arbitrary' main memory and we will know if it tries to do it. if (baseline_profile) frame->aux_paddr = 0xF4DEAD00; else { So here ^ all *used* frame entries are being initialized. > tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr); > > It's just strange. > There up to 16 reference video frames (already decoded) that could be used for decoding of a video frame. Addresses of reference frames that shouldn't be used for decoding of the current frame are set to an invalid address. Userspace my supply wrong frames list or frames list setup code may contain an obscure bug and we will know about it. } else { aux_paddr = 0xFADEAD00; value = 0; } Here ^ all *unused* frame entries are being initialized. tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr); And here ^ these entries are written to the tables that are read by HW.
diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt new file mode 100644 index 000000000000..e5ca6f5e96c0 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt @@ -0,0 +1,38 @@ +NVIDIA Tegra Video Decoder Engine + +Required properties: +- compatible : "nvidia,tegra20-vde" +- reg : Must contain 2 register ranges: registers and IRAM area. +- reg-names : Must include the following entries: + - regs + - iram +- interrupts : Must contain an entry for each entry in interrupt-names. +- interrupt-names : Must include the following entries: + - ucq-error + - sync-token + - bsev + - bsea + - sxe +- clocks : Must contain one entry, for the module clock. + See ../clocks/clock-bindings.txt for details. +- resets : Must contain an entry for each entry in reset-names. + See ../reset/reset.txt for details. +- reset-names : Must include the following entries: + - vde + +Example: + vde@6001a000 { + compatible = "nvidia,tegra20-vde"; + reg = <0x6001a000 0x3D00 /* VDE registers */ + 0x40000400 0x3FC00>; /* IRAM area */ + reg-names = "regs", "iram"; + interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>, /* UCQ error interrupt */ + <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>, /* Sync token interrupt */ + <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>, /* BSE-V interrupt */ + <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>, /* BSE-A interrupt */ + <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>; /* SXE interrupt */ + interrupt-names = "ucq-error", "sync-token", "bsev", "bsea", "sxe"; + clocks = <&tegra_car TEGRA20_CLK_VDE>; + resets = <&tegra_car 61>; + reset-names = "vde"; + }; diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig index 554683912cff..10c982811093 100644 --- a/drivers/staging/Kconfig +++ b/drivers/staging/Kconfig @@ -118,4 +118,6 @@ source "drivers/staging/vboxvideo/Kconfig" source "drivers/staging/pi433/Kconfig" +source "drivers/staging/tegra-vde/Kconfig" + endif # STAGING diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile index 8951c37d8d80..d07c167d5773 100644 --- a/drivers/staging/Makefile +++ b/drivers/staging/Makefile @@ -49,3 +49,4 @@ obj-$(CONFIG_BCM2835_VCHIQ) += vc04_services/ obj-$(CONFIG_CRYPTO_DEV_CCREE) += ccree/ obj-$(CONFIG_DRM_VBOXVIDEO) += vboxvideo/ obj-$(CONFIG_PI433) += pi433/ +obj-$(CONFIG_ARCH_TEGRA) += tegra-vde/ diff --git a/drivers/staging/tegra-vde/Kconfig b/drivers/staging/tegra-vde/Kconfig new file mode 100644 index 000000000000..b947c012a373 --- /dev/null +++ b/drivers/staging/tegra-vde/Kconfig @@ -0,0 +1,6 @@ +config TEGRA_VDE + tristate "NVIDIA Tegra20 video decoder driver" + depends on ARCH_TEGRA_2x_SOC + help + Say Y here to enable support for a NVIDIA Tegra20 video decoder + driver. diff --git a/drivers/staging/tegra-vde/Makefile b/drivers/staging/tegra-vde/Makefile new file mode 100644 index 000000000000..e7c0df1174bf --- /dev/null +++ b/drivers/staging/tegra-vde/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_TEGRA_VDE) += vde.o diff --git a/drivers/staging/tegra-vde/TODO b/drivers/staging/tegra-vde/TODO new file mode 100644 index 000000000000..533ddfc5190e --- /dev/null +++ b/drivers/staging/tegra-vde/TODO @@ -0,0 +1,8 @@ +All TODO's require reverse-engineering to be done first, it is very +unlikely that NVIDIA would ever release HW specs to public. + +TODO: + - properly handle decoding faults + - support more formats + +Contact: Dmitry Osipenko <digetx@gmail.com> diff --git a/drivers/staging/tegra-vde/uapi.h b/drivers/staging/tegra-vde/uapi.h new file mode 100644 index 000000000000..4e60449f688e --- /dev/null +++ b/drivers/staging/tegra-vde/uapi.h @@ -0,0 +1,99 @@ +/* + * Copyright (C) 2016-2017 Dmitry Osipenko <digetx@gmail.com> + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#ifndef _UAPI_TEGRA_VDE_H_ +#define _UAPI_TEGRA_VDE_H_ + +#include <linux/types.h> +#include <asm/ioctl.h> + +#if defined(__cplusplus) +extern "C" { +#endif + +#define FLAG_IS_B_FRAME (1 << 0) +#define FLAG_IS_REFERENCE (1 << 1) + +struct tegra_vde_h264_frame { + __s32 y_fd; + __s32 cb_fd; + __s32 cr_fd; + __s32 aux_fd; + __u32 y_offset; + __u32 cb_offset; + __u32 cr_offset; + __u32 aux_offset; + __u32 frame_num; + __u32 flags; + + __u32 reserved; +} __attribute__((packed)); + +struct tegra_vde_h264_decoder_ctx { + __s32 bitstream_data_fd; + __u32 bitstream_data_offset; + + __u32 dpb_frames_ptr; + __u8 dpb_frames_nb; + __u8 dpb_ref_frames_with_earlier_poc_nb; + + // SPS + __u8 is_baseline_profile; + __u8 level_idc; + __u8 log2_max_pic_order_cnt_lsb; + __u8 log2_max_frame_num; + __u8 pic_order_cnt_type; + __u8 direct_8x8_inference_flag; + __u8 pic_width_in_mbs; + __u8 pic_height_in_mbs; + + // PPS + __u8 pic_init_qp; + __u8 deblocking_filter_control_present_flag; + __u8 constrained_intra_pred_flag; + __u8 chroma_qp_index_offset; + __u8 pic_order_present_flag; + + // Slice header + __u8 num_ref_idx_l0_active_minus1; + __u8 num_ref_idx_l1_active_minus1; + + __u32 reserved; +} __attribute__((packed)); + +#define VDE_IOCTL_BASE 'v' +#define VDE_IO(nr) _IO(VDE_IOCTL_BASE,nr) +#define VDE_IOR(nr,type) _IOR(VDE_IOCTL_BASE,nr,type) +#define VDE_IOW(nr,type) _IOW(VDE_IOCTL_BASE,nr,type) +#define VDE_IOWR(nr,type) _IOWR(VDE_IOCTL_BASE,nr,type) + +#define TEGRA_VDE_DECODE_H264 0x01 + +#define TEGRA_VDE_IOCTL_DECODE_H264 VDE_IOW(VDE_IOCTL_BASE + TEGRA_VDE_DECODE_H264, struct tegra_vde_h264_decoder_ctx) + +#if defined(__cplusplus) +} +#endif + +#endif // _UAPI_TEGRA_VDE_H_ diff --git a/drivers/staging/tegra-vde/vde.c b/drivers/staging/tegra-vde/vde.c new file mode 100644 index 000000000000..309d87926e21 --- /dev/null +++ b/drivers/staging/tegra-vde/vde.c @@ -0,0 +1,971 @@ +/* + * NVIDIA Tegra20 Video decoder driver + * + * Copyright (C) 2016-2017 Dmitry Osipenko <digetx@gmail.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include <linux/clk.h> +#include <linux/delay.h> +#include <linux/dma-buf.h> +#include <linux/interrupt.h> +#include <linux/io.h> +#include <linux/iopoll.h> +#include <linux/miscdevice.h> +#include <linux/module.h> +#include <linux/platform_device.h> +#include <linux/reset.h> +#include <linux/slab.h> +#include <linux/uaccess.h> + +#include <soc/tegra/pmc.h> + +#include "uapi.h" + +#define SXE(offt) (0x0000 + (offt)) +#define BSEV(offt) (0x1000 + (offt)) +#define MBE(offt) (0x2000 + (offt)) +#define PPE(offt) (0x2200 + (offt)) +#define MCE(offt) (0x2400 + (offt)) +#define TFE(offt) (0x2600 + (offt)) +#define VDMA(offt) (0x2A00 + (offt)) +#define FRAMEID(offt) (0x3800 + (offt)) + +#define ICMDQUE_WR 0x00 +#define CMDQUE_CONTROL 0x08 +#define INTR_STATUS 0x18 +#define BSE_INT_ENB 0x40 +#define BSE_CONFIG 0x44 + +#define BSE_ICMDQUE_EMPTY BIT(3) +#define BSE_DMA_BUSY BIT(23) + +#define TEGRA_VDE_TIMEOUT (msecs_to_jiffies(1000)) + +#define VDE_WR(data, addr) \ +do { \ + pr_debug("%s: %d: 0x%08X => " #addr ")\n", \ + __func__, __LINE__, (data)); \ + writel_relaxed(data, addr); \ +} while (0) + +struct video_frame { + struct dma_buf_attachment *y_dmabuf_attachment; + struct dma_buf_attachment *cb_dmabuf_attachment; + struct dma_buf_attachment *cr_dmabuf_attachment; + struct dma_buf_attachment *aux_dmabuf_attachment; + dma_addr_t y_paddr; + dma_addr_t cb_paddr; + dma_addr_t cr_paddr; + dma_addr_t aux_paddr; + u32 frame_num; + u32 flags; +}; + +struct tegra_vde { + void __iomem *regs; + void __iomem *iram; + struct mutex lock; + struct miscdevice miscdev; + struct reset_control *rst; + struct clk *clk; + struct completion decode_completion; + phys_addr_t iram_lists_paddr; + int irq; +}; + +static void tegra_vde_set_bits(void __iomem *regs, u32 mask, u32 offset) +{ + u32 value = readl_relaxed(regs + offset); + + VDE_WR(value | mask, regs + offset); +} + +static int tegra_vde_wait_mbe(void __iomem *regs) +{ + u32 tmp; + + return readl_poll_timeout(regs + MBE(0x8C), tmp, (tmp >= 0x10), 1, 100); +} + +static int tegra_vde_setup_mbe_frame_idx(void __iomem *regs, + int setup_refs, int refs_nb) +{ + u32 frame_idx_enb_mask = 0; + u32 value; + int frame_idx; + int idx; + + VDE_WR(0xD0000000 | (0 << 23), regs + MBE(0x80)); + VDE_WR(0xD0200000 | (0 << 23), regs + MBE(0x80)); + + if (tegra_vde_wait_mbe(regs)) + return -EIO; + + if (!setup_refs) + return 0; + + for (idx = 0, frame_idx = 1; idx < refs_nb; idx++, frame_idx++) { + VDE_WR(0xD0000000 | (frame_idx << 23), regs + MBE(0x80)); + VDE_WR(0xD0200000 | (frame_idx << 23), regs + MBE(0x80)); + + frame_idx_enb_mask |= frame_idx << (6 * (idx % 4)); + + if (idx % 4 == 3 || idx == refs_nb - 1) { + value = 0xC0000000; + value |= (idx >> 2) << 24; + value |= frame_idx_enb_mask; + + VDE_WR(value, regs + MBE(0x80)); + + if (tegra_vde_wait_mbe(regs)) + return -EIO; + + frame_idx_enb_mask = 0; + } + } + + return 0; +} + +static void tegra_vde_mbe_set_0xa_reg(void __iomem *regs, int reg, u32 val) +{ + VDE_WR(0xA0000000 | (reg << 24) | (val & 0xFFFF), regs + MBE(0x80)); + VDE_WR(0xA0000000 | ((reg + 1) << 24) | (val >> 16), regs + MBE(0x80)); +} + +static int tegra_vde_wait_bsev(struct tegra_vde *vde, bool wait_dma) +{ + struct device *dev = vde->miscdev.parent; + u32 polled; + int ret; + + ret = readl_poll_timeout(vde->regs + BSEV(INTR_STATUS), polled, + !(polled & BIT(2)), 1, 100); + if (ret) { + dev_err(dev, "BSEV unknown bit timeout\n"); + return -EIO; + } + + ret = readl_poll_timeout(vde->regs + BSEV(INTR_STATUS), polled, + (polled & BSE_ICMDQUE_EMPTY), 1, 100); + if (ret) { + dev_err(dev, "BSEV ICMDQUE flush timeout\n"); + return -EIO; + } + + if (!wait_dma) + return 0; + + ret = readl_poll_timeout(vde->regs + BSEV(INTR_STATUS), polled, + !(polled & BSE_DMA_BUSY), 1, 100); + if (ret) { + dev_err(dev, "BSEV DMA timeout\n"); + return -EIO; + } + + return 0; +} + +static int tegra_vde_push_bsev_icmdqueue(struct tegra_vde *vde, + u32 value, bool wait_dma) +{ + VDE_WR(value, vde->regs + BSEV(ICMDQUE_WR)); + + return tegra_vde_wait_bsev(vde, wait_dma); +} + +static void tegra_vde_setup_frameid(void __iomem *regs, + struct video_frame *frame, int frameid, + u32 mbs_width, u32 mbs_height) +{ + u32 y_paddr = frame ? frame->y_paddr : 0xFCDEAD00; + u32 cb_paddr = frame ? frame->cb_paddr : 0xFCDEAD00; + u32 cr_paddr = frame ? frame->cr_paddr : 0xFCDEAD00; + u32 v1 = frame ? ((mbs_width << 16) | mbs_height) : 0; + u32 v2 = frame ? ((((mbs_width + 1) >> 1) << 6) | 1) : 0; + + VDE_WR( y_paddr >> 8, regs + FRAMEID(0x000 + frameid * 4)); + VDE_WR(cb_paddr >> 8, regs + FRAMEID(0x100 + frameid * 4)); + VDE_WR(cr_paddr >> 8, regs + FRAMEID(0x180 + frameid * 4)); + VDE_WR(v1, regs + FRAMEID(0x080 + frameid * 4)); + VDE_WR(v2, regs + FRAMEID(0x280 + frameid * 4)); +} + +static void tegra_setup_frameidx(void __iomem *regs, + struct video_frame *frames, int frames_nb, + u32 mbs_width, u32 mbs_height) +{ + int idx; + + for (idx = 0; idx < frames_nb; idx++) + tegra_vde_setup_frameid(regs, &frames[idx], idx, + mbs_width, mbs_height); + for (; idx < 17; idx++) + tegra_vde_setup_frameid(regs, NULL, idx, 0, 0); +} + +static void tegra_vde_write_iram_entry(void __iomem *tables, + int table, int row, + u32 value1, u32 value2) +{ + VDE_WR(value1, tables + 0x80 * table + row * 8); + VDE_WR(value2, tables + 0x80 * table + row * 8 + 4); +} + +static void tegra_vde_setup_iram_tables(void __iomem *iram_tables, + struct video_frame *dpb_frames, + int ref_frames_nb, + int with_earlier_poc_nb) +{ + struct video_frame *frame; + u32 value, aux_paddr; + int with_later_poc_nb; + int i, k; + + pr_debug("DPB: Frame 0: frame_num = %d\n", dpb_frames[0].frame_num); + + pr_debug("REF L0:\n"); + + for (i = 0; i < 16; i++) { + if (i < ref_frames_nb) { + frame = &dpb_frames[i + 1]; + + aux_paddr = frame->aux_paddr; + + value = (i + 1) << 26; + value |= !(frame->flags & FLAG_IS_B_FRAME) << 25; + value |= 1 << 24; + value |= frame->frame_num; + + pr_debug("\tFrame %d: frame_num = %d is_B_frame = %d\n", + i + 1, frame->frame_num, + (frame->flags & FLAG_IS_B_FRAME)); + } else { + aux_paddr = 0xFADEAD00; + value = 0; + } + + tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr); + tegra_vde_write_iram_entry(iram_tables, 1, i, value, aux_paddr); + tegra_vde_write_iram_entry(iram_tables, 2, i, value, aux_paddr); + tegra_vde_write_iram_entry(iram_tables, 3, i, value, aux_paddr); + } + + if (!(dpb_frames[0].flags & FLAG_IS_B_FRAME)) + return; + + if (with_earlier_poc_nb >= ref_frames_nb) + return; + + with_later_poc_nb = ref_frames_nb - with_earlier_poc_nb; + + pr_debug("REF L1: with_later_poc_nb %d with_earlier_poc_nb %d\n", + with_later_poc_nb, with_earlier_poc_nb); + + for (i = 0, k = with_earlier_poc_nb; i < with_later_poc_nb; i++, k++) { + frame = &dpb_frames[k + 1]; + + aux_paddr = frame->aux_paddr; + + value = (k + 1) << 26; + value |= !(frame->flags & FLAG_IS_B_FRAME) << 25; + value |= 1 << 24; + value |= frame->frame_num; + + pr_debug("\tFrame %d: frame_num = %d\n", + k + 1, frame->frame_num); + + tegra_vde_write_iram_entry(iram_tables, 2, i, value, aux_paddr); + } + + for (k = 0; i < ref_frames_nb; i++, k++) { + frame = &dpb_frames[k + 1]; + + aux_paddr = frame->aux_paddr; + + value = (k + 1) << 26; + value |= !(frame->flags & FLAG_IS_B_FRAME) << 25; + value |= 1 << 24; + value |= frame->frame_num; + + pr_debug("\tFrame %d: frame_num = %d\n", + k + 1, frame->frame_num); + + tegra_vde_write_iram_entry(iram_tables, 2, i, value, aux_paddr); + } +} + +static int tegra_vde_setup_context(struct tegra_vde *vde, + struct tegra_vde_h264_decoder_ctx *ctx, + struct video_frame *dpb_frames, + phys_addr_t bitstream_data_paddr, + int bitstream_data_size, + int macroblocks_nb) +{ + struct device *dev = vde->miscdev.parent; + u32 value; + + tegra_vde_set_bits(vde->regs, 0xA, SXE(0xF0)); + tegra_vde_set_bits(vde->regs, 0xB, BSEV(CMDQUE_CONTROL)); + tegra_vde_set_bits(vde->regs, 0x8002, MBE(0x50)); + tegra_vde_set_bits(vde->regs, 0xA, MBE(0xA0)); + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x14)); + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x28)); + tegra_vde_set_bits(vde->regs, 0xA00, MCE(0x08)); + tegra_vde_set_bits(vde->regs, 0xA, TFE(0x00)); + tegra_vde_set_bits(vde->regs, 0x5, VDMA(0x04)); + + VDE_WR(0x00000000, vde->regs + VDMA(0x1C)); + VDE_WR(0x00000000, vde->regs + VDMA(0x00)); + VDE_WR(0x00000007, vde->regs + VDMA(0x04)); + VDE_WR(0x00000007, vde->regs + FRAMEID(0x200)); + VDE_WR(0x00000005, vde->regs + TFE(0x04)); + VDE_WR(0x00000000, vde->regs + MBE(0x84)); + VDE_WR(0x00000010, vde->regs + SXE(0x08)); + VDE_WR(0x00000150, vde->regs + SXE(0x54)); + VDE_WR(0x0000054C, vde->regs + SXE(0x58)); + VDE_WR(0x00000E34, vde->regs + SXE(0x5C)); + VDE_WR(0x063C063C, vde->regs + MCE(0x10)); + VDE_WR(0x0003FC00, vde->regs + BSEV(INTR_STATUS)); + VDE_WR(0x0000150D, vde->regs + BSEV(BSE_CONFIG)); + VDE_WR(0x00000100, vde->regs + BSEV(BSE_INT_ENB)); + VDE_WR(0x00000000, vde->regs + BSEV(0x98)); + VDE_WR(0x00000060, vde->regs + BSEV(0x9C)); + + memset_io(vde->iram + 512, 0, macroblocks_nb / 2); + + tegra_setup_frameidx(vde->regs, dpb_frames, ctx->dpb_frames_nb, + ctx->pic_width_in_mbs, ctx->pic_height_in_mbs); + + tegra_vde_setup_iram_tables(vde->iram, dpb_frames, + ctx->dpb_frames_nb - 1, + ctx->dpb_ref_frames_with_earlier_poc_nb); + + VDE_WR(0x00000000, vde->regs + BSEV(0x8C)); + VDE_WR(bitstream_data_paddr + bitstream_data_size, + vde->regs + BSEV(0x54)); + + value = ctx->pic_width_in_mbs << 11 | ctx->pic_height_in_mbs << 3; + + VDE_WR(value, vde->regs + BSEV(0x88)); + + if (tegra_vde_wait_bsev(vde, false)) + return -EIO; + + if (tegra_vde_push_bsev_icmdqueue(vde, 0x800003FC, false)) + return -EIO; + + value = 0x01500000; + value |= ((vde->iram_lists_paddr + 512) >> 2) & 0xFFFF; + + if (tegra_vde_push_bsev_icmdqueue(vde, value, true)) + return -EIO; + + if (tegra_vde_push_bsev_icmdqueue(vde, 0x840F054C, false)) + return -EIO; + + if (tegra_vde_push_bsev_icmdqueue(vde, 0x80000080, false)) + return -EIO; + + value = 0x0E340000 | ((vde->iram_lists_paddr >> 2) & 0xFFFF); + + if (tegra_vde_push_bsev_icmdqueue(vde, value, true)) + return -EIO; + + value = (1 << 23) | 5; + value |= ctx->pic_width_in_mbs << 11; + value |= ctx->pic_height_in_mbs << 3; + + VDE_WR(value, vde->regs + SXE(0x10)); + + value = !ctx->is_baseline_profile << 17; + value |= ctx->level_idc << 13; + value |= ctx->log2_max_pic_order_cnt_lsb << 7; + value |= ctx->pic_order_cnt_type << 5; + value |= ctx->log2_max_frame_num; + + VDE_WR(value, vde->regs + SXE(0x40)); + + value = ctx->pic_init_qp << 25; + value |= !!(ctx->deblocking_filter_control_present_flag) << 2; + value |= !!ctx->pic_order_present_flag; + + VDE_WR(value, vde->regs + SXE(0x44)); + + value = ctx->chroma_qp_index_offset; + value |= ctx->num_ref_idx_l0_active_minus1 << 5; + value |= ctx->num_ref_idx_l1_active_minus1 << 10; + value |= !!ctx->constrained_intra_pred_flag << 15; + + VDE_WR(value, vde->regs + SXE(0x48)); + + value = 0x0C000000; + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 24; + + VDE_WR(value, vde->regs + SXE(0x4C)); + + value = 0x03800000; + value |= min(bitstream_data_size, SZ_1M); + + VDE_WR(value, vde->regs + SXE(0x68)); + + VDE_WR(bitstream_data_paddr, vde->regs + SXE(0x6C)); + + value = (1 << 28) | 5; + value |= ctx->pic_width_in_mbs << 11; + value |= ctx->pic_height_in_mbs << 3; + + VDE_WR(value, vde->regs + MBE(0x80)); + + value = 0x26800000; + value |= ctx->level_idc << 4; + value |= !ctx->is_baseline_profile << 1; + value |= !!ctx->direct_8x8_inference_flag; + + VDE_WR(value, vde->regs + MBE(0x80)); + + VDE_WR(0xF4000001, vde->regs + MBE(0x80)); + VDE_WR(0x20000000, vde->regs + MBE(0x80)); + VDE_WR(0xF4000101, vde->regs + MBE(0x80)); + + value = 0x20000000; + value |= ctx->chroma_qp_index_offset << 8; + + VDE_WR(value, vde->regs + MBE(0x80)); + + if (tegra_vde_setup_mbe_frame_idx(vde->regs, + ctx->pic_order_cnt_type == 0, + ctx->dpb_frames_nb - 1)) { + dev_err(dev, "MBE frames setup failed\n"); + return -EIO; + } + + tegra_vde_mbe_set_0xa_reg(vde->regs, 0, 0x000009FC); + tegra_vde_mbe_set_0xa_reg(vde->regs, 2, 0xF1DEAD00); + tegra_vde_mbe_set_0xa_reg(vde->regs, 4, 0xF2DEAD00); + tegra_vde_mbe_set_0xa_reg(vde->regs, 6, 0xF3DEAD00); + tegra_vde_mbe_set_0xa_reg(vde->regs, 8, dpb_frames[0].aux_paddr); + + value = 0xFC000000; + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 2; + + if (!ctx->is_baseline_profile) + value |= !!(dpb_frames[0].flags & FLAG_IS_REFERENCE) << 1; + + VDE_WR(value, vde->regs + MBE(0x80)); + + if (tegra_vde_wait_mbe(vde->regs)) { + dev_err(dev, "MBE programming failed\n"); + return -EIO; + } + + return 0; +} + +static void tegra_vde_decode_frame(struct tegra_vde *vde, int macroblocks_nb) +{ + VDE_WR(0x00000001, vde->regs + BSEV(0x8C)); + VDE_WR(0x20000000 | (macroblocks_nb - 1), vde->regs + SXE(0x00)); +} + +static void tegra_vde_detach_and_put_dmabuf(struct dma_buf_attachment *a) +{ + struct dma_buf *dmabuf = a->dmabuf; + + if (IS_ERR_OR_NULL(a)) + return; + + dma_buf_detach(dmabuf, a); + dma_buf_put(dmabuf); +} + +static int tegra_vde_attach_dmabuf(struct device *dev, int fd, + unsigned long offset, int min_size, + struct dma_buf_attachment **a, + phys_addr_t *paddr, u32 *size, + enum dma_data_direction dma_dir) +{ + struct dma_buf_attachment *attachment; + struct dma_buf *dmabuf; + struct sg_table *sgt; + + *a = NULL; + *paddr = 0xFBDEAD00; + + dmabuf = dma_buf_get(fd); + if (IS_ERR(dmabuf)) { + dev_err(dev, "Invalid dmabuf FD\n"); + return PTR_ERR(dmabuf); + } + + if ((u64)offset + min_size > dmabuf->size) { + dev_err(dev, "Too small dmabuf size %d @0x%lX, " + "should be at least %d\n", + dmabuf->size, offset, min_size); + return -EINVAL; + } + + attachment = dma_buf_attach(dmabuf, dev); + if (IS_ERR(attachment)) { + dev_err(dev, "Failed to attach dmabuf\n"); + dma_buf_put(dmabuf); + return PTR_ERR(attachment); + } + + sgt = dma_buf_map_attachment(attachment, dma_dir); + if (IS_ERR(sgt)) { + dev_err(dev, "Failed to get dmabuf sg_table\n"); + dma_buf_detach(dmabuf, attachment); + dma_buf_put(dmabuf); + return PTR_ERR(sgt); + } + + if (sgt->nents != 1) { + dev_err(dev, "Sparsed DMA area is unsupported\n"); + dma_buf_unmap_attachment(attachment, sgt, dma_dir); + dma_buf_detach(dmabuf, attachment); + dma_buf_put(dmabuf); + return -EINVAL; + } + + *paddr = sg_dma_address(sgt->sgl) + offset; + + dma_buf_unmap_attachment(attachment, sgt, dma_dir); + + *a = attachment; + + if (size) + *size = dmabuf->size - offset; + + return 0; +} + +static int tegra_vde_attach_frame_dmabufs(struct device *dev, + struct video_frame *frame, + struct tegra_vde_h264_frame *source, + enum dma_data_direction dma_dir, + int is_baseline_profile, int csize) +{ + int ret; + + ret = tegra_vde_attach_dmabuf(dev, source->y_fd, + source->y_offset, csize * 4, + &frame->y_dmabuf_attachment, + &frame->y_paddr, NULL, dma_dir); + if (ret) + return ret; + + ret = tegra_vde_attach_dmabuf(dev, source->cb_fd, + source->cb_offset, csize, + &frame->cb_dmabuf_attachment, + &frame->cb_paddr, NULL, dma_dir); + if (ret) + return ret; + + ret = tegra_vde_attach_dmabuf(dev, source->cr_fd, + source->cr_offset, csize, + &frame->cr_dmabuf_attachment, + &frame->cr_paddr, NULL, dma_dir); + if (ret) + return ret; + + if (is_baseline_profile) + frame->aux_paddr = 0xF4DEAD00; + else + ret = tegra_vde_attach_dmabuf(dev, source->aux_fd, + source->aux_offset, csize, + &frame->aux_dmabuf_attachment, + &frame->aux_paddr, NULL, dma_dir); + + return ret; +} + +static void tegra_vde_deattach_frame_dmabufs(struct video_frame *frame) +{ + tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment); + tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment); + tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment); + tegra_vde_detach_and_put_dmabuf(frame->aux_dmabuf_attachment); +} + +static int tegra_vde_copy_and_validate_frame(struct device *dev, + struct tegra_vde_h264_frame *frame, + unsigned long vaddr) +{ + if (copy_from_user(frame, (void __user *)vaddr, sizeof(*frame))) { + dev_err(dev, "Copying of H.264 frame from user failed\n"); + return -EFAULT; + } + + if (frame->frame_num > 0x7FFFFF) { + dev_err(dev, "Bad frame_num %u\n", frame->frame_num); + return -EINVAL; + } + + if (frame->y_offset & 0xFF) { + dev_err(dev, "Bad y_offset 0x%X\n", frame->y_offset); + return -EINVAL; + } + + if (frame->cb_offset & 0xFF) { + dev_err(dev, "Bad cb_offset 0x%X\n", frame->cb_offset); + return -EINVAL; + } + + if (frame->cr_offset & 0xFF) { + dev_err(dev, "Bad cr_offset 0x%X\n", frame->cr_offset); + return -EINVAL; + } + + return 0; +} + +static int tegra_vde_validate_h264_ctx(struct device *dev, + struct tegra_vde_h264_decoder_ctx *ctx) +{ + if (ctx->dpb_frames_nb == 0 || ctx->dpb_frames_nb > 17) { + dev_err(dev, "Bad DPB size %u\n", ctx->dpb_frames_nb); + return -EINVAL; + } + + if (ctx->level_idc > 15) { + dev_err(dev, "Bad level value %u\n", ctx->level_idc); + return -EINVAL; + } + + if (ctx->pic_init_qp > 52) { + dev_err(dev, "Bad pic_init_qp value %u\n", ctx->pic_init_qp); + return -EINVAL; + } + + if (ctx->log2_max_pic_order_cnt_lsb > 16) { + dev_err(dev, "Bad log2_max_pic_order_cnt_lsb value %u\n", + ctx->log2_max_pic_order_cnt_lsb); + return -EINVAL; + } + + if (ctx->log2_max_frame_num > 16) { + dev_err(dev, "Bad log2_max_frame_num value %u\n", + ctx->log2_max_frame_num); + return -EINVAL; + } + + if (ctx->chroma_qp_index_offset > 31) { + dev_err(dev, "Bad chroma_qp_index_offset value %u\n", + ctx->chroma_qp_index_offset); + return -EINVAL; + } + + if (ctx->pic_order_cnt_type > 2) { + dev_err(dev, "Bad pic_order_cnt_type value %u\n", + ctx->pic_order_cnt_type); + return -EINVAL; + } + + if (ctx->num_ref_idx_l0_active_minus1 > 15) { + dev_err(dev, "Bad num_ref_idx_l0_active_minus1 value %u\n", + ctx->num_ref_idx_l0_active_minus1); + return -EINVAL; + } + + if (ctx->num_ref_idx_l1_active_minus1 > 15) { + dev_err(dev, "Bad num_ref_idx_l1_active_minus1 value %u\n", + ctx->num_ref_idx_l1_active_minus1); + return -EINVAL; + } + + if (!ctx->pic_width_in_mbs || ctx->pic_width_in_mbs > 127) { + dev_err(dev, "Bad pic_width_in_mbs value %u, min 1 max 127\n", + ctx->pic_width_in_mbs); + return -EINVAL; + } + + if (!ctx->pic_height_in_mbs || ctx->pic_height_in_mbs > 127) { + dev_err(dev, "Bad pic_height_in_mbs value %u, min 1 max 127\n", + ctx->pic_height_in_mbs); + return -EINVAL; + } + + return 0; +} + +static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde, + unsigned long vaddr) +{ + struct tegra_vde_h264_decoder_ctx ctx; + struct tegra_vde_h264_frame frame; + struct device *dev = vde->miscdev.parent; + struct video_frame *dpb_frames = NULL; + struct dma_buf_attachment *bitstream_data_dmabuf_attachment = NULL; + enum dma_data_direction dma_dir; + phys_addr_t bitstream_data_paddr; + phys_addr_t bsev_paddr; + u32 bitstream_data_size; + u32 macroblocks_nb; + bool timeout = false; + int i, ret; + + if (copy_from_user(&ctx, (void __user *)vaddr, sizeof(ctx))) { + dev_err(dev, "Copying of H.264 CTX from user failed\n"); + return -EFAULT; + } + + ret = tegra_vde_validate_h264_ctx(dev, &ctx); + if (ret) + return -EINVAL; + + ret = tegra_vde_attach_dmabuf(dev, ctx.bitstream_data_fd, + ctx.bitstream_data_offset, 0, + &bitstream_data_dmabuf_attachment, + &bitstream_data_paddr, + &bitstream_data_size, + DMA_TO_DEVICE); + if (ret) + goto cleanup; + + dpb_frames = kcalloc(ctx.dpb_frames_nb, sizeof(*dpb_frames), + GFP_KERNEL); + if (!dpb_frames) { + ret = -ENOMEM; + goto cleanup; + } + + macroblocks_nb = ctx.pic_width_in_mbs * ctx.pic_height_in_mbs; + vaddr = ctx.dpb_frames_ptr; + + for (i = 0; i < ctx.dpb_frames_nb; i++) { + ret = tegra_vde_copy_and_validate_frame(dev, &frame, vaddr); + if (ret) + goto cleanup; + + dpb_frames[i].flags = frame.flags; + dpb_frames[i].frame_num = frame.frame_num; + + dma_dir = (i == 0) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + + ret = tegra_vde_attach_frame_dmabufs(dev, + &dpb_frames[i], &frame, + dma_dir, + ctx.is_baseline_profile, + macroblocks_nb * 64); + if (ret) { + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]); + goto cleanup; + } + + vaddr += sizeof(frame); + } + + ret = clk_prepare_enable(vde->clk); + if (ret) { + dev_err(dev, "Failed to enable clk: %d\n", ret); + goto cleanup; + } + + ret = mutex_lock_interruptible(&vde->lock); + if (ret) + goto clkgate; + + ret = reset_control_deassert(vde->rst); + if (ret) { + dev_err(dev, "Failed to deassert reset: %d\n", ret); + clk_disable_unprepare(vde->clk); + goto unlock; + } + + ret = tegra_vde_setup_context(vde, &ctx, dpb_frames, + bitstream_data_paddr, + bitstream_data_size, + macroblocks_nb); + if (ret) + goto reset; + + tegra_vde_decode_frame(vde, macroblocks_nb); + + timeout = !wait_for_completion_io_timeout(&vde->decode_completion, + TEGRA_VDE_TIMEOUT); + if (timeout) { + bsev_paddr = readl(vde->regs + BSEV(0x10)); + macroblocks_nb = readl(vde->regs + SXE(0xC8)) & 0x1FFF; + + dev_err(dev, "Decoding failed, " + "read 0x%X bytes : %u macroblocks parsed\n", + bsev_paddr ? bsev_paddr - bitstream_data_paddr : 0, + macroblocks_nb); + } + +reset: + /* + * We rely on the VDE registers reset value, otherwise VDE would + * cause bus lockup. + */ + ret = reset_control_assert(vde->rst); + if (ret) + dev_err(dev, "Failed to assert reset: %d\n", ret); + + if (timeout) + ret = -EIO; + +unlock: + mutex_unlock(&vde->lock); + +clkgate: + clk_disable_unprepare(vde->clk); + +cleanup: + if (dpb_frames) + while (i--) + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]); + + kfree(dpb_frames); + + tegra_vde_detach_and_put_dmabuf(bitstream_data_dmabuf_attachment); + + return ret; +} + +static long tegra_vde_unlocked_ioctl(struct file *filp, + unsigned int cmd, unsigned long arg) +{ + struct miscdevice *miscdev = filp->private_data; + struct tegra_vde *vde = container_of(miscdev, struct tegra_vde, + miscdev); + + switch (cmd) { + case TEGRA_VDE_IOCTL_DECODE_H264: + return tegra_vde_ioctl_decode_h264(vde, arg); + } + + dev_err(miscdev->parent, "Invalid IOCTL command %u\n", cmd); + + return -ENOTTY; +} + +static const struct file_operations tegra_vde_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = tegra_vde_unlocked_ioctl, +}; + +static irqreturn_t tegra_vde_isr(int irq, void *data) +{ + struct tegra_vde *vde = data; + + tegra_vde_set_bits(vde->regs, 0, FRAMEID(0x208)); + complete(&vde->decode_completion); + + return IRQ_HANDLED; +} + +static int tegra_vde_probe(struct platform_device *pdev) +{ + struct resource *res_regs, *res_iram; + struct device *dev = &pdev->dev; + struct tegra_vde *vde; + int ret; + + vde = devm_kzalloc(&pdev->dev, sizeof(*vde), GFP_KERNEL); + if (!vde) + return -ENOMEM; + + platform_set_drvdata(pdev, vde); + + res_regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs"); + if (!res_regs) + return -ENODEV; + + res_iram = platform_get_resource_byname(pdev, IORESOURCE_MEM, "iram"); + if (!res_iram) + return -ENODEV; + + vde->irq = platform_get_irq_byname(pdev, "sync-token"); + if (vde->irq < 0) + return vde->irq; + + vde->regs = devm_ioremap_resource(dev, res_regs); + if (IS_ERR(vde->regs)) + return PTR_ERR(vde->regs); + + vde->iram = devm_ioremap_resource(dev, res_iram); + if (IS_ERR(vde->iram)) + return PTR_ERR(vde->iram); + + vde->clk = devm_clk_get(dev, NULL); + if (IS_ERR(vde->clk)) { + dev_err(dev, "Could not get VDE clk\n"); + return PTR_ERR(vde->clk); + } + + vde->rst = devm_reset_control_get(dev, "vde"); + if (IS_ERR(vde->rst)) { + dev_err(dev, "Could not get VDE reset\n"); + return PTR_ERR(vde->rst); + } + + ret = devm_request_irq(dev, vde->irq, tegra_vde_isr, IRQF_SHARED, + dev_name(dev), vde); + if (ret) + return -ENODEV; + + ret = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_VDEC, + vde->clk, vde->rst); + if (ret) { + dev_err(&pdev->dev, "Failed to power up VDE unit\n"); + return ret; + } + + ret = reset_control_assert(vde->rst); + if (ret) { + dev_err(dev, "Failed to assert reset: %d\n", ret); + return ret; + } + + clk_disable_unprepare(vde->clk); + + mutex_init(&vde->lock); + init_completion(&vde->decode_completion); + + vde->iram_lists_paddr = res_iram->start; + vde->miscdev.minor = MISC_DYNAMIC_MINOR; + vde->miscdev.name = "tegra_vde"; + vde->miscdev.fops = &tegra_vde_fops; + vde->miscdev.parent = dev; + + ret = misc_register(&vde->miscdev); + if (ret) + return -ENODEV; + + return 0; +} + +static int tegra_vde_remove(struct platform_device *pdev) +{ + struct tegra_vde *vde = platform_get_drvdata(pdev); + + misc_deregister(&vde->miscdev); + + return 0; +} + +static const struct of_device_id tegra_vde_of_match[] = { + { .compatible = "nvidia,tegra20-vde", }, + { }, +}; +MODULE_DEVICE_TABLE(of, tegra_vde_of_match); + +static struct platform_driver tegra_vde_driver = { + .probe = tegra_vde_probe, + .remove = tegra_vde_remove, + .driver = { + .name = "tegra-vde", + .of_match_table = tegra_vde_of_match, + }, +}; +module_platform_driver(tegra_vde_driver); + +MODULE_DESCRIPTION("NVIDIA Tegra20 Video Decoder driver"); +MODULE_AUTHOR("Dmitry Osipenko"); +MODULE_LICENSE("GPL");
Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports decoding of CAVLC H.264 only. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> --- .../bindings/arm/tegra/nvidia,tegra20-vde.txt | 38 + drivers/staging/Kconfig | 2 + drivers/staging/Makefile | 1 + drivers/staging/tegra-vde/Kconfig | 6 + drivers/staging/tegra-vde/Makefile | 1 + drivers/staging/tegra-vde/TODO | 8 + drivers/staging/tegra-vde/uapi.h | 99 +++ drivers/staging/tegra-vde/vde.c | 971 +++++++++++++++++++++ 8 files changed, 1126 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt create mode 100644 drivers/staging/tegra-vde/Kconfig create mode 100644 drivers/staging/tegra-vde/Makefile create mode 100644 drivers/staging/tegra-vde/TODO create mode 100644 drivers/staging/tegra-vde/uapi.h create mode 100644 drivers/staging/tegra-vde/vde.c