From patchwork Fri Feb 1 13:28:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thierry Reding X-Patchwork-Id: 1034759 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-tegra-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="hxEuTFna"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43rdGp1Dpsz9sDr for ; Sat, 2 Feb 2019 00:28:50 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726734AbfBAN2q (ORCPT ); Fri, 1 Feb 2019 08:28:46 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:35267 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726190AbfBAN2q (ORCPT ); Fri, 1 Feb 2019 08:28:46 -0500 Received: by mail-wr1-f68.google.com with SMTP id r17so1136111wrp.2 for ; Fri, 01 Feb 2019 05:28:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NwsWVWy8vuEMAajnZP7InA+g7wcnn8BpTYvFQ8gZvv0=; b=hxEuTFnaURF05Tlscni15NeD6qE7CPPebBJYjOnu4G0nKFNBTMydHsuziPwUjtv9cb 8aVhMEaDC8LzwYk4cH1LfrcbP8Jvbu/vYIz8rwgYmXT2kEqhWawLyudqa05B51tZmo86 FivMdY9nMLcta/YcMC+tLfAzCcr5xMh6AG6u7a8zSk5YX8mgatIoK5jEnVvFtCW0muqH /qOJRzQzq74+h13GTblzSkL4QsZVcHhyjJZOBjKrC8EBtKKvMYD0lkLz8La/k/At4Xj6 A6uweF5jjZhGWy31ip85GG0zE5sm9Z6w3E1IFJ72HiWOplP9taS5o/iv0LeGFqbS4j1B /ocA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NwsWVWy8vuEMAajnZP7InA+g7wcnn8BpTYvFQ8gZvv0=; b=VkTUPOf7l4jSTMcGMyW/fy5SvMkFvJ7Dhuw2lRNeuS4W5neEs28L7PlPvkajynwHDk XtEWvWFy5IZua5O5BPog6uMWiu9vsieUDA2MWpVfShvG6ODuiVGKA5D8RWhgVAh4Qo71 kB+LJVMAVxZgc+34RUNyzi6hZXEMMwOsVQ2jZVw+xpE1oy2kpkcet8d0I+Y/+2yjD2ld Nz7ZdF7uUAme9HR0p2EPwQkfN/apfoVXeW9HRSPhdCwmuSbaxJXavhEtwuV9aJG0LOrb nAGAOe/YtVzXWmCWbUz/t39Ow9EDRaYzUa1ttoJ/+BjeplqWXt4xHJwB5/brgQLSDBkF QT+g== X-Gm-Message-State: AJcUukeELuGfVv1iOjmmlU9wnaXxPfzxUI19sPPwWF8QE8qHqnZlUbnM ysFYmBzRcYS4KVKUeFFSIqs= X-Google-Smtp-Source: ALg8bN74W/e1A8HSmj1wLJqiY2py6iA33J7IGE5/VKoOW7aEe6u/ZKZwUU/1s3E0cViIe24x0F5yqg== X-Received: by 2002:a5d:6549:: with SMTP id z9mr37550430wrv.116.1549027724078; Fri, 01 Feb 2019 05:28:44 -0800 (PST) Received: from localhost (pD9E51040.dip0.t-ipconnect.de. [217.229.16.64]) by smtp.gmail.com with ESMTPSA id y14sm5308192wro.92.2019.02.01.05.28.43 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 01 Feb 2019 05:28:43 -0800 (PST) From: Thierry Reding To: Thierry Reding Cc: Mikko Perttunen , Dmitry Osipenko , dri-devel@lists.freedesktop.org, linux-tegra@vger.kernel.org Subject: [PATCH v3 03/16] gpu: host1x: Introduce support for wide opcodes Date: Fri, 1 Feb 2019 14:28:24 +0100 Message-Id: <20190201132837.12327-4-thierry.reding@gmail.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190201132837.12327-1-thierry.reding@gmail.com> References: <20190201132837.12327-1-thierry.reding@gmail.com> MIME-Version: 1.0 Sender: linux-tegra-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-tegra@vger.kernel.org From: Thierry Reding The CDMA push buffer can currently only handle opcodes that take a single word parameter. However, the host1x implementation on Tegra186 and later supports opcodes that require multiple words as parameters. Unfortunately the way the push buffer is structured, these wide opcodes cannot simply be composed of two regular opcodes because that could result in the wide opcode being split across the end of the push buffer and the final RESTART opcode required to wrap the push buffer around would break the wide opcode. One way to fix this would be to remove the concept of slots to simplify push buffer operations. However, that's not entirely trivial and should be done in a separate patch. For now, simply use a different function to push four-word opcodes into the push buffer. Technically only three words are pushed, with the fourth word used as padding to preserve the 2-word alignment required by the slots abstraction. The fourth word is always a NOP opcode. Additional care must be taken when the end of the push buffer is reached. If a four-word opcode doesn't fit into the push buffer without being split by the boundary, NOP opcodes will be introduced and the new wide opcode placed at the beginning of the push buffer. Signed-off-by: Thierry Reding --- drivers/gpu/host1x/cdma.c | 92 +++++++++++++++++++++++++++++++++++ drivers/gpu/host1x/cdma.h | 2 + include/trace/events/host1x.h | 26 ++++++++++ 3 files changed, 120 insertions(+) diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c index 91df51e631b2..e2d106fa3c0b 100644 --- a/drivers/gpu/host1x/cdma.c +++ b/drivers/gpu/host1x/cdma.c @@ -217,6 +217,45 @@ unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma, return 0; } +/* + * Sleep (if necessary) until the push buffer has enough free space. + * + * Must be called with the cdma lock held. + */ +int host1x_cdma_wait_pushbuffer_space(struct host1x *host1x, + struct host1x_cdma *cdma, + unsigned int needed) +{ + while (true) { + struct push_buffer *pb = &cdma->push_buffer; + unsigned int space; + + space = host1x_pushbuffer_space(pb); + if (space >= needed) + break; + + trace_host1x_wait_cdma(dev_name(cdma_to_channel(cdma)->dev), + CDMA_EVENT_PUSH_BUFFER_SPACE); + + host1x_hw_cdma_flush(host1x, cdma); + + /* If somebody has managed to already start waiting, yield */ + if (cdma->event != CDMA_EVENT_NONE) { + mutex_unlock(&cdma->lock); + schedule(); + mutex_lock(&cdma->lock); + continue; + } + + cdma->event = CDMA_EVENT_PUSH_BUFFER_SPACE; + + mutex_unlock(&cdma->lock); + down(&cdma->sem); + mutex_lock(&cdma->lock); + } + + return 0; +} /* * Start timer that tracks the time spent by the job. * Must be called with the cdma lock held. @@ -509,6 +548,59 @@ void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2) host1x_pushbuffer_push(pb, op1, op2); } +/* + * Push four words into two consecutive push buffer slots. Note that extra + * care needs to be taken not to split the two slots across the end of the + * push buffer. Otherwise the RESTART opcode at the end of the push buffer + * that ensures processing will restart at the beginning will break up the + * four words. + * + * Blocks as necessary if the push buffer is full. + */ +void host1x_cdma_push_wide(struct host1x_cdma *cdma, u32 op1, u32 op2, + u32 op3, u32 op4) +{ + struct host1x_channel *channel = cdma_to_channel(cdma); + struct host1x *host1x = cdma_to_host1x(cdma); + struct push_buffer *pb = &cdma->push_buffer; + unsigned int needed = 2, extra = 0, i; + unsigned int space = cdma->slots_free; + + if (host1x_debug_trace_cmdbuf) + trace_host1x_cdma_push_wide(dev_name(channel->dev), op1, op2, + op3, op4); + + /* compute number of extra slots needed for padding */ + if (pb->pos + 16 > pb->size) { + extra = (pb->size - pb->pos) / 8; + needed += extra; + } + + host1x_cdma_wait_pushbuffer_space(host1x, cdma, needed); + space = host1x_pushbuffer_space(pb); + + cdma->slots_free = space - needed; + cdma->slots_used += needed; + + /* + * Note that we rely on the fact that this is only used to submit wide + * gather opcodes, which consist of 3 words, and they are padded with + * a NOP to avoid having to deal with fractional slots (a slot always + * represents 2 words). The fourth opcode passed to this function will + * therefore always be a NOP. + * + * This works around a slight ambiguity when it comes to opcodes. For + * all current host1x incarnations the NOP opcode uses the exact same + * encoding (0x20000000), so we could hard-code the value here, but a + * new incarnation may change it and break that assumption. + */ + for (i = 0; i < extra; i++) + host1x_pushbuffer_push(pb, op4, op4); + + host1x_pushbuffer_push(pb, op1, op2); + host1x_pushbuffer_push(pb, op3, op4); +} + /* * End a cdma submit * Kick off DMA, add job to the sync queue, and a number of slots to be freed diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h index e97e17b82370..ebf6c6a06541 100644 --- a/drivers/gpu/host1x/cdma.h +++ b/drivers/gpu/host1x/cdma.h @@ -90,6 +90,8 @@ int host1x_cdma_init(struct host1x_cdma *cdma); int host1x_cdma_deinit(struct host1x_cdma *cdma); int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job); void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2); +void host1x_cdma_push_wide(struct host1x_cdma *cdma, u32 op1, u32 op2, + u32 op3, u32 op4); void host1x_cdma_end(struct host1x_cdma *cdma, struct host1x_job *job); void host1x_cdma_update(struct host1x_cdma *cdma); void host1x_cdma_peek(struct host1x_cdma *cdma, u32 dmaget, int slot, diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h index a37ef73092e5..3d340b6f1ea3 100644 --- a/include/trace/events/host1x.h +++ b/include/trace/events/host1x.h @@ -80,6 +80,32 @@ TRACE_EVENT(host1x_cdma_push, __entry->name, __entry->op1, __entry->op2) ); +TRACE_EVENT(host1x_cdma_push_wide, + TP_PROTO(const char *name, u32 op1, u32 op2, u32 op3, u32 op4), + + TP_ARGS(name, op1, op2, op3, op4), + + TP_STRUCT__entry( + __field(const char *, name) + __field(u32, op1) + __field(u32, op2) + __field(u32, op3) + __field(u32, op4) + ), + + TP_fast_assign( + __entry->name = name; + __entry->op1 = op1; + __entry->op2 = op2; + __entry->op3 = op3; + __entry->op4 = op4; + ), + + TP_printk("name=%s, op1=%08x, op2=%08x, op3=%08x op4=%08x", + __entry->name, __entry->op1, __entry->op2, __entry->op3, + __entry->op4) +); + TRACE_EVENT(host1x_cdma_push_gather, TP_PROTO(const char *name, struct host1x_bo *bo, u32 words, u32 offset, void *cmdbuf),