From patchwork Wed Aug 9 22:14:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 1819641 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=CCeW9yMC; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RLkrP04vxz1yf2 for ; Thu, 10 Aug 2023 08:14:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0E7C03857727 for ; Wed, 9 Aug 2023 22:14:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0E7C03857727 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1691619297; bh=uO+ucp4kgopmi2Z+NkP8oylmdM169vOblcJeQNyh95A=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=CCeW9yMCYjWUdfUIYINxoCIQPpfFFnDKsMhfilMnPJu3IL/cUxkoEprLGk9s49EL7 woXIDB1jw+f7aU97k19jOh4eJfMIhhNuckzqTut5X/NV9GKGVsZJC1BeZ3OjDSw5dS ZCZSylmIAc6mLCT6LNiLbv6RD7eZeEk3SpOaP6Ic= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ot1-x334.google.com (mail-ot1-x334.google.com [IPv6:2607:f8b0:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id 8415E3858D20 for ; Wed, 9 Aug 2023 22:14:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8415E3858D20 Received: by mail-ot1-x334.google.com with SMTP id 46e09a7af769-6bc886d1504so297617a34.0 for ; Wed, 09 Aug 2023 15:14:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691619276; x=1692224076; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uO+ucp4kgopmi2Z+NkP8oylmdM169vOblcJeQNyh95A=; b=P4kLjqtRd/6tjfHiCToiYVDyY+0h6jysq0FtkmvmKiBY1+lsaVP0XvsH8VjFsE8+tU CI5WuUAwcPPuxJJtaFwPj9S0fNCJWkgh8sZCOiZvHU1nppZLfDoOvV7XJSR3J1XrV7uz n0VaJVmWiOxT5LzxcIe33SpjeVLbZFZNecaRhUnRdst2tnTy8xMHdMXAK978nDESLBwf fPjq86iP+odZnH0odfRpZhH7ls5FAwVC0f/uZJv1QhPvUqg1C6429gG5QIC+pW8GHJ2s TgqU7x1Y7SJnC6eb+cT4g8I1XiSGgi0QZge4fxU3C87IC2ppla2gTJWkRbLlExVtKFkR rObQ== X-Gm-Message-State: AOJu0YwWRJFCk0hpIqUxV5UN9RKLFT7gfS+TDO/tMiLWlS2lzt88zZKh h9PXjWN2fVcJ6RcObnLWovPoaxnURnk= X-Google-Smtp-Source: AGHT+IHMGJOexSdizv0VJ0FLqdqHO4SFKu67VCw8fmS+r0LLD52brwkOqoQ1JklcQ1ChDWEgUbDPYA== X-Received: by 2002:a05:6358:5295:b0:137:8c47:ba55 with SMTP id g21-20020a056358529500b001378c47ba55mr1010089rwa.1.1691619276367; Wed, 09 Aug 2023 15:14:36 -0700 (PDT) Received: from localhost.localdomain (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id ce11-20020a05622a41cb00b0040fef71dc1esm46334qtb.10.2023.08.09.15.14.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Aug 2023 15:14:35 -0700 (PDT) To: gcc-patches@gcc.gnu.org Cc: David Malcolm , Lewis Hyatt Subject: [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Date: Wed, 9 Aug 2023 18:14:06 -0400 Message-Id: <20230809221414.2849878-1-lhyatt@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-3038.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches From: Lewis Hyatt Reply-To: Lewis Hyatt Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" On Mon, Jul 31, 2023 at 06:39:15PM -0400, Lewis Hyatt wrote: > On Fri, Jul 28, 2023 at 6:58 PM David Malcolm wrote: > > > > On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote: > > > Add a new linemap reason LC_GEN which enables encoding the location > > > of data > > > that was generated during compilation and does not appear in any > > > source file. > > > There could be many use cases, such as, for instance, referring to > > > the content > > > of builtin macros (not yet implemented, but an easy lift after this > > > one.) The > > > first intended application is to create a place to store the input to > > > a > > > _Pragma directive, so that proper locations can be assigned to those > > > tokens. This will be done in a subsequent commit. > > > > > > The actual change needed to the line-maps API in libcpp is not too > > > large and > > > requires no space overhead in the line map data structures (on 64-bit > > > systems > > > that is; one newly added data member to class line_map_ordinary sits > > > inside > > > former padding bytes.) An LC_GEN map is just an ordinary map like any > > > other, > > > but the TO_FILE member that normally points to the file name points > > > instead to > > > the actual data. This works automatically with PCH as well, for the > > > same > > > reason that the file name makes its way into a PCH. In order to > > > avoid > > > confusion, the member has been renamed from TO_FILE to DATA, and > > > associated > > > accessors adjusted. > > > > > > Outside libcpp, there are many small changes but most of them are to > > > selftests, which are necessarily more sensitive to implementation > > > details. From the perspective of the user (the "user", here, being a > > > frontend > > > using line maps or else the diagnostics infrastructure), the chief > > > visible > > > change is that the function location_get_source_line() should be > > > passed an > > > expanded_location object instead of a separate filename and line > > > number. This > > > is not a big change because in most cases, this information came > > > anyway from a > > > call to expand_location and the needed expanded_location object is > > > readily > > > available. The new overload of location_get_source_line() uses the > > > extra > > > information in the expanded_location object to obtain the data from > > > the > > > in-memory buffer when it originated from an LC_GEN map. > > > > > > Until the subsequent patch that starts using LC_GEN maps, none are > > > yet > > > generated within GCC, hence nothing is added to the testsuite here; > > > but all > > > relevant selftests have been extended to cover generated data maps in > > > addition > > > to normal files. > > > > [..snip...] > > > > Thanks for the updated patch. > > > > Reading this patch, it felt a bit unnatural to me to have an > > (exploded location, source line) > > pair where the exploded location seems to be representing "which source > > file or generated buffer", but the line/column info in that > > exploded_location is to be ignored in favor of the 2nd source line. > > > > I think we're missing a class: something that identifies either a > > specific source file, or a specific generated buffer. > > > > How about something like either: > > > > class source_id > > { > > public: > > source_id (const char *filename) > > : m_filename_or_buffer (filename), > > m_len (0) > > { > > } > > > > explicit source_id (const char *buffer, unsigned buffer_len) > > : m_filename_or_buffer (buffer), > > m_len (buffer_len) > > { > > linemap_assert (buffer_len > 0); > > } > > > > private: > > const char *m_filename_or_buffer; > > unsigned m_len; // where 0 means "it's a filename" > > }; > > > > or: > > > > class source_id > > { > > public: > > source_id (const char *filename) > > : m_ptr (filename), > > m_is_buffer (false) > > { > > } > > > > explicit source_id (const linemap_ordinary *buffer_linemap) > > : m_ptr (buffer_linemap), > > m_is_buffer (true) > > { > > } > > > > private: > > const void *m_ptr; > > bool m_is_buffer; > > }; > > > > and use one of these "source_id file" in place of "const char *file", > > rather than replacing such things with expanded_location? > > > > > diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc > > > index e8d3dece770..4164fa0b1ba 100644 > > > --- a/gcc/c-family/c-indentation.cc > > > +++ b/gcc/c-family/c-indentation.cc > > > @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc, > > > unsigned int *first_nws, > > > unsigned int tab_width) > > > { > > > - char_span line = location_get_source_line (exploc.file, exploc.line); > > > + char_span line = location_get_source_line (exploc); > > > > ...so this might contine to be: > > > > char_span line = location_get_source_line (exploc.file, exploc.line); > > > > ...but expanded_location's "file" field would become a source_id, > > rather than a const char *. It looks like doing do might make a lot of > > "is this the same file or buffer?" turn into comparisons of source_id > > instances. > > > > So I think expanded_location would become: > > > > typedef struct > > { > > /* Either the name of the source file involved, or the > > specific generated buffer. */ > > source_id file; > > > > /* The line-location in the source file. */ > > int line; > > > > int column; > > > > void *data; > > > > /* In a system header?. */ > > bool sysp; > > } expanded_location; > > > > and we wouldn't need to add these extra fields: > > > > > + > > > + /* If generated data, the data and its length. The data may contain embedded > > > + nulls and need not be null-terminated. */ > > > + unsigned int generated_data_len; > > > + const char *generated_data; > > > } expanded_location; > > > > and we could pass around source_id instances when identifying specific > > filenames/generated buffers. > > > > Does this idea simplify/clarify the patch, or make it more complicated? > > > > [...snip...] > > > > Thoughts? > > Dave > > > > Thanks, this makes sense and I think on balance it makes the interface > nicer to do it this way. In the last patch of this series (for SARIF > output) I had found it necessary to use a > typdef std::pair filename_or_buffer; > which was the same thing in spirit as your source_id. It makes sense > to promote that to a real class and use it more widely. > > I will send out an updated series with that change later after testing. > > I don't think we can simply change "file" in expanded_location to be a > source_id, because this field is used in lots of places that don't > care about generated data buffers, and that interface change would > necessitate touching all of them. For example, gengtype.cc uses libcpp > and expanded_location, and there are a bunch of call sites in the > middle end and backend that do as well, plus e.g. the custom > diagnostics hooks in the C++ front end that print "inlined from > xyz.cc". I thought about making source_id implicitly convertible to > char*, but I think that is too error prone, plus it doesn't help with > the most common use of this field, which is to pass it to printf(). > The approach I am thinking to take is to leave "file" as it is in > expanded_location, but to also add a "source_id src" field too. This > way, the only call sites that need to be touched are those that care > about this distinction, and so the number of changes is much more > limited. But it does still achieve the goal that we don't need to use > an expanded_location to communicate with input.cc, we can use a > source_id instead, and that makes the interface more natural. > > With the new interface, the main change needed for diagnostics code > would be that instead of calling location_get_source_line(file_name, > line), you would need to call location_get_source_line(src_id, line). > In case both the source_id and the line number come from an > expanded_location, there could be a convenience overload like > location_get_source_line(exploc) also, but it wouldn't be necessary to > involved an expanded_location if the source_id and line are better > handled separately. > Hello- Here is the updated patch series with the interface change we discussed. I also identified an issue with the previous version. I had intended for struct line_map_ordinary to incur 0 space overhead with this change, but with the prior approach, the size was increasing by 8 bytes (from 24 to 32). In this round, I changed the approach to use a union so there is no size overhead. Rather, there is an extra level of indirection incurred only when LC_GEN maps are used, so that the impact on current usage is minimal to none. While working on this version I felt that the first patch is really too large. In this iteration, I split it into 6 patches to (hopefully) make it easier to review. If that is inconvenient for any reason, please let me know and I can send it the old way. Thanks again for taking a look at it. -Lewis