From patchwork Sat Nov 16 01:23:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1196030 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-513754-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="nEd8untc"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="OsIHNRpo"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47FHd43LRwz9sPV for ; Sat, 16 Nov 2019 12:27:04 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :content-type:content-transfer-encoding; q=dns; s=default; b=lfG DE+Z5EVHeBayrD80OHFkBBuOX7UDUHXsLdx2PVun3ymSRGVpHd+PZgFiway+BNUi hx/zLRtMZtaQKrZbze79+/6yqokYknvG4tIhmOJwuZRuXBHelo4cBE4kLofrGbCQ UVU6VNlUrY4f5vkEJt3qxUkzQKUmZwAS2GzXVUD8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :content-type:content-transfer-encoding; s=default; bh=aivQLnOTP a3664is9+Wyydo/w0A=; b=nEd8untc5t21aV99JU5bIC1dUa6rSvi+NsxSMqnW2 daDVfBwxf4hyztvf7sjK4aSh5jyOKzc6stGjBLJz/vjRWnDzdCMvK4ynE3GLq0iZ bFaP/k9Fxw7RqKxTK2Zge3pxrOwO7EBOpgBY5x6vhbMscQ50lN9hjDw/KX+lOLv/ +A= Received: (qmail 75161 invoked by alias); 16 Nov 2019 01:20:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 61775 invoked by uid 89); 16 Nov 2019 01:18:13 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-22.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT autolearn=ham version=3.3.1 spammy= X-HELO: us-smtp-delivery-1.mimecast.com Received: from us-smtp-1.mimecast.com (HELO us-smtp-delivery-1.mimecast.com) (207.211.31.81) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 16 Nov 2019 01:17:56 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573867075; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MBB4Da5ekIoTu55XxqOhYvgmD4/9l42QpfOSUfQ8auk=; b=OsIHNRpo9EDRE0nrPaGp4LBqPkZvKtC2pfn3k8qS4+KDjjRtvjTY4JlZTA+g3EVEucRSGT C/6IWJySiTIICyVLj3g65Twldx6Zu6k68q6lJpWwecq/OcJLIA02mgPayRevlKi+uW9gJ7 8KSQzFacAmhCnLzBPujwaY+lf+Nm/48= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-345-9iFhzdsdMK-o0JhlINxm0w-1; Fri, 15 Nov 2019 20:17:53 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5B4DD800686 for ; Sat, 16 Nov 2019 01:17:52 +0000 (UTC) Received: from c64.redhat.com (ovpn-112-32.phx2.redhat.com [10.3.112.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6CF8510246FB; Sat, 16 Nov 2019 01:17:51 +0000 (UTC) From: David Malcolm To: gcc-patches@gcc.gnu.org Cc: David Malcolm Subject: [PATCH 38/49] analyzer: new file: sm-taint.cc Date: Fri, 15 Nov 2019 20:23:25 -0500 Message-Id: <1573867416-55618-39-git-send-email-dmalcolm@redhat.com> In-Reply-To: <1573867416-55618-1-git-send-email-dmalcolm@redhat.com> References: <1573867416-55618-1-git-send-email-dmalcolm@redhat.com> X-Mimecast-Spam-Score: 0 X-IsSubscribed: yes This patch adds a state machine checker for tracking "taint", where data potentially under an attacker's control is used for things like array indices without sanitization (CWE-129). This checker isn't ready for production, and is presented as a proof-of-concept of the sm-based approach. gcc/ChangeLog: * analyzer/sm-taint.cc: New file. --- gcc/analyzer/sm-taint.cc | 338 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 338 insertions(+) create mode 100644 gcc/analyzer/sm-taint.cc diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc new file mode 100644 index 0000000..c664a54 --- /dev/null +++ b/gcc/analyzer/sm-taint.cc @@ -0,0 +1,338 @@ +/* An experimental state machine, for tracking "taint": unsanitized uses + of data potentially under an attacker's control. + + Copyright (C) 2019 Free Software Foundation, Inc. + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "gcc-plugin.h" +#include "system.h" +#include "coretypes.h" +#include "tree.h" +#include "gimple.h" +#include "diagnostic-path.h" +#include "diagnostic-metadata.h" +#include "analyzer/analyzer.h" +#include "analyzer/pending-diagnostic.h" +#include "analyzer/sm.h" + +namespace { + +/* An experimental state machine, for tracking "taint": unsanitized uses + of data potentially under an attacker's control. */ + +class taint_state_machine : public state_machine +{ +public: + taint_state_machine (logger *logger); + + bool inherited_state_p () const FINAL OVERRIDE { return true; } + + bool on_stmt (sm_context *sm_ctxt, + const supernode *node, + const gimple *stmt) const FINAL OVERRIDE; + + void on_condition (sm_context *sm_ctxt, + const supernode *node, + const gimple *stmt, + tree lhs, + enum tree_code op, + tree rhs) const FINAL OVERRIDE; + + void on_leak (sm_context *sm_ctxt, + const supernode *node, + const gimple *stmt, + tree var, + state_machine::state_t state) const FINAL OVERRIDE; + bool can_purge_p (state_t s) const FINAL OVERRIDE; + + /* Start state. */ + state_t m_start; + + /* State for a "tainted" value: unsanitized data potentially under an + attacker's control. */ + state_t m_tainted; + + /* State for a "tainted" value that has a lower bound. */ + state_t m_has_lb; + + /* State for a "tainted" value that has an upper bound. */ + state_t m_has_ub; + + /* Stop state, for a value we don't want to track any more. */ + state_t m_stop; +}; + +//////////////////////////////////////////////////////////////////////////// + +enum bounds +{ + BOUNDS_NONE, + BOUNDS_UPPER, + BOUNDS_LOWER +}; + +class tainted_array_index + : public pending_diagnostic_subclass +{ +public: + tainted_array_index (const taint_state_machine &sm, tree arg, + enum bounds has_bounds) + : m_sm (sm), m_arg (arg), m_has_bounds (has_bounds) {} + + const char *get_kind () const FINAL OVERRIDE { return "tainted_array_index"; } + + bool operator== (const tainted_array_index &other) const + { + return m_arg == other.m_arg; + } + + bool emit (rich_location *rich_loc) FINAL OVERRIDE + { + diagnostic_metadata m; + m.add_cwe (129); + switch (m_has_bounds) + { + default: + gcc_unreachable (); + case BOUNDS_NONE: + return warning_at (rich_loc, m, OPT_Wanalyzer_tainted_array_index, + "use of tainted value %qE in array lookup" + " without bounds checking", + m_arg); + break; + case BOUNDS_UPPER: + return warning_at (rich_loc, m, OPT_Wanalyzer_tainted_array_index, + "use of tainted value %qE in array lookup" + " without lower-bounds checking", + m_arg); + break; + case BOUNDS_LOWER: + return warning_at (rich_loc, m, OPT_Wanalyzer_tainted_array_index, + "use of tainted value %qE in array lookup" + " without upper-bounds checking", + m_arg); + break; + } + } + + label_text describe_state_change (const evdesc::state_change &change) + FINAL OVERRIDE + { + if (change.m_new_state == m_sm.m_tainted) + { + if (change.m_origin) + return change.formatted_print ("%qE has an unchecked value here" + " (from %qE)", + change.m_expr, change.m_origin); + else + return change.formatted_print ("%qE gets an unchecked value here", + change.m_expr); + } + else if (change.m_new_state == m_sm.m_has_lb) + return change.formatted_print ("%qE has its lower bound checked here", + change.m_expr); + else if (change.m_new_state == m_sm.m_has_ub) + return change.formatted_print ("%qE has its upper bound checked here", + change.m_expr); + return label_text (); + } + + label_text describe_final_event (const evdesc::final_event &ev) FINAL OVERRIDE + { + switch (m_has_bounds) + { + default: + gcc_unreachable (); + case BOUNDS_NONE: + return ev.formatted_print ("use of tainted value %qE in array lookup" + " without bounds checking", + m_arg); + case BOUNDS_UPPER: + return ev.formatted_print ("use of tainted value %qE in array lookup" + " without lower-bounds checking", + m_arg); + case BOUNDS_LOWER: + return ev.formatted_print ("use of tainted value %qE in array lookup" + " without upper-bounds checking", + m_arg); + } + } + +private: + const taint_state_machine &m_sm; + tree m_arg; + enum bounds m_has_bounds; +}; + +//////////////////////////////////////////////////////////////////////////// + +/* taint_state_machine's ctor. */ + +taint_state_machine::taint_state_machine (logger *logger) +: state_machine ("taint", logger) +{ + m_start = add_state ("start"); + m_tainted = add_state ("tainted"); + m_has_lb = add_state ("has_lb"); + m_has_ub = add_state ("has_ub"); + m_stop = add_state ("stop"); +} + +/* Implementation of state_machine::on_stmt vfunc for taint_state_machine. */ + +bool +taint_state_machine::on_stmt (sm_context *sm_ctxt, + const supernode *node, + const gimple *stmt) const +{ + if (const gcall *call = dyn_cast (stmt)) + { + if (is_named_call_p (call, "fread", 4)) + { + tree arg = gimple_call_arg (call, 0); + arg = sm_ctxt->get_readable_tree (arg); + + sm_ctxt->on_transition (node, stmt, arg, m_start, m_tainted); + + /* Dereference an ADDR_EXPR. */ + // TODO: should the engine do this? + if (TREE_CODE (arg) == ADDR_EXPR) + sm_ctxt->on_transition (node, stmt, TREE_OPERAND (arg, 0), + m_start, m_tainted); + return true; + } + } + // TODO: ...etc; many other sources of untrusted data + + if (const gassign *assign = dyn_cast (stmt)) + { + tree rhs1 = gimple_assign_rhs1 (assign); + enum tree_code op = gimple_assign_rhs_code (assign); + + /* Check array accesses. */ + if (op == ARRAY_REF) + { + tree arg = TREE_OPERAND (rhs1, 1); + arg = sm_ctxt->get_readable_tree (arg); + + /* Unsigned types have an implicit lower bound. */ + bool is_unsigned = false; + if (INTEGRAL_TYPE_P (TREE_TYPE (arg))) + is_unsigned = TYPE_UNSIGNED (TREE_TYPE (arg)); + + /* Complain about missing bounds. */ + sm_ctxt->warn_for_state + (node, stmt, arg, m_tainted, + new tainted_array_index (*this, arg, + is_unsigned + ? BOUNDS_LOWER : BOUNDS_NONE)); + sm_ctxt->on_transition (node, stmt, arg, m_tainted, m_stop); + + /* Complain about missing upper bound. */ + sm_ctxt->warn_for_state (node, stmt, arg, m_has_lb, + new tainted_array_index (*this, arg, + BOUNDS_LOWER)); + sm_ctxt->on_transition (node, stmt, arg, m_has_lb, m_stop); + + /* Complain about missing lower bound. */ + if (!is_unsigned) + { + sm_ctxt->warn_for_state (node, stmt, arg, m_has_ub, + new tainted_array_index (*this, arg, + BOUNDS_UPPER)); + sm_ctxt->on_transition (node, stmt, arg, m_has_ub, m_stop); + } + } + } + + return false; +} + +/* Implementation of state_machine::on_condition vfunc for taint_state_machine. + Potentially transition state 'tainted' to 'has_ub' or 'has_lb', + and states 'has_ub' and 'has_lb' to 'stop'. */ + +void +taint_state_machine::on_condition (sm_context *sm_ctxt, + const supernode *node, + const gimple *stmt, + tree lhs, + enum tree_code op, + tree rhs ATTRIBUTE_UNUSED) const +{ + if (stmt == NULL) + return; + + // TODO: this doesn't use the RHS; should we make it symmetric? + + // TODO + switch (op) + { + //case NE_EXPR: + //case EQ_EXPR: + case GE_EXPR: + case GT_EXPR: + { + sm_ctxt->on_transition (node, stmt, lhs, m_tainted, + m_has_lb); + sm_ctxt->on_transition (node, stmt, lhs, m_has_ub, + m_stop); + } + break; + case LE_EXPR: + case LT_EXPR: + { + sm_ctxt->on_transition (node, stmt, lhs, m_tainted, + m_has_ub); + sm_ctxt->on_transition (node, stmt, lhs, m_has_lb, + m_stop); + } + break; + default: + break; + } +} + +void +taint_state_machine::on_leak (sm_context *sm_ctxt ATTRIBUTE_UNUSED, + const supernode *node ATTRIBUTE_UNUSED, + const gimple *stmt ATTRIBUTE_UNUSED, + tree var ATTRIBUTE_UNUSED, + state_machine::state_t state ATTRIBUTE_UNUSED) + const +{ + /* Empty. */ +} + +bool +taint_state_machine::can_purge_p (state_t s ATTRIBUTE_UNUSED) const +{ + return true; +} + +} // anonymous namespace + +/* Internal interface to this file. */ + +state_machine * +make_taint_state_machine (logger *logger) +{ + return new taint_state_machine (logger); +}