diff mbox

[net-next,2/3] net-timestamp: allow reading recv cmsg on errqueue with origin tstamp

Message ID 1417404155-28607-3-git-send-email-willemb@google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Willem de Bruijn Dec. 1, 2014, 3:22 a.m. UTC
From: Willem de Bruijn <willemb@google.com>

Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
   (http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
   cmsg from users without CAP_NET_RAW.
---
 Documentation/networking/timestamping.txt | 12 +++++++++++-
 include/uapi/linux/net_tstamp.h           |  3 ++-
 net/ipv4/ip_sockglue.c                    | 22 ++++++++++++++++++++--
 net/ipv6/datagram.c                       | 21 +++++++++++++++++++--
 4 files changed, 52 insertions(+), 6 deletions(-)

Comments

Andy Lutomirski Dec. 1, 2014, 2:53 p.m. UTC | #1
On Sun, Nov 30, 2014 at 7:22 PM, Willem de Bruijn <willemb@google.com> wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Allow reading of timestamps and cmsg at the same time on all relevant
> socket families. One use is to correlate timestamps with egress
> device, by asking for cmsg IP_PKTINFO.
>
> on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
> avoid changing legacy expectations, only do so if the caller sets a
> new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.
>
> on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
> returned for all origins. only change is to set ifindex, which is
> not initialized for all error origins.
>
> In both cases, only generate the pktinfo message if an ifindex is
> known. This is not the case for ACK timestamps.
>
> The difference between the protocol families is probably a historical
> accident as a result of the different conditions for generating cmsg
> in the relevant ip(v6)_recv_error function:
>
> ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
> ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
>
> At one time, this was the same test bar for the ICMP/ICMP6
> distinction. This is no longer true.
>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
>
> ----
>
> Changes
>   v1 -> v2
>     large rewrite
>     - integrate with existing pktinfo cmsg generation code
>     - on ipv4: only send with new flag, to maintain legacy behavior
>     - on ipv6: send at most a single pktinfo cmsg
>     - on ipv6: initialize fields if not yet initialized
>
> The recv cmsg interfaces are also relevant to the discussion of
> whether looping packet headers is problematic. For v6, cmsgs that
> identify many headers are already returned. This patch expands
> that to v4. If it sounds reasonable, I will follow with patches
>
> 1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
>    (http://patchwork.ozlabs.org/patch/366967/)
> 2. sysctl to conditionally drop all timestamps that have payload or
>    cmsg from users without CAP_NET_RAW.
> ---
>  Documentation/networking/timestamping.txt | 12 +++++++++++-
>  include/uapi/linux/net_tstamp.h           |  3 ++-
>  net/ipv4/ip_sockglue.c                    | 22 ++++++++++++++++++++--
>  net/ipv6/datagram.c                       | 21 +++++++++++++++++++--
>  4 files changed, 52 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
> index 1d6d02d..b08e272 100644
> --- a/Documentation/networking/timestamping.txt
> +++ b/Documentation/networking/timestamping.txt
> @@ -122,7 +122,7 @@ SOF_TIMESTAMPING_RAW_HARDWARE:
>
>  1.3.3 Timestamp Options
>
> -The interface supports one option
> +The interface supports the options
>
>  SOF_TIMESTAMPING_OPT_ID:
>
> @@ -145,6 +145,16 @@ SOF_TIMESTAMPING_OPT_ID:
>    stream sockets, it increments with every byte.
>
>
> +SOF_TIMESTAMPING_OPT_CMSG:
> +
> +  Support recv() cmsg for all timestamped packets. Control messages
> +  are already supported unconditionally on all packets with receive
> +  timestamps and on IPv6 packets with transmit timestamp. This option
> +  extends them to IPv4 packets with transmit timestamp. One use case
> +  is to correlate packets with their egress device, by enabling socket
> +  option IP_PKTINFO simultaneously.
> +

I haven't tested yet, but where in the code is the check for
IP_PKTINFO being requested?  I may have missed it.

> +
>  1.4 Bytestream Timestamps
>
>  The SO_TIMESTAMPING interface supports timestamping of bytes in a
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index ff35402..edbc888 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -23,8 +23,9 @@ enum {
>         SOF_TIMESTAMPING_OPT_ID = (1<<7),
>         SOF_TIMESTAMPING_TX_SCHED = (1<<8),
>         SOF_TIMESTAMPING_TX_ACK = (1<<9),
> +       SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
>
> -       SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_ACK,
> +       SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_CMSG,
>         SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
>                                  SOF_TIMESTAMPING_LAST
>  };
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 59eba6c..640f26c 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -399,6 +399,22 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
>                 kfree_skb(skb);
>  }
>
> +static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
> +                                         const struct sk_buff *skb,
> +                                         int ee_origin)
> +{
> +       struct in_pktinfo *info = PKTINFO_SKB_CB(skb);
> +
> +       if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
> +           (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
> +           (!skb->dev))
> +               return false;
> +
> +       info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;

Is this the source addr chosen by the initial routing decision when
the packet was sent, or is it the final source addr on the way out?
If the latter, is this an information leak when network namespaces are
in use?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willem de Bruijn Dec. 1, 2014, 3:41 p.m. UTC | #2
>> The recv cmsg interfaces are also relevant to the discussion of
>> whether looping packet headers is problematic. For v6, cmsgs that
>> identify many headers are already returned. This patch expands
>> that to v4. If it sounds reasonable, I will follow with patches
>>
>> 1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
>>    (http://patchwork.ozlabs.org/patch/366967/)
>> 2. sysctl to conditionally drop all timestamps that have payload or
>>    cmsg from users without CAP_NET_RAW.
>> ---

>> +SOF_TIMESTAMPING_OPT_CMSG:
>> +
>> +  Support recv() cmsg for all timestamped packets. Control messages
>> +  are already supported unconditionally on all packets with receive
>> +  timestamps and on IPv6 packets with transmit timestamp. This option
>> +  extends them to IPv4 packets with transmit timestamp. One use case
>> +  is to correlate packets with their egress device, by enabling socket
>> +  option IP_PKTINFO simultaneously.
>> +
>
> I haven't tested yet, but where in the code is the check for
> IP_PKTINFO being requested?  I may have missed it.

See comment below

>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>> index 59eba6c..640f26c 100644
>> --- a/net/ipv4/ip_sockglue.c
>> +++ b/net/ipv4/ip_sockglue.c
>> @@ -399,6 +399,22 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
>>                 kfree_skb(skb);
>>  }
>>
>> +static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
>> +                                         const struct sk_buff *skb,
>> +                                         int ee_origin)
>> +{
>> +       struct in_pktinfo *info = PKTINFO_SKB_CB(skb);
>> +
>> +       if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
>> +           (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
>> +           (!skb->dev))
>> +               return false;

This function is called to decide whether to call ip_cmsg_recv when
the origin is not SO_EE_ORIGIN_ICMP. For other origins, the pktinfo
field is not initialized, so we initialize it here.

The socket owner must still set socket option IP_PKTINFO to exercise
the relevant code in ip_cmsg_recv:

        unsigned int flags = inet->cmsg_flags;

        if (flags & 1)
                ip_cmsg_recv_pktinfo(msg, skb);

and

net/ipv4/ip_sockglue.c:48:#define IP_CMSG_PKTINFO 1


>> +
>> +       info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
>
> Is this the source addr chosen by the initial routing decision when
> the packet was sent, or is it the final source addr on the way out?
> If the latter, is this an information leak when network namespaces are
> in use?

Not as long as the entire payload is looped back with the metadata,
as it is currently. Note my suggested fix at the top: to give processes
an option to request timestamps without either, and to give the
administrator the option to drop all others.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski Dec. 1, 2014, 3:48 p.m. UTC | #3
On Mon, Dec 1, 2014 at 7:41 AM, Willem de Bruijn <willemb@google.com> wrote:
>>> The recv cmsg interfaces are also relevant to the discussion of
>>> whether looping packet headers is problematic. For v6, cmsgs that
>>> identify many headers are already returned. This patch expands
>>> that to v4. If it sounds reasonable, I will follow with patches
>>>
>>> 1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
>>>    (http://patchwork.ozlabs.org/patch/366967/)
>>> 2. sysctl to conditionally drop all timestamps that have payload or
>>>    cmsg from users without CAP_NET_RAW.
>>> ---
>
>>> +SOF_TIMESTAMPING_OPT_CMSG:
>>> +
>>> +  Support recv() cmsg for all timestamped packets. Control messages
>>> +  are already supported unconditionally on all packets with receive
>>> +  timestamps and on IPv6 packets with transmit timestamp. This option
>>> +  extends them to IPv4 packets with transmit timestamp. One use case
>>> +  is to correlate packets with their egress device, by enabling socket
>>> +  option IP_PKTINFO simultaneously.
>>> +
>>
>> I haven't tested yet, but where in the code is the check for
>> IP_PKTINFO being requested?  I may have missed it.
>
> See comment below
>
>>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>>> index 59eba6c..640f26c 100644
>>> --- a/net/ipv4/ip_sockglue.c
>>> +++ b/net/ipv4/ip_sockglue.c
>>> @@ -399,6 +399,22 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
>>>                 kfree_skb(skb);
>>>  }
>>>
>>> +static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
>>> +                                         const struct sk_buff *skb,
>>> +                                         int ee_origin)
>>> +{
>>> +       struct in_pktinfo *info = PKTINFO_SKB_CB(skb);
>>> +
>>> +       if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
>>> +           (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
>>> +           (!skb->dev))
>>> +               return false;
>
> This function is called to decide whether to call ip_cmsg_recv when
> the origin is not SO_EE_ORIGIN_ICMP. For other origins, the pktinfo
> field is not initialized, so we initialize it here.
>
> The socket owner must still set socket option IP_PKTINFO to exercise
> the relevant code in ip_cmsg_recv:
>
>         unsigned int flags = inet->cmsg_flags;
>
>         if (flags & 1)
>                 ip_cmsg_recv_pktinfo(msg, skb);
>
> and
>
> net/ipv4/ip_sockglue.c:48:#define IP_CMSG_PKTINFO 1
>

Aha, got it.  I'm really not very familiar with the network plumbing
-- I just use it from the userspace side and occasionally try to write
patches when something doesn't work :)

>
>>> +
>>> +       info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
>>
>> Is this the source addr chosen by the initial routing decision when
>> the packet was sent, or is it the final source addr on the way out?
>> If the latter, is this an information leak when network namespaces are
>> in use?
>
> Not as long as the entire payload is looped back with the metadata,
> as it is currently. Note my suggested fix at the top: to give processes
> an option to request timestamps without either, and to give the
> administrator the option to drop all others.

So what happens to ipi_spec_dst if that admin option is set?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willem de Bruijn Dec. 1, 2014, 3:59 p.m. UTC | #4
>>>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>>>> index 59eba6c..640f26c 100644
>>>> --- a/net/ipv4/ip_sockglue.c
>>>> +++ b/net/ipv4/ip_sockglue.c
>>>> @@ -399,6 +399,22 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
>>>>                 kfree_skb(skb);
>>>>  }
>>>>
>>>> +static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
>>>> +                                         const struct sk_buff *skb,
>>>> +                                         int ee_origin)
>>>> +{
>>>> +       struct in_pktinfo *info = PKTINFO_SKB_CB(skb);
>>>> +
>>>> +       if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
>>>> +           (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
>>>> +           (!skb->dev))
>>>> +               return false;
>>
>> This function is called to decide whether to call ip_cmsg_recv when
>> the origin is not SO_EE_ORIGIN_ICMP. For other origins, the pktinfo
>> field is not initialized, so we initialize it here.
>>
>> The socket owner must still set socket option IP_PKTINFO to exercise
>> the relevant code in ip_cmsg_recv:
>>
>>         unsigned int flags = inet->cmsg_flags;
>>
>>         if (flags & 1)
>>                 ip_cmsg_recv_pktinfo(msg, skb);
>>
>> and
>>
>> net/ipv4/ip_sockglue.c:48:#define IP_CMSG_PKTINFO 1
>>
>
> Aha, got it.  I'm really not very familiar with the network plumbing
> -- I just use it from the userspace side and occasionally try to write
> patches when something doesn't work :)
>
>>
>>>> +
>>>> +       info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
>>>
>>> Is this the source addr chosen by the initial routing decision when
>>> the packet was sent, or is it the final source addr on the way out?
>>> If the latter, is this an information leak when network namespaces are
>>> in use?
>>
>> Not as long as the entire payload is looped back with the metadata,
>> as it is currently. Note my suggested fix at the top: to give processes
>> an option to request timestamps without either, and to give the
>> administrator the option to drop all others.
>
> So what happens to ipi_spec_dst if that admin option is set?

I would drop the packet completely. Drop with payload has to be
implemented as a separate check before skb_copy_datagram_msg
anyway.

It is possible to return the timestamp, but zero the fields, but I
find that harder to reason about, so it may cause subtle
process bugs.

A related question is what this field holds if the process
requests cmsg, but no payload. Again, this is probably
best treated as an illegal combination of options and
should fail hard.

>
> --Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski Dec. 1, 2014, 4:16 p.m. UTC | #5
On Mon, Dec 1, 2014 at 7:59 AM, Willem de Bruijn <willemb@google.com> wrote:
>>>>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>>>>> index 59eba6c..640f26c 100644
>>>>> --- a/net/ipv4/ip_sockglue.c
>>>>> +++ b/net/ipv4/ip_sockglue.c
>>>>> @@ -399,6 +399,22 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
>>>>>                 kfree_skb(skb);
>>>>>  }
>>>>>
>>>>> +static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
>>>>> +                                         const struct sk_buff *skb,
>>>>> +                                         int ee_origin)
>>>>> +{
>>>>> +       struct in_pktinfo *info = PKTINFO_SKB_CB(skb);
>>>>> +
>>>>> +       if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
>>>>> +           (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
>>>>> +           (!skb->dev))
>>>>> +               return false;
>>>
>>> This function is called to decide whether to call ip_cmsg_recv when
>>> the origin is not SO_EE_ORIGIN_ICMP. For other origins, the pktinfo
>>> field is not initialized, so we initialize it here.
>>>
>>> The socket owner must still set socket option IP_PKTINFO to exercise
>>> the relevant code in ip_cmsg_recv:
>>>
>>>         unsigned int flags = inet->cmsg_flags;
>>>
>>>         if (flags & 1)
>>>                 ip_cmsg_recv_pktinfo(msg, skb);
>>>
>>> and
>>>
>>> net/ipv4/ip_sockglue.c:48:#define IP_CMSG_PKTINFO 1
>>>
>>
>> Aha, got it.  I'm really not very familiar with the network plumbing
>> -- I just use it from the userspace side and occasionally try to write
>> patches when something doesn't work :)
>>
>>>
>>>>> +
>>>>> +       info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
>>>>
>>>> Is this the source addr chosen by the initial routing decision when
>>>> the packet was sent, or is it the final source addr on the way out?
>>>> If the latter, is this an information leak when network namespaces are
>>>> in use?
>>>
>>> Not as long as the entire payload is looped back with the metadata,
>>> as it is currently. Note my suggested fix at the top: to give processes
>>> an option to request timestamps without either, and to give the
>>> administrator the option to drop all others.
>>
>> So what happens to ipi_spec_dst if that admin option is set?
>
> I would drop the packet completely. Drop with payload has to be
> implemented as a separate check before skb_copy_datagram_msg
> anyway.
>
> It is possible to return the timestamp, but zero the fields, but I
> find that harder to reason about, so it may cause subtle
> process bugs.
>
> A related question is what this field holds if the process
> requests cmsg, but no payload. Again, this is probably
> best treated as an illegal combination of options and
> should fail hard.
>

Here's a thought: what if you just drop any timestamp loopback message
if the interface doesn't belong to the sending socket's network
namespace?

Does that solve all of the problems (except perhaps those associated
with LSM use or maybe ipsec)?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willem de Bruijn Dec. 1, 2014, 4:59 p.m. UTC | #6
> Here's a thought: what if you just drop any timestamp loopback message
> if the interface doesn't belong to the sending socket's network
> namespace?
>
> Does that solve all of the problems (except perhaps those associated
> with LSM use or maybe ipsec)?

I don't have an exhaustive list of potential vulnerabilities, so it's hard to
say. The iptables example was another case where policy is leaked
that might reasonably be intended to be hidden.

We have to do it behind a sysctl, to avoid breaking legacy applications.
If so, then I would just opt for strongest interpretation and apply it for
users, regardless of namespaces. The one exception is to always allow
for callers with CAP_NET_RAW, since those can always open a packet
socket for sniffing, anyway.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 1d6d02d..b08e272 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -122,7 +122,7 @@  SOF_TIMESTAMPING_RAW_HARDWARE:
 
 1.3.3 Timestamp Options
 
-The interface supports one option
+The interface supports the options
 
 SOF_TIMESTAMPING_OPT_ID:
 
@@ -145,6 +145,16 @@  SOF_TIMESTAMPING_OPT_ID:
   stream sockets, it increments with every byte.
 
 
+SOF_TIMESTAMPING_OPT_CMSG:
+
+  Support recv() cmsg for all timestamped packets. Control messages
+  are already supported unconditionally on all packets with receive
+  timestamps and on IPv6 packets with transmit timestamp. This option
+  extends them to IPv4 packets with transmit timestamp. One use case
+  is to correlate packets with their egress device, by enabling socket
+  option IP_PKTINFO simultaneously.
+
+
 1.4 Bytestream Timestamps
 
 The SO_TIMESTAMPING interface supports timestamping of bytes in a
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index ff35402..edbc888 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -23,8 +23,9 @@  enum {
 	SOF_TIMESTAMPING_OPT_ID = (1<<7),
 	SOF_TIMESTAMPING_TX_SCHED = (1<<8),
 	SOF_TIMESTAMPING_TX_ACK = (1<<9),
+	SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_ACK,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_CMSG,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 59eba6c..640f26c 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -399,6 +399,22 @@  void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
 		kfree_skb(skb);
 }
 
+static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
+					  const struct sk_buff *skb,
+					  int ee_origin)
+{
+	struct in_pktinfo *info = PKTINFO_SKB_CB(skb);
+
+	if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
+	    (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
+	    (!skb->dev))
+		return false;
+
+	info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
+	info->ipi_ifindex = skb->dev->ifindex;
+	return true;
+}
+
 /*
  *	Handle MSG_ERRQUEUE
  */
@@ -446,7 +462,9 @@  int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 	memcpy(&errhdr.ee, &serr->ee, sizeof(struct sock_extended_err));
 	sin = &errhdr.offender;
 	sin->sin_family = AF_UNSPEC;
-	if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
+
+	if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
+	    ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin)) {
 		struct inet_sock *inet = inet_sk(sk);
 
 		sin->sin_family = AF_INET;
@@ -1051,7 +1069,7 @@  e_inval:
 }
 
 /**
- * ipv4_pktinfo_prepare - transfert some info from rtable to skb
+ * ipv4_pktinfo_prepare - transfer some info from rtable to skb
  * @sk: socket
  * @skb: buffer
  *
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index cc11396..2464a00 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -325,6 +325,16 @@  void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu)
 	kfree_skb(skb);
 }
 
+static void ip6_datagram_prepare_pktinfo_errqueue(struct sk_buff *skb)
+{
+	int ifindex = skb->dev ? skb->dev->ifindex : -1;
+
+	if (skb->protocol == htons(ETH_P_IPV6))
+		IP6CB(skb)->iif = ifindex;
+	else
+		PKTINFO_SKB_CB(skb)->ipi_ifindex = ifindex;
+}
+
 /*
  *	Handle MSG_ERRQUEUE
  */
@@ -388,8 +398,12 @@  int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 		sin->sin6_family = AF_INET6;
 		sin->sin6_flowinfo = 0;
 		sin->sin6_port = 0;
-		if (np->rxopt.all)
+		if (np->rxopt.all) {
+			if (serr->ee.ee_origin != SO_EE_ORIGIN_ICMP &&
+			    serr->ee.ee_origin != SO_EE_ORIGIN_ICMP6)
+				ip6_datagram_prepare_pktinfo_errqueue(skb);
 			ip6_datagram_recv_common_ctl(sk, msg, skb);
+		}
 		if (skb->protocol == htons(ETH_P_IPV6)) {
 			sin->sin6_addr = ipv6_hdr(skb)->saddr;
 			if (np->rxopt.all)
@@ -491,7 +505,10 @@  void ip6_datagram_recv_common_ctl(struct sock *sk, struct msghdr *msg,
 			ipv6_addr_set_v4mapped(ip_hdr(skb)->daddr,
 					       &src_info.ipi6_addr);
 		}
-		put_cmsg(msg, SOL_IPV6, IPV6_PKTINFO, sizeof(src_info), &src_info);
+
+		if (src_info.ipi6_ifindex >= 0)
+			put_cmsg(msg, SOL_IPV6, IPV6_PKTINFO,
+				 sizeof(src_info), &src_info);
 	}
 }