diff mbox

[PATCHv4,net-next-2.6,4/5] XFRM,IPv6: Add IRO remapping hook in xfrm_input()

Message ID db067c0a2ab679dfb16c84e8509e671fa6c5cb01.1286139129.git.arno@natisbad.org
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Arnaud Ebalard Oct. 4, 2010, 6:25 a.m. UTC
Add a hook in xfrm_input() to allow IRO remapping to occur when
an incoming packet matching an existing SA (based on SPI) with
an unexpected destination or source address is received.
Because IRO does not consume additional bits in a packet (that's
the point), there is no way to demultiplex based on something
like nh or spi. Instead, IRO input handlers (for source and
destination address remapping) are called upon address mismatch
during IPsec processing.
For that to work, we rely on the fact that SPI values generated
locally are no more linked to destination address (first patch
of the set) and we postpone a bit the expected address check in
xfrm_input() (inside xfrm_state_lookup() against daddr param) by
introducing a call to the input_addr_check() handler from the
struct xfrm_state_afinfo associated with the address family.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
---
 include/net/xfrm.h     |    5 +++
 net/ipv4/xfrm4_input.c |   11 +++++++
 net/ipv4/xfrm4_state.c |    1 +
 net/ipv6/xfrm6_input.c |   69 +++++++++++++++++++++++++++++++++++++++++++++++-
 net/ipv6/xfrm6_state.c |    1 +
 net/xfrm/xfrm_input.c  |    5 ++-
 net/xfrm/xfrm_state.c  |    2 +-
 7 files changed, 90 insertions(+), 4 deletions(-)

Comments

Herbert Xu Oct. 4, 2010, 8:40 a.m. UTC | #1
On Mon, Oct 04, 2010 at 08:25:23AM +0200, Arnaud Ebalard wrote:
> Add a hook in xfrm_input() to allow IRO remapping to occur when
> an incoming packet matching an existing SA (based on SPI) with
> an unexpected destination or source address is received.
> Because IRO does not consume additional bits in a packet (that's
> the point), there is no way to demultiplex based on something
> like nh or spi. Instead, IRO input handlers (for source and
> destination address remapping) are called upon address mismatch
> during IPsec processing.
> For that to work, we rely on the fact that SPI values generated
> locally are no more linked to destination address (first patch
> of the set) and we postpone a bit the expected address check in
> xfrm_input() (inside xfrm_state_lookup() against daddr param) by
> introducing a call to the input_addr_check() handler from the
> struct xfrm_state_afinfo associated with the address family.
> 
> Signed-off-by: Arnaud Ebalard <arno@natisbad.org>

I would prefer for this check to go into x->type->input since
it does not apply to IPsec.

Just because the SPI is unique for inbound SAs, it doesn't mean
that we should ignore the destination IP address in the packet for
IPsec.

I think another way of getting what you want is to simply add
inbound SAs with a zero destination address in your case which
can then be made to match any destination IP address.  You can
then follow that up with additional checks in x->type->input.

Cheers,
Arnaud Ebalard Oct. 4, 2010, 8:51 p.m. UTC | #2
Hi,

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Mon, Oct 04, 2010 at 08:25:23AM +0200, Arnaud Ebalard wrote:
>> Add a hook in xfrm_input() to allow IRO remapping to occur when
>> an incoming packet matching an existing SA (based on SPI) with
>> an unexpected destination or source address is received.
>> Because IRO does not consume additional bits in a packet (that's
>> the point), there is no way to demultiplex based on something
>> like nh or spi. Instead, IRO input handlers (for source and
>> destination address remapping) are called upon address mismatch
>> during IPsec processing.
>> For that to work, we rely on the fact that SPI values generated
>> locally are no more linked to destination address (first patch
>> of the set) and we postpone a bit the expected address check in
>> xfrm_input() (inside xfrm_state_lookup() against daddr param) by
>> introducing a call to the input_addr_check() handler from the
>> struct xfrm_state_afinfo associated with the address family.
>> 
>> Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
>
> I would prefer for this check to go into x->type->input since
> it does not apply to IPsec.

Either I don't understand the sentence or this is not feasible: the
thing is there is nothing in the packet to demultiplex like nh for
RH2/HAO. Here, we only lookup for a remapping state when there is a
mismatch in the source/destination addresses expected for the SA.

That's the reason IRO remapping states only apply to IPsec traffic.

> Just because the SPI is unique for inbound SAs, it doesn't mean
> that we should ignore the destination IP address in the packet for
> IPsec.

I don't ignore it. Before the change, for input IPsec traffic, the SA
lookup is done as follows:

  - SA lookup based mostly on SPI
  - Destination address check (done simultaneously during lookup)
    fatal if mismatch

After the change, there are three steps for IPv6:

  - SA lookup based on SPI
  - Destination Address check
      mismatch => lookup for destination remapping state 
                  call for associated input handler
                  fatal if mismatch
  - Source Address check
      mismatch => lookup for source remapping state
                  call for associated input handler

Explanation makes it looks more complex than it is:

 - IPv4 IPsec is basically untouched
 - IPv6 IPsec is basically untouched when CONFIG_XFRM_SUB_POLICY is not
   enabled,  
 - when CONFIG_XFRM_SUB_POLICY is enabled additional work is done only
   for IPv6 upon address mismatch.

> I think another way of getting what you want is to simply add
> inbound SAs with a zero destination address in your case which
> can then be made to match any destination IP address.  You can
> then follow that up with additional checks in x->type->input.

The idea is to allow the optimization for unmodified IPsec SA
(between stable addresses, i.e. HoA). Updating IRO src/dst remapping
states allow changing the src/dst on-wire address for unmodifed SA w/o
the need to explicitly add RH2 and/or HAO. Additionally, because source
and destination remapping are not linked, your proposal would not solve
the source remapping case, would it? 

Thanks for your feedback and patience, Herbert!

Cheers,

a+

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Oct. 5, 2010, 6:27 a.m. UTC | #3
On Mon, Oct 04, 2010 at 10:51:46PM +0200, Arnaud Ebalard wrote:
> 
> Either I don't understand the sentence or this is not feasible: the
> thing is there is nothing in the packet to demultiplex like nh for
> RH2/HAO. Here, we only lookup for a remapping state when there is a
> mismatch in the source/destination addresses expected for the SA.
> 
> That's the reason IRO remapping states only apply to IPsec traffic.

I see.

The thing that bugs me is that you've added an indirect call for
all IPsec traffic when only MIPv6 users would ever need this.

With your remapping, would it be possible to add dummy xfrm_state
objects with the remapped destination address that could then call
xfrm6_input_addr?

That way normal IPsec users would not be affected at all while
preserving your new functionality.

Cheers,
Arnaud Ebalard Oct. 5, 2010, 11:28 p.m. UTC | #4
Hi,

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Mon, Oct 04, 2010 at 10:51:46PM +0200, Arnaud Ebalard wrote:
>> 
>> Either I don't understand the sentence or this is not feasible: the
>> thing is there is nothing in the packet to demultiplex like nh for
>> RH2/HAO. Here, we only lookup for a remapping state when there is a
>> mismatch in the source/destination addresses expected for the SA.
>> 
>> That's the reason IRO remapping states only apply to IPsec traffic.
>
> I see.
>
> The thing that bugs me is that you've added an indirect call for
> all IPsec traffic when only MIPv6 users would ever need this.

The destination address check is always done by the IPsec stack and
usually results in a direct drop if/when it fails. I just replace the
direct drop by some a possible recovery (a state lookup and a possible
remapping). The change does not impact standard IPsec users.

Regarding the source address check, I indeed add an additional memcmp()
with some additional work when there is an address mismatch. From a
performance standpoint, I *think* it does not change much: removing the
address from the hash computation for the lookup should balance the
comparison.

I made some pretty lame performance test with ... dd and nc. Two boxes,
both w/ gigabit intel cards connected via a Gigabit switch. The receiver
runs current net-next-2.6, and has static IPv6/IPv4 SA/SP (transport
mode ESP using AES):

 1) First, current net-next-2.6 (no patches applied)
 2) then, with my patches applied and CONFIG_XFRM_SUB_POLICY enabled
 3) then, with my patches applied and CONFIG_XFRM_SUB_POLICY disabled

I use dd and netcat6 on the source to send 1GB of data to the receiver
over TCP (over IPv4 and then IPv6) for the various flavours of kernel
above (on the receiver):

 dd if=/dev/zero bs=1024 count=1048576 | nc -x <receiverip> 1234

receiver has

 nc -x -l -p 1234 > /dev/null

The (lack of) results are (reported by dd, 3 runs each time):

 1) IPv4: 33.5760s (32.0 MB/s)   IPv6  29.0952s (36.9 MB/s)
          28.1210s (38.2 MB/s)         31.7187s (33.9 MB/s)
          29.6547s (36.2 MB/s)         30.6551s (35.0 MB/s)

 2) IPv4: 29.4168s (36.5 MB/s)   IPv6: 30.8944s (34.8 MB/s)
          28.6593s (37.5 MB/s)         30.0922s (35.7 MB/s)
          30.1222s (35.6 MB/s)         30.1781s (35.6 MB/s)

 3) IPv4: 31.0125s (34.6 MB/s)   IPv6: 31.6964s (33.9 MB/s)
          28.8677s (37.2 MB/s)         30.1182s (35.7 MB/s)
          30.4820s (35.2 MB/s)         30.4874s (35.2 MB/s)

I expected (hoped) additional processing time to somewhat add up and
appear in the final result but I think I will need to decrease the
rest of processing to prove you right :-) I'd be happy to do some
tests if you point me better tools or good parameters to do that
(use UDP?, change MTU?, NULL enc?, more runs? ...). 

> With your remapping, would it be possible to add dummy xfrm_state
> objects with the remapped destination address that could then call
> xfrm6_input_addr?
>
> That way normal IPsec users would not be affected at all while
> preserving your new functionality.

I don't think I can do that easily (at all?) with what XFRM provides,
can I? Or at least I don't see how it is possible because I would need
some kind of policy for the state to be applied and the only trigger I
see is the src/dst address mismatch when processing the IPsec packet.

Ideally, one could think the perfect solution would be to use the SPI
and associate a remapping state to it but I already dropped that one
because SPI tracking is simply a broken idea. Among the problems: you
need to update on rekeying, you can only install the remapping state
after SA is installed, ... The problem is that it is not stable, unlike
the addresses of the SA I use in current proposal.

Cheers,

a+
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Oct. 6, 2010, 1:25 a.m. UTC | #5
On Wed, Oct 06, 2010 at 01:28:42AM +0200, Arnaud Ebalard wrote:
>
>  1) First, current net-next-2.6 (no patches applied)
>  2) then, with my patches applied and CONFIG_XFRM_SUB_POLICY enabled
>  3) then, with my patches applied and CONFIG_XFRM_SUB_POLICY disabled

To measure the effect of this properly you should use null
encryption/hashing and look at the CPU utilisation with minimum
packet sizes.
 
> > With your remapping, would it be possible to add dummy xfrm_state
> > objects with the remapped destination address that could then call
> > xfrm6_input_addr?
> >
> > That way normal IPsec users would not be affected at all while
> > preserving your new functionality.
> 
> I don't think I can do that easily (at all?) with what XFRM provides,
> can I? Or at least I don't see how it is possible because I would need
> some kind of policy for the state to be applied and the only trigger I
> see is the src/dst address mismatch when processing the IPsec packet.

So do you know the remapped destination addresses a priori?

If not then then other possibility would be to add the code hook
in case of xfrm_state_lookup failure.

But more importantly you need to solve the hash collission issue
that I mentioned earlier.  Without that it won't work at all.

Cheers,
Arnaud Ebalard Oct. 6, 2010, 9:42 p.m. UTC | #6
Hi Herbert,

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Wed, Oct 06, 2010 at 01:28:42AM +0200, Arnaud Ebalard wrote:
>>
>>  1) First, current net-next-2.6 (no patches applied)
>>  2) then, with my patches applied and CONFIG_XFRM_SUB_POLICY enabled
>>  3) then, with my patches applied and CONFIG_XFRM_SUB_POLICY disabled
>
> To measure the effect of this properly you should use null
> encryption/hashing and look at the CPU utilisation with minimum
> packet sizes.

I did that w/ the following kernels on the receiver (same as before):

 - kernel 1: w/o patches (net-next-2.6)
 - kernel 2: patches applied w/ CONFIG_XFRM_SUB_POLICY
 - kernel 3: patches applied w/o CONFIG_XFRM_SUB_POLICY

5 runs of sending 1073741824 bytes over TCP protected using ESP (null
enc and null auth) in transport mode, forcing small packets:

dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x <v4orv6dst> 1234 

5 runs for each kernel for IPv4 and IPv6. I keep only the 3 central
values provided by dd indicating the completion of the emission (remove
highest and smallest) and do an average w/ those 3 values: 

              IPv4          IPv6
kernel 1    70.0815s      72.4843s
kernel 2    69.8335s      72.3462s
kernel 3    69.9758s      72.3588s

I used the exact same method for the tests for the 3 cases (reboot,
started the test, nothing else running). I am unable to explain why it's
longer for the test to complete with an unpatched kernel but this is
what I get. Maybe it is just an artefact and the impact is just simply
too small to be mesured. Anyway, the whole set of results are at the end
of the email. 


>> > With your remapping, would it be possible to add dummy xfrm_state
>> > objects with the remapped destination address that could then call
>> > xfrm6_input_addr?
>> >
>> > That way normal IPsec users would not be affected at all while
>> > preserving your new functionality.
>> 
>> I don't think I can do that easily (at all?) with what XFRM provides,
>> can I? Or at least I don't see how it is possible because I would need
>> some kind of policy for the state to be applied and the only trigger I
>> see is the src/dst address mismatch when processing the IPsec packet.
>
> So do you know the remapped destination addresses a priori?

I have both the HoA and the CoA (this is x->coaddr in my state. '::' is
allowed if I want to allow anything), but the appplication of the state
needs to be done for traffic meant for the HoA and not blindly for all
the IPsec traffic received with the CoA (as source or destination).

it's not possible to just blindly remap things based on the on-wire
address and the fact it is IPsec traffic. For instance, if some tunnel
mode IPsec traffic between a MN and its HA is used in parallel, I cannot
remap the CoA from received packets on the MN to the HoA.

That's the reason why the lookup is done via something stable (i.e. the
HoA) which is derived from the SPI during SA lookup. 


> If not then then other possibility would be to add the code hook
> in case of xfrm_state_lookup failure.

This would work for destination address. But it has the drawback of
requiring a first lookup (guaranteed to fail if destination remapping
for the feature) and then a second (somehow w/o using the destination
address). Source check would still be done in all cases.


> But more importantly you need to solve the hash collission issue
> that I mentioned earlier.  Without that it won't work at all.

You are correct. The fact that the byspi hash table contains all states
(for in and out traffic) and that the state lookup does not involve the
direction but only the destination address (damn, the one I remove ;-))
is a real issue. It just makes the state lookup unreliable when I pass
NULL as daddr for incoming traffic. Thanks for pointing that.

I don't see any good solution yet (state have no direction) but I will
definitely focus on that issue and spend some time on it tomorrow.

Cheers,

a+

The full set of results for the tests:

* kernel 1

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x 192.168.0.14 1234
1073741824 bytes (1.1 GB) copied, 70.0393 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.2442 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.1995 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.0057 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 69.9887 s, 15.3 MB/s

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x fdff::1 1234 
1073741824 bytes (1.1 GB) copied, 72.5605 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.9725 s, 14.7 MB/s
1073741824 bytes (1.1 GB) copied, 72.3099 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.435 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.4573 s, 14.8 MB/s

* kernel 2

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x 192.168.0.14 1234
1073741824 bytes (1.1 GB) copied, 69.6845 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 69.9419 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 69.8615 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 70.0142 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 69.6970 s, 15.4 MB/s

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x fdff::1 1234 
1073741824 bytes (1.1 GB) copied, 72.4252 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 71.5411 s, 15.0 MB/s
1073741824 bytes (1.1 GB) copied, 72.2388 s, 14.9 MB/s
1073741824 bytes (1.1 GB) copied, 72.3745 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.6553 s, 14.8 MB/s

* kernel 3

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x 192.168.0.14 1234
1073741824 bytes (1.1 GB) copied, 69.8569 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 70.4445 s, 15.2 MB/s
1073741824 bytes (1.1 GB) copied, 69.8989 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 70.0238 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.0047 s, 15.3 MB/s

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x fdff::1 1234 
1073741824 bytes (1.1 GB) copied, 72.4677 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.2757 s, 14.9 MB/s
1073741824 bytes (1.1 GB) copied, 72.2261 s, 14.9 MB/s
1073741824 bytes (1.1 GB) copied, 72.3331 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.8262 s, 14.7 MB/s

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 05b2b1f..5b84c19 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -312,6 +312,8 @@  struct xfrm_state_afinfo {
 						  struct sk_buff *skb);
 	int			(*transport_finish)(struct sk_buff *skb,
 						    int async);
+	int			(*input_addr_check)(struct sk_buff *skb,
+						    struct xfrm_state *x);
 };
 
 extern int xfrm_state_register_afinfo(struct xfrm_state_afinfo *afinfo);
@@ -623,6 +625,7 @@  struct xfrm_spi_skb_cb {
 		struct inet6_skb_parm h6;
 	} header;
 
+	unsigned int saddroff;
 	unsigned int daddroff;
 	unsigned int family;
 };
@@ -1405,6 +1408,7 @@  extern int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
 			   int encap_type);
 extern int xfrm4_transport_finish(struct sk_buff *skb, int async);
 extern int xfrm4_rcv(struct sk_buff *skb);
+extern int xfrm4_input_addr_check(struct sk_buff *skb, struct xfrm_state *x);
 
 static inline int xfrm4_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
@@ -1423,6 +1427,7 @@  extern int xfrm6_transport_finish(struct sk_buff *skb, int async);
 extern int xfrm6_rcv(struct sk_buff *skb);
 extern int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 			    xfrm_address_t *saddr, u8 proto);
+extern int xfrm6_input_addr_check(struct sk_buff *skb, struct xfrm_state *x);
 extern int xfrm6_tunnel_register(struct xfrm6_tunnel *handler, unsigned short family);
 extern int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler, unsigned short family);
 extern __be32 xfrm6_tunnel_alloc_spi(struct net *net, xfrm_address_t *saddr);
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 06814b6..8d414ca 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -41,6 +41,7 @@  int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
 		    int encap_type)
 {
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET;
+	XFRM_SPI_SKB_CB(skb)->saddroff = offsetof(struct iphdr, saddr);
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct iphdr, daddr);
 	return xfrm_input(skb, nexthdr, spi, encap_type);
 }
@@ -164,3 +165,13 @@  int xfrm4_rcv(struct sk_buff *skb)
 	return xfrm4_rcv_spi(skb, ip_hdr(skb)->protocol, 0);
 }
 EXPORT_SYMBOL(xfrm4_rcv);
+
+int xfrm4_input_addr_check(struct sk_buff *skb, struct xfrm_state *x)
+{
+	xfrm_address_t *daddr;
+
+	daddr = (xfrm_address_t *)(skb_network_header(skb) +
+				   XFRM_SPI_SKB_CB(skb)->daddroff);
+
+	return xfrm_addr_cmp(&x->id.daddr, daddr, AF_INET);
+}
diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c
index 4794762..c6b038a 100644
--- a/net/ipv4/xfrm4_state.c
+++ b/net/ipv4/xfrm4_state.c
@@ -79,6 +79,7 @@  static struct xfrm_state_afinfo xfrm4_state_afinfo = {
 	.extract_input		= xfrm4_extract_input,
 	.extract_output		= xfrm4_extract_output,
 	.transport_finish	= xfrm4_transport_finish,
+	.input_addr_check	= xfrm4_input_addr_check,
 };
 
 void __init xfrm4_state_init(void)
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index f8c3cf8..aeb7fc6 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -15,6 +15,7 @@ 
 #include <linux/netfilter_ipv6.h>
 #include <net/ipv6.h>
 #include <net/xfrm.h>
+#include <net/ip6_route.h> /* XXX for ip6_route_input() */
 
 int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 {
@@ -24,6 +25,7 @@  int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
+	XFRM_SPI_SKB_CB(skb)->saddroff = offsetof(struct ipv6hdr, saddr);
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
 	return xfrm_input(skb, nexthdr, spi, 0);
 }
@@ -142,5 +144,70 @@  int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 drop:
 	return -1;
 }
-
 EXPORT_SYMBOL(xfrm6_input_addr);
+
+#if defined(CONFIG_XFRM_SUB_POLICY)
+/* Perform check on source and destination addresses and possibly IRO
+ * address remapping upon mismatch and if matching IRO state exists. */
+int xfrm6_input_addr_check(struct sk_buff *skb, struct xfrm_state *x)
+{
+	xfrm_address_t *saddr, *exp_saddr, *daddr, *exp_daddr;
+
+	saddr = (xfrm_address_t *)(skb_network_header(skb) +
+				   XFRM_SPI_SKB_CB(skb)->saddroff);
+	daddr = (xfrm_address_t *)(skb_network_header(skb) +
+				   XFRM_SPI_SKB_CB(skb)->daddroff);
+
+	exp_daddr = &x->id.daddr;
+	if (xfrm_addr_cmp(exp_daddr, daddr, AF_INET6)) {
+		/* Destination address mismatch: check if we have an IRO
+		 * destination remapping state to explain that.
+		 *
+		 * Note: saddr is provided as a hint. If source address
+		 * is also a remapped one, xfrm6_input_addr() will manage
+		 * to find IRO destination remapping state */
+		if (xfrm6_input_addr(skb, exp_daddr, saddr,
+				     XFRM_PROTO_IRO_DST) < 0)
+			return -1;
+
+		/* Copy destination address to sec_path for sock opts and
+		 * replace packet destination address with expected HoA */
+		ipv6_addr_copy(&skb->sp->irodst, (struct in6_addr *)daddr);
+		ipv6_addr_copy((struct in6_addr *)daddr,
+			       (struct in6_addr *)exp_daddr);
+
+		skb_dst_drop(skb);
+		ip6_route_input(skb);
+		if (skb_dst(skb)->error)
+			return -1;
+	}
+
+	exp_saddr = &x->props.saddr;
+	if (xfrm_addr_cmp(exp_saddr, saddr, AF_INET6)) {
+		/* Source address mismatch: check if we have an IRO
+		 * source remapping state to explain that.
+		 *
+		 * Note: unlike for destination addresses above, a
+		 * source mismatch is not considered fatal */
+		if (xfrm6_input_addr(skb, daddr, exp_saddr,
+				     XFRM_PROTO_IRO_SRC) < 0)
+			return 0;
+
+		/* Copy destination address to sec_path for sock opts and
+		 * then replace source address with expected peer's HoA */
+		ipv6_addr_copy(&skb->sp->irosrc, (struct in6_addr *)saddr);
+		ipv6_addr_copy((struct in6_addr *)saddr,
+			       (struct in6_addr *)exp_saddr);
+	}
+
+	return 0;
+}
+#else
+int xfrm6_input_addr_check(struct sk_buff *skb, struct xfrm_state *x)
+{
+	xfrm_address_t *daddr;
+	daddr = (xfrm_address_t *)(skb_network_header(skb) +
+				   XFRM_SPI_SKB_CB(skb)->daddroff);
+	return xfrm_addr_cmp(&x->id.daddr, daddr, AF_INET6);
+}
+#endif
diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
index a67575d..aeb4688 100644
--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -179,6 +179,7 @@  static struct xfrm_state_afinfo xfrm6_state_afinfo = {
 	.extract_input		= xfrm6_extract_input,
 	.extract_output		= xfrm6_extract_output,
 	.transport_finish	= xfrm6_transport_finish,
+	.input_addr_check	= xfrm6_input_addr_check,
 };
 
 int __init xfrm6_state_init(void)
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 45f1c98..9ff65f6 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -152,8 +152,9 @@  int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			goto drop;
 		}
 
-		x = xfrm_state_lookup(net, skb->mark, daddr, spi, nexthdr, family);
-		if (x == NULL) {
+		x = xfrm_state_lookup(net, skb->mark, NULL, spi, nexthdr, family);
+		if (x == NULL ||
+		    x->outer_mode->afinfo->input_addr_check(skb, x)) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
 			xfrm_audit_state_notfound(skb, family, spi, seq);
 			goto drop;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index b6a4d8d..b8f7c08 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -685,7 +685,7 @@  static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark, xfrm_ad
 		if (x->props.family != family ||
 		    x->id.spi       != spi ||
 		    x->id.proto     != proto ||
-		    xfrm_addr_cmp(&x->id.daddr, daddr, family))
+		    (daddr && xfrm_addr_cmp(&x->id.daddr, daddr, family)))
 			continue;
 
 		if ((mark & x->mark.m) != x->mark.v)