diff mbox series

[SRU,B,F,1/1] scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP action

Message ID 20200716144446.50898-2-frank.heimes@canonical.com
State New
Headers show
Series zfcp: Fix panic on ERP timeout for previously dismissed ERP (LP: 1887774) | expand

Commit Message

Frank Heimes July 16, 2020, 2:44 p.m. UTC
From: Steffen Maier <maier@linux.ibm.com>

BugLink: https://bugs.launchpad.net/bugs/1887774

Suppose that, for unrelated reasons, FSF requests on behalf of recovery are
very slow and can run into the ERP timeout.

In the case at hand, we did adapter recovery to a large degree.  However
due to the slowness a LUN open is pending so the corresponding fc_rport
remains blocked.  After fast_io_fail_tmo we trigger close physical port
recovery for the port under which the LUN should have been opened.  The new
higher order port recovery dismisses the pending LUN open ERP action and
dismisses the pending LUN open FSF request.  Such dismissal decouples the
ERP action from the pending corresponding FSF request by setting
zfcp_fsf_req->erp_action to NULL (among other things)
[zfcp_erp_strategy_check_fsfreq()].

If now the ERP timeout for the pending open LUN request runs out, we must
not use zfcp_fsf_req->erp_action in the ERP timeout handler.  This is a
problem since v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use
timer_setup()"). Before that we intentionally only passed zfcp_erp_action
as context argument to zfcp_erp_timeout_handler().

Note: The lifetime of the corresponding zfcp_fsf_req object continues until
a (late) response or an (unrelated) adapter recovery.

Just like the regular response path ignores dismissed requests
[zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the
ERP timeout handler now needs to ignore dismissed requests.  So simply
return early in the ERP timeout handler if the FSF request is marked as
dismissed in its status flags.  To protect against the race where
zfcp_erp_strategy_check_fsfreq() dismisses and sets
zfcp_fsf_req->erp_action to NULL after our previous status flag check,
return early if zfcp_fsf_req->erp_action is NULL.  After all, the former
ERP action does not need to be woken up as that was already done as part of
the dismissal above [zfcp_erp_action_dismiss()].

This fixes the following panic due to kernel page fault in IRQ context:

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007 S:00000e3e7ffd7000 P:000000000000013d
Oops: 0004 ilc:2 [#1] SMP
Modules linked in: ...
CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G            E  X   ...
Hardware name: IBM 8561 T01 701 (LPAR)
Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0 [zfcp])
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0 0000000000030000
           000000010028e700 000000000400a39c 000000010028e700 00000e3e7cf87e02
           0000000010000000 0700098591cb67f0 0000000000000000 0000000000000000
           0000033840e9a000 0000000000000000 001fffe008d6bc18 001fffe008d6bbc8
Krnl Code: 001fffff80549bd4: a7180000            lhi     %r1,0
           001fffff80549bd8: 4120a0f0            la      %r2,240(%r10)
          #001fffff80549bdc: a53e0003            llilh   %r3,3
          >001fffff80549be0: ba132000            cs      %r1,%r3,0(%r2)
           001fffff80549be4: a7740037            brc     7,1fffff80549c52
           001fffff80549be8: e320b0180004        lg      %r2,24(%r11)
           001fffff80549bee: e31020e00004        lg      %r1,224(%r2)
           001fffff80549bf4: 412020e0            la      %r2,224(%r2)
Call Trace:
 [<001fffff80549be0>] zfcp_erp_notify+0x40/0xc0 [zfcp]
 [<00000985915e26f0>] call_timer_fn+0x38/0x190
 [<00000985915e2944>] expire_timers+0xfc/0x190
 [<00000985915e2ac4>] run_timer_softirq+0xec/0x218
 [<0000098591ca7c4c>] __do_softirq+0x144/0x398
 [<00000985915110aa>] do_softirq_own_stack+0x72/0x88
 [<0000098591551b58>] irq_exit+0xb0/0xb8
 [<0000098591510c6a>] do_IRQ+0x82/0xb0
 [<0000098591ca7140>] ext_int_handler+0x128/0x12c
 [<0000098591722d98>] clear_subpage.constprop.13+0x38/0x60
([<000009859172ae4c>] clear_huge_page+0xec/0x250)
 [<000009859177e7a2>] do_huge_pmd_anonymous_page+0x32a/0x768
 [<000009859172a712>] __handle_mm_fault+0x88a/0x900
 [<000009859172a860>] handle_mm_fault+0xd8/0x1b0
 [<0000098591529ef6>] do_dat_exception+0x136/0x3e8
 [<0000098591ca6d34>] pgm_check_handler+0x1c8/0x220
Last Breaking-Event-Address:
 [<001fffff80549c88>] zfcp_erp_timeout_handler+0x10/0x18 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt

Link: https://lore.kernel.org/r/20200623140242.98864-1-maier@linux.ibm.com
Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()")
Cc: <stable@vger.kernel.org> #4.15+
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 936e6b85da0476dd2edac7c51c68072da9fb4ba2)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
---
 drivers/s390/scsi/zfcp_erp.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Colin Ian King July 16, 2020, 2:49 p.m. UTC | #1
On 16/07/2020 15:44, frank.heimes@canonical.com wrote:
> From: Steffen Maier <maier@linux.ibm.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1887774
> 
> Suppose that, for unrelated reasons, FSF requests on behalf of recovery are
> very slow and can run into the ERP timeout.
> 
> In the case at hand, we did adapter recovery to a large degree.  However
> due to the slowness a LUN open is pending so the corresponding fc_rport
> remains blocked.  After fast_io_fail_tmo we trigger close physical port
> recovery for the port under which the LUN should have been opened.  The new
> higher order port recovery dismisses the pending LUN open ERP action and
> dismisses the pending LUN open FSF request.  Such dismissal decouples the
> ERP action from the pending corresponding FSF request by setting
> zfcp_fsf_req->erp_action to NULL (among other things)
> [zfcp_erp_strategy_check_fsfreq()].
> 
> If now the ERP timeout for the pending open LUN request runs out, we must
> not use zfcp_fsf_req->erp_action in the ERP timeout handler.  This is a
> problem since v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use
> timer_setup()"). Before that we intentionally only passed zfcp_erp_action
> as context argument to zfcp_erp_timeout_handler().
> 
> Note: The lifetime of the corresponding zfcp_fsf_req object continues until
> a (late) response or an (unrelated) adapter recovery.
> 
> Just like the regular response path ignores dismissed requests
> [zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the
> ERP timeout handler now needs to ignore dismissed requests.  So simply
> return early in the ERP timeout handler if the FSF request is marked as
> dismissed in its status flags.  To protect against the race where
> zfcp_erp_strategy_check_fsfreq() dismisses and sets
> zfcp_fsf_req->erp_action to NULL after our previous status flag check,
> return early if zfcp_fsf_req->erp_action is NULL.  After all, the former
> ERP action does not need to be woken up as that was already done as part of
> the dismissal above [zfcp_erp_action_dismiss()].
> 
> This fixes the following panic due to kernel page fault in IRQ context:
> 
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: 0000000000000000 TEID: 0000000000000483
> Fault in home space mode while using kernel ASCE.
> AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007 S:00000e3e7ffd7000 P:000000000000013d
> Oops: 0004 ilc:2 [#1] SMP
> Modules linked in: ...
> CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G            E  X   ...
> Hardware name: IBM 8561 T01 701 (LPAR)
> Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0 [zfcp])
>            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0 0000000000030000
>            000000010028e700 000000000400a39c 000000010028e700 00000e3e7cf87e02
>            0000000010000000 0700098591cb67f0 0000000000000000 0000000000000000
>            0000033840e9a000 0000000000000000 001fffe008d6bc18 001fffe008d6bbc8
> Krnl Code: 001fffff80549bd4: a7180000            lhi     %r1,0
>            001fffff80549bd8: 4120a0f0            la      %r2,240(%r10)
>           #001fffff80549bdc: a53e0003            llilh   %r3,3
>           >001fffff80549be0: ba132000            cs      %r1,%r3,0(%r2)
>            001fffff80549be4: a7740037            brc     7,1fffff80549c52
>            001fffff80549be8: e320b0180004        lg      %r2,24(%r11)
>            001fffff80549bee: e31020e00004        lg      %r1,224(%r2)
>            001fffff80549bf4: 412020e0            la      %r2,224(%r2)
> Call Trace:
>  [<001fffff80549be0>] zfcp_erp_notify+0x40/0xc0 [zfcp]
>  [<00000985915e26f0>] call_timer_fn+0x38/0x190
>  [<00000985915e2944>] expire_timers+0xfc/0x190
>  [<00000985915e2ac4>] run_timer_softirq+0xec/0x218
>  [<0000098591ca7c4c>] __do_softirq+0x144/0x398
>  [<00000985915110aa>] do_softirq_own_stack+0x72/0x88
>  [<0000098591551b58>] irq_exit+0xb0/0xb8
>  [<0000098591510c6a>] do_IRQ+0x82/0xb0
>  [<0000098591ca7140>] ext_int_handler+0x128/0x12c
>  [<0000098591722d98>] clear_subpage.constprop.13+0x38/0x60
> ([<000009859172ae4c>] clear_huge_page+0xec/0x250)
>  [<000009859177e7a2>] do_huge_pmd_anonymous_page+0x32a/0x768
>  [<000009859172a712>] __handle_mm_fault+0x88a/0x900
>  [<000009859172a860>] handle_mm_fault+0xd8/0x1b0
>  [<0000098591529ef6>] do_dat_exception+0x136/0x3e8
>  [<0000098591ca6d34>] pgm_check_handler+0x1c8/0x220
> Last Breaking-Event-Address:
>  [<001fffff80549c88>] zfcp_erp_timeout_handler+0x10/0x18 [zfcp]
> Kernel panic - not syncing: Fatal exception in interrupt
> 
> Link: https://lore.kernel.org/r/20200623140242.98864-1-maier@linux.ibm.com
> Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()")
> Cc: <stable@vger.kernel.org> #4.15+
> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
> Signed-off-by: Steffen Maier <maier@linux.ibm.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> (cherry picked from commit 936e6b85da0476dd2edac7c51c68072da9fb4ba2)
> Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
> ---
>  drivers/s390/scsi/zfcp_erp.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c
> index db320dab1fee..79f6e8fb03ca 100644
> --- a/drivers/s390/scsi/zfcp_erp.c
> +++ b/drivers/s390/scsi/zfcp_erp.c
> @@ -577,7 +577,10 @@ static void zfcp_erp_strategy_check_fsfreq(struct zfcp_erp_action *act)
>  				   ZFCP_STATUS_ERP_TIMEDOUT)) {
>  			req->status |= ZFCP_STATUS_FSFREQ_DISMISSED;
>  			zfcp_dbf_rec_run("erscf_1", act);
> -			req->erp_action = NULL;
> +			/* lock-free concurrent access with
> +			 * zfcp_erp_timeout_handler()
> +			 */
> +			WRITE_ONCE(req->erp_action, NULL);
>  		}
>  		if (act->status & ZFCP_STATUS_ERP_TIMEDOUT)
>  			zfcp_dbf_rec_run("erscf_2", act);
> @@ -613,8 +616,14 @@ void zfcp_erp_notify(struct zfcp_erp_action *erp_action, unsigned long set_mask)
>  void zfcp_erp_timeout_handler(struct timer_list *t)
>  {
>  	struct zfcp_fsf_req *fsf_req = from_timer(fsf_req, t, timer);
> -	struct zfcp_erp_action *act = fsf_req->erp_action;
> +	struct zfcp_erp_action *act;
>  
> +	if (fsf_req->status & ZFCP_STATUS_FSFREQ_DISMISSED)
> +		return;
> +	/* lock-free concurrent access with zfcp_erp_strategy_check_fsfreq() */
> +	act = READ_ONCE(fsf_req->erp_action);
> +	if (!act)
> +		return;
>  	zfcp_erp_notify(act, ZFCP_STATUS_ERP_TIMEDOUT);
>  }
>  
> 
Fix looks sane, upstream fix that addresses this issue. The SRU
justification in the bug report may need a bit of a clean up to match
the normal SRU template style.

Acked-by: Colin Ian King <colin.king@canonical.com>
Frank Heimes July 16, 2020, 3:26 p.m. UTC | #2
Hi Colin, well, yeah - what we see here is the official message from the
upstream commit.
I thought about summarizing this for the SRU Justification, but then
thought I will eventually end up in just repeating information,
since the bug description in LP covers that level of detail, too (in add.
to the justification).
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1887774
(need to think about improving it next time ...)


On Thu, Jul 16, 2020 at 4:50 PM Colin Ian King <colin.king@canonical.com>
wrote:

> On 16/07/2020 15:44, frank.heimes@canonical.com wrote:
> > From: Steffen Maier <maier@linux.ibm.com>
> >
> > BugLink: https://bugs.launchpad.net/bugs/1887774
> >
> > Suppose that, for unrelated reasons, FSF requests on behalf of recovery
> are
> > very slow and can run into the ERP timeout.
> >
> > In the case at hand, we did adapter recovery to a large degree.  However
> > due to the slowness a LUN open is pending so the corresponding fc_rport
> > remains blocked.  After fast_io_fail_tmo we trigger close physical port
> > recovery for the port under which the LUN should have been opened.  The
> new
> > higher order port recovery dismisses the pending LUN open ERP action and
> > dismisses the pending LUN open FSF request.  Such dismissal decouples the
> > ERP action from the pending corresponding FSF request by setting
> > zfcp_fsf_req->erp_action to NULL (among other things)
> > [zfcp_erp_strategy_check_fsfreq()].
> >
> > If now the ERP timeout for the pending open LUN request runs out, we must
> > not use zfcp_fsf_req->erp_action in the ERP timeout handler.  This is a
> > problem since v4.15 commit 75492a51568b ("s390/scsi: Convert timers to
> use
> > timer_setup()"). Before that we intentionally only passed zfcp_erp_action
> > as context argument to zfcp_erp_timeout_handler().
> >
> > Note: The lifetime of the corresponding zfcp_fsf_req object continues
> until
> > a (late) response or an (unrelated) adapter recovery.
> >
> > Just like the regular response path ignores dismissed requests
> > [zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early]
> the
> > ERP timeout handler now needs to ignore dismissed requests.  So simply
> > return early in the ERP timeout handler if the FSF request is marked as
> > dismissed in its status flags.  To protect against the race where
> > zfcp_erp_strategy_check_fsfreq() dismisses and sets
> > zfcp_fsf_req->erp_action to NULL after our previous status flag check,
> > return early if zfcp_fsf_req->erp_action is NULL.  After all, the former
> > ERP action does not need to be woken up as that was already done as part
> of
> > the dismissal above [zfcp_erp_action_dismiss()].
> >
> > This fixes the following panic due to kernel page fault in IRQ context:
> >
> > Unable to handle kernel pointer dereference in virtual kernel address
> space
> > Failing address: 0000000000000000 TEID: 0000000000000483
> > Fault in home space mode while using kernel ASCE.
> > AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007
> S:00000e3e7ffd7000 P:000000000000013d
> > Oops: 0004 ilc:2 [#1] SMP
> > Modules linked in: ...
> > CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G            E
> X   ...
> > Hardware name: IBM 8561 T01 701 (LPAR)
> > Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0
> [zfcp])
> >            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0
> 0000000000030000
> >            000000010028e700 000000000400a39c 000000010028e700
> 00000e3e7cf87e02
> >            0000000010000000 0700098591cb67f0 0000000000000000
> 0000000000000000
> >            0000033840e9a000 0000000000000000 001fffe008d6bc18
> 001fffe008d6bbc8
> > Krnl Code: 001fffff80549bd4: a7180000            lhi     %r1,0
> >            001fffff80549bd8: 4120a0f0            la      %r2,240(%r10)
> >           #001fffff80549bdc: a53e0003            llilh   %r3,3
> >           >001fffff80549be0: ba132000            cs      %r1,%r3,0(%r2)
> >            001fffff80549be4: a7740037            brc     7,1fffff80549c52
> >            001fffff80549be8: e320b0180004        lg      %r2,24(%r11)
> >            001fffff80549bee: e31020e00004        lg      %r1,224(%r2)
> >            001fffff80549bf4: 412020e0            la      %r2,224(%r2)
> > Call Trace:
> >  [<001fffff80549be0>] zfcp_erp_notify+0x40/0xc0 [zfcp]
> >  [<00000985915e26f0>] call_timer_fn+0x38/0x190
> >  [<00000985915e2944>] expire_timers+0xfc/0x190
> >  [<00000985915e2ac4>] run_timer_softirq+0xec/0x218
> >  [<0000098591ca7c4c>] __do_softirq+0x144/0x398
> >  [<00000985915110aa>] do_softirq_own_stack+0x72/0x88
> >  [<0000098591551b58>] irq_exit+0xb0/0xb8
> >  [<0000098591510c6a>] do_IRQ+0x82/0xb0
> >  [<0000098591ca7140>] ext_int_handler+0x128/0x12c
> >  [<0000098591722d98>] clear_subpage.constprop.13+0x38/0x60
> > ([<000009859172ae4c>] clear_huge_page+0xec/0x250)
> >  [<000009859177e7a2>] do_huge_pmd_anonymous_page+0x32a/0x768
> >  [<000009859172a712>] __handle_mm_fault+0x88a/0x900
> >  [<000009859172a860>] handle_mm_fault+0xd8/0x1b0
> >  [<0000098591529ef6>] do_dat_exception+0x136/0x3e8
> >  [<0000098591ca6d34>] pgm_check_handler+0x1c8/0x220
> > Last Breaking-Event-Address:
> >  [<001fffff80549c88>] zfcp_erp_timeout_handler+0x10/0x18 [zfcp]
> > Kernel panic - not syncing: Fatal exception in interrupt
> >
> > Link:
> https://lore.kernel.org/r/20200623140242.98864-1-maier@linux.ibm.com
> > Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()")
> > Cc: <stable@vger.kernel.org> #4.15+
> > Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
> > Signed-off-by: Steffen Maier <maier@linux.ibm.com>
> > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> > (cherry picked from commit 936e6b85da0476dd2edac7c51c68072da9fb4ba2)
> > Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
> > ---
> >  drivers/s390/scsi/zfcp_erp.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c
> > index db320dab1fee..79f6e8fb03ca 100644
> > --- a/drivers/s390/scsi/zfcp_erp.c
> > +++ b/drivers/s390/scsi/zfcp_erp.c
> > @@ -577,7 +577,10 @@ static void zfcp_erp_strategy_check_fsfreq(struct
> zfcp_erp_action *act)
> >                                  ZFCP_STATUS_ERP_TIMEDOUT)) {
> >                       req->status |= ZFCP_STATUS_FSFREQ_DISMISSED;
> >                       zfcp_dbf_rec_run("erscf_1", act);
> > -                     req->erp_action = NULL;
> > +                     /* lock-free concurrent access with
> > +                      * zfcp_erp_timeout_handler()
> > +                      */
> > +                     WRITE_ONCE(req->erp_action, NULL);
> >               }
> >               if (act->status & ZFCP_STATUS_ERP_TIMEDOUT)
> >                       zfcp_dbf_rec_run("erscf_2", act);
> > @@ -613,8 +616,14 @@ void zfcp_erp_notify(struct zfcp_erp_action
> *erp_action, unsigned long set_mask)
> >  void zfcp_erp_timeout_handler(struct timer_list *t)
> >  {
> >       struct zfcp_fsf_req *fsf_req = from_timer(fsf_req, t, timer);
> > -     struct zfcp_erp_action *act = fsf_req->erp_action;
> > +     struct zfcp_erp_action *act;
> >
> > +     if (fsf_req->status & ZFCP_STATUS_FSFREQ_DISMISSED)
> > +             return;
> > +     /* lock-free concurrent access with
> zfcp_erp_strategy_check_fsfreq() */
> > +     act = READ_ONCE(fsf_req->erp_action);
> > +     if (!act)
> > +             return;
> >       zfcp_erp_notify(act, ZFCP_STATUS_ERP_TIMEDOUT);
> >  }
> >
> >
> Fix looks sane, upstream fix that addresses this issue. The SRU
> justification in the bug report may need a bit of a clean up to match
> the normal SRU template style.
>
> Acked-by: Colin Ian King <colin.king@canonical.com>
>
diff mbox series

Patch

diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c
index db320dab1fee..79f6e8fb03ca 100644
--- a/drivers/s390/scsi/zfcp_erp.c
+++ b/drivers/s390/scsi/zfcp_erp.c
@@ -577,7 +577,10 @@  static void zfcp_erp_strategy_check_fsfreq(struct zfcp_erp_action *act)
 				   ZFCP_STATUS_ERP_TIMEDOUT)) {
 			req->status |= ZFCP_STATUS_FSFREQ_DISMISSED;
 			zfcp_dbf_rec_run("erscf_1", act);
-			req->erp_action = NULL;
+			/* lock-free concurrent access with
+			 * zfcp_erp_timeout_handler()
+			 */
+			WRITE_ONCE(req->erp_action, NULL);
 		}
 		if (act->status & ZFCP_STATUS_ERP_TIMEDOUT)
 			zfcp_dbf_rec_run("erscf_2", act);
@@ -613,8 +616,14 @@  void zfcp_erp_notify(struct zfcp_erp_action *erp_action, unsigned long set_mask)
 void zfcp_erp_timeout_handler(struct timer_list *t)
 {
 	struct zfcp_fsf_req *fsf_req = from_timer(fsf_req, t, timer);
-	struct zfcp_erp_action *act = fsf_req->erp_action;
+	struct zfcp_erp_action *act;
 
+	if (fsf_req->status & ZFCP_STATUS_FSFREQ_DISMISSED)
+		return;
+	/* lock-free concurrent access with zfcp_erp_strategy_check_fsfreq() */
+	act = READ_ONCE(fsf_req->erp_action);
+	if (!act)
+		return;
 	zfcp_erp_notify(act, ZFCP_STATUS_ERP_TIMEDOUT);
 }