diff mbox series

opal-prd: Add RestartSec to service script

Message ID 20200227060113.20316-1-hegdevasant@linux.vnet.ibm.com
State Changes Requested
Headers show
Series opal-prd: Add RestartSec to service script | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (f123417068e51842004bdc047c8c5107b70442ef)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Vasant Hegde Feb. 27, 2020, 6:01 a.m. UTC
We are seeing random failure of `opal-prd` during boot. Its failing during
boot mostly because prd module is not yet loaded.

Sample failure message:
----------------------
Feb 21 19:03:09 grsp1 opal-prd: FW: Can't open PRD device /dev/opal-prd: No such file or directory
Feb 21 19:03:09 grsp1 opal-prd: FW: Error initialising PRD channel

We have `Restart` option in service script. But systemd will attempt to
restart as soon as it fails and stops after few attempts (by default it
retries for 5 times). This patch add `RestartSec` option to service script

RestartSec = Configures the time to sleep before restarting a service.

Note that I have pickedup 30sec as wait time. Before this change we hit
`opal-prd` failure within 50 reboots. With this fix 200 reboots went fine
without any failure.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 external/opal-prd/opal-prd.service | 1 +
 1 file changed, 1 insertion(+)

Comments

Oliver O'Halloran Feb. 27, 2020, 11:39 a.m. UTC | #1
On Thu, Feb 27, 2020 at 5:01 PM Vasant Hegde
<hegdevasant@linux.vnet.ibm.com> wrote:
>
> We are seeing random failure of `opal-prd` during boot. Its failing during
> boot mostly because prd module is not yet loaded.

Ehhhhh, surely systemd has some way to specify that the opal-prd
service depends on the driver being loaded?


> Sample failure message:
> ----------------------
> Feb 21 19:03:09 grsp1 opal-prd: FW: Can't open PRD device /dev/opal-prd: No such file or directory
> Feb 21 19:03:09 grsp1 opal-prd: FW: Error initialising PRD channel
>
> We have `Restart` option in service script. But systemd will attempt to
> restart as soon as it fails and stops after few attempts (by default it
> retries for 5 times). This patch add `RestartSec` option to service script
>
> RestartSec = Configures the time to sleep before restarting a service.
>
> Note that I have pickedup 30sec as wait time. Before this change we hit
> `opal-prd` failure within 50 reboots. With this fix 200 reboots went fine
> without any failure.
>
> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> ---
>  external/opal-prd/opal-prd.service | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/external/opal-prd/opal-prd.service b/external/opal-prd/opal-prd.service
> index dce0dd262..755b5f45c 100644
> --- a/external/opal-prd/opal-prd.service
> +++ b/external/opal-prd/opal-prd.service
> @@ -6,6 +6,7 @@ ConditionPathExists=/sys/firmware/devicetree/base/ibm,opal/diagnostics
>  [Service]
>  ExecStart=/usr/sbin/opal-prd
>  Restart=always
> +RestartSec=30
>
>  [Install]
>  WantedBy=multi-user.target
> --
> 2.21.1
>
Dan Horák Feb. 27, 2020, 12:29 p.m. UTC | #2
On Thu, 27 Feb 2020 22:39:51 +1100
"Oliver O'Halloran" <oohall@gmail.com> wrote:

> On Thu, Feb 27, 2020 at 5:01 PM Vasant Hegde
> <hegdevasant@linux.vnet.ibm.com> wrote:
> >
> > We are seeing random failure of `opal-prd` during boot. Its failing
> > during boot mostly because prd module is not yet loaded.
> 
> Ehhhhh, surely systemd has some way to specify that the opal-prd
> service depends on the driver being loaded?

yup, a systemd service should be able to wait, I have found
https://lists.fedoraproject.org/pipermail/devel/2012-January/160917.html


		Dan
 
> 
> > Sample failure message:
> > ----------------------
> > Feb 21 19:03:09 grsp1 opal-prd: FW: Can't open PRD
> > device /dev/opal-prd: No such file or directory Feb 21 19:03:09
> > grsp1 opal-prd: FW: Error initialising PRD channel
> >
> > We have `Restart` option in service script. But systemd will
> > attempt to restart as soon as it fails and stops after few attempts
> > (by default it retries for 5 times). This patch add `RestartSec`
> > option to service script
> >
> > RestartSec = Configures the time to sleep before restarting a
> > service.
> >
> > Note that I have pickedup 30sec as wait time. Before this change we
> > hit `opal-prd` failure within 50 reboots. With this fix 200 reboots
> > went fine without any failure.
> >
> > Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> > ---
> >  external/opal-prd/opal-prd.service | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/external/opal-prd/opal-prd.service
> > b/external/opal-prd/opal-prd.service index dce0dd262..755b5f45c
> > 100644
> > --- a/external/opal-prd/opal-prd.service
> > +++ b/external/opal-prd/opal-prd.service
> > @@ -6,6 +6,7 @@
> > ConditionPathExists=/sys/firmware/devicetree/base/ibm,opal/diagnostics
> > [Service] ExecStart=/usr/sbin/opal-prd
> >  Restart=always
> > +RestartSec=30
> >
> >  [Install]
> >  WantedBy=multi-user.target
> > --
> > 2.21.1
> >
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
diff mbox series

Patch

diff --git a/external/opal-prd/opal-prd.service b/external/opal-prd/opal-prd.service
index dce0dd262..755b5f45c 100644
--- a/external/opal-prd/opal-prd.service
+++ b/external/opal-prd/opal-prd.service
@@ -6,6 +6,7 @@  ConditionPathExists=/sys/firmware/devicetree/base/ibm,opal/diagnostics
 [Service]
 ExecStart=/usr/sbin/opal-prd
 Restart=always
+RestartSec=30
 
 [Install]
 WantedBy=multi-user.target