Patchwork [RFC] Add timestamp to error message

login
register
mail settings
Submitter Seiji Aguchi
Date Feb. 1, 2013, 2:53 p.m.
Message ID <A5ED84D3BB3A384992CBB9C77DEDA4D414A5F112@USINDEM103.corp.hds.com>
Download mbox | patch
Permalink /patch/217554/
State New
Headers show

Comments

Seiji Aguchi - Feb. 1, 2013, 2:53 p.m.
[Issue]
When we offer a customer support service and a problem happens 
in a customer's system, we try to understand the problem by 
comparing what the customer reports with message logs of the 
customer's system.

In this case, we often need to know when the problem happens.

But, currently, there is no timestamp in qemu's error messages.
Therefore, we may not be able to understand the problem based on
error messages.

[Solution]
This patch adds a timestamp to qemu's error message logged by
error_report().

A logic calculating a time is copied from libvirt, src/util/virtime.c.

[TODO]
Other functions below are used to output error messages in qemu.
 - qerror_report() is called in many source codes.
 - fprintf() is called in vl.c.

A timestamp should be added to these messages as well.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
---
 util/qemu-error.c |   90 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 90 insertions(+), 0 deletions(-)

-- 1.7.1
Laszlo Ersek - Feb. 1, 2013, 6:28 p.m.
Hello Seiji-san,

On 02/01/13 15:53, Seiji Aguchi wrote:

> A logic calculating a time is copied from libvirt, src/util/virtime.c.

> +static void error_print_timestamp(void)

Are gmtime() + strftime() unsuitable for some reason?

Also, since the timestamp is ultimately printed with millisecond
resolution, clock_gettime(CLOCK_REALTIME) could be replaced with the
more portable gettimeofday().

Thanks,
Laszlo
Seiji Aguchi - Feb. 1, 2013, 8:25 p.m.
Hi Laszlo,

Thank you for reviewing my patch!
I will update my patch in accordance with your comment, using gmtime() + strftime() and gettimeofday().

Seiji
Eric Blake - Feb. 1, 2013, 9:35 p.m.
On 02/01/2013 11:28 AM, Laszlo Ersek wrote:
> Hello Seiji-san,
> 
> On 02/01/13 15:53, Seiji Aguchi wrote:
> 
>> A logic calculating a time is copied from libvirt, src/util/virtime.c.
> 
>> +static void error_print_timestamp(void)
> 
> Are gmtime() + strftime() unsuitable for some reason?

They are not async-signal safe, so they are not usable in between a
fork() and exec*().  Libvirt avoids them because it DOES log information
including timestamps in between fork/exec (and prior to writing
libvirt's virtime.c, we DID hit cases where libvirt would deadlock a
child process due to the non-safe use of a more naive timestamp
generator), but I don't know if qemu suffers from the same restriction
of when it has anything worth logging.
Seiji Aguchi - Feb. 1, 2013, 10:41 p.m.
> > Are gmtime() + strftime() unsuitable for some reason?

> 

> They are not async-signal safe, so they are not usable in between a

> fork() and exec*().  Libvirt avoids them because it DOES log information including timestamps in between fork/exec (and prior to

> writing libvirt's virtime.c, we DID hit cases where libvirt would deadlock a child process due to the non-safe use of a more naive

> timestamp generator), but I don't know if qemu suffers from the same restriction of when it has anything worth logging.


In my understanding, gmtime() + strftime() is usable in error_report() because it seems to be called 
For just a option check.

>[TODO]

>Other functions below are used to output error messages in qemu.

 >- qerror_report() is called in many source codes.

 >- fprintf() is called in vl.c.


On the other hand, qerror_report() and fprintf () seems to be called in signal handers.
So, in those cases, some signal-safe functions should be used.

Seiji
Markus Armbruster - Feb. 2, 2013, 7:23 a.m.
Seiji Aguchi <seiji.aguchi@hds.com> writes:

>> > Are gmtime() + strftime() unsuitable for some reason?
>> 
>> They are not async-signal safe, so they are not usable in between a
>> fork() and exec*().  Libvirt avoids them because it DOES log
>> information including timestamps in between fork/exec (and prior to
>> writing libvirt's virtime.c, we DID hit cases where libvirt would
>> deadlock a child process due to the non-safe use of a more naive
>> timestamp generator), but I don't know if qemu suffers from the same
>> restriction of when it has anything worth logging.
>
> In my understanding, gmtime() + strftime() is usable in error_report()
> because it seems to be called
> For just a option check.

error_report() is not bound to a particular task.  It simply reports
either to the current monitor or to stderr.  It's certainly not usuable
in a signal handler, or similar signal-unsafe context.

>>[TODO]
>>Other functions below are used to output error messages in qemu.
>  >- qerror_report() is called in many source codes.
>  >- fprintf() is called in vl.c.
>
> On the other hand, qerror_report() and fprintf () seems to be called
> in signal handers.

fprintf() in a signal handler?  Where?

> So, in those cases, some signal-safe functions should be used.
Markus Armbruster - Feb. 2, 2013, 7:31 a.m.
Seiji Aguchi <seiji.aguchi@hds.com> writes:

> [Issue]
> When we offer a customer support service and a problem happens 
> in a customer's system, we try to understand the problem by 
> comparing what the customer reports with message logs of the 
> customer's system.
>
> In this case, we often need to know when the problem happens.
>
> But, currently, there is no timestamp in qemu's error messages.
> Therefore, we may not be able to understand the problem based on
> error messages.
>
> [Solution]
> This patch adds a timestamp to qemu's error message logged by
> error_report().
>
> A logic calculating a time is copied from libvirt, src/util/virtime.c.

Do we really want to add timestamps to error messages unconditionally?
I don't doubt it's useful in your scenario, but most of the time it's
just annoying clutter.

> [TODO]
> Other functions below are used to output error messages in qemu.
>  - qerror_report() is called in many source codes.

Not a problem, it ends up in qerror_print(), which calls error_report().

>  - fprintf() is called in vl.c.

All over the place, not just vl.c.  Also monitor_printf().

> A timestamp should be added to these messages as well.

[...]
Stefan Hajnoczi - Feb. 4, 2013, 9:18 a.m.
On Sat, Feb 02, 2013 at 08:31:45AM +0100, Markus Armbruster wrote:
> Seiji Aguchi <seiji.aguchi@hds.com> writes:
> 
> > [Issue]
> > When we offer a customer support service and a problem happens 
> > in a customer's system, we try to understand the problem by 
> > comparing what the customer reports with message logs of the 
> > customer's system.
> >
> > In this case, we often need to know when the problem happens.
> >
> > But, currently, there is no timestamp in qemu's error messages.
> > Therefore, we may not be able to understand the problem based on
> > error messages.
> >
> > [Solution]
> > This patch adds a timestamp to qemu's error message logged by
> > error_report().
> >
> > A logic calculating a time is copied from libvirt, src/util/virtime.c.
> 
> Do we really want to add timestamps to error messages unconditionally?
> I don't doubt it's useful in your scenario, but most of the time it's
> just annoying clutter.

Agreed, I think it should be an option.  Also remember that management
tools can pass a pipe as stderr when starting QEMU, and then they can
add their preferred timestamping/log formatting outside of QEMU.

Stefan
Markus Armbruster - Feb. 4, 2013, 10:29 a.m.
Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Sat, Feb 02, 2013 at 08:31:45AM +0100, Markus Armbruster wrote:
>> Seiji Aguchi <seiji.aguchi@hds.com> writes:
>> 
>> > [Issue]
>> > When we offer a customer support service and a problem happens 
>> > in a customer's system, we try to understand the problem by 
>> > comparing what the customer reports with message logs of the 
>> > customer's system.
>> >
>> > In this case, we often need to know when the problem happens.
>> >
>> > But, currently, there is no timestamp in qemu's error messages.
>> > Therefore, we may not be able to understand the problem based on
>> > error messages.
>> >
>> > [Solution]
>> > This patch adds a timestamp to qemu's error message logged by
>> > error_report().
>> >
>> > A logic calculating a time is copied from libvirt, src/util/virtime.c.
>> 
>> Do we really want to add timestamps to error messages unconditionally?
>> I don't doubt it's useful in your scenario, but most of the time it's
>> just annoying clutter.
>
> Agreed, I think it should be an option.  Also remember that management
> tools can pass a pipe as stderr when starting QEMU, and then they can
> add their preferred timestamping/log formatting outside of QEMU.

That's actually how I'd solve the problem.
Seiji Aguchi - Feb. 4, 2013, 3:58 p.m.
> > Do we really want to add timestamps to error messages unconditionally?
> > I don't doubt it's useful in your scenario, but most of the time it's
> > just annoying clutter.
> 
> Agreed, I think it should be an option.  

OK. I will add the timestamp as an option.

> Also remember that management tools can pass a pipe as stderr when starting QEMU, and
> then they can add their preferred timestamping/log formatting outside of QEMU.

To get an exact timestamp, qemu should supply it to the messages.
Also, there are some customers who use qemu in a small environment.
In that case, they don't use the management like vdsm.

Seiji
Seiji Aguchi - Feb. 4, 2013, 4:09 p.m.
> > [TODO]
> > Other functions below are used to output error messages in qemu.
> >  - qerror_report() is called in many source codes.
> 
> Not a problem, it ends up in qerror_print(), which calls error_report().

Thanks.

> 
> >  - fprintf() is called in vl.c.
> 
> All over the place, not just vl.c.  Also monitor_printf().

I will take a look at the code in monitor_printf() and fprintf() in other places.

> > On the other hand, qerror_report() and fprintf () seems to be called
> > in signal handers.
> 
> fprintf() in a signal handler?  Where?

Sorry, I misunderstood the code.
In signal handers, qemu just set a value like shutdown_signal and shutdown_pid or so.

Seiji
Stefan Hajnoczi - Feb. 5, 2013, 12:40 p.m.
On Mon, Feb 04, 2013 at 03:58:29PM +0000, Seiji Aguchi wrote:
> > Also remember that management tools can pass a pipe as stderr when starting QEMU, and
> > then they can add their preferred timestamping/log formatting outside of QEMU.
> 
> To get an exact timestamp, qemu should supply it to the messages.
> Also, there are some customers who use qemu in a small environment.
> In that case, they don't use the management like vdsm.

Fair enough, I think an option doesn't hurt anyone.

Stefan
Laszlo Ersek - April 4, 2013, 2:54 p.m.
On 02/01/13 15:53, Seiji Aguchi wrote:

>  /*
>   * Print an error message to current monitor if we have one, else to stderr.
>   * Format arguments like sprintf().  The result should not contain
> @@ -207,6 +296,7 @@ void error_report(const char *fmt, ...)
>  {
>      va_list ap;
>  
> +    error_print_timestamp();
>      error_print_loc();
>      va_start(ap, fmt);
>      error_vprintf(fmt, ap);
> -- 1.7.1

Side note: strictly in theory, this would result in two vfprintf()
calls. The message log would remain "record oriented", but (again, in
theory) another thread *might* get interleaved and mess up our format
with a parallel call to error_report() (or more deeply, to fprintf()).

Importantly I'm not talking about "corrupting data"; stdio streams are
automatically locked by the fprintf() family. The thing (theoretically,
possibly) corrupted would be our record-oriented message format, by
interleaved printfs.

(a) I'm not sure if this is possible at all in qemu.

(b) Anyway, there are two ways to fix it:

(b1) In error_report(), lock the stream across the two printfs with
flockfile(). Probably overkill, and in case we're printing to the
monitor, wasteful/useless. Or,

(b2) Format the full message (including the timestamp) into a buffer
(sprintf, vsprintf(), or their glib wrappers with automatic allocation,
if any) and print it with a single error_printf("%s", buf).

Anyway I absolutely do not insist on this, so sorry for the noise.

Thanks
Laszlo
Paolo Bonzini - April 4, 2013, 3:20 p.m.
Il 04/04/2013 16:54, Laszlo Ersek ha scritto:
> Side note: strictly in theory, this would result in two vfprintf()
> calls. The message log would remain "record oriented", but (again, in
> theory) another thread *might* get interleaved and mess up our format
> with a parallel call to error_report() (or more deeply, to fprintf()).
> 
> Importantly I'm not talking about "corrupting data"; stdio streams are
> automatically locked by the fprintf() family. The thing (theoretically,
> possibly) corrupted would be our record-oriented message format, by
> interleaved printfs.
> 
> (a) I'm not sure if this is possible at all in qemu.

It would be one more thing that is protected by the big QEMU lock.

Regarding your other comment, please use gmtime_r, not gmtime.

Paolo

> (b) Anyway, there are two ways to fix it:
> 
> (b1) In error_report(), lock the stream across the two printfs with
> flockfile(). Probably overkill, and in case we're printing to the
> monitor, wasteful/useless. Or,

> (b2) Format the full message (including the timestamp) into a buffer
> (sprintf, vsprintf(), or their glib wrappers with automatic allocation,
> if any) and print it with a single error_printf("%s", buf).
> 
> Anyway I absolutely do not insist on this, so sorry for the noise.

Patch

diff --git a/util/qemu-error.c b/util/qemu-error.c
index 08a36f4..44d0fc1 100644
--- a/util/qemu-error.c
+++ b/util/qemu-error.c
@@ -196,6 +196,95 @@  void error_print_loc(void)
     }
 }
 
+
+#define SECS_PER_HOUR   (60 * 60)
+#define SECS_PER_DAY    (SECS_PER_HOUR * 24)
+#define DIV(a, b) ((a) / (b) - ((a) % (b) < 0))
+#define LEAPS_THRU_END_OF(y) (DIV(y, 4) - DIV(y, 100) + DIV(y, 400))
+
+const unsigned short int __mon_yday[2][13] = {
+    /* Normal years.  */
+    { 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365 },
+    /* Leap years.  */
+    { 0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366 }
+};
+
+#define is_leap_year(y) \
+    ((y) % 4 == 0 && ((y) % 100 != 0 || (y) % 400 == 0))
+
+static void error_print_timestamp(void)
+{
+    struct timespec ts;
+    unsigned long long now;
+
+    struct tm fields;
+    long int days, rem, y;
+    const unsigned short int *ip;
+    unsigned long long whenSecs;
+    unsigned int offset = 0; /* We hardcoded GMT */
+
+    if (clock_gettime(CLOCK_REALTIME, &ts) < 0) {
+        return;
+    }
+
+    now = (ts.tv_sec * 1000ull) + (ts.tv_nsec / (1000ull * 1000ull));
+    /* This code is taken from GLibC under terms of LGPLv2+ */
+
+    whenSecs = now / 1000ull;
+
+    days = whenSecs / SECS_PER_DAY;
+    rem = whenSecs % SECS_PER_DAY;
+    rem += offset;
+    while (rem < 0) {
+        rem += SECS_PER_DAY;
+        --days;
+    }
+    while (rem >= SECS_PER_DAY) {
+        rem -= SECS_PER_DAY;
+        ++days;
+    }
+    fields.tm_hour = rem / SECS_PER_HOUR;
+    rem %= SECS_PER_HOUR;
+    fields.tm_min = rem / 60;
+    fields.tm_sec = rem % 60;
+    /* January 1, 1970 was a Thursday.  */
+    fields.tm_wday = (4 + days) % 7;
+    if (fields.tm_wday < 0) {
+        fields.tm_wday += 7;
+    }
+    y = 1970;
+
+    while (days < 0 || days >= (is_leap_year(y) ? 366 : 365)) {
+        /* Guess a corrected year, assuming 365 days per year.  */
+        long int yg = y + days / 365 - (days % 365 < 0);
+
+      /* Adjust DAYS and Y to match the guessed year.  */
+      days -= ((yg - y) * 365
+               + LEAPS_THRU_END_OF(yg - 1)
+               - LEAPS_THRU_END_OF(y - 1));
+      y = yg;
+    }
+    fields.tm_year = y - 1900;
+
+    fields.tm_yday = days;
+    ip = __mon_yday[is_leap_year(y)];
+    for (y = 11; days < (long int) ip[y]; --y) {
+        continue;
+    }
+
+    days -= ip[y];
+    fields.tm_mon = y;
+    fields.tm_mday = days + 1;
+
+    error_printf(
+           "%4d-%02d-%02d %02d:%02d:%02d.%03lld+0000 ",
+           fields.tm_year + 1900, fields.tm_mon + 1, fields.tm_mday,
+           fields.tm_hour, fields.tm_min, fields.tm_sec,
+           now % 1000);
+
+    return;
+}
+
 /*
  * Print an error message to current monitor if we have one, else to stderr.
  * Format arguments like sprintf().  The result should not contain
@@ -207,6 +296,7 @@  void error_report(const char *fmt, ...)
 {
     va_list ap;
 
+    error_print_timestamp();
     error_print_loc();
     va_start(ap, fmt);
     error_vprintf(fmt, ap);