[committed,buildbot] Replace the aarch64 build slave

Message ID 5049068a-9243-e699-5dda-bde6d97f832c@arm.com
State New
Headers show
Series
  • [committed,buildbot] Replace the aarch64 build slave
Related show

Commit Message

Szabolcs Nagy Oct. 5, 2018, 10:22 a.m.
This one is a thunderx machine.

(the other one was down for a while now.)

i assume the slave will be able to connect once there is a server restart.

Comments

Tulio Magno Quites Machado Filho Oct. 8, 2018, 1:53 p.m. | #1
Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

> This one is a thunderx machine.
>
> (the other one was down for a while now.)
>
> i assume the slave will be able to connect once there is a server restart.

The server has just been restarted.

If the new slave doesn't reconnect in the following minutes, we'll have to
analyze its log.

Thanks!
Szabolcs Nagy Oct. 8, 2018, 2:46 p.m. | #2
On 08/10/18 14:53, Tulio Magno Quites Machado Filho wrote:
> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
>> i assume the slave will be able to connect once there is a server restart.
> 
> The server has just been restarted.
> 
> If the new slave doesn't reconnect in the following minutes, we'll have to
> analyze its log.

sorry i stopped the slaves, since it could not connect previously.

now i restarted it and it fails with

2018-10-08 14:44:22+0000 [-] Connection to 144.217.14.79:9989 failed: [Failure instance: Traceback (failure with no frames): <class
'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.
Tulio Magno Quites Machado Filho Oct. 8, 2018, 3:23 p.m. | #3
Szabolcs Nagy <Szabolcs.Nagy@arm.com> writes:

> On 08/10/18 14:53, Tulio Magno Quites Machado Filho wrote:
>> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
>>> i assume the slave will be able to connect once there is a server restart.
>> 
>> The server has just been restarted.
>> 
>> If the new slave doesn't reconnect in the following minutes, we'll have to
>> analyze its log.
>
> sorry i stopped the slaves, since it could not connect previously.
>
> now i restarted it and it fails with
>
> 2018-10-08 14:44:22+0000 [-] Connection to 144.217.14.79:9989 failed: [Failure instance: Traceback (failure with no frames): <class
> 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.

That port is wrong.  It should have been 9991.
You have to change that in the buildbot.tac:

port = 9991
Szabolcs Nagy Oct. 8, 2018, 4:53 p.m. | #4
On 08/10/18 16:23, Tulio Magno Quites Machado Filho wrote:
> Szabolcs Nagy <Szabolcs.Nagy@arm.com> writes:
> 
>> On 08/10/18 14:53, Tulio Magno Quites Machado Filho wrote:
>>> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
>>>> i assume the slave will be able to connect once there is a server restart.
>>>
>>> The server has just been restarted.
>>>
>>> If the new slave doesn't reconnect in the following minutes, we'll have to
>>> analyze its log.
>>
>> sorry i stopped the slaves, since it could not connect previously.
>>
>> now i restarted it and it fails with
>>
>> 2018-10-08 14:44:22+0000 [-] Connection to 144.217.14.79:9989 failed: [Failure instance: Traceback (failure with no frames): <class
>> 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.
> 
> That port is wrong.  It should have been 9991.
> You have to change that in the buildbot.tac:
> 
> port = 9991
> 

thanks, fixed, and updated the wiki to mention the nondefault port.
Szabolcs Nagy Oct. 9, 2018, 9:55 a.m. | #5
On 08/10/18 17:53, Szabolcs Nagy wrote:
> On 08/10/18 16:23, Tulio Magno Quites Machado Filho wrote:
>> That port is wrong.  It should have been 9991.
>> You have to change that in the buildbot.tac:
>>
>> port = 9991
> 
> thanks, fixed, and updated the wiki to mention the nondefault port.

the first build is red, there are two failures, both are timeouts:

libio/tst-readline takes more than 80s
nss/tst-nss-files-hosts-multi takes about 30s

i assume it's ok to set TIMEOUTFACTOR=5 in the bot environment
or should we raise the TIMEOUT of these particular tests?

XPASS: elf/tst-protected1a
XPASS: elf/tst-protected1b
UNSUPPORTED: iconv/tst-gconv-init-failure
FAIL: libio/tst-readline
UNSUPPORTED: math/test-fesetexcept-traps
UNSUPPORTED: math/test-fexcept-traps
UNSUPPORTED: math/test-nearbyint-except-2
UNSUPPORTED: misc/tst-pkey
UNSUPPORTED: nptl/test-cond-printers
UNSUPPORTED: nptl/test-condattr-printers
UNSUPPORTED: nptl/test-mutex-printers
UNSUPPORTED: nptl/test-mutexattr-printers
UNSUPPORTED: nptl/test-rwlock-printers
UNSUPPORTED: nptl/test-rwlockattr-printers
FAIL: nss/tst-nss-files-hosts-multi
UNSUPPORTED: posix/tst-spawn4-compat
UNSUPPORTED: resolv/tst-resolv-ai_idn
UNSUPPORTED: resolv/tst-resolv-ai_idn-latin1
Summary of test results:
      2 FAIL
   5815 PASS
     14 UNSUPPORTED
     17 XFAIL
      2 XPASS
Makefile:401: recipe for target 'tests' failed
make[1]: *** [tests] Error 1
Joseph Myers Oct. 9, 2018, 11:44 a.m. | #6
On Tue, 9 Oct 2018, Szabolcs Nagy wrote:

> the first build is red, there are two failures, both are timeouts:
> 
> libio/tst-readline takes more than 80s
> nss/tst-nss-files-hosts-multi takes about 30s
> 
> i assume it's ok to set TIMEOUTFACTOR=5 in the bot environment
> or should we raise the TIMEOUT of these particular tests?

If only a few tests are timing out, and there are good reasons for them to 
time out on slow systems (amount of processing or I/O involved), then I 
think raising those tests' TIMEOUT is appropriate.
Tulio Magno Quites Machado Filho Oct. 9, 2018, 12:53 p.m. | #7
Joseph Myers <joseph@codesourcery.com> writes:

> On Tue, 9 Oct 2018, Szabolcs Nagy wrote:
>
>> the first build is red, there are two failures, both are timeouts:
>> 
>> libio/tst-readline takes more than 80s
>> nss/tst-nss-files-hosts-multi takes about 30s
>> 
>> i assume it's ok to set TIMEOUTFACTOR=5 in the bot environment
>> or should we raise the TIMEOUT of these particular tests?
>
> If only a few tests are timing out, and there are good reasons for them to 
> time out on slow systems (amount of processing or I/O involved), then I 
> think raising those tests' TIMEOUT is appropriate.

I agree with Joseph.

But answering your initial question: we can indeed change TIMEOUTFACTOR in the
bot.
We can tune it for each slave, if necessary.

Patch

diff --git a/master.cfg b/master.cfg
index 164d309..701def3 100644
--- a/master.cfg
+++ b/master.cfg
@@ -26,7 +26,7 @@  builder_map = {
   'glibc-ppc-linux': ['debian8-ppc-power8-1'],
   'glibc-ppc64le-linux': ['fedora25-ppc64le-power8-1'],
   'glibc-s390x-linux': ['marist-fedora-s390x'],
-  'glibc-aarch64-linux': ['reservedbit-xgene-ubuntu-aarch64'],
+  'glibc-aarch64-linux': ['tx1-ubuntu-aarch64'],
 }
 
 # Sets with all builders and all slaves.