mbox series

[v2,0/7] colo: Introduce resource agent and test suite/CI

Message ID cover.1591456338.git.lukasstraub2@web.de
Headers show
Series colo: Introduce resource agent and test suite/CI | expand

Message

Lukas Straub June 6, 2020, 7:17 p.m. UTC
Hello Everyone,
So here is v2. Patch 1 can already be merged independently of the others.

Regards,
Lukas Straub

Changes:
v2:
 -use new yank api
 -drop disk_size parameter
 -introduce pick_qemu_util function and use it

Overview:

Hello Everyone,
These patches introduce a resource agent for fully automatic management of colo
and a test suite building upon the resource agent to extensively test colo.

Test suite features:
-Tests failover with peer crashing and hanging and failover during checkpoint
-Tests network using ssh and iperf3
-Quick test requires no special configuration
-Network test for testing colo-compare
-Stress test: failover all the time with network load

Resource agent features:
-Fully automatic management of colo
-Handles many failures: hanging/crashing qemu, replication error, disk error, ...
-Recovers from hanging qemu by using the "yank" oob command
-Tracks which node has up-to-date data
-Works well in clusters with more than 2 nodes

Run times on my laptop:
Quick test: 200s
Network test: 800s (tagged as slow)
Stress test: 1300s (tagged as slow)

The test suite needs access to a network bridge to properly test the network,
so some parameters need to be given to the test run. See
tests/acceptance/colo.py for more information.

I wonder how this integrates in existing CI infrastructure. Is there a common
CI for qemu where this can run or does every subsystem have to run their own
CI?

Regards,
Lukas Straub


Lukas Straub (7):
  block/quorum.c: stable children names
  avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries
  boot_linux.py: Use pick_qemu_util
  colo: Introduce resource agent
  colo: Introduce high-level test suite
  configure,Makefile: Install colo resource-agent
  MAINTAINERS: Add myself as maintainer for COLO resource agent

 MAINTAINERS                               |    6 +
 Makefile                                  |    5 +
 block/quorum.c                            |   20 +-
 configure                                 |   10 +
 scripts/colo-resource-agent/colo          | 1466 +++++++++++++++++++++
 scripts/colo-resource-agent/crm_master    |   44 +
 scripts/colo-resource-agent/crm_resource  |   12 +
 tests/acceptance/avocado_qemu/__init__.py |   15 +
 tests/acceptance/boot_linux.py            |   11 +-
 tests/acceptance/colo.py                  |  677 ++++++++++
 10 files changed, 2251 insertions(+), 15 deletions(-)
 create mode 100755 scripts/colo-resource-agent/colo
 create mode 100755 scripts/colo-resource-agent/crm_master
 create mode 100755 scripts/colo-resource-agent/crm_resource
 create mode 100644 tests/acceptance/colo.py

--
2.20.1

Comments

Lukas Straub July 5, 2020, 9:37 a.m. UTC | #1
On Sat, 6 Jun 2020 21:17:32 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> Hello Everyone,
> So here is v2. Patch 1 can already be merged independently of the others.
> 
> Regards,
> Lukas Straub
> 
> Changes:
> v2:
>  -use new yank api
>  -drop disk_size parameter
>  -introduce pick_qemu_util function and use it
> 
> Overview:
> 
> Hello Everyone,
> These patches introduce a resource agent for fully automatic management of colo
> and a test suite building upon the resource agent to extensively test colo.
> 
> Test suite features:
> -Tests failover with peer crashing and hanging and failover during checkpoint
> -Tests network using ssh and iperf3
> -Quick test requires no special configuration
> -Network test for testing colo-compare
> -Stress test: failover all the time with network load
> 
> Resource agent features:
> -Fully automatic management of colo
> -Handles many failures: hanging/crashing qemu, replication error, disk error, ...
> -Recovers from hanging qemu by using the "yank" oob command
> -Tracks which node has up-to-date data
> -Works well in clusters with more than 2 nodes
> 
> Run times on my laptop:
> Quick test: 200s
> Network test: 800s (tagged as slow)
> Stress test: 1300s (tagged as slow)
> 
> The test suite needs access to a network bridge to properly test the network,
> so some parameters need to be given to the test run. See
> tests/acceptance/colo.py for more information.
> 
> I wonder how this integrates in existing CI infrastructure. Is there a common
> CI for qemu where this can run or does every subsystem have to run their own
> CI?
> 
> Regards,
> Lukas Straub
> 
> 
> Lukas Straub (7):
>   block/quorum.c: stable children names
>   avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries
>   boot_linux.py: Use pick_qemu_util
>   colo: Introduce resource agent
>   colo: Introduce high-level test suite
>   configure,Makefile: Install colo resource-agent
>   MAINTAINERS: Add myself as maintainer for COLO resource agent
> 
>  MAINTAINERS                               |    6 +
>  Makefile                                  |    5 +
>  block/quorum.c                            |   20 +-
>  configure                                 |   10 +
>  scripts/colo-resource-agent/colo          | 1466 +++++++++++++++++++++
>  scripts/colo-resource-agent/crm_master    |   44 +
>  scripts/colo-resource-agent/crm_resource  |   12 +
>  tests/acceptance/avocado_qemu/__init__.py |   15 +
>  tests/acceptance/boot_linux.py            |   11 +-
>  tests/acceptance/colo.py                  |  677 ++++++++++
>  10 files changed, 2251 insertions(+), 15 deletions(-)
>  create mode 100755 scripts/colo-resource-agent/colo
>  create mode 100755 scripts/colo-resource-agent/crm_master
>  create mode 100755 scripts/colo-resource-agent/crm_resource
>  create mode 100644 tests/acceptance/colo.py
> 
> --
> 2.20.1

Ping...
Philippe Mathieu-Daudé July 14, 2020, 2:33 p.m. UTC | #2
On 7/5/20 11:37 AM, Lukas Straub wrote:
> On Sat, 6 Jun 2020 21:17:32 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
>> Hello Everyone,
>> So here is v2. Patch 1 can already be merged independently of the others.
>>
>> Regards,
>> Lukas Straub
>>
>> Changes:
>> v2:
>>  -use new yank api
>>  -drop disk_size parameter
>>  -introduce pick_qemu_util function and use it
>>
>> Overview:
>>
>> Hello Everyone,
>> These patches introduce a resource agent for fully automatic management of colo
>> and a test suite building upon the resource agent to extensively test colo.
>>
>> Test suite features:
>> -Tests failover with peer crashing and hanging and failover during checkpoint
>> -Tests network using ssh and iperf3
>> -Quick test requires no special configuration
>> -Network test for testing colo-compare
>> -Stress test: failover all the time with network load
>>
>> Resource agent features:
>> -Fully automatic management of colo
>> -Handles many failures: hanging/crashing qemu, replication error, disk error, ...
>> -Recovers from hanging qemu by using the "yank" oob command
>> -Tracks which node has up-to-date data
>> -Works well in clusters with more than 2 nodes
>>
>> Run times on my laptop:
>> Quick test: 200s
>> Network test: 800s (tagged as slow)
>> Stress test: 1300s (tagged as slow)
>>
>> The test suite needs access to a network bridge to properly test the network,
>> so some parameters need to be given to the test run. See
>> tests/acceptance/colo.py for more information.
>>
>> I wonder how this integrates in existing CI infrastructure. Is there a common
>> CI for qemu where this can run or does every subsystem have to run their own
>> CI?
>>
>> Regards,
>> Lukas Straub
>>
>>
>> Lukas Straub (7):
>>   block/quorum.c: stable children names
>>   avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries
>>   boot_linux.py: Use pick_qemu_util
>>   colo: Introduce resource agent
>>   colo: Introduce high-level test suite
>>   configure,Makefile: Install colo resource-agent
>>   MAINTAINERS: Add myself as maintainer for COLO resource agent
>>
>>  MAINTAINERS                               |    6 +
>>  Makefile                                  |    5 +
>>  block/quorum.c                            |   20 +-
>>  configure                                 |   10 +
>>  scripts/colo-resource-agent/colo          | 1466 +++++++++++++++++++++
>>  scripts/colo-resource-agent/crm_master    |   44 +
>>  scripts/colo-resource-agent/crm_resource  |   12 +
>>  tests/acceptance/avocado_qemu/__init__.py |   15 +
>>  tests/acceptance/boot_linux.py            |   11 +-
>>  tests/acceptance/colo.py                  |  677 ++++++++++
>>  10 files changed, 2251 insertions(+), 15 deletions(-)
>>  create mode 100755 scripts/colo-resource-agent/colo
>>  create mode 100755 scripts/colo-resource-agent/crm_master
>>  create mode 100755 scripts/colo-resource-agent/crm_resource
>>  create mode 100644 tests/acceptance/colo.py
>>
>> --
>> 2.20.1
> 
> Ping...

Ping^2?