Friday, June 22, 2012

Yellow Squad Weekly Retrospective Minutes: June 22

Introduction

What is this post?

I'm the lead for the "Yellow" squad in Canonical's collection of geographically distributed, agile squads.  We're directed to work as needed on various web and cloud projects and technologies.  Every Friday, our squad has a call to review what happened in the past week and see what we can learn from it.  We follow a simple, evolving format that we keep track of on a wiki.  This post contains the minutes of one of those meetings.

Why read it?

The point of the meeting, and of these minutes, is to share and learn.  We'd be happy if you do both of those.  You might be interested in our technical topics, or in the problems we encounter, in the process change that we try to follow based on our successes and failures.

What are we working on right now?

Our current project is applying LXC virtualization to the 5+ hour test suite of the Launchpad web application.  By parallelizing the test suite across lightweight virtual machines on the same box, we've gotten the time down to under 40 minutes.  That's still not ideal, but it is a whole lot better.

Now read the minutes!

Attendance

Attending: the gang's all here (bac benji frankban gary_poster gmb)
(These are freenode.net nicks)

Project plan

  • For a brief moment in time, we achieved our goal of a 95% success parallel test rate for the first time, given our statistical approach of looking at the test runs in a rolling window of the last three days.  We've been moving around between 80-95% success rate this week.
  • With Serge Hallyn's help we have a workaround for bug 994752 bug 1014916.  If you are starting seven or more LXC Lucid containers at once and care about start-up time, you probably want this workaround too.  It gives us increased reliability and shaves about three minutes off our test run time.
  • We have chosen to reduce concurrency on the 32 hyperthread machines to 20 simultaneous LXC containers.  This seems to give slightly yet noticeably better timing than our other experiments of 32, 24, and 16.
  • However, we still see timeouts as described in 9746171011847, and 1002820 (as discussed last week), and these are the sources of our only recurring failures now.  We adjusted our approach last week but reduced the timeouts.  We're going to increase the timeouts one more time, and then go back to the drawing board.
  • As mentioned last week, we found and addressed one issue with testrepository parallel LXC workers not completing at very different times, but it is still a problem.  The first worker to finish is now typically about seven minutes earlier than the last worker to finish, within a given test run.  We figure contention of some sort might be throwing a random spanner in the works, or the test timing and scheduling is too far off from ideal, or the division of layer setup across the workers is throwing too much variability for the scheduling to be able to do a good job.  We have a low priority task to investigate further.
  • The two new 24 core machines in the data center to actually run the virtualized tests in production are supposed to arrive within the next couple of business days.
  • We are landing kanban cards toward our lpsetup stretch goal.  We'll be talking with matsubara to hopefully set up tarmac for the project, and maybe Jenkins later for some integration tests.

Action Items

[None]

New tricks

gary_poster: when you think of something to share for the weekly retrospective call, why not write it on the wiki page

As a follow on to benji's suggestion to have daily calendar reminders to think about things to share, why not write down any topics you think of on the wiki page?  It might let us have something like an agenda.
benji: how would we write them--just cryptic notes to remind ourselves, or a full write-up?  gary_poster: I think just notes are fine.

gary_poster: When starting a new project, look at our checklist and jml's

We're officially starting a new stretch project now with lpsetup.  We have our own tiny baby checklist for a project.  It also links to a getting-fabulously-better-and-yet-ever-depressingly-larger checklist that jml has been working on.

The only real message in our checklist is "hey, prototypes are cool, and competing prototypes especially.  And follow those other rules too, whydoncha."  jml's is a lot more comprehensive (and he's looking for help with automation, if you are interested!).

gary_poster: When apt-get fails...


Our juju charm sometimes fails on ec2 with errors like this:
subprocess.CalledProcessError: Command '['apt-get', 'install', '-y', '--force-yes', u'your-package-name']' returned non-zero exit status 100
We've seen this before, but I forgot, so I'm sharing it now. This is caused by an apt cache that has hashes that don't match the packages. You can resolve this with apt-get clean on the cloud machine (then locally use juju resolve --retry your_service_name/0 and wait for install-error to go away).

It would be really nice to add this automation to the Python charm helpers once they are packaged and usable (waiting on bug 1016588). If the install fails with 100, we would automatically try an apt-clean.
benji: what about just always clearing the cache first? gary_poster: if the charm is actually relatively fast to start, and the cache is fine, that would be a loss that might be noticeable.

benji: bzr negative ignore


A neat trick that came in handy this week was that bzr supports a negative ignore, with a bang ("!"), like "!pattern".  This came in handy for him when he wanted to assert that everything inside a log directory should be ignored except a README.

frankban: Python's inspect module

Python's inspect module was helpful this week in investigating what was going on in a tough analysis.  You can look back in the frames of a given call.  It can help when pdb is not an option because code is split up across threads or processes, or because stdin or stdout are being used, or because there are too many callspots and you need to come up with data to analyze rather than stepping through something.

benji: it is nice for profiling too.  You can log one level back, and then two levels back, and so on, when the standard profiling tools don't work for one reason or another.  gary_poster: traceback module is less fine-grained but can be more quickly convenient for some tasks.

gmb: Beware: global state hates you: doctest die in a fire part XXXII

The doctest module mucks with stdout, stderr, and __stdout__ and __stderr__ by its very nature.  This can make debugging particularly unpleasant when you yourself are doing things with stdout and stderr.  Our solution was to convert doctests to unittests.  bac: The testtools doctest matcher makes converting from doctest to unit test a lot easier.

Successes

gary_poster: lpsetup came from our first trashed prototype.  Yay!

We had setuplxc, a script that we used to initialize our ec2 instances for parallel testing.  It ended up being a prototype that we trashed and rewrote into lpsetup, our current stretch project.  This is the first time we followed our resolution to prototype, trash, and rewrite, and it has worked well.  frankban: releasing early means you can refactor early.

gary_poster: can we learn anything from frankban's successful analysis  of bug 1015318?

We already discussed the inspect module in relation to this bug.  It let him gain necessary knowledge of what's going on in distant parts of the code (transaction code and database wrappers).

Pain

gary_poster: do we want lpsetup integration tests?

Our lpsetup code has a nice set of unit tests of its infrastructure and helpers, but no integration tests--and therefore, effectively, no tests of the commands themselves.

We know from experience that a full run of the code to create a working lxc launchpad environment takes about an hour on ec2.  Full integration tests could take multiples of that.

Do we want integration tests of lpsetup?  If so, what are their goals?  Can we use mocks or stubs to keep from actually running the real commands, and spending hours running tests, or is that pointless?  How valuable would these tests be?  Will reports from users tell us of problems at about the same speed as the integration tests?

benji/frankban: we could run commands in an LXC ephemeral container to get a full end-to-end test that is cleanly thrown away at the end.  LXC containers can nest now so it could work.

benji: we can write some tests that verify that the commands still basically worked.  For instance, yesterday I made sure that the help command still worked and showed the expected information.  We could automate that and have quick tests.  gary_poster: that sounds like smoke tests, and it would be nice to have those in the test suite as a first step.  frankban: our infrastructure has us write commands as steps, where each step is a function.  Another smoke test might be to make sure that we have the steps we expect in the order we expect.

ACTION: gary_poster will make a kanban card to create a first cut at smoke tests for all of our subcommands.

ACTION: gary_poster will make a kanban card to make the tarmac gatekeeper enforce our test coverage expectations (the code review of the enforcement will also include the discussion as to what we are enforcing as a first cut).

gary_poster: sometimes when I write tests with mocks/stubs I feel like I'm just copying the same information from the source to the test, with different spelling.  It feels like using stubs for this code would be like that.  In that situation, what's the value?  benji: when I feel like that, I usually find that it is because I'm writing tests  of the "happy path" and not of the exceptions.

Consensus: we would like to have a guarantee that code actually works to build a Launchpad working environment before it is released.  We want true integration tests, rather than merely mocks/stubs. How could we do that? benji: we could have the integration tests in Jenkins, with tarmac using only the unit tests to gate commits to lpsetup.  

How do we use the Jenkins tests to keep bad code from being released? gary_poster: lpsetup is packaged, so we could use the results to determine whether we manually make the lpsetup package.  This maybe could be automated using a variety of approaches (Could Jenkins trigger a PPA build of a specific revision? Could we have a second branch that accepts Jenkins/integration-blessed revisions, with PPAs built daily from it?).

benji: if we do these integration tests, we should probably first have a card for an integration test prototype, so we can figure out how to do it.

gary_poster: do we allow tests of one subcommand to build off the state generated from another subcommand?  Don't we have bad experience with that?  bac: yes, we do, but how else would we do it in this case?  [We have no answer, so that is how we'd do it.].

gary_poster: the integration test ideas sound great, and I want them, but they sound expensive.  We do not have an unlimited runway for this, we won't be working on this until it's done, and in fact we could be pulled off of this stretch goal project in two or three weeks.  I'd rather have something released that is better than what we had, instead of something discarded that was supposed to have been even better.  Given unit tests and smoke tests, are integration tests something we should discard or postpone?  Should we timebox it?  Or is it essential?

bac: an additional cost is that getting a box from IS for automating integration tests on Jenkins may add even more time and effort to make this entirely impractical.  That's a process issue that we have yet to address.

benji, bac, gmb: we vote for postpone integration tests.  frankban: I vote for timeboxed integration tests.

ACTION: gary_poster will create a slack card for investigating integration test approaches.  If someone works on this in slack time and shows us a way forward, we'll open this conversation again.  Until that point, or until we successfully release lpsetup for developer usage, they are postponed and effectively discarded.

gmb: what can we learn from fixing the failing zope.testing fork tests: it never should have happened

[Editors note: The kanban card for this task took more than a day to move out of coding, so it automatically became a topic for the weekly call, per our checklist.]

The Launchpad project has a longstanding fork of zope.testing.  Some of the tests started failing a year or more ago.  Since the yellow squad started working with it, we fixed many of the broken tests and documented the remaining three that we felt were too much of a bother for their value.  More recently, in the work to clean up the subunit stream, we made a mistake and suddenly broke many of the tests and committed this to our "trunk".  How did this happen?

It simply shouldn't have happened.  We know better.  We shouldn't have a fork, we shouldn't have commits with broken tests, and we shouldn't have a project without a gatekeeper like pqm or tarmac.

gary_poster: following jml's new project checklist would at least have made us have a gatekeeper, fixing two of those three.  Is getting tarmac for a project cheap enough now that it is reasonable to deploy even for small projects?

ACTION: bac will research how to get and integrate tarmac resources (a testing machine) for a project.  He will first consult with matsubara about this.  The results will be new/improved documentation on how to get tarmac deployed for a project, and/or information on what it would take to make this easier.

No comments: