The Mystery of the Chinese Junk

You know it’s going to be one of those days when, just as you’re about to put on your headphones and get into ‘the zone’, you overhear somebody saying the fateful words ‘ok, then maybe we’ll need to get Dylan to look at it.’

See, amongst the many hats I wear in the course of a given week, there’s one that’s probably labelled ‘dungeon master’ I’m the one who remembers where all the bodies are buried, because – for all sorts of reasons that made very good sense at the time – I probably helped bury most of them. And on this particular day, the source of so much excitement was our venerable Microsoft Dynamics CRM v4 server. It started out with a sort of general grumbling on the support channel about CRM4 being slow… but by the time it was handed over to me to look into, it was beautifully summarised as ‘dude… there’s Chinese in the Windows event log’

And, sure enough, there is – complete with the lovely Courier typeface that Windows Event Viewer kicks into when you get errors so weird that good old Microsoft Sans Serif can’t even display them:


Now, whilst it’s been a while since I’ve done any serious work on our old CRM system, I’m pretty sure it’s not supposed to do that – so we start investigating. Working theory #1: some sort of vulnerability has resulted in attackers injecting Chinese characters into our database – whilst CRM4 is generally pretty well insulated from any public-facing code, there’s one or two places where signup forms would generate CRM Leads, that sort of thing. So we start grepping the entire database for one of the Chinese strings we’ve found in the event log.

Whilst this is going on – and trust me, it takes a while – I decide to share my excitement via the wonder of social media. This turns out to be a Really Good idea, because… well, here’s what happened…

“The incoming tabular data stream TDS RPC protocol stream is incorrect. Parameter (“䐀攀氀攀琀椀漀渀匀琀愀琀攀…” Oh. It’s gonna be one of THOSE days.

— Dylan Beattie (@dylanbeattie) October 18, 2016//

@dwm @dylanbeattie The low bits are all null, so this probably UTF-16LE being mistaken for UTF-16BE (or vice versa…?).

— Fake Unicode ⁰ ⁧ (@FakeUnicode) October 18, 2016//

You see in @FakeUnicode’s screenshot there, the words ‘DeletionState’ appear quite clearly at the bottom of the message?

Whilst this is going on, our database search comes back reporting that there’s no mysterious Chinese characters in any of our CRM database tables. Which is good, since it means we probably haven’t been compromised. So, next step is to work through that Unicode lead, see if that gets us anywhere. Because .NET has a built-in encoding for big-endian Unicode, this is pretty simple:

var source = “䐀攀氀攀琀椀漀渀匀琀愀琀攀”;
var bytes = Encoding.BigEndianUnicode.GetBytes(source);
var result = Encoding.Unicode.GetString(bytes);

Turns out – just as in FakeUnicode’s screenshot – that’s the text “DeletionState” with the byte order flipped. We grabbed a few examples of the ‘Chinese’ text from the event log, ran it through this – sure enough, in every single case it’s a valid CRM database query that’s somehow been flipped into wrong-endian Unicode. At this point we start suspecting some sort of latent bug – this is old software, running on an old operating system,talking to an old database server, and sure enough, a bit of googling turns up a couple of  likely-looking issues, most of which are addressed in various updates to SQL Server 2008. We take a VM snapshot in case everything goes horribly wrong, and one of the Ops gang volunteers to work late to get the server patched.

Next morning, turns out the server hasn’t been patched – because every single download of the relevant service pack has been corrupted. At which point all bets are off, because chances are the problem is actually network-related – which also explains where the ‘Chinese’ is coming from.

OK, let’s capture a stream of bytes from somewhere. Like, say, from the TDS data stream used by the MSCRMAsyncService


What does that say? If you think you know the answer, you’re wrong. Pop off and read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – done? Awesome. NOW what do you think it says?

See, we have no idea. It’s a stream of bytes. Without some indication of how we’re supposed to interpret those bytes, it’s meaningless. OK, I’ll give you a clue – it’s UTF-16. Now can you tell what it says? No, you can’t – because (1) you don’t know whether it’s big-endian or little-endian, and (2) you don’t know where it started.

If we assume it’s big-endian, then the first byte pair – 00 48 – would encode the character ‘H’, the second byte pair – 00 65 – would encode ‘e’, and so on. If we assume it’s little-endian, then the first byte pair – 00 48 – encodes the character 䠀 – and suddenly the mysterious Chinese characters in the event log start to make sense.


Of course, the data stream between the MSCRMAsyncService and the SQL server hasn’t actually flipped from little-endian UTF-16 to big-endian – what’s happened is that the network connection between them is dropping bytes. And if you drop a single byte – or any odd number of bytes – from a little-endian Unicode stream, you get a sort of off-by-one error right along the rest of the data stream, resulting in all sorts of weirdness – including Chinese in the event logs.

Turns out there was a problem with the virtual network interface on the SQL Server box – which was causing poor performance, timeouts, bizarre query syntax errors, Chinese in the event logs, and corrupted service pack downloads. Fortunately the databases themselves were intact, so we offlined them, cloned the virtual disk they were sitting on, attached that to a different server and brought them back online.

Every once in a while, you get a weird problem like this. I’ve seen maybe half-a-dozen problems in my entire career that made absolutely no sense until they turned out to be a faulty network connection, at which point generally you not only solve the problem, but explain a whole load of other weirdness that you hadn’t got round to investigating yet. The only thing more fun than dodgy networks is dodgy memory – but that’s a post for another day.

Oh, and if you’re wondering about the title of this post, you clearly haven’t studied the classics.

The Next Big(int) Thing

One of our systems here uses a bigint identity column as a database primary key – because we knew when we built it, back in 2010, that we were going to end up with more than 2,147,483,647 records.

Well, that happened at 12:02 today, and a couple of systems promptly failed – because, despite the underlying database being designed to handle 2^63 records, the POCOs that were being mapped to those classes were using a regular C# int to store the record ID, and so as soon as they got an ID from the database that’s bigger than Int32.MaxValue, they blew up. Thanks to the underlying DB schema already supporting 64-bit IDs, the fix was pretty simple – just change int to long in a few carefully-selected places and redeploy the applications – but it’s still annoying that something we knew about, and planned for, still came back to bite us. So I started thinking – how could we stop this happening?

The problem is that, despite being a bigint column, we just accepted SQL Server’s default identity setting of (1,1) – i.e. start counting at 1, and increment by 1 each time. Which means that until you hit 2-billion-and-something records, it doesn’t actually make any difference – and that takes a while. In our case, it took 5 years, 8 months and 26 days. During that time we’ve made hundreds of changes to our code, and in a handful of those cases, we’ve mapped that bigint ID onto a regular C# Int32 – and so inadvertently planted a little time-bomb in our production code. Tick, tick, tick…

So here’s a nice neat solution, that I wish I’d thought of five years ago. Anytime you create a bigint identity, seed it with (2147483648, 1) – so that right from day one, it’s already too big to fit in an Int32. Any system that tries to store an ID in a regular int variable will fail immediately, not in five years when someone creates that magic 2.14-billion-and-somethingth record. Even though you’ve effectively thrown away 2^32 possible values, you have another (2^64 – 2^32) values to play with, so you’ve lost a tiny, tiny fraction of the available keyspace in exchange for immediate feedback if any of your client apps can’t cope with 64-bit ID values.

ASP.NET Authentication with Adxstudio

I’m looking into options for integrating our shiny new CRM system with our website, so we can provide all sorts of neat self-service capabilities and features. One of the applications I’m investigating is a thing called Adxstudio – now owned by Microsoft – which claims to “transform Dynamics CRM into powerful application platform with dozens of apps and starter portals.”


This is one of those situations where we really are dealing with ‘solved problems.’ Email campaigns. Customers updating their own contact details, potentially things like forums, helpdesk/ticketing systems – lots of things which are nice-to-have but really aren’t strategic differentiators, and so there’s a compelling argument to find an off-the-shelf solution and just plug it in. We already have a federated authentication system here at Spotlight – something we built a few years ago that provides basic identity and authentication capabilities on top of OAuth2. At the time we built it, OpenID Connect didn’t exist yet, so we’ve got a system that does basically the same thing but isn’t actually compatible with OpenID Connect – and consequently doesn’t work out-of-the-box with Adxstudio. So I’ve been poking around, trying to work out the best way to plug Adxstudio into our infrastructure so we can evaluate it as a solution.

One of the options on the table was to replace our existing authentication system with IdentityServer; another was to implement OpenID Connect support on top of our existing authentication system – both quite elegant solutions, but both of which involve quite a lot more work than is actually required for what we’re trying to do.

The core requirement here is:

  • We already have a CRM Contact record for every user of our system
  • We can look up a user’s CRM Contact GUID during authentication
  • We want to set up the Adxstudio MasterPortal demo so that our customers are seamlessly authenticated and can use Adxstudio features as though they had registered via the Adxstudio registration facility.

Now, one of the nice things about Adxstudio is that it’s built as OWIN middleware, and uses the ASP.NET Identity framework to handle authentication – so what we need to do is work out how to translate the CRM Contact GUID into an IPrincipal/IIdentity instance that we can assign to the HttpContext.Current.User property, and hope that Adxstudio then does the right thing once the HttpContext User is set correctly.

Adxstudio provides an implementation of ApplicationUserManager that’s already registered with the OWIN model, which accepts a CRM Contact GUID (as a string) and returns an instance of ApplicationUser that we can use to spin up a new ClaimsIdentity. So the simplest possible approach here is this snippet of code:

protected void Application_AuthenticateRequest(object sender, EventArgs e) {
Guid userGuid;
var cookie = Request.Cookies[“crm_contact_guid”];
if (cookie != null && Guid.TryParse(cookie.Value, out crmContactGuid)) {
var http = HttpContext.Current;
var owin = http.GetOwinContext();
var userManager = owin.Get<ApplicationUserManager>();
var user = userManager.FindById(crmContactGuid.ToString());
var identity = user.GenerateUserIdentityAsync(userManager).Result;
HttpContext.Current.User = new RolePrincipal(identity);

Doing this with a Contact that’s been created via the Adxstudio registration thing works just fine – but trying to do it with a ‘vanilla’ contact blows up:

imageAs you can see from that stack trace, deep down buried under several layers of Adxstudio and ASP.NET Identity code, something is trying to construct a System.Security.Claims.Claim() instance and it’s blowing up because we’re passing in a null value for something that’s not allowed to be null. Unfortunately for us, because we don’t have the source for the thing that’s actually blowing up, we can’t see what the actual parameter values are that are causing the exception… so it’s time for a bit of hunch-driven development. 🙂

I’d already noticed that when you install Adxstudio into your CRM system, it adds a bunch of custom attributes to the Contact entity in Dynamics CRM; here’s a dump of those attributes for a working contact:

Attribute Key Value
adx_changepasswordatnextlogon False
adx_identity_emailaddress1confirmed False
adx_identity_lockoutenabled True
adx_identity_logonenabled True
adx_identity_mobilephoneconfirmed False
adx_identity_passwordhash (omitted for security reasons)
adx_identity_securitystamp ca49a664-0385-4eb4-90c0-6283c9e704ea
adx_identity_twofactorenabled False
adx_identity_username ali_baba
adx_lockedout False
adx_logonenabled False
adx_profilealert False
adx_profileisanonymous False
adx_profilemodifiedon 2016-07-20 09:18:43

The ASP.NET Identity model is generally pretty flexible, but I have a hunch that the username and the security stamp are both required fields because they’re fundamental to the way authentication works. So, let’s try inserting some code into the AuthenticateRequest handler that will check these fields exist, and update them directly in CRM if they don’t:

protected void Application_AuthenticateRequest(object sender, EventArgs e) {
Guid userGuid;
var cookie = Request.Cookies[“crm_contact_guid”];
if (cookie != null && Guid.TryParse(cookie.Value, out userGuid)) {
var http = HttpContext.Current;
var owin = http.GetOwinContext();
var userManager = owin.Get<ApplicationUserManager>();
    var user = userManager.FindById(userGuid.ToString());
    if (String.IsNullOrEmpty(user.UserName)
) {
// “Xrm” is the connection string name from web.config
using (var crm = new OrganizationService(“Xrm”)) {
var entity = crm.Retrieve(“contact”, userGuid, new ColumnSet(true));
entity.Attributes[“adx_identity_securitystamp”] =
entity.Attributes[“adx_identity_username”] =
Guid.NewGuid().ToString().Substring(0, 8);

var identity = user.GenerateUserIdentityAsync(userManager).Result;
HttpContext.Current.User = new RolePrincipal(identity);

(For the sake of this demo, all we care about is making sure those values are no longer null. In reality, make sure you understand the significance of the username and security stamp fields in the identity model, and populate them with suitable values.)

OK, this now works sometimes – but only following an IISRESET. Turns out that Adxstudio is actually caching data from CRM locally, so although that new chunk of code is updating the Contact entity into a valid identity, the Adxstudio local cache doesn’t see those changes because it’s looking at an out-of-date copy of the Contact entity. So… time to configure some cache invalidation.

You can read about Adxstudio’s web notifications feature here. Adxstudio includes some code that will call a cache invalidation handler on your own site every time an entity is updated. Which works just fine IF CRM Online can see your Adxstudio portal site. And right now I’m running CRM Online as a 30-day trial and I’m running Adxstudio on localhost, and my workstation isn’t on the internet, so CRM Online can’t see it.

Time to fire up my favourite toolchain – Runscope and Ngrok. First, I’ve set up ngrok so that any requests to will be forwarded to my local machine – you’ll need a paid ngrok license to use custom tunnel names, but if you’re using the free version try this:

D:toolsngrok>ngrok http –host-header=adx.local 80


Now, as long as that NGrok process is running, you can hit that URL – – from anywhere on the internet, and it’ll be tunneled to localhost on port 80 and have the host-header rewritten to be adx.local. This neatly solves the problem of CRM Online not being able to connect to my local Adxstudio instance.

Next, just to give us a bit of insight into what’s going on, I’m going to set up a Runscope bucket for that. Remember – we need to route requests to /cache.axd on our local Adxstudio portal instance, via ngrok, so here’s how to get the Runscope URL you’ll need:


So, last step – you see that big URL in the middle?  We need to tell the Adxstudio Web Notifications plugin to notify that URL every time something changes. The option is under CRM > Settings > Web Notification URLs.

Note that the Adxstudio documentation refers to a Configuration screen accessible from the Solutions > Adxstudio Portals Base. It appears this screen doesn’t exist any more – I certainly couldn’t find any trace of it in my CRM Online instance – but it also appears it isn’t necessary, because as soon as I’d created and active Web Notification URL things started happening.

So, now we have something that works – but it still fails on the first Portal request for a particular Contact, probably because the Adxstudio cache isn’t picking up those two new fields fast enough for the login to succeed. To work around this, I’ve put in a thread sleep and then an HTTP redirect, so the first time a user lands on the portal they’ll get a slight delay whilst we populate their Adxstudio attributes, and then they’ll get their personalised screen:

protected void Application_AuthenticateRequest(object sender, EventArgs e) {
Guid userGuid;
var cookie = Request.Cookies[“crm_contact_guid”];
if (cookie != null && Guid.TryParse(cookie.Value, out userGuid)) {
var http = HttpContext.Current;
var owin = http.GetOwinContext();
var userManager = owin.Get<ApplicationUserManager>();
    var user = userManager.FindById(userGuid.ToString());
   if (String.IsNullOrEmpty(user.UserName)
) {
using (var crm = new OrganizationService(“Xrm”)) {
var entity = crm.Retrieve(“contact”, userGuid, new ColumnSet(true));
entity.Attributes[“adx_identity_securitystamp”] =
entity.Attributes[“adx_identity_username”] =
Guid.NewGuid().ToString().Substring(0, 8);
// Redirect back to the same page, so that Adxstudio will
// retrieve a fresh copy of the cached data.

var identity = user.GenerateUserIdentityAsync(userManager).Result;
HttpContext.Current.User = new RolePrincipal(identity);

And it works. The final step for me was to spin up a separate web app that lists all the Contacts in the CRM system, with a login handler that puts the CRM Contact GUID into a cookie and redirects the browser to http://adx.local/ – and it works. No registration, no login, and any user with a valid CRM Contact GUID can now log directly into the Adxstudio MasterPortal example.

Scrum Values

The Scrum guide has had a makeover. Well it has had a small but powerful addition – values.

Scrum values have been around for a while. They are now officially part of the Scrum Guide following on from an overwhelming demand to add these.

Here is a quick run through of what these are:

Commitment – People personally commit to achieving the goals of the Scrum Team

Courage – The Scrum Team members have courage to do the right thing and work on tough problems

Focus – Everyone focuses on the work of the Sprint and the goals of the Scrum Team

Openness – The Scrum Team and its stakeholders agree to be open about all the work and the challenges with performing the work

Respect – Scrum Team members respect each other to be capable, independent people

Screen Shot 2016-07-21 at 09.53.44

This is a good opportunity to reflect on how we are doing with these at Spotlight. Please take a couple of minutes to fill in a quick form here.
To read the whole scrum guide, please click here.

Affordances, Signifiers, and Cartographobia

One of the teams here is putting the finishing touches on a new online version of Spotlight Contacts, our venerable and much-loved industry guide that started life as a printed handbook way back in 1947. Along the way, we’ve learned some very interesting things about data, and how people perceive that their data is being used.


One of the features of the new online version is that every listing includes a location map – a little embedded Google Map showing the business’ location. When we rolled this feature out as part of a recent beta, we got some very unhappy advertisers asking us to please remove the map from their listing immediately. Now, most of these were freelancers who work from home – so you can understand their concerns. But what’s really interesting is that in most cases, they were quite happy for their full street address to stay on the page – it was just the map that they were worried about.

Of course, this immediately resulted in a quite a lot of “what? they want to keep the address and remove the map? ha ha! that’s daft!” from developers – who, as you well know, are prone to occasional outbursts of apoplectic indignation when they have to let go of their abstractions and engage with reality for any length of time – but when you think about it, it actually makes quite a lot of sense.

See, street addresses are used for lots of things. They’re used on contracts and invoices, they’re used to post letters and deliver packages. Yes, you can also use somebody’s address to go and pay them a visit, but there are many, many reasons why you might need to know somebody’s address that have nothing to do with you turning up on their doorstep. In UX parlance, we’d say that the address affords all of these interactions – the presence of a street address enables us to post a letter, write a contract or plan a trip.

A map, on the other hand, only affords one kind of interaction; it tells you how to actually visit somewhere. But because of this, a map is also a signifier. It sends a message saying “come and visit us” – because if you weren’t actually planning to visit us, why would you need to know that Spotlight’s office at 7 Leicester Place is actually in between the cinema and the church, down one of the little alleys that run between Leicester Square and Chinatown? For posting a letter or writing a contract, you don’t care – the street address is enough. But by including a map, you’re sending a message that says “hey – stop round next time you’re in the neighbourhood”, and it’s easy to see why that’s not really something you want if you’re a freelancer working from your home.

It’s important to consider this distinction between affordances and signifiers when you’re designing your user interactions. Don’t just think about what your system can do – think about all the subtle and not-so-subtle messages that your UI is sending.

Here’s the classic Far Side cartoon “Midvale School for the Gifted”, which provides us with some great examples of affordances and signifiers. The fact you can pull the door is an affordance. The sign saying PULL is a signifier – but the handle is both. Looking at it gives you a clue “hey – I could probably pull that!” – and when you do, voila, the door swings open. If you’ve ever found a door where you have to grasp the handle and push,, then you’ve found a false affordance – a handle that’s sat there saying ‘pull me…’ and when you do, nothing happens. And, in software as in the Far Side, there’s going to be times when all the affordances and signifiers in the world are no match for your users’ astonishing capacity to ignore them all and persist in doing it wrong.

(Far Side © Gary Larson)

ASP.NET Core 1.0 High Performance

Former Spotlighter James Singleton – who worked on our web team for several years and built some of our most popular applications, including our video/voice upload and playback platform – has just published his first book, ASP.NET Core 1.0 High Performance. Since the book includes one or two things that James learnt during his time here at Spotlight, he was gracious enough to invite me to contribute the foreword – and since the whole point of a foreword is to tell you all why the book is worth buying, I figured I’d just post the whole thing. Read the foreword, then read the book (or better still, buy it then read it.)

TL;DR: it’s a really good book aimed at .NET developers who want to improve application performance, it’s out now, and you can buy your copy direct from

And that foreword in full, in case you’re not convinced:


“The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry.”
– Henry Petroski

We live in the age of distributed systems. Computers have shrunk from room-sized industrial mainframes to embedded devices smaller than a thumbnail. However, at the same time, the software applications that we build, maintain and use every day have grown beyond measure. We create distributed applications that run on clusters of virtual machines scattered all over the world, and billions of people rely on these systems, such as email, chat, social networks, productivity applications and banking, every day. We’re online 24 hours a day, 7 days a week,  and we’re hooked on instant gratification. A generation ago we’d happily wait until after the weekend for a cheque to clear, or allow 28 days for delivery; today, we expect instant feedback, and why shouldn’t we? The modern web is real-time, immediate, on-demand, built on packets of data flashing round the world at the speed of light, and when it isn’t, we notice. We’ve all had that sinking feeling… you know, when you’ve just put your credit card number into a page to buy some expensive concert tickets, and the site takes just a little too long to respond. Performance and responsiveness are a fundamental part of delivering great user experience in the distributed age. However, for a working developer trying to ship your next feature on time, performance is often one of the most challenging requirements. How do you find the bottlenecks in your application performance? How do you measure the impact of those problems? How do you analyse them, design and test solutions and workarounds, and monitor them in production so you can be confident they won’t happen again?

This book has the answers. Inside, James Singleton presents a pragmatic, in-depth and balanced discussion of modern performance optimization techniques, and how to apply them to your .NET and web applications. Starting from the premise that we should treat performance as a core feature of our systems, James shows how you can use profiling tools like Glimpse, MiniProfiler, Fiddler and Wireshark to track down the bottlenecks and bugs that are causing your performance problems. He addresses the scientific principles behind effective performance tuning – monitoring, instrumentation, and the importance of using accurate and repeatable measurements when you’re making changes to a running system to try and improve performance.

The book goes on to discuss almost every aspect of modern application development – database tuning, hardware optimisations, compression algorithms, network protocols, object-relational mappers. For each topic, James describes the symptoms of common performance problems, identifies the underlying causes of those symptoms, and then describes the patterns and tools you can use to measure and fix those underlying causes in your own applications. There’s in-depth discussion of high-performance software patterns like asynchronous methods and message queues, accompanied by real-world examples showing how to implement these patterns in the latest versions of the .NET framework. Finally, James shows how you can not only load test your applications as part of your release pipeline, but can continuously monitor and measure your systems in production, letting you find and fix potential problems long before they start upsetting your end users.

When I worked with James here at Spotlight, he consistently demonstrated a remarkable breadth of knowledge, from ASP.NET to Arduinos, from Resharper to resistors. One day he’d be building reactive front-end interfaces in ASP.NET and JavaScript, the next he’d be creating build monitors by wiring microcontrollers into Star Wars toys, or working out how to connect the bathroom door lock to the intranet so that our bicycling employees could see from their desks when the office shower was free. Since James moved on from Spotlight, I’ve been following his work with Cleanweb and Computing 4 Kids Education. He’s one of those rare developers who really understands the social and environmental implications of technology – that whether it’s delivering great user interactions or just saving electricity, improving your systems’ performance is a great way to delight your users. With this book, James has distilled years of hands-on lessons and experience into a truly excellent all-round reference for .NET developers who want to understand how to build responsive, scalable applications. It’s a great resource for new developers who want to develop a holistic understanding of application performance, but the coverage of cutting-edge techniques and patterns means it’s also ideal for more experienced developers who want to make sure they’re not getting left behind. Buy it, read it, share it with your team, and let’s make the web a better place.

Check it out. The chapter on caching & message queueing is particularly good 🙂

Quality of Life with Git and Pivotal

At Spotlight we are currently using Pivotal Tracker as our planning tool and git for version control. One of the obvious things to do is to connect code changes to stories by including the story id in the branch and commit messages. This allows you to cross reference between the two and track why you did what code changes.

But developers are lazy beasts, the pivotal numbers are long, and life is too short to write [#123456789] a gazillion times a day. Let’s see how we can improve it!

Automated branch creation

If you follow the standard git workflow, the first thing to do when starting some work is to create a branch. The branch should have a story number, as well as a meaningful title. Luckily, pivotal has a very nice REST api we can use directly from curl to free us from burden of typing the branch name:

STORY_ID=`grep -o '[0-9]*' <<< $1} `
NAME=`curl -X GET -H "X-TrackerToken: $TOKEN" "$PROJECT_ID/stories/$STORY_ID" | grep -o '"name":"[^"]*"'| head -1 | sed "s/'//" | sed s/'"name":"'// | sed s/'"'//g | sed s/' '/'_'/g | sed s/'#'//g | sed s~/~_~g | sed s/,//g`

git checkout -b $branchName
git push origin -u $branchName

Yes, my bash is horrendous, so if your eyes are melting from reading it here is what it does: you pass the pivotal story id, the script curls the story as json, extracts the name and replaces characters that would upset git. Then it creates a branch and pushes it to the origin upstream repo.

Stick it into a file (eg., put this in your PATH and set chmod 755. For added automation, set it as your git alias and hey presto! Guaranteed to work 99% of the time.


This gets the job done, the only downside being that the branch names sometimes end up being too long. But that’s just an added incentive to keep the story titles short.

Add [#story id] if you are committing to a story branch

We can (hopefully) assume that people will not start their branch with a number, so we can try to filter them on commit:

BranchName=`git rev-parse --abbrev-ref HEAD`
TicketNo=`grep -o '^[0-9]*' <<< $BranchName`
if [ -n "$TicketNo" ] 
 git commit -m "[#$TicketNo] $1"
 git commit -m "$1"

Now you can just type your commit message, and git will add the story number if you are on a story branch. That saves us 11 keystrokes per commit! How cool is that?


And just for completion, the git aliases I am using that go into .gitconfig

 cm = !
 grab = !

Enjoy one line git-pivotal experience!

Exupérianism: Improving Things by Removing Things

Last night this popped up on Twitter:

Last year, as part of migrating our main web stack to AWS, we created a set of conventions for things like connection strings and API endpoint addresses across our various environments, and then updated all of our legacy systems to use these conventions instead of fragile per-environment configuration. This meant deleting quite a lot of code, and reviewing pull requests with a lot more red lines than green lines in them – I once reviewed a PR which removed 600 lines of code across fifteen different files, and added nothing. No new files, no new lines – no green at all – and yet that change made one of our most complicated applications completely environment-agnostic. It was absolutely delightful.

When I saw John’s tweet, what instantly came to mind was a quote from Antoine de Saint-Exupéry:

“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”

So how about we adopt the term “Exupérian” for any change which improves something by making it smaller or simpler? The commit that removes 600 lines of unnecessary configuration. The copy-edit that turns fifteen thousand words of unstructured waffle into ten thousand words of focused, elegant writing. Maybe even that one weekend you spent going through all the clutter in your garage and finally getting rid of your unwanted lamps and old VHS tapes.

Saint-Exupéry was talking about designing aircraft, but I think the principle is equally applicable to software, to writing, to music, to architecture – in fact, to just about any creative process. I was submitting papers to a couple of conferences last week, and discovered that Øredev has a 1,000-character limit for session descriptions. Turns out my session descriptions all end up around 2,000-3,000 characters, and editing those down to 1,000 characters is really hard. But – it made them better. You look at every single word, you think ‘does it still work if I remove this?’, and it’s surprising how often the answer is ‘yes’.

Go on, give it a try. Do something #exuperian today. Edit that email before you send it. Remove those two classes that you’re sure aren’t used any more but you’re scared to delete in case they break something. Throw out the dead batteries and expired coupons from your desk drawer. Remove a pointless feature nobody really wants.

Maybe you even have an EU cookie banner you can get rid of? 🙂

Agile Tour London

Here’s a bit of a delayed blog post about my not-so-recent visit to The Agile Tour London on 23rd October 2015. There isn’t going to be another one in London until late next year so it’s great to be able to share what went on whilst it’s still relevant!

It was an interesting event; I had moments of “oh wow this is brilliant” followed by “what am I doing here?!”, learned some new tricks and refreshed some old practices.

In a nutshell I would consider it a success. A couple of topics covered that I particularly enjoyed and which had a definite impact were:

  1. The frequency of releases to customers: 

We all talk about how it is a good thing to get early feedback from customers and get new features out there soon as possible, starting with a minimal viable product. But in that passion to deliver fast, what we sometimes fail to understand is how often customers actually want or need updates. It can sometimes be more disruptive than constructive to be releasing too frequently, especially if it means a new feature release actually disrupts a customer’s day-to-day job.

  1. How fast we really are going compared to the rest of the world:

One of the main objectives of Agile is to achieve continuous improvement. There are a number of key metrics to help measure success such as velocity, cycle times…the list goes on. It does help to see if a team is improving and moving forwards, but if you have multiple teams how do you know how the teams compare? To take it a step further, do you know if your teams are doing as good as the rest of the industry?  Where do you stand?

As this was an interactive session there were a lot of ideas flowing around the room. One that I liked (mainly because it was mine) was that of an App which could record measures and compare and score teams across companies. Another one was cross-company agile workshops. At the time we were discussing this idea, the risk of these workshops causing a big overhead became apparent but having thought about it, there could be actual work done if they were correctly planned and structured.

Those of you who are interested and are “clever” enough to search the internet will now find some funny looking pictures of me from the conference.

Merry Christmas and a Happy New Year.

Spotlight on… Future Decoded and Project Oxford

I was at ExCel earlier this week for Microsoft’s annual Future Decoded event. Future Decoded’s a combination of big-picture keynote speeches – Internet of Things, quantum computing, artificial intelligence – and focused talks on current and future Microsoft technology like ASP.NET 5, Windows 10, the new Roslyn compiler infrastructure. It’s always an excellent event, but something that really jumped out at me this year was a talk by Chris Bishop from Microsoft Research about Project Oxford, a set of AI services for dealing with speech, natural language – and human faces. As you can appreciate, human faces are a hugely important part of casting. From 10×8″ headshots to online portfolios, a performer’s photographs have always been an essential part of any sort of casting service, and Spotlight is no different.

We humans are sociable animals, and one of the things we are astonishingly good at is recognising each other’s faces – our parents, our friends, celebrities, even the grainy photocopies in the picture round of your local pub quiz. This capacity to detect and recognise faces is vital to our social groups and communities, and accurate face recognition has long been one of the holy grails of artificial intelligence research. Over the last decade, there’s been some remarkable developments in the areas of computer vision associated with human faces.

First, there’s face detection – analysing a photograph, and working out if there’s any people in it.  Like this example from Apple’s iOS libraries:

When I visited Japan in 2007, Sony were proudly showing off a cutting-edge digital camera that would detect human faces and adjust the autofocus so that your subjects’ faces would be in focus – very cool, very innovative, very expensive. Eight years later, most of us have a phone in our pocket that can do face detection via a built-in camera, and if it doesn’t, Facebook will detect the faces when you upload your photographs.

So… what’s next? The really exciting thing – certainly from a casting perspective – is face recognition, and being able to measure similarity between faces. How many casting briefs have you seen looking for someone to play a historical figure, or brothers and sisters of a character who’s already been cast? Or those breakdowns looking for a “Kate Winslet type” or a “Michael Fassbender type”?

Among the technologies Microsoft demonstrated at ExCel on Wednesday was Project Oxford’s “similar face search” capability. It’s available via an HTTP API from Microsoft Research, but they’ve also put together this rather neat demo called So I decided to kick it around a bit and see what it can do – and, since this is Spotlight, I’ve tried it out on a couple of castings to see how well Project Oxford thinks these performers matched the people they’re portraying.



That’s Morgan Freeman, who portrayed Nelson Mandela in “Invictus”; Tom Hanks playing Walt Disney in “Saving Mr Banks”, Helen Mirren playing Elizabeth II in “The Queen” – and Sasha Baron Cohen, who was in talks to play Freddie Mercury in a Queen biopic, but it was confirmed in 2013 that Cohen was no longer involved and the project is now on hold.

Of course, making a fun online technology demo is one thing; to actually turn this kind of technology into a usable casting tool is still some way off. For starters, the processing power involved in this kind of analysis is considerable – there’s nearly a quarter of a million performer photographs in Spotlight’s database, so to analyse our whole data set for similarity would mean analysing over sixty million pairs of photographs, and Microsoft’s beta programme is currently limited to 5,000 requests per month. But not long ago, this kind of stuff wasn’t just expensive, it was actually impossible, and with the cost of computation halving every eighteen months, it won’t be long before this kind of research opens up a whole new range of possibilities for digital casting tools.

In the meantime, head over to to try it out for yourself, or read more about it at Microsoft’s Project Oxford site.