Jenny is a data enthusiast and product lead of a Kiwi team analytics startup, Multitudes, which provides ethical engineering metrics. Having a keen interest in how technology impacts our world, she is interested in exploring how we can use data in ways that are not reductive but empowering. Before Multitudes, she built data products at Xero.
Outside of work, Jenny has been involved in several organisations around DEI, climate action, and youth civic engagement. She is currently a data ethics facilitator at Colab Cohorts’ Equitable Product course. She is also co-founder and editor of Climate Club, a weekly newsletter for busy folks who care about climate change.
Engineering metrics, minus the creepiness
So! The last session for the day is — of course — packed with great content. As we hoped it would be.
It's all about data.
What would we do without it? And how do we actually prevent the misuse of it?
When we're running our teams, and when we're working with them... Sorry, when we're working on the software that they'll be running the team with.
So actually, I don't recall the first time that I met the next speaker.
She's actually one of those people who you've heard of, who we've seen the name of... everywhere.
In some articles at work, behind a commit, in some other places, or on LinkedIn! All over the place.
And so of course, when we were thinking about data lens in culture. We were like, right we knew who to go to. It's Jenny.
Her work in the last few years has been very focused on just that. Making sure that metrics, data, and culture intertwine in the best possible way.
And so please welcome Jenny Sahng.
Hi my name is Jenny.
I'm a data scientist and product lead at a kiwi startup called Multitudes.
And I just really appreciate everyone being here. I know it's 3:30pm; it's like, peak nap time.
So thanks for being here, and I hope it's an interesting talk.
I know we've heard a lot about engineering metrics today: DORA, all the rest of it.
So I'd like to expand on that a little, by talking about how we actually use these metrics to uncover insights about how our engineering teams are going, in a way that's empowering, sustainable, and — crucially — not creepy.
I'll just share a bit about like why I'm qualified to talk about this.
The company I work for — Multitudes — started out because our CEO (Lauren who's second from the right) was running a diversity, equity, and inclusion consultancy.
They gave workshops to teams about... you know the ones.
It's a really great opportunity to get everyone on the same page to skill up, so that everyone has the same language to work with.
When they're talking about how we might make a better team culture for everyone on teams.
So Lauren knew that workshops are just part of the solution, when an organisation is working on becoming more equitable and inclusive.
That really drove home, when a company that they were thinking about working with... Some DEI things made national news, because it's a pretty big workplace culture scandal.
I had to block out that thing [points at white out protest sign], in case there are legal ramifications of that.
But yeah, I think everyone would have heard of it. It was a huge thing.
This really drove home the fact that diversity equipment inclusion, can never be like a one and done thing.
You need to have better feedback loops. It has to be embedded into the everyday culture.
So after workshops after a one-off speaker comes in, how do teams themselves know that they're actually improving in these metrics, in these sort of pillars?
So Employee Engagement surveys. They're really cool. They're interesting.
They're very rich in qualitative information.
But the accuracy depends on how many people actually choose to respond, and the data could be unreliable.
Talking to people — of course — is the best but it's not very scalable
It's quite time intensive for people to have those kinds of in-depth conversations, and also depends on how willing people are to share.
So ultimately we wanted to give people a way to have easy and regular feedback loops on how things are going. That means individuals and teams can track their progress, have data for retros and one-on-ones that are — sort of — more informed, and just have better conversations that lead to a culture that everyone can participate in fully.
So that's how our Multitudes was born.
Over the past few years, we've been speaking to literally hundreds of engineering managers, CTOs, developers.
Many of whom are in this crowd. So thank you, if you've been part of that journey, to learn what the biggest challenges are that people face in engineering teams. What's worked, and why.
We think metrics can help with some of those challenges, when used in the right way.
So I'd like to distill those findings for you.
What makes up a strong team culture?
First off, let's talk about why engineering culture is important in the first place. I think after reading and watching all the slides, and watching all the talks, I'm sort of like:
"Wow okay, this audience really knows why it's important."
But I'm just going to keep going.
So you might have heard the quote "culture eat strategy for breakfast"
It's saying that "no matter how great your business strategy is, it's going to fail if you don't have a team culture to back it up, and have people on board and pulling in the same direction"
There's a parallel for engineering teams.
Lauren often says "code is easy people are hard".
The biggest challenges that people often face are not technical hurdles.
Which — while difficult — a strong team can work through them and find solutions.
Allen mentioned in his talk that "tech systems are inherently sort of social systems".
So often the things that cause a team to fall apart, are things like: restructures that take a while to ramp up from, personalities that haven't learned to work together yet, a lack of alignment and priorities...
All sort of people problems.
It's like this meme, which I think was a leak from like Xero's memes channel.
You can like, leave Xero, but the yeah the memes don't leave you.
We've all been in the situation before. I'm doing a bit of product now, so I kind of... know the feels.
We've been in this situation before because it is so easy for team culture to fall into these common traps.
With engineering and product feeling at odds. Not being aligned on priorities. Goals and stakeholders not being clearly communicated or understood clearly.
These are all people things, and culture is what drives that — for the better or for worse.
So what makes up a strong team culture?
Google has done a lot of research on this. They had a project called Project Aristotle.
It studied hundreds of teams and found that psychological safety, was by far the most important factor or the best predictor, for a high performing team.
So this means a sense of shared trust, that team members feels safe to take risks.
You can imagine things like: a blameless culture when it comes to incidents, or the sense that you won't be shunned if you bring up tough issues, or throw a spanner in the works because you point out a problem in the plan, or proposing a new idea, and also knowing that you can ask questions or ask for help, without anyone thinking you're silly.
So it's pretty easy to see why psychological safety, would be really valuable for a team's success.
The second thing that fosters strong team dynamics is shared experience.
There's lots of studies in org psych (Organisational Psychology) over many decades, that show that teams that have worked together in the past perform better.
That's the case for software developers, security analysts, surgeons, astronauts, basketball teams.
There has probably been a case study for it.
But in the world of fast-moving teams and fast-moving projects, and especially in our industry where churn is quite a bit higher than other Industries.
Shared team experience can be a rarity.
Also more recent research suggests that, while team familiarity can be good for really routine things, it might be less effective for tasks that require new thinking and innovation.
For these reasons, restructures and team changes are probably unavoidable.
But how do we manage that? And manage the negative impacts, when you have these ruptures in shared experience?
The last thing is the environment.
So there's been lots of studies that have found that, environments with both diverse teams and where everyone feels included, results in significantly higher business outcomes.
80% increase in the ability to innovate.
31% increase in responsiveness to changing customer needs.
So I mean... aside from all the other reasons why diversity and inclusion are important, even if you had only capitalist reasons for it, here they are.
While employee engagement and satisfaction surveys can give you visibility on this, as I mentioned earlier they can get low response rates.
They're kind of hard to interpret. It's hard to pull out the key themes without your own unconscious bias casting a lens over it.
So yeah these things are really important, but they can be hard to measure.
So this is where engineering metrics can come in.
They provide accompaniment to things like conversations and one-on-ones and employee engagement surveys.
They collect data from a different source to those, in a place where your team members are working — online.
So they've been getting a lot of airtime recently, and not all good.
There's been some pretty big horror stories recently with CEOs laying off developers based on lines of code written, or tools that stack rank employees based on some arbitrary metric, and just quite general big brother-esque vibes that track and monitor behaviour.
So yeah, it's just it's a bit of a wild west out there, and it's kind of hard to know.
Everyone's saying "Oh you need DORA metrics", "You need these metrics" but it is quite hard to work through.
Having said that, metrics can provide real value in helping us improve what matters to us.
As long as we're thoughtful about what we choose to measure, and how.
So here are two examples.
Laura Tacho, a VP of engineering and a leadership coach.
She highlights the benefits of measuring what you want to improve, but also the risks and how you should roll it out.
Rebecca here was an engineering manager at Stripe.
When she was rolling out developer metrics, she found it valuable in how it informed team conversations.
So we can see that engineering metrics, like DORA and some other ones I'll touch on, can bring value.
It can give you better visibility over team health; especially in a world where remote working is now the norm.
I mentioned in the intro that it can help with providing fast feedback loops on not just team performance and system stability, but also aspects of team culture.
A case that I saw at multitudes that really inspires me, and I think back to a lot, is that there was a team where a manager thought one person was doing a lot less work than others.
The team had access to not just performance data, but also data on collaboration.
So they saw they saw that this person was actually getting less feedback than everyone else, by quite a significant margin.
Which is a pretty key ingredient for growth and learning.
I mean in the case of software, you need reviews. You need co-reviews to ship things.
Coincidentally, that person also happened to be the only woman in the team.
There's lots of research out there that shows that women receive less feedback, and the feedback they do receive, is less specific and actionable.
So that team was able to use the data insight to work on distributing feedback more evenly across the team.
So yeah, you can definitely see that there are ways metrics can be used to identify issues early, measure progress, inform decisions, and bring that in — along with the human context — to make decisions.
It can help you see past blank spots and unconscious biases, and reveal insights that you might not have picked up on your own with just your own lens on how the team's going.
The question of what to measure is really important, but fortunately a lot of that thinking has already been done for us, in our industry.
There are two main complementary frameworks, that are widely accepted as industry standards.
They're the result of years of org psych.
So it can really help you skip over all the pitfalls that you could fall down.
Measuring reductive metrics like lines of code, or numbers of pull requests per person, and it can help you avoid those backfirings.
So the first one is DORA (stands for DevOps, Research, and Assessment).
It's the result of years of research across thousands of teams, for a reliable and actionable set of metrics to understand software team performance.
It's from the book, Accelerate, which was mentioned earlier.
So you've got two velocity metrics and two stability metrics. I won't go over them because it's been talked about a lot, and you can look it up.
But yeah, I think the key to understand is that, they're really good at covering a broad base of system stability, and software team performance.
How you measure these is dependent on your stack.
There's lots of tools out there to help you get this started.
You can build your own dashboard.
You can try and open source, or you can pay for a tool.
There are lots of options.
These have kind of become a standard, because now every year they put out "State of DevOps" report.
That means you can benchmark your teams against your industry — even your subsection of the industry — similar organisation size, and also shows trends of how their industry is evolving their practices.
So the authors of DORA then moved on to expand it in 2021, to capture more of the human elements of engineering performance.
Because they recognised that engineering productivity is about so much more than just activities, incidents, pull requests, and deploys.
It expands on DORA to include well-being, collaboration, satisfaction — that sort of thing.
So you've got satisfaction and well-being. So that's — you know — burnout efficacy.
Does everyone have the tools that they need?
So quality — including change failure rate, time to restore — but also things like "are we moving the needle on customer satisfaction?", activities or the deployment frequency, incidents, and the severity of the incidents.
Communication and collaboration is the one that I am really interested in.
So it's things like who's working with who? Like knowledge silos.
Around how long does it take to onboard someone?
Who's doing a lot of the support work? The documentation, the reviews...
And yeah, as Aleisha mentioned in her talk. It's not just about who needs support.
It's about getting visibility on whether you've got a culture, that allows everyone to give feedback to anyone.
It's not just juniors getting support from seniors. It's juniors being able to have the freedom to ask questions, and then to question — guess like — infrastructure decisions or architecture decisions, that other people have made.
I think that's a really important part of the SPACE, and in an area where it really expands on door in a cool way.
The last one is efficiency and flow. So that's lead time sort of thing, but also:
How much focus time are people getting?
How often are people getting interrupted?
How long do people wait for reviews?
To give an example of why incorporating well-being and collaboration is important.
This is just a made-up example. So let's say team Yetis is shipping code faster. Which is a performance or activity metric from DORA.
But they're also doing more out of hours work. So the well-being metric is going badly.
This makes sure that people aren't getting higher performance at the expense of well-being,
which would never be sustainable anyway.
It's often an early indicator of burnout and churn.
So this is why space is a really great way to measure, not just the performance, but everything else that leads to long-term sustainable performance.
So that's cool, but how do you use these metrics day-to-day, and how do you do it in a way that will empower teams?
How to use metrics day-to-day
There are four rules of thumb that I'd like to propose today.
The first is to remember that the data is never the full picture.
So you could look at these development metrics to prepare for a retro, a one-on-one, or a stand-up.
It's a good idea to use them as a conversation starter, and not hard evidence of something going absolutely wrong or right.
We recommend presenting the data in a way that promotes discussion.
So you could ask something like "hey it looks like deployment frequency has been dropping over the last two cycles,
does that line up with how things are feeling on the ground in the team? Are there any ideas on what could be contributing?"
That gives the team opportunity to bring their own context and to share, maybe they're working on some tech debt, or there's a big gnarly refactor going on, or maybe the data is being measured wrong and you need to change that.
But the team should have an opportunity to weigh in on that.
Much like what people have been saying about security as well.
The team has the most context and it would be a real shame to not use that, when you're interpreting this information.
It's just a good practice for interpreting data in general; bringing the real world context into those numbers, and not using numbers as the be-all and end-all.
So it's really important to let your team know that you're trying out engineering metrics, and get buy-in from them first.
There are some real horror stories that I've come across where, people write out the metrics framework without buy-in.
And that really damaged the trust and the team cohesion.
So giving your team members opportunities to weigh in on how metrics are being used.
And then a few weeks later — maybe in a retro — checking in on how it's going, whether it's useful, and how they can change, how it's being used, if there's any feedback on what's good and what's not, and then coming up with the solution together.
So in the same way that teams should be maintaining values, product ideas, infrastructure, and team norms that should be created and maintained together as a team.
The same should go for the metrics that the team is holding themselves by.
So on that note, being transparent about the data being collected is really important as well. We really recommend giving teams access to their own data if possible.
That creates a high trust environment.
It also lets the team members view and use their own data to set their own goals, and keep themselves accountable.
In my view, an example of a really empowered team is when an individual in a 1-on-1, can use their own data to set their own goals, to work towards them, keep track, and then advocate for that in the next performance review. And say
"Hey I set this goal. This was what I want to get to, and I got to it"
We really recommend that transparency of sharing data, and giving people access to their own data.
The last one is we really recommend that you don't track, and you certainly don't rank, by individual performance metrics.
That's things like lines of code, numbers of pull requests, how fast people are merging their PRs; shown at an individual level.
This might seem intuitive but it's... it's a whole thing at the moment.
Firstly, it's just not useful. Software's a team sport.
If someone's taking ages to merge a PR, that's probably not their fault.
They need other people to be reviewing their code, it might be really complex, maybe they're stuck with all the glue work to help other people on the team move faster.
So it's not really about the individual at all.
It doesn't give any bearing on the individual's contribution to the team.
Secondly, knowing that people's performance is being measured using simplistic measures like these, means that people are incentivised to game them, rather than focusing on the team's goal. That's just the last thing you want — seriously — your engineers will be working on.
I think recently there was a pretty well-known company that tried to roll out some individual performance metrics, then have it feed into performance reviews, and developers just ended up writing scripts to game their commits.
That's just it's just the last thing you want people spending time on. And you can imagine what they did to the culture, and the trust in the team.
So yeah, just don't go there basically. It's pointless and it'll be terrible.
So — on that note — to summarise.
Metrics are a really great way to make sure that important things aren't being missed in the conversations you're already having.
So we know that getting culture is right can be challenging. But it is crucial to any team's success.
When it's done right, metrics can help you progress towards a more equitable culture; as a valuable data point, alongside other data like qualitative feedback or observing your teams in person.
There are lots of frameworks out there to get you started like DORA and SPACE.
They can help you avoid some of the pitfalls of metrics: like making sure you're measuring the right thing, making sure you have visibility over well-being and collaboration, as well as performance.
Then how you measure those things in a way that helps you move with the rest of the industry — if you're interested in that.
When you implement these frameworks and use them day-to-day, we really recommend keeping the human context in the picture by using these data insights as conversation starters.
People should ideally have access to their own data, or at least transparency of what metrics are being measured, and how they're being used.
They should have opportunities to give feedback on how it's going.
Then finally, it's important to remember that the mythical 10x developer is not a thing anymore.
It's very 2010. It's counterproductive to focus on individual performance.
Software as a team sport. So the focus should always be on the team level.
So that's the talk. Thank you for your time and attention.
Some things that you can check out if you're interested. And yeah, I'm happy to take questions.
Well, we actually do have some time for the horrifying Q&A if you're okay with that.
Please be nice, it's the end of the day.
Any questions for Jenny?
Are there any tools that you know of, that can easily hook into a repo, and give these metrics?
Sorry, "are there any tools that..." What was the last part of the question?
Hook into repos to give metrics.
Oh um... Multitudes?
Yeah... there are lots. Like, there are heaps out there. Um...
Ah yeah, I don't know how to make this not salesy. [awkward laugh]
We really care about well-being, collaboration, as well as performance?
But yeah there are a lot out there if you Google DORA metrics or engineering metrics tool.
I'm just adding to that.
I think Multitudes and Jenny does put out a lot of blog posts and content around this sort of improvement as well.
So even if you don't have the budget, their content is still excellent, and their meetups are still quite good as well.
One of the things we've struggled with is time to recover, because an incident is a complex thing; from when a customer first experiences it, through to when a team first realises it, to when a team realises it's actually their fault, through to when the team deigns to respond to it.
Sometimes you have... incidents that's really incidents.
And they drag on for 3 or 4 days because nobody has priority.
How do you deal with that sort of noisiness in the data around recovery?
So, just to make sure I got the right question. Are you asking
"How do you get clean data on cases like incidents, where the start can be really tricky. Attributing it to a team can be really tricky." That sort of thing?
Yeah Mean Time To Restore (one of the 4 DORA metrics) is quite tricky to measure, and it does rely on the team agreeing on a set of norms around how you deal with incidents.
So once the incident is open, we're going to like — flag an Ops Genie, have this go to the Slack channel...
I mean you can do it really lightweight. You could just do it on a Slack Channel and say
"Hey P1 happening here. Check this link on this CloudWatch log".
But whatever you define it as, that is how... it really depends on the team.
So if you want to do a really light touch thing, you can do that. Then just chuck it into a spreadsheet.
If you want to use a proprietary tool, you can have it link up to whatever IMS tool you've got.
Then say like "incident start is when you should start measuring, incident closes is when you should stop measuring"
And then another tool would normally calculate that for you.
In terms of how complex want to make it, it depends on the team and the kind of processes you're after.
I saw you, the person in the back on that corner.
Hi. Thank you.
How do you measure teams well-being, and define it without constant surveys that drag down their well-being.
Yeah... it's... Oh, the question. The question's
"How do you measure well-being in a good way, without constant surveys that dragged down the team and drag things out?"
So... the way that we measure well-being is, we measure out of hours commits.
So people can set the hours that you expect people to be working, and if it's outside of those hours, then it's counted.
It can be sort of like "Oh so-and-so did a bunch of out of hours commits on a Sunday" and then they could feed into a one-on-one where you're like
"Oh did you want some time in lieu? Or is it something we can change, to make sure that you're not crunched like that in the future?"
That's like one way to do it.
Obviously wellbeing is so broad, and so things around surveys...
I mean we're sort of experimenting with very light touch ways to do surveys.
But you can push that to Slack and make it a one-question thing, where it's just like a pulse check.
I think these are things that teams do already. Like those Team Health Checks — you know — red-yellow-green traffic lights.
We're sort of interested in how do you collate that data, and present it in a way that's actionable.
And that would be the last question, before we transition.
Thank you so much for raising awareness of this topic.
The topic of looking at the picture holistically and the engineering experience.
My question is around one of the first topics that you touched on which is understanding the context.
So... what we've done in the past is, we've gotten all these metrics, which is really interesting.
But it's hard to understand what is the cause behind them; like if you've got low satisfaction scores.
Do you have any insights, as to how you can get a better understanding of the reasons behind the metrics?
So, the question was "How do you get the context behind the metrics?
So you've got all these numbers... But how do you know what's causing them, and how you can change them?"
I think this is where coaching comes in.
This is sort of where engineering managers come in; your experience, work you do with talking to your team members, having those retros, and things like that.
I don't think metrics should ever replace that.
The reasons and the causes behind it will be different for each team.
They tend to be sort of similar.
I mean, there's a lot of research that's already been done about what impacts developer satisfaction.
It's often things like lack of autonomy, not being able to work on something that allows you to grow, lack of growth opportunities, work-life balance...
There are set buckets that they tend to fall into, but even then you definitely want to be having conversations with people to figure out exactly what it is, and what solution would work for that person and that team.
So yeah, it's the same stuff. But — we think — better inform by data.
Okay and that is the last of the questions for Jenny. Thank you so much. Please give her a round of applause.
Thank you to our amazing sponsors. We wouldn’t be able to do it without you!