Data Science

What makes a great data scientist and data science team? (on the Data Pubcast podcast)

Greg Detre

30 Sep 2020 • 33 min read

I was recently honoured to be a guest on The Data Pubcast - a podcast about making data accessible to everyone, hosted by the incredibly talented Nick Latocha and Andy Crossley.

In the episode we discuss:

the spectrum of what "data science" means and how knowing that can help you create excellent data science teams [4:10]
Maker vs Manager schedules: how to carve out uninterrupted time of deep work for your team [11:00]
how Data Product Managers can protect the time of those doing "flow state" work [13:00]
the 3 skill categories that every excellent "Machine Learning Engineer" needs [14:00]
one of the hardest problems that successful data science teams need to overcome [17:00]
Data Engineers: why all successful product data science teams need them [18:00]
does data democratisation work? (Empowering stakeholders to do analysis themselves) [25:30]
a profound approach to creating an excellent data science team that the rest of the business want to support [30:20]
why "delivery" is the second part of being a great team, and what that involves - including software engineer training, mathematical theory, and why data scientists need to be great communicators who practice and deliver thoughtful feedback to each other [36:00]
one of the biggest mistakes I made that had a collossal impact on me (and the solution) [38:00]
a few organisations that stand out as leaders of data science at the moment [41:49]
why data scientists are expected more and more to prove their worth (and why execs need to believe and support their work) [45:00]

I also answer the 3 questions they ask every guest:

What would you tell yourself as a 20-year old to help your data science career? [48:00]
What #1 training course would you recommend to someone just starting a data career? [51:11]
If you could sit down with anyone - dead or alive - who would you want to meet? [54:00]

I really enjoyed speaking to Nick and Andy, and I hope you enjoy listening.

Automated transcript

The Data Pubcast - Episode 3: What Makes a Great Data Scientist and Data Science Team?

Guests: Greg Detre (former co-founder of Memrise, former Chief Data Scientist at Channel 4)
Hosts: Nick Latosha, Andy Crossley

Nick: Hello and welcome to the second proper data podcast. My name's Nick Latosha joined by Andy Crossley as usual. We've just poured ourself a pint and we'd like to welcome our guest for this evening, former co-founder of Memrise and former chief data scientist at Channel 4, Greg Detre. So Greg, do you wanna give a brief introduction of your, I guess your data life so far?

Greg: Yeah, I'd be happy to. So I trained as a psychologist trying to understand why we forget things. And I put people in brain scanners and use machine learning to try and read their minds.

Nick: Nice.

Greg: And it was enormous fun. And in the process of doing that, I ended up having to learn a bunch of hard lessons about machine learning, about software engineering, about dealing with gigabytes of data, made loads of mistakes, but also got to explore and experiment and try out a bunch of different techniques and work with some really smart people. We released an open source toolbox for machine learning that got used around the world. And so it was just a great way to cut my teeth on data and machine learning. And then from there, I've been kind of involved in data and technology for most of the last 15 years. As you say, I started co-founding Memrise as the CTO there, and then most recently, leaving the data scientist at Channel 4. So trying to apply machine learning to business problems. And with loads of different ways in which that turned out to be interesting. And it was a great bunch of people. And now I mostly spend my days helping a mix of people from startups up to decent sized companies, helping them get the most out of their data teams, helping those teams be productive and happy and work on the right problems. So a mix of advisory and coaching work for the most part.

Andy: Cool. And I think one of the, well, the main, well, we've actually got a theme this time. I was gonna say the main theme. We didn't have a theme in the first one with the data bob. But given that intro, Greg, and your own blog around making data mistakes, we wanted to theme this around what makes a great data scientist or data science team. Just those two terms have a lot of debate in them themselves. Well, I'm certainly finding out on a day-to-day basis with some of our clients at the moment. But really to pick up on through your experience and those learnings and willingness to say that there's been some mistakes and learning along the way. But pick your mind around what you think makes a great data scientist and data science team. And that's before you start turning the psychologist on, me and Nick, and we have to go and lie down. Hopefully not. (laughing)

So with that as a background and the sort of coaching you do today, I'm gonna try and avoid the C word. I'm gonna miss, we did enough about COVID last time. But I guess as a 30,000 foot view, in your mind, what makes a great data science team? The specific things you've seen work or don't work or particular traits? 'Cause I think a lot of organizations now are creating data science teams. I'm not sure they are always, but I think that's down to different definitions of data science.

Greg: Yeah, so why don't we start there? Because I think that might be the key to unlocking things. I think one of the reasons this ends up being a tricky question and a conversation that can easily get derailed is that people have a differing sense of what they mean by data science or a data scientist. And the easiest way, I think, to disentangle things is to imagine two broad categories that both get referred to as data science.

And on the one side, we have using data to help us make decisions and the kinds of tools that we rely on for that, everything from A/B testing to analytics, to business intelligence and visualization, lots of questions, interrogating things in the service of making good decisions, in the service of understanding the situation so that then we can make good business decisions.

And I think we could contrast that with product, using data to create new products, new data products, often with heavy reliance on machine learning techniques. So a recommendations engine or a self-driving car, even an automatic forecasting tool or a new advertising product. Those are all examples of ways in which we can use data to create perhaps entire new revenue streams, self-standing sources of business value. And the tools involved for those products tend to be lots of software engineering, Python and deep learning often, or various kinds of algorithms.

Now, it's far too simplistic to say, okay, on the one hand, we've got data that helps us make decisions. And on the other hand, we've got data that we use to create products and that those are two completely distinct things. They're not, there's a spectrum. Not just skills in the middle. And in fact, often you need really good analytics to do really good machine learning, for example.

But just starting by saying, hang on, when you say data science, which do you have in mind? 'Cause usually people have one or other in mind. Then at least now we can start to say, well, okay, maybe the lower hanging fruit are to make sure that we've got our analytics house in order. And then once we've done that, we can start to think about the more kind of highfalutin machine learning applications, for instance. Does that make sense to start with?

Andy: I think that's a great starting point and way of, I like the way you sort of started with the differentiation, but actually, yeah. So it's two ends of a spectrum or continuum because they're not perfect black and white. And Nick, I'd be interested, yeah, given your new role, does that line up with what you're seeing in work at the moment?

Nick: Yeah, I think so. I think it's a great way of separating or trying to separate the two kind of disciplines, I guess. And it at least puts you in the right ballpark. So, I've been looking to hire fairly recently and as soon as you put data scientist on something, you get a range of backgrounds, experiences and skillsets. So I think that's kind of a great way of kind of narrowing it down. And then I guess the next level is, what type of kind of machine learning engineer are you after and breaking it down even further?

Andy: And that's the bit that I'm particularly interested in because I can see that there's then, if you think of that as a horizontal continuum, there's a vertical continuum in terms of the skillsets required.

Nick: So I guess, which were you then, Greg, when you were chief data scientist at Channel 4, which spectrum did you fall on?

Greg: So Channel 4 did something interesting. My boss, Sanjeev Mbala, had kind of seen this distinction and made a very clear and helpful separation between the kind of decision-making and analytics versus the product. And so the team that I was leading was squarely focused on product. So applying usually machine learning algorithms to try and create business value. And I think it makes a great deal of sense to have at least some separation in the org chart. You don't want them to be too far apart because the two need to work closely together, but to have some separation between those two. Otherwise, your sort of machine learning engineers will kind of be swamped with ad hoc requests for basically SQL queries or reports. And it's a somewhat different skillset, and it's certainly a different kind of work, a different cadence of work. And it ends up those kind of ad hoc, relatively urgent requests are a kind of ideal way to squash any productivity on hard, slow problems like designing a machine learning algorithm.

Andy: Yeah, that's interesting about the cadence. I've had some of our team say there's a real difference between being in a product team, and this is more from the combination of data and software engineering world. And I'd be interested on your take on the difference. But yeah, internal, I guess, business optimization and helping the business make decisions to make us better, ultimately, versus creating a product team and running a billion products. In your mind, in your experience, which ones runs at a faster cadence?

Greg: So the closer you are to decisions, the faster your cadence, almost invariably. And so if you're on the decision-making side and you're dealing with Tableau reports or ad hoc queries, what were the most popular programs last week broken down by channel and time of day or whatever, then chances are whoever's asking that probably would like an answer in hours or days because they're planning to do something as a result. Whereas if you're working on a recommendations engine, it's unlikely that you're trying to do things in that kind of a hurry.

So there's a very well-known article by Paul Graham distinguishing between maker and manager schedules, which in a sense makes the same point that if your job requires you to get into a state of flow, so you're a maker, and I think that's more true of the product building kind of data scientist, if you need to get into a state of flow, you wanna minimize interruptions. And so those ad hoc requests are kind of death by a thousand cuts to your productivity.

So one of the tiny micro improvements we made was to attempt to carve out chunks of the week that were uninterrupted by meetings where everybody knew they could basically turn off Slack notifications, that they'd have four hours or even a whole day in a row uninterrupted to work on whatever felt most important rather than what felt most urgent.

Andy: Cool. Did that work?

Greg: So it's always very hard to measure productivity and there's never a counterfactual, but yeah, I'm pretty certain that that helped based on my own anecdotal experience, based on work from psychology on flow, based on the measurable noisy happiness that it evoked in the team and the grumpiness when things got interrupted. So I think you do need to then have a conversation with some of your stakeholders because that could impact them. So there's always gotta be exceptions. And so people have gotta be reasonable and it helps to have a separation of, sort of a division of labor.

So this gets into, so it helps to have a data product manager who's the kind of conduit with the business who's much more available for meetings. Whereas the data scientists who might be doing this kind of flow state work aren't necessarily in as many meetings.

So that gets to this question, which is kind of a follow on from, okay, so let's say you are building a machine learning product data science team. What is the profile of skills? What makes for a good team? And I think that was one of the kind of starting questions. And one of the things that makes for a really effective product team is having a really great data product manager or two.

Andy: Yeah, I think that makes sense. And the way I've handled that before is having, as you've said, someone to protect the time of those people doing the work and give them that flow. So I guess what else kind of makes up those great data teams? Is it just data scientists and a product manager or are there different disciplines that then are brought into that?

Greg: Yeah, so that's a great question. So let's talk through some of the skills and break down what makes for a good blend. So let's assume we're talking about what we'll call a machine learning engineer, which appears to be the term that's starting to emerge for this kind of product data scientist with a heavy emphasis on machine learning.

So what does that machine learning engineer need? Well, I think they probably need a mix of roughly three, I think of a Venn diagram with roughly three circles in it. So they obviously need a bunch of computational skills, everything from understanding the machine learning theory, probably some stats, and everything else that enables them to be able to understand and be trained in how to pick and modify data. And modify and parameterize algorithms. Great, so that's the computational circle. That's number one.

Number two is they've gotta be great software engineers. And I think that often gets kind of downplayed, but if you are building models that are gonna get used in production and you're having to experiment and inevitably do some data wrangling and probably deploy it and test it and visualize the output and everything else that goes into it, you've gotta be a great software engineer. Otherwise, you're simply unable to realize your ideas.

And then the third circle is a bit of a catch-all, but it's something like having a product sense. So it's knowing enough about the business to get a sense of what's important and knowing enough about, yeah. So those three skills are the kind of three that I tend to think of as being key for a machine learning engineer. So computational, software engineering, and sort of broadly product, which includes communication and business savvy and everything else.

So if those are the three skills, and no one, of course, has all of them, to the degree that one might dream of, but you need a sort of balance between them. So that's your core machine learning engineer.

We've talked briefly, we've touched on the importance of the data product manager, who supports them as the kind of conduit to the business. And we had some great data product managers at Channel 4, which made a huge difference. They build up a long-term relationship with your potential stakeholders in order to be able to be the voice of what's valuable. 'Cause at the end of the day, the really hard part, I would say, for being a successful data science team is working on the right problems. And to choose the right problems, you need to have a clear sense of what's valuable and a clear sense of what's feasible. And usually the data science team know what's feasible, but they often don't know what's valuable. And one of the ways to make that, to get a richer sense of what's valuable is to have a great data product manager who has many of the skills that a kind of more traditional product manager has in terms of managing stakeholders and being able to kind of guess what they'd say and keep the delivery cadence of the project working well, blah, blah, blah.

And then I'd say the third role that tends to play, that tends to be important in a successful product data science team is data engineering. And that can take many forms as well. But broadly it ends up being some mix of, mostly software engineering skills, lots of database stuff, lots of deployment and operation stuff. And so it's not like there is an absolutely clear distinction between a data scientist and a data engineer, or machine learning engineer and a data engineer. Often they overlap, often they need to work really closely together. And there's a question about whether the data engineers should be in the same team as the machine learning engineers. Certainly you want the communication between them and between all three of those roles, the data product manager, the machine learning engineer, and the data engineer. You want all three of those roles to be communicating well, because if they're too distinct, if they're too far apart in terms of org charts or whatever, then you're just gonna create friction and struggle to get good stuff done. And it's production.

Nick: So that was an interesting point you said about whether the data engineer and the machine learning engineer should be in the same team. For me, I've never thought that they shouldn't be in the same team. So that's interesting. Have you seen or come across or set up where you have separated them into different teams and how did that work?

Greg: Well, I think often they are, and it depends. So you can imagine the data engineers have their permissions to deploy to production and to production data. They often, it makes sense at least on the face of things to put them in the technology department.

Andy: I guess it's kind of similar to your typical DBAs that are kind of key to anyone developing a data warehouse, but they often sit in a technology team because that's where they typically lie. And they've got, as you've said, the permissions and the processes.

Greg: Yeah, exactly. So, I mean, you can see how they might end up being in a very different part of the org chart where in some sense, the nearest common boss is an exec reporting to the CEO. And so you can imagine how that creates, that that's a communication gap that you need to find ways to bridge if that's the case. And there's a million solutions that one can imagine. You can have those two teams just talking lots, you can second people from one to the other, you can put the data scientists in technology, but then they're far from the business. You can put the data engineers in with the data scientists, but then they're far from technology. You can try and make sure that your data scientists think of themselves partly as data engineers. You can have a few rogue data engineers in amongst the data scientists.

These are some of the difficult, Ben Horowitz talks about org charts in "The Hard Thing About Hard Things" as being basically a kind of constraint satisfaction problem for who needs to be able to communicate most fluidly with whom. And that inevitably by putting two groups next to one another, they're gonna communicate more effectively, but then they won't be able to communicate so well with somebody in the opposite part of the business. So it's just a trade off between, yeah.

And I think the one thing, the one impression I wouldn't wanna leave anyone with is that I think there should be too clean a distinction between the machine learning engineers and the data engineers. I think machine learning engineers need to be good software engineers and they need to know what data engineers do pretty well and ideally be a little bit full stack.

Andy: Yes, and so that was the interesting, or that's where I come from and that's why I asked the question about separation because I come from a consulting background and so we tend to try and bring that whole capability together in a single team to help a client, a client organization.

Greg: And I'll just say that I think we probably agree on this that the ideal is probably to have small cross-functional teams that are self-sufficient with the skills to be responsible for their own areas, working on particular bits of the product. So for instance, if you're trying to release a data product, that might be a couple of machine learning engineers, a data engineer, maybe a product manager, and then maybe designers or people from the business, domain experts, maybe someone from QA, I don't know. And that half dozen two pizza size team would work really closely together, bound around a product rather than bound by skills disciplines. And so this kind of eternal debate on balance, I suspect that's probably the best arrangement if you wanna release great products.

Andy: Yeah, and I'm waiting for Nick to jump in 'cause he's the one holding this in the real world in his day job, but I suspect some of it comes down to what fits with the wider culture of the organization as well, actually.

Nick: Yeah, you're right. And actually, I've been in that situation where there's a big divide between IT and the business, and the two never shall meet. And the answer has always been, as you've said, Greg, but let's build these cross-functional, self-sufficient, multidisciplinary teams. But the challenge has always been that the business side doesn't really get that way of working or doesn't really understand how to work in an agile way and how to work with developers and QAs and UX people. So that's always been the challenge that I've seen is technology are kind of dying to do that and transform in that way, but the business don't know how to kind of gel and merge with them.

Andy: So that's an interesting point, actually, around how, 'cause we've talked around the data science capability, data team, et cetera. I think that's a really interesting point, Nick, around how the rest of the business adopts or adapts to suddenly having that capability. In the pre-chat, while we were all getting our pints poured for us, if only, but the three of us were talking about democratization of data, who anybody who's listened to all of our podcasts so far will know that I'm not a fan of that term particularly. But I guess that there comes a point, Greg, where that capability gets handed to the business in whatever form or whatever product. And if the business isn't ready to take that on or adopt their ways of working, so a key one for me is around making decisions. Great, I've got a dashboard, and it now tells me that that's red, but now what do I do if there's not the business process there? So what have you seen in terms of the, I guess the adoption of that data team within the wider business, and what works and what doesn't work for the organization?

Greg: Right, so there's two interesting questions there. I think the democratization of data still thrills me as a notion. (laughing) But I feel like I just maybe haven't been lucky enough to see it done really brilliantly. I mean, I've worked with some great data analytics teams, but I'm not sure that I would say that they've been able to fully empower their stakeholders to do analysis themselves, which is, I think, part of what I think of as democratization. So that one, if people know how to do that, I would love to hear from them, 'cause I'd love to learn.

Yeah, there was a second question.

Andy: I guess it's around, so I've read other blogs and stuff recently about, there's a data product produced, which ultimately, you know, whether that goes out to a consumer, but let's assume that the business utilizes it, whether that's a dashboard or some alert system or control room capability or whatever it is, but if that business isn't ready to adopt that, or that part of the business isn't ready to adopt that new capability, I'll cartoon it, we, well, it's not a cartoon, but historically, we did work in sort of operational improvement, and you get Dave, the factory manager, who's running around with a spreadsheet, printed off on his clipboard and knows exactly what's going on. Now, we all know that that could be done way better, and I could put a device in his hand that would have real-time information about the production line and what's working and what's not working, but if Dave doesn't like that or he feels comfortable with his spreadsheet, that, it comes a bit back to your being close to the business, but there's a whole piece that comes after that which gets Dave ready to use that data capability. And just interested around, how have you seen that evolve? 'Cause I think that does come into the democratization data. I think people are becoming more savvy or believe they are anyway.

Greg: Yeah, so I think, I'm not sure what the right phrase for it is, but there's, I suppose, adoption. So I can think of a few examples of where that's been done well and probably many more where it hasn't. Because it's so easy, it's almost the default to build something that people don't want or don't realize they want, which might be the perspective of the data science team. They'd want it if they only knew. And so the data product manager is absolutely key there.

So to succeed, you have to probably have built up a long-term relationship with them. So when this worked well, in my experience, in one case, it resulted from the fact that the model was built in close concert with the end users. And so they helped shape, it was an audience segmentation model. And they helped shape the approach, the parameterization. They participated in naming the different segments. There were some really fun names. And so when it was finally in production, they were already kind of invested, bought in a part of its creation. And they had a whole plan for which people were gonna work on which segments. And so it was an enormous success.

The other time I can think of that could have easily gone the way that you described with Dave was where we were working closely with a forecasting team who had until then been doing everything manually with some automation. And we were proposing to work with them to see whether machine learning could help. And I think there were a couple of things that helped a great deal. One was again about human relationships. We had an incredibly charming and super smart data scientist who was very junior at the time and was seconded with them for a month. And after a month of like beers with them and making friends, like he understood their pain points. And so when we went to try and build a data product that was intended to help them, not only can we rely on their help for shaping it, but also it was just somehow it just worked. The whole process went much more smoothly than it had until he'd been involved. So there's a human relationships part.

I think I'm gonna suggest that there's another profound approach we can take as a data science team that can make a big difference. So before Channel 4, I also worked with the Guardian. And in both cases, I was part of the team that tend to think in terms of numbers, working in organizations filled with people who tend to think in terms of words or images. And so it would be very, very easy, I think, to create a dynamic way of sort of, hey, good news, everybody. We're here with our machine learning algorithms to automate your jobs away. And to become, I think, a very threatening presence, a threatening know-it-all presence that alienates people.

And one of the things I think we did right with the forecasting was to say, listen, there's no way that the machine learning is gonna be as good as the humans on their own in their sort of main domain of expertise. There's so much nuance that goes into their judgment, so much experience. The machine learning algorithms are just not gonna do as good a job. But there's a few areas where we think they could improve in what the humans are doing, perhaps 'cause they don't have enough time or because actually the data is sufficiently rich that we'd expect that the algorithms will outperform human because there's no bias and tiredness. So you can imagine creating a hybrid system that's greater than the sum of its parts. And the name that we used to give this, we used to call these centaur teams, as in the half human, half horse creatures of Greek mythology.

Andy: Love it.

Greg: Where, exactly. And so those hybrid teams end up being potentially much greater than the sum of their parts. And this example of where you've got a kind of distribution of division of labor based on, is a simple one. I think there are much more sophisticated examples. And that this is really the pattern that I see emerging from large swathes of knowledge work, that it's not realistic to imagine that an algorithm will do as good a job as a smart human being, but there are various ways in which together you end up with something much better than either in isolation.

Andy: Yeah, I thought where you were gonna go there, interesting, at the end, the description, not to mix mythical creatures, but the whole unicorn piece. I remember when we started down this path in terms of setting up the consultancy, everybody then talked about the data unicorn and the data science unicorn, and data scientists was the next key role. I don't know about you, but there seems to be a bit of an acceptance now, in a good way, that it's unrealistic to expect individuals to cover off all of those capabilities or all of those roles. I'm saying that as long as they've got an awareness of it, for me, it's the same as any, if you go back to more management consultancy, you can't be great at everything, but recognizing where you need to back up your capability by bringing in other people from a team, or a different team, gets you to that, so it's greater than all of the parts. Which I think is really important, because the different perspectives, I find play into better outcomes for the end user, and without getting into the whole ethics piece, but the different perspectives about what that data tells you, how to interpret it, and like you say, it's never gonna replace a human, so putting a human at the end of it to take that forward, I think is really key.

Greg: Yeah, I'm with you. I think you can make a clear business case for the effectiveness of diverse perspectives, and to some degree, even machines and humans are a form of diversity.

Andy: Yes. You must see that now, Nick, mustn't you?

Nick: Yeah, and I think just to link back to a couple of the other topics around the multidisciplinary team, and making sure you've got a plan to get user adoption, and things like that, I've seen people start to include roles such as UX and UI within data, and also a data comms person, so someone who's talking about what the data team or the data function are doing, relating it to the business, the business values, the business goals, and communicating that as if it was any other project. Is that anything that you've seen, Greg?

Greg: So there's an element, which I think that's a role played by the data product manager, but the only pushback I'd suggest is that if it is, the minute, I would not want to suggest that the data scientists don't have to be great communicators. I think data scientists do need to be great communicators, and the good news is that it's something you can learn, and this gets into, I suppose, the other half of what I would give as my answer to what makes a great data science team.

If the first half is picking the right problems, which is a lot of what we've talked about so far, the second half is obviously delivering and getting them into production, and part of the way that you make a team that can deliver really well is by training them, and training can mean saying, we're all gonna make sure that we have a high standard of capability as software engineers, there's loads of ways to talk about that, we're all gonna make sure that we've read the basic papers that we need to be able to do our jobs in terms of machine learning theory. We're also gonna think about soft skills, we're also gonna think about stakeholder management, and being able to communicate really effectively to each other and to non-technical people, and those are all learnable skills. Even just practice alone will make you good or better. Practice plus great feedback in a structured program makes a transformative difference to everybody.

Andy: I think that's a great point around the softer skills as well. One of my guys said to me the other day, 'cause we were talking about the difficulties with managing your backlog with your budget, with your time, with your deliverables, with your deadlines, et cetera, we found that people are starting to work longer hours 'cause there's a hard delineation between work and pay.

Greg: I came from the startup world and actually made this mistake myself horribly, and it sort of had a really colossal impact on how I think about things and how I try and work myself, and it makes me much more worried about it when I see other people at risk of burnout. But burnout is much more than just working too many hours. There's a cynicism and feelings of frustration that accompany it.

So I guess I would suggest that if we're just talking about the number of hours in a week, the part of the solution, I think, at least for me, when managing a team is to explicitly demarcate different gears. So it may be that if you're in the midst of an exciting bit of the project and you think you can knock it on the head in a couple of weeks if you really burst at it, and there's a joy to that, a camaraderie to choosing to stay late on a goal that feels sort of manageable and seeing real progress and having a kind of a finish line. But then once you hit that finish line, you explicitly signal that now's a good time to take a break, right? Because this is a marathon, not a sprint.

And I think it's probably different in consulting versus working for a larger company, but one way you can do that, 'cause people, I think, get a tremendous energy, at least I do, some people do, from those changes of pace. So one of the things we do is explicitly have very short projects, sometimes a few days or a couple of weeks, usually building prototypes to establish whether something's feasible before we embark on a longer project. So you'd have a couple of weeks every quarter where you knew you were gonna do a mad dash, and then the rest of the time would be a more measured pace 'cause honestly, shipping something broken into production ends up being much more costly in firefighting than fun. And so you provide an outlet for that like, kind of desire in us all to really run and feel the wind in our hair without making it be a kind of either an expectation or kind of running to exhaustion.

Nick: Yeah, yeah, I'm having half seven calls with Europe and 6 p.m. calls with the US and massive long days, but yeah, there's a lot going on and right now we're kind of sprinting. And I think it's a great bit of advice to then start to think about when do we plan that slower pace, when can the team catch a breath and start to think about what they've achieved, what they've done and what they're gonna do next. I think it's a great thing to start to incorporate into everyone's data function.

So I think we've kind of covered a lot of what makes a good data science team or a good data scientist. Is there anyone, any company or any organization that you think is doing this well and think is kind of leading the way with data science?

Greg: So it's tricky because one rarely gets to glimpse inside other organizations. And so it's hard to evaluate based on the quality of someone's chat, whether the innards match the plumage. So it's very hard to say from the outside. I suppose I've always been really impressed by the FT and I think they're very thoughtful and very competent having interacted with them a fair amount. So they're the ones that probably stick out to me.

Andy: And Andy, I guess you obviously work with a wide range of clients. Is there anyone that stands out for you at the moment?

Andy: So I've seen sort of quite small pockets actually, as opposed to broader teams. I'm slightly reluctant to name them given that we do work with them. But some have stood out because they've generated or curated, maybe is a better term, quite niche teams internally and they operate at a, well, I'll be like Greg was saying, they've managed to somehow operate or find a way of operating at a really high cadence all the time. I don't think that's always the same individuals, but it's created a desire from the outside business to want stuff from them all the time 'cause they deliver all the time.

I do think, and this is a bit controversial, I do think there's a lot of, as Greg said, there's a lot of chat out there that I'm not sure I believe as well. I don't know what people think 'cause I saw, trying to remember what the article was now, but somebody talking today about there is, they're seeing a trend of a flurry of activity. So a bit like you said, Greg, I can do a really short project, create a prototype, business gets excited, and then you can slow down a bit while you sort of productionize it, if you will. But I read somewhere today that businesses are now becoming almost apathetic to that. And so the initial interest in it has waned and the data science teams or data teams are struggling to find the next thing of value or maintain that value. And that's really interesting for me. Has the bubble burst slightly? I doubt it very much, but it's interesting that somebody was noticing that as a small but growing trend.

Greg: So we can definitely predict with confidence that the hype cycle pendulum will swing in a kind of more ominous direction over the coming years. Having hired and created a lot of data science teams, they are now gonna be on the hook to create some really measurable value. And it's not at all easy to do. And so there's gonna be a bunch of people who are gonna feel like they were sold a lemon.

Andy: No pressure, Nick, by the way. (laughing)

I think it's that whole, there's always, well, there's two ends of the spectrum. There's the execs that think AI is gonna solve all of their problems. And I use the term AI and not data science on purpose. And then the other end is those that just don't really know what it can do. And you've got to find the middle ground and have them as the champions of your data function or your data capability. And I think that also plays a part into what makes a good or a great data science team is having the execs on board to support, help develop and help play a part in that function and what it delivers. So yeah, I think that plays a really big part.

Greg: Yeah, you're dead right.

Andy: 'Cause that does link back to what Greg said about it's not a sprint, it's a marathon. But I think we've all just acknowledged that there will be a dwindling of interest if they can't maintain value delivery. But I think you're right. If the exec isn't on board and recognizes that it's a marathon, not a sprint, you're gonna face these problems quite quickly.

Greg: Yeah, I suspect.

Andy: I think it's one of those things where I was lucky in that my boss was great at managing upwards and getting that support. So it's easy for me to kind of ignore it or forget about it. In practice, I think it's totally essential, exactly as you said, Nick, that you need that exec buy-in in order to be able to do a really effective job. You need the resources and to be able to coordinate with a whole bunch of teams, 'cause often everything's kind of interwoven. Yeah, that's essential. (laughs)

Three Questions Segment

Andy: Okay, so last time we had a set of questions that we decided we were gonna ask every guest. So Andy, do you wanna kick off with the first one?

Andy: Yeah, so this is our three questions to all guests that come on, and it's just a little bit of fun, really, sort of away from the more serious debate around whether execs care about AI or not. And so, Greg, if you could go back and talk to your 20-year-old self, what do you wish you could tell yourself that would help your data career now?

Greg: Well, so the tricky part of this is there's plenty of things I can imagine telling my 20-year-old self, but having met my 20-year-old self, it's hard to imagine what I could say that that self would listen to.

Andy: Yeah. I thought you were gonna say, "I'm only 21, so it's quite easy." (both laugh)

Greg: Yeah, exactly. So I suppose, I think, let's see. So if it was specifically about my data career, probably the single most, and I think I knew this, I just didn't wanna believe it, so do as much maths early on as possible because once you get past university, it gets really hard to, and I tried, it gets really hard to motivate yourself to really make progress learning new bits of maths in your spare time. And if you don't do the exercises, it doesn't count. That is undeniably true. I spent quite a lot of time reading math textbooks and programming textbooks, and if you don't do the exercises, you might as well not bother. (laughs)

Andy: I need you to go and tell my son that 'cause he just, he guesses the answers and he gets there 90% of the time. It doesn't work out why he can't actually do the thinking.

Greg: Yeah, I'm trying to figure out what I could say to my 20-year-old self 'cause I think at some level, I knew that then. (both laugh)

So I think the second thing I realized was, I mean, my 20-year-old self was sort of almost debilitated by procrastination, and that was true up until about my 30s. I just procrastinated so much about everything. I happened to be really, really good at estimating how long things would take and working extremely effectively in a blind panic in that final phase. So it all worked out just about fine, but I think when I finally realized that procrastination ultimately is a failure of emotion regulation, basically there's something, there's a negative emotion that you don't wanna face, right? I don't wanna deal with the boredom or the stress or the uncertainty or the confusion or the fear of failure, and so for that reason, I'm not doing this thing. Once you realize that, focus instead on the kind of, the positive feeling of relief you will have when you've done it. Somehow that helped a great deal along with a bunch of the other standard advice.

And so not doing the exercises, but reading the textbook anyway, 'cause that's the fun bit, is in some sense a form of that failure of emotion regulation. You know it won't actually do much good, but you're not willing to pay the kind of emotional pain of long-term learning and gain.

Nick: Okay, so Greg, what one training piece of learning or course would you recommend to anyone joining the data profession? And I've got a sneaking suspicion of what it might be.

Greg: Well, actually, I'd love to know what you think it might be and then I'll answer.

Nick: Well, given your previous answer on maths, I was wondering whether it might be maths related.

Greg: No, actually, 'cause I think maths is an important component in being successful at data. But actually, I think there's lots of ways of being a successful data scientist. And I've never been a particularly strong mathematician. I put a lot of effort into getting good at software engineering and that ended up being my kind of, the thing I could always fall back on or that gave me confidence.

In terms of the one thing that I would recommend to anyone joining the data profession, I really struggled to think about this 'cause I don't think there is, I really struggle to think of any one thing. But I'll tell you, anyone who's going, there's a certain inflection point when you go from being a relatively senior individual contributor, you kind of know what you're doing, you've been around the block a few times, people respect your judgment, you get things done. And at some point, you get asked to manage other people. And that inflection point is really hard, especially once you get to a team of about five people. Five people is the point where suddenly it goes from, you can manage them and still do a bunch of good stuff yourself and still be on top of all the programming. After, as it gets larger than that, you suddenly realize, oh my goodness, I actually don't really know what I'm doing anymore. It's been a while since I deployed any code and actually somebody else is doing all the difficult bits. And you start to feel really like, you feel a lot less valuable and a lot less confident. And you might cling to that former feeling of competence 'cause you're probably not very good at managing yet. And I distinctly remember this period, it lasted about eight years. Trying to be an individual contributor, but really I should have been focusing on the management.

And so my favorite resource for this is called Software Lead Weekly. And so it's a sort of compendium of great articles that covers this and many other issues around leading a team. And it's not data science specific, but only a handful of the problems in data science are data science specific. Mostly it's, most of the problems in data science are kind of a superset of the problems in running a sort of software product team.

Nick: Interesting. That is not what I was expecting you to say at all. That's a lot. So I'm always intrigued by this one and see what weird and wacky answers people come up with. So if you could sit down with anyone in data, dead or alive, to chat to during one of our data podcasts, who would you meet?

Greg: This is, I don't know how tenuous this is. For me, it'd be Alan Turing for a few different reasons. So there's a meaningful sense in which I think Bletchley Park was the first and probably the most successful data science team of all time. (laughing)

In the space of, I think something like three, four years, they picked the right problems, built their own computer from scratch, including the theory needed, invented a bunch of crazy Bayesian mathematics to enable them to sort of optimize their project management and how to work on it. And, oh, and then cryptography. And so I think that's part of the reason, but it's actually probably not the main reason.

The main reason is I'm just absolutely delighted by the notion of the Turing test as a test for intelligence. I spent ages thinking about ways in which we could improve upon it as a means of determining whether something's intelligent. And I came up with one set of ideas by stealing from Dungeons and Dragons of all places, a potential Turing test mark two. And I would love to hear him sort of rubbish it with his astonishing thinking.

Andy: I think many other people would be like, that could be a follow-up of the most, I think. (laughing) The Greg Depto version of the Turing test with Dungeons and Dragons, I like that.

Greg: Oh, be careful what you wish for. (laughing)

Nick: Yeah, it might be a Christmas special, I think.

Closing

Nick: Cool, so yeah, I think, I think, you know, I've definitely finished my pint. I feel as though I could do with another now, but, you know, Greg, it's been great chatting to you. I think we've learned a lot about data science. You know, the different spectrums of data science, what makes a good data science function. And, you know, thoroughly enjoyed chatting to you. So thank you very much.

Andy: Yeah, cheers, Greg. Thank you for that. I've made lots of notes that I'm taking back to the office tomorrow morning. So I appreciate your time. That's been a great episode again. Thank you.

Greg: Oh, it's been brilliant. Thank you.

Automated transcript

The Data Pubcast - Episode 3: What Makes a Great Data Scientist and Data Science Team?

Three Questions Segment

Closing

Sign up for more like this.