UB Insider #57: Dawn of the Data Age
About this episode:
The world is in the midst of a data explosion. For some companies, big data is a magic bullet to solve all of their problems. For others, there doesn’t seem to be any reason to deviate from business as usual. In this episode of UB Insider, Luciano Pesci, co-founder and CEO of Emperitas, founder and director of the Utah Community Research Group at the University of Utah, and a teacher of data analysis and science for the University of Utah and Westminster College, talks about how data is changing the business environment and what companies need to consider to stay ahead. Click here to read more on Pesci’s lecture series that is mentioned in the interview. Subscribe to our podcast or download this episode on Apple Podcasts, Stitcher or Google Play.
Lisa Christensen: Hello and welcome to UB Insider. I’m Lisa Christensen, online editor at Utah Business magazine. Between the unprecedented connectivity of the internet and the immense capability to collect and process information, the world is in the midst of a data explosion. Some companies see it as a magic bullet while others don’t see the need to change the way business has been done at all.
Luciano Pesci, Co-Founder and CEO of Emperitas, Founder and Director of the Utah Community Research Group at the University of Utah and teacher of data analysis and science for the University of Utah and Westminster is here to talk about how the data age is changing business and what businesses need to consider to stay ahead. Welcome.
Luciano Pesci: It is a probabilistic miracle that we’re here today, right?
Lisa Christensen: Yeah.
Luciano Pesci: So we’re going to talk a lot about data, but as you were going through that intro it hit me, all the things that had to happen to get to his moment. If you look at those from a probability standpoint they’re each an independent event with almost infinite possibilities, right? We could have been locked out. We could have been delayed. That moment would be different than any other moment. So from a probability standpoint any one of those things has a chance of almost not happening, it’s almost zero. So with that long winded explanation, thank you for having me here.
Lisa Christensen: Well I’m glad the probability smiled on us to…
Luciano Pesci: It was favorable this time.
Lisa Christensen: Yeah. So when you talk about the dawn of the data age what exactly do you mean? How has the access to data changed over the years?
Luciano Pesci: So that title is the title of the lecture series that I’m doing not on-campus, off-campus with businesses and some former students have been coming, it’s both online and offline. And that title was picked because as we reflected, as we started to go through and looked at the ways in which organizations are currently using data we tried to find those origins.
We’re kind of curious by nature because we’re mostly economists so we dig into things like what was the origin, what were the competing theories? And we’ve come so far so fast that I think people don’t appreciate the scope at all. I think that there’s an element within the community that believes that this is just another wave of something, that big data will die at some point. There are some people who eagerly await it. But it’s not. It’s not going to go away because the efficiency gains from information, which is what data is, they are hugely, hugely impactful on competition and markets and outcomes. So we are at the very, very beginning.
If we look back really to the three technologies that make this all possible that you talked about in your intro we’re talking about the first computing. It’s 1956 with the IBM RAMAC, right? Five megabytes for $30,000 a month of storage for accounting data. We’re talking about connectivity, what became the internet the ARPANET that had a node here at the University of Utah early on. All those things were so expensive that they were the exclusive domain of the state. In fact, the word statistic comes from statisticum collegium, council of the state. The only people who could afford data was the government. No one else could afford to do that, to store it, to do it by hand. In one generation computers have literally democratized it to everybody.
Lisa Christensen: So that has brought on a whole new way of looking at data and treating data since this is so recently democratized?
Luciano Pesci: Yeah, well there’s new methods that are coming out at a record pace. That’s part of it. The other is the technology, the costs are plummeting, the speeds are going up, the qualities are going up. There was a good image floating around the web that compared car technology development to computer technology development.
So if you took a car and you said let’s look from the Ford Model T in the early 1900’s to today, a hundred years, and we say let’s imagine that it progressed at the same rate of technological improvement that computers have since 1956 till today, so not quite the same time period but you’re just trying to get a sense of the magnitudes, right? If cars had progressed as well as the computers that we’re using for data science they would have six million horsepower, they would go something like three million miles on a gallon of gas that cost $4400. They wouldn’t be even close to the same thing. So it’s not easy for people to see that the computer keeps getting smaller, faster, better. That allows new methods in data science. It also allows for methods that have been around for a long time on academic books to start to be tested and validated which has been really interesting.
Lisa Christensen: So one of the things you talk about is data culture. So what does that mean for businesses, and what are some examples of businesses that are doing it really well?
Luciano Pesci: So the examples of those that are doing it really well, they’re the biggest companies right now in the world for a reason. Amazon is probably the best example of a company that has a data culture meaning that wherever it is possible for them to quantify and measure for the purposes of examining it for improvement, that’s a data culture. And I lay it out that way because there are a lot of people that just collect data.
Probably the number one thing that we hear now is that people are just drowning in it. They have no idea what to do with it. That’s because they’re not using any sort of frameworks to guide the data science. They just hire data scientists which is part of my lecture that we just did. They don’t know which type of data scientist they need to hire but they hire a data scientist. That person doesn’t really make any traction. It’s because there’s no bigger incorporation into the organization and the strategy.
And for Amazon their goal was always removing friction from the buying process, just improving the customer experience as much as possible. So they not only collect data, they then analyze it in a very transparent way. And I think that’s the part that’s missing or maybe scares some organizations from embracing data science is there are some who don’t want to see the output. If you don’t want to see the output, if you’re not ready to see the output, to possibly hear that your baby is ugly then why would you hire data scientists to do anything? And that’s rare. It’s rare that a data scientist gets through all of the analysis and presents something that is then actively fought. It happens, but more than likely it’s just that they’re either not doing it. They’ve got the data and they’re not doing it or they didn’t hire the right talent and they’re not making any traction doing it.
Lisa Christensen: So the successful places have a specific goal they want to accomplish and they use data to accomplish that goal? They have a direction to go with that data?
Luciano Pesci: They have a framework. So to keep emphasizing the idea of frameworks because they’ve been cropping up in things like marketing and sales. The customer journey is a great one. It’s one that we use. Because it forces you to say what is the data that fits at this stage and how does that data differ from data at another stage and how does that then cross boundary lines within the organization? I mean, organizations put all of their problems into silos. Marketing goes into its little group, right? So that they can leverage all of their shared expertise and focus on that. Well that’s great but a lot of their problem overlaps with sales and if there isn’t sharing of information and data actively between those then they’re only ever going to be able to do a minimal amount with what they’ve got, their view point.
So the problems that the organizations face overlap the boundaries of these silo’d departments. And so if they have a bigger framework, something like the customer journey, it forces them to see where their contributions are and where that overlap happens so that they can start to collaborate. Data is a good place, again, if you can empower people within the organization to not only have access to it but to then start to actually use it in a transparent way to guide towards those outcomes, to have hard metrics that you’re trying to guide yourself towards it’s far more successful. So frameworks really are the key.
Lisa Christensen: So going back to the data scientist that you mentioned, you were talking about how companies hire a data scientist because they figure they need a data scientist but they don’t consider exactly what kind of data scientist that they need, or they don’t establish a framework in which they want the data scientist to work. So how can businesses hire the right kind of data scientists and then have them get the most bang out of their buck?
Luciano Pesci: So there is some high level and some low level advice on this. I think the low level advice is that you should be looking at people in a complete picture. I think there’s a lot of talent that gets overlooked because it doesn’t cross off enough of the check boxes that employers are arbitrarily putting on most of these job descriptions because they don’t understand what they’re hiring for because they haven’t actually figured out the framework and the data that fits into it and what’s even going to be needed to handle that.
So, for example, there’s really four functions within data science. There are those that I call the “data guardians.” These are the people who are database administrators who are maybe ETL people. They are the ones who are making sure that the data is getting put into one place where it will be usable where people will have access. They are managing that access. They are architecting it. They know what’s in a table if you need to know because you need to pull from the data. And that job is completely different than the statistical programming type of data science approach when you are going to pull that data, you’re going to use that data and start to run models and descriptives and do visualizations. And those are probably the two core data science types right now.
There are two more that have emerged that are less frequently hired: the data detectives and the last group would be the data storytellers. These are people who are really good at knowing what data matters, the decision maker problem, how to tie it into that framework for the organization, what other sources of data that aren’t internally generated that might be useful. And then they can put that into very easy to understand information. Because even those organizations that have a framework and have been successful in hiring data scientists, if they get all that really complicated work done and they can’t convince the stakeholders and decision makers who are going to act on it to believe it’s true or to understand it appropriately and maybe even understand the limits of it what it’s saying then there’s still the chance of failure. And this is why it’s 80%, four out of five data projects are actually the failures. They’re not the Amazon success stories right now.
So to get back to your question of how to hire, I think understanding your framework and how the data fits into that framework will then be clear to you. Do we need people to aggregate data, because then we need a DBA. We need a data guardian. Do we have lots and lots of data and we just need someone to really go through it and tell us what’s going on? You might need a data storyteller first. Most people think they have data, sometimes that’s not true. Sometimes they think they’re collecting it and they’re not. And then it’s just sitting there. So this is a top down problem. The organizations have figure out a direction and then they can hire the appropriate talent.
Lisa Christensen: So it sounds like a lot of the disconnect there is coming from maybe a misconception of big data being having data for data’s sake and that, you know, somehow helping the organization.
Luciano Pesci: That bigger’s always better?
Lisa Christensen: Yeah.
Luciano Pesci: There’s a really good, I don’t know if it’s a colloquial term or folk term that “perfect is the enemy of better.” I think is really true with data analysis. There is this belief that if you don’t have big data and you don’t have the AI expert that you can’t be competitive in data. And that couldn’t be further from the truth. Most people don’t even need big data to start to optimize big things in their organization.
One of the nice things about being a teacher and having students that I’ve stayed in contact with over the years is I hear their stories. And there’s one guy in particular that’s part of a local organization. I won’t name them but they’re a great one. And we go out to lunch maybe every quarter, two quarters. And I call them million dollar lunches because the punchline of the story is that it ends with, “And then I did this with data and it saved us a half a million dollars.” Or it saved us a million dollars. He’s not doing AI with billions of rows of data. He’s doing that data detective role of going through the organization and hearing pain points where people have uncertainty, figuring out where there actually is a match up in existing data in the company and if it’s not there starting to collect it. And then bringing the information back to them in simple reports and they’re able to see something. It’s the information. All this obsession over data misses the point that it’s the information that it represents that’s supposed to be the useful part. And it doesn’t have to be big to do that.
Lisa Christensen: Going along with how fast computers have been developing and are developing, and how good our machine learning is getting, some people have fears that some day the computers are going to take over the world, that humans are going to become obsolete. In your presentation you did touch on that and you did talk about why machines and humans are both essential when you’re looking at data. Would you mind just explaining that?
Luciano Pesci: Yeah. I think it’s kind of ironic that people are fearing machines so much. Now this is not new. I told you I like to dig into the history of technologies and their patterns. If you go all the way back you had people who feared the rail. They thought that moving at that speed pressed your organs against your back and your body and was going to kill you.
Lisa Christensen: Wasn’t there something about how the speed is going to rip the skin off your face?
Luciano Pesci: Yeah. Women were told not to ride bicycles because it would stretch the skin on their face. There isn’t a technology that you can’t go back and point to that didn’t start with a lot of fear. And sometimes that fear is founded. So steam engines, steam boats, one of the most revolutionary products in American history, 40% of the time they were blowing up.
Tesla, another good model example of a car company could not be selling their cars right now if two out of five blew up. So you had a good suspicion that maybe version one isn’t going to be that great, but version two has consistently been better and version three has been significantly better. So that fear was always there.
I actually think that the only fear that people should have around machines is the Skynet and how other humans will command it. Because I think that right now in a power structure situation there are still humans involved and they would refuse to do certain things. That will not be true of machines. They will follow code. That is the potential danger. It’s not that they’re going to think themselves into a superiority complex. It’s not that they’re going to threaten every single job. They will take all of these jobs that humans cannot do repetitively at massive, massive scales. But machines don’t create so far. Not in a true sense. They have to be fed information to then model that behavior on.
This idea that we’ll get to, the singularity that there was this world before computers were as good as humans, then this world where computers are as good as humans, and then the world where computers are even better than humans. I just don’t see why there has to be a negative fear throughout that process because the roles are very complementary at least in the first two of those. And then I hope the third works out more like Star Trek where it’s just abundance. But the part that the human is still much better at than the machine is the interpretation. So a machine can process information significantly faster. It can just go through things faster. It can search a database faster. It can do discovery.
So IBM Watson was rebranded as Ross. And Ross is a hired lawyer that’s putting out millions of lawyers out of business right now that people aren’t paying much attention to. And the process that Ross is doing is just searching through data files of text discovery for things like keywords. So that can be done just so much faster by a machine. So why do it as humans anymore? The part that’s really useful is okay, once you have all that evidence put in front of you, what’s the case that you’re going to make? What’s the story that you have to tell from this information? The machines do not do that part for you. They just show you the cat that they found in the YouTube video. And then you’ve got to know the story of that cat and how that cat fits into your life. And how YouTube fits into your life. And humans are still far better at that than the machines.
Lisa Christensen: So as with every other discovery or advancement it’s people that we have to worry about. Not machines, not tools.
Luciano Pesci: Some people. Yeah, I’d say some people.
Lisa Christensen: Some people.
Luciano Pesci: That’s the other part. That’s the potential of these machines. You’ve got things like facial recognition, you’ve got things like RFID. These technologies could be used for immense good just like they could be used for immense bad. The machine won’t determine that itself. A human would determine that.
Lisa Christensen: You have another lecture coming up this month. Do you want to talk about that?
Luciano Pesci: Sure. Yeah, it’s on August 18th and the in person and the live will happen at the same time on August 18th. It will be in our neighbor building in Cottonwood Heights at MasterControl. So if you just look up Emperitas it’s just the building right next to it. It will be about an hour discussion, lecture and then thirty minutes of free discussion. It’s really built just around education. And the topic for this next, upcoming lecture is Getting to Quick Data Wins because this is another reason why these data projects fail, because perfect is the enemy of good. And they set these really impossible to reach goals that it has to do x, y, z within some time frame and then that doesn’t happen and then they just dismiss it as a waste of, a very expensive, waste of money and time.
And again, if you’re trying to do it in that approach of you think you know it and you’re going to build out the entire system all at once then I agree that I think that it’s actually probabilistically, to go back to the probability, probabilistically against you. But if you can break down your problems into smaller pieces that can be attacked very quickly and validated very quickly then you will start to see the gains. And that usually is enough encouragement to break down the other hesitancies that people have about not wanting the transparency. And sometimes by the way, people don’t want the transparency because they’re not allowed to fail. That’s another one of the things with data that you’ve got to accept. You will fail at some point. You will run something wrong. You will interpret something wrong. The data will not be clean and so you shouldn’t just take it at straight, blind value. You should discuss it and compare it to all the other evidence, the other information that you have. And if you can do that in quick bursts you will see gains very quickly.
Lisa Christensen: Okay, well we will have information about that on our website. And thank you so much for coming in today.
Luciano Pesci: Thank you for having me.
Lisa Christensen: Thanks also to Greg Shaw for production help today. Be sure to reach out to us on social media at @UtahBusiness or email us at firstname.lastname@example.org. Also make sure to subscribe to UB Insider wherever you get your podcasts. Thanks for listening.