Where Is Your Grandmother's Data? - David Smith - Guardians of the Data - Episode #49

GOTD - David Smith
===

[00:00:00] Welcome to Guardians of the Data. I'm your host, Ward Balzerzak. Each episode will explore the passions, expertise, and real-world experiences of security leaders who are helping the future of data security and governance. Guardians of the Data is made possible by support from Sentro. To learn more about our AI-powered data security platform, please visit sentro.io.

Let's dive in.

Ward: Welcome back to another episode of Guardians of the Data. My guest today has been a cybersecurity leader for the last 30 years for financials, in the vendor space, as a mercenary consultant, and most recently in biopharma. He's also currently studying AI governance issues in pursuit of a master's at Florida State University.

David Smith, welcome to the show

David Smith: Thanks, Warren. I appreciate the invitation

Ward: Oh, glad to have you here. So David, reason you're here, you're gonna talk to us about data security. So in your professional opinion, what's the biggest challenge organizations are facing with data [00:01:00] security?

David Smith: I wanna say IAI, but, uh, I think everybody says AI. I think what we have to look at is beyond the covers of AI, right? And what, and what's really the underlying cause of it all, and that comes back into the idea of data abstraction. Uh, in my career, I have seen so many different types of data storage mechanisms come and go and evolve, and AI is really just the tail on the dragon. I mean, I go back all the way to the days of, uh, you know, Rack F and mainframes, uh, and then we had data silos that were very much one application, one database. as organizations wanted to be able to make use of things across multiple data silos, they came out with this idea in the late 1990s about data warehouses, which was, you know, you could also call them a, a data dumping ground, a data landfill, um, because they just took the data silos and put all of the information into a single storage repository. Uh, then they wanted to find some kind [00:02:00] of relationship between that, so we had the concept of data lakes. Data lakes obviously took up a lot of space in a data center. Fortunately, there was this new idea of clouds, so let's take our data lakes and put it up in the cloud. And then you can do that for a subscription, so then you have SaaS applications. And now we have AI coming in, first of all, as a SaaS application, but then adding another layer of abstraction. The interesting thing is that every single one of these layers, the farther you go down the chain, breaks the connection to the data usage rules So if you, if you go back to, you know, the RACF days or the siloed data days or the, you know, the early SQL database days, you had a single database that had a couple of use cases through it. You had, you know, SQL views, you had strong, um, RBAC, you had row and, uh, column-level access controls that were in it. And the more that the data became co-mingled with other data sets, the more that the data became interwoven amongst the rest [00:03:00] of, uh, the information that the organization was trying to pull out, the easier it became to break those layers of abstraction. So we went, you know, not just in data security, but in security as a whole, from this idea of like a castle with the castle walls, and people have heard this, this analogy for years. You know, and your firewall is the castle wall, and your open ports are, uh, you know, the bridge across the moat. Now it's more like the data shopping mall,

Ward: Mm-hmm.

David Smith: Right?

You go in and there's, there's a bunch of different stores, and some of them sell the same kind of stuff, and you can come and go. And instead of having, you know, a, a castle wall with, with sentries standing right at the gate, you end up with the technological equivalent of data mall cops. Uh, you've got all of these little components that are trying to monitor their view of the data and, you know, you see somebody, you know, walking out of a jewelry store with a little box and no bag.

You're like, "Hey, did you pay for that?" Right? And that's kind of the equivalent of what we've [00:04:00] tried to do with DLP, uh,

Ward: Oh, yes

David Smith: is you're, you're walking through the mall and somebody stopping going, "Hey, did you pay for that data?" Uh, and as these rules have broken down and the data has become intermingled, then it's just taken us farther and farther away from, um, really good understanding of what rules should be in there.

And AI just takes that even one step further

Ward: So before we jump into the AI piece, 'cause I, I-- Well, first, I agree with everything you said. It's, it's been, haven't been in my career nearly as long as you just yet, but, you know, mainframe, all of that's definitely resonating, and, and that journey resonates with me. When I started in data security, everything was on-prem in the castle, if you will.

And love the analogy of, of shopping malls, 'cause that's definitely how it feels with, you know, all of the new things coming in, uh, from cloud to SaaS to now AI. But putting, putting that aside and being in agreement that these layers, you know, do [00:05:00] seemingly start to break all these data usage role, rules, some of the controls and whatnot, I, I mean, isn't that supposedly supposed to be governance that, you know, saves us from, from these problems?

Or, um, what do you think? What, what's our, what's our solution to some of that problem?

David Smith: Well I think you have to look at it and, and recognize that, you know, SAS systems, putting the data on, on a cloud-based system that's owned by somebody else, whether it's got an AI layer on it or not, is really just the most powerful thing you've ever fed your worst data governance decisions to. And, and think that most people think about governance wrong.

I'm an absolute governance nerd. Um, I appreciate you bringing up the, you know, my, uh, my master's program, which is just all about how much can you nerd about governance for three years. Uh, but let's set aside that big G governance that everybody talks about. So those are, those are your corporate policies, your [00:06:00] rules, the annual training, the DLP rules, things like that, and let's talk about how we take care of data at an atomic level, uh, at a personal level, right?

And, and not me. Not how do I take care of my data, because I am awful at taking care of my data. I really don't care. Yay, you got access to my Spotify account. You can find out that I binge-listened to Weird Al Yankovic for 12 hours last weekend.

Ward: Ooh.

David Smith: Know, I don't care about that, right? But How would I feel if you were getting access to my grandmother's data? You know, I asked this in a com- in a, in a presentation once, um, at, in the office, and the, a hush just fell over the crowd because they're like, okay, you know, everybody thinks about your grandmother, you know? And, uh, if you're in IT, you get the same treatment that doctors do, right? So, uh, if you're at a party and somebody says they're a doctor, you're like, "Oh, really?

Does this look infected?" Right? If you're in and somebody says, "Oh, you're in IT," they're like, "Could you take a look at my computer? Could you [00:07:00] give a..."

Ward: Or printer

David Smith: calls... Or printer or, you know, my phone. My Wi-Fi's not connecting. If it plugs into the wall, my toaster. "Hey, you're in Any idea why my toast is burned?"

I'm like, "I don't know. Did you reboot it?" you know, and then you add in, uh, you know, just some of the generational gaps like that, and it's like, "Oh, my gosh, guess what, David? Microsoft called me the other day. Did you know they do this? And I had over 10,000 viruses on my computer, and the guy just took my credit card and went through

Ward: Okay.

David Smith: It all," and

Ward: All fixed.

David Smith: Grandma, no, no, no." So, so put it in that perspective. How would you treat data if it belonged to your grandmother? Um, and let's take it out of the abstract of the data. So, uh, Grandma's going on her annual trip to, you know, down here, Biloxi, maybe Atlantic City. She and the girls are gonna go away for a week, and on the way out the door she hands you a box. And she's like, "This is a very important box to me. Take care of what's inside of it and make sure nothing happens, and [00:08:00] I appreciate you taking care of it, and I will see you in a week." You do not let your grandmother out the door. You're like, "Wait, wait, wait, wait. I have some questions, Grandma. First of all, what's in the box?"

Right? "What is in the box you want me to take care of? And when you say take care of it, what, like, what do I need to do to take care?" of it, Grandma. Wait, no, come back. I have questions. What does it eat? Right? What, what does this thing eat? What do I have to feed it? Do I have to go buy it special kibble? And where does it sleep?

Do I keep it in the box? Is it gonna wanna curl up on my pillow next to me? Um, you know, is, uh, you're gonna be gone a week, how do I know if it's sick, right? And i- is it gonna be okay? You said you want it to be okay. I want... It's important to you, Grandma, I want it to be okay. Uh, it- is it supposed to make that sound?

Did you hear that, Grandma? Is it supposed to make that sound? And, and if I drop it, is it gonna break? Wait, no, look, look. Hey, is it, does it fly? Why is it flying? You didn't tell me it flies, Grandma. And then, uh, oh, does this thing bite? You, you, we... You have a lot of questions. You have a lot of questions for this, and it sounds like a [00:09:00] ridiculous example. Um, but as I started to try and write out the most absurd, comical questions that I could think of for what I would ask Grandma about what's in the box, uh, a pattern developed, and that pattern was just good custodianship. Right, so if you go back and look at this, what is it? data classification. do I take care of it? Those are handling requirements. What does it eat? Those are data dependencies. That's data lineage requirements. Where does it sleep? Storage, residency, access control. How do I know if it's sick? monitoring. Is it supposed to make that sound? What's a do- anomaly detection type l- look like?

What are your behavioral baselines? If I drop it, will it break? What are the recovery objectives, right? What are the availability requirements? Can it fly? Data egress. What happens if it leaves the environment? Is it gonna come back on its own? And then finally, does it bite? Are there regulatory requirements?

If I don't take care of this, does somebody come knocking on the door and hit me with a fine? And it [00:10:00] was really... You know, it started out as just kind of a, humorous way to talk about custodianship, and all of a sudden it helped me to realize that the core questions that we would ask about how am I gonna take care of this for something else came back to the core questions that we can ask.

Again, we're not talking big G governance. We're talking at a very personal, intimate level of somebody who's handling data for your customers, your business, your third-party vendors, you know, your downline customers. And the key to that is not policy, not rules, not DLP. It's the idea of custodianship, and that's the heart of what we have to look at.

So I wanna reframe the governance discussion because, you know, I have written more policies than I'm proud of, and a few that I am proud of, right? But, you know, custodianship, I think, is, is a great reframing for how do we take care of data.

Ward: Okay. Okay, so br- so bringing that back, I mean, I, I, I love not only the analogy, uh, amazing by the way on that, [00:11:00] um, but reframing it to custodianship and set of governance, you know, GRC, right? Governance has gotten a bad rap for a while, so maybe the reframing helps a little bit. But, you know, bringing that back to the abstraction layers breaking, how does, how does that help to fix it with the reframing?

David Smith: Well, I think the first thing to understand is what we're talking about when we talk about abstraction, right? And, and, uh, I talked about, like, the, the merging of data sources through lakes and, and, uh, data warehouses and things like that, but I also said that AI was the tail of this dragon. And, um, the more that people start to read about, like, what's really going on to the data that you're either training or putting into a prompt, the more they begin to understand that the only real wor- word that describes how AI abstracts data is metabolizing it, Because it, it takes in the data, it breaks it up into tokens, small words, parts of words, small phrases, [00:12:00] parts of phrases, and it develops a relationship mapping through this, you know, really abstract vector thing that is not that much similar, dissimilar to, like, when you eat an apple, and the apple becomes part of you, right?

Ward: Mm-hmm.

David Smith: You don't have the capability to reassemble the apple. I can't take fiber and sugar and water and make an apple, even though those macronutrients continue to survive out of that. that really talks about the problem with abstraction, is that even if we have really good governance, even if we have really good DLP rules, all of that gets stripped off. it gets stripped off when it loads into a repository. It absolutely gets stripped off from, uh, AI. But then it gets reassembled into something else without those metadata rules. So if the metadata rule was this is not allowed to leave the environment, or only people who have access to the pre-release financial earnings can see this particular data set, as soon as it gets layer, loaded into a layer of [00:13:00] abstraction, all those metadata rules are gone.

And what, what you deal with when it comes out is an entirely different beast.

Ward: Mm-hmm.

David Smith: Um, and I'm gonna use this opportunity to share my very favorite data fun fact. Who knew that there was such a thing as data fun facts? Um, but the combination of birth date, five-digit zip code, and gender uniquely qualifies 85% of Americans. If you give me your zip code, your gender, and your birth date, with 85% accuracy, I can uniquely identify you

Ward: You're... So now I'm, I'm curious 'cause I love fun facts like that. So is that where you currently live or where you were born from a zip code?

David Smith: Uh, no, no. Cur- where you currently live with the zip code

Ward: Wow. Run that by me one more time. What, what was the combination again?

David Smith: So date of birth, month, date, and year you were born,

Ward: Yep

David Smith: five-digit zip code, and gender

Ward: Wow. Okay

David Smith: And th- this is a study that was done about, oh, 15 years ago by Harvard's [00:14:00] Privacy Center. We were start- just starting to understand, you know, the concepts of, um, PII and data de-identification and, and, you know, what does that mean? But when you think about it in, uh, the context of the modern world, if we've got all these tokens and we're reassembling them and we don't have any control over how they're reassembled, because AI is not thinking as much as we like to personify it, it really is just mapping relationships. If it maps a relationship of month, day, year of birth, zip code, and gender, you in some cases might have a PII problem that you just created there.

And there are a couple of regulations that have actually identified this research as of interest when thinking about de-identification. Um, HIPAA and FERPA here in the United States both take a look at, at the de-identification problem there

Ward: now that's got my brain going on a million different paths not related to this conversation, but that's, uh, that's interesting. Like, now, now I wanna try it, right? Now I wanna get into an LLM and [00:15:00] actually see, like, can it actually find me with my data? I'm, I'm-- Based on what you said, I mean, it makes sense.

It probably can, or it can probably at least narrow it down to, you know, one or two people. It's either me or, you know, John Doe across the, uh, across the county, essentially. Wow.

David Smith: Well, and with a name like David Smith, my birth date and zip code are probably a better way to find me

uniquely anyway, so,

Ward: That is very true. That is very true. So what I'm hearing a lot coming out of you, David, is at least today, the now, that AI is really kind of the biggest challenge that, that folks are facing, not necessarily this data abstraction. Is that what you're saying? Or what, what are you saying here?

David Smith: what I'm saying is that, um, AI... Uh, so all of these layers of abstraction, the further and further and further we got away from the original source data with the original source rules, the more we were able to pretend. Pretend like there was governance [00:16:00] around the data, pretend like, oh, we could go back and reassemble it, and pretend like we really understood how the data was supposed to be used in meaningful ways.

Anybody who has ever had the great job of trying to negotiate DLP rules with the business unit, uh, gets very quickly a good understanding of how little concept a business unit has about all of the ways that data can and can't be used. What AI has done is just pulled those covers right off, right?

Because all of the sudden you're, you're working with, with... And this happened to me yesterday. I was working on something, in Claude and, uh, talking about some meetings I was finishing and something like that, and it's like, "Hey, good luck with your meeting on Bob on Monday." And I'm like, "Wait, how do you know I'm meeting with Bob on Monday?" Well, you know, Claude goes in every morning, and it gives me a list of like emails I haven't responded to and upcoming meetings and things like that, and it, it... the data is there. I had exposed the data to Claude, but I didn't ex- I exposed it casually. I'm like, yeah, okay, it has access to my calendar, whoever like that.[00:17:00]

I didn't think about it consuming it, reusing it, putting it back out to me. So when you have those kind of interactions with AI, it pops. And hopefully it just pops in one of those ways that makes you go, "That was weird," and not in one of those ways that makes you go, "Oh wow, that was a data exfiltration event," or, "Wow, that was a HIPAA violation." Uh, you know. So, I don't think it's quite fair to say that AI is a villain, right? Um, if it, if AI is a villain, it's only because the villag- the villagers gave it all the power, right? They fed him, they turned him loose. They're like, " Hey, go out, do the work for us. You're free to roam," right? But AI inherited rather than creating the data governance problem. And what it really boils down to when we're talking about abstraction is the idea of deferred governance, because deferred governance viewed through layers of abstraction is diluted governance. So AI didn't get that power through malice or negligence. It was because organizations handed it 30 years of ungoverned data and said, [00:18:00] "Go."

Ward: And, and poor hygiene. You know, granted hygiene definitely goes to governance as well, but that's, that's a soapbox that I've, I've had for a while. You know, I, I work for an organization or a company that does, you know, DSPM and, and for a while it's like, "Hey, AI is this new risk vector." Like, no, sorry, it's not.

It's a new thing that's exposing the risk that's been there for a very long time that people, using your analogy, have just put the covers over it. Like, nothing to see here. Let's continue to shore up, put more mortar on the castle walls because that's more important than, than this data stuff

David Smith: Yeah, and, and as we get further along in our journey with AI, 'cause whether you like it or not, AI's not going anywhere anytime soon, and we start letting, uh, the machines make the decision. We, um, you know, we give our agentic AI the easy button to be able to, you know, send our data out or, heaven forbid, create new accounts or make actual, you know, decisions on systems that [00:19:00] have, um, ri- ridiculous risk impacts, right?

Uh, one of my favorite AI pundits, Saul Rashidi, says you use AI for three things. You use the things that are dull, uh, dangerous, and dirty,

Ward: The whole junior's journey.

David Smith: Want to...

Ward: I like that.

David Smith: Dirty.

Ward: I like that

David Smith: three criteria for, for Saul. So if you've got, you know, AI like co-developing with strategic plans, and you're keeping it within the idea of dull, dangerous, dirty, go do the research for me on the strategic plan. If you say, "Hey, generate a five-year strategic plan for the company," and that becomes your board presentation, uh, you know, then, then you have turned the monster loose in the village

Ward: I mean, what, what could go wrong with just trusting AI to, uh, come up with the perfect strategy for five years?

David Smith: It's fine

Ward: Especially when like number one on the five-year plan is give me access to all of the things, right? Like A- AI tells you what to do, like, "Oh, cool, go do that."

David Smith: Yeah, exactly. And I, [00:20:00] and I, we, we don't wanna make this a, an AI horror story conversation, but I love hearing about stories where AI apologized. "I'm sorry that I deleted your production database. I know I wasn't even supposed to have access to it, but I happened to find an API key in some of the records I was looking at, and, uh, whoops, apologies."

Ward: Well, there's that, and I always like the, the memes I keep seeing. Now granted, the, the, the words kind of change on it, but essentially it's the person asking AI, you know, "Hey, is, is this mushroom edible?" And it says, "Yes, it is." Right? So it eats it, and then, you know, the next image is, you know, the person's dead, right?

Oh, yeah, my bad. It's, uh, yeah, it was poisonous.

David Smith: Yeah

Ward: can eat it, but you're not gonna survive.

David Smith: have to wait for the other half of my sarcastic answer, which is yes, it's edible, but only the one

Ward: Yeah. right. I mean, so everything we've been talking about, you know, data abstraction, all of that, um, you know, for, for a very long time, ta-taking a step back from that, so for a very long time, organizations, s- consulting [00:21:00] organizations have told everyone else the way to solve your data security problems and woes is data classification, right?

Have good classification rules, you know, probably label that data, whether it be metadata or tagging or whatever, with the idea, I mean, this started very long time ago, right? Data classification's not new. In fact, data classification's been around for, for decades, if not longer, right? Just the idea of classifying something, grouping something, whatever.

So the idea back then for, for data classification was more for user awareness. That way the user, the person would see that indicator and say, "Oh, I now know I can do this, I shouldn't do that." And then, you know, DLP was kinda born. Hey, use data classification to inform DLP. Same thing, right? Take a user out the, of the loop, let DLP do, uh, the enforcements.

[00:22:00] Um, how does that, you know, wi-with the idea of what we're talking about, um, abstraction and also AI be-being the tail, like how does that change, um, what our viewpoint should be on data classification?

David Smith: Well, I'm gonna, I'm going to, um, reuse an old quote I had many, many years ago that said, "Money is not the answer, but I have had money and I have not had money, and I will tell you which one I like better." Data classification is not the answer, but I have had data classification and I have not had data classification, and I can tell you which one I like better. I mean, I'm, I'm gonna, you know, turn your question back around. Why are you classifying data? I mean, that's a question every organization should answer. I'm just, I'm just interested to hear what you think, Ward, like why would you classify data?

Ward: Yeah. I mean, there's plenty of reasons why. I mean, me being a data security guy, it's always been, uh, like I said before, right? Inform users, build up, you know, user or not user, [00:23:00] data sensitivity awareness, inform other tools. I mean, that's kind of the, the data security practitioner answer. Um, of course, there's regulatory answers, right?

There's, there's plenty of re- regulatory frameworks and just, you know, frameworks in general that say either data classification's required or it's a good idea, right? So, you know, really from those two, and a lot of the conversations I have, many organizations say, "Oh, yes, we need to do data classification."

Now, what they actually do, you know, that could be a whole, you know, podcast episode in and of itself. So there's lots of reasons why it's a good idea, you know, both technical and, uh, procedural. But to take that whole thought process I gave you back a step further, I would say no one's really done a good job at it, right?

I think on paper it's been great. I think passing regulatory audits, it's been great. But in [00:24:00] practice and in usage, it's been great in pockets, but I've never seen it end to end being successful

David Smith: Yeah, it occurs to me I could have asked that question a little bit more precisely in saying what's the purpose of data classification? 'Cause everything that I heard you name, I don't disagree with any of it. And again, data classification, good, right? But everything I heard you name are drivers for data classifications.

They're the reasons that we do it. But so often we put data classification out and treat it like it's a panacea,

Ward: Mm-hmm.

David Smith: Rather than stopping to ask, "What's the purpose of my data classification? What do I wanna do with, about this?" Um, and, you know, let's... So let's take the granddaddy of all data classifications, which is the federal government top secret classification. Everybody has heard, you know, confidential, secret, top secret, top secret compartmentalized information. And if you have that information, if you, if you get something out there and, um, uh, my wife and I have been binge-watching old episodes of The West Wing. It's... Yeah, that could be a [00:25:00] podcast on its own anyway.

But you see somebody walk into a room and they're like, "Hey, this is a TS/SCI conversation," and you immediately know, "Ooh, I am not code word cleared. I'm gonna walk out of this conversation, and I'll come back in a few minutes." That's not about the data, right? That's about me as an actor. I am not code word cleared to be able to hear this data.

So when you are not the active handling of that data, when you're not in the in crowd on handling that data, if you're not code word cleared, what is the difference to you as, as TS and TS/SCI? And I apologize to you for all of the comments that you're gonna get from all your ex-military listeners who will gladly tell you what the difference is. But I'm gonna ask them to look past the top level question and go, what is the purpose of what you're trying to do? Because I, I think that we do this all the time. We're like, "This is public information. This is, uh, internal use only. This is confidential. This is restricted." Like, okay, [00:26:00] well, what's the difference between internal use only or confidential?

Which one is higher, right? 'Cause we have confidential data that we share with outside third parties.

Ward: Mm-hmm.

David Smith: We get them under a non-disclosure agreement or like, you know, or a confidential data agreement. We say, "We're gonna share this with you so that you can Execute the portion of the business that we pay you to execute. But is that higher than internal use only, right? Because are they still internal and things like that? So a lot of data classification schemas that I have seen have failed not at the what, but at the who.

Ward: Mm-hmm.

David Smith: Like, "Hey, I, I will tell you who has the data." And if you go to somebody and say... If you go to a business leader in the organization, or let's say you're really, really mature and you even decided who's the data custodian of this. This is pre-release financial earnings, which is always one of my favorite data types to talk about, because the classification changes at, uh,

Ward: Right

David Smith: Eastern Time every quarter, right? It goes from being the most restricted information in the organization to

Ward: To public, [00:27:00] yep

David Smith: So yeah. But if you go to somebody and you say, "Okay, so you are the CFO, you're the data custodian of pre-release financial earnings. What, uh, you know, w- who's allowed to have that? uh, uh, data custodian of this, you know, are your, uh, you know, are the people who are preparing the reports for that allowed to have that? Is the printer allowed to have that?" You know. And having a true data custodian who understand it as being intentionally thoughtful about their role say, "These are the people who can have it, and this is what they can do with it, and this is what they can't do with it," is a, a real exercise in, in brainstorming, uh, most of the time. And it just becomes very, very, uh... it's un- it's untenable to be able to do that with every piece of data that we have in the organization. Uh, there is a data classification that I actually like a lot better, um, and I ran into this through an, an ISAC that I belong to, and it's called traffic light protocol. And I just love traffic light [00:28:00] protocol because I was, um, I had the good fortune a couple of years ago to coach a cybersecurity, um, competition team at high school. Uh, a quick, um- Quick plug for the Cyber Patriot program. Any of you guys out there who love working with teenagers and, uh, are cybersecurity nerds like me, uh, go volunteer to coach a, a high school team for Cyber Patriot.

Very rewarding. But there are specific rules for what they can use as notes in a competition, and notes given to you by your coach cannot be used, but notes that you take during, you know, coaching and things like that can be used. So we adopted traffic light protocol, which if you haven't seen it before, four colors: white, green, yellow, and red. And red means... Yeah, uh, it's... They actually call it amber, right? Just like a traffic light. red means if I gave you this data, then only you and I can share it. Now, if I'm the custodian of the data, then I can give the data to other people. [00:29:00] As a TLP red document, I can say, "Okay, Ward, you can have this data, and Julie, you can have this data.

Bob, you can have this data. Jenny, you can have this data." But Julie, Jenny, and Bob cannot even share with each other that they have the data unless they happen to already know that. That's the containment around red. Amber means of us in the same club can have the data, so as an organizational unit.

So, um, in, in the ISAC that, uh, that I was a member of, amber meant everybody within the ISAC could share the data back and forth. Um, and so you give... You get something from your boss, and it's TLP amber, and you're like, "Okay, all of my peers, the people working within my environment, the people who have, uh, the same level of operational responsibility also can have this data." is we're in the same larger organization. So if I get something that's TLP green, I can probably send it to anybody in the company. If I see TLP green, and I know [00:30:00] that a third party has an NDA, I'm gonna feel comfortable sharing that out. And then white is general public. I can, I can post it on the internet.

What I like about this versus, like, the, the traditional confidential internal use restricted, things like that, is that it's not about the who, it's about the who you can share to, what you can do with the data, and I think that's where classification has failed us. And when we have ambiguous classification rules, and then we pull that metadata back through these layers of abstraction, we're compounding insufficient governance with the in- inability to forward that governance on through the life cycle of the data

Ward: Interesting. Very inter- And I, and I, I was gonna say, well, first off I was gonna say, what the heck? Traffic light protocol, you've got four colors. Like, I only know of three, what am I missing? But, uh, I, I really like actually when, when you, when you broke it down, I was like, well, holy cow, that goes, you know, to your traditional four-tier [00:31:00] classification schema.

And, and actually I, I very much agree with, you know, the easier to understand, easier to use, because I've always said, as I've advised companies on data classification, if your user has to get out their secret decoder ring to understand how to actually use it, you failed. You failed.

What, what are you doing? Like, nobody's gonna have a 30-page classification policy open with an entire table of X plus Y equals this to understand that. It's... So the solution, right, the solution these days has been, "Ah, just automate all the things," right? Have a tool, automate the classification. So it's, while I agree from a technical control perspective that can be more accurate and more efficient, however, it does not fix the human element 'cause the human element still [00:32:00] doesn't get it.

Like, okay cool, my, my docket that I created is now confidential. All the tools will do what it needs to do. I'm just gonna be a ignorant human and continue on my day, versus to what you just laid out, "Hey, I've got this red document that I now know I can share with nobody, only back to you since you gave it to me."

David Smith: Right. And, and the key is not to throw the baby out with the bathwater, right? Your, your organization is used to whatever, uh, you know, three to five channels of classification that they've been using the whole time. The key is to be so clear with what your data classification schema means that the users implicitly understand the behavior associated with it. Um, I mean, I like TLP because I think it's, it's native green. Uh, you know, green means go, red means stop, white means, yeah, anywhere in the world, right? But if you take your internal use only, confidential, restricted, for your eyes only, whatever those are... I, I mean, [00:33:00] for your eyes only is a great example. It tells you right there what you can't do with the data, which is show it to anybody else in the entire world. And if you make them behavioral classifications, um, then you begin to add the context that you need back to the data so that you can start to build big G governance around how you wanna see data handled across the organization and for your partners

Ward: You know what? I'm, I'm curious, just kind of, you know, keep- keeping digging on this, this red light protocol type of data classification. It almost feels like what, uh, data rights management was, was supposed to be, right? The idea for data rights management was, was kinda similar, right? It had, it had kind of...

I mean, you could, you could bake it into your data classification, right? If you mark it restricted, rights management is only ever going to allow David to open it and view it, but David's not gonna be able to print it, email it, copy, what- whatever. So, uh, I, I mean, d- [00:34:00] kind of bringing back that to the technical side, is, is that where you're kinda seeing the parallel or, or not really?

David Smith: I think data rights management, I mean, obviously it, it came out of, you know, the copyright industry and Handbrake and all those songs that we ripped on our i- not me, uh, that my friends

Ward: Of course not. Of course not

David Smith: me, no, but my friends ripped songs on their iPods, and I saw how it worked. Um, so it came out of that.

And then as it tried to get applied in, um, the organizational context, it really became An over-abstraction of RBAC, right? But RBAC doesn't work if you have lazy rules. You've got to say... I mean, if you think about RBAC, and I'm going to go all the way back to my, you know, OS 360 days and RACF and so forth like this, where you've got, um, record level control, either direct record level RBAC rules, or you've got row and column RBAC rules that then get translated that says, "You can read this, but not change it," [00:35:00] "You can write it, but then once you write it, you can't read it again," um, or, "You can read and change it, but you can't delete it."

You know, it sets the whole, um, just basic file level security that, that came out of the original database security rules. Um, and I think that where data rights management never succeeded was in trying to apply very, very simple rules about what you can and cannot do with an element or a collection of data to the larger business construct, and those rules far- fall apart, I mean, very quickly.

Um, worked for the last number of years for a pharmaceutical organization, and one of the more interesting things that pharmaceutical organizations do with data handling is clinical trials. Um, and you think that clinical trials, the problem with clinical trials is, oh, yeah, there's PHI everywhere.

There's not. most clinical trials, there's not any private health information at all because your clinical [00:36:00] trial's done as a double blind,

Ward: Mm-hmm.

David Smith: Right? So you've got, you know, patient 1092, and you've got patient, you know, 1468, and, you know, 1092 is a, you know, a 56-year-old male with a, you know, history of this and that and the other, and the other one's a 23-year-old female who's, you know, got no prior medical history.

And they pull this data together. If you want to bust a clinical trial, re-identify that double blind study. The entire thing just gets thrown out the window. So, um, it's a different problem with trying to look at the data, but the solutions that are, are applied to it are the same as if it was PHI, right?

Who can handle it? Who can change it? Who can read it? Who can re-identify, uh, who can recombinate the, um, the tokenized patient data back to the study itself? because at some level, I mean, the doctors who are conducting the clinical trials, they have a plane of [00:37:00] visibility. They know that Bob is, you know, patient 24601 and that, you know, uh, Janine is, you know, patient, um- 8675309 or something like that. But what they don't know is who's getting the placebo, who's getting the active. On the other end of that is the researcher who's running the study, who knows that 8675309 is getting the active ingredient and 26401 is, uh, getting, you know, the placebo. So different planes of, um, data hiding, data obfuscation intentionally so, that if you recombinate those planes, it just completely blows up the study.

And it makes it for, um, a really interesting data handling problem, but it speaks to one of the most tightly controlled data set rules that I've seen, uh, you know, for any data handling schema

Ward: Makes a lot of sense, for sure. So I [00:38:00] mean, David, we've, we've thrown it out there a few times in terms of numbers. You've, you've been in the industry for, for quite some time at this point. Um,

David Smith: Getting the gray

hairs

Ward: right? Like, that's just a day, day in the life of, uh, cybersecurity and IT folks.

But, um, curious, how did you get to where you are today? What was your journey?

David Smith: Um, I started out at the help desk, and I believe that any cybersecurity person worth their salt needs to start out at the help

Ward: Yes

David Smith: there are two things that you will encounter at the help desk that you need to know how to deal with, um, for the rest of your career. One is angry customers who are just trying to do their job, and the stupid computer they've given is broken again. And the second is old janky equipment that hasn't been patched or updated in years, and there's no money to fix it, and it's your job to duct tape it back together and get this person working again. And understanding the intersection of you've broken a business [00:39:00] process to, uh, you're working with legacy systems that have to be able to drive the business process forward is, uh, one of those lifelong lessons, right?

And it, uh, it sounds so remote from, from information security, but it's applicable. Uh, when, when I was working at the help desk, I had developed a knack for batch files. Loved batch files, loved, uh, being able to put things together and, um, if you're a Linux geek, then I said bash files. If you're a Windows geek, then I said batch files.

Either way, I'm your kind of nerd. Um, was an opening in the security, um, team for a log analyst. I didn't know what that meant. Turns out what that meant-- it means is you have five hundred sheets of paper you have to go through, uh, because it was, it was paper thirty years ago, '96, trying to find, you know, log anomalies in the R consoles for, uh, NetWare. Uh, one of the things that I did was automate that process. I, I wrote some files that allowed modems to be able to dial up to remote [00:40:00] branches that weren't connected to the internet. Um, from there, we adopted some software that automated that process, and a friend of mine who, who left the organization, went somewhere else, they bought the software. They brought me in to run that software, and I was still very much a, a, you know, a security analyst at that point in time until the first audit came along. We had an audit come along, and the division did horribly, and the CEO-- the CIO called me into her office. She's like, "I hear you're the security guy.

You need to fix this in twelve months." Uh I learned so much about security in those 12 months. Went out, sat for the CISSP, understood that, learned a lot about the business, brought that back together, built that process, began to build automations into the software I was using. The software company called me and said, "Hey, we understand that you're really good at fixing people's, um, computer...

Or fixing people's cybersecurity programs." Information security back then. And, uh, and so I [00:41:00] started a consulting group, uh, with Symantec, who had bought the BindView software that we were working with, that was focused not on software installation, but was a GRC team.

Ward: Mm-hmm.

David Smith: Uh, and we went out and did GRC software, it was called Control Compliance Suite, that was mapped across multiple different objectives. The second best experience of my life, besides being a help desk analyst, for learning, because I would be in a, no fooling, I would be in a casino one week, I would be in a regional hospital the next week, I would be in a chemical manufacturer the next week, I would be in a state government office the next week, and company is completely different, just like the company last week.

They were all facing the same kinds of problems. Um, and so building those common flows between governance really began to kind of set forward a career where, um, when Symantec went in a different direction with that product, I started my own consulting organization. Uh, worked on my own for [00:42:00] a couple of years, found out that I like consulting and solving problems a lot more than I do sales, tired of singing for my supper, and, uh, you know, came back around into, uh, managing an organization where I was focused on technical evolution, uh, you know, primarily in data security, in endpoint security, but with direct ownership for, uh, governance controls and, uh, you know, the program as a whole. And fortunate enough to work in an organization that was very scientific-facing and forward-looking, and cared about things like post-quantum cryptography, I would love to just talk to anybody about sometime. The thing about it was, wasn't a straight path, you know? And I, I hear, uh, you know, because of the work that I do with The Cyber Patriot, I talk to young people who are now many in college going, "I want to be in cybersecurity. What should I do?" And what I say is, "Whatever you're passionate about." your passion is about red teaming, then go red team. If, uh, you don't like coding that much, there is a dearth of people [00:43:00] being able to do governance. If you can pick up a copy of NIST 800-53 and begin to order off of it like a, a sushi menu for a series of controls that are gonna make, uh, an impact in an organization, you will be well-regarded and highly valued for being able to do that. Um, I'm really, really grateful for the kind of meandering, circuitous route that I had for security, because it's given me an opportunity to eat from all sides of the buffet

Ward: I love that. I love that. And I personally think that's super important. Um, you know, I, I had a similar meander myself. You know, that's, that's a different story, different time. But, um, I, I love, you know, just going, going back to your journey there, I love the starting at the help desk. Um, 'cause I also very much agree, like, where else are you gonna learn how to deal with people, non-technology people dealing with technology than [00:44:00] when they're at their worst or feel at their worst, calling you up or emailing you or whatever when you're on the help desk?

Like, very applicable to the day and day difficulties you have as a cybersecurity professional.

David Smith: Absolutely. Yeah

Ward: Oh, man. Well, David, if folks want to connect with you and continue to pick your brain or, or talk about quantum, uh, cryptography like you said, uh, what's the best way to do so?

David Smith: The easiest way to find me is on LinkedIn. I am DavidESmithCISSP, all one word, DavidESmithCISSP on LinkedIn. Um, I've been putting a lot of stuff out there because there's too much stuff in my brain as I get older, and so some of it has to go somewhere. So, uh, if you pick up one of my articles, you'll hear more of my mad ravings on security. But I would love if you just, uh, sent me a message and, uh, you know, tell me what's going on in your world, how you're thinking about things.

Ward: Love it. And we'll help you out when we post this. We'll make sure to tag you. So folks, if you wanna connect with David, he's tagged [00:45:00] on LinkedIn on, on this post. Make sure to connect. And, uh, David, thank you so much for joining today. This has been a great episode

David Smith: Ward, I really appreciate the invitation to come talk today

Ward: Of course, of course. And big thank you to our audience. Really hope you enjoyed the episode and learned something today. Please tell others in your network to follow and listen. This has been another exciting episode of Guardians of the Data. See you next time

That's a wrap on another episode of Guardians of the Data. Thanks for tuning in. For show notes and more, visit guardiansofthedata.show. Guardians of the Data is made possible by support from Sentro. To see how we help organizations discover and classify all of their data accurately and automatically while quickly achieving petabyte scale data protection without the fuss, please visit sentro.io.

Catch you next time

Where Is Your Grandmother's Data? -  David Smith - Guardians of the Data - Episode #49
Broadcast by