LongCut logo

How AI is Changing DevOps Careers | What You Need to Know

By TechWorld with Nana

Summary

## Key takeaways - **AI Only 70% Accurate on GitHub Issues**: Even the highest-end Claude and GPT models achieve at best 70% accuracy on real GitHub issues from Sweep Bench, which means the app won't function due to bugs, unlike editable tasks like blog posts. [06:11], [07:27] - **DevOps Needs 100% Correctness**: In DevOps, 70% correct Terraform or Kubernetes YAML won't deploy or start pods; it's binary—either it works or it fails, requiring human verification of the output. [07:15], [08:08] - **Conference Hype vs Engineer Reality**: For two years at KubeCon, Bret asked engineers if they use AI in DevOps; despite talks and hype, no one was automating infrastructure with AI, only casual ChatGPT use. [34:39], [35:51] - **AI Scales Engineer Capacity Historically**: One sysadmin went from 10 servers to 100 virtual, 1,000 cloud, 10,000 containers per tech wave; AI will enable managing more without reducing jobs as companies expand. [19:03], [20:24] - **Alerting: Low-Risk AI Use Case**: AI on observability tools analyzes alerts, logs, metrics, and posts likely causes to Slack for faster troubleshooting; it's read-only, low risk, saving 30-60 minutes at 3AM. [24:01], [27:13] - **Humans Accountable, Not AI**: AI can't be fired if it exposes a Kubernetes API publicly; the engineer who commits the AI-generated code takes the blame and must understand it to push deploy. [00:29], [45:03]

Topics Covered

  • 70% AI accuracy fails DevOps
  • Start AI with CICD pipelines
  • AI amplifies engineer scope
  • Hype masks slow AI adoption
  • Humans accountable for AI failures

Full Transcript

Even if you paid the most money, the hundreds of dollars for the high-end Claude models, high-end GPT models, you are at best gonna get about 70% accuracy for a software developer.

70% means it won't work, it's not gonna function at 70%.

Correct.

The app will not function.

It was a weird disconnect between what I was hearing on the internet and what even some AI conference talks were talking about, and then what everyone I knew and everyone I would talk to.

The AI can't be fired.

The company isn't going to fire anthropic because it deployed a Kubernetes cluster with the API for Kubernetes, accidentally public on the internet, the person who committed the AI code is gonna get fired, so there's still gotta be a human that has to go.

I'm willing to take this risk.

I'm willing to push that deploy button.

I spent two hours talking with Brett Fisher about AI and DevOps, and what he told me completely changed how I think about the combination of these two.

Here's the most common question everyone wants answered.

Is AI going to automate DevOps work and make engineers obsolete, or are we still many years away from that?

Will AI do our work so well that we will need.

Fewer engineers should we really believe when people say engineers who know how to use AI are becoming more valuable and not obsolete, should we believe the hype?

Or is there still massive gap between what AI companies promise and what's actually happening in production?

And that's exactly what I discussed with Brett to address these questions with actual data and real world experience.

Not just personal opinions and theory.

Brett has been teaching Docker and DevOps for over a decade, but he has been diving deep into the AI tools, concepts, and trends.

He also spent two years walking around conferences asking engineers about their real practical experiences with different AI tools, and all this showed him the truth about AI adoption in it.

And DevOps specifically.

He even launched a podcast specifically about AI for DevOps.

So he has some real, tangible advice and insights on this topic, which you will get fully in this conversation.

So let's dive in.

The first thing I wanted to understand from Brett was simple.

Can AI actually do DevOps work?

Everyone is talking about AI automating infrastructure, AI writing, Terraform code, ai, managing Kubernetes clusters.

So I asked him directly, is this real or is this just a hype?

I saw that you are.

Creating a lot of videos about AI in DevOps or agentic DevOps.

So for someone who has been doing DevOps and who has been doing a little bit of like software development or IT infrastructure, who hasn't touched the AI topic yet, how would you describe that ecosystem?

Like how messy it is, how chaotic it is, how difficult it is to choose the right tooling, or what do you focus on instead?

Like tools or the use cases?

How would you describe that?

Only eight months ago, right where we, I wasn't even aware of the agentic idea of how we were going to really automate a LLM in a loop.

Essentially, that's what most of these agents are, is just fancy logic in a loop.

That was very much a surprise to me.

I mean, I was like everyone else.

I was using chat, GPTI was using cursor and copilot and trying all the different tools, trying to figure out what was useful for me, but also trying to casually.

On my spare time, maybe spend a little time figuring out what other people are doing.

But in 2024, even very early 2025, no one in DevOps was really talking about using AI tools to do their job.

And other than maybe writing YAML with an AI that you would just, you know, one shot it, that's where you just ask it a question and you take the response and there's no agent involved.

You mentioned that you dug deeper, you know, you kind of had this realization, okay, there must be something there.

There's.

Scold.

From the information that you got, I'm pretty sure that you started learning and researching with certain expectations.

Right?

So you had some kind of assumption probably in your head, like you didn't know what you were gonna discover.

What was something that you discovered during this search that you didn't expect in terms of like, I didn't expect it to work this well.

Like to be that mature and also the opposite.

When you dug deeper, like, I didn't expect it to be this bad actually, you know, compared to the hype.

Like what were some of the details or like aha moments that you had during that three month of intense research?

Yeah, great question.

Most of it is not good news, but that like maybe a, a surprise on the good side was when you can get one of the high-end frontier models.

In an agent loop and it actually spits out something that you know how to read.

Like the, the key is that right now AI is really aren't gonna do anything of your job for you.

You, you've gotta babysit 'em.

But when it gets it right and you get, you know, whether it's a, a couple hundred lines of YAML or, uh, a whole new.

Class in a Java file or a JavaScript file, you get a whole TypeScript perfect layout that does feel like magic, right?

When you just saved yourself 20, 30 minutes of research.

And I've been making a lot of CLI tools lately because this stuff is so easy.

So tools to automate my work, tools to help give DevOps examples of tools, and I like Golan as my that's go is my preferred language for, for DevOps tooling.

When it gets it right, which is rare and it writes the way I would and stuff like that, it feels so exciting and it feels like I'm a kid again.

Discovering code, discovering how to automate things and play with things, that just doesn't happen often enough yet.

And even the best models that we have, the absolute best models that the.

Top of the rankings are only going to fix about two thirds, maybe, if you're lucky.

Three fourths of GitHub issues.

We have this thing called Suite Bench now, and all the models are listed there and they're constantly competing with each other for the top rankings.

But even if you paid.

The most money, the hundreds of dollars for the high-end clawed models or the high-end GPT models, you are at best gonna get about 70% accuracy, which for a software developer, 70% means it won't work.

It's not gonna function at 70%.

Correct.

The app will not function eventually.

There'll be so many bugs.

Right.

Let me explain what Sweep Bench is and why this.

70% number matters so much.

Sweep Bench is a benchmark that tests how well AI models can solve real GitHub issues from actual open source projects.

So these are not some theoretical problems. These are actual bugs and features that real developers submitted.

In those projects and the best models like Claude Sonnet, G, PT four, the most expensive, most advanced AI available out there, they solve about 70% of these issues correctly now.

70% sounds pretty good.

Right?

But here's the problem.

In DevOps specifically, partial correctness does not work because if your Terraform configuration is 70% correct, your infrastructure will not deploy.

If your Kubernetes YAML is 70% correct, your pods will not start if your CSD pipeline is just 70% correct.

It will fail, and you will need to troubleshoot and identify what the issues are, what things are breaking.

So you will need to understand how the pipeline, the manifest file, and the Terraform code work because you'll need to fix those issues.

And this is fundamentally different from other tasks where AI works well.

Because if AI writes a blog post, that's 70% correct, you can edit the other 30%, right?

But infrastructure code does not work that way.

It's binary.

Either it works or it doesn't.

And this is why you can't just prompt an AI and walk away.

You need to understand what it's building.

You need to verify the output, and you need the knowledge to spot the 30%.

That's wrong.

I think there is a moment with every developer where they're picking up steam and then they realize that we're still many years, I think away from these models being so trustworthy that we can just pass on our requests and not have to look at the answers.

So the idea of like an AI writing, the PR, and then an AI reviewing the PR and approving the pr.

Whether it's for software code or develop or DevOps, yaml, I think we're still many years away from that for sort of fully ai, which is why I'm not worried about jobs and and, and stuff like that.

I think most of that stuff is really hyped up and don't believe anything in ai.

If they work for an AI company, you just can't trust 'em because they're living.

They're not living in reality with the rest of us.

They're living like 10 years in the future, or 15 years in the future.

And so I have, uh, three follow up questions on those.

So first of all, because there is a difference between tasks that software engineers are doing or software developers mm-hmm.

And the tasks that DevOps engineers are doing there, you know, writing code is pretty much different from, you know, your.

Creating and designing an architecture, and then you are kind of putting those Lego boxes and, and pieces together to build this entire thing, right?

So there is less writing code, but more like mixing and matching, matching different tools.

So how would you compare?

So is.

AI usable on top of DevOps because one thing that the immediately stood out for me was that DevOps was always about automation, right?

Right.

So we had lots of bottlenecks.

We had human in the loop, you know, things that slowed down the process.

Mostly it was manual work and DevOps was like, let's automate and, and remove those bottlenecks and humans.

Where it's not necessary.

And AI conceptually, like agentic AI conceptually does the same, which is let's automate things which are tedious, let's make it more intelligent.

So it's like intelligent automation versus like DevOps automation.

So is there any value in edit using AI on top of DevOps, or would you say it's kind of not really like valuable as much as for software development?

Like what would you say?

I think both are very valuable.

Whether if you're gonna compare like software development versus DevOps engineering, I would say that they're very much, if we're going with just general accuracy of 70%, if that's a ballpark, I think they're both very much usable.

I think the difference is, is that DevOps, it's not as intuitive because we don't necessarily have a tool sitting in front of us that's gonna explain exactly where and how we we're supposed to automate with AI today, right?

Like all of the.

All the IDs are all focused on code generation.

They happen to generate YAML just as well, or ML or whatever the Js ON, or whatever you need.

They do that just as well as they do the rest of software engineering.

But we don't have, I wouldn't say we have a lot of, especially open source, we have very little DevOps tooling and open source that lets us just go.

So I think the real challenge for any DevOps person today who's trying to adopt AI is where is it working and where is it not?

And where are the patterns that I can just, you know, where are the recipes?

I need recipes.

You know, I need cookbooks just to be able to automate something in GitHub.

Or can I use a W S'S MCP server to spin up Kubernetes clusters with very little effort?

The answer is yes, absolutely.

The real trick is how do we narrow that scope down per task to give that AI a very, you know, guardrails.

To keep it safe, to make sure that humans are reviewing it.

I feel like we can bring in AI to DevOps today as long as it's only doing one little part of the job.

And then we slowly, like in everything in DevOps, we slowly expand.

Our new automation.

I mean, this is no different than if you want to implement a new CI platform for your team.

Maybe you wanna move from Jenkins to GitHub.

You're not gonna do that all in a day, right?

That's gonna be too crazy risky.

This is really important.

Software developers have tools like Cursor, GitHub, co-pilot, visual Studio code extensions, all designed specifically for writing code with ai.

You open your IDE and AI is right there helping you.

But for DevOps engineers.

The tooling landscape is much more scattered.

You're not just writing code as a DevOps engineer.

You are configuring infrastructure across multiple cloud providers.

You are setting up CICD pipelines maybe in different platforms, managing COR clusters, writing terraform configurations, setting up monitoring and alerting in the cluster or on a cloud platform, managing secrets and security policies.

And each of these lives in a different tool or different platform, different interface.

There is no single AI assistant that understands your entire DevOps workflow.

So the challenge is not can AI help with DevOps?

The challenge is, where do I even start?

And Brett's answer is clear.

Start small and specific.

Pick one task.

Automate that with ai.

Make sure it works and then expand.

Do not try to automate your entire infrastructure with AI on day one because that's how you break production.

So instead just pick something narrow and more specific and give your AI clear guardrails to protect from messing up anything important.

And at the beginning, keep humans in the review loop.

And once you have that fully working, then expand slowly.

And this is exactly how we adopted every other DevOps tool.

When Kubernetes came out, you didn't immediately migrate all your applications to Kubernetes, right?

You started with one service or one application.

Maybe you did a test run, you made sure it worked and then migrated more.

AI is the same.

It's a new tool in your toolkit, so you introduce it and you migrate your stuff to it.

Incrementally, so we know the limitation.

But what are the practical workflows where AI is useful right now?

Not in the future, not someday, but today.

And Brett's answer was surprisingly specific and it starts with CI ICD.

So if you're experimenting with that, absolutely.

I think that CI is a wonderful place to play right now because it's not yet building production deployments for you, which also is an area that you can experiment with an AWS engineer, Raj.

Who showed us how to use the MCP servers at AWS to build Kubernetes cluster.

And it is possible, it is totally doable.

It built cloud formation, it built Terraform for us can use CDK.

You can use any of these ways to program the infrastructure and try it out, but you're wanting, you're what you're gonna realize through your process.

I think if we're gonna imagine anyone out here that's gonna play with this stuff after listening to us, that you're gonna realize that.

This is all going to, the more you try to automate it with ai, the more it's gonna fo focus on you having to document, document, document and give that documentation to a bigger, increasing context to your ai.

So it ends up having all the documentation, all your standards.

What are your requirements?

There are specific IAM groups that has to use, you're gonna have to give it all that crazy detail.

Now let me explain why CICD is the perfect starting point for AI in DevOps.

And what MCP actually means first.

C C D.

Why is this the best place to start?

One.

CICD pipelines are relatively isolated.

If your AI generates a bad pipeline, it fails to build, but it doesn't take down the production environment.

So the risk radius is actually small.

Two pipeline code is repetitive.

You are doing similar tasks across different projects.

You run tests, build container images.

Push them to the registry, and then you deploy that newly built image to deployment environment.

And AI is good at repetitive patterns.

Three, you can iterate quickly.

If the AI generated pipeline does not work, you fix it and you try it again, no production impact.

And four, the feedback loop is fast.

You commit the pipeline, it runs.

You see if it works or not, and within minutes, you basically know if the AI got it right.

Now, MCP, which stands for Model context Protocol, and this is becoming really important in the AI landscape, in very simple words.

MCP is basically a way to give AI access to your tools and systems through APIs.

So in the AWS example that Brett mentioned, the AI can actually talk to AWS APIs.

It can create infrastructure, it can build current clusters, it can write terraform or cloud formation code and so on.

But here is the critical insight.

The more you automate with ai, the more documentation you need.

Why?

Because AI needs context.

It needs to know your naming conventions, which IAM roles to use your security requirements and guidelines, which regions you are deploying to your tagging standards, or how you structure your infrastructure code.

And all of that context needs to be documented and fed to the AI because otherwise it'll make assumptions and those assumptions will most probably be wrong and.

That's gonna actually improve your project.

There's a conversation we're all having right now that AI might actually save the testing slash qa slash documentation people because they are experts in how all that stuff needs to happen and AI needs that.

Needs that level of context in order to help us create reliable infrastructure, time and time and time again, and reliable automation.

I think this is actually gonna be helpful for us.

I really don't think it's gonna replace a lot of us.

It's just gonna make us go bigger.

The story.

I always tell is because I'm a gray beard and I've been around for over 30 years in tech, that I have seen this wave.

This one's happening faster than every wave I've seen, but in the nineties we went to from mainframe to pc, and the early two thousands we went from hardware to virtualization.

Five years later, we went from on-prem to cloud.

Then we went five years later from cloud to containers, and now we're just automating all of that and in each level.

Every single time we went through those phases.

A CIS admin, we didn't really always have the DevOps term, but a CIS admin or someone who's ops, someone who's cares about infrastructure and deployments and automation and management.

So in all those generations of us moving from one evolution of infrastructure, or whether it was a PC or servers or whatever, each time.

The CIS admin grew in nineties.

I could only really manage 10 servers.

I don't even think I had 10.

I had to babysit them.

Everything was manual.

There was no scripting or automation.

Open source wasn't really popular yet in terms of typical enterprise infrastructure and as it open source grew as the tooling grew over those decades.

And I have a chart that shows like once it's admin to 10 servers.

Then with virtualization, it was one CIS admin with a hundred servers.

Then with the cloud it was one CIS admin with a thousand servers.

Then with containers it, it sort of went beyond servers and you thought about workloads and it went to one CIS admin or DevOps person could do 10,000 containers, and our tooling got better, our automation got better, and each time it was all about a single individual managing a bigger and bigger fleet.

So I don't look at that as DevOps lost jobs or in ops, lost jobs.

We just were able to manage more and it turns out.

So far, at least in my whole career, that companies and organizations, they will take as much infrastructure as you can give them.

They will take as much automation.

They always wanna run more stuff.

I've never seen a team have an nq.

Their ticket queue be zero forever.

Everyone's got broke, broken stuff.

Everyone needs more, more things.

They wanna launch more apps, more copies of the app to make it stable.

More for capacity, more disaster recovery locations where we can spin it up on the fly in case this place goes down.

Like that's been.

Expanding and expanding.

Expanding, and I don't think AI is going to cause that to stop.

I think AI is just gonna cause this all to manage more.

Now, a quick note here.

While everyone's talking about ai, just go and check any DevOps job postings.

Right now they're asking for Kubernetes, Terraform, Docker, CICD.

You will not find must know how to Useche G. PT on DevOps, job descriptions or cloud engineer job descriptions.

I actually analyzed over a hundred.

DevOps job posts from dozens of different countries to see what companies are actually hiring for, what skills they're looking for, regardless of the role, whether it's cloud engineer, platform engineer, DevOps engineer, whatever related to DevOps technologies.

And I'll link that video here if you want to see the breakdown.

And Brett confirmed this as well in the interview, you still need DevOps skills.

They are the foundational knowledge that AI does not work without.

But as we all know, DevOps ecosystem is pretty overwhelming.

I think it's extremely important.

To have a clear guide for how to navigate this ecosystem, and that's why we created DevOps Starter Kit that shows you exactly which skills to learn in which sequence, based on what actually gets people hired.

So instead of getting caught up in this AI hype, focus on skills that are demanded right now on job market the most.

So be sure to grab it below.

Link is gonna be in the description because you need to learn DevOps first and you can add AI on top of it once you have properly mastered DevOps.

Now this historical perspective is super important.

You can literally take notes on this.

So one engineer managing 10 physical servers became a hundred virtual servers, then thousand virtual servers, then 10,000 containers Each automation technology did not reduce the number of engineers needed.

It actually increased the scope of what one engineer could manage, and companies always expanded to feel that capacity.

Why?

Because their competitors were expanding as well.

If you can deploy faster, you build more features.

If you can manage more infrastructure, you launch in more regions.

The demand did not shrink.

It grew with every such advancement.

And with ai, the pattern is the same.

One engineer will manage 50,000 containers or a hundred thousand serverless functions or complex multi-cloud deployments.

That would have been impossible before.

But this is important to understand.

Companies will not say, great.

Now we need fewer engineers because we are gonna keep managing the same number of servers and containers and functions as before.

They'll say, great.

Now we can build that new AI product that we have been planning.

Now we can expand to these new markets.

Now we can build more products and features.

Now we can deploy in additional regions.

Much faster with more scale.

So the ticket queue never goes to zero.

There is always more work or more to do, and AI makes an individual engineer faster and more efficient, but it doesn't make you obsolete.

Now let me show you one of the most practical use cases where AI is already providing value today.

And this is something that you can implement right now.

I think every observability monitoring tool's gonna have, if it doesn't already, it's gonna have AI features to help you troubleshoot faster, get to the problem resolution faster, make suggestions.

We, we've seen some of these.

Post, they will see an issue that's triggered and in alert systems. So whether you have PagerDuty or whatever the PagerDuty notify the ai, it accesses the metrics and logs and data that guesses on what the uh, resolution might be.

And then it posts that into Slack, right?

So then like you as the human, when you're woken up in the middle of the night, you automatically.

Look the slack at the issue and right below it is an AI response going.

I've looked at the data, it might be this, this, or this.

Like, I think that's the very first thing that ops people should be looking at because that, you know, reducing your time to fix is a key DevOps metric that we've all been working on.

And AI, I think can give us that first pass at least to narrow down, you know, hey, I looked in the pod spec logs for that pod that was alerting and it's, it's, I see these three errors and I filtered it down for, those are the things I think that are gonna happen.

Yeah, I think this specific use case that you just mentioned about AI layered on top of observability because you have tons of metrics, you have tons of data, so it kind of.

Is a perfect use case for AI and analytics and then coming up with a suggested solution because that's probably one of the things that humans do not like to do or not enjoy doing, and especially reducing the time to fix the issue when it's actually very critical.

I've heard that actually.

Is one of the top use cases whenever, you know, it was a discussion about AI in the context of DevOps.

And it can be read only.

Like a benefit of that is it's read only.

You don't necessarily give need to give it right to your infrastructure.

So it's a low risk, like if a guess is wrong.

You didn't waste your time, right?

You just have to troubleshoot more.

This is one of the most practical AI use cases for DevOps right now, alerting and initial troubleshooting.

So here is how it works.

Your monitoring system fires and alert AI automatically pulls all the relevant logs, metrics, traces, and any recent configuration changes made.

To the environment.

AI then analyzes all this data and suggests possible causes.

What caused this issue?

Again, AI posts its analysis from all this aggregated data to Slack channel alongside the alert, describing exactly what happened and why.

When you wake up at 3:00 AM you immediately see exactly what the cause of the issue was and the potential solutions.

This saves the first 30 to 60 minutes of troubleshooting that time when you're finding which pod is failing, pulling all the logs, checking any recent deployments, any recent changes, looking at the metrics, searching for similar past incidents.

AI can do all that automatically and much faster, and it will just present you with, here are the three most likely causes based on the data, but here is why this use case works so well.

Because it's read only, the AI is not changing anything.

It is just analyzing and suggesting, so if it's wrong, you lost a few minutes.

If it's right, you just saved 30 minutes of troubleshooting.

But it doesn't actually have a risk of messing up anything in production because it doesn't make any changes.

So low risk, high value.

This is exactly the pattern you want for AI in operations.

One theme kept coming up in my conversation with Brett.

Context.

AI needs massive amounts of context in order to work well, and the biggest misconception that people have is that.

AI will just magically understand what you want with a single prompt.

So let me show you what actually happens in reality.

So you said this phrase of AI is not gonna do your stuff, so you still have to babysit it, whether it's doing DevOps related task or code generation.

And a lot of people, I think the biggest misconception they, that I see that people have mm-hmm.

When they do not know.

Have not done proper research of AI and its capabilities is that it's automatically is gonna, you know, you just give it one prompt and it's gonna magically do stuff for you.

And when you start digging deeper and understanding how it works, what you said exactly like it needs a lot of instructions, it needs a lot of guidance, documentation.

So once you get all of that.

Knowledge down back to the original story of you realize the mistakes, so you start giving it more context and more context.

At the end of it, you'll realize that your ai, maybe it's weeks or months later, but your AI is actually way more reliable now.

It can take those DevOps tickets from your GitHub issues or your Jira or whatever.

It can take those tickets and then it can get like 90% of the time it gets it right.

The first time maybe, and you're maybe getting pretty close to almost trusting that it, that it can do a job.

Yeah, but it's, what it's really doing is it's doing a very narrow scope.

The documents are very well written.

The requirements for the poll request or the change request are very specific.

And because it can feed from all those different places.

It suddenly has, you know, it's probably using the hundreds of thousands of tokens now.

It might even have access to your metrics to understand.

The biggest misconception about AI is that you give it just one prompt and it does everything.

But in reality, ai.

Needs extensive instructions, documentation, context, data, and guidance.

Think about when you onboard a new engineer in your team.

You're not gonna say, just set up our infrastructure and walk away.

You give them architecture documentation, you explain your naming conventions, you tell them what the security requirements are.

You give code examples, access to the existing systems. And you also tell them, this is a person that you can go with your questions if you need any help.

And AI is the same except it cannot ask for clarifying questions.

So you need to provide even more context upfront and notice the timeline that Brett mentioned weeks or months.

So this is not instant.

You don't get AI working perfectly on day one or day two.

You start with 70% accuracy.

Then you give it more context, you document better, you refine your prompts.

You narrow your scope and gradually over weeks or sometimes a month, you might get to 90% accuracy for specific narrow tasks, but that 90% accuracy only happens when your documentation is excellent.

Your requirements are specific.

The task scope is narrow and you have fed hundreds of thousands of tokens of context, and this is why companies with mature DevOps practices will benefit more from AI because if you already have infrastructures code, properly organized, CICD pipelines, well documented, run books for common tasks or architecture diagrams and clear standards and conventions.

Then AI can learn from all of that.

But if your infrastructure is undocumented, if you're doing a lot of the stuff manually, your standards are inconsistent and everything is just tribal knowledge in people's heads, AI will not help much.

Because it has nothing to learn from.

So the work you do now to organize and document your systems, that's an investment you are making to use those AI tools effectively, or maybe your actual live Kubernetes endpoints.

And it might reach out to see what the real world is looking like right now.

And that's something we're seeing is where if you tie in these MCP tools, which MCP basically just gives your AI access to all your other APIs.

And so it might have access to all of your remote cloud tools.

It has access to GitHub through the API.

It has access to AWS through the API, so it can start to read and look at things all the time.

You're slowly refining it and getting the scope down, and right now it's loosey goosey.

Everything goes.

Everyone's just.

Throwing everything into it.

It's kinda like the early days of containers where we were throwing everything in our containers, including all the bad stuff.

I think it's a maturity model and you're, you're all gonna get there, but you have to start now or, or wait, but start now, and then maybe there eventually you'll, you'll have something mature in six months.

MCP or model context protocol is becoming really important for this reason.

Right now, when you use ai, you are basically just copy pasting information, right?

You copy your Terraform code into chat GP team.

You copy error logs into cloud or use it in your IDE, so you kind of manually feed it.

The context MCP changes that because it lets AI directly access your systems through APIs.

It can read your Kubernetes cluster state, it can check your GitHub repositories and the code inside.

It can query your monitoring metrics or look at your cloud resources and existing infrastructure, and it can access your documentation instead of you manually providing context and explaining how your infrastructure looks like or what your code is doing.

AI can automatically pull the current state of your systems. It can pull the current state of the code.

It can see what's actually running in production right now.

And this makes AI much more useful for troubleshooting operations work.

For example, when an alert fires AI can immediately check the logs of COR pods or look at recent deployments or query metrics from the last hour.

Review any configuration changes.

And based on all these rich data, it can suggest any potential causes.

But Brett's warning is important.

Right now.

Everyone is just throwing everything at ai.

No structure, no security model, no guardrails.

This is like the early days of containers.

People were putting literally everything in containers, including passwords and secrets, all the bad practices.

And then we learned from all those mistakes and we matured.

We learned the best practices.

And AI is following the same path.

We start now, we experiment, we make a bunch of mistakes.

We learn and iterate, and then in six months you'll have something mature.

Now, here's the part that surprised me the most.

Brett spent two years walking around conferences.

Asking one simple question and the answers revealed the actual truth about AI adoption in DevOps by our cloud, right?

Because we're the ai, ai, ai, that was where this all came from, was like how, I don't even know, like I'm overwhelmed.

I hear all this wonderful hype about all this code being written in ai, and then I go talk to my friends and they're like, Nope.

That's not happening.

Like I don't see it, like no one was seeing it.

So I started asking questions and I started going around Kub Con.

And for two years now, I walk around KubeCon.

I live in the expo hall.

That's all where we hang out the whole week.

'cause I could watch the videos online.

I don't need to go to the sessions.

I, I'm there for people.

And all week long I'm asking, Hey, do you run your own AI inference?

Do you use AI in DevOps for everyone?

For years, all the talks were telling us about it, but then everyone I talked to is like, no.

No, no, not using ai.

Not running ai.

I'm using Chad GPT and I'm code jenning.

But I'm not automating with AI and I'm not certainly touching infrastructure with ai, and I'm certainly not running my own AI inference cluster.

So it was a weird dystopia, not dystopia.

It was a weird disconnect between what I was hearing on the internet and what even some AI conference talks were talking about.

And then what everyone I knew and everyone I would talk to, I was like a man on the street trying to find that one person that was using it.

It just wasn't happening, and it wasn't until this year.

That we finally started having actual DevOps ai conversations.

If you'd ask someone a year ago, they would've said, that's crazy.

I, I would never let an AI touch my infrastructure.

But now we're actually starting to have conversations, so I feel like hopefully, hopefully I nailed the timing.

Like not too early, not too late.

'cause you can be too early.

That's also another problem.

This is such an important observation.

For two years, Brett walked around CubeCon, the biggest Kubernetes and cloud native conference and he asked people, are you using AI endeavor?

Conference talks said yes.

Marketing materials said yes.

LinkedIn posts said yes, but actual engineers, the practitioners who were supposed to be using those tools said no.

They said, we are using Chet g BT to write code faster, but we're not automating infrastructure with ai.

We are not letting AI touch production environments.

And that means there was a complete disconnect between the hype of AI and the reality.

Now, why does this matter for you?

Because you need to separate what's possible in demos.

What's.

Being solved by the AI companies and what's actually being used in production by real engineering teams. The AI companies will tell you their tools can automate everything.

The conference talks will also show these impressive demos of their AI tools, but those demos are often heavily scripted in controlled environments, perfectly prepared with perfect documentation and context and unlimited time to prepare in real production environments.

You have messy systems. The documentation is in complete requirements change constantly.

So the adoption is happening much slower than the hype suggests, and that's okay.

That's absolutely normal, and it kind of fits into the pattern of new technology adoption.

This is more of a slow step-by-step adoption.

And Brett says it's only this year, so 2020 4, 25.

That real conversations about AI in DevOps are starting, not implementation at scale, just conversations about how to implement it safely.

So that means we are still in the very early days.

You are not behind.

You are actually right on time.

Now, I wanted to understand what Brett's vision is for the future.

What does he think AI in DevOps will actually look like?

When it matures and his answer paints a really interesting picture.

I don't want to ever be called out of like, you're teaching this, but you have no idea what you're talking about.

Right?

Like, I really hope that I don't, I'm sure it happens, but I really try to hope so I'm really excited right now.

What will my AI workflow in DevOps look like?

Once I've taught everyone else all the steps, because I don't quite have that yet, and in fact the industry doesn't have that yet.

I'm imagining this future six months from now where I have, uh, I have a lot of GitHub examples.

For years I've had, I have DevOps and Kubernetes examples and all sorts of GitHub actions examples, because I've been teaching that stuff for a decade.

I'm imagining taking all those repos or one of those repos and turning into like a fork that's an AI version where literally everything is done by the ai.

Maybe this isn't like.

Future, like actually gonna be how production is is done.

But every step is done by ai.

All I've gotta do is put in an issue.

Everything else happens after that.

And at the same time, it is way more thorough.

So it, it does automatic security checks for CVEs, and then when it sees the CVEs, it actually recommends how I could fix those CVEs in my images or my code, or you know, whatever I have to deploy for somebody when I make a terraform.

Commit it, it reviews it and gives me feedback on how to improve it.

Like I'm, I'm imagining this scenario where I can actually, hopefully, eventually know less about what I'm doing and the AI provides me the safety guardrails all the way through my pipeline.

I don't think it's any one tool specifically.

It may not answer your question, but I think it's understanding.

The end-to-end workflow of what the current ais can do for me after commit.

'cause I kind of look at my job as like, I'm the person af the developer who's making the next, the next Facebook or whatever their job is to make the best app for users.

My job is to make the best experience for them.

Uh, that's how I look at it.

Them and the users.

The users are my, my users, but, but the developers themselves are my users.

And what if I could, like, there's an old show called Silicon Valley, I still love it.

You should.

Absolutely.

If you're not a fan of the show, if you've ever watched the show, Silicon Valley is absolutely a decade later.

I think it just hit its 10 year anniversary.

A fantastic show that is that, that actually talks about AI and they say inference.

From like eight years ago that I didn't even know what that word meant.

I watched that show again recently and it's absolutely still relevant today for like this current chaos of AI is perfect for today.

They should just rerun it.

I had this idea that like there's this character called Gilfoyle.

He's basically the DevOps engineer and he is able to step away and replace himself with an ai.

The AI chat bot's like chatting away with his colleagues saying, saying that he's, that it's doing the work and it does things.

He's not working right.

He's still gotta be there because he's gotta manage the ai, but he's not really working.

He's just sitting back drinking a, a cocktail.

I am kind of imagining is, are we there?

Are we close?

I mean, no one's really shown that yet.

There is certainly no one at conferences talking about that yet.

Even in London, six months ago, there was at all of KubeCon.

Hundreds and hundreds of talks.

One, there was one talk that detailed in depth their efforts of trying to reduce toil by having an AI review all their DevOps prs.

And it turns out at least six months ago, it was way harder than I was interested in doing.

Like I watched that it caused me to go, nope.

Not for me, not, not yet because there was so much to it.

And I'll, I'll, I'll up the link in the show notes.

That to me is like, how can I mix and match?

That's what I'm really interested in.

I don't think there's any one tool DevOps is gonna use to magically solve all these problems, but I am very interested in the workflow pipeline aspect of what if I replace myself with an ai, how crazy would it be?

How wrong would it be?

And obviously we want safety and all that, but just experiment.

See where, how far you can get, and then you sit back and you can drink, uh, your MI ties or whatever.

This vision is fascinating because it shows both the promise and the current reality.

Brad imagines a future where you open a GitHub issue and AI handles everything.

After that, it runs security scans.

It reviews your Terraform code, it checks for CVEs and suggests any fixes.

It provides guardrails throughout your entire pipeline.

Okay, but notice what he says.

I can know less about what I'm doing, and the AI provides the safety guardrails, and this is a key shift here.

The value is not in memorizing yml syntax anymore.

The value is in understanding the workflow well enough to set up AI correctly, to give it the right context to review its output.

To know when something looks wrong and needs correction, and the Silicon Valley reference is perfect here.

The DevOps character in the series replaces himself with an AI chat bot, but he's still there.

He still manages the AI behind the scenes.

He just automates the repetitive work, and that is the realistic future.

Not AI replaces DevOps engineers.

But DevOps engineers use AI to handle the tedious parts.

So they can focus on higher level problems, but here is the reality check.

Even at CubeCon with hundreds of talks, there was one talk about actually implementing AI for DevOps prs, and it was way harder than most people want to deal with right now.

So the vision.

But the path is unclear.

We don't know how fast we're gonna get there.

The tools are still immature, but people are experimenting and that's how every new technology starts.

Do you think that the value of engineers will switch more towards, um, understanding the context, understanding the use cases, the logic rather than memorizing the syntax of the tools or even specific tools and how they work?

Because maybe that can be automated and, and done with ai where AI kind of does this tool selection maybe at a granular level, like do you think.

That's where the value of engineers or the future requirements of engineering skills will shift to where they're more like architects and designers rather than people who actually.

Do the execution or implementation, what do you think that evolution is gonna look like for the engineering skills in general?

Again, trying to read the tea leaves, trying to predict the future is always tricky with this stuff.

But if we can look sort of at the last two years of progression, we are nowhere near a GI for DevOps, right?

We're nowhere near.

An AI that you can just hand over the keys of the kingdom and it does the things without you just thinking about it.

So in my mind, you still kind of need to know all these things.

You do need to know the syntax because how will you know when it's wrong and the AI can't be fired?

The company isn't going to fire Andro because it deployed a Kubernetes cluster with the API for Kubernetes, accidentally public on the internet.

The person who committed the AI code is gonna get fired or at least in trouble.

Right.

Maybe in a world of the far future where we think of AI as having rights and jobs and paychecks, and it can be fireable and hireable, maybe in that world, but we don't treat them that way yet.

I don't see any sign of that happening anytime soon.

So there's still gotta be a human that has to go, I'm willing to take this risk.

I'm willing to commit that button.

Push that commit button or push that deploy button until that one step can be removed.

I think we have to be able to know what it's doing.

I think that every team manager, when things don't go well and production goes down, or we have a deployment that fails and rolls back, there's always gonna be that DevOps, engineer management that's gonna say, okay, let's do.

A root cause analysis.

Let's ask the the five why's and the people in the room are gonna say, well, the AI did this well, why the AI decided to do this?

Well why?

This is probably the most important insight in this entire video.

Lemme break it down to you.

The question I asked was, will engineering shift to being more about architecture and design while AI handles the implementation?

And Brett's answer is clear.

Not yet.

Not for many years.

Why?

Because someone needs to be accountable.

When your Kubernetes cluster is accidentally exposed to the public internet, your company cannot fire Claude or Che GBT.

They're gonna ask which engineer deployed this, which human approved this?

Which person takes the responsibility and that human better understand what they deployed and why.

You can't say AI did it in a postmortem because that won't save your job, and this is why you still need to know the syntax and the implementation details.

You still need to understand how Kubernetes works on a low level, not just high level.

You still need to know what good Terraform code looks like versus bad Terraform code, but not because you are writing it all from scratch, but because you need to evaluate and assess.

What the AI generated, you need to spot the mistakes.

You need to know when something is wrong.

So the skill shift is actually subtle, but very important.

Before ai, you write infrastructure code from memory or from documentation, copy pasting some code snippets.

With ai, you get the output from ai.

You review and verify infrastructure code.

That AI generated, but both require deep knowledge.

You can't review something you don't understand.

You cannot spot errors in code you've never learned.

So the fundamentals still matter.

Learning Kubernetes, learning, Terraform Learning, CICD, in the syntax of all these configurations, learning, networking, and security, you need to understand not just high level of.

How these things work, but also the syntax and configuration details and the basic foundational knowledge of all of these topics, and then you add AI on top as a tool to work faster with all of these tools.

But don't skip the foundation thinking that AI will replace it.

Because when things break at 3:00 AM you need to understand what's happening and how it needs to be fixed.

The AI may help troubleshoot it, but you are the one who needs to fix it or be responsible for the final fix that gets applied.

So after hearing all of this, the limitations, the possibilities, the gap between the hype and reality, the question becomes how do you actually start?

What's the practical first step?

And I asked Brett this specifically because it's easy to feel overwhelmed in this whole AI.

Ecosystem right now.

There are hundreds of AI tools, hundreds of workflows that you could automate.

So where do you even begin?

And I believe his answer is going to save you a lot of wasted time.

So what's the practical strategy?

How do you actually get started with AI in DevOps, given everything we've discussed?

And so that to me is most people step one.

I don't even think I even have to tell them that, that they're just all gonna do that first.

But that doesn't necessarily correlate to AI in my CI in terms of how I'm gonna set all that up.

It's gonna run somewhere else.

I'm gonna pick a bunch of tools that already exist to start with.

Maybe my team already uses cloud code for everything, so we're just gonna use cloud code and ci.

If your team has experience with that, do that because you're gonna save yourself time of learning.

The nuances of prompt engineering and how do I write prompts for an AI so it doesn't hallucinate.

I gotta tell it.

It's it's gotta do it.

Great.

Or we might have to hurt grandma.

Like you gotta threaten it.

Hopefully grandma will be fine.

Hopefully we don't have to mess with grandma much longer.

There's another category, which is not necessarily the category of what AI will do, but the buy model is quickly.

Like everyone, right now, all these small startups are all in a race to be the DevOps AI company.

I'm not necessarily an expert on any of them, but I have been paying attention to that market and how they're, they are all trying to.

Basically shortcut like any other build versus buy model.

Like I always tell people you're either gonna pay for it with money or you're gonna pay for it.

With time and open source, you usually pay for it in time, and if you're gonna buy, you usually pay for it with real money and it saves you time.

The same thing is happening with AI for DevOps, we're now slightly aware of maybe half a dozen companies that they're trying to be the one shot AI for DevOps automation tool, and you use their platform rather than putting together your own things through your own scripts.

Turns out largely their problem.

Is context.

So a lot of them are spending a lot of their time training and specializing these models to be really good at DevOps.

Tasking, here's the practical strategy.

Step one, start with the AI tools that you are already using.

If your team uses Jet GPT, use that.

If you're using cloud in your team, use that.

Don't try to learn five new tools.

All at once.

Step two, pick one specific narrow workflow to automate like CICD.

Pipeline generation is a good start, for example, or writing Kubernetes manifest files or generating Terraform modules, something specific and isolated.

Step three, document your require.

Clearly, this is critical.

The more context you give the ai, the better it performs. So write down your naming conventions, your security requirements, your standards, and feed that to ai.

Step four, review everything.

Do not trust the AI output blindly.

Check it, test it, make sure it works, even if it works, that it follows the best practices.

And so on.

Step five, iterate.

The first attempt will most probably not be perfect.

You will give AI more context.

You will refine your prompts.

You will narrow the scope, and over weeks or month, it will get better.

Now, there's also the buy versus build decision.

You can build your own AI workflows with open source tools, or you can pay for a platform that specializes.

In DevOps ai, for example, and built takes time, but gives you control by costs money, but saves time.

So depending on what's your priority, you are gonna need to trade one thing for the other.

The key is start somewhere.

Do not wait for the perfect tool or the perfect workflow, or for AI to mature enough, but also don't panic and try to.

Bring tens of AI tools all at once in your projects with the fear to not stay behind.

You have time, so pick something small and start experimenting.

That's how you learn and that's how you'll be ready when the AI tools actually mature.

So after two hours with Brett.

Here is what changed my perspective completely.

Everyone is asking the wrong question.

They're asking, will AI replace me?

But the real question is, what happens when one DevOps engineer can manage 50,000 containers instead of 1000?

History gives us the answer here.

Companies do not downs.

They expand, they build more products, they enter new markets.

They improve reliability.

So the work never shrinks.

It grows.

Now, here's something that most people completely miss.

AI does not eliminate the need for DevOps knowledge.

It actually makes it even more important because think about it, when your monitoring system fires an alert at 3:00 AM and AI suggests.

Three possible causes.

How do you know which one is right?

You need to understand your infrastructure deeply enough to evaluate all those suggestions.

When AI generates a Terraform configuration code, how do you know it's secure?

You need to know what good terraform looks like versus dangerous terraform.

So the skill is not changing from doing to not doing, and just delegating it off to ai.

The skill is changing from writing to evaluating, but evaluation requires even deeper knowledge than writing.

You can't spot mistakes in code you don't understand, and this is why Brett's advice is so practical.

Start small, pick one workflow, maybe CICD, pipeline generation or monitoring.

Give AI clear documentation about your standards.

Let it generate the pipeline.

You review it.

You learn what it gets right and what it gets wrong, and over weeks and months, you build that muscle.

The muscle of working with ai.

Not replacing yourself, but augmenting yourself.

And the disconnect that Brett discovered at conferences is actually good news for you.

It means you are not late.

The hype says AI is everywhere and it's.

Super magical, but the reality says that most engineers are still figuring out where to start.

So start now, not because you will fall behind, but because engineers who experiment today will have month or years of experience when these tools actually mature and document your systems. Not anymore for humans, but for the AI that will help you work faster next year or in two years based on that documentation.

And very, very importantly, learn the fundamentals, not despite ai, but because of ai.

And final one is stay skeptical.

When someone promises full automation with ai, ask them, who have you talked to?

That's actually running this in production and getting the results that you are promising.

Now, if you want to continue learning about this topic of AI for DevOps, make sure to check out Brett's podcast.

It's called Gentech DevOps.

He's interviewing people who are actually implementing this stuff in production, so not some theory and opinions, but real use cases.

Which is how you should be learning everything.

And let me know in the comments, are you already using AI in your DevOps work?

And if yes, what workflows have you tried to automate already?

And share in your comment what worked for you and what didn't.

I wanna hear your real experiences.

And people who have not tried AI can see the realistic picture of AI usage in projects.

So live your insights and experience down below for the whole community.

And with that, thanks for watching and I'll see you in the next video.

Loading...

Loading video analysis...