Career and networking evolution with BGPMon's Founder Andree Toonk
Avi Freedman: Welcome to Network AI, the journey of super nerd proportions into the world of networking, cloud and the internet. I'm Avi Freedman, networker, coder, husband and CEO of network observability company, Kentik. On this first episode, we'll talk with my friend Andre Toonk, of OpenDNS, BGPMon, Cisco and multiple networking projects about career and networking evolution. Some of the takeaways you'll hear, are about overcoming intimidation, staying connected to mentors, driving your own growth. On the networking side, we'll talk about the internet working world, SRE and networking in a relationship and some of the trends in desegregation and cloud scale networking. Welcome everyone to Network AI. I'd like to introduce my friend and fellow networker, Andre Toonk. Andre, if you could give us a brief background about yourself and what you've been up to and what you're up to now.
Andree Toonk: All right. Thanks for having me Avi. My name is Andre, I'm based out of Vancouver, Canada, I like everything networking and I guess what we now call DevOps. I've been sort of doing that for the last 20 years. I think what's unique is that I've always had one leg in, what I call DevOps now, SIS admin into POS maybe and networking in the other. Really, always tried to take the learnings from mostly, sort of the DevOps world into the network worlds. Always been a lot of fun. Started my career at the Amsterdam Internet Exchange. I worked for a few ISP, moved to Canada, Vancouver 15 years ago. In 2012, I joined a company that has really changed a lot for me called open DNS as one of the early, what was back then called Ops engineers. That was a lot of fun. When I joined, OpenDNS had a few pops or data centers, I recently left lost full, and I think they're up to 40. That's been a very exciting and an interesting journey, I learned a lot there. Then also, I think one other thing a lot of folks might know me from is I founded a company called BGPMon. I know-
Avi Freedman: Thank you.
Andree Toonk: ...a lot of folks... I know. Can take, obviously an RV through that and through Cisco and I know at Cisco, we were big fans of Kentik. Everything around BGP monitoring, hijacks, outages and that kind of cool stuff. I don't know, Avi, you're super interested in that, you even had a predecessor to that. I think it was called-
Avi Freedman: I did.
Andree Toonk: Watch my Net or something?
Avi Freedman: Yeah. It was Watch my Net. Actually, I met inaudible at a convention for people to run science fiction conventions. He did the taunting me to discover whether I was a worthy nerd, which I don't really enjoy, but anyway, I did that. Then he's like, " You can't build a BGP monitoring system. I'm like, " Well, I don't really do the front end, but I could certainly."
Andree Toonk: I remember the front end was a pro CGI thing.
Avi Freedman: It was BGP. Now, he said, " It's actually dead," but I think it was week, I tried to compile my old BGP implementation that I wrote for my interview at above net that used to run the RBL. I was like, " Ooh, I guess BGP has changed a little." I found some BGP parole thing and made it work and dumped the things and did the grafting. I just didn't really try to put... oh, I should have, but BGPMon did well. Maybe not. It did well got a huge community.
Andree Toonk: I got to use community and eventually got acquired and it's now still, I think, being run by Cisco, although I think they're shutting it down, but you might actually know better by now.
Avi Freedman: It got rewritten and put into something and now they're all respect a competitor because Kentik has just launched-
Andree Toonk: Kentik does something similar. I know. Obviously, you guys have a Doug Madorie, who's an expert in this field too. Anyways, that was a really fun journey. Then I left Cisco, so, a lot of that was around that last fall. Right now I'm on a bit of a sabbatical and trying to figure out what I'm going to do next.
Avi Freedman: Well, we can talk about that because I am only one of many people, I'm sure, that has tried to recruit you and you'd been like, " Nah, it's okay." But had been following some of the fun stuff that you've been doing, which again, we can talk about. You said systems, I know a lot of people are like, " Oh, networking is hardware." It's like, " Really. I mean, it's hardware that runs software." Software, they can run networking, I mean, I have always seen that it can be a continuum. But what got you into the networking, especially internet working, but networking overall side of things.
Andree Toonk: well, I guess I was lucky. When I was in university, we had a teacher that somehow, I guess, got to deal with Cisco and he had these Cisco laps and CCNA was just becoming a big thing. I guess Cisco had sponsored a lab in my university, so they had all the gear and then they had also just released all these kind of flash- based tutorials, Excel material that you could basically follow then you learn about networking. Turned out that, I had just moved out of my parents' place to the university there and I didn't have too many friends there yet. The ones that I had, they still lived at home. I kind of got used to that lab and it was open basically till midnight. I got addicted to it. It's a lot of fun. Then I just started doing them. Then, I got really interested in it then got really good at it. That's how it started. That's how I originally got into it. To be honest, I guess, I worked at a help desk before as well, but this is really how I got into the networking. Then I got a gig at AMS- IX, the Amsterdam Internet Exchange, which is one of the larger internet exchanges in the world. That's where I even learned more. For me, this is where the door opens beyond the default routes. I didn't know, I knew BGP, you set a D throughout to your ISP, but it never occurred to me, what happens after the default routes. Then I ended up there, I was like, " Wow, this is a whole world behind here, which is super fascinating." I really never left that world. It's so fascinating. That's how I really got into it. Cisco labs helped us and then the door to the core of the internet between quotes.
Avi Freedman: I remember the first time I visited AMS- IX, the university park side, I'm not used to smelling smoke in a data center, but people are still-
Andree Toonk: That's right.
Avi Freedman: ...smoking in the lobby, right outside the data center. I'm like, " Usually it's bad if you let the smoke out of the computer, the magic smoke that makes it run." Then, I remember they had, that was, I think the first time I saw the forklift for sticking stuff in, and of course the Glimmer glass, which if it was United States, you would think was for the NSA, but was actually for an interesting approach to layer zero.
Andree Toonk: It's interesting you bring that up. My first real job was at SARA, which is in the science park where lot of AMS-IX stuff is. Some of my coworkers were still smoking when I worked there. I worked on the network called SurfNet, just sort of the internet too of the US. One of my first automation experiences, network automation was all around tier one. I worked a lot with someone at AMS- IX, Aryan if you're listening, they were automating the Glimmer glass with Tier One, and we had a lot of optical gear, like Nortel gear. Telco gear has a standardized language which is called Tier One for folks that don't know that, this is really obscure thing.
Avi Freedman: I didn't that.
Andree Toonk: All the Telco gear seems to have, at least, in the past different vendors, Cisco, Nortel, all spoke the same kind of language. We built some Pearl libraries to automate that together with this M6 to do the Glimmer glass.
Avi Freedman: Actually, for those that don't know it is an interesting take, I don't know if they still do it anymore, but the Glimmer glass was sort of, it was an active, the idea was you'd have a standby port. It would be watching so it would be doing the Mac learning. Then, when they wanted to maintain a switch, they... it wasn't all switched, they'd flip over to another. It was-
Andree Toonk: I think, they just had basically two switches. Glimmer glass is basically a big switch with mirrors in it, right and it's an optical switch, but it's a very dumb switch. It's either goes to switch A, the cable comes in and says, " I wanted to go to switch A." If switch A is unhealthy, well, then, they change the mirror slightly and then they will go through switch B. The advantage as a customer was that you only needed one connection and then, sort of on the provider side, they would just say, " Oh, switch A is unhealthy. We'll switch you to switch B." Then you want to automate that. But the cool thing is that it was all optical with mirrors, essentially.
Avi Freedman: Automated mirrors. Pretty cool.
Andree Toonk: Automated patch panel, whatever you want to call it.
Avi Freedman: Did you have great mentors, whether, how did you get... you did the lab yourself, did you have people pointing you at the things to learn and was there that kind of culture there at the time?
Andree Toonk: If I think back about the lab and really getting into networking and Cisco, a lot of people give a crap on that, like the certification programs, but to be honest, I think if it weren't for those, I would not have gotten into it. I think it gave me a solid kind of background in terms of, what is BGP, what's OSBF, what's... whatever and actually some hands on and getting some of the basics. For me, that was a great way to get in. I was surrounded by sort of that teacher and a few other folks that shared that passion. I don't think if there was really a mentor there at that point, it was pure passion. But certainly the opportunity was created for me to spend however much I want it on all the cool gear. If I think about mentors, I've had a few over my career, but I think, early on when I was still in the Netherlands, there was someone called inaudible at SARA, he really got me into the automation type thing. He was a really smart guy. In university we have to do some programming, but I never enjoyed it. I never really got into it. But I guess, the problem was you were just doing puzzles. I was like, " I don't get it. It's kind of boring. I want to go back to the lab." This was at SurfNet that we were running this list of very advanced network and very large and because all this new gear and, especially the Nortel stuff, the network management systems didn't exist yet for Nortel. Again, in the Telco world, you basically buy the network management system from the vendor, but because it was also new, it didn't exist or it was very expensive. He and I kind of started going like, " Okay, how would we do this ourselves?" It was a whole, back then Pearl was a big thing and he took me under his wing. Then there was another guy named Marco who helped me with a lot of BHP stuff. All of a sudden, my passion for programming kind of came back because all of a sudden I had an itch to scratch. I had an actual problem and it would solve me sometime in the middle of the night or whatever. I was a new engineer sort, so, they gave me all the crappy jobs, like, " Go collect all the serial numbers from all the devices." Well, that's a three week job, right. I can spend two days programming this thing and then run it in an hour and it's done.
Avi Freedman: Laziness breeds elegance in many people, right. It's the, " Why do I want to do this again and again?" It's an interesting theme, I guess we'll talk about it because some of things that you've done are now living between networking and systems, but there were in some ways more advances in networking automation, at least the life cycle, even before that really became, crawled out of the high- performance computing world and became cloud and SRE. Then, I guess it sort of stopped for a while. That's definitely interesting.
Andree Toonk: Avi, just maybe, one sec back to a mentor, right? There's been a few throughout my career and those are the ones that really helped me going. Then I guess my message would be to fore share. Do find mentors, people that want to help you. To be honest, it's always sounds a little bit scary, but a lot of times it's just, find someone you respect or who is a few years ahead of you and just, it just can be a lot of things, but a lot of times it's just unofficial, right. Just have a coffee with that person once a week or once a month. That the relationship changes over time and with whom, but there's so much you can learn from others if you're just kind of open to it. Even for me, even now, I meet with a lot of folks. Sometimes they're just chats and sometimes you're like, " Wow, they're full of gold." I think it's very important. I also tell people a lot of folks within a company have one- on- one. If I work for Kentik, I would have a one- on- one with you, for example, every week. But more and more, I find it's very important to... If you're doing that, keep doing that, but also do it with people outside of your company, see, that's a very well understood thing, but as you go further on in your career, there's only so much you can learn. The company is a little bit of a bubble in itself. Work on someone you worked with in the past. That's a one- on- one too, don't feel guilty that you spend an hour talking to someone at another company because you're learning from that too.
Avi Freedman: Absolutely. Sometimes I get daunted when I looked at my LinkedIn and I try to accept people that I know. You never really know what can come in the future, from staying connected to people. Definitely for sure. The other thing I would encourage is, there's lots of different terms for the bright, shiny eyed kid, but if you demonstrate that you are interested in learning and picking things up, there's often, especially in university, but even in companies ways to get involved in projects and learn. In a healthy company and a healthy environment, people will invest in you if it's clear that you're learning and growing, and hopefully will pay it back too, at some point. With COVID, we need to figure out some of these things you, how that's all gonna work and how we grow the community. But those are all things that we as a community are thinking about. You mentioned sort of, I guess, now we have to say SRA or DevOps or Dev Net Ops, but, the link between these. You certainly have come from the world of, well, the labs, the actual Cisco 2500s and 7500s and such. Also, on the internet routing, where is already virtual and open DNS was, I mean, services on top of, right, a network, with any casts and load balancing and things like that. But I see you've been playing a lot with data plane networking, EBPF, actually, the prehistory of Kentik was network sensors doing packet stuff. Then I just got really interested, customer said, " What do we do with all the data?" But, what was your interest in playing with high speed, networking through systems and all the Linux evolution around that.
Andree Toonk: I think there were a few things that drove me to sort of dig into it deeper. Some of it was driven by the things that were happening in my work environment back then, where, at Cisco umbrella. I know a lot of other folks who are doing this is with, what's called saucy, where basically what's happening is you're offering the typical network services as a service, right? Typical network components like a load balancer or a firewall or nodding or IP stack. You see this in all the old cloud providers, start with Amazon has IP stack gateway, whatever, nod gateways. Traditionally, we would ship lots of appliances, but that doesn't really work anymore. The question then becomes, how do you build these things in a virtual environment, in a cloud native way? That was something I worked on a lot and was super fascinated about this, because again, it brought together sort of those two worlds that I was kind of interested in. That worked really well for me. There's two challenges to be solved. One of them is sort of the implementation on the control plane and management plane, because, the nice thing about an appliance is, it has ASICs and, basically, they scale vertically, right. They can do a 100 gig in one device, for example. Well, there's no way you can get that out of a VM. Right. Then you're faced with it, you have to basically dis- aggregate that. Now, you need to have, I don't know, let's call it 10 VMs to do the equivalent of one big box. Now you're basically into the distributed computing problem. How do you synchronize states and all that kind of stuff. That's a whole interesting problem. That was part of the problem space. It's super interesting. Some people call it micro services, but basically it's a distributed computing problem. Then the other part is speed. These big boxes have ASICs and they go crazy, they're really fast. Right. If you do this in virtual environments, you actually have a problem because, well, there's a few problems. The first one you often hear about was Linux networking is slow, right. That's kind of true, I guess. Right. Part of my journey was to define what is slow. The conclusion was, in Linux, I'm going to skip a whole bunch of details, but sort of, you can do a million packets per second per core. That's sort of the rough-
Avi Freedman: Through the kernel. Just to be clear, through the kernel, forwarding using the IP stack.
Andree Toonk: Thanks for clarifying.
Avi Freedman: That's the baseline, right.
Andree Toonk: There's all a bunch of details there, but that's a rough, if you want to take the numbers, you just use kernel networking, you say the kernel does the forwarding, a million packets per second per core. Whether that's slow or not, really depends on your situation, but certainly if you're trying to do a 100 gig through a firewall service in a data center, and you want to do that, well, that's... Now, you really have to do a... First of all, again, I'm talking about a 100 gigs, but really, the number we should be talking about is the number of packets per second. The reason why we say packets per second, because, roughly speaking again, keeping in details is, every packet per second is an interrupt or a soft interrupt essentially. Right. This is where the CPU speed then kicks in. Actually, we'll go a little bit deeper since we're here with Avi, I mean, we can go geeky. That's just a kernel, right. Let's say you're a fancy shop and we're doing Kubernetes. Cool. What does a typical Kubernetes world look like? Well, actually you run Kubernetes, basically Docker containers on a VM host on hardware, right? Typically in the world of Linux, you have these, what we call the VNIF, Virtual Network Interfaces, connecting them. If you now look at the path and the costs that it takes to take a packet off the physical wire into, say the hypervisor, that's the first nick you got into, the network interface. That's an interrupt. Then you go into the sort of the VM host and then you go into the Docker. There's typically three physical VNIF through the VM host, VNIF through the container.
Avi Freedman: You might have a magic tunneling policy enforcement.
Andree Toonk: There's that too. But just in this example, there's a few of these virtual interfaces, but they all have a cost. They all have that 1 million packets per second per core type thing. Now if you have a simple scenario, as I just explained, and there's lots of variants, then all of a sudden your budget is cut into three, right. Very crude. Anyways, that was something that when I was in that world, it's like, " Well, this sucks." Because, how are we ever gonna build a 100 gig IP stack gateway, for example, at a reasonable cost. That done, the long story short, I started exploring what are the other alternatives? There's two main alternatives that have a bit of traction. One of them, people have probably heard of is DPDK. DPDK is basically a very fast driver that that bypasses the kernel. But, it's only-
Avi Freedman: But always bricks well with spin locking. Right?
Andree Toonk: Yeah. Basically, the CPU runs all the time, whether you have packets or not. It's a little bit of cheating, but if you work in network heavy environments, you probably don't care about giving up a CPU just for the networking. Basically, it takes away, this is part of the challenge, Linux is a multi- user multifunction system. It's optimized to be as generic as possible. Whereas for a lot of these workloads, actually, you don't want time sharing. You basically say, " Hey, this CPU, all you're doing is doing network packets." Then you can do leverage to L1 caching and all that kind of stuff. Basically, that's what DPDK does. But then DBDK only does sending and receiving. It doesn't do forwarding for example. Now, you need to build a network function that actually understands, " Well, this backup with this destination, IP or Mac should go there." That's what the VPP does so that we can dig deeper there, but DPDK and VPP is one option. In the BPF world, you have AXDP. DPDK and VPP completely bypass the kernel, which is very interesting. The NIC, the Network Interface, literally disappears from, if you do IIF config or IP link, you don't see it anymore. You gave it to a user line program and that's taken care of it.
Avi Freedman: Can you use SRI- OV and make some of the NIC interrupts disappear, or it takes over the whole NIC only?
Andree Toonk: Yeah. You can do SR- IOV, and then it makes it a little bit easier because then you don't have to have self dedicated NICs. Certainly good for testing. If you don't have a machine with multiple... because then you need an auto band for management. The other one is AXDP which actually is a little bit of a hybrid between the two and it allows you to execute network codes. AXDP is basically part of BPF. A lot of folks have heard of BPF and the specific BPF codes related to networking is AXDP, express data path. It basically executes some of the network codes a lot earlier on in the Linux kernel stack. As a result, it's a lot cheaper.
Avi Freedman: It's above the driver, but before the IP stack?
Andree Toonk: Yeah. It's after the driver and it's almost immediately after the driver code and what they basically, want to say, in Linux, you have this concept of SKBs which is a structure for sockets. As soon as you got there, you get very slow. They say, " We want to do a lot of that work before that." That's what you can do. In one of my blogs, I said, " Okay, let's say you have two network interfaces and I want to route between them. I still want to use bird or whatever, as a BGP and built on the forwarding in AXDP, which basically, it's really nice because you got to use still the Linux forwarding table, the kernel routing tables. You can say IP route this, IP address list behind that NIC, which again, if you use DPDK and VBP you can't. Everything has to be done in use alone.
Avi Freedman: Zero kernel. But you get some kernel.
Andree Toonk: This was the big lesson for me. As we went into this journey, it was like, " Okay, well, what does this really mean?" Then it became very obvious. If you take it away from the kernel and you have to do everything in use alone, you have to re implement everything. There's no TCP dump. There's no IP tables. You can't type in IP route this, in fact, there's no TCP stack, right. All of that has to re implemented.
Avi Freedman: It's the mono kernel, people trying to make a mono kernel to run a microservice, but for networking.
Andree Toonk: You can do it really fast, but you better have to go all in and a team to actually do that. It's not something you can just kind of do an-
Avi Freedman: What was the answer? One million packets a second for the core, what was the answer with with AXDP?
Andree Toonk: I think with AXDP, it was something like 10 million packets per second. With a VPP, I think it was around 14 million packets per second per core. Significantly more expensive or, significantly cheaper in terms of budgets. Just switching to AXDP, which is sort of a nice middle ground where you get some kernel functionality, but a lot of speed improvements. You've got a 10X improvement or so, if you guys are interested, look at inaudible, that's where all the details are. I don't recall the exact, but it was pretty impressive. It's a very steep learning curve though. You have to figure out if that's what you need, but certainly some of the new CNIs in the world of Kubernetes are leveraging a lot of that stuff. A lot of that work, VBP and AXDP have CNI implementations. That's a very interesting topic.
Avi Freedman: You've come a long way from not liking coding. But again, you have to find the right problem to solve too. That's pretty cool. I remember when I first started seeing people try to do this and I was looking at Snapswitch and I'm like, " That's pretty fast. What's going on there." That's pretty interesting.
Andree Toonk: Snap is another one. There's a few out there. But I think AXDP is gaining a lot of momentum and I think VPP is super interesting as well. By the way, if you guys are interested in VBP, it also goes by Fido and the project name is fd. io. It's actually a Cisco technology that they open source. Apparently some of the Cisco products use it internally but it's pretty powerful. But I guess, if you need raw power, go look at it, but you really need it because it comes at a pretty high cost in terms of investing in understanding what you're doing, right. It's not like, " I'm going to replace some parts with engine X or something." It will take 10 minutes to figure out. No, this is going to be measured in days or weeks.
Avi Freedman: Well, that's a separate topic maybe for another time, but the whole idea of white box, and what's the idea of running Linux and it's nice, but, be careful what you wish for. When you don't have vendor support for everything. I think, again, a separate topic, that I think has been some interesting middle ground, but I to ask you, I saw all that work, which I consider to be low level. You have to know how the systems work. You have to understand what's in the kernel of what layer? I like that stuff, but then I've been looking at my socket, sassy. I was following tail scale on some of that stuff. That's sort of upper level. I was like, " Well, that seems pretty simple," but it seems there's some interesting things that you can do with it. I'm sort of curious about what motivates you.
Andree Toonk: You're kind of curious. Like, " What is this guy up to? He's all over the place. Is doing this, is doing that." First, that was all around the cloud native networking type stuff, the data plane networking and the control plane, and then another project that was working on is my socket. Maybe, a little bit what that is. Sort of the 10, 000 foot level thing is, what is my socket. It's kind of an alternative to remote access, VPN type stuff. I think the what is the challenge with the typical VPN is, when I think, when I was at Cisco, any big org, like, " I want to onboard ABI as a contractor." And he needs access to Wiki. com. I have to remember to create an it ticket two weeks ahead of time. We would trade a corporate account. I would send you to any connect client then you could VPN in. Then the challenge with that is, well obvious now, VP running in, now, he basically becomes a member of the network. Again, I'll use Cisco as an example. That's a very big network, obvious smart guy he's going to poke around. Right. See what he can do. It turns out we also gave him corporate credentials because they use SSO for VPN. Now, he finds a GitHub server. Chances are, you can actually log in. Right. So that's, that's a typical problem. I was like, " Okay, how do we solve this?" Now, there's a lot of people that have fought about this, especially with my socket loss, how would you build a solution to that? Something like what Google calls Beyond Corp other people have called private access, zero trust type stuff. That's with my socket. Based on the idea is that, you have this Wiki server, it actually kind of dials out to, in this case, my socket, X secure tunnel for just say port 443, that particular service and then obvious says, " Okay, I want to go there. We eat all the corporate, the column. And then we act sort of asked bouncer in the middle and say, " Okay, we have a checklist. Is Avi allowed?" We also check what Avi's doing exactly. Then we stitch those two connections together. The nice thing is that the Wiki example, it can be in a private VPC behind Nat because it's an outbound connection so no VPN changes are needed. Avi can be anywhere and it can also be clientless, you don't need a VPN clients or something like that. Oe of the things that I'm doing, sort of from a technical perspective really drives me. Okay, HTP was relatively easy because there's a lot of proxies out there. How do you build more application aware proxies, and with application that we're approaching? I mean, like, it can speak the protocol like HTP.
Avi Freedman: Is Quake tough too?
Andree Toonk: I haven't looked at Quake. The ones that I've been looking at are more, the typical remote access use cases like SSH. I have a bastion host. Normally, you need a VPN, right. Or a router. I don't want to use the VPN. Okay. Well, how do you do that? Then I built my own sort of SSH proxy, your own implementation of staff and with things like Go, it's actually relatively easy, so, I learned a lot of Go. Because we speak the protocol, we can do session recording, we can kill sessions, all that kind of stuff. Building an SQL proxy right now where we can log all the queries or can even modify what's getting back. If Avi does a select on users, I'm gobbling the email address or something. That's the kind of stuff I'm working on with, with my socket, where you're basically sitting in the middle as this application of web proxy and provide policy authorization, authentication and enforcement and recording and stuff like that.
Avi Freedman: I would encourage people that are looking at, trying to play with technology to look at what Andre's done. I try to think the same way, what's the minimum covering set that I can build to demonstrate the concept and then build on top of it towards, and then all of a sudden it's like, " Oh, look, zero trust and everything platform," that you could do whatever. Then, that'll be the interesting question to ask next time, is what are you going to do with it? Just project or company side, because, there's lots of different takes on the direction things are going. I'm actually in Vegas. This is my virtual background, but of my actual home. But I'm in Vegas at Blackhat. Is much smaller than before but it's just interesting to try to poke through the marketing and figure out what people are actually doing.
Andree Toonk: It's part of my journey as well. A lot of these things I start with, like, " I don't really understand what this..." I've keep hearing about it. Then I get slightly frustrated because I don't understand it. Right. Well, let's just build this DPDK, or AXDP thing. In this case, well, zero trust, very popular. I was like, " I don't really understand anymore what this is. And I just go ahead and go like, " Well, if I would build it, what would it do?" Then, it's kind of a forcing function to go in and figure this out. I'm sure now that you're a Blackhat or deaf gone in Vegas, you've heard zero trust like a million times and everybody means something different. It's just a great exercise to figure out what you think it is because you're forced to... If you're implemented, you better understand what you're doing.
Avi Freedman: Actually in 2003 for the first issue of ACMQ Eric Allman, who I was like, " Oh, Mr. Sendmail reached out and was like, would you do an article for this? And I'm like, what do you want to write about? I was like, " I don't know what to write about." I said, "Well, what does port 80 make you think of?" I'm like, " It makes me think, Ooh, I think I can say this, of the illicit tunnels that I have put any to my home stuff, because their firewalls." I didn't like the firewalls I'm like. I said securing the edge. If I had said zero trust, it would have been really cool.
Andree Toonk: Well, you were way ahead of your time.
Avi Freedman: Actually, I just was talking about IP, so I didn't have the user concept in there, but, obviously, nowadays, that makes sense. Who's the user or the role or the service or-
Andree Toonk: Tie it all together. Right. User, expected role, the application. Now you can tie in contacts like, " Hey, obvious, all of a sudden in Vegas at Blackhat, maybe one or two and there's laptops doing all kinds of weird things. Maybe you want to drive back a little bit of this permissions and stuff.
Avi Freedman: You mentioned something which I think is pretty cool because, it's been part of the journey of networking for me as I described networking sometimes as lots of little simple things that interact in complex ways involved with vendor bugs. The average CIS admin might find a kernel bug in their lifetime nowadays, didn't use to be the case, but nowadays. The average network practitioner, it might be monthly or quarterly or yearly depending on how active you are. Sometimes, the other thing with networking that I've found is that, it can be hard until you put your hands on. You think there's complexity that there isn't like, I remember when I was learning BGP, I'm like, " There must be performance- based stuff. Something must make things change when the performance is bad." No, there isn't. What you just said is really powerful, which is like, " Sometimes you just need to put your hands on." Well, that's because it's all marketing or because, it can be hard in any kind of technology to see the description and put it in your head until you really put your hands on. I love that you're making all the blogs available and all that, but in code available for people to play with, that's pretty cool.
Andree Toonk: No better way to de- mystify some of this stuff than just getting started. To be frank, it is scary, it's intimidating and it takes a lot of time. You do have to have that time, not everybody has that and especially not every employer allows you to do some of that stuff. But that's the best way, just keep your hands dirty and keep staying in shape, essentially.
Avi Freedman: One of the things I like to ask people is sort of, what's hot and what's hype. It sounds you're both big and hype- y on SASI, but you're big on what you can do with Linux and dis- aggregating, but I guess I'll ask you the question, what do you think in networking in general is really cool. And what do you think is being talked about maybe more than it should be?
Andree Toonk: That's a deep question. There's a lot of hype around well, zero trust, stuff like that. There's a segment of things called SDP, Software Defined Perimeter which is actually quite interesting. I don't know if this is exactly your question, but certainly what I've seen over time. I know a lot of actually, folks like you will like, traditionally networks are dumb. This is a quick dumb fossil pipes. Then there's been this trend over the last 20 years, somewhat driven by people like us that want to make more money out of the network our employers wants to put more smarts and services in the network, in the network gear, into firewalls and all that kind of stuff. In fact, Kentik as a company, we want to extract value from the network, value in terms of disability and stuff like that. Unfortunately, as an industry we've been trying to put more and more stuff like that into the devices, into the networking. Network gear has gone from just doing layer two layer three all the way up to layer seven. This has been very, I've seen this around me, I guess, stressful or difficult on network engineers. That's why I heavily biased, think in terms of engineers internet engineers, network engineers are some of the best around, because they are forced to understand the full stack. I remember outages in various companies that if there was an outage, it was typically the network engineering team that first of all, they had so many scars, they were battle- hardened so they were cool under pressure. But they also understood in a lot of chaos, they were very good in trying to figure out where's the problem, is it in this stack, this stack where, because they understand all that. But there's also a lot of cognitive overload. This has been great or it's worked well for people like me and you that every year we learn something new and sort of driving or sort of serving that wave. It's a lot more challenging for people that come out of university and want to become a network engineer. Right? First off, I don't think there is an education or a program. There's lots of computer science stuff, that's a problem. I don't know if that really ever existed but the world into sort of network engineering was a lot easier. There's this problem for new engineers and even existing ones, I call it cognitive overload. You need to do so much nowadays. Depending on where you work, then that we're excited is seen as a valuable asset. Certainly the world there I came from, we were delivering network add on services so it was seen as a strategic advantage if we, same for Amazon, the more we invest in it the better and stuff. Whereas, if you work in a more enterprise environment, you typically just the call center, right? That's very challenging, but you still have to deal with this cognitive overload. Why am I saying this, I feel we've reached a top, a peak. Right now, what you're seeing more is that networks are made dumber again, this is a good thing, in my opinion. We talked a little bit about service mesh and a lot of that stuff is now built into applications. I think service mesh as hypey as it may be, look, that's the whole question, what the service mesh mean to you and me. But a lot of times I think that's interesting that you see. Basically, what service mesh provides, it's an alternative to your traditional load balancer or traditional firewall or whatever, right. Where traditionally we would put a load balancer in the network and we would funnel all the traffic through there. From there on it would go to the things again. Service mesh is different. It's hosted in a much more meshed way if you will. But it's interesting that those responsibilities are now being basically put in a different part of the stack and also in different teams, similarly, with security where you had firewalls. But if you look at a cloud, we don't have firewalls anymore. Now we have security groups and whatever, and they're managed by the teams itself. I think those are all good things to be done because no network engineer I know ever like doing ACLs and stuff like that, although they will always the voice of reasons, like, "Are you sure you want to open this up to the whole world?" I think that's a bit of a trend that I've seen where and I think that's good for network engineers. It's also good for innovation in general, because other teams can kind of re- imagine that. Now, whether that's going to be as scalable and as good, look at Kubernetes networking. It's getting better slowly, but it's been a big mesh.
Avi Freedman: It's flexible. Kubernetes networking is flexible.
Andree Toonk: It's flexible. But if you came from the world where I came from, where you want to build network functions, you want to build a firewall. Okay. First of all, I have to go through these three levels of network interfaces. Oh. By the way, there's like three levels of nerding as well. It's like, " This is great for proof of concept lab stuff, but there's no way I'm going to get significant amounts of traffic through this." I mean, that's part of just the evolution, eventually they'll solve that, but I think certainly a trend that I'm seeing and I'm not quite sure if it's good or bad, but it's certainly good for network engineers that have this tremendous cognitive overload, I think.
Avi Freedman: When I was an ISP in the'90s, every year it was a paradigm shift, it was awesome. For people that hate being bored, which I mean hate being bored from a technology perspective. But I do see a lot of customers, a lot of people sort of confused between the service mash and then STL was going to be hot, but it's a coordinator. What are you running underneath? Then network mesh and what some of the network meshes are doing, because ultimately there's policy, there's load balancing, there's telemetry, there's all these things. I do sort of, the separating into dumb pipes versus the services on top. But I think there still is a lot of question about, how much is sort of the load balancing service meshy stuff, and how much of that network intelligence and balancing and things like that are going to happen down there. But, it's interesting to say, I agree, it is an area of hotness and hype altogether. We see a lot of innovation and I look forward to seeing and figuring it out. We keep it pretty simple with our infrastructure backend, because we want the training to be simple and stuff like that.
Andree Toonk: It's super fascinating. Our service mesh stuff, basically, all these overlay tunnels going everywhere. It certainly allows for more innovation I find, I know we got a lot of questions from other teams to the network teams that I lead. I mean, we wanted to do it, but there just wasn't enough time. What you see, in several companies is where there's now two teams. Now, there's the network team, the big pipe, dumb pipe stuff. Then there's an overlay team, for example, somehow embedded or however you want to implement that. That's where you could do a lot of smash. I only can see the networks that Avi has invited me to, we provide the encryption on that level versus sort of the underlying level. We've brought identification on top of that and fill overs and now it's all software. You're no longer limited in what you could do on the big Cisco, Juniper, Nokia, whatever boxes, right. Now what you want to do is only limited by your imagination. It makes me very excited.
Avi Freedman: It's exciting but it's scary because how do you document all these, again, it's many simple things, but they can evolve in complex ways. Something else you said, you're talking about the lab days when we started talking. It used to be said that, until you've destroyed a system you aren't a real CIS admin, until you've taken down the internet you're not a real internet worker. Is that easier or harder? I mean, do we have labs that simulate that? Was there more freedom, less freedom?
Andree Toonk: That's a great question. I have a few of those scars for myself. I'm very proud of them. I wear them with pride, but if I think back of them, they were very stressful. Having worked in a lot of operational environments and training new engineers, sometimes I feel a little bit bad because it's a lot harder. Well, it's still very easy to make those mistakes, unfortunately. Although we try to put in more controls, but the cost of making those mistakes are so much higher nowadays. When you created some kind of routing loop, a pop went off, that sucked for a few minutes, but then it wasn't the end of the world. But now, depending on your environment, there's millions of dollars and just really bad press or whatever. I think those mistakes will still get made, but the cost is a lot higher and it will make a lot more impact on the L& D engineer. You see this trend now with inaudible, I think that's great. That's been a change over the last, maybe two or three years, because accidents do happen and humans do make mistakes, hopefully we'll learn from it. But it's certainly, I think harder. Well, it's not harder to make the mistakes, maybe it is, but the cost is so much higher, so people are less willing to experiment. They are a lot more careful.
Avi Freedman: It's definitely interesting. People are trying to do labs, to simulate the internet and peering and all that, especially because it's not just about technology, it's about politics, which is sort of a layering layer, depending on how you look at it, especially with internet engineering, but how do help there and give people. Because sometimes until you really get it in your head, the dangers of redistributing routing, which hopefully no one's doing much, but routing protocols. Until you see it can be hard to really internalize. The thing about automation is, it can save a lot of time or it can do the wrong thing really fast and really well because computers are dumb and will do what you tell them to do.
Andree Toonk: You want me to remove this thing? I've done it everywhere now. And there's a global outage. Every now and then we hear, I mean recently there was something with Fastly and then Akamai, I think two, three weeks ago, and it's just hard down. Right. Whether that's network- related or DNS, it's the same type of challenge. I fully agree. The other challenges is, as we are trying to scale these environments, we're actually putting layers on top of that. There's a lot of network engineers, I don't mean this in a bad way, that actually never log into routers. All they do is make changes through a GUI or a pool request and a Jinja template and then magic happens, which is great for scaling and automation and, and makes the network safer, assuming you do testing on them, but the challenge then because, now it broke and eventually something's going to break, whether it's a bug or whatever. Now all of a sudden you do have to log in. One of my worries is that over time we lose that knowledge, someone that can log in and understands the TCP three- way handshake, to BGP handshakes, and why is this route flapping, stuff like that. That's a very interesting challenge as we scale up, like, how do you keep that sort of more and more niche knowledge? Are there enough people that keep doing that?
Avi Freedman: As an observability vendor, I like to think at some point we'd have all the telemetry and whatever to do that. But as you said, sometimes you need to get the TCP dump because there's something in the middle, which you're never going to get telemetry from, which is behaving poorly and you need to be able to point the finger and go figure it out. I mean, sometimes, it has been handy. Sometimes my fingers actually know more than my brain, what to device. As you mentioned, you do need to get baked a little bit under pressure sometimes. Sometimes it's helpful to have your own parallel processing of like, "Here's the incident, here's what I'm looking at. And here's what up there." It's something I'm thinking about. I think the community is thinking about it, especially with COVID, because networking, especially internet working has always been a little bit of a tribal knowledge and an apprenticeship. In the'90s, you didn't graduate to above journeyman until you actually broke something.
Andree Toonk: Congratulations, here's your medal. Now you have route access.
Avi Freedman: Exactly. But it's hard to really understand how bad the internet was in the'90s and yet it all worked. But now that we live connected over it, those are things, look, we're all going to think about, blog about if you have any great ideas. Something I'm going to keep asking people about, on that note any advice you'd give younger Andre, for the career?
Andree Toonk: Younger Andre. I don't know, I think I was very lucky to just stumble into the right things and be surrounded by this lab and the right people early on that just kind of, I guess, directed me in the right way and was lucky to take the right risks and just have a good environment and good people around me. If I think about new networks or people that are interested in this, or maybe you're just getting started or earlier in your career, I think find those people that you think you can learn from, and that are willing to spend some time with you, find an environment and a company where things like experimentation are encouraged versus, just don't touch it. That, is this fear and you'll never learn much. And those companies do exist. Just talk to a lot of folks, read things like NANOG and stuff like that. Even, that has changed a lot, but just, even by lurking and watching the presentations, you'll learn so much. To be honest, the one thing that was very important to me was that I early on learned how to code essentially. I still think that some of the best people I've worked with both understood basic. I'm not like you don't have to be bad at programming, basic scripting. Nowadays, especially if you are a networking, Python is kind of the go- to language. There's tons of libraries out there for network folks. I think that's very important. It will set you apart from some of the others. It will make your job a lot more fun too. All of a sudden, you're no longer handcuffed by what the vendor allows you to do. You can build your own systems and change things. I think that one is definitely one of my big tips, spend some, and if you don't know how to do it, if you've never done, it can be very intimidating. But this has been the same for me with like DPDK or AXDB or service mesh. I didn't know any of that. What's hard is to get started and it's almost like, " I don't know. This weekend or I'm going to take two days off." Or maybe if you're lucky you can talk to your manager or your boss and say, "I want to spend two, three days just to get going," because those first two, three days are the hardest. Just watch some videos, do a course on, there's lots of course websites out there. Then once you get going, once you you've written your first few scripts, then all of a sudden it starts to unlock and then you can start to solve your own little problems and then the future is limitless essentially.
Avi Freedman: I guess I'll follow that up with a few thoughts. First I want to encourage everyone listening to do what Andre has done. Something I have done too. As you learn document and teach. If you have a question that was confusing for you, I guarantee you, it's confusing to someone else. No, you will not look, you will look smart by breaking down the things that were confusing and then how you got unconfused, right. It's like, " The kernel is this big thing." Then we try to understand what it is and here in a different way, because different people learn differently and need that. We just encourage people that also is really helpful to the community and also acknowledge, which you've said a couple of times, Andre, we've both been pretty fortunate and lucky that the things, we found our way in, we had access, we had privileged to take time and do what we thought was interesting. If people are looking to get in, people are looking for mentors, people are looking for pointers, feel free to pin me, I'm avi @ avi.net, avi @ kentik. com. People like Andre, if you demonstrate that you're reading and thinking and interested and can be passionate we're happy to help connect you to people in your area of interest and try to grow the community. Because it's something we're all actively thinking about as things get more and more abstract and even as the world is opening up, hopefully, safely post COVID, there's going to be different patterns that we need to figure out how to teach and grow these communities. Thank you for being on and sharing Andre. How can people find you and reach you?
Andree Toonk: I'm on Twitter @ atoonk, A- T- O- O- N- K. That's the best way to reach me or check out my website @ toonk. io about all my adventures around exploring some of these things and learning sort of anthe public.
Avi Freedman: Cool. Well, thank you very much. I look forward to maybe in another year seeing what progress you've made on my socket and whether you have a third project, maybe product company coming down the pipe.
Andree Toonk: Awesome.
Avi Freedman: Thank you very much.
Andree Toonk: Well, thanks for having me, man.
DESCRIPTION
In our first episode of Network AF, our host Avi Freedman sits down with BGPMon Founder Andree Toonk to discuss the world of networking. Andree is the Senior Engineering Manager at Cisco and has 20 years of experience in network infrastructure. In today's episode, you'll hear insights from Andree that will help you drive your career growth by taking control of your learning experience and finding your mentors. In terms of networking, you'll hear about some of the trends in desegregation and cloud-scale networking. Listen now for a deep dive into network engineering!