You can listen to the podcast and read the show notes here
Michael Smith: I’m here with Charlie Arehart, veteran ColdFusion server troubleshooter. Hey Charlie. We’re going to be looking at some very strange stuff, which is why your ColdFusion might be slow, but it’s nothing to do with your SQL JVM or even your CF code. And what we’re going to be talking about is unexpected load.
So, we’ll look at that and we’ll look at what different kinds of unexpected traffic you may have on your server and how much traffic that could be and how common it is to find. I’ll let you into a secret here, it’s pretty darn common. And what kind of crazy problems you could have on your server when you have unexpected load.
You might be thinking, “Well, I have an intranet site, aren’t they safe from unexpected load?” Charlie is going to let out, reveal the secret on that. No, he’s shaking his head. So, you better listen to this even if you have intranets. And we’re going to look at how you can mitigate these problems fast a lot of the time, other times it might take longer. We’ll look at the different kinds.
And you may be thinking, “But what about robots.txt, doesn’t that block all these things?” And unfortunately I think the answer is no again. And also what about all these spiders bots that scan your site, aren’t they getting hard as a block and they getting small, so how do you handle a smart spider? It sound dangerous. So, it’s very interesting topic that you may not have thought about before. So, welcome Charlie.
Charlie Arehart: [foreign 00:01:36]
Michael Smith: [foreign 00:01:39]
Charlie Arehart: [foreign 00:01:42]
Michael Smith: Yeah.
Charlie Arehart: It’s going to be lots of Spanish and mariachi and all kinds of fun stuff.
Michael Smith: Yeah, si.
Charlie Arehart: Let’s not confuse anybody [crosstalk 00:01:55] It’s okay if you just speak english.
Michael Smith: Okay, we’ll stay in english. Yeah, we’ll speak english. We’re just joking around. So, yeah. Usually when people think about server troubleshooting, they usual go to’s are, they’re looking at their SQL or their JVM or their ColdFusion code, but what you’re saying is that … Of all those often our problems on servers, [inaudible 00:02:19] will be something else happening.
Charlie Arehart: Yeah. And it may be that those things have an impact. It may be that those things are being stressed in ways you don’t expect. And that’s really the bottom line of this talk. I think the title I had was our spiders eating your server and so that’s … What we’re getting at is all I do all day, every day is troubleshoot people’s servers, whether CF or LUCY or Railo or even some other things.
Most of the time when we’re doing things there’s some commonality, the configuration issues. Sometimes there’s coding issues, it’s not as often as people might think I mean. That’s another little lessons of this, I would say, is that a lot of people when they have trouble they focus on looking at their code, they focus on looking at their SQL, or if they have gotten past that or ran out of ideas then they start thinking about tuning their JVM. I’m telling you it’s really rarely those things.
Instead there’s usually some, potentially a configuration issue that’s the matter, there could be something else in your environment that’s the matter, but when it’s none of those things I’m saying I frequently find, frequently find that maybe either in a moment or perhaps persistently some people’s problems end up being unexpected load on their server. And when I say unexpected loan I mean it could be all kinds of stuff. So, I use the term spiders, for those who maybe don’t know it, generically people think of spiders as like search engine spiders.
Now, they may be a contributor to this. I find a lot of people who are confused about that too because they think, “So what, when somebody searches Google and they get told that there’s a page on my site is that what we’re … ” No, no. What we’re talking about is that for Google to give the answer that says your site has that answer Google’s got to visit your site and Google doesn’t just visit one page of your site, they visit every page of your site. And they don’t just visit it once, they’ll typically visit it every day or every couple days. And then if you start to think about it, you might have more than one site on your server, right? You might have multiple sites.
So, they might be visiting all the pages on multiple sites every day and that’s just Google. Then there’s Bing and there’s Yahoo and then there’s Yandex from Russia and Baidu from China and there’s Ezooms and AA traps and Rogerbot and it just goes on and on and on. It’s crazy. If you have the right tools to look you could be stunned to find, I’m telling you, I’ve seen as much as 80% of some people’s website traffic is these spiders and bots. Now, and I’m just talking about generic search engine spiders, there’s much more that could be hitting you.
So, some people are probably already thinking, “Okay, well, I’ll just block Baidu and I’ll block Yandex, I don’t care about them.” But you’re probably not going to block Google or Yahoo or Bing, so you’re still going to potentially have lots of traffic. But even if you block those I’m telling you they’re not it. They are a contributor, they may be hugely hitting your site and sites, multiple sites and you may not notice it, but there’s just all kinds of other ones that you’ve never heard of that are hitting you.
And they may be legitimate, there’s RSS reading tools out there and they love to look for blogs and podcast feeds and all kinds of fun stuff and they’re checking them every day to say, “Have you changed? Have you changed?” And then sometimes it’s hackers, bad guys. They’re trying to break in and they may be trying to hit pages in a certain way, maybe trying to break in with SQL injection or various other forms of HTML and HTTP forms of break-ins. So, they may be hitting your site or sites dozens, hundreds of times a day.
And then another problem is that these … Whatever it is, and we’ll talk more than in the presentation about the different kinds, but you’d be stunned to find out how often they’re coming. And we’ve talked about that they come a lot and they come every day, but sometimes you look closely and you find out, “Holy smokes, we’re getting like three or five request per second.” Maybe from one of them or maybe over multiples of them. Go ahead, Michael.
Michael Smith: No way. Five requests a second you’ve seen?
Charlie Arehart: Yeah, yeah, yeah. And the thing is, remember, we’re talking about like say on a ColdFusion server, that might have multiple sites on it. So, it could just be, for instance, Google and Google’s not paying attention to where your domain is. All they know is one machine on Google somewhere in the world has been told, “Go visit this site.” And another machine on Google in the world has been told, “Go visit this site.” And then there’s one for Yahoo and one for Yandex and Baidu.
So, they’re all just randomly come in visiting and they could conceivably, in a perfect storm, have multiples of them hitting at once. So, each one might try to avoid coming too often but they may or may not be sensitive to the fact that you’ve got multiple sites on the box and they’re hitting multiples at once, or you may have two or three of them at once. So, I don’t want to overstate that because somebody might say, “No, no, no, no that’s … That doesn’t happen to us, we know that.” Okay, but there … I’ve seen … Let’s do another one.
Site security scanning tools, right? These days a lot of people are concerned about site security and they’ll … Either they or someone in the organization will run some sort of a scam against their server, but when they run that scan, again what are they doing? They’re trying to call lots of requests with different forms of potentially bad break-in type code. And those tools, sometimes they’re not as careful and they might make a request every second.
And then I go, “Come on, it’s an HTML page, what’s the big deal?” But you know that your CF code is probably doing lots of stuff, even one page might do lots of stuff. Well then, stop and think about it. If they’re trying to break in and you’ve been good about like putting CF [inaudible 00:08:19] ram into a block SQL injections, we’ll just going to throw an error. Well, what happens when your code throws an error? Hopefully you have some error handling. what do you do when you’re error handling? Well, maybe you send some e-mails to somebody or you write something to a database or you call some CFC to store it somewhere and you have some interface for looking at your errors.
Well, what I’m saying is, you might do a fair amount of stuff when there’s an error. Well, if a security scanner is creating an error every second. I’ve seen servers where we found out that error handling was clobbering them. Think about it, CF developers out there, you might have when your error handler a dump of the CGI scope and the session scope and the form scope and the cookie scope, service scope, the application scope, thinking, “Hey, I’m a developer. I need to solve this problem by looking at what’s wrong.” So, you dump all these scopes. Well, that takes time.
I’ve seen sometimes just doing all those dumps either itself takes time or when you start doing one every second the different requests are tripping over each other, one’s trying to do the dump and the other one’s trying to do the dump and they’re contending with each other for parts of memory, it’s just, it’s amazing. When you have the right tools, and I know Mike Brown was on one of your interviews recently and he’s going to talk about tools for troubleshooting, I think he’s talking about made of built-in tools.
I know he was not getting into it, but there are certainly other tools out there, Fusion Reactor, C-Fusion, there’s Java APM tools, all sorts of tools. My point is when you have the right tools to watch what’s going on in your environment … We tend to assume that when there’s a problem, a, it’s ColdFusion. When are we getting of ColdFusion? When are we moving to something else? Or somebody in the organization is saying that. Or b, we think, “Oh, it’s talking to the database. I must have to tune my SQL or I got to tune my code.” And you start looking at all these examples will have … I’m telling you, when you have the right tools you might be stunned to find out that the cause of the problem, what’s exhibiting the problem, is not what you expect. And then when you dig into that you find out, “Holy smokes, this is not real traffic.”
And I’ve helped people, I think I said this already, I’ve seen it be 80% of their traffic. If this could be causing load, if this could be making requests slow, either on occasion in chunks or if it’s even just like a home always in the background, there’s always noisy requests. You might start to think, “Hey, ColdFusion can’t handle the load, let’s get to a bigger box, let’s go to cluster instances, let’s deploy into the cloud.” Well, sometimes, if you just find out what the real problem is you might be able to fix it and obviate the need to do some of those … So again, sometimes I’m pointing out people … I bite my tongue sometimes because I know some people are like, “Well, this is our chance to get a new box or to get on a better [inaudible 00:11:14] platform.”
So, I try to read the audience to know what people want, but my point is I definitely help people avoid major infrastructure improvements because they just presume that the problem was something that was insurmountable and it was just inherent in ColdFusion. And I was able to show with right tools, sometimes it wasn’t all what they expected, and it’s not at all unusual to find that it’s about spiders and bots and other kinds-
Michael Smith: And of course, it’s someone in your organization who’s trying to get you to move the site to another language, that isn’t going to solve this problem at all. They’d move to PHP or whatever, Ruby on Rails, and they’d still have the unexpected load problem. And probably worse because a lot of those languages don’t handle load as well as ColdFusion servers do.
Charlie Arehart: Well, what I will say is that I first started putting this talk together as a concept to propose to a SQL server conference, you all might know about [inaudible 00:12:10]. Because this could absolutely hit the database server. I mean if you stop and think about it, the database server would get stressed by this traffic as well. So, your app server gets stressed, your web server gets stressed, the database server gets stressed and it doesn’t really matter what the application server is. It could be ColdFusion, it could be PHP, could be Ruby, could be Python.
So, yeah, you could go on and on and … I’ll say too that this isn’t new information to a lot of people. This is stuff that’s been … People been suffering for a long time. And there are tools to try to help it and there’s tools that have over the years been tried to be used but they don’t work well. Let me say one other thing, which is really surprising. Sometimes people will use tools like say Google Analytics or other similar things to watch what their traffic is and they’ll say, “You know, we don’t have that problem. We can see … We’ve got three, four requests for second and it’s always real people through whatever means they’re using to gauge that.”
Well, it turns out some of the tools that are trying to show you traffic, they specifically don’t show you spiders and bots because they think you’re interested in who are your real visitors, where are they from, what are they doing, how long are they staying, what pages do they go to. It’s often these kind of tools that are out there to help with quote observing how you traffic [inaudible 00:13:34]. They’re focused on persuading you where you should be focused on you know improving your search results and improving your conversions and improving purchases or whatever.
And I’m not saying those are bad, but I’m saying some of those literally, specifically filter out spider bot, hacker, monitor traffic. And by the way, let me throw out another couple more I just thought of. Load balancing things. People often have their load … If they’re in an environment low balancers, the load balancer’s sending a ping every second, every five seconds, saying, “How you doing? How you doing?” That’s calling the CF page, sometimes they literally tell it to call the front page of their website that’s going to be doing queries to say, “Yes, use your answer.” And sometimes that traffic adds up.
And then monitors, it really becomes almost like a … I won’t cell phone [inaudible 00:14:30], but it eats into itself when the performance gets to be bad people think, “Well, let me get some monitoring on it. Uptime, WhatsUp Gold, there’s all kinds of tools out there and they’ll have [inaudible 00:14:42], either they or colleagues or clients are sending requests in saying, “We would need to get evidence of when the server is not responding.” And you look and you start to find out, “Holy smokes, we’ve got 30 or 40 of these every minute coming in.”
So, it’s just amazing when you see how much automated traffic can come in. And it can come from seemingly legitimate, reasonable places, but it can also come from surprising places that could be …
Michael Smith: It does sound pretty frightening in some ways that all this traffic can be there and some of the tools don’t show it to you. I mean, mental image I have is if you were like managing a fast food restaurant and you had your regular customers coming in to get their hamburgers and shakes fries and whatever. And then suddenly a whole bunch of people who weren’t really there to buy any of your stuff started piling into the restaurant at random intervals. And some of them are American and some Russian and some are Chinese and some you actually invited in but you forgot you invited them in and suddenly your restaurant’s packed with people and your real customers can’t get to the front of the counter to get their order.
Charlie Arehart: Absolutely, you’re right. And it’s funny you say that because I had literally in my own mind, as I was creating this presentation, at the various points I started to think, “What would be analogy.” And that’s one of the analogies I did think of and I’m not sure if I’ll still use it because I wanted to evolve the analogy through the different scenarios and it started to break down, but I absolutely, you’re right.
I mean, the bottom line is you are sometimes serving traffic that you don’t even realize is hitting you and you may be going through hoops to serve it. And some of that you might go, “What? Get out. Stay out. I don’t know who are. Stop coming in here.” And we’ll talk about mitigating later, but if it is something simple like that, sometimes it literally can be a 10 second job to say, “Look, no, you don’t come in here anymore.” And boom, they’re gone.
Michael Smith: No shirt, no shoes, you don’t come into the restaurant.
Charlie Arehart: Yeah, yeah. Good point.
Michael Smith: So, you mentioned some of the problems with the server being slow. Are there any other problems that this unexpected traffic can cause?
Charlie Arehart: Well, when you stop and think about the traffic coming in, that the reason why it becomes really troublesome is that almost always these automated requests, they come in as if they’ve never been to your server before. And what I mean by that is that if you were CFML developers, you typically use a session scope and you have somebody come in and you might track that they’ve come in and you set some session variables and you know that as they go from page to page to page to page they’re going to have that session for several pages, maybe for half an hour, an hour, whatever you set it to.
Well, that happens because those clients in the browser typically are sending cookies to ColdFusion and those cookies CFID, CFToken, J Session ID, depending on how things are configured. Those cookies cause ColdFusion to go, “Oh, I see you’ve been here before recently. I’ll use your cookie to connect you to your session and it’ll keep the session.” Well, the problem with almost all of this automated traffic. Monitors, security scanning tools, load balancing pings, spiders, bots, hackers, they tend to not bother sending any cookies, because again their thought is, “Look, I just want the page. Give me the page. Shut up, give me the page. Just give me the page. Give me another page.” Seriously, that’s the way they are.
Michael Smith: They’re greedy.
Charlie Arehart: What’s that?
Michael Smith: [inaudible 00:18:17] a little greedy.
Charlie Arehart: Oh, they are. It’s crazy. When you stop and start to study this, you start looking to go, “Guys, chill out, calm down.” So, yes, there may be ways we’re going to say, “Calm down, slow down.” But they don’t present the cookies, and because they don’t present the cookies guess what ColdFusion does, it says, “Oh, no cookie, no problem. Let me create a [inaudible 00:18:35] for you.” Now, [inaudible 00:18:37] you might go, “Oh, big deal. So what?” Well, here’s the big deal. Think about this. What do CFML developers do when a session starts.
You might have it on application or on session start method in your application’s CFC, or if you’re an old school application CFM developer, you’ve got some code that says, “If not is defined session whatever, do some stuff that you put into the session scope.” And you might want to be a good soldier for your users, you might do some queries and call some CFC’s and stick them in the session scope thinking, “Hey, I’m just doing this up front. You’ll enjoy it, trust me, as you go from page to page. I’ve already done those queries for you, you can just rely on them being there.” Well, that’s all great for real people, but boy, if you stop and think about it, if you’re getting 30, 40, 50000 requests a day that are from some automated client and it might even be a million requests a day.
You’re now doing that session startup code on every request, not just once for several pages for a given user, but if a spider or a hacker or monitor security tool comes through 100 times a minute, you’re going to do that session startup code 100 times a minute. And again, you might go, “Come on, computers are powerful and fast, why I need to worry about this?” But you might do some pretty heavy stuff, you might do some queries, you might call some CFC’s, you might log something, who knows. But you’re now doing it possibly 100 times more per interval, per second, per minute, per hour, you might be doing 100 times more than you thought you were ever going to do.
So, that’s impactful, right? That stresses the database, that stresses your fall system maybe, whatever. But then it gets worse, because where does this session get stored. In memory by the fall.
Michael Smith: In memory.
Charlie Arehart: In ColdFusion, in the heap. You run out of heap, you go digging into why you’re running out of heap. I’ve helped a lot of people with the right tools go, “Hey, let’s see how many sessions you have right now.” And with the right tools it’s pretty easy to see and I say, “How many sessions you think you should have? Let’s just guess.” If we see there’s a memory problem and it’s in the heap and we haven’t found anything obvious I say, “Let’s just think, it could be sessions. How many sessions you think you have?” And somebody might say, “Well, okay, what … A session last for about half an hour, assuming that the people stop working with their site, maybe we expect 100 people to be logged in.” “Okay, so couple hundred sessions right now, maybe, typically.”
And we go look at a tool that tracks how many sessions there are. I’ve seen 200, I’ve seen 2000, I’ve seen 20000, I’ve seen 200000, I’ve seen 400000 sessions. Well, then the question is what do you put in those sessions. The more stuff you put in the session, the more weight it has, the more memory it’s using. The more sessions you have, the more memory you’re using. And then your session time out. And I’ve helped people dig into it and we might find out, “Hey, this session timeout isn’t 20 minutes, it isn’t an hour, it isn’t two hours. It’s eight hours or it’s a day.”
Because some users were complaining they didn’t like being logged out during the day so they said, “Let’s make the session timeout longer.” Well, that was fine for them but what about for real people? All of a sudden … For spiders and bots. So, all of a sudden you find out, not at all unusually, that you’ve got tens of thousands, hundreds of thousands of sessions. And that on the surface is not a problem, but again, what do you put into the session. If you put a lot of stuff in the session it starts to get very heavy use of heap.
And there’s ways to deal with that, so I’m not saying that that’s necessarily wrong and it’s not necessary going to be a problem for everybody, but I’m saying for some people, as we dig into these things one thing leads to another and boom, there’s your problem. And what’s the real root cause and what might you do to mitigate it.
So, there’s a lot to it and I just want to get people thinking about this problem. Some are well aware of it, others were like, “Never heard of it, never thought of it. Didn’t realize it was that significant a potential problem.” Or they’re thinking, “It’s not for us.” And I’ll give you the tools to be able to find out for yourself is that is.
Michael Smith: That sounds amazing Charlie. And I’m just thinking of that little metaphor we had earlier of the fast food restaurant. It would be like the manager of the restaurant wants to be extra helpful, so everyone … Time someone comes in the door, they like stroll another burger or on the grill, they’ve put some more fries in the thing. And if enough people come in they start hiring extra staff to come in the restaurant, but they’re not real customers coming in-
Charlie Arehart: They’re just coming to have a glass of wine. Or they’re just coming to the door to use the wifi.
Michael Smith: Yeah. And suddenly your whole restaurant is piled to the ceiling with frying burgers and the whole thing’s not running good.
Charlie Arehart: Or lots of tables for them to sit at and all they’re doing is using your wifi and they’re not buying a thing. There’s definitely those kinds of analogies that eventually could help some people connect the dots. So, yeah, good for you.
Michael Smith: So, wait a minute. Some people here have intranet ColdFusion sites, so aren’t they safe from all this stuff?
Charlie Arehart: Well, I’ll say this, is that I would always … If I’m troubleshooting with somebody we’re going to follow the diagnostics and let the evidence lead us. So, if they’re having a memory problem we’ll look at sessions. If there’s a lot of sessions we might say, “What might those be from?” And they might be an intranet site and they’ll say, “So, you know, we shouldn’t have more than a couple hundred sessions at any one time and I’ve yet found sometimes no there was.” And then we dig into and find out well why was that so.
For instance I’ve seen security scanners. You may not even know this but somebody else in your organization whose responsibility is security, may be scanning your servers to see if they can break in. So, you might find a cluster of requests, perhaps briefly, maybe for half hour, an hour, but you might think that, “Well, they’ll do it at night or they’ll do it on the weekend or they’ll tell me when they’re going to do it.” No, they don’t want to tell you, that’s the whole point. They don’t tell you when they’re going to do it because they’re trying to say, “Can we break in without anybody doing anything fancy to protect it?” And hopefully they can’t break in, but the point is you never know when they’ll do it.
Then I’ve also seen people with internal say Google Search Appliances, they used to be pretty popular years ago, they still are for some people. But some people need to have their intranet website tech searchable and so they’ll go buy the Google Search Appliance or maybe some other implementation, but the point is that’s going to act just like Google. It’s going to be hitting their pages and it might hit them every day. And it just depends on the nature of the site.
I remember helping somebody once, a few years ago. They had a site where they sold … Well, I’ll just say widgets, I don’t want to give it away because if I said it literally would … Someone would give it away. Let’s say they sold widgets and their website had the categories of widgets and the manufacturers of the widgets, the styles of widgets, the colors of widgets and for each of those you could see a list of the widgets and let’s say for the color red it would have a paged interface through all the widgets that they had, maybe 100 or 1000 widgets that were red and it had interface of 10 on the screen and one, two, three, four, five, six, seven, eight, nine, ten and then dot, dot at the end.
So, users could just go hopping along looking at the 12 group of widgets if they wanted to. And that’s great for people but guess what, that’s like honey to a bear for spiders and bots. They’re following every link and they do the first time they come in, they literally follow every link and what they do is they shunt off to a server somewhere, here’s the URL, here’s the URL, here’s the URL. So, for your site they end up with sometimes not unusually, 1000 URL’s and so now their server says, “Okay, every day we’re going to come visit these 1000 URL’s.”
And maybe then once in a while they also are looking to see do you new URL’s, but they’re visiting these pages every day, just a list. I’ve got a list, I’m coming to get it, give me the page, give me the page, give me the page. And that sort of thing could happen in an intranet site with some sort of an internal search engine. I’m just saying your application may be more prone to having something trawl through it like that, and some of these internal things, they may do it every day and they may do it a 1000 times a day. And even if they spread it out to be every five seconds, well some of your pages might be heavier than others, and so while your real people are trying to get stuff they’re contending with these every five second automated requests coming in.
And then I said, “It could be a load balancer ping, it could be an internal monitor.” So, yeah, there’s a lot of things where … Don’t think spiders only means Google and such. There’s all kinds of automated requests that could be made even in an intranet site and it can be devastating.
Michael Smith: So, let’s talk about mitigating and how you can solve these problems, but I think first the thing that a lot of people are going to think is, “Well, don’t I just change my robots.txt file, doesn’t that deal with this?”
Charlie Arehart: Right. And that was a standard that came out several years ago, to tell spiders and bots, legitimate ones, “Hey, we don’t want you coming into this section of our site.” Or, “We don’t want you coming to the site at all.” So, for those who don’t know it, that was a very simple, standard … You can just Google robot’s text and you’ll find out about it. And it’s just the file where you say, “In this root directory I put this file and for this website and who’s this … Whose [inaudible 00:27:52] directories placed, a good spider should look at the file and say, “Is there any place I’m not supposed to go? Otherwise I’m going to go there.”
Well, I said very carefully there, a good spider is going to [inaudible 00:28:04] file and say, “Should I go there or not?” That spider’s going to go, “Robots.txt, you’re cute. Give me the page, give me the page, give me the page.” So, that’s what happens literally and so, no, the robots.txt … It might help. So, that goes back to one I said earlier. Let’s say it was some specific … Like Yandex from Russia, which is surprising how much you see. Baidu from China, Yandex from Russia and some other ones. They’re good guys. If you tell them in the robots.txt, “Don’t come here.” They won’t come there.
Very quickly you put in there and say, “Stop coming.” They stop coming. I’ll tell you also that … And I’ll talk about this in the presentation. Most web servers have some provision for a feature to also block them. Now, some people think of blocking things by IP address, but the IP addresses for these things are changing all the time, it’s like playing whack a mole. Don’t get into the game of trying to block by IP address, but you can usually block by user agent.
And the user agent is a self identifying string that a browser or one of these automated requesting tools typically will say, “Here’s who I am.” And they might say, “I’m Baido, I’m Yandex, I’m Google, Google Bot, Yahoo Slurp.” So, you could also with some web server tools tell them, the web server, “Hey, if you see something that looks like this, keep them out.” So, that was what I was getting at earlier when I said you can very quickly block something even if they don’t honor the robot’s text. If they present a user agent you could tell it, and I’ll show you that in the class how to do it, point you to resources for more, but you could block a given user agent.
Now, unfortunately some of them don’t identify themselves. I mean, this just keeps going. So, some of them want to seem to you to be legitimate. So, they’ll present what looks to be a legitimate string or even worse, you’ll have bad guys … Seriously, listen to this. You’ll have bad guys who will say they’re Google. They’re bad guys trying to do naughty things but their user agent string will say they’re Google because they can say whatever they want, there’s no way to authenticate this stuff. And you will say, “Oh Google, you’re welcome, come on in. Please have another beer on me.” But they’re coming in and doing naughty stuff.
So, I don’t want anybody to think that these things are all foolproof, but you also need to think carefully, be wise, because you won’t always succeed in blocking them through some of the means that you’ll think of, but there’s some other ones.
So, something else I’ll talk about in the presentation is. There are services now where this is what they do for a living. Their living, their money is made by providing to you a service where they’re going to watch your traffic coming into your server and they’re going to look for these characteristics. And they know about the ones that are legit and how to block them and they know the ones that are illegited and how to find them. And they use [inaudible 00:30:58] and they use learning and some human assisted discovery of study of things.
The point is you’re paying them to say, “Hey, please just take care of this problem for me. Just wrap my site in a condom and keep that crap out.” Seriously, that’s the analogy to think of. And I’ll mention some in the presentation, but there’s a few of them and some of them have free levels of service, some of them have paid levels of service. Some have crazy expensive paid levels of service, but the point about them is that you do some configuration of just a few minutes, literally it could be just a few minutes, you make some configuration changes and now when somebody comes to your site, before they go to your site they go through one of these services in it, basically washes the traffic.
And if it doesn’t like what it sees you can configure to just track that it’s happening, in case you’re not trusting the tool. You can have the traffic … All the bad traffic just monitored and you can look at it and go, “Wow, yeah, holy smokes, please turn it on. Block all this traffic.” Or you might look at it and go, “Whoa, whoa, whoa, this is legitimate traffic. These are real requests. So, we need to tweak the tool to not block them.” But the good news is that many of these tools, the lessons have been learned over recent years and some of them are literally staying up on this stuff every day. So, they may have a new version of their file and their service may look at a new version of their rules to decide how to handle things every day.
I know some people are thinking, “I’ve never heard of this before.” Others, I said at the beginning, are well aware of it. And there’s others for whom this is their livelihood, bad guys trying to get in and good guys trying to handle it. So, it’s like a [inaudible 00:32:42] of security problems and there’s black hat guys and white hat guys and just like we tend to ignore security until it bites us, I understand most people tend to secure this stuff until it bites them. So, I’m trying to bring it to your attention and wherever you are on the spectrum of understanding of the problem I want to give you resources to be able to go dig into it more on your own.
Michael Smith: Yeah, I was talking on the podcast with someone from Engine X … I’m spacing out on their name, but it will come to me in a moment.
Charlie Arehart: Speaker on them, from them.
Michael Smith: Yeah. And that-
Charlie Arehart: They have some [inaudible 00:33:20] abilities to help with that.
Michael Smith: They’re basically a virtual proxy firewall security thing. I mean, you can code it, so it’s not hardware. So, I can see that there’s a lot of things you can do to stop people at the front door before they come in and make your server or your restaurant crazy.
Charlie Arehart: You can use the analogy … And it’s the same with security. There’s a fence around your property, there’s walls around your house, there’s windows that may or may not have good locks, there’s doors that may have a double-keyed dead bolt, there’s … Somebody might say, “I want to have cameras.” And somebody might say, “I want detection of movement out there.” Somebody might say, “I want a guard posted at the door.” Somebody might say, “I want to have a guard inside the house.” And somebody might say, “I want a guard who sits in the room while I’m sleeping.”
I mean, there’s a range of security needs and expenses and services out there. And I’m saying this is like a subset of that, in that … Don’t let that discourage you, for some people … If you would just do anything. It’s kind of like if you’re getting annoyed by Nets, just having a screen on your window will keep the Nets out. And that’s going to be, for some people, all you need, because you’re just being annoyed by Nets. But obviously, if all you have was a screen and a bad guy wants to get in he’s just going to cut the screen and step right in.
So, you’ll need increasing levels of protection for whether it’s security things or these kind of automated request. We call the attacks, I don’t know if they’re attacks, it’s just annoyance, automated request annoyance and how annoying it is to you. I’m saying there’s varying levels of controls to help with it and there’s tools to help identify it. So, it’s an exciting and interesting subset of stuff that, for some people it’s just a big news prize for them.
Michael Smith: Some of these could actually be attacks. They could be a denial of service attack or a security attack.
Charlie Arehart: Absolutely [crosstalk 00:35:25]. A bad guy could be wanting to just make your site suffer because they’re a former clerk who you ticked off or somebody might be trying to do united with all the mess that’s going on with united these days. I mean, all it takes something like that and you get … People could launch … Because there’s bad guys who have broken into people’s computers all over the world and they might have control of thousands of them and they can just flip a switch and those thousands of computers will send requests to your server trying to get stuff then you just think, “Oh my gosh, we’re so popular. We’ve got real visitors coming in.” No, it’s all automated crap coming in and it’s bringing your server to its knees.
And somebody could be smart about it and just send a low level of several requests per second and you might generally be able to handle that, but if that noise of several requests per second is always running and along comes your real people and then suddenly something’s a little bit slow, well they are suffering now because at the same time they’re just trying to do their random few requests per minute. You’ve got several requests per second that are now starting to slow down, and if it was noise that’s going to hurt your server.
So, it’s just amazing. The bottom line is again, if you have the right tools and Mike Brown’s talk will get into it, you might notice I’m not really talking about this, I talk about troubleshooting tools and tuning and stuff like that. It’s kind of a what’s the root cause of some of the problems, so I’m saying, “Then might you want to dig in with tools to really understand the problem and see what the impact is.” But I’ll give you some insights to think about.
Michael Smith: Cool. So, are these spiders and bots getting smarter every year and so harder to handle?
Charlie Arehart: Yeah. Well, that’s what I was getting at there about the tools there are evolving over time. So, yeah, you could find that you may find some of your mitigation efforts start to not be as successful, because they’re getting smarter. The black hats are always trying to stay a step ahead of the white hats, but that’s where those … For some people you’ll want to then just bite the bullet and pay somebody who, that’s their living, they make tools that watch out for that kind of traffic and they’re staying on top of that.
But I would also say don’t somebody go … This just sounds like one of those things where, like with security, I’m just going to bury my head in the sand and wait till I’m hit by the problem, because it’s just too complicated. No, the good news is that with the right tools it’s really pretty easy to tell when this is happening, it’s really pretty easy to take care of potentially significant percentages of this.
It might literally be that some knucklehead user agent that you never heard is sending a 1000 requests in an hour and you just block them. And if they don’t trust the robot.txt you block them with some web server blocking on the user agent. And if they’re going to identify the user agent you are free to block based on it. I’ve seen that be what took care of some little weird spike that was killing somebody. So, don’t let this stuff sound too overwhelming and complicated, it’s really not. It’s just a matter of being aware of it and thinking about it and then using the right tools to find it and solve it.
Michael Smith: Well, that sounds like a really interesting talk. So, I’m looking forward to seeing you at the event and hopefully people listening will have learned a lot from this. So, let’s just switch gears a moment here. This is question I’m asking all the speakers Into The Box. Why are you proud to use ColdFusion?
Charlie Arehart: Well, I will say that … Some of the people here will know that I’ve been in ColdFusion for 20 years. I started in ’97 and I’ve stuck with it throughout. I’ve had a brief period for a while but I worked with an alternative CFML server engine. And after about three years of that I came back. So, I’ve been fully invested in ColdFusion since about 2006, so that’s 11 years. And started in ’97.
So, I help people use it every day. I help people solve problems with it every day. I help problems go away. So, as people benefit from using it and we fix the problems they’re able to stay with it. I’ve helped a lot of companies who were on the verge of saying, “This is the last straw. Maybe it’s issues like we’re talking about or maybe it’s just some performance troubleshooting problem.” And they have spent weeks or months potentially, or lots of hours trying to solve it and they can’t and they think, “That’s it. This is the last straw. Let’s move to X.”
Sometimes I’m able with just a half hour, an hour of looking at things, connecting the dots, solving the problem able to keep them in it. So, that’s where my pride about it comes in, is that I know that it can be useful for many people and that … And don’t believe the people who say, “Nobody uses it.” Because I get new customers every week. I’ve got over a 1000 customers last 10 years. And they’ve run the gamut from small to huge to agencies to government to see universities and all kinds of segments in between.
So, it’s used very widely and I know you know that because you’re very active in connecting people in the community. And through my other activities like the ColdFusion meetup. There’s over what, 70 … 2700, somewhere I can remember, but it’s a big number. Thousands of people belong to be online ColdFusion meetup. I’ve had a harder time recently getting speakers to come on. People seem to just not be into doing online presentation very much these days for some reason, just like they don’t do live ones anymore. So, they do conferences and that’s it. But I’d love to get the speakers. Hey you speakers who are watching this, speaker come give your presentation on the online ColdFusion meetup.
But anyway, my point is that there’s a huge amount of people that are still using ColdFusion very happily. Sure there are some that have left, some that leave very noisily, some that leave grumpy and never want to look back, others who leave and say they’re not coming back and then they end up coming back. Again, my pride would be in saying that I know it does great things for people. Sure, there are some people who stumble with it or use it for things that they ought not or they have old code that isn’t performing well and they may want to rewrite it, but I know that there’s a lot of people that use and are very happy with it and who continue to develop with it.
Adobe has even said, I don’t know if anybody else has brought this up, but they said in a blog post last year, in a comment in one of their blog posts, about CF 2016 when it came, out about three months afterwards they said in a comment that they were selling 2000 new licenses of ColdFusion. I think it was every year or maybe it was every quarter. And people said, “Oh you mean updates or you mean upgrades or you mean maintenance subscription renewals, right?” And he said, “No, no, no, no, these are new people who’ve never had ColdFusion before buying it.” That just flies in the face of the conventional wisdom that nobody’s using ColdFusion anymore, it’s a dead language, only government uses it. And it’s just not true, just not true.
So, my pride is in that it’s a lot more capable and widely used than most people realize, that’s the flag I like to carry. But I understand choosing to use other stuff and I’m not knocking anybody who uses something else. I’m just saying everybody don’t jump on the CF people as if they have no knowledge of what makes sense. They’re doing what makes sense for them and for many of them, they’re very happy with it.
Michael Smith: I appreciate you saying that Charlie. And you do a lot for the ColdFusion community, with your CFO [inaudible 00:42:57] site and other resources you put out there, as well as all the talks you give. And you mentioned that there is that … [inaudible 00:43:06] that’s been going around, that ColdFusion is dead or dying, which has been going around for 10 years and ColdFusion is still thriving and doing well. So, my question here is what would it take to make ColdFusion more alive this year.
Charlie Arehart: Well, I will say that the spate of conference this year is a sign that it is more alive than most people realize. I mean, we’ve got Into The Box, we’ve got the Adobe Government Summit next week, Into The Box the week after that. We’ve got to see CF Objective in July. We’ve got and NC Def Con in October. There was The Mirror conference in February. It’s much more alive than people realize, I think that’s what I would say. As far as saying what would make it more alive …
Michael Smith: Well, and don’t forget the Adobe ColdFusion Summit, because that’s [crosstalk 00:44:00]
Charlie Arehart: Of course. And I knew was that I wouldn’t think of. Yes, so the Adobe ColdFusion Summit. I hat it in my brain. And that’s a great one. A lot of people don’t know about that Adobe puts on and they substantially subsidize it. So, it’s really surprisingly inexpensive and it’s a couple days every year in Vegas. And I’ve been very happy to be able to speak at every year and most of these conferences I’ve had the chance to speak, but I was going to say if anything would help make it more alive I would say it’s a certain levels of greater involvement by Adobe, whether it’s in marketing, whether it’s in support, whether it’s in Evangelism.
I know that’s a sore subject for some people. And I know Adobe get beat up a lot on it. But I would just say that if anything was going to make it more alive I would think that … We could hope that the community would come out, but I think the community’s getting really dispersed and there isn’t a lot of folks like you and [inaudible 00:45:02] with their effort. There are some devoted and specific areas where there’s effort and energy, but to really … I think to really make [inaudible 00:45:12] more alive I think it would really take some new efforts by Adobe.
I think their hands are tied being a big corporate entity and ColdFusion is just one staple of dozens of products and it’s a different one than any other. I think we may be hard pressed to get much in the way of improvement there, but I would argue that if they did, that would help make it seem more alive.
Michael Smith: Yeah. Well, those are great suggestions. I mean, given that I would just suggest everyone listening to the ask yourself this question, what would it take to make ColdFusion more alive this year as regards to my own action. Is there something I can do … Can I find a user group to go to? Could I start a user group? Could I blog about the great stuff I’m doing with ColdFusion?
Charlie Arehart: Put together a talking, offer it on the ColdFusion meetup.
Michael Smith: Yeah. Put a talk together and do the meetup or go hang out at the-
Charlie Arehart: [crosstalk 00:46:03] podcast you’re doing more than just these, right? You’re going to start to do other interviews.
Michael Smith: Yes. I mean, I was inspired … I asked myself this question because I was tired of hearing the ColdFusion dying things, so I created this ColdFusion alive podcast. And just a highlight how alive it is and how modern a language it can be.
Charlie Arehart: Or more than people realize.
Michael Smith: And [inaudible 00:46:27] is doing a lot of work on all that box products, making what they call the modern face of ColdFusion. And a lot of other people out there are doing great things and I’d like to … If you’re doing great things let me know and I’d love to interview you.
Charlie Arehart: Sure. And then there’s something else you’re doing, but before I ask you about that I just remembered another one, CF Camp. So, CF Camp takes place in Munich and generally about October, November timeframe, and that’s another wonderful conference. We have a lot of wonderful conferences literally all over the world, but aren’t you also doing also some online sessions, did I see something where you’re doing some variant of online presentations, you’re letting people give online talks? Or maybe I’m confusing it with the podcast.
Michael Smith: Yeah. Right now I have done webinars in the past, and I ran into the same issue you had, that for whatever reason people didn’t want to [inaudible 00:47:25], but people seem very happy to do podcast interviews. So, I’m doing that and I’m currently doing it for Into The Box conference, I’m happy to do it for any other ColdFusion conference out there.
Charlie Arehart: You can be busy all year with as many as there are.
Michael Smith: Well, indeed. Unfortunately there wasn’t … Until I started this there wasn’t the ColdFusion active podcast. CFI had an episode a year ago but that kind of wound down now. And then I’ve decided, running a podcast takes a lot of effort and energy and money to do. You know, if you do show notes and get a transcript and promote it out as much as you can, it’s not insignificant in the effort it takes.
Charlie Arehart: Yeah. So, good for you for putting this together, hope it goes well.
Michael Smith: Yeah, it’s going great. My goal is to get to 1000 downloads a month by the end of the quarter. So, we’re well on our way there. If you can share this episode out and share the podcast, subscribe on iTunes, leave reviews, those are all great things to do, to help get the word out the word.
Charlie Arehart: Spread the word.
Michael Smith: Yeah. So, anyway, mentioning Into The Box conference, what are you looking forward to at it, Charlie?
Charlie Arehart: Well, it’s a great community of people and they’re very friendly. It’s a small group, so it’s very intimate and you’ll be able to really make great connections, so there’s that. There’s some elements of fun about it that are unique to Into The Box and I’ll just say if you’re not coming then you’re not going to know what you’re missing, but if you come you’re going to find out that there’s a lot of fun. There’s just a great spirit behind it, that is absolutely true, it’s just a great spirit behind the conference.
Michael Smith: The guys running it, Lewis and Gavin and Brad and all the people there, they really work on making it fun. They’ve got their heart in ColdFusion, so that’s one of the reasons I stepped up to help them get the word out, because they were too busy organizing the conference to get the word out enough.
Charlie Arehart: You know what that’s like, right? So, for those who don’t know, Michael used to run CF United conference, which ran from about 2003 to 2010 and then there were all-
Michael Smith: There was even one before … We started in ’99 with a-
Charlie Arehart: With the small ones.
Michael Smith: What did we call it, C Fund, it was called C Fund back there.
Charlie Arehart: I was there and I did an article in the ColdFusion Developers Journal about it. I still have that on my website, that was at the NIH conference.
Michael Smith: That’s right. Good memory Charlie. I think you may be the only speaker who spoke at every single one of those …
Charlie Arehart: We started going around to the different hotels in Bethesda and Rockville and Gaithersburg. And then you started with the Maryland CFUG, which I remember the earliest ones-
Michael Smith: Yeah, that’s right. That’s when you lived in Maryland.
Charlie Arehart: And then you started doing that, you [inaudible 00:50:32] offices there behind Parkland building.
Michael Smith: Yeah, we ran that for 11 years too. So, I kind of picked it up a bit with the webinar thing, but …
Charlie Arehart: But I just wanted to shout out for those who maybe don’t know because they’re just watching Michael give these podcasts. Michael was the guy behind … I mean, of course had a team of people and I know he’ll want to give credit to the team of people, but he was the driving force behind the CF United conference, which in its time, in the 2000’s, was the biggest conference in CF and it was very well attended, very well appreciated.
But it’s a lot of work and that’s why I brought it up, is that he was running it and his team of people and then others that he brought in, they were all busy doing that and he can attest to the effort it takes to run even such a big one and to run a small one, you can just get caught up in the details and miss out on opportunities for things like Evangelism. So, good on you for doing them, thank you for that. And thank you for your-
Michael Smith: You’re welcome. I had a great time doing it and I’m glad I did it. And I’m glad other people are doing it now. I was trying to search for the tweet that Gavin made, I’m pretty sure they’ve got record numbers this year Into The Box, but I can’t remember the exact number. So, great. So, anyway, let’s wrap up this episode. If people wanted to find out more about you Charlie, how could they find you?
Charlie Arehart: I’m really easy to find. First, my website is carehart.org, so that’s my first initial C, and my last name Arehart, A-R-E-H-A-R-T, carehart.org. So, that’s easy to find, if you just Google me you’ll find me, if you Google Charlie ColdFusion you’ll find me. And then all on Twitter, LinkedIn, Facebook, it’s all just CareHart, one of the benefits of having a relatively unique last name. So, it’s … My Twitter handle’s Carehart, LinkedIn is Carehart, Facebook is Carehart, [inaudible 00:52:32] is CareHart. So, I’m easy to find if you can remember how to spell my name, that’s half the battle.
Michael Smith: There you go. And I’ve just realized reading your name, you can actually read it as Carehart, without the e in the heart.
Charlie Arehart: That’s why I did it.
Michael Smith: [crosstalk 00:52:46] Caring nature.
Charlie Arehart: I did it partly … Two reasons. One was, people did often not know how to pronounce it, because we just see things in writing and we don’t talk to people. So, we don’t know how to pronounce somebody’s name when it’s potentially variable like that. So I heard a lot of Arhart and I just thought … Care, that would kind of force you to say Air, Carehart.
And then the other motivation was many of you will remember that Sean Corfield, his site was called corfield.org, and it was around the same time that I was creating my domain and about 2006 that he was making great strides in the community and everybody knew about corfield.org and I just thought, hey, carehart.org kind of has the similar ring. So, that’s why I created it and have stuck with it.
And still even in my business, the consulting that I do, I just do troubleshooting all day, every day I go from one client to another doing remote CF server troubleshooting, LUCY and other things. I still just go by carehart.org. If you write the check it’s carehart.org. I probably ought to be like Carehart consulting or something like, but I just call it carehart.org.
Michael Smith: Great. Well, thanks for being with us on this podcast, Charlie.
Charlie Arehart: My pleasure, good to see you.
Michael Smith: Yeah, you know it.