Charlie Arehart talks about “Are spiders eating your servers? The impact of their unexpected load and how to counter it” in this episode of the ColdFusion Alive Podcast, with host Michael Smith.
- Why your ColdFusion app being slow might not be your SQL, JVM, or even your CF code
- What exactly is unexpected load
- What kind of unexpected traffic have you seen
- Is it a lot of traffic?
- Is it common to find on CF servers?
- What can be the crazy problems with unexpected traffic?
- But, but, what about intranet-only sites – aren’t they safe from unexpected traffic?
- How you can mitigate these problems fast
- What about robots.txt? Doesn’t that block bots?
- Ok, but aren’t spiders and bots getting smarter, so harder to handle?
- Why are you proud to use CF?
- WWIT to make CF more alive this year?
- What are you looking forward to at Into The Box?
- The move to the countryside
For years I’ve watched people try to tame “server problems” with a focus on their code, their SQL, the jvm, and so on. Yet often it turns out that the root cause is actually unexpected load. And that load may be from things you never expected (automated), at volumes you never expected. I’ve found folks with as much as 80% of their web traffic to be such unexpected automated traffic! Worse, there are characteristics of such automated visits that may actually have MORE IMPACT than “real users”: for instance, did you know they create a new session/client variables, and run session startup code, for each page they visit?!
The good news is there are solutions to better manage (or simply block) such automated requests which may already exist in your environment, and tools you may consider (some free, some commercial) which can be easily implemented. There are even SAAS solutions that could help alleviate such problems with just a single tiny change in your environment! You may also want to consider some admin configuration options related to sessions and/or client variables, as well as reconsider some coding choices in your session startup code.
In this session, veteran ColdFusion server troubleshooter Charlie Arehart will guide a more detailed review of the issues above, including how to identify such traffic, more on these specific impacts, and most important identifying the solutions along with their pros and cons. He has helped shops achieve dramatic reductions in impact from such automated requests, resulting in greater server stability and performance.
Mentioned in this episode
- Web spiders
- Yahoo slurp
- User agent
Michael Smith: I’m here with Charlie Arehart, veteran ColdFusion server troubleshooter. Hey Charlie. We’re going to be looking at some very strange stuff, which is why your ColdFusion might be slow, but it’s nothing to do with your SQL JVM or even your CF code. And what we’re going to be talking about is an unexpected load.
So, we’ll look at that and we’ll look at what different kinds of unexpected traffic you may have on your server and how much traffic that could be and how common it is to find. I’ll let you into a secret here, it’s pretty darn common. And what kind of crazy problems you could have on your server when you have an unexpected load.
You might be thinking, “Well, I have an intranet site, aren’t they safe from the unexpected load?” Charlie is going to let out, reveal the secret on that. No, he’s shaking his head. So, you better listen to this even if you have intranets. And we’re going to look at how you can mitigate these problems fast a lot of the time, other times it might take longer. We’ll look at the different kinds.
And you may be thinking, “But what about robots.txt, doesn’t that block all these things?” And unfortunately, I think the answer is no again. And also what about all these spiders bots that scan your site, aren’t they getting hard as a block and they getting small, so how do you handle a smart spider? It sounds dangerous. So, it’s a very interesting topic that you may not have thought about before. So, welcome Charlie.
Charlie Arehart: [foreign 00:01:36]
Michael Smith: [foreign 00:01:39]
Charlie Arehart: [foreign 00:01:42]
Michael Smith: Yeah.
Charlie Arehart: It’s going to be lots of Spanish and mariachi and all kinds of fun stuff.
Michael Smith: Yeah, si.
Charlie Arehart: Let’s not confuse anybody [crosstalk 00:01:55] It’s okay if you just speak english.
Michael Smith: Okay, we’ll stay in English. Yeah, we’ll speak English. We’re just joking around. So, yeah. Usually when people think about server troubleshooting, they usual go to’s are, they’re looking at their SQL or their JVM or their ColdFusion code, but what you’re saying is that … Of all those often our problems on servers, [inaudible 00:02:19] will be something else happening.
Charlie Arehart: Yeah. And it may be that those things have an impact. It may be that those things are being stressed in ways you don’t expect. And that’s really the bottom line of this talk. I think the title I had was our spiders eating your server and so that’s … What we’re getting at is all I do all day, every day is troubleshoot people’s servers, whether CF or LUCY or Railo or even some other things.
Most of the time when we’re doing things there’s some commonality, the configuration issues. Sometimes there’s coding issues, it’s not as often as people might think I mean. That’s another little lessons of this, I would say, is that a lot of people when they have trouble they focus on looking at their code, they focus on looking at their SQL, or if they have gotten past that or ran out of ideas then they start thinking about tuning their JVM. I’m telling you it’s really rarely those things.
About Charlie Arehart
A veteran server troubleshooter who’s worked in enterprise IT for more than three decades, Charlie Arehart (@carehart) is a longtime community contributor who as an independent consultant provides short-term, remote, on-demand troubleshooting/tuning assistance for organizations of all sizes and experience levels (carehart.org/consulting).