Charlie Arehart talks about “Are spiders eating your servers? The impact of their unexpected load and how to counter it” in this episode of the ColdFusion Alive Podcast, with host Michaela Light.
“These days a lot of people are concerned about site security and they'll … Either they or someone in the organization will run some sort of a scam against their server, but when they run that scan, again what are they doing? They're trying to call lots of requests with different forms of potentially bad break-in type code. And those tools, sometimes they're not as careful and they might make a request every second.” – Charlie Arehart
Show notes
- Why your ColdFusion app being slow might not be your SQL, JVM, or even your CF code
- What exactly is unexpected load
- What kind of unexpected traffic have you seen
- Is it a lot of traffic?
- Is it common to find on CF servers?
- What can be the crazy problems with unexpected traffic?
- But, but, what about intranet-only sites – aren’t they safe from unexpected traffic?
- How you can mitigate these problems fast
- What about robots.txt? Doesn’t that block bots?
- Ok, but aren’t spiders and bots getting smarter, so harder to handle?
- Why are you proud to use CF?
- WWIT to make CF more alive this year?
- What are you looking forward to at Into The Box?
- The move to the countryside
Charlie says:
For years I've watched people try to tame “server problems” with a focus on their code, their SQL, the jvm, and so on. Yet often it turns out that the root cause is actually unexpected load. And that load may be from things you never expected (automated), at volumes you never expected. I've found folks with as much as 80% of their web traffic to be such unexpected automated traffic! Worse, there are characteristics of such automated visits that may actually have MORE IMPACT than “real users”: for instance, did you know they create a new session/client variables, and run session startup code, for each page they visit?!
The good news is there are solutions to better manage (or simply block) such automated requests which may already exist in your environment, and tools you may consider (some free, some commercial) which can be easily implemented. There are even SAAS solutions that could help alleviate such problems with just a single tiny change in your environment! You may also want to consider some admin configuration options related to sessions and/or client variables, as well as reconsider some coding choices in your session startup code.
In this session, veteran ColdFusion server troubleshooter Charlie Arehart will guide a more detailed review of the issues above, including how to identify such traffic, more on these specific impacts, and most important identifying the solutions along with their pros and cons. He has helped shops achieve dramatic reductions in impact from such automated requests, resulting in greater server stability and performance.
Mentioned in this episode
- Web spiders
- Yandex
- Bidu
- Yahoo slurp
- User agent
- robots.txt
Listen to the Audio
Bio
Charlie Arehart
A veteran server troubleshooter who's worked in enterprise IT for more than three decades, Charlie Arehart (@carehart) is a longtime community contributor who as an independent consultant provides short-term, remote, on-demand troubleshooting/tuning assistance for organizations of all sizes and experience levels (carehart.org/consulting).
Links
(* WWIT = What Would It Take)
Interview transcript
Michaela Light: I'm here with Charlie Arehart, veteran ColdFusion server troubleshooter. Hey Charlie. We're going to be looking at some very strange stuff, which is why your ColdFusion might be slow, but it's nothing to do with your SQL JVM or even your CF code. And what we're going to be talking about is an unexpected load.
So, we'll look at that and we'll look at what different kinds of unexpected traffic you may have on your server and how much traffic that could be and how common it is to find. I'll let you into a secret here, it's pretty darn common. And what kind of crazy problems you could have on your server when you have an unexpected load.
You might be thinking, “Well, I have an intranet site, aren't they safe from the unexpected load?” Charlie is going to let out, reveal the secret on that. No, he's shaking his head. So, you better listen to this even if you have intranets. And we're going to look at how you can mitigate these problems fast a lot of the time, other times it might take longer. We'll look at the different kinds.
And you may be thinking, “But what about robots.txt, doesn't that block all these things?” And unfortunately, I think the answer is no again. And also what about all these spiders bots that scan your site, aren't they getting hard as a block and they getting small, so how do you handle a smart spider? It sounds dangerous. So, it's a very interesting topic that you may not have thought about before. So, welcome Charlie.
Charlie Arehart: [foreign 00:01:36]
Michaela Light: [foreign 00:01:39]
Charlie Arehart: [foreign 00:01:42]
Michaela Light: Yeah.
Charlie Arehart: It's going to be lots of Spanish and mariachi and all kinds of fun stuff.
Michaela Light: Yeah, si.
Charlie Arehart: Let's not confuse anybody [crosstalk 00:01:55] It's okay if you just speak english.
Michaela Light: Okay, we'll stay in English. Yeah, we'll speak English. We're just joking around. So, yeah. Usually when people think about server troubleshooting, they usual go to's are, they're looking at their SQL or their JVM or their ColdFusion code, but what you're saying is that … Of all those often our problems on servers, [inaudible 00:02:19] will be something else happening.
Charlie Arehart: Yeah. And it may be that those things have an impact. It may be that those things are being stressed in ways you don't expect. And that's really the bottom line of this talk. I think the title I had was our spiders eating your server and so that's … What we're getting at is all I do all day, every day is troubleshoot people's servers, whether CF or LUCY or Railo or even some other things.
Most of the time when we're doing things there's some commonality, the configuration issues. Sometimes there's coding issues, it's not as often as people might think I mean. That's another little lessons of this, I would say, is that a lot of people when they have trouble they focus on looking at their code, they focus on looking at their SQL, or if they have gotten past that or ran out of ideas then they start thinking about tuning their JVM. I'm telling you it's really rarely those things.
And to continue learning how to make your ColdFusion apps more modern and alive, I encourage you to download our free ColdFusion Alive Best Practices Checklist.
Because… perhaps you are responsible for a mission-critical or revenue-generating CF application that you don’t trust 100%, where implementing new features is a painful ad-hoc process with slow turnaround even for simple requests.
What if you have no contingency plan for a sudden developer departure or a server outage? Perhaps every time a new freelancer works on your site, something breaks. Or your application availability, security, and reliability are poor.
And if you are depending on ColdFusion for your job, then you can’t afford to let your CF development methods die on the vine.
You’re making a high-stakes bet that everything is going to be OK using the same old app creation ways in that one language — forever.
All it would take is for your fellow CF developer to quit or for your CIO to decide to leave the (falsely) perceived sinking ship of CFML and you could lose everything—your project, your hard-won CF skills, and possibly even your job.
Luckily, there are a number of simple, logical steps you can take now to protect yourself from these obvious risks.
No Brainer ColdFusion Best Practices to Ensure You Thrive No Matter What Happens Next
ColdFusion Alive Best Practices Checklist
Modern ColdFusion development best practices that reduce stress, inefficiency, project lifecycle costs while simultaneously increasing project velocity and innovation.
√ Easily create a consistent server architecture across development, testing, and production
√ A modern test environment to prevent bugs from spreading
√ Automated continuous integration tools that work well with CF
√ A portable development environment baked into your codebase… for free!
Learn about these and many more strategies in our free ColdFusion Alive Best Practices Checklist.
Michaela Light is the host of the CF Alive Podcast and has interviewed more than 100 ColdFusion experts. In each interview, she asks "What Would It Take to make CF more alive this year?" The answers still inspire her to continue to write and interview new speakers.
Michaela has been programming in ColdFusion for more than 20 years. She founded TeraTech in 1989. The company specializes in ColdFusion application development, security and optimization. She has also founded the CFUnited Conference and runs the annual State of the CF Union Survey.