I used to tell my techs not to go immediately for the most exotic underlying cause when someone reported a problem. Just as a doctor should not immediately diagnose a brain tumor from a headache, a tech should not diagnose an email profile corruption issue when the error says “incorrect username or password”.
We have dedicated server clients who sometimes cannot understand that taking the time to eliminate the easy items is not only highly efficient - I am a huge fan of efficiency because it means more time slacking off at work for me - but takes a lot less time to resolve if it does wind up being one of the easy things. As an added bonus, it gets them out of my helpdesk, too. Everyone wins, but I win more, and who doesn’t want that?
Today, from our desk…
Client: I keep seeing ratelimit entries in the logs. Why would that be?
Us: Well, the most common reason would be that you actually have ratelimiting logging enabled and also have the ratelimit hosts setting enabled.
Client: Exim keeps failing over and over, too, and even while we’re sending out mail from our scripts. Why would that be?
Us: Any number of reasons, but the most common would be…that exim is failing over and over because the system detects no available connections (because all avaialble are in use) and restarts the process for that reason.
Client: But our script only uses one connection at a time.
What we wated to say: Yes. And do you suppose the many thousands of items that you’re sending out just wait patiently, one by one, or do you suppose that exim is capable of handling multiple item processing simultaneously, pushing out mail must more efficiently (there’s that word again) by utilizing multiple sockets.
Us: It isn’t just a matter of outbound mail. Inbound mail also uses those very same sockets. Every time you send something from your lists, or unfreeze the queue, and as long as you’re accepting inbound mail, you’re using more than one socket at a time.
Client, some time later: We found that the process monitor that we put in was killing off the exim process and that coincided directly with the restart notes in the logs. So, it wasn’t ratelimiting or the system check after all.
OK. Was that particular bit on a need to know basis or something? Do you think it might have been somewhat helpful for you to note that an unrelated third application that can directly affect other applications was running on the server? Because that, dear client, would have been one of those “easy things” that could have solved this quite some time ago, by either raising the process monitor threshold for exim, or eliminating it altogether.
Or is that also too easy?
Leave a reply