OK, it's no secret I've always abhorred qmail. Now, it appears to have reached an all-time peak.
While trying to diagnose a "what's happening to our mail, it's not getting through" condition, I found lots of this error on the server that was apparently the culprit:
2005-08-11 13:12:31.061306500 starting delivery 2753: msg 34103 to local XXXXXXXXX-deballing@XXXXXXXX
2005-08-11 13:12:31.122103500 delivery 2753: success: lseek_error_29/lseek_errno=29/did_0+0+1/
Yup, you read that right... qmail SAW the error, knew enough to log it, and yet still called it a "success", so it pulled it out of its queue.
I'm looking for two things:
- Can a qmail-savvy person please tell me that my worst fears are true: That any message which meets that criteria is, in fact, visiting Dave Null and won't be back for a while?
- What the hell causes that lseek_error thing in the first place, and how does one correct it? It seems to be fairly rare, near as I can tell, given that the only real mentions of it I can find on Google are two people seemingly asking questions about it, and neither of them seems to be seeing the "I got an error, but I'll treat it as a success anyway, and put the queued-item in the bit-bucket." situation we were seeing.
A quick check of errno.h show that 29 is ESPIPE. Cross referencing the errno man page, ESPIPE is an invalid seek (which makes sense for an lseek error). Looking at the lseek man page, that seems to mean that the code tried to do an lseek (which is a file operation) on a file descriptor that refered to a pipe rather than a file (which wouldn't work very well). My best guess is that this error only happens on a message that is forwarded to a program (for instance a pipe in a .forward file). However I am not a qmail expert, so I could be wrong about what specifically causes that in qmail.
My first guess would be to see if qmail hands off the delivery to something else. It may be the second program actually throwing off the error message.
I'm a postfix guy but I've seen where sometimes the error message reported is actually from the secondary delivery program not postfix itself as it initially seems.
Right, but the hard part for me to wrap my head around is: "It SAW the fucking error and still said 'Ah, it's a success.'"
That part I cannot accept. :-(
Who in his sane mind still uses qmail?
What a crap.
qmail delivers into a pipe, the program behind the pipe return exit status 0 ALTHOUGH it encountered an error. So don't blame qmail, but the problem qmail delivers to...
I don't think that's the case, Ralf. Qmail *saw* the error, it's in the qmail log. That it failed to behave properly when faced with it is most definitely qmail's fault.
Definitely not qmail's fault. Qmail doesn't ever print "lseek error". What's almost certainly happening is that a program delivery is printing that error and then exiting 0. If a program exits zero, qmail cannot reasonably parse its output and gainsay it. I checked, and if you're piping one program into another, and the exit code of the first program is lost. This exits zero:
exit 1 | exit 0
That's probably not a good thing, but you have the shell to "thank" for that, not qmail.
-russ