From news@news.cam.ac.uk Fri Aug 8 10:30:41 EDT 1997 Article: 41009 of news.software.nntp Path: news.cis.ohio-state.edu!news.maxwell.syr.edu!howland.erols.net!rill.news.pipex.net!pipex!join.news.pipex.net!pipex!server1.netnews.ja.net!lyra.csx.cam.ac.uk!not-for-mail From: news@news.cam.ac.uk (USENET news) Newsgroups: news.software.nntp Subject: INN 1.5.1sec2 - fix for an innd crash Date: 8 Aug 1997 00:22:17 +0100 Organization: Computing Service, Cambridge University, England Lines: 77 Message-ID: <5sdlb9$oq$1@lyra.csx.cam.ac.uk> NNTP-Posting-Host: lyra.csx.cam.ac.uk Xref: news.cis.ohio-state.edu news.software.nntp:41009 At least, it seems to be the fix for a problem I've been seeing with INN 1.5.1sec2 on Solaris 2.5 (SPARC) and Sun's C compiler - "your mileage may vary". The symptoms I saw were innd collapsing in response to circumstances which caused outbound feeds to be restarted (ctlinnd reload newsfeeds, ctlinnd newgroup, etc.), though it was also dependent on the newsfeeds configuration (related to funnel files and how many sites particular articles to the funnel were being sent to). Since the underlying cause is a memory addressing error, the symptoms are likely to vary with OS/compiler/phase of moon, depending on where data structures get allocated etc. In my case, examining a core dump in dbx showed e.g. Current function is SITEfree 1044 DISPOSE(sp->FNLnames.Data); (dbx) [1] kill(0x0, 0x6, 0x0, 0x0, 0xffffffff, 0x128be8), at 0xef67434c [2] abort(0x11b104, 0x127580, 0x2ebc80, 0x0, 0x0, 0x1), at 0xef6396a4 [3] free(0x549e08, 0x55555400, 0x55555555, 0x549e00, 0xd0, 0x131f48), at 0x6e9d8 =>[4] SITEfree(sp = 0x52a0a0), line 1044 in "site.c" [5] SITEparsefile(StartSite = 1), line 542 in "newsfeeds.c" [6] ICDsetup(StartSites = 1), line 98 in "icd.c" [7] ICDwritevactive(vp = 0xefffef54, vpcount = 2), line 221 in "icd.c" [8] ICDnewgroup(Name = 0xeffff6aa "soc.genealogy.britain", Rest = 0xeffff6c0 "y"), line 281 in "icd.c" [9] CCnewgroup(av = 0xeffff264), line 1016 in "cc.c" [10] CCreader(cp = 0x131728), line 1739 in "cc.c" [11] CHANreadloop(), line 833 in "chan.c" [12] main(ac = 0, av = 0xeffffe7c), line 972 in "innd.c" accompanied consistently by assertion botched: *(unsigned int *)((caddr_t)Perl_op + Perl_op->ovu.ovu_size + 1 - sizeof (unsigned int)) == 0x55555555 in errlog. The error (if I'm right, and the code certainly appears to be wrong anyway!) is actually in innd/art.c, in an area where the INN 1.5.1sec2 (and sec) code differs from both 1.5.1 and from the 1.6 beta releases. The 1.5.1sec2 version appears broken, and substituting the 1.6b3 version (patch below) appears to fix it. Grateful thanks are due to Forrest J. Cavalier III , whose unified source files for the various recent INN versions (announced here recently) made it much easier to see where possibly-relevant bits of 1.5.1sec2 differed from 1.5.1 and 1.6beta. In fact, he also highlighted the 1.5.1sec2 version of the change as dubious (from examining the source code) a couple of days ago on the inn-workers mailing list, but the effects of the broken code were hitherto unknown... John Line ===== patch for innd/art.c in INN 1.5.1sec2 (retrofitted from 1.6b3) *** art.c.original Thu Aug 7 21:30:52 1997 --- art.c Thu Aug 7 22:06:26 1997 *************** *** 1690,1697 **** *p++ = ' '; bp->Used++; } ! strncpy(p, sp->Name, bp->Size - 1) ; ! p[bp->Size - 1] = '\0'; bp->Used += strlen(p); } } --- 1690,1697 ---- *p++ = ' '; bp->Used++; } ! strncpy(p, sp->Name, bp->Size - bp->Used - 1) ; ! bp->Data[bp->Size - 1] = '\0'; bp->Used += strlen(p); } } =====