openafs/ChangeLog

commit 62beb979b4e43c361db54fbf3084f876fd2c11da
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Mon Jul 16 16:53:34 2018 -0500

    afs: Skip bulkstat if stat cache looks full

    Currently, afs_lookup() will try to prefetch dir entries for normal
    dirs via bulkstat whenever multiple pids are reading that dir.
    However, if we already have a lot of vcaches, ShakeLooseVCaches may be
    struggling to limit the vcaches we already have. Entering
    afs_DoBulkStat can make this worse, since we grab afs_xvcache
    repeatedly, we may kick out other vcaches, and we'll possibly create
    30 new vcaches that may not even be used before they're evicted.

    To try to avoid this, skip running afs_DoBulkStat if it looks like the
    stat cache is really full.

    Reviewed-on: https://gerrit.openafs.org/13256
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 9ff45e73cf3d91d12f09e108e1267e37ae842c87)

    Change-Id: I1b84ab3bb918252e8db5e4379730a517181bc9d8
    Reviewed-on: https://gerrit.openafs.org/14400
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 28794011085d35b293d5e829adadb372b2a2b3fd
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Mon Jul 16 16:44:14 2018 -0500

    afs: Log warning when we detect too many vcaches

    Currently, afs_ShakeLooseVCaches has a kind of warning that is logged
    when we fail to free up any vcaches. This information can be useful to
    know, since it may be a sign that users are trying to access way more
    files than our configured vcache limit, hindering performance as we
    constantly try to evict and re-create vcaches for files.

    However, the current warning is not clear at all to non-expert users,
    and it can only occur for non-dynamic vcaches (which is uncommon these
    days).

    To improve this, try to make a general determination if it looks like
    the stat cache is "stressed", and log a message if so after
    afs_ShakeLooseVCaches runs (for all platforms, regardless of dynamic
    vcaches). Also try to make the message a little more user-friendly,
    and only log it (at most) once per 4 hours.

    Determining whether the stat cache looks stressed or not is difficult
    and arguably subjective (especially for dynamic vcaches). This commit
    draws a few arbitrary lines in the sand to make the decision, so at
    least something will be logged in the cases where users are constantly
    accessing way more files than our configured vcache limit.

    Reviewed-on: https://gerrit.openafs.org/13255
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 0532f917f29bdb44f4933f9c8a6c05c7fecc6bbb)

    Change-Id: Ic7260f276e00f3e34541207955df841d4ed27844
    Reviewed-on: https://gerrit.openafs.org/14399
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 47ab46e1e9fe56b5bf8147eab0b652e74078cbe7
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Mon Jul 16 16:08:13 2018 -0500

    afs: Split out bulkstat conditions into a function

    Our current if() statement for determining whether we should run
    afs_DoBulkStat to prefetch dir entries is a bit large, and grows over
    time. Split this logic out into a separate function to make it easier
    to maintain, and add some comments to help explain each condition.

    This commit should have no visible effects; it's just code
    reorganization.

    Reviewed-on: https://gerrit.openafs.org/13254
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 19cd454f11997d286bc415e9bc9318a31f73e2c6)

    Change-Id: Ida322c518d11787fd794df7534135fbc2dec2935
    Reviewed-on: https://gerrit.openafs.org/14398
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 738eb3e4976947657f877ec107d517d63a66a907
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Sun Jul 8 15:00:02 2018 -0500

    afs: Bound afs_DoBulkStat dir scan

    Currently, afs_DoBulkStat will scan the entire directory blob, looking
    for entries to stat. If all or almost all entries are already stat'd,
    we'll scan through the entire directory, doing nontrivial work on
    each entry (we grab afs_xvcache, at least). All of this work is pretty
    pointless, since the entries are already cached and so we won't do
    anything. If many processes are trying to acquire afs_xvcache, this
    can contribute to performance issues.

    To avoid this, provide a constant bound on the number of entries we'll
    search through: nentries * 4. The current arbitrary limits cap
    nentries at 30, so this means we're capping the afs_DoBulkStat search
    to 120 entries.

    Reviewed-on: https://gerrit.openafs.org/13253
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit ba8b92401b8cb2f5a5306313c2702cb36cba083c)

    Change-Id: Icf82f88328621cb5a9e0ad52f873b8a7d74b1f3a
    Reviewed-on: https://gerrit.openafs.org/14397
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit fbba8e4cbf00f32181b5075e8d7a876cccd2a046
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Thu Jul 13 17:40:36 2017 -0500

    afs: Avoid needless W-locks for afs_FindVCache

    The callers of afs_FindVCache must hold at least a read lock on
    afs_xvcache; some hold a shared or write lock (and set IS_SLOCK or
    IS_WLOCK in the given flags). Two callers (afs_EvalFakeStat_int and
    afs_DoBulkStat) currently hold a write lock, but neither of them need
    to.

    In the optimal case, where afs_FindVCache finds the given vcache, this
    means that we unnecessarily hold a write lock on afs_xvcache. This can
    impact performance, since afs_xvcache can be a very frequently
    accessed lock (a simple operation like afs_PutVCache briefly holds a
    read lock, for example).

    To avoid this, have afs_DoBulkStat hold a shared lock on afs_xvcache,
    upgrading to a write lock when needed. afs_EvalFakeStat_int doesn't
    ever need a write lock at all, so just convert it to a read lock.

    Reviewed-on: https://gerrit.openafs.org/12656
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 6c808e05adb0609e02cd61e3c6c4c09eb93c1630)

    Change-Id: Id517d1098b4c3a02db646b2a74535f77cb684ec3
    Reviewed-on: https://gerrit.openafs.org/14396
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 5c476b91fd2259e9c34070be8ba201dd471ad974
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Thu Jul 13 17:40:21 2017 -0500

    afs: Change VerifyVCache2 calls to VerifyVCache

    afs_VerifyVCache is a macro that (on most platforms) effectively
    expands to:

        if ((avc->f.states & CStatd)) {
            return 0;
        } else {
            return afs_VerifyVCache2(...);
        }

    Some callers call afs_VerifyVCache2 directly, since they already check
    for CStatd for other reasons. A few callers currently call
    afs_VerifyVCache2, but without guaranteeing that CStatd is not set.
    Specifically, in afs_getattr and afs_linux_VerifyVCache, CStatd could
    be set while afs_CreateReq drops GLOCK. And in afs_linux_readdir,
    CStatd could be cleared at multiple different points before the
    VerifyVCache call.

    This can result in afs_VerifyVCache2 acquiring a write-lock on the
    vcache, even when CStatd is already set, which is an unnecessary
    performance hit.

    To avoid this, change these call sites to use afs_VerifyVCache instead
    of calling afs_VerifyVCache2 directly, which skips the write lock when
    CStatd is already set.

    Reviewed-on: https://gerrit.openafs.org/12655
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit a05d5b7503e466e18f5157006c1de2a2f7d019f7)

    Change-Id: I05bdcb7f10930ed465c24a8d7e51077a027b1a4b
    Reviewed-on: https://gerrit.openafs.org/14395
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 150ee65f286409e37fcf94807dbeaf4b79ab0769
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Sun Apr 26 17:26:02 2020 -0500

    rx: Use _IsLast to check for last call in queue

    Ever since commits 170dbb3c (rx: Use opr queues) and d9fc4890 (rx: Fix
    test for end of call queue for LWP), rx_GetCall checks if the current
    call is the last one on rx_incomingCallQueue by doing this:

        opr_queue_IsEnd(&rx_incomingCallQueue, cursor)

    But opr_queue_IsEnd checks if the given pointer is the _end_ of the
    last; that is, if it's the end-of-list sentinel, not an item on the
    actual list. Testing for the last item in a list is what
    opr_queue_IsLast is for. This is the same convention that the old Rx
    queues used, but 170dbb3c just accidentally replaced queue_IsLast with
    opr_queue_IsEnd (instead of opr_queue_IsLast), and d9fc4890 copied the
    mistake.

    So because this is inside an opr_queue_Scan loop, opr_queue_IsEnd will
    never be true, so we'll never enter this block of code (unless we are
    the "fcfs" thread). This means that an incoming Rx call can get stuck
    in the incoming call queue, if all of the following are true:

     - The incoming call consists of more than 1 packet of incoming data.

     - The incoming call "waits" when it comes in (that is, there are no
       free threads or the service is over quota).

     - The "fcfs" thread doesn't scan the incoming call queue (because it
       is idle when the call comes in, but the relevant service is over
       quota).

    To fix this, just use opr_queue_IsLast here instead of
    opr_queue_IsEnd.

    Reviewed-on: https://gerrit.openafs.org/14158
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    (cherry picked from commit befc72749884c6752c7789479343ba48c7d5cea1)

    Change-Id: If724245414798ae7a86dfa048cf99863317aef8e
    Reviewed-on: https://gerrit.openafs.org/14394
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 12aed9ef2988a69187910bada5ba7325cb6144ab
Author: Andrew Deason <adeason@dson.org>
Date:   Sun Jul 21 18:48:51 2019 -0500

    rx: Avoid osi_NetSend during rx shutdown

    Commit 8d939c08 (rx: avoid nat ping during shutdown) added a call
    to shutdown_rx() inside the DARWIN shutdown sequence, before the rx
    socket was closed. From the commit message, it sounds like this was
    done to avoid NAT pings from calling osi_NetSend during the shutdown
    sequence after the rx socket was closed; calling shutdown_rx() before
    closing the socket would cause any connections we had to be destroyed
    first, avoiding that.

    The problem with this is that this means shutdown_rx() is called when
    osi_StopNetIfPoller is called, which is much earlier than some other
    portions of the shutdown sequence; some of which may hold references
    to e.g. rx connections. If we try to, for instance, destroy an rx
    connection after shutdown_rx() is called, we could panic.

    An earlier version of that commit (gerrit PS1) just tried to insert a
    check before the relevant osi_NetSend call, making us just skip the
    osi_NetSend if the shutdown sequence had been started. So to avoid the
    above issue, try to implement that approach instead. And instead of
    doing it just for NAT pings, we can do it for almost all osi_NetSend
    calls (besides those involved in the shutdown sequence itself), by
    checking this in rxi_NetSend. Also return an error (ESHUTDOWN) if we
    skip the osi_NetSend call, so we're not completely silent about doing
    so.

    This means we also remove the call to shutdown_rx() inside DARWIN's
    osi_StopNetIfPoller(). This allows us to interact with Rx objects
    during more of the shutdown process in cross-platform code.

    Reviewed-on: https://gerrit.openafs.org/13718
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    (cherry picked from commit 9866511bb0a5323853e97e3ee92524198813776e)

    Change-Id: Ie62c1a68d8a8889f7a8aa3eff3973c219b45a95c
    Reviewed-on: https://gerrit.openafs.org/14393
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 7f59989cf84df1e2077f4fcf5649c9624e79a5d2
Author: Andrew Deason <adeason@dson.org>
Date:   Sun Jul 21 18:31:53 2019 -0500

    rx: Introduce rxi_NetSend

    Introduce a small wrapper around osi_NetSend, called rxi_NetSend. This
    small wrapper allows future commits to change the code around our
    osi_NetSend calls, without needing to change every single call site,
    or every implementation of osi_NetSend.

    Change most call sites to use rxi_NetSend, instead of osi_NetSend. Do
    not change a few callers in the platform-specific kernel shutdown
    sequence, since those call osi_NetSend for platform-specific reasons.

    This commit on its own does not change any behavior with osi_NetSend;
    it is just code reorganization.

    Reviewed-on: https://gerrit.openafs.org/13717
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    (cherry picked from commit 2a33a80f7026df6b5e47e42319c55d8b7155675a)

    Change-Id: I6af8541953a582d044fb668eb4a91720536bc8e1
    Reviewed-on: https://gerrit.openafs.org/14392
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 38f613f5e15552938cb425c68b22c166e35284be
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Tue Oct 13 20:18:59 2020 -0500

    ubik: Remove unused sampleName

    The RPC-L type sampleName and related constant UMAXNAMELEN are not
    referenced by anything, and have been unused since OpenAFS 1.0. Remove
    the unused definitions.

    Reviewed-on: https://gerrit.openafs.org/14386
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 83ce8d41c68a5ebcc84132d77af9024c6d285e05)

    Change-Id: I1d6c583d9c630fc9704578ba3329132e16b3a803
    Reviewed-on: https://gerrit.openafs.org/14401
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit f7374a883505aa34a3db34ddd1163367c544bb0c
Author: Andrew Deason <adeason@dson.org>
Date:   Sat May 2 23:54:55 2020 -0500

    afs: Drop GLOCK for RXAFS_GetCapabilities

    We are hitting the net here; we certainly should not be holding
    AFS_GLOCK while waiting for the server's response.

    Found via FreeBSD WITNESS.

    Reviewed-on: https://gerrit.openafs.org/14181
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 44b7b93b593371bfdddd0be0ae603f4f8720f78b)

    Change-Id: I186e08c89136cc3109fd2519bb0d2abbb52f9ba0
    Reviewed-on: https://gerrit.openafs.org/14391
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 02fb067d3fded0eee5c9326c56e66b512a75d71c
Author: Michael Meffie <mmeffie@sinenomine.net>
Date:   Mon Mar 23 09:46:05 2020 -0400

    build: remove unused LINUX_PKGREL from configure.ac

    This change removes the unused LINUX_PKGREL definition from the
    configure.ac file.

    Commit 6a27e228bac196abada96f34ca9cd57f32e31f5c converted the setting of
    the RPM package version and release values in the openafs.spec file from
    autoconf to the makesrpm.pl script. That commit left LINUX_PKGREL in
    configure.ac because it was still referenced by the Debian packaging,
    which was still in-tree at that time.

    Commit ada9dba0756450993a8e57c05ddbcae7d1891582 removed the last trace
    of the Debian packaging, but missed the removal of the LINUX_PKGREL.

    Reviewed-on: https://gerrit.openafs.org/14117
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 8ae4531c5720baff9e11e4b05706eab6c82de5f9)

    Conflicts:
            configure.ac

    Change-Id: I69925f89c52aef32aea5bc308140936517b1aeb0
    Reviewed-on: https://gerrit.openafs.org/14363
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 2f2cbff766650e7c25eb332d5385f4a3fca8676d
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Mon Apr 20 13:03:15 2020 -0500

    ubik: Avoid unlinking garbage during recovery

    In urecovery_Interact, if any of our operations fail around
    calling DISK_GetFile, we will jump to FetchEndCall and eventually
    unlink 'pbuffer'. But if we failed before opening our .DB0.TMP file,
    the contents of 'pbuffer' will not be initialized yet.

    During most iterations of the recovery loop, the contents of 'pbuffer'
    will be filled in from previous loops, and it should always stay the
    same, so it's not a big problem. But if this is the first iteration of
    the loop, the contents of 'pbuffer' may be stack garbage.

    Solve this in two ways. To make sure we don't use garbage contents in
    'pbuffer', memset the whole thing to zeroes at the beginning of
    urecovery_Interact(). And then to make sure we're not reusing
    'pbuffer' contents from previous iterations of the loop, also clear
    the first character to NUL each time we arrive at this area of the
    recovery code. And avoid unlinking anything if pbuffer starts with a
    NUL.

    Commit 44e80643 (ubik: Avoid unlinking garbage) fixes the same issue,
    but only fixed it in the SDISK_SendFile codepath in remote.c.

    Reviewed-on: https://gerrit.openafs.org/14153
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 98b5ffb52117aefac5afb47b30ce9b87eb2fdebf)

    Change-Id: I5cadb88e466ddd326ef1e4138d5b1bf89fdb27dc
    Reviewed-on: https://gerrit.openafs.org/14365
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 8993e35578e4ae51dd5e8941f77c18bb975e51af
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Fri Sep 18 14:03:37 2020 -0600

    WINNT: Make opr_threadname_set a no-op

    We don't supply an implementation for opr_threadname_set for WINNT;
    don't pretend that we do.

    Reviewed-on: https://gerrit.openafs.org/13817
    Tested-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit f895a9b51671ffdc920fd9b4284337c5b737a0ef)

    Change-Id: Ie8df82550f5420e2b024dea29aae0e39e3ee506f
    Reviewed-on: https://gerrit.openafs.org/14369
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 045a97dfbc8dd6a2794b74e16f47984dc5f8eccf
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Fri Sep 18 12:11:36 2020 -0600

    Move afs_pthread_setname_self to opr

    Move the functionality in afs_pthread_setname_self from libutil to
    opr, in a new function opr_threadname_set. This allows us to more
    easily use the routine in more subsystems, since most code already
    uses opr.

    Reviewed-on: https://gerrit.openafs.org/13655
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 9d28f7390332c92b3d9e863c6fe70c26db28b5ad)

    Change-Id: Ic6bbb615bc3494a7a114a0f4cae711b65ebec111
    Reviewed-on: https://gerrit.openafs.org/14368
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 41f4bb48741065da6a69ffcb05e451e5c7dac757
Author: Marcio Barbosa <mbarbosa@sinenomine.net>
Date:   Mon Aug 31 19:56:56 2020 +0000

    Revert "vos: take RO volume offline during convertROtoRW"

    This reverts commit 32d35db64061e4102281c235cf693341f9de9271. While that
    commit did fix the mentioned problem, depending on "vos" to set the
    volume to be converted as "out of service" is not ideal. Instead, this
    volume should be set as offline by the SAFSVolConvertROtoRWvolume RPC,
    executed on the volume server.

    The proper fix for this problem will be introduced by another commit.

    Reviewed-on: https://gerrit.openafs.org/14339
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 85893ac3df0c2cb48776cf1203ec200507b6ce7d)

    Change-Id: Ie125d2dae1301ca5a4f8323099e6e42bc57b6d28
    Reviewed-on: https://gerrit.openafs.org/14361
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 29cdc0af10cb2a0f4cb4d1b05e72d6f066a1a7a5
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date:   Wed Apr 15 05:33:00 2020 -0500

    LINUX: Always crref after _settok_setParentPag

    Commit b61eac78 (Linux: setpag() may replace credentials) changed
    PSetTokens2 to call crref() after _settok_setParentPag(), since
    changing the parent PAG may change our credentials structure. But that
    commit did not update the old pioctl PSetTokens, so -setpag
    functionality remained broken on Linux for utilities that called the
    old pioctl ('klog' is one such utility).

    To fix this, we could copy the same code from PSetTokens2 into
    PSetTokens. But instead just move this code into _settok_setParentPag
    itself, to avoid code duplication. This commit also refactors
    _settok_setParentPag a little to make the platform-specific ifdefs a
    little easier to read through.

    Reviewed-on: https://gerrit.openafs.org/14147
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Yadavendra Yadav <yadayada@in.ibm.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 8002a46125e8224ba697c194edba5ad09e4cfc44)

    Change-Id: I6a9d10428fe470cb38e3ca669f273dde0fa9c875
    Reviewed-on: https://gerrit.openafs.org/14328
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 9553128008c7656d78759aa685a574906d352f3c
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date:   Wed Apr 15 05:33:00 2020 -0500

    LINUX: Copy session keys to parent in SetToken

    Commit 48589b5d (Linux: Restore aklog -setpag functionality for kernel
    2.6.32+) added code to SetToken() to copy our session keyring to the
    parent process, in order to implement -setpag functionality. But this
    was removed from SetToken() in commit 1a6d4c16 (Linux: fix aklog
    -setpag to work with ktc_SetTokenEx), when the same code was moved to
    ktc_SetTokenEx().

    Add this code back to SetTokens(), so -setpag functionality can work
    again with utilities that use older functions like ktc_SetToken, like
    'klog'.

    Reviewed-on: https://gerrit.openafs.org/14146
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 826bb826274e48c867b41cb948d031a423373901)

    Change-Id: I1ae90d92efa27bce2ff59ff9b9dcca370eaf4730
    Reviewed-on: https://gerrit.openafs.org/14327
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 01dfdf2608c5643714da61950637c1e7d1994023
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date:   Fri Aug 21 01:54:00 2020 +0530

    afs: Avoid NatPing event on all connection

    Inside release_conns_user_server, connection vector is traversed and after
    destroying a connection new eligible connection is found on which NatPing
    event will be set. Ideally there should be only one connection on which
    NatPing should be set but in current code while traversing all connection
    of server a NatPing event is set on all connections to that server. In
    cases where we have large number of connection to a server this can lead
    to huge number of “RX_PACKET_TYPE_VERSION” packets sent to a server.
    Since this happen during Garbage collection of user structs, to simulate
    this issue below steps were tried

      - had one script which “cd” to a volume mount and then script sleeps for
        large time.
      - Ran one infinite while loop where above script was called using PAG
        based tokens (As new connection will be created for each PAG)
      - Instrumented the code, so that we hit above code segment where NatPing
        event is set. Mainly reduced NOTOKTIMEOUT to 60 sec.

    To fix this issue set NatPing on one connection and once it is set break
    from “for” loop traversing the server connection.

    Reviewed-on: https://gerrit.openafs.org/14312
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    (cherry picked from commit b968875a342ba8f11378e76560b46701f21391e8)

    Change-Id: I8c70108ab3eb73ed1d9e598381bc29b87ca42aa0
    Reviewed-on: https://gerrit.openafs.org/14364
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit e4103743d91bcdba1c57c850eb07b0b03abe72d6
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Mon Jun 22 22:54:52 2020 -0500

    volser: Don't NUL-pad failed pread()s in dumps

    Currently, the volserver SAFSVolDump RPC and the 'voldump' utility
    handle short reads from pread() for vnode payloads by padding the
    missing data with NUL bytes. That is, if we request 4k of data for our
    pread() call, and we only get back 1k of data, we'll write 1k of data
    to the volume dump stream followed by 3k of NUL bytes, and log
    messages like this:

        1 Volser: DumpFile: Error reading inode 1234 for vnode 5678
        1 Volser: DumpFile: Null padding file: 3072 bytes at offset 40960

    This can happen if we hit EOF on the underlying file sooner than
    expected, or if the OS just responds with fewer bytes than requested
    for any reason.

    The same code path tries to do the same NUL-padding if pread() returns
    an error (for example, EIO), padding the entire e.g. 4k block with
    NULs. However, in this case, the "padding" code often doesn't work as
    intended, because we compare 'n' (set to -1) with 'howMany' (set to 4k
    in this example), like so:

        if (n < howMany)

    Here, 'n' is signed (ssize_t), and 'howMany' is unsigned (size_t), and
    so compilers will promote 'n' to the unsigned type, causing this
    conditional to fail when n is -1. As a result, all of the relevant log
    messages are skipped, and the data in the dumpstream gets corrupted
    (we skip a block of data, and our 'howFar' offset goes back by 1). So
    this can result in rare silent data corruption in volume dumps, which
    can occur during volume releases, moves, etc.

    To fix all of this, remove this bizarre NUL-padding behavior in the
    volserver. Instead:

    - For actual errors from pread(), return an error, like we do for I/O
      errors in most other code paths.

    - For short reads, just write out the amount of data we actually read,
      and keep going.

    - For premature EOF, treat it like a pread() error, but log a slightly
      different message.

    For the 'voldump' utility, the padding behavior can make sense if a
    user is trying to recover volume data offline in a disaster recovery
    scenario. So for voldump, add a new switch (-pad-errors) to enable the
    padding behavior, but change the default behavior to bail out on
    errors.

    Reviewed-on: https://gerrit.openafs.org/14255
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 4498bd8179e5e93a33468be3c8e7a30e569d560a)

    Change-Id: Idf89d70c9d4d650dbf7b73e67c5b71b9bab7c3f4
    Reviewed-on: https://gerrit.openafs.org/14367
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 9947625a1d67b4ffdc0582e9081000e34be2b46b
Author: Michael Meffie <mmeffie@sinenomine.net>
Date:   Fri May 15 12:01:44 2020 -0400

    vos: avoid CreateVolume when restoring over an existing volume

    Currently, the UV_RestoreVolume2 function always attempts to create a
    new volume, even when doing a incremental restore over an existing
    volume.  When the volume already exists, the volume creation operation
    fails on the volume server with a VVOLEXISTS error. The client will then
    attempt to obtain a transaction on the existing volume. If a transaction
    is obtained, the incremental restore operation will proceed. If a full
    restore is being done, the existing volume is removed and a new empty
    volume is created.

    Unfortunately, the failed volume creation is logged to by the volume
    server, and so litters the log file with:

        Volser: CreateVolume: Unable to create the volume; aborted, error code 104

    To avoid polluting the volume server log with these messages, reverse
    the logic in UV_RestoreVolume2. Assume the volume already exists and try
    to get the transaction first when doing an incremental restore. Create a
    new volume if the transaction cannot be obtained because the volume is
    not present.  When doing a full restore, remove the existing volume, if
    one exists, and then create a new empty volume.

    Reviewed-on: https://gerrit.openafs.org/14208
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
    Tested-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit f5051b87a56b3a4f7fd7188cbd16a663eee8abbf)

    Change-Id: I422b81e0c800ff655ac8851b2872f4d7160d79a8
    Reviewed-on: https://gerrit.openafs.org/14326
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 967aa95fe23f98c911b8a618132442159eda3f77
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date:   Wed Apr 29 05:10:05 2020 +0000

    rxkad: Use krb5_enctype_keysize in tkt_DecodeTicket5

    Inside tkt_DecodeTicket5 (rxkad/ticket5.c) function, keysize is calculated
    using krb5_enctype_keybits and then dividing number of bits by 8. For 3DES
    number of keybits are 168, so keysize comes out to 21(168/8). However
    actual keysize of 3DES key is 24. This keysize is passed to
    _afsconf_GetRxkadKrb5Key where keysize comparison happens, since there is
    keysize mismatch it returns AFSCONF_BADKEY.

    To fix this issue get keysize from krb5_enctype_keysize function instead
    of krb5_enctype_keybits. Thanks to John Janosik (jpjanosi@us.ibm.com)
    for analyzing and fixing this issue.

    Reviewed-on: https://gerrit.openafs.org/14203
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 5d53ed0bdab6fea6d2426691bdef2b6f9cb7f2fe)

    Change-Id: I16cd7a803a139802671a8045dac09e10ef4ad6cb
    Reviewed-on: https://gerrit.openafs.org/14325
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 28e96b954fe17810d1ebdd9fdd4702511e870a67
Author: Jan Iven <iven.jan@gmail.com>
Date:   Tue Sep 1 14:51:25 2020 +0200

    ptserver: Remove duplicate ubik_SetLock in listSuperGroups

    It looks like a call to ubik_SetLock(.. LOCKREAD) was left in
    place in listSuperGroups after locking was moved to ReadPreamble
    in commit a6d64d70 (ptserver: Refactor per-call ubik initialisation)
    When compiled with 'supergroups', and once contacted by
    "pts mem -expandgroups ..", ptserver will therefore abort() with
    Ubik: Internal Error: attempted to take lock twice
    This patch removes the superfluous ubik_SetLock.

    FIXES 135147

    Reviewed-on: https://gerrit.openafs.org/14338
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 696f2ec67b049639abf04905255a7d6173dbb19e)

    Change-Id: I62bfe44e374b398278658b61f2ecf9a66fab18ae
    Reviewed-on: https://gerrit.openafs.org/14345
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit c7a3154382376ca2e5effdfed2c447c86b9f6eed
Author: Michael Meffie <mmeffie@sinenomine.net>
Date:   Wed Jul 1 21:50:09 2020 -0400

    redhat: Add make to the dkms-openafs pre-requirements

    If `make` is not installed before dkms-openafs, the OpenAFS kernel
    module is not built during the dkms-openafs package installation.

    The failure happens in the "checking if linux kernel module build works"
    configure step, which invokes `make` to check the linux buildsystem.
    configure fails when `make` is not available, and gives the unhelpful
    suggestion (in this case) of configuring with --disable-kernel module.

    Running the configure.log in the dkms build directory shows:

        configure:7739: checking if linux kernel module build works
        make -C /lib/modules/4.18.0-193.6.3.el8_2.x86_64/build M=/var/lib/dkms/openafs/...
        ./configure: line 7771: make: command not found
        configure: failed using Makefile:

    Avoid this build failure by adding `make` to the list of dkms-openafs
    package pre-requirements.

    Reviewed-on: https://gerrit.openafs.org/14266
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit e61ab9353e99d3298815296abf6b02c50ebe3df0)

    Change-Id: I9b2bb73acabfabc1cf8b5514c8aa66572cc96066
    Reviewed-on: https://gerrit.openafs.org/14314
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit ddaf6a74ff3e0aeb4c8b2b15886eaa5c90db59bf
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Fri Aug 28 10:32:01 2020 -0600

    INSTALL: document the minimum Linux kernel level

    The change associated with gerrit #14300 removed support for older
    Linux kernels (2.6.10 and earlier).

    The commit 'Import of code from autoconf-archive' (d8205bbb4) introduced
    a check for Autoconf 2.64.  Autoconf 2.64 was released in 2009.

    The commit 'regen.sh: Use libtoolize -i, and .gitignore generated
    build-tools' (a7cc505d3) introduced a dependency on libtool's  '-i'
    option.  Libtool supported the '-i' option with libtool 1.9b in 2004.

    Update the INSTALL instructions to document a minimum Linux kernel
    level and the minimum levels for autoconf and libtool.

    Notes: RHEL4 (EOL in 2017) had a 2.6.9 kernel and RHEL5 has a 2.6.18
    kernel. RHEL5 has libtool 1.5.22 and autoconf 2.59, RHEL6 has libtool
    2.2.6 and autoconf 2.63, and RHEL7 has libtool 2.4.2 and autoconf 2.69.

    Reviewed-on: https://gerrit.openafs.org/14305
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 16bae98ec525fa013514fb46398df682d7637ae0)

    Change-Id: I7aaf434928204df77851dd2d651d43b86f5b53d2
    Reviewed-on: https://gerrit.openafs.org/14331
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit fee39617c350623dc659e7a21287a1791bdff5d2
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Wed Apr 1 22:59:38 2020 -0500

    vos: Print "done" in non-verbose 'vos remsite'

    Currently, 'vos remsite' always prints the message "Deleting the
    replication site for volume %lu ...", and then calls VDONE if the
    operation is successful. VDONE prints the trailing "done", but only if
    -verbose is turned on, and so if -verbose is not specified, the output
    of 'vos remsite' looks broken:

        $ vos remsite fs1 vicepa vol.foo
        Deleting the replication site for volume 1234 ...Removed replication site fs1 /vicepa for volume vol.foo

    To fix this, unconditionally print the trailing "done", instead of
    going through VDONE, so 'vos remsite' output now looks like this:

        $ vos remsite fs1 vicepa vol.foo
        Deleting the replication site for volume 1234 ... done
        Removed replication site fs1 /vicepa for volume vol.foo

    Reviewed-on: https://gerrit.openafs.org/14127
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit f16d40ad26df3ec871f8c73952594ad2e723c9b4)

    Change-Id: I4cd7cb2156391004e57cd42437d7174a6bd70992
    Reviewed-on: https://gerrit.openafs.org/14311
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit ff5b64d074376992d0be02e19b4da789c20d6bab
Author: Marcio Barbosa <mbarbosa@sinenomine.net>
Date:   Thu Feb 13 00:39:00 2020 -0300

    vos: take RO volume offline during convertROtoRW

    The vos convertROtoRW command converts a RO volume into a RW volume.
    Unfortunately, the RO volume in question is not set as "out of service"
    during this process. As a result, accesses to the volume being converted
    can leave volume objects in an inconsistent state.

    Consider the following scenario:

    1. Create a volume on host_b and add replicas on host_a and host_b.

    $ vos create host_b a vol_1
    $ vos addsite host_b a vol_1
    $ vos addiste host_a a vol_1

    2. Mount the volume:

    $ fs mkmount /afs/.mycell/vol_1 vol_1
    $ vos release vol_1
    $ vos release root.cell

    3. Shutdown dafs on host_b:

    $ bos shutdown host_b dafs

    4. Remove RO reference to host_b from the vldb:

    $ vos remsite host_b a vol_1

    5. Attach the RO copy by touching it:

    $ fs flushall
    $ ls /afs/mycell/vol_1

    6. Convert RO copy to RW:

    $ vos convertROtoRW host_a a vol_1

    Notice that FSYNC_com_VolDone fails silently (FSYNC_BAD_STATE), leaving
    the volume object for the RO copy set as VOL_STATE_ATTACHED (on success,
    this volume should be set as VOL_STATE_DELETED).

    7. Add replica on host_a:

    $ vos addsite host_a a vol_1

    8. Wait until the "inUse" flag of the RO entry is cleared (or force this
    to happen by attaching multiple volumes).

    9. Release the volume:

    $ vos release vol_1

    Failed to start transaction on volume 536870922
    Volume not attached, does not exist, or not on line
    Error in vos release command.
    Volume not attached, does not exist, or not on line

    To fix this problem, take the RO volume offline during the vos
    convertROtoRW operation.

    Reviewed-on: https://gerrit.openafs.org/14066
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 32d35db64061e4102281c235cf693341f9de9271)

    Change-Id: Ie4cfab2f04a8859ddfcaece371198ac544066770
    Reviewed-on: https://gerrit.openafs.org/14310
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 696d9ce83e3556bda2d3945326937c63dd3560ed
Author: Michael Meffie <mmeffie@sinenomine.net>
Date:   Tue Apr 19 20:46:33 2016 -0400

    ubik: positional io for db reads and writes

    The ubik library was written before positional i/o was available and
    issues an lseek system call for each database file read and write.  This
    change converts the ubik database accesses to use positional i/o on
    platforms where pread and pwrite are available, in order to reduce
    system call load.

    The new inline uphys_pread and uphys_pwrite functions are supplied on
    platforms which do not supply pread and pwrite. These functions fall
    back to non-positional i/o. If these symbols are present in the database
    server binary then the server process will continue to call lseek before
    each read and write access of the database file.

    This change does not affect the whole-file database synchronization done
    by ubik during database recovery (via the DISK_SendFile and DISK_GetFile
    RPCs), which still uses non-positional i/o. However, that code does not
    share file descriptors with the phys.c code, so there is no possibility
    of mixing positional and non-positional i/o on the same FDs.

    Reviewed-on: https://gerrit.openafs.org/12272
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    (cherry picked from commit f62fb17b3cf1c886f8cfc2fabe9984070dd3eec4)

    Change-Id: Iccc8f749c89659f4acebd74a1115425f69610ba8
    Reviewed-on: https://gerrit.openafs.org/14142
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit fffd6e07c87e36cd9a4a36858c3df0c282622195
Author: Michael Meffie <mmeffie@sinenomine.net>
Date:   Wed Apr 20 18:17:16 2016 -0400

    ubik: remove unnecessary lseeks in uphys_open

    The ubik database file access layer has a file descriptor cache to avoid
    reopening the database file on each file access.  However, the file
    offset is reset with lseek on each and every use of the cached file
    descriptor, and the file offset is set twice when reading or writing
    data records.

    This change removes unnecessary and duplicate lseek system calls to
    reduce the system call load.

    Reviewed-on: https://gerrit.openafs.org/12271
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    (cherry picked from commit 6892bfbd701899281b34ee337637d438c7d8f8c6)

    Change-Id: I5ea7857796d94eb5b659d868e79b9fea2411f301
    Reviewed-on: https://gerrit.openafs.org/14141
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 438bc2c4caf96485e12eda9e7591fc4bd887ebad
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Thu Nov 9 12:47:57 2017 -0600

    asetkey: Add new 'delete' command variants

    The current 'delete' command from asetkey only lets the user delete
    old-style rxkad keys. Add a couple of new variants to allow specifying
    the key type and subtype, so the user can delete specific key types
    and enctypes if they want.

    Reviewed-on: https://gerrit.openafs.org/12767
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    (cherry picked from commit 5120409cc998284f2fb0467c2f88030976140341)

    Change-Id: I8c762839b50f4faf5e583fb5c510bf2ff9dd2259
    Reviewed-on: https://gerrit.openafs.org/14293
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 1c395cd1bcf5770f70dce36ef29f9bd7ba98acaa
Author: Kailas Zadbuke <kailashsz@in.ibm.com>
Date:   Thu May 7 23:55:39 2020 -0400

    salvaged: Fix "-parallel all<number>" parsing

    In salavageserver -parallel option takes "all<number>" argument.
    However the code does not parse the numeric part correctly. Due
    to this, only single instance of salvageserver process was running
    even if we provide the larger number with "all" argument.

    With this fix, numeric part of "all" argument will be parsed
    correctly and will start required number of salvageserver instances.

    Reviewed-on: https://gerrit.openafs.org/14201
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 4512d04a9b721cd9052c0e8fe026c93faf6edb9e)

    Change-Id: I8910fb514986a404b22256e8a514955a684c8a27
    Reviewed-on: https://gerrit.openafs.org/14285
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Kailas Zadbuke <kailashsz@in.ibm.com>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 38f7e123c33c1c11eb275e0619080ee874244afd
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Tue Jun 16 15:20:20 2020 -0600

    tests: Use usleep instead of nanosleep

    Commit "Build tests by default" 68f406436cc21853ff854c514353e7eb607cb6cb
    changes the build so tests are always built.

    On Solaris 10 the build fails because nanosleep is in librt, which we do
    not link against.

    Replace nanosleep with usleep.  This avoids introducing extra configure
    tests just for Solaris 10.

    Note that with Solaris 11 nanosleep was moved from librt to libc, the
    standard C library.

    Reviewed-on: https://gerrit.openafs.org/14244
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 22a66e7b7e1d73437a8c26c2a1b45bc4ef214e77)

    Change-Id: Ic24c5a149451955b5c130e7b36cec27e02688d83
    Reviewed-on: https://gerrit.openafs.org/14291
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 264d6b786c2fd0aff0e4abe835e42868f936afce
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Tue Mar 24 11:59:48 2020 -0500

    tests: Wait for server start in auth/superuser-t

    The auth/superuser-t test runs an Rx server and client in two child
    processes. If the client process tries to contact the server before
    the server has started listening on its port, some tests involving
    RPCs can fail (notably test 39, "Can run a simple RPC").

    Normally if we try to contact a server that's not there, Rx will try
    resending its packets a few times, but on Linux with AFS_RXERRQ_ENV,
    if the port isn't open at all, we can get an ICMP_PORT_UNREACH error,
    which causes the relevant Rx call to die immediately with
    RX_CALL_DEAD.

    This means that if the auth/superuser-t client is only just a bit
    faster than the server starting up, tests can fail, since the server's
    port is not open yet.

    To avoid this, we can wait until the server's port is open before
    starting the client process. To do this, have the server process send
    a SIGUSR1 to the parent after rx_Init() is called, and have the parent
    process wait for the SIGUSR1 (waiting for a max of 5 seconds before
    failing). This should guarantee that the server's port will be open by
    the time the client starts running.

    Note that before commit 086d1858 (LINUX: Include linux/time.h for
    linux/errqueue.h), AFS_RXERRQ_ENV was mistakenly disabled on Linux
    3.17+, so this issue was probably not possible on recent Linux before
    that commit.

    Reviewed-on: https://gerrit.openafs.org/14109
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    (cherry picked from commit 66d0f91791695ac585f0511d0dadafd4e570b1bf)

    Change-Id: Ia6c06ca9a05e33b3bc35238d9c0d18e7ff339438
    Reviewed-on: https://gerrit.openafs.org/14290
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit eaa552d27074cd7e1c862d606916b063e6d89a27
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Tue Dec 31 12:04:48 2019 -0600

    tests: Introduce afstest_GetProgname

    Currently, in tests/volser/vos-t.c we call afs_com_err as
    "authname-t", which is clearly a mistake during some code refactoring
    (introduced in commit 2ce3fdc5, "tests: Abstract out code to produce a
    Ubik client").

    We could just change this to "vos-t", but instead of specifying
    constant strings everywhere, change this to figure out what the
    current command is called, and just use that. Put this code into a new
    function, afstest_GetProgname, and convert existing tests to use that
    instead of hard-coding the program name given to afs_com_err.

    Reviewed-on: https://gerrit.openafs.org/13991
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit a21a2f8edb79d6190976e920a9a90d0878411146)

    Change-Id: I3d410d6de132f8a0fffeb9cce32a912fe3bbdc20
    Reviewed-on: https://gerrit.openafs.org/14289
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit fe57174d84cab19a13eea9a8f31f0ac71c8cf838
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Wed Aug 5 09:23:36 2020 -0600

    butc: fix int to float conversion warning

    Building with clang-10 results in 2 warnings/errors associated with
    with trying to convert 0x7fffffff to a floating point value.

        tcmain.c:240:18: error: implicit conversion from 'int' to 'float'
                     changes value from 2147483647 to 2147483648 [-Werror,
                     -Wimplicit-int-float-conversion]
        if ((total > 0x7fffffff) || (total < 0))    /* Don't go over 2G */

    and the same conversion warning on the statement on the following line:
        total = 0x7fffffff;

    Use floating point and decimal constants instead of the hex constants.

    For the test, use 2147483648.0 which is cleanly represented by a float.
    Change the comparison in the test from '>' to '>='.

    If the total value exceeds 2G, just assign the max value directly to the
    return variable.

    Reviewed-on: https://gerrit.openafs.org/14277
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 37b55b30c65d0ab8c8eaabfda0dbd90829e2c46a)

    Change-Id: I16e0acd893049b01a2c5e4c7e71de3fa40e28d3e
    Reviewed-on: https://gerrit.openafs.org/14299
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit e7902252f15acfc28453c531f6fa3b29c9c91b92
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Fri Aug 21 10:37:51 2020 -0600

    LINUX 5.9: Remove HAVE_UNLOCKED_IOCTL/COMPAT_IOCTL

    Linux-5.9-rc1 commit 'fs: remove the HAVE_UNLOCKED_IOCTL and
    HAVE_COMPAT_IOCTL defines' (4e24566a) removed the two referenced macros
    from the kernel.

    The support for unlocked_ioctl and compat_ioctl were introduced in
    Linux 2.6.11.

    Remove references to HAVE_UNLOCKED_IOCTL and HAVE_COMPAT_IOCTL using
    the assumption that they were always defined.

    Notes:

    With this change, building against kernels 2.6.10 and older will fail.
    RHEL4 (EOL in March 2017) used a 2.6.9 kernel.  RHEL5 uses a 2.6.18
    kernel.

    In linux-2.6.33-rc1 the commit messages for "staging: comedi:
    Remove check for HAVE_UNLOCKED_IOCTL" (00a1855c) and "Staging: comedi:
    remove check for HAVE_COMPAT_IOCTL" (5d7ae225) both state that all new
    kernels have support for unlocked_ioctl/compat_ioctl so the checks can
    be removed along with removing support for older kernels.

    Reviewed-on: https://gerrit.openafs.org/14300
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 13a49aaf0d5c43bce08135edaabb65587e1a8031)

    Change-Id: I6dc5ae5b32031641f4a021a31630390a91d834fe
    Reviewed-on: https://gerrit.openafs.org/14315
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 99232fa48020bde90f6b245da4374bc63399654f
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Wed Aug 5 09:19:02 2020 -0600

    afs: Set AFS_VFSFSID to a numerical value

    Currently when UKERNEL is defined, AFS_VFSFSID is always set to
    AFS_MOUNT_AFS, which is a string for many platforms for UKERNEL.

    Update src/afs/afs.h to insure that the define for AFS_VFSFSID is a
    numeric value when building UKERNEL.

    Clean up the preprocessor indentation in src/afs/afs.h in the area
    around the AFS_VFSFSID defines.

    Thanks to adeason@sinenomine.net for pointing out a much easier solution
    for resolving this problem.

    Reviewed-on: https://gerrit.openafs.org/14279
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 446457a1240b88fd94fc34ff5715f2b7f2f3ef12)

    Change-Id: Ic0f9c2f1f4baeb9f99da19e1187f1bc9f5d7f824
    Reviewed-on: https://gerrit.openafs.org/14297
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 7e2a62fb594a7713a5c694b9e0401b5e3c2648c8
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Wed Aug 5 09:18:49 2020 -0600

    afs: Avoid using logical OR when setting f_fsid

    Building with clang-10 produces the warning/error message
        warning: converting the result of '<<' to a boolean always evaluates
        to true [-Wtautological-constant-compare]
    for the expression
        abp->f_fsid = (AFS_VFSMAGIC << 16) || AFS_VFSFSID;

    The message is because a logical OR '||' is used instead of a bitwise
    OR '|'.  The result of this expression will always set the f_fsid
    member to a 1 and not the intended value of AFS_VFSMAGIC combined with
    AFS_VFSFSID.

    Update the expression to use a bitwise OR instead of the logical OR.

    Note: This will change value stored in the f_fsid that is returned from
    statfs.

    Using a logical OR has existed since OpenAFS 1.0 for hpux/solaris and in
    UKERNEL since OpenAFS 1.5 with the commit 'UKERNEL: add uafs_statvfs'
    b822971a.

    Reviewed-on: https://gerrit.openafs.org/14292
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit c56873bf95f6325b70e63ed56ce59a3c6b2b753b)

    Change-Id: I2e3fe6a84ef6ce73fff933f137d4806efefa5949
    Reviewed-on: https://gerrit.openafs.org/14298
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit fc32922236122b9013ed9bdedda8c54c2e8c75d8
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Fri May 1 15:02:08 2020 -0500

    vlserver: Return error when growing beyond 2 GiB

    In the vlserver, when we add a new vlentry or extent block, we grow
    the VLDB by doing something like this:

        vital_header.eofPtr += sizeof(item);

    Since we don't check for overflow, and all of our offset-related
    variables are signed 32-bit integers, this can cause some odd behavior
    if we try to grow the database to be over 2 GiB in size.

    To avoid this, change the two places in vlserver code that grow the
    database to use a new function, grow_eofPtr(), which checks for 31-bit
    overflow. If we are about to overflow, log a message and return an
    error.

    See the following for a specific example of our "odd behavior" when we
    overflow the 2 GiB limit in the VLDB:

    With 1 extent block, we can create 14509076 vlentries successfully. On
    the 14509077th vlentry, we'll attempt to write the entry to offset
    2147483560 (0x7FFFFFA8). Since a vlentry is 148 bytes long, we'll
    write all the way through offset 2147483707 (0x8000003B), which is
    over the 31-bit limit.

    In the udisk subsystem, this results in writing to page numbers
    2097151, and -2097152 (since our ubik pages are 1k, and going over the
    31-bit limit causes us to treat offsets as negative). These pages
    start at physical offsets 2147482688 (0x7FFFFC40) and -2147483584
    (-0x7FFFFFC0) in our vldb.DB0 (where offset is page*1024+64).

    Modifying each of these pages involves reading in the existing page
    first, modifying the parts we are changing, and writing it back. This
    works just fine for 2097151, but of course fails for -2097152. The
    latter fails in DReadBuffer when eventually our pread() fails with
    EINVAL, and causes ubik to log the message:

        Ubik: Error reading database file: errno=22

    But when DReadBuffer fails, DReadBufferForWrite assumes this is due to
    EOF, and just creates a new buffer for the given page (DNewBuffer).
    So, the udisk_write() call ultimately succeeds.

    When we go to flush the dirty data to disk when committing the
    transaction, after we have successfully written the transaction log,
    DFlush() fails for the -2097152 page when the pwrite() call eventually
    fails with EINVAL, causing ubik to panic, logging the messages:

        Ubik PANIC:
        Writing Ubik DB modifications

    When the vlserver gets restarted by bosserver, we then process the
    transaction log, and perform the operations in the log before starting
    up (ReplayLog). The log records the actual data we wrote, not split
    into pages, and the log-replaying code writes directly to the db
    usying uphys_write instead of udisk_write. So, because of this, the
    write actually succeeds when replaying the log, since we just write
    148 bytes to offset 2147483624 (0x7FFFFFE8), and no negative offsets
    are used.

    The vlserver will then be able to run, but will be unable to read that
    newly-created vlentry, since it involves reading a ubik page beyond
    the 31-bit boundary. That means trying to lookup that entry will fail
    with i/o errors, and as well as any entry on the same hash chains as
    the new entry (since the new entry will be added to the head of the
    hash chain). Listing all entries in the database will also just show
    an empty database, since our vital_header.eofPtr will be negative, and
    we determine EOF by comparing our current blockindex to the value in
    eofPtr.

    Reviewed-on: https://gerrit.openafs.org/14180
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    (cherry picked from commit d01398731550b8a93b293800642c3e1592099114)

    Change-Id: I72302a8c472b3270e99e58a573f5cf25dd34b9c5
    Reviewed-on: https://gerrit.openafs.org/14288
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 03979eaa174de0cff188af451806fb7ae10d3a62
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Tue Apr 7 13:15:31 2020 -0500

    vlserver: Correctly pad nvlentry for "O" RPCs

    For our old-style "O" RPCs (e.g. VL_CreateEntry, instead of
    VL_CreateEntryN), vlserver calls vldbentry_to_vlentry to convert to
    the internal 'struct nvlentry' format. After all of the sites have
    been copied to the internal format, we fill the remaining sites by
    setting the serverNumber to BADSERVERID. For nvldbentry_to_vlentry, we
    do this for NMAXNSERVERS sites, but for vldbentry_to_vlentry, we do
    this for OMAXNSERVERS.

    The thing is, both functions are filling in entries for a 'struct
    nvlentry', which has NMAXNSERVERS 'serverNumber' entries. So for
    vldbentry_to_vlentry, we are skipping setting the last few sites
    (specifically, NMAXNSERVERS-OMAXNSERVERS = 13-8 = 5).

    This can easily cause our O-style RPCs to write out entries to disk
    that have uninitialized sites at the end of the array. For example, an
    entry with one site should have server numbers that look like this:

        serverNumber = {1, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}

    That is, one real serverid (a '1' here), followed by twelve
    BADSERVERIDs.

    But for a VL_CreateEntry call, the 'struct nvlentry' is zeroed out
    before vldbentry_to_vlentry is called, and so the server numbers in
    the written entry look like this:

        serverNumber = {1, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0}

    That is, one real serverid (a '1' here), followed by seven
    BADSERVERIDs, followed by five '0's.

    Most of the time, this is not noticeable, since our code that reads in
    entries from disk stops processing sites when we encounter the first
    BADSERVERID site (see vlentry_to_nvldbentry). However, if the entry
    has 8 sites, then none of the entries will contain BADSERVERID, and so
    we will actually process the trailing 5 bogus sites. This would appear
    as 5 extra volume sites for a volume, most likely all for the same
    server.

    For VL_CreateEntry, the vlentry struct is always zeroed before we use
    it, so the trailing sites will always be filled with 0. For
    VL_ReplaceEntry, the trailing sites will be unchanged from whatever
    was read in from the existing disk entry.

    To fix this, just change the relevant loop to go through NMAXNSERVERS
    entries, so we actually go to the end of the serverNumber (et al)
    array.

    This may appear similar to commit ddf7d2a7 (vlserver: initialize
    nvlentry elements after read). However, that commit fixed a case
    involving the old vldb database format (which hopefully is not being
    used). This commit fixes a case where we are using the new vldb
    database format, but with the old RPCs, which may still be used by old
    tools.

    Reviewed-on: https://gerrit.openafs.org/14139
    Tested-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 7e41ee0bd50d39a356f0435ff370a0a7be40306f)

    Change-Id: I68ecc41e7268efd0424e6c129aa914cd04115b59
    Reviewed-on: https://gerrit.openafs.org/14287
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 986ee1fd422d1608553af5a6a02bd60349b864ac
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Fri Jun 19 08:01:14 2020 -0600

    afs: Avoid panics on failed return from afs_CFileOpen

    afs_CFileOpen is a macro that invokes the open "method" of the
    afs_cacheOps structure, and for disk caches the osi_UFSOpen function is
    used.

    Currently osi_UFSOpen will panic if there is an error encountered while
    opening a file.

    Prepare to handle osi_UFSOpen function returning a NULL instead of
    issuing a panic (future commit).

    Update callers of afs_CFileOpen to test for an error and to return an
    error instead of issuing a panic.

    While this commit eliminates some panics, it does not address some of the
    more complex cases associated with errors from afs_CFileOpen.

    Reviewed-on: https://gerrit.openafs.org/14241
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit d2d27f975df13c3833898611dacff940a5ba3e2a)

    Change-Id: Ia30711748b3cffd56eb3120961aa1747dfae0f23
    Reviewed-on: https://gerrit.openafs.org/14286
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit a00fe9fd4253b2436ae649b05b329d0d52c8b6f6
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date:   Thu Mar 5 07:21:55 2020 +0000

    LINUX: Initialize CellLRU during osi_Init

    When OpenAFS kernel module gets loaded, it will create certain entries
    in "proc" filesystem. One of those entries is "CellServDB", in case
    we read "/proc/fs/openafs/CellServDB" without starting "afsd" it will
    result in crash with NULL pointer deref. The reason for crash is
    CellLRU has not been initialized yet (since "afsd" is not started)
    i.e afs_CellInit is not yet called, because of this "next" and "prev"
    pointers will be NULL. Inside "c_start()" we do not check for NULL
    pointer while traversing CellLRU and this causes crash.

    To avoid this initialize CellLRU during module intialization.

    Reviewed-on: https://gerrit.openafs.org/14093
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 8d90a9d27b0ef28ddcdd3eb041c8a9d019b84b50)

    Change-Id: I9ed97d3751943331c9d9bc9dfc73acc24231187b
    Reviewed-on: https://gerrit.openafs.org/14284
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 6824d45c2ab83e52350c3f366e4cb6f1eb263088
Author: Kailas Zadbuke <kailashsz@in.ibm.com>
Date:   Wed Jun 3 15:44:08 2020 +0530

    util: Handle serverLogMutex lock across forks

    If a process forks when another thread has serverLogMutex locked, the
    child process inherits the locked serverLogMutex. This causes a deadlock
    when code in the child process tries to lock serverLogMutex, since we
    can never unlock serverLogMutex because the locking thread no longer
    exists. This can happen in the salvageserver, since the salvageserver
    locks serverLogMutex in different threads, and forks to handle salvage
    jobs.

    To avoid this deadlock, we register handlers using pthread_atfork()
    so that the serverLogMutex will be held during the fork. The fork will
    be blocked until the worker thread releases the serverLogMutex. Hence the
    serverLogMutex will be held until the fork is complete and it will be
    released in the parent and child threads.

    Thanks to Yadavendra Yadav(yadayada@in.ibm.com) for working with me
    on this issue.

    Reviewed-on: https://gerrit.openafs.org/14239
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    (cherry picked from commit e44d6441c8786fdaaa1fad1b1ae77704c12f7d60)

    Change-Id: I09c04c0bd99b10433857ccdaeb4ee6a4cd50f768
    Reviewed-on: https://gerrit.openafs.org/14283
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Kailas Zadbuke <kailashsz@in.ibm.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 179a418ea5063785a23e4faf35134f063a6f3e1c
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Fri Mar 13 13:00:35 2020 -0500

    LINUX: Properly revert creds in osi_UFSTruncate

    Commit cd3221d3 (Linux: use override_creds when available) caused us
    to force the current process's creds to the creds of afsd during
    osi_file.c file ops, to avoid access errors in some cases.

    However, in osi_UFSTruncate, one code path was missed to revert our
    creds back to the original user's creds: when the afs_osi_Stat call
    fails or deems the truncate unnecessary. In this case, the calling
    process keeps the creds for afsd after osi_UFSTruncate returns,
    causing our subsequent access-checking code to think that the current
    process is in the same context as afsd (typically uid 0 without a
    pag).

    This can cause the calling process to appear to transiently have the
    same access as non-pag uid 0; typically this will be unauthenticated
    access, but could be authenticated if uid 0 has tokens.

    To fix this, modify the early return in osi_UFSTruncate to go through
    a 'goto done' destructor instead, and make sure we revert our creds in
    that destructor.

    Thanks to cwills@sinenomine.net for finding and helping reproduce the
    issue.

    Reviewed-on: https://gerrit.openafs.org/14098
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Tested-by: Cheyenne Wills <cwills@sinenomine.net>
    (cherry picked from commit 57b4f4f9be1e25d5609301c10f717aff32aef676)

    Change-Id: I714eb2dea9645ffe555f26b5d69707a7afbe8d81
    Reviewed-on: https://gerrit.openafs.org/14099
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit ee578e92d9f810d93659a9805d0c12084fe2bb95
Author: Jeffrey Hutzelman <jhutz@cmu.edu>
Date:   Thu May 2 16:02:47 2019 -0400

    Linux: use override_creds when available

    Linux may perform some access control checks at the time of an I/O
    operation, rather than relying solely on checks done when the file is
    opened. In some cases (e.g. AppArmor), these checks are done based on
    the current tasks's creds at the time of the I/O operation, not those
    used when the file was open.

    Because of this, we must use override_creds() / revert_creds() to make
    sure we are using privileged credentials when performing I/O operations
    on cache files. Otherwise, cache I/O operations done in the context of
    a task with a restrictive AppArmor profile will fail.

    Reviewed-on: https://gerrit.openafs.org/13751
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit cd3221d3532a28111ad22d4090ec913cbbff40da)

    Change-Id: I8955ff6150462fecba9a10a8f99bce9ee8163435
    Reviewed-on: https://gerrit.openafs.org/14082
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit facff58b840a47853592510617ba7a1da2e3eaa9
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Fri Jul 3 10:35:06 2020 -0600

    LINUX 5.8: use lru_cache_add

    With Linux-5.8-rc1 commit 'mm: fold and remove lru_cache_add_anon() and
    lru_cache_add_file()' (6058eaec), the lru_cache_add_file function is
    removed since it was functionally equivalent to lru_cache_add.

    Replace lru_cache_add_file with lru_cache_add.

    Introduce a new autoconf test to determine if lru_cache_add is present

    For reference, the Linux changes associated with the lru caches:

      __pagevec_lru_add introduced before v2.6.12-rc2

      lru_cache_add_file introduced in v2.6.28-rc1
      __pagevec_lru_add_file replaces __pagevec_lru_add in v2.6.28-rc1
         vmscan: split LRU lists into anon & file sets (4f98a2fee)

      __pagevec_lru_add removed in v5.7 with a note to use lru_cache_add_file
         mm/swap.c: not necessary to export __pagevec_lru_add() (bde07cfc6)

      lru_cache_add_file removed in v5.8
         mm: fold and remove lru_cache_add_anon() and lru_cache_add_file()
          (6058eaec)
      lru_cache_add exported
         mm: fold and remove lru_cache_add_anon() and lru_cache_add_file()
          (6058eaec)

    Openafs will use:
      lru_cache_add on 5.8 kernels
      lru_cache_add_file from 2.6.28 through 5.7 kernels
      __pagevec_lru_add/__pagevec_lru_add_file on pre 2.6.28 kernels

    Reviewed-on: https://gerrit.openafs.org/14249
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Yadavendra Yadav <yadayada@in.ibm.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 7d85ce221d6ccc19cf76ce7680c74311e4ed2632)

    Change-Id: Iba6ef4441687dbf60d227a708e2a032c2c0dc79f
    Reviewed-on: https://gerrit.openafs.org/14269
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 335f37be13d2ff954e4aeea617ee66502170805e
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Fri Jul 3 10:34:42 2020 -0600

    LINUX 5.8: do not set name field in backing_dev_info

    Linux-5.8-rc1 commit 'bdi: remove the name field in struct
    backing_dev_info' (1cd925d5838)

    Do not set the name field in the backing_dev_info structure if it is
    not available. Uses an existing config test
        'STRUCT_BACKING_DEV_INFO_HAS_NAME'

    Note the name field in the backing_dev_info structure was added in
    Linux-2.6.32

    Reviewed-on: https://gerrit.openafs.org/14248
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit d8ec294534fcdee77a2ccd297b4b167dc4d5573d)

    Change-Id: I3d9e18092db998a4c4f26bd63ee3b75383a53d4c
    Reviewed-on: https://gerrit.openafs.org/14268
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit d7fc5bf9bf031089d80703c48daf30d5b15a80ca
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Fri Jul 3 10:33:51 2020 -0600

    LINUX 5.8: Replace kernel_setsockopt with new funcs

    Linux 5.8-rc1 commit 'net: remove kernel_setsockopt' (5a892ff2facb)
    retires the kernel_setsockopt function. In prior kernel commits new
    functions (ip_sock_set_*) were added to replace the specific functions
    performed by kernel_setsockopt.

    Define new config test 'HAVE_IP_SOCK_SET' if the 'ip_sock_set' functions
    are available. The config define 'HAVE_KERNEL_SETSOCKOPT' is no longer
    set in Linux 5.8.

    Create wrapper functions that replace the kernel_setsockopt calls with
    calls to the appropriate Linux kernel function(s) (depending on what
    functions the kernel supports).

    Remove the unused 'kernel_getsockopt' function (used for building with
    pre 2.6.19 kernels).

    For reference
        Linux 2.6.19 introduced kernel_setsockopt
        Linux 5.8 removed kernel_setsockopt and replaced the functionality
                  with a set of new functions (ip_sock_set_*)

    Reviewed-on: https://gerrit.openafs.org/14247
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit c48072b9800759ef1682b91ff1e962f6904a2594)

    Change-Id: I2724fad06b1882149d2066d13eced55eff5ee695
    Reviewed-on: https://gerrit.openafs.org/14267
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 0f67e733e82a9001f3f9253c5e1880be845d537b
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Thu Apr 2 13:27:50 2020 -0600

    LINUX: Include linux/time.h for linux/errqueue.h

    The configuration test for errqueue.h fails with an undefined structure
    error on a Linux 3.17 (or higher) system.  This prevents setting
    HAVE_LINUX_ERRQUEUE_H, which is used to define AFS_RXERRQ_ENV.

    Linux commit f24b9be5957b38bb420b838115040dc2031b7d0c (net-timestamp:
    extend SCM_TIMESTAMPING ancillary data struct) - which was picked up in
    linux 3.17 added a structure that uses the timespec structure.  After
    this commit, we need to include linux/time.h to pull in the definition
    of the timespec struct.

    Reviewed-on: https://gerrit.openafs.org/13950
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Tested-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 086d185872da5f19447cf5ec7846e7ce5104563f)

    Change-Id: I67d01b11c5ea95b8b87832fc833f8fc850ade684
    Reviewed-on: https://gerrit.openafs.org/14130
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit 5a14bd0abe83b580f0cc7a200ae963399ab7de5f
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Tue May 26 12:11:28 2020 -0600

    vol: Fix format-truncation warning with gcc-10.1

    Building with gcc-10.1 produces a warning (error if --enable-checking)
    in vol-salvage.c

    error: ‘%s’ directive output may be truncated writing up to 755 bytes
           into a region of size 255 [-Werror=format-truncation=]
      809 |     snprintf(inodeListPath, 255, "%s" OS_DIRSEP "salvage.inodes.%s.%d", tdir, name,

    Use strdup/asprintf to allocate the buffer dynamically instead of using
    a buffer with a hardcoded size.

    Reviewed-on: https://gerrit.openafs.org/14207
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit d73680c5f70ee5aeb634a9ec88bf1097743d0f76)

    Change-Id: I8d3bf244a70f723f585738905deea7ddfb1bb862
    Reviewed-on: https://gerrit.openafs.org/14232
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Tested-by: Cheyenne Wills <cwills@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit d5fc5283e91ea94a67df8364a5b8bf8970ffe934
Author: Michael Meffie <mmeffie@sinenomine.net>
Date:   Mon Oct 9 22:16:09 2017 -0400

    afsmonitor: remove unused LWP_WaitProcess

    Remove the unimplemented once-only flag and the unused LWP_WaitProcess
    call.

    Reviewed-on: https://gerrit.openafs.org/12745
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 7c27365ea24aed5787f6fc03f30f6085c78ece51)

    Change-Id: I3b61f9fb4f45564304b0e35878d3535a10e31d02
    Reviewed-on: https://gerrit.openafs.org/14226
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

commit a2eec64374d6b754b29c509b554573cb6e53eb46
Author: Cheyenne Wills <cwills@sinenomine.net>
Date:   Fri May 22 12:16:48 2020 -0600

    Avoid duplicate definitions of globals

    GCC 10 changed a default flag from -fcommon to -fno-common.  See
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85678 for some background.

    The change in gcc 10 results in build link-time errors. For example:
       ../../src/xstat/.libs/liboafs_xstat_cm.a(xstat_cm.o):(.bss+0x2050):
           multiple definition of `numCollections';

    Ensure that only one definition for global data objects exist and change
    references to use "extern" as needed.

    To ensure that future changes do not introduce duplicated global
    definitions, add the -fno-common flag to XCFLAGS when using the
    configure --enable-checking setting.

    [cwills@sinenomine.net: Note for 1.8.x: renamed terminationEvent
    to cm_terminationEvent/fs_terminationEvent instead of deleting it.]

    Reviewed-on: https://gerrit.openafs.org/14106
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 0e2072ae386d4111bef161eb955964b649c31386)

    Change-Id: I54ca61d372cf763e4a28c0b0829ea361219f6203
    Reviewed-on: https://gerrit.openafs.org/14217
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>